126 7 41MB
English Pages 640 Year 2022
Studies in Computational Intelligence 1032
Valentin V. Klimov David J. Kelley Editors
Biologically Inspired Cognitive Architectures 2021 Proceedings of the 12th Annual Meeting of the BICA Society
Studies in Computational Intelligence Volume 1032
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/7092
Valentin V. Klimov David J. Kelley •
Editors
Biologically Inspired Cognitive Architectures 2021 Proceedings of the 12th Annual Meeting of the BICA Society
123
Editors Valentin V. Klimov MEPhI National Research Nuclear University Moscow, Russia
David J. Kelley Boston Consulting Group Seattle, WA, USA
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-96992-9 ISBN 978-3-030-96993-6 (eBook) https://doi.org/10.1007/978-3-030-96993-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume contains papers presented at the 2021 Annual International Conference on Biologically Inspired Cognitive Architectures (BICA 2021) held on September 12–19, 2021, in Fukuchiyama City, Kyoto, with a follow-up electronic poster session held on November 28. BICA 2021 was held as a purely virtual conference, consisting of two parallel events: (1) the BICA Event at IS4SI 2021 and (2) the BICA Workshop at IVA 2021. Here, (1) and (2) are: 1. Information in Biologically Inspired Cognitive Architectures (BICA)-based Systems at the 2021 Summit of the International Society for the Study of Information, September 12–19, 2021, Vienna, Austria, online (https://summit2021.is4si.org) 2. 2021 International Workshop on Biologically Inspired Cognitive Architectures: BICA Workshop at the 21st ACM International Conference on Intelligent Virtual Agents, September 14–17, University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan, online (https://sites.google.com/view/iva2021/). Together (1) and (2) constitute the 2021 Annual International Conference on Biologically Inspired Cognitive Architectures, also known as the 2021 Annual International Conference on Brain-Inspired Cognitive Architectures for Artificial Intelligence (BICA*AI 2021), also known as the 12th Annual Meeting of the BICA Society, September 12–19, 2021, Vienna, Austria, and the University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan, online (bica2021.bicasociety.org). BICA*AI 2021 was sponsored by BICA Society, ACM IVA 2021, AGI Lab, and NRNU MEPhI. Biologically Inspired Cognitive Architectures (BICAs) are computational frameworks for building intelligent agents that are inspired from biological intelligence. Biological intelligent systems, notably animals such as humans, have many qualities that are often lacking in artificially designed systems including robustness, flexibility and adaptability to environments. At a point in time where visibility into naturally intelligent systems is exploding thanks to modern brain imaging and recording techniques, our ability to learn from nature and to build biologically inspired intelligent systems has never been greater. At the same time, the growth in v
vi
Preface
computer science and technology has unleashed enough computational power that an explosion of intelligent applications from augmented reality to intelligent virtual agents is now certain. The growth in these fields challenges the computational replication of all essential aspects of the human mind (the BICA Challenge), an endeavor which is interdisciplinary in nature and promises to yield bidirectional flow of understanding between all involved disciplines. BICA Conference Series was preceded by the AAAI Fall Symposia on BICA, held in 2008-2009. Originally, participants were mostly the members of the DARPA BICA Program of 2005-2006; however, the audience started expanding rapidly. As a result, in 2010 the BICA Conference Series was initiated as a separate venue, managed by a newly formed nonprofit: BICA Society. For 11 years, the BICA conference was organized around the world (USA, Italy, Ukraine, France, Russia, Czech Republic, Brazil), demonstrating impressive success in growing progression. The conference was complemented by schools and symposia on BICA, in total making 15 separate events. The BICA Society membership reached many hundreds. BICA Society published a number of books and journals, some of them on the regular basis. During the years of the COVID pandemia, the BICA conference was a great success in the cyberspace. There were 132 submissions to BICA 2021. Each submission was peer reviewed. The committee decided to accept 69 papers for this volume. December 2021
David J. Kelley Valentin V. Klimov
Organization
Program Committee Taisuke Akimoto Kenji Araki Joscha Bach Feras Batarseh Paul Baxter Paul Benjamin Galina A. Beskhlebnova Jordi Bieger Perrin Bignoli Douglas Blank Peter Boltuc Jonathan Bona Michael Brady Mikhail Burtsev Erik Cambria Suhas Chelian Antonio Chella Olga Chernavskaya Thomas Collins Christopher Dancy Haris Dindo
Kyushu Institute of Technology, Japan Hokkaido University, Japan AI Foundation, USA Virginia Tech, USA Plymouth University, USA Pace University, New York, USA Scientific Research Institute for System Analysis RAS, Russia Reykjavik University, Iceland Yahoo Labs, USA Bryn Mawr College, USA University of Illinois at Springfield, USA University of Arkansas for Medical Sciences, USA Boston University, USA Moscow Institute of Physics and Technology, Russia Nanyang Technological University, Singapore Fujitsu Laboratories of America, Inc., USA Dipartimento di Ingegneria Informatica, Università di Palermo, Italy P. N. Lebedev Physical Institute, Moscow, Russia University of Southern California (Information Sciences Institute), USA Penn State University, USA University of Palermo, Italy
vii
viii
Sergey A. Dolenko
Alexandr Eidlin Jim Eilbert Thomas Eskridge Usef Faghihi Elena Fedorovskaya Stan Franklin
Marcello Frixione Salvatore Gaglio Olivier Georgeon John Gero Jaime Gomez Ricardo R. Gudwin Eva Hudlicka Dusan Husek
Christian Huyck Ignazio Infantino Eduardo Izquierdo Alex James Li Jinhai Magnus Johnsson Darsana Josyula Kamilla Jóhannsdóttir Omid Kavehei David Kelley Troy Kelley William Kennedy Deepak Khosla Muneo Kitajima Valentin Klimov Unmesh Kurup Giuseppe La Tona Luis Lamb Leonardo Lana de Carvalho
Organization
D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, Russia Sberbank, Moscow, Russia AP Technology, USA Florida Institute of Technology, USA Universite de Quebec in Trois-Rivieres, Canada Rochester Institute of Technology, USA Computer Science Department & Institute for Intelligent Systems, University of Memphis, USA University of Genova, Italy University of Palermo, Italy Claude Bernard Lyon 1 University, France University of North Carolina at Charlotte, USA Universidad Politécnica de Madrid, Spain University of Campinas (Unicamp), Brazil Psychometrix Associates, USA Institute of Computer Science, Academy of Sciences of the Czech Republic, The Czech Republic Middlesex University, UK Consiglio Nazionale delle Ricerche, Italy Indiana University, USA Kunming University of Science and Technology, China Kunming University of Science and Technology, China Lund University, Sweden Bowie State University, USA Reykjavik University, Iceland The University of Sydney, Australia AGI Laboratory and Boston Consulting Group, Inc., USA U.S. Army Research Laboratory, USA George Mason University, USA HRL Laboratories LLC, USA Nagaoka University of Technology, Japan National Research Nuclear University MEPhI, Russia LG Electronics, USA University of Palermo, Italy Federal University of Rio Grande do Sul, Brazil Federal University of Jequitinhonha and Mucuri Valleys, Brazil
Organization
Othalia Larue Christian Lebiere Jürgen Leitner Simon Levy Antonio Lieto James Marshall Olga Mishulina Sergey Misyurin Steve Morphet Amitabha Mukerjee Daniele Nardi Sergei Nirenburg David Noelle Natalia Yu. Nosova Andrea Omicini Marek Otahal Aleksandr I. Panov David Peebles Giovanni Pilato Roberto Pirrone Michal Ptaszynski Uma Ramamurthy Thomas Recchia Vladimir Redko James Reggia Frank Ritter Paul Robertson Brandon Rohrer Christopher Rouff Rafal Rzepka Ilias Sakellariou Alexei V. Samsonovich Fredrik Sandin Ricardo Sanz Michael Schader Howard Schneider
ix
University of Quebec, Canada Carnegie Mellon University, USA Australian Centre of Excellence for Robotic Vision, Australia Washington and Lee University, USA University of Turin, Italy Sarah Lawrence College, USA PF “Logos” LLC, Russia National Research Nuclear University MEPhI, Moscow, Russia Enabling Tech Foundation, USA Indian Institute of Technology Kanpur, India Sapienza University of Rome, Italy Rensselaer Polytechnic Institute, New York, USA University of California Merced, USA National Research Nuclear University MEPhI, Russia Alma Mater Studiorum–Università di Bologna, Italy Czech Institute of Informatics, Robotics and Cybernetics, Czech Republic Moscow Institute of Physics and Technology, Russia University of Huddersfield, UK ICAR-CNR, Italy University of Palermo, Italy Kitami Institute of Technology, Japan Baylor College of Medicine, Houston, USA US Army ARDEC, USA Scientific Research Institute for System Analysis RAS, Russia University of Maryland, USA The Pennsylvania State University, USA DOLL Inc., USA Sandia National Laboratories, USA Johns Hopkins Applied Physics Laboratory, USA Hokkaido University, Japan Department of Applied Informatics, University of Macedonia, Greece National Research Nuclear University MEPhI, Russia Lulea University of Technology, Sweden Universidad Politecnica de Madrid, Spain Yellow House Associates, USA Sheppard Clinic North, Canada
x
Michael Schoelles Valeria Seidita Ignacio Serrano Javier Snaider Donald Sofge Meehae Song Rosario Sorbello John Sowa Terry Stewart Swathikiran Sudhakaran Sherin Sugathan Junichi Takeno Knud Thomsen Jan Treur Vadim L. Ushakov Alexsander V. Vartanov Rodrigo Ventura Evgenii Vityaev Pei Wang Mark Waser Roseli S. Wedemann Özge Nilay Yalçin Terry Zimmerman
Organization
Rensselaer Polytechnic Institute, USA Dipartimento di Ingegneria - Università degli Studi di Palermo, Italy Instituto de Automatica Industrial - CSIC, Spain FedEx Institute of Technology, The University of Memphis, USA Naval Research Laboratory, USA Simon Fraser University, Canada University of Palermo, Italy Kyndi, Inc., USA University of Waterloo, Canada Fondazione Bruno Kessler, Trento, Italy Enview Research & Development Labs, India Meiji University, Japan Paul Scherrer Institute, Switzerland Vrije Universiteit Amsterdam, Netherlands National Research Nuclear University MEPhI, Russia Lomonosov Moscow State University, Russia Universidade de Lisboa, Portugal Sobolev Institute of Mathematics SB RAS, Russia Temple University, USA Digital Wisdom Institute, USA Universidade do Estado do Rio de Janeiro, Brazil University of British Columbia, Canada University of Washington Bothell, USA
Contents
Development of COMOS Architecture Based on a Story-Centric View of the Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Taisuke Akimoto
1
Gibsonian Information: A New Approach to Quantitative Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bradly Alicea, Daniela Cialfi, Avery Lim, and Jesse Parent
15
Consciousness Semanticism: A Precise Eliminativist Theory of Consciousness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jacy Reese Anthis
20
A Second-Order Adaptive Network Model for Exam-Related Anxiety Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isabel Barradas, Agnieszka Kloc, Nina Weng, and Jan Treur
42
Using Boolean Functions of Context Factors for Adaptive Mental Model Aggregation in Organisational Learning . . . . . . . . . . . . . . . . . . . Gülay Canbaloğlu and Jan Treur
54
User Group Classification Methods Based on Statistical Models . . . . . . Andrey Igorevich Cherkasskiy, Marina Valeryevna Cherkasskaya, Alexey Anatolevich Artamonov, and Ilya Yurievich Galin From Mental Network Models to Virtualisation by Avatars: A First Software Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frank de Jong, Edgar Eler, Lars Rass, Roy M. Treur, Jan Treur, and Sander L. Koole Mapping Speech Intonations to the VAD Model of Emotions . . . . . . . . Alexandra Dolidze, Maria Morozevich, and Nikolay Pak
69
75
89
xi
xii
Contents
Security Risk Management Methodology for Distributed Ledger Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anatoly P. Durakovskiy, Victor S. Gorbatov, Dmitriy A. Dyatlov, and Dmitriy A. Melnikov
96
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model for the Design of Artificial Cognitive Agents . . . . . . 113 Roman V. Dushkin and Vladimir Y. Stepankov Walking Through the Turing Wall . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Albert Efimov, David I. Dubrovsky, and Philipp Matveev A Controlled Adaptive Network Model for Joint Attention . . . . . . . . . . 138 Dilay F. Ercelik and Jan Treur Means of Informational Support for the Program of Increasing the public's Loyalty to Projects in the Field of Nuclear Energy . . . . . . . 148 Anna I. Guseva, Elena Matrosova, Anna Tikhomirova, and Matvey Koptelov The Research of Characteristic Frequencies for Gesture-based EMG Control Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Anna Igrevskaya, Alexandra Kachina, Aliona Petrova, and Konstantin Kudryavtsev Specification Language Based on Linear Temporal Logic for Automatic Construction of Statically Verified Systems . . . . . . . . . . . 164 Larisa Ismailova, Sergey Kosikov, Igor Slieptsov, and Viacheslav Wolfengagen Semantic Management of Domain Modification in a Virtual Environment for Modeling Vulnerable Information Subjects . . . . . . . . . 170 Larisa Y. Ismailova, Viacheslav E. Wolfengagen, and Sergey V. Kosikov Semantic Stabilization Tools for Managing the Cognitive Activity of the Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Larisa Ismailova, Sergey Kosikov, Igor Slieptsov, and Viacheslav Wolfengagen Intelligent Web-Application for Countering DDoS Attacks on Educational Institutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Ivanov Mikhail, Radygin Victor, Sergey Korchagin, Pleshakova Ekaterina, Sheludyakov Dmitry, Yerbol Yerbayev, and Bublikov Konstantin Toward Human-Level Qualitative Reasoning with a Natural Language of Thought . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Philip C. Jackson Jr.
Contents
xiii
Developing of Smart Technical Platforms Concerning National Economic Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Ksenia Sergeevna Khrupina, Irina Viktorovna Manakhova, and Alexander Valentinovich Putilov Toward Working Definitions of Cognitive Processes Suitable for Design Specifications of BICA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Joao E. Kogler Jr. Intelligent System for Express Analysis of Electrophysical Characteristics of Nanocomposite Media . . . . . . . . . . . . . . . . . . . . . . . . 223 Korchagin Sergey, Osipov Aleksey, Pleshakova Ekaterina, Ivanov Mikhail, Kupriyanov Dmitry, and Bublikov Konstantin Classification and Generation of Virtual Dancer Social Behaviors Based on Deep Learning in a Simple Virtual Environment Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Andrey I. Kuzmin, Denis A. Semyonov, and Alexei V. Samsonovich Possibility of Benford’s Law Application for Diagnosing Inaccuracy of Financial Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Pavel Y. Leonov, Viktor P. Suyts, Vadim A. Rychkov, Anastasia A. Ezhova, Viktor M. Sushkov, and Nadezhda V. Kuznetsova Artificial Intelligence Limitations: Blockchain Trust and Communication Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Sergey V. Leshchev Application of the Multi-valued Logic Apparatus for Solving Diagnostic Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Larisa A. Lyutikova Semantic Generalization Means Based on Knowledge Graphs . . . . . . . . 261 Nikolay Maksimov, Olga Golitsina, Anastasia Gavrilkina, and Alexander Lebedev Knowledge Graphs in Text Information Retrieval . . . . . . . . . . . . . . . . . 268 Nikolay Maksimov, Olga Golitsyna, and Alexander Lebedev Digitalization of the Economy and Advanced Planning Technologies as a Way to Overcome the Economic Recession . . . . . . . . . . . . . . . . . . . 275 Yulia M. Medvedeva, Rafael E. Abdulov, Daler B. Dzhabborov, and Oleg O. Komolov Genetic-Memetic Relational Approach for Scheduling Problems . . . . . . 281 Sergey Yu. Misyurin and Andrey P. Nelyubin Multicriteria Optimization of a Hydraulic Lifting Manipulator by the Methods of Criteria Importance Theory . . . . . . . . . . . . . . . . . . . 288 S. Yu. Misyurin, A. P. Nelyubin, G. V. Kreinin, and N. Yu. Nosova
xiv
Contents
The Hexabot Robot: Kinematics and Robot Gait Selection . . . . . . . . . . 297 S. Yu. Misyurin, A. P. Nelyubin, G. V. Kreynin, N. Yu. Nosova, A. S. Chistiy, N. M. Khokhlov, and E. M. Molchanov On the Possibility of Using the Vibration Displacement Theory in the Analysis of Ship Accident Rate Using Artificial Intelligence Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306 S. Yu Misyurin, Yu. A. Semenov, and E. B. Semenova Proposal and Evaluation of Deep Profit Sharing Method in a Mixed Reward and Penalty Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Kazuteru Miyazaki Generalized Structure of Active Speech Perception Based on Multiagent Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Zalimkhan Nagoev, Irina Gurtueva, and Murat Anchekov Multiagent Neurocognitive Models of the Processes of Understanding the Natural Language Description of the Mission of Autonomous robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Z. V. Nagoev, O. V. Nagoeva, I. A. Pshenokova, K. Ch. Bzhikhatlov, I. A. Gurtueva, and S. A. Kankulov Exploring the Workspace of a Robot with Three Degrees of Freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Natalia Yu. Nosova and Sergey Yu. Misyurin Digitalization as a New Paradigm of Economic Progress . . . . . . . . . . . . 344 Svetlana Nosova, Anna Norkina, Svetlana Makar, Irina Arakelova, and Galina Fadeicheva Artificial Intelligence Technology as an Economic Accelerator of Business Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Svetlana Nosova, Anna Norkina, Olga Medvedeva, Andrey Abramov, Svetlana Makar, Nina Lozik, and Galina Fadeicheva The Collaborative Nature of Artificial Intelligence as a New Trend in Economic Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Svetlana Nosova, Anna Norkina, Olga Medvedeva, Svetlana Makar, Sergey Bondarev, Galina Fadeicheva, and Alexander Khrebtov Digital Technologies as a Process of Strategic Maneuvering in Economic Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 Svetlana Nosova, Anna Norkina, Olga Medvedeva, Irina Aracelova, Victoria Grankina, and Lidia Shirokova Evaluation of fMRI Data at the Individual Level . . . . . . . . . . . . . . . . . . 393 Vyacheslav A. Orlov, Sergey I. Kartashov, Denis G. Malakhov, Mikhail V. Kovalchuk, and Yuri I. Kholodny
Contents
xv
The Impact of Internet Media on the Cognitive Attitudes of Individuals on the Example of RT and BBC . . . . . . . . . . . . . . . . . . . 400 Alexandr Y. Petukhov, Sofia A. Polevaya, and Evgeniy A. Gorbov Applications of the Knowledge Base and Ontologies for the Process of Unification and Abstraction of the Information System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 Pavel Piskunov and Igor Prokhorov Designing an Economic Cross as a Condition for the Formation of Technological Platforms of a Digital Society . . . . . . . . . . . . . . . . . . . 412 Olga Bronislavovna Repkina, Galina Ivanovna Popova, and Dmitriy Vladimirovich Timokhin A Physical Structural Perspective of Intelligence . . . . . . . . . . . . . . . . . . 419 Saty Raghavachary One Possibility of a Neuro-Symbolic Integration . . . . . . . . . . . . . . . . . . 428 Alexei V. Samsonovich A Comparison of Two Variants of Memristive Plasticity for Solving the Classification Problem of Handwritten Digits Recognition . . . . . . . . 438 Alexander Sboev, Yury Davydov, Roman Rybka, Danila Vlasov, and Alexey Serenko Sentiment Analysis of Russian Reviews to Estimate the Usefulness of Drugs Using the Domain-Specific XLM-RoBERTa Model . . . . . . . . . 447 Alexander Sboev, Aleksandr Naumov, Ivan Moloshnikov, and Roman Rybka Correlation Encoding of Input Data for Solving a Classification Task by a Spiking Neural Network with Spike-Timing-Dependent Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . 457 Alexander Sboev, Alexey Serenko, and Roman Rybka The Two-Stage Algorithm for Extraction of the Significant Pharmaceutical Named Entities and Their Relations in the RussianLanguage Reviews on Medications on Base of the XLM-RoBERTa Language Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Alexander Sboev, Ivan Moloshnikov, Anton Selivanov, Gleb Rylkov, and Roman Rybka Causal Cognitive Architecture 2: A Solution to the Binding Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472 Howard Schneider Fundamentals of a Multi-agent Planning and Logistics Model for Managing the Online industry’s Reproduction Cycle . . . . . . . . . . . . 486 Vasily V. Shpak
xvi
Contents
Automated Flow Meter for LPG Cylinders . . . . . . . . . . . . . . . . . . . . . . 496 Viktor A. Shurygin, Igor M. Yadykin, and Aleksandr B. Vavrenyuk Construction of Statically Verified System Interacting with User in Question-Answer Mode According to the Specification Set by the Formula of Linear Temporal Logic . . . . . . . . . . . . . . . . . . . . . . . 506 Igor Slieptsov, Larisa Ismailova, Sergey Kosikov, and Viacheslav Wolfengagen Comparison of ERP in Internal Speech (Meaningful and Non-existent Words) . . . . . . . . . . . . . . . . . . . . . . . . . . 512 Alisa R. Suyuncheva and Alexander V. Vartanov Digital Transformation of the Economy and Industrialization Based on Industry 4.0 as a Way to Leadership and High-Quality Economic Growth in the World After the Structural Economic Crisis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522 Tamara O. Temirova and Rafael E. Abdulov The Use of the Economic Cross Model in Planning Sectoral Digitalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528 Dmitriy Vladimirovich Timokhin Modeling the Economic Cross of Technological Platform for Sectoral Development in the Context of Digitalization . . . . . . . . . . . 535 Marina Vladimirovna Bugaenko, Dzhannet Sergoevna Shikhalieva, and Dmitriy Vladimirovich Timokhin Cooperative Multi-user Motor-Imagery BCI Based on the Riemannian Manifold and CSP Classifiers . . . . . . . . . . . . . . . . . 542 Sergey A. Titaev Sentiment Analysis of Social Networks Messages . . . . . . . . . . . . . . . . . . 552 Evgeny Tretyakov, Dobrica Savić, Anastasia Korpusenko, and Kristina Ionkina Smart Technologies in REM Production Economy in Russia . . . . . . . . . 561 Victoria Olegovna Pimenova, Gusov Zakharovich Auzby, and Evgeniy Valerievich Trubacheev fMRI Study of Brain Activity in Men and Women During Rhythm Reproduction and Measuring Short Time Intervals . . . . . . . . . . . . . . . . 569 V. L. Ushakov, S. I. Kartashov, V. A. Orlov, M. V. Svetlik, and V. Yu. Bushov Development of the Intelligent Object Detection System on the Road for Self-driving Cars in Low Visibility Conditions . . . . . . . . . . . . . . . . . 576 Nikita Vasiliev, Nikita Pavlov, Osipov Aleksey, Ivanov Mikhail, Radygin Victor, Ekaterina Pleshakova, Sergey Korchagin, and Bublikov Konstantin
Contents
xvii
Application of Machine Learning for Solving Problems of Nuclear Power Plant Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585 V. S. Volodin and A. O. Tolokonskij The Architecture of Cognition as a Generalization of Adaptive Problem-Solving in Biological Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 590 Andy E. Williams Cognitive System for Traversing the Possible Worlds with Individual Information Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 596 Viacheslav Wolfengagen, Larisa Ismailova, Sergey Kosikov, and Sebastian Dohrn Integrated Multi-task Agent Architecture with Affect-Like Guided Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 James B. Worth and Mei Si The Control Algorithm of Compressor Equipment of Automobile Gas-Filling Compressor Stations with Fuzzy Logic Elements . . . . . . . . . 613 Andrew A. Evstifeev, Margarita A. Zaeva, and Nadezhda A. Shevchenko Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Development of COMOS Architecture Based on a Story-Centric View of the Mind Taisuke Akimoto(B) Kyushu Institute of Technology, Iizuka, Fukuoka, Japan [email protected]
Abstract. A general aim in studies on cognitive architectures or artificial cognitive systems is a systematic framework and computational principles for achieving comprehensive, human–analogous, artificial intelligence. We have taken a storycentered approach to this issue during the past few years. The essential idea of this approach is to construct the foundation of an artificial cognitive system as the internal movement of stories. The internal movement of stories is considered to be the mid-level cognition between a lower level associated with neural/bodily processes and a higher-level associated with symbolic/manipulative processes. From this standpoint, we present the architecture of cognitive monogatari (narrative/story) system (COMOS). Parallel distributed processing is adopted as the central principle of the system in dealing with the complex process at the midlevel. The development of COMOS is a continuous long-term project, and the present implementation remains at an early stage. However, the work presented in this paper mainly includes the following two contributions: First, generic concepts for constructing COMOS are systematized. Second, the implementation of the COMOS architecture is presented along with primitive process scenarios. Keywords: Story · Mid-level cognition · Parallel distributed architecture · Connection · Combination
1 Introduction A common aim in studies on cognitive architectures or artificial cognitive systems [1–4] is to seek systematic frameworks and computational principles for building comprehensive human-analogous, intelligent systems or agents. This goal overlaps with the original aim of artificial intelligence (AI). Although various perspectives on the general basis of intelligence currently exist, story cognition has been considered one of the essential aspects of human intelligence. In particular, Schank et al. proposed knowledge and memory models focusing on story cognition [5, 6]. Winston also emphasized the importance of story cognition based on a strong story hypothesis: “Our inner language enables us to tell, understand, and re-combine stories, and those abilities distinguish our intelligence from that of other primates” [7]. Our approach to an artificial cognitive system is influenced by these pioneering approaches and incorporates various other perspectives. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 1–14, 2022. https://doi.org/10.1007/978-3-030-96993-6_1
2
T. Akimoto
We constructed the concept of a story-centered approach for the past few years [8– 10]. Its basic concept is to construct the foundation of an artificial cognitive system as the internal movement of narrative-form information. A narrative is a universal form of human communication about the world in which events and things are organized from the perspective of the individual. A narrative can represent various contents, including past, present, future, hypothetical, and fictional events, regardless of whether they are based on experience, communicated knowledge, or imagination. The story-centered approach is based on an analogy from this social information to the internal information of a cognitive system. The term “story” is particularly used to distinguish this internal information from a narrative in the original meaning. From this perspective, the foundation of a human cognitive system is considered to generate stories internally by interacting with the environment. This view is partially inspired by autopoiesis [11], that is, the semi-closed generative movement of stories inside a cognitive system is emphasized as the basis of cognition. The internal movement of stories is considered to be the mid-level cognition between the lower level associated with neural/bodily processes and the higher-level associated with symbolic/manipulative processes. Thus, the story-centered approach takes a three-level perspective on cognition and focuses on the mid-level. Whereas story cognition may seem a higher-level symbolic cognition, there are mainly two reasons for placing it at the mid-level. First, a story as a world representation must be integrative information in which the sensory/bodily and conceptual dimensions are mixed. Second, the generative process of stories involves a high complexity in nature, and hence modeling as a symbolic manipulation with a central control is unsuitable for this. In dealing with the complex process at the mid-level, we adopt a connectionist approach rooted in the parallel distributed processing movement [12]. However, our approach has several different aspects from recent major connectionist approaches. First, whereas most recent connectionist studies adopt neural networks, our perspective is on an upper abstraction level, that is, the information level. A similar abstraction level is seen in earlier parallel distributed processing models, and we have also been inspired by Minsky’s theory of a multi-agent cognitive system [13]. Second, a common essential mechanism among major connectionist models is categorical cognition or pattern recognition. However, the core of COMOS is a memory system in which internal stories are generated and organized, and its principal operations are associative, relational, and combinational processes. Studies on cognitive architectures/systems are generally involved with two objectives: computational elucidation of the cognition with a scientific purpose and the creation of human-analogous machines with an engineering purpose. This study gives a huge emphasis on the latter. The broad design directions of COMOS are as follows: (a) the potential to be a generic foundation of various types and levels of cognition (including memory functions, future-oriented action perception, communication, personality, imagination, creativity, and integrative cognition), (b) internal consistency of the system in both the conceptual and implementation levels, (c) the conceptual and structural economy of the system, and (d) a qualitative similarity in observable behaviors to those of humans. The development of COMOS as a long-term project and the approach presented in this paper are at an early stage. The potential of COMOS for direction (a) is still only a
Development of COMOS Architecture
3
hypothesis and will be gradually proven through continuous development. In directions (b) and (c), the internal consistency and economy of the system are supported by the implementation based on systematic concepts derived through the system construction. Issues of direction (d) are not addressed in this paper and will be supplemented in the future from narrower perspectives on particular cognitive behaviors. Thus, the approach presented in this paper mainly includes the following two contributions: First, generic concepts for constructing COMOS are systematized (Sect. 2). Second, the architecture of COMOS based on an implementation is presented (Sect. 3) based on primitive process scenarios (Sect. 4). However, technical details are omitted in this paper owing to limited space.
2 Generic Concepts In this section, we systematize generic or inclusive concepts for constructing a COMOS architecture. Section 2.1 describes the concepts of a story as an internal information structure. Sections 2.2 and 2.3 describe concepts for the internal structure and movement of COMOS. 2.1 Story as Internal Information Structure In this study, a story is considered to have a common form among internal world representations. In our previous study, we proposed Cogmic Space as a representational framework for a story as a world representation [10]. Cogmic Space is designed using an analogy from narrative comics. Narrative comics can be seen as a comprehensive sign system for expressing stories involving rich conceptual and modal information. Analogically, Cogmic Space intends to provide a mid-level representation in which conceptual and sensory/bodily dimensions of the world are mixed. The structure of a story based on Cogmic Space is summarized below: • Story: A story consists of a temporally organized sequence (or network) of panels. • Panel: A panel represents a temporal segment involving both static and dynamic aspects of the world. In a panel, various types of information elements, broadly classified into objects and motions, are organized spatially and relationally. • Object: An object represents a character, thing, speech, thought, sound, feeling, effect, or other type of perceptual or conceptual element, irrespective of the presence of physical substances in the external world. • Motion: A motion represents a physical, mental, or social action in which one or more objects are involved. • Relationship: An organic structure among story elements (i.e., objects and motions) is formed through mutual relationships between elements inside or across panels. This makes an element not an isolated, but rather, a contextual unit. Conceivable qualitative dimensions of relationships include sequentiality, continuity, change, causality, intentionality, spatiality, and conceptual relatedness.
4
T. Akimoto
Conscious and Subconscious Dimensions. In Cogmic Space, the conscious and subconscious (unconscious) levels of a story-form representation and its generative process are distinguished. The term “story” is used particularly to refer to a subconscious-level representation under the assumption that the generative process of a story involves a high complexity unsuitable for central control. This subconscious generative process is considered a self-organization through parallel-distributed operations. A conscious-level representation is referred to as an “inner discourse.” An inner discourse is dynamically generated when a part of a story reaches the conscious level. This conscious generative process is conceptualized as “inner narrating.” Both representation levels are based on the Cogmic Space framework. However, a story is a relatively fragmented and incomplete representation, and an inner discourse is a more organized representation. Concrete and Generalized Dimensions. A story represents concrete and unique events and objects. However, events and objects also have a repetitive nature or sameness among different stories. For example, a girl Lisa playing baseball at a park today and Lisa eating a stake at a restaurant yesterday can be recognized as the same person, whereas both have certain differences. A girl Rosa as a member of Lisa’s baseball team is not the same person as Lisa, but they have commonalities, including being “girls,” “baseball players,” and “humans.” Different event sequences in similar situations, for example, eating at a restaurant, may have common patterns [5]. The process of constructing general structures is assumed to be a generalization of stories or inner discourses through a semi-isomorphic relationship between the concrete and generalized dimensions.
2.2 Basics of Internal Structure of COMOS The essence of COMOS is the internal movement of stories. Thus, the internal structure of COMOS in this context refers to the informational structure of story-form representations across conscious–subconscious and concrete–generalized levels. The place in which the internal structure emerges is a memory system, the core of COMOS. We adopt a network structure to organize all story-form representations as a whole. A basic information unit (node) and connection (link) constituting this network are conceptualized as a memory actor (M-actor) and connector, respectively. These constituents are illustrated in Fig. 1. LenSA r1 r2
y1
...
x x
rn
y2
y1
w y3
Internal structure of a connector: A bundle of relational fibers
Fig. 1. Basic structures of an M-actor (x) and a connector.
Development of COMOS Architecture
5
M-Actor. An M-actor is the upper class involving the basic memory elements including a story, story element (object and motion), concept, schema, and sensory/motor pattern (these elements will be described later in Sect. 3.1). The functional nature of an Mactor is mainly formed by its relative relationships with other M-actors and the internal state, including the activation level. An M-actor has internal processes that operate from a local perspective. There are two basic processes of an M-actor, i.e., a connection formation (coordinating the strengths and relationships of connectors to other M-actors) and activation (calculating one’s activation level based on the activation levels of adjacent M-actors). Connector. A connector is a directed link from an M-actor (tail) to another (head). A connector is held and managed by a tail M-actor. As shown in Fig. 1, a connector has an internal structure, that is, a bundle of relational fibers, to form rich relational information. A relational fiber corresponds to a relational feature with strength. A relational feature is a qualitative type of relationship between M-actors based on the appearances in a story. Thus, relational fibers are associated with “relationships” in Cogmic Space (see Sect. 2.1). Thus, a connector can be interpreted as a relational feature vector in which a fiber is a vector component. This multidimensional connection enables dynamic weighting for a connector depending on the perspectives in which the types of relationships are emphasized or de-emphasized. The mechanism of controlling such perspectives is referred to as the lens of story association (LenSA). A simple form of implementing LenSA is a weight vector and an integrated weight of a connector is derived from the inner product of the feature and weight vectors (with some subsequent function). Connectors can be distinguished in terms of duration: a) a continuous (relatively long-term) connector formed through a gradual change, and b) a temporary (relatively short-term) connector generating and disappearing moment to moment. Generally speaking, the former fits into the generalized dimension formed through the accumulation of concrete experiences, whereas the latter fits into the internal structure of a story or inner discourse having a dynamic and one-time-only nature. 2.3 Basics of the Internal Movement of COMOS The internal movement of COMOS, including the generation and organization of the network of M-actors, is conceptualized by three processes: a spreading activation, connection, and combination. Spreading Activation. When a part of a story in the memory system receives attention through an inner narrating, a spreading activation among M-actors occurs from the focused part to the surrounding M-actors. This process creates a temporary organization of relevant M-actors as a distribution state of their activation levels. It provides utility information that can be used in various cognitive processes. Connection. Forming connectors between M-actors is essential for the movement of a network. We use the term relational connection to refer to a direct connection, that is, a connector, between M-actors (Fig. 2a). In addition, the two types of higher structural connections, similar to analogical mapping at a relational level [14], are defined as follows:
6
T. Akimoto
• A similarity connection (Fig. 2b) indirectly connects an M-actor to another M-actor through a mapping between their connectors to adjacent M-actors. Furthermore, the degrees of similarity and difference between the connected M-actors are quantified based on this mapping. • A systematic connection (Fig. 2c) formed two or more maps between the two partial structures. These maps are formed through similarity connections. Furthermore, the degrees of similarity and difference between the connected structures are quantified based on this mapping.
Fig. 2. Three levels of connections. Circles, solid arrows, and dashed arrows denote M-actors, connectors, and mapping, respectively.
Combination. A combination is a process of composing a new structure from two existing structures. A generated structure may be more complex, rich, or general than the original structure. A positioning combination as a generic process type of COMOS is rooted in similarity-based theories on creative cognition, particularly the conceptual blending theory developed by Fauconnier and Turner [15], including computational studies based on the theory [16, 17]. The original conceptual blending theory explains the fundamental mechanism of human creative cognition as the production of a novel concept through the combination of different familiar concepts. This theory has certain similarities with analogical cognition. As shown in Fig. 3a, the cognitive structure of conceptual blending comprises three types of mental spaces: input, generic, and blended. A mental space refers to a relatively small conceptual packet that represents a situation, event, scene, or object. An input space provides source information for composing a blended space (or blend). The blend is constructed based on a shared structure (generic space) through cross-space mapping between the input spaces. In COMOS, a similarity or systematic connection becomes a cross-space mapping for blending, and each side of the connection becomes an input space (Fig. 3b–c). The generic space is assumed to be an implicit structure underlying the formation of a similarity or systematic connection. Furthermore, similarity or systematic connections are also formed between each input space and a blend to produce a fundamental driving force of blending based on the similarity and difference between the blend and input spaces.
Development of COMOS Architecture
7
Fig. 3. Generative structures of blending. The basic diagram in (a) is based on [15].
3 Architecture of COMOS The concepts described in the previous section were synthesized at the computational level in the COMOS architecture. Its actual developmental process involves an exploratory nature, that is, the concepts and architecture have been systematized through systemic development, including our preceding studies [10, 18, 19]. The implementation language is Scala, a multi-paradigm language that combines object-oriented and functional programming. Figure 4 shows the overall structure of COMOS. The core of COMOS is a memory system in which multidimensional movements of story-form representations occur. The other two parts, i.e., an inner narrating module and action, perception, and language systems (APL), are respectively coupled with the memory system. COMOS Memory system Inner narrating
Inner discourse Generative space
Action, perception, and language systems (APL)
Physical/social environment
Memory organization
Fig. 4. Overall structure of COMOS. The processes denoted by dashed lines are unimplemented or extremely limited in the present implementation.
3.1 Memory System The memory system consists of three dimensions, that is, a memory organization, generative space, and inner discourse, in which a higher dimension forms more integrative
8
T. Akimoto
and narrow structures based on the lower dimension. These are outlined below from the lower to higher dimensions. The memory organization forms a continuous network among M-actors. The internal structure of the memory organization is divided into the following four regions: • Story region: Stories are contained in this region, regardless of whether they are based on actual experience. A story has an internal structure based on Cogmic Space, and each element (a story, panel, object, or motion) works as an M-actor. • Concept region: Concepts corresponding to linguistic symbols are organized in this region. These concepts include two sub-classes: individual concepts for identical objects, places, or times, and general concepts for general words. The essential information of a concept is formed as connectors to other concepts that are formed through the accumulation of its appearances in stories or inner discourses. • Schema region: Schemas associated with concepts and stories are organized in this region. A schema forms connectors to concepts through a relatively macroscopic timescale, based on the co-appearances of concepts throughout a flow of inner discourses based on a story. • Sensory/motor pattern region: This region comprises vector spaces for organizing multimodal features associated with concrete story elements and generalized concepts. This region is extremely limited in the present implementation because it has no actual body or APL. The generative space involves dynamic, generative processes of stories at the subconscious level. Activated stories, including one in an ongoing generative process, are incorporated into this space. Furthermore, higher (similarity and systematic) connections and generative structures of blending are constructed in this space. These structures are temporarily generated and are not held long-term. An inner discourse, as a conscious-level representation, is dynamically generated based on a story in the generative space. This is closely related to the inner narrating module. 3.2 Inner Narrating and Memory System Inner narration drives a conscious-level generative process from outside the memory system. It is particularly involved in the generative space and inner discourse through three process types: attention, appearance, and manipulation. The attention process is directed from the inner narrating module to the generative space. The inner narrating module observes the generative space and provides one or a few attentional signals to one or a few parts of the active stories. An attentional signal triggers the generation of an inner discourse. The generated inner discourse then appears in the inner narrating module as a manipulatable object. The manipulation process is directed from the inner narrating to the inner discourse and may add higher conceptual structures into an inner discourse (analogous to conscious thinking about the represented world).
Development of COMOS Architecture
9
3.3 APL and Memory System The role of APL is to mediate between the memory system and external environment. This rough segmentation (lumping of action, perception, and language) is used because its internal mechanism has yet to be evolved. Conceptually, APL and the memory system are coupled in the form of bidirectional processes between action/expression and perception/comprehension. Action and expression are outward processes assumed to be intentional processes based on or driven by an inner discourse. Perception and comprehension are inward processes that generate stories based on perceptual and/or linguistic information from the environment. This process is associated with the generative space because it is based on the complex and total operations of the memory system.
4 Process Scenarios Whereas COMOS is based on a parallel distributed architecture with no central control, achieving an integrated operation remains a difficult issue. However, our developmental strategy is conceptualized from the following three-level perspective on a system operation. • A microscopic perspective focuses on the internal processes of each distributed element. In this context, each element operates automatically with the surrounding information captured from a local perspective. • A macroscopic perspective focuses on (semi- or quasi-) autonomous integrative operations of an entire system. Although we have no clear solution for realizing this level, it is assumed that operations at this level must emerge from lower operations. • A mesoscopic perspective bridges the gap between the microscopic and macroscopic operations. It focuses on compound operations consisting of lower distributed operations and/or other compound operations. Whereas microscopic operations are semantically abstract, mesoscopic operations are associated with relatively general cognitive terms, which potentially include association, remembering, generalization, analogy, blending, perception, action, comprehension, and expression. A mesoscopic operation works automatically in a given internal context, typically provided by a trigger with or without input information from other operations. However, there is no autonomy because the internal context of an operation must be produced outside. Our immediate focus is on the relationship between the microscopic and mesoscopic levels, and two process scenarios at the mesoscopic level are presented below. 4.1 Scenario 1: Network Formation in a Story and Memory Organization The first scenario focuses on the coupled processes of forming a temporary experience and continuous memory organization. More precisely, a temporary network inside a story is generated and then inscribed into the network of the memory organization. These two processes are referred to as an inner discourse generation and memory network formation, respectively. A story under this scenario is a mostly static representation, that is, this scenario does not include the generation of content elements of a story.
10
T. Akimoto
A procedural outline of this scenario is described as follows (see also Fig. 5): 1. Inner discourse generation: An inner discourse (as the relational structure of a temporary experience) is generated through the following three steps: a. Making an attention center: The inner narrating module gives an attentional signal to a story’s element, that is, a motion or object in a panel. b. Internal-story connection: Relational connections around the attention target are generated through the spreading of attention levels and the formation of temporary connectors through distributed operations of the story elements. c. Meta-level connection: Some of the story elements receive connections from a meta-story perspective (conceptualized as the narrator-self). For example, metalevel connections potentially represent temporal and personal relationships corresponding to the tense and person in a language and beliefs in reality or the fictionality of events. 2. Memory network formation: The relational structure of the inner discourse is inscribed on the network of memory organization through a partially isomorphic relationship. This process comprises the following two steps. a. Translation: The relational structure of an inner discourse is translated into a relational structure between concepts and schemas. A story element preliminarily has connectors to zero or more concepts in the memory organization (e.g., an object John is connected with concepts, e.g., “John” and “boy”). Hence, a connector between story elements can be translated into connector(s) between their associated concepts. For example, in a simplified intuitive notation, a connector cd = [meat → oJohn (r agent :0.7, r coappearance :0.3)] in an inner discourse (including a part expressed as “John eats _”) is translated into a connector ct = [geat → d John (r agent :0.7, r coappearance :0.3)], where m, o, g, d, and r denote the motion, object, general concept, individual concept, and relational fiber, respectively. b. Assimilation: In a memory organization, each M-actor (particularly a concept or schema) incorporates translated connectors in which the tail overlaps with itself. In an M-actor, the incorporated connectors are mapped to existing connectors based on a match in the head M-actor. The mapped connectors are assimilated into the corresponding connectors, and unmapped connectors become new connectors. The actual process of assimilation includes slightly complicated mechanisms including a weakening of the relational fibers over time, a strengthening and increase in the stability of relational fibers through accumulating experiences, and a coordination of the fiber strengths among the connectors.
Development of COMOS Architecture
11
Inner discourse
Narrator-self
Meta-level connection
Translation Internal-story connection Relational connections between concepts and schemas
Attention center
Story in the generative space
Assimilation
Memory organization
Fig. 5. Outline of scenario 1.
Fig. 6. Outline of scenario 2.
4.2 Scenario 2: Internal Movement of Generative Space The second scenario focuses on the internal movement of the generative space, including the story association, formation of higher structural connections, and blending of stories. In general terms, these processes are the foundation of memory retrieval, the computation of the similarity and difference between events or objects, and memory-based creativity including analogical cognition and blending, among other factors.
12
T. Akimoto
A procedural outline of this scenario is described as follows (see also Fig. 6): 1. Story association through spreading activation: This scenario starts by giving an attentional signal to a story’s element. This causes the generation of an inner discourse (as described in Scenario 1), and a spreading activation then occurs in the memory organization through parallel distributed operations among M-actors. Stories containing activated elements then rise into the generative space and are accessible during subsequent processes. 2. Similarity connection: The next process is to generate a similarity connection from the attention center (base, b) to another story’s element (target, t). This process consists of three steps: a. Target setting: The target of the connection, t, is chosen from the activated elements of the stories. A simple method for choosing t is to prioritize a higher activation level. The chosen t also receives a (sub)-attentional signal to generate a temporary network around t. b. Mapping: The connectors of b are mapped to the connectors of t based on their similarities. The similarity between connectors is defined as the synthesis of a relational similarity (between bundles of relational fibers) and elemental similarity (between head M-actors). This mapping is constructed on a one-to-one basis, and connectors with no corresponding connector are mapped to/from a blank, respectively. c. Quantification: The degrees of similarity and difference between b and t, denoted as simα (b, t) and diffα (b, t), are quantified based on the above mapping. 3. Systematic connection: The next optional process is to generate a systematic connection from a partial structure in a story (base, B) to a partial structure in another story (target, T ). In this context, a partial structure is a temporary network around an attention center. This procedure is analogous to a similarity connection: a. Target setting: In this context, the network around t of a similarity connection becomes T of a systematic connection. b. Mapping: The elements of B are mapped to the elements of T on a one-toone basis. Here, each map is formed by a similarity connection, and mapping is applied based on the maximum similarity (simα ) priority. Elements with no corresponding elements are mapped to/from a blank, respectively. c. Quantification: The degrees of similarity and difference between B and T, denoted as simβ (B, T ) and diffβ (B, T ), are quantified based on the above mapping. 4. Blending: Two input spaces, S l and S r , and a mapping from S l to S r , M lr , are provided by a similarity or systematic connection. These input spaces are blended to create a new story or to update the original story of S l . This process is divided into the following two steps: a. Combination: A blended space, S b , is composed by combining each pair in M lr in a step-by-step fashion. Three simple operations for paired elements are
Development of COMOS Architecture
13
defined through a selection and/or combination of the elements, including their associated concepts: a) choosing one side, b) making a simile, and c) creating a compound concept. For example (in a simplified intuitive notation), when S l = [m1 :kick (r agent o1 :fighter) (r object o2 :robber)], S r = [m2 :blow_up (r agent o3 :soldier) (r object o4 :building)], and M lr = (m1 –m2 , o1 –o3 , o2 –o4 ), a possible blend is S b = [m3 :kick as-if-to blow_up (r agent o5 :soldier_fighter) (r object o6 :robber)]. b. New addition or integration: There are two options for the destination of the blended structure. (1) If S b has continuity from S l , S b can be integrated into the original story of S l . A general meaning of this process is to complement or enrich the original story by incorporating information from S r . Thus, it is similar to analogical reasoning. (2) If S b is treated as an imagination with a discontinuity from the input spaces, S b is a new representation in the generative space (and may be inscribed into the memory organization when it receives attention through an inner narration).
5 Conclusion and Remaining Issues In this paper, the COMOS architecture is presented, with an emphasis on the holistic and theoretical aspects. Although empirical evidence is still lacking, we believe that to produce engineering knowledge for comprehensive AI (i.e., cognitive architectures/systems) it is important to not only accumulate solid building blocks, but also to construct a holistic theory (hypothesis) in an abductive fashion. Although the proposed COMOS architecture is mostly unsophisticated at the element level, essential issues remain at the system level. We provide the following short accounts of two essential issues. The first remaining issue is relevant to scenario 1 (Sect. 4.1). A key point of COMOS is lumping together various types of experiences, such as environmental or communicative experiences, internal experiences (e.g., recollection and daydreaming), and their hybrids. However, the qualitative difference among them is not considered in a memory network formation, whereas it seems to be essential for the nature of memory and learning. This issue must be addressed from multiple perspectives, such as a story’s informational quality, emotional values, and meta-level beliefs (e.g., real or fantasy) associated with events and objects, and the effects of prediction/intent on network formation. The second remaining issue is the orientation and constraints of the generative movement of stories. In scenario 2 (Sect. 4.2), most operation selections (e.g., connection targets and manner of combination used in blending) are conducted based on internal abstract metrics including the activation level, similarity, and difference. However, we consider that there are three missing pieces: a) the goal- or future-orientedness in story generation, b) the maintenance of the internal consistency in a story with the detection of an inconsistency caused by a local change in the story, and c) the coordination between a story and the represented environment through APL. In the future, we should seek an emergent theory to synthesize these missing pieces.
14
T. Akimoto
Acknowledgement. This work was supported by JSPS KAKENHI, Grant Number JP18K18344, and the Support Center for Advanced Telecommunications Technology Research, Foundation. We would like to thank Editage (www.editage.com) for English language editing.
References 1. Newell, A.: Unified Theories of Cognition. Harvard University Press, Massachusetts (1990) 2. Langley, P., Laird, J.E., Rogers, S.: Cognitive architectures: Research issues and challenges. Cogn. Syst. Res. 10, 141–160 (2009). https://doi.org/10.1016/j.cogsys.2006.07.004 3. Samsonovich, A.V.: On a roadmap for the BICA challenge. Biologically Inspired Cogn. Archit. 1, 100–107 (2012). https://doi.org/10.1016/j.bica.2012.05.002 4. Laird, J.E., Lebiere, C., Rosenbloom, P.S.: A standard model of the mind: toward a common computational framework across artificial intelligence, cognitive science, neuroscience, and robotics. AI Mag. 38(4), 13–26 (2017). https://doi.org/10.1609/aimag.v38i4.2744 5. Schank, R.C., Abelson, R.P.: Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures. Lawrence Erlbaum, New Jersey (1977) 6. Schank, R.C.: Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge University Press, New York (1982) 7. Winston, P.H.: The right way. Adv. Cogn. Syst. 1, 23–36 (2012) 8. Akimoto, T.: Stories as mental representations of an agent’s subjective world: a structural overview. Biologically Inspired Cogn. Archit. 25, 107–112 (2018). https://doi.org/10.1016/j. bica.2018.07.003 9. Akimoto, T.: Narrative structure in the mind: translating Genette’s narrative discourse theory into a cognitive system. Cogn. Syst. Res. 58, 342–350 (2019). https://doi.org/10.1016/j.cog sys.2019.08.007 10. Akimoto, T.: Cogmic space for narrative-based world representation. Cogn. Syst. Res. 65, 167–183 (2021). https://doi.org/10.1016/j.cogsys.2020.10.005 11. Maturana, H.R., Varela, F.J.: Autopoiesis and Cognition: The Realization of the Living. D. Reidel Publishing Company, Massachusetts (1980) 12. Rumelhart, D.E., McClelland, J.L., The PDP Research Group: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Massachusetts (1986) 13. Minsky, M.: The Society of Mind. Simon and Schuster, New York (1986) 14. Gentner, D.: Structural-mapping: a theoretical framework for analogy. Cogn. Sci. 7(2), 155– 170 (1983). https://doi.org/10.1207/s15516709cog0702_3 15. Fauconnier, G., Turner, M.: The Way We Think: Conceptual Blending and The Mind’s Hidden Complexities. Basic Books, New York (2002) 16. Goguen, J.A., Harrell, D.F.: Style: a computational and conceptual blending-based approach. In: Argamon, S., Burns, K., Dubnov, S. (eds.) The Structure of Style, pp. 147–170. SpringerVerlag, Berlin (2010) 17. Eppe, M., et al.: A computational framework for conceptual blending. Artif. Intell. 256, 105–129 (2018). https://doi.org/10.1016/j.artint.2017.11.005 18. Akimoto, T.: Theoretical framework for computational story blending: from a cognitive system perspective. In: Proceeding of 10th International Conference on Computational Creativity, pp. 49–56. North Carolina (2019) 19. Akimoto, T.: Developing a parallel distributed memory system of stories: a preliminary report. In: Samsonovich, A.V., Klimov, V.V. (eds.) Procedia Computer Science, vol. 190, pp. 23–30 (2021). https://doi.org/10.1016/j.procs.2021.06.002
Gibsonian Information: A New Approach to Quantitative Information Bradly Alicea1,2(B)
, Daniela Cialfi1,3 , Avery Lim1,4 , and Jesse Parent1
1 Orthogonal Research and Education Laboratory, Champaign-Urbana, IL, USA
[email protected]
2 OpenWorm Foundation, Boston, MA, USA 3 University of Chieti-Pescara, Pescara, Italy 4 Center for Enabling EA Research and Learning, Blackpool, UK
Abstract. We propose a new way to quantitatively characterize information: Gibsonian Information (GI). GI provides a means to characterize how agents extract information from direct perceptual signals. In this paper, we characterize GI quantitatively, and contrast this with rival approaches to quantitative information. Our formulation differs from existing approaches to measuring information in two ways. The first involves an emphasis on sensory processing and the dynamic evolution of such interactions. More broadly, GI also provides a means to measure a generalized indicator of nervous system input, and can be characterized in terms of multisensory integration and collective behavior. Overall, GI enables a differential system between both motion (information) and random noise/stasis (non-information) that can potentially be applied to a wide range of problem domains. Keywords: Ecological psychology · Information Theory · Development and plasticity · Naturalistic perception
1 Introduction Information can be difficult to measure and define. With respect to cognition and the brain, information is a polysemic concept [1] that can be formalized in a number of different ways: from Shannon Information [2], Bayesian Surprise [3], and even Integrated Information Theory [4]. We will introduce an alternative way to measure and otherwise characterize information content based on direct perception and ecological perspectives on perception. This perspective allows us to emphasize the nature of information as a continuous dynamical phenomenon, dependent on both an embodied observer and multiple sensory modalities. Additionally, there are two properties which make our alternative perspective unique: embodiment of the perceptual agent the role of an observer. Embodiment is particularly critical to this perspective, and while our approach can be applied to many different types of agent, from biological cells to multicellular organisms with nervous systems, and from computational agents to multi-agent collectives. One core notion in GI is that information arises from inhomogeneous information and motion in the environment. This differs from Shannon Information, where information © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 15–19, 2022. https://doi.org/10.1007/978-3-030-96993-6_2
16
B. Alicea et al.
content is synonymous with diversity and unpredictability [5]. In particular, the ability of an observer to detect motion against a static background and to exploit spatiotemporal distinctions in environmental information are key to the utilization of GI. The combination of an embodied context in which there are multiple modes of sensory input and interpretable motion provides affordances that evoke structure in the environment. While we stop short of proposing a mathematical theory of affordances, our mathematical formulation of GI relies on the structure provided by affordances as an external source of information that is encoded into the nervous system through perceptual processes. In a manner similar to Shannon Information, we can define GI in terms of an ensemble. A GI ensemble is a sequence of events observed in a spatial array. This results in a distribution that is not binomial: in fact, any number of continuous distributions. We can think of a movement information signal with rich spatial structure as an Exponential distribution and random background noise as a Gaussian distribution. This is similar in principle to the concept of visual momentum [6], or the reinforcement of relevant data to support an effective distribution of direct perceptual information, allowing an agent to extract and integrate information over time, and is driven by spatial parallelism. Athalye et al. [7] have observed that neural activity patterns leading to the reinforcement of environmental structure receive more frequent feedback. High visual momentum results from a continuity of action, while low levels result from discontinuous action. The relationship between perceptual motion and spatiotemporal distributions yield an information measurement captured in terms of a parameter value or scalar representing the degree of change from one time point to the next. This provides us with a dynamical behavior rather than a discrete measure of bits.
2 Mathematical Definitions GI is defined by motion, or more generally the distribution of objects in space and time. Unlike Shannon Information, GI is defined by the arrival of environmental information in space and time. Ideally, this is described statistically in the form of a Poisson distribution. This suggests parallels between GI and Fisher Information. The statistical signature of GI is embedded in a spatiotemporally- dependent external communication channel which delivers sensory input to an agent’s sensory apparatus. The external and internal communication channels which define GI and its internal representation are described mathematically in the form gi(t) =
λx λx −λ e , gi(t) = e−λ + τ x! x!
(1)
where i(t) is the information function over time, τ is temporal delay, and the channel content is defined by a standard Poisson distribution. The Poisson arrival model allows us to model the structure of the sensory environment as a set of discrete spatiotemporal points that turn on and off as the agent encounters clusters of objects called affordances. This could be a surface or signal that is distinct from a random background, or a surface or signal that has a characteristic orientation relative to the agent’s position relative to the sensory input.
Gibsonian Information
17
Our Poisson arrival model is an encoding of information that translates into the activity of the internal communication channel. Sensory signals enter the internal communication channel at the sensory apparatus via sensory transduction, which occurs at a characteristic rate. This rate is controlled by the parameter τ, which can results in effects such as under- or over-sampling (e.g. flickering and aliasing). As GI relies upon feedback from previous states of the internal component, the temporal delay term τ can also be modeled as an evolution equation, which takes the form τ = f(t, τ )
(2)
where the temporal delay evolves over time t. The ability to evolve the temporal delay as a feedback over time results in a mechanism for generativity, creating an internal representation that resembles an ever-changing and adaptive distortion of directly perceived sensory input. The Poisson model of environmental information is only one possible way to represent an ensemble of sensory signals. In cases where an informative background must be separated from other relevant information, the following equation provides us with a differential measure of GI. n d2 − d1 (3) gi(t) = 0
where di is a statistical distribution characterizing each source of spatial information and n is the extent of the contrast in spatial units. For a neutral background (one with no useful information), d1 would be 0. To better understand the GI concept as an operational phenomenon, it is necessary to shed light on three important properties that lead us to the three quantitative principles of GI.
3 GI in a Biological Nervous System Let us now consider the idea of emergent direct perception as multisensory integration in the context of a biological nervous system. In this case, direct perception arises from both experience with the sensory information that constitutes a scene as well as the ability to integrate this information in a coherent manner. We can take as an example a human observer riding a roller coaster: An observer can see the track ahead of them, expecting visual movement in certain directions, but also experiencing strong inertial forces which contribute to the experience. Each sensory channel transmits a different sensory modality, and GI is assessed by the degree of disjoint information between all channels. We expect to see a variety of superadditive and suppressive effects over time, particularly as cues of different modalities become incongruent. Each sensory channel has two possible signal types: direct perception and indirect perception. These signal types represent the external and internal communication channels, respectively. The signals themselves have a specific orientation and strength as shown by the direction and length of the arrow. When they are all pointed in the same direction, the sensory environment is said to be congruent. By contrast, signals pointed in different directions are incongruent, which has effects on multisensory integration
18
B. Alicea et al.
and the overall coherence of a given percept. Although all sources fail to point in the same direction, this incongruence does not result in confusion or suppressive effects. Unlike the confirmatory action of congruent sensory channels, incongruence might lead to superadditive effects by serving as a counterfactual source of information and a richer representation of the phenomenon in question.
4 Overarching Themes and Future Directions One overarching theme of GI is the integration of sensory information, particularly over time. In biological observers, as opposed to computational agents, it is known that multisensory information is important in cognitive functions such as attention, and can arise from phenomena such as the observation of biological motion and latencies between different sensory modalities. With respect to multisensory congruence then, future research should investigate the relationship between informational fluxes and changes in configurational diversity over time. As GI is an inherently dynamical approach to information, constructs related to the principles of disjoint distributions and coherent movement can capture these fluxes in structural information. GI can be placed in a theoretical context of agent action as demonstrated in comparison to Simultaneous Localization and Mapping (SLAM - [8]) and the perception-action loop. The basic SLAM algorithm decomposes perceptual input-output into four components: input, mapping, localization, and output. The internal components of the SLAM model provide spatial content, and map to the additive τ term in Eq. (1). As a delay parameter of the external encoding, in this case delay provides a means to model displacements in the representation of space. Unlike with SLAM, our additive τ term in Eq. (2) represents temporal delay, and provides temporal context to the external signal. The application of GI to a SLAM context is particularly useful in the case of collective behaviors and adds to our understanding of how GI characterizes coherent movement. Upon considering a population of observers, GI can be measured from different points of view in a common environment. This is potentially valuable for understanding the spatial variation and richness of features in a given environment. Every individual agent will take an egocentric view of their environment. In cases where a single agent has multiple sensory inputs (e.g. multimodal perception in a sensory receptor array), the egocentric viewpoint is enriched. This produces a set of differential perspectives which provides a sampling of variety much differently than the ensemble approach of Shannon Information. In the case of the latter, variation is condensed into a single parameter. This is but one example of how GI can enrich our quantitative understanding of information.
References 1. Floridi, L.: The Philosophy of Information. Oxford University Press, Oxford, UK (2011) 2. Shannon, C., Weaver, W., Weiner, N.: The Mathematical Theory of Communication. University of Illinois Press, Urbana, IL (1949) 3. Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision. Res. 49(10), 1295–1306 (2009)
Gibsonian Information
19
4. Oizumi, M., Albantakis, L., Tononi, G.: From the phenomenology to the mechanisms of consciousness: Integrated Information Theory 3.0. PLOS Comput. Biol. 10(5), e1003588 (2014) 5. Dretske, F.: Knowledge and the Flow of Information. MIT Press, Cambridge, MA (1981) 6. Woods, D.D.: Visual momentum: a concept to improve the cognitive coupling of person and computer. Int. J. Man Mach. Stud. 21, 229–244 (1984) 7. Athalye, V.R., Santos, F.J., Carmena, J.M., Costa, R.M.: Evidence for a neural law of effect. Science 359, 1024–1029 (2018) 8. Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag. 13(2), 99–110 (2006)
Consciousness Semanticism: A Precise Eliminativist Theory of Consciousness Jacy Reese Anthis1,2(B) 1 University of Chicago, Chicago, IL 60637, USA
[email protected] 2 Sentience Institute, New York, NY 10006, USA
Abstract. Many philosophers and scientists claim that there is a ‘hard problem of consciousness’, that qualia, phenomenology, or subjective experience cannot be fully understood with reductive methods of neuroscience and psychology, and that there is a fact of the matter as to ‘what it is like’ to be conscious and which entities are conscious [13]. Eliminativism and related views such as illusionism argue against this. They claim that consciousness does not exist in the ways implied by everyday or scholarly language. However, this debate has largely consisted of each side jousting analogies and intuitions against the other. Both sides remain unconvinced. To break through this impasse, I present consciousness semanticism, a novel eliminativist theory that sidesteps analogy and intuition. Instead, it is based on a direct, formal argument drawing from the tension between the vague semantics in definitions of consciousness such as ‘what it is like’ to be an entity [41] and the precise meaning implied by questions such as, ‘Is this entity conscious?’ I argue that semanticism naturally extends to erode realist notions of other philosophical concepts, such as morality and free will. Formal argumentation from precise semantics exposes these as pseudo-problems and eliminates their apparent mysteriousness and intractability. Keywords: Philosophy of mind · Consciousness · Hard problem of consciousness · Artificial Intelligence · Neuroscience · Eliminativism · Illusionism · Materialism · Semantics · Semanticism
1 Introduction This paper attempts to add a new perspective to the debate on ‘What is consciousness?’ I sidestep the conventional approaches in an effort to revitalize intellectual progress. Much of the debate on the fundamental nature of consciousness takes the form of intuition jousting, in which the different parties each report their own strong intuitions and joust them against each other in the form of intuition pumps [24], gestures, thought experiments, poetic descriptions, and analogies. Consider, for example, the ‘deflationary critiques’ of Chalmers’ argument for the ‘hard problem of consciousness’. In 1995, Chalmers asserted that there are some ‘easy problems of consciousness’ that could eventually be explained with reductive methods of scientific inquiry, but that ‘the problem © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 20–41, 2022. https://doi.org/10.1007/978-3-030-96993-6_3
Consciousness Semanticism
21
of experience’ seems to remain even with a full behavioural and neuroscientific understanding of the human brain [13]. Critics such as P.M. Churchland, P.S. Churchland, and Dennett argued in 1996 that it clearly does not make sense to speak of a ‘hard problem’ for other concepts, such as life, perception, cuteness, light, and heat [19, 20, 22]. They argued that because there is no good reason to think consciousness is relevantly different from these concepts, there is no ‘hard problem of consciousness’. Chalmers replied in 1997 by insisting that phenomena related to consciousness ‘cry out for explanation’ beyond their material function while these other phenomena do not [14]. This intuition jousting leads to an impasse that continues to encumber progress in the field of consciousness studies. In this paper, I take an alternative approach. I readily yield that the claim, ‘Consciousness doesn’t exist’, is counterintuitive, and I try to critique that intuition with neither intuitions nor analogy. I make use of analogy to explain my theory, but unlike most arguments on the nature of consciousness, analogy and intuition do no work in the theory itself. The current literature is also plagued by crosstalk. Consider Strawson’s 2018 claim [56] that eliminativism is ‘the silliest claim ever made’. This seems to result from Strawson’s focus on a strawman of consciousness eliminativism—portraying it as the claim that there is no mental phenomenon that we can directly gesture at, whatever it corresponds to after third-person analysis. This directly accessible phenomenon is not the sense of consciousness that I deny, and I have not seen it explicitly denied elsewhere. Indeed, in Dennett’s 2018 reply [23] and Strawson’s subsequent 2019 reply [54], both acknowledge they are using different definitions of consciousness. The cornerstone of my approach is thus to clarify a distinction between ‘consciousness-as-self-reference’, to denote it as the mere contentless self-reference, and a different phenomenon, manifest in a statement such as, ‘Is this entity conscious?’, which I call ‘consciousness-as-property’ because it is the assignment of a property rather than an ostensive self-reference. Both of these phenomena appear to fall under definitions of ‘phenomenal consciousness’1 [7], ‘qualia’ [36], and other references to subjective mental phenomena, though I will discuss at length the issue of vagueness in these definitions. This distinction between the direct (arguably undeniable2 ) datum of one’s consciousness-as-self-reference and the ambitious thesis of a real property of consciousness applicable across entities seems to be a neglected analytical perspective.3 1 This article is oriented towards phenomenal consciousness, rather than Block’s related concept
of access consciousness [7]; most philosophical and scientific study of consciousness centers on the former, and the latter is less semantically problematic. I thank Keith Frankish for discussion of this distinction. 2 Debates on consciousness fallibilism are beyond the scope of this paper, but one argument is that the move from ‘I directly observe my experiences’ to ‘My experiences exist’ still hinges on logic, such as modus ponens from the conditional, ‘If I directly observe something, it exists’, and we cannot have absolute certain in even basic logic. Nonetheless, debates on eliminativism do not hinge on fallibilist or infallibilist claims. 3 McGinn [39] can be read as gesturing at such a distinction in his argument that the ‘property P, instantiated by the brain, in virtue of which the brain is the basis of consciousness’ is ‘closed to perception’, though the distinction is not developed in the sort of detail required here. I thank David Chalmers for raising this point. There is also a similar point made by Gloor in Footnote (18) [27].
22
J. R. Anthis
The theory argued for in the following pages, consciousness semanticism, asserts that consciousness-as-property does not exist in the way commonly implied by everyday and scholarly discourse. This theory is closely related to other formulations of eliminativism, type-A materialism,4 and illusionism (we could call it semantic eliminativism or semantic illusionism), but it is not intended to perfectly align with any particular one. The main difference is that I avoid argument by intuition and analogy—rather, the work in the argument is done by highlighting the semantic vagueness of the definitions of ‘consciousness’ used in everyday and scholarly discourse, arguing that this is inconsistent with the precision implied in everyday and scholarly questions we ask about consciousness, such as ‘Is this computer program conscious?’, and thus, consciousness does not exist as we like to assume. In other words, I argue that the way in which ‘consciousness’ is used by consciousness realists to imply that there is a ‘hard problem’ hinges on its vague definition, which is incoherent. Once we have established the fundamental nature of consciousness, there are many important conceptual and empirical research questions that naturally follow, such as the extent to which structures and processes such as Baars’ 1988 ‘global workspace’ [5] and Tononi’s 2008 ‘integrated information’ [59] obtain in humans, nonhuman animals, and artificial intelligences, and the extent to which these phenomena correlate with and cause self-reports of human and machine consciousness. These are deeply important questions, but the arguments herein suggest that there is no fact of the matter—no discovery that could be made—about which of these phenomena are the correct description of consciousness. If we add precision to the consciousness debates, as is more obviously necessary to evaluate other properties such as ‘life’ or ‘brightness’, then notions of the hard problem and consciousness-as-property evaporate, clearing the intellectual quagmire and making way for intellectual progress in a rigorous understanding of biological and artificial minds.
2 Terminology and Concepts Semanticism is intended to be a precise, semantics-driven theory, so I begin with an extensive articulation of the terms and concepts involved in its argument before presenting the formal argument in the following section. When we force definitions to be precise, we must smooth out the scatterplot of semantic intuitions just as we would with statistical data points in a mathematical regression, retaining some valuable intuitions while discarding contrary ones to achieve a parsimonious description of reality. As argued by Wittgenstein in 1922 [60], we must select definitions that best ‘picture’ the world. My position here is especially vulnerable given how explicit I have made its substance, including those definitions, but I think that is how we will make intellectual
4 Eliminativism is not always materialist or physicalist. Eliminativists tend to think dualist expla-
nations are incorrect, but it is possible that the best explanations of conscious-seeming mental phenomena will invoke phenomena outside physical reality as currently understood. Under semanticism, even if we were to find such explanations, we would still need to decide whether to categorize those phenomena as conscious, non-conscious, or indeterminate.
Consciousness Semanticism
23
progress and, perhaps optimistically, how we will achieve far greater agreement among philosophers on the nature of consciousness.5 2.1 Eliminativism The specific claim I will articulate and defend is that the property of consciousness implied in our everyday language and scholarly discourse does not exist. I take ‘property’ and ‘exist’ as the key terms herein. Semanticism is one example of the eliminativist view, defined by The Stanford Encyclopedia of Philosophy as, ‘the radical claim that our ordinary, common-sense understanding of the mind is deeply wrong and that some or all of the mental states posited by common-sense do not actually exist and have no role to play in a mature science of the mind’ [49]. Eliminativism is a general view, in that it does not specify exactly what mental states do not exist; I call the specific view ‘consciousness semanticism’ (or equivalently ‘semantic eliminativism’ or ‘semantic illusionism’) because it is grounded in a criticism of the inconsistent semantics of consciousness in everyday and scholarly discourse.6 2.2 Property Here, I adopt the conventional usage of the term ‘property’ to mean ‘those entities that can be predicated of things or, in other words, attributed to them’ [43]. I expect this usage to be uncontroversial or at least easily translated into different framework of definitions (e.g., some formulations of structuralism). 2.3 Existence The definition of ‘exist’ is much more fraught. Depending on your definitional approach, the definition can vary based on context or other factors [30]. What exactly does it mean for a property such as consciousness, life, or wetness to exist? There seems to be no consensus in the extant literature on how to approach this. Moreover, there is no established definition that is sufficiently precise to resolve debates about consciousness’ existence. 5 It may be that even staunch defenders of consciousness’ existence would be eliminativists under
my view. I take this as confirmation that the argument is sound and that intellectual consensus is more tractable than commonly assumed, but with other approaches to philosophical inquiry, one can take this as rendering the eliminativist position trivial and thus uninteresting. I thank David Chalmers and Jake Browning for developing this point. 6 Labeling philosophical concepts is challenging, given almost every plausible English word already has an established meaning, especially the most meaningful words. ‘Semanticism’ fortunately only has one significant established usage, as far as I can tell. According to Akiba, ‘Semanticism about vagueness (or the semantic theory of vagueness) holds that vagueness exists only in language and other forms of representation and not in the world itself’ [1]. While this internalist view is related to my argument, it is not isomorphic, and given a large majority of potential philosophical terms (e.g., ‘relevant word’ + ‘ism’) have already been used somewhere in the field, ‘semanticism’ still seems to be the best descriptor.
24
J. R. Anthis
How then do I approach establishing a definition? I take it that there is no right or wrong definition without preestablishing some criteria for our definitions.7 In this case, I aim for the criteria of simplicity, precision, and approximation of the usage of the word in everyday language. Thus, I propose this working definition: Existence: A property exists if and only if, given all relevant knowledge and power, we could categorize the vast majority of entities8 in terms of whether and to what extent, if any, they possess that property. While it is counterintuitive to be so specific about the definition of such a fundamental term, this definition corresponds with common-sense questions such as ‘Is this computer program conscious?’ or ‘Is there something it is like to be a rock?’9 By asking these questions, we act as if there is an objective, potentially discoverable answer as to whether any given entity in the universe has conscious-as-property. This is ubiquitous in everyday and scholarly discourse on consciousness, but the semanticist view entails that, even if we could observe everything in the universe with unlimited time and intellectual capacity to analyse those observations, we could never answer these questions or even assign coherent probabilities to possible answers given the common definitions of consciousness.10 Despite its appeal, this operationalization of consciousness will inevitably be contentious. The ‘potential discoverability’ definition seems to best fit with the criteria of simplicity, precision, and approximation of the usage of the word in everyday language. However, if one disagrees with this definitional approach and wants to keep the definition of ‘existence’ vague, that entails more than the nature of consciousness being a challenging or confusing topic, but that it is a fundamentally impossible debate to resolve. In other words, I cannot tell you if I am a consciousness eliminativist or not unless we start 7 As Putnam says, ‘the logical primitives themselves, and in particular the notions of object and
existence, have a multitude of different uses rather than one absolute “meaning”’ [47]. I am not claiming that there is a univocal or unambiguously correct definition of existence. In fact, much of my criticism of the current consciousness debate can be read as a rejection of that claim, at least in the sense that for a definition to be correct, we need to specify criteria. My choice of meaning is thus only motivated as the best operationalization of the term that I know of for the purpose of resolving debate over the nature of consciousness. Without such operationalization, we could not make progress one way or the other. 8 We can define existence more strongly, requiring the categorization of all entities. The strong and weak versions of existence both seem worth consideration to me, and the argument for consciousness semanticism works with both. I thank Keith Frankish for raising this point. 9 Sometimes the referent of ‘conscious’ is not an entity per se, but a mental state itself, such as, ‘Is anger conscious?’ or ‘Is the vestibular sense conscious?’ I take these claims to be similarly vague in most cases. The exception is a deictic definition of consciousness, in which they may be true by definition, but as I discuss below, the one example does not constitute a definition that can be extended to the world at large. 10 With definitions that are at least somewhat precise, such as those defining consciousness as ‘brain or no brain’ or ‘integrated information’, we can of course assign coherent probabilities in some cases (e.g. an animal with no brain or a computer with no integration has zero probability of consciousness). While these definitions are interesting to discuss, they have not been widely endorsed in the literature.
Consciousness Semanticism
25
the discussion with a precise definition of what it means for consciousness to ‘exist’. You might have a different definition you want to use for existence, such as that all possible properties exist or all properties we can intelligently discuss, even if we do so vaguely, exist. Again, I think there are good arguments for various operationalizations, and I think we should engage all of them, in order to avoid what Chalmers calls a ‘verbal dispute’ [17]. Under a different definition or an insistence to avoid precisification of ‘existence’, the rest of this paper may be unpersuasive. However, the question addressed by this paper can simply be read instead as ‘Is consciousness potentially discoverable?’ or ‘Is consciousness vaguer than we make it out to be?’11 The view outlined in this paper may be called ‘consciousness incoherentism’. This would not change the significant practical implications. The map is not the territory [35]. 2.4 Omniscience and Potential Discoverability One could also argue that this definition is simply a restatement of the ‘hard problem’ itself, an impassable gap between our scientific knowledge of the physical world and the ineffable state of consciousness [13], or a restatement of the ‘problem of other minds’, that we cannot peer into the minds of other beings to know if they are conscious [15]. Prima facie it seems that these claims also render the nature of consciousness impossible to discover. My claim is different than these because I specify omniscience. That something exists does not mean we will ever in practice have the tools to discover it. Rather, it means that in principle, that something exists means that it could be discovered by some sufficiently powerful entity—powerful not just in observing everything, but with the intellectual capacity to analyse and fully understand all observations. An omniscient entity therefore would not have trouble overcoming the ‘hard problem’ or the ‘problem of other minds’. In other words, one can assert consciousness exists in terms of ‘potential discoverability’ while still maintaining that the ‘hard problem’ is still an impassable gap for mere humans lacking omniscience. Semanticism implies that the ‘hard problem’ and ‘problem of other minds’ do not represent meaningful issues in consciousness studies, except perhaps as meta-meaningful facets of our imperfect scholarly understanding. On the far side of these proposed gaps— in the realm of subjective experience or other ineffable or impermeable properties—there is nothing that matches our common-sense definition of consciousness. Gaps that have nothing on the other side are merely borders of reality. In the language of Chalmers [14], semanticism implies that all that needs explaining about consciousness is its functions and any other similarly effable features. There are many different definitions of omniscience we can use here, but none have major implications. We can try to imagine a category of entities that exist but are not 11 I am not the first to suggest that consciousness is a vague concept and that this leads to problems
in conventional approaches to understanding consciousness. For example, Papineau writes, ‘My thesis will not be that there is anything vague about how it is for the octopus itself. Rather, the vagueness lies in our concepts, and in particular whether such phenomenal concepts as pain draw a precise enough boundary to decide whether octopuses lie inside or outside’ [33].
26
J. R. Anthis
potentially discoverable. This seems intuitively reasonable, but I have not read a coherent operationalization of this possibility that jibes with common sense. These distinctions seem relatively arbitrary. Take a very large number, 3ˆˆˆˆˆˆ3 in up-arrow notation [34] with however many exponents you need to make it so large that the decimal representation cannot be computed with all the computing power made with all the atoms in the universe. The decimal representation of this number seems undiscoverable, but does it exist? Is it real? It seems that the everyday and intuitive definitions of ‘exist’ and ‘real’ are unclear in this kind of edge case, so it becomes an arbitrary semantic choice. Moreover, nobody is claiming that consciousness exists but is hidden from us by computational limitations. Another class of entities are those are too far away in time or space from us to know about given the laws of physics, even with all knowledge and power up to but not including breaking those laws. For example, ‘Is there currently a blue moon around any planet in the Andromeda galaxy?’ which is unknowable in the sense that it would take us millions of years to see that amount of light from Andromeda, even if we can build equipment sensitive enough to detect it. Similarly, assuming time travel is impossible, we cannot examine the past, such as, ‘Were any atoms in the White House today also in the eggs George Washington ate for breakfast on March 17, 1760?’ which is unknowable in the sense that we cannot go back and check what Washington had for breakfast, and it currently does not seem like a computer could ever reverse-engineer the physical processes of the universe with this level of fidelity. This speaks to a broader observation that there is a wide gradient of possibilities when one specifies a property such as omniscience: Exactly which laws of the universe does omniscience allow you to break?12 Some logical positivists also wrestled with these possibilities, such as Blumberg and Feigl [8], countering the claims of critics who say some scientific assertions are not verifiable. Blumberg and Feigl say those critics are confused about the meaning of ‘possibility of verification’, which assumes we can surpass the limitations of scientific instruments and laws of nature. In the present work on consciousness, it is sufficient to leave these as ambiguous edge cases because nobody is claiming that consciousness exists but is hidden from us by natural laws such as the light barrier or arrow of time. Of course, some people use very loose definitions of ‘exist’ and ‘real’, such as if something is real if you are simply able to talk about it—this means that dragons and other imaginary creatures are real [9]—or the even stronger definition of modal realism, that arguably views mere possibility as sufficient for realness [37]. With those definitions, I agree that consciousness is real and exists. But the most common scholarly and everyday usages of ‘consciousness’ seem to imply a much stronger sense of reality or existence. The discussion of whether consciousness exists also seems to evade debates of idealism, whether the physical world exists out-side of mental phenomena, since consciousness is very clearly a mental phenomenon [51]. 12 Another category is dualist or non-physical phenomena, which are commonly posed as answers
to, ‘What is consciousness?’ As noted above, the semanticist argument does not rely on physicalism. If interactionist dualist phenomena exist or if non-interactionist dualist phenomena exist and we have dualist means of knowledge production, then they could be part of a precise definition of consciousness, and semanticism applies. If non-interactionist dualist phenomena exist and we are limited to physical means of knowledge production, then they could not be a part of a precise definition of consciousness, and semanticism applies.
Consciousness Semanticism
27
2.5 Related Views A number of philosophical views can be interpreted as claiming that consciousness does not exist in some fashion: consciousness eliminativism [58], eliminative materialism [49], type-A materialism [12], some definitions of materialism itself such as Papineau’s [44], illusionism as developed by Frankish [26], and the view of consciousness as a ‘strange loop’ as developed by Hofstadter [29]. I do not claim that the view I propose here is perfectly aligned with any existing versions of eliminativism. Developing a novel view allows me to position my arguments with more freedom and clarity. The overlap with extant discussions will be discussed throughout the paper as relevant. For example, while I am sympathetic to the broad gesture that consciousness (or at least qualia) is an illusion, or specific assertions that experiences do not have ‘what it is like’ properties, consciousness is analogous to a trick of stage magic, and so on, none of the available formulations of illusionism seem to capture my precise claim regarding imprecision and potential discoverability. In other words, I consider myself an illusionist, but my claim is also more precise than that. Similarly, while I largely agree with previous discussions of eliminativism, I worry that the ontological claim that consciousness does not exist has been conflated with the pragmatic claim that we should eliminate the term ‘consciousness’ from consciousness studies, which is a difficult epistemic and sociological question about the best way to make scholarly progress as a field.
3 The Semanticism Argument Now that terminology is established, the semanticism argument is brief and straightforward. 1. Consider the common definitions of the property of consciousness (e.g., ‘what it is like to be’ an entity) and the standard usage of the term (e.g., ‘Is this entity conscious?’). 2. Notice, on one hand, each common definition of ‘consciousness’ is imprecise. 3. Notice, on the other hand, standard usage of the term ‘consciousness’ implies precision. 4. Therefore, definitions and standard usage of consciousness are inconsistent. 5. Consider the definition of exist as proposed earlier: Existence of a property requires that, given all relevant knowledge and power, we could precisely categorize all entities in terms of whether and to what extent, if any, they possess that property. 6. Therefore, consciousness does not exist. First, define consciousness using any of the available definitions: most commonly, ‘subjective experience’, ‘something it is like to be that organism’ [41], or the homage to jazz, ‘If you got to ask, you ain’t never gonna get to know’ [55]. Also consider the common deictic and ostensive definitions, which simply gesture at consciousness by referring to personal examples, such as saying it is the common feature between you seeing the colour red, imagining the shape of a triangle, and feeling the emotion of joy. Also consider standard usage of ‘consciousness’ in scholarly and everyday discourse, such as the question, ‘Is this computer program conscious?’.
28
J. R. Anthis
Second, notice that all of these definitions are imprecise. They do not clearly differentiate all possible entities into conscious or non-conscious, even if we know all there is to know about such beings. It might seem clear via introspection that there is something that it is like for you to see red and feel joy. In fact, that is loaded into the deictic definition. But one example does not constitute a definition that can be extended to the world at large. Consider not just extensions to the mental lives of other beings (e.g., ‘Is this computer program conscious?’), but also borderline cases in your own mental life, such as whether the vestibular sense (i.e., sensing the spatial orientation of your body) is a conscious experience. Third, notice that scholarly and everyday usages of the term ‘consciousness’, such as the question, ‘Is this computer program conscious?’ imply precision, that there is in principle a correct answer as to whether any particular being is conscious. Fourth, because all standard definitions of consciousness are imprecise (again, except for precision regarding one individual if using the deictic definition), yet common usage implies precision across individuals, there is inconsistency. You could not, even as an omniscient and omnipotent being, categorize any entities (except yourself, if you use the deictic definition) in terms of whether and to what extent they are conscious. Consider a hypothetical example where humanity builds a sophisticated AI and understands every single detail of its inner workings. In this case, what exactly would you check for to determine if the being is conscious or non-conscious? You would have no reasonable basis for claiming it is or is not. This implies there is some issue with common definitions and standard usage. Fifth, as discussed at length above, define ‘existence’ such that: ‘A property exists if and only if, given all relevant knowledge and power, we could categorize all entities in terms of whether and to what extent, if any, they possess that property’. This allows us to test whether a property exists. Sixth, notice that an imprecisely defined property cannot be used to categorize all entities as having to a full extent, having to some extent, or not having that property. As such, the property of consciousness fails the test for the existence.13
4 Objections As with previous literature on the subject, it seems that the best way to respond to objections and thus illustrate my view is to analogize ‘consciousness’ to properties that lack the intellectual and intuitive baggage burdening ‘consciousness’ itself. But, importantly, the analogies are explanatory and not part of the formal argument. Since I am familiar with, and in fact certain of, my first-person experience, I can reasonably guess that other entities like me have their own first-person experiences. 13 Currently this semanticism argument seems conclusive, but to consider another argument, the
existence of consciousness would make the world more complicated than a world without it because it adds an extra feature, which adds the weight of parsimony in favor of semanticism relative to most non-eliminativist views. Absent the semanticism argument, since parsimony is not conclusive (i.e., it is a heuristic, not a proof), it would then need to be weighed, rather subjectively, against the intuition that consciousness exists as a property. Moreover, some realist views would consider consciousness realism to be more parsimonious.
Consciousness Semanticism
29
This is a useful heuristic for most contexts, but not for consciousness. I contend that the following situation is analogous to the case of consciousness: I show you a few distinct shapes on a piece of paper: circles, triangles, squiggly lines, etc. Then I point to one shape and say, ‘That is a baltad’, which is a word I just made up. Can you now categorize all the shapes on the paper as baltads or non-baltads? Of course not. Since all we know about baltads is that one example, baltadness does not exist. Even if I give you an arbitrarily large number of examples, you still would not be able to take a new shape and tell me whether it is a baltad or not. Of course, you could guess the definition I have in mind based on human psychology, such as using a machine learning classifier and coding the shapes as a matrix of pixels, but if I have no hidden definition behind the curtain—no definition I came up with for baltad that I have not yet shared—then there is no empirical strategy that can estimate it because it does not exist. That is the case with the purported property of consciousness, where proponents do not seem to endorse any behind-the-curtain definition. This is one benefit of defining existence the way I do in this paper, since we can say that the property ‘baltadness’ does not exist despite knowing for a fact that a single ‘baltad’ exists. This raises a fatal issue for questions such as, ‘Is this computer program conscious?’ in the same way we would struggle to answer, ‘Is a virus alive?’ (about the property of life) or ‘Is Mount Davidson, at 282 m above sea level, a mountain?’ (about the property of mountainhood). We cannot hope to find an answer, or even give a probability of an answer, to these questions without creating a more exact definition of the term. ‘Life’ is a particularly interesting comparison to ‘consciousness’ because it has endured some of the same discursive challenges. It is useful to talk about living beings versus non-living beings and get a sense of the properties that match our intuitions regarding what is alive and what is not. But do we need academic journals filled with papers asking what is alive and what is not? Do we assign probabilities to whether each different entity is alive? Do we need to spend precious research resources trying to figure out whether a virus, which cannot reproduce on its own but can do so within a living host, is alive? No, we simply accept that it depends on how we define life, and then move on to more important research questions such as, how exactly do viruses work? As Cleland said about her seminal paper on life’s definition [21], ‘I argue that it is a mistake to try to define “life”. Such efforts reflect fundamental misunderstandings about the nature and power of definitions’ [42]. I expect the same evolution of thought will happen to consciousness research, though it might be much harder to reach that resolution. Scientists and philosophers regularly ask meaningless questions about consciousness, assuming that they have the potentially discoverable answers that normal empirical questions do, such as, ‘Will this coin land on heads?’ In that case, it is very clear to everyone involved what it would mean for the coin to land on heads. It might be very clear to you that you are conscious—if we are using a deictic definition of self-reference—but that does not speak to whether a virus, a computer program, or even another human is conscious. In other words, there will never be a ‘conscious-o-meter’ as some have imagined, even if we have a perfect understanding of what is happening inside every mind [57].
30
J. R. Anthis
This ties into another objection: But if we accept the semanticist argument against consciousness realism, and we’re extending this argument to baltads, life, mountainhood, and coin-flipping, doesn’t that imply that the vague everyday properties we refer to like wetness or brightness don’t exist in this way either? Yes, it does, and it is a very important realisation about our everyday language. We do not spend decades of research on questions like, ‘Is this lightbulb bright?’ or ‘Is my raincoat wet?’ or ‘Is my uncle bald?’ because it is so clear that the answers simply hinge on how we choose to define the terms. We instead use these terms only in a much more limited sense of existence, a cluster of things in the multidimensional definition space; we never presume a fact of the matter. For most terms, the ambiguity is an entirely reasonable part of discourse. For example, we all know the sun is very bright and an ember is only slightly bright, and we can ask perfectly fruitful questions such as, ‘Is this lamp bright?’, even if we have not precisely laid out lumens as a unit of measurement.14 In this sense, wetness and brightness do not exist in the sense of ‘exist’ this paper rests on, but wetness and brightness do exist in the sense of ‘exist’ implied by everyday discourse. To put it another way, if our discourse referred to consciousness only as ‘consciousness-as-cluster’ and not as consciousness-as-property, there would be no issue. Moreover, we do not have obfuscating intuitions on brightness and wetness, so we have never cascaded into a ‘hard problem of brightness’, and we probably never will. Yet, because of the different intuitional landscape for consciousness, much of scholarly discourse on consciousness is hopelessly wrapped up in questions of, ‘Is this entity conscious?’ which conflates two very different questions: 1. What mental capacities does the entity have? 2. Which mental capacities do we want to categorize as components of consciousness? The first is an important and substantive scientific question that deserves much attention and detailed neuroscientific research. The second is a relatively arbitrary decision, which we might make based on practical considerations of scholarly or public communication. Neither derives an objective answer as to whether the entity is conscious, and the only reason left to believe in an objective answer is the fallible intuition that consciousness exists. Using current definitions, we will never discover which beings are conscious, and claims like, ‘There is a 10% chance that this computer program is conscious’, are incoherent. But we can decide on a more precise definition and thus know which beings are conscious, and this can be based on genuinely interesting discoveries about the mental 14 Some terms are ambiguous simply because of indexicality, where the term is ambiguous until
placed in a certain context, such as ‘me’ (which depends on who is using the word) or ‘the first item in the list’ (which depends on which list is being referred to). An example of an indexical property is ‘in our group’ (which classifies entities based on whether they are in the group of the speaker). This is not the sort of ambiguity I’m referring to here.
Consciousness Semanticism
31
lives of other beings, such as whether honeybees possess moods (in fact, current evidence suggests they do [6]). This is the same step we have taken for other proto-scientific concepts such as ‘star’. Before we could see the sky with lenses and telescopes, it made perfect sense to use the term ‘star’ to refer to bright little things in the sky. But as we discovered more about the physical properties of different celestial objects, we decided to narrow the definition to mean masses of plasma held together by gravity, as distinct from planets or comets that are also bright little things in the sky. But this was not finding an answer to, ‘Is Polaris a star?’ It was simply an evolution of language as we gathered more detailed empirical information about celestial objects, and our language about consciousness can evolve the same way as we are getting more and more detail on the workings of brains and AI. For another example, consider metals (e.g., gold, iron). It is potentially discoverable whether a certain hunk of metal ore is gold or iron, but only after humanity has decided what exactly gold and iron are—chemical elements with precise atomic numbers.15 This is related to debates on semantic ‘internalism’ [18] and ‘externalism’ [53], whether linguistic meaning is internal to the speaker or an external aspect of the speaker’s environment. An example of the internalist view is that ‘meanings are instructions for how to build concepts of a special sort’ [45], while externalism is famously exemplified in Putnam’s ‘Twin Earth’ thought experiment [46]. The mapping of these semantic positions to consciousness realism and eliminativism is not straightforward. For example, the Twin Earth comparison between water and Twin water entails a singular category while the proto-star comparison entails a plural category, given that bright little things in the sky includes at least (i) masses of plasma and (ii) masses of rock or gas. The claim herein is only that such meanings can change over time based on new empirical knowledge, and this claim may be fit into an internal or external conceptualization, though the internalism fit is more natural in the sense that internal meaning seems more malleable. This is also related to the recent literature on ‘topic continuity’ [33] and ‘conceptual engineering’ [10], alongside older work such as Quine [48] that describes definitions and observations as co-evolving over time. For example, if one sees philosophical inquiry as grounded not only in analysing the current usage of a concept but in engineering future usage [31], then the vagueness entailed in current usage of ‘consciousness’ may be more comfortable. Nonetheless, the vagueness would still entail that precisifying the concept is a part of answering the central question of consciousness, ‘Is this entity conscious?’. You admit that there is something—you call it ‘consciousness-as-self-reference’— exists and is such that I have 100% confidence in it. Is this first-person knowledge not ineffable? Even if we solve all the ‘easy’ problems of consciousness, how could someone else ever gain that knowledge? 15 One could argue that even atomic elements may not be entirely precise. What if scientists
encounter an atom identical to a gold atom but with a new, undiscovered subatomic particle included? This seems impossible based on current physics, but most epistemic views imply we cannot completely eliminate its possibility. In this case as in others, we can accommodate much vagueness by speaking of atomic elements given our current physical models but not by presuming a discoverable answer as to which entities constitute the element in scenarios when those physical models no longer apply.
32
J. R. Anthis
It is true that self-reference is a unique phenomenon in that it seems to remain even if you are in any imaginable brain-in-a-vat or Laplace’s demon scenario. We need to be careful not to conflate this with the broader claim that the indexicality or positionality of the observer implies an existing property. No additional knowledge is accrued by indexicality: If I am the first person in a single-file queue at a marketplace, I am unique in my indexicality in the sense that no other person could ever be first in line at the same time, but what knowledge does that endow me with that others necessarily lack? Other people in line could assess what my visual field is from that position, how happy I must be to be first in line, and so on. Ineffable knowledge does not follow from unique indexicality. You say that the common definitions, such as ‘something it is like to be that organism’, are not precise. But first-person experience offers exactly the precision you’re missing! Observe, in yourself, the ‘what it is like’-ness, such as the redness of red or the feeling of your hand on a hot stove. This is a very specific thing that an omniscient being could look for and use to categorize the vast majority of (or, more strongly, all) entities. When I observe my most personal of mental features, I share this intuition about the nature of those features. And I agree that the core semanticism argument (1–6 above) rests on whether we accept or reject the intuition that consciousness exists. I agree that the property of consciousness, then, seems to exist in a very real, undeniable way; explaining this intuition is known as the ‘meta-problem of consciousness’ [16]. However, there are two issues with the reliability of this intuition. First, as explained above, we must differentiate the two uses of the word ‘consciousness’: the act of self-reference and the broader property. Reasonable people may consider the former a brute fact, but the latter is not self-evidenced through introspection—it transcends that datum. In other words, we may have strong intuitions about consciousness-as-property, but we cannot have direct introspective evidence about consciousness-as-property in the way we do for consciousness-as-self-reference. In this paper, I exclusively take aim at consciousness-as-property’s nonexistence. There is no ‘hard problem’ of consciousness-as-self-reference because the self-reference is simply a datum; it has no extension across individuals as a property. There is no category that we know my consciousness-as-self-reference and your consciousness-as-self-reference both belong to other than those following from the similar processes by which we made those self-references. Second, if we properly disentangle self-reference from property, it seems that humans do not have reliable intuitions about the category of deep question to which questions of consciousness belong.16 Humans did not evolve making judgments of and getting feedback on our answers to deep questions, such as the nature of sentience, quantum physics, molecular biology, or any other field that was not closely related to the day-today decision-making of our distant ancestors. Such intuitions are unrefined extrapolation from our intuition-building, evolutionary environment. Moreover, there are strong, specific reasons to expect humans to have an intuition that consciousness exists even if it 16 An unreliable intuition is still interesting and worth discussion, but less reliability should
correspond to proportionally less evidential weight in our beliefs.
Consciousness Semanticism
33
does not. The idea of an objective property of consciousness is in line with a variety of intuitions humans have about their own superiority and special place in the universe. We tend to underestimate the mental capacities of nonhuman animals; we struggle to accept our own inevitable deaths; and even with respect to other humans, most of us suffer from biases a la the Dunning–Kruger effect. Consciousness realism is the same sort of phenomenon: it places our mental lives in a distinct, special category, which is a quite enticing prospect. Indeed, another objection I often hear is that consciousness eliminativism cannot be accepted because it would not allow us to give consciousness the moral consideration it deserves, but the moral impetus yielded by a belief is not evidence of that belief’s validity. As Elizabeth Anscombe said of an interaction with Ludwig Wittgenstein, ‘He once greeted me with the question: “Why do people say that it was natural to think that the sun went round the earth rather than that the earth turned on its axis?” I replied: “I suppose, because it looked as if the sun went round the earth.” “Well,” he asked, “what would it have looked like if it had looked as if the earth turned on its axis?”’ [3]. In the case of consciousness, we should reflect on how it would seem if consciousness does not exist as a property; it would seem17 no different than the current situation, and thus our intuition provides no net evidence against (or in favour of) eliminativism. The intuition that consciousness exists may appear more definitive than the intuition in favour of geocentrism, but again, I insist that we are discussing consciousness-as-property, not consciousness-as-self-reference—and thus there is no reason to give it special weight in our belief system.18 If one continues to testify that they have direct introspective evidence that consciousness-as-property exists, this creates an impasse. Once we have carved out all the discursive space around an individual’s testimony, there is no argument I can offer on this or any other subject that will defeat brute insistence. Moreover, if intuition weighs heavily in this analysis, as seems to be the preference of most philosophers of mind, then we should account for the fact that new survey research suggests most people do not even agree there is a ‘hard problem’ [25]. And if the response to this evidence is that most people have not engaged in the proper reflection on those intuitions, then we should also consider the new experimental research suggesting that lay-person judgments about philosophical cases tend to stay the same after such reflection [32]. Ah, but now you have trapped yourself. Can’t the semanticist argument now be applied to your own claims about the property of ‘existence’, and thus the entire debate between eliminativism and realism is meaningless? This is the objection I am most sympathetic to: Rather than saying consciousness does not exist, I could say that there is no meaningful or determinate answer as to whether consciousness exists or does not. The upshot of my argument would remain. 17 One may respond that any ‘seeming’ is itself consciousness. I take this to be an uncommon
and almost always dismissed definition upon reflection, but if one’s definition of consciousness extends that widely across human mental activity, then of course it exists. It simply does not get us anywhere in our understanding of the mind. 18 Because of the strength of religious doctrine circa 1500, the geocentrism intuition may have felt even stronger than the consciousness intuition at that time.
34
J. R. Anthis
This approach, removing the superficial layer of a philosophical question, could be seen as a version of logical positivism, or more precisely verificationism [40]: Where I say a property ‘does not exist’, you can replace that with, ‘is cognitively meaningless’. However, verificationism seems to not determinately resolve the debate on consciousness because realists could simply assert that the existence of consciousness is empirically verifiable or discoverable through introspection. In the language of Carnap [11], I am arguing that the ‘hard problem’ is a ‘scheinproblem’ or ‘pseudoproblem’, a philosophical problem that is worded as if it has meaningful content, yet it cannot ‘be translated into the formal mode or into any other unambiguous and clear mode’. Or, in Ryle’s [52] language, I am arguing that consciousness realism is a ‘category error’, mistakenly putting consciousness-as-property in the category of precise, real, or meaningful whereas it is actually in the category of vague, unreal, or meaningless. Each of these formulations is a reasonable translation of the view laid out in this paper.19
5 Implications and Concluding Remarks The semanticist claim, if correct, is deeply important. Not only would it mean that a deeply seated intuition about an intimate component of the human experience is wrong, but it would force us to revaluate how we assess the consciousness of other beings. What does it mean to conduct neuroscientific research on consciousness if that property does not exist? How do we make legal and ethical decisions about brain-dead patients, foetuses, or nonhuman animals if there is no fact-of-the-matter regarding who is conscious and who is not? What properties of the mind will we imbue with normative value if we can no longer rest on a vague gesture towards consciousness or sentience? Perhaps even more importantly, humanity seems to be rapidly developing the capacity to create vastly more intelligent beings than currently exist. Scientists and engineers have already built artificial intelligences from chess bots to sex bots. Some projects are already aimed at the organic creation of intelligence, growing increasingly large sections of human brains in the laboratory. Such minds could have something we want to call consciousness, and they could exist in astronomically large numbers. Consider if creating a new conscious being becomes as easy as copying and pasting a computer program or building a new robot in a factory. How will we determine when these creations become conscious or sentient? When do they deserve legal protection or rights? These are important motivators for the study of consciousness, particularly for the attempt to escape the intellectual quagmire that may have grown from notions such as the ‘hard problem’ and ‘problem of other minds’. Andreotta [2] argues that the project of ‘AI rights’, including artificial intelligences in the moral circle, is ‘beset by an epistemic problem that threatens to impede its progress—namely, a lack of a solution to the “Hard Problem” of consciousness’. While the extent of the impediment is unclear, a resolution of the ‘hard problem’ such as the one I have presented could make it easier to extend moral concern to artificial intelligences. So, how should our discussions move forward if we accept semanticism? Let me first clarify what semanticism is not. It is not just a view on semantics, such as ‘This 19 I view semanticism as a precisification of logical positivism alongside eliminativism, anti-
realism, and so on, but a full development of such ideas is beyond the scope of this paper.
Consciousness Semanticism
35
is what “consciousness” means, or a view on epistemology, such as, ‘This is what we can know about consciousness’. These are both deeply involved in the analysis but only insofar as they are components of the ontological question, ‘Does consciousness exist?’. I take that to be a primarily ontological question, but answering that question requires semantics (i.e., meaning) of its three words, and the meaning of ‘existence’ involves epistemological facts about what we can know, as argued above. Moreover, even if you choose to define the word ‘exist’ differently than how I use it here and thus have a different answer to the question, ‘Does consciousness exist?’, the empirical upshot remains. Let me also clarify that semanticism is distinct from the popular question of whether we have consciousness or only seem to have consciousness (an operationalization of the claim that ‘consciousness is an illusion’), which simply depends on whether we choose to define our individual conscious experience in a way that allows for a distinction between the seeming and the conscious experience itself. If we choose this definition, then this operationalization of illusionism can be correct. If we do not, then it is obviously wrong. I think this amounts to a verbal dispute, or specifically what I would call a definitional trap, in which the two sides are talking past each other with different definitions of ‘illusion’, and the question is trivially resolved if we just pick a definition. Moreover, this relatively arbitrary choice of definitions seems much less important than the substantive question of potential discoverability. For now, I suggest we continue to use the word ‘consciousness’. While vague, the term still fills an important social niche that no other term is currently poised to fill. With certain definitions of ‘eliminativist’ or ‘reductionist’, that empirical view makes me no longer qualify for either identity because I am not saying we should eliminate our use of the term, but I suggest that we disentangle ontological views on what exists from strategic views on how we should use language in intellectual discourse. Deciding on the best words to use relies on empirical investigation into what makes for effective communication, and it hinges on a variety of psychological and sociological variables, so I have much less confidence in my view on what term we should use than I do the ontological question of whether consciousness exists. While we may continue using the term ‘consciousness’, I suggest that we no longer approach consciousness as if it is some potentially discoverable property and that we avoid assumptions that there is a ‘hard problem’, a ‘problem of other minds’, ‘neural correlates of consciousness’, or any other sort of monumental gap between scientific understanding of the mind and the ‘mystery’ of conscious experience. Research projects resting on those assumptions are wild goose chases. We should merely use our scientific knowledge to precisify the discourse. As ‘life’ has been broken down into reproduction, growth, homeostasis, and other characteristics, we may break consciousness down into more precise characteristics. Personally, I break it down as reinforcement learning, goal-oriented behaviour, moods, integration, and complexity. There are various empirical descriptions we can give for these characteristics, usually found through neuroscience or behavioural tests, and those are all the explanations we need for a full account of consciousness. We will never discover what consciousness is, except that it is a vague gesture towards certain interesting, important mental phenomena. Theories of consciousness can only succeed in describing such phenomena, perhaps in a relatively unified way
36
J. R. Anthis
such as Baars’ ‘global workspace theory’ [5] or Tononi’s ‘integrated information theory’ [59]. However, the success of a particular theory consciousness will have been a semantic decision, not an objective discovery—a feat of engineering, rather than a feat of analysis.20 There is much new ground to be broken in a new line of consciousness research with a more concrete framework that avoids being caught up in claims of ineffable mystery. We can talk about properties even if they do not exist in this sense, as long as our talk does not imply that they do. In neuroscience, we can still figure out neural correlates of what we intuitively think of as consciousness, and we can still figure out all the wondrous machinery that causes our reports of conscious experience. In fact, because semanticism allows us to disentangle these two phenomena, it seems we can make neuroscientific discoveries more efficiently, casting off the metaphysical and semantic baggage. There is an unfortunate cyclical effect in consciousness studies: our misguided intuition fuels vague terminology and makes philosophers and scientists work hard to justify that intuition—as they have for centuries—which then perpetuates that intuition.21 I believe that if we can get past this mental roadblock, accepting the imprecision of our current terminology and that there is no objective truth to consciousness as it’s currently defined, then we can make meaningful progress on the two questions that are actually
20 It is possible that the precise phenomena associated with consciousness may be tightly clustered
in feature space. For example, with advanced brain imaging and thalamic bridging, we may notice that all adult, non-vegetative humans share a specific information processing system, and when we turn that circuit off (e.g., through transcranial magnetic stimulation), subjects consistently report, ‘Wow, everything is exactly the same, except now it doesn’t feel like anything to be me’, a more generalized version of pain asymbolia. Then we notice that if and only if we place this circuit into artificial intelligence (e.g., the ‘emotion chip’ in Star Trek) does the AI report a ‘what it is like’ to be them. No other circuits have this effect. In this hypothetical scenario, while semanticism would still be correct, it would not matter much in practice because the vagueness could be somewhat resolved by empirical experimentation. Of course, this sort of scenario seems extremely unlikely, especially the consensus of consciousness evaluations of dissimilar entities, such as simple computer programs or alien species. We could differentiate the philosophical view that consciousness will not become precise without further precisification (‘semantic eliminativism’, ‘semantic illusionism’, or semanticism, developed and defender in this paper) from the empirical view that there will not be a convergence of views on such a precisificiation (‘convergence eliminativism’ or ‘convergence illusionism’). 21 I sympathize greatly with physicists defending the Everett or Many-Worlds Interpretation of quantum mechanics. It is easy for eliminativists like me to imagine an alternate history of theoretical physics where the notion of wavefunction collapse was never assumed (analogous to never assuming consciousness realism) and Many-Worlds took off as the default interpretation in the early 1900s instead of taking until at least the 1980s to catch on among quantum field theorists. I also hear woes from theoretical physicists who see a morass keeping string theory in place despite its challenges. Similar dynamics obtain in theology. Consciousness realists ask, ‘Without the reality of consciousness, can we still have compassion for and seek to protect other beings? How can we prevent suffering if suffering does not exist?’ Religious people ask their nonreligious alters, ‘If God doesn’t exist, why don’t you just steal and murder like a selfish hedonist?’ The nonreligious person replies, ‘how scary it would be if my belief in God were the only compelling reason I had to not steal and murder’.
Consciousness Semanticism
37
very real and important: What exactly are the features of various organisms and artificial intelligences, and which exact features do we morally care about? There are also two important moral implications of eliminativism, particularly semanticism, outside of consciousness research. First, it reduces the likelihood of moral convergence (i.e., human descendants settling on a specific moral framework in the future). This is because one-way moral convergence could happen is if humanity discovers which beings are conscious or sentient and uses that as a criterion for moral consideration. This reduced likelihood should make us more pessimistic about the expected value of the far future (in terms of goodness versus badness) given humanity’s continued existence, which then makes reducing extinction risk a relatively less promising strategy for doing good [4]. Second, eliminativism tends to increase the moral weight people place on small and weird minds, such as insects and simple artificial intelligences, which is an important topic in the burgeoning field of research on the moral consideration of artificial entities [28].22 This is not a necessary consequence of the view, but it tends to happen for the following reason: when you view consciousness as a discoverable, real property in the world, you tend to care about all features of various beings (e.g., neurophysiology, behaviour, but also physical appearance, evolutionary distance from humans, substrate, etc.) because these are all analogical evidence of a real property. However, if you instead view consciousness as a vague property, you tend to care less about the features that seem less morally relevant in themselves (e.g., physical appearance, evolutionary distance). Those features may still be indicators, since neither eliminativists nor realists have full knowledge of mental capacities of different entities. However, the indication for an eliminativist is more direct, such as from evolutionary distance to capacity for reinforcement learning rather than the realist evidential pathway from evolutionary distance to capacity for reinforcement learning to ineffable qualia. In other words, if an insect has the capacity for reinforcement learning, moods, and integration of mental processes, then the eliminativist seems to have more freedom to say, ‘That’s a being I care about. My moral evaluation could change based on more empirical evidence, but those are mental features I want to consider’. Eliminativism places an evidential burden on those who deny that animals and other nonhuman entities lack consciousness: they are compelled to point to at least one specific, testable mental feature that those entities lack. When we take a close look at the arguments for or against the existence of consciousness, our common-sense understanding evaporates, and that is okay. In fact, morality, free will, the meaning of life, the purpose of life, and thick concepts that serve as ‘useful fictions’ [50], such as justice, evaporate in analogous ways. The modern stall of intellectual progress on philosophical questions—if the field decides that it sincerely wants to make progress—may be overcome by generalizing semanticism or other versions of eliminativism beyond consciousness. Developing such a view is beyond the scope of the current work, but it seems that the arguments for consciousness eliminativism, moral 22 Eliminativism and illusionism are gateway drugs to panpsychism, in the sense that they encour-
age us to focus on specific mental features such as nociception that exist in a wide range of entities. However, discussion of panpsychism is beyond the scope of this paper and, predictably, hinges on exactly how we define panpsychism.
38
J. R. Anthis
anti-realism,23 free will reductionism, personal identity reductionism, as well as empiricism, positivism, and verificationism, which also sweep away much of philosophical discourse, are valuable starting points for generalization.24 The thrust of semanticism is that a great philosophical clarity comes from accepting the vagueness of most philosophical terms. Questions such as ‘Does free will exist?’ evaporate, leaving only tractable questions such as, ‘What notions of free will are most useful in society today?’. For consciousness studies, my hope is that semanticism, illusionism, or another view under the eliminativist umbrella—broadly construed—will take off and reach escape velocity, driving a new, scientific, and unadulterated understanding of consciousness. Which exact perspective, along with its respective jargon and conceptual Lego, takes off is much less important than the overarching objective of clearing out the intellectual quagmire. I think that clarity could be reached through a variety of intellectual campaigns. For example, it could be that we say, ‘Consciousness exists, but qualia do not’, which may not be eliminativism per se but could have the same underlying claims about the world. Again, the map is not the territory. In any case, I expect that we will cast off the baggage of the ‘hard problem’ and similarly confused concepts, entering a new era of clarity in consciousness studies. The nonexistence of consciousness can be one of the most challenging claims to accept: it pushes against a deeply held intuition about the nature of human experience. We crave a unique, unsolvable mystery at the core of our being. We want something to hang onto in this perilous territory, and due to academic happenstance, the terms carved out as handholds have been ‘hard problem’ and ‘qualia’ and other words that mistakenly gesture at ineffability and grandiosity. If we want to advance our species’ understanding of who we are, we need to let go of these unsubstantiated intuitions. There is no insurmountably hard problem of consciousness, only the exciting and tractable problems that call out for empirical and theoretical study. The deepest mysteries of the mind are within our reach. Acknowledgments. I am grateful for insight from Kelly Anthis, Matthew Barnett, Peter Brietbart, Liam Bright, Jake Browning, Rachel Carbonara, David Chalmers, Patricia Churchland, Rodrigo Diaz, Kynan Eng, Keith Frankish, Douglas Hofstadter, Peter Hurford, Tyler John, Ali Ladak, Tom 23 This kind of moral anti-realism also seems to constitute a counterargument to moral uncertainty
[38], the idea that we should account for being factually wrong about morality, analogous to empirical uncertainty, though a version of moral uncertainty could persist where the moral agent simply decides to care about their future moral preferences and account for new occurrences changing those. There may also be a sort of Pascal’s wager for moral realism where anti-realists should account for the expected realist value of their actions, but I am not persuaded by such argumentation because it hinges on the plausibility of moral realism, whereas to me it seems semantically mistaken and thus cannot be assigned even a tiny probability. 24 There are differences in the current discourses on these topics. For example, while consciousness is mostly referred to as a fact-of-the-matter of which discovery is at least theoretically possible (if not practical), some other concepts are more often properly acknowledged as useful fictions, where we are merely smoothing out a scatterplot of intuitions, which would remove the force of my argument. Currently it seems to me that none of these discourses fully acknowledge that fictitious, subjective nature of their object of study. Instead of trimming the literature with Occam’s razor, it may be so distended that we need to launch Occam’s nuke.
Consciousness Semanticism
39
McClelland, Kelly McNamara, Seán O’Neill McPartlin, Caleb Ontiveros, Jay Quigley, Jose Luis Ricon, Ilana Rudaizky, Atle Ottesen Søvik, and Brian Tomasik.
References 1. Akiba, K.: How Barnes and Williams have failed to present an intelligible ontic theory of vagueness. Analysis 75(4), 565–573 (2015). https://doi.org/10.1093/analys/anv074 2. Andreotta, A.J.: The hard problem of AI rights. AI Soc. 36(1), 19–32 (2020). https://doi.org/ 10.1007/s00146-020-00997-x 3. Anscombe, G.E.M.: An Introduction to Wittgenstein’s Tractatus. Hutchinson University Library, London (1959) 4. Anthis, J.R., Paez, E.: Moral circle expansion: a promising strategy to impact the far future. Futures 130, 102756 (2021). https://doi.org/10.1016/j.futures.2021.102756 5. Baars, B.J.: A Cognitive Theory of Consciousness. Cambridge University Press, Cambridge (1988) 6. Bateson, M., Desire, S., Gartside, S.E., Wright, G.A.: Agitated honeybees exhibit pessimistic cognitive biases. Curr. Biol. 21(12), 1070–1073 (2011). https://doi.org/10.1016/j.cub.2011. 05.017 7. Block, N.: On a confusion about a function of consciousness. Behav. Brain Sci. 18(2), 227–247 (1995). https://doi.org/10.1017/S0140525X00038188 8. Blumberg, A.E., Feigl, H.: Logical positivism. J. Philos. 28(11), 281–296 (1931) 9. Bowles, N.: Jordan Peterson, Custodian of the Patriarchy. The New York Times, New York (2018) 10. Cappelen, H.: Fixing Language: An Essay on Conceptual Engineering. Oxford University Press, Oxford (2018) 11. Carnap, R.: Scheinprobleme in der Philosophie (1928) 12. Chalmers, D.J.: Consciousness and its place in nature. In: Stich, S.P., Warfield, T.A. (eds.) Blackwell Guide to the Philosophy of Mind. Blackwell Publishing, Oxford (2003) 13. Chalmers, D.J.: Facing up to the problem of consciousness. J. Conscious. Stud. 2(3), 200–219 (1995) 14. Chalmers, D.J.: Moving forward on the problem of consciousness. J. Conscious. Stud. 4(1), 3–46 (1997) 15. Chalmers, D.J.: The Conscious Mind: In Search of a Fundamental Theory. Oxford University Press, Oxford (1996) 16. Chalmers, D.J.: The meta-problem of consciousness. J. Conscious. Stud. 25(9–10), 6–61 (2018) 17. Chalmers, D.J.: Verbal disputes. Philos. Rev. 120(4), 515–566 (2011) 18. Chomsky, N., Smith, N.: New Horizons in the Study of Language and Mind. Cambridge University Press, Cambridge (2000) 19. Churchland, P.S.: The Hornswoggle problem. J. Conscious. Stud. 3(5–6), 402–408 (1996) 20. Churchland, P.M.: The rediscovery of light. J. Philos. 93(5), 211 (1996) 21. Cleland, C.E., Chyba, C.F.: Defining ‘life.’ Orig. Life Evol. Biosph. 32(4), 387–393 (2002). https://doi.org/10.1023/A:1020503324273 22. Dennett, D.: Facing backwards on the problem of consciousness. J. Conscious. Stud. 3(1), 4–6 (1996) 23. Dennett, D.: Magic, Illusions, and Zombies: An Exchange, The New York Review. WW Norton and Company, New York (2018) 24. Dennett, D.: The milk of human intentionality. Behav. Brain Sci. 3(3), 428–430 (1980). https:// doi.org/10.1017/S0140525X0000580X
40
J. R. Anthis
25. Díaz, R.: Do people think consciousness poses a hard problem? Empirical evidence on the meta-problem of consciousness. J. Conscious. Stud. 28(3–4), 55–75 (2021) 26. Frankish, K.: Illusionism as a theory of consciousness. J. Conscious. Stud. 23(11–12), 11–39 (2016) 27. Gloor, L.: Moral Anti-Realism Sequence #2: Why Realists and Anti-Realists Disagree, Effective Altruism Forum (2020). https://forum.effectivealtruism.org/s/R8vKwpMtFQ9kDvkJQ/ p/6nPnqXCaYsmXCtjTk 28. Harris, J., Anthis, J.R.: The moral consideration of artificial entities: a literature review. Sci. Eng. Ethics 27(4), 1–95 (2021). https://doi.org/10.1007/s11948-021-00331-8 29. Hofstadter, D.R.: I Am a Strange Loop. Basic Books, New York (2007) 30. van Inwagen, P.: Existence, ontological commitment, and fictional entities. In: Loux, M.J., Zimmerman, D.W. (eds.) The Oxford Handbook of Metaphysics. Oxford University Press, Oxford (2005) 31. Jackman, H.: Construction and continuity: conceptual engineering without conceptual change. Inquiry 63(9–10), 909–918 (2020). https://doi.org/10.1080/0020174X.2020.1805703 32. Kneer, M., Colaço, D., Alexander, J., Machery, E.: On second thought: reflections on the reflection defense. In: Lombrozo, T., Knobe, J., Nichols, S. (eds.) Oxford Studies in Experimental Philosophy, vol. 4. Oxford University Press, Oxford (forthcoming) 33. Knoll, V.: Verbal disputes and topic continuity. Inquiry 1–22 (2020). https://doi.org/10.1080/ 0020174X.2020.1850340 34. Knuth, D.E.: Mathematics and computer science: coping with finiteness. Science 194(4271), 1235–1242 (1976). https://doi.org/10.1126/science.194.4271.1235 35. Korzybski, A.: Science and Sanity: An Introduction to Non-Aristotelian Systems and General Semantics. International Non-Aristotelian Library Publishing Company, Lancaster (1933) 36. Lewis, C.I.: Mind and the World-Order. C. Scribner’s Sons, New York (1929) 37. Lewis, D.K.: On the Plurality of Worlds. Basil Blackwell, Oxford (1986) 38. MacAskill, W., Bykvist, K., Ord, T.: Moral Uncertainty. Oxford University Press, New York (2020) 39. McGinn, C.: Can we solve the mind-body problem? Mind, XCVIII 391, 349–366 (1989). https://doi.org/10.1093/mind/XCVIII.391.349 40. Misak, C.J.: Verificationism: Its History and Prospects. Routledge, London (1995) 41. Nagel, T.: What is it like to be a bat? Philos. Rev. 83(4), 435–450 (1974). https://doi.org/10. 2307/2183914 42. NASA: Life’s Working Definition: Does It Work?: Interview with Carol Cleland (2003). https://www.nasa.gov/vision/universe/starsgalaxies/life’s_working_definition.html 43. Orilia, F., Swoyer, C.: Properties. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Stanford (2020) 44. Papineau, D.: Thinking about Consciousness. Oxford University Press, Oxford (2002) 45. Pietroski, P.M.: Conjoining Meanings. Oxford University Press, Oxford (2018) 46. Putnam, H.: Meaning and reference. J. Philos. 70(19), 699 (1973). https://doi.org/10.2307/ 2025079 47. Putnam, H.: The Many Faces of Realism. Open Court, Chicago (1987) 48. Quine, W.V.: Two dogmas of empiricism. Philos. Rev. 60(1), 20 (1951). https://doi.org/10. 2307/2181906 49. Ramsey, W.: Eliminative materialism. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Stanford (2020) 50. Rosen, G.: Modal fictionalism. Mind 99, 395 (1990) 51. Russell, B.: Idealism. The Problems of Philosophy. Oxford University Press, Oxford (1912) 52. Ryle, G.: The Concept of Mind. Hutschinson’s University Library, London (1949) 53. Speaks, J.: Theories of meaning. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Stanford (2021)
Consciousness Semanticism
41
54. Strawson, G.: Appendix: dunking Dennett. Estudios de Filosofia 59, 9–43 (2019) 55. Strawson, G.: Consciousness Isn’t a Mystery. It’s Matter. The New York Times, New York (2016) 56. Strawson, G.: The Consciousness Deniers (2018) 57. The Brain Dialogue: Towards a Conscious-O-Meter (2016). https://www.cibf.edu.au/towardsa-conscious-o-meter 58. Tomasik, B.: The Eliminativist Approach to Consciousness, Center on Long-Term Risk (2015). https://longtermrisk.org/the-eliminativist-approach-to-consciousness 59. Tononi, G.: Consciousness as integrated information: a provisional manifesto. Biol. Bull. 215(3), 216–242 (2008). https://doi.org/10.2307/25470707 60. Wittgenstein, L.: Tractatus Logico-Philosophicus. Harcourt, Brace and Company, New York (1922)
A Second-Order Adaptive Network Model for Exam-Related Anxiety Regulation Isabel Barradas1 , Agnieszka Kloc2 , Nina Weng3 , and Jan Treur4(B) 1 Faculty of Science and Technology, Free University of Bozen-Bolzano, Bolzano, Italy
[email protected]
2 Department of Technology and Operations Management, Erasmus University Rotterdam,
Rotterdam, The Netherlands [email protected] 3 Technical University of Denmark, DTU Compute, Kongens Lyngby, Denmark [email protected] 4 Department of Computer Science, Social AI Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands [email protected]
Abstract. A common type of performance anxiety is the so-called “exam anxiety”, in which students can experience physical and emotional reactions before or during the exam due to the testing situation. If exam anxiety was already quite prevalent in students’ lives, one could expect that this condition got even worse due to the COVID-19 pandemic. Besides all the worrying factors that COVID-19 brought to the general population, students had to rapidly adapt to the reality of online exam modalities – introducing extra sources of stress. Therefore, our aim is to model the differences between online and offline modalities in the emotion regulation processes to overcome exam anxiety. To model these processes, we used a second-order adaptive network model. We employed reappraisal, since it is considered the most effective emotion regulation strategy to deal with this type of anxiety. We showed that, even though the reappraisal processes take place and the exam anxiety is regulated, the exam anxiety levels are higher in the online exams than in the offline exams. Keywords: Second-order adaptive · Network model · Exam anxiety · Emotion regulation · Reappraisal
1 Introduction Anxiety is an emotion that accompanies everyone throughout their lives, in different moments and to different extents, and by some psychologists is considered as “being at the root of what it means to be human” [1, 2]. It can be triggered by distinct current, upcoming or also past events. Researchers distinguish between various types of anxiety I. Barradas, A. Kloc and N. Weng—The first three authors contributed equally to this research and the writing of the paper. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 42–53, 2022. https://doi.org/10.1007/978-3-030-96993-6_4
A Second-Order Adaptive Network Model
43
depending on the context it refers to, for example classroom anxiety, public speaking anxiety, workplace-related anxiety, exam anxiety. In this work, we focus on the latter one and examine how the specifics of the context – online or offline examination mode – could affect the experienced levels of anxiety. The experienced levels of anxiety depend additionally on whether an individual is able to employ processes that can help decrease his/her/their emotion levels. In the psychological literature, such processes are referred to as emotion regulation [3]. Examples of the most widely discussed emotion regulation strategies are acceptance, avoidance, problem solving or reappraisal. These emotion regulation strategies are extremely important for human mental functioning as it is often argued that the impairment or lack of ability to employ them when faced with stressful situations can explain the existence of the so-called distress disorders – generalized anxiety disorder or depression [4]. Typically, these strategies are used on a daily basis by everyone at least to a certain extent, consciously or subconsciously, in each anxiety-inducing situation. To the best of our knowledge, however, emotion regulation models have not been used in the context of modeling the differences between anxiety levels in online and offline exams and thus we aim to fill this gap by addressing these processes in our research. To study the anxiety-related mental processes in the online and offline exams context, we construct a computational model which takes into account the differences between anxiety-related states that become activated in a specific (online or offline exam) context. We also model the emotion regulation process in regard to two anxiety conditions: concerning having a stable examination environment and concerning the new exam formula with a different type of questions. Finally, we add two adaptation levels to the model, what helps us make the model more realistic by making the connection strength between anxiety and emotion regulation control states, as well as the speed of its development, adaptive.
2 Background Exam anxiety is a type of performance anxiety [5, 6] that is defined by Zeidner in his book as a “set of phenomenological, physiological, and behavioral responses that accompany concern about possible negative consequences or loss of competence on an exam or similar evaluative situation” [7]. This type of anxiety is experienced by many students and it tends to happen when the student perceives either the test performance or the testing situation as a threat [8]. As a consequence, the student believes that his/her/their intellectual, motivational, and social capabilities are not enough to cope with the test [7]. Exam anxiety has also been referred to as test anxiety, exam stress or test stress. There are different factors that impact exam anxiety, such as fear of failure, lack of preparation, poor test history, high pressure, and perfectionism. Besides all of this, the COVID-19 pandemic brought more stressful factors that are added to the ones just mentioned. Not just people increased their level of anxiety due to concerns related to their health and economic situation, students have classes and are assessed in a novel scenario to which they are not used. Moreover, adjusting to the new required routine can be challenging. Even before the COVID-19 pandemic, online assessment has been gaining followers. In fact, it can be more convenient to schedule, it saves time grading
44
I. Barradas et al.
and entering grades (can even be automatic), and it costs less than paper and pencil exams [9]. Moreover, Stowell & Bennett (2010) [10] reported that students that normally had high levels of exam anxiety in the classroom had reduced exam anxiety during online exams, while students with low classroom anxiety had higher anxiety during online exams. Nonetheless, changing the paradigm from offline exams (exams in the classroom) to online exams requires time and resources for adaptation – which was something not possible to achieve during the COVID-19 pandemic. In many cases, online exams became mandatory and the process of converting the assessment methodologies was accelerated. To cope with this new reality, both students and instructors had to learn fast how to use these platforms, which introduced an extra source of stress. A low level of test-related anxiety is beneficial to students, since it makes them more aware and alert. However, an overall state of anxiety has negative consequences on the well-being of students and also on their performance. Exam anxiety has a mental and physical impact, that can be reflected in many symptoms, such as: loss of sleep, appetite and hair, nervousness, fear, irritability, headaches, inability to concentrate, and craving for food (before the exam); as well as confusion, panic, mental blocks, fainting, and feeling too hot or too cold (during the exam) [5]. Also, Akinsola & Nwajei (2013) [11] reported a coexistence of exam anxiety with depression. This further affects students exam performance and their grades, as established by Putwain (2008) [12] and Segool et al. (2013) [13]. These examples of consequences are already enough to understand that strategies are needed to reduce the anxiety level to a state in which it can be healthy and positively-stimulating. Many strategies to reduce exam anxiety that can be adopted and combined can be found in Mashayekh & Hashemi (2011) [5]. Improving self-image and motivation, keeping a healthy lifestyle, and taking test samples are good examples of practices that can be performed to help students to decrease their felt exam anxiety. Even though these strategies are general and not specific to the COVID-19 situation, they can be helpful in that context too. Nonetheless, since COVID-19 forced the implementation of online assessments, our goal here is to assess the differences in exam anxiety for online and offline exams. Therefore, the emotion regulation strategies adopted in this work will take the differences between these two modalities into consideration and these general strategies will not be considered. Emotion regulation refers to the ability to effectively control which emotions individuals have, when they have them, and how they experience or express these emotions [14]. Processes to regulate emotions are either automatic or controlled, conscious or unconscious, and can affect different parts in the emotion process. Although the definition of emotion is ambiguous, here we consider that emotions can be the abbreviation for “emotional episodes” since they are multicomponential processes that evolve in time. Therefore, emotion regulation mechanisms cause changes in the emotion dynamics. With this in mind, there is the need to adopt models that are able to reflect these dynamical changes. According to Gross (2013) [3], emotion regulation has three core features: the goal (what the individual is trying to accomplish), the strategy (which processes are engaged to achieve that goal), and the outcome (the consequences of trying to reach that goal using
A Second-Order Adaptive Network Model
45
that strategy). In this work, we are focused on the strategies to regulate emotions. Whatever the goal is, there are different ways to achieve it and can be categorized as adaptive (associated with greater wellbeing or fewer symptoms) or maladaptive (associated with psychological symptoms and other negative outcomes) [15]. Maladaptive strategies include, for instance, suppression (the attempt to hide, inhibit or reduce ongoing emotion-expressive behavior [16, 17] which happens once an emotion is already taking place [18]) and rumination (the repetitive thinking about the thoughts and feelings about the event [19]. Both processes give the individual the feeling that they are solving the process, but actually trigger more negative emotions [20]. On the other hand, adaptive strategies can include, for instance, putting an event into perspective: diminishing the meaning of the event [19] or reappraisal (the attempt to reinterpret an emotion-eliciting situation, altering its meaning and its emotional impact [17]). Reappraisal can be employed even before the emotion takes place and therefore is able to modify the entire dynamics of the emotional process before the response has been completely generated [18]. In fact, individuals can change their exam anxiety through reappraisal. Brady et al. (2018) [21] suggested that a way to do it would be to reinterpret the role of anxiety – instead of considering this state as harmful, students should try to see it as neutral or beneficial. Due to the believed effect of reappraisal in controlling exam anxiety, this emotion regulation technique is commonly employed during cognitive therapy [22, 23]. Strain & D’Mello (2011) [24] concluded that reappraisal is an effective strategy for emotion regulation during learning, which was also beneficial for the learning process. Moreover, it can be argued that even though certain strategies, such as distraction, can be effective in high-intensity negative situations [25], they cannot be effective in the learning context, especially if it is the only strategy used. For instance, if students exclusively apply the strategy (distraction) before the exam, it can happen that they will not have enough time to study everything and, once they realize this, they get too anxious to study more (worsening the process) and they will be even more anxious during the exam. If students decide to exclusively apply the strategy during the exam, it is easy to understand that either they do not finish the exam in time, or at a certain point they will realize that they do not have enough time to finish it, affecting their performance at the rest of the exam. For these reasons, reappraisal is the only strategy that is going to be considered here. It is important to note that reappraisal can be cognitively demanding and therefore, to be effective, it should not happen too late in the emotional process.
3 The Adaptive Network Modeling Approach Used In this work, the adaptive modeling approach from [26] is applied for designing and simulating a dynamic process of multiple orders of adaptation. By extracting states and causal relationships between them for a certain mental or social process, the network model is built. The linkages could be manipulated further by adding extra states to the network, which are called self-model states or reification states. Given the initial value for all states, the states activations will change with time. The network structure characteristics for this approach are described as follows:
46
I. Barradas et al.
Connectivity characteristics: a connection weight ωX,Y represents the connection strength from state X to state Y. Aggregation characteristics: for each state Y a combination function cY (..) is used to aggregate the multiple incoming impacts. Timing characteristics: ηY represents the speed factor of state Y which controls the timing of the impact on Y. These three types of network characteristics provide the standard numerical representation of the network model in difference equation format: Y (t + t) = Y (t) + ηY [cY (ωX1 ,Y Xk (t), . . . , ωXk ,Y Xk (t)) − Y (t)]t
(1)
where Y is any given state that has incoming connections from states X 1 ,…, X k . The characteristics for all states are specified in a standard table format called role matrices, which allows us to simulate the process automatically in the software environment. Using the notion of self-modeling network (also called reified network) introduced in [26], any network characteristic can be made adaptive by adding a (self-model) state to the network that represents the value of this characteristic. This will be applied here to obtain a second-order adaptive network in which for some states X and Y: (1) first-order self-model states WX,Y are included in the network that represent the value of connection weight ωX,Y , and (2) second-order self-model states HWX,Y are included in the network that represent the value of the speed factor of WX,Y (learning rate).
4 The Second-Order Adaptive Network Model The knowledge from the background literature (Sect. 2) and the modeling approach from Sect. 3 were combined into an adaptive mental network model for exam-related anxiety regulation. The connectivity of the multi-levelled model is 3D-visualized in Fig. 1 with the detailed explanation of states in the Table 1. This model simulates the process of activating and regulating the anxiety before and during the exam. In the base level, the two exam states are introduced (X 1 – on_ex, X 2 – off_ex) for specifying the two modalities of exam: online and offline. Each modality causes different issues/problems and then leads to different levels of anxiety. The states from X 3 to X 9 states are potential issues that students might encounter before the exam, while the during-exam issues are represented by the states from X 10 to X 16 . Some of these potential problems might be triggered only by online or offline exam (e.g., the Internet problem during the exam X 10 can only be caused by an online exam) or by both (e.g., the difficulty of checking course materials before the exam X 3 would be introduced by both online and offline settings with different levels of impact). After the “issues states”, four states about the way students feel are created, which are the expected difficulties for exam performance (X 17 ), the expected difficulties for technical performance (X 18 ), the feeling of being demoralized (X 19 ) and the difficulties to be focused (X 20 ). It is necessary to mention that the first two states would affect the latter two due to the fact that the mental states before the exam influence the feelings during the exam, especially in the early stage of the exam. Finally, the exam anxiety level (X 21 ) is integrated based on these four states.
A Second-Order Adaptive Network Model
47
Fig. 1. Overview of the self-modeling network architecture for exam-related anxiety regulation with the base level (lower plane, pink), the first reification level (middle plane, blue), and the second reification level (upper plane, purple).
Since we are applying reappraisal as a strategy of emotion regulation (as mentioned before in Sect. 2.2), there are two pairs of states that are belief states – X 5 and X 6 , X 12 and X 13 – indicating two opposite beliefs about the environment and the type of questions, respectively. The negative states in each pair will be influenced (suppressed) by the corresponding control states (X 22 and X 23 ), so that the student is able to regulate the emotion by reappraisal. These two control states are triggered by the level of exam anxiety (X 21 ). The states were chosen based on a conducted survey that reported the views and difficulties of students during online exam modalities as consequence of the COVID-19 pandemic [27]. We took this study into consideration, but grouped some difficulties that were related to each other (about technical difficulties, for instance). To make the comparison possible, we also added states that are just influenced by offline modalities and therefore were not included in the referred survey. The second level describes first-order network adaptation at the first reification level, which demonstrates the Hebbian learning process for anxiety regulation taking place before and during the exam. By changing the W-states, which provides the self-model representation for the connection weights in the base level, the student learns the ability of switching from negative to positive interpretation over time. The third level describes a second-order adaptation, which controls the speed of the first-order learning at the first reification level. H-states are used for these self-model states. Here, by adapting the speed factor for the first-order adaptation, the student could further gain the ability of reappraisal by adjusting the speed. Some connections in this model have negative values, indicating that the source state negatively impacts the target state. Those connections are colored as dark-red in Fig. 1. The combination functions used in this model are listed in Table 2. When only one incoming connection exists, the identity function is used. By contrast, the advanced logistic function alogistic was chosen to combine multiple effects for the nodes. There are
48
I. Barradas et al.
two parameters for alogistic: the steepness σ and the threshold τ. With the application of Hebbian learning, Hebbian learning function hebb is used in this work with the persistence factor μ accordingly. Figure 2 demonstrates the role matrices of this multi-leveled network model. Matrix mb represents the incoming base connections, which are either between states at the same level, or from lower states to higher states. Notice that even though arrows from higher states to lower states exist in Fig. 1 (e.g., the pink/red arrows from H-states to W-states), those arrows do not indicate the base connectivity but the connectivity based on reification for a specific role; they are specified in the matrix for that role instead. Table 1. Overview of the states of the multi-level network model.
A Second-Order Adaptive Network Model
49
Table 2. The combination functions used in the introduced network model
Identity
Notation id(V)
Advanced logistic sum
alogisticσ,τ(V1, …,Vk)
Hebbian learning
hebbμ(V1, V1, W)
Formula V
Parameters
(1+e-στ) V1V2 (1-W) + μ W
Steepness σ > 0 Excitability threshold τ V1,V2 activation levels of the connected states; W activation level of the selfmodel state for the connection weight μ persistence factor
On the right-top corner of Fig. 2, matrix mcw shows the connection weights which has values within [−1, 1]. Two weights here are not fixed numbers but considered as adaptive – the two states we adapt through Hebbian learning; this is where the related downward connections are specified. In matrix mcfw, the combination functions weights are shown, with only W-states applying the hebb function. Matrix mcfp presents the combination function parameters for all states, where the two negative belief states are applied with lower thresholds than others to model the reappraisal process. Besides, the steepness for X 19 (feeling demoralized), X 20 (difficulty of being focused) and X 21 (exam anxiety) are down-adjusted to imitate the gradually increasing progress of anxiety level during the exam stages. In the right-bottom corner, the matrices of speed factors (matrix msv) and the initial values (iv) are displayed. The current values are given under the online setting. To switch to offline setting, the initial value of X 1 should be set as 0, and X 2 set as 1.
5 Simulation Results In this section, we present some of the simulation results for the introduced model of context-dependent exam anxiety. We show separate simulations for offline and online exams and discuss the differences in anxiety-inducing processes in these two contexts.
50
I. Barradas et al.
Fig. 2. Role matrices for connectivity, aggregation and timing characteristics of the network model.
Figure 3 and Fig. 4 show the simulations with two adaptation levels for online and offline exams, respectively. For the online modality (Fig. 3), the values of the control state for emotional regulation related to environment reaches a maximum value of 0.74. The speed modification makes the values of self-model state X 24 grow slower than if we consider a model without the speed adaptation. The anxiety level reaches a maximum value of 0.87. When it comes to the second emotion regulation process, again the values of the control state X 23 grow slow at first. However, the growth picks up speed. For offline exams (Fig. 4), just the control state X 23 takes place. This because students are used to the classroom environment and, since the negative belief X 6 never reaches high values, the control state X 22 does not need to happen. The anxiety level (X 21 ) is much lower than in the online scenario.
A Second-Order Adaptive Network Model
51
Fig. 3. Simulation results for online exam using emotion regulation processes.
Fig. 4. Simulation results for offline exam using emotion regulation processes.
6 Conclusion and Future Directions This work uses a multi-order adaptive network model to study the differences in emotion regulation for exam-related anxiety between two modalities of exams – online and offline. As an emotion regulation strategy, we applied reappraisal processes, since they represent a commonly employed strategy to deal with learning-related anxiety [21]. In here, our general comparisons were conducted in a way in which the timeline did not represent the exact moments in which the different states happened. The chronological order is correct, but the pre-exam period is “compressed”. For the purpose of this paper and comparison, we considered this approach to be satisfactory. However, if we would like to precisely define the time intervals in which the pre-exam states and the during exam states happen and understand how that affects the overall exam anxiety, we can add two extra context states – pre exam (X 28 ) and during exam (X 29 ) – that just are activated during the time periods in which they occur. We checked this (for a non-adaptive scenario) and that seems to work well.
52
I. Barradas et al.
If one compares our models with non-adaptive models, it might seem that the adaptive models should achieve lower values of exam anxiety (X 21 ). Nonetheless, the two adaptation levels were not added with this purpose, but to make the model resemble a real-life situation more closely. This is because the learning speed of reappraisal is not constant. As expected, we could see that the emotion regulation processes promoted by the control states X 22 and X 23 were influenced by the introduction of the two reification levels, also affecting the beliefs related to the way students perceived the exam environment (X 5 and X 6 ) and the difficulties related to the type of questions (X 12 and X 13 ). It is important to mention that in the scope of this work, we just considered the reappraisal regarding these two variables. Nonetheless, different kinds of beliefs can be taken into account in the future. We argue so because emotion regulation processes are highly subjective and a certain individual could, for instance, regulate his/her/their belief about the duration of the exam, seeing it no longer as a negative factor that introduces pressure, but as a positive factor that allows him/her/them to have extra time to rest. For our simulation, however, we applied emotion regulation process to two beliefs we considered the most likely to become regulated by an average individual. In the future, we would like to simulate how the process would look like for a person already suffering from anxiety. We expect the anxiety levels to increase more. Based on preliminary results, it was already possible to see that the reappraisal processes may not manage to significantly reduce the perceived anxiety. Furthermore, the coexistence of other mental health disorders, such as depression or attention deficit hyperactivity disorder could also be included in the model, altering for instance the parameters of the states related to feeling demoralized and to have difficulties in getting focused, respectively.
References 1. May, R.: The Meaning of Anxiety. Pocket Books, New York (1979) 2. Barlow, D.H.: Unraveling the mysteries of anxiety and its disorders from the perspective of emotion theory. Am. Psychol. 55(11), 1247 (2000) 3. Gross, J.J.: Handbook of Emotion Regulation. Guilford publications, New York (2013) 4. Aldao, A., Nolen-Hoeksema, S., Schweizer, S.: Emotion-regulation strategies across psychopathology: a meta-analytic review. Clin. Psychol. Rev. 30(2), 217–237 (2010) 5. Mashayekh, M., Hashemi, M.: Recognizing, reducing and copying with test anxiety: causes, solutions and recommendations. Procedia Soc. Behav. Sci. 30, 2149–2155 (2011) 6. Pekrun, R.: A social-cognitive, control-value theory of achievement emotions. In: Heckhausen, J. (ed.) Motivational Psychology of Human Development: Developing Motivation and Motivating Development, pp. 143–163. Elsevier Science, Amsterdam (2000) 7. Zeidner, M.: Test Anxiety: The State of the Art. Springer Science and Business Media, Berlin (1998) 8. Russo, T.J.: Multimodal approaches to student test anxiety. Clearing House J. Educ. Strat. Issues Ideas 58(4), 162–166 (1984) 9. Alexander, M.W., Bartlett, J.E., Truell, A.D., Ouwenga, K.: Testing in a computer technology course: an investigation of equivalency in performance between online and paper and pencil methods. J. Career Tech. Educ. 18(1), 69–80 (2001)
A Second-Order Adaptive Network Model
53
10. Stowell, J.R., Bennett, D.: Effects of online testing on student exam performance and test anxiety. J. Educ. Comput. Res. 42(2), 161–171 (2010) 11. Akinsola, E.F., Nwajei, A.D., et al.: Test anxiety, depression and academic performance: assessment and management using relaxation and cognitive restructuring techniques. Psychology 4(06), 18 (2013) 12. Putwain, D.: Do examinations stakes moderate the test anxiety–examination performance relationship? Educ. Psychol. 28(2), 109–118 (2008) 13. Segool, N.K., Carlson, J.S., Goforth, A.N., Von Der Embse, N., Barterian, J.A.: Heightened test anxiety among young children: elementary school students’anxious responses to highstakes testing. Psychol. Sch. 50(5), 489–499 (2013) 14. Gross, J.J.: The emerging field of emotion regulation: an integrative review. Rev. Gen. Psychol. 2(3), 271–299 (1998) 15. Aldao, A., Nolen-Hoeksema, S.: Specificity of cognitive emotion regulation strategies: a transdiagnostic examination. Behav. Res. Ther. 48(10), 974–983 (2010) 16. Gross, J.J., Levenson, R.W.: Emotional suppression: physiology, self-report, and expressive behavior. J. Pers. Soc. Psychol. 64(6), 970 (1993) 17. Gross, J.J., John, O.P.: Individual differences in two emotion regulation processes: implications for affect, relationships, and well-being. J. Pers. Soc. Psychol. 85(2), 348 (2003) 18. Cutuli, D.: Cognitive reappraisal and expressive suppression strategies role in the emotion regulation: an overview on their modulatory effects and neural correlates. Front. Syst. Neurosci. 8, 175 (2014) 19. Garnefski, N., Kraaij, V., Spinhoven, P.: Manual for the Use of the Cognitive Emotion Regulation Questionnaire. DATEC, Leiderdorp, The Netherlands (2002) 20. Zuzama, N., Fiol-Veny, A., Roman-Juan, J., Balle, M.: Emotion regulation style and daily rumination: potential mediators between affect and both depression and anxiety during adolescence. Int. J. Environ. Res. Public Health 17(18), 6614 (2020) 21. Brady, S.T., Martin Hard, B., Gross, J.J.: Reappraising test anxiety increases academic performance of first-year college students. J. Educ. Psychol. 110(3), 395 (2018) 22. Nadinloyi, K.B., Sadeghi, H., Garamaleki, N.S., Rostami, H., Hatami, G.: Efficacy of cognitive therapy in the treatment of test anxiety. Procedia Soc. Behav. Sci. 84, 303–307 (2013) 23. Ejei, J., Gholamali Lavasani, M., et al.: The effectiveness of coping strategies training with irrational beliefs (cognitive approach) on test anxiety of students. Procedia Soc. Behav. Sci. 30, 2165–2168 (2011) 24. Strain, A.C., D‘Mello, K.: Emotion regulation during learning. In: International Conference on Artificial Intelligence in Education, pp. 566–568. Springer, Heidelberg (2011).https://doi. org/10.1007/978-3-642-21869-9_103 25. Sheppes, G., Scheibe, S., Suri, G., Gross, J.J.: Emotion-regulation choice. Psychol. Sci. 22(11), 1391–1396 (2011) 26. Treur, J.: Network-Oriented Modeling for Adaptive Networks: Designing Higher-Order Adaptive Biological, Mental and Social Network Models. Springer Nature, Chem (2020). https:// doi.org/10.1007/978-3-030-31445-3 27. Ocak, G., Karakus, G.: Undergraduate students’ views of and difficulties in online exams during the covid-19 pandemic. Themes eLearn. 14, 13–30 (2021)
Using Boolean Functions of Context Factors for Adaptive Mental Model Aggregation in Organisational Learning Gülay Canbalo˘glu1,2 and Jan Treur1,3(B) 1 Center for Safety in Healthcare, Delft University of Technology, Delft, The Netherlands 2 Department of Computer Engineering, Koç University, Istanbul, Turkey
[email protected]
3 Social AI Group, Department of Computer Science, Vrije Universiteit Amsterdam,
Amsterdam, The Netherlands [email protected]
Abstract. Aggregation of individual mental models to obtain shared mental models for an organization is a crucial process for organizational learning. This aggregation process usually depends on several context factors that may vary over circumstances. It is explored how Boolean functions of these context factors can be used to model this form of adaptation. For adaptation of aggregation of mental model connections (represented by first-order self-model states), a second-order adaptive self-modeling network model for organizational learning was designed. It is shown how in such a network model, Boolean functions can be used to express logical combinations of context factors and based on this can exert context-sensitive control over the mental model aggregation process.
1 Introduction Organisational learning (Argyris and Schön 1978; Crossan et al. 1999; Fischhof and Johnson 1997; Kim 1993; Wiewiora et al. 2019) is a challenging complex adaptive phenomenon within an organisation, especially when it comes to computational modelling of it. Within organisational learning different types of adaptation processes work together in a dynamical manner via a number of feedforward and feedback cycles; e.g., (Crossan et al. 1999; Kim 1993; Wiewiora et al. 2019). Specific adaptation process involved are, for example, individual learning and development of mental models, formation of shared mental models for teams or for the organisation as a whole, and improving individual mental models or team mental models based on a shared mental model of the organisation; e.g., (Crossan et al. 1999; Kim 1993; Wiewiora et al. 2019). Kim (1993), p. 44, puts forward that ‘Organizational learning is dependent on individuals improving their mental models; making those mental models explicit is crucial to developing new shared mental models’. An interesting challenge addressed in the current paper is how specific context factors play their role in formation of a shared mental model by aggregating a number of individual mental models in an adaptive, context-sensitive manner.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 54–68, 2022. https://doi.org/10.1007/978-3-030-96993-6_5
Using Boolean Functions of Context Factors for Adaptive Mental Model
55
Concerning mental models, in (Treur and Van Ments 2022) it has been explored how self-modeling networks (Treur 2020) provide an adequate modeling approach to obtain computational models addressing how they are used for internal simulation, adapted by learning, revision or forgetting, and the control of this. In (Canbalo˘glu et al. 2021), it has also been shown how based on self-modeling networks computational models of organisational learning can be designed; here aggregation of individual mental model was addressed in a fixed, nonadaptive manner. In (Canbalo˘glu and Treur 2021) a first attempt was made to model context-sensitive adaptation of the aggregation process, in that case by some heuristic approach. In contrast to this, the current paper addresses the adaptation of the aggregation process in a more precise manner by modeling it based on more exact knowledge expressed by Boolean propositions or functions of the considered context factors. In this paper, in Sect. 2 some background knowledge is discussed. Section 3 describes the self-modeling networks modeling approach. In Sect. 4, a computational self-modeling network model for organisational learning with context-dependent aggregation based on Boolean functions will be introduced. Section 6 illustrates the model by an example simulation scenario. Finally, Sect. 6 is a discussion.
2 Mental Models and Organisational Learning In this section, the concepts and processes that need to be addressed are briefly discussed. This provides a basis for the self-modeling network model that will be presented in Sect. 4 and for the scientific justification of the model. In (Van Ments and Treur 2021), an analysis of various types of mental models (Craik 1943) and the types of mental processes processing them are reviewed. Based on this analysis a three-level cognitive architecture has been introduced, where: • the base level models internal simulation of a mental model • the middle level models the adaptation of the mental model (formation, learning, revising, and forgetting a mental model, for example) • the upper-level models the (metacognitive) control over these processes By using the notion of self-modeling network (or reified network) from (Treur 2020), recently this cognitive architecture has been formalized computationally and used in computer simulations for various applications of mental models. For an overview of this approach and its applications, see (Treur and Van Ments 2022). Organisational learning is an area which has received much attention over time; see, for example, (Argyris and Schön 1978; Bogenrieder 2002; Crossan et al. 1999; Fischhof and Johnson 1997; Kim 1993; McShane and Glinow 2010; Stelmaszczyk 2016; Wiewiora et al. 2019). However, contributions to computational formalization of organisational learning are very rare. By Kim (1993), mental models are considered a vehicle for both individual learning and organizational learning. By learning and developing individual mental models, a basis for formation of shared mental models for the level of the organization is created, which provides a mechanism for organizational learning. The overall process consists of the following cyclical processes and interactions (see also (Kim 1993), Fig. 8):
56
G. Canbalo˘glu and J. Treur
(a) Individual level (1) (2) (3) (4)
Creating and maintaining individual mental models Choosing for a specific context a suitable individual mental model as focus Applying a chosen individual mental model for internal simulation Improving individual mental models (individual mental model learning)
(b) From individual level to organization level (1) Deciding about creation of shared mental models (2) Creating shared mental models based on developed individual mental models (c) Organization level (1) Creating and maintaining shared mental models (2) Associating to a specific context a suitable shared mental model as focus (3) Improving shared mental models (shared mental model refinement or revision) (d) From organization level to individual level (1) Deciding about individuals to adopt shared mental models (2) Individuals adopting shared mental models by learning them (e) From individual level to organization level (1) Deciding about improvement of shared mental models (2) Improving shared mental models based on further developed individual mental models In terms of the three-level cognitive architecture described in (Van Ments and Treur 2021), applying a chosen individual mental model for internal mental simulation relates to the base level, learning, developing, improving, forgetting the individual mental model relates to the middle level and control of adaptation of a mental model relates to the upper level. Moreover, both interactions from individual to organization level and vice versa involve changing (individual or shared) mental models and therefore relate to the middle level, while the deciding actions as a form of control relate to the upper level. This overview will provide useful input to the design of the computational network model for organizational learning and in particular the aggregation in it that will be introduced in Sect. 4.
3 The Self-modeling Network Modeling Approach Used In this section, the network-oriented modeling approach used is briefly introduced. A temporal-causal network model is characterised by; here X and Y denote nodes of the network, also called states (Treur 2020):
Using Boolean Functions of Context Factors for Adaptive Mental Model
57
• Connectivity characteristics Connections from a state X to a state Y and their weights ωX,Y • Aggregation characteristics For any state Y, some combination function cY (..) defines the aggregation that is applied to the impacts ωX,Y X(t) on Y from its incoming connections from states X • Timing characteristics Each state Y has a speed factor ηY defining how fast it changes for given causal impact. The following canonical difference (or related differential) equations are used for simulation purposes; they incorporate these network characteristics ωX,Y , cY (..), ηY in a standard numerical format: Y (t + t) = Y (t) + ηY cY ωX1 ,Y X1 (t), . . . , ωXk ,Y Xk (t) − Y (t) t (1) for any state Y and where X1 to Xk are the states from which Y gets its incoming connections. The available dedicated software environment described in (Treur 2020, Ch. 9), includes a combination function library with currently around 50 useful basic combination functions. The above concepts enable to design network models and their dynamics in a declarative manner, based on mathematically defined functions and relations. The examples of combination functions that are applied in the model introduced here can be found in Table 1. Combination functions as shown in Table 1 and available in the combination function library are called basic combination functions. For any network model some number m of them can be selected; they are represented in a standard format as bcf1 (..), bcf2 (..), …, bcfm (..). In principle, they use parameters π1,i,Y , π2,i,Y such as the λ, σ, and τ in Table 1. Including these parameters, the standard format used for basic combination functions is (with V 1 , …, V k the single causal impacts): bcf i (π1,i,Y , π2,i,Y , V1 , . . . , Vk ). For each state Y just one basic combination function can be selected, but also a number of them can be selected, what happens in the current paper; this will be interpreted as a weighted average of them according to the following format: cY (π1,1,Y , π2,1,Y , . . . , π1,m,Y , π2,m,Y , V1 , . . . , Vk ) =
γ1,Y bcf1 (π1,1,Y , π2,1,Y , V1 , . . . , Vk ) + . . . + γm,Y bcfm (π1,m,Y , π2,m,Y , V1 , . . . , Vk ) γ1,Y + . . . + γm,Y
(2) with combination function weights γi,Y . Selecting only one of them for state Y, for example, bcf i (..), is done by putting weight γi,Y = 1 and the other weights 0. This is a convenient way to indicate combination functions for a specific network model. The function cY (..) can then be specified by the weight factors γi,Y and the parameters πi,j,Y .
58
G. Canbalo˘glu and J. Treur Table 1. The combination functions used in the introduced self-modeling network model
Realistic network models are usually adaptive: often not only their states but also some of their network characteristics change over time. By using a self-modeling network (also called a reified network), a similar network-oriented conceptualization can also be applied to adaptive networks to obtain a declarative description using mathematically defined functions and relations for them as well; see (Treur 2020). This works through the addition of new states to the network (called self-model states) which represent (adaptive) network characteristics. In the graphical 3D-format as shown in Sect. 4, such additional states are depicted at a next level (called self-model level or reification level), where the original network is at the base level. As an example, the weight ωX,Y of a connection from state X to state Y can be represented (at a next self-model level) by a self-model state named WX,Y . Similarly, all other network characteristics from ωX,Y , cY (..), ηY can be made adaptive by including self-model states for them. For example, an adaptive speed factor ηY can be represented by a self-model state named HY , an adaptive combination function weight γi,Y can be represented by a self-model state Ci,Y . This self-modeling network construction can easily be applied iteratively to obtain multiple orders of self-models at multiple (first-order, second-order, …) self-model levels. For example, a second-order self-model may include a second-order self-model state HWX ,Y representing the speed factor ηWX ,Y for the dynamics of first-order self-model state WX,Y which in turn represents the adaptation of connection weight ωX,Y . Similarly, a persistence factor μWX ,Y of such a first-order self-model state WX,Y used for adaptation (e.g., based on Hebbian learning) can be represented by a second-order self-model state
Using Boolean Functions of Context Factors for Adaptive Mental Model
59
MWX ,Y . In particular, for the aggregation process for the formation of a shared mental which is a main focus of the current paper, in Sect. 4 s-order self-model states Ci,WX ,Y will be used that represent the ith combination function weight γi,WX ,Y of the combination functions selected for a shared mental model connection weight WX,Y (where the latter is a first-order self-model state).
4 The Adaptive Network Model for Organisational Learning The self-modeling network model for organisational learning with adaptive aggregation introduced here is illustrated for a scenario using the more extensive case study in an intubation process from (Van Ments et al. 2021). The part of the mental models used addresses four mental states; see Table 2, which involve tasks that for the sake of simplicity are indicated by a, b, c, and d. Table 2. The mental model used for the simple case study States for mental models of Short notation persons A, B and organization O
Explanation
a_A
a_B
a_O
Prep_eq_N
Preparation of the intubation equipment by the nurse
b_A
b_B
b_O
Prep_d_N
Nurse prepares drugs for the patient
c_A
c_B
c_O
Pre_oy_D
Doctor executes pre oxygenation
d_A
d_B
d_O
Prep_team_D
Doctor prepares the team for intubation
Initially the mental models of the nurse (person A) and doctor (person B) are different and based on weak connections; they cannot use a stronger shared mental model as that does not exist yet. In this scenario, person A and person B have knowledge on different tasks. Development of the organizational learning covers: 1. Individual learning processes of A and B for their separate mental models through internal simulation. By Hebbian learning, mental models become stronger but they are still incomplete. A has no knowledge for state d_A, and B has no knowledge for state a_B: they do not have connections to these states. 2. Shared mental model formation by aggregation of the different individual mental models (feed forward learning). Here the considered context factors exert control over the aggregation process. 3. Individuals’ adoption of shared mental model (feedback learning), e.g., a form of instructional learning. 4. Strengthening of individual mental models by individual learning through internal simulation, strengthening knowledge for less known states of persons A and B (by Hebbian Learning). Then, persons have stronger and now more complete mental models.
60
G. Canbalo˘glu and J. Treur
5. Improvements on the shared mental model by aggregation of the effects of the strengthened individual mental individuals. Again, the considered context factors (which may have changed in the meantime) exert control over the aggregation process. In the aggregation process for the shared mental model formation, not all individual mental models will be considered to be equally valuable. Due to more experience, Person A may be more knowledgeable than person B who is a beginner, for example. And when they are both experienced, can they be considered independent sources, or have they learnt it from the same source? In the former case, aggregation of their knowledge may be assumed to lead to a stronger outcome than in the latter case. Based on such considerations, a number of example context factors have been included that affect the type of aggregation that is applied. They are used to control the process of aggregation in such a way that it becomes context-sensitive. Recall that in a network model in general, aggregation is specified by combination functions (see Sect. 3) indicated by combination function weights γi,Y (and parameters πi,j,Y of these functions). In the specific case of mental model aggregation considered here, it concerns aggregation for the first-order self-model states WX,Y for the weights of the connections X → Y of the shared mental model. Therefore to make aggregation of mental models adaptive, the (choice of) combination functions for these states WX,Y have to become become adaptive in relation to the considered context factors and specifically here using knowledge expressed by Boolean propositions or functions of the considered context factors. In the example scenario four options for combination functions are considered for these W-states of the shared mental model (see Table 1): alogistic, smax, eucl, sgeomean, numbered by i = 1, …, 4 in this order. To make the combination functions of the first-order self-model states WX,Y adaptive, second-order self-model states Ci,WX ,Y , i = 1, …, 4 are introduced that represent the combination function weights γi,WX ,Y : • • • •
C1,WX ,Y C2,WX ,Y C3,WX ,Y C4,WX ,Y
for the logistic sum combination function alogistic for the scaled maximum combination function smax for the euclidean combination function eucl for the scaled geometric mean combination function sgeometric
So, there are four Ci,WX ,Y -states for each shared mental model connection, which is three in total. Thus, the model has 12 Ci,WX ,Y -states at the second-order self-model level to model the aggregation process. These second-order self-model states and the functions they represent are used depending on the context (due to the connections from the context states to the Ci,WX ,Y -states), and the average is taken (according to (2) in Sect. 3) if more than one i has a nonzero Ci,WX ,Y for a given WX,Y -state. The influences of the context factors on the aggregation as pointed out in Table 3 have been used to specify the context-sensitive control for the choice of combination function via these Ci,WX ,Y -states. For example, if A and B have a similar category of knowledgeability, in principle a form of average is supported (via the states C3,WX ,Y or C4,WX ,Y for a Euclidean or geometric mean combination function), but if they are independent, some form of amplification is supported (via the state C1,WX ,Y for a logistic
Using Boolean Functions of Context Factors for Adaptive Mental Model
61
combination function). If they differ in knowledgeability, the maximal knowledge is chosen (via the state C2,WX ,Y for a maximum combination function). This setup is meant as example to illustrate the idea and can easily be replaced by other context factors and other knowledge relating them to the control of the aggregation via the Ci,WX ,Y -states. Table 3. Examples of heuristics for context-sensitive control of mental model aggregation Context: knowledgeable
Context: dependency
A and B both not experienced A and B both experienced
A and B dependent
Context: preference for type of quantity
Combination function type
Additive
Euclidean
Multiplicative
Geometric mean
Additive
Euclidean
Multiplicative
Geometric mean
A and B not dependent
Logistic
A experienced B not experienced
Maximum
B experienced A not experienced
Maximum
The indications from Table 3 have been represented by four Boolean functions for the considered combination functions in Table 4. In a standard manner, such Boolean functions can also be represented by Boolean propositions (in disjunctive normal form) as shown in the upper part of Box 1. Here ∧ means the conjunction (AND), ∨ the disjunction (OR) and ¬ the negation (NOT). As handling negations adds undesirable complexity to the model, the choice has been made to have additional context states for the opposites of the given context states: begX = X is beginner (¬ expX) indepAB = A and B are independent (¬ depAB) multpref = preference for multiplicative (¬ addpref) By replacing the negations within the upper part by these opposite states, the lower part in Box 1 is obtained. From this it can easily be derived which connections are needed from the context states to the Ci,WX ,Y -states. To specify such Boolean propositions in the developed self-modeling network model, not only the connections are needed but also the function for the propositional structure. As it is chosen to use the standardised disjunctive normal form, that can be done by one function based on a combination of minimum of each of the conjunctions followed by the maximum of the disjunction.
62
G. Canbalo˘glu and J. Treur Table 4. Examples of Boolean functions formalising the indications from Table 3.
expA
expB
depAB
addpref
alogistic
smax
Euclidean
sgeomean
1
1
1
1
0
0
1
0
1
1
1
0
0
0
0
1
1
1
0
1
1
0
0
0
1
1
0
0
1
0
0
0
1
0
1
1
0
1
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
1
0
0
1
0
0
0
0
1
0
0
0
1
1
1
0
1
0
0
0
1
1
0
0
1
0
0
0
1
0
1
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
Box 1 Boolean propositions for the knowledge relating context factors to control states Ci,WX ,Y .
Using Boolean Functions of Context Factors for Adaptive Mental Model
63
For any propositional formula F in disjunctive normal form F = [a1,1 ∧ . . . ∧ a1,q1 ] ∨ . . . ∨ [ap,1 ∧ . . . ∧ ap,qp ] when (truth) values 0 or 1 are assigned to the ai,j , the value of F can be determined by the following function: max min([[a1,1 , . . . , a1,q1 ], . . . , [ap,1 , . . . , ap,qp ]]) = max(min(a1,1 , . . . , a1,q1 ), . . . , min(ap,1 , . . . , ap,qp )) Note that this function is also meaningful if continuous values within the [0, 1] interval are used. This may also enable adaptation based on gradual changes of the context factors. For the current scenario where only one or two conjunctions occur (see Box 1), the function maxmin2 was used, defined as maxmin2([[a1,1 , . . . , a1,q1 ], [a2,1 , . . . , a2,q2 ]]) = max(min(a1,1 , . . . , a1,q1 ), min(a2,1 , . . . , a2,q2 )) In MATLAB terms this function was implemented with a parameter p(1) for the length of the first conjunction as
maxmin2(p, v) = max([min(v(1:p(1))),
min(v(p(1)+1:end)])
and added to the combination function library in the software environment. Given the specification in Box 1, the parameter p(1) is 2 for the maximum function, 3 for the logistic function, and 4 for the other two combination functions (Euclidean and geometric mean). The approach to adaptive aggregation of mental models described above was integrated in the self-modeling network model for organisational learning (with fixed, nonadaptive aggregation) introduced in (Canbalo˘glu et al. 2021). The connectivity of the network model extended in this way is depicted in Fig. 1. The added states for the context factors are depicted by the 16 grey ovals in the middle (blue) plane. Moreover, the added Ci,WX ,Y -states are depicted by the 12 blue-green ovals in the upper (purple) plane. This increases the number of states in the network model introduced in (Canbalo˘glu et al. 2021) from 46 to 74 states. At the base level (the pink plane), states for the individual and shared mental models are included. Moreover, context states for the different phases were added here. The middle level (the blue plane) represents the first-order self-model level. This is based on W-states representing the weights of the connections between states within the mental models. For the organisational learning, a number of (intralevel) connections that connect W-states from individual mental models to shared mental models and conversely are crucial. From left to right, these intralevel connections are used to provide input from the W-states to the W-states of the shared mental model for the formation (or improvement) of the shared mental model: feed forward learning in terms of (Crossan et al. 1999). The intralevel connections from right to left model the influence of the
64
G. Canbalo˘glu and J. Treur
shared mental model on the individual mental models, for example, based on instruction of the shared mental model to employees: feedback learning in terms of (Crossan et al. 1999).
Fig. 1. The connectivity of the second-order adaptive network model
The middle level also includes context states for the context factors that are used at the second-order self-model level (the purple plane) in the Boolean functions for the control of the aggregation for shared mental model formation. Overall, the second-order self-model level includes WW -, MW and HW -states to control the adaptations of the W-states at the first-order self-model level. The WW -states can be seen as higher-order W-states; they represent the weights of the intralevel connections from W-states of the shared organization mental model to W-states of the individual mental models used for feedback learning. The WW -states initiate and control this feedback learning by making these weights within the first-order self-model level nonzero when a shared mental model has become available. The WW -states also have a learning mechanism (which can be considered a form of higher-order Hebbian learning), so that they are maintained over time: persons will keep relating and updating their individual mental model to the shared mental model. Finally, the HW -states are used for controlling adaptation speeds of connection weights and MW -states for controlling persistence of adaptation.
5 An Example Simulation Scenario In this section some of our simulation results are discussed. More details of the model, a full specification, and more simulation scenarios can be found as Linked Data at URL https://www.researchgate.net/publication/354402996. In this scenario, different options for combination functions are used to observe different types of aggregation while a feed forward organizational learning progresses by the aggregation of separate individual
Using Boolean Functions of Context Factors for Adaptive Mental Model
65
mental models. To see the different processes better, the scenario was structured in phases. In reality and also in the model, these processes also can overlap or take place entirely simultaneously. The five phases were designed as follows: • Phase 1: Individual mental model usage and learning Two different mental models for person A and B belonging to an organization are learnt in this phase by Hebbian learning for internal simulations of the mental models. Person A mainly has knowledge on the first part of the job, and person B has knowledge on the last part. • Phase 2: Feed forward organisational learning: shared mental model formation Aggregation of individual mental models occurs here to for the shared mental model: feed forward organisational learning. During this formation, different combination functions are used for different cases in terms of context factors such as knowledgeability, dependence and preference of additivity or multiplicativity. This feed forward organizational learning takes place by the determination of the values of the W-states for the organization’s general states for jobs a_O to d_O. • Phase 3: Feedback organisational learning: the shared mental model is learnt by the individuals Feedback learning from the organization’s shared mental model, can be viewed as learning from each other in an indirect manner. It takes place in this phase by the activation of the connections from the organization’s general W-states to the individual W-states. By that, individuals receive the knowledge from the shared mental model, for example, as a form of instructional learning. Since there is a single shared mental model, there is no need for many mutual one-to-one connections between persons to learn from each other. • Phase 4: Individual mental model usage and learning Further improvements on individual mental models of persons take place using Hebbian learning during internal simulation of the mental model in this phase, similar to Phase 1. One scenario is discussed to see how the context factors affect the aggregation process. For this scenario, we have binary values for the context states. In Fig. 2 the phases are shown: phase 1 (individual mental model learning), phase 2 (feed forward learning for formation of the shared mental model), phase 3 (feedback learning of individual mental models from the shared mental model) and phase 4 (further individual mental model learning) show a classical organization learning process. In the beginning of the whole simulation, we have the selection of the combination functions by determining values for the C-states based on the context states. For example, in Fig. 3 it can be seen that in the first phase second-order self-model state C3 for the connection from a_O to b_O becomes 1 and C1 for the connection from b_O to c_O becomes 1. However after the context change after time 400, this changes. Then C3 for the connection from a_O to
66
G. Canbalo˘glu and J. Treur
b_O becomes 0 whereas C2 for that connection becomes 1; this happens because of the contextual change that from that time on A is not a beginner anymore but is considered an experienced person. This illustrates how the form of aggregation adapts to context.
Fig. 2. Overall simulation of the example scenario
Using Boolean Functions of Context Factors for Adaptive Mental Model
67
Fig. 3. Adapting C-states for the example scenario by context changes
6 Discussion An important process with organisational learning is the aggregation of developed individual mental models to obtain shared mental models; e.g., (Kim 1993; Wiewiora et al. 2019). The current paper focuses on how Boolean functions of context factors can be used in this aggregation process. It was shown how a second-order adaptive self-modeling network model for organisation learning based on self-modeling network models described in (Treur 2020) can model the adaptivity of this aggregation process of individual mental models based on Boolean combinations of context factors. In previous work (Canbalo˘glu et al. 2021) addressing organisational learning, the type of aggregation used for the process of shared mental model formation was fixed and not addressed in an adaptive manner and not made context-sensitive. In (Canbalo˘glu and Treur 2021) different forms of aggregation have been incorporated and addressed in an adaptive heuristic manner. In contrast, in the current paper Boolean functions of context factors were used to address the adaptivity of the aggregation, which provides a more precise way to specify knowledge for the context-sensitive adaptive control. For more details about computational modeling of organisational learning, see the forthcoming book (Canbalo˘glu et al. 2022).
References Argyris, C., Schön, D.A.: Organizational Learning: A Theory of Action Perspective. AddisonWesley, Reading, MA (1978) Canbalo˘glu, G., Treur, J.: Context-Sensitive Mental Model Aggregation in a Second-Order Adaptive Network Model for Organisational Learning. In: Proceeding of the 10th International Conference on Complex Networks and their Applications. Studies in Computational Intelligence, vol. 1015, pp. 411–423. Springer Nature (2021). https://doi.org/10.1007/978-3-03093409-5_35
68
G. Canbalo˘glu and J. Treur
Canbalo˘glu, G., Treur, J., Roelofsma, P.H.M.P.: Computational Modeling of Organisational Learning by Self-Modeling Networks. Cognitive Systems Research (2021). https://doi.org/10.1016/ j.cogsys.2021.12.003 Canbalo˘glu, G., Treur, J., Wiewiora, A. (eds.): Computational Modeling of Multilevel Organisational Learning and its Control Using Self-Modeling Network Models. Springer Nature, to appear (2022) Craik, K.J.W.: The Nature of Explanation. University Press, Cambridge, MA (1943) Crossan, M.M., Lane, H.W., White, R.E.: An organizational learning framework: from intuition to institution. Acad. Manag. Rev. 24, 522–537 (1999) Kim, D.H.: The link between individual and organisational learning. Sloan Manage. Rev. Fall 1993, 37–50 (1993) Treur, J.: Network-Oriented Modeling for Adaptive Networks: Designing Higher-Order Adaptive Biological, Mental and Social Network Models. Springer Nature, Cham (2020). https://doi.org/ 10.1007/978-3-030-31445-3 Treur, J., Van Ments, L. (eds.): Mental Models and their Dynamics, Adaptation, and Control: a Self-Modeling Network Modeling Approach. Springer Nature (2022). https://doi.org/10.1007/ 978-3-030-85821-6 Van Ments, L., Treur, J.: Reflections on dynamics, adaptation and control: a cognitive architecture for mental models. Cogn. Syst. Res. 70, 1–9 (2021) van Ments, L., Treur, J., Klein, J., Roelofsma, P.: A second-order adaptive network model for shared mental models in hospital teamwork. In: Nguyen, N.T., Iliadis, L., Maglogiannis, I., Trawi´nski, B. (eds.) ICCCI 2021. LNCS (LNAI), vol. 12876, pp. 126–140. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88081-1_10 Wiewiora, A., Smidt, M., Chang, A.: The ‘How’ of multilevel learning dynamics: a systematic literature review exploring how mechanisms bridge learning between individuals, teams/projects and the organization. Eur. Manag. Rev. 16, 93–115 (2019)
User Group Classification Methods Based on Statistical Models Andrey Igorevich Cherkasskiy1 , Marina Valeryevna Cherkasskaya2(B) Alexey Anatolevich Artamonov1,2 , and Ilya Yurievich Galin1
,
1 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute),
Kashirskoe Highway, 31, 115409 Moscow, Russia [email protected] 2 Plekhanov Russian University of Economics, Stremyanny Lane, 36, 117997 Moscow, Russia
Abstract. The fundamental difficulty of building an information model of a target object in social networks is that a large number of characteristics (several tens) are used in the description of objects in social networks, described by all conceivable types of data: numbers, score estimates of qualitative characteristics, texts, symbols, video and audio information. Obviously, such non-additive data types cannot be used to construct any integral criterion for the specification of the target object. To solve this problem, the article introduces the concept of “vector of target search”. The general idea for solving this problem, proposed by the authors, is to convert physical characteristics into relative dimensionless quantities with normalized values from 0 to 1. The authors have implemented modern promising ideas in the development of intelligent information technologies, such as: computer training of intelligent agents using illustrative examples from the training sample, agent-based technologies for working with Big Data, the method of wave scanning of social networks when searching for target objects. The implementation of wave scanning of social networks during agent search of targets significantly reduces the computing power required to implement the search process and reduces the amount of “noise” in agent collections. Authors developed a method of marking a single object of a social network to solve the problems of streaming classification of objects in the interests of various groups of researchers, including solving the problems of targeted attraction of applicants to a higher educational institution. Keywords: Multi-agent systems · Agent search · Social media analysis · Wave scan · Statistical analysis of the training sample · Targeted attraction of applicants
1 Introduction Every day, millions of users of information and communication networks generate a huge amount of unstructured data, including that in social information systems. The attractiveness of social networks is that communications in them are carried out quickly, free of charge, do not depend on state borders and languages of communication, or censorship. At the same time, users of social networks can use absolutely any type of data: numerical characteristics of physical quantities, various score scales of qualitative © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 69–74, 2022. https://doi.org/10.1007/978-3-030-96993-6_6
70
A. I. Cherkasskiy et al.
characteristics, symbolic images, video images, audio objects (music, sound signals), texts in various national languages, etc. In this regard, it becomes relevant to develop methods for converting such unstructured information for subsequent analysis of social networks and microblogs, in which users express their position on socially significant events. Based on the views, opinions and sentiments of users on the network, social trends arise, develop and spread. At the same time, it is necessary to assess how the opinions expressed by activists in forums can change and shape future events in the real world. Among the features of modern social networks, it is worth noting the possibility of contacts “each with each” (a fully connected network [1]), the presence of a single channel for transmitting and storing all types of data, as well as almost unlimited memory for transmitting and storing messages. In this regard, it becomes possible to give an accurate assessment of the positions of individual subjects and user groups on various social issues, to disseminate targeted information for various purposes, to form target thematic groups capable of collective decisions and actions and remotely control them.
2 Methodology of Agent-Based Target Search in Social Networks Monitoring and analysis of social networks with subsequent identification and classification of users can cover any area of human activity: from marketing research to state security issues [2]. In this work, a methodology for training agents has been developed to carry out agent-based information retrieval in social networks. An agent is any entity that is in a certain environment and perceives it with sensors, obtaining data that reflecs events occurring in the environment, interpreting this data and acting on the environment through effectors [3]. To achieve any goal, agents can interact with each other and with the passive environment they form multi-agent systems. Each agent of such a system has its own ideas about the external world, tasks and logic that determine its behavior. In the process of work, agents communicate with each other. Sensor agents are responsible for collecting and processing information; effector agents affect the environment. Agents can act independently of each other, or conflict over resources, communicating to resolve disputes [4]. The questions of agent search of target objects in social networks have not been studied much. Solving this issue will significantly expand the possibilities of social research and the possibility of forming thematic user groups [5]. In this regard, the authors have developed a method for marking a single object of a social network for solving problems of streaming classification of objects in the interests of various groups of researchers, including for solving problems of attracting targeted applicants to a higher educational institution. To achieve this goal, it was necessary to solve the following tasks: • expert collective formation of a training sample of target objects based on their characteristics from social network accounts; • converting characteristics of target objects contained in the training sample into a vector of target search characteristics; • converting physical (dimensional) values of search vector characteristics into relative (dimensionless) values;
User Group Classification Methods Based on Statistical Models
71
• calculation of values of integral search target index for each object from training sample and marking of targets; • development of a methodology for experimental evaluation of completeness and accuracy of agent collections collected by trained intellectual agents; • including agent-based collections of targets in a thematic database. 2.1 Wave Method of Choosing Agent Routes in Social Networks (Search Agent) The idea of wave scanning is to determine step-by-step the route of the agent on the social network. The initial position of the route is a training sample, which obviously contains only target objects. The first steps of the agent movement are determined by the relationships of the targets with other objects (the profile section is “friends”). A plurality of objects found at these addresses are analyzed by the agent for a target match. The target objects thus found constitute the first wave front. The direction of the next step on the route is determined by analogy with the first. Thus, the route of the agent movement is determined iteratively. The route model is a directed graph with a hierarchical structure. Note that the proposed methodology for constructing a target agent search route solves the Big Data problem and significantly reduces the number of search operations compared to the full search method. 2.2 Wave Method of Choosing Agent Routes in Social Networks (Search Agent) Multi-agent scanning of social networks opens up ample opportunities for analyzing the contingent most suitable for a specific type of activity. Let us assume that a recently organized university trains specialists in various humanitarian fields. The university administration needs to determine the social portrait of students and the number of potential applicants, which can be expected in the near future (5 years). It is necessary to solve the following tasks: • the university needs to submit a training sample - several dozens of the most successful students who have accounts on the VKontakte social network; • form a training sample in terms of user characteristics accepted on the VKontakte social network; • generate a vector of agent-based target search and a set of groups in which users from the training sample participate; • generate empirical thesauruses and score estimates of components of the target search vector; • conduct a statistical analysis of the training sample, calculate the values of the integral criteria for target compliance and its critical value (marker of target compliance). Thus, the formed starting content of a multi-agent system is itself a solution to one of the user’s information and analytical tasks - an “impersonal social portrait” of a successful university student. To solve the second problem, it is necessary to analyze with the help of trained agents the composition of all groups, which include users from the training sample, and select
72
A. I. Cherkasskiy et al.
those who correspond to the “social portrait” of a successful university student and the corresponding year of graduation from high school. Only accounts open to all were investigated. On the basis of the personal profile of a VKontakte user, 39 field names were collected, divided into five categories: main, contacts, activities, interests, life position. All data available for viewing within the privacy settings were structured as a table and analyzed. Expert analysis of 39 user characterizing fields showed that 16 fields are irrelevant to selecting components of the target search vector. The remaining 23 fields are taken as components of the target search vector, arranged and “weighted” by the degree of importance for the agent target search. Taking into account the target search vector and the statistical analysis of the values of its components, we obtain a statistical model of the target search object (profile of a potential university applicant, Table 1). Table 1. Statistical model of the target (profile of a potential university applicant). Criterion
Meaning
Date of Birth
2000–2003 years
Attitude to smoking/alcohol
Negative
Favorite books
F. Dostoevsky, G. Nouven, K. Lewis
Audio recordings
Classical, pop music
Video recordings
Art films
Groups
Language communities (>2)
School
Classical gymnasium/school
Status
Used keywords from thesaurus
Inspire
Used keywords from thesaurus
Personal site
Skype, Instagram, Facebook
The main thing in life
Self-development
The main thing in people
Kindness and honesty
First name/Last name
Cyrillic is used
Family ties
Big family
About myself
Used keywords from thesaurus
Number of friends
>100
The analysis of the training sample showed that the target objects are included in 308 different groups, moreover, this indicator is dynamic and often varies for objects from the training sample. The average number of group members is 48 037 people. It follows from this that scanning a social network by groups will require an analysis of about 15 000 000 accounts. Wave scanning from the training sample minimized the encounter with non-target objects and required analysis of only 27 229 accounts, that is, the number of search operations compared to the full search method was a second-order value. The implementation
User Group Classification Methods Based on Statistical Models
73
of wave scanning of social networks during agent search of targets significantly reduces the computing power required to implement the search process and reduces the amount of “noise” in agent collections. The target marker values of the objects in the training sample range from 0,163 to 0,584. The target marker values above 0,584 could be suitable, but such a value is unlikely, and values below 0,163 are too low and there is no point in considering them. The obtained data allows you to calculate a forecast of the number of applicants using the following rule: the number of applicants is equal to the volume of the agent collection of target objects whose marker is greater than 0,163. The results of computational experiments have shown that the value of the forecast significantly depends on the interpretation of the concept of “applicant”, namely on the value of the field “age”. For example, of the 27 229 accounts analyzed, 55 fall into the agency collection if the age of the applicant is in the range of 17–18 years old (graduates of this year’s schools). If the age range is expanded from 17 to 33 years old, then the volume of the agent collection will increase to 4 826 accounts. It was this prediction that turned out to be plausible, since university applicants are mainly mature people who want to get a second education or people who have made a mature decision on training in order to subsequently devote themselves to activities in this thematic area.
3 Conclusion It was revealed that the members of groups on social networks have the same views; they form group opinions and group behavior of users. In this regard, the effectiveness of the use of agent technologies in conducting information and analytical research on social networks has been proved. Methods for constructing statistical models of target objects have been developed, as well as inductive methods for training search agents to solve multi-criterial problems of recognizing target objects when moving in the information environment of a social network. Methods of wave agent scanning of a social network are proposed and implemented, which allows minimizing encounters with non-target objects and thus solving the Big Data problem that arises with other methods of social network scanning. Based on the developed methodology of agent search for target objects in social networks, targeted admission of applicants aimed at a certain type of study to one of the real Universities was carried out. In view of the large choice and increasing competition among higher educational institutions and taking into account the prospect of developments, the authors are considering in the future the possibility of drawing up typical portraits of applicants for any specific field of study in order to attract new students and form digital trajectories of their education. The analysis of consumers of educational services, the degree of their interest creates grounds both for adjusting the curriculum, improving the management of the educational process, and for improving the policy of universities aimed at developing search methods and attracting potential students. Acknowledgments. The study was carried out at the expense of the Russian Science Foundation grant (project # 19-71-30008).
74
A. I. Cherkasskiy et al.
References 1. Fully connected topology, In Wikipedia, the free encyclopedia (2014). https://ru.wikipedia. org/wiki/fullyconnectedtopology. Accessed 17 Feb 2021 2. Martinez-Rojas, M., del Carmen Pardo-Ferreira, M., Rubio-Romero, J.: Twitter as a tool for the management and analysis of emergency situations: a systematic literature reviews. Int. J. Inf. Manage. 43, 196–208 (2018) 3. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pearson Education Limited, London (2009) 4. Zaytsev, I.D.: Multi-agent systems in the modeling of socio-economic relations: the study of behavior and verification of properties using Markov chain, Ph.D. thesis, A.P. Ershov Institute of Informatics Systems (IIS), Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia (2014) 5. Ulizko, M., Pronicheva, L., Artamonov, A., Tukumbetova, R., Tretyakov, E.: Complex objects identification and analysis mechanisms. Adv. Intell. Syst. Comput. 1310, 517–526 (2021)
From Mental Network Models to Virtualisation by Avatars: A First Software Implementation Frank de Jong1 , Edgar Eler1,2 , Lars Rass1 , Roy M. Treur1 , Jan Treur1(B) , and Sander L. Koole2 1 Social AI Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
[email protected], {e.eler,j.treur}@vu.nl, [email protected] 2 Department of Clinical Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands [email protected]
Abstract. Mental processes in the brain and body are often modelled and simulated by causal models according to a dynamical systems perspective. These causal models can be adapted to fit natural human processes and describe specific human traits. Patterns or time series generated by simulations are often displayed in the form of graphs of activation levels of mental states against time. However, from these generated time series generated by such a model, using avatars virtual agents can be constructed which express these patterns. Such forms of virtualisation can be used in therapy or coaching sessions to help clients gain insight into their behaviour and understand their functioning better. This paper describes part of a contribution to the virtualisation project CoSiHuman to achieve this, focusing here on first steps toward development of a software environment supporting this path. The presented approach utilises the Unity game engine, which includes the creation of several human-like avatars with the ability to express several human emotions. Several programming libraries were created to facilitate easy application of simulation data from causal models to obtain virtual agents based on avatars. These dynamic libraries were made to be easy to expand and be used by other researchers for future occasions. The approach was successfully applied to simulation based on an example causal model in the context of a couple’s therapy showing its utility. Finally, the software was evaluated by two experienced software developers from industry and found to be good, well-documented and easily extendable.
1 Introduction Simulating human mental processes has been the subject of much research in the past decade; e.g., (Treur 2020). These simulations use human-like computational causal models based on causal pathways in the brain and body. These causal models contain connections between mental states and how they interact and behave together. Mental processes are based on complex networks, often with cyclical connections, between several mental states (Kim 1996). Causal models can be adapted to fit natural mental processes and specific human traits. From simulations by such models, virtual agents based on avatars © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 75–88, 2022. https://doi.org/10.1007/978-3-030-96993-6_7
76
F. de Jong et al.
can be created, which express the generated patterns of mental states over time. This approach can be used in coaching or therapy sessions as a way for people to understand their functioning better. Since this is all virtual, the scenarios can be flexible and convey complicated behaviours involving multiple people. Moreover, these virtual agents can also be used in games or education. They can help to provide insight into human mental processes. This process of virtualisation involves creating digital, human-like avatars. These avatars act as a medium for expressing human-like attributes using the patterns generated by simulations from the computational models. This paper focuses on such virtualisation by digital avatars, how they are created, and how they can be applied to causal models to express human-like patterns and underlying traits. Humans observe the avatars depicting some natural process to gain insight into their own behaviour and underlying mental processes. An approach is explored which uses existing software to build virtual agents based on avatars and causal models in a systematic manner. Thus a reusable software environment can be obtained which makes it easier to achieve virtualisation of any causal mental model and its simulations, so that this does not need to be done from scratch all the time. The main software for virtualisation is the Unity software. As an ilustrative example, two virtual avatars are created and imported as assets into a Unity project. Libraries were built that allow the avatars to be controlled for a wide variety of potential applications. The above considerations have led to a methodology based on adaptive mental causal network models and avatars expressing their simulation; see the CoSiHuman project described in (Treur et al. 2021) or URL https://www.researchgate.net/project/CoS iHuman-Cooperative-Simulated-Human. The shared mental networks used for mental processes is a basic similarity between natural humans and the obtained artificial humans.
2 The Example Causal Mental Network Model Used In the network-oriented modeling approach based on temporal-causal networks described in (Treur 2020), a network structure is defined by network characteristics for connectivity, aggregation and timing. Network nodes have activation values that change over time: they serve as state variables and (like in Philosophy of Mind) are also called (mental) states. More specifically, a (temporal-causal) network model is characterised by (here X and Y denote nodes of the network, also called states): • Connectivity characteristics • Connections from a state X to a state Y and their weights ωX,Y • Aggregation characteristics For any state Y, some combination function cY (..) defines the aggregation that is applied to the impacts ωX,Y X(t) on Y from its incoming connections from states X • Timing characteristics Each state Y has a speed factor ηY defining how fast it changes for given impact.
From Mental Network Models to Virtualisation by Avatars
77
Such states are depicted in Fig. 1 by the small ovals and the causal relations between them (connections in the causal network) by arrows. The following difference (or differential) equations that are used for simulation purposes and also for analysis of temporal-causal networks incorporate these network characteristics ωX,Y , cY (..), ηY in a standard numerical format: Y (t + t) = Y (t) + ηY cY ωX1 ,Y X1 (t), . . . , ωXk ,Y Xk (t) − Y (t) t (1) where X 1 to X k are the states from which state Y has incoming connections. Such equations are hidden in the dedicated software environment; see (Treur 2020), Ch 9. There are many different approaches possible to address the issue of aggregating multiple impacts by combination functions. Therefore, for this aggregation a combination function library with a number of basic combination functions (currently more than 50) is available, while also own-defined functions can be added. Examples of basic combination functions from this library can be found in Table 1. Table 1. Basic combination functions from the library used in the presented model Notation
Formula
Parameters
Stepmod
stepmodρ,δ (V )
0 if time t < δ mod ρ, else 1
Repetition parameter ρ Duration parameter δ
Advanced logistic sum
alogisticσ,τ (V 1 , …,V k )
[
1 − 1+e−σ(V1 +···+Vk −τ) 1 −στ ) 1+eστ ) ](1 + e
Steepness parameter σ >0 Threshold parameter τ
As an example, consider a couple therapy for partners A and B, where partner A has aggression issues and being-in-control issues. For example, the following script can be generated by causal models for emotions and emotion regulation; e.g., (Essau et al. 2017). The aim for using a virtualised version of this script can be for the couple to gain more insight in interaction patterns relating to anger and fear regulation. • Person A shows some level of anger; no emotion regulation • Person B (seeing that) gets a stressful emotion • Person B’s gaze goes away, not at A anymore (fear emotion regulation: attention deployment) • Person A (seeing that) shows more anger; no emotion regulation • Person A makes a threatening gesture • Person B (seeing that) gets a stronger stressful emotion • Person B walks out of the room (fear emotion regulation: situation modification) • Person A (seeing that) shows still more anger; no emotion regulation • Person A breaks some valuable thing in the room These are interaction processes that can well be modeled well by a causal modeling approach, where Person A has poor anger emotion regulation incorporated together with
78
F. de Jong et al.
a dependency on feeling fully in control of situations and Person B well-functioning (but for A undesirable) emotion regulation based on attentional deployment and situation modification. This has been modeled in a network-oriented manner according to (Treur 2020) as shown in Fig. 1; for an explanation of all states, see Table 2. sss
srsangryA
esbreak
bsnegintB
sswalkawayB srsgazeawayB srswalkawayB ssangryA
psbreak
srss
ssgazeawayB
psthreaten
esthreaten
psgazeawayB esgazeawayB
ssthreatenA srsthreatenA
bsnegintA
pswalkawayB
eswalkawayB
Fig. 1. The connectivity of the example causal mental network model
An example simulation of this network model is shown in Fig. 2. The full specification by role matrices is included in the Appendix. Table 2. Overview of the states in the example causal mental network model State nr
State name
Explanation
X1
sss
Sensor state for stimulus s
X2
ssgazeawayB
Sensor state for seeing gazeawayB
X3
sswalkawayB
Sensor state for seeing walkawayB
X4
srss
Sensory representation state for stimulus s
X5
srsgazeawayB
Sensory representation state for gazeawayB
X6
srswalkawayB
Sensory representation state for walkawayB
X7
bsnegintB
A’s negative interpretation of B
X8
psbreak
Preparation state for breaking
X9
psanger
Preparation state for anger
X 10
psthreaten
Preparation state for threatening B
X 11
esbreak
Execution state for breaking
X 12
esanger
Execution (expression) state for anger
X 13
esthreaten
Execution state for threatening B (continued)
From Mental Network Models to Virtualisation by Avatars
79
Table 2. (continued) State nr
State name
Explanation
X 14
ssangryA
Sensor state for seeing angry A
X 15
ssthreatenA
Sensor state for seeing threatening A
X 16
srsangryA
Sensory representation for seeing angry A
X 17
srsthreatenA
Sensory representation for seeing threatening A
X 18
bsnegintA
B’s negative interpretation of A
X 19
psgazeaway
Preparation state for gaze away
X 20
pswalkaway
Preparation state for walking out of the room
X 21
esgazeaway
Execution state for gaze away
X 22
eswalkaway
Execution state for walking out of the room
Fig. 2. Simulation outcome for the example model
3 The Setup of the Software Environment for Virtualisation Once the avatars are virtualised, they are meant to be in interaction with humans, so that humans can obtain more insight in the mental processes. When natural humans observe avatars, the avatars must elicit empathy and a connection from the observer. Empathy, or the ability to relate to another person’s feelings or emotions (Singer and Tusche 2014), is suggested to be triggered automatically (Preston and de Waal 2002). Empathy allows the observer to connect with the behaviour of the avatars and helps to gain insight into their behaviour. One element of accomplishing empathy (in addition to the notion of self-other distinction) is through emotion contagion, which is when an individual observes a behavioural change and reflexively produces the same behaviour or emotion (Burgos-Robles et al. 2019). The avatars should be applicable to simulation
80
F. de Jong et al.
data from several different causal models to provide insight into several behaviours. To allow the avatars to express several unique human-like traits, they must have the capacity for many human functions. The avatars must express the six basic emotions: anger, disgust, fear, happiness, sadness and surprise, from Ekman’s Theory of Basic Emotions (Miller 2016). Although Ekman later theorised that there are several other basic emotions beyond the initial six (Ekman and Cordaro 2011). This work reported in the current paper focused on the initial basic emotions due to the scope of the research but leaves the door open for more emotions to be implemented and integrated easily. From Ekman’s ‘Unmasking the Face’ (Ekman and Friesen 2003), emotions can be encoded using the FACSHuman plugin for Makehuman (Gilbert et al. 2021) for each basic emotion. This encoding is achieved using the Facial Action Coding System (FACS) developed by Ekman, Hager and Friesen (Ekman et al. 2002). Each emotion is split into several ‘Action Units’, which are applied separately. Splitting ensures that each group of Action Units can be triggered independently. Triggering different parts together can blend emotions, e.g., fear in addition to surprise, by triggering the upper Action Units for fear and the bottom Action Units for the surprise emotion (Ekman and Friesen 2003). Furthermore, the avatars must be able to walk, run, strafe, and turn. The approach can be expanded to support even more complex virtual behaviours. For example, virtual avatars can drive a car based on a causal model for road rage issues. The Makehuman application is an open-source tool ‘designed to simplify the creation of virtual humans using a Graphical User Interface’ (Makehumancommunity.org 2016). This software is ideal since it abstracts from the complexity of 3D modelling and allows for the creation of realistic virtual human-like avatars. Realistic and relatable looking avatars are essential, given that a higher level of empathy can be obtained based on appearance and similarity to the avatar (Hall and Woods 2005). The creation of the avatar is very straightforward, with sliders for the avatar’s visual features, i.e., gender, age, weight, height, race, etc. The approach also has many plugins that allow for additional features and customisations to be added. A plugin to the Makehuman application (Gilbert et al. 2021) is utilised for encoding facial expressions built on the FACS (Ekman et al. 2002). The avatar from Makehuman is exported into the Blender software (Blender Foundation 2018), an open-source 3D creation suite. Support for the entire 3D pipeline is included, allowing the avatars to be animated, modelled, rendered, etc. A feature called blend shapes is used, enabling the shape to be deformed into a new shape gradually by scaling, transforming or rotating any of its vertices (Blender Foundation 2021). The blend shape can be applied gradually, so vertices are deformed over time or to a limited value. This technique is often used for animation. At first, a base avatar is exported from Makehuman into Blender, which acts as a neutral shape. In Makehuman, the FACS system is used to export the same avatar with a different facial expression. These two avatars are then merged, and the difference between the base avatar and the morphed avatar is converted into a blend shape. Now, triggering this blend shape morphs the base avatar into the avatar with the facial expression. Animations are collected from Mixamo (MCV 2021), which is part of the Adobe creative cloud. These are pre-made and freely available animations that can be applied
From Mental Network Models to Virtualisation by Avatars
81
to custom uploaded avatars. Using existing animations removes the tedious and often challenging task of creating animations. The Unity game engine, or Unity software, is the final software used in the work reported in this paper. The avatar can be programmed to express human traits in the engine. The data used to express the human traits is collected from simulation data from the causal models. Unity is a game engine created by Unity Technologies and allows 3D and 2D games to be created. The engine is also used in the film, architecture, engineering and construction industries. Importing the blender objects is very well supported. Unity takes care of almost all parts of the virtualisation, from rendering the avatars, world creation, controlling the camera, sound, and any other component used to show the avatars. Unity is also a very powerful engine because it supports almost any platform easily. In the scope of the work reported in this paper, it allows easy cross-platform access to the virtualised avatars. Moreover, Unity is a good fit in avatar virtualisation as it already has many features to optimise performance and quality.
4 Creating Avatars Avatars are first created in the Makehuman application. More specifically, the avatar’s features are specified, e.g., age, gender, race, weight, the shape of the face, etc. Other features like clothing, hairstyle, and pose are also selected. In this first creation step, almost all visual attributes are specified. The avatar can be made to look human-like to the extent that the makehuman-project allows, which is a limitation. Once this base avatar is created, a so-called skeleton must be added, which will be used to add animations later on. Skeleton animation or rigging, in which the mesh (the surface of the avatar) is connected to the avatar’s bones. Rotating, scaling, or translating these bones deforms the mesh and combining these deformations with several bones can create an animation. Makehuman provides several skeletons, one of them being ‘advanced skeletons’. Advanced skeletons are skeletons that allow for complete control of all parts of the avatar. This paper’s scope did not require precise control of the hands and feet bones; however, adding these to new avatars is also possible and can be done in future development. A more straightforward skeleton is used, which is also less taxing on the game engine. The final step is to export the avatar to an FBX format, imported into Blender. In Blender, the facial shape keys (blend shapes) are added. This part consists of importing the base avatar first with a neutral expression. Once the Action Units, according to the theoretical framework, are applied to the avatar, it is exported to Blender. When both avatars are imported, one with an applied Action Unit and the other base avatar, each avatar component is merged to form several blend shapes. The avatar is split up into the body mesh, which contains the arms, legs, and face, the eyebrows mesh, the eyes mesh, the tongue mesh, the teeth mesh, the clothing mesh, and the hair mesh. Depending on the Action Units applied, not all components are deformed, and therefore, the blend shape is only added to these morphed components. For example, moving only the eyebrows deforms the eyebrows component, so only a simple blend shape is added. However, moving the mouth, body, tongue, and teeth are all deformed, so a blend shape is added for each of those components. This process is repeated for each of the six basic emotions and each individual Action Unit specified in ‘Unmasking the face’ (Ekman and Friesen 2003); see Fig. 3.
82
F. de Jong et al.
Fig. 3. Full-face fear expression encoded from Fig. 22 in ‘Unmasking the Face’ (Ekman and Friesen 2003) into Blender using FACSPlugin.
Adding blend shapes must be repeated for every new avatar created. The FACSHuman plugin allows each Action Unit to be saved, so the Action Units only need to be loaded each time a blend shape is added. This repetitive process also opens the door for a potential automation program that could repeat this process several times. Blender also has a scripting API, making avatar creation straightforward, employing an easy-to-use Graphical User Interface (GUI). Animations allow the avatars to come to life by simulating movements like walking and running and other actions like throwing an object, hand gestures, and most other conceivable actions. Animating is a critical component to make the avatar human-like and allows the humans to relate and empathise better. However, a time-consuming obstacle is animating each bone in the avatar’s skeletons. Professional animators are likely required to make these actions look believable. Animations are imported from Mixamo (MCV 2021), a library of existing animations, which saves time. Once the avatar has animations and blend shapes, it can be imported into the Unity game engine, supporting all Blender features. Unity has a concept called a ‘scene’ where game objects can be added. Any object in Unity is a game object, from cameras, lights, characters, special effects, and anything else. Each game object can also have several child game objects, which can give it more properties. The avatars are created as a Prefab Variant, a special Prefab game object with stored properties. Typically, a game object is added to the scene, and its properties are then configured. A Prefab is a game object with the properties already defined. If the scene requires two or more of the same game objects with the same properties, a Prefab is helpful as the properties are already defined. When many of these game objects are in the scene, and the properties need to be edited, you do not need to edit each game object but just the Prefab. Editing the Prefab applies all the changes to each game object in the scene. A Prefab Variant inherits the properties of the Prefab but can override any number of properties. In the context of avatar virtualisation, a new Prefab Variant can be created for each different scenario. This way, all its properties are inherited and do not need to be added for each scenario. Instead, each scenario can create a variant and only change the required properties, saving a lot of time. The Prefab Variant for each avatar contains a ‘Rigidbody’, responsible for the game object to be affected by gravity, and Animator Component, allowing the animations to be added to the avatar. The variant also contains a Box Collider game object attached to both feet
From Mental Network Models to Virtualisation by Avatars
83
of the avatar, which prevent it from falling through other objects and a script called the Avatar Emotion Controller, which controls the expression of emotions of the avatar. The use of the Prefab makes it easy for the avatar to be used in multiple different scenes with little to no setup required. Now, the avatars are imported into the engine as assets. There must also be a way for the avatars to be controlled and programmed. A dynamic library is created, allowing the avatars to be programmed easily. Several scripts were made and organised into C# namespaces, a programming concept where several different functionalities are grouped. Only relevant classes are exposed in these namespaces. This way, complex implementation details are not revealed, keeping the usage simple. The ‘FacialExpressions’ namespace is responsible for allowing an avatar to show a facial expression or emotion. A single class called ‘EmotionController’ is made available in this namespace, which can be added to a script game object. The script object is a piece of code that can modify a component’s properties, create components, trigger events, and respond to user input. Since the ‘EmotionController’ class must be attached to an avatar, a script called ‘AvatarEmotionController’ is created in which an ‘EmotionController’ class is instantiated. The ‘EmotionController’ class makes a method available in which the emotion name is given and the intensity. It will then make the avatar’s face express that given emotion. This class uses a hidden class that is responsible for parsing the blend shapes in the following manner. Each avatar’s blend shapes have the name of an emotion, e.g., angry, sadness, etc. A dash separates this and then the name of a facial area. Another dash separates the last part of the name, which is the so-called variation. Each blend shape is grouped by the initial name, representing the category and name of emotion. This is followed by a sub-category, the more specific part of the face involved. This is followed by the variation value, where ‘a’ is zero, and each letter next in the alphabet increments it by one. After these are parsed, it can be controlled by setting the main category, the emotion name, followed by an intensity value. All the blend shapes with an emotion name are collected, and each is then set to the intensity value. This way, all the corresponding blend shapes are expressed uniformly. However, this is also a hurdle, as some emotions might need to be expressed in only one part of the face. Furthermore, the game engine must set the intensity gradually and not in a single frame. This can be done by calling a coroutine, a loop that can start and then wait with executing for some time before continuing as if it never stopped. This way, every half a second, one iteration can be completed. This is ideal for setting the emotion intensity gradually to be a smooth transition from neutral to any emotion and from any emotion to neutral. The time for an emotion to get set can be specified as well. Another class in the ‘FacialExpressions’ namespace is the ‘EyeMovementController’, which acts similarly to the emotion controller regarding the avatar’s eye gaze position. It combines four blend shapes, two for moving the eye left and right and the other two for moving the eye up and down. The moving is abstracted to a single function that allows the position of the eyes to be moved to an x and y position. Now that the avatars can express facial expressions and move the eyes, there also needs to be a way for the avatars to move. The avatar could walk away, run away, throw objects, turn around, etc. A lot of the required actions these avatars perform have some things in common. Namely, the implementation requires access to the avatar’s Animator,
84
F. de Jong et al.
RigidBody, and Transform game objects. The Transform object represents where in the world this object is. An abstract class is created, which is implemented for some actions that the avatar needs to perform. This class ensures that the same function can be called on any class that inherits this abstract class. This function is called “trigger” and, as its name suggests, triggers the action performed. In any action, force can be applied to the RigidBody, an animation can be activated or deactivated, and the avatar can be translated or rotated relative to the world. This standard implementation also ensures that the actions are all triggered using a coroutine. This ensures the action isn’t performed in a single frame but however long or short as required. For the avatars to be applied to simulation data from a causal model, a scene must first be set up. In this scene, the avatars required can be added along with any other components that make up the world, cameras, lights, etc. The scene can be different for each model and is not addressed by the work described in the current paper. However, each scenario requires an interface between the avatars and the scenario. The easiest way to accomplish this is to create an empty game object, an invisible object. A script is then added to this game object; when run, it activates. It also contains all the processes specific to that model, i.e. reading the simulation data, initialising the references to the avatars in the scene, and applying the data to the scene over time. The approach is set up in the Unity software. The components created include avatar A and B, which are created and then imported into Unity. These avatars are general and reusable in several different virtualisation scenarios. The libraries created for the avatars include the Emotion Controller, Eye Movement Controller, Actions and the ‘ModelData’ script. However, several components need to be created to connect the virtualisation to simulation data from a model. Namely, a specific script responsible for controlling the avatars. The ‘ModelData’ script must be called to read the simulation data and apply it so that the avatars express emotions or carry out actions. Furthermore, the scene must also be created. This scene contains the environment the avatars are in, which can be tailored to the specific model. An argument between two people can occur in a house, but a model describing a different situation might appear somewhere else. Finally, for the virtualisation to be viewed, a few cameras must be set up and controlled to show the relevant avatar. The approach contains some templates for the virtualisation script, so it is easier to configure.
5 The Designed Virtualisation for the Example Network Model Inside the Unity project, a new scene was created. The first part that must be created is the environment in which the avatars are placed. This process can be extensive however is skipped because it is not necessary for testing the libraries. Therefore a simple cube is created, which acts as the ground. Next, two avatars are placed into the scene facing each other. A few lights and cameras are added to make the avatars visible. Finally, an empty game object is added to the scene. Later, a script will be added to this invisible object which will automatically run when the scene is started. The next step is to create the virtualisation script. Here the simulation data is read using the ‘ModelData’ script, and the avatars are controlled with the read data. This script is attached to the empty game object created in the scene. In this scenario, there
From Mental Network Models to Virtualisation by Avatars
85
are two avatars, and a reference to both avatars is added. Person A will be avatar A, and person B will be avatar B from this point on. The actions of each avatar need to be implemented since these can be specific to the model. In this example, a couple of activities are performed by avatar B. Namely, they look away and walk away. Therefore, two actions need to be created, a gaze-away action and a walk away action. The latter can be added to the ‘Actions’ library utilising a new class. This new class implements the abstract ‘AvatarAction’ class making it easy to call later. The implementation of this action has access to avatars components which enables anything to be modified. The avatar will need to turn around and move forward, implemented by rotating the Transform object and applying force to the Rigidbody of the avatar. Additionally, a few animations can be triggered to make the action look more realistic. The second action is the avatar B’s gaze-away. It was implemented in the same way as the walk away action. A new class was added to the ‘Actions’ library in which the eyes move to the right and down, utilising the eye movement library. Along with this, the avatar also rotates a couple of degrees to the side. With all the behaviours implemented, the script should create a step function where each row of the simulation data is read and applied to the corresponding actions of the avatars. In this example, three values in a row are applied. Avatar A’s anger, a value between 0 and 1 generated from the causal model, represents the intensity of the anger emotion. This value is virtualised by accessing avatar A’s emotion controller and setting the ‘anger’ emotion to the intensity. Next, avatar B’s walk away, also a value between 0 and 1, is applied. Once this value is larger than 0.5, the action is triggered, and the avatar walks away. A simple check is added in the step function, and once the condition becomes true, the action is initiated. The gaze-away value is very similar to the anger value. In each step, the gaze-away action is set to the value representing the intensity. A gaze-away value of 0 means the avatar is not looking away at all, while a value of 1 means they are entirely looking away. The step function is triggered in the virtualisation script once and then waits for 200ms and steps again. The step function is repeated until all the simulation data has been applied. Figure 4 shows how the libraries are used in the virtualisation of this example. Avatar B has two additional actions that were added specifically for this example. The virtualisation script also uses a camera controller script that sets the active camera to the avatar currently performing an action. When avatar B walks away, the camera is positioned, so both avatar A and B are in frame. However, when avatar B is looking away, the camera showing only that avatar is triggered. Some subsequent snapshots of the video displaying this virtualisation of the scenario for the causal mental network model discussed in Sect. 3 are shown in Fig. 5. An mp4 video displaying this virtualisation case can be found as Linked Data at URL https:// www.researchgate.net/publication/354200343.
6 Discussion As part of the CoSiHuman project (Treur et al. 2021), the work reported in this paper aimed to explore creating virtual avatars using the Makehuman software, Blender, and
86
F. de Jong et al.
Fig. 4. Diagram of the example virtualisation using the avatars and libraries.
Fig. 5. Snapshots from the virtualisation of the scenario from Sect. 3.
Unity software and to make these avatars express human-like traits using causal models. Furthermore, a goal was to create avatars and libraries that can be used in several different virtualisation examples. The outcome shows that making these avatars is possible, and applying them to simulation data works. The avatars are also dynamic, as they can act out several behaviours in many different scenarios. Since the software used allows
From Mental Network Models to Virtualisation by Avatars
87
a lot of customisation, creating new and different looking avatars is straightforward, albeit additional tools must be developed to make this less time-consuming. Building the approach further is straightforward as lots of documentation is provided, but the two avatars already created can also be applied to different models. One of the goals was to make it easy for the approach to be expanded, and laying out the creation process should allow for easy adaptations. Since many different software components were used for avatar virtualisation, any part can be modified and changed. For example, say an avatar requires very precise animation of the hand gestures with animations; an avatar can be created using the Makehuman software with a more detailed skeleton. At the same time, all blend shapes can stay the same. Inside Blender, more animations can be added as required. Another example would be facial expressions. If more emotions are needed, or existing emotions need to be modified, the changes can be performed in Makehuman while the remaining export process remains the same. However, creating avatars is not yet as easy because it is very time-consuming. After the initial avatar is created in Makehuman, the facial expressions must be applied and exported. This process is very repetitive and provides an ideal opportunity for automation. A new part of the software can create a base avatar created in Makehuman and automatically applies the blend shapes and animations. While this process might take a bit of time, it should not require any more input from a human. This automation dramatically improves the flexibility of this virtualisation approach, as the creation of the avatar is very easy with an already existing graphic user interface with lots of customisation. A vital feature of this approach is to make the avatars relatable to the humans observing them. The avatars must elicit empathy, which can be achieved in several different ways—for example, matching the appearance to the individuals interacting with the avatar. The most straightforward manner to accomplish this would be to match the gender. Also, making the avatar more human-like is effective. In the work reported in the current paper, a focus was on exploring the possibility of virtualisation. Fortunately, the Unity game engine allows for a vast amount of graphical features. Making the environment feel more authentic will be essential and is possible. Furthermore, making the avatars behave more human-like is something that can be improved. More natural features can be added, like blinking at random intervals or turning the head slightly every once in a while to make avatars less robotic. Avatars can also change poses sometimes. Additionally, several libraries were created to make controlling the virtual avatars easier. These include the emotion controller, eye movement controller, and the avatar actions library. Each component is responsible for making it easy to set the properties of the avatar. Setting emotions, moving the eyes, and triggering actions are easy to use by abstracting the complex implementation. These libraries can also be built on and expanded with relative ease. Overall, the aim of this approach to create virtual avatars was accomplished. The avatars were also applied to simulation data from a causal model showing that it can work. This was further supported by the two experienced software developers from industry who evaluated the approach. They provided very positive feedback and described the approach as well-documented and ready to be expanded in the future. Furthermore, several other further prospects came into view: Automating avatar creation, additional realism features and making the avatars more human-like. Along with this, allowing
88
F. de Jong et al.
bidirectional interactions between humans and avatars also becomes a possibility. In therapy, as a role-playing tool, humans can interact with the avatars not just by observing but by making choices on the fly. This interaction can be in choosing a different emotion regulation method, which in turn causes the avatar to behave differently. Selecting a different emotion regulation method is particularly useful to show how other behaviour changes the avatar’s actions. Furthermore, the possibilities for expanding this approach are virtually endless, and applications to the real world are extensive.
References Blender Foundation. Blender - a 3D modelling and rendering package. Stichting Blender Foundation. Blender Foundation, Amsterdam (2018) Blender Foundation. Introduction — Blender Manual, Docs.blender.org (2021). https://docs.ble nder.org/manual/en/2.90/animation/shape_keys/introduction.html, Accessed: 12 July 2021 Burgos-Robles, A., Gothard, K., Monfils, M., Morozov, A., Vicentic, A.: Conserved features of anterior cingulate networks support observational learning across species. Neurosci. Biobehav. Rev. 107, 215–228 (2019). https://www.sciencedirect.com/science/article/pii/S01497634193 01393 Ekman, P., Cordaro, D.: What is meant by calling emotions basic. Emot. Rev. 3(4), 364–370 (2011). https://doi.org/10.1177/1754073911410740 Ekman, P., Friesen, W.: Unmasking the Face. Malor Books, Cambridge (2003) Ekman, P., Hager, J., Friesen, W.: Facial Action Coding System. Research Nexus, Salt Lake City (2002) Essau, C., LeBlanc, S.S., Ollendick, T.H. (eds.) Emotional regulation and psychopathology in children and adolescents. Oxford University Press (2017) Gilbert, M., Demarchi, S., Urdapilleta, I.: FACSHuman, a software program for creating experimental material by modeling 3D facial expressions. Behav. Res. Methods 53(5), 2252–2272 (2021). https://doi.org/10.3758/s13428-021-01559-9 Hall, L., Woods, S.: Empathic interaction with synthetic characters: the importance of similarity. Encyclopedia Hum. Comput. Interact. (2005). Idea Group Publishing Kim, J.: Philosophy of Mind, 3rd edn., pp. 104–123. Westview Press, Colorado (1996) Makehumancommunity.org. Documentation: What is MakeHuman? - MakeHuman Community Wiki. (2016). http://www.makehumancommunity.org/wiki/Documentation:What_is_M akeHuman%3F, Accessed 26 Jul 2021 Miller, H., Jr.: The SAGE Encyclopedia of Theory in Psychology, pp. 249–250. SAGE Publications, Inc., Thousand Oaks (2016) MCV. Digging into rigging: Mixamo’s character service explored | MCV/DEVELOP, MCV/DEVELOP (2021). https://www.mcvuk.com/development-news/digging-into-riggingmixamos-character-service-explored/, Accessed 12 Jul 2021 Preston, S., de Waal, F.: Empathy: its ultimate and proximate bases. Behav. Brain Sci. 25(1), 1–20 (2002) Singer, T., Tusche, A.: Neuroeconomics (Second Edition), 2nd edn., pp. 513–532. Academic Press, San Diego (2014) Treur, J.: Network-Oriented Modeling for Adaptive Networks: Designing Higher-Order Adaptive Biological, Mental and Social Network Models. Springer Nature Publishing, Cham (2020) Treur, R.M., Treur, J., Koole, S.L.: From Natural humans to artificial humans and back: an integrative neuroscience-AI perspective on confluence. In: First International Conference on Being One and Many (2021). https://www.researchgate.net/publication/349297552
Mapping Speech Intonations to the VAD Model of Emotions Alexandra Dolidze(B) , Maria Morozevich, and Nikolay Pak National Research Nuclear University “MEPhI” (Moscow Engineering Physics Institute), 31 Kashirskoe shosse, Moscow 115409, Russian Federation
Abstract. Human speech carries information not only in its semantics but also in intonation, that delivers a vast amount of knowledge that is usually ignored in modern text-to-speech (TTS) technologies. Development of methods to define and further to compute dependencies between speech and emotions is considered a promising field of research. After looking into the role of intonation in the Russian language an experiment has been suggested to create a mapping between intonational configurations of spoken Russian language and sentiments. The impact of emotions on individuals is tangible on many levels, that is the reason why they are a key part of modelling affective states. As understanding sentiments is as important as expressing them, ways to define them vary from discrete sets to multidimensional spaces. Therefore the suggested mapping is expected to be used further in the task of synthesising more natural and emotion-saturated speech. Keywords: Affective computing · Affective prosody · Speech conversion · Emotional speech · Russian language
1 Introduction Nowadays there is a pronounced necessity for tools aimed at operating (analysis, synthesis and conversion) with sentiments in oral speech. Voice assistants, virtual agents and chatbots are especially in need of such solutions. Despite the constant improvement of some partial solutions and the discovery of new approaches, it is preliminary to talk about a full-fledged solution to the problem. The reason for this is the long-term neglect of oral speech as a potentially attainable and solvable area for research due to its complexity and characteristic features leading to difficulties in modelling, analysis and synthesis [1]. The analysis and synthesis of sentiments in oral speech is a complex task, since it requires a comprehensive understanding of the interaction of all aspects of sounding speech, both semantic and phonetic (more precisely prosodic), i.e., for sentiment analysis it is important not only to know what we say, but rather how we say. As stated by Kröger et al. [2], speech can be a source for many different attributes, especially moods/emotions, impressions on other people and even personality traits. The existence of dependencies between different markers in speech and described attributes is well-established, yet the extraction of such features is a challenging task and many attempts using various approaches to solve it have been made [3]. For instance, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 89–95, 2022. https://doi.org/10.1007/978-3-030-96993-6_8
90
A. Dolidze et al.
Deschamps-Berger, Lamel and Devillers [4] proposed a neural network CNN-BiLSTM architecture for recognition of emotions in real-life scenarios (emergency calls) that showed promising results. Moreover, EmoVoice, designed by Vogt et al. [5], could be a good example as well, as it integrates a naive Bayes (NB) classifier and a support vector machine (SVM) classifier for emotions’ classification. There are different methods that can be used for approaching emotions in speech problems, solving it is beneficial for creating intellectual social virtual agents and augmenting their voices. Integrating and developing human-like artificial intelligence is a growing area, making agents more socially acceptable is one of important goals, especially for biologically inspired cognitive architectures (BICA) [6]. Many models of speech synthesis software that are available on the market offer decent quality of naturalness in their speech, however lack in its emotionality. Improving that aspect may change the perception of interacting with agents, making the experience more realistic. Prosodic signs of speech (i.e., pronunciation features that complement the basic articulation of sound) are the main source of in-speech sentiment, that lay in the basis of analysis. It is also important to mention that prosodic components can be almost completely separated from the articulatory and semantic ones. Moreover, it should be noted that prosodic elements form stand-alone systems and one of the most important of them is intonation. As the relationship between emotions and speech is multi-layered and complex, it was decided to narrow the research field. Our aim is to construct a mapping of a set of intonations given by their characteristics to an affective model, in other words to identify those sets corresponding to specifically coloured emotional speech.
2 Intonation Characteristics Intonation itself is the most important phonetical mean of phrase forming or utterance making, it is also one of the major ways of expressing emotions in speech. It is characterised by a combination of such prosodic means as tone, tempo (speed of utterance) and timbre, as well as speech melody, volume (intensity), stress and pause placement [7, 8]. At the same time, the following acoustic characteristics are of the greatest importance particularly for the emotional expression in intonation: melody, intensity, pauses, phrasal stress and tempo. The set of characteristics of the means of intonation of a particular utterance forms its intoneme [8]. Melody of speech is a change in the frequency of the main speech tone, characterised by the direction, the range, the interval and the rising or falling rate. In the Russian language speech melody plays an important role in intonation expression since the language is not tonal (i.e., the tone phonemes’ pronunciation does not affect its grammatical or lexical meaning), so the precise role of melody is to convey intonation and emotions. Along with pause, it is one of the most significant means of forming intonation. It should be noted that pauses have several language functions. The main one, implemented along with change of melody and tempo, consists in dividing the utterance into syntagmas (complete, meaningful segment of speech [9]). In the context of oral speech sentiment analysis mentioned role of phrasal pause is the most intriguing, as it allows us to determine the values of melodic intervals and their ranges, yet at the same time, apart from
Mapping Speech Intonations to the VAD Model of Emotions
91
the location of pauses, it indicates their duration. It is also important to note that pauses are the only means of intonation that can be partially transmitted in writing through punctuation marks, so that their functions are not to be confused.
3 Impact of Emotions There is no easy way to define the concept of emotion: it is difficult to describe it and give it a precise formulation. Moreover, its’ concept is multidisciplinary, including such sciences as physiology, psychology and more. Further, affect is a spectrum of reactions that can be expressed with emotions, feelings, mood, etc. So there is still a lot of discussions about modelling and theorizing emotions. The most known types of descriptions, enumerated in [10], include: 1. Discrete models (six basic Ekman’s emotions); 2. Dimensional models (Valence-Arousal-Dominance model); 3. Appraisal models (OCC model). Dimensional models provide a wider range of expressible emotions. The VAD model has three independent dimensions: valence (unhappiness to happiness), arousal (sleep to excitement) and dominance (submissive to dominant). Ekman’s set of basic emotions include anger, surprise, disgust, enjoyment, fear, and sadness. The VAD model is taken because of its’ known correspondence with Ekman’s model [11]. Set of Ekman’s emotions is intuitive for people and can be used as a simple description for participants in an experiment. Therefore it is possible to assign VAD’s values to emotion as labels for data using known correspondence. The use of emotions can improve performance of solutions in the field of AI development. For instance, Suresh and Ong [12] have shown that including external source of emotional knowledge (emotional lexicons were used) to pre-trained linguistic models can improve their performance, giving the opportunities for future works and advances. In addition, there are many other, non-affective general semantic dimensions (e.g. abstractness [19]), controlling which would benefit conversational agents. The aim of our work is to find the mapping of the set of utterance intonations to the VAD model of emotions (using as baseline Ekman’s basic emotions). In terms of it, we propose an experiment that allows us to create such a model.
4 Experiment Design and Conduction As part of the experiment, we accept some limitations and determine the analysed means of intonation. As mentioned above, the main acoustic component of intonation, which is speech melody, is determined by a number of characteristics. However, numerical values have only the interval (the time of the beginning and the end of the melody on syntagma, in seconds) and the range (minimum and maximum frequencies on the interval, in Hz).
92
A. Dolidze et al.
Furthermore, the rate of speech is subjected to empirical observations, calculated, in the context of the experiment, as the average speed of utterance, i.e., as the ratio of the total number of words in a phrase to the time of its utterance in a second. The intensity of the utterance is also one of the countable intonation characteristics, calculated in decibels. The placement of pauses is read in seconds, and their duration is calculated as the difference between the moment of the beginning and the end of the pause. The last characteristic of intonation that interests us within the bounds of the experiment is the phrasal stress, that falls on some syntagma during the utterance, which ordinal number is calculated. For all statistical data, the variance is calculated as well as the expected value. To simplify the identification of the characteristics and purify the experiment, a number of restrictions were also applied regarding the initial conditions of the experiment and the choice of utterances for voicing: 1. The set of six basic emotions according to Ekman (anger, surprise, disgust, enjoyment, fear, and sadness) were selected at the initial stage of experiment conduction. 2. Actors aged 16–30 of both sexes were chosen as potential participants. 3. Single-syntagmatic statements expressed in simple sentences are chosen to eliminate the need to identify the pauses’ function [9]. 4. The narrative sentences are chosen. Since the Russian language, according to Bryzgunova [13], has seven types of intonemes depending on types of sentences, only one of them (with a descending tone on the central vowel) is used to express the completeness of the narrative sentence. As emotional intonemes are "layered" on more basic intonation constructions, we decided to choose the most neutral. 5. All the suggested sentences are grammatically correct artificial phrases, where all the root morphemes were replaced by a meaningless set of sounds e.g., «Gloka kyzdpa xteko bydlanyla bokpa (The glocky kuzdra shteckly budled the bocker)» [14]. An example of similar sentences can be found in English as well i.e., «The gostak distims the doshes» [15]. The prosodic and semantic levels of language are considered independent and could be separated from each other. However, to avoid participants’ biased expressing emotions based on the semantic significance of the sentence the use of such sentences is required for better separation of the prosodics from the semantics. 6. Three participants are asked to evaluate a recording to verify accuracy of expressed sentiment. In case two out of three assessments are saying that the conveyed emotion does not match the labeled one, the recording is omitted. The experiment is carried out in the following stages (see Fig. 1): 1. Creation of a set of statements and determination of a set of emotions. 2. Data collection: recording of statements’ voicing by participants with selected emotions. 3. Selection of the required intonation characteristics from the source data. 4. Acquiring the neighbourhood values of the characteristics of the means of intonation.
Mapping Speech Intonations to the VAD Model of Emotions
93
5. Identifying intonemes.
Fig. 1. Listed steps of the experiment conduction.
6. Acquiring a linear mapping of intonation to emotions.
5 Further Development of the Idea In order to validate the hypothesis of generating speech with augmented sentiment the use of text-to-speech synthesis technologies would be needed. In [12] it is stressed that adding emotional knowledge into pre-trained models is a complex task, especially smoothness of integration. The authors also point out that such an approach is the most promising in their opinion. One of the well-known text-to-speech (TTS) systems is Tacotron2. It is a sequence-tosequence recurrent network with attention to predicted mel spectrograms with a modified WaveNet vocoder. It has shown remarkable results and achieves state-of-the-art sound quality close to that of natural human speech [16]. Recently the NeMo(Neural Modules) [17] open-source toolkit has become available. It focuses on creating AI applications through re-usability, abstraction, and composition. Developers created a collection that consists of common neural modules for particular domains that are pre-built and packaged. For instance, there are modules for automatic speech recognition (ASR) and natural language processing (NLP), TTS can also be implemented using NeMo, it relies on Tacotron2 for it.
94
A. Dolidze et al.
In paper [18] the concept of sub-tasking generating flexible speech into 2 parts was discussed: use of TTS system and modular voice conversion. Solving those tasks separately and then integrating solutions in order to get not only natural speech but also to an extent controllable voice features. Hereafter it would be possible to use NeMo modules to generate speech and sentiment-intomene model with mapping for emotional conversion. This way the result could improve perception of the spoken language.
6 Conclusion Affective computing has been exploring how to incorporate the influence of sentiment into computer systems. Building virtual agents, which would be able to interact and converse freely with people, became not only a theoretical pursuit, but an important practical application. Still delivery of speech has a lot of room for improvement. Existing relationship between speech and emotion can be expressed and calculated as correlation between the set of intonation’s characteristics (intonemes) and the point on emotional three dimensional space. Taking that into account the experiment was proposed in order to validate it. Collecting and processing of oral speech within experiment’s limitations would provide a basis for hypothesis testing. The suggested concept, if carried out, can provide a good outline for speech conversion. With combination of TTS software natural speech with augmented intoneme characteristics can be produced and henceforth used in construction of intellectual emotional virtual agents to make them more socially acceptable.
References 1. Chomsky, N.: Syntactic Structures, 2nd edn. De Gruyter Mouton, Berlin-New York (2002) 2. Kröger, J.L., Lutz, O.H.M., Raschke, P.: Privacy implications of voice and speech analysis. Information disclosure by inference. In: FIP International Summer School on Privacy and Identity Management, Luxembourg, pp. 242–258 (2020) 3. Sarma, M., Ghahremani, P., Povey, D., Goel, N.K., Sarma, K.K., Dehak, N.: Emotion identification from raw speech signals using DNNs. In: Interspeech, Hyderabad, pp. 3097–3101 (2018) 4. Deschamps-Berger, T., Lamel, L., Devillers, L.: End-to-end speech emotion recognition: challenges of real-life emergency call centers data recordings. In: 9th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8. IEEE, Nara (2021) 5. Vogt, T., André, E., Bee, N.: EmoVoice—a framework for online recognition of emotions from voice. In: André, E., Dybkjær, L., Minker, W., Neumann, H., Pieraccini, R., Weber, M. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 188–199. Springer, Heidelberg (2008). https:// doi.org/10.1007/978-3-540-69369-7_21 6. Samsonovich, A.V.: Socially emotional brain-inspired cognitive architecture framework for artificial intelligence. Cogn. Syst. Res. 60, 57–76 (2020). https://doi.org/10.1016/j.cogsys. 2019.12.002 7. Torsueva, I.G.: Intonation and Meaning of the Statement. Nauka, Moscow (1979) 8. Svetozarova, N.D.: Intonation System of the Russian Language. Leningrad University Publishing House, Leningrad (1982)
Mapping Speech Intonations to the VAD Model of Emotions
95
9. Shcherba, L.V.: An Essay on French Pronunciation in Comparison with Russian. Vyshaya Shkola, Moscow (1963) 10. Hudlicka, E.: Guidelines for designing computational models of emotions. Int. J. Synth. Emot. (IJSE) 2(1), 26–79 (2011) 11. B˘alan, O., Moise, G., Petrescu, L., Moldoveanu, A., Leordeanu, M., Moldoveanu, F.: Emotion classification based on biophysical signals and machine learning techniques. Symmetry 12(1), 21 (2020) 12. Suresh, V., Ong, D.C.: Using knowledge-embedded attention to augment pre-trained language models for fine-grained emotion recognition. In: 9th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8. IEEE, Nara (2021) 13. Bryzgunova, E.A.: Intonation and syntax. In: Beloshapkova, 3rd edn. Modern Russian language, Moscow (1997) 14. Uspensky, L.V.: A word About Words. [Essays on language], 5th edn. Detgiz, Leningrad (1954) 15. Richards, I.A., Ogden, C.K.: The Meaning of Meaning. Harcourt Brace Jovanovich, Orlando (1989) 16. Shen, J., et al.: Natural TTS synthesis by conditioning wavenet on mel spectrogram predictions. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4779–4783. IEEE, Alberta-Calgary (2018) 17. Nvidia NeMo page. https://developer.nvidia.com/nvidia-nemo/, Accessed 19 Dec 2021 18. Mertes, S., Kiderle, T., Schlagowski, R., Lingenfelser, F., André, E.: On the potential of modular voice conversion for virtual agents. In: 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), pp. 1–7. IEEE, Nara (2021) 19. Samsonovich, A.V., Ascoli, G.A.: Augmenting weak semantic cognitive maps with an “abstractness” dimension. Comput. Intell. Neurosci. 2013, 308176 (2013). https://doi.org/10. 1155/2013/308176
Security Risk Management Methodology for Distributed Ledger Systems Anatoly P. Durakovskiy(B) , Victor S. Gorbatov , Dmitriy A. Dyatlov , and Dmitriy A. Melnikov National Research Nuclear University «MEPHI», (Moscow Engineering Physics Institute), Kashirskoe shosse, 31, Moscow 115409, Russian Federation [email protected]
Abstract. Currently, the implementation of distributed ledger (DL) systems, as the technological base of trusted electronic document management (EDM), is at the initial stage of its practical application. Solving security issues comes down to checking the developed application software using standard methods for assessing the reliability of software tools. Obviously, this is not enough to ensure their comprehensive security. The purpose of this study is to analyze the possibility of solving this problem using the well-known risk-oriented approach to managing information security (IS) of information technology systems (ITS). The subject of the research is the IS risk management (RM) methodology in relation to DL systems (DLS). In this study, the concept of «trust» has been clarified taking into account the presence in the SLS architecture of not only applied ITS, but also other fundamentally important technological components. This greatly expands the understanding of the problem of managing IS in the real conditions of operation of such systems. Criteria for comparative analysis based on well-known techniques are proposed. Recommendations have been developed on the possible structure and content of the IS RM methodology in relation to specific DLS. The research results can be useful to the IS field specialists in the development of new SLS and modernization of existing ITS in terms of increasing the level of their IS. Keywords: Risk management · Information security · Information assets · Distributed ledger · Threat · Vulnerability
1 Introduction Currently, the well-known DL technology1 has passed the initial stage of its real application in various commercial activity fields [1]. Having passed the stage of declaring the «revolutionary» achievements possibility, this technology began to be gradually introduced in the credit and financial sphere, and then in the management systems of the economy real sectors [2, 3]. 1 Distributed ledger terminology. Version 1.0, 11.04.2019. Distributed ledger technologies center
of St. Petersburg State University. URI: https://dltc.spbu.ru/images/files/glossary_v1_0.pdf. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 96–112, 2022. https://doi.org/10.1007/978-3-030-96993-6_9
Security Risk Management Methodology for Distributed Ledger Systems
97
Nevertheless, there are three main «barriers» on the way to the wide practical implementation of DLS [4]. The first one is a little experience of practical use in the ITS market. Currently, only the «blockchain» technology (BC) is actively (but narrowly focused) developing, and then only in the credit and financial sector in the form of introducing digital assets (cryptocurrencies) into the market turnover. This does not yet allow to give an objective assessment of the efficiency of using this technology in the economy real sector, as well as to increase confidence in it on the part of the market participant majority. The second one is the lack of legal regulation, including effective methods (standards, regulations) for the use of such technology. The third one is, as a result, the lack of guarantees for managing IS of the DLS, including generally accepted authentication procedures, managing non-repudiation, confidentiality, as well as the lack of principles for conducting transactions, delegation of authority and establishment of responsibility between participants. The main efforts of specialists in the IS field in the course of creating applied software tools (ST) based on DL technology are aimed at checking the source code of software, as a rule, smart contracts [5], in order to detect errors and vulnerabilities. In addition, other standard software security methods are used, for example, testing and design methodology under a contract (specification). However, such techniques are not effective enough for DLS due to their significant subjectivity and a large number of complex testing procedures, and they do not guarantee completeness of verification under conditions of dynamic changes in the system specification and supplementing it with new elements [6]. In this regard, DLSs are recognized as unreliable and subject to colossal risk due to the lack of trust in them, as a result of ignoring the non-repudiation management [7]. Another «weak» element in the DLS IS management is managing the data confidentiality, in particular, the carried out transactions that are registered in the ledger. Active scientific research and practical developments are being conducted in this direction, for example, within the research topics «Cryptographic mixing networks», «Ring signature», «Homomorphic encryption», «Zero-knowledge proofs» [8–10]. However, now, none of the proposed methods solves this problem as a whole, as well as the problem of DLS performance and data storage, which are created by network participants during of transactions [11]. The widespread misconception [12] that the participants of the DLS (BC) are equal and independent, as subjects of a decentralized system, does not contribute to a satisfactory solution of this problem, which is presented as their advantage due to the absence of an interest conflict. The developer, the customer, the support service, users and/or the market regulator, all of them, objectively, are not equal and independent. It is also necessary to take into account the communication networks that unite all participants of the DLS of general use, which are also potentially vulnerable objects from the point of view of attacks by intruders. The relatively short period of the DL technology development of indicates the lack of effective methods (standards, regulations) for the use of DLS. Today, five areas of the DLS standardization of have been identified [13]: vocabulary; reference model; IS; data management; smart contracts.
98
A. P. Durakovskiy et al.
In 2016, the International Organization for Standardization (ISO) created a specialized technical committee ISO/TC 307 «Blockchain and distributed ledger technologies», which develops ten standards, of which only four have been published, and three of them in 2020. The first ISO 23455 standard «Blockchain and distributed ledger technologies – Overview of and interactions between smart contracts in blockchain and distributed ledger technology systems» was published in September 2019 as the best methodology for organizing the above-mentioned security verification of smart contracts. Based on the open publications analysis results, it can be stated that insufficient or incomplete research of the DLS IS management problems, as well as an incorrect understanding of the DL technology content and properties, the lack of a regulatory legal framework, as well as acceptable methods (standards, regulations) for the DLS application, carry serious risks for the DLS all participants. Obviously, in this case, an integrated approach is needed to assess DLS IS, covering the entire life cycle of the DLS – from design, maintenance and decommissioning. The basis of this approach, which has been well tested in the implementation and operation of well-known ITS and most fully meets the requirements, is the IS RM methodology, which is implemented at all stages of the ITS life cycle. RM helps to identify existing vulnerabilities, potential threats, and optimize the use of technological resources. Currently, risk-based solutions are used at all levels of operating business systems, and there are also many different RM methods that have a large set of indicators and characteristics, which makes it possible to more accurately determine the necessary ones from an IS point of view. The work purpose is an attempt to apply well-known IS RM techniques for DLS (BC), which is very relevant in the case of the development and implementation of ITS using DL and BC technologies. The work result is a methodology that can be considered as the basis for the specific methodology creation for the development and implementation of a standard DLS in a particular real sector of the economy. Practical application of the IS RM method is possible in the following cases (but not only): – at the design stage, for a preliminary assessment of the IS risks of newly created ITS, or when implementing individual components based on the BC technology into existing ITS; – when comparing DLS with other options for supporting a business process from the point of view of managing IS based on the criterion of minimizing the resources used; – when conducting a IS audit, analyzing and monitoring IS management systems, as well as when creating an IS policy for organizations.
2 The Problem of Trust in the DLS Over the past ten years, DL technology has gone through a «thorny» path from general euphoria to evaluation of real efficiency, and has found the greatest use in three activity areas: finance, logistics and trade [14]. Its certain advantages became obvious: streamlining of transactions, transparency within corporate transactions, reduction of corruption component, and fight against counterfeiting. Currently, there is only integration of the developed applied ITS based on the DL technology with existing business processes.
Security Risk Management Methodology for Distributed Ledger Systems
99
The essence of the DL technology (BC) is an attempt to create a single trusted environment for general use, that is, in the presence of principally not trusting each other participants of the information exchange in the conditions of information openness and data exchange. From a theoretical point of view, the trust problem is solved by a purely technological method [1], i.e. by creating appropriate application protocols for the exchange of cryptographic information. But, from a practical point of view, before the integration of new technology into business systems begins in any activity field, there must be a «public» agreement, a mutual agreement of the participants in that sphere. For example, as soon as the financial (although at the moment largely speculative) operation participants «agreed» on the BC technology use, the «accelerated» development of cryptocurrencies began, the role and use of which in the credit and financial sphere defies any explanation. The inability of states to control the electronic money «printing» and tie it to gold and foreign exchange reserves (determination of their nominal value, commodity or gold equivalent) became obvious. The cryptocurrency market has become an essentially unregulated speculative sphere. Moreover, there are known facts of criminal money «laundering», terrorist and extremist activities financing using cryptocurrencies [15, 16]. It becomes obvious that when implementing CPP, more attention should be paid not to the problems of introducing the technology itself, but to the IS managing problems [7], the main of which are managing availability, confidentiality, integrity, accountability and assurance (reliability)2 . The solution to each of the problems has its own specifics, depending on the specific ITS using the DL technology, and therefore requires a significant adaptation of existing methods and IS management methods or the new ones development. In many ways, the ITS security level depends on its security architecture. There is no unified technique of creating its security architecture in relation to the DLS. This is a consequence of the fact that the DLS itself actual architecture (reference model) is presented differently in different sources. For example, the State Corporation «Rostec» has developed a draft plan for the development of this technology [17], where only three levels of architecture are defined: transport, presentation and application (Table 1). This approach is not the only one, but it gives an understanding of the main technological components of the DLS. Now, the most widely used (according to the offer criterion in the ST market) are applied ST, which (most likely) realizes the main theoretical advantage of the DL technology as a trusted business process environment. Applied STs are mainly smart contracts, that is, data transfer protocols that exercise full control of business processes and perform certain operations (transactions) according to a given mathematical algorithm. It is obvious that it is at the applied ST level that the practical implementation of the requirements for the ST reliability (assurance) takes place by testing, checking the program code or on the basis of the contract design methodology, as in the development of standard computer programs [6], and without taking into account the features of other technological components. 2 NIST. Underlying Technical Models for Information Technology Security. NIST Special Pub-
lication 800–33. December 2001. URI: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspeci alpublication800-33.pdf.
100
A. P. Durakovskiy et al. Table 1. Three-level architecture of DL technologies.
DL level
Reference model of OSI level
Technological components
Application
Application
IS management ST Creating; Integrating; Operating
Presentation
Presentation
Software architecture Programming language Ways to stimulate Digital assets Functionality Execution of smart contracts Regulation of access rights Ways to increase throughput
Transport
Session transport
Consensus Security and privacy Transactions
In other words, the methods and techniques used today to manage systemic trust in STs based on DL are not only not adapted to the applied KPI level, but also do not take into account (fully) the architecture features of building such systems (and their security architectures). In this regard, a more detailed and comprehensive methodology is needed to create a truly trusted environment based on a comprehensive solution to the problems of managing the business process IS. The risk-oriented method is the widespread and most acceptable technique to determine the necessary means of information protection that ensure the threats parrying and timely response to IS incidents according to the criterion of minimizing their negative consequences. Based on the specified acceptable risk values, a set of necessary events is carried out to ensure the required trust level in the DLS functioning.
3 IS RM Frameworks Any application of the IS management risk-oriented method for a specific ITS type should be considered as an integral part of a more general methodology for RM in modern business processes. RM provides the most complete information about data assets and their significance (value), existing system vulnerabilities and possible threats, determines the relationship of all functional processes. According to GOST R ISO/IEC 13335-1-20063 , the RM process must develop recommendations for protecting the organization’s assets using appropriate methods and means, which, in fact, is the purpose of applying such a methodology (RM). That standard defines the theoretical and practical foundations of IS RM, which establish that the RM process is continuous, including several stages of the life cycle of complex systems: planning, implementation, operation and maintenance. 3 GOST R ISO/IEC 13335–1-2006. Information technology. Security techniques. Part 1.
Concepts and models for information and communications technology security management.
Security Risk Management Methodology for Distributed Ledger Systems
101
At the planning stage, ITS IS risks are assessed by specifying data assets and their value, and identifying system vulnerabilities and potential threats. Based on the results of the risk assessment, at the next stages of implementation and operation, protective measures are carried out with a given level of trust (security). Risk assessment, in turn, consists of several stages, which are presented in Fig. 1 (in accordance with the International Standard ISO/IEC 27001:20134 ).
Fig. 1. Stages of IS risks analysis (assessment).
The stages presented in Fig. 1 are also general in nature, and therefore, when determining a particular methodology, it is necessary to proceed from the features of a typical ITS, taking into account its typical architecture and business processes. A methodology is being developed for a specific ITS, the main requirement for which is the following: the evaluation results should be comparable and reproducible. The general procedure for carrying out IS RM is described in detail in GOST R ISO/IEC 27005–20185 , the first stage of which is identification, and the second is risk analysis (Fig. 1). In addition to identifying risks, the analysis also specifies their possible values described in this standard. It also specifies the main criteria for assessing and accepting risk, as well as the scope and boundaries of IS management. The possible options for IS RM are set out in the outdated standard GOST R ISO/IEC TR 13335–3-20076 , which includes four variants of the RM strategy: basic and informal approaches, detailed risk analysis, combined approach. The most acceptable technique to assess assets, threats and vulnerabilities is a comprehensive RM, the results of which make it possible to choose specific protective measures. The Fig. 2 shows a flowchart of the procedure for such an analysis (risk acceptability).
4 ISO/IEC 27001:2013. Information technology – Security techniques – Information secu-
rity.management systems – Requirements. 5 GOST R ISO/IEC 27005–2018. Information technology. Security techniques. Information
security risk management. 6 GOST R ISO/IEC TR 13335–3-2007. Information technology – Guidelines for the management
of information technology security – Part 3: Techniques for the management of information technology security (inactive).
102
A. P. Durakovskiy et al.
Fig. 2. Flowchart (algorithm) of the procedure for determining the IS risk acceptability
4 Comparative Analysis of IS RM Methods To adapt the existing IS RM techniques in relation to DLS, it is necessary to carry out their comparative analysis using the best techniques7,8,9 . At the initial stage of the study of RM methods in relation to the DLS, due to the lack of quantitative data and resources, it is advisable to carry out a qualitative assessment of the consequences and setting the incidents probability. It is advisable to use GOST R ISO / IEC 58771–2019 as a basic one, which provides a list (Table 2) of various RM technologies for solving a wide range of practical problems, even though it is not exhaustive. This list is given in alphabetical order without any priority. As can be seen from Table 2, some technologies are used for the same RM procedures. Therefore, in the future, only those RM methods that are presented in Table 2 will be analyzed. Moreover, they will be preliminarily grouped into groups for their application at the appropriate stages of the RM procedure. Table 2. Qualitative IS RM methods. Methods
Application (operations)
Failure mode and effects analysis (FMEA), Failure mode, Effects and criticality Analysis (FMECA)
Risk identification
Failure tree analysis (FTA)
Probability analysis; Analysis of the causes
Event tree analysis (ETA)
Analysis of controls; Impact analysis
Human reliability analysis (HRA)
Analysis of risk sources; Risk analysis
Hazard analysis and critical control points (HACCP)
Analysis of controls; Monitoring (continued)
7 GOST R ISO 31000–2019. Risk management. Principles and guidelines. 8 GOST R 51897–2011/Manual ISO 73:2009. The national standard of the Russian Federa-
tion. Risk management. Terms and definitions (approved and put into effect by the Order of Rosstandart of 16.11.2011 № 548-st). 9 GOST R ISO/IEC 58771–2019. Risk management. Risk assessment technologies.
Security Risk Management Methodology for Distributed Ledger Systems
103
Table 2. (continued) Methods
Application (operations)
Pareto charts
Setting priorities
Hazard and operability studies (HAZOP)
Risk analysis; Risk identification
Risk indices
Risk comparison
Checklists, classification and systematization
Identification of controls; Risk identification
The matrix of consequences/probabilities
Risk report; Risk evaluation
The «bow tie» method
Analysis of controls; Description of the risk
The Delphi method (Delphi)
Views identification
The Ishikawa method («fish bone»)
Analysis of risk sources
Nominal group method
Views identification
Fuzzy logic methods
Risk analysis; Choose between options
Multi-criteria analysis (MCA)
Choose between options
Brainstorming
Views identification
Surveys
Views identification
Causal mapping
Analysis of the causes
Risk registers
Fixing information about risks
The syndic approach
Identification of risk factors
Value at risk (VaR)
Risk analysis
Structured or semi-structured interviews
Views identification
Structured «what-if?» technique (SWIFT)
Risk identification
Scenario analysis
Impact analysis; Risk identification
Reliability-centered maintenance (RCM)
Controls selection; Risk evaluation
Toxicological risk assessment
Risk evaluation
To select a criterion for comparative RM techniques analysis, in order to take as a basis one of the methods for the IS risk managing, let’s consider the main characteristics of RM techniques presented in Table 3. An important feature of the DLS that distinguishes it from other ITS is that such systems are focused on the interaction of all participants united by a public network [1] to solve a joint task (business process). Therefore, it seems logical to consider only those RM methods that do not require a large amount of input data about participants. Grouping the risk assessment methods (Table 2) by application in alphabetical order and adding the above criteria, we obtain data for comparison, which are presented in Table 4. Another characteristic of RM technologies in relation to DLS is the level of expertise used. DLS has a rather complex architecture that combines a set of various IT tools [1]. In addition, every year the DLS architecture complexity increases due to the emergence
104
A. P. Durakovskiy et al.
of new, more stringent requirements for the performance and functionality of the ITS. The competence (professionalism) level of the expert performing the IS risk analysis (assessment) should correspond to the complexity of this system and suggests (possibly) his additional training or a highly specialized (narrowly focused) examination. Table 3. The main characteristics of the IS RM methods. Characteristics
Details
Application
Using the IS RM method
Volume – the level of risk application
Company, Department, Process
Time horizon
Short-term, Medium-term, Long-term
Decision-making level
Strategic, Tactical, Operational
Incoming amount of information or data
Low, Medium, High
The level of expertise used
Low, Medium, High
Type of method
Qualitative, Quantitative, Combined
Time and cost for application
Low, Medium, High
Thus, for the DLS, it would be logical to use, as the main criteria for comparing the RM methods quality, the low or medium volume of incoming information or data and the high level of the experts involved competence (high-level expertise). The three IS RM methods contained in Table 4 – «VaR», «Toxicological risk assessment» and «Checklists, classification and systematization» – require a large amount of input data and do not meet the first criterion. Of the remaining methods, most have low and medium requirements for the expertise level, which does not meet the criterion of «high level of expertise». Only three methods fully meet this criterion – «FTA», «HRA» and «Fuzzy Logic Methods», which are based on high-level expertise. The two methods presented in Table 4 partially meet the «high level of expertise» criterion – «HAZOP» and «RCM». They require either a high or medium level of expertise, depending on the research being conducted. If a deeper analysis is needed in HAZOP, and if a more effective IS RM process is needed in RCM, then a higher expertise level is required. It is precisely these output characteristics that correspond to the DLS, which makes it possible to accept these two risk assessment methods as acceptable for the case studied in the article. Table 5 presents the IS RM methods that meet the specified criteria, as well as additional characteristics of risk assessment technologies. As can be seen from the presented comparison, only a few IS RM methods meet the specified comparison criteria, but do not cover the entire analysis (assessment) procedure. Excluded from consideration are such operations (Table 2) as the analysis of controls and management and consequences, identification of views, identification of controls and management and risk factors, monitoring, description and comparison of risk, setting priorities, fixing information about risks.
Security Risk Management Methodology for Distributed Ledger Systems
105
Table 4. The incoming data volume and the expertise level during the IS RM. Methods
Incoming data volume
Expertise level
FMEA, FMECA
Depends on the Application
Medium
FTA
Low, Medium
High
ETA
Low, Medium
Medium
HRA
Medium
High
HACCP
Medium
Medium
Pareto charts
Medium
Medium
HAZOP
Medium
Medium, High
Risk indices
Medium
Low, Medium
Checklists, Classification and systematization
High
Low, Medium
The matrix of consequences/Probabilities
Medium
Low, Medium
The «bow tie» method
Any
Low, Medium
Delphi
–
Medium
The Ishikawa method («fish bone»)
Low
Low, Medium
Nominal group method
–
Low
Fuzzy logic methods
Low, Medium
High
MCA
Low
Medium
Brainstorming
–
Low, Medium
Surveys
Low
Medium
Causal mapping
Medium
Medium
Risk registers
Low, Medium
Low, Medium
The syndic approach
Low
Medium
VaR
High
High
Structured or semi-structured interviews
–
Medium
SWIFT
Medium
Low, Medium
Scenario analysis
Low, Medium
Medium
RCM
Medium
Medium, High
Toxicological risk assessment
High
High
The selected methods (Table 5), which are basic for IS RM, are mainly used at the stage of risk identification. For the «IS Risk assessment» stage (Fig. 1), none of the selected methods meets the specified criteria. Such methods have several applications in the risk analysis (assessment) at different levels of ITS architecture. Basically, these methods allow to determine the operational or tactical risk in the medium term, and also require medium or long duration and cost of expertise.
106
A. P. Durakovskiy et al.
From this point of view, the most preferable method is based on «Fuzzy Logic Methods», which is applicable at any level of the system architecture, in any time perspective, is used at various levels of decision-making with an average duration and implementation cost. The five methods of ITS IS RM selected as a comparison result (Table 5), as mentioned above, do not cover the entire procedure of risk analysis (assessment). However, they can be used as the basis for the integral methodology creation for DLS. First, the application of these methods is not strictly specified. Methods are constantly being developed and improved. For example, the «HAZOP» methodology was originally developed for the analysis of chemical process systems, but over time has spread to other areas, including the field of electronic systems. That methodology is mainly applicable at the stage of detailed the system design, however, its structure in the form of a systematic examination by a specialists group of the project individual elements in order to identify hazard and operability problems allows it to be used at all stages of risk analysis (assessment). Table 5. IS RM methods that meet the specified criteria. Application
Method
Application level
Time horizon
Decision-making level
Effort level
Probability analysis
FTA
Department, Process
Medium
Operational tactical
Medium, High
Analysis of risk sources
HRA
Department, Process
Any
Operational tactical
Medium, High
Analysis of the causes
FTA
Department, Process
Medium
Operational tactical
Medium, High
Risk analysis
HRA
Department, Process
Any
Operational tactical
Medium, High
Fuzzy logic methods
Any
Any
Any
Medium
HAZOP
Process
Medium, Long
Operational tactical
Medium, High
Controls selection RCM
Company, Department
Medium
Operational tactical
Medium, High
Choose between options
Fuzzy logic methods
Any
Any
Any
Medium
Risk identification
HAZOP
Process
Medium, long
Operational tactical
Medium, High
Risk evaluation
RCM
Company, Department
Medium
Operational tactical
Medium, High
Security Risk Management Methodology for Distributed Ledger Systems
107
Second, these methods are focused on expanding their fields of application, i.e. on new systems, such as DLS, and can be adapted to new application conditions and/or combined with each other. For example, the «FTA» method is most useful when used in complex ITS, involving interaction between many objects/subjects and taking into account possible difficult-to-predict options for information exchange. Third, as already mentioned, DLS are created in conditions of distrust between its participants, that is, the subjective factor plays a rather significant role in such systems. One of the selected methods «HRA» is aimed specifically at the active participant, at assessing his contribution to managing the security and reliability of interaction. Fourth, due to the small experience of using DL technology, the necessary and sufficient statistical data for assessing its security are not known. In conditions of the parties distrust, the data received by the DLS have varying degrees of reliability and can be very doubtful. In such cases, the method based on «Fuzzy Logic Methods» is preferable. Fifth, CPR is a system consisting of technical means complexes with vulnerabilities. Therefore, it requires attention from the point of view of reliability and maintenance security. This involves the use of the «RCM» methodology. This methodology covers the entire life cycle of the ITS and, in addition, allows to assess the economic efficiency and expediency of the decisions taken. At the same time, it is impossible to exclude the possibility of using other IS RM methods that do not meet the criteria selected in this work, for example, in the case of their modernization, as well as their further development or integration with other methods that meet the specified conditions.
5 Recommendations for the Development of Specific IS RM Methods As mentioned above, the methodological IS RM basis of is the practice according to which, when analyzing (assessing) risks, it is necessary to take into account the consequences of previous IS incidents, the ITS vulnerabilities detection and the IS threats likelihood. Based on the IS RM results, specific proposals and recommendations (measures) are developed in order to reduce risk, data on existing risks is exchanged between interested parties, continuous monitoring is carried out, which improves ITS control and monitoring processes. The ISO/IEC 27001–2013 standard proposes an iterative method that allows to increase the level of detail (specification) of RM [18], and which provides for a stepby-step cyclic IS RM procedure at each the DLS architecture level and for each of its technological components. In this regard, when developing a practical methodology, it is necessary to clarify the specific DLS architecture. Therefore, for example, it is necessary to point out the three-level DLS architecture limitations (Table 1), which does not take into account the availability of participants’ computing devices hardware, and telecommunications equipment of data transmission networks. Accordingly, the risks of fault tolerance and/or disaster tolerance, for example, fall out of consideration. If, by default, the hardware reliability at the moment can still be considered acceptable from a practical point of
108
A. P. Durakovskiy et al.
view, then ensuring the disaster tolerance of complex ITS in itself is a complex scientific and technical problem [19]. The importance of taking into account technological factors is also evidenced by the developing relatively new direction of state regulation of IS management, which has received the legal name of managing the security of critical information infrastructure facilities [20]. Therefore, the three-level DLS architecture (Table 1) needs to be supplemented with another level, which will call physical. It combines the physical, channel and network layers in a reference model of open systems interaction (RMOSI) [21]. Such a four-level DLS architecture is presented in Table 6. The main methods recommended for use in IS RM systems in the interests of specific DLS are presented in Table 5. Obviously, any methodology includes a mandatory section that contains a description of the procedures for documenting the IS RM results, the architecture and all technological components, the assessment methods used, the final results and recommendations, and suggestions for risk reduction. Thus, the structure of the DLS IS RM methodology consists of two main elements: – description of the step-by-step procedure of the IS RM for the specified technological DL components; – documenting the IS RM results. Iterations of the analysis (evaluation) procedure can be carried out sequentially or in parallel with each other, which is determined by the responsible executor for each specific case, based on the features of the created or existing DLS, as well as when additional data appears that affect certain technological components. Table 6. A four-level model of the DL architecture DL level
RMOSI level
Technological components
Applications
Applications
Security applications Creation; Integration; Functioning
Presentation
Presentation
Software architecture, Programming language Ways to stimulate Digital assets Functionality Execution of smart contracts Managing access rights Ways to increase throughput
Transport
Session mode transport
Consensus Security and privacy Execution of transactions
Physical
Network channel physical
Telecommunications networks Telecommunications equipment Computer systems
Security Risk Management Methodology for Distributed Ledger Systems
109
When preparing the final documentation, all technological components are indicated, regardless of whether a risk assessment was carried out against them or not. In the absence of an any component risk assessment, a corresponding record is made indicating the reason why the check was not carried out. Records are also made of all new data or information that appeared during the verification that affect any technological components and what measures were taken in this regard. When re-conducting a RM for any component, the results of both checks are subject to documentation. The main content of the developed methods for DLS IS RM is determined by the combination of descriptions of two essential elements – procedural risk assessments and risk assessment methods. In accordance with the standard «GOST R 58771–2019. «Risk management. Risk assessment technologies.» (or simply Standard) IS RM methods selected in this study as meeting the specified criteria in relation to DLS have a specific description and are distributed according to the appropriate technologies. The «FTA» method belongs to the Standard category B.5 of «Technologies for understanding consequences, Probability and Risk». Approaches in this category provide a comprehensive understanding of the expected consequences and the occurrence their likelihood. Using the «FTA» method, the conditions for the occurrence of some undesirable event are set and the causes of these conditions are analyzed. A logical causeand-effect relationship between events and their sources is built, existing patterns of risk occurrence are determined. The method is mainly used to analyze the probability of occurrence and identify the causes of IS incidents. That is, the «FTA» method is suitable for the entire process of DLS IS RM. The «HRA» method also belongs to the Standard category B.5. With the help of this method, the possibilities of making erroneous actions by the user are identified and analyzed, as well as factors affecting the likelihood of subjective and objective errors related directly to the user, a third party or the interaction infrastructure. This method is mainly used for the analysis of risk sources and the analysis of the risk itself, that is, it is suitable, as well as the «FTA» method, for the entire process of IS RM. In addition, the «HRA» method is used in the design of ITS in order to understand and take into account all the requirements of their participants. This makes it possible to use this method in determining the main criteria, scope and boundaries of the analysis, as well as the organizational structure of the IS RM system. The use of the «HRA» method during the modification of the system makes it possible to determine the user’s impact on the system, directions for improving risk assessment procedures in order to reduce errors, as well as to identify and reduce the influence of factors causing errors. All of the above makes it possible to use this method in the monitoring process and in risk reassessment. Determining the influence of the human factor is necessary at all stages of the system life, both as a whole and during the interaction of individual services and employees during operation. That is, the «HRA» method can be used at the stage of exchanging information about risk and when it is shared by several participants in the system, in particular those who are affected to one degree or another by the IS RM procedure.
110
A. P. Durakovskiy et al.
The «HAZOP»10 method belongs to the Standard category B.2 «Identification Technologies», using a structured technique of IS RM. With the help of this method, potential security breaches of developed or existing systems are identified. It is applicable for risk analysis and identification. In addition, the «HAZOP» method is used for detailed development and design of the system, which allows it to be used at the stage of determining the risk analysis (assessment) context, for testing it and improving the final documentation at the documentation stage. The «HAZOP» method application for assessing changes in organizational structures and ways of carrying out such changes also allows it to be used at the stage of exchanging information about risk. The method based on «Fuzzy Logic Methods» belongs to the Standard category B.5 «Technologies for understanding consequences, probability and risk». With the help of this method, it is possible to formalize qualitative indicators and their rationale when making a decision in conditions of uncertainty (lack of information). The method based on «Fuzzy logic Methods» is used in the analysis (assessment) of risk and the choice between possible solutions, that is, it can be used at the entire stage of IS RM and at the risk acceptability assessment stages. The «RCM»11 method belongs to the Standard category B.8 «Technologies for assessing the Significance of risk». The method of this category is used to determine the admissibility or acceptability of the final security risk. With the help of the «RCM» method, the values of risks are assessed and the required policies and tasks for the maintenance of the system and its components are determined from the managing their security point of view. The method is mainly used to select the controls, and risk analysis (assessment), that is, to assess the qualitative parameters of risk. However, it can also be used for risk identification. The greatest efficiency of using the «RCM» method is achieved at the system development stage, that is, when determining the content of the IS RM procedure. The process of analyzing functional failures using this method allows to determine the necessary tasks to reduce risk during further the system operation, which makes it possible to use the method at the stage of determining the risk acceptability. Also, the «RCM» method is used when monitoring the system in order to determine the requirements for functional support and possible reassessment of its elements. At the stage of the system operation, the data collected as part of the procedure for implementing this methodology provides feedback between the various system elements, making it possible to use the «RCM» method at the stage of risk information exchange. Note that the process of applying the «RCM» method is subject to documentation. The Fig. 3 shows the general flowchart of the DLS IS RM procedure. As noted above, now the selected methods do not fully cover the entire process of IS RM and are recommended only as a basis (integral methodology) for adapting existing and creating new methods of risk analysis (assessment).
10 GOST R 27.012–2019. Dependability in technics. Hazard and operability studies (HAZOP
studies). 11 GOST 27.606–2013. Dependability in technics. Dependability management. Reliability centred
maintenance.
Security Risk Management Methodology for Distributed Ledger Systems
111
Fig. 3. The general flowchart of the DLS IS RM procedure.
At the same time, it should be understood that in order to manage IS of complex ITS, such as DLS, complex approaches are needed that combine the use of several risk analysis (assessment) methods and technologies at the same time, which can be identified only in the process of the demanded methods practical implementation.
6 Conclusion Thus, the study of the DLS security issues shows the obvious need for a risk-based approach to solve the task. This is due to the expansion of the trust concept sphere in the DLS, taking into account all its fundamentally important real technological components. The analytical review of ITS IS assessment methods carried out in this paper in relation to the DL technology on the basis of the proposed comparison criteria shows that an integrated methodology combining the use of several risk assessment methods and technologies at the same time is necessary to solve the task. The such solution concretization on the basis of the recommendations proposed in the work can be implemented only in the process of the necessary IS RM methods practical creation for the specific DLSs, which is set by the authors for their further research on the development of the this work ideas.
References 1. Konkin, A., Zapechnikov, S.: Techniques for private transactions in corporate blockchain networks. In: 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), pp. 2356–2360 (2021). https://doi.org/10.1109/ElConRus5 1938.2021.9396228 2. Khmelnitskaya Z.B., Bogdanova, E.S., Ivich, M.L.: Construction of integrated logic systems using blockchain technology. Sci. Works Free Econ. Soc. Russia. 227(1), 360–378 (2021). https://doi.org/10.38197/2072-2060-2021-227-1-360-378 3. Zakoldaev, D., Yamshchikov, R., Yamshchikova, N.: The blockchain technology in Russia: achievements and problems. Bull. Moscow State Regional Univ. (2), 93–107 (2018). https:// doi.org/10.18384/2224-0209-2018-2-889
112
A. P. Durakovskiy et al.
4. Ducrée, J.: Research – a blockchain of knowledge? Blockchain: Res. Appl. 1(1–2), 100005 (2020). https://doi.org/10.1016/j.bcra.2020.100005 5. Hsain, Y.A., Laaz, N., Mbarki, S.: Ethereum’s smart contracts construction and development using model driven engineering technologies: a review. Procedia Comput. Sci. 184, 785–790 (2021). https://doi.org/10.1016/j.procs.2021.03.097 6. Merkin-Janson, L.A., Rezin, R.M., Vasilyev, N.K.: Architecture of the formally-verified distributed ledger system InnoChain. Model. Anal. Inf. Syst. 27(4), 472–487 (2020). https://doi. org/10.18255/1818-1015-2020-4-472-487 7. Budzko, V.I., Melnikov, D.A.: Information security and blockchain. Highly Syst. 14(3), 5–11 (2018). https://doi.org/10.18127/j20729472-201803-02 8. Shukla, S., Thakur, S., Hussain, S., Breslin, J.G., Jameel, S.M.: Identification and authentication in healthcare internet-of-things using integrated fog computing based blockchain model. Internet Things. 15 (2021). https://doi.org/10.1016/j.iot.2021.100422 9. Regueiro, C., Seco, I., de Diego, S., Lage, O., Etxebarria, L.: Privacy-enhancing distributed protocol for data aggregation based on blockchain and homomorphic encryption. Inf. Process. Manage. 58 (2021). https://doi.org/10.1016/j.ipm.2021.102745 10. Konkin, A., Zapechnikov, S.: Privacy methods and zero-knowledge poof for corporate blockchain. Procedia Comput. Sci. 190(07), 471–478 (2021). https://doi.org/10.1016/j.procs. 2021.06.055 11. Zapechnikov, S.V.: The distributed ledgers ensuring privacy-preserving transactions. Bezopasnost informacionnyh tehnology 27(4), 108–123 (2020). https://doi.org/10.26583/bit. 2020.4.09 12. Pravikov, D.I., Scherbakov, A.Y.: Changing the paradigm of information security. Highly Syst. (2), 35–39 (2018). http://radiotec.ru/ru/journal/Highly_available_systems/number/2018-2/art icle/20178. Accessed 14 Aug 2021 13. Bylinkina, E.V.: Blockchain: legal regulation and standardization. Law Politics. (9), 143–155 (2020). https://doi.org/10.7256/2454-0706.2020.9.33614 https://nbpublish.com/lib rary_read_article.php?id=33614. Accessed 14 Aug 2021 14. Blinova, O.: Blockchain and real business: integration problems. Invest-Foresight. Blockchain. 5 Mar 2021. https://www.if24.ru/blokchejn-i-realnyj-biznes-problemy-integr atsii. Accessed 14 Aug 2021 15. Vladimir I.B. Melnikov, D.A.: The historical view of the blockchain technology. The more things change, the more they stay the same. IT Secur. (Russia). 25(4), 23–33 (2018). ISSN 2074–7136. https://doi.org/10.26583/bit.2018.4.02 16. NIST. Blockchain Technology Overview. National Institute of Standards and Technology Internal Report 8202 (NISTIR 8202). October 2018. DOI: https://doi.org/10.6028/NIST.IR. 8202 17. Korolyov I. Russia will spend 36 billion rubles on the development of blockchain. What will it give? Cnews. IT in the public sector. 20.04.2020. https://www.cnews.ru/articles/2020-0419_v_rossii_potratyat_36_mlrd_rub_na_razvitie. Date accessed 14 aug. 2021 18. Il’chenko, L.M., Bragina, E.K., Egorov, I.E., Zaysev, S.I.: Calculation of risks of information security of telecommunication enterprise. Open Educ. 22(2), 61–70 (2018). https://doi.org/ 10.21686/1818-4243-2018-2-61-70 19. Budzko, V.I., Keyer, P.A., Senatorov, M.Y.: The disaster recovery solution for haits holistic approach. J. Highly Syst. 9(5), 14–24 (2008). https://www.elibrary.ru/item.asp?id=10439557. Accessed 14 Aug 2021 20. Natalichev, R.V. et al.: Evolution and paradoxes of the regulatory framework for ensuring the security of critical information infrastructure facilities. IT Secur. (Russia) 28(3), 6–27 (2021). ISSN 2074–7136. https://doi.org/10.26583/bit.2021.3.01 21. Fathi, V.A., Otakulov, A.S.: OSI network model. Mod. Sci. 28(10–2), 545–547 (2020). https:// elibrary.ru/item.asp?id=44150156. Accessed 14 Aug 2021
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model for the Design of Artificial Cognitive Agents Roman V. Dushkin2
and Vladimir Y. Stepankov1(B)
1 National Research Nuclear University MEPhI, Moscow 115409, Russia 2 Artificial Intelligence Agency, Volkonsky 1st Lane, 15, Moscow 127473, Russia
Abstract. The article presents a review of the phenomenon of understanding the meaning of the natural language and, more broadly, the meaning of the situation in which the cognitive agent is located, taking into account the context. A specific definition of understanding is given, which is at the intersection of neurophysiology, information theory and cybernetics. The scheme of abstract architecture of a cognitive agent (of arbitrary nature) is given, concerning which it is stated that an agent with such architecture can understand in the sense described in the work. It also provides a critique of J. Searle’s mental experiment «The Chinese Room» from the point of view of the construction of artificial cognitive agents implemented within a hybrid paradigm of artificial intelligence. The novelty of the presented work is based on the application of the author’s methodological approach to the construction of artificial cognitive agents, while in the framework of this approach is considered not just the perception of external stimuli from the environment, but the philosophical problem of «understanding» artificial cognitive agent of its sensory inputs. The relevance of the work follows from the renewed interest of the scientific community in the theme of Strong Artificial Intelligence (or AGI). The author’s contribution to the considered theme consists in complex consideration from different points of view of the theme of understanding perceived by artificial cognitive agents with the formation of prerequisites for the development of new models and the theory of understanding within the framework of artificial intelligence, which in the future will help to build a holistic theory of the nature of human mind. The article will be interesting for specialists working in the field of artificial intellectual systems and cognitive agents construction, as well as for scientists from other scientific fields — first of all, philosophy, neurophysiology, psychology etc. Keywords: Philosophy of mind · Philosophy of artificial intelligence · Chinese room · Semantics · Perception · Understanding · Learning · Machine learning · Artificial intelligence · Strong artificial intelligence
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 113–126, 2022. https://doi.org/10.1007/978-3-030-96993-6_10
114
R. V. Dushkin and V. Y. Stepankov
1 Introduction “The Chinese Room” is a thought experiment at the intersection of the philosophy of mind and the philosophy of artificial intelligence, which was proposed in 1980 by John Searle (Searle 1980). This is perhaps the most talked about thought experiment of all that has been proposed in this area. Nevertheless, until now the question that J. Searle raised regarding “The Chinese Room” has not been resolved — “Does the Chinese Room understand Chinese?” (Searle 2001). It is interesting that J. Searle in his article tried to immediately give answers to a wide range of arguments of his future opponents, nevertheless, in subsequent years, an astonishingly large number of publications appeared on the topic of “The Chinese Room”. However, the past 40 years since the initial publication marked the most powerful breakthrough in the theoretical understanding of the theory of artificial intelligence (AI) and, particularly, development and application of applied AI technologies (Dushkin 2019). There are achievements, among other things, in the development of methods for constructing artificial cognitive agents of the general level (Artificial General Intelligence (AGI) or “General purpose artificial intelligence”, which in the terminology of J. Searle is called “Strong AI”). For example, the work (Dushkin and Andronov 2019) proposes a model of a hybrid architecture of an artificial cognitive agent, which can become a prototype of AGI — this architecture will be briefly discussed in the next section. Therefore, there is a reason to consider “The Chinese Room” from these positions. In the author’s opinion, the methodological problem with J. Searle’s “The Chinese Room” is that in this thought experiment there was an attempt to justify the impossibility of building a Strong AI, while the example was a scheme of a weak AI agent based on a computational approach within the framework of the von Neumann architecture. It looks approximately the same as if in relation to a Homo sapiens the question was asked “does any particular neuron (or at least a complex of neurons) in its neocortex understand what this person reads at a given time?”. On the one hand, this is a reference to the so-called “Cartesian theater” (Dennett 1991) — according to modern ideas, there is no “center of consciousness” in the human nervous system, so the person (his central nervous system) as a whole understands the meaning, and not some separate complex of neurons. On the other hand, the description of the thought experiment did not give clear definition to the notion “understanding”, and hence the discussion about understanding “The Chinese Room” sense perceived the statements down to the level of the intuitive approach to artificial intelligence (Turing 1950), which today is not considered a serious scientific and engineering communities. Therefore, in this paper we will try to give an operational definition of the phenomenon of understanding meaning by a cognitive agent of arbitrary nature. Finally, an important question in the field of artificial intelligence philosophy, which directly follows from the conclusions of the thought experiment under consideration, is whether artificial cognitive agents (AI systems) will ever be able to understand the meaning of what is happening around them and the signals received by their sensory inputs. The answer to this question depends on understanding the nature of “understanding meaning”, and this paper provides a justification for the thesis that artificial cognitive agents have the fundamental ability to understand meaning taking into account “life experience” and context.
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model
115
The innovation of the presented work is based on the application of the author’s methodological approach to the construction of artificial cognitive agents (Dushkin and Andronov 2019), despite the fact that this approach considers not just the perception of external stimuli from the environment, but the philosophical problem of “understanding” an artificial cognitive agent of its sensory inputs. This understanding is based on the construction of the description of the dynamic model of perceived reality taking into account both the continuity of the acts of perception, applied to the process of perception “personal experience” of the agent and the context in which it is located. The relevance of the work stems from the renewed interest of the scientific community in the topic of Strong artificial intelligence. The author’s contribution to the topic under consideration consists in a comprehensive consideration from various points of view of the topic of understanding perceived by artificial cognitive agents with the formation of prerequisites for the development of new models and theory of understanding within the framework of artificial intelligence, which in the future can help to build a holistic theory of the nature of the human mind. The article will be of interest to specialists working in the field of building artificial intelligent systems and cognitive agents, as well as scientists from other scientific fields — first of all, philosophy, neurophysiology, psychology, etc. The authors invite all interested to an interdisciplinary dialogue to develop models, methods and approaches to the study and construction of both individual modules and functions of AGI, and Strong AI in general.
2 Briefly About the Hybrid Architecture of AI Agents In the work of (Dushkin and Andronov 2019) defines and describes the hybrid architecture of an artificial cognitive agent — a program that perceives information from the environment of its functioning through sensors and acts on the environment with the help of actuators (executive devices), as well as solves any cognitive tasks in the process of functioning. Nevertheless, it makes sense to bring this architecture here and briefly reflect its main properties, as well as give the motivation behind its development. In this work, the classification of artificial intelligence methods (as an interdisciplinary field of research) is based on the identification of two paradigms — “top-down” and “bottom-up”, as it was defined by J. McCarthy and M. Minsky in the Laboratory of Informatics and Artificial Intelligence at MIT (Dushkin 2019). The “top-down” paradigm of artificial intelligence combines a logical and symbolic approach to the construction of AI agents, which are based on the use of formal logic and symbolic mathematics. The “bottom-up” paradigm includes structural (artificial neural networks), evolution (genetic algorithms), and quasi-biological approaches (using chemical reactions to execute calculations). It is interesting that the two presented paradigms have important properties. Using the methods of the “top-down” paradigm, it is possible to design and implement AI agents that are difficult to train (all the knowledge in them is described explicitly in the implementation process), but at the same time they can more or less easily explain the decisions made in the course of their functioning. On the other hand, AI agents based on the “bottom-up” paradigm can be easily trained — which is why most of the methods of
116
R. V. Dushkin and V. Y. Stepankov
the “bottom-up” paradigm are combined by the term “machine learning”. However, the explanation and interpretation of decisions made by such AI agents is extremely difficult (Shumsky 2020). The so-called “hybrid paradigm” comes to the rescue, in which the two approaches mentioned above to the construction of artificial cognitive agents are combined — the best is taken, while the negative aspects are leveled. In other words, AI agents built on a hybrid paradigm can, on the one hand, solve complex cognitive problems by “bottomup” methods, but at the same time it is possible to use significant benefits of “top-down” methods — modeling, forecasting and explaining decisions made. The Fig. 1 shows a generalized hybrid architecture of an artificial cognitive agent (simplification and reworking of the technical scheme from (Dushkin and Andronov 2019). Actually, the diagram below shows a standard architecture of an agent operating in an environment, expanded with special nuances that are explained later in this article. Features of the hybrid approach to the design and implementation of artificial cognitive agents is the simultaneous use of methods of “bottom-up” and “top-down” paradigm of artificial intelligence (Dushkin and Stepankov 2021). In this case, this means that the cognitive agent uses both neural network methods for analyzing input information and symbolic methods for making decisions. In addition, the organismic approach is taken into account, when the architecture of an artificial cognitive agent is partially based on the known principles of the functioning of natural cognitive agents. In particular: 1. All information from the environment is perceived by the agent through sensors (there are no special features in this — all agents are arranged in this way, and it is hardly possible to imagine otherwise). Then the sensor information is cleared and fed to the input of the sensor integration unit. It also provides information from sensors monitoring the internal state of the cognitive agent to ensure homeostasis of vital signs. The sensory integration module is based on the neural network approach — the basic processes of perception, recognition and cognition are carried out here. 2. At the same time, information from sensors can be immediately fed to the reactive management subsystem, where ready-made patterns of response to the situation in the environment are searched. If there is such a pattern, it is used directly to make an impact on the environment. However, the proactive control subsystem has the ability to suppress the reactive circuit if any features of the environment description are detected during its operation, which require a more careful approach to the development of the agent’s reaction. 3. Multisensory integration consists in creating a holistic description of the situation of the external environment in which the agent finds himself (at that case Searl’s argument is juxtaposed to cross-modal Turing test (Leshchev 2021_1). This description is fed into the proactive management subsystem to make a decision about how the agent should act.
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model
117
4. A proactive management system has a memory in which the agent’s experience is accumulated (we call this Hierarchical Associative Memory (Stepankov and Dushkin 2021)), as well as tools for modeling the agent’s behavior in the environment and predicting its state options after acts of influence on it. This allows to carry out intelligent planning of the agent’s activities based on the solution of the “internal conflict” with the choice of the most effective option of action. 5. The selected action option is fed to the input of the high-level command translation module into the language of actuators, which can be expressed in sequential programming of the agent’s actuators for active interaction with the environment. Also, there can be an impact on the internal states of the agent itself through internal actuators. 6. Besides, the proactive management subsystem releases the newly created action program to the reactive management subsystem for further rapid response in cases where the environment is in approximately the same state as it was at the time when the proactive management subsystem developed a new action program. In the future, this program will be automatically executed by the reactive management subsystem. The introduced scheme of activity of a hybrid artificial cognitive agent is sufficiently generalized in order to concretize it with specific methods of recognition, modelling, forecasting and solving all the other tasks listed in the previous description, which are solved in the cycle by the presented architecture. This allows us to design specific cognitive agents based on this template, which work on the basis of various methods of artificial intelligence in environments of different nature. It should also be noted that the presented architecture is also suitable for natural cognitive agents — in particular, humans. Indeed, a person receives information about the external environment through a set of sensory systems, and the body continuously monitors vital signs to ensure their homeostasis (Cannon 1926). Further, the thalamus performs multisensory integration of information to form a complete picture of perceived reality in the neocortex (Melchitzky and Lewis 2017). If the situation requires instant decision-making and implementation, then the reflex circuit is activated (for example, if a person touches a burning object with his hand), however, many (but not all) reflexes can be deliberately suppressed. In various zones of the neocortex, basic pattern recognition, recognition of the situation as a whole, modeling, forecasting and decision-making with further programming of the sequence of actions are carried out. After that, if the command to act is given, this program is implemented through its translation into a sequence of neuronal activations of many pathways to the muscles to influence the environment. (Ashby 1960). In the process of bringing such commands to automatism, consciousness and conscious thinking are less and less involved in solving similar problems, and the response program itself “descends into the cerebellum” (Schmidt and Tevs 1996). The given description of the general scheme of human activity is the basis of the motivation that is the foundation for building hybrid schemes of organization of artificial cognitive agents and, in particular, the hybrid architecture presented in this section.
118
R. V. Dushkin and V. Y. Stepankov
Fig. 1. Generalized hybrid architecture of an artificial cognitive agent
3 What is Meant by “Understanding”? Understanding is one of the most important concepts of the philosophy of consciousness in particular and philosophy in general. If we expand the anthropocentric definition of understanding, then it should be understood as a certain process that in some way relates to an abstract concept, physical object or phenomenon, which allows you to adequately interact with such a concept, object or phenomenon. Understanding is one of the sufficient properties of intellectual behavior (Bereiter 2009). At the same time, in his original work, J. Searle did not give an exact definition of the phenomenon of “understanding”, but simply asked the question: “Does the person sitting in the Chinese Room understand Chinese?” (Searle 2001). In fact, it was an appeal
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model
119
to an intuitive understanding of the phenomenon of “understanding”. And, in general, by doing so, J. Searle opened up the widest possibilities for various interpretations of his thought experiment, which is why «The Chinese Room» became the most debated experiment in the history of the philosophy of consciousness. J. Searle tried to refute the possibility of creating a so-called Strong Artificial Intelligence, that is, a set of technologies that can lead to a self-aware artificial cognitive agent that has an “understanding” in the human sense of the word, «whatever that means». Moreover, such a cognitive agent would probably have a consciousness, including a phenomenal one. According to J. Searle’s “appropriately programmed computer with the right inputs and outputs will be a mind, in the sense that the human mind is a mind” (Searle 1980). At the same time, if we turn to the neurophysiological foundations of the work of natural neural networks in the training mode, the mystery of the phenomenon of “understanding” will be revealed. In accordance with modern views (Shumsky 2020), in the process of perception and learning in the human neocortex, a large number of associative connections are built between the so-called “neocortical hypercolumns”, each of which consists of a large number of columns that are activated by the appearance of perceived images of different modality in the receptive fields of the sensory zones of the cerebral cortex. In fact, activation of the neocortical hypercolumn means that at this moment the image of the object, phenomenon or abstract concept that corresponds to the activated hypercolumn and a specific set of columns in it is involved in the flow of thoughts of a person. In the process of human development from the embryo to the adult state, inclusive, a huge number of learning acts are continuously and constantly carried out in the brain, which consist in building or removing associative connections between semantic concepts, which at the physiological level corresponds to the appearance or destruction of synaptic connections between neurons of various cortical columns (Stout and Khreisheh 2015). Such synapses allow the activation of an extensive set of related concepts when understanding an object or phenomenon. And the richer a person’s life experience, the more associative connections there are in the cerebral cortex, the more associations are involved in the process of thinking (while, of course, the basal ganglia «conducts» the ensemble of activations so that it corresponds to the current goal) (Shumsky 2020). One of the aspects of neuroplasticity — the building of synaptic connections — is precisely responsible for the mass creation of associations in the human neocortex (Chang 2014). An important point in this process is the so-called multisensory integration that occurs in the thalamus and is responsible for the holistic perception of the surrounding reality (Jones 2012). Thus, in the process of teaching a child through multisensory integration, an amazing number of associative connections between hypercolumns are formed in his neocortex. For example, if we consider the concept «table», then it will match hypecolumn in which by means of sparse coding, activation is carried out in response to the appearance of images of tables of various types in front of the trained child, the manifestation of acoustic waves, which are perceived as a sound form of the word “table”, and even the appearance of images of letters forming the corresponding word. Moreover, the columns that are part of this hypercolumn will activate various related concepts, including the general “table” pattern — the concepts of legs, surface,
120
R. V. Dushkin and V. Y. Stepankov
horizontality, smoothness, woodiness, etc. Learning what a “table” is means to build all these associative connections that are excited in response to the appearance of a sensory signal of any modality associated with the “table” object of the reality. Thus, if such a trained child is asked if he understands what a “table” is, then he will answer positively. And the nature of its understanding lies precisely in the activation of the indicated connections between different representations of the object of reality in his neocortex. In other words, understanding is the recognition and activation of associative relationships. After all, if a person does not understand something, this is due to the fact that in the process of cognitive activity, a sufficient number of associative connections with objects that were already in his personal experience of cognition were not activated, and therefore the incomprehensible object cannot be integrated into the hierarchical structure of associative relationships of his cortical hypercolumns. What happens if a child trained in this way is shown the inscription: “桌子” (zhu¯ozi, “table” in Chinese)? In his neocortex there will be no “response” of associative connections collected in the process of obtaining personal experience, and therefore these Chinese characters will not create an image of a “template table” in front of his inner eye. However, if you teach and consolidate what you have learned, these two symbols will be included in the cognitive field of the concept of “table”, and first through translation into native language, and then it will directly excite all the necessary associations. And, in this way, the child will learn to understand Chinese. To understand it in the sense that is described here a few paragraphs earlier. From the above reasoning follows an interesting cognitive distortion about understanding, which many people are subject to. The fact is that a person’s personality is based on his personal experience (LeDoux 2020), which determines the unique connectome peculiar to a particular person. The connectome is the entire dynamic set of connections between neurons, and it follows that the connections between cortical columns and hypercolumns are also unique to each individual. Despite the fact that at high levels of abstraction, people form more or less similar hierarchical systems of concepts, at lower concretized levels, understanding is based on personal experience. For example, the inner concretization of the concept “table”, each person will be using the specific image of the specific tables, captured in memory in the form of engrams received, often in childhood. A more prominent example is the internal representation of imaginary images while reading a work of art. Let a certain author describe in his work any area in which events unfold. Surely, when creating the work, the author imagined within himself some specific area from his childhood. When reading, each reader will represent the area that is known to him and only to him from his personal childhood experience. Childhood — because it is at an early age the memory is filled with very colorful impressions of what is perceived in the surrounding reality. In other words, it follows that each person understands the meaning in his own way. And cognitive distortion is the desire to assume that all people understand what is said and perceived equally. This is far from the case. So, J. Searle’s original question about “The Chinese Room” doesn’t make much sense, even to humans.
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model
121
4 Will Artificial Cognitive Agents Be Able to Understand Meaning? However, if we move from the neurophysiological foundations of memory and understanding to consider the possibilities for artificial cognitive agents (artificial intelligence systems) to get the function of understanding, since it was on these possibilities that the original idea of the thought experiment “Chinese Room” was directed, then everything is not so rosy. Here we see a certain methodological error, which few people notice behind the numerous spears breaking. At the same time, the very formulation of the experiment gives reason to believe that it incorrectly answers the question of whether artificial cognitive agents will ever be able to understand the meaning of perceived information in the same way as a person does. J. Searle describes some analogy to the von Neumann architecture of building computing systems, in which the central processor is a person sitting in a room. And he asks a question about this person, that is — if we consider the analogy — about the central processor of a computer. However, according to modern views (Chalmers 2018), a person does not have an analogue of a computer processor in the central nervous system, and the processing of input information in the human brain is carried out on completely different principles, and it is built on a different architecture. Nevertheless, at the same time, the work of the nervous system does not go beyond the computational paradigm — it can be explained in terms of information theory and cybernetics. That is, a person is also a computer system, quite complex, but nonetheless, working on the basic principles of mathematics. A similar position is described in the book (Chalmers 2018; chapter 9, Section 4). At the same time, J. Searle asks whether the analogy of von Neumann architecture presented in the thought experiment understands the input information. The answer in his opinion is negative, and this is quite reasonably argued by him. However, the methodological error lies in the fact that the system presented in the thought experiment is a so-called weak AI system. Weak artificial intelligence is called a such AI system, which is aimed at solving a specific problem using cognitive methods. For example, the visual image recognition system is a weak AI system. Is it reasonable to ask the question of understanding in relation to a weak AI system, if understanding itself as a phenomenon belongs to the prerogative of strong cognitive agents? The answer is obvious. If we consider the modern theory of artificial intelligence (Dushkin 2019; Shumsky 2020), then all modern technologies for solving cognitive problems can be divided into several types: pattern recognition, search for hidden patterns, processing and understanding of text in natural language, as well as decision-making. Weak AI agents by definition solve some specific task, that is, such a task is reduced to one of the four listed area. Among them, only the task of processing and understanding the text in natural language is somehow related to the subject of this work — understanding. The remaining tasks are somehow correlated and related to understanding, but do not require it as such for their solution. Therefore, weak AI systems that solve these problems, by definition, do not have understanding. It is interesting to consider weak AI systems designed to solve problems of analyzing natural language utterances. The task of understanding natural language in the framework of artificial intelligence is set — natural language understanding (NLU). However, at the moment, most of the existing technologies solve the problem of natural language
122
R. V. Dushkin and V. Y. Stepankov
processing — natural language processing (NLP). This implies that there is no need for understanding in the sense described in the previous section. Current NLP technologies, based on both formal grammars and statistical and neural network approaches, actually show a reactive model of artificial cognitive agents, when judgments about the quality of their functioning and the solution of their task are made on the basis of observation of their external behavior. If an artificial cognitive agent responds adequately to most of the natural language utterances addressed to it, then it “understands” them. But, of course, the same questions can be addressed to a human or even other higher animals. Does a dog really understand, for example? If you study its behavior, then no doubt after a certain number of training acts, the dog begins to understand the speech addressed to it. «Let’s go for a walk» says the owner, and the dog happily jumps and wagging his tail. We are most likely not talking about primitive reflex reactions, since in the process of recognizing key words in the composition of phrases addressed to it, the dog also takes into account the internal state, which is expressed in the emotional background and, at least, in the presence of memory of recently past events. However, it can be assumed, based on a comparison of the structure of the nervous system of dogs and humans, that to some extent all the arguments from the field of neurophysiology of perception, recognition and understanding, given in the previous section, apply to the dog, as well as to other higher animals and even birds, although they do not have a cerebral cortex. In the nervous system of animals, there are mechanisms for the mass creation of associative connections between groups of neurons responsible for recognizing sensory images or recognizing abstract concepts. So, in the neocortex of the dog, associative connections are created from the auditory sensory cortex from neurons that recognize the phrase «Let’s go for a walk», directed to the areas of memory and positive emotions associated with walking, and then to motor zones with programs for expressing joy. And this can really be called understanding, since the mechanism is similar to a human one. But how could an understanding of AI agent be designed? (The question of whether this is necessary should be left out of the scope of this work, since here an attempt is made to answer J. Searle’s question about understanding). At the moment, there is at least one example of a natural agent with a more or less understandable architecture, about which it can be said that it “knows how to understand” in the sense that was given to this term in this work. Based on this analogy, you can build a project of an understanding AI agent that could become the forerunner of general-level AI (AGI). Yes, it could be argued that “an airplane does not fly like a bird, but it does fly”, and therefore an understanding AI agent does not necessarily have to copy the principles underlying human understanding. However, it is necessary to start somewhere, and then try to consider generalizing concepts, including by considering the cognitive abilities of birds, especially in terms of understanding. In the first section of this article, the architecture of a hybrid artificial cognitive agent was presented, on the basis of which it would be possible to try to implement the understanding function. Indeed, natural cognitive agents (humans, dogs, parrots, etc.) are based on this architecture. Therefore, in accordance with the principle stated in the previous paragraph, it is this architecture that should be taken as the basis for an understanding artificial cognitive agent.
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model
123
Thus, the design and development of an artificial intelligent agent that could have an understanding can be carried out on the following principles: 1. The AI agent must have a set of various sensors with which it interacts with the environment and receives from its sensory information of various modality, as well as sensory information about the internal state of the AI agent itself. It is important to have sensors of several modalities, at least two, but it is better to have more. However, the question of the upper limit of the number of sensory modalities is open and requires additional research. 2. Sensory information coming from the environment must be filtered and aggregated directly in the AI agent’s sensory systems. Then the purified and aggregated information is transmitted further to the reactive control subsystem and to the multi-sensor integration unit. 3. The reactive management subsystem of an AI agent is a system that performs a rapid reflex response to known stimuli. However, the operation of this subsystem can be suppressed by a signal from the proactive management subsystem, which is designed to simulate the future state of the environment and the AI agent in it for more thoughtful decision-making. In addition, the reactive management subsystem can escalate into a proactive focus of attention in cases where something has gone wrong — in a known situation that requires a reflex reaction, the environment does not respond as usual. 4. In addition, signals from the AI agent’s sensors are sent to the multisensory integration unit, where a complete picture of the environment should be built based on data fusion technology. It is here that the same set of associative connections is formed, which is the memory of the AI agent about everything that happened to him — his personal experience of functioning. 5. The memory of the AI agent located in the proactive management subsystem should be arranged on hierarchical and associative principles, while after receiving the next package of sensory information, purified, aggregated and integrated by the previous blocks, associative and hierarchical connections are used to form a complete description of the environment model of the situation in which the AI agent is located, which also includes the context. It is at this step that the AI agent develops an understanding of what is happening. 6. The generated situation model is considered as a proactive management subsystem, in which a set of management actions is created, which is sent both to the command translation unit in the low-level code, and to the reactive management subsystem for the formation of a new reflex circuit. 7. Through the actuators, commands translated into low-level code have an impact on the environment, and the cycle is repeated from the very beginning. A special feature of this architecture is that the described cycle of interaction of an AI agent with the environment in which it operates is carried out continuously and, moreover, without waiting for the previous iteration of this cycle to complete. That is, the AI agent constantly perceives multisensory information from the environment, which it undergoes this kind of processing with the formation of a branched associative memory,
124
R. V. Dushkin and V. Y. Stepankov
which will eventually allow it to understand what is happening, including taking into account the context. As you can see, the described scheme of functioning of an understanding AI agent is quite abstract and is disconnected from the specific type of the AI agent itself, the type of environment and the problem area (especially in such context as ambient intelligence, ubiquitous computing, smart technologies (Leshchev 2021_2). This means that the scheme is a template for building understanding AI agents, specifying them for a given environment and the task to be solved in it. If we return from the abstract template to “The Chinese Room”, then the described scheme will allow the AI agent to understand natural language in the sense that a person does, since from the very beginning of its functioning, he will accumulate his personal experience, accumulating associative memory. The associative memory of an artificial cognitive agent in this case will represent a very strongly connected semantic network (Žáˇcek and Telnarová 2019), the activation of nodes in which will represent an “understanding of the current situation”. The set of all nodes of the semantic network activated at a time is an act of cognition, a thought of a cognitive agent. The transition from one set of activated nodes to another in this case will be a “stream of thoughts”. In fact, the presented principles and template scheme for designing and implementing an understanding AI agent opens the way to strong AI or general artificial intelligence (AGI). Here it is important to focus on the fact that in this case it may well happen that the created AI agent with multisensory integration capabilities and associative memory will be able to obtain internal phenomenal states (Dushkin 2020). However, this remains a matter for further research.
5 Conclusion In this paper, it is shown that J. Searle’s thought experiment “The Chinese Room” should be revised and not applied to modern approaches to the construction of artificial cognitive agents that are designed and implemented within the hybrid paradigm of artificial intelligence. The term “understanding” was not clearly defined by the author of the thought experiment, and therefore the question of whether an AI agent understands the meaning of phrases addressed to it in natural language is meaningless. Nevertheless, an attempt is made to define the phenomenology of understanding by natural intellectual agents with further transfer of this definition to the agents of artificial nature. The hybrid architecture of the AI agent is presented, based on the analogy with the upper-level abstract consideration of the processes occurring in the nervous system of higher animals. The AI agents that will be designed and implemented within this architecture can be the forerunners to strong AI. However, there are still unresolved issues outside the scope of this work that require further study. It is necessary to consider the principles of organizing hierarchical associative memory for implementing understanding in AI agents, including studying how the understanding of abstract concepts that do not have an embodiment in reality is arranged. It is also necessary to carefully analyze the similarities and differences between the main components of understanding and decision-making in birds, which lack a cerebral cortex, and in terrestrial animals, which have different types of cortex and architectonics.
Criticism of the «Chinese Room» by J. Searle from the Position of a Hybrid Model
125
Moreover, researchers should focus on swarming intelligent systems of the natural environment — ant heaps, swarms of bees, flocks of birds, etc. The authors will continue their research and experiments in this area. The authors are grateful for the valuable ideas received in discussions with E. Vvedenskaya, E. Gavrilina and Y. Kochubeyev.
References Ashby, W.R.: Design for a Brain. The origin of Adaptive Behaviour. Wiley, New York, 304 p. (1960) Bereiter, C.: Education and mind in the Knowledge Age. — Lawrence Erlbaum Associates, 523 p. (2009). ISBN 0-8058-3942-9 Cannon, W.B.: Physiological regulation of normal states: some tentative postulates concerning biological homeostatics. A Charles Riches amis, ses collègues, ses élèves / A. Pettit. Les Éditions Médicales, Paris, 91 p. (1926) Dushkin R.V., Stepankov, V.Y.: Hybrid bionic cognitive architecture for artificial general intelligence agents. Procedia Comput. Sci. 190, 226–230 (2021). ISSN 1877-0509, https://doi.org/ 10.1016/j.procs.2021.06.028 Chang, Y.: Reorganization and plastic changes of the human brain associated with skill learning and expertise. Front. Hum. Neurosci. 8(55), 35 (2014). https://doi.org/10.3389/fnhum.2014. 00035 Dennett, D.C., Allen, L.: (ed.) Consciousness Explained. The Penguin Press, London, 551 p. (1991). ISBN 978-0-7139-9037-9 Jones, E.G.: The Thalamus: [angl.]: in 2 vol. — pedakci ot 1985 goda. — H-Йopk: Springer, Boston, 915 c. (2012). ISBN 978-1-4615-1749-8. https://doi.org/10.1007/978-14615-1749-8 LeDoux, J.E.: How does the non-conscious become conscious? Current Biol. 30, R196-R199 (2020) Melchitzky, D.S., Lewis, D.A.: Kaplan and Sadock’s Comprehensive Textbook of Psychiatry: In: Sadock, B.J., Sadock, V.A., Ruiz, P. (eds.) 1.2 Functional Neuroanatomy. vol. 2, no. 10, Lippincott Williams & Wilkins, Thalamus, pp. 158–170 (2017). ISBN 978-1451100471 Searle, J.: Minds, brains, and programs. Behav. Brain Sci. 3(3), 417–424 (1980). https://doi.org/ 10.1017/S0140525X00005756 Searle, J.: Chinese Room Argument. — The MIT Encyclopedia of the Cognitive Sciences. MIT Press, Cambridge, pp. 115–116 (2001). ISBN 0262731444 Stout, D., Khreisheh, N.: Skill learning and human brain evolution: an experimental approach. Cambridge Archaeol. J. 25(04), 867–875 (2015). https://doi.org/10.1017/S0959774315000359 Turing, A.: Computing Machinery and Intelligence. Mind LIX(236). 433–460 (1950). https://doi. org/10.1093/mind/LIX.236.433 Žáˇcek, M., Telnarová, Z.: Language networks and semantic networks. Central Eur. Symp. Thermophys. 2019 (Cest). AIP Conf. Proc. 2116(1), 060007 (2019). https://doi.org/10.1063/1.511 4042 Dushkin, R.V.: Artificial intelligence. DMK-Press, Moscow, 280 p. (2019). ISBN 978-5-97060787-9 Dushkin, R.V.: On the question of recognition and differentiation of the philosophical zombie. Philos. Thought. (1), 52–66 (2020). https://doi.org/10.25136/2409-8728.2020.1.32079. http:// e-notabene.ru/fr/article_32079.html Dushkin, R.V., Andronov, M.G.: Hybrid scheme for constructing artificial intelligent systems. Cybern. Program. (4), 51–58 (2019). https://doi.org/10.25136/2644-5522.2019.4.29809. http:// e-notabene.ru/kp/article_29809.html
126
R. V. Dushkin and V. Y. Stepankov
Stepankov, V.Y., Dushkin, R.V.: Hierarchical associative memory model for artificial generalpurpose cognitive agents. Procedia Comput. Sci. 190, 723–727 (2021). ISSN 1877–0509. https://doi.org/10.1016/j.procs.2021.06.084 Chalmers, D.: The conscious mind. In search of a fundamental theory. Librocom, Moscow, Series «Philosophy of Consciousness». 512 p. (2018) Schmidt R., Tevs G. (1996) (editors-in-chief). Human Physiology, vol. 1, pp. 107–112. «Mir», Moscow (1996). ISBN 5–03–002545-6 Shumsky, S.A.: Machine Intelligence. Essays on the Theory of Machine Learning and Artificial Intelligence. RIOR, Moscow, 340 p. (2020). ISBN: 978-5-369-01832-3 Leshchev, S.V.: Cross-modal Turing test and embodied cognition: agency, computing. Procedia Comput. Sci. 190, 527–531 (2021). https://doi.org/10.1016/j.procs.2021.06.061 Leshchev, S.V.: From artificial intelligence to dissipative sociotechnical rationality: cyberphysical and sociocultural matrices of the digital age. In: Popkova, E.G., Ostrovskaya, V.N., Bogoviz, A.V. (eds.) Socio-economic systems: paradigms for the future. SSDC, vol. 314, pp. 65–72. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-56433-9_8
Walking Through the Turing Wall Albert Efimov1,2
, David I. Dubrovsky3 , and Philipp Matveev1,4(B)
1 Sberbank Robotics Laboratory, Moscow, Russian Federation 2 National Research Technology University “MISiS”, Moscow, Russian Federation 3 Institute of Philosophy of the Russian Academy of Sciences, Moscow, Russian Federation 4 Lomonosov Moscow State University, Moscow, Russian Federation
Abstract. Can the machines that play board games or recognize images only in the comfort of the virtual world be intelligent? To become reliable and convenient assistants to humans, machines need to learn how to act and communicate in the physical reality just like people do. The authors propose two novel ways of designing and building Artificial General Intelligence (AGI). The first one seeks to unify all participants in any instance of the Turing test – the judge, the machine, the human-subject as well as the means of observation instead of building a separating wall. The second one aims to design AGI programs in such a way that they can move in various environments. The authors thoroughly discuss four areas of interaction for robots with AGI and are introducing a new idea of techno-umwelt bridging artificial intelligence with biology in a new way. Keywords: Artificial intelligence · The turing test, post-turing methodology · Techno-umwelt · Intelligent robotics · General artificial intelligence · AGI
1 Introduction The paper Computing Machinery and Intelligence by A. Turing was first published in 1950s. It was quite a long time ago. However, the history of the idea of a humanlike creature, endowed with artificial intelligence, is coming from much older times than the last century. Even Aristotle seriously considered the “automation” of reasoning by formalizing it with syllogisms, i.e., logical premises and conclusions that serve as elementary building blocks of rational thinking. In the 1930s, Kurt Gödel formulated and then proved incompleteness theorems, according to which no system of formal arithmetic can be complete and internally consistent at the same time. In other words, there is no such system that allows one to prove or disprove any given statement. This had puzzled many researchers for a while. But soon Alan Turing and Alonzo Church introduced the concept of computable function (solved in one system or another) and showed that all functions can be solved not by formulas but algorithmically, for example, using a Turing machine [1]. The Turing’s thesis, in its simplest form, says that a universal Turing machine can perform any computation that a human can do [3]. This idea, surprising in its simplicity and depth, paved the way for the emergence of the first computers, on which Turing himself worked during the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 127–137, 2022. https://doi.org/10.1007/978-3-030-96993-6_11
128
A. Efimov et al.
Second World War. At that time the British scientist thought of creating an “intelligent machine” (intelligent machine). The term “artificial intelligence” was not yet coined. A lot of effort is being put by various institutions at the national level into building Artificial General Intelligence. Notably, in Russia Sberbank, a leading Russian technology corporation, has launched a large-scale AGI research program, attracting the world’s top talent like J. Schmidhuber [3]. The authors of this paper are working on the philosophy and methodology of the AGI research program and want to share their views on it. Particularly, we have jointly developed some novel approaches to cognitive architecture for the future AGI. It might be useful to facilitate some fruitful outcomes of combining the Narrow AI approach and a more general one. In recent years, the actual issues of AI development have been widely discussed at high-level conferences like Artificial General Intelligence [4], Robophilosophy [5] and some others. Notably, the issues raised by Turing 70 years ago provoked some discussions at an important conference “Beyond Turing” [6]. It was organized by G. Marcus and such researchers of artificial intelligence and robotics as B. Lenat, K. Forbus, S. Scheiber, T. Podgio, E. Meires, S. Adams, G. Banavar, M. Campbell, C. Ortiz, L. Zhitnik, A. Agraval, S. Antol, M. Mitchell, H. Kitano, V. Jarrold, G. Marcus, O. Etzioni and others took part in it. Many papers in that conference were dedicated to novel ways of testing robotics and artificial intelligence, as well as some substantiated proposals were made on the use of embodied intelligence to create AGI models. Many of these researchers are simply trying to transfer the Turing test methodology by using a robot instead of an abstract computing machine that simulates a human conversation. For example, one of the original ideas from “Beyond Turing” conference was that of using AI as an independent factor in scientific discoveries. Today, the head of AI program at Sony Corporation, H. Kitano, believes that the Turing test can no longer be a criterion for creating artificial intelligence, and that at the existing level of technology it is possible to develop an “AI system that can make major scientific discoveries in biomedical sciences, and that is worthy of a Nobel Prize” [7]. Additionally, it is necessary to note some individual works and works done by groups of researchers, such as W. Nöth [8], A. Clark [9], H. Ishiguro [10], S. Penny [11] et al. Overall, there are three approaches to a long-term research program in AGI: connectionism, logical-representation and embodied intelligence. The authors of this paper take side with the last one, and it is strongly supported in the works of R. Brooks [12], Clark [13] and many others, who maintain there is a connection between the cognitive functions of intelligence (both human and machine) and physicality, and are convinced that the classical view of machine functionalism on the role of representations in cognition is “too cerebral” [14]. Clark gives an example of a study where a behaviour of female crickets was analyzed and one of its findings was that male crickets used a unique sound source localization system. Clark argues that this process is carried out completely without any internal representations of the surrounding world, relying entirely on the mechanical solution by the female cricket. Similar mechanisms are used by people to solve everyday problems [9]. Some researchers are bridging robotics and AGI with biology and semiotics by making attempts to implement the idea of umwelt for robotics. For example, it is clear that a robot perceiving the world solely via the radio waves emitted and absorbed by
Walking Through the Turing Wall
129
radars cannot understand the “red” color. Thus, a robot might have a kind of a limited umwelt similar to the umwelt of an insect. Few authors agree with this, discussing the issues of umwelt for robots and the semiotic meaning of perception and artificial intelligence [8]. However, their discussion is limited only to the application of robotics to the physical world. A very thorough discussion on the issues of virtual humans, AI and embodiment can be found in Burden et al. [15]. AGI as well as robotics are developing very rapidly, and lots of definitions are improving very fast. For the definitions, looking for a clearer understanding of the issues of modern robotics, including intelligent robotics and use of AI in robotics, one should address the Murphy’s handbook [16].
2 Chess and Ciphers Many experts today believe that before embarking on the creation of artificial intelligence, one should figure out the nature and structure of the natural. However, Turing saw the problem in a completely different way. He inherited the ideas of Rene Descartes, who considered the living organisms as fully automated beings, believing that a full-fledged consciousness and thinking are only characteristic of humans. In such a world view, the human mind, his intellect is separated from the real world as a part of a different “sphere of consciousness”. Likewise, in the thought of Alan Turing, the intellect did not specifically depend on its physical carrier. In his famous 1950 paper Computing Machinery and Intelligence he identified several areas representing the “highest manifestations” of human intelligence that should be modelled in the future [17]. They are the study of languages (and translations), games (chess, etc.), mathematics, and cryptography (including solving riddles). If in these fields of activity, a computer cannot be distinguished from a human, claimed Turing, then we should consider their thinking as equivalent, and we can say that we are dealing with an “intelligent machine.” Turing didn’t think that the most prominent thing in a person was the ability to play chess, conduct sublime dialogues or solve cryptographic riddles. Turing was convinced that to create intelligent machines with the abilities comparable to humans, it was not enough just to teach the machine to interact with the physical world. In a 1948 report to the National Physics Laboratory, Turing wrote that such a machine “would not be able to evaluate such things so important to humans as food, sports, or sex” [18]. Creating a machine capable of interacting with the real world means to follow the path of a more guaranteed artificial intelligence, while Turing considered this path to be longer and more expensive than teaching a computer to play chess. Although this article was written more than 70 years ago, it set the conceptual foundations for many generations of researchers into artificial intelligence [19]. According to Turing’s approach, high-level brain intelligence functions can be reproduced in an artificial system without imitating the system in the physical world. And the test, described in his article, developed such representations.
130
A. Efimov et al.
3 Inside and Outside the Wall Reflecting on the test, Turing started from the Victorian “imitation game”. According to its rules, a presenter, exchanging notes with the players, must determine who of the players is a woman and who is pretending to be one by exchanging notes with them. A “referee” is separated from the players by a wall impenetrable to everything except for symbolic information, such as notes or, in modern terms, chat messages. This test can be seen as an “intelligence test” for a man who has to imitate “feminine” (of course, in representations of Victorian times) behaviour and reactions. The Turing test transferred this situation to a game with a computer which must simulate a living person hidden from the judge by the same “wall”. This wall seems to be an indispensable element of the test because without it we will immediately see who we are dealing with. It hides the physical reality of the conversation partner and reduces his entire thinking down to a certain limited set of processes. At the same time, even Turing himself admitted that a comprehensive human knowledge of the world is impossible without a direct interaction with this world. However, at that time, imitating tasks as doing sports or sex seemed completely unthinkable, so the British scientist postponed them to the indefinitely distant future and suggested to focus on games, languages, and cryptography. As a result, Turing launched a kind of race between a man and a machine exclusively in the virtual space. At the same time, the idea of such a test stimulated the development of the systems that performed certain narrow functions better than humans, whether it was playing chess, translating, or driving a car, and that were even ready to replace us in one area or another. The narrow capabilities of the intellectual machine was originally laid down in the paradigmatic idea of Turing, which limited the intellect only to simple verbal, symbolic communications and ignored all other modalities. Can the intelligence that plays chess, chats, and solves riddles be called a general AI? It is hardly possible. However, if a machine (a robot or a computer) remains separated from the person and the world by a wall, it is unable to fully interact with them, and the machine’s true intelligence is replaced by the complexity of the functions it implements. That is probably enough for an unmanned vehicle or a chess program, but it is not enough in the pursuit of general AI without a paradigm shift: we must “break the wall” and move on to a new, post-Turing methodology. The post-Turing methodology implies that all the elements of the “test” mentioned by Turing should constitute a single whole and are seen as a complex: the observing judge, the subject (a person or a computer), and the questioning tool (the wall turns into a rich interactive environment, a sort of interface between a machine and a person).
4 Verification by Dating a Girl For an easier explanation of post-Turing methodology, we shall refer to the thought experiment “verification by dating a girl” proposed by Alexeev [2]. Let us say a young man meets a girl at an online dating platform. After a long virtual conversation the young man finally invites her out for a date only to discover that he has been talking to a program all this time. This rather embarrassing discovery is equivalent to successfully passing
Walking Through the Turing Wall
131
the Turing test in its classic version by artificial intelligence. Expecting technological evolution to follow its current way, Alexeev suggested that “in the near future, the “Dating Girl” scenario will also come true” [20]. However, even if this scenario is implemented, it will not make any practical sense because it deprives the machine of the comprehensive direct and useful interaction with a man and the world by interacting “through the Turing wall”. To clarify this, let’s imagine a different ending to the scenario in the same experiment. Suppose that the young man meets the girl in a cafe and she looks alive and real. However, the offline conversation does not go so well: it turns out they do not have so much in common. The young man discovers that the witty and appropriate remarks that the girl was giving when chatting offline were automatically prompted by artificial intelligence. Disappointed, the young man goes back home and writes her that he is embarrassed by such a meeting, but she replies with a quote from his favorite TV series and the interaction is resumed. Thus, the wall separating interactions between a human and a machine does not bring any value to AGI development. The wall is excessive and is no longer needed to assess the degree of artificial intelligence and of its interaction with people. At the same time, a computer turns out to be emotionally closer and more understandable than even a human conversation partner, it is “more humane than a human himself.“ In this regard, one can recall Plato over again with his “eternal ideas” and real objects as their manifestations. Artificial intelligence, not encumbered by the Turing wall, can embody the “idea of intelligence” in the same way as a person himself does, in its own right and in interaction with a human. A person knows who he is dealing with and realizes that he is better off with a machine: he finds it more interesting, more useful and more reliable.
5 Post Turing A real thinking machine should become the product of versatile interactions with humans and the outside world: verbal and non-verbal, taking place both in a virtual environment and in a real one. Therefore, the classical Turing test covers only the areas of verbal and virtual interaction, like Winograd’s schemes and most other popular tests of artificial intelligence. This is not surprising because they all exist within the paradigm set by Turing, “behind the wall”. Breaking it down means getting out into the field of non-verbal and real assimilation of the world by artificial intelligence. Today we realize that many animals possess certain forms of consciousness, including even cephalopods. And each time thinking and its manifestations turn out to be related to the real conditions in which the living creature exists, to the corporeality of the living creature and its motor skills. According to Dubrovsky [21] mental phenomena has occurred only in those organisms that are being active in the external environment. It seems that a complete knowledge of the surrounding world is essentially impossible without a physically interacting with it. Therefore, a condition for creating the “general” artificial intelligence will be its capability to work in different modalities and environments. It needs a gateway to non-verbal and physical fields.
132
A. Efimov et al.
Fig. 1. Shows the turing continuum [22].
The idea of all kinds of robots (or any machines) interacting in verbal-nonverbal and physical-virtual areas is graphically represented in (Fig. 1). There are two axes making four dimensions from a robot’s (or a machine’s) point of view: 1) verbal interactions in the virtual world; 2) verbal interactions in the physical world; 3) non-verbal interactions in the virtual world; 4) non-verbal interactions in the physical world. These four dimensions (further referred to as “techno-umwelts”) cover all possible interactions for all kinds of machines, making them so important and calling for a further detailed discussion. Verbal Interaction in the Virtual World. The history of the emergence of AI research described in the work shows why most of the tests (thought experiments) developed before 2008 are in this quadrant. Testing various verbal communication skills is the basis of the canonical Turing test, Lady Lovelace test, Colby test, Searle test, Block test. In all these tests a person acts in the virtual world of his imagination and of a computer program. The interface consists of a standard: display, a keyboard and a mouse. An example of a similar interaction may be a communication of a user with a banking program.
Walking Through the Turing Wall
133
Casual connections in already existing systems of concepts are the basics of a machine’s responses, even if a computer can assess the social motives of the conversation/interaction partner. Verbal Interaction in the Physical World. There are very few examples of Turing tests in this quadrant as it is very hard to come up with them because we don’t have truly human-level robotics or artificial general intelligence combined in one universal machine. Essentially, a real robot with a fully-fledged AGI should be tested here, if one is intending to pass any Turing test at all. There are no known succesfull examples of robots capable of communicating with humans and interacting with the physical world at the same time. R. Brooks [12] notes that it might take ages before robots get to this quadrant by being capable of operating at the level of an 8-year-old child. S. Harnad suggested that his Total Turing test will be placed in this quadrant. Non-verbal Interaction in the Virtual World. An example of such interaction is the duel between game characters in computer games. Although A. Turing pointed out its importance, this area of tests was long ignored by researchers. Recognition of images, human speech, as well as their synthesis can be an example of non-verbal interaction in the virtual world. Influence by the physical world is fundamentally absent, and even if a machine recognizes the speech of a person (for example, a player) it then determines only the words ignoring their meaning. An example of non-verbal interaction in the virtual world can be the actions or emotions of virtual avatars which carry a heavy semantic load while the verbal information is not transmitted at all [22]. Non-verbal Interaction in the Physical World. Put extremely simply, this quadrant is an automatic barrier that must be lifted when a computer recognizes a person’s face by using a camera. Everything becomes much more complicated when it is necessary to imitate the actions of a person, or a robot must move freely around an apartment or a hospital corridor, send parcels to people and receive objects from them. The virtual space created by people has quantifiable, programmable characteristics and is very limited in terms of its varieties. Contrasted with it, the reality is inexhaustible. In the physical world the role of chance increases sharply, and abstraction becomes a separate task [23]. This quadrant of Turing-like tests is the most challenging because it directly depends on the combination of powerful artificial intelligence and advanced robotics. Researchers have simply been ignoring this area since the 40s of the last century, and A. Turing himself set an example. In the meantime, its significance for communication between people is selfevident (for example, gestures) and is emphasized by all researchers of communication. One of the potential tests for this level of robotics development implies comparing an android and a person: the machine pronounces only the phrases of a person that were previously recorded but has the maximum resemblance to him [10]. Also, to determine the intellectual abilities of a machine, one can use the Brooks test [12], based on non-verbal interaction. The Brooks test, probably the hardest to implement in this context, engages all four areas/quadrants of the Turing-like tests. Examples of artificial intelligence that cope with non-verbal tasks include already existing systems capable of playing computer games, or a virtual TV presenter Elena (created at Sberbank Robotics Laboratory, she can fully imitate a real TV presenter, her
134
A. Efimov et al.
movements, emotions and gestures). However, both systems do not leave the limits of the virtual world. Real interaction with humans in the physical world is still an extremely difficult task. This is not enough for a general artificial intelligence to come true, as such a machine must cover all four areas of interactions and environments [22].
6 The Advent of “Techno-Umwelts” Back in the 19th century the eminent biologist Jakob Johann von Uexküll noticed that different living creatures had perceptual worlds that were different from other species, and peculiar to their one, which he called “umwelts”. By analogy, the four areas of a possible machine interaction are proposed to be called “techno-umwelts”. TechnoUmwelt is a domain of the world perception, the way a machine sees the world around it. Everyone ones what the personal umwelt is, and many have seen the “techno-umwelt” of unmanned vehicles using radars and lidars in videos [22]. It seems acceptable to draw a parallel between the post-Turing architecture of an intelligent robot with biological evolution, where an environment (an umwelt) played a key role in the adaptation of biological species. Let us try to compare one of the techno-umwelts to the area where life arose on Earth– the World Ocean. In this case, the emergence of intelligent robots from the first techno-umwelt can give them new, adaptive features, just as the “blind watchmaker” of evolution has been giving new opportunities for millions of years to living creatures that would come to land from the ocean. A transition to the next techno-umwelt for robots could mean an upgrade to the next range of features. The point is not that a robot that must autonomously move on land suddenly learns how to swim autonomously. The point is that the capabilities of a robot that has been a success in one of the techno-umwelts should be gradually transferred, like skills, to another techno-umwelt. At the same time, in the evolutionary cycle of the development of intelligent, embodied robots the role of a human creator is increasing, who, while observing the course of the evolution, can endow robots with additional technical capabilities. The above profile of human-machine interactions (verbal-non-verbal and virtualphysical) give four independent “techno-umwelts”: verbal virtual, non-verbal virtual, verbal, and non-verbal physical. The versatility of Artificial General Intelligence (AGI) is only possible when a machine is capable of shifting freely between all four “technoumwelts”. The current generation of AI is capable of recognizing objects of different classes without prior training. This is the most important achievement, but it has nothing to do with the capability of working in different “techno-umwelts”. To achieve this, it will be required to implement a kind of “translators” from the language inherent to one perceptual world to the language of another. Only then will artificial intelligence be able to become truly multimodal, to solve a whole range of potential tasks, and fully “communicate” with a person. There are many ideas for AGI implementation, like virtual personal assistants solving different kinds of puzzles or playing board games, and so on. However, none of these are going to be representative of true AGI, as they are limited to just one techno-umwelt rendering their experience of the particular techno-umwelt useless for another one. For
Walking Through the Turing Wall
135
example, do not ask an AGI-enabled virtual personal e-mail manager to control a selfdriving car. It has no capabilities. However, almost every adult human can drive a car and answer emails (better do not do both at the same time). Humans have an innate ability to act in different environments: we are better than machines in the physical world, but we are struggling to compete with machines in the virtual world, as it is not something inherent to us.
7 Conclusions: New Cognitive Architectures The post-Turing approach to AGI methodology allows us to design novel architectures for cognitive systems. For example, instead of separated, silo-like intelligent machines, working in a sense-think-act paradigm in various environments, we could build architectures universal for all techno-umwelts. Of course, we need a low-level integration for fusion of robot’s skill acquired in various techno-umwelts. Techno codes, translating experiences from one techno-umwelt into another to be used by a robot or a machine might be such a basis. It resembles the case when the same machine can drive as well as answer emails for its owner. To summarize, the authors have proposed two things. Firstly, we need to join together the subject, the object, and the observation tool in one unified testbed. Thus, the Turing test will be transformed into a post-Turing one. There is no need for a wall separating the subject and the object – this wall only makes things worse, creating competition between humans and machines. Future AGI tests should seek better performance of robots and humans working without any “walls”, be open to all sorts of interactions, including verbal, non-verbal, virtual and physical. Secondly, we need to focus our efforts on designing and building machines capable of operating autonomously in various techno-umwelts rather than just in one at a time. The same robot (or AGI) should be able to autonomously answer questions and drive a car. A specialization profile is for insects and old machines, but not for humans or AGI-enabled robots of the future. The emergence of AGI will forever change our interactions with technology. After a millennia of philosophical reflection and scientific and technological progress, for the first time in history people will encounter some truly “smart” things, the devices that can possess an even more comprehensive and accurate knowledge of the world and us than we ourselves do. These processes have already begun today, and we are beginning to “dissolve” in the technologies and gadgets that surround us everywhere. The very notion of “man” is being blurred. As computers master new areas of activity, be it chess or translation, these areas can no longer be considered an exclusive prerogative of a person. Perhaps being a person is something that a machine is not yet capable of imitating. However, human engineers can create a machine that can autonomously get from point A to point B, but one has to to be a philosopher to see the place where the point B is situated.
136
A. Efimov et al.
References 1. Turing, A.: On computable numbers, with an application to the entscheidungsproblem. Proc. London Math. Soc. s2–42, 230–265 (1937). https://doi.org/10.1112/plms/s2-42.1.230 2. Turing, A.: Computability and λ-definability. J. symb. log. 2, 153–163 (1937). https://doi.org/ 10.2307/2268280 3. Efimov, A., Vidyakhin, A.: Artificial General Intelligence: on approaches to the supermind. Intellectualnaya Literatura, Moscow, Russia (2021) 4. Goertzel, B., Panov, A., Potapov, A., Yampolskiy, R. (eds.) Artificial general intelligence. 13th International Conference, AGI 2020, St. Petersburg, Russia, September 16–19, 2020, Proceedings. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3 5. Seibt, J., Funk, M., Coeckelbergh, M., Nørskov, M., Loh, J. (eds.) Envisioning Robots in Society - Power, Politics, and Public Space: Proceedings of Robophilosophy 2018. IOS Press, Amsterdam, Netherlands (2018) 6. Marcus, G., Rossi, F., Veloso, M.: Beyond the turing test. AIMag. 37, 3–4 (2016). https://doi. org/10.1609/aimag.v37i1.2650 7. Kitano, H.: Artificial intelligence to win the nobel prize and beyond: creating the engine for scientific discovery. AIMag. 37, 39–49 (2016). https://doi.org/10.1609/aimag.v37i1.2642 8. Nöth, W.: Semiosis and the umwelt of a robot. Semiotica. 2001 (2001). https://doi.org/10. 1515/semi.2001.049 9. Clark, A.: Reasons, robots and the extended mind. Mind Lang. 16, 121–145 (2001) 10. Ishiguro, H.: Android science. In: Thrun, S., Brooks, T., Durrant-Whyte, H. (eds.) Springer Tracts in Advanced Robotics, pp. 118–127 Springer, Heidelberg https://doi.org/10.1007/9783-540-48113-3_11 11. Penny, S.: What robots still can’t do (With Apologies to Hubert Dreyfus) or: deconstructing the technocultural imaginary. FAIA. 311, 3–5 (2018). https://doi.org/10.3233/978-1-61499931-7-3 12. Brooks, R.: Steps toward super intelligence IV, things to work on now. https://rodneybrooks. com/forai-steps-toward-super-intelligence-iv-things-to-work-on-now/ 13. Clark, A.: Can Philosophy contribute to an understanding of Artificial Intelligence? http://undercurrentphilosophy.com/medium/can-philosophy-contribute-to-an-unders tanding-of-artificial-intelligence/ 14. Spitzer, E.: Tacit representations and artificial intelligence: hidden lessons from an embodied perspective on cognition. In: Müller, V.C. (ed.) Fundamental Issues of Artificial Intelligence. SL, vol. 376, pp. 425–441. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-264851_25 15. Burden, D., Savin-Baden, M.: Virtual Humans: Today and Tomorrow. CRC Press, Boca Raton (2019) 16. Murphy, R.: Introduction to AI robotics. MIT Press, Cambridge (2019) 17. Turing, A.: Computing machinery and intelligence. Mind LIX 236, 433–460 (1950) 18. Turing, A.: Intelligent Machinery 1948. Report for National Physical Laboratory semanticscholar.org. - 1948. https://pdfs.semanticscholar.org/b6b8/523c4e6ab63373fdf5ca584b02e 404b14893.pdf Accessed 22 Jun 2021 19. Ackerman, E.: Can Winograd Schemas Replace Turing Test for Defining Human-Level AI? https://spectrum.ieee.org/automaton/artificial-intelligence/machine-learning/winogradschemas-replace-turing-test-for-defining-humanlevel-artificial-intelligence 20. Alekseev, A.: Kompleksnyj test T’juringa: filosofsko-metodologicheskie i socio-kul’turnye aspekty [Turing comprehensive test: philosophical, methodological and socio-cultural aspects]. IInteLL, Moscow, Russia (2013)
Walking Through the Turing Wall
137
21. Dubrovsky, D.I.: The hard problem of consciousness. Theoretical solution of its main questions. AIMS Neurosci. 6, 85–103 (2019) 22. Efimov, A.: Post-turing methodology: breaking the wall on the way to artificial general intelligence. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 83–94. Springer, Cham (2020). https://doi.org/10.1007/978-3-03052152-3_9 23. Richert, A., Müller, S., Schröder, S., Jeschke, S.: Anthropomorphism in social robotics: empirical results on human–robot interaction in hybrid production workplaces. AI Soc. 33(3), 413–424 (2017). https://doi.org/10.1007/s00146-017-0756-x
A Controlled Adaptive Network Model for Joint Attention Dilay F. Ercelik1 and Jan Treur2(B) 1 Faculty of Brain Sciences, University College London, London, UK
[email protected]
2 Social AI Group, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
[email protected]
Abstract. Joint or shared attention is a fundamental cognitive ability, which manifests itself in shared-attention episodes where two individuals attend to the same object in the environment. Network-oriented modeling provides an explicit framework for laying out this attentional process from the perspective of the individual initiating the episode. To this end, we describe an adaptive network with two reification levels and clearly explain the role of its states. We conclude with some suggestions for extending this modeling work and thinking about the potential use-cases of more developed models. Keywords: Joint attention · Network model · Adaptive · Second-order
1 Introduction Joint attention (or shared attention) is a cognitive ability or process that emerges when two individuals attend to the same object in the environment in an overlapping time period, provided they are also both aware of each other’s attentional state. This is referred to as the “shared-attention episode”. Based on this, a shared-attention episode requires two individuals (or agents), with one of whom acting as an initiator: the initiator drives the shift in the gaze of their partner, effectively leading them to the object at the center of the episode. Although joint attention has primarily been studied in the visual modality (gaze behaviours: gaze leading, gaze response and monitoring) especially in its early years, it is now understood and investigated as a multimodal phenomenon. Indeed, the initiator may, for instance, support its gaze action with a hand gesture, e.g., pointing at the object to further draw the partner’s attention. See also (Stephenson et al. 2021). Joint attention is thought to be relatively important in social cognition and development, and figures among the many factors that come into play in infant vocabulary learning (Akhtar and Gernsbacher 2007) and social group maintenance (Manninen et al. 2017), for instance. The literature on infant development has shed light on the importance of joint attention in social cognition, essentially revealing that it needs to be toned down: indeed, early world learning can occur without joint attention, in typical and autistic development, as well as in Williams and Down Syndrome (Akhtar and Gernsbacher 2007); at best, joint attention ought to be understood as a composite (not necessary and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 138–147, 2022. https://doi.org/10.1007/978-3-030-96993-6_12
A Controlled Adaptive Network Model for Joint Attention
139
sufficient) phenomenon relating to social cognition, without which the latter can very well emerge and develop. However, this should not undermine the interest in describing and giving accounts of joint attention. Rather, there is much interest in better understanding the complex relationship between joint attention and social cognition, which, as we have described above, is not straightforward. In this direction, the first step is to begin with joint attention alone. With this rationale, this paper aims to precisely lay out some key components that make up a shared-attention episode, from the perspective of the initiator. Particularly, we aim to create a computational model of joint attention from the initiator’s standpoint. To achieve this, we turn to network-oriented modeling, a computational tool that enables the conceptualization and simulation of “inherently complex behavior” (Treur 2020). Network-oriented modelling is a useful framework to understand joint attention because it enables the modeling of adaptive networks (i.e., networks with structures that are dynamic), and shared-attention episodes are dynamic in nature: indeed, the attentional and action states (among others) of the two partners change over time during the episode, with regard to connection strengths between states, for instance. Adaptive networks from the network-oriented modeling literature are well-fit for the conceptualization of the emergence of joint attention from the initiator’s perspective.
2 The Adaptive Modeling Approach Used Although more details about network-oriented modeling can be found in Treur (2020), some key components should be addressed before describing the adaptive network of joint attention in the next section. Such networks have a “base level”, which can be thought of as being the basic temporal-causal network itself. This level comprises “states” or variable, where each variable symbolizes the occurrence of an event in the sharedattention episode. Connections between variables indicate the direction of the causal relationship between any two variables; they are further defined with weights. Adaptive temporal-causal networks go a step further in that they incorporate higher-order levels, where each additional level is adaptively changing some components of the lower-order level. Most adaptive networks have one or two of these higher-order levels, also called reification levels. This approach can be described as follows. According to the network-oriented modeling approach described in (Treur 2020), a network model is characterised by: • connectivity characteristics Connections from a node (or state) X to a node Y and their weights ωX,Y • aggregation characteristics For any node Y, some combination function cY (..) defines aggregation that is applied to the single impacts ωX,Y X(t) on Y through its incoming connections from states X • timing characteristics Each node Y has a speed factor ηY defining how fast it changes for given (aggregated) impact
140
D. F. Ercelik and J. Treur
The difference (or differential) equations that are useful for simulation purposes and also for analysis of network dynamics incorporate these network characteristics ωX,Y , cY (..), ηY : it holds Y (t + t) = Y (t) + ηY cY (ωX1 ,Y X1 (t), . . . , ωXk ,Y Xk (t)) − Y (t) t (1) for any state Y and where X1 , . . . , Xk are the states from which it gets its incoming connections. Examples of useful combination functions are: • the simple logistic sum function slogisticσ,τ (..) defined by: slogisticσ,τ (V1 , . . . , Vk ) =
1 1 + e−σ(V1 +...+Vk −τ)
• the advanced logistic sum function alogisticσ,τ (..) defined by: 1 1 1 + e−στ alogisticσ,τ (V1 , . . . , Vk ) = − στ −σ(V +...+V −τ) 1 k 1+e ) 1+e
(2)
(3)
This function is obtained from the simple logistic sum function by subtracting its value for sum 0 from it and rescaling the result for the [0, 1] interval. The aforementioned concepts enable the design of network models, and their dynamics, in a declarative manner based on mathematically defined functions and relations. Realistic network models are usually adaptive: their network characteristics often are adapted over time. Therefore, their dynamics is usually an interaction (sometimes called co-evolution) of these two sorts of dynamics: dynamics of the nodes (or states) in the network (dynamics within the network) versus dynamics of the characteristics of the network (dynamics of the network). Dynamics of the network’s nodes are modeled declaratively by declarative mathematical functions and relations. In contrast, the dynamics of the network characteristics traditionally are described in a procedural, algorithmic nondeclarative manner, which then leads to a hybrid type of model. But by using self-models within the network, a network-oriented conceptualisation can also be applied to adaptive networks to obtain a declarative description using mathematically defined functions and relations; see (Treur 2020). This works through the addition of new nodes to the network (called self-model states or reification states) which represent (adaptive) network characteristics. Such nodes are depicted at a next level (self-model level), where the original network is at a base level. These types of characteristics with their self-model states and their roles are shown in Table 1. This provides an extended network, called a self-modeling network. Like for all network models, a self-modeling network model is specified in a (network-oriented) declarative mathematical manner based on nodes and connections. These include interlevel connections relating nodes at one level to nodes on the other.
A Controlled Adaptive Network Model for Joint Attention
141
Table 1. Different network characteristics and self-model states for them Types of characteristics
Concepts
Notations
Self-model states
Role played by the self-model state
Connectivity characteristics
Connections weights
ωX,Y
WX,Y
Connection weight W
Aggregation characteristics
Combination functions and their parameters
cY (..) πi,j,Y
Ci,Y Pi,j,Y
Combination function weight C Combination function parameter P
Timing characteristics
Speed factors
ηY
HY
Speed factor H
The outcome is also a network model (Treur 2020, Ch 10). This whole construction can be applied iteratively to obtain multiple self-model levels that can provide higher-order adaptive networks, and is quite useful to model, for example, plasticity and metaplasticity in the form of a second-order adaptive network with three levels, one base level and a first- and a second-order self-model level; e.g., (Treur 2020), Ch 4. To support the design of network models and simulation of them, for any application from a library predefined basic combination functions bcfi (..), i = 1,.., m are selected by assigning weights γi,Y ; the combination function then becomes the weighted average cY (..) = (γ1,Y bcf1 (..) + ... + γm,Y bcfm (..))/(γ1,Y + ... + γm,Y )
(4)
Furthermore, parameters of combination functions are specified, so that bcfi (..) = bcfi (p,v) where p is a list of parameters and v is a list of values.
3 The Designed Adaptive Network Model for Joint Attention Using the modeling framework introduced above, we designed a second-order adaptive network to model the emergence of a shared-attention episode between two individuals, from the initiator’s perspective. Table 2 summarises the states of the network model. Below, we will explain the organization of the behaviour; see the graph at the end of this section for a more holistic view of the network (for more clarity, we have added two versions of the network: the top figure discards the top-down connection arrows, while the complete version of the network is displayed in the bottom figure). At the base level, we model the temporal-causal emergence of the shared-attention episode, from the initiator’s perspective. The initiator perceives both the object o and the person p in the environment, which leads to two separate sensory representations srso and srsp . This causes the initiator to prepare and subsequently execute the action esa , which represents the multimodal action of the initiator: e.g., the initiator could be looking at their partner (gaze leading), while also pointing towards the object o. Then, this action, which initiates the shared-attention episode, is thought to trigger a new world state for the object o (wse ) that leads to a new, and perhaps altered, sensory representation of o for the initiator.
142
D. F. Ercelik and J. Treur
The connection weights self-model states WX,Y influence connections at the base level. The initiator’s sensory representation of their partner (dependent on their perception of the partner) is modulated by their self-knowledge about the image of that partner p (named X13 or WimgP ), be it conscious or unconscious; the corresponding idea is applied to the object o (X19 or WimgO ). Then, the link from this sensory representation to the action preparation state (X7 or psa ) is influenced by (1) the initiator’s knowledge about their own (likely) response to that partner (X14 or WrepP ), (2) the initiator’s (successful) Theory of Mind (especially in the context of the partner’s mind) (X17 or WToM ), and (3) the extent of their social affiliation (whether, and to what extent, the initiator views the partner as belonging to their group) (X18 or WAff2 ). Going from action preparation (X7 ) to action execution (X8 ), we have included two W-states: (1) the initiator executes the action depending on their tendency to proceed with the prepared action in this specific environment (based on reward prediction/expectation) (X15 or WR ), (2) as before, the strength of social bonding/mirroring modulates the connection between action preparation and action execution, and can be thought of as a link from the preparation to the execution (a moderate social affiliation - as perceived by the initiator - may be sufficient to initiate action preparation but not develop into the actual execution of the action, for instance) (X16 or WAff1 ).
Fig. 1. An adaptive network model of joint attention between two individuals: from the initiator’s perspective
The speed factors (represented by second-order self-model H-states) in the network can be briefly explained using this quote from Robinson et al. (2016, p. 2): Adaptation accelerates with increasing stimulus exposure. Based on these assumptions, the adaptive network model has been designed as shown in Fig. 1 (base level and downward causal pathways) and Fig. 2 (all causal pathways).
A Controlled Adaptive Network Model for Joint Attention
143
Fig. 2. An adaptive model of joint attention between two individuals: from the initiator’s perspective (complete version)
Table 2. State names for the joint attention model with their explanations. State nr
State name
Explanation
Level
X1
wso
World state for stimulus o (object)
Base level
X2
sso
Sensor state for stimulus o (object)
X3
srso
Sensory representation for X = o (object)
X4
wsp
World state for stimulus p (partner)
X5
ssp
Sensor state for stimulus p (partner)
X6
srsp
Sensory representation for X = p (partner)
X7
psa
Preparation state for action a
X8
esa
Execution state for action a: action a is the instantiation action undertaken by one of the partners (the initiator), it can be multimodal, e.g. the initiator is pointing at the object while turning their head to gaze at the partner and saying the label of the object (in order to draw the partner’s attention)
X9
srse
Sensory representation for X = e (evaluation loop) (continued)
144
D. F. Ercelik and J. Treur Table 2. (continued)
State nr
State name
Explanation
Level
X 10
wsa
Following es_a, new world state for stimulus o (object) via the initiator’s reorientation of gaze from the partner p to the object o again
X 11
ssa
Following es_a, new sensor state for stimulus o (object)
X 12
srsa
Following es_a, new sensory representation for X = o (object)
X 13
WimgP
Self-model state for connection First-order weight ωssp ,srsp , i.e. one’s self-model level self-knowledge about the image of the partner (conscious or unconscious)
X 14
WrepP
Self-model state for connection weight ωsrsp , psa , i.e. one’s self-knowledge of one’s response to a partner
X 15
WR
Self-model state for connection weight ωpsa ,esa , i.e. one’s tendency of executing prepared action a, based on reward prediction
X 16
WAff1
Self-model state for connection weight ωpsa ,esa , i.e. affiliation or social bonding, similar to mirroring strength and empathy (unconscious process). See (Iacoboni 2007)
X 17
WToM
Self-model state for connection weight ωsrsp psa , i.e. the initiator’s Theory of Mind (conscious process). See (Goldman 2006)
X 18
WAff2
Self-model state for connection weight ωsrsp psa , i.e. affiliation or social bonding, mirroring strength (unconscious process). See (Iacoboni 2007)
X 19
WimgO
Self-model state for connection weight ωsso ,srso , i.e. one’s self-knowledge about the image of the object (conscious or unconscious) (continued)
A Controlled Adaptive Network Model for Joint Attention
145
Table 2. (continued) State nr
State name
Explanation
Level
X 20
HWimgP
Second-order self-model state for Second-order speed factor ηWssp ,srsp for self-model self-model level state WimgP
X 21
HWR
Second-order self-model state for speed factor ηWpsa ,esa for self-model state WR
X 22
HWAff1
Second-order self-model state for speed factor ηWpsa ,esa for self-model state WAff1
4 Simulation Results In this section, we describe the resulting simulation plot we obtained using the following hyperparameters: Endofsimulation = 40, dt = 0.5. Using the adaptive network MATLAB template with the role matrices specified in detail in the Appendix at URL https://www. researchgate.net/publication/356195174, we ran the script and obtained the simulation plot shown in Fig. 3. For more clarity, in that Appendix there are some shots of the simulation plot with only a few states at a time.
Fig. 3. Simulation results
146
D. F. Ercelik and J. Treur
5 Discussion and Future Perspectives We will begin this section with some considerations about the simulation plot shown previously. Here, we expect all states to have increased activation levels over time: what matters most is the order of such increase (i.e., what state activity precedes what other state). Let’s summarise these below: • For the base level states, we do have that the activation of the world stimulus precedes that of the sensor state and the sensory representation, for both the object o and the person p • Following the states above, the activation of the preparatory state comes into the picture • Shifted to the right (relative to other previous states), we see the three states associated with the new sensing and representation of the object o (X9 , X10 , X11 ) • Compared to other W states, the activity levels of WimgP (X13 ), WR (X15 ) and WAff1 (X16 ) increase more over time: this is expected since they received positive modulation from the H states at the second reification level. • As expected, the H-states have higher activation levels than all other states. Finally, we will now consider some ways in which the network above can be improved (Extension) and used in the future (Future Use). Extension Options • a network that models the exact temporal dynamics of shared-attention episode, e.g., with the flexibility of having the initiator perceive the object or the person with an offset etc. Here, it is assumed that they are both perceived in parallel. • a more fine-grained modelisation of the multimodal behaviour of the initiator (where the multimodal behaviour drawing the attention of the partner is not reduced to a single state esa ) • a network with loops, where the sensory representation of the person p may, for example, influence the new sensory representation of the object o (wsa , ssa and srsa ) • a network from the perspective of the partner • better than the above: an interactional network, which models both the initiator and their partner (e.g., the gaze response of the partner, etc.)
Future Use Modeling what happens in a shared-attention episode is useful in better understanding how joint attention arises between two individuals. It is of particular interest to do so using the network-oriented modeling approach in order to capture the dynamics of the episode, namely what event causes another event, when and with how big of an impact. A precise network model of joint attention would be of interest to the developmental psychology field (especially looking at typical and atypical development). For instance, joint attention is relevant to research investigating development (e.g., lexical with world learning), and in the realm of autism spectrum disorder in children. Other aspects of social cognition and social psychology, such as idea sharing, are also intertwined with the emergence of joint attention.
A Controlled Adaptive Network Model for Joint Attention
147
References Akhtar, N., Gernsbacher, M.A.: Joint attention and vocabulary development: a critical look. Lang. Linguist, Compass 1(3), 195–207 (2007) Goldman, A.I.: Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford University Press, Oxford (2006) Hebb, D.O.: The Organization of Behavior: A Neuropsychological Theory. Wiley, Hoboken (1949) Iacoboni, M.: Face to face: the neural basis of social mirroring and empathy. Psychiatr. Ann. 37(4), 236 (2007) Manninen, S., et al.: Social laughter triggers endogenous opioid release in humans. J. Neurosci. 37(25), 6125–6131 (2017) Robinson, B.L., Harper, N.S., McAlpine, D.: Meta-adaptation in the auditory midbrain under cortical influence. Nat. Commun. 7, 13442 (2016) Stephenson, L.J., Edwards, S.G., Bayliss, A.P.: From gaze perception to social cognition: the shared-attention system. Perspect. Psychol. Sci. 16(3), 553–576 (2021) Treur, J.: Network-Oriented Modeling for Adaptive Networks: Designing Higher-Order Adaptive Biological, Mental and Social Network models. Springer, Cham (2020). https://doi.org/10. 1007/978-3-030-31445-3
Means of Informational Support for the Program of Increasing the public’s Loyalty to Projects in the Field of Nuclear Energy Anna I. Guseva, Elena Matrosova(B) , Anna Tikhomirova(B) , and Matvey Koptelov National Research Nuclear University MEPhI, (Moscow Engineering Physics Institute), Moscow, Russia [email protected], [email protected], [email protected]
Abstract. The article is devoted to modern means of information dissemination among Internet users with consideration for their individual characteristics. The authors of the article propose the application of the practice of contextual advertising for the purpose of informing the public about projects in the field of nuclear energy and forming a positive attitude towards it. Also, it is given the algorithm of working with Internet users aimed at increasing the level of public loyalty to nuclear energy technologies. The criteria for analyzing the activities of users on the Internet are proposed, as well as a flexible approach to determining the number of differentiable groups of users in order to ensure the expediency of labor inputs for creating various scenarios and forms of information presentation with consideration for the individual characteristics of users. Keywords: Megaproject · Loyalty program · Decision-making process · Automated decision support system · Fuzzy logic · Fuzzy knowledge base
1 Introduction Nowadays, the implementation of megaprojects is an effective tool for maintaining the competitiveness of the state both at the national and international levels. Progress in the field of complex technologies is the basis for the innovative development of the domestic market of products and services, as well as Russia’s stable position in the foreign market. The spread of unique Russian technologies is one of the ways to strengthen the country’s position on the international market, it is also the key to the formation of effective and good-neighborly relations with other countries. The development of peaceful atom technologies is one of the successfully implemented directions, within which the State Corporation Rosatom is building nuclear power plants (NPP) on the territories of other countries [1]. With the expansion of the geography of the project, the number of risks associated with it increases as well. When implementing a megaproject on the territory of another This work was supported by RFBR grant № 20–010-00708\21. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 148–157, 2022. https://doi.org/10.1007/978-3-030-96993-6_13
Means of Informational Support for the Program of Increasing the Public’s Loyalty
149
country, a large number of significant additional economic and political country and information risks emerge [2, 3]. In order to reduce the likelihood of their implementation, specialized programs are being developed and executed aimed at increasing the loyalty of the public to mega-projects. The construction and operation of nuclear power plants are designed for a long-term perspective. The socio-political situation in the NPP location area can have a significant impact on the effectiveness of the implementation of the full technological cycle [4]. The presence of a significant percentage of the population who have a negative attitude towards the project and their capability of taking active actions can lead to a significant delay in construction and a shift in the project implementation plan in general. All that will directly affect the economic efficiency of the project. Hence, various measures aimed at increasing the level of loyalty of the public are being widely implemented.
2 Means of Informational Support Currently, the actions of users on the Internet are actively analyzed by companies in order to ensure the most effective interaction with users, as well as the formation of an offer of the goods, services and information materials primarily interesting to a particular user. The user’s search requests are the basis for the formation of relevant contextual advertising [5]. The appearance of targeted advertising leads to the increasing probability of making a purchase and also encourages the user to make it more rapidly. The working principles of contextual advertising can be used for target informing of active Internet users in order to increase the level of loyalty of the citizens in relation to nuclear power plant construction projects. The reason for the negative attitude towards the construction of NPP is often the poor awareness of the public about the used technologies and about the advantages that can be obtained from the implementation of this particular technology of energy production. Presently, the level of Internet expansion is quite high, constant access to online sources of information is becoming common. For instance, the Rosatom newsletter website supported in 6 languages is visited monthly by more than 1,500 users from around 60 countries. The most active users are from Russia, Turkey, the USA, the UK and India [6]. Sources containing opinions of other users, stories about experience in a particular field are gaining more and more popularity. Among them the information articles selected according to the preferences of users the contextual advertising stands out, which is often concealed as information about the experience of interacting with technology or a product. In this form, a person is more likely to get acquainted with the proposed material and perhaps even come to purchase the proposed service or product. For the effective work with public opinion, the changing style of behavior of a person who spends less and less time in front of the TV and more and more on the Internet cannot be ignored, because he receives from there exactly the information in which he is most interested. With that said, it is also reasonable to use similar technologies to change public opinion about nuclear energy. Several key criteria can be used to assess the public’s attitude to a nuclear energy project [7]:
150
A. I. Guseva et al.
• Level of knowledge about the nuclear industry; • User settings; • Benefit for the consumer. Precisely on their basis it is possible to form targeted information materials aimed at correcting a person’s opinion. According to the type of effect obtained from the implementation of a megaproject in the field of nuclear energy the criterion of “benefit for consumers” can be divided into three components: economic, social and ecological effects (Fig. 1) [8]. Level of knowledge about the nuclear industry
User settings
Internet user
Economic effect
Social effect
Ecological effect
Benefit for the consumer
Fig. 1. Criteria for evaluation of the public’s attitude towards nuclear energy
For example, the implementation of a wide range of Russian-Belarusian projects on the development of education, science and human resources for nuclear energy leads to the following dynamics of public opinion. The results of a survey held in 2017 show that almost 40% of Belarusians believe that the building nuclear power plant in Ostrovets may be unsafe. Another 9.5% claim that the project will not pay off and 26.9% of the country’s residents view the Belarusian NPP project positively. At the same time, a significant part of society — 17% — remains indifferent on this issue [9, 10]. The 2019 survey shows that the majority of the population (67.8%) positively assessed the work of the energy industry, the evaluations of the experts involved from among the leaders and leading specialists of the energy industry, specialized scientists and university teachers of the country, etc. are even higher - 84.1%. But upon that the number of supporters of nuclear energy decreased by 3.1% compared to the last survey, and the number of opponents decreased by 0.6%. At the same time, the number of respondents who found it difficult to answer this question increased by 4.3% [11]. The obtained results show the need of accounting named criteria in the formation of targeted information materials. The specifics of the materials are that they should not be large or difficult to perceive, should not be overloaded with information, though should correspond as much as possible to the characteristics of a particular user, which should be identified by analyzing his search requests and their orientation. Examples:
Means of Informational Support for the Program of Increasing the Public’s Loyalty
151
• To a person who reads a lot of specialized information websites may be offered an informational page containing more scientific information • A user who forms requests only of a domestic manner should be provided with a page as simple as possible to perceive, containing information describing aspects of improving the standard of living of an ordinary citizen; • While identifying the user’s interest in the topic of ecology, should appear a page that reveals this aspect and explains the safety of nuclear energy technologies; • If a user spends a lot of time reading stories about the lives and experiences of other people – he should be presented with information in the form of a life story of a particular person or group of people and a positive impact on this life as a result of the implementation of new technologies; • If a user spends a significant amount of time looking at the photos from open sources, he should be provided with material with a lot of photos on the topics of nuclear energy and NPP construction with a small text accompaniment, etc. To engage a person into the topic, it is necessary to form competent and laconic information materials that do not cause rejection. When identifying a person’s interest, should be selected other information materials corresponding to the user’s interests. Figure 2 shows an approximate algorithm for working with Internet users in order to influence public opinion. The proposed algorithm is one of the possible activities of the loyalty program aimed at retaining existing contractual relationships and partners, as well as attracting new ones. Work on any loyalty program event should begin with an analysis and forecast of its coverage and the degree of potential impact. Since the proposed event involves exclusively working with the population via the Internet, at the first stage it is necessary to analyze the coverage of the country’s population with this resource. For this, can be counted the percentage of the country’s population that accessed the Internet during the study period (for example, for a month), the average daily audience of users from stationary and mobile devices, the number of Internet users, the Internet audience in the country by population groups, etc. In the case of identifying factors which are satisfactory in terms of coverage of the population via Internet, it is necessary to proceed to the next stage of analysis – the allocation of the most popular resources among users, through which the dissemination of information will cover the maximum possible number of users.
152
A. I. Guseva et al. Start Evaluation of the level of Internet use in the partner country
The level is sufficient for the implementation of measures to work with the user Yes Identification of the most popular Internet resources or apps for obtaining information among the population Analysis of search requests and formation of user groups
Formation of a pool of information pages by user groups Launching the issuance of specialized information pages to users on the principle of contextual advertising
Analysis of user interest in selected groups by emerging information pages
No
No
The presence of interest in the group in general
Expanding and updating the pool of texts with the same focus and style of presentation of information
Repeated analysis of user group search requests Formation of the updated pool of the texts Repeated analysis of user group search requests
Yes
Repeated analysis of user group search requests
No
Yes Increasing interest in the group in general
The presence of interest in the group in general
Yes
Yes Addition of information pages with brief questionnaires
No
Evaluation of changes in the loyalty level of the group of users Loyalty has increased
Selection of other loyalty program events
A further increase in loyalty is required
Yes
No End
No
Fig. 2. Algorithm for working with Internet users
Means of Informational Support for the Program of Increasing the Public’s Loyalty
153
After receiving the key resources or widely used apps, it is necessary to analyze the characteristic requests of the population and identify user groups in the most appropriate direction of impact. For the identified user groups should be prepared appropriate information pages aimed at matching the interests of a person as accurately as possible and delivering the necessary information in the form that he/she is ready to accept. The effectiveness of the event largely depends both on the correct division of users into groups and on the quality of the prepared information materials. The number of groups depends on the special aspects of society, the differentiation of views, the presence of active segments of the population interested in global technologies, traditions, gender and age structure, etc. Thus, it is possible to propose several criteria that have an impact on which group the respondent falls into. When forming information pages, it is necessary to take into account the following criteria (segmentation signs) with the corresponding signs. • Preferred form of information perception: o o o o
Viewing photos; Viewing videos; Listening to audio; Reading text-pages.
• Direction of the request: o o o o o
Scientific materials; Political materials; Information about life and experience of another people; Household information; Information about goods and services.
• Emotional dimension [12]: o General perception of information: – Positive; – Neutral; – Negative. o Perception of nuclear energy technologies: – – – –
Positive; Neutral; Negative; Lack of interest.
154
A. I. Guseva et al.
Since it is required to achieve the maximum targeted impact, it is necessary to consider all possible combinations of criteria features (Fig. 3). To do this, one can appeal to the principles of solving combinatorial problems and calculate the number of combinations according to a scheme called the “tree of possible options”. The first criterion allows to divide users into 4 groups, the second into five, the third to allocate 3 groups, in each one 4 subgroups are possible, thus, with the proposed definition of criteria and their features, 240 groups of users are obtained, whose features must be taken into account when forming information pages and determine the appropriate emphasis and style of presentation of information. Criteria
Segmentation photo video audio text
Preferred form of perception of information
science politics life and experience of another people general household goods and services
Directions of the requests of information
General Emotional dimension of information
About nuclear energy technologies
positive neutral negative
positive neutral negative lack of interest
Fig. 3. Features of user groups
Such a large degree of differentiation of society into groups can be worthwhile only with a very large scale of coverage of the population. However, if the task is to influence the opinion of local groups, then such high labor inputs will not be justified. Hence, it is possible to reduce the number of groups by forming them with the mathematical tools of fuzzy logic theory. To ascribe a user to a certain group, a set of fuzzy knowledge base rules can be used. The rules are formulated by partially compressing the criteria described above. They use logical functions AND and OR [13]. The construction of the rule is an implication, the premise is indicated on the left, and the conclusion is indicated on the
Means of Informational Support for the Program of Increasing the Public’s Loyalty
155
right. In Russian, this corresponds to the form IF…THEN. For example, we can assume the following reduction in the number of groups due to the merging of sub-criteria: • In the “preferred form of perception” criteria, leave 2 options: text perception and perception through video and audio objects; • In the “direction of the requests” criteria combine the sub-criteria “general household” and “goods and services”; • In the “emotional dimension of information” criteria leave the division in all subcriteria for 2 options “positive” and “negative”. As a result, for reduced differentiation, the total number of possible options will be 2 * 4 * 2 * 2 = 32. As examples of the rules, the following can be given: • IF “preferred form of perception of information” - video OR audio files AND “direction of the request” - scientific materials AND “emotional dimension” regarding general perception is positive OR neutral AND “emotional dimension” regarding general perception of nuclear energy technologies is negative, SO it is necessary to use a set of prepared information materials № I (Group i). • IF “preferred form of perception of information” - photo OR audio files AND “direction of the request” – general household themes OR goods and services AND “emotional dimension” regarding general perception is positive OR neutral AND “emotional dimension” regarding general perception of nuclear energy technologies is positive OR neutral OR lack of interest, SO it is necessary to use a set of prepared information materials № k (Group k). As a result, the general approach to user analysis and the analysis criteria remain identical, but due to the use of the fuzzy logic method, it is possible to narrow the number of groups, which is also true for the number of prepared differentiated options accounting the differences in people and the size of the target audience in general. After the launch of the event, it is necessary to analyze the degree of its impact on users and thus assess the expediency of conducting this particular event. Let’s consider some of the loyalty program activities implemented by the State Corporation Rosatom during the construction of NPP abroad. For example, in June 2021, the information coverage of the signing of the act of acceptance of the launch complex of the BELNPP-1 (Belarus) caused a response in the form of 220 publications in local media, and the solemn ceremony of pouring the first concrete on block No. 5 “Kudankulam” (India) – 150. In addition to these informational events, the state corporation held an intersectoral seminar on the prospects of the North Sea Route, focused on Japanese partners, representatives of the state corporation took part in the VIII International Nuclear Energy Summit “IPPS 2021”, Nordic Nuclear Forum 2021, the BUDIKS festival, the strategic online forum on nuclear technologies “Thinking Outside the Dome”, APCOM21, etc. [10]. In total, the June information events provoked a response in the form of 5237 publications, of which neutral-positive - 4861, and negative - 236. Negative responses were associated with the suspension of operation of the Kudankulam power unit No. 2 (India) due to a malfunction of one of the turbines, the statement of the European Commissioner
156
A. I. Guseva et al.
for Energy K. Simson about the exploitation of the Belarusian NPP, Russia’s attempts to control the Finnish media and unreliable information about the implementation of Rosatom projects in Argentina in exchange for the SPUTNIK V vaccine. The proposed method of working with public opinion may be highly effective for certain groups of the population, while other groups may be completely insensitive to it. The lack of a result should be quickly identified and other measures to increase loyalty should be applied for such groups of the population.
3 Conclusions Internet technologies are penetrating deeper and deeper into all spheres of life. Companies that manage to timely monitor trends in the development of IT communications and forming preferences of Internet users have a stable income. At the same time, the segment that falls within the sphere of influence of large corporations often changes more slowly and more heavily. This is due to the presence of stable bureaucratic mechanisms and the fear of switching from traditional, proven methods to new technologies. However, the lack of promptly actions leads to a decrease in work efficiency and loss of opportunities. Elements of successful technologies popular in a rapidly changing and adapting market close to perfect competition should be studied and adjusted to the needs of organizations with a large infrastructure. The construction of NPPs belongs to megaprojects, implemented the international level as well [14]. But the problems of competition are not alien to this area, and the role of potential customers here may be given to the society of the country in which the construction of a NPP is planned, the mood of which needs to be analyzed in advance and specialized measures to correct it must be taken. Modern technologies, widely used today in commercial companies, are aimed at maximum adaptation to individual requests and interests of a particular user. The individualization of the material shown to the user is justified and has a positive effect on sales volumes. Targeted information in the correct form reduces the time for making a purchase decision, as well as it contributes to an increase in the number of spontaneous purchases. Since this approach has already proven its effectiveness in practice, it is extremely expedient to extend it to other areas of activity, including increasing user loyalty to an object, project or direction of activity. The most accurate consideration of individual characteristics will make it possible to have a directed and effective impact on an individual and thus increase the loyalty of society in general in relation to the relevant project. The events of the loyalty program should be formed taking into account the peculiarities of the society. In order to ensure their maximum effectiveness, all the most successful modern technologies should be analyzed, adapted and applied, including the growing digitalization of society and its specifics in a particular country. Acknowledgments. This work was supported by RFBR grant № 20–010-00708\21.
Means of Informational Support for the Program of Increasing the Public’s Loyalty
157
References 1. https://www.rosatom.ru/about-nuclear-industry/ 2. Kovtun, D., Koptelov, M. and Guseva, A.: Megaproject risk management based on loyalty program using neural network models. In: Proceedings 2020 2nd International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA), pp. 228–231 (2020) 3. Guseva, A., Koptelov, M., Kovtun, D.: Decision support for project risk management using the information-semantic field. In: Proceedings 2019 1st International Conference on Control Systems, Mathematical Modelling, Automation and Energy Efficiency (SUMMA), pp. 256– 260 (2019) 4. Chernyakhovskaya, Y.V.: NPP integrated sales for sustainable development in the field of nuclear energy. Econ. Sci. 11(132), 24–27 (2015) 5. Perry, M., Bryan, T.: Ultimate Guide to Google AdWords. How To Access 100 Million People in 10 Minutes. Entrepreneur Media, Inc., 2012 Published in Russian language translation by Mann, Ivanov and Ferber under license from Entrepreneur Media, Inc. dba Entrepreneur Press (2012) 6. https://rosatomnewsletter.com/ 7. Guseva, A., Matrosova, E., Tikhomirova, A., Matrosov, N.: Loyalty Program tool application in megaprojects. Adv. Intell. Syst. Comput. 1310, 106–114 (2020) 8. Guseva, A., Matrosova, E., Tikhomirova, A., Kovtun, D.N.: Assessment of the public acceptance of the nuclear power plant construction plan on the territory of foreign country. Procedia Comput. Sci. 190, 301–311 (2021). https://doi.org/10.1016/j.procs.2021.06.040 9. https://greenbelarus.info/articles/23-05-2017/dva-oprosa-s-raznicey-v-polgoda-i-raznymrezultatom-chto-dumayut-belarusy-o 10. Link to the State Corporation report. https://www.canva.com/design/DAEkVV8Oaak/Csj TBTI-7RvEJDsZWj48HA/view?utm_content=DAEkVV8Oaak&utm_campaign=design share&utm_medium=link&utm_source=sharebutton#7 11. https://atom.belta.by/ru/analytics_ru/view/sotsiologi-i-minenergo-izuchili-podderzhku-aesv-belarusi-10506/ 12. Samsonovich, A.V., Eidlin, A.A.: A proposal for modeling cognitive ontogeny based on the brain-inspired generic framework for social-emotional intelligent actors. Procedia Comput. Sci. 186, 149–155 (2021). https://doi.org/10.1016/j.procs.2021.04.165 13. Demidova, G.L., Lukichev, D.V.: Regulators based on fuzzy logic in control systems of technical objects, ITMO University, St. Petersburg, 81 p. (2017) 14. Chernyakhovskaya, Y.V.: The macro impacts of international NPP projects. Stud. Russ. Econ. Dev. 29, 21–27 (2018). https://doi.org/10.1134/S1075700718010021
The Research of Characteristic Frequencies for Gesture-based EMG Control Channels Anna Igrevskaya, Alexandra Kachina, Aliona Petrova(B) , and Konstantin Kudryavtsev Institute of Cyber Intelligence Systems, National Research Nuclear University MEPhI, Moscow, Russia
Abstract. The paper describes experimental studies of the selection of characteristic frequencies for use in EMG-based human-machine interfaces in the problem of gesture recognition. Such interfaces allow the operator to interact with a mobile robotic device without significant physical effort. The flexion of the hand fingers is characterized by neuromuscular temporal signals, which are read by sensors and subsequently subjected to mathematical processing. The discrete Fourier transform allows us to obtain a spectral representation of the signal and select the main frequencies where the maximum signal amplitude appears. The resulting frequency sets can be used to identify gestures and generate control commands for a robotic device. The selection of a frequency set was carried out for the particular case of recognizing the thumb flexion as a gesture. As a result, it was found that considered parameters are not resistant to the influence of various factors that also include the execution of other gestures. Therefore, it is also suggested to take into account the signal power at these frequencies. Keywords: EMG · Gesture recognition · FFT · Signal processing · Robotics
1 Introduction Currently, EMG-based human-machine interfaces are becoming more widespread, since they allow to significantly expand the control of a computing device, especially for people with disabilities. However, despite the significant progress, there is a large number of problems, related to the usability, the number of possible commands, the recognition quality, etc. This manuscript considers a gesture interface based on EMG sensors that acquire an electrical signal related to muscle contraction. EMG signal processing can be time-consuming and computationally intensive. So, the system will be restricted to perform in real-time and will be more complex and expensive. Although at present there are several different mathematical methods for efficient and undemanding processing of data in real-time (e.g. neural networks), for some devices they can be overly complicated and require preparatory classifier training. Thus, if the EMG system is only required to recognize primitive gestures, then simple general characteristics of the signal can be considered. In this manuscript, we consider an EMG system that implements the recognition of finger flexion as target gestures and the possibility of using peak frequencies for data classification. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 158–163, 2022. https://doi.org/10.1007/978-3-030-96993-6_14
The Research of Characteristic Frequencies for Gesture-Based EMG
159
2 Related Works The following stages are used for the implementation of a control channel for a robotic device using an EMG signal: signal acquisition, preprocessing, classification and interpretation into commands. Preprocessing includes cleaning the signal from noise, representing in a chosen domain and selecting features, which will be used for the classification. The most popular features are parameters in the time, frequency, or time-frequency domains [1], e.g., MAV, PSD. The features in each of the domains have their advantages and disadvantages. The frequency features are more resistant to deviations and lack of stationarity in the signal than the time domain ones. The time-domain features have lower computational costs and are more suitable for the classification in the case of the stationary signal [2]. Some special features that take into account the dynamic component of the signal [3] can also positively affect classification quality. Frequency features are at the same time represent qualitative criteria for determining muscle tension and fatigue [4]. In many papers, different features are used in combination. In addition to the classic signal features, non-standard feature selection methods are applying – e.g. mel-cepstral coefficients [5] or automatic feature selection neural networks [6]. At the same time, this approach is computationally intensive. The sets of the recognized gestures vary widely between implementations and can be both relatively simple movements [3, 7–10], and certain gesture parameters (e.g. torque [11]). Even for the simplest gestures, any differences (the applied force, the duration of a gesture [7, 9], etc.) may affect the recognition quality, and even control commands themselves may provoke a decrease in the quality due to conflicts [12]. Also, the position of the arm can affect the hand-gesture recognition quality [13]. So, there are a lot of aspects of the gesture recognition process in non-laboratory conditions.
3 Theory As stated earlier, numerous studies focus on enhancing the recognition quality by improving both the feature set and the classifiers. The optimization of usage of the existing features of the raw EMG signal and the selection of new universal, discriminative and stable features is a key objective. In this paper, it is proposed to consider in this context one of the frequency-domain features, peak frequency (PKF). PKF is defined as a frequency at which the maximum EMG power occurs: (1) PKF = max Pj , j = 1, . . . , M In this formula (1), Pj is the EMG power spectrum at frequency bin j, and M is the length of the frequency bin. Along with other frequency parameters, PKF quite depends on the force exerted by the subject as part of the gesture, although PKF is not considered the most reliable gesture classification feature [2]: within the framework of this study, PKF divides the considered gestures into only 2 distinct categories. However, it can remain the same for different hand movements because of the involvement of approximately the same muscle groups, though this parameter may change for gestures that involve different muscle groups. Thus, we will consider this feature in the problem of recognizing finger
160
A. Igrevskaya et al.
movements. As a simplification, we will consider one finger and analyze how much this feature will change depending on whether the hand is at the resting state, when the finger moves, and when the other fingers move. Also, we will analyze the sequence of such peak frequencies arranged in descending order of power, which can potentially reveal new patterns and features in the signal, according to which it is possible to distinguish between gestures more precisely.
4 Experiment Setup 4.1 Hardware In the experimental session, we use a surface sensor Grove EMG Detector by Seeed Studio connected to the Arduino Uno R3 microcontroller to collect data. The data from the microcontroller is transmitted to the PC for processing using the HC-06 Bluetooth module. The sampling rate is determined by the microcontroller firmware and in our case is equal to 10 Hz. The selected hardware is not high-precision professional medical equipment and will therefore have lower noise robustness and a lower sampling rate. However, such equipment has such advantages as low cost, portability, and configuration flexibility, which makes it available to a wide range of users. 4.2 Scheme During the conducted experimental sessions, EMG data were obtained for the subject’s hand in different states: resting state and target/non-target finger movement. The gesture consisted of finger sequential flexion and extension. The thumb was chosen as the target finger and the electrode for data collection was placed on it (see Fig. 1).
Fig. 1. Location of the electrodes during the experiments.
The experiments were carried out over several days, with one participant and with a repeated procedure for reinstalling the electrodes. The duration of each data session was ~ 7s. In the resting state, the relaxed hand lay on the table. For the finger movements, the hand was initially relaxed, then recording began and the gesture was performed several times (2–3) without interruption. The total number of resting state recording sessions – 100, of target finger movement – 68, for non-target – 92 (23 sessions for each non-target finger).
The Research of Characteristic Frequencies for Gesture-Based EMG
161
5 Result Analysis Several features were calculated for each dataset: RMS, WL, MAV – taken for assessing the general state of the signal; 3 maximum PKF features (F_max1, F_max2, F_max3); A1, A2, A3 – amplitudes at F_max1, F_max2, F_max3 (the mean values are presented in Table 1). The frequencies shown are fairly low due to the current experimental configuration. Therefore, the purpose of this analysis is not to represent the exact values of the frequency features, but their relationship to each other. Table 1. The arithmetic mean values of EMG-signal features. Gesture
F_max1/A1
F_max2/A2
F_max3/A3
RMS
MAV
WL
Rest state
0.39/810.0
0.63/576.66
0.78/461.08
422.69
420.95
520.91
Thumb
0.69/3498.5
0.77/2727.90
0.80/1943.50
443.12
424.77
3383.90
Forefinger
0.51/950.98
0.49/802.76
0.71/630.89
388.34
385.11
1091.78
Middle finger
0.43/847.41
0.52/658.82
0.54/503.45
388.93
387.36
633.78
Ring finger
0.33/592.86
0.33/473.57
0.69/365.77
394.26
393.45
357.39
Little finger
0.31/663.90
0.39/446.39
0.50/348.32
401.15
400.43
384.61
The mean values of F_maxN for the non-target fingers and the resting state are less than those for the target finger, however, the numerical difference between them is relatively low, also they significantly vary during the experiments. The obtained frequency values F_max1 and F_max2 were scatter plotted along with the signal power (Fig. 2). All considered hand states contain components with the highest amplitudes at common frequencies from 0 to 0.75 Hz. The thumb and forefinger movements are significantly distinguished by higher-frequency components. Thumb gestures differ significantly in
Fig. 2. Peak frequencies and power for certain gestures.
162
A. Igrevskaya et al.
power level from the resting state and non-target gestures, especially in the range of 0.75–1.25 Hz. To examine if the thumb flexion frequencies differ significantly from other classes, a Wilcoxon signed-rank test pairwise comparison of the PKF values was performed (Table 2), but not all the cases showed significant differences in the result. Table 2. Wilcoxon signed-rank test results. Thumb vs:
Resting
Forefinger
Middle finger
Ring finger
Little finger
p-value (F_max1)
0.0003
1.0
0.12
0.002
0.0007
p-value (F_max2)
0.8
0.12
0.25
0.001
0.002
p-value (F_max3)
0.5
0.92
0.3
0.22
0.08
During the experiments, it was revealed that the values of F_max1, F_max2, F_max3 interchanged periodically. Even though the mean values of these frequencies are quite different, these groups of values can be considered almost equivalent due to the strong scattering. This is also confirmed by the result of the Wilcoxon signed-rank test for these samples, where their pairwise comparison yielded a p-value > 0.1.
6 Conclusions The considered PKF features showed the ability to distinguish between the target gesture and hand relaxation. However, PKF features by themselves are not resistant to signal distortions that can be triggered by other gestures. The use of a sequence of PKF features instead of a single value has given an advantage in terms of the possibility of choosing one or another peak frequency, especially if the frequencies would be shifted due to the interference with the higher frequency. Taking into account the peak frequency signal power as well, the considered classes of gestures can be easily demarcated. It can be noted that the analysis of factors affecting EMG patterns, including the execution of other gestures, is extremely important for EMG-based gesture recognition systems. It was found that there are frequency ranges in which the PKF features of other gestures/noise/background activity are observed, that may overlap with the target gestures. In this regard, when analyzing PKF features, it is recommended to determine and exclude such frequencies from consideration. In future studies, it is planned to increase the sampling rate of the EMG signal and analyze the parameters of the signal in the time-frequency domain using the wavelet transform in the terms of resistance to different interferences related to other gestures.
References 1. Parajuli, N., et al.: Real-time EMG based pattern recognition control for hand prostheses: a review on existing methods. Challenges Fut. Implement. Sens. 19(20), 4596 (2019)
The Research of Characteristic Frequencies for Gesture-Based EMG
163
2. Phinyomark, A., Phukpattaranont, P., Limsakul, C.: Feature reduction and selection for EMG signal classification. Expert Syst. Appl. 39(8), 7420–7431 (2012) 3. Phinyomark, A., et al.: Feature extraction of the first difference of EMG time series for EMG pattern recognition. Comput. Methods Programs Biomed. 117(2), 247–256 (2014) 4. Phinyomark, A., Thongpanja, S., Hu, H., Phukpattaranont, P., Limsakul, C.: The usefulness of mean and median frequencies in electromyography analysis. In: Computational Intelligence in Electromyography Analysis: A Perspective on Current Applications and Future Challenges, vol. 8, pp. 195–220 (2012) 5. Kapur, A., Kapur, Sh., Maes, P.: AlterEgo: A personalized wearable silent speech interface. In: 23rd International Conference on Intelligent User Interfaces (IUI 2018), pp. 43–53. Association for Computing Machinery, New York (2018) 6. Demir, F., Bajaj, V., Ince, M.C., Taran, S., Sengür, ¸ A.: Surface EMG signals and deep transfer learning-based physical action classification. Neural Comput. Appl. 31(12), 8455–8462 (2019). https://doi.org/10.1007/s00521-019-04553-7 7. Zhang, Z., Yang, K., Qian, J., Zhang, L.: Real-time surface EMG pattern recognition for hand gestures based on an artificial neural network. Sensors 9(14), 3170 (2019) 8. Gonzalez-Ibarra, J.C., Soubervielle-Montalvo, C., Vital-Ochoa, O., Perez-Gonzalez, H.G.: EMG pattern recognition system based on neural networks. In: 2012 11th Mexican International Conference on Artificial Intelligence, pp. 71–74 (2012) 9. Kerber, F., Puhl, M., Krüger, A.: User-independent real-time hand gesture recognition based on surface electromyography. In: Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI 2017), pp. 1–7. Association for Computing Machinery, New York (2017) 10. Qi, J., Jiang, G., Li, G., Sun, Y., Tao, B.: Intelligent human-computer interaction based on surface EMG gesture recognition. IEEE Access 7, 61378–61387 (2019) 11. Khokhar, Z.O., Xiao, Z.G., Menon, C.: Surface EMG pattern recognition for real-time control of a wrist exoskeleton. BioMed Eng OnLine 9, 41 (2010) 12. Petrova, A.I., Voznenko, T.I., Chepin, E.V.: The impact of artifacts on the BCI control channel for a robotic wheelchair. In: Misyurin, S.Y., Arakelian, V., Avetisyan, A.I. (eds.) Advanced Technologies in Robotics and Intelligent Systems. MMS, vol. 80, pp. 105–111. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33491-8_12 13. Rhee, K., Shin, H.-C.: Electromyogram-based hand gesture recognition robust to various arm postures. Int. J. Distrib. Sensor Netw. (2018)
Specification Language Based on Linear Temporal Logic for Automatic Construction of Statically Verified Systems Larisa Ismailova1 , Sergey Kosikov2 , Igor Slieptsov2 , and Viacheslav Wolfengagen1(B) 1
National Research Nuclear University “Moscow Engineering Physics Institute”, Moscow 115409, Russian Federation [email protected] 2 NAO “JurInfoR”, Moscow 119435, Russian Federation
Abstract. The given paper considers an approach to the construction of information systems, the interaction of which with the user can be described in the form of a small set of formal requirements. The means of formal specification are proposed in the form of a language allowing to express requirements compactly and close to how they are formulated by the developer. The language is an extension of the language of linear temporal logic. The language support tools ensure the construction of a supporting environment that is sufficient for the construction of the system and static verification of its correctness. Based on the previously proposed approach to the automatic construction of systems on the example of a model of asynchronous discrete interaction (question-answer) with the user, the paper demonstrates that a statically verifiable solution to the problem in the form of a function with a certain signature does exist. The specification language permits to identify a variety of interaction protocols that can be implemented regardless of user actions. Keywords: Information system · Formal specification temporal logic · Consistency · Verification
1
· Linear
Introduction
Among the methods of developing information systems, it is common to widely use the development based on a formal specification, which specifies the interaction of the system with the environment and/or the user. If the interaction can be represented in the form of a sequential exchange of question-answer messages, the data on the implemented interaction can be c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 164–169, 2022. https://doi.org/10.1007/978-3-030-96993-6_15
Specification Language Based on LTL
165
written in the form of a protocol, which is a sequence of pairs of questionanswer messages. In this case the formal specification, imposing a restriction on interaction, defines a predicate on a set of protocols. The given paper considers the problem applicable to asymmetric synchronous discrete interaction. This interaction model is widely used and is suitable for describing a large class of systems (remote call protocol (RPC) [2–4,8], LSP server [1], etc.). The paper examines the automatic construction of systems according to their formal specification, which imposes restrictions on a set of possible protocols of system interaction with the user, which must be followed regardless of user actions [9]. The paper considers a variant of discrete linear temporal logic [5] with no quantifiers and terms. 1.1
The Problem
The paper [9] proposed the architecture of the system and the signature of the function interpreting the specification, the implementation of which guarantees the correctness of the system, that is, under any user’s actions, the interaction will meet the requirements of the specification; the correctness of the construction procedure is ensured by the derivability of the type of the interpreting function within the dependent type system. The extension of linear temporal logic was offered to be taken as a specification language. The given paper proves the existence of a function with such a signature by constructing it for a definite specification language. All constructions are made formally. The adopted formal approach allows the implementation in a programming language with dependent types, their correctness is confirmed by the availiability of a prototype in Idris 2. The article also suggests and deploys directions for abstracting the implementation: – according to the language of questions and answers - the ability to specialize the system interface for applied tasks; – according to the semantics of atomic predicates - the ability to define a set of atomic predicates and their semantics in terms of an applied problem; – according to the model of abstract generators - the ability to create additional semantic markup of questions that can be asked at the current stage. 1.2
Related Works
Approaches to solving this problem have been proposed for 20 years [6]. The given approaches impose restrictions on the type of formulas and do not contain guarantees of the correctness of the constructed system, as well as static verification of the implementation of the proposed model. The paper [10] studies how it is possible to determine the framework semantic models hidden behind a variety of external forms of information processes in the network. Their goal is to correct illogical patterns, some of which are permanent and some change due to time/events.
166
L. Ismailova et al.
The paper [7] examines the use of inheritance and linking mechanisms while developing an information model, this allows the model developer to expand the properties of the class. The distinction between these two closely related representations is established and used when applied in aspect-oriented modeling. The paper [11] also observes the use of a marked-up metalanguage for interactive analysis of semantic processes. The given paper suggests a solution based on the architecture proposed in the paper [9].
2
Model of Asymmetric Synchronous Discrete Interaction with User
Asymmetric synchronous interaction in the question-answer mode consists of a sequence of stages, every one of which is divided into two phases: 1. in the phase of question the first of two interacting objects makes up a question y ∈ Y as a word of some known question language Y , and transmits it to the second object; 2. in the phase of answer the second of interacting objects forms the answer v ∈ V (y) - a message of some known language V (y), which, in common case, may depend on the asked question y, and transmits it to the first object. The synchronous client-server interaction over the RPC protocol [4] may serve as a sample or, in particular, over LSP protocol [1]. In this sample, the client initializes the interaction at each stage by sending a request, and waits for a response from the server to continue his work. We will consider infinite interaction with a protocol M = (y0 , v0 ), (y1 , v1 ), . . . V (y). (1) M:N→ y∈Y
3
Architecture of System
The paper [9] proposes the architecture of the system, parameterized by the φ specification. The system consists of a memory or database that stores the user interaction history, interpreting functions φ and a control mechanism. The interpreting function [φ] = (q, ω) maps each specification φ to a control function q and the proof ω of its correctness. The control function E
A
A
q(H) : (H |= φ) × G(H, φ) + H |= φ + H |= ¬φ + 1
(2)
Specification Language Based on LTL
167
y0 , v0 ,
0
1
A(y0 , v0 )
Mode (U or ¬A)
U
¬A
y, v
y, v
Mode (U or not)
Mode (A or G)
U
A
set g
y, v
y, v
G
y, v
0
y ∈ L(g)
1
Data produced by the system
Input data the system consumes
Branching on predicate
Fig. 1. Diagram of possible modes and produced ouput data (generators and proofs) depending on the input data (history of interaction).
determines the mode of operation of the system at each stage according to the available history H = (y0 , v0 ), . . . , (yn−1 , vn−1 ). In the generation mode the control function returns a question generator g : G(H, φ) which contains a nonempty set of questions L(g) ⊂ [Y ]+ , and a proof
168
L. Ismailova et al. E
H |= φ ⇒ ∃M (HM |= φ)
(3)
that there is a possible way to continue interaction according to the specification (· |= φ). In case when the interpreting function can deduce with H that the specification will or will not be satisfied, regardless of the following interaction, it returns the deduced proof A
H |= φ ⇒ ∀M (HM |= φ).
(4)
Some of specifications are decidable incorrect to generate the system (the last singleton 1). The correctness proof ⎧ E A ⎪ ⎪∀y ∈ Lg∀v (q(H, (y, v)) : (H |= φ) × G(H, φ) + H |= φ), ⎪ ⎪ ⎪ E ⎪ ⎪ ⎪ ⎪ q(H) = (α : H |= φ, g : G(φ, H)), ⎨ A A ω(H) : ∀y∀v (q(H, (y, v)) : H |= φ), q(H) = (α : H |= φ), ⎪ ⎪ ⎪ ⎪ A A ⎪ ⎪ ⎪∀y∀v (q(H, (y, v)) : H |= ¬φ), q(H) = (α : H |= ¬φ), ⎪ ⎪ ⎩ True, q(H) : 1 (5) expresses with Curry-Howard isomorphism restrictions of available modes on the next stage depending of the mode on the current one. The control mechanism provides the user interaction based on the value of the control function q(H) and, if q(H) has returned the question generator g, selects a question y ∈ L(g) to send to the user. The diagram on Fig. 1 illustrates the requirements to the implementation of the interpreting function [A] of atomic predicate A. The diagram is a formal consequence of the type signature (2) and the requirement of a proof (5).
4
Conclusion
The paper formalizes the architecture proposed in the paper [9] and suggests a specification language based on linear temporal logic without quantifiers. The formal interpretation maps each specification to a pair (q, ω) of the function q, which is used as part of the system to select a question message at each stage, and the proof ω that the constructed system will interact with the user strictly according to the specification regardless of the user’s actions. Acknowledgements. This research is supported in part by the Russian Foundation for Basic Research, RFBR grants 20-07-00149-a, 19-07-00326-a, 19-07-00420-a.
References 1. B¨ under, H.: Decoupling language and editor-the impact of the language server protocol on textual domain-specific languages. In: MODELSWARD, pp. 129–140 (2019)
Specification Language Based on LTL
169
2. Choi, K., Chang, B.M.: A theory of rpc calculi for client-server model. J. Funct. Program. 29, 1–39 (2019) 3. Choi, K., Cheney, J., Fowler, S., Lindley, S.: A polymorphic RPC calculus. Sci. Comput. Program. 197, 102–499 (2020) 4. Cooper, E.E., Wadler, P.: The RPC calculus. In: Proceedings of the 11th ACM SIGPLAN Conference on Principles and Practice of Declarative Programming, pp. 231–242 (2009) 5. Emerson, E.A.: Temporal and modal logic. In: Formal Models and Semantics, pp. 995–1072. Elsevier (1990) 6. Fisher, M.: Concurrent MetateM — a language for modelling reactive systems. In: Bode, A., Reeve, M., Wolf, G. (eds.) PARLE 1993. LNCS, vol. 694, pp. 185–196. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-56891-3 15 7. Ismailova, L., Wolfengagen, V., Kosikov, S.: Hereditary information processes with semantic modeling structures. Procedia Comput. Sci. 169, 291–296 (2020) 8. Lampson, B.W.: Distributed systems-architecture and implementation. an advanced course was held from march 4 to march 13, 1980 at the technische universitaet muenchen (1981) 9. Slieptsov, I., et al.: Construction of statically verified system interacting with user in question-answer mode according to the specification set by the formula of linear temporal logic (2021) 10. Wolfengagen, V., Ismailova, L., Kosikov, S.: Capturing information processes with variable domains. Procedia Comput. Sci. 169, 276–283 (2020) 11. Wolfengagen, V., Ismailova, L., Kosikov, S., Babushkin, D.: Modeling spread, interlace and interchange of information processes with variable domains. Cogn. Syst. Res. 66, 21–29 (2021)
Semantic Management of Domain Modification in a Virtual Environment for Modeling Vulnerable Information Subjects Larisa Y. Ismailova1 , Viacheslav E. Wolfengagen1(B) , and Sergey V. Kosikov2 1
National Research Nuclear University “MEPhI” (Moscow Engineering Physics Institute), Kashirskoe shosse, 31, Moscow 115409, Russia [email protected] 2 NAO “JurInfoR”, Malaya Pirogovskaya street, 5, Moscow 119435, Russia
Abstract. The paper deals with the development of information technology tools for semantic stabilization of interaction in virtual environments. The basis of the approach is the development and systematic application of a semantic network of processes. As part of the model, the subjects who are vulnerable to targeted information influence are described. The paper proposes a semantic model of the virtual environment, which allows registering the occurrence of a channel of potential undesirable effects. One of the distinctive features of the use of semantic information during conceptual modeling is the generation of a family of related concepts along pre-laid “trajectories”. In this regard, the conceptual design of a virtual environment is constructed as a sequence of several stages: (1) the virtual world is conceptually modeled using the generated semantic data; (2) semantic data is used to generate families of concepts; (3) this families form a synonyms of subject concepts. After generation, the main displaced concepts that are characteristic of the subject application of the virtual environment are formed. The paper offers a tool that provides control of domain modifications using handlers and adapters. Interaction with the environment is carried out by matching the selected network fragment (specified in the user interface) with a generalized domain network that implements the semantic interaction environment (the system and subject semantic components available in the environment). The system configuration of the environment is performed by specifying the system of states and possible transitions, which are controlled by a family of semantic handlers. The subject configuration of the environment is performed by means of families of semantic adapters.
Keywords: Virtual environment stabilization
· Variable domain · Semantic
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 170–175, 2022. https://doi.org/10.1007/978-3-030-96993-6_16
Semantic Management of Domain Modification
1
171
Introduction
The following development of methods for extracting and processing knowledge is one of the main ways of developing information technologies. Under intensive development are variants of approaches that became classical in the 70s of the last century, intended to display such features of subject areas as the dependence of the properties of subject area objects on time, the aspect of consideration, the traceability of the dynamic nature of subject areas, the possibilities of different origin of objects and their history, etc. [8]. Each method of describing the subject area requires an appropriate environment for its support. Thus, the task of developing virtual semantic modeling environments continues to be relevant. The semantic modeling environment contains objects of various types. It is common practice to distinguish real, possible and virtual objects [3]. The most interesting are the virtual objects introduced to increase the order in the domain model. Such objects can occur both in the domain description language to increase its uniformity, and in computations as intermediate results. They are characterized by the fact that they can temporarily violate the restrictions imposed on the domain model. Some objects of the model can reflect individual parts of the model as a whole in order to enlarge its adequacy, that is, to support a partial information model of the subject area in itself. This property is an essential characteristic of the object, since ordinary objects, as a rule, are arranged differently. In most methodologies for constructing models [8], when designing objects, it is recommended to observe the principle of encapsulation, which ensures the independence of the internal structure and the way of functioning of the object from its external environment. To support a partial information model, the subjects must interact with each other as well as with the objects of the model. Therefore, as a rule, a sign of activity is attributed to subjects. Active objects, while working with the model, are endowed with the capability to generate actions or initiate the performance of certain processes in the model [6]. In general, the support of partial information models of the subject area is an open problem that does not have a comprehensive solution within the framework of existing information technologies [4]. The technical means of expressing partial domain models mapped to subjects is the concept of a variable domain. The concept of a variable domain arises in the approach to modeling subject areas based on category theory. The domains, in this regard, correspond to the objects of the category lying in the model’s basis, and the model elements correspond to the arrows of the category [5]. At the conceptual level a variable domain arises as a result of indexing model elements. Indexes can be considered as a means of assigning elements of the general model of the subject area to a particular class of partial models. As a rule, the index in this sense is denoted by the semantically neutral term “correlation point”. In category theory, the construction of indexing corresponds to the fibered product [8]. When assigning model elements to partial models or their classes, partial model elements arise. Contradictory partial models lead to the fact that the corresponding partial elements cannot be continued to complete ones. The task
172
L. Y. Ismailova et al.
of managing such elements is currently not solved and can be considered as an urgent problem in the development of information technologies.
2
Virtual Environments
The characteristic of the considered semantic modeling (simulation) environments as virtual ones requires refinement. The simulation environment can be considered as a virtual one, since it provides modeling capabilities that are independent of the real supporting system. Such an environment actually implements an abstract machine that provides modeling of subjects containing partial models of the subject area, and the transmission of messages from subjects to objects (including to other subjects). The method of implementing a virtual machine suggests providing the ability to develop and maintain virtual objects. Practically virtual objects arise in the course of computations and can include both parts of real (actual) objects and parts introduced to establish order in the domain model. In practice, such parts can be represented by special values or procedures that are called to obtain a value based on the state of the entire environment as a whole.
3
Related Work
The task of semantic management of domain modifications is a special case of the task of including semantic information into virtual environments. Various approaches to solving this problem are known, the approaches that focus on the interaction of objects are the closest to the proposed one. It is known that the content of virtual environment support systems is usually intended for presentation to people and, therefore, is not suitable for machine access. The paper [1] considers 3D content services in virtual environments in connection with the extensibility of the basic system. An eXtensible Application Markup Language (XAML) is proposed as the base language for including external modules. The paper [7] proposes to use ontologies to present semantic information to virtual environments, which can also be used by intelligent virtual agents. A semantic architecture oriented on ontology is presented, which mainly supports the representation and interaction of complicated objects. The paper notes [9] that ontologies are used to solve terminological problems or to provide automatic processing of the information. They are also used to improve software development. One of the promising areas of ontologies application is virtual reality. The considered approaches demonstrate a variety of methods for interpreting semantic information in virtual environments [2]. At the same time, the integration of methods for modeling the semantics of the subject, including the construction of partial models of the subject area, and methods for modeling the interaction of subjects, as a rule, is not achieved. This task continues to be a pressing problem of the information technologies development.
Semantic Management of Domain Modification
4
173
Task Setting
Consideration of various options for interaction of subjects in a virtual environment and possible ways to modify partial domain models compared to subjects allows us to formulate requirements to the theoretical methods for describing information models based on variable domains, tools for supporting virtual environments that provide domain modification management, and methods for their development. Theoretical methods should provide for as follows: – description of indexed domains; – the ability to track the information history of domains; – the ability to modify domains to ensure the integration of partial information models and the evaluation of the integration results. The tools should provide for as follows: – the ability to define elements of variable domains, including the information subjects containing partial information models. Models can be represented as contexts; – the ability to control the definition of domain elements in order to ensure their semantic coordination; – the possibility of domain modification, including both domain expansion (when identifying data that expands the composition of elements or their structure) and narrowing (when identifying inconsistencies of partial information models).
5
Theoretical Model
The domain modification modeling management can be provided on the basis of the variable domain formalism in category theory. We assume that the basic concepts of category theory are known (see, for example, [8]). The domain can be described as a functor U from the category Asg, which describes the structure of a set of partial domain models, to the base category C. The domain modification itself is described as a transition from the value of the U functor on one object of the Asg category to its value on another object. The transition is defined by a mapping, which is traditionally called a constraint mapping (although it can describe not only a constraint, but also an extension of elements). While defining the constraint mapping it is convenient to omit the indication to the U functor when it may be retrieved from the context. In accordance with this it is possible to introduce the following notion: (U f )(a) = af, where f is an arrow of category Asg, a is an element of object U (dom f ). The functor character of mapping may be expressed by the following way: a(f ◦ g) = (af )g a1a = a, where f and g are arrows of category Asg, k are objects of category Asg.
174
6
L. Y. Ismailova et al.
Tools
To support practically the domain modification in a virtual modeling environment, the authors developed a virtual modeling environment (hereinafter referred to as the environment) that provides a description of partial information models. The development of the environment was carried out in the JavaScript language. The environment provides a two-level conceptual description of the subject area. The objects of the general domain model and the processes in which they participate are described at the lower level. The partial information models of the system subjects are described at the upper level. The proposed method provides a description of partial models in the form of graphical constructions called diagrams. The constructed environment was used in the practical development of training systems in the field of jurisprudence. A training system was constructed that simulates the actions of stakeholders in obtaining an integrated environmental permit. The implementation of the system confirmed the sufficient expressive capacity of the proposed methods and tools.
7
Conclusion
The paper characterizes vulnerable information subjects and keeps tracks of the characteristics of the virtual modeling environment of such subjects. As a method of modeling by subjects the domain modification management is offered. A means of describing such subjects in the form of variable domains is proposed. Theoretical methods and tools for semantic management of domain modification are described. The proposed tools provide: – description of vulnerable information subjects based on the concepts of process and context; – support to a virtual environment of modeling subjects based on the concept of a diagram; – support to the domain modifications based on environment objects representing a variable context; – domain modification management based on the allocation of the semantics of the context elements and consideration of the expansion and narrowing of the representing elements when integrating information models. The proposed methods and tools were used to create models of the subject area in the field of jurisprudence. The use confirmed the practical applicability of the described methods and tools. Acknowledgements. This research is supported in part by the Russian Foundation for Basic Research, RFBR grants 20-07-00149-a, 19-07-00326-a, 19-07-00420-a.
Semantic Management of Domain Modification
175
References 1. Alahmad, R., Robert, L.P.: Capturing the complexity of cognitive computing systems: co-adaptation theory for individuals, pp. 93–95. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3458026.3462148 2. Angles, R., Arenas, M., Barcel´ o, P., Hogan, A., Reutter, J., Vrgoˇc, D.: Foundations of modern query languages for graph databases. ACM Comput. Surv. 50(5), 68:1–68:40 (2017). https://doi.org/10.1145/3104031, http://doi.acm.org/10.1145/3104031 3. Ismailova, L., Kosikov, S., Wolfengagen, V.: Prototype mechanisms for supporting the network of links to parameterized data objects. Procedia Comput. Sci. 190, 317–323 (2021). https://doi.org/10.1016/j.procs.2021.06.042, https://www. sciencedirect.com/science/article/pii/S1877050921012862 4. Ismailova, L., Wolfengagen, V., Kosikov, S.: Cognitive system to clarify the semantic vulnerability and destructive substitutions. Procedia Comput. Sci. 190, 341–360 (2021). https://doi.org/10.1016/j.procs.2021.06.044, https://www. sciencedirect.com/science/article/pii/S1877050921012898 5. Ismailova, L., Wolfengagen, V., Kosikov, S.: A mathematical model of the feature variability. Procedia Comput. Sci. 190, 312–316 (2021). https://doi.org/ 10.1016/j.procs.2021.06.041, https://www.sciencedirect.com/science/article/pii/ S1877050921012850 6. Ismailova, L., Wolfengagen, V., Kosikov, S.: A semantic model for indexing in the hidden web. Procedia Comput. Sci. 190, 324–331 (2021). https://doi.org/ 10.1016/j.procs.2021.06.043, https://www.sciencedirect.com/science/article/pii/ S1877050921012874 7. Vishal, J., Akash, T., Jaspreet, S., Arun, S. (eds.): Cognitive Computing Systems: Applications and Technological Advancements, 1st edn. Apple Academic Press, Cambridge (2021) 8. Lawvere, F.W., Schanuel, S.J.: Conceptual Mathematics: A First Introduction to Categories. Cambridge University Press, Cambridge (1997) 9. Xie, Y., Ravichandran, A., Haddad, H., Jayasimha, K.: Capturing concepts and detecting concept-drift from potential unbounded, ever-evolving and highdimensional data streams. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, C.J. (eds.) Data Mining: Foundations and Practice. Studies in Computational Intelligence, pp. 485–499. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-784883 28
Semantic Stabilization Tools for Managing the Cognitive Activity of the Subject Larisa Ismailova1 , Sergey Kosikov2 , Igor Slieptsov2 , and Viacheslav Wolfengagen1(B) 1
National Research Nuclear University “Moscow Engineering Physics Institute”, Moscow 115409, Russian Federation [email protected] 2 NAO “JurInfoR”, Moscow 119435, Russian Federation
Abstract. The paper considers a model of knowledge extraction based on the conceptual modeling of user interaction with a domain-oriented virtual environment. The environment is modeled as a network of information graphs that changes its structure over time. This allows us to set the task of supporting the modeling of the cognitive activity of the subject in a changing environment. The description of changes is made on the basis of a parameterized computational model using the construction of a variable domain. The paper shows that a given set of domain variables can be considered as nested in a topos, which provides a natural construction of program structures. The paper offers a tool for working with variable domains, which provides the task of semantically stable fragments of information graphs. The tool is a specialized applicative-type evaluator that provides computations in a changing environment. The build-up of network vertices and connections can lead to system contradictions, for the resolution of which it is necessary to include special handlers, the number of which can grow excessively. The evaluator provides a solution to the problem of managing handlers. Attempts to solve the mentioned difficulties and contradictions in practice lead to the idea of a multi-layer network architecture and semantic adapters.
Keywords: Virtual environment stabilization
1
· Variable domain · Semantic
Introduction
The problem for conceptual modeling of activity, including cognitive activity, continues to attract the attention of researchers. The difficulty of modeling an activity is not least because of the difficulty of choosing a conceptual basis for its
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 176–181, 2022. https://doi.org/10.1007/978-3-030-96993-6_17
Semantic Stabilization Tools
177
description [6]. As a rule, interaction in some environment is considered, which leads to the construction of virtual interaction environments. The environment, as a rule, allows the display of semantic information, which is used in describing the interaction. However, the choice of an adequate conceptual basis for describing semantics is still an open problem. The virtuality of the environment is understood as the possibility of modeling by creating virtual contexts independent of the real supporting system, and placing the modeled objects in the appropriate contexts, which provides management of their interaction. In practice, the virtuality of the environment is ensured by the introduction of a special class of objects - the virtual objects. Such objects may violate the integrity constraints imposed on system objects. The most known example is floating-point arithmetic expressions in the IEEE 754–2008 standard, which includes the representation of positive and negative zero, positive and negative infinity, as well as Not-a-Number values [8]. Particular interest is caused by the models of the subject area objects, acting on the basis of their existing knowledge in accordance with the accepted goals. Such objects will be further referred to as subjects. An essential characteristic of a subject is that it has a certain model of the subject area, on the basis of which it plans and carries out its actions. Such a model, in principle, may coincide with the model of the subject area as a whole, but in general it is partial, i.e., it reflects only part of the full model of the subject area. The replenishment of a partial model for a specific subject can be considered from the positions of cognitive science. In general, cognitive activity is understood as “grasping” and fixing meaning. The results of cognitive activity can be connected with the formation of a system of meanings (concepts) related to information about the actual or possible state of things in the world [1]. The environment can be considered in different ways. The given paper considers the environment as a network of information graphs consisting of vertices and arcs (links) [9]. The partial model used by the subject has a similar structure. As time passes, the network structure may change: new connections may appear, and old connections may lose relevance. Partial models in such cases may no longer correspond to the general model, which leads to the appearance of inconsistency in the models. The process of restoring consistency will be considered as a process of semantic stabilization.
2
Related Work
Semantic stabilization for cognitive activity management is a particular problem from the sphere of support to semantically oriented models of cognitive activity in virtual environments. The activity considered in this paper includes interaction with the user. Similar problems have been researched both in connection with the construction of semantically oriented models and in connection with the construction of virtual environments. The WordNet project [2] causes interest as in it the subject area model is a network of concepts, represented by words, with synonymy relationships
178
L. Ismailova et al.
established on them. However, the poverty of the types of connections between concepts limits the possibilities of using the model for describing practical subject areas. Another possible approach is accepted in the paper [4], which presents the structure of multimodal interaction for semantic manipulation of 3D objects in virtual reality. However, the paper treats the virtual environment narrowly as an environment of interaction with 3D objects, which makes it difficult to generalize the accepted approach into a wider class of subject areas. The same direction of research is seen in the paper [5], which focuses on assistance in 3D interaction, adding adaptability depending on the tasks, goals and the general context of interaction. However, here the understanding of the virtual environment is limited by spatial relationships. A different approach is applied in the paper [3], where the concepts of semantic stability are used for natural language processing. However, the model used is strictly oriented to natural language processing, which makes it difficult to generalize it. In the whole, the existing approaches to research the provision of semantic stability when modeling the activity of a subject in a virtual environment focus either on providing semantically meaningful work in a virtual environment of a specific type (usually in a 3D interaction environment), or on ensuring semantic stability with a rigidly defined domain model. Due to this the problem of semantic stability research when adopting a subject area model of sufficiently common type continues to be critical.
3
Setting the Problem
The accepted modeling tools of the subject area in the form of a network of information graphs permit to consider the cognitive activity of the subject in the form of a restructuring of a partial model of the subject area associated with the subject. The model of cognitive activity is presented in the form of means of systematic purposeful generation of network vertices (as well as other network elements: links, marks, etc.). Systematicity is understood as a property according to which the information graphs, in their turn, control the creation of vertices, which allows describing, in particular, the strategies for knowledge increment. Purposefulness is understood as a hierarchical organization of a system in which the overlying elements define goals for the underlying ones. The specific configuration of the network or part of it may be mapped to the expression of a specialized language, which is a variant of the language of higher-order intuitionistic logic. The language expression signification support is needed in relation to a particular state of network or a sequence of such states. The sequence is also set by means of the network. The accepted approach makes it possible to formulate requirements for supporting tools for working with variable domains, which ensures the setting semantically stable fragments of information graphs. The tool is a specialized applicative-type evaluator that provides computations in a changing environment. The evaluator must provide:
Semantic Stabilization Tools
179
– manipulation of conceptual graphs, including the creation of vertices and connections, their modification, change of markings and deletion; – support for embedding conceptual graphs of one into another as a vertex; – creation of subgraphs embedded in various contexts; – recursive embedding, which ensures the embedding of the subgraph itself. The description of changes in the subject area model requires adequate methods. The paper adopts the approach developed earlier in [7] and based on the concept of a variable domain. The language and methods of category theory are essentially used for the description [10]. A variable domain is understood as a functor from the Asg category, which formally defines the models of subjects and their interrelationships, to the basic category C.
4
Tools
The modeling of the elements of cognitive activity of the subject involves both modeling the transmission of messages containing semantically meaningful fragments of the network, and performing computations that allow to bring up some fragments of the network from others. In accordance with this the proposed tools are grouped into two libraries: ProcessJS and LambdaJS. The libraries are written in the ECMAScript 5th Edition dialect and use the AMD/RequireJS module system. The LambdaJS library provides a Turing-complete computing system based on the λ-calculus, with explicit and serialized storage of computational expressions and with the ability to integrate with the JavaScript computing environment. The library is designed to extend the Turing-complete functional application programming language (hereinafter referred to as Lambda) with transformational semantics, the expressions of which are serializable. The ProcessJS library provides a computing system ensuring support for defining and performing a set of sequential processes that interact both with each other and with an external user. Processes are defined as an inductive class in a specialized process language that allows to represent a specific set of processes in the form of a term constructed recursively. The language provides a set of basic processes that ensure different types of user interaction and computations (computations are described by means of a supporting programming system). The tools of process composition are also provided, they serve to obtain complex processes from simpler ones.
5
Support to Multi-layer Architecture
The networks of information graphs appearing in problems of supporting systems for modeling cognitive activity of subjects usually have a multi-layer structure. At the top level of the network there are information graphs that carry out goal setting. At the next level there are graphs that organize the computation process. At the lower level there are graphs that work with data taking into account their syntactic and semantic features. The LambdaJS and ProcessJS libraries ensure
180
L. Ismailova et al.
the support for a multi-layer architecture, respectively, by maintaining functional mechanisms and mechanisms for supporting high-level operations on processes. In particular, the possibility of forming closures, which are serializable objects containing the values of free variables, provides both interaction between the layers of the network and the ability to configure the system.
6
Semantic Adapters
The support of the multi-layer architecture of the system critically depends on the mechanisms of matching fragments of information graphs according to data. Special programming components - semantic adapters are used for this purpose. Semantic adapters are components associated with real data and serve to support the representation of the corresponding virtual data consistent with a common semantic model. The LambdaJS and ProcessJS libraries ensure the support for semantic adapters. In particular, one of the standard support constructions is to provide the ability to call a specific processing function in accordance with the type of the received argument. In general case, it is possible to additionally analyze an argument and generate an adapter in the form of a composition of basic adapters, which are performed by LambdaJS.
7
Conclusion
The paper considers a model of cognitive activity of the subject, presented as a set of ways to expand the subject area model based on information graphs. It shows that such an extension, generally speaking, can lead to a contradictory model of the subject area, and formulates the requirement of semantic stability as the absence of contradictions during the expansion. Variable domains were used to describe the extension of the model. The LambdaJS and ProcessJS libraries were presented as tools to support the processing of the subject area model. The Lambda library provides for as follows: – inputting of constructions of a typical lambda-calculus; – checking the types of objects; – processing of closures, which makes it possible to transfer the values of free variables in accordance with the discipline of static binding; – performing higher-order operations, such as map, reduce, filter; – serialization of objects. The ProcessJS library provides for as follows: – representation of user interaction as a set of conceptually parallel processes; – a set of basic processes of interaction with the user, giving out and receiving the information; – basic process management tools (start, stop, pause, resume); – tools of combining processes to obtain composite processes; – process analysis tools.
Semantic Stabilization Tools
181
The developed tools were used to support a network of information graphs with a multi-layer architecture and to develop semantic adapters that provide semantic stability when building up a semantic network. Acknowledgements. This research is supported in part by the Russian Foundation for Basic Research, RFBR grant 20-07-00149-a.
References 1. Gaume, B., Duvignau, K., Pr´evot, L., Desalle, Y.: Toward a cognitive organization for electronic dictionaries, the case for semantic proxemy. In: Proceedings of the Workshop on Cognitive Aspects of the Lexicon, COGALEX 2008, pp. 86–93. Association for Computational Linguistics, USA (2008) 2. Green, R., Pearl, L., Dorr, B.J., Resnik, P.: Mapping lexical entries in a verbs database to wordnet senses. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 244–251. Association for Computational Linguistics, USA (2001). https://doi.org/10.3115/1073012.1073044 3. Huang, D., Pei, J., Zhang, C., Huang, K., Ma, J.: Incorporating prior knowledge into word embedding for Chinese word similarity measurement. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17(3) (2018). https://doi.org/10.1145/3182622 4. Irawati, S., Calder´ on, D., Ko, H.: Semantic 3D object manipulation using object ontology in multimodal interaction framework. In: Proceedings of the 2005 International Conference on Augmented Tele-Existence, ICAT 2005, pp. 35–39. Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10. 1145/1152399.1152407 5. Irawati, S., Calder´ on, D., Ko, H.: Spatial ontology for semantic integration in 3D multimodal interaction framework. In: Proceedings of the 2006 ACM International Conference on Virtual Reality Continuum and Its Applications, VRCIA 2006, pp. 129–135. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1128923.1128944 6. Ismailova, L., Kosikov, S., Zinchenko, K., Wolfengagen, V.: Environment of modeling methods for indicating objects based on displaced concepts. In: Samsonovich, A.V. (ed.) Biologically Inspired Cognitive Architectures 2019, pp. 137– 148. Springer International Publishing, Cham (2020) 7. Ismailova, L., Wolfengagen, V., Kosikov, S., Volkov, I.: Computational model for granulating of objects in the semantic network to enhance the sustainability of niche concepts. In: Samsonovich, A.V. (ed.) Biologically Inspired Cognitive Architectures 2019, pp. 157–164. Springer International Publishing, Cham (2020) 8. Jeannerod, C.P., et al.: Techniques and tools for implementing IEEE 754 floatingpoint arithmetic on vliw integer processors. In: Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, PASCO 2010, pp. 1–9. Association for Computing Machinery, New York, NY, USA (2010). https://doi.org/10. 1145/1837210.1837212 9. Slieptsov, I.O., Ismailova, L.Y., Kosikov, S.V.: Representation of conceptual dependencies in the domain description code. In: Samsonovich, A.V. (ed.) Biologically Inspired Cognitive Architectures 2019, pp. 507–514. Springer International Publishing, Cham (2020) 10. Wolfengagen, V., Ismailova, L., Kosikov, S.: Cognitive technology to capture deep computational concepts with combinators. Cogn. Syst. Res. 71, 9–23 (2022). https://doi.org/10.1016/j.cogsys.2021.10.001
Intelligent Web-Application for Countering DDoS Attacks on Educational Institutions Ivanov Mikhail1 , Radygin Victor2(B) , Sergey Korchagin1 , Pleshakova Ekaterina1 , Sheludyakov Dmitry3 , Yerbol Yerbayev4 and Bublikov Konstantin5
,
1 Financial University Under the Government of the Russian Federation, Shcherbakovskaya, 38,
Moscow, Russian Federation {MNivanov,SAKorchagin,ESPleshakova}@fa.ru 2 National Research Nuclear University “MEPHI”, 31 Kashirskoe Shosse, Moscow 115409, Russian Federation [email protected] 3 Yuri Gagarin State Technical University of Saratov, Polytechnic, 77, Saratov, Russian Federation 4 Zhangir Khan West Kazakhstan Agrarian-Technical University, Uralsk, Republic of Kazakhstan 5 Institute of Electrical Engineering of the Slovak Academy of Sciences, Dubravska Cesta 3484/9, Bratislava, Slovakia [email protected]
Abstract. The work is devoted to the development of an intelligent prevention system DDoS attacks on educational institutions using blockchain technology. The principles of developing decentralized applications were studied using smart con-tracts. A model of a system for countering DDoS attacks using blockchain has been developed. A new architecture of an intelligent system is proposed using the blockchain to counter cyber-attacks, such as distributed denial of service. A com-putational experiment for issuing DDoS attacks on educational institutions was carried out. A comparison of the proposed intelligent system with traditional countermeasures against DDoS attacks used by educational institutions to ensure information security. The advantages of an intelligent blockchain system are established prevention of DDoS attacks, applied to political institutions. The results will be useful to specialists and researchers in both the field information security, and in the field of social and political sciences. Keywords: Information security · DDoS attack · Artificial intelligence · Education · Web-application · Digital technologies
1 Introduction Modern realities require interaction with information systems of various scales in all spheres of human life and society as a whole. With the growth of the volume and significance of the processed information, the problems of ensuring information security, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 182–194, 2022. https://doi.org/10.1007/978-3-030-96993-6_18
Intelligent Web-Application for Countering DDoS
183
in particular, of political institutions, become priority. Political institutions, as an object of information security, have a number of features, among which are noted: a wide range of DDoS attacks (100–500 Gb/s), multilevel (L1-L7 according to the OSI network model), the complexity of using protocols (NTP, DNS, SNTP, HTTP, Charden, SSDP, etc.), scale of impact and consequences [1–3]. There is a large number of works and applied tools designed to counter DDoS at-tacks. Solutions that allow organizing a coordinated response to DDoS attacks by network forces, while the development of such solutions is an actively developing and promising area. Despite the large number of traditional tools for protecting against DDoS attacks, the number of victims of such attacks is increasing every year [4]. Currently, there is a growing interest in the development of decentralized information systems to solve information security problems of educational institutions [5]. In such systems, the integrity and protection of information is maintained using blockchain technology, in contrast to traditional systems, where each attack object has its own protection system. DDoS attacks tend to occur within a single domain network, each independently revealing an ongoing attack. Systems of protection against DDoS attacks using blockchain technology and smart contracts that provide the necessary mechanism without the need to develop a new protocol and allow the exchange of information about ongoing attacks in a fully distributed and automated mode. The foregoing determines the relevance of the work, as well as its goals and objectives. The aim of this work is to develop an intelligent system based on blockchain technology that will protect the information networks of educational institutions from attacks such as distributed denial of service (DDoS). Research objectives include comparative analysis of existing blockchain algorithms in relation to information security tasks; modeling a system of protection against DDoS attacks using block-chain technology, development of a program to implement a coordinated response to DDoS attacks.
2 Overview of Methods for Preventing DDoS Attacks If a DDoS attack is detected, it cannot be done nothing else but to manually fix the problem and disconnect the victim’s system from the network. DDoS attacks block many resources, for example, limiting processor power and network bandwidth, memory, processing time, etc. The main goal of any DDoS defense mechanism is how DDoS attacks can be detected as soon as possible and stopped as close to their sources as possible. DDoS protection schemes are divided into four classes depending on the deployment location: source, victim, intermediate routers, and distributed or hybrid defense mechanism [6]. The advantages and disadvantages of all considered approaches are shown in Table 1. Let us take a quick look at each of these methods. Defense mechanisms installed on the side of the attack source. In this type of DDoS protection mechanisms, tools are deployed at the source of the attack to prevent network users from creating DDoS attacks. With this approach, source devices identify malicious packets in outbound traffic and filter or restrict traffic. Detecting and preventing a DDoS attack at the source is the best possible defense, since there is minimal damage to legal traffic [7–9].
184
I. Mikhail et al. Table 1. Comparison of DDoS prevention methods.
Defense mechanism from DDoS
Advantages
Limitations
Means of protection, installed on source side attacks
Detection and stop DDoS attacks provide the best protection because legal traffic is minimal damage; minimum traffic volume, which will be tested on the source that requires fewer machine resources detection and prevention
DDoS attack detection is difficult because the sources are widely distributed across the network, and one of the sources can carry normal traffic; complexity of system deployment at each source
Means of protection, installed on victim side attacks
DDoS attack detection is relatively easy due to high volume availability resources; the best practicable type of protection circuit for protection of web servers, since providing critical services servers always try protect your resource
During DDoS attacks, resources victims, such as gateway network capacity are often overloaded, and these approaches cannot stop traffic coming per sacrifice; detection of an attack only after how she reaches the victim, and detecting an attack when legitimate clients have already been rejected, not practically applicable; complexity of system deployment at each source
Means of protection, installed on intermediate routers
Detection and tracking sources of attacks are easy in this approach through joint operation of several routers; traffic is aggregated, i.e. attackers, and legitimate packets arrive at the router, and this is the best place to rate all traffic
The main difficulty with such the approach is deployment; to achieve full detection accuracy, all routers on the Internet will need to follow this discovery pattern because unavailability this scheme in one router can cause the discovery and tracing process to fail; full practical implementation extremely difficult because this requires reconfiguration of all routers on the Internet (continued)
Intelligent Web-Application for Countering DDoS
185
Table 1. (continued) Defense mechanism from DDoS
Advantages
Limitations
Distributed or hybrid vehicles protection
Detection can be implemented on the victim’s side, and a response can be initiated and extended to other nodes a victim; distribution of detection and mitigation methods at different ends of the network may be more beneficial
Close cooperation is required between deployment points; complexity and overhead due to cooperation and communication between distributed components scattered all over the internet
Defense mechanisms installed on the victim’s side of the attack. In this type of DDoS protection mechanisms, the victim detects, filters or limits the rate of malicious incoming traffic on routers victim networks, that is, networks that provide web services. Legal and attack traffic can be clearly identified using either misuse-based intrusion detection or anomaly-based intrusion detection [10]. However, attack traffic reaching the victim can fail or degrade the quality of service and drastically re-duce the bandwidth [11]. Protection mechanisms installed on intermediate routers. Any router on the net-work can independently attempt to detect malicious traffic and filter or rate-limit the traffic. Is he can also adjust the balance between detection accuracy and attack bandwidth consumption [11]. Locating and tracing the sources of attacks is made easy by working together multiple routers on the network. At this point of defense, all traffic is aggregated, i.e. both attackers and legitimate packets arrive at the router, and this is the best place to speed limits for all traffic [12]. Distributed or hybrid defense mechanisms. This type of protection may be the best strategy against DDoS attacks. Hybrid defense mechanisms are deployed (or their components are distributed) in multiple locations, such as attack source, victims, or intermediate networks, and usually interaction is carried out between the points of deployment [13]. Router mechanisms are best suited for rate-limiting all types of traffic, while victimside mechanisms can accurately detect attack traffic in a combination of legitimate and attacking packets. Therefore, using this DDoS protection strategy may be more beneficial [14]. The method proposed in this work can be classified as a distributed or hybrid defense mechanism against DDoS attacks. A distinctive feature of the method pro-posed in the study is the use of blockchain technology, in particular, the use of the Stratis decentralized application development platform.
3 Development of an Intelligent System for Countering DDoS Attacks Using Blockchain 3.1 A Model for Reducing the Impact of DDoS Attacks on Political Institutions The proposed model for mitigating the impact of DDoS attacks on the network is based on combining the advantages of blockchain technology and software-defined networking
186
I. Mikhail et al.
(SDN). Flexibility in configuring SDN solutions within domains allows you to quickly implement changes in network policy without sacrificing performance, and the capabilities of cross-domain interaction of decentralized blockchain-based applications allow you to organize a sufficiently fast and effective automatically recon-figuring network to combat both cross-domain attacks and within one network domain. Consider the following scenario for a DDoS attack on educational institutions (see Fig. 1). A web server hosted in an Autonomous System (AS) C is exposed to a DDoS attack from devices located in different domains (AS A, B, and C). With an inconsistent approach to preventing DDoS attacks between houses, the web server relies on protection mechanisms that are implemented in the network where it is located, which in most cases is located at a distance from the source of parasitic traffic, as a consequence of overloading immediately - how many domains. To organize a coordinated counteraction to DDoS attacks between users and the AS, it is necessary to develop a smart contract and publish it to the blockchain net-work with support for smart contracts. Thus, as soon as one of the users or the AS detects a DDoS attack on the network, he should send the IP address from which the parasitic traffic is sent to the agreed smart contract. As described earlier, the time to create a new block in the Stratis network takes 16 s, and therefore, after processing the called smart contract, the ASs subscribed to it will receive an updated list of addresses for the blacklist and confirm the fact of the attack if the received ad-dress matches the source parasitic traffic.
Fig. 1. Scenario for a DDoS attack on political institutions.
As soon as the rest of the ASs receive the list of attackers and confirm that an attack is taking place, each of them should run various mitigation strategies in accordance with the security policies and mechanisms available in the domain. In addition, the AS can block
Intelligent Web-Application for Countering DDoS
187
malicious traffic near the place of its origin, which is the best solution for the Internet in general, since it allows you to reduce the total cost of forwarding traffic, packets in which in the event of DDoS attacks mainly consist of useless and heavy information. In a situation of cross-domain interaction, nodes participating in joint protection, upon receiving information about attacks, can take measures in accordance with their security policies. However, in this case, a reward mechanism for helping to prevent the attack is needed in order to stimulate each participant to work together. 3.2 A Model for Reducing the Impact of DDoS Attacks on Political Institutions As the number of DDoS attacks continues to increase and vary according to their nature, the need for a coordinated response to effectively circumvent attacks also increases. It should be noted that cooperation between users and ASs is an important complementary approach to existing protection mechanisms.
Fig. 2. System model based on SDN architecture.
The proposed architecture consists of 3 components (see Fig. 2): users/clients send white and black lists of IP addresses to smart contracts of the Stratis blockchain; autonomous systems - send their own white and black lists of IP addresses to the blockchain, while using the lists obtained from the blockchain on them, and also implement mechanisms to counter DDoS attacks; blockchain/smart contract - implementation of a decentralized application in the open Stratis network in C #, which implements the logic of forming a common black and white list of IP addresses based on messages from participants in a coordinated response to DDoS attacks. The architecture is built considering the following principles: 1. DDoS detection and measures to prevent it are provided in the form of services through mechanisms defined either in the AS, or through third-party services;
188
I. Mikhail et al.
2. To transfer or send information about the attack to the domain, it is necessary to select a node that will be connected to the blockchain. This can be dedicated hardware solely for this purpose, or virtualized with SDN to reduce resource consumption; 3. For an effective coordinated response to attacks, modules connected to the blockchain that are deployed on clients and ASs must regularly refer to the smart contract for changes in the lists of IP addresses; 4. Entries in the lists of the smart contract are updated only by the participants in a coordinated response by way of proof of ownership (English Proof of Ownership); 5. There is no prescription on how to react to the fact of a DDoS attack. After notification of the attack, in which the client has verified its identity, countermeasures are determined in accordance with the established domain security policies. 3.3 Development of a Smart Contract The following is a logical diagram of the developed smart contract (see Fig. 3), which is being deployed as an additional solution to the existing mitigation mechanisms for DDoS attacks. Domain networks connected to the system must implement the described principles.
Fig. 3. Block diagram of the smart contract being developed.
First, each connected domain must create a smart contract in the Stratis network containing either a personal IP address, for example, for a simple user, or a list of addresses certified by a membership certificate. For users, such a certificate can be created by the AU in which they work, while for the connected AU such a certificate must be requested. The issuance of a certificate is the only centralized element in the architecture. In order to track the fact of participation of various networks and identify advisory smart contracts, it is necessary to register them in the corresponding register, which is also a smart contract registered in the Stratis blockchain at the time of the organization of interaction. Traffic arriving both to the user and to the speaker can be analyzed and filtered using monitoring tools. A blockchain-connected device that downloads and publishes domainspecific whitelists and blacklists can be deployed as a security add-on. Traffic analysis
Intelligent Web-Application for Countering DDoS
189
at the gateway is carried out using SDN, so the approach is aimed at using a monitoring framework based on the OpenFlow protocol. The owner of the account that created the smart contract can add other addresses to the contract, which in this way gain access to adding to the list of IP addresses. Before adding the address, it is checked if it matches the parent subnet. The IP-addresses added to the list necessarily include the validity period of the applied rules, which is saved as an additional parameter in the record. Time is measured by the blocks created, and access to the stored data is public and can be viewed by any person. Before you can get the resulting list of IP address value pairs (source / destination), you need to prove ownership, i.e. confirm participation in a coordinated response. After that, any AS can use the resulting list to adjust the rules for processing traffic in its subnet.
4 Computational Experiment to Emit DDoS Attacks on Political Institutions To set up a computational experiment to emit DDoS attacks on political institutions, it was necessary to use the technology of component virtualization. Virtualization is now a consolidated technology in modern computers. The advances in technical sciences of the last decade have made virtualization of computing platforms commonplace for information systems. Hypervisor technology allows different virtual machines to share the same hardware resources. As part of the development of cloud resources, virtual resources were acquired in the cloud environment of MC Azure, whose specialization ranges from computing to data storage systems. This approach made it possible to attract large computing power with relatively low costs for research. The strength of component virtualization is the fact that virtual machines can be easily moved from one physical server to another, create new configurations and/or shut down depending on requirements, which allows providing services through flexible and simple management tools. To carry out a computational experiment on the issue of DDoS attacks on political institutions, virtual machines from Microsoft Azure were used, to issue network equipment, the Azure IoT Central service was used (see Fig. 4). Among the disadvantages of this approach, one can single out the fact that the virtual network infrastructure of MS Azure is not capable of supporting arbitrary network topologies and addressing schemes. The user who rented resources does not have the ability to simultaneously configure both computing nodes and the network topology. However, for our computational experiment, this is not a significant drawback and allows us to conduct research. Two technical solutions were tested: the “traditional” system for countering DDoS attacks and the proposed intelligent blockchain system. In the case of an intelligent blockchain system, when setting up a computational experiment, it is necessary to implement smart contracts. To implement a smart contract, the logic of which was presented earlier, you need to download a copy of the Full Node source code files from the official Stratis repository
190
I. Mikhail et al.
Fig. 4. Screenshot of successful deployment of virtual infrastructure for computational experiment in MS Azure.
to your working PC [13]. The development of a smart contract was carried out by implementing a class - a special data type in the.NET Core platform, inherited from the Smart Contract class. Using inheritance in.NET Core allows you to safely extend existing functionality. The source code of the developed smart contract is presented in (44). To check the given code for syntax errors, you need to run: • cd [YOUR_PATH] /src/Stratis.SmartContracts.Tools.Sct • dotnet run - validate [CONTRACT_PATH] -sb where [CONTRACT_PATH] is the location of the smart contract on the virtual machine disk. If at the end of the command execution the result of the smart contract compilation in the form of bytecode is displayed in the console window, then the verification was successful, the smart contract is ready to be published on the blockchain. Figure 5 shows a screenshot of the successful result of smart contract compilation. After the implementation of the smart contract, you can start testing both systems (traditional and intelligent blockchain systems) [15–18]. Testing was carried out at the following levels according to the OSI model (see Table 2). To simulate the issuance of DDoS attacks on political institutions, an application was developed in Microsoft Power Apps. We limited ourselves to the following methods of influence: ICMP-flood, SYN-flood, HTTP flood, Application flood. According to statistics [19–21], these methods are the most common in cyber-attacks on political institutions. DDoS attack parameters used in the computational experiment: duration 4 h. Traffic volume - 100 Gb/sec.
Intelligent Web-Application for Countering DDoS
191
Fig. 5. Smart contract compilation result.
Table 2. Analysis of DDoS attacks by levels of the OSI model. Level
Protocols
Technologies used by DDoS
L7 - application
FTP, HTTP, POP3, SMTP and the gateways that use them
PDF GET requests, HTTP GET, HTTP POST (website forms: login, photo/video upload, feedback confirmation)
L6 - presentation
Compression and coding protocols (ASCII, EBCDIC)
Forged SSL requests: checking SSL encrypted packets is very resource-intensive, attackers use SSL for HTTP attacks on the victim’s server
L5 - session
I/O protocols (RPC, PAP)
Telnet protocol attack exploits weaknesses in the Telnet server software on the switch, making the server unavailable
L4 - transport
TCP, UDP protocols
SYN flood, Smurf attack (attack by ICMP requests with changed addresses)
L3 - network
IP protocols, ICMP, ARP, RIP and the routers that use them
ICMP flood - DDos attacks at the third layer of the OSI model that use ICMP messages to overload the bandwidth of the target network
192
I. Mikhail et al.
5 Results Figure 6 shows the results of a computational experiment. The graph shows that the proposed intellectual method demonstrates the best indicators for countering DDoS attacks in conditions (attacks on educational institutions are considered as an exam-ple) for five levels (L3-L7).
Fig. 6. Comparison of a traditional system and an intelligent blockchain system as a defense against DDoS attacks on political institutions.
Based on the statistical data of the analysis of DDOs of attacks on educational institutions [21–24], these levels have the greatest number of vulnerabilities. The proposed intelligent system is characterized by configuration flexibility that allows you to quickly apply changes to network policy without sacrificing performance, and the use of the Stratis blockchain will allow organizing a coordinated response to detected DDoS attacks by many networks.
6 Conclusion As a result of the research, a system was developed using blockchain to counter DDoS attacks. As an example, educational institutions were considered as an object of attacks, which has a number of features. Stratis and software-defined networks have been proposed as a technology for designing an intelligent system for countering DDOS attacks. A model of an intelligent system for countering DDoS using a blockchain was developed, and a computational experiment was carried out to emit a cyber-attack on political institutions. As a result of the experiment, it was found that the proposed system demonstrates better performance compared to the traditional protection system for five levels out of five according to the OSI model. The results will be useful to specialists and researchers in both the field information security, and in the field of social and political sciences.
Intelligent Web-Application for Countering DDoS
193
References 1. Dunn Cavelty M., Wenger A.: Cyber security meets security politics: complex technology, fragmented politics, and networked science. Contemp. Secur. Policy 41(1), 5–32 (2020) 2. Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using YOLOv3. In: Gervasi, O., et al. (eds.) Computational Science and Its Applications – ICCSA 2020. ICCSA 2020. Lecture Notes in Computer Science, vol. 12249, pp. 923–934. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58799-4_66 3. Korchagin, S.A., Terin, D.V., Klinaev, Y.V., Romanchuk, S.P.: Simulation of currentvoltage characteristics of conglomerate of nonlinear semiconductor nanocomposites. In: 2018 International Conference on Actual Problems of Electron Devices Engineering (APEDE), pp. 397–399. IEEE (2018) 4. Canh, N.P., et al.: Systematic risk in cryptocurrency market: evidence from DCC-MGARCH model. Financ. Res. Lett. 29, 90–100 (2019) 5. Husain, S.O., Franklin, A., Roep, D.: The political imaginaries of blockchain projects: discerning the expressions of an emerging ecosystem. Sustain Sci 15, 379--394 (2020). https:// doi.org/10.1007/s11625-020-00786-x 6. Conti, M., et al.: A survey on security and privacy issues of bitcoin. IEEE Commun. Surv. Tutorials 20(4), 3416–3452 (2018) 7. Korchagin, S.A., et al.: Software and digital methods in the natural experiment for the research of dielectric permeability of nanocomposites. In: 2018 International Conference on Actual Problems of Electron Devices Engineering (APEDE), Saratov, pp. 262–265. IEEE (2018) 8. Gataullin T.M., Gataullin S.T.: Best economic approaches under conditions of uncertainty. In: 11th International Conference Management of Large-Scale System Development, MLSD 2018, Moscow (2018) 9. Gataullin T.M., Gataullin S.T.: Management of financial flows on transport. In: 12th International Conference Management of Large-Scale System Development, MLSD 2019, Moscow (2019) 10. Apel, S., Hertrampf, F., Späthe, S.: Towards a metrics-based software quality rating for a microservice architecture. In: Lüke, K.-H., Eichler, G., Erfurth, C., Fahrnberger, G. (eds.) I4CS 2019. CCIS, vol. 1041, pp. 205–220. Springer, Cham (2019). https://doi.org/10.1007/ 978-3-030-22482-0_15 11. Mladenov V., Mainka C., zu Selhausen K.M., Grothe M., Schwenk J.: Trillion dollar refund: how to spoof PDF signatures. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, pp. 1–14. ACM (2019) 12. Salah, K., Rehman, M.H.U., Nizamuddin, N., Al-Fuqaha, A.: Blockchain for AI: review and open research challenges. IEEE Access 7, 10127–10149 (2019) 13. Yadav, A.K., Singh, K.: Comparative analysis of consensus algorithms of blockchain technology. In: Hu, YC., Tiwari, S., Trivedi, M., Mishra, K. (eds.) Ambient Communications and Computer Systems. Advances in Intelligent Systems and Computing, vol. 1097, pp. 205–218. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1518-7_17 14. Gupta, B.B., Dahiya, A., Upneja, C., Garg, A., Choudhary, R.: A comprehensive survey on DDoS attacks and recent defense mechanisms. In: Gupta, B.B., Srinivasagopalan, S. (eds.) Handbook of Research on Intrusion Detection Systems:, pp. 186–218. IGI Global (2020) 15. Tavares, B., Correia, F.F, Restivo, A.L A Survey on blockchain technologies and research. J. Inf. 14, 118–128 (2019) 16. Schulzke, M.: The politics of attributing blame for cyberattacks and the costs of uncertainty. Perspect. Politics 16(4), 954–968 (2018) 17. Rahman, N.A.A., Sairi, I.H., Zizi, N.A.M., Khalid, F.: The importance of cybersecurity education in school. Int. J. Inf. Educ. Technol. 10(5), 378–382 (2020)
194
I. Mikhail et al.
18. Dawson, M.: National cybersecurity education: bridging defense to offense. Land Forces Acad. Rev. 25(1), 68–75 (2020) 19. Khasanshin, I.: Application of an artificial neural network to automate the measurement of kinematic characteristics of punches in boxing. Appl. Sci. 11(3), 1223 (2021) 20. Soboleva, E.V., Suvorova, T.N., Zenkina, S.V., Bocharov, M.I.: Professional selfdetermination support for students in the digital educational space. Eur. J. Contemp. Educ. 9(3), 603–620 (2020). https://doi.org/10.32744/pse.2020.6.32 21. Korchagin S., Romanova E., Serdechnyy D., Nikitin P., Dolgov V., Feklin V. Mathematical modeling of layered nanocomposite of fractal structure. Mathematics 9(13), 1541 (2021) 22. Shirokanev A.S., Andriyanov N.A., Ilyasova N.Y.: Development of vector algorithm using CUDA technology for three-dimensional retinal laser coagulation process modeling. Comput. Opt. 45(3), 427–437 (2021) 23. Soloviev, V., Titov, N., Smirnova, E.: Coking coal railway transportation forecasting using ensembles of ElasticNet, LightGBM, and Facebook prophet. In: Nicosia, G., et al. (eds.) Machine Learning, Optimization, and Data Science. LOD 2020. Lecture Notes in Computer Science, vol. 12566, pp. 181–190. Springer, Cham (2020). https://doi.org/10.1007/978-3-03064580-9_15 24. Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using YOLOv3. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12249, pp. 923–934. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58799-4_66
Toward Human-Level Qualitative Reasoning with a Natural Language of Thought Philip C. Jackson Jr.(B) TalaMind LLC, PMB #363, 55 E. Long Lake Road, Troy, MI 48085, USA [email protected]
Abstract. How could AI systems achieve human-level qualitative reasoning? This research position paper proposes that a system architecture for human-level qualitative reasoning would benefit from a neuro-symbolic approach combining a ‘natural language of thought’ with qualitative process semantics. Keywords: Qualitative reasoning · Human-level artificial intelligence · Neuro-symbolic · Natural language of thought
1 Introduction Forbus [1] surveys research on qualitative reasoning and makes a convincing case for the importance of qualitative reasoning and qualitative concept representations in AI. He also makes a convincing case that qualitative reasoning and qualitative concept representation are essential to human-level intelligence, and pervasive in human-level intelligence. Therefore, we can expect that qualitative reasoning and qualitative concept representation will be essential to achieving human-level AI, and pervasive in human-level AI. Forbus [1] gives many examples of qualitative reasoning and qualitative concepts. These examples are discussed using a Lisp-like symbolic notation, presented in the book. He makes the point that qualitative reasoning, and understanding of qualitative concepts, are necessary for human-level understanding of natural language, writing that “qualitative reasoning forms an important component of natural language semantics.” [1, p. 9] And he discusses how qualitative concepts and reasoning can be represented in AI systems, using symbolic representations and languages developed for AI systems. This paper proposes that a ‘natural language of thought’ would be ideal for qualitative reasoning, and for expression of qualitative concepts, in systems that may eventually achieve human-level AI. The following pages will consider these questions, in order: • • • •
What is human-level intelligence and would be human-level artificial intelligence? What are the major options to achieve human-level AI? What are the major options for AI understanding natural language? What is human-level qualitative reasoning, and what is the role of natural language in it? • How could a ‘TalaMind’ architecture support qualitative reasoning, with a natural language of thought? © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 195–207, 2022. https://doi.org/10.1007/978-3-030-96993-6_19
196
P. C. Jackson Jr.
I will begin by discussing human-level AI.
2 What is Human-Level Intelligence and Would be Human-Level Artificial Intelligence? The issue of how to define human-level intelligence has been a challenge for AI researchers. Rather than define it, one might just expect to use a Turing Test to recognize it, if it is ever achieved. Some have suggested human intelligence may not be a coherent concept that can be analyzed, even though we can recognize it when we see it in other human beings [2]. While a Turing Test may help recognize human-level AI if it is created, the test does not define intelligence nor indicate how to design, implement, and achieve human-level AI. Also, the Turing Test focuses on recognizing human-identical AI, indistinguishable from humans. It may be sufficient (and even important, for achieving beneficial humanlevel AI) to develop systems that are human-like, and understandable by humans, rather than human-identical [3]. An approach different from the Turing Test was proposed in [4]: to define humanlevel intelligence by identifying capabilities achieved by humans and not yet achieved by any AI system, and to inspect the internal design and operation of any proposed system to see if it can in principle robustly support these capabilities, which [4] calls higher-level mentalities: • • • • • • • • • •
Natural Language Understanding Self-Development and Higher-Level Learning Metacognition and Multi-Level Reasoning Imagination Self-Awareness – Artificial Consciousness Sociality, Emotions, Values Visualization and Spatial-Temporal Reasoning Curiosity, Self-Programming, Theory of Mind Creativity and Originality Generality, Effectiveness, Efficiency
The higher-level mentalities together comprise a qualitative difference which would distinguish human-level AI from current AI systems and computer systems in general. Discussions of the higher-level mentalities are given in [4] and [5], beginning in Chapter 2 Sect. 1.2. It appears that qualitative reasoning and qualitative concept representation are important in supporting virtually all the higher-level mentalities. This will be discussed further beginning in Sect. 5 below.
3 What are the Major Options to Achieve Human-Level Artificial Intelligence? Logically, there are three major alternatives toward this goal:
Toward Human-Level Qualitative Reasoning
197
• Purely symbolic approaches to HLAI. • Neural network architectures. • Hybrid systems 3.1 Purely Symbolic Approaches to HLAI Based on computational universality, one might argue theoretically that purely symbolic processing is sufficient to achieve human-level AI. Over the decades, researchers have proposed a variety of symbolic processing approaches toward the eventual goal of achieving human-level artificial intelligence. Based on the generality of symbolic logic, one might think it should be possible to achieve human-level AI by developing systems which only use symbolic logic, with extensions of first-order logic, along with other symbolic approaches, such as semantic networks and frame-based systems. Such approaches have been developed in AI research for many years. However, symbolic logic effectively handicaps achieving human-level AI, because it is not as flexible as natural language for representing human thoughts and knowledge: Formal logic specializes and standardizes the use of certain natural language words and phrases, and in principle, anything that can be expressed in formal logic could be translated into equivalent expressions in natural language. But the opposite is not true: Natural language can express ideas and concepts much more flexibly than formal logic. Natural language allows communication without needing to be precise about everything at once. [6] Natural language supports expressing thoughts about what you think or think other people think, thoughts about irrational or self-contradictory situations, emotions, etc. To achieve human-level artificial intelligence, a system will need the ability to understand the full range of human thoughts that can be expressed in a natural language like English. No existing formal logic language can represent this range of thoughts. A natural language like English already can do this perhaps as well as any artificial, formal logic language ever could. So, the ‘TalaMind’ approach advocates representing and using a natural language like English as a ‘natural language of thought’ within AI systems that may eventually achieve human-level artificial intelligence. [5] This is further discussed in Sect. 4 below. 3.2 Neural Network Architectures for HLAI Based on computational generality, one can argue theoretically that neural networks are sufficient to achieve human-level AI. The technology is being applied to a wide variety of tasks in robotics, vision, speech, and linguistics. The technology is essentially domain independent. However, if human-level AI is achieved solely by relying on neural networks then it may not be very explainable to humans: Immense neural networks may effectively be a black box, much as our own brains are largely black boxes to us. To achieve beneficial human-level AI [3] it will be important for a human-level AI to be more open to inspection and more explainable than a black box. This suggests that research on neural networks to achieve human-level AI should be pursued in conjunction with other approaches that
198
P. C. Jackson Jr.
support explanations in a natural language like English and avoid complete dependence on neural networks. 3.3 Hybrid Architectures for HLAI It should be possible to develop hybrid (“neuro-symbolic”) architectures, combining symbolic processing and neural networks, to support eventually achieving human-level AI. Such architectures should have substantial advantages: Symbolic processing would support explainable reasoning and learning with sentential structures, networks, contexts, etc. Neural networks would support learning, representing, and recognizing complex patterns and behaviors not easily defined by symbolic expressions. This paper advocates a class of hybrid architectures called the ‘TalaMind architecture’ [5] and will focus on discussing the symbolic processing side of the architecture, to support a natural language of thought. Integration of neural networks with symbolic processing is a topic for ongoing and future research.
4 What are the Major Options for AI Understanding Natural Language? Natural language understanding was listed in Sect. 2 as one of the higher-level mentalities of human-level intelligence. It is a key higher-level mentality because it supports other higher-level mentalities. AI systems could use purely symbolic programming methods for processing and understanding natural language, or rely entirely on neural networks, or use hybrid approaches combining symbolic processing and neural networks. If we focus just on the symbolic processing methods, there are two major alternatives to discuss. 4.1 Treating Natural Language as External Data for Symbolic Processing The first alternative for symbolic processing is to treat natural language expressions as external data, and to use other, internal symbolic languages for representing thoughts and for specifying how to process, interpret and generate external natural language expressions. In AI research, it has been a traditional approach to translate natural language expressions into a formal language such as predicate calculus, frame-based languages, conceptual graphs, relational tuples, etc., and then to perform reasoning and other forms of cognitive processing, such as learning, with expressions in the formal language. Although this has been the traditional approach, it is not the approach advocated by this paper, for reasons discussed in [5]. 4.2 Using Natural Language as a Language of Thought in an AI System A major alternative for symbolic processing of natural language is to represent natural language expressions as internal data structures and to use natural language itself as an internal symbolic language for representing thoughts, and for describing (at least at a
Toward Human-Level Qualitative Reasoning
199
high level) how to process thoughts, and for interpreting and generating external natural language expressions. This approach is what I describe as implementing a ‘natural language of thought’ in an AI system [5]. Other symbolic languages could be used internally to support this internal use of natural language, e.g., to support pattern-matching of internal natural language data structures, or to support interpretation of natural language data structures. This approach could also be combined with neural networks, in hybrid approaches for processing natural language. Yet in this approach the data structures representing natural language expressions are the general high-level representations of thoughts. For domains like mathematics, physics, chemistry, etc. an AI system might use additional symbolic languages to help represent domain-specific thoughts. This approach involves more than just representing and using the syntax of natural language expressions to represent thoughts: It also involves representing and using the semantics of natural language words and expressions, to represent thoughts [5]. There is not a consensus based on analysis and discussion among scientists that an AI system cannot use a natural language like English as an internal language for representation and processing of thoughts. Rather, in general it has been an assumption by AI scientists over the decades that computers should use formal logic languages (or simpler symbolic languages) for internal representation and processing within AI systems. Yet it does not appear there is any valid theoretical reason why the syntax and semantics of a natural language like English cannot be used directly by an AI system for its language of thought, without translation into formal languages, to help achieve human-level AI. [5, pp. 156–177].
5 What is Human-Level Qualitative Reasoning, and What is the Role of Natural Language in it? Forbus [1] gives a broad survey of qualitative reasoning in human intelligence. His discussion makes it clear that qualitative reasoning and qualitative concepts play an important role in many aspects of human intelligence. We may define ‘human-level qualitative reasoning’ as all the forms of qualitative reasoning that people use to support the higher-level mentalities of human-level intelligence, listed in Sect. 2 above. Such a definition only opens a door for us to ponder the topic. Following are some initial discussions of how qualitative reasoning could support a few of the higher-level mentalities. These discussions suggest directions and topics for future research. 5.1 Self-Development and Higher-Level Learning I use the term ‘higher-level learning’ to encompass the following kinds of learning, and distinguish them from lower-level forms of learning investigated in previous research on machine learning:
200
P. C. Jackson Jr.
• Learning by induction of new linguistic concepts. • Learning by creating explanations and testing predictions, using causal and purposive reasoning. • Learning about new domains by developing analogies and metaphors with previously known domains. • Learning by reflection and self-programming. • Reasoning about thoughts and experience to develop new methods for thinking and acting. • Reasoning about ways to improve methods for thinking and acting. • Learning by invention of languages and representations. Given the importance of qualitative words and expressions in natural language, it seems clear that qualitative representations and qualitative reasoning could play an important role in supporting at least the above italicized forms of higher-level learning, within an AI system that uses a natural language of thought. So, for example, “Learning about new domains by developing analogies and metaphors with previously known domains” is italicized because it seems clear that qualitative reasoning will be important in developing analogies and metaphors, in humanlevel AI. “Learning by reflection and self-programming” is not italicized because the importance of qualitative reasoning seemed less clear for this - though I do not wish to foreclose it. It may be possible to come up with a good example for it. 5.2 Metacognition and Multi-level Reasoning Metacognition is “cognition about cognition,” cognitive processes applied to cognitive processes. This does not say much, until we say what we mean by cognition. There are both broad and narrow usages for the term cognition in different branches of cognitive science and AI. Many authors distinguish cognition from perception and action. However, Newell [7, p. 15] gave reasons why perception and motor skills should be included in “unified theories of cognition.” If we wish to consider metacognition as broadly as possible, then it makes sense to start with a broad idea of cognition, including perception, reasoning, learning, and acting, as well as other cognitive abilities Newell identified, such as understanding natural language, imagination, and consciousness. Since cognitive processes may in general be applied to other cognitive processes, we may consider several different forms of metacognition, for example: Reasoning about reasoning. Reasoning about learning. Learning how to learn…. Others have focused on different aspects of metacognition, such as “knowing about knowing” or “knowing about memory.” Cognitive abilities could be considered in longer metacognitive combinations, e.g., “imagining how to learn about perception” – the combination could be instantiated to refer to a specific perception. Such examples illustrate that natural language has syntax and semantics which can support describing different forms of metacognition. More importantly, a ‘natural language of thought’ could help an AI system perform metacognition by enabling the ‘inner speech’ expression of specific thoughts about other specific thoughts, specific thoughts about specific perceptions, etc. Using qualitative
Toward Human-Level Qualitative Reasoning
201
words and expressions in natural language, a natural language of thought could support qualitative representations and qualitative reasoning in metacognition within a system achieving human-level artificial intelligence. 5.3 Sociality and Emotions A human-level AI will need some level of social understanding to interact with humans. It will need some understanding of cultural conventions, etiquette, politeness, etc. It will need some understanding of emotions humans feel, and it may even have some emotions of its own, though we will need to be careful about this. One of the values of human-level artificial intelligence is likely to be its objectivity and freedom from being affected by some emotions. Within an AI system, emotions could help guide choices of goals, or prioritization of goals. Apart from whether and how emotions may be represented internally, a humanlevel AI would also need to understand how people express emotions in behaviors and linguistically, and how its behaviors and linguistic expressions may affect people and their emotions. So again, using qualitative words and expressions in natural language, a natural language of thought could play an important role in supporting sociality, and understanding and expression of emotions, within a system achieving human-level artificial intelligence. 5.4 Natural Language Understanding and Human-Level Qualitative Reasoning Qualitative reasoning is of great importance to natural language understanding, and natural language is the main vehicle that people use to express qualitative reasoning, in communication with each other. We have only to read a newspaper, and find sentences like this: “Individual investors are holding more stocks than ever before as major indexes climb to fresh highs. They are also upping the ante by borrowing to magnify their bets or increasingly buying on small dips in the market.” [8] Many nouns, verbs, adjectives, and adverbs refer to qualitative concepts, even in discussing quantitative matters like stocks. Thus, in the above example qualitative words include “major”, “climb”, “highs”, “magnify”, “increasingly”, “small”. Natural language expressions group these concepts into larger constructs, which can retain a qualitative nature. Quantitative concepts may be used as necessary, of course, in statements like: “Households increased stockholdings to 41% of their total financial assets in April.” [8] How does human intelligence internally represent and perform qualitative reasoning? Though presumably everything is grounded in the biological neural networks and biochemical dynamics of the human brain, several authors have advocated that at some level the brain uses a language of thought. (To provide examples, the References section
202
P. C. Jackson Jr.
includes citations for Berwick and Chomsky [9], Fernyhough [10], Fodor [11], Jackendoff [12], and Schneider [13].) So perhaps internally the human brain represents and uses qualitative concepts in a language of thought. In any case, in attempting to develop human-level AI, we are free to develop systems that represent and use qualitative concepts and expressions within a ‘natural language of thought’. A natural language of thought is advocated as part of the TalaMind approach toward eventually achieving human-level artificial intelligence. The above discussion is in effect an argument that qualitative reasoning will be essential for understanding and processing a natural language of thought, and that a natural language of thought could help support qualitative reasoning, within human-level AI. These topics will be further discussed in the next section.
6 How Could a ‘TalaMind’ Architecture Support Qualitative Reasoning, With a Natural Language of Thought? 6.1 An Overview of the TalaMind Approach and Architecture The TalaMind approach was proposed by [4] for research toward eventually achieving human-level artificial intelligence. The approach is summarized by three hypotheses: I.
Intelligent systems can be designed as ‘intelligence kernels’, i.e., systems of concepts that can create and modify concepts to behave intelligently within an environment. II. The concepts of an intelligence kernel may be expressed in an open, extensible conceptual language, providing a representation of natural language semantics based very largely on the syntax of a particular natural language such as English, which serves as a language of thought for the system. III. Methods from cognitive linguistics may be used for multiple levels of mental representation and computation. These include constructions, mental spaces, conceptual blends, and other methods [14, 15]. The first hypothesis essentially describes the ‘seed AI’ approach in AGI. [16] The second hypothesis conjectures that a language of thought based on the syntax and semantics of a natural language can support an intelligence kernel achieving human-level artificial intelligence. The third hypothesis envisions that cognitive linguistics can support multiple levels of cognition. A TalaMind1 architecture has three levels of conceptual representation and processing, called the linguistic, archetype, and associative levels, adapted from Gärdenfors’ [17] paper on levels of inductive inference. He called these the linguistic, conceptual, and associative levels, but the perspective of the TalaMind approach is that all three are conceptual levels. For example, the linguistic level includes sentential concepts. Hence the middle level is called the archetype level, to avoid implying it is the only level where concepts exist. 1 TalaMind® and Tala® are trademarks of TalaMind LLC, to support future development.
Toward Human-Level Qualitative Reasoning
203
At the linguistic level, the architecture includes a natural language of thought called Tala, in which qualitative expressions would occur frequently, as discussed in Sect. 5 above and Sect. 6.2 below. The linguistic level also includes a ‘conceptual framework’ for managing concepts expressed in Tala, and conceptual processes that operate on concepts in the conceptual framework to produce intelligent behaviors and new concepts. In the TalaMind approach, conceptual processes can be implemented with ‘executable concepts’, also expressed in Tala, which can create and modify executable concepts. [5, pp. 214–217] The potential scope of conceptual processes would be computationally universal. [5, p. 73]. At the archetype level, cognitive categories and concepts may be represented using methods such as conceptual spaces, image schemas, semantic frames, radial categories, etc. In future research, qualitative concepts could be represented using semantic frames at this level. The next section gives some additional discussion of this topic. The associative level would typically interface with a real-world environment and support deep neural networks, Bayesian processing, etc. At present, the TalaMind approach does not prescribe specific research choices at the archetype and associative levels. The TalaMind architecture is open at the three conceptual levels, permitting conceptual graphs, predicate calculus, and other formal languages in addition to the Tala language at the linguistic level, and permitting integration across the three levels, e.g., potential use of deep neural networks at the linguistic and archetype levels. So, the TalaMind architecture is actually a broad class of architectures, open to further design choices at each level. For concision, a system with a TalaMind architecture is called a ‘Tala agent’. When Tala expressions are created and processed internally within a Tala agent, they are created and processed as syntactic structures. There is no need within a Tala agent to convert internal syntactic structures to and from linear text strings. Such internal processing also might not involve disambiguation of word senses, because Tala expressions can include pointers to word senses and referents. The TalaMind hypotheses do not require it, but it is consistent and natural to have a society of mind at the linguistic level of a TalaMind architecture. The term ‘society of mind’ is used in a broader sense than the approach described by Minsky [18]. This broader, generalized sense corresponds to a paper by Doyle [19], who referred to a multiagent system using a language of thought for internal communication, although Doyle did not discuss a ‘natural language of thought’. 6.2 TalaMind’s Potential to Use and Support Qualitative Reasoning A Tala expression is a (potentially reentrant) multi-level list structure representing the dependency parse-tree (syntax) of a natural language expression. These list structures can be arbitrarily complex, corresponding to the potential complexity of natural language syntax. [5] As a simple example, the sentence Can you turn grain into food for people? could be represented by:
204
P. C. Jackson Jr. (turn (wusage verb) (modal can) (sentence-class question) (subj you) (obj (grain (wusage noun))) (into (food (wusage noun) (for (people (wusage noun)) ))))
At the linguistic level of a TalaMind architecture, ‘turn’ would be represented symbolically as a natural language word with multiple senses. Tala constructions are used in the prototype demonstration to translate ‘turn X into Y’ into ‘make X be Y’.2 The prototype then simulates how a Tala agent could discover and develop an executable concept for making grain be food for people, i.e., making bread from grain [5]. Qualitative words occur frequently in the TalaMind ‘discovery of bread’ simulation, as illustrated by bold font in the following steps: 1...3 Leo wants Ben to make edible grain. 1...4 Leo says grain is not edible because grain is too 1...4 Ben wants Ben to experiment with grain. 1...4 Ben wants Ben to examine grain. 1...5 Ben asks can you turn over some to me for experiments?. 1...6 Leo gives some grain to Ben. 1...8 Ben thinks wheat grains resemble nuts. 1...8 Ben imagines an analogy from nuts to grain focused on food for people. 1...8 Ben thinks grain perhaps is an edible seed inside an inedible shell. 1...8 Ben thinks humans perhaps can remove shells from grains by pounding grains because pounding breaks shells off grains. 1...12 Ben thinks grain is not edible because grain is very hard. 1...12 Ben thinks how can Ben make softer grain?. 1...13 Ben soaks grain in water. 1...17 Ben mashs grain. 1...18 Ben thinks grain is a gooey paste. 1...20 Ben thinks dough is soft, too gooey, and tastes bland.
2 Pages 218–222 of [5] discuss the use of Tala constructions in the prototype demonstration to
translate ‘turn X into Y’ as meaning ‘make X be Y’ and to translate ‘turn over X to Y’ as meaning ‘give X to Y’, and note that extensions to the Tala constructions would be needed to distinguish these usages of ‘turn’ from usages that refer to changes in physical orientation, such as ‘turn the car into the driveway’ or ‘turn the boat into the wind’.
Toward Human-Level Qualitative Reasoning
205
1...22 Ben thinks baked dough is a flat, semi-rigid object. 1...24 Ben thinks flat bread is edible, flat, not soft, not gooey, and tastes crisp. 1...28 Leo asks can you make thick, soft bread?. 1...29 Ben thinks thick, soft bread would be less dense. 1...29 Ben thinks thick, soft bread might have holes or air pockets. 1...29 Ben thinks air pockets in thick, soft bread might resemble bubbles in bread. 1...29 Ben thinks Ben might create bubbles in bread by adding a drinkable liquid with bubbles to dough. 1...30 Ben thinks Ben might create bubbles in bread by adding beer foam to dough. 1...33 Ben mixs the dough with beer foam. 1...33 Ben bakes dough. 1...34 Leo tries to eat bread. 1...36 Leo says bread is edible, thick, soft, tastes good, and not gooey. 1...37 Ben says Eureka!
Forbus [1] (pp. 221–234) discusses how frame-based qualitative process representations could support understanding semantics of natural language expressions, citing research by Kuehne and Forbus [20, 21]. In principle, frame-based qualitative process representations could be used in a future TalaMind system to support understanding the semantics of qualitative words in the steps to “turn grain into food for people”.This could be supported by conceptual processes at the linguistic level of the TalaMind system, applying qualitative process representations to expressions in Tala. The TalaMind archetype level is envisioned to include semantic frame representations [5]. The potential for a natural language of thought to use and support qualitative reasoning extends in general to qualitative process reasoning that can be expressed in natural language, and in general to domains that human-level intelligence uses qualitative process reasoning to consider. This generality is potentially within the scope of human-level artificial intelligence by using a natural language of thought in the TalaMind approach. However, developing this approach is a topic for future research – the goal of this paper has just been to describe a direction for future work, with wide applicability in natural language understanding and support of human-level artificial intelligence. 6.3 Further Thoughts If natural language is the symbolic representation for knowledge and thoughts within an AI system, then symbolic reasoning is essentially what it always was before formal logic languages were developed: Symbolic reasoning is using natural language expressions to derive other natural language expressions. For example, if we have two natural language sentences “All men are mortal” and “Socrates is a man”, then symbolic reasoning would derive a natural language sentence “Socrates is mortal.” In the TalaMind approach, this symbolic reasoning would be performed by matching Tala data structures representing the syntax and semantics of the
206
P. C. Jackson Jr.
natural language expressions involved. There would be no translation of the expressions into a logical formalism like predicate calculus or conceptual graphs. Of course, this Socratic example is trivial. The true power of reasoning with natural language is the ability to represent and reason with thoughts that are not easy to represent in formal logic. The extent to which natural language processing is currently being performed with neural networks indicates how difficult it is to perform with formal logic. Yet I expect that symbolic reasoning with a natural language of thought could surpass neural networks in some respects, and that a neuro-symbolic approach could have the best of both worlds. Since TalaMind systems would be neuro-symbolic, they would be able to have some concepts represented as patterns by neural networks, which they might not be able to define very well in natural language. This is also a characteristic of human intelligence. It may be considered as a feature, rather than a limitation. An anonymous reviewer asked whether TalaMind’s use of a natural language of thought “is essentially an embodiment of the strong version of linguistic relativity, i.e., that language determines thought.” I would say that TalaMind is only constrained by a weak version of linguistic relativity: to achieve human-level intelligence, the intelligence kernel should have a set of initial concepts which would enable the system to learn (and also invent) new languages. Thus, the system would not be limited to a single natural language of thought, si ves lo que quiero decir. And as noted previously, the system’s neural networks could support learning concepts not limited by its language(s) of thought. Historically, it appears there have been very few research endeavors directly toward developing an AI natural language of thought, though there have been endeavors in related directions. More discussion of this is given in Section 7 of [22].
7 Summary A ‘natural language of thought’ would be ideal for support of qualitative reasoning within systems which may eventually achieve human-level artificial intelligence, supported by neural networks. The TalaMind approach envisions an architecture for such systems. Of course, there is much more work needed to achieve human-level AI, including humanlevel qualitative reasoning, via the TalaMind approach. The goal of this paper is just to motivate future work in this direction. Acknowledgement. I thank three anonymous reviewers for questions and comments which prompted further remarks and discussions.
References 1. Forbus, K.D.: Qualitative Representations – How People Reason and Learn about the Continuous World. MIT Press, Cambridge (2018) 2. Kaplan, J.: Artificial Intelligence – What Everyone Needs to Know. Oxford University Press, Oxford (2016) 3. Jackson, P.C.: Toward beneficial human-level AI… and beyond. In: AAAI Spring Symposium Series Technical Reports, SS-18–01, pp. 48–53 (2018)
Toward Human-Level Qualitative Reasoning
207
4. Jackson, P.C.: Toward human-level artificial intelligence – representation and computation of meaning in natural language. Ph.D. thesis, Tilburg University, The Netherlands (2014) 5. Jackson, P.C.: Toward Human-Level Artificial Intelligence – Representation and Computation of Meaning in Natural Language. Dover Publications, Mineola (2019) 6. Sowa, J.F.: Fads and fallacies about logic. IEEE Intell. Syst. 22(2), 84–87 (2007) 7. Newell, A.: Unified theories of cognition. Harvard University Press, Cambridge (1990) 8. Banerji, G.: Americans Can’t get Enough of the Stock Market. The Wall Street Journal, New York (2021) 9. Berwick, R.C., Chomsky, N.: Why Only Us – Language and Evolution. The MIT Press, Cambridge (2016) 10. Fernyhough, C.: The Voices Within – the History and Science of How We talk to Ourselves. Basic Books, New York (2016) 11. Fodor, J.A.: LOT2 – the Language of Thought Revisited. Oxford University Press, Oxford (2008) 12. Jackendoff, R.: What is a concept that a mind may grasp it? Mind Lang. 4(1–2), 68–102 (1989) 13. Schneider, S.: The Language of Thought – a New Philosophical Direction. The MIT Press, Cambridge (2011) 14. Evans, V.E., Green, M.: Cognitive Linguistics – an Introduction. Lawrence Erlbaum Associates, Mah wah (2006) 15. Fauconnier, G., Turner, M.: The Way we Think – Conceptual Blending and the Mind’s Hidden Complexities. Basic Books, New York (2002) 16. Yudkowsky, E.: Levels of organization in general intelligence. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence. Cognitive Technologies. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-68677-4_12 17. Gärdenfors, P.: Three levels of inductive inference. Stud. Logic Found. Math. 134, 427–449 (1995) 18. Minsky, M.L.: The society of mind. Simon & Schuster, New York (1986) 19. Doyle, J.: A society of mind – multiple perspectives, reasoned assumptions, and virtual copies. In: Proceedings of Eighth International Joint Conference on Artificial Intelligence, pp. 309– 314 (1983) 20. Kuehne, S.E.: Understanding natural language descriptions of physical phenomena. Ph.D. Thesis, Northwestern University (2004) 21. Kuehne, S.E., Forbus, K.D.: Qualitative physics as a component in natural language semantics: a progress report. In: Twenty-Fourth Annual Meeting of the Cognitive Science Society, George Mason University (2002) 22. Jackson, P.C.: On achieving human-level knowledge representation by developing a natural language of thought. Procedia Comput. Sci. 190, 388–407 (2021)
Developing of Smart Technical Platforms Concerning National Economic Security Ksenia Sergeevna Khrupina3(B) , Irina Viktorovna Manakhova2 and Alexander Valentinovich Putilov1
,
1 National Research Nuclear University MEPHI, 115409 Moscow, Russia 2 Lomonosov Moscow State University, 119991 Moscow, Russia 3 Moscow State University of Humanities and Economics, 109044 Moscow, Russia
Abstract. The article examines the possibilities of taking into account the requirements of national economic security when used by domestic business. The requirements for economic security are formulated taking into account the opinions of leading researchers in this field in Russia and abroad, as well as the current provisions of the legislation of the OECD countries and Russia. The phenomena and trends that pose threats to the economic security of the country when its residents use the potential of smart technological platforms are identified. Those of the identified phenomena and trends that are of particular relevance from the point of view of Russia’s economic security are studied in detail. The changes in approaches to the economic security of the use of smart technology platforms that took place during the Covid-19 pandemic were noted. An interpretation of the changes that have occurred is proposed, and possible directions for further transformation of the concept of national economic security in relation to technological platforms are formulated. The assessments proposed by the author are based on the results of a study of foreign experience in the abuse of technology platforms claiming to be global smart ecosystems and countering such abuses by states. The strengths and weaknesses of the national economy of Russia as a subscriber of global smart technology platforms are investigated. Based on the results of the research carried out, a system of proposals has been formed regarding Russia’s participation in the development of its own technological platforms. Keywords: Technological platforms · National security · Smart technologies · Government regulation
1 Introduction The current situation in the global economy is characterized by the intensification of the participation of individuals and legal entities in information flows. The information flows themselves receive a more complex structure, both from the point of view of the transmitted information, and from the point of view of their supporting participants. Technological platforms that provide information exchange on the Internet use infrastructure dispersed in terms of geography and country ownership. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 208–215, 2022. https://doi.org/10.1007/978-3-030-96993-6_20
Developing of Smart Technical Platforms Concerning National Economic
209
Against the background of improving the quality of organizing information flows and increasing the throughput of information channels, there is a reduction in the number of organizers of the information exchange process. Information flows are unified, creating a single information space for the participants. At present, a counterparty performing a cognitive single operation in the network is actually involved in a large number of simple operations related to the organization of the initiation of information flow, transmission and protection of information, fixing information on its media, server and other information storage devices. The infrastructural convergence of information transfer technologies used in various industries has led to the closure of transmission channels of information that is different in its economic and cognitive purpose at common nodes, be it individual communicators (phones, netbooks, etc.) or servers involved in organizing the transfer of information… Cognitive convergence of information has become a source of additional information security risks. The combination of information flows of different economic nature, cognitively identifiable, for example, as entertainment and financial, within one technological platform increased the risks of distortion, loss of information or misuse by unscrupulous partners. The rapid development of the means of collecting and transmitting information, including the means of urban video surveillance, has given rise to a number of information security issues that have not previously figured among the topical. First of all, this is the problem of protecting the confidential information of citizens, which was obtained as a result of the organization of video surveillance and the distribution of responsibility for ensuring the safety of confidentiality and the organization of compensation payments in the event of an offense by an unidentified person. Given the emergence of the aforementioned problems, the issue of ensuring the economic security of counterparties while expanding the practice of using technological platforms is one of the primary issues that need to be addressed in the post-coronavirus period. At the same time, it should be emphasized the economic impossibility of mass landing of national counterparties from the use of technological platforms. It is also impossible to reorient national counterparties to use exclusively national platforms. At the same time, both the option of reorienting national business to use mainly national platforms, and the option of tightening control over foreign technological platforms (their elements), minimizing national economic risks, are relevant. The purpose of the research carried out in the article is to develop recommendations to take into account the priorities of national economic security in the context of systemic transformation of the global information space based on the expansion of the practice of using technological platforms. The conclusions and suggestions of the authors are based on changes in the structure of economic security risks caused by trends in the development of cybercrime over the period 2010–2021 and trends in the digitalization of the public administration system. Based on the results of the risk structure study, a system of recommendations for their minimization has been prepared based on the proactive embedding of digital security tools at the level of functioning of digital platforms of economy 4.0.
210
K. S. Khrupina et al.
2 Risks to Economic Security Associated with the Increasing Importance of Technology Platforms for National Information Spaces Informatization of the economy is a fundamental trend, the importance of which in the post-coronavirus reality will only increase. As applied to individual industries and spheres of human life, informationization takes various forms, however, regardless of the forms of its manifestation, the result that can be quantitatively measured is traffic. According to the forecasts prepared by UNCTAD, by 2026, a multiple increase in traffic is expected, including due to the most modern forms of its organization. Indicators of the dynamics of global traffic, carried out due to the three main technologies for its implementation for 2021, are presented in Fig. 1.
2026
231
576
69
71 2020
Fixed wireless access
175
19
Mobile data treffic Fixed data traffic
46 2019
104
0
12
200
400
600
800
1000
Fig. 1. Dynamics of global traffic in 2019–2020. and his forecast for 2026, in exabytes per month [1]
The growth in global traffic is accompanied by an increase in the complexity of information flows. The volume of heterogeneous information is increasing, the formats of transmission of information packets are becoming more complex. One of the most negative consequences of an increase in traffic is the impossibility of ensuring the security of information flows and managing them without the use of special hardware and software. As a result of the impossibility of non-specialized management of information flows, an increase in the share of cybercrimes in their total number should be noted. At the same time, the absolute leader, both among cybercrimes and among all crimes in general, according to the criterion of losses incurred by victims, became crimes, the possibility of which is caused by the low digital literacy of employees, while crimes related to insufficient software security did not bring such significant losses. Comparative characteristics of financial losses incurred by society from various crimes in 2019 are presented in Fig. 2.
Developing of Smart Technical Platforms Concerning National Economic
211
Analysis of Fig. 2 allows us to state that only one of the five groups of economic losses of victims of global crime presented in it is not directly related to cybercrime. Taking into account the fact that the importance of cyber crime as a systemic risk at the present time continues to grow, it can be recommended to overcome these risks on the basis of embedding information protection mechanisms into the structure of technological platforms that provide information interactions.
Personal data breach
0.149
Non-payment
0.341
Investment
0.353
Confidence fraud
0.363
BEC/EAC
1.297 0
0.2
0.4
0.6
BEC/EAC
Confidence fraud
Non-payment
Personal data breach
0.8
1
1.2
1.4
Investment
Fig. 2. Comparative characteristics of global financial losses from various crimes in 2019, billion US dollars. Compiled by the authors based on [2]
As part of the writing of this article, the authors have prepared an assessment of the correlation between the share of expenses of the leading Russian technological platforms for the prevention of cyber offenses and the share of these cyber offenses in their total amount. Comparison of the infographics glorified in Fig. 3 indicates the effectiveness of expenditures carried out by domestic technological platforms for the prevention of economic security risks. At the same time, these expenses do not ensure the prevention of negative consequences from the influence of new risks. The lag of the human potential of the average worker from the demands of the economy for his digital competencies is not taken into account. In view of the above, the structure of technological platforms used in Russia requires industry specialization. In accordance with the Report of the Ministry of Economic Development of the Russian Federation “Russian Technological Platforms”, by 2035, it is planned to introduce 28 technological industry platforms for such areas as medicine, nuclear energy, and finance. The technological and logical base of such platforms is based on IT technologies operated in 2010–2020, but their cognitive component is changing.
212
K. S. Khrupina et al.
Other
24
The damage caused due to the influence of the human factor
27
11
Defects in the functioning of the platform when scaling its coverage
26
9 9
Software failures, unstable operation of platforms
11
Vulnerability to illegal activities of employees
23
12
19
8
Vulnerability to hacker attacks 0 Share in losses
5
10
21
15
20
25
30
Cost share
Fig. 3. Assessment of the relationship between the costs of leading technological Russian protoplatforms (Sberbank, Yandex, Mail) to prevent threats to economic security and losses from the implementation of these threats, as a percentage of the total.
Along with the threats to the economic security of the micro-level, the threats to economic security associated with the dominance of foreign service providers of technological platforms are currently becoming actualized. Threats to economic security at the macro level can be divided into the following categories: a) threats associated with the location of physical storage devices in the territories outside the jurisdiction of the country, which creates a danger for their safety and risks of manipulating the possibility of using these devices for the purpose of unfair competition; b) the risks of assigning a lower priority to the Russian user of access to the technological platform and the risks of discrimination against the Russian user by the owner of the technological platform; c) risks of using the attachment of the national Russian manufacturer to foreign technological platforms within the framework of sectional pressure on the country; d) risks of technological lag of the Russian innovation system due to its focus on the consumption of a foreign product; these risks include the risks of degradation of the country’s human potential in the field of IT technologies and the risks of technological dependence on foreign suppliers of IT products. As a result of the growing interest of counterparties in the use of smart technology platforms in 2022 - 2025, an exponential increase in the economic losses of their customers from the activities of fraudsters is expected. In addition, at the end of this period, it is planned to finalize the market for services of smart technology platforms, followed by the commercialization of the influence they have achieved by the market leaders.
Developing of Smart Technical Platforms Concerning National Economic
213
3 Recommendations for the Development of Technological Platforms in Russia, Taking into Account the Requirements of Economic Security The proposed recommendations for the development of technology platforms are formulated taking into account the results of a SWOT analysis of the practice of using technology platforms by Russian business. The results of the SWOT analysis are presented in Table 1. Table 1. SWOT - analysis of the organization of technology platforms in Russia taking into account the requirements of economic security Strengths 1. Foreign suppliers are diversified 2. Technological platforms of defense and other strategically important state-owned enterprises are isolated, their operators are residents of Russia 3. There is a potential for import substitution of foreign technology platforms 4. With regard to the development of technology platforms, a priority development program is being implemented 5. There is a national strategy for the development of technology platforms
Weak sides 1. Great dependence on a collective foreign supplier of equipment and software 2. Civil spheres of the economy are 89% under the control of foreign technological platforms 3. There are no unified intersectoral technological platforms 4. Digital competencies of employees do not fully meet the needs of economic security 5. There is no effective risk insurance system
Possibilities 1. Centralization of industry technology platforms based on the most effective domestic IT solutions 2. Strengthening control over the movement of information within the channels provided by foreign partners 3. Combining the development of technology platforms and smart city
Threats 1. Growing technological lag due to the current economic superiority of foreign operators of technological platforms 2. Loss of access to development technologies due to sanctions pressure 3. Growth of economic losses of users of technological platforms due to insufficient training of personnel for the risks of working in the information environment
In this context, it is recommended to invest more actively in the development of domestic smart technology platforms in such areas as: a) strategically important areas; the recommended tool for ensuring the vector of development of smart technology platforms, taking into account the requirements of economic security in this sector of the economy, is the tightening of requirements for the share of a domestic manufacturer in the gross value of a digital product and its quality; b) financial sector; an increase in economic security for technological platforms operating in this area requires both an increase in the security of information due to IT
214
K. S. Khrupina et al.
solutions, and to a greater extent, the legal distribution of risks and the responsibility of compensation for losses of users in the event of leakage/loss/distortion of information during its transfer and processing within the framework of the technological platform. Particular attention should be paid to preventing information leakage as it enters more than one smart platform. So. The concept of a “smart” city is to assume full coverage of the urban area with a video surveillance system. Respectively. Leakage of information about, for example, a password from a client’s bank card when using an absolutely reliable financial technology platform, may occur through the fault of the organizer of the video surveillance system. These problems can be solved by organizing cross-platform technological integration based on modeling possible information flows, centralized collection of information and the use of big data algorithms to identify potential opportunities for leakage or distortion of information at the junction of platforms.
4 Conclusions Thus, according to the results of the study, the need for cross-platform integration was proved in order to increase the economic security of participants forced to carry out economic activities at the junction of technological platforms.
References 1. Digital economy report, Unktad. https://unctad.org/system/files/official-document/der 2021_en.pdf (2021) 2. Gorkov, S.: Russian technological platforms. http://biotech2030.ru/wp-content/uploads/ 2019/02/TP_RUS-04.02.2019.pdf (2021) 3. Sista, E., De Giovanni, P.: Scaling up smart city logistics projects: the case of the smooth project. Smart Cities 4(4), 1337–1365 (2021). https://doi.org/10.3390/smartcities4040071 4. Artifisial intelligence and national security. Congressional research Servise, US. https://sgp. fas.org/crs/natsec/R45178.pdf (2021) 5. Hsiao, Y.-C., Ming-Ho, W., Li, S.C.: elevated performance of the smart city—a case study of the IoT by innovation mode. IEEE Trans. Eng. Manage. 68(5), 1461–1475 (2021) 6. Esposti, S.D., Ball, K., Dibb, S.: What’s in it for us? benevolence, national security, and digital surveillance. Public Adm. Rev. 81(5), 862–873 (2021). https://doi.org/10.1111/puar.13362 7. Jiang, J., Chen, J.: Framework of blockchain-supported e-commerce platform for small and medium enterprises. Sustainability 13(15), 8158 (2021) 8. Gromova, E., Timokhin, D., Popova, G.: The role of digitalisation in the economy development of small innovative enterprises. Procedia Comput. Sci. 169, 461–467 (2020). https://doi.org/ 10.1016/j.procs.2020.02.224
Developing of Smart Technical Platforms Concerning National Economic
215
9. Polukhin, A.A., Yusipova, A.B., Panin, A.V., Timokhin, D.V., Logacheva, O.V.: The effectiveness of reserves development to increase effectiveness in agricultural organizations: economic assessment. In: Bogoviz, A.V. (ed.) The Challenge of Sustainability in Agricultural Systems. LNNS, vol. 206, pp. 3–14. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-721 10-7_1 10. Pawar, K.B., Dharwadkar, N.V., Deshpande, P.A., Honawad, S.K., Dhamadhikari, P.A.: An android based smart robotic vehicle for border secrity surveillance system. In: Proceedings 4th International Conference on Computational Intelligence and Communication Technologies, CCICT 2021, pp. 296–301 (2021). https://doi.org/10.1109/CCICT53244.2021.00062
Toward Working Definitions of Cognitive Processes Suitable for Design Specifications of BICA Joao E. Kogler Jr.(B) University of Sao Paulo, Sao Paulo, SP 05508-900, Brazil [email protected] http://www.lsi.usp.br/~kogler Abstract. This paper proposes some working definitions of cognitive processes and related concepts, following an information-theoretical approach, and aimed at the specification of cognitive systems design. Their extension to biologically inspired cognitive applications (BICA) is also considered, verifying if their interpretation conform with the corresponding sense usually found in biology and neuroscience. We address this discussion motivated by the increasing use of the terms ‘cognition’ and ‘cognitive’ by design teams of companies, in the specification and dissemination of several new technological applications, made without providing a clear indication of their meaning and the role they play in meeting the application’s functional requirements. This situation is partly due to the imprecise presentations of these concepts available in the literature, generally not easily translatable into design requirements specification. The price paid for this deficiency increases considerably in cases of critical dependency, suggesting the need for more accurate terminology, at least for design purposes. Keywords: Cognition · Cognitive systems · Cognitive processes Working definitions · Information-theoretic approach · Design specification
1
·
Introduction
This paper addresses a discussion on the search for working definitions of cognitive processes related terms based on an information-theoretical approach, aimed at the specification of cognitive systems design. A working definition conceptualizes a notion intending its use restricted to a particular domain and does not require that all the conditions contained in an authoritative definition be observed. In the present case, the working definitions considered are intended to obtain greater clarity and explanatory precision of terms to be used to specify design requirements of artificial cognitive systems. The approach will be done in two steps. In a first step it will not consider the extension of the working definitions to natural cognitive systems, that is, biological ones. In a second step we c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 216–222, 2022. https://doi.org/10.1007/978-3-030-96993-6_21
Working Definitions of Cognitive Processes
217
will also consider their extension to include cognitive applications with biological inspiration, verifying their conformity with the usual concepts used by the life sciences. The paper does not intend to provide an exhaustive review or rereading of the subject, as it would not fit in this restricted space. Nor does it intend to propose a definitive perspective, which would be one more to add to the many already existing, making it even more difficult for one to understand. The purpose is to raise a discussion on this topic, which has gained importance recently, although not explicitly, but implied by two aspects: – (i) The profusion of artificial intelligence (AI) technology application projects presented as “cognitive”, and – (ii) The growing concern with normative issues and design recommendations on ethical and safety aspects of the uses of AI. The first question to be analyzed is not whether natural cognitive systems can simply be explained on the basis of the conceptualization we propose, but whether artificial cognitive systems should be based on them. That is, whether these definitions are reasonable and adequate enough to characterize what is meant by ‘cognition’ and ‘cognitive’, without necessarily explaining how these attributes manifest themselves biologically. At first, we will be only interested in their coherence with the role they play in the natural world, so that these terms, when used for artificial artifacts, can make the same sense as they do in nature. Later, we will examine the other issue, which concerns with checking whether the definitions, considered satisfactory for the design of artificial cognitive systems, could be maintained or should be modified when introducing biologically inspired components, and also for the case of designing devices that must connect directly to living organisms, exchanging signals with them. In the next section, we will discuss some details of the necessity of better terminology for use in the design of artificial cognitive systems in general. In the further section we will discuss the extension of such terminology to the design of biologically inspired cognitive systems. Finally, in the last section we will consider perspectives arising from the use of this terminology for research purposes in cognitive science.
2
What is an Artificial Cognitive System
An artificial cognitive system is easy to define: it is a system designed to present cognitive features that determine its functioning and use. Therefore, not only the functioning of the system does depend on these features, but also its own identity as an artifact of the “cognitive” class. Consequently, the lack of clarity and precision in the characterization of this “cognitive” aspect can bring serious consequences for its proper functioning and use. However, if this “cognitive” profile is not critical for the functioning of the application, any deficiencies in this characterization may go unnoticed, until a problem arises. This is the case of a product advertised and sold as being “cognitive”, and its users were unable to discern the cognitive nature of its intended use: without complaints,
218
J. E. Kogler Jr.
there would be no tangible threat to those who designed and sold it. This happens, for example, in the case of “cognitive” technological applications aimed at sophisticated services that depend on typical machine intelligence features, such as learning, pattern recognition, and translation, among others, composing large systems capable of learning many work flows. These attributes are often advertised by manufacturers as the ability to perform tasks in“human ways”. In fact, what differentiates these applications from other “intelligent, but noncognitive” is the capacity of managing complexity more efficiently, but which do not necessarily present the trait of what we might call “cognitiveness”.1 The difficult question, therefore, is not to define what a cognitive system is, but what it means to be cognitive. The academic reference literature on the subject in general refrain to define the term cognition, limiting itself to present its concept as being what encompasses the set of mental faculties comprising the processes of perception, attention, memory, reasoning, language, learning, action, conscience and conation. However, this presentation brings two difficulties: – (i) It is a list with many items and one could ask whether these components are all indispensable for cognition. A positive answer would imply that a cognitive system would then necessarily have all these capacities. Otherwise, one could then ask whether the presence of any of these capacities would be sufficient to make a system cognitive, which would then imply that any agent would be cognitive, simply because it is capable of acting. If more than one of these capacities is required, one could then argue which of their combinations would suffice to the conditions for being cognitive and why. – (ii) If cognition is identified with this list, then wouldn’t be there some way to characterize it functionally without having to resort to the items on that list? Nevertheless, we could always ask if there would be some common denominator among the members of this list, and if so couldn’t it be cognition? Definitions that convey the meaning of a term by pointing to a list of other terms are called ostensive.2 The ostensive definition expedient is usual when a term is difficult to define in explanatory way and assumes that the list of defining terms is well known [13] §30. However, this is not the case of the list used for the term cognition, because although it contains intuitively intelligible terms, these can also offer difficulties to be formally defined. The use of this definition of cognition for design purposes brings potential uncertainty about the specific functionality expected for the application. However, designers cannot be blamed, as a significant number of textbooks on cognition employ this definition [4–6], while many simply omit [1,9]. Explanatory definitions of cognition usually resort to the etymology of the term, seeking to rescue its original meaning, which characterizes it as the ability to build, transform and store knowledge. In turn, knowledge can be characterized as information endowed with a special status, namely, information capable of guiding the production and coordination of actions. Following this line, some 1 2
See reference [7] for some examples of such cases. Sometimes referred to as umbrella concepts.
Working Definitions of Cognitive Processes
219
define knowledge as actionable information, that is, under this view, information requires an agent (or agency) and its interpretation is relative to the agent. Information, in turn, is understood as interpreted data, and the term data is defined as the codification of observed facts. For these authors, this codification translates into the recording of facts using a symbolic language (adapted from [10]). We consider these definitions adequate, albeit with some caveats and suggest some modifications. Firstly, when taking information as something already interpreted (data) with reference to an agent, it is assumed that specific aspects of the agent are involved in this: it must depend on its state and memory. Therefore, the potential character of the concept of information as something in principle capable of affecting different receivers in a similar way is lost, and it also eliminates the concept of information intrinsically associated with a message intentionally sent by a source, assuming a certain margin of interpretation. It also excludes the concept of information from a source that cannot be characterized as an agent, but rather a phenomenon or fact, simply reducing it to data. Therefore, the adaptation we suggest is that information is a property immanent to the data that allows its interpretation by an agent, without constituting the interpretation itself. Secondly, there is the question of interpretation itself. Meaning can be produced in two ways: semantic and pragmatic. The latter refers to a particular context, whether of situation or use. The former must be independent of specific contexts and must have an unconditional value [2]. We will then consider that cognition is the process by which an agent obtains, transforms and stores knowledge from information that is invariant in reference to a class of contexts to which it refers. Therefore, knowledge presupposes semantic interpretation. As a consequence of this definition, a role of construction of something similar to knowledge, but of a pragmatic character, will be reserved for the perception. This proposal arises naturally, as perception produces what is called percept, which is associated with interpretation dependent on a particular, specific and generally immediate context, whose objective is to guide the current behavior of the agent. By adopting the distinction between the concepts of perception and cognition, we are indirectly proposing the removal of cognition as encompassing perception, and eventually this could extend to the other capacities listed in the list ostensibly defining cognition mentioned earlier. However, the use of this ostensive definition has become almost instinctive and permeates most of the literature in several areas outside neuroscience. In our view, this can be resolved by introducing the concept of the cognitive process as an entity of a broader character. Our proposal consists, therefore, in defining cognitive process as being a more general category of processes that obtain from information the invariant aspects in relation to some type of reference. Therefore, knowledge would correspond to invariants in relation to several contexts within a certain class, and percepts would correspond to invariants within a certain context, that is, a situation or perceptual condition. Analogously, memory could be considered as a process of preservation of invariant information in relation to temporally
220
J. E. Kogler Jr.
distinct situations. Attention could be considered a process of seeking invariant information in relation to different observation situations that refer to the same objective or goal. And so on, this proposal to characterize cognitive processes as class members of transformations that produce invariants of certain types yields a more accurate understanding in favor of the needs imposed by specification in design. Naturally, this proposal still needs further development and has been studied and improved as indicated in [8].
3
The Case of Biologically-Inspired Cognitive Systems
The terminology proposed in the previous section is directly aimed at the specification tof he characteristics and requirements of components systems of artificial agents and technological applications that require the use of cognitive processes. However, no consideration was given to its adherence and compatibility with the terminology used in neuroscience, an aspect that becomes essential when turning to the project inspired by the biological instances of the considered cognitive processes. An important item to consider is the question of the distinction between cognition and perception emphasized in the previous section. In the view of some authors outside neuroscience, this separation can be seen as artificial, however it is widely present in neuroscience literature. This distinction between cognition and perception is traditionally adopted by neuroscience, although recently there have been some controversies related to the non-penetration of perception by cognition. However, although this aspect implies on the distinction between these two concepts, is not an immediate consequence of this fact. The distinction between the concepts of cognition and perception can be adopted without requiring the total separation of these capacities in a biological agent [3]. The proposal of using the term cognitive process as the concept underlying perception, and the other related ones, fits better than the usual view of cognition as their encompassing process, because the very idea of cognition presupposes an agent, and this could make the definitions circular. On the other hand, the concept of cognitive process as proposed is sufficient for the characterization of all the mentioned processes, including the cognition itself, which would then have its identity better characterized.
4
Conclusion and Perspectives
This article proposes a prototypical terminology that can be used as the starting point for a discussion about the necessity of providing greater clarity and conceptual accuracy to the benefit of the advancement of the design of artificial cognitive systems. The area is in natural expansion and growing rapidly, aiming at proposals of increasing complexity. This lead to two issues that have been examined with some priority: the safety and reliability requirements and the observance of ethical aspects. Recently a European committee [11] and the IEEE standards society [12] have produced normative recommendations aimed at meeting these current demands. Note that these publications deal explicitly
Working Definitions of Cognitive Processes
221
with the field of AI, but omit even implicit references to cognitive systems in the current editions. We defend that the use of a clearer and precise terminology for the concepts underlying cognition could encourage the explicit inclusion of topics about cognitive systems design and uses under these safety and ethical points of view. This article also intends that a discussion on the fundamental aspects about the functional nature of cognition and the cognitive processes would begin with the analysis of the definitions here provided and their use as a basis for the definitions of the related processes of reasoning, learning, acting, consciousness and conation . Furthermore, the terms introduced here also involve innovative aspects, considering the use of invariants as the basis for the construction of the terminology. It is noticed that these terms are sufficiently neutral from the point of view of the dispute between the different schools of thought in the conception of cognitive processes and, therefore, offer an interesting perspective for the construction of a more unifying vision that could consolidate the interdisciplinary character of cognitive science.
References 1. Anderson, J.: Cognitive Psychology and its Implications, 7th edn. Worth Publishers, New York, NY (2010) 2. Carston, R.: The semantics/pragmatics distinction - a view from relevance theory. In: Turner, K. (ed.) The Semantics-Pragmatics Interface From Different Points of View, pp. 85–125. Elsevier (1999) 3. Firestone, C., Scholl, B.: Cognition does not affect perception: evaluating the evidence for top- down effects. Behav. Brain Sci. 229, 1–77 (2016) 4. Friedenberg J., S.G.: Cognitive Science - An Introduction to the Study of Mind, Third Edition. Sage Publications, Thousand Oaks, California (2016) 5. Gazzaniga, M., Ivry, R., Mangun, G.: Cognitive Neuroscience - The Biology of the Mind, 2nd edn. Worth Publishers, New York, NY (2002) 6. Hayes, N.: Foundations of Psychology, 3rd edn. South-Western Cengage Learning, London, UK (2010) 7. Kogler J.J.: Cognitive is the new hype word. Blog article, WordPress.com (2018). https://kogler.wordpress.com/2018/04/21/cognitive-is-the-new-hype-word/ 8. Kogler J.R., Santos, J.P.: Information, structure and context in cognition. In: Adams, F., Pessoa, J.R., Kogler. J.J.E. (eds.) Cognitive Science: Recent Advances and Recurring Problems, pp. 179–196. Vernon Press (2017) 9. Posner, M.: Foundations of Cognitive Science. MIT Press, Cambridge, Massachusets (1998) 10. da Silva, F., Cullel, J.: Knowledge Coordination. Wiley, Chichester, England (2003) 11. The European Commission’s High-Level Expert Group on Artificial Intelligence: Draft Ethics guidelines for trustworthy AI. Technical Report, March 2021 update, (2018). https://digital-strategy.ec.europa.eu/en/library/draft-ethicsguidelines-trustworthy-ai
222
J. E. Kogler Jr.
12. The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems: Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems, First Edition. IEEE (2019). https://standards.ieee.org/content/ieee-standards/en/industry-connections/ec/ autonomous-systems.html 13. Wittgenstein, L.: Philosophical Investigations, 4th edn. Wiley-Blackwell, Malden, Massachusets (2009)
Intelligent System for Express Analysis of Electrophysical Characteristics of Nanocomposite Media Korchagin Sergey1 , Osipov Aleksey1 , Pleshakova Ekaterina1 , Ivanov Mikhail1 , Kupriyanov Dmitry2(B) , and Bublikov Konstantin3 1 Financial University under the Government of the Russian Federation, Shcherbakovskaya, 38,
Moscow, Russian Federation 2 National Research Nuclear University “MEPHI”, 31 Kashirskoe Shosse, Moscow 115409,
Russian Federation [email protected] 3 Institute of Electrical Engineering of the Slovak Academy of Sciences, Dubravska cesta 3484/9, Bratislava, Slovakia
Abstract. The article deals with the problem of training specialists in the field of nanotechnology. To solve this problem, it is proposed to use Web-applications that allow mathematical and computer modeling of nanocomposites and interact with specialized databases. As an example, a new Web application has been developed that allows for express analysis of the electrophysical characteristics of nanocomposite media. A distinctive feature of the Web-application is the presence of a database with information about the complex dielectric constant for 147 different materials, the possibility of remote control and multi-user mode. The web application allows students to make predictions of synthesized composites, as well as create new materials with specified properties. The advantages of the developed Web-application in the educational process are shown, among which it is worth noting the speed of development of new materials and the clarity of the computational experiment. Keywords: Web engineering · Nanocomposites · Software package · Electromagnetic radiation · Education
1 Introduction The rapid development of nanotechnology, accompanied by targeted research and effective projects in this area, is impossible without specialists in nanotechnology. Their preparation should be considered as an integral part of innovative education that produces personnel for a new type of economy. At the same time, the main requirements for future specialists are a high level of professional knowledge and skills and a high innovative activity [1–3].
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 223–230, 2022. https://doi.org/10.1007/978-3-030-96993-6_22
224
K. Sergey et al.
It should be noted that the constantly increasing flow of new knowledge in the field of nanotechnology, information about the properties of nanomaterials, nano- and microsystem technology requires constant updating, which creates additional difficulties in teaching these disciplines [4–6]. To overcome this problem, it is necessary to develop educational Web applications that allow for mathematical and computer modeling and interact with specialized databases, which will be regularly updated. Mathematical and computer modeling is a powerful tool for theoretical research in physics and an effective tool for teaching nanotechnology [6, 7]. Computer modeling as a research method actually carries the concept of an iterative paradigm of a computational experiment, since in the process of its implementation the mathematical model is refined, the computational algorithm is improved, and, possibly, the organization of the computational process is revised. This is especially important when studying the electrophysical characteristics of nanocomposites, since the experimental determination of the interaction of electromagnetic radiation with nanocomposite media is a laborious and expensive stage of research. Compared to a natural experiment, a computational experiment is much cheaper and more accessible, its preparation and implementation take less time, and it is easily controllable [8, 9]. In addition, mathematical and computer models provide more detailed information than physical experiments themselves. Thus, the development of Webapplications that allow for mathematical and computer modeling of nano-composites is an urgent method of training specialists in nano-technologies. Analysis of modern scientific literature [3, 10–12] has shown that the existing Webapplications for training specialists in the field of nanotechnology are rather narrow in nature, applicable to specific applied engineering problems. In addition, most of them are not freely available. Thus, there is an obvious need for universal web-applications that include a number of materials and allow, based on the available data on substances, to make predictions of synthesized composites. The paper proposes a new Web-application for express analysis of the electrophysical characteristics of nanocomposite media, which differ from analogs by the presence of a database with information on the complex dielectric constant for 147 different materials, the possibility of remote control and multiuser mode.
2 Models and Methods As a modeling method, the Web application uses a set of models of the theory of the effective environment (Rayleigh, Maxwell, Lorenz-Lorentz, Bruggemann, Lichtenecker, Odelevsky, etc.) The essence of the model of the effective environment is that the system of clusters that form a composition -one material is considered as a kind of new medium with the same level of polarization. Thus, knowing the parameters of each of the components of the composite, their geometric shape and concentration, it is possible to determine the characteristics of the resulting nanocomposite medium as a whole. The advantage of this approach is that in order to analyze the propagation of an electromagnetic field in a composite medium, there is no need to solve Maxwell’s equations at each point in space. The construction and analysis of such models are based on solving the electrostatics problem about a local field in a ball. Depending on the type, shape of
Intelligent System for Express Analysis of Electrophysical Characteristics
225
inclusions in the matrix of the composite and the direction of the field action, within the framework of the effective medium theory, various models are given for determining the electrodynamic properties of the composite medium.- Matpiqna cmec, vklqeni odinakovogo padiyca pacpoloeny v yzlax ppocto kybiqecko pexetki [12]: ⎡ ⎤ ν ⎦, ε = εm ⎣1 + ε +2ε εp −εm p m 10/3 − ν − 1,65 ν εp −εm εp +4/3εm where ε is the dielectric constant of the entire nanocomposite, εm is the dielectric constant of the matrix, εp is the dielectric constant of inclusions, ν is the volume fraction of inclusions in the nanocomposite. – System densely aggregated by two types of spherical particles [13]: εp − εm ε − εm = ν, 3ε εp + 2ε 2ε2 + ε εp − 2εm + 3ν εm − εp − εm εp = 0; – Statistical mixture with a chaotic arrangement of particles [14]: ε1 ε2 , ε = a + a2 + 2 a=
(3ν − 1)ε1 + (2 − 3ν)ε2 ; 4
– An ordered cubic system of spherical inclusions in a matrix [15]:
ε1 − ε2 3,33 ε1 + 2ε2 εeff = ε2 1 + 3v/( − v − 1, 31 v ) ; ε1 − ε2 ε1 + 43 ε2 – Chaotically arranged cylindrical inclusions in the matrix under the action of a field along the axes of the cylinders [15]: 1 εeff = ε1 1 + v(ε1 − ε2) /(ε1 + (1 − v)(ε1 − ε2 )) ; 2 – Chaotically located cylindrical inclusions in the matrix under the influence of a field perpendicularly directed relative to the axes of the cylinders [16]:
εeff = (v − 0, 5)(ε1 − ε2 ) + ((v − 0, 5)(ε1 − ε2 ))2 + ε1 ε2 . The interface of the software package was created in the C++ Builder 10.0 tool environment and consists of three forms. The first form is the main window in which the selection of models and methods for the analysis of the nanocomposite takes place. In the second form, in accordance with the selected category, calculations and charting are
226
K. Sergey et al.
carried out. Libraries necessary for the creation and operation of the program are also connected to it. The third form is the “About” window. It contains a button for returning to the first form, linking to the first and second forms, information about the author of the software package, name, version of the program, release date and tool environment [17–19]. The web application allows you to study the electrodynamic properties of heterogeneous media based on a number of models. The analysis of the properties is carried out on the basis of the data entered into the system on the parameters of the medium: the volume fractions of the components, the complex dielectric constant of the substances, the shape, structure and orientation of the particles of inclusions in space [20–26]. The modularity of the development (Fig. 1) allows expanding the functionality, especially an important fact is the ability to add new models of heterogeneous media and new numerical methods.
Fig. 1. Structure of the software package
At the moment, more than 20 models (Rayleigh, Lichtenecker, Lorenz-Lorentz, Maxwell-Garnett, Bruggemann, Odelevsky, etc.) have been implemented for various morphologies of composites. During the development of the software package, the need for a source of initial data on material properties was discovered. In this role, it is most convenient to use tabular data from experimental studies of material properties. For these purposes, a scheme of operation of several software systems has been designed using a single database of material properties (Fig. 2).
Intelligent System for Express Analysis of Electrophysical Characteristics
227
Fig. 2. Structure of interaction using a unified database of material properties
The remote database of material properties has an interface for performing administrative functions and an interface for uploading data to an external client. In turn, the software package on the client’s side receives data through a unified data collection module from a global database or from a local one. Application interconnection with a remote database occurs via a local or global network. This approach makes it possible to create an instrumental environment for conducting various kinds of research in the field of analysis of the properties of heterogeneous media and to implement the possibility of remote connection to an always up-to-date database of material properties. The interface allows performing express analysis of complex nanostructures with higher accuracy and speed compared to existing analogs, developing and simulating new functional materials. An important feature of the developed software package is the possibility of remote control and the presence of a multi-user mode, which is especially useful in training specialists in nanotechnology.
3 Results and Discussion Figure 3 shows the interface of the developed educational Web application: Using the developed Web application, the frequency dependences of the real and imaginary components of the effective dielectric constant of composite media consisting of a matrix and inclusions of spherical and cylindrical shapes are obtained. The dependences of the complex dielectric constant on the frequency of the external action of the electromagnetic field in the wavelength range of 1000–6000 nm for the nanocomposite of the following types of inclusions have been investigated: • chaotically located spherical inclusions in the matrix; • ordered cubic system of spherical inclusions in the matrix; • randomly located cylindrical inclusions in the matrix under the influence of the field along the axes of the cylinders; • close-packed cylindrical inclusions in the matrix under the influence of a field perpendicularly directed relative to the axes of the cylinders;
228
K. Sergey et al.
Fig. 3. Web-application interface for express analysis of electrophysical properties of nanocomposite media
• randomly located cylindrical inclusions in the matrix under the influence of a field perpendicular to the axes of the cylinders. The results obtained indicate that the structural features of the composite have a significant effect on the electrodynamic properties of the object under study. The type of inclusions, electrical layers on the boundary surfaces of dispersed particles significantly affect the complex dielectric constant. Investigation of the models presented in the work by theoretical and calculation methods allows one to determine the conditions for the manifestation of various electrodynamic effects and to establish the optimal requirements for the preparation of a full-scale experiment, which is very important aspects when studying the interaction of electromagnetic radiation with nanocomposites.
4 Conclusion The paper shows the implementation of a Web application for the rapid analysis of the electrophysical properties of nanocomposite media for training specialists in the field of nanotechnology. The proposed approach allows students to quickly obtain information on the electrophysical properties of nano-composites and to design new materials with desired properties. The system of nanotechnological education has yet to go through the path of formation. There is a lot of joint work at the junction of information technology, physics and
Intelligent System for Express Analysis of Electrophysical Characteristics
229
materials science to saturate the developing nanotechnological market with qualified specialists who are able to create the production of a wide range of nanotechnological products, which are currently in great demand.
References 1. Patrikeev, L.N., Kargin, N.I.: Nanotechnology in development of vital engineering projects (introductory course for the preparation of bachelors, masters and specialists in nanotechnology). Communications 8(2), 28–31 (2020) 2. Jumaah, M.W., Altaie, M.: Application of nanotechnology in Iraqi construction projects. IOP Conf. Ser. Mater. Sci. Eng. 90, 012019 (2020) 3. Korchagin, S.A., Terin, D.V., Klinaev, Yu.V., Romanchuk, S.P.: Simulation of currentvoltage characteristics of conglomerate of nonlinear semiconductor nanocomposites. In: 2018 International Conference on Actual Problems of Electron Devices Engineering (APEDE), pp. 397–399. IEEE (2018) 4. Mandrikas, A., Michailidi, E., Stavrou, D.: Teaching nanotechnology in primary education. Res. Sci. Technol. Educ. 38(4), 377–395 (2020) 5. Yu, H.P., Jen, E.: Integrating nanotechnology in the science curriculum for elementary highability students in Taiwan: evidenced-based lessons. Roeper Rev. 42(1), 38–48 (2020) 6. Dogadina, E.P, Smirnov, M.V., Osipov, A.V., Suvorov, S.V.: Evaluation of the forms of education of high school students using a hybrid model based on various optimization methods and a neural network. Informatics 8(3), 46 (2021). Multidisciplinary Digital Publishing Institute 7. Korchagin, S.A., et al.: Software and digital methods in the natural experiment for the research of dielectric permeability of nanocomposites. In: 2018 International Conference on Actual Problems of Electron Devices Engineering (APEDE), Saratov, pp. 262–265. IEEE (2018) 8. Souza, B.E., et al.: Elucidating the drug release from metal-organic framework nanocomposites via in situ synchrotron microspectroscopy and theoretical modeling. ACS Appl. Mater. Interfaces 12(4), 5147–5156 (2020) 9. Kim, B., et al.: Multiscale modeling of interphase in crosslinked epoxy nanocomposites. Compos. B Eng. 120, 128–142 (2017) 10. Shen, X., et al.: A three-dimensional multilayer graphene web for polymer nanocomposites with exceptional transport properties and fracture resistance. Mater. Horiz. 5(2), 275–284 (2018) 11. Liu, Y., et al.: Fluorescent microarrays of in situ crystallized perovskite nanocomposites fabricated for patterned applications by using inkjet printing. ACS Nano 13(2), 2042–2049 (2019) 12. Korchagin, S.A., Terin, D.V.: Development program complex for the suppression of chaos in the process of corrosion of metals. In: 2016 International Conference on Actual Problems of Electron Devices Engineering (APEDE), Saratov, pp. 1–6. IEEE (2016) 13. Cali, M., et al.: An effective model for the sliding contact forces in a multibody environment. In: Advances on Mechanics, Design Engineering and Manufacturing, pp. 675–685 (2017) 14. Alfa, A.A., et al.: An effective instruction execution and processing model in multiuser machine environment. In: Khanna, A., Gupta, D., Bhattacharyya, S., Snasel, V., Platos, J., Hassanien, A.E. (eds.) International Conference on Innovative Computing and Communications. AISC, vol. 1087, pp. 805–817. Springer, Singapore (2020). https://doi.org/10.1007/ 978-981-15-1286-5_71 15. Korchagin, S.A., et al.: Modeling the dielectric constant of silicon-based nanocomposites using machine learning. In: 2020 International Conference on Actual Problems of Electron Devices Engineering (APEDE), Saratov, pp. 1–3. IEEE (2020)
230
K. Sergey et al.
16. Kim, R.P., Romanchuk, S.P., Terin, D.V., Korchagin, S.A.: The use of a genetic algorithm in modeling the electrophysical properties of a layered nanocomposite. Izv. Saratov Univ. (N. S.) Ser. Math. Mech. Inform. 19(2), 217–225 (2019) 17. Hu, R., Oskay, C.: Multiscale nonlocal effective medium model for in-plane elastic wave dispersion and attenuation in periodic composites. J. Mech. Phys. Solids 124, 220–243 (2019) 18. Sarout, J., et al.: Stress-dependent permeability and wave dispersion in tight cracked rocks: experimental validation of simple effective medium models. J. Geophys. Res. Solid Earth 122(8), 6180–6201 (2017) 19. Shirokanev, A.S., Andriyanov, N.A., Ilyasova, N.Y.: Development of vector algorithm using CUDA technology for three-dimensional retinal laser coagulation process modelling. Comput. Opt. 45(3), 427–437 (2021) 20. Soloviev, V., Titov, N., Smirnova, E.: Coking coal railway transportation forecasting using ensembles of ElasticNet, LightGBM, and Facebook prophet. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12566, pp. 181–190. Springer, Cham (2020). https://doi.org/10.1007/9783-030-64580-9_15 21. Sebyakin, A., Soloviev, V., Zolotaryuk, A.: Spatio-temporal deepfake detection with deep neural networks. In: Toeppe, K., Yan, H., Chu, S.K.W. (eds.) iConference 2021. LNCS, vol. 12645, pp. 78–94. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71292-1_8 22. Gataullin, T.M., Gataullin, S.T.: Best economic approaches under conditions of uncertainty. In: 11th International Conference “Management of Large-Scale System Development”, MLSD 2018, Moscow (2018). https://doi.org/10.1109/MLSD.2018.8551800 23. Gataullin, T.M., Gataullin, S.T.: Management of financial flows on transport. In: 12th International Conference “Management of Large-Scale System Development”, MLSD 2019, Moscow (2019). https://doi.org/10.1109/MLSD.2019.8911006 24. Gataullin, T.M., Gataullin, S.T., Ivanova, K.V.: Modeling an electronic auction. In: Popkova, E.G., Sergi, B.S. (eds.) ISC 2020. LNNS, vol. 155, pp. 1108–1117. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-59126-7_122 25. Yerznkyan, B.H., Gataullin, T.M., Gataullin, S.T.: Solow models with linear labor function for industry and enterprise. Montenegrin J. Econ. (2021). https://doi.org/10.14254/1800-5845/ 2021.17-1.8 26. Korchagin, S., Romanova, E., Serdechnyy, D., Nikitin, P., Dolgov, V., Feklin, V.: Mathematical modeling of layered nanocomposite of fractal structure. Mathematics (2021). https://doi.org/ 10.3390/math9131541
Classification and Generation of Virtual Dancer Social Behaviors Based on Deep Learning in a Simple Virtual Environment Paradigm Andrey I. Kuzmin, Denis A. Semyonov, and Alexei V. Samsonovich(B) National Research Nuclear University MEPhI, Kashirskoe Hwy 31, Moscow 115409, Russia [email protected]
Abstract. This work examines one possibility of using a deep neural network to control a virtual dance partner behavior. A neural-network-based system is designed and used for classification, evaluation, prediction, and generation of socially emotional behavior of a virtual actor. The network is trained using deep learning on the data generated with an algorithm implementing the eBICA cognitive architecture. Results show that, in the selected virtual dance paradigm, (1) the functionality of the cognitive model can be efficiently transferred to the neural network using deep learning, allowing the network to generate socially emotional behavior of a dance partner similar to a human participant behavior or the behavior generated algorithmically based on eBICA, and (2) the trained neural network can correctly identify the character types of virtual dance partners based on their behavior. When considered together with related studies, our findings lead to more general implications extending beyond the selected paradigm. Keywords: Deep neural networks · Cognitive architecture · Virtual dance partner · Socially Emotional Agents · Deep learning · LSTM · Multilayer perceptron
1 Introduction Research in the field of emotional Artificial Intelligence (AI) is becoming increasingly popular, which is a consequence of both, the growing scientific interest in the field of emotions and creativity per se and the potential applicability of research in this field to the many emergent practical tasks, including various applications of emotionally intelligent social agents [5]. New technologies, like Socially Emotional Agents, often lead to a transformation of science, opening new approaches to do research and development projects that could not be conceived before. This work examines one possibility of an application of deep learning technologies [7] to the Virtual Partner Dance paradigm [2]. In this simplistic paradigm, three actors are dancing with each other in a virtual environment. The only freedom of an actor is to select a new partner (or nobody) at any moment of time. Here we study a neural-network-based system designed for classifying, evaluating, and generating socially emotional behavior of an actor in this paradigm. An implementation of the cognitive model described earlier © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 231–242, 2022. https://doi.org/10.1007/978-3-030-96993-6_23
232
A. I. Kuzmin et al.
[2] based on the emotional Biologically Inspired Cognitive Architecture (eBICA: [3, 4]) is used here for two purposes: (a) to generate the synthetic behavioral data necessary for training the neural network, and (b) to measure the neural network performance. As a result, the functionality is transferred from the cognitive model to the neural network. The rest of the paper is organized as follows. First, an analysis of the theoretical aspects of the approach is presented, identifying possible tasks for the neural network in the selected paradigm and possible forms of its integration with a cognitive-model-based approach. The next section deals with the adaptation of the cognitive model for training neural networks. It also provides an overview and development of the neural network architecture for solving the generation and classification tasks. Then, a system is designed for classifying, evaluating, predicting, and generating socially emotional behavior of an actor in the Virtual Partner Dance paradigm. Preliminary results of implementation and testing are reported. The paper is concluded with discussion of related work and implications of the findings.
2 Materials and Methods 2.1 Implementation of the Virtual Partner Dance Paradigm For the implementation of the Virtual Partner Dance paradigm for human participants, the Unreal Engine 4 game engine was chosen, the use of which allowed us to significantly reduce the time spent on development. The project also uses materials purchased both on third-party sites and in the official Epic Games store. Implemented functionality includes the following features. The current implementation allows for simultaneous participation of 1 or 2 human participants (each on a separate computer), with the rest of 2 or 1 avatars, respectively, controlled by Virtual Actors. The control of avatars’ behavior by a Virtual Actor is implemented using the socket interface. For human participants, control of the avatar is carried out using a mouse and keyboard. The mouse sets the direction of the subject’s gaze, determined as the direction of view at the center of the monitor. The avatar the player is currently focused on is highlighted and considered as the current partner choice. The direction of the subject’s head and body is also derived from the direction of view at the center of the monitor, controlled by the mouse. The buttons on the keyboard are used to indicate the current emotional state and the dance state. Also implemented is the capability to read the participant’s facial expression using an iPad with the LiveLink Face application installed, based on which it is possible to estimate the participant’s current emotional state as well as the gaze direction. This capability, however, was not used in the presented study. The spectrum of available emotional states includes calm, joy, sadness, and disgust. The set of dance states controlled by a human participant or by a Virtual Actor is limited to two: standing or dancing, while the engine automatically selects a random dance pattern or an idle pose for the avatar to perform, depending on the dance state. The available gaze and head direction states include straight ahead, in the eyes of the partner, away from the partner (Fig. 1).
Classification and Generation of Virtual Dancer Social Behaviors
233
Fig. 1. Screenshot of the running NightClub3 application (from [2]).
Several avatars are available for the player to choose from, as partner avatars (the participant’s own avatar is invisible to the participant). Each avatar’s partner choice is shown by a luminous arrow on the floor. The implementation allows for conducting Turing-like-test experiments using the “two human participants and one Virtual Actor” format. This is done via the local network, while the two participants are allocated in different isolated rooms and cannot see or hear each other. One of the two computers in this case works in the server mode, and the other in a client mode. The interface is implemented in such a way that it is impossible for the participant to distinguish between the single-participant and the dual-participant modes. In sessions with three Virtual Actors controlled by eBICA, the visualization of the virtual environment was not used. This approach allowed us to speed up the simulated time flow substantially and therefore to collect large volumes of behavioral data in the available physical time. 2.2 Virtual Actor Behavior Model Details of the Virtual Actor implementation based on the eBICA cognitive architecture and of their simulated social interaction in the Virtual Partner Dance paradigm are described in the previous work [2]. Here we remind several key features. Considering the context of each partner switching or not switching, actions of the actor X with respect to the actor Y in the Virtual Partner Dance paradigm can be classified into the following categories. • Invitation: agent X turns to Y, demonstrating an intention to dance with Y, without being the current partner of Y. • Rejection: agent X shows no intention to partner with Y for a period of time longer than T, since Y invited X.
234
A. I. Kuzmin et al.
• Harassment: agent X continues “inviting” Y after being rejected by Y. • Acceptance: agent X chooses Y as the current partner after Y has been inviting or harassing X. • Revocation: agent X stops inviting agent Y without being accepted by Y. • Maintaining: after time T from the moment of establishing mutual partnership with Y, agent X continues partnering with Y. • Dropping: agent X breaks mutual partnership with Y by selecting no one. • Abandoning: agent X breaks mutual partnership with Y by selecting another partner. Here T is a private intrinsic parameter of an actor. The value of T in each case depends on the character type of the actor who is judging the action. E.g., one and the same action may have different interpretations when judged from the X, Y, or Z perspectives. In this model, four main character types of actors are used, determined by their private parameters T and D (dominance). The types are understood as follows [2]. • Timid: uncertain, indecisive in relations to others. This type does not invite a potential partner first, does not dance alone, never drops or abandons the partner. • Ringleader: tries to get everyone engaged in the dance. This type tries not to dance alone, is actively looking for a partner; does not abandon the partner, except when the third party, who is not unpleasant to the Ringleader, has not been a partner of the Ringleader for a long time. • Dancer: just wants to dance. This type can dance alone, can drop or abandon partners for no reason. Is likely to keep a mutual partnership during T, but over time begins to lose interest in the partner. • Naïve: wants to keep the partner. This type can dance alone and can invite others. In the absence of partnership always accepts an invitation and never drops or abandons the partner. Again, details of implementation of these character types in Virtual Actors performing as dance partners in the selected paradigm can be found in [2]. 2.3 Teaching Neural Networks with eBICA to Control Virtual Actor Behavior Ideas of this approach go back to the work of Trafton et al. [1] and were recently used in our related study [20, 21]. As a result of a virtual dance session, with or without human participation, we have behavioral data consisting of all events of changing the partner choice by all actors. These are categorical data that should be used for training a neural network. In addition, each performed action considered in the context is characterized by a vector of appraisals, including Valence, Arousal and Dominance, as specified by the eBICA model [3, 4]. It should be pointed that the three semantic dimensions are universal [6] and with certain remarks can be considered common for many models, starting from the Semantic Differential of Osgood [15] or the Russell’s Circumplex [16], to the Plutchik’s wheel [17], or the Lövheim’s cube [18]. By no means our consideration is limited to a particular version of the model. The terms Valence, Arousal and Dominance used here are merely symbolic and have different flavors of their meaning in each new paradigm. In our case, for example, Arousal is related to the individual time
Classification and Generation of Virtual Dancer Social Behaviors
235
scale T, Valence determines the likelihood of partner acceptance and maintenance, and Dominance determines the likelihood of imposing own decisions on others. Behavior of each actor results in the objective appraisal of that actor computed according to eBICA, and that appraisal in turn objectively determines the character type. In this way, eBICA works as a classifier. And vice versa, given the character type, eBICA generates behavior specific for that type. Therefore, as a result of a virtual dance session we have a sample of behavioral data in the form of discrete categorical values, each with its timestamp. In addition, we have continuous values of the action appraisals. To be able to use a neural network approach for any task based on these types of the input data, we need to decide on the format of presentation of these data to the network. Among various methods of presenting categorical data for processing by a neural network, here we select one-hot encoding ([19], p. 129). Two tasks for the neural network will be of interest to us here: categorization and prediction of a given virtual dancer behavior. The first, categorization, can be also used for evaluation of behavior, while the second, prediction, can be used to generate a continuation of behavior, given its beginning and the actions of other agents. The first task is in fact the task of behavior classification based on the observed interaction between dancers, and the outcome is the character type. The second task is to generate a sequence of dancer’s actions, either starting from scratch or by continuing a given observed behavior. A neural network can do both types of behavior generation: (1) continuation of actor’s behavior, and (2) generation of behavior “from scratch”, given the character type. In this work we studied (1) only, in addition to the first task. There are multiple possibilities to combine cognitive and connectionist approaches in solving the aforementioned tasks. Two usages of cognitive models (in our case, eBICA) are of interest to us here: (a) to generate the data necessary for training neural networks using deep learning, and (b) to evaluate the performance of the neural network. To remind, the eBICA architecture consists of seven components [4]: interface buffer, working memory, episodic memory, value system, semantic memory, procedural memory, and semantic maps, including the maps of emotional values. Its particular implementation is adapted to the selected paradigm and limited to necessary components only, as described earlier [2]. 2.4 Development of the System Architecture A neural-network-based system needs to be developed to do the following two tasks: (a) to generate behavior that is similar to given examples, and (b) to classify the given behavior based on given examples of behavior classes. In this work, two different systems were created and used for the two tasks. Developing a Deep Neural Network Architecture for Behavior Generation. The paradigm for behavior generation by a neural network consisted in priming the network with the beginning of the sequence of actions generated by the cognitive model. This phase followed by a continuation of the sequence, now generated by the neural network. For this task, an LSTM network was used. The reason for this choice is that LSTM is a recurrent network that can remember past behavior, determining the dynamical state of
236
A. I. Kuzmin et al.
the simulated system, for a sufficiently long time. LSTM models are designed specifically to avoid the problem of long-term dependency [7–14]. Remembering information for long periods of time is their distinguishing feature. Given that the input is categorical data converted using the one-hot encoding, it was necessary to implement an encoder-decoder model for further use with LSTM. The encoder-decoder model is a method of adaptation of recurrent neural networks for sequence-by-sequence prediction problems. The approach involves two recurrent neural networks: one for encoding the source sequence, called the encoder, and the second for decoding the encoded source sequence into the target sequence, called the decoder. In the process of optimization of the encoder-decoder function, the sizes of the input and output sequences are used as the input. The result is the generated model of the encoder, the decoder, and the neural network model that needs to be trained. To train a neural network using this scheme one needs to set three basic parameters: the optimizer, the loss, and the metrics for correctness. Next, one needs to use the prediction capability of the system, which in turn depends on the encoder/decoder model, the source sequence, and the output characteristics, including the length and cardinality of the output and the target sequences. In summary, the entire process of training and testing the system involves the following stages. • Acquisition and preprocessing of the data. • Setting parameters of the training paradigm: the sequence length and cardinality, the number of examples and their division into training and test groups. • Designing the encoder/decoder and neural network model to be trained. • Training of the neural network model. • Data prediction using the trained neural network and comparison of results against the target sequences. Developing a Deep Neural Network Architecture for the Classification Task. To adapt categorical data for processing with a neural network, categories can be represented either as patterns of numbers or using one-hot encoding. In this way, categorical values can be encoded using numeric values, represented by activities in an array of neuronal units. Neural network architectures most commonly used for data categorization are feedforward multilayer models (multilayer perceptrons). Input neurons may represent particular features of the input data, and patterns of activity of the output neurons represent categories. To train the neural network, we used datasets containing sequences of actions of Virtual Actors (representing the behavior of a virtual dancer) generated using the eBICAbased cognitive model. Input data was divided according to the Cross-validation principle. The idea is that given K samples of data, one is set for testing, while the remaining K-1 samples are used as the training dataset. In the present study, the input data was divided into three groups: training, validation, and test datasets, in the ratio of 60 to 30 to 10. The available data was split into the three datasets at random. Various optimizers were used during training (e.g., Adam, Adagrad,
Classification and Generation of Virtual Dancer Social Behaviors
237
SGD, Nesterov), before the final optimal choice was made. In summary, the procedure involved the following steps. • • • • •
Uploading and preprocessing of datasets and their annotations. Splitting available datasets into three samples (training, validation, and test). Training the system using several choices of hyperparameters. Making an optimal choice and performing the training with it. Testing the trained system and analyzing the results.
Integrated System Design. The two designed neural-network-based systems, together with the eBICA-based cognitive model, were integrated into one system. Considering the integrated system capabilities, the following modes of its usage became available. • Neural-network-based Virtual Actor behavior generation from scratch. • Virtual Actor behavior generation using the cognitive model first, and then continuing behavior generation using the trained neural network. • Neural-network-based Virtual Actor character type identification based on behavior. • eBICA-based Virtual Actor character type identification based on behavior using the cognitive model. 2.5 Implementation Details The system was implemented in Python using the following libraries: Tensorflow, Keras, NumPy, Pandas, MatPlotLib, Random, and IterTools. The network used for classification consisted of the following layers: • One-hot-encoding (for personality types) • Replacing categorical input data with numbers (numbering) • Three-layer perceptron (activation: “sigmoid”,” sigmoid”, “softmax”; numbers of neurons: 10/10/5; initializers: RandomUniform(-1:1), RandomUniform(), Ones(). • Cross-validation (60/30/10). The Categorical Cross-entropy optimizers were selected as loss. Other optmiizers used for testing include: Adam, Adagrad, SGD, Nesterov. The network used for behavior generation was an LSTM network, including the encoder and decoder layers. Optimizer: Adam, loss: Categorical Cross-entropy, metrics: accuracy. Design and implementation of the eBICA-based cognitive model controlling and evaluating Virtual Actor behavior is described in the previous work [2]. 2.6 Participants, Experimental Procedures and Data Collection In addition to synthetic data generation, human participants were used in order to collect samples of human behavior. In total, 20 human participants took part in this study. All of them were NRNU MEPhI college students studying Program Engineering, age 21 to 27, native Russian speakers, men and women in nearly equal proportions.
238
A. I. Kuzmin et al.
Participants were instructed to behave naturally and try to have a partner most of the time during the experiment session. In some sessions participants were interacting with two Virtual Actors, whereas in other sessions they interacted with one human participant (located in a different room) and one Virtual Actor, without any a priori information, who is who. Considering the capabilities of the system, logging of all human and Virtual Actor actions was possible in every session. In addition, a synchronous video of the participant’s face was recorded in all sessions for subsequent offline analysis.
3 Preliminary Results As a result of testing, it was found that Nesterov was the best optimizer for the behavior classification network. The test results are shown in Tables 1 and 2. Table 1. Results of the Nesterov learning depending on the learning rate. Learning rate
0.05
0.01
0.005
Test loss
0.7644
0.4709
0.5532
Test classification error
0.2538
0.1078
0.1377
Table 2. Results of the Nesterov learning as a function of momentum. Momentum
0.3
0.6
0.9
Test loss
0.5613
0.4709
0.4718
Test classification error
0.1622
0.1078
0.1144
The Loss/Error graph of the final training of the model using Nesterov is shown in Fig. 2. Next, the ability of the system to generate the Virtual Actor behavior was tested. To do this, the data sample generated with the cognitive model was divided into two parts: 90% for training, and 10% for testing. Each dataset was preprocessed and divided into two parts according to this scheme. In the outcome, the fraction of the correctly identified character types was 76%.
Classification and Generation of Virtual Dancer Social Behaviors
239
Fig. 2. Plots representing the loss function.
4 Discussion In this work we examined one possibility of using a deep neural network to control a virtual dance partner behavior. A neural-network-based system was designed and used for classification, evaluation, prediction, and generation of socially emotional behavior of a virtual actor. The network was trained using deep learning on the data generated with an algorithm implementing the eBICA cognitive architecture described previously [2]. Preliminary results of our study presented in this paper show that in the selected virtual dance paradigm. 1) the functionality of the cognitive model can be efficiently transferred to the neural network using deep learning, allowing the network to generate socially emotional behavior of a dance partner similar to a human participant behavior or the behavior generated algorithmically based on eBICA, and 2) the trained neural network can correctly identify the character types of virtual dance partners based on their behavior. When considered together with related studies [22–31], our findings lead to more general implications extending beyond the selected paradigm. Their analysis suggests that many semantic dimensions can be used in processing of cognitive appraisals: the set is not limited to affective dimensions and may also include characteristics like abstractness [32], openness, etc. Increasing the number of semantic map dimensions may not be a complete solution: eventually, aspects of the agency and the notion of the self [33] will become important. In the near future, we plan to add the following features to the system:
240
A. I. Kuzmin et al.
• Obtaining the subject’s emotion and the direction of his gaze from the parameters of his face, obtained using the iPad and the LiveLink Face application. • Ability to launch the application in VR mode using the HTC Vive Pro Eye virtual reality helmet, while using the gaze tracking built into the helmet. It is assumed that it will be possible to use a virtual reality helmet in both single-user and multiuser modes. In the case of multi-user mode, both the format “two subjects in virtual reality helmets” and the format “one subject in a virtual reality helmet and one at the computer” are possible. • Possibility of choosing a male dancer (at the moment only female dancers are implemented in the application). Male character models have already been purchased from the Official Epic Games Store. The application is potentially extendable and can be augmented to accommodate more behavioral freedoms, such as horizontal movement with respect to the partner (approaching or distancing from the partner). Dance movement synchronization would be another possible enhancement. The current implementation is intentionally designed as a simplistic one, with very few degrees of freedom that the participant can control. This choice was necessary for the Virtual Actor to be compatible with a human participant in the ability to control the avatar. Future applications should overcome this limitation. Acknowledgments. This work was supported by the Ministry of Science and Higher Education of the Russian Federation, state assignment project No. 0723-2020-0036.
References 1. Trafton, J.G., Hiatt, L.M., Brumback, B., McCurry, J.M.: Using cognitive models to train big data models with small data. In: An, B., Yorke-Smith, N., El Fallah Seghrouchni, A., Sukthankar, G. (eds.). Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), pp. 1413–1421. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (2020). ISBN 978-1-45037518-4 2. Karabelnikova, Y., Samsonovich, A.V.: Virtual partner dance as a paradigm for empirical study of cognitive models of emotional intelligence. Procedia Comput. Sci. 190, 414–433 (2021). https://doi.org/10.1016/j.procs.2021.06.05 3. Samsonovich, A.V.: Emotional biologically inspired cognitive architecture. Biol. Inspired Cogn. Archit. 6, 109–125 (2013) 4. Samsonovich, A.V.: Socially emotional brain-inspired cognitive architecture framework for artificial intelligence. Cogn. Syst. Res. 60, 57–76 (2020). https://doi.org/10.1016/j.cogsys. 2019.12.002 5. Marsella, S., Gratch, J., Petta, P.: Computational models of emotion. In: Scherer, K.R., Bänziger, T., Roesch, E. (eds.) A Blueprint for Affective Computing: A Sourcebook and Manual. Oxford University Press, Oxford (2010) 6. Goleman, D.: Emotional Intelligence. Bantam Books, New York (1995) 7. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, Cambridge (2016) 8. Khaikin, S.: Neural Networks: A Complete Course, 2nd edn. Williams Publishing House, Moscow (2006). 1104 p.
Classification and Generation of Virtual Dancer Social Behaviors
241
9. Scholle, F.: Deep Learning in Python. Publishing House “Peter”, St. Petersburg (2018). (in Russian) 10. Isakov, S.: Recurrent neural networks: types, training, examples and applications (Electronic resource). https://neurohive.io/ru/osnovy-data-science/rekurrentnye-nejronnye-seti. (in Russian) 11. Gafarov, A.F.: G12 Artificial Neural Networks and Applications. Kazan Publishing House, Kazan (2018). (in Russian). 121 p. 12. Tsaregorodtsev, V.G.: Optimization of preprocessed data. Neurocomput. Dev. Appl. 7, 3–8 (2003) 13. Chollet, F.: Deep Learning with Python, 2nd edn. Manning Pub. Co., Shelter Island, New York (2021). ISBN-13: 978-1617296864 14. Nikolenko, S., Kadurin, A., Arkhangelskaya, E.: Deep learning. Piter, St. Petersburg (2018). (in Russian) 15. Osgood, C.E., Suci, G., Tannenbaum, P.: The Measurement of Meaning. University of Illinois Press, Urbana (1957) 16. Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161–1178 (1980) 17. Plutchik, R.: A psychoevolutionary theory of emotions. Soc. Sci. Inf. 21, 529–553 (1982) 18. Lövheim, H.: A new three-dimensional model for emotions and monoamine neurotransmitters. Med. Hypotheses 78(2), 341–348 (2012) 19. Harris, D., Harris, S.: Digital design and computer architecture, 2nd edn. Morgan Kaufmann, San Francisco (2012). ISBN 978-0-12-394424-5 20. Samsonovich, A.V.: A virtual actor behavior model based on emotional biologically inspired cognitive architecture. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) AGI 2021. LNCS, vol. 13154, pp. 221–227. Springer, Cham (2022) 21. Samsonovich, A., Dodonov, A., Klychkov, M., Budanitsky, A., Grishin, I., Anisimova, A.: A virtual clown behavior model based on emotional biologically inspired cognitive architecture. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y., Klimov, V.V. (eds.) NEUROINFORMATICS 2021. SCI, vol. 1008, pp. 99–108. Springer, Cham (2022). https:// doi.org/10.1007/978-3-030-91581-0_14 22. Berman, A., James, V.: Kinetic imaginations: exploring the possibilities of combining AI and dance. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 2431–2437 (2015) 23. Deng, L.Q., Leung, H., Gu, N.J., Yang, Y.: Real-time mocap dance recognition for an interactive dancing game. Comput. Animation Virtual Worlds 22(2–3), 229–237 (2011). https:// doi.org/10.1002/cav.397 24. Ho, E.S.L., Chan, J.C.P., Komura, T., Leung, H.: Interactive partner control in close interactions for real-time applications. ACM Trans. Multimed. Comput. Commun. Appl. 9(3), 19 (2013). https://doi.org/10.1145/2487268.2487274 25. Holldampf, J., Peer, A., Buss, M.: Virtual partner for a haptic interaction task. In: Ritter, H., Sagerer, G., Dillmann, R., Buss, M. (eds.) Human Centered Robot Systems. COSMOS, vol. 6, pp. 183–191. Springer, Heidelberg (2009). ISBN: 978-3-642-10402-2. https://doi.org/10. 1007/978-3-642-10403-9_19 26. Kirakosian, S., Maravelakis, E., Mania, K., and IEEE: Immersive simulation and training of person-to-3D character dance in real-time, pp. 170–173 (2019). ISBN: 978-1-7281-4540-2 27. Mousas, C.: Performance-driven dance motion control of a virtual partner character, pp. 57– 64. IEEE (2018). ISBN: 978-1-5386-3365-6 28. Senecal, S., Nijdam, N.A., Aristidou, A., Magnenat-Thalmann, N.: Salsa dance learning evaluation and motion analysis in gamified virtual reality environment. Multimedia Tools Appl. 79(33–34), 24621–24643 (2020). https://doi.org/10.1007/s11042-020-09192-y
242
A. I. Kuzmin et al.
29. Tamborini, R., et al.: The effect of behavioral synchrony with black or white virtual agents on outgroup trust. Comput. Hum. Behav. 83, 176–183 (2018). https://doi.org/10.1016/j.chb. 2018.01.037 30. Tsampounaris, G., El Raheb, K., Katifori, V., Ioannidis, Y.: Exploring visualizations in realtime motion capture for dance education. Association for Computing Machinery (2016). https://doi.org/10.1145/3003733.3003811 31. Yokoyama, R., Sugiura, M., Yamamoto, Y., Nejad, K.K., Kawashima, R.: Neural bases of the adaptive mechanisms associated with reciprocal partner choice. Neuroimage 145, 74–81 (2017). https://doi.org/10.1016/j.neuroimage.2016.09.052 32. Samsonovich, A.V., Ascoli, G.A.: Augmenting weak semantic cognitive maps with an “abstractness” dimension. Comput. Intell. Neurosci. 2013, 308176 (2013). https://doi.org/10. 1155/2013/308176 33. Samsonovich, A.V., Ascoli, G.A.: The conscious self: ontology, epistemology and the mirror quest. Cortex 41(5), 621–636 (2005). https://doi.org/10.1016/S0010-9452(08)70280-6
Possibility of Benford’s Law Application for Diagnosing Inaccuracy of Financial Statements Pavel Y. Leonov1(B) , Viktor P. Suyts2 , Vadim A. Rychkov1 , Anastasia A. Ezhova1 , Viktor M. Sushkov1 , and Nadezhda V. Kuznetsova1 1 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute),
Kashira Hwy, 31, Moscow, Russia [email protected] 2 Lomonosov Moscow State University, Kolmogorova Street, 1, Moscow, Russia
Abstract. The paper describes a technique for diagnosing data inaccuracy using Benford’s law. The Benford distribution for the first significant digit of a random decimal number is presented graphically and mathematically. The main requirements for data are listed, which are consistent with Benford’s law: the data must refer to one process, there must be no maximum and minimum restrictions in the studied population, artificial introduction of the numbering system is not allowed, and there must be no obvious linking patterns between numbers. When examining the possibility of applying Benford’s law to diagnose inaccuracies in the financial statements of an organization, the costs of two companies for payment of services to suppliers were analyzed. It was found that in the absence of attempts to manipulate reporting, performance indicators are close to theoretically predicted based on Benford’s law. Attempts to manipulate reporting are reflected in corresponding deviations from Benford’s law. The possibility of applying Benford’s law to diagnose unreliability of an organization’s financial statements has been proved. Keywords: Benford’s law · Distribution · Statistics · Accounting · Financial statements · Unreliability · Diagnosing inaccuracy JEL Codes: C46 · H26 · K42
1 Introduction Currently, the possibility of using the statistical patterns identified by Frank Benford to analyze reliability of an organization’s financial statements is widely discussed [1–4]. Benford’s Law describes the frequency of occurrence of a particular digit as the first significant in numbers from naturally formed arrays [5]. To test this law in practice, it is necessary to select a set of the first digits from the elements of the investigated numerical array and compare the actual frequency of their occurrence with the theoretical one determined by F. Benford [6]. Similarly, the second, third, etc. numbers can be analyzed. Having identified the deviation of the theoretical distribution from the empirical one, one can judge the probability of errors in the data or their deliberate distortion. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 243–248, 2022. https://doi.org/10.1007/978-3-030-96993-6_24
244
P. Y. Leonov et al.
2 Analytical Part 2.1 Benford Distribution
35% 30% 25% 20% 15% 10% 5% 0%
1
2
3
4
5
6
7
8
9
Fig. 1. Benford distribution. Horizontally – the first significant digits, vertically – the probability of their occurrence.
Benford’s law states that if a data array is formed in a natural way without outside interference, then the numbers in the highest digits of all numbers in the array have a discrete exponential distribution (Fig. 1). The expected probability of occurrence of the digit d1 as the first digit is described by the following equation: 1 ; d1 = 1, 2, . . . , 9 P(d1 ) = log 10 1 + (1) d1 The distribution of digital values for the first, second and third digit of a number in accordance with Benford’s law is shown in Table 1. Table 1. Expected distribution of digits in a random decimal number according to Benford’s law. Digit
1st digit
2nd digit 0,11968
0,10178
1
0,30103
0,11389
0,10138
2
0,17609
0,19882
0,10097
3
0,12494
0,10433
0,10057
4
0,09691
0,10031
0,10018
5
0,07918
0,09668
0,09979
6
0,06695
0,09337
0,09940
7
0,05799
0,09035
0,09902
0
3rd digit
(continued)
Possibility of Benford’s Law Application for Diagnosing Inaccuracy
245
Table 1. (continued) Digit
1st digit
2nd digit
3rd digit
8
0,05115
0,08757
0,09864
9
0,04576
0,08500
0,09827
When considering the first two digits d1 d2 more accurate results are obtained for further analysis. It is shown theoretically [7] that the Benford distribution for the first two digits is as shown in Fig. 2: 4.5% 4.0% 3.5% 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% 0.0%
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
Fig. 2. Benford distribution for the first two digits. Horizontally – the first two significant digits, vertically – the probability of their occurrence.
At the end of the XX century, scientific papers were published by the American scientist in the field of accounting, auditing and mathematics Mark Nigrini. In his works, M. Nigrini analyzed more than 200,000 tax returns and concluded that Benford’s law applies to many sets of financial data, including data on accounts receivable and payable, income tax or stock exchanges, data on corporate expenses and sales, as well as revenue indicators and profits of an organization [8]. 2.2 Comparison of Companies For the analysis, we took two companies operating in different industries – “A” (ITsphere) and “B” (construction business). Company “A” does not hide its real income, performs legal transactions with funds, does not manipulate financial data and pays the taxes in full to the state. This company has been operating for more than 20 years and today occupies a leading position in the market of services offered. In company B, on the other hand, the situation is reversed. It is known for certain that in the previous reporting periods management carried out repeated actions to evade taxation. By veiling real indicators of financial performance, they entered into contractual relationships with third parties, understated profits and overstated costs, and implemented complex schemes involving fictitious firms.
246
P. Y. Leonov et al.
To answer the question, what data for statistical analysis need to be collected, it is required to determine what restrictions are imposed on these data. Sampling elements should relate to the same characteristics, there should be no maximum and minimum re-strictions in the studied population, artificial introduction of the numbering system is not allowed, and there must be no obvious linking patterns between the numbers. One of the main conditions for statistical analysis requires that the considered array be of sufficient volume [9]. In view of this, it is not possible to use data on the indicators of revenue or profit of economic entities. When investigating the possibility of applying Benford’s law to diagnose inaccuracy of an organization’s financial statements [10], there have been selected the companies’ expenses for paying for the services of suppliers. Below the data of company “A” on settlements with counterparties for services rendered in 2018 are presented. Of the entire array of transactions, only payments to suppliers were left, of which this year there were more than 3,000 items. In each operation, the first two digits were identified and the frequencies of their occurrence were analyzed. The results are presented on a bar graph in Fig. 3. 5.0% 4.5% 4.0% 3.5% 3.0% 2.5% 2.0% 1.5% 1.0% 0.5% 0.0%
10 15 20 25 30 35 40 45 51 56 63 70 76 84 92 99
Fig. 3. Frequency of coincidences of the first two digits in the operations of company “A”.
The obtained results are close to the expected theoretical, but it is not difficult to notice the existing outliers. They correspond to multiple fixed payments to one counterparty. Despite the lack of perfect agreement with the theoretical values, the observed results are in good agreement with them, and an exponential decrease is observed in the first two digits of the highest categories of financial transactions. In a similar way, we analyzed settlements with counterparties in company “B”, which, as mentioned above, was noticed in carrying out financial fraud and was held accountable for its actions. Grouped data on payments for 2018 is presented in the form of a histogram, while the frequencies of the first pair of digits in the most significant bits of operations are determined (Fig. 4).
Possibility of Benford’s Law Application for Diagnosing Inaccuracy
247
6.0% 5.0% 4.0% 3.0% 2.0% 1.0% 0.0%
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
Fig. 4. Frequency of coincidences of the first two digits in the operations of company “B”.
As can be seen from the data presented, the results differ significantly from the theoretically calculated ones. The number of outliers significantly exceeds the number of similar operations carried out in relation to one counterparty. There are many deviations, both up and down, which cannot be explained by the specifics of the operations performed. Assessing company “B” as unknown previously, it can be assumed that there are errors or intentional misstatements in the financial statements provided. This situation should serve as an impetus for a more detailed analysis of its economic activities.
3 Conclusion Based on the results presented above, it can be concluded that it is advisable to apply Benford’s law to diagnose inaccuracy of an organization’s financial statements. Thus, it has been established that settlements for payments to counterparties for one year have a character close to theoretical in the event that a studied organization does not mask the actual financial indicators and provides actual data to tax authorities. In the event that an organization, in order to evade taxation, resorts to various methods of manipulating accounting data, the nature of their distribution will differ significantly from the theoretical one defined by Frank Benford. Failure of accounting data to conform to Benford’s law is a reason for their clarification and a thorough analysis.
References 1. Tammaru, F., Alver, L.: Application of Benford’s law for fraud detection in financial statements: theoretical review. In: 5th International Conference on Accounting, Auditing, and Taxation (ICAAT 2016) (2016). https://doi.org/10.2991/icaat-16.2016.46 2. Nigrini, M.J.: Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations, 2nd edn. Wiley, Hoboken (2020) 3. Henselmann, K., Scherr, E., Ditter, D.: Applying Benford’s law to individual financial reports: an empirical investigation on the basis of SEC XBRL filings. Working Papers in Accounting Valuation Auditing 2012-1 (2012) 4. Kurochkina, I.P., Bystrygina, N.V.: Innovative methods for assessing the degree of reliability of information in the consolidated financial statements of organizations. Econ. Stat. Inform. 6(2), 309–313 (2014). (in Russian)
248
P. Y. Leonov et al.
5. Alekseev, M.A.: Applicability of Benford’s law to determine the reliability of financial statements. NSUEU Bull. 4, 114–128 (2016). (in Russian) 6. Suyts, V.P., Khorin, A.N., Zhakipbekov, D.S.: Diagnostics of the unreliability of the organization’s reporting: statistical methods in assessing the reliability of financial statements. Audit Financ. Anal. 1, 179–188 (2015). (in Russian) 7. Nigrini, M.J.: I’ve got your number: how a mathematical phenomenon can help CPAs uncover fraud and other irregularities. J. Account. 187(5), 79–83 (1999) 8. Zverev, E., Nikiforov, A.: Benford distribution: identifying non-standard items in large collections of financial. Information 3, 4–18 (2018). (in Russian) 9. Leonov, P.Y., Suyts, V.P., Kotelyanets, O.S., Ivanov, N.V.: K-means method as a tool of big data analysis in risk-oriented audit. Commun. Comput. Inf. Sci. 1054(3), 206–216 (2019) 10. Savelieva, M.Y., Vyuzhanina, I.I.: Comparative characteristics of approaches for detecting the manipulation of accounting reporting. Econ. Bus. Theory Pract. 12–2, 58–60 (2018). (in Russian)
Artificial Intelligence Limitations: Blockchain Trust and Communication Transparency Sergey V. Leshchev(B) National Research Nuclear University MEPhI, Moscow, Russia
Abstract. Natural and artificial systems and agents overcome communication limitations in different ways. It is necessary to develop scenarios for their interaction (for example, in hybrid environments and Industry 4.0), identifying the most risky strategic lines of mutual danger and discovering areas of transparent communication. In this vein, the problem of communication transparency can be understood as a problem of trust. The article discusses the concept of transparency (security, reliability) of communication in the social and technological spheres. This problem has been studied in the specific contexts of artificial intelligence and blockchain. It has been methodologically shown that the combination of these two technologies can significantly expand the capabilities of both. Artificial intelligence is generating more efficient and secure blockchain protocols as a network, as well as private implementations of blockchain solutions such as smart contracts. The probable ecological optimization of the blockchain by artificial intelligence due to the detection of ineffective algorithmic solutions or used software and hardware resources is also important. In turn, blockchain technology provides artificial intelligence with an environment for unfolding and building a transparent history (logging, precedency, patterns, templates). The network nature of the blockchain contributes, on the one hand, to the globality of solutions, the distribution of information, and, on the other hand, to the locality and privacy of owner interests. Keywords: Artificial intelligence · Turing test · Blockchain · Communication
1 Introduction. Transparency in AI-Sphere The most representative part of electronic culture today is artificial intelligence. Gradually, intelligent solutions are taking over the market due to the versatility of the approach – in particular, for example, machine learning and big data, security systems and analytical platforms, system programs and applied graphics, music and text packages. Artificial intelligence today is represented not only by software or hardware developments, which is especially reflected in the mass market of mobile technology, but also by new tools and practices, such as additive technologies, soft robotics, hybrid cyber-physical solutions that use all the variety of “software” in Industry 4.0 [1, 2, 3] framework. The physical and mathematical apparatus and the engineering and technical material part of this technology operates with various logical, mathematical, cybernetic, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 249–254, 2022. https://doi.org/10.1007/978-3-030-96993-6_25
250
S. V. Leshchev
thermodynamic, electrical and other structures, models, systems. At the same time, the philosophy of information [4], neurobiological theory of integrated information [5, 6], the philosophy of mind and artificial intelligence [7, 8] follow parallel paths. In total, all these approaches produce a certain pool of concepts that fix a general pool of ideas about how AI should have a “friendly interface”, should be safe, “understandable” and “open”. A more technical formulation of the question is as follows: how should AI communicate, i.e. relate to the models of the world, to the world itself, to natural consciousness, to rationality, in order to remain reliable, stable, safe and transparent? In the development of intelligent, cognitive and robotic systems, the usual front of such concepts is data and information, signals and communication, computation and thinking, syntax and semantics, signs and symbols, explanation and understanding, representation and knowledge, knowledge and wisdom. It is assumed that the first members of the given oppositions are factually reserved for weak or narrow AI, and the second ones, mainly, but not absolutely – for natural intelligence or self-conscious, autonomous, “strong” AI. Not only this contradistinction is absolute, it is even difficult to be formulated correctly. So “knowledge” is a totally ambivalent, and in the human sense is difficult to ascribe to a cognitive or socio-technical system. “Semantics” is used in phrases such as “semantic web”, without assumptions or comparison to “semantic understanding”. There are concepts that are provocative in their origin, such as, for example, the “logical depth” coined by Ch. Bennett, which has nothing to do with “deep understanding” on the one hand and “deep learning” on the other. In light of the above, it seems necessary to outline more clearly the demarcation lines between general humanitarian “correctness” and “reliability” of communication and more specific safe, reliable, transparent AI communication.
2 Research 2.1 Cases of Transparency: Neural Network Non-linearity, Social Contract Artificial intelligence practices are determined by communicative solutions: from the Turing test, which allows assessing the intelligence of a system or a partial cognitive module (see, for example [9]), to multi-agent systems, where the communicative connections of agents determine their behavior. In the 20th century, communication as a prototype of computational processing took on many incarnations. From the point of view of correct communication, the theory of communicative action is methodologically important. J. Habermas describes communicative rationality in terms of truth, comprehensibility, truthfulness, and legitimacy [10]. In different contexts, it is convenient to apply similar categorizations: adequacy, relevance, coherence, and the like. The technical constraints imposed on communication systems inherit some of the problems of human communication, which can be avoided at the architectural level by using such categories. One of the characteristic features of thinking is the ambiguity, ambivalence of many concepts, which generates associative series, and, ultimately, the complexity and recursive nature of thinking. However, the construction of an intelligent machine based on mathematical models and algorithms first goes through the stage of rigorous formal models. The point here is not a real necessity of such formal stage (which, perhaps, will be overcome in other types of computing, such as biomolecular or quantum computing), but in the “availability of a solution” for the user. It is possible to reserve strictly one
Artificial Intelligence Limitations: Blockchain Trust
251
meaning for one concept only in highly specific areas, mainly in strictly formal sciences, such as logic and mathematics (although, even here questions arise about the determinism of solutions in, for example, probabilistic and indeterministic models). Technically, such a fixation allows keeping a certain front of the system’s transparency, reversibility, linearity, i.e. the possibility of auditing the solution by the operator. This problem also arises where the symbolic approach (modeling reasoning, decision making, theorem proving) is opposed to connectionism (neural networks, deep learning). The deliberate introduction of “opacity” in the form of a nonlinear activation function in neural networks is due to the need for “production of new knowledge” or “production of complexity”. A completely transparent model (with a linear activation function) does not bring knowledge and therefore is not used (except for some cases when, for convenience, a linear activation function in the form of, for example, scaling by a factor is still used). However, it is not only nonlinearity that affects the lack of transparency. Various calculation techniques, such as, for example, drop-out (selection of only certain, including probabilistically selected nodes of a given layer for recalculation), you need the network not to depend on the settings of specific nodes and not to remember the selected data set (retraining). Transparency can be considered at a deeper level: it is desirable that non-linear activation functions be continuously differentiable, which allows the network to be transparent for error correction (gradient descent as a method of “error propagation”). This understanding, in turn, can be expanded: in the case of stochastic gradient descent, we abandon the guaranteed reliability of the “one step” of the standard gradient descent and allow the training examples to act on weights correction sequentially, which adds to the efficiency of learning (another name for this method is operational gradient descent), but hides the “bias” of the data, i.e. undermines the reliability of the necessary correction. Another important example of transparency loss is dimension reduction on pooling layers in deep learning, where the values are irreplaceably reduced by the maximum or weighted mean function. A natural consequence of nonlinearity (and any other signal reduction) is the fact that, despite the productivity of neural networks that produce the required result, it is insufficient for us in those areas where we cannot confirm this or that choice by an expert assessment of the causal choice neural networks or where the so-called verbalization of a neural network is difficult (reducing the trained network to a set of logical and algebraic functions). The problem of transparency as an unambiguous interpretation is technically important due to the need for communication of any cognitive agents. The “social contract” of the human community arises thanks to well-established social protocols – laws, regulations (“legitimacy” according to Habermas). Such protocols limit the impact of possible errors. The social contract, however, does not by itself provide a high degree of social coherence. Highly coherent sociality acquires a new quality of mutual co-ordination of human agents at the cost of efforts to reach a consensus [10] – this quality can be conventionally called intersubjectivity. Human agents are related to each other in a different way than intellectual machines (social circles, community, etc.). This becomes possible due to a certain structure of the transparency of society, achieved transparency of communication.
252
S. V. Leshchev
Transparency formats take on different forms depending on the constraints of a research. So, thermodynamic “transparency” is associated with the reversibility of calculations. Reversible computing could become a “perpetual motion machine” of information processing, because its condition is entropy freezing. The possibility of their physical realization was substantiated (with some reservations) by C. Bennett. 2.2 Blockchain Transparency Advantage The transparency format that we are referring to in this article is related to the communication and communicative constraints of social and AI systems. Limitations, as a rule, follow from the deficiency of the communication model, or from specific dysfunctions of private communication decisions, errors. Media-related and message-related errors destabilize the communication transparency of the system, or completely interrupt communication. Not only direct violation of protocols is recognized as erroneous in social systems, but also marginal behavior, destructive practices. Writing computer viruses, for example, is not a prohibited practice, but works “against” a legitimate communication and gives rise to the antivirus industry, i.e. the industry of mistrust, opacity. Blockchain is a technological solution to the problem of trust. The synthesis of blockchain technology and artificial intelligence can lead to an extremely effective solution to the problem of transparency. Moreover, here the advantages of both communication directions of such technological convergence are revealed: both from artificial intelligence to the blockchain, and vice versa [11–14]. For example, problems of identification and authentication of any objects are solved, i.e. the problem of fraud and fake communications is being solved. Objects, processes – and any network entities – acquire their own identity and traceable network trajectory (biography). Transactions, in addition to the fact of their security, become once and for all fixed points of no return, i.e. become available for audit and authentication. Therefore, the implementation of the blockchain will allow “accompanying” decision-making chains, and the validation of decision-making becomes possible due to the end-to-end transparency of all steps. In addition to the obvious advantage of distributed ledgers in the de-centralization of data storage, blockchain allows you to process and store crypto-protected data that can be processed by intelligent algorithms much more efficiently than analytical processing by scientists is capable of. In turn, a secure, traceable storage mode for data distributed on a network scale across all user computing power allows machine learning algorithms and neural networks to learn from reliable data. Moreover, continuous training will make it possible to identify and recognize the “trust” format itself, that is, starting from a certain level of training, artificial intelligence will be able to determine the “trust style” in external data outside of blockchain protocols. Intelligent algorithms will be able to identify data susceptible to fake influence not only on the digital trail, but, in a sense, recognizing more abstract styles, “reliability structures”. At the new stage of fusion of blockchain and AI, the transparency of communication will increasingly be understood as a communication integrity, like the “psychological integrity” of a person. It is the blockchain that can turn out to be that certifying mechanism, thanks to which such analogies to the oppositions of “common sense” and the natural attitude as “internalexternal”, “system-environment”, “friend-foe” can be established. It is also possible to assume that fixing checkpoints “forever” will generate a kind of data marts with the
Artificial Intelligence Limitations: Blockchain Trust
253
required level of deployment, as it happens in intelligent data warehouses. Such “audit points”, being aggregated, will serve as the foundation for new levels of abstraction, super-connectivity of data. This factor is able to play the role of “heat removal”, i.e. turn out to be decisive in terms of energy savings for the generation of “blockchain blocks”, since “knowledge” will be born on the basis of new global pre-calculated “abstracts”, aggregated big data.
3 Results The idea of the transparency of communication in the social and technological spheres is considered. This problem has been studied in the specific contexts of artificial intelligence and blockchain. It is shown that the combination of these two technologies can significantly expand the capabilities of both. It has been demonstrated that the problem of communication transparency is reduced to trust interactions, which can be formalized as, for example, blockchain transactions. Artificial intelligence based on blockchain technology is able to learn without the need to identify inaccurate data. In addition, artificial intelligence trained on blockchain data will be able to identify data susceptible to forgery in external information environments, taking into account their internal dynamics.
4 Conclusion Electronic culture is gradually acquiring an all-encompassing character. Almost all spheres of social life are represented today by a parallel – informational, virtual – embodiment [15]. Some of these implementations, such as digital twins, are used to parallelize simulation processes, i.e. to completely replace the life cycle of a real object. Others – such as e-government and smart cities – have a regulatory impact on other smart services and intelligent systems [16, 17]. Informational environments of our time require new ideas about the security and transparency of systems and communications. Since natural, social, and artificial (multi-agent, information, artificial intelligence environments) overcome their communication limitations in different ways, it is necessary to develop new scenarios of their interaction (for example, in Industry 4.0, see [3]), defining the most risky strategic lines of mutual danger. In this vein, the problem of transparency of communication can be understood as a problem of trust. Blockchain is the most significant form of technological trust among modern technologies. Using blockchain technology in artificially intelligent developments, it is possible to create the most trusting forms of both the artificially intelligent assistants themselves, and the types of interaction with them, controlled “from within” by the technology of trust. The network society (designated by M.Castells) finds in the blockchain a guarantee of its transparency and sustainable development, and in artificial intelligence components – the basis for communication flexibility and the possibility of environmental optimization.
References 1. Schuh, G., Potente, T., Hauptvogel, A.: Sustainable increase of overhead productivity due to cyber-physical-systems. In: Proceedings of the 11th Global Conference on Sustainable Manufacturing, pp. 332–335. Berlin, Germany (2013)
254
S. V. Leshchev
2. Kumar, A.: Methods and materials for smart manufacturing: additive manufacturing, internet of things, flexible sensors and soft robotics. Manuf. Lett. 15, 122–125 (2018) 3. Leshchev, S.V.: From artificial intelligence to dissipative sociotechnical rationality: cyberphysical and sociocultural matrices of the digital age. In: Popkova, E.G., Ostrovskaya, V.N., Bogoviz, A.V. (eds.) Socio-economic Systems: Paradigms for the Future. SSDC, vol. 314, pp. 65–72. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-56433-9_8 4. Floridi, L.: The Philosophy of Information. Oxford University Press, Oxford (2010) 5. Tononi, G., Koch, C.: Consciousness: here, there and everywhere? Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 370, 20140167 (2015) 6. Tononi, G., Boly, M., Massimini, M., Koch, C.: Integrated information theory: from consciousness to its physical substrate. Nat. Rev. Neurosci. 17(7), 450–461 (2016) 7. Dennett, D.C.: Consciousness Explained. Little, Brown and Company, Boston (1991) 8. Searle, J.: Minds, brains, and programs. Behav. Brain Sci. 3(3), 417–457 (1980) 9. Leshchev, S.V.: Cross-modal turing test and embodied cognition: agency, computing. Procedia Comput. Sci. 190, 527–531 (2021) 10. Habermas, J.: Theorie des kommunikativen Handelns. B.I. Suhrkamp, Fr/M (1987) 11. Salah, K., Rehman, M.H., Nizamuddin, N., Al-Fuqaha, A.: Blockchain for AI: review and open research challenges. IEEE Access 7, 10127–10149 (2019) 12. Dinh, T.N., Thai, M.T.: Ai and blockchain: a disruptive integration. Computer 51(9), 48–53 (2018) 13. Raja, G., Manaswini, Y., Vivekanandan, G.D. et al.: AI-powered blockchain – a decentralized secure multiparty computation protocol for IoV. In: Proceedings of the conference IEEE INFOCOM 2020 – IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 865–870. IEEE, Toronto, ON, Canada (2020) 14. Dillenberger, D.N., Novotny, P., Zhang, Q., et al.: Blockchain analytics and artificial intelligence. IBM J. Res. Dev. 63(2/3), 1–14 (2019) 15. Leshchev, S.V.: The infogenesis and infotectonics of electronic culture: new horizons of information technologies. Sci. Tech. Inf. Process. 42(3), 135–139 (2015) 16. Kadyrova, G.M., Shedko, Y.N., Orlanyuk-Malitskaya, L.A., Bril, D.V.: Prospects to create and apply artificial intelligence in the activities of public authorities. IOP Conf. Ser. Earth Environ. Sci. 650(1), 012014 (2021) 17. Anthopoulos, L.G.: Smart government: a new adjective to government transformation or a trick? In: Understanding Smart Cities: A Tool for Smart Government or an Industrial Trick? vol. 22, pp. 263–293. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57015-0_6
Application of the Multi-valued Logic Apparatus for Solving Diagnostic Problems Larisa A. Lyutikova(B) Institute of Applied Mathematics and Automation, KBSC RAS (IAMA KBSC RAS), Nalchik, Russia
Abstract. The paper proposes a general approach and a software package developed on its basis for diagnostic tasks. Today, the most popular methods for solving such problems are neural networks. However, in cases where it is necessary to process small and poorly structured amounts of data, neural network algorithms do not guarantee accurate results. To work with such data, an approach is proposed based on the use of the multi-valued logic apparatus, which first of all analyzes the data for patterns, structures them, selects particularly significant connections. Selects a group of individual signs for each diagnosis. The computer implementation of this approach is a program for diagnosing gastritis. The input data is the real results of examinations of real patients. The diagnosis takes place according to a number of signs identified by doctors. Each feature has its own scope of definition, this is the reason for attracting the apparatus of multi-valued logics. The program allows you to set the diagnostic accuracy in advance. Within the limits of this accuracy, all possible diagnoses will be reflected on the screen. If the data are such that it is impossible to make a diagnosis with this accuracy, it is proposed to change the accuracy of the conclusion or to undergo an auxiliary examination. Keywords: Diagnostics · Knoyledge base · Algorithm · Clauses · Axioms
1 Introductıon Medical diagnostics is a fairly well known problem. There are various methods for solving it, which depend on the type of system and its purpose. These can be systems based on statistical and other mathematical models - they are based on mathematical algorithms that search for a partial correspondence between the symptoms of the observed patient and the symptoms of previously observed patients whose diagnoses are known [1–3]. There may be systems based on expert knowledge. In them, algorithms operate on knowledge about diseases presented in a form close to the ideas of doctors and described by expert doctors. These can be systems based on machine learning that require a large amount of data. It is in this case that the algorithm is able to learn for independent work. In recent years, deep learning algorithms have been widely used to diagnose and predict the development of a disease [4]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 255–260, 2022. https://doi.org/10.1007/978-3-030-96993-6_26
256
L. A. Lyutikova
Methods of remote diagnostics are also being developed, including remote sensing using convolutional neural networks [5]. The purpose of this work is to develop a method for data analysis and create on its basis an adequate software package for the diagnosis of gastritis. The proposed method is based on logical data analysis. And the construction of a complex discrete function, the variables of which are appropriately presented symptoms and diagnoses. This makes it possible, even with a small amount of data, to find patterns, build classes based on the revealed commonality of features, and select the most important properties for making a decision.
2 Formulation of the Problem A group of doctors proposed the problem considered in this work. To solve it, the data of patients who were diagnosed with gastritis according to the data of gastroenterological examinations were provided. There are 28 symptoms (signs of a disease) in total, each of which has 2 to 4 answers. The signs by which diseases are diagnosed are the result of established clinical practice, and include various examinations. The number of diagnosed types of gastritis is 17. 132 people were examined and diagnosed. Based on these data, it is necessary to build an algorithm for adequate diagnosis of the remaining patients. Figure 1 shows a sample questionnaire.
Fig. 1. Fragment of the assessment of the patient’s symptoms
Ye have a function Y = f (X ) from 28 variables, which is defined at 132 points, the domain of definition of each variable has a scatter of 2 to 4 options. It is necessary to restore the value of the function at other requested points. The formulation of this problem is reduced to the formulation of a problem based on precedents. Let be X = {x1 , x2 , ..., xn } xi ∈ {0, 1, ..., ki − 1}, where kr ∈ [2, ..., N ], N ∈ Z– Set of symptoms, diagnosed diseases. Y = {y1 , y2 , ..., ym } - many diagnoses, each
Application of the Multi-valued Logic Apparatus
257
diagnosis is characterized by a corresponding set of symptoms x1 (yi ), ..., xn (yi ) : yi = f (x1 (yi ), ..., xn (yi ))… Or X = {x1 , x2 , ..., xn }, where xi ∈ {0, 1, ..., kr − 1}, kr ∈ [2, ..., N ], N ∈ Z - processed input data Xi = {x1 (yi ), x2 (yi ), ..., xn (yi )}, i = 1, ..., n,yi ∈ Y , Y = {y1 , y2 , ..., ym } - output: ⎛ ⎞ ⎛ ⎞ x1 (y1 ) x2 (y1 ) ... xn (y1 ) y1 ⎜ x1 (y2 ) x2 (y2 ) ... xn (y2 ) ⎟ ⎜ y2 ⎟ ⎜ ⎟→⎜ ⎟ ⎝ ... ⎝ ... ⎠ ... ... ... ⎠ ym x1 (ym ) x2 (ym ) ... xn (ym ) It is necessary to find general rules that generate a given pattern, exclude uninformative variables, break the set of diagnoses into classes [7]. Maybe in such a small area of knowledge as the diagnosis of a type of gastritis, a good specialist, based on his experience, will give a more objective and complete picture of the possibilities of making a diagnosis than those that will be obtained as a result of the proposed method. But what is important is a general approach based on logical analysis of data, which alloys you to formally find the most important rules, the totality of which is capable of completely restoring the original information [8, 9].
3 Solution Methods The data with which one has to deal in solving diagnostic problems is known to be incomplete, inaccurate, ambiguous. However, the solutions obtained must comply yith the patterns that are explicitly and implicitly present in the data under consideration. Logical methods can analyze the initial data yell enough, highlight essential and insignificant features, identify the minimum set of rules necessary in order to fully restore the original patterns. As a result, you can get a more compact and reliable presentation of the original information, which can be processed more reliably and faster. Ye say that the constructed system of rules is complete if it ensures the derivation of all solutions in the area under consideration. A group of diagnoses identified by a specific symptom (a group of symptoms) will be called a class. Each diagnosis can be a representative of one or several classes, each class is defined by a set of similar symptoms. The production rule alloys you to expressively represent the relationship between a specific diagnosis and its symptoms: m
& xj (yi , ) → P(yi ), i = 1, ..., l; xj (yi ) ∈ {0, 1, ..., k − 1},
j=1
where is the predicate P(yi ) takes on the value true, i.e. P(yi ) = 1 if y = yi and n
P(yi ) = 0, if y = yi … Or in the form: ∨ x(yj ) ∨ P(yj ), j ∈ [1, . . . , m]. i=1
The decision functions for a set of data yill be called the conjunction of all decision rules: m
& xj (yi , ) → P(yi ), i = 1, ..., l; xj (yi ) ∈ {0, 1, ..., k − 1},
j=1
258
L. A. Lyutikova m
or f (X ) = &
n
∨ x¯ i ∨ P(yj )
j=1 i=1
The same diagnosis can be characterized by slightly different symptoms, this function will help to exclude insignificant symptoms, break the data into classes. In general, combining each individual rule into a common function by conjunction operation offers a wide range of data interpretation. As a result, we get a Boolean function of m + n variables (symptoms and diagnoses), which on each set will be equal to one, except for those sets where all symptoms are present, but the diagnosis corresponding to these symptoms is denied. We can say that this function allows any rules, except for the negation of those that exist. Below is the an elementary example and a table of values of the function under consideration is described for a visual theoretical demonstration of the proposed method.
4 Knoyledge System Modeling Algorithms The algorithm for selecting the rules from yhich the entire volume of the considered data can be obtained can be as folloys: n
ki , this is the number of questions for each - the number of columns in the table i=1
symptom and the number of possible answers in our case from one to four. The number of lines yill correspond to the number of diagnoses, in our case it is 17 plus the number of classes that yill be found [11]. Observing the order, ye yrite doyn the data for each item of all patients in the table as folloys: We take each diagnosis made and spread it across the corresponding columns, the diagnosis y1 will be placed in the column of each item according to the results of the examination of that patient. For example, gender will have two columns with values 0 and 1, and the diagnosis will be placed in the column based on the patient’s gender. The general view of the table is shown below. In the course of filling in the table, ye check the column in yhich the diagnosis of the patient in question falls. If there are already other diagnoses in the column, then ye cross them out and enter them into the class yith the diagnosis being considered, enter them in the next roy in the same column. These diagnoses are grouped into a class for a given diagnostic item. Further, ye sequentially consider the roys, if in the roy corresponding to any diagnosis there are not crossed out diagnoses left in the squares, then ye select the column corresponding to this diagnosis and consider this a unique sign of this particular diagnosis. Ye also consider the classes formed as a result of data analysis [12–15]. Thus, the algorithm allows one to construct those clauses that contain diagnoses, those by which is the recognition. Analysis of the presence of free knowledge in the resulting answer allows us to implement a procedure signaling the need for additional training: replenishing the original system of productive rules with new admissible solutions. In this way, a procedure can be implemented to improve the adaptive properties of gastritis diagnosis.
Application of the Multi-valued Logic Apparatus
259
5 Program Description The program that implements the above algorithm consists of tyo executable modules: Module 1. Performs database decoding using a dictionary, loading symptoms and diagnoses in question-and-ansyer form and analyzing the results. Module 2. Knoyledge base generation program. Creates information on the basis of the source file yith data or to clarify the given knoyledge system. Reduces the size of the database in accordance yith the approximate value, then it is yorth either reducing the accuracy of the algorithm and adds information to check the correctness of the stored data. After filling in all the fields shoyn in Fig. 1, ye can obtain a diagnostic result yith a given accuracy.
6 Conclusion The result of the study yas a softyare package for the diagnosis of gastritis, based on logical data analysis. The proposed analysis alloys you to find hidden patterns in the data, break doyn the data under study into classes, and find the unique properties of each diagnosis. Unlike neural netyork methods, this method demonstrates patterns and connections, does not require retraining. Can highlight the most signifi-cant patterns and shorten the process of finding a solution. It can be argued that logical algorithms can be used to analyze data; they can make it possible to consider the initial data as a certain set of general rules, among yhich to identify the minimum set of those rules that are enough to get all the initial ones. These rules yill be generative for the area under considera-tion, they yill help to better understand the nature of the objects under consideration, and minimize the search for correct ansyers. Acknowledgments. The reported study was funded by RFBR according to the research project № 19-01-00648 A.
References 1. Zhuravljov, J.: Ob algebraicheskom podhode k resheniju zadach raspoznavanija ili klassifikacii. Problemy Kibernetiki 33, 5–68 (1978) 2. Shibzukhov, Z.M.: Correct aggregation operations yith algorithms. Pattern Recog. Image Anal. 24(3), 377–382 (2014) 3. Ashley, I., Naimi, L.B.: Balzer stacked generalization: an introduction to super learning. Eur. J. Epidemiol. 33, 459–464 (2018) 4. Haoxiang, W., Smys, S.: Big data analysis and perturbation using data mining algorithm. J. Soft Comput. Paradigm (JSCP) 3(01), 19–28 (2021) 5. Vijesh, J.C., Jennifer, S.R.: Location-based orientation context dependent recommender system for users. J. Trends Comput. Sci. Smart Technol. (TCSST) 3(01), 14–23 (2021) 6. Grabich, M., Marichal, J.-L., Pap, E.: Aggregation Functions: Encyclopedia of Mathematics and Its Applications, pp. 127 (2009)
260
L. A. Lyutikova
7. Calvo, T., Beliakov, G.: Aggregation functions based on penalties. Fuzzy Sets Syst. 161(10), 1420–1436 (2010). https://doi.org/10.1016/j.fss.2009.05.012 8. Mesiar, R., Komornikova, M., Kolesarova, A., Calvo, T.: Fuzzy Aggregation Functions: A Revision. Sets and Their Extensions: Representation, Aggregation and Models. SpringerVerlag, Berlin (2008) 9. Fan, Y., Zhilin, Y., William, W.: Cohen differentiable learning of logical rules for knowledge base reasoning. Adv. Neural Inf. Process. Syst. Dec 2017, 2320–2329 (2017) 10. Peter, F.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data, pp. 396. Cambridge University Press, Cambridge (2012). ISBN: 978-1107096394 11. Rahman, A., Tasnim, S.: Ensemble classifiers and their applications: a review. Int. J. Comput. Trends Technol. 10(1), 31–35 (2014) 12. Dyukova, Ye.V., Zhuravlev, Yu.I., Prokof’yev, P.A.: Metody povysheniya effektivnosti logicheskikh korrektorov. Mashinnoye obucheniye i analiz dannykh 1(11), 1555–1583 (2015) 13. Lyutikova, L.A., Shmatova, E.V.: Constructing logical operations to identify patterns in data. E3S Web Conf. 224, 01009 (2020) 14. Christopher, J.C.B.: A tutorial on support vector machines for pattern recognition appeared. Data Mining Knowl. Discovery 2, 121–167 (1998) 15. Victor Aladjv. Computing Algebra Sustem. Mople. A nen Software Libraru 16. Cook, S.A., Rechow, R.A.: The relative efficency of propositional proof systems. J. Symbolic Logic 44(1), 36–50 (1979) 17. Duda, R., Hart, R.P.: Pattern Classification and Scene Analysis. Wiley, NJ (1973) 18. Lyutikova, L.A.: Use of Logic with a Variable Valency Under Knowledge Bases Modeling. Lyutikova, LA (CSR-2006) 19. Ryazanov, V.V., Senko, O.V., Zhuravlev, Y.: Methods of recognition and prediction based on voting procedures. Pattern Recog. Image Anal. 9(4), 713–718 (1999) 20. Shibzoukhov, Z.M.: On constructive method of synthesis of majoritaly correct algorithm families. In: Conference Proceedings of VII International Conference on Pattern Recognition and Image Analysis, vol. 1, pp. 113−115 (2004)
Semantic Generalization Means Based on Knowledge Graphs Nikolay Maksimov , Olga Golitsina , Anastasia Gavrilkina , and Alexander Lebedev(B) National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe Shosse, 31, Moscow, Russia [email protected]
Abstract. The article proposes approaches to text documents semantic content scaling, presented in a knowledge graph form, in order to reduce a cognitive activity working space. Two scaling types operations on graphs are considered enlargement as aggregation based on inclusion (part-whole relationship) and generalization based on generic relationships. Examples of declarative means use for the scaling operations implementation - a relationships classification and thesaurus are given. Keywords: Cognition · Semantic content scaling · Semantic generalization · Semantic enlargement · Knowledge graphs
1 Introduction In cognitive activity processes man tries to make for himself in the fashion that suits him best a simplified and intelligible picture of the world [1]. There are also human consciousness physical limitations in terms of processing information in working memory: a maximum of 7 ± 2 attention objects in perception-understanding process [2]. And no less significance a knowledge level of person - cognition subject. At the same time, the fundamental operation that an observer can perform is an operation of distinction, the specification of an entity by operationally cleaving it from a background. Furthermore, that which results from an operation of distinction and can thus be distinguished is a thing with the properties that the operation of distinction specifies, and which exists in the space that these properties establish [3]. In the semantic scaling task two aspects can be distinguished: 1) an objects/connections number reducing in cognitive activity field (including in a vision area), and 2) cognitive objects semantics “bringing” to a level adequate to a person cognitive situation (actually scaling). The factors that determine the need for scaling can be, on the one hand, the person knowledge level on a given subject area (SbA), and, on the other hand, the depth and/or SbA subject breadth, entailing a significant variety and objects/relationships/properties number, which significantly complicates understanding. The article discusses approaches to the semantic scaling formation (like thinking operations) based on graph-theoretic representation of text documents content, as well © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 261–267, 2022. https://doi.org/10.1007/978-3-030-96993-6_27
262
N. Maksimov et al.
as examples obtained using the scientific and technical texts visual ontological analysis service [3].
2 Semantic Scaling In general, as a knowledge semantic scaling (granulation) we will mean an information amount change about objects, integrally provided for perception. Reducing information within the graph-theoretic model framework is possible due to (1) decreasing/highlighting/selecting in a cognitive activity field a reality fragment described by a graph and, accordingly, displayed vertices and relationships number (that is, a subgraph highlighting/formation), and/or (2) decreasing in cognitive activity field available objects analyzed properties, that is, by transition to higher level concepts and relationships. To reduce entity/relationships number in a cognitive activity field, the aspect projection operation may use, which allows to display a subject area certain semantic slice (more detaled in [4]), and for «bringing» a cognitive objects semantics to level adequate to a task, can be used semantic scaling (enlargement and generalization), as well as abstraction and idealization. Scaling types are determined by features nature (essence): will be used in operations objects-instances or object properties. In the first case – it’s enlargement as objects aggregation on an inclusion basis (part-whole relationship). In the second case – it‘s generalization by bringing objects-concepts/relationships (constituting a generalized image) to a generality higher level (genus-species relationship). In a knowledge graph context, enlargement is an operation in which in routes where all relationships belong to the “part-whole” class, original graph intermediate vertices (i.e., non-initial and non-final) are “hidden”. Generalization is an operation in which more specific concepts are replaced by generic ones (i.e., objects common properties are used) while preserving original relationships. Text is considered as elementary facts set that fixate a relationship between an entities pair, and the graph-theoretic representation is used to text content display and transformation [5]. The following are semantic enlargement and generalization technologies, based on use of relationships classification and concept thesauri.
3 Technology 3.1 Semantic Enlargement Semantic enlargement of knowledge graph constructed from text is performed by identifying arcs on the graph that correspond to “part-whole” type relationships, and replacing arc with incident vertices with a vertex “whole” while preserving all other arcs of both vertices. Thus, the subgraphs, which presents dividing the whole into parts result, are replaced by one vertex. Revealing relations “part-whole” is ensured by using relationships classification [6] in knowledge graph constructing. Top level of this classification is represented by combination of three facets, reflecting the separate and aggregate ratio (separate-separate,
Semantic Generalization Means Based on Knowledge Graphs
263
separate-whole, whole-whole), the reality/model ratio, and also reflecting a relationship manifestation form. To encode classes, a positional coding system is used, the first code position is reserved for the ratio between separate and aggregate facet. Thus, the relationships class code included in a hierarchy, growing from the top-level class with facet focus, representing the relationship “separate-whole” (the code of this class code begins with numeral “2”). Let’s consider a graph enlarging example1 for the following text: “The pump consists of a housing and a removable part. The removable part is sealed with a trapezoidal copper gasket to ensure tightness. The housing is made of heat-resistant steel grade 48TC, protected from the inside by a stainless overlay. Support legs are welded to the housing. The support legs lean on the base frame. The removable part consists of an upper radial-axial bearing, a pump shaft, an impeller, a guide apparatus, a frame and a cover with a neck welded from 48TC steel forgings. A mechanical shaft seal and a hydrostatic bearing are located in the neck. The pump shaft is one-piece forged and made of steel 20X13. The impeller with double curvature blades is welded from two parts: a disk with blades and a cover disk. The impeller and the guide apparatus are made of stainless steel 10X18H9TL”. Figure 1 shows a graph based on the original text.
Fig. 1. The graph built from the original text (the arcs labels are codes and relationships classes names, font size corresponds to a concept weight).
The constructed graph contains 30 connected vertices. The «removable part» concept seems to be the most significant. Figure 2 shows the graph obtained as the enlargement operation application result. 1 All examples was made for Russian texts and is giving in translation in article.
264
N. Maksimov et al.
Fig. 2. The graph obtained as result of the enlargement
The figure shows that now “pump” is the most significant concept, which represents the entity of which the “removable part” is a part. This is obtained as an absorption result of vertices connected with the “pump” vertex by the “part-whole” relationships. In this case, the number of vertices is reduced to 16. 3.2 Semantic Generalization Another approach to document texts semantic content scaling technologies built on subject area thesauri use, which a priori use generalizations (generic relationships). The generalization operation consists in replacing of original graph vertices with thesaurus higher-ranking terms. To do this, a subject area thesaurus is searched for descriptors that coincide with graph nodes contents or according to lexicographic inclusion principle. After that, from the found descriptors along a generic hierarchy branch (towards a higher ones), a transition is made to a top generic descriptor, which as a result replaces an entity name in a graph node. If found thesaurus descriptors do not have generic extension, then branches from the descriptors associated with the found one by USE relationships are considered. After replacing entity names with thesaurus descriptors, relationships between descriptors (if any) are added to the graph. Figure 3 shows the graph built according to the text: “The main circulation pumping unit (MCPU) is designed for coolant circulation creation in primary loop and heat extraction from a reactor core. The MCPU has an additional function of coolant circulation ensuring on runaway in case of various accidents with blackout. An additional function allows for a smooth transition to a natural circulation mode”.
Semantic Generalization Means Based on Knowledge Graphs
265
Fig. 3. The original text graph.
In this example, IAEA INIS thesaurus [8] was used, in which descriptors corresponding to the original text were selected. As a result, the following chains were obtained: “reactor core” – context-thesaurus – “REACTOR CORES” – BT – “REACTOR COMPONENTS”; “runaway” – context-thesaurus – “RUNAWAY (REACTOR ACCIDENT)” – USE – “EXCURSIONS” – BT – “ACCIDENTS”; “main circulation pumping unit” – context-thesaurus – “PUMPS” – BT – “EQUIPMENT”; “coolant circulation ensuring” – context-thesaurus – “COOLANTS”; “blackout» – context-thesaurus – “STATION BLACKOUT” – BT – “ACCIDENTS”; “heat extraction» – context-thesaurus – “HEAT EXTRACTION” – RT – “EQUIPMENT”; “heat extraction» – context-thesaurus – “HEAT EXTRACTION” – RT – “ENERGY TRANSFER”; “primary loop» – context-thesaurus – “HEATING LOOPS» – BT – “ENERGY SYSTEMS”; “primary loop” – context-thesaurus – “COOLANT LOOPS” – BT – “ENERGY SYSTEMS”; “various accidents” – context-thesaurus – “ACCIDENTS”; “natural circulation mode” - context-thesaurus - “NATURAL CIRCULATION” – USE – “NATURAL CONVECTION” – BT – “ENERGY TRANSFER”; “coolant circulation creation” – context-thesaurus – “COOLANTS”.
266
N. Maksimov et al.
The generalization technological operation will consist in replacing of vertices referring to the original text with the associated thesaurus terms included in generic relationships. Thus, “REACTOR COMPONENTS” vertex will replace the “reactor core” vertex. The vertex “Main circulation pumping unit” corresponds to the thesaurus term “PUMPS”, which is connected by BT relationship with the term “EQUIPMENT”, so these three vertices will be replaced by one vertex “EQUIPMENT”. The vertices “primary loop”, “COOLING LOOP”, “HEATING LOOP”, “ENERGY SYSTEMS” will be replaced by one vertex “ENERGY SYSTEMS”. The vertices “runaway”, “various accidents”, “blackout” have corresponding thesaurus terms, which are connected by BT type relationships with the term “ACCIDENTS”. Therefore, all these vertices will be replaced by one “ACCIDENTS” vertex. The “ENERGY TRANSFER” vertex will replace the “natural circulation mode” vertex. Finally, the “coolant circulation creation” and “coolant circulation ensuring” vertices will be replaced by one vertex “COOLANTS”. Additionally, the graph will include the relationships “HEAT EXTRACTION” - RT – “EQUIPMENT” and “HEAT EXTRACTION” - RT – “ENERGY TRANSFER”. Generalization operation result is shown on Fig. 4.
Fig. 4. The text graph after the generalization operation. In upper case thesaurus terms are given (except for “MCPU”).
After the generalization, it can be seen that the text refers to equipment associated with coolants in energy systems, which realize heat extraction from reactor components, with the additional function of a smooth transition to energy transfer in an accident.
Semantic Generalization Means Based on Knowledge Graphs
267
4 Conclusion The above examples show a principal possibility of semantic scaling operations implementing. But it should be noted that generalization (like other intellective operations) cannot be performed by a separate action on a separate concept. A concept (more precisely, its specific content - meaning) depends on surrounding concepts (as context) and, in turn, determines the meaning of others, thereby creating a semantic integrity. Similarly, relationships identification, that reflects reality interrelations and properties change, depends on context. That is, first it is necessary to reveal and identify a base essence of text content and then - to determine each entity significance and relationships class (from role point of view in scaled image being formed). Based on functional-cybernetic model, the essence may be represented: 1) by a predominant properties class (area) presented in text; and 2) by a predominant text aspect orientation and/or by a nature of goal effect. Accordingly, in knowledge graphs scaling operations all concepts and relationships should be analyzed and evaluated, and not in a separate aspect. And the scaling procedure should be based on both of the above approaches, and with the declarative means use concepts and relationships thesauri, and also use a top-level ontology.
References 1. Einstein, A.: Sobranie Nauchnykh Trudov [The Collection of Scientic Works]. In: Tamm, I.E., Smorodinskii, Y.A., Kuznetsov, B.G. (eds.), vol. 4, Nauka, Moscow (1967). (in Russian) 2. Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63(2), 81–97 (1956) 3. Maturana, H.: Biology of Language: The epistemology of reality. In: Miller, G., Lenneberg, E. (eds.) Psychology and Biology of Language and Thought, pp. 28–62. Academic Press, New York (1978) 4. Maksimov, N.V., Golitsina, O.L., Monankov, K.V., Lebedev, A.A., Bal, N.A., Kyurcheva, S.G.: Semantic search tools based on ontological representations of documentary information. Autom. Documentation Math. Linguis. 53(4), 167–178 (2019). https://doi.org/10.3103/S00 05105519040046 5. Maksimov, N.V., Golitsina, O.L., Monankov, K.V., Gavrilkina, A.S.: Methods of visual graphanalytical presentation and retrieval of scientific and technical texts. Sci. Visual. 13(1), 138–161 (2021) 6. Maksimov, N.V., Gavrilkina, A.S., Andronova, V.V., Tazieva, I.A.: Systematization and identification of semantic relations in ontologies for scientific and technical subject areas. Autom. Documentation Math. Linguis. 52(6), 306–317 (2018) 7. Maksimov, N.V., Golitsina, O.L., Monankov, K.V., Gavrilkina, A.S.: Opytnyj obrazec servisa vizual’nogo ontologicheskogo analiza nauchno-tekhnicheskih tekstov. Svidetel’stvo ob ofitsial’noi registratsii programm dlya EVM [The prototype of the service of visual ontological analysis of scientific and technical texts. The Certificate on Official Registration of the Computer Program]. No. 2021610648 (2021). (in Russian) 8. INIS Repository Search. https://inis.iaea.org/search/. Accessed 25 Oct 2021
Knowledge Graphs in Text Information Retrieval Nikolay Maksimov , Olga Golitsyna , and Alexander Lebedev(B) National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe shosse, 31, Moscow, Russia [email protected]
Abstract. The article discusses the issues of texts ontological representations graph forms interactive use in tasks of information support by means of documentary type information retrieval systems in one of the most human activity complex types - scientific research - the new scientific knowledge output process, as result of which new facts are being established and generalized. Cognitive-like search tools on full texts based on knowledge graph is discussed. Examples of graph search using path search technologies and analysis of the neighborhood of an entity or property are given. Keywords: Knowledge graphs · Ontologies · Semantic information retrieval
1 Introduction Automated documentary information retrieval (IR) systems use as an activity that replaces the main (target, useful, practice) human activity in modern digitalization conditions. So, synthesizing new knowledge task can be presented as solution image forming task as a result of (and by) constructing a new text from relevant documents texts fragments. Such text (in form of abstracts, explanatory notes, scientific articles, etc.) represents an image of main activity problem solving. Graph-based knowledge representations (like conceptual graphs, knowledge graphs) are actively used in various fields as a displaying relationships convenient means between objects of different nature. Appearing in classical works, firstly Ch. Pierce “diagrammatic reasoning for existential graphs” [1] as well as [2, 3], they were given new life and popularized by Google, whereupon they became frequent in works on natural language processing, Sematic web, logic [4–7]. There is also a tendency to move from a traditional request verbal form to a graphical one in interaction technologies with a search engine. For example, in [8] suggested to use semantic networks for query extension. Query terms are compared with a semantic network terms, from which concepts are selected to form additional search conditions. In [9] provides a visual representation of query environment space, allowing user to interpret relationship between query terms and terms from their environment. This enables for user to visually evaluate an information and interactively refine a request by choosing those terms that accurately reflect his information need. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 268–274, 2022. https://doi.org/10.1007/978-3-030-96993-6_28
Knowledge Graphs in Text Information Retrieval
269
It becomes possible to switch to semantic IR (in limit – cognitive, in some points similar to [7, 10]) by combining knowledge graphical representation and information retrieval tools on graphs, when the objects of search are not only documents, but also their semantic context. Knowledge as an intellectual human activity object is adequately represented by ontological tools, since they, according to Logical-semiotic form [11, 12], reflect not only subject area (SbA) immanent and situational links, but also a relationships between cognition tool concepts and between language terms. In this sense, ontologies can be a “polygon with contour maps” where user implements a trajectory of both informational and subject search: understanding of using key concepts expediency is provided through these concepts visualized contexts and paths construction between them. At the same time, document ontology semantic graph1 interactive visualization allows to use the graph as a navigation tool for document material, since makes it possible to operate with context-specific subgraphs and move from graph vertices to text fragments. In the limit, formed from texts in natural language an ontology graph can serve as a tool for constructing a number of images (alternatives and additions), which make it possible to solve the user’s pragmatic problem together. Note that such representation (the combination of subject, conceptual and sign components, which actually constitute the knowledge base) makes the semantic network a knowledge graph. The article briefly presents models and tools for constructing semantic search images for full texts, providing the possibility of formal conceptual analysis and graph structures synthesis for searching of functional oriented entity dependences in texts (as deep semantic IR but not as AI Reasoning).
2 Semantic Image Model Just like a meaningful text, constructed from this text an ontology graph can be considered as a facts set, expressing some meaning in aggregate. The following types of information components can be distinguished: – an elementary fact - an image that fixes a certain state of an entities pair particular interaction, where a concept, an object, a subject, etc. acts as an entity, and the interaction is represented by a connection (relationship); – a situational fact - an elementary fact in which both entities (or one of them) are additionally determined by circumstances of the entity’s participation in interaction a specific situation; thus, a new named entity is formed, which includes an elementary facts set; – a completed fact (statement, observation, description) - a network of elementary and/or situational facts, forming an integrity, correlated with an information request, and thus forming a meaning.
1 An ontology representation in graph form (Datalogical form) is a functional system [11] data
model represents by labeled directed graph, that have multigraph property, and on which can be dynamically formed metagraph and hypergraph.
270
N. Maksimov et al.
In this case, an elementary fact corresponds to a triplet “entity-relationship-entity” in ontology graph, and a situational fact - to a triplet, in which one or both entities represented by elementary triplets set that make up the atomic entity semantic neighborhood. The completed fact is represented in ontology graph by a certain integral triplets construction, reconstructing the source text creator intentions on the one hand, and corresponding to the problem context being solved by user on the other hand. In an ontology functional system besides an entities set and functional relationships set a characteristic properties set and a composition law presence makes it possible to group entities not only in dynamics, for example, according to synthesized facts chain principle correspondence, but also in statics - for example, according to possessing a common property principle, according to lexicographical inclusion, etc. An ontology formation as a network organization search image reduces to a document text presentation in an elementary facts set form. At this stage, in addition to entities names, situational relationships are formed, which can be typed in accordance with relations taxonomy proposed in [12]. Entities names analysis makes it possible to form additionally structural and linguistic relations, based on abbreviations recognition, measurement units, long phrases division according to the natural language rules, etc.
3 Search on Knowledge Graphs The semantic IR problem can be reduced to an iterative sequential solution of two problems: the information retrieval classical problem and found documents in-depth analysis with document ontology graph use as an interactive tool for navigating through text and concepts. A document ontology graph visual analysis allows one to find paths or connected components that are relevant to the user’s information needs. From graph vertices to fragments of source text transition and such fragments combination allows to discover new knowledge, as well as check the existing ones for consistency. Information retrieval tasks can be divided into two classes - task of finding a problem solution and tasks of an information-analytical nature. The first type problems involve a search for a solution that can be represented by a process, that is, a directed sequence of events and actions on objects. This predetermines the need for each results present in a form reflecting the solution direction (from the starting point to the “answer”) and, ideally, represent this solution algorithm. For this tasks kind, path finding metaphor is suitable, which involves building an items sequence corresponding to objects, events, actions, expressed concepts (elementary facts chains), from base concepts to concepts in main activity problem potential solution context. For tasks of an information-analytical nature, neighborhood finding metaphor can be used, which involves basic concepts context visualizing. Grouping around mainstay concepts allows user to deepen research topic.
Knowledge Graphs in Text Information Retrieval
271
4 Examples2 4.1 A Path Finding on a Knowledge Graph Let us illustrate the developed models and methods application implemented within the xIRBIS framework3 on SbA example corresponding to the problem of under construction Baltic NPP capacities incomplete demand. As an information search result in an information resource, documents were found, the relevant fragments of which are combined4 . Made use of this text an ontology graph was constructed, and it’s not presented here due to large size. The ontology graph fragment containing the “Power” vertex is shown in Fig. 1. Let’s build the shortest path between the vertices “Pressure” and “Power” (see Fig. 1, the path is highlighted by bold line).
Fig. 1. The ontology graph fragment of text “incomplete demand for capacities of under construction Baltic NPP problem” and the shortest path (highlighted by bold line) between vertices “Pressure” and “Power”.
The path (see Fig. 1) contains the elementary fact “steam pressure” - “to be a goal (assignment) [for] dependency [result] changing [to increase]” - “decreasing Baltic NPP 2 Examples are built for Russian-language texts from http://atomenergoprom.ru and https://tes
iaes.ru//. Texts and figures contains the corresponding translations in article. 3 Maksimov, N., Golitsyna, O., Monankov, K., Gavrilkina, A.: Documentary information and
analytical system xIRBIS (revision 6.0): computer program [In Russian]. Certificate of state. registration No. 2020661683 dated 09/29/2020. 4 An alternative way is to build graphs for each document, and then - graphs union.
272
N. Maksimov et al.
power unit electric capacity”. Let us consider further the elementary fact “steam pressure” - “locativity [in]” - “steam generator”. From these two elementary facts follows that parameters - generator steam pressure and power unit electric capacity - are related. Graph "reading” allows us to conclude that a steam pressure increasing in the steam generator will lead to a steam generator temperature head decreasing, as a result of which a coolant average temperature will increase, which will lead to excess reactivity formation. The excess reactivity absorption will lead to Baltic NPP power unit electric capacity decreasing. Thus, a connection has been established between the parameters of steam pressure in steam generator and power unit electric capacity and the nature of connection has been established - an increase in pressure will lead to a decrease in capacity. 4.2 Neighborhood Analysis Let us consider the search scheme “neighborhood analysis” using the example of tracing requirements. In Fig. 2 shown one of sub-graphs formed as projection operation application result to the text containing the technical requirements for MCPU, which allows to trace the following dependencies: “MCPU” - “retains” - “Operability” - “under conditions” “Joined action of operational and seismic loads”; “MCPU” - “retains” - “structural strength” - “limitation [up to] ” - “design earthquake”. Thus, according to the constructed graph, it can be concluded that there are conditions for the MCPU operability under seismic loads.
Fig. 2. Example of condition highlighting by aspect projection operation. Bold type in translation denotes vertices, italics - relations.
A sub-graph for permissible values of seismic loads is shown in Fig. 3.
Knowledge Graphs in Text Information Retrieval
273
Based on the constructed graph, the user can draw a conclusion about the admissibility of maximum horizontal acceleration value on the free soil surface equal to 0.388 g.
Fig. 3. Example of requirements highlighting with numerical values by aspect projection operation. Bold type denotes vertices, italics - relations.
Transition to source text fragments showed that these conclusions do not contradict to the document content.
5 Conclusion The user ultimate goal is to build a problem solution image, or rather, a number of images, which will ultimately make it possible to form a tool/method/technology for actually solving a pragmatic problem. In the general case, such an image can be presented as an algorithm for main activity problem solving, where taken from received information and/or the user’s knowledge individual components (data/functions) or blocks will be linked into a logical chain. This is fundamentally different from a classical search concept, which presupposes a final “whole” result formation in response to a semantically complete query. That is, here we need to come to technological and semantic integration of main and informational activities in interactive “quantization” mode, when requests will relate to a minimal, but significant for decision making, and answers (documents fragments, as semantic “building blocks” of pragmatic problem solution), will fill the SbA “fractally”. The developed full text IR technology, which, in addition to the stages corresponding to the classical information retrieval, also includes cognitive search based on the construction, visualization, analysis/transformation of document(s) ontology graph. Search on graphs is reduced to schemes that include the search for a facts chain and the search for an elementary (situational) fact neighborhood. In general, this makes it possible to
274
N. Maksimov et al.
increase the interactive iterative process efficiency of building new knowledge by combining the search, analysis and synthesis of information, including through a purposeful and controlled reduction in the operating concept space dimension. Thus, the "fuzziness" of linguistic variables (words and phrases) of language and possible meanings of the text are removed, on the one hand, by using the ontologies of language and SbA, and on the other hand - by using the user’s interactive selection of the actual branch/subgraph of the knowledge graph. Acknowledgements. This work was supported by the Ministry of Science and Higher Education of the Russian Federation (state assignment project No. 0723–2020-0036).
References 1. Peirce, C.: Reasoning and the Logic of Things: The Cambridge Conferences Lectures of 1898. Harvard University Press, Cambridge (1992) 2. Schneider, E.: Course modularization applied the interface system and its implications for sequence control and data analysis. Human resources research organization, Alexandria, Virginia (1973) 3. Sowa, J.: Conceptual graphs for a data base interface. IBM J. Res. Dev. 20(4), 336–357 (1976) 4. Ehrlinger, L., Wöß, W: Towards a definition of knowledge graphs. In: Joint Proceedings of the Posters and Demos Track of 12th International Conference on Semantic Systems - SEMANTiCS2016 and 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS16), vol. 1695, pp. 13–16. CEUR-WS, Aachen (2016) 5. Fensel, D., et al.: Knowledge Graphs: Methodology, Tools and Selected Use Cases. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-37439-6 6. Yahya, M., Barbosa, D., Berberich, K., Wang, Q., Weikum, G.: Relationship queries on extended knowledge graphs. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM 2016), pp. 605–614. Association for Computing Machinery, New York (2016) 7. Hamilton, W., Bajaj, P., Zitnik, M., Jurafsky, D., Leskovec, J.: Embedding logical queries on knowledge graphs. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 2030–2041. Curran Associates Inc., New York (2018) 8. Hoeber, O., Yang, Xue-Dong., Yao, Y.: Conceptual query expansion. In: Szczepaniak, Piotr S., Kacprzyk, Janusz, Niewiadomski, Adam (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 190–196. Springer, Heidelberg (2005). https://doi.org/10.1007/11495772_30 9. Hoeber, O., Yang, X., Yao, Y.: Visualization support for interactive query refinement. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 19–22. IEEE Computer Society, New York (2005) 10. Robertson, S.: On the history of evaluation in IR. J. Inf. Sci. 34(4), 439–456 (2008) 11. Golitsyna, O., Maksimov, N., Okropishina, O., Strogonov, V.: The ontological approach to the identification of information in tasks of document retrieval. Autom. Doc. Math. Linguist. 46(3), 125–132 (2012) 12. Maksimov, N.: The methodological basis of ontological documentary information modeling. Autom. Doc. Math. Linguist. 52(2), 57–72 (2018)
Digitalization of the Economy and Advanced Planning Technologies as a Way to Overcome the Economic Recession Yulia M. Medvedeva1
, Rafael E. Abdulov1,2(B) , Daler B. Dzhabborov3 and Oleg O. Komolov4,5
,
1 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute),
Kashirskoe Shosse 31, Moscow 115409, Russia 2 National University of Science and Technology (MISiS), Leninskiy Prospekt 4,
Moscow 119049, Russia 3 Institute of Economics, Russian Academy of Sciences, Nakhimovskiy Prospekt 32,
Moscow 117218, Russia 4 Financial University Under the Government of the Russian Federation, 49, Leningradsky
Prospekt, Moscow 125993, Russia 5 Plekhanov Russian University of Economics, Stremyanny Lane, 36, Moscow 117997, Russia
Abstract. Over the past ten years, modern digital technologies have been actively introduced into the planning systems of large transnational corporations. First of all, we are talking about the analysis of large amounts of data, machine learning and deep learning tools. With advanced digital technologies, we can flexibly respond to changes in consumer preferences, and therefore adjusting to demand and controlling the supply of goods. These technologies can be used to improve state economic activity, in particular for use in planning. The role of planning is likely to increase, primarily to overcome the protracted recession in the global economy that has lasted for the past decades, as well as to eliminate the imbalances caused by the COVID-19 pandemic. abstract should summarize the contents of the paper in short terms. Keywords: Digital technologies · Big data · Planned economy · Central planning
1 Introduction In the modern economic mainstream, it is generally accepted that the planned economic system has lost the competition to the market economy because of its inefficiency. In many ways, the problems of the planned economy are associated with the so-called calculation argument put forward by L. Mises [1], which implies the impossibility, in contrast to the market mechanism of “supply and demand”, to correctly assess the need for the production and distribution of a particular product or service. Indeed, the problem of the complexity of calculating the required amount of goods for production exists in central planning. A colossal number of factors must be taken © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 275–280, 2022. https://doi.org/10.1007/978-3-030-96993-6_29
276
Y. M. Medvedeva et al.
into account in order to understand what, how, whom for to produce and how to distribute. Nevertheless, in the Soviet Union, with all these problems, given the complexity of calculations in the absence of computers and modern technology, the deficit invariably imputed to the planned system was not a constant and widespread phenomenon. In addition, attempts were made to introduce information technologies everywhere to improve the plan; for example, there was a project of a nationwide automated system for recording and processing information (OGAS), under the leadership of V.M. Glushkov. It is also worth mentioning such a concept as decentralized or democratic planning, as opposed to centralized state planning, which allows resolving some of the issues mentioned above [2, pp. 105–111]. In particular, this type of planning takes into account the wishes of the people themselves and enterprises “from below” and simplifies the task of determining the quantity and distribution of products manufactured. At the same time, even in this field problems may arise, which some may associate with the impossibility of taking into account so many factors. As it turns out, the plan works and successfully resolves many similar issues today (Enterprise Resource Planning (ERP) [3]). Such issues, as a rule, are resolved in large companies and this is done quite successfully.
2 Business Planning High technologies are actively contributing to the resolution of these issues. With the development of information and communication technologies, it became possible to collect and store a significant amount of new data in volumes that could not be imagined before, which makes it possible to forecast much more efficiently. This type of data is called “Big data”. They are characterized by a large volume, different structure, higher processing and storage speed, as well as new opportunities for use in business [4]. It should be noted that today big data is used by all large companies, not to mention transnational corporations. The implementation and analysis of big data solves a significant number of problems state planning faced in the past, these are highlighted by B. Wang and S. Lee: 1) Big data can reveal and use hidden information not visible to the human eye, to plan the production and distribution of goods more efficiently. 2) Big data makes it possible to use nowcasting [5], a method in macroeconomic analysis that involves modern technologies that allow forecasting and assessing the recent past, present and near future. This is especially applicable for determining current macroeconomic parameters, which was impossible in the past, since macroeconomic parameters were determined ex post facto (e.g. GDP for the previous year, not for the current one). (See e.g. [6]) 3) Big data takes into account individual supply and demand, which even the market does not fully do, which imposes the consumption of only those goods produced by disparate manufacturers. When planning according to individual preferences, even at the production stage, one can determine what an individual needs. 4) Big data leads to the change of business processes from a hierarchical structure to a network structure, and even now large enterprises use, in many respects, elements of democratic, decentralized planning. [7, p. 146]
Digitalization of the Economy and Advanced Planning Technologies
277
Big data are analyzed using machine learning, deep learning [8], building neural networks, etc. If the first is pretrained algorithms based on technologies which have existed for quite a long time (from the second half of the XX century), although the efficiency has increased significantly over the past 10–20 years, then, despite the theoretical development at approximately the same period of time, people began fully using deep learning only at the beginning of the XXI century, since this requires a large amount of data and high productive capacities. The peculiarity of deep learning is that a person leaves the computer to find patterns in data. As a rule, training of such a model is based on the fact that there is a training sample (data with a predetermined result), with which an artificial neural network (a mathematical model built on the principle of biological neural networks) learns and finds dependencies that a person cannot find. Besides, as a rule, they are even difficult to interpret, but, and at the same time, the quality of forecasts becomes higher. The use of such technologies allows tracking individual preferences and forecasting consumption. Amazon studies its customers in detail and has a huge amount of completely different information about them. [9] It is no coincidence that companies from this field have been the most effective in recent years. Amazon’s revenue and profits are growing rapidly. The share of Internet commerce is growing in the total retail volume [21]. At the same time, marketplaces account for almost 50% of online commerce. For example, in the United States in 2019, marketplaces accounted for 47% of the total retail trade, while supermarket and other sellers’ websites accounted for 26%, and branded online stores - for 18% [20]. Today, companies operating outside the Internet also use digital technologies. For example, they write about it in the work “People’s Republic of Walmart” [10], where it is said that Walmart, like many TNCs, the largest American retailer uses big data tools to analyze its customers and plan activities similar to the state planning and provides the basis for the socialist economy of the future. Some media also agree with this, in particular the Financial Times writes that a large amount of data and the ability to process it make it possible to improve tools for centralized planning and reduce the information imbalance between the market and the plan [11]. The owner of one of the largest online trading platforms, AliExpress, is in solidarity with this position, who says that with the artificial intelligence, understanding many processes can reach a completely new level and that big data can improve the market and allow approach or even reach the planned economy [12]. In fact, the application of these tools is very wide and varied. Here are a few examples. Elo, one of the largest bank card holders in Brazil with over 1 million transactions per day, has implemented a system to anticipate customer short-term preferences based on their current location, weather, traffic, social media and previous purchases [13]. Many other companies have achieved similar results (see, for example, [14]). Among Russian companies, one can cite the example of Beeline, which was able to optimize service offerings for its customers. Using big data analytics tools such as machine learning and deep learning, the company categorized subscribers in terms of services to provide them, whether they are at the airport, and how many devices they use. This analysis
278
Y. M. Medvedeva et al.
made it possible to predict what services a customer needs at a given time, which in turn led to an increase in service consumption, an growth in revenue and the convenience of subscribers [15]. Naturally, the use of similar tools is typical for almost any industry, including real production, education, and healthcare. For example, it becomes possible to individually approach patients based on their medical history and the statistical base of such histories and to optimize the planning of medical expenses [15]. In retail, these tools are used to determine consumer preferences, which is directly related to the planned economy. Such examples already exist in Russian retailers such as Perekrestok. The demand, preferences of buyers in a particular region and the availability of goods there were analyzed. When predicting future purchases, the company delivers goods in advance to increase its revenue, while keeping in mind that the goods should not be stuck on the shelves [16]. There are also a significant number of examples of the use of this kind of technologies in real production, therefore, for example, Nestlé “… using sales data for previous periods and optimization algorithms, automatically determines the demand for materials and forms logistics supply chains.” [17] Domestic companies employed in real production also use big data in the production process [17]. Along with industry and services, big data is also used in agriculture. The American agricultural company FarmLogs, using data from satellites, which are in the public domain, about the upcoming weather conditions, soil conditions, precipitation, sun activity, etc., were able to introduce automatic seeding of various crops into their production. Thanks to the automatic analysis of the past and current state of these crops, a forecast is developed and “detailed recommendations for farming are formed, and all necessary calculations, including financial ones, are made automatically” [15].
3 Principles of Decentralized Planning After analyzing the above examples, we can come to the conclusion that modern technologies are indeed capable of improving both state centralized and decentralized democratic planning. One of the main problems the State Planning Committee faced in the USSR was insufficiently developed technologies, which did not allow taking into account all the factors for correctly predicting the preferences of the population. Today, this problem is eliminated, which suggests that the planned model is returning to the economy and can compete with the uncontrolled market, and the elements of the plan at the state level can and should be introduced now. Thus, modern technologies allow to: 1)Make more accurate forecasts; 2) Receive a colossal amount of information; 3) Take into account individual preferences; 4) Reduce the cost of information processing; 5)Make planning an effective tool for housekeeping. As noted by prof. A.V. Buzgalin, and A.I. Kolganov, in some industries, as the first steps, even within the framework of capitalism, it is advisable to introduce such a “system of relations and institutions, which we will call selective planning. Its systemic quality is the determination by society and approval by the state for a certain period of clearly fixed goals and basic “rules of the game” in the field of indirect (for the private sector) and direct (for the public sector) regulation of the part of the national economy that is subject to public regulation” [18].
Digitalization of the Economy and Advanced Planning Technologies
279
4 Conclusion The current stage of the world economy development is characterized by deglobolization and recessions, which has been going on since 2008–09. In recent years, de-globolization has manifested itself in the form of stagnation in the growth rate of foreign direct investment, the movement of international credit, as well as the growth of protectionist sentiments, trade wars and the introduction of mutual sanctions. As a result of these events, many previously successfully built global value chains began to be disrupted. The global recession was not overcome as it entered a new economic crisis from 2020, moreover, intensified by the pandemic caused in turn by COVID-19 [22]. The phases of deglobalization have historically been followed by phases of globalization. At the same time, long-term and medium-term cycles can be distinguished. The former correlate with the systemic cycles of capital accumulation, according to G. Arrighi. He singled out phases of material and financial expansion in each such long cycle. Globalization is precisely correlated with financial expansion, and deglobalization - with material expansion. The period of real financial expansion began in the 1970s. It was characterized by a sharp increase in financialization, that is, the replacement of industrial capital, financial capital, overaccumulation of capital in the real sector, a sharp increase in social differentiation, an increase in poverty, along with an increase in labor productivity around the world [23]. Medium-term cycles of globalization are associated with long waves in the economy and technological paradigms (TP). At the dawn of the emergence of a new TP, international relations can stagnate, and at the stage of TP expansion, the growth of international trade, capital movement, etc. is manifested. Thus, today the world economy is on the eve of a new systemic cycle of capital accumulation and a new technological paradigm based on the so-called NBICS technologies. The modern digitalization of the economy is already severely transforming our lives. However, truly revolutionary technologies are already beginning to be mastered and introduced around us today. In these conditions, planning is acquiring a gigantic and necessary role for sustainable and crisis-free development of the economy [24]. In the future, thanks to the development of modern technologies, elements of decentralized planning will spread everywhere, due to their high efficiency in many segments of the economic, and other human activities, as we see it by the example of the modern world.
References 1. Von Mises, L.: Economic calculation in the socialist commonwealth. Lulu Press, Inc (2016) 2. Kotz, D.M.: What economic structure for socialism? Soc. Sci. China (Internal publication). 10(4), 105–111 (2008) 3. Gartner, IT glossary, ERP https://www.gartner.com/it-glossary/enterprise-resource-planningerp/ 4. Stallings, W.: Foundations of modern networking: SDN, NFV, QoE, IoT, and Cloud. AddisonWesley Professional (2015) 5. Giannone, D., Reichlin, L., Small, D.: Nowcasting: the real-time informational content of macroeconomic data. J. Monet. Econ. 55(4), 665–676 (2008) 6. Now-Casting.com. https://www.now-casting.com/home
280
Y. M. Medvedeva et al.
7. Wang, B., Li, X.: Big data, platform economy and market competition: a preliminary construction of planoriented market economy system in the Information Era. World Rev. Polit. Econ. 8(2), 138–161 (2017) 8. Deng, L., et al.: Deep learning: methods and applications. Found. Trends® Sign. Process. 7(3–4), 197–387 (2014) 9. Durand, C., Keucheyan, R.: Economic planning is back. https://www.opendemocracy.net/en/ oureconomy/economic-planning-back/ 10. Phillips, L., Rozworski, M.: The People’s Republic of Wal-Mart: How the World’s Biggest Corporations Are Laying the Foundation for Socialism, Verso (2019) 11. Thornhill, J.: The Big Data revolution can revive the planned economy. Finan. Times 4 (2017) 12. Can big data help to resurrect the planned economy. http://www.globaltimes.cn/content/105 1715.shtml 13. Cloudera.com. https://www.cloudera.com/about/customers/cartao-elo.html 14. Mandiri, Transamerica. https://www.cloudera.com/about/customers/bank-mandiri.html, https://www.cloudera.com/about/customers/transamerica.html 15. Big data: https://habr.com/ru/company/newprolab/blog/318208/ 16. PredTech analyzed the behavior and lifestyle of the customers of the “Perecrestok”: https:// www.tadviser.ru/index.php/Ppoekt:Pepekpectok,_topgovy_dom_(Ppoekty_IT-ayt copcinga) 17. Cases of using Big Data technologies in production. https://habr.com/ru/company/newprolab/ blog/325550/ 18. Buzgalin, A.V., Kolganov, A.I.: Planning in the economy of the XXI century: what and for what? Terra Economicus. 15(1) (2017) 19. Glushkov, V.M.: Fundamentals of Paperless Informatics. The Science, Moscow (1987) 20. Salesforce.com. https://www.salesforce.com/resources/research-reports/?sfdc-redirect= 404#!page=1 21. Visionmonday.com. https://www.visionmonday.com/eyecare/coronavirus-briefing/thelatest-covid19-data/article/worldwide-ecommerce-is-on-the-rise-despite-retail-downturn/ Author, F.: Article title. Journal 2(5), 99–110 (2016) 22. Komolov, O.O.: Deglobalization and the “Great stagnation”. Int. Crit. Thought 10(3), 424–439 (2020) 23. Abdulov, R., Jabborov, D., Komolov, O., Maslov, G., Stepanova, T.: Deglobalization: the crisis of neoliberalism and the movement towards a new world order. https://doi.org/ 10.13140/RG.2.2.28808.14087.https://www.researchgate.net/publication/350878182_DEG LOBALIZACIA_KRIZIS_NEOLIBERALIZMA_I_DVIZENIE_K_NOVOMU_MIROPO RADKU 24. Abdulov, R.E.: Artificial intelligence as an important factor of sustainable and crisis-free economic growth. Postproceedings of the 10th Annual International Conference on Biologically Inspired Cognitive Architectures. BICA 2019 (Tenth Annual Meeting of the BICA Society), August 15–19, 2019 in Seattle, Washington, USA, pp. 468–472 (2019)
Genetic-Memetic Relational Approach for Scheduling Problems Sergey Yu. Misyurin1,2
and Andrey P. Nelyubin2(B)
1 National Research Nuclear University MEPhI, 31 Kashirskoe Shosse, Moscow, Russia 2 Mechanical Engineering Research Institute RAS, 4 Malyi Kharitonievski Pereulok,
Moscow, Russia
Abstract. A general approach to the construction and optimization of schedules is proposed, based on the representation of schedules in the form of a set of binary relations. The description of the schedule in the language of relations is natural and reflects its essential characteristics. It allows us to formalize a lot of flexible constraints involving different priorities, requests, wishes of the schedule participants. It also brings us closer to solving the problem of schedule recognition arising in the process of regular rescheduling. To optimize the schedules, we use a hybrid algorithm scheme that includes genetic, memetic, greedy algorithms, and heuristic rules. The proposed relations are used to encode key scheduling features within the genetic and memetic routine. The relational approach allows us to derive new information maintaining the consistency. As an example of the application of the approach, the formulation of the problem of distribution of objects among a group of autonomous mobile robots during emergency rescue or exploration work is proposed. Keywords: Scheduling algorithms · Rescheduling · Genetic algorithm · Memetic algorithm · Binary relations · Robotic systems
1 Introduction Scheduling problems are found in many areas: in production planning, in the delivery of goods, in curriculum development, in distributed computing. When carrying out complex work with robotic systems, the schedule should be built automatically. Construction of a feasible schedule taking into account many constraints is a complex problem that requires computational resources. Since the schedule usually affects many stakeholders, then in addition to “hard” constraints (on resources, on technology), there are many “flexible” constraints involving the priorities, requests, wishes of different participants. Such constraints are not always easy to formalize. The formulation of the schedule optimization problem is additionally complicated by the fact that there is no universal characteristic of which schedule is considered “good” and which is “bad”. For this, a number of criteria have been proposed [1], among which one can choose the most suitable one for a specific problem, or choose a combination of these criteria and solve a multicriteria optimization problem [2]. In any case, at the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 281–287, 2022. https://doi.org/10.1007/978-3-030-96993-6_30
282
S. Yu. Misyurin and A. P. Nelyubin
stage of formulation the schedule optimization problem or in the process of solving it, it will be necessary to take into account preferences and priorities and find a compromise solution. The next problem is rebuilding the schedule, or briefly, rescheduling. We can obtain and understand a lot of information looking at the schedule built at the beginning: the amount of work, the time estimation of its completion, the capacity and resource requirements, etc. However, following a fixed schedule usually fails. Regularly “something goes wrong”: equipment breaks down, the supply of resources is late, orders are canceled… In such cases, instead of trying to return to the original schedule, it may be more expedient to rebuild this schedule. In the rescheduling problem an additional requirement (request) arises that the new schedule does not differ much from the previous one. And we are not talking about some summary characteristics, but about the entire schedule as a whole. Formally, such a requirement presupposes the development of a measure of the similarity of two schedules. Such a problem can be attributed to the field of pattern recognition. In this paper, we propose an approach to formalizing schedules aimed at solving the listed problems. It consists in constructing various binary relations on various sets of objects of the scheduling problem. We use these relations in scheduling and rescheduling problems, as well as in schedule optimization algorithms using the ideas of genetic [3] and memetic [4, 5] computations.
2 Relational Representation of Schedules Let us start describing the approach with examples of relations that can be used in a fairly wide range of planning and scheduling problems. 2.1 Assignment Relations Introduce the following relations on the sets of machines (performers) M and jobs J: – assignment feasibility relation: the machine α can do the job A. α → A – assignment relation: the machine α is assigned to the job A. α A – assignment denial relation: denial to assign the machine α to the job A. In practice, not all assignments are feasible. For example, a worker is not qualified to perform all the jobs. Or the machine is equipped to handle only certain items. Also, as a rule, not all feasible assignments are implemented in the schedule. In the general M × J. case, non-strict inclusions hold: 2.2 Sequence Relations Sequences can be described from different points of view, depending on the context of the problem being solved. If priorities and deadlines are important then it is logical to use precedence relations:
Genetic-Memetic Relational Approach for Scheduling Problems
283
A ≺ B – strict precedence relation: the job A is done before the job B. A ∼ B – concurrency relation: the jobs A and B run at the same time. A B – non-strict precedence relation: the job A is done not later than the job B. The relation ≺ is an order: it is transitive, irreflexive and asymmetric. The relation ∼ is an equivalence: it is transitive, reflexive and symmetric. The relation is a quasiorder: it is transitive and reflexive. These relations are interconnected: = ≺ ∪ ∼, and the relations ≺ and ∼ are asymmetric and symmetric parts of , correspondingly. Also, in the context of the problem being solved, there may be a different cost of transitions between jobs. For example, the distance between the points of delivery of goods. Or changeover of equipment between products of different types. The cost of switching between jobs can be either symmetric or asymmetric. For example, when switching from casting a dirty metal alloy to a clean one, flushing of the mixers is required. To describe schedules in such cases, it is useful to use connectivity relations: A B – connectivity relation: the jobs A and B are performed directly one after the other in any sequence. A B – connected precedence relation: the job A is done just before the job B. The relation is symmetric and is asymmetric. Both of these relations are irreflexive and non-transitive. However, it is possible to construct a transitive closure of the relation , which is an equivalence and will make sense: the jobs A and B are performed in the same bundle. We can also define a relation that is transitive in : the job A is executed just before the bundle of jobs that includes the job B. 2.3 Using Relations in Scheduling and Rescheduling From a human point of view, scheduling is a complex process of making many different types of decisions that are linked to each other. Wherein some decisions may depend on the results of previously made decisions. Such complex decision-making processes are referred to as decision trees (directed graph). The above relations correspond to such “atomic” decisions. Therefore, the description of the schedule in the language of relations is natural, reflects its essential characteristics. This is what is meant when we talk about the similarity or proximity of two schedules, for example, in the rescheduling problem. The mathematical framework of binary relations allows us to formalize this problem. Various metrics can be introduced on the sets of relations, taking into account their intersections, interdependence, and the presence of contradictions. Then, to measure the similarity of two schedules, we can compare the sets of relations representing these schedules. The priorities, requests and wishes for the schedule coming from various participants in the process can also be formalized in the form of relations. Here are some examples: • Priority job A must be placed before all others: A ≺ B, ∀B ∈ J. • It is better to produce the product A with the new equipment α, if it is free: α → A. • The teacher at the university wants his three pairs to stand side by side on the same day: A1 A2 , A2 A3 , A1 A3 .
284
S. Yu. Misyurin and A. P. Nelyubin
All such requests can be added to the hard constraints that determine the feasibility of the schedule. However, the requests of different participants often contradict each other. It becomes impossible to build a schedule that satisfies all the constraints and requests. Another approach is to use penalty functions for violating such requests. But this transfers the problem of choosing priorities to the problem of choosing weighting coefficients for penalty functions. The set of all requests can also be represented as a separate set of relations. Then the degree of satisfaction of a particular schedule for all requests can be measured by comparing the set of relations representing this schedule and the set of relations representing the requests.
3 Schedule Optimization Algorithms Various heuristic rules are widely used in automatic scheduling. The schedules obtained with their help are easier to justify. However, such schedules are often suboptimal due to the fact that not all the factors are taken into account and many implicit opportunities are missed. Finding the best solutions requires the use of global optimization algorithms [6]. Rather complex models can be used to construct schedule alternatives, including nonlinear, discrete, and mixed constraints. From the point of view of the optimization algorithm, such models represent a black box, the input of which is the parameters of the schedule, and the output is a schedule for which the values of the characteristics can be estimated. For solving such optimization problems, genetic algorithms, as well as other variations of evolutionary algorithms, have proven themselves well. To optimize the schedules, we use a hybrid algorithm scheme that includes genetic, memetic, greedy algorithms, and heuristic rules. When developing genetic algorithms, the key issue is the choice of a method for encoding solutions in the chromosome. The assignments are successfully encoded using a binary string. But for coding sequences, there are several approaches that have their own advantages and drawbacks [3]. In addition, with the complete coding of the schedule in the chromosome, in addition to the key information, there is a lot of "noise". We write to the chromosome a set of messages in the form of relations. These messages should not be inconsistent and other messages can be derived from them. For example, from the messages A≺B and B≺C, A≺C follows by the rule of transitivity. If, as a result of such transitive derivation, a cycle A≺A is obtained, then such a set of messages is inconsistent and the corresponding chromosome should be discarded. The resulting relation may not be complete. The chromosome may not contain any information at all. The schedule corresponding to the chromosome is reconstructed using basic scheduling algorithms that rely on messages from the chromosome. Any heuristic can be used in the basic algorithm, including greedy algorithms, or constraint programming elements. Consider following examples of the basic scheduling algorithm that uses information from the chromosome. Example 1. Suppose it is required to distribute the production of orders for products across several machines during a month. For each order, the quantity, type and price
Genetic-Memetic Relational Approach for Scheduling Problems
285
of the items are known. For each machine, it is known what types of products it can produce (the relation is given) and with what productivity. 1. For each feasible assignment , calculate marginality as the price multiplied by the productivity. 2. Sort the assignments α → A available on the chromosome by marginality. 3. Try to implement the assignments α → A one at a time in accordance with the sorting. by marginality. 4. Sort the remaining feasible assignments 5. If there is no message α A in the chromosome, then try to implement the one at a time in accordance with the sorting. assignments To implement the assignment
:
1. Check whether there are unallocated items in the order A, whether the machine α has the capacity in the form of free hours. 2. Calculate how many items of the order A can be produced on the machine α. 3. Subtract this number of items from the order A 4. Subtract the required number of hours from the machine’s α capacity. Example 2. Suppose it is required to build a sequence for the manufacture of items on one machine. 1. Sort items according to their deadline for production. 2. Put the items in the production sequence one by one according to the sorting. 3. If item A is considered and there is a message A B in the chromosome, then immediately after A put the item B. The initial population of schedules can be obtained by adding one or several different messages to the empty chromosomes. The crossover operator of two chromosomes consists in the probabilistic combination of sets of messages in each of the chromosomes. General rules can be formulated: 1. The combination of messages in the child chromosomes should be consistent. 2. If the same message is found in both parental chromosomes, then with a high probability it should appear in the child chromosomes as well. 3. If some message is absent in both parental chromosomes, then it can appear in the child chromosomes only as a result of derivation from other messages (for example, according to the transitivity rule). The chromosome mutation operator can be the random addition or deletion of individual messages while maintaining consistency. A memetic algorithm is also used for local search. Its idea is to isolate the most common features (memes) among solutions and distribute them among other solutions. Thus, the optimization search is carried out within the population. In our approach, as
286
S. Yu. Misyurin and A. P. Nelyubin
memes, it is natural to use individual messages in chromosomes, which are often found among the most successful schedules.
4 Application in Scheduling Robotic Systems The proposed approach can be applied to automatically scheduling and rescheduling of robotic systems [7–11]. Groups of autonomous mobile robots are increasingly being used for rescue, exploration or research work in areas where human access is difficult or impossible. In complex problems, groups of heterogeneous robots are used, differing not only in characteristics, but also in their specialty [10]. Some robots are looking for objects, others transport resources or materials, and still others perform basic work on objects. The distribution of these jobs among the executing robots is a scheduling problem. Moreover, such a schedule should be regularly rebuilt taking into account the incoming information: new objects are discovered, object priorities change. For groups of robots, there are strategies that automatically identify priorities and preferences in multicriteria problems in the form of relations [7, 9]. At the same time, only partial relations are sufficient to select the desired strategy and distribute work (goals, objects) among robots [8, 10]. In addition, by controlling the configuration of a group of heterogeneous robots, it is possible to perform online optimization [7–9] and adaptation [11], involving the ideas of genetic and memetic algorithms.
5 Conclusion The article describes the general ideas of the proposed approach and touches upon a wide range of problems for its application. Promising areas for further research include: – – – –
use of fuzzy relations; introducing relations on subsets of objects (for example, the bundles of jobs); construction and research of different metrics on sets of relations; research on the effectiveness of optimization algorithms.
This work was supported by the Russian Foundation for Basic Research (RFBR) grant No. 18–29-10072 mk.
References 1. Brucker, P.: Scheduling Algorithms 5. Springer, Heidelberg (2007) 2. Ehrgott, M., Figuera, J., Greco, S.: Trends in Multiple Criteria Decision Analysis. Springer, New York (2010) 3. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer, Heidelberg (1996) 4. Moscato, P., Cotta, C.: Memetic algorithms. In: Gonzalez, T. (eds.) Handbook of Approximation Algorithms and Metaheuristics, Chap. 27. Chapman & Hall/CRC, New York (2007)
Genetic-Memetic Relational Approach for Scheduling Problems
287
5. Ong, Y.-S., et al.: Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybern. – Part B Cybern. 36(1), 141–152 (2006) 6. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998) 7. Misyurin, S., Nelyubin, A.P.: Dominance relations approach to design and control configuration of robotic groups. Procedia Comput. Sci. 190, 622–630 (2021) 8. Misyurin, S.Y., Nelyubin, A.P., Potapov, M.A.: Designing robotic groups under incomplete information about the dominance of many goals. In: Misyurin, S.Y., Arakelian, V., Avetisyan, A.I. (eds.) Advanced Technologies in Robotics and Intelligent Systems. MMS, vol. 80, pp. 267–273. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33491-8_32 9. Misyurin, S.Yu., Nelyubin, A.P., Potapov, M.A.: Multicriteria Approach to Control a Population of Robots to Find the Best Solutions. Advances in Intelligent Systems and Computing. Biologically Inspired Cognitive Architectures 948, 358–363 (2019) 10. Misyurin, S.Yu., Nelyubin, A.P., Potapov, M.A.: Applying partial domination in organizing the control of the heterogeneous robot group. IOP Conf. Series: J. Phys. 1203(1), 012068 (2019) 11. Misyurin, S.Yu., Nelyubin, A.P.: Multicriteria adaptation principle on example of groups of mobile robots. IOP Conf. Series: J. Phys. 937(1), 012034 (2017)
Multicriteria Optimization of a Hydraulic Lifting Manipulator by the Methods of Criteria Importance Theory S. Yu. Misyurin1,2
, A. P. Nelyubin2(B)
, G. V. Kreinin2
, and N. Yu. Nosova2
1 National Research Nuclear University MEPhI, 31 Kashirskoe Shosse, Moscow, Russia 2 Blagonravov Mechanical Engineering Research Institute of RAS,
4 Malyi Kharitonievski Pereulok, Moscow, Russia
Abstract. The article describes the procedure for multicriteria optimization and choosing the best parameter values of a manipulator designed to lift a heavy, bulky load using two parallel and synchronously operating hydraulic drives. Information about the dynamics of the system was obtained by computer simulation of a sufficiently complete dimensionless model. Three characteristics of the system are considered as optimality criteria: imbalance of mass loads on drives, power (size) of drives and synchronization of their operation. To search for feasible solutions to the optimization problem in the parameter space, a sequence of uniformly distributed points was generated. The sets of feasible and Pareto optimal solutions are analyzed using visualization tools in the MOVI program. Within the framework of the mathematical criteria importance theory, expert information on preferences regarding criteria was formalized and refined. In the course of this iterative procedure, the set of feasible solutions was narrowed down to 67, then to 4 alternatives, and in the end one best solution was chosen. Keywords: Dynamic system · Hydraulic drive · Dimensionless parameters · Visualization · Multicriteria optimization · Criteria importance
1 Introduction This paper describes the procedure for multicriteria optimization and choosing the best parameter values for a rather complex model of a manipulator designed to lift a heavy, bulky load using two parallel and synchronously operating hydraulic drives. This model was investigated in [1], where a method was proposed for the graphical representation of the results of the numerical calculations of the dynamics of the system. 17 technological and operational parameters were considered as the optimized parameters of the manipulator. We can vary the values of these parameters within certain ranges. The design of the manipulator corresponds to a feasible set of parameters, the characteristics of which can be calculated and analyzed using the approach proposed in [1]. To search for feasible solutions in this work, we used the parameter space investigation method by generating a sequence of uniformly distributed points [2]. For calculations and visualization of a set of solutions, the MOVI software was used [3]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 288–296, 2022. https://doi.org/10.1007/978-3-030-96993-6_31
Multicriteria Optimization of a Hydraulic Lifting Manipulator
289
As the main criteria for optimality (objective functions) of the system, three indicators were taken that characterize the values of the imbalance of mass loads on the drives, the power (size) of the drives and the maximum divergence of the displacements of their rods (deviation from synchronicity) during movement. Among all the found feasible solutions, a set of Pareto optimal solutions can be selected. It is known that the best solution should be chosen among this set. However, the set of Pareto optimal solutions is also large enough to iterate over all of them. To narrow down the range of choice, additional assumptions should be made about the preferences of the decision maker (DM) regarding the values of the criteria. For formal modeling of these preferences and obtaining conclusions on the basis of this information, the approach of the mathematical criteria importance theory (CIT) was used in this work [4, 5]. The DASS software [5, 6] was used for calculations.
2 The Manipulator Optimization Problem The object of research in this work is a rather complex model of a manipulator designed to lift a heavy, bulky load using two parallel and synchronously operating hydraulic drives 1 and 2 (Fig. 1). The manipulator is controlled by a two-stage valve system connected to a power supply with pressure pM through the intermediate cavity 4. The valve 3 of the first stage regulates the overall rate of lifting of the load, and the law of motion of the actuator. The valve 5 of the second stage distributes the flows of the working fluid directed to the drives, stabilizing the speed and synchronization of their movement, which can be disturbed by the difference in mass loads on the drives or the resistance forces acting in them.
Fig. 1. Lifting manipulator diagram.
The detailed mathematical model of the system under optimization and the results of numerical calculations are given in [1]. Synchronously operating drives move a load of mass m = m1 + m2 , where m1,2 are the mass loads applied to the drives. The proportion of the total manipulator load on the first drive is characterized by the value c1 = m1 /m. To assess the ability of the manipulator to work under conditions when the drives are loaded unequally, the
290
S. Yu. Misyurin et al.
unbalance criterion K1 = |0.5 − c1 | is introduced. The greater its value, the greater the difference in the loads on the drives the manipulator allows. The indicator of the power (size) of the drive is the relative value. K2 = χ = mg/pM F, where pM F is the maximum driving force of one drive. Since there are two drives, the limit value of the total driving force is 2pM F and criterion K2 can theoretically vary from 0 to 2. Other resistance forces acting in each drive are measured on the scale of one drive, that is, they have this scale by the limit of one. For a given mass load and resistance forces, the higher the K2 or other load effectiveness of the drives, the smaller the drive. The synchronicity indicator of the movement of the drives is represented by the current divergence of the positions of the rods λ = |λ1 − λ2 |, i.e. violation of the synchronization of movement at different moments of movement. As a criterion for the synchronization of the movement of the drives, its maximum (worst) value for the stroke period K3 = λmax is selected, which must be minimized. The principle of constructing a mathematical model applied in [1] was similarly applied in [7, 8]. Table 1 shows the parameters of the system and the ranges of their values. Table 1. System parameters and the ranges of their values. Parameters Ranges
Description
c1
0.25–0.75 Weight load imbalance
χL
−2–−0.4 Relative total operating load on drives, simultaneously serving as a measure of their sizes
λV
0.2–1.0
Intermediate chamber volume measure
β0
0,3–0.7
Specified proportion of the opening of the common channel in the line leading to the drives, attributable to the first drive
α1
0.25–1
The ratio between the flow areas of the common supply channel and the channel leading to the first drive
α2
0.25–1
The same for the channel leading to the second drive
κ2
0.05–0.1
First drive fluid friction coefficient
κ2
0.05–0.1
Second drive fluid friction coefficient
ϑ1
25–50
Position feedback ratio
ϑ2
0–50
Speed feedback ratio
ϑD
25–100
Position feedback ratio
ϑV
0–5
Speed feedback ratio
tA
0.02–0.04
tB
0.02–0.04
τS
10–50
Mass m movement time
The maximum feasible unbalance between the loads c1 and c2 can be considered a manipulator characteristic that can be optimized.
Multicriteria Optimization of a Hydraulic Lifting Manipulator
291
3 Feasible Solutions Generation and Initial Analysis In the MOVI software 4000 solutions were generated, the coordinates of which are uniformly distributed in the space of variable parameters [2]. 2628 of these solutions were found to be feasible subject to the constraints of the model. Among the feasible solutions, there were 126 Pareto optimal solutions. Each solution x can be associated with a 3D vector K(x) = (K1 (x), K2 (x), K3 (x)) of the optimality criteria estimates. If the solutions are depicted as points in the 3D space of criteria, then they form a cloud in a certain area, and the points of Pareto optimal solutions will be located on a part of the boundary of this cloud. Figure 2 shows the projection of this cloud on 2D space of criteria K1 and K2 . Blue dots denote feasible solutions, green circles – Pareto optimal ones.
Fig. 2. The set of solutions in the space of criteria K1 and K2 . Blue dots denote feasible solutions, green circles – Pareto optimal solutions.
Figure 2 represent the initial, primary information for subsequent analysis and choosing of the best solution. At the first stage, such images make it possible to assess in what ranges of criteria values are feasible solutions. That is, in fact, the decision maker receives primary information about the available opportunities in terms of achieving the best values of the criteria. The first practical conclusion based on the analysis of Fig. 2 is the following: a lot of feasible solutions are obtained with an acceptable value of the load imbalance K1 . Therefore, we can safely discard some of the solutions with weakly acceptable values of K1 , imposing an additional constraint K1 > 0.1, and this leaves quite a lot of feasible solutions – 1371. Of these, 67 are Pareto optimal solutions. The result of imposing this constraint in the criteria space is shown in Fig. 3. Further, you can also impose constraints on the remaining criteria, and then gradually increase these constraints, thereby narrowing the set of choices. This is one of the approaches, it can be attributed to the solution carried out by specialists in this field.
292
S. Yu. Misyurin et al.
Fig. 3. The set of solutions in the space of criteria K1 and K2 . Blue dots denote feasible solutions, crimson dots – discarded ones due to the constraint K1 > 0.1, green circles – Pareto optimal solutions.
4 Solving the Choice Problem by the CIT Methods It is required to choose the best solution among the selected 67 solutions obtained at the previous stage, taking into account the constraint K1 > 0.1. For the application of the CIT methods, the individual criteria K1 , K2 , K3 must be brought to a homogeneous form with a common scale Z, which can be only ordinal [4]. In this problem, we will use a 10-point scale: the higher the score, the better, the higher the value (usefulness, preference) for the decision maker of such values according to the criterion. To bring the criteria to a 10-point scale Z, we use linear normalization of the criteria values and rounding. As a result, each of the 67 solution options is associated with its vector estimate from the set Z 3 = Z × Z × Z. It should be noted that the requirements for minimization and maximization of the initial criteria can be different (K1 → max, K2 → max, K3 → max), while the estimates on the Z scale are always the same (y1 → max, y2 → max, y3 → max). Due to rounding, each vector score describes a certain small region in the original 3D space of the criteria. At the same time, some solutions may be in the same region and then they will have the same vector assessment. For example, the alternatives #2478, #3442 and #3768 have the same vector score (10, 7, 10). Further, by the CIT method we will solve the problem of choosing the best vector estimate. Choosing this vector estimate, we will get a corresponding small region in the original space of criteria, which includes one or more solutions from the 67 considered. In the CIT, preferences of the DM are modeled using binary relations [4]. The strict preference relation P is introduced on the set of vector estimates Z 3 : the notation yPz means that the vector estimate y is preferable to z (or y dominates z). The preference relation P could be incomplete, or partial. It is known that the best vector estimate should be chosen among the nondominated with respect to P vector estimates. Since the DM’s preferences increase along the criterion scale Z, the Pareto relation is defined on the set of vector estimates Z 3 : yP φ z ⇔ yi ≥ zi , i = 1, 2, 3, and y = z. Among the 67 vector estimates under consideration, there are 11 nondominated with respect to the Pareto ratio P∅ . In fact, 9 vector estimates remain, since alternatives with
Multicriteria Optimization of a Hydraulic Lifting Manipulator
293
numbers #2478, #3442 and #3768 correspond to the same vector estimate (10, 7, 10). These 9 vector estimates and the corresponding alternatives are shown in Table 2. Table 2. Pareto-optimal vector estimates. Alternatives
K1
K2
K3
257
0.179
1.35
0.00327
y1
y2
y3
6
8
10
286
0.149
1.50
0.00630
4
10
8
1134
0.192
1.39
0.00814
7
9
7
1474
0.141
1.50
0.00551
3
10
9
1616
0.241
1.29
0.00561
10
8
9
1627
0.210
1.39
0.01131
8
9
5
2478
0.243
1.23
0.00374
10
7
10
3014
0.180
1.44
0.00495
6
9
9
3258
0.154
1.38
0.00390
4
9
10
3442
0.235
1.23
0.00354
10
7
10
3768
0.240
1.16
0.00366
10
7
10
At the next step of solving the problem by the CIT method, we enter into the DASS software [5, 6] information about the ordering of criteria according to their importance, as shown in Fig. 4. This qualitative criteria importance information is denoted as = {1 2 3}. According to the CIT definitions [4], the information 1 2 that the first criterion is more important than the second means that any vector estimate y = (y1 , y2 , y3 ) from Z 3 , for which y1 > y2 , is more preferable than the vector estimate z = (y2 , y1 , y3 ) obtained from y by permuting the components y1 and y2 . This can be written as follows: yP 12 Z.
Fig. 4. Choice based on information about the ordering of criteria by importance
294
S. Yu. Misyurin et al.
As a result, there are only 2 nondominated vector estimates and the corresponding 4 alternatives shown in Table 3. For each of the 4 vector estimates that turned out to be dominated with respect to P , it is possible to formally justify why it should be excluded from consideration. Namely, what other vector estimate dominates it and on the basis of what information about preferences this conclusion is made: y(2478) = (10, 7, 10)P 12 (7, 10, 10)P φ (6, 8, 10) = y(257) y(2478) = (10, 7, 10)P 12 (7, 10, 10)P φ (4, 9, 8) = y(286) y(2478) = (10, 7, 10)P 12 (7, 10, 10)P φ (7, 9, 7) = y(1134) y(2478) = (10, 7, 10)P 12 (7, 10, 10)P φ (3, 10, 9) = y(1474) y(1616) = (10, 8, 9)P 23 (10, 9, 8)P φ (8, 9, 5) = y(1627) y(2478) = (10, 7, 10)P 12 (7, 10, 10)P φ (6, 9, 9) = y(3014) y(2478) = (10, 7, 10)P 12 (7, 10, 10)P φ (4, 9, 10) = y(3258) For example, the notation (10, 7, 10) P 12 (7, 10, 10) means that the vector estimate (10, 7, 10) is preferable to the vector estimate (7, 10, 10), since the first criterion is more important than the second. Table 3. Selected vector estimates. Alternatives
K1
K2
K3
y1
y2
y3
1616
0.241
1.29
0.00561
10
8
9
2478
0.243
1.23
0.00374
10
7
10
3442
0.235
1.23
0.00354
10
7
10
3768
0.240
1.16
0.00366
10
7
10
The resulting 4 vector estimates remain incomparable with the introduced information about the DM’s preferences. At the next step in solving the choice problem, we enter information about the type of the criteria scale, as shown in Fig. 5. As a result, there remains one non-dominated vector estimate (10, 8, 9), which corresponds to option #1616.
Multicriteria Optimization of a Hydraulic Lifting Manipulator
295
Fig. 5. Choice based on the first ordered metric scale of the criteria.
5 Conclusion The problem of multicriteria optimization is one of the most difficult and controversial problems. Difficulties begin with the construction of a vector of optimization criteria. In real problems, as a rule, the criteria come into conflict in the sense that it is not possible to obtain solutions that are optimal for all criteria at once. It is required to take into account their relative importance and determine preferences. This paper presents an interactive step-by-step optimization procedure, which, without determining the importance weights, significantly reduces the space of optimal solutions. The main stages of the developed procedure are as follows. 1. Based on the mathematical model of a physical object, multiparameter and multicriteria optimization is carried out with the selection of the Pareto set. In our problem, using a mathematical model, 4000 solutions were calculated, 2628 were selected that were feasible with respect to the constraints of the model, of which 126 were Pareto optimal. 2. At the next stage, the analysis of the criteria space is carried out in order to reduce the area of suitable solutions. In our case, with the introduction of an additional constraint on the values of criterion K1 > 0.1, there are 1371 feasible solutions, of which 67 are Pareto optimal. 3. Next, the preferences regarding the importance and values of the criteria are clarified. Moreover, in the form of qualitative estimates. In our case, after determining the relative importance of the criteria and the type of criteria scale, we received one best solution.
Acknowledgment. The research was supported by Russian Foundation for Basic Research, project No. 18–29-10072 mk (Optimization of nonlinear dynamic models of robotic drive systems taking into account forces of resistance of various nature, including frictional forces).
296
S. Yu. Misyurin et al.
References 1. Kreinin, G.V., Misyurin, S.Yu., Nelyubin, A.P., Nosova, N.Yu.: Visualization of the interconnection between dynamics of the system and its basic characteristics. Sci. Vis. 12(2), 9–20 (2020) 2. Sobol, I.M., Statnikov, R.B.: Choice of Optimal Parameters in Problems with Many Criteria. Drofa, Moscow (2006) 3. Homepage. http://www.psi-movi.com/. Accessed 09 May 2021 4. Podinovski, V.V.: Ideas and methods of the criteria importance theory in multicriteria decisionmaking problems. Nauka, Moscow (2019) 5. Nelyubin, A.P., Podinovski, V.V., Potapov, M.A.: Methods of criteria importance theory and their software implementation. In: Kalyagin, V.A., Pardalos, P.M., Prokopyev, O., Utkina, I. (eds.) NET 2016. SPMS, vol. 247, pp. 189–196. Springer, Cham (2018). https://doi.org/10. 1007/978-3-319-96247-4_13 6. Homepage. http://mcodm.ru/soft/dass. Accessed 06 June 2021 7. Misyurin, S.Y., Kreinin, G.V., Nosova, N.Y.: Similarity and analogousness in dynamical systems and their characteristic features. Russ. J. Nonlinear Dyn. 15(3), 213–220 (2019) 8. Misyurin, S.Yu., Kreinin, G.V., Nelubin, A.P., Nosova, N.Yu.: The synchronous movement of mechanisms taking into account forces of the different nature. J. Phys. Conf. Ser. 1439(1), 012016 (4 pages) (2020)
The Hexabot Robot: Kinematics and Robot Gait Selection S. Yu. Misyurin1,2(B) , A. P. Nelyubin2 , G. V. Kreynin2 , N. Yu. Nosova2 A. S. Chistiy1 , N. M. Khokhlov1 , and E. M. Molchanov1
,
1 National Research Nuclear University MEPhI, 31 Kashirskoe shosse, Moscow, Russia 2 Blagonravov Mechanical Engineering Research Institute of RAS,
4 Malyi Kharitonievski pereulok, Moscow, Russia
Abstract. The article is devoted to the problems of movement and control of a six-legged walking spider-robot. The direct and inverse problems of kinematics of a spatial mechanism are solved. The motion of a walking six-legged robot Hexabot (robot “spider”) with possibility of realization of different motions is considered. The task is to increase the speed of the robot’s movement from one position to another by reducing the dynamic loads on the robot leg, as well as by choosing a more rational dynamically balanced gait. In solving the first part of the work, at the first stage, one leg is considered separately as an open kinematic system with three degrees of freedom. Equations of direct and inverse kinematics are obtained. The motion of the platform (the “body” of the spider) is not considered in this work. Keywords: Kinematics · Dynamics · Spider robot · Optimization · Robot gait · Mathematical modelling
1 Introduction The problems of kinematics and control of spatial mechanisms (walking robots) with a large number of degrees of freedom have been studied for a long time by such wellknown researchers as A.K. Platonov, V.B. Larin, D.E. Okhotsimsky and others [1–3]. Many authors have repeatedly emphasized the great relevance of this area of research, however, the conditions for the active development of this direction have appeared only recently due to the progress of information and machine-building technologies [4, 5]. Highly intelligent systems have been created that can control technical systems, and a large experimental base has appeared. With the development of additive technologies, almost all components can be manufactured using a 3D printer [6], which significantly speeds up the verification of theoretical developments by carrying out model and fullscale experiments. The research area of robotics has become so popular and relevant that many articles are devoted to the problems of optimizing the behavioral state of robots, including robotic group control [7–9]. The class of walking robots is developing rapidly. Six-legged robots are more stable than bipedal or zoomorphic robots due to the number of legs [10–16]. Such robots have © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 297–305, 2022. https://doi.org/10.1007/978-3-030-96993-6_32
298
S. Yu. Misyurin et al.
increased stability, are most effective in off-road conditions, difficult terrain and when overcoming obstacles. Even on viscous soils, they can be more effective than mechanisms with tracked engines [17–23], traditionally considered the most passable, but they have access to only about half of the earth surfaces [24]. Despite the fact that walking robots have a high crossing ability, their movement on a flat surface is energetically inefficient. In [25], a comparison is made between wheel and tracked engines and walking engines. Increasing the speed and efficiency of movement of walking robots is an urgent area of research. In our article, we will consider a method that does not affect the design changes of the mechanism. The time and energy costs during the movement of walking robots depends not only on the starting and ending points, but also on the given trajectory of movement of individual links. Each limb (leg) of a robot is composed of many parts, so research based on methods of theoretical mechanics is needed to increase the efficiency of movement. In traditional transport, the driver’s task is to choose the direction and speed of movement; in the case of a walking robot, the operator assumes the same actions, while the control of individual links of the mechanism and their coordination will remain with the computer system. That is, one of the most important problems in creating a walking robot is the development of a control system. There are many algorithms for the movement of the robot. Most are suitable for a wide variety of robots. However, for heavy robots, many algorithms are poorly optimized. Thus, for such robots it is necessary to select the optimal algorithm and its parameters so as to achieve the desired results.
Fig. 1. The Hexabot model: (a) an experimental robot layout: b) a robot kinematic diagram.
To build and debug control algorithms, it is necessary to create an adequate robot model. First, a kinematic model is built, in which the weight of the links and the dynamic connections between them are neglected. Here, the limb of the robot is considered as a set of material points (nodes) with certain coordinates. Such a model is convenient for the development of various trajectories of movement of the robot’s limbs. At the second stage of the development of the control system, it is necessary to build a dynamic model that takes into account the mass of individual robot links and dynamic connections between the links. Such a model makes it possible to obtain the equations of motion of individual nodes when moving using various trajectories and to estimate the speed. The
The Hexabot Robot: Kinematics and Robot Gait Selection
299
main problem of the control system is to ensure the movement of the links according to the most effective equation of motion by supplying the appropriate control signals. Figure 1(a) shows an experimental sample of the Hexabot robot with 18 degrees of freedom, three degrees for each leg. Figure 1(b) shows its simplified kinematic diagram.
Fig. 2. Schematic representation of a spider robot leg: (a) before optimization; (b) after optimization.
Each leg of the robot uses three servos when moving. To reduce the time, it takes for the limb to move from the lower position to the upper position and vice versa, as well as the amount of calculations, it is necessary to simplify the gait. Let’s take a simple move of the leg from one position to another. To begin the movement, the leg must rise. Let us consider two cases (Fig. 2(a), 2(b)). In the first case, the leg is lifted by the drive between the coxa and femur links, and in the second case by the drive between the femur and tibia links. In the second case, only the tibia link is lifted to raise the leg. Therefore, this method is preferable if there are no constraints on the space of movement.
2 Description of the Mechanism To control the robot, we need to know what position the leg will take depending on the angles of rotation of the gears, in other words, to solve the direct problem of kinematics. This issue was considered in more detail in [15, 16]. Consider the robot leg separately (Fig. 3(a)). Let us introduce a coordinate system in the center of the hinge for attaching the leg to the body O1 . This mount is a rotational pair in the plane of the “spider” body. The O1 z axis is directed vertically upwards along the rotation axis. The O1 x axis is directed perpendicular to the O1 z axis, and is directed to the center of mass of the spider’s body; the O1 y axis is perpendicular to the O1 zx plane and forms a right-handed coordinate system. O1 , O2 and O3 are the rotary kinematic pairs with rotation angles. The O1 O2 link rotates in the O1 xy plane, while α is the angle of deviation of the link from the O1 y axis. The axes of rotation of the kinematic pairs O2 and O3 are parallel to the O1 xy plane. The points A1 , A2 and A3 are the centers of mass of the links. Denote the lengths |O1 O2 | = l1 , |O2 O3 | = l2 , |O3 O4 | = l3 , |O1 A1 | = ρ1 , |O2 A2 | = ρ2 , |O3 A3 | = ρ3 , |O1 A2 | = L1 , |O1 A3 | = L2 , |O2 A3 | = L3 . The angles α, β and γ are generalized coordinates – the
300
S. Yu. Misyurin et al.
rotation angles of the links, measured according to the diagram in Fig. 3(b). Also, the following characteristics are given for i = 1, 2, 3: mi are masses of links; mg3 is the mass of the third gear; JAi are the moments of inertia of the links relative to the centers of mass; Jdi are the moments of inertia of the gears; Mdi are the moments developed by the engines; Mci are the moments of resistance.
Fig. 3. The robot leg: (a) a location of the O1 xyz coordinate system in relation to the real object – the robot’s leg; (b) a spider robot’s leg kinematic scheme.
3 Mechanism Kinematics When solving the problems of kinematics of a robot with six legs, we rely on the fact that part of the legs moves in the air, while part of the legs is on the ground and supports the robot. Direct Kinematics Problem. Let’s find out how each leg moves separately, which movement will be the most efficient, or the fastest. That is, at the first stage, we assume that the robot’s body is motionless. According to this assumption, we consider each leg as an open kinematic chain. In accordance with this assumption, the equation of the direct problem of kinematics has the following form: ⎡ ⎡ ⎛ ⎞ ⎛ ⎞⎤ ⎛ ⎞⎤ 0 0 0 −−−→ O1 O4 = Mz (α)⎣My (β)⎣My (−γ )⎝ l3 ⎠ + ⎝ l2 ⎠⎦ + ⎝ l1 ⎠⎦ 0 0 0 where M are rotation matrices. ⎛ ⎛ ⎞ ⎞ 1 0 0 cos(α) 0 sin(α) Mx (α) = ⎝ 0 cos(α) − sin(α) ⎠, My (α) = ⎝ 0 1 0 ⎠ 0 sin(α) cos(α) − sin(α) 0 cos(α) ⎛ ⎞ cos(α) − sin(α) 0 Mz (α) = ⎝ sin(α) cos(α) 0 ⎠ 0 0 1
(1)
(2)
The Hexabot Robot: Kinematics and Robot Gait Selection
Substituting matrices (2) into Eq. (1), we obtain: ⎛ ⎞ ⎛ ⎞ x (L1 + L2 cos(β) + L3 cos(γ − β))cos(α) ⎝ y ⎠ = ⎝ (L1 + L2 cos(β) + L3 cos(γ − β))sin(α) ⎠ L2 sin(β) − L3 sin(γ − β) z
301
(3)
Inverse Kinematics Problem. To obtain expressions for the inverse problem of kinematics, we solve the system (3) with respect to α, β and γ . Here we have the opportunity to obtain an explicit solution (4). In the general case, there are several solutions to this problem, therefore, a solution was chosen that corresponds to the position of the leg shown in Fig. 3: α = cos(−1) ( yx ) , ⎛ ⎞
2 2 2 2 2 2 x + y − L3 − L2 ⎟ ⎜L + z + |z| −1 ⎜ 1 −1 ⎟ − tan β = sin ⎝ (4) ⎠
2 x2 + y2 − L3 2 2 2 2L1 z x + y − L3 ⎛ ⎜ γ = π − cos−1 ⎝
2 ⎞ x2 + y2 − L3 ⎟ ⎠ 2L1 L2
L21 + L22 − z 2 −
With the help of these expressions, it became possible to more accurately set the trajectory of movement of the limb of the robot’s leg, depending on the selected trajectory, for example, on the Fig. 4 the movement is implemented in the vertical plane. The movement is organized along the points (xi , yi , zi ), which correspond to the angles (αi ,βi ,γi ),i = 1...n. We can move from point to point to describe the trajectory of the spider’s leg (n = 3, Fig. 4(a)). The larger the split by points, the more accurately the trajectory will be described by the robot’s foot (n = 6, Fig. 4(b)). But at the same time, the time for recalculating the trajectory increases, which can slow down the movement.
4 Gaits In this work, we optimize the robot’s gait in terms of speed when moving forward on a flat, horizontal surface. The main goal is to reach maximum speed with a six-legged robot. The robot has a relatively large mass (2.3 kg), so there is a question of choosing an algorithm for its movement, while the mass of one leg weighs 0.2 kg. Various types of gaits are quite well disassembled in the studies [13, 14]. Thus, in [14], using the example of Drosophila melanogaster, gaits were considered, both on a horizontal surface and on a vertical one, with the legs sticking to the surface. However, these works did not take into account the dynamics of the movement of the legs, they did not take into account the forces developed by the drives located in the hinges of the legs, they did not take into account the different kinematics of the movement of the leg (as a mechanism with three degrees of freedom) in space. These movements are described by Eqs. (4). In reality, all of these factors affect the speed of the six-legged robot, whose gaits are similar to those of Drosophila melanogaster.
302
S. Yu. Misyurin et al.
Fig. 4. The trajectory of movement of the robot’s leg, depending on the selected trajectory: (a) less accurate trajectory; (b) more accurate trajectory.
Consider several gaits that we have implemented. The robot’s movement patterns (a description of the movement of each of its limbs on a plane during its movement) are shown in Fig. 5 (a, b, c). Horizontally, the numbers of the legs (left and right) are indicated as in Fig. 1(b), vertically there are intervals of time, where black squares correspond to the leg is in contact with the surface, white squares – the leg is in motion from one position to another. Figure 5(b) characterizes the Tripod Gait movement most often implemented, in which three legs alternately stand on the surface, forming a “support polygon” – a stable triangle in which the center of gravity of the entire structure is inside the triangle. This gait is the most commonly practiced gait, and is within the support polygon bounded by the legs. Figures 5(c), 5(d) and 5(e) depict movement when at any given time only two legs are in contact with the surface, and the rest move in the air. In this case, it is necessary to ensure that the system is dynamically stable. As noted in [14] when observing insects, the fastest gaits had, on average, almost two feet on the ground at any given moment. In many cases, during movement on “two supports”, the projection of the center of gravity of the system is almost never within the support polygon bounded by the legs, as in the Tripod Gait movement Fig. 5 (b), which leads to static instability of the gait – as in many vertebrates, running briskly. Movement on “two supports” occurs exclusively due to the coordination of the legs. During locomotion, each front leg moves in sync with the opposite (Fig. 5(e)) hind leg, and the middle legs move together. This generates three power strikes per locomotion cycle. Consequently, all other things being equal (for example, the same foot speed) a “two-pointed” gait can provide the most continuous and therefore faster forward movement than a Tripod Gait. The gait depicted in Fig. 5(a) corresponds to a movement where only one leg is in contact with the surface. This is how some species of insects run, but it is difficult to implement it for the movement of a six-legged robot.
The Hexabot Robot: Kinematics and Robot Gait Selection
303
Fig. 5. Robot movement patterns
5 Conclusion Currently, a lot of interest in the world is directed to the study of biologically inspired robots. This research area of robots includes various types of similarity to biological beings, such as crawling robots (similar to worms or snakes), robots on four (dogs) and six (insects) legs, flying (like a butterfly) robots, etc. Researchers are trying to adopt those created by nature unique designs and movement capabilities for using robots in various circumstances. But the apparent simplicity created by nature is not easy to repeat. Simple copying of movement does not give the desired effect, it is necessary to carry out a whole complex of scientific research in order to obtain one or another required advantage in the design. On the example of a spider robot, the following problems can be distinguished that require solution: 1. Type of movement. Determination of all types of gaits of the biological prototype. Optimization - choosing the best gait for the task at hand.
304
S. Yu. Misyurin et al.
2. Determination of the kinematic structure capable of implementing this movement. 3. Construction of a mathematical model of the movement of the robot using the equations of kinematics and dynamics. Carrying out multi-criteria and multi-parameter optimization of the mathematical model of the robot.
Acknowledgment. The research was supported by Russian Foundation for Basic Research, project No. 18-29-10072 mk (Optimization of nonlinear dynamic models of robotic drive systems taking into account forces of resistance of various nature, including frictional forces).
References 1. Okhotsimsky, D.E., Golubev, Yu.F.: Mechanics and motion control of an automatic walking apparatus. Nauka, Moscow (1984). (in Russian: Mekhanika i upravleniye dvizheniyem avtomaticheskogo shagayushchego apparata) 2. Okhotsimsky, D.E., Platonov, A.K., Kirilchenko, A.A., Lapshin, V.V.: Walking machines: preprint. IPM of the USSR Academy of Sciences, No. 87, Moscow (1989). (in Russian: Shagayushchiye mashiny: preprint) 3. Larin, V.B.: Control of a walking apparatus. Naukova dumka, Kiev (1980). (in Russian: Upravleniye shagayushchim apparatom) 4. Lapshin, V.V.: Mechanics and motion control of walking machines. Publishing house of Bauman Moscow State Technical University, Moscow (2012). (in Russian: Mekhanika i upravleniye dvizheniyem shagayushchikh mashin) 5. Pavlovsky, V.E.: On the development of walking machines: preprint. IPM RAS, No. 101, Moscow (2013). (in Russian: O razrabotkakh shagayushchikh mashin: preprint) 6. Egunov, V.A., Kachalov, A.L., Petrosyan, M.K., Tarasov, P.C., Yankina, E.V.: Development of the insectoid walking robot with inertial navigation system. In: Proceedings of the 2018 International Conference on Artificial Life and Robotics (ICAROB 2018), pp. 387–390. IEEE Fukuoka S, B-Con Plaza, Beppu, Oita, Japan (2018) 7. Misyurin, S., Nelyubin, A.P.: Dominance relations approach to design and control configuration of robotic groups. Procedia Comput. Sci. 190, 622–630 (2021) 8. Misyurin, S.Y., Nelyubin, A.P., Potapov, M.A.: Multicriteria approach to control a population of robots to find the best solutions. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 358–363. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_46 9. Misyurin, S.Y., Nelyubin, A.P., Potapov, M.A.: Applying partial domination in organizing the control of the heterogeneous robot group. J. Phys. Conf. Ser. 1203(1), 012068 (2019) 10. Nelson, G., et al.: Petman: a humanoid robot for testing chemical protective clothing. Robot. Soc. Jpn. 30(4), 372–377 (2012) 11. Yang, U.J., Kim, J.Y.: Mechanical design of powered prosthetic leg and walking pattern generation based on motion capture data. Adv. Robot. 29(16), 1061–1079 (2015) 12. Sutyasadi, P., Parnichkun, M.: Gait tracking control of quadruped robot using differential evolution based structure specified mixed sensitivity h∞ robust control. J. Control Sci. Eng. 2016(2), 1–18 (2016) 13. Campos, R., Matos, V., Oliveira, M., Santos C.: Gait generation for a simulated hexapod robot: a nonlinear dynamical systems approach. In: 36th Annual Conference of IEEE Industrial Electronics, Glendale, AZ, USA, pp. 1–6 (2010) 14. Pavan, R., Thandiackal, R., Cherney, R.: Climbing favours the tripod gait over alternative faster insect gaits. Nature Commun. 8(1), 14494 (11 pages) (2017)
The Hexabot Robot: Kinematics and Robot Gait Selection
305
15. Misyurin, S.Y., Kreinin, G.V., Nosova, N.Y., Nelyubin, A.P.: Kinematics and dynamics of the spider-robot mechanism, motion optimization. In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) BICA 2020. AISC, vol. 1310, pp. 320–326. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65596-9_38 16. Misyurin, S.Y., Kreinin, G.V., Nosova, N.Y., Nelubin, A.P.: Six-legged walking robot (hexabot), kinematics, dynamics and motion optimization. Procedia Comput. Sci. 190, 604–610 (2020) 17. Hong, S., Kim, H.W., Choi, J.S.: Transient dynamic analysis of tracked vehicles on extremely soft cohesive soil. In: The 5th ISOPE Pacific/Asia Offshore Mechanics Symposium, Seoul, Korea, pp. 100–107 (2002) 18. Briskin, E.S., Chernyshev, V.V., Maloletov, A.V., Sharonov, N.G.: Comparative analysis of wheeled, tracked and walking machines. Robotics and Technical Cybernetics 1, 6–14 (2013). In Russian: Sravnitel’nyy analiz kolesnykh, gusenichnykh i shagayushchikh mashin. Robototekhnika i tekhnicheskaya kibernetika 1, 6–14 (2013) 19. Pavlovsky, V.E., Platonov, A.K.: Cross-country capabilities of a walking robot, geometrical, kinematical and dynamic investigation. In: Morecki, A., Bianchi, G., Rzymkowski, C. (eds.) Romansy 13. International Centre for Mechanical Sciences (Courses and Lectures), vol. 422, pp. 131–138. Springer, Vienna (2000) 20. Briskin, E.S., Chernyshev, V.V., Maloletov, A.V., et al.: Walking machine “Octopus”. Mechatron. Autom. Control 5, 48–49 (2004). (in Russian: Shagayushchaya mashina “Vos’minog”. Mekhatronika, avtomatizatsiya, upravlenie) 21. Chernyshev, V.V.: Experience in the use of a walking machine for the elimination of emergency oil spill. Life Safety 5, 28–30 (2003). (in Russian: Opyt ispol’zovaniya shagayushchey mashiny dlya likvidatsii avariynogo razliva nefti. Bezopasnost’ zhiznedeyatel’nosti) 22. Briskin, E.S., Chernyshev, V.V., Maloletov, A.V., et al.: On ground and profile practicability of multi-legged walking machines, Climbing and Walking Robots. In: CLAWAR 2001: Proc. of the 4-th Int. Conf. Karlsruhe, Germany, 2001, pp. 1005–1012 (2001) 23. Briskin, E.S., Chernyshev, V.V., Maloletov, A.V., Zhoga, V.V.: The Investigation of Walking Machines with Movers on the Basis of Cycle Mechanisms of Walking. In: 2009 International Conference on Mechatronics and Automation, pp. 3631–3636. IEEE, Changchun, China (2009) 24. Raibert, M.: Legged Robots That Balance. The MIT Press (1986) 25. Ignatev, M.B.: Cybernetic Picture of the World. Complicated Systems’ Theory. GUAP Publ., Saint-Petersburg (2011). (in Russian: Kiberneticheskaia kartina mira. Teoria slozhnykh system)
On the Possibility of Using the Vibration Displacement Theory in the Analysis of Ship Accident Rate Using Artificial Intelligence Systems S. Yu Misyurin1,2(B)
, Yu. A. Semenov1 , and E. B. Semenova1
1 Blagonravov Mechanical Engineering Research Institute RAS (IMASH RAS) ,
Moscow, Russia 2 Moscow Engineering Physics Institute (MEPHI) , Moscow, Russia
Abstract. The problem of transportation of unfixed cargo by ships in sea conditions is considered. As one of the influencing factors in order to predict the accident rate, it is proposed to use the results of applying the theory and vibration displacement (transportation). The effective roll angle model is considered as the simplest model of this theory. Used model makes it possible to determine the speed of the body’s “slow” motion and its direction - to the vessel plane of symmetry or to its sides. The latter case is associated with a shift in the center of gravity of the vessel, leading to increased roll or even overturning. The estimation of critical roll angle – angle, upon reaching which breakaway and movement of unfixed cargo to the ship’s side occurs. The possibility of applying the model when using artificial intelligence systems is noted. Keywords: Sea transportation · Heaving · Unfixed cargo · Bulk bodies · Vibration movement · Modeling · Accident prediction · Artificial intelligence systems
1 Introduction There are well-known accidents of ships on sea waves caused by the displacement of transported unfixed cargo (individual objects, loose bodies, logs) to one of the sides of the ship. As a result of this displacement, the position of the ship’s center of gravity changes, which causes an increased roll and even overturning. A significant number of studies, including those with the use of artificial intelligence systems, have been devoted to modeling the behavior of unfixed cargo on ships in waves, as well as the development of measures to prevent corresponding accidents. We refer, in particular, to the monograph [1, 2]. M.A. Moskalenko’s [3] and T.E. Malikova’s [4] dissertations provide detailed surveys of the problem’s current state.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 306–312, 2022. https://doi.org/10.1007/978-3-030-96993-6_33
On the Possibility of Using the Vibration Displacement
307
Despite the presence of these studies and developments, there are still reports of accidents and destruction of ships carrying unfixed cargo. According to [4], 3–5% of large ships are destroyed annually, and due to mechanical damage to cargo, annual losses amount to billions of dollars. This paper draws attention to the possibility of vibrational displacement theory usage for the purpose of studying unfixed loads behavior on a ship and predicting appropriate emergency situations using artificial intelligence systems. This theory was developed largely by Russian scientists [5–9]. The theory has been developed in relation to the study of directional motion under the influence of vibration of both individual bodies of various shapes and bulk media. Usage of the model makes it possible to estimate the “slow” motion speed of the body movement and its direction - to the plane of symmetry of the ship or to its sides. The latter case is associated with a shift in the center of gravity of the vessel, leading to increased roll or even overturning. In robotics, artificial intelligence is often based upon numerical solutions of differential equations. Such equations can be obtained, like, for example, in this paper, using methods of classical mechanics or theory of mechanisms and machinery. Aforementioned solutions are used for the problems of control or precise positioning [10–12].
2 On the Effect and Theory of Vibration Displacement The effect of vibrational movement is generally understood as the emergence of a directed, on average, “slow” change (in particular, movement) due to non-directional, on average, usually “fast” vibrational influences [5]. One of the main special cases of the effect is the vibrational transportation of bodies in vibrating trays and vessels.
3 Basic Provisions and Assumptions We use a model of rotary vibrations, presented in [9], but it is worth to quote here its main provisions and assumptions subject to the conditions of the problem. 1. The roll of a vessel in waves is a rotational vibration relative to a certain average roll angle α0 , which is unequivocally determined by the position of the vessel’s center of gravity. 2. An emergency situation occurs if this average angle, added with the largest value of the oscillatory component α1 , exceeds a certain value α∗ = α0 + α1 which we call the critical roll angle. With such a picture of an emergency, an important question is in which direction the cargo will move when the ship rolls - to one of the sides or to the plane of symmetry of the ship. To solve this issue, we will use the simplest model shown in Fig. 1.
308
S. Yu. Misyurin et al.
Fig. 1. The direction of the vibrational movement of the load on a symmetrically oscillating rough surface (the simplest case): a - rotary vibrations about the O axis, which lies below the surface plane - the displacement of the load to the plane of symmetry SS, b - rotary vibrations about the O axis, which lies above the surface - the load is displaced from the plane symmetry.
To the above, the following notes should be made: 1) Usually, the theory of vibrational displacement assumes that the vibrating surface moves translationally, i.e. the amplitude A and the angle of vibration β are constant, while in our case they depend on the x coordinate. However, as shown in [9], such dependencies can be taken into account purely parametrically. Under the conditions of Fig. 1, we have α = α1 sin ωt and in a more general case, when there is a certain average slope α0 , α = α0 + α 1 sin ωt
(1)
2) When we apply this model in relation to the case of granular bodies, it may be advisable to take into account the trajectory of the cargo medium center of gravity (Fig. 2) when its free surface is inclined.
On the Possibility of Using the Vibration Displacement
309
Fig. 2. Displacement of the center of gravity of a granular medium when its free surface is tilted
Simple calculations show that the displacement occurs along a parabola: z(x) =
6H 2 x b2
where H is the height of the layer, and b is the width of the compartment with bulk cargo. This shift can also be taken into account purely parametrically.
4 An Approximate Estimation of the Critical Roll Angle Unlike the static analogues, proposed dynamic model postulates that the body begins its motion toward the sides not upon reaching an angle of friction α = −ρ1 = arctgf1 , where f 1 - coefficient of static friction, but a smaller angle α = α * - critical roll angle, and that this difference can be significant. Let us estimate the value of the angle α *. Considering the position of the body near some point x of the flat surface PP in local moving coordinates xMy associated with this surface (Fig. 3). The equations of relative motion in such axes are as follows: m¨x = mr(x)α1 ω2 cosβ(x)sinωt − mgsin(α0 + α1 sinωt) + F
(2)
m¨y = mr(x)α1 ω2 sinβ(x)sinωt − mgcos(α0 + α1 sinωt) + Jc + N
(3)
h where m – body mass, g – acceleration of gravity, r(x) = cosβ(x) – the distance from the center of gravity M to point O, α1 – amplitude onboard ship oscillation, ω – frequency of oscillations, F – dry friction force, N – normal reaction, Jc – Coriolis inertia force. As shown in [9], the Coriolis acceleration can be neglected due to its smallness.
310
S. Yu. Misyurin et al.
Fig. 3. To the equations of motion
In a state of relative rest (x = const, y = 0, Jc = 0), from Eqs. (2), (3), the following expressions for the forces F and N can be obtained F = mg sin (α0 + α1 sin ωt) − mA(x)ω2 cos β(x) sin ωt N = mg cos (α0 + α1 sin ωt) − mA(x)ω2 sin β(x) sin ωt Where A(x) = r(x)α 1 – the amplitude of oscillations of the plane at the location of the center of gravity of the body. We can assume that N > 0, so the body does not detach from the surface. The body will begin to move in the x- axis direction upon fulfillment of the inequality. F < −f1 N ,
(4)
After simple transformations, taking into account the ratio of tgρ1 = f1 , inequality (4) can be represented as (assuming β − ρ < π/2) sin ωt >
g sin (α + ρ) Aω2 cos (β − ρ)
(5)
Dependence of β = β(x), taken into account parametrically is omitted here and the expression (1) is used.
On the Possibility of Using the Vibration Displacement
311
It is enough for inequality (5) to be fulfilled only on some finite interval of time t, and with this in mind, also taking into account (1), it can be represented as: gsin(α0 + α1 + ρ1 ) α∗ there is a tendency to move the load in the direction of one of the sides, despite the fact that this inequality is fulfilled only at certain intervals. This tendency may be stronger than the opposite trend in the situation of Fig. 1a. It strengthens the unfavorable trend in the situation of Fig. 1b. And, of course, this tendency determines the behavior of the load at Aω2 /g 1. This is due to the fact that the movement of the load during each period of time leads to a gradual increase in the modulus of the mean angle of inclination |α0 |. In other words, there is an instability of the equilibrium position of the load “on average”.
6 Conclusion The paper presents the idea of using the apparatus and the results of the theory of vibrational displacement in the problem of accident rate of ships carrying unfixed loads in rolling conditions. The shown simplest model can be used as one of the influencing factors for predicting the accident rate of ships using artificial intelligence systems.
312
S. Yu. Misyurin et al.
References 1. Korobtsov, V.I.: Change of some physical and mechanical properties of grain cargo and patterns of its movement in conditions of sea transportation. Technology of safe transportation of goods by sea. In: Proceedings. TsNIIMF. – L.: Transport, – Issue 56, pp. 59–64 (1964) 2. Garkavy, V.V.: Dynamics of a vessel with unfixed loads at high inclinations. Leningrad State Marine Technical University, doct. dis. Author, Leningrad (1991) 3. .Moskalenko, M.A.: Methodological bases of the structural safety of ships. Marine State University, doct. dis., Vladivostok (2006) 4. Malikova, T.E.: Theoretical foundations and methodology for regulating the displacement of cargo on sea vessels. Marine State University, doct. dis., Vladivostok (2014) 5. Blekhman, I.I.: Theory of vibration processes and devices. - Vibration mechanics and vibration technology. Publishing house “Ruda I Metally”, SPb. (2013) 6. Blekhman, I.I., Janelidze, G.: Vibration Displacement. Nauka, Moscow (1964) 7. Nagaev, R.F.: Periodic modes of vibration movement. Science, Moscow (1978) 8. Vibrations in technology: Handbook: In 6 volumes. Mechanical Engineering, Moscow (1978– 1981) 9. Blekhman, I.I., Vasilkov, V.B., Semenov, Y.: Vibrotransporting of bodies on a surface with non-translational rotational oscillations. J. Mach. Manuf. Reliab. 49(4), 280–286 (2020) 10. Kreinin, G.V., Misyurin, S.: Dynamics and synthesis of the parameters of a positioning drive. J. Mach. Manuf. Reliab. 38(6), 523–530 (2009) 11. Kreinin, G.V., Misyurin, S.: Selection of the scheme for incorporating a drive into the structure of a mechanism in solving problems of kinematic synthesis. J. Mach. Manuf. Reliab. 37(1), 1–5 (2008) 12. Misyurin, S.Y., Kreinin, G.V., Nosova, N.Y., Nelyubin, A.P.: Kinematics and dynamics of the spider-robot mechanism, motion optimization. In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) BICA 2020. AISC, vol. 1310, pp. 320–326. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65596-9_38
Proposal and Evaluation of Deep Profit Sharing Method in a Mixed Reward and Penalty Environment Kazuteru Miyazaki(B) Research Department, National Institution for Academic Degrees and Quality Enhancement of Higher Education, Kodaira, Japan [email protected]
Abstract. Deep reinforcement learning, which combines deep learning and reinforcement learning, is attracting attention. In many cases, Qlearning (QL) is used as a reinforcement learning method. On the other hand, the authors are paying attention to exploitation-oriented learning such as Profit Sharing (PS) from the standpoint of strongly enhancing experience. Though many methods such as DQNwithPS have been proposed as a method that combines exploitation-oriented Learning and deep learning, complete independence from QL has not been realized in an environment where rewards and penalties are mixed. In this paper, we propose a deep exploitation-oriented learning method called deep profit sharing that does not use QL in an environment where rewards and penalties coexist. We confirm the effectiveness by numerical experiments. Keywords: Deep reinforcement learning · Exploitation-oriented learning · Reward and penalty · Q-learning · Profit Sharing
1
Introduction
What do we expect to use a reinforcement learning system? In many cases, learning system users have some sort of “what they want to do (A)” and “what they don’t want to do (B)” in mind. It seem to be expected that they are expecting the acquisition of policies that meet the demands such as “acquiring A while avoiding B” through interaction with the environment. In such cases, the state to be transitioned is usually rewarded. It also punishes the conditions to be avoided. Then, we set a positive value for the reward and a negative value for the penalty, and usually aim to maximize the (discounted) expected acquired reward value as a one-dimensional amount. A well-known representative of such methods is Q-learning (QL) [5], which guarantees optimality under Markov decision processes (MDPs). Many other methods do not make a big difference in that “users set reward and penalty values”. Such a technique is very effective in the sense c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 313–318, 2022. https://doi.org/10.1007/978-3-030-96993-6_34
314
K. Miyazaki
of “optimization under given reward and penalty values”. However, depending on the set reward and penalty values, the policy expected by the user may not be obtained. On the other hand, the authors do not regard the reward as a continuously changing quantity such as a numerical value, but rather treat it as a signal when the goal is achieved. We also aim to reduce the number of trials and errors by strongly strengthening the experience gained. Learning methods designed from this standpoint are called Exploitation-oriented Learning (XoL) [1]. The authors have proposed various methods after proving the rationality theorem of Profit Sharing (PS) [1] as a method to satisfy XoL. For example, Penalty Avoiding Rational Policy Making algorithm (PARP) [1] is known as a method for avoiding penalty. Furthermore, in recent years, the authors have proposed methods that integrate with deep learning such as DQNwithPS [3], but avoiding penalty there depends on Deep Q-Network (DQN) [4], which learns by QL. Therefore, in the combination of deep learning and exploitation-oriented learning, the method of handling reward and penalty at the same time has not been completed. Therefore, in this paper, we propose a deep exploitation-oriented learning method that does not use QL in an environment where rewards and penalties coexist, and confirm its effectiveness by numerical experiments.
2
Reward and Penalty Design Issues
Here is an example of the reward and penalty design issues mentioned in Sect. 1. The environment used in the experiment is shown in the lower left of Fig. 1b). It contains two types of rewards (R1, R2) and one type of penalty (P). Figure 1a and 1b are the results of QL in (R1 = 50, R2 = 100, −1 ≤ P ≤ −1000) and (R1 = 0, R2 = 100, −1 ≤ P ≤ −1000), respectively. In this environment, if the agent selects the action a0 in every sensory inputs, it can get R1 without P1 . It is the only penalty avoiding rational policy. We call the policy the right circle policy (RCP). On the other hand, if the agent selects the action a1 in every sensory inputs, it gets R2 and P1 . It is called the left circle policy (LCP). The vertical axes of both figures show the acquirement number of RCP in different 100 trials. For this problem, we compared QL and PARP. The learning rate of QL was 0.05, the discount rate was 0.9, and roulette selection based on Q-value was used for action selection. PARP used random selection as an environmental exploration strategy. In this environment, we can consider the following four type priorities because we assume that the priority does not contain two or more than rewards. “A(P) > G(R1)”,”A(P) > G(R2)”, “G(R1) > A(P)”, and “G(R2) > A(P)”. Only the priority “G(R2) > A(P)” requires LCP. The other priorities require RCP. PARP learns always RCP. Therefore PARP cannot fit on the priority “G(R2) > A(P)”. Though QL can learn RCP or LCP, it is difficult to design appropriate P, R1 and R2 values for each priority. For example, if we set (R1 = 50, R2 = 100, P = −1000), QL always learns RCP even if it should learn LCP in “G(R2) > A(P)”. In this case, if we set (R1 = 50, R2 = 100, P = −1), QL can always
Deep Profit Sharing in a Mixed Reward and Penalty Environment
315
Fig. 1. The acquirement number of RCP.
learn LCP. However, it is very difficult to find the combination of reward and penalty values before learning.
3
Deep Exploitation-Oriented Learning that Handles Rewards and Penalties at the Same Time
In this paper, in order to treat reward and penalty at the same time, PS that learns with positive reward and PS that learns with negative reward are used together. These two PSs are executed at the same time, and when the action is selected, the action is selected by the PS learned by the positive reward in the space excluding the action whose value is updated by the PS learned by the negative reward. If all actions are excluded by the negative reward PS, select one action uniformly and randomly. If there are multiple types of rewards and penalties, we can use the various methods proposed by XoL. The method that incorporates this behavior selection method into deep exploitation-oriented learning is referred to as Deep Profit Sharing (DPS).
4 4.1
Evaluation of Deep Exploitation-Oriented Learning Experimental Environment
The environment shown in Fig. 2 is used as a testbed. Each agent has four kinds of desire levels (A, B, C, D), and each level is decreased by a predetermined amount for each output of an action. This decrease is predetermined for each level. The learning agent in S0 selects one action in four actions (a, b, c, d) that corresponds to each desire levels (A, B, C, D), respectively. The agent can properly observe all desire levels but the function that defines the decrement of each desire level is not known in advance. The agent receives 21 types of sensory inputs labeled s0 to s35 (Some numbers are skipped). State transitions that do not change the sensory input, such as returning to s1 after selecting action c in s1, are omitted in Fig. 2. When the agent selects action a in s1, its transits to s3 by a probability p = 0.5 and to new state s6 located between s1 and s2 by a probability 1.0 − p, though it is also omitted in Fig. 2.
316
K. Miyazaki
Fig. 2. Environment used in the numerical experiments.
Fig. 3. Input for Deep Reinforcement Learning in decision making system application.
On reaching a goal, represented by triangle A, B, C, or D in Fig. 2, the corresponding desire level is increased. This increment, which is predetermined for each desire level, corresponds to a reward. On the other hand, if one level decreases to zero, all desired levels are initialized and the agent returns to s0, which represents a penalty. In this situation, we consider the problem which action should be selected in s0. The solution depends on the change in desire level resulting from selecting an action and acquiring a reward. The purpose of the agent is to learn a rational policy that can avoid a penalty. 4.2
Use of Deep Profit Sharing
In this paper, the input to the network is obtained by discretizing the values of desire levels A, B, C, and D into 64 pieces as shown in Fig. 3. When the value of each level is greater than or equal to the value written in the upper left corner of each square box in Fig. 3, 1 is given at the corresponding location, otherwise 0 is given as input. For example, the input shown in Fig. 3b) is obtained when the values of desire levels A, B, C, and D are 54, 99, 77, and 16, respectively. Two types of 3 × 3 filters are applied to such an 8× 8 input. Setting the stride to 1 results in two 6 × 6 outputs. After that, average pooling is applied to each 2×2 region to obtain 3×3×2 output. For these 18 inputs, a fully-coupled network consisting of an intermediate layer 3 and an output layer 4 is constructed, and the coupling weight of the network is learned by the back propagation method. Here, the four neurons in the output layer correspond to the evaluation values of desire levels A, B, C, and D, respectively. The larger of these output values, the more the network coupling load is learned so that it indicates the desire level to be selected in the current input pattern. The reinforcement function that regulates the distribution of rewards in DPS uses a geometric decreasing function with a discount rate of 0.99 for rewards. On the other hand, for penalty, a constant value function without discount was used to ensure avoidance of penalty.
Deep Profit Sharing in a Mixed Reward and Penalty Environment
4.3
317
Action Selection Method
In this problem setting, in many cases, it can be expected that the minimum priority selection method (MPS) [2], which always aims at the goal with the minimum desire level, works effectively. Therefore, unless a problem occurs, the MPS output is output to the environment as it is. On the other hand, if we get a penalty as a result of acting according to MPS in a certain environment, consider using the “penalty avoidance list” proposed in the paper [2] in the same environment. Learning using the penalty avoidance list has the disadvantage that if all actions are registered in the avoidance list, the actions are selected uniformly and randomly. For such cases, consider using DPS to reduce random selection as much as possible. That is, first, the penalty avoidance list is prioritized, and when a random selection is requested in that list, the output of DPS is adopted. 4.4
Experimental Method
For this problem, it is assumed that the method of reaching each goal (shortest path) has been learned in advance as lower-level learning. That is, the (shortest) way to reach each target (A, B, C, D) from each state (s0 to s35) is known. In the environment shown in Fig. 2, the number of actions required to reach each goal from the state s0 is all 6. The initial value of each desire level was set to 100, and the experiment was conducted assuming four different amounts of each desire level, 1, 3, 5, 7, which are reduced each time the agent outputs one action. Specifically, desire levels A and B are fixed at 1, C is 1 to 7, and D is the value of C. Changed between 7. For each of these combinations of desire level reductions, the amount of reward for achieving the goal, that is, the amount of increase in the desire level reached when each goal was reached, was 20, 40, 60, 80, 100 prepared in 5 ways. That is, a total of (4 + 3 + 2 + 1) · 54 = 6, 250 experiments were performed. In this problem setting, the minimum priority strategy (MPS) is effective in many experiments. So, first try MPS, which is a quick response device, and if it does not work effectively, use a contemplation device. This makes it possible to verify how much “desire level” changes the proposed method can follow when there are four types of rewards, A, B, C, and D. 4.5
Results and Discussion
To confirm the effectiveness of DPS, we performed random selection and comparison with DQN. Among all 6, 250 experiments, the average value in 30 experiments in which random selection (random), DPS and DQN were successful while MPS failed is shown in the upper part of Table 1 and the standard deviation is shown in the lower part. In addition to the case where DQN gives the same value to the reward and the penalty, the experiment when the reward is given a value 100 times the penalty (P100M), and, the experiment in the case of learning only with penalty without reward (P0M) were excuted.
318
K. Miyazaki
Table 1. Comparison of the number of successful random selection, DPS, DQN. The number of success Random DPS Ave. S.D.
101.9 3.53
DQN DQN (P100M) DQN (P0M)
106.3 99.5 100.3 5.14 4.55 4.39
103.5 4.79
These results were tested at a significance level of 5% using a two-sample t-test, and significant differences were found between DPS and everything else, and between random and DQN. Regarding DQN (P0M), a significant difference was confirmed between DQN and DQN (P100M). From this, it was confirmed that DQN is inferior to random, but DQN (P100M), which has 100 times the reward as penalty, has the same performance as random. In addition, it was confirmed that DQN (P0M), which is learned only by penalty, is superior to DQN and DQN (P100M), but there is no difference from random and DPS. In this way, it was found that the performance of DQN changes greatly depending on the value of reward and penalty. On the other hand, it was confirmed that DPS can learn stably without being affected by such reward and penalty values.
5
Conclusion
A normal reinforcement learning system represented by Q-learning handles positive rewards and negative rewards at the same time, but there is a problem that appropriate reward and penalty values must be designed for each problem. In this paper, we applied the idea of XoL, which treats rewards and penalties as independent signals without setting values, to deep exploitation-oriented learning. As a deep exploitation-oriented learning method that can handle rewards and penalties at the same time, we proposed Deep Profit Sharing and confirmed the effectiveness of the proposed method by numerical experiments. In the future, we plan to apply it to actual problems and extend it to a multi-agent environment.
References 1. Miyazaki, K.: Exploitation-oriented learning XoL: a new approach to machine learning based on trial-and-error searches, pp.267-293, Multi-Agent Applications with Evolutionary Computational and Biologically Inspired Technologies (2010) 2. Miyazaki, K.: Proposal and evaluation of deep exploitation-oriented learning under multiple reward environment. Cognitive Syst. Res. 70, 29/39 (2012) 3. Miyazaki, K.: Exploitation-oriented learning with deep learning - introducing profit sharing to a deep Q-Network. J. Adv. Comput. Intell. Intell. Inform. 21(5), 849/855 (2017) 4. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., et. al.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013 (2013) 5. Watkins, C.J.H., Dayan, P.: Technical note: Q-learning. Mach. Learn. 8, 55/68 (1992)
Generalized Structure of Active Speech Perception Based on Multiagent Intelligence Zalimkhan Nagoev , Irina Gurtueva(B)
, and Murat Anchekov
The Federal State Institution of Science Federal Scientific Center, Kabardino-Balkarian Scientific Center of Russian Academy of Sciences, I. Armand Street, 37-a, 360000 Nalchik, Russia [email protected]
Abstract. Recent success in the field of speech technology is undoubted. Developers from Microsoft and IBM reported on the efficiency of automated speech recognition systems at the human level in transcribing conversational telephone speech. According to various estimates, their WER now is about 5.8–5.1%. However, the most challenging problems in speech recognition – diarization and noise cancellation – are still open. A comparative analysis of the most frequent errors made by systems and people when solving the recognition problem shows that, in general, the errors are similar. Errors made by a human when solving speech recognition problems are much less critical; they seldom distort the meaning of a statement. In other words, these errors are not sematic. That is why the mechanisms of human speech perception are the most promising area of research. This paper proposes the model of a general structure for active auditory perception theory and the neurobiological basis of the hypothesis put forward. The proposed concept is a basic platform for general multiagent architecture. We assume that speech recognition is guided by attention, even in its early stages, a change in the early auditory code determined by context and experience. This model simulates the involuntary attention used by children in mastering their native language, based on an emotional assessment of perceptually significant auditory information. The multiagent internal dynamics of auditory speech coding can provide new insights into how hearing impairment can be treated. The formal description of the structure of speech perception can be used as a general theoretical basis for the development of universal systems for automatic speech recognition, highly effective in noisy conditions and cocktail-party situations. Formal means for program implementation of the present model are multiagent systems. Keywords: Automatic speech recognition · Speech perception · Psycholinguistic models · Multiagent systems
1 Introduction The efficiency of modern speech recognition systems has been significantly improved with deep learning methods [1–7]. However, the implementations of the algorithm are characterized by high latency [7]. Deep learning methods are also currently used to solve the problem of diarization, but they are limited to speakers whose voices are used © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 319–326, 2022. https://doi.org/10.1007/978-3-030-96993-6_35
320
Z. Nagoev et al.
at the stage of preliminary training and are unstable under operating conditions when the number of speakers increases. The deep attractor network successfully performs source separation based on the projection of the time-frequency representation of a mixed audio signal into a multidimensional space, where the speaker’s representations are separated more clearly [6]. But the algorithm also has a high latency. New online implementation of a deep attractor network, carried out at the Columbia University [8], seem to be promising. But so far, the task of universal speech separation remains one of the most difficult speech processing problems. Comparison of the effectiveness of a human and the Deep Speech 2 model from Baidu at different SNR (signal-to-noise ratio) indicators showed that the percentage of automatic recognition errors increases by at least eight times [9]. Classic methods using multiple microphones do not show high reliability, and we may say that the problem of noise suppression in speech recognition is still an open problem. Modern concepts of speech perception by a human need to overcome the framework of the cortical-centric approach. Conceptualization of speech perception as a passive process of mapping acoustic patterns into neural phonetic representations is insufficient. It is necessary to view speech coding as already plastic and attention-driven in the early stages of processing. Recent studies have shown that speech perception results from direct and inverse interactions between the multiple brain regions involved in sound processing, including descending projections [10, 11]. This could provide new options for the treatment of hearing impairment with augmentation or expand therapeutic treatments.
2 A Brief Review for Model Representations of Speech Perception in Modern Psycholinguistics Psycholinguistic models of speech perception vary widely: from generalized descriptive models, using the postulates of general communication theory [12], to mathematical ones that attempt to formalize the recognition process in rigid mathematical form, and simulation models, based on research on cognitive processes involved in speech processing [13]. The present review discusses the model representations that have had the most significant impact on subsequent developments in this area and satisfy the feasibility criterion. One of well-known models in the field of analysis of speech perception is the logogen model based on the evidence of the parallel lexical activation phenomenon [14]. This system is a thesaurus - an information structure that covers all possible information about a word: meaning, phonological, morphological, derivational, and syntactic features, frequency rank, and even features of admissible contexts. All available features and parameters of the analyzed signal are evaluated against the information available in the dictionary. If the total sum of the parameters exceeds a certain critical threshold, the logogen is activated, and a decision is made to recognize the word. The disadvantage of this model is the interaction of features of different levels at an early stage of the perceptual process and the lack of a clear mechanism for integrating different types of information. It is also unclear how the process of recognition proceeds in time, what is the structure of the vocabulary. But it offered a representation of the parallel processing
Generalized Structure of Active Speech Perception
321
of speech information, which behavioral research considers the main content of speech recognition [15–17]. The cohort model is based on the fact, verified in experiments on semantic activation, that the perceptual processing of words is carried out “from left to right” [18]. Recognition is carried out in three stages. At the access stage, the system activates a set of words from the internal dictionary by matching with the initial elements of the analyzed word, forming a cohort. At the selection stage, all words that do not match with the incoming signal in more than one feature are excluded from the cohort. At the integration stage, the features of the words that make up the cohort are tested for integrability with higher levels of linguistic knowledge. Upon reaching the “recognition point,” that is, when the feature sequence becomes unique, the decision is made without further reference to the phonetic characteristics of the word. Unfortunately, from a behavioral research perspective, a human can recognize a word that does not match acoustically or contextually [18], but the exclusion of inappropriate words from the cohort leads to the model being unable to correct errors arising from inconsistencies. It has also been shown that listeners recognize high-frequency words more easily than low-frequency words [19], but the cohort model does not take into account the effect of frequency. The cohort model served as the basis for the development of a whole family of models for sequential narrowing of the search class (the so-called tuple models). For example, Shilcock, Taft, Hambly modified the cohort model [20], refusing to stop the phonetic analysis after the recognition point. In the Neighborhood-Activation Model [21], the tuple is formed by quasi-homonyms. A tuple is composed of words that match not only with the initial elements but also with any features from the point of view of phonetic similarity. The principles of the logogen model and the cohort model are synthesized by Shortlist [22]. In this approach, the reconfigurable perceptual setting, having performed the analysis of the acoustic content of the signal, provides a tuple of lexical units. The word recognition decision is made using the word frequency and context limiting information. Formally, the removal of ambiguity occurs based on reducing the weight coefficients for the recognition thresholds for low-frequency words and vice versa. Minerva 2 is a mathematical model of episodic memory [23–25]. It was developed on the assumption that primary memory as temporary storage is associated with secondary. The primary memory sends the analyzed signal (“probe”) to the secondary memory and receives a response (“echo”). When a probe is sent to long-term memory, only one echo is returned. All tracks existing in the secondary memory are activated and respond simultaneously, so the echo track is a combination of their messages. The track’s contribution to the echo is determined by the degree of activation, so the most significant contribution is made by those tracks that are relatively similar to the probe. When the presentation of the same word is repeated multiple times (so we have multiple tokens of the same word), it results in the echo revealing the most common aspects of these tracks. Unfortunately, the model does not offer a solution for the continuous speech recognition problem. Although Minerva 2 is a purely episodic model, it simulates abstract behavior corresponding to the blending of probes and stored tracks to shape the experience. The mutual activation model (TRACE) is a hierarchically structured network. Its nodes are differential features, phonemes, and words [13]. The feature node activated by
322
Z. Nagoev et al.
an acoustic signal, brings all phonemes possessing it to the activation state. Phonemes activate all words which exponents contain a particular phoneme. A phoneme or word is identified when the activation level takes on a value that exceeds the level of activation of all other phonemes or words. Quantitative formulas are proposed, according to which the probability of choosing a particular unit is calculated for given parameters of the system and the input signal. The TRACE model does not provide special segmentation procedures; the system examines the probabilities of a new word entering the input at each “phonemic” point of the speech chain. Obviously, modern recognition models differ significantly in details since modern psychology and psycholinguistics have not developed a single idea of the unit, mechanisms, and structure of speech perception. In many experiments, inconsistent results are obtained, which can be explained by the significant complexity of the problem itself and the difficulties in developing adequate research methods.
3 Cognitive Architecture of the Process of Speech Perception Based on Multiagent Recursive Intelligence The latest results of behavioral research and human neuroimaging and analysis of the problems of automatic speech recognition require considering auditory perception as an active process, including cognitive processing of sound information and different types of learning [26]. The theoretical foundations of the development proposed in this work are based on modern research in cognitive psychology and neurobiology. The implementation tool is multiagent systems, the construction method of which based on ontoneuromorphogenesis is described in detail in [27]. Multiagent conceptualization of active speech perception assumes that adaptive perception is possible when auditory information is processed at four levels: preliminary recognition (level I), subconscious recognition (level II), conscious recognition (level III), and situation level (level IV). The first level of architecture - “preliminary recognition” - simulates the functioning of the peripheral auditory system of a human. In fact, at this stage of processing, features are extracted and a vector of spectral and dynamic features is created, which sufficiently characterizes the signal’s acoustic features [28–30]. The feature vector is fed to the input of a multiagent recursive cognitive architecture [27], in which the developers have previously formed a set of neural factories - agents of a particular type, which, create intellectual agents on-demand and determine their type. The second level is activated by a request from an agent-actor to a neuro factory to create an intellectual agent-grapheme. Such agents are created by a particular program that reads the agent’s genome – a starting set of production rules in the agent’s knowledge base [27]. In other words, genome is the initial information on the degree of significance of events for the implementation of the object function of an intellectual agent which contained in it a priori. The multiagent system assigns an additional feature to the agent-actor - an emotional coloring. The system also identifies agent-actor as a prototype/non-prototype, based on experimental data on the positive perception of pro-longed vowels and concentration of attention on sounds with a low volume level
Generalized Structure of Active Speech Perception
323
[31]. The prototype/non-prototype parameter is a binary feature. Emotional coloration is defined in the range from 0 to 1. Thus, at this stage, the knowledge base of the agent characterizing the phoneme consists of spectral-temporal features, as well as emotional coloration and a prototype/non-prototype characteristic. These parameters are used as information to exclude or to weigh hearing significance and to interpret context in the situation of one-to-many mapping. Then, the agent in order to find the class to which it belongs asks the expert a question. Based on the expert’s answer, a contract is concluded between the agent-actor of the first level and the agent-grapheme characterizing the given class; that is, supervised machine learning is implemented. The set of features is supplemented with a classifying contractual relationship with a grapheme agent. The agent is placed in the perceptive space [31]. The prototype/non-prototype feature and the assessment of emotional coloration determine not only the spatial position of the agent in the multiagent perceptive space but also the initial lifetime of the agent in the system. Correct settings of the lifetime parameter allow the system to remain adaptive without loss its stability. Also the settings of the parameter of the agent’s life expectancy gives an opportunity to investigate the problems of the effectiveness of the acquisition of new knowledge by children and the mechanisms of memory [31]. Evaluation of the prototype/non-prototype feature in accordance with the statistical regularity of the input data allows one to shift the position of the agent in the feature space and simulate the plasticity of listeners’ perception, sensitivity to the patterns of distribution of speech syllables [32], correlations between acoustic characteristics defining units [33]. Emotional evaluation in the early stages of processing allows the formation of the primary contextual framework and becomes the key to solving the classification problem when the logic of first-order predicates is not enough. This stage of processing simulates auditory information processing in the brain’s subcortical regions, corresponding to procedural learning, in which categorization rules are difficult or non-verbal. It is shown in [34] that when solving problems of integrating information from several dimensions at the stages preceding decision making, dopaminemediated reward signals provided by the basal ganglia are used. Recent researches have shown that the nuclei of the basal ganglia are associated with large areas of the cerebral cortex [34] and play an important role not only in the processes of motor activity, but also in the implementation of cognitive functions, in a wide range of learning problems, including the categorization of perception [35]. Research on general learning suggests that several learning systems and their corresponding neural groups exist, including in subcortical structures [35]. The study of the involvement of subcortical learning systems is essential for the development of holistic neurobiological models of speech categorization since modern neurobiological and theoretical models of speech processing focus mainly on the cerebral cortex. At the third level - the level of “conscious recognition” - the agents of the previous layer are grouped around significant objects and actions. We believe that for reliable speech recognition, it is necessary to establish a connection between the spectral characteristics of the signal and the “articulatory event” underlying it. An articulation event is a pair of agents, one of which identifies an object, and the other identifies an action, as
324
Z. Nagoev et al.
well as their contract, that is, a dynamic connection that served as a source of the sound. At this stage of training, the expert comments on audio signals in natural language as follows: “The car has passed,” “John said,” et cetera. The choice of an articulatory event and its connection with the spectral characteristics of an utterance as an independent object of analysis is in good agreement with the central postulate of the motor theory of speech perception that a person identifies non-verbal sound information as an event [27]. This level imitates declarative learning, in which the studied categories are defined explicitly and verbally. There is a conscious advancement and testing of hypotheses mediated by the anterior cingulate gyrus and the prefrontal cortex. At the fourth level - “situation” [27] - articulatory events receive a general emotional assessment, and forecasts are made, according to the results of which, based on feedback, it is possible to refine or reorganize categories. Thus, the proposed model of the cognitive mechanism of speech perception makes it possible to include in the signal analysis procedure all aspects of a speech message, including the extralinguistic component, expressed in this approach in terms of an event and a situation. Each subsequent level of signal structuring into the hierarchy of word elements, words, phrases, et cetera, has additional time limits, such as known word pronunciations or permitted word sequences, compensating for errors and uncertainties at lower levels.
4 Conclusion It is not enough to consider speech perception as the imprinting of acoustic patterns into phonetic representations; the strategies used by a person when coding speech should necessarily include cognitive processing even in the early stages of analysis. In this paper, the main principles of the integral approach for constructing a generalized mechanism of speech perception, taking into account linguistic and neurobiological data, have been determined. Elements of an active cognitive model of speech recognition, which allows taking into account plasticity and the effect of directed attention, were developed using multiagent recursive architecture. The proposed conceptualization of adaptive processing includes context-dependent changes in the system of perception. Acknowledgments. The research was supported by the Russian Foundation of Basic Research, grant No. 19–01-00648.
References 1. Hershey, J.R., Rennie, S.J., Olsen, P.A., Kristjansson, T.T.: Super-human multi-talker speech recognition: a graphical modeling approach. Comput. Speech Lang. 24, 45–66 (2010) 2. Weng, C., Yu, D., Seltzer, M. L., Droppo, J.: Single-channel mixed speech recognition using deep neural networks. In: Proceedings IEEE ICASSP, pp. 5632–5636 (2014) 3. Matsoukas, S., et al.: Advances in transcription of broadcast news and conversational telephone speech within the combined ears bbn/limsi system. IEEE Trans. Audio Speech Lang. Process. 14, 1541–1556 (2006)
Generalized Structure of Active Speech Perception
325
4. Evermann, G., et al.: Development of the 2003 CU-HTK conversational telephone speech transcription system. In: Proceedings IEEE ICASSP 1, p. I–249 (2004) 5. Glenn, M. L., Strassel, S. M., Lee, H., Maeda, K., Zakhary, R., Li, X.: Transcription methods for consistency, volume and efficiency. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC, pp. 2915–2920 (2010) 6. Hannun, A.: Writing about Machine Learning. https://awni.github.io/speech-recognition/. Accessed 21 Aug 2021 7. Han, C., O’Sullivan, J., Luo, Y., Herrero, J., Mehta, A.D., Mesgarani, N.: Speaker-independent auditory attention decoding without access to clean speech sources. Sci. Adv. 5(5), 1–11 (2019). https://doi.org/10.1126/sciadv.aav6134 8. Amodei, D., et al.: Deep Speech 2: End-to-end speech recognition in English and Mandarin. arXiv preprint arXiv:1512.02595. Accessed 11 May 2020 9. Galbraith, G.C., Arroyo, C.: Selective attention and brainstem frequency-following responses. Biol. Psychol. 37, 3–22 (1993) 10. Giard, M.-E., Collet, L., Bouchet, P., Pernier, J.: Auditory selective attention in the human cochlea. Brain Res. 633, 353–356 (1994) 11. Sakharny, L.V.: Introduction into Psycholinguistics. Publishing House of Leningrad University, Leningrad (1989). [Sakharny, L. V.: Vvedeniye v psikholingvistiku. Izdatel’stvo Leningradskogo Universiteta, Leningrag (1989)] 12. Ventzov, A.V., Kasevich, V.B.: Problems of Speech Perception. Publishing House Editorial, Moscow (2003). [Ventzov, A. V., Kasevich, V. B.: Problemy Vospriyatia Rechi. Izdatel’stvo Editorial, Moscow (2003)] 13. Morton, J.: The integration of information in word recognition. Psychol. Rev. 76, 165–178 (1969) 14. Marslen-Wilson, W.D.: Functional parallelism in spoken word-recognition. Cognition 25, 71–102 (1987) 15. Marslen-Wilson, W.D.: Activation, competition and frequency in lexical access. In: Altman, G.T.M. (ed.) Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives, pp. 148–172. MIT Press, Cambridge (1990) 16. Marslen-Wilson, W.D., Brown, C.M., Tyler, L.K.: Lexical representations in spoken language comprehension. Lang Cogn. Process. 3, 1–16 (1988) 17. Cole, R.A.: Listening for mispronunciations: a measure of what we hear during speech. Percept Psychophys. 1, 153–156 (1973) 18. Taft, M., Hambly, G.: Exploring the cohort model of spoken word recognition. Cognition 22, 259–328 (1986) 19. Bard, E.G., Shillcock, R.C., Altmann, G.E.: The recognition of words after their acoustic offsets in spontaneous speech: evidence of subsequent context. Percept Psychophys. 44, 395– 408 (1988) 20. Luce, P.A.: A computational analysis of uniqueness points in auditory word recognition. Percept Psychophys. 39, 155–158 (1986) 21. Norris, D.: Shortlist: a connectionist model of continuous speech recognition. Cognition 52, 189–234 (1994) 22. Massaro, D.W., Cohen, M.M.: The paradigm and the fuzzy logical model of perception are alive and well. J. Exp. Psychol. 122(1), 115–124 (1993) 23. Hintzman, D.L.: Minerva 2: a simulation model of human memory. Behav. Res. Methods Instrum. Comput. 16(2), 96–101 (1984) 24. Hintzman, D.L., Block, R., Inskeep, N.: Memory for mode of input. J. Verb. Learn. Verb. Behav. 11, 741–749 (1972)
326
Z. Nagoev et al.
25. Heald, S.L.M., Van Hedger, S.C., Nusbaum, H. C.: Understanding Sound: Auditory Skill Acquisition. https://www.researchgate.net/publication/316866628_Understanding_S ound_Auditory_Skill_Acquisition. https://doi.org/10.1016/bs.plm.2017.03.003. Accessed 12 June 2020 26. Nagoev, Z.V.: Intellectics, or thinking in living and artificial systems. Publishing House KBSC RAS, Nalchik (2013). [Nagoev, Z. V.: Intellektika ili myshleniye v zhyvych i iskusstvennych sistemach. Izdatel’stvo KBNC, Nal’chik (2013)] 27. Nagoev, Z., Lyutikova, L., Gurtueva, I.: Model for automatic speech recognition using multiagent recursive cognitive architecture. In: Annual International Conference on Biologically Inspired Cognitive Architectures BICA, Prague, Czech Republic. https://doi.org/10.1016/j. procs.2018.11.089 28. Nagoev, Z., Gurtueva, I., Malyshev, D., Sundukov, Z.: Multi-agent algorithm imitating formation of phonemic awareness. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 364–369. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_47 29. Nagoev, Z. V., Gurtueva, I.: Fundamental elements for cognitive model of speech perception mechanism based on multiagent recursive intellect. News of Kabardino-Balkarian Scientific Center of RAS 3(89), 3–14 (2019). [Nagoev, Z. V., Gurtueva, I. A.: Bazovye element kognitivnoi modeli mehanizma vospriyatiya rechi na osnove multiagentnogo rekursivnogo intellekta. Izvestiya Kabardino-Balkarskogo nauchnogo tsentra RAN (89), 3–14 (2019)] 30. Nagoev, Z., Gurtueva, I.: Multiagent model of perceptual space formation in the process of mastering linguistic competence. Adv. Intell. Syst. Comput., 327–334. https://doi.org/10. 1007/978-3-030-65596-9_39 31. Maye, J., Werker, J.F., Gerken, L.: Infant sensitivity to distributional information can affect phonetic discrimination. Cognition 82(3), B101–B111 (2002) 32. Holt, L.L., Lotto, A.J.: Behavioral examinations of the level of auditory processing of speech context effects. Hear. Res. 167(1–2), 156–169 (2002). https://doi.org/10.1016/S0378-595 5(02)00383-0 33. Lim, S.-J., Fiez, J.A., Holt, L.L.: How may the basal ganglia contribute to auditory categorization and speech perception? Front. Neurosci. 8, 1–18 (2014) 34. Ashby, F.G., Maddox, W.T.: Human category learning. Annu. Rev. Psychol. 56, 149–178 (2005) 35. Elman, J.L., McClelland, J.L.: Exploiting lawful variability in the speech wave. In: Perkell, J.S., Klatt, D.H.: (eds.) Invariance and Variability in Speech Processes, pp. 360–385. Lawrence Erlbaum Associates, Inc., Hillsdale (1986)
Multiagent Neurocognitive Models of the Processes of Understanding the Natural Language Description of the Mission of Autonomous robots Z. V. Nagoev , O. V. Nagoeva , I. A. Pshenokova , K. Ch. Bzhikhatlov , I. A. Gurtueva(B) , and S. A. Kankulov The Kabardino-Balkarian Scientific Center of Russian Academy of Sciences, I. Armand Street, 37-a, 360000 Nalchik, Russia
Abstract. The article is devoted to the development of neurocognitive models of the processes of understanding natural language statements that describe the goals, essential conditions and the course of the mission of autonomous robots. Such models serve to provide a dialogue interface in human-robotic teams. The key problem of automatic formation of a plan for achieving goals common for all members of such a team is solved on the basis of using algorithms for synthesizing the optimal path in a single decision graph based on self-organization processes occurring in the multi-agent neurocognitive architecture we have developed. This approach makes it possible to take into account the semantics of natural language messages and the implicit information contained in them when constructing the desired plan. Keywords: Artificial Intelligence · Cognitive models · Multi-agent systems
1 Introduction The problem of dialogue control of robots originated a long time ago and was associated with the realization of the fundamental complexity of setting missions to autonomous mobile robots [10, 16–18]. One of the first and subsequently the most common solutions to the problem of dialogue with a robot was an approach based on previously prepared dialogue scenarios [11–14]. The construction of interpreters of dialog systems based on developed ontologies of subject areas made it possible to replace rigid procedures with strategies that allow for the variability of specific conditions within semantically significant units of interpretation that correlate with the composition of ontologies [15]. Only recently, works have begun to appear that postulate the need for a robot to understand not only the structure of the environment and the composition of partners in the human-machine team, but also the context of joint problem solving in the formation of interpretations of natural language statements [19, 20, 24]. To identify contexts based The work was supported by RFBR grant No. 19-01-00648. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 327–332, 2022. https://doi.org/10.1007/978-3-030-96993-6_36
328
Z. V. Nagoev et al.
on several modalities, probabilistic [1], fuzzy [21, 25], semantic (based on the use of semantic graphs) [22], automata [9] and other methods are used. At present one of the main directions of organizing joint work of human-machine teams using dialogue control has become the use of cognitive architectures [2, 23]. Our approach to the choice of the main metaphor for designing a system of dialogue control of collective behavior consists in the synthesis of cognitive architectures and multi-agent systems based on the concept of an intelligent agent [3, 4, 8]. The proposed approach uses the so-called multi-agent neurocognitive architecture, which is constructed using the principles of self-organization of the neural tissue of the brain [5]. In such an architecture, the active elements of cognitive nodes are software agents with local target functions, the maximization of which these agents, imitating brain neurons, perform by cooperating with each other using message exchange. To implement experiments on this ongoing work, a mobile autonomous robot OFFICE-COMPANION-01 (O-COM) (Fig. 1), equipped with an anthropomorphic torso and manipulators with hands, was used. The intelligent control system of the robot was developed using the IntellectOn multi-agent neurocognitive architectures editing program.
Fig. 1. Autonomous mobile robot O-COM
2 The Problem of the Natural Language Description of the Missions of Autonomous Robots A mission is a certain change planned by its operator in the system “operator-autonomous robots-environment”, which the operator wants to achieve by organizing coordinated collective actions of autonomous robots in a certain time perspective. In most cases it is not possible to determine rigid algorithms for the behavior of autonomous robots to achieve mission goals due to the enormous labor costs. In addition, such an approach does not ensure the fulfillment of the mission, since the slightest change in the initial scenario of the mission violates the integrity of the constructed algorithms. The situation can be corrected if the goals, conditions and limitations of the mission are determined without specifying specific algorithms for the behavior of each of the autonomous robots involved in its implementation. This level of generalization, in our
Multiagent Neurocognitive Models of the Processes
329
opinion, is achievable on the basis of introducing into consideration the design metaphor of an intelligent agent - a simulation model of an artificial life system immersed in a real environment with the help of some physical body (natural or artificial) and controlled by a self-developing multi-agent neurocognitive architecture. The concept of such an intelligent agent was introduced in [4]. In particular, it is shown that such intelligent agents are able to solve problems associated with a change in the state of the “agent-environment” system by synthesizing their behavior at a certain time interval in the future, up to a certain planning horizon. Both people (operators) and autonomous (including mobile) robots can be considered as such an intelligent agent. The basic principles of formalizing the semantics of statements in natural language using multi-agent neurocognitive architectures are outlined in [4–7]. The unity of the semantics of the natural-language description of the mission arises as a consequence of the unity of the interpretation of the systemic bases of the agent’s behavior as a model of artificial life. The formal basis for the possibility of such an interpretation is the consideration by intelligent agents of the behavior of other intelligent agents in the system as fragments of the decision graph on the way to the target state of the mission. Such a graph, in the general case, is a dynamic formation formed by an intelligent agent to identify and resolve specific problems. The states of the “intelligent agent-environment” system act as its vertices, and actions performed by the “environment” and the intelligent agent itself act as arcs.
3 Synthesis of a Multi-agent Mission Model Based on Neurocognitive Architectures Since the proposed multi-agent neurocognitive architecture of the decision-making and control system of an intelligent agent is recursive, conceptual representations of this intelligent agent using this multi-agent neurocognitive architecture can be formalized in the form of embedded models of multi-agent neurocognitive architectures of other agents. This approach makes it possible to conceptualize the representations of intelligent agents about each other and about themselves in their representations. Such a possibility is of fundamental importance for ensuring natural language communication based on a dialogue between the operator and intelligent agents in the process of setting and executing a mission. When setting the problem, the operator on a limited subset of the natural language informs the agent of the goals, conditions, mission constraints. An intelligent agent interprets this information with the help of formalization performed by a multi-agent neurocognitive architecture during its translation from a linguistic representation into a mathematical one, thus determining the vertices and arcs of the graph of the problem situation. When constructing a graph, the intelligent agent may not have enough information, therefore, having identified the missing vertices and arcs using a multi-agent neurocognitive architecture, the intelligent agent performs reverse formalization and formulates a request to the operator in natural language. Upon agreement of all the missing elements of the mission description, having received approval of the constructed plan from the operator, the intelligent agent proceeds to its implementation. Figure 2 shows the
330
Z. V. Nagoev et al.
elements of a control multi-agent neurocognitive architecture, which includes agentsneurons performing a functional representation of individual words of a natural language (volumetric figures with numerous “hairs”), concepts associated with these words (full balls are objects, incomplete balls are subjects of actions, polygons - actions). The architecture also includes agents-neurons that control the construction of a graph of a problem situation and the execution of a plan for its resolution (“hearts” are evaluative agentsneurons, arrows are agents-neurons that control actions, “flags” are agents-neurons that control synthesis and goal setting).
Fig. 2. A fragment of a multi-agent neurocognitive architecture that interprets statements in natural language.
4 Implementation Our robot moves around the floor of an office building and delivers several types of coffee and tea, sweets and cookies to employees in various offices. The missions are set to the robot using a dialogue system with lexicon expandable on the basis of training. The user (operator) formulates tasks in a general form, detailed to the level at which the statement describing the mission would be understandable to an ordinary person. In the case when the robot cannot complete the graph of the problem situation associated with the formulation of this task, due to the lack of any information, it asks the operator clarifying questions. In the example under study, the mission statement to the robot is performed using the following dialogue. Operator: “Deliver drinks to the employees at 13:10”. Robot: “Which employee should I deliver drinks to?”. Operator: “To everyone who is not at the meeting”. Robot: “Where is the meeting?”. Operator: “Meeting in office number 5”. Robot: “Understood the task. I’m getting started “. Using the knowledge accumulated by the intelligent control system in the multi-agent neurocognitive architecture, the robot interprets the statements of the operator, activating individual agents as part of the neurocognitive architecture, performing the functional
Multiagent Neurocognitive Models of the Processes
331
representation of concepts corresponding to the words of the natural language from the composition of these statements. Interacting with each other in order to maximize their local target functions by sending messages, these agents-neurons form the functional states of the intelligent control system. Figure 3 shows a fragment of the decision graph built by the multi-agent neurocognitive architecture of the O-COM robot after processing the above mission description.
Fig. 3. Problem situation graph
This figure shows in black the states attributed by the neurocognitive architecture to the present tense, green to the past tense, and red to the future tense. The solid line denotes the states that the intelligent agent considers to be reliable (confirmed by experience, or logical inference), and the dashed lines indicate the states that he considers possible (in the past, present and future). The triangular elements in Fig. 3 indicate the actions of the intelligent agent (robot) itself, square ones - the actions of other participants in the problem situation - partners in the implementation of a joint action plan to solve the problem. The given fragment of the graph of the problem situation shows events, the occurrence of which was not explicitly specified in the initial description of the mission.
5 Conclusion As a result of the work, the basic principles of mission interpretation in dialogue systems of interaction with autonomous robots using multi-agent neurocognitive architectures have been formulated. It has been established that the use of multi-agent neurocognitive architectures allows an intelligent agent to complete the construction of the decision graph by automatically adding to it events related to the processing of information implicitly contained in the natural language description. It is shown that in order to coordinate joint efforts to form and implement a plan for resolving a problem situation, an intelligent agent, using the knowledge and models accumulated by its controlling multi-agent neurocognitive architecture, completes the situation graph by including subgraphs describing the actual or hypothetical behavior of other intelligent agents. - people, robots, software agents.
332
Z. V. Nagoev et al.
References 1. Awais, M., Henrich, D.: Human-robot collaboration by intention recognition using probabilistic state machines. In: IEEE 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD 2010): Balatonf red Hungary, pp. 75–80 (2010) 2. Haikonen, P.: Consciousness and Quest for Sentient Robots. Biologically Inspired Cognitive Architectures 2012, Proceedings of the third annual meeting of the BICA Society, AISC, pp. 19–27. Springer, Cham (2012) 3. Hewitt, C.: Viewing control structures as patterns of message passing. Artif. Intell. 8(3), 323–364 (1977) 4. Nagoev, Z.V.: Intelligence, or Thinking in Living and Artificial Systems, 211 p. Publishing House of KBNTs RAS, Nalchik (2013) 5. Nagoev, Z.V., Nagoeva, O.V., Pshenokova, I.A.: Formal model of semantics of natural language statements based on multi-agent recursive cognitive architectures. Izvestiya KBSC RAS, Nalchik: Publishing house KBSC RAS, no. 4(78), pp. 19–31 (2017) 6. Nagoev, Z., Nagoeva, O., Gurtueva, I.: Multi-agent neurocognitive models of semantics of spatial localization of events. Cogn. Syst. Res. 59, 91–102 (2020) 7. Nagoev, Z., Gurtueva, I., Malyshev, D., Sundukov, Z.: Multi-agent algorithm imitating formation of phonemic awareness, vol. 948. AISC, pp. 364–369. Springer, Cham (2020). Doi: https://doi.org/10.1007/978-3-030-25719-4_47 8. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Per. from English - M.: Publishing House “Williams” (2006) 9. Shuai, I., Yushchenko, A.S.: Dialogue control system of a robot based on the theory of finite automata. Mechatronics, Automation, Control 20(11), 686–695 (2019). https://doi.org/10. 17587/mau.20.686-695 10. https://www.researchgate.net/publication/220604947_Introduction_to_the_Special_I ssue_on_Dialog_with_Robots 11. https://www.aclweb.org/anthology/W14-0207.pdf 12. https://cyberleninka.ru/article/n/rechevoy-dialog-s-kolloborativnym-robotom-na-osnovemnogomodalnoy-semantiki 13. https://www.ri.cmu.edu/pub_files/pub3/fong_terrence_w_2001_2/fong_terrence_w_ 2001_2.pdf 14. http://dspace.nbuv.gov.ua/bitstream/handle/123456789/58356/10-Budkov.pdf?sequence=1 15. https://web.eecs.umich.edu/~kuipers/research/ssh/human-robot-dialog.html 16. https://www.semanticscholar.org/paper/Exploring-Spoken-Dialog-Interaction-in-HumanRobot-Marge-Pappu/e6b3c3f7f231c6480729e5fad67c10b44c9025b4 17. https://www.semanticscholar.org/paper/Spatial-Representation-and-Reasoning-for-Ken nedy-Bugajska/3665b4f1b22014b4f018bcf24dd1d36eb28d1911 18. https://web.stanford.edu/group/arl/cgi-bin/drupal/sites/default/files/public/publications/Jon esR%202002.pdf 19. https://www.semanticscholar.org/paper/The-impact-of-adding-perspective-taking-to-spatialDogan-Gillet/dd1cd2de92afa6ddc8c122d3826091980454df4c 20. https://www.semanticscholar.org/paper/Open-Challenges-on-Generating-Referring-Expres sions-Dogan-Leite/50ba7d33e88a4cf3b384659b77ae1c2cab69718f 21. https://elibrary.ru/item.asp?id=36818551 22. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4541855/ 23. https://www.usna.edu/Users/cs/nchamber/pubs/multirobot-05.pdf 24. https://pdf.sciencedirectassets.com/282173/1-s2.0-S2212827119X00025/1-s2.0-S22128271 19301970/main.pdf?X-Amz-Security-Token 25. https://elibrary.ru/item.asp?id=30674885
Exploring the Workspace of a Robot with Three Degrees of Freedom Natalia Yu. Nosova1(B)
and Sergey Yu. Misyurin1,2
1 Blagonravov Mechanical Engineering Research Institute of RAS,
4 Malyi Kharitonievski Pereulok, Moscow, Russia 2 National Research Nuclear University MEPhI, 31 Kashirskoe Shosse, Moscow, Russia
Abstract. In this paper, we consider the mechanical part of the robot controlled by artificial intelligence according to three independent coordinates. The solution of kinematics problems is considered. The task is set to determine the size and shape of the workspace of the robot mechanical part based on solving the inverse kinematics under given initial conditions. This is important for solving the planning problem, since all the trajectories of the output link with the working tool (endeffector) of the robot must remain within the obtained working area. It is possible to create quite complex mechanisms of both sequential and parallel structures, but if the working area of this mechanism is small, then their scope of application is significantly reduced. Each kinematic chain of the developed mechanism includes spatial four-links (hinged parallelograms). They are a flat mechanism in which the axes of rotation of the kinematic pairs are located in a plane perpendicular to the axes of their rotation, and all the points of the links describe the trajectories of movement in parallel planes. Thus, the spatial four-link mechanism can be used to transmit both translational motions and to transmit rotation to the output link and/or the working tool. Keywords: Workspace · Kinematics · Singularity · Robot · Parallel mechanism · Hinge parallelogram · Mathematical modelling
1 Introduction One of the main characteristics of the mechanism is its working area – a space determined by a plurality of points that the working tool of the output link of the mechanism reaches when changing input variables (parameters) [1–10]. It is possible to create rather complex mechanisms of both serial and parallel structure, but if the workspace of this mechanism is small, then their scope is significantly reduced. Therefore, when creating a new mechanism that is part of a robot and controlled by artificial intelligence, special attention is paid to such an important factor as the workspace size and shape. This is important for the task of motion planning, since all the trajectories of the robot’s output link must remain within the resulting working area. The workspace size is determined analytically, after compiling the coupling equation of the mechanism, which in some cases is a rather difficult task. For serial structure mechanisms, the direct kinematic analysis is solved quite simply, and for parallel structure © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 333–343, 2022. https://doi.org/10.1007/978-3-030-96993-6_37
334
N. Yu. Nosova and S. Yu. Misyurin
mechanisms we have, as a rule, a complex system of nonlinear algebraic or trigonometric equations, which are not solved analytically, only numerically. At the same time, it is worth noting that when the mechanism gets into a singularity, numerical decisions also constitute difficulties. In the numerical solution, being in the zone of a special position (singularity), we can get to the degenerate case when the determinant of the Jacobi matrix of the system of constraint equations vanishes. Over the past decade, the mechanisms workspace has been explored by many scientists using various methods and algorithms. For example, a spatial search approach is used to obtain a workspace based on the inverse kinematics [7]; the numerical search method [11]; the investigate of a reachable workspace and a dexterous workspace based on the number of conditions of a homogeneous Jacobi matrix and conducted a quantitative analysis of the workspace [12]; a discrete boundary searching method was implemented to calculate the workspace of parallel manipulator considering the driving constraint and joints constraint in the polar coordinate system [13]; the method of the maximum working space construction the for Isoglide type mechanisms, based on the use of the chord method [14]; analyzed the working space of two parallel robots with symmetrical planes by geometric method [15]. In paper [16] have used the Monte-Carlo method to determine the workspace of a six-degree-of-freedom hybrid-parallel manipulator, which is a model of a three-fingered human hand. There are also a number of works that talk about optimized the parallel mechanism from the perspective of maximizing the working space and maintaining flexibility. [17]; optimized a parallel mechanism with the goal of maximizing the reachable working space efficiency of a parallel mechanism [18]; used the spatial inscribed ball method to solve the working space of the mechanism, and the inscribed ball volume as the objective function for optimization [19]; used the spatial inscribed cuboid method to quantify the working space of the Tricept parallel mechanism and optimize the Tricept parallel mechanism [20]. This article discusses a parallel structure mechanism involving hinged parallelograms. These mechanisms have a special feature in the formulation of the kinematics equations, in that some of the equations are linear. This simplifies the system of coupling equations as a whole. By solving the direct and inverse kinematic analysis, we will determine the size and shape of the working area. A method based on solving an inverse kinematics was chosen to determine the workspace of the mechanism.
2 A Problem Statement Consider a parallel structure mechanism with three degrees of freedom. It consists of three kinematic chains, each containing a linear motor, a hinge parallelogram, and rotary kinematic pairs whose axes are perpendicular to the axes of the hinge parallelogram (Fig. 1) [21]. The Orthoglide parallel mechanism with three degrees of freedom developed by the French scientists Wenger Ph. and Chablat D., belonging to the family of three axis mechanisms with variable stepping points and fixed strut length, was chosen as the prototype.
Exploring the Workspace of a Robot
335
This mechanism has three identical kinematic chains of PRPaR (where P, R and Pa are prismatic, revolute and parallelogram joints, respectively) [22, 23]. Kinematic pairs (joints) can be driven by linear actuators or ball-screw drives. The output link is connected to the prismatic actuator via a set of three parallelograms, so it can only move progressively. This robot has a symmetrical structure and a fairly simple kinematic chain, where all connections have one degree of freedom. Due to its design, the Orthoglide robot has no singularities [24]. The output link does not change the orientation of the working tool in space due to the prismatic pairs, but only changes its position. To change the orientation of the working tool in space and increase the functional capabilities of the developed robot with three degrees of freedom, French scientists created a mechanism with five degrees of freedom by adding cardan shafts to two hinged parallelograms in two kinematic chains. A spherical mechanism with two degrees of freedom in the form of a wrist was organized at the output link [25]. The disadvantage of the developed mechanism with five degrees of freedom is the introduction of additional elements to the original design in the form of cardan shafts and rotary actuators to drive them, which made the overall design more complex and reduced its rigidity. The authors of this article suggested that a rotary actuator should be mounted on the same axis as a linear actuator. In addition to the rotary actuator, the kinematic chain was additionally equipped with an output rotational kinematic pair rigidly connected to the output link of the mechanism. In this case, each kinematic chain can be supplemented with a rotary actuators, and a spherical mechanism with two or three degrees of freedom can be arranged on the output link to produce a mechanism with five or six degrees of freedom, respectively. The specified mechanisms are considered in detail in the works [26, 27]. For the construction of the workspace of the described mechanisms, the mechanism with three degrees of freedom, which is responsible for the position in space of the output link and, consequently, the working tool, is of greater interest. The workspace of an organized spherical mechanism with two or three degrees of freedom will be sphere shaped, so we do not include them in the following calculations.
3 Model Design Description Let us consider in detail the mechanism with three degrees of freedom (Fig. 1). The base and moving platform are designed as triangles that are connected by three kinematic chains. To ensure that one kinematic chain does not take all the weight of the structure, the arrangement of the three chains is made in the form of a pyramid. In the middle position the entire mechanism is symmetrical (Fig. 1). Linear or rotary actuators can be located in each of the kinematic chains. The linear actuator is attached to the base by means of a vertically mounted rod. The moving platform is mated to the kinematic chains by means of rods connected to the hinged parallelograms. The spatial four-link mechanisms (hinged parallelograms) are connected to the rods of linear actuators via rotation kinematic pairs (Bi and Ki , i = 1, 2, 3), whose axes are simultaneously perpendicular to the axes of the actuators and the axes of the corresponding hinged parallelogram kinematic pairs (Fig. 2).
336
N. Yu. Nosova and S. Yu. Misyurin
Fig. 1. Structure scheme of a mechanism with four degrees of freedom: base (1), output link (2), working tool (3), linear actuator (4, 4 , 4 ), initial rotational kinematic pair (5, 5 , 5 ), hinged parallelogram consisting of an initial link (6, 6 , 6 ) and an end link (9, 9 , 9 ), finite rotational kinematic pair (8, 8 , 8 ), rotational kinematic pairs of a hinged parallelogram (7, 7”, 7”), the final link of the kinematic chain (10, 10 , 10 ); rotary actuators (11); output rotational kinematic pair (12) [21, 26].
Fig. 2. A spatial four-link mechanism with parallel rotation axes.
In [28] a random four-link mechanism in which the axes of rotation of the kinematic pairs are randomly positioned in space is considered in detail. It has been shown that this mechanism is not inoperable and is a spatial farm. If all kinematic pairs are arranged in a plane perpendicular to their axes of rotation, we obtain a planar mechanism in which all points of the links describe the trajectories of movement in parallel planes. Using this property, the spatial four-link mechanism (hinged parallelogram) can be used to transmit both translational movement and rotational transmission. To determine singularities of the output link in the working area, let us consider the functionality of a mechanism with a parallel structure with three degrees of freedom [29, 30]. Based on the design constraints, it can be seen that the displacement of the moving platform along one of the horizontal axes is equal to the radius of the moving platform itself, as well as the stroke limit of the linear actuator rod.
Exploring the Workspace of a Robot
337
Fig. 3. Extreme positions of the output link in the working area: (a) an extreme position of the output link in the workspace; (b) a maximum lift of the output link.
The maximum upward displacement of the moving platform and when one of the rods is extended, it results in simultaneous displacement along the horizontal coordinate axes (Fig. 3(a)). Extending all the linear actuator rods simultaneously lifts the moving platform to a distance comparable to the actuator rod limiting stroke (Fig. 3(b)). Determining the singularities of the parallel structure mechanisms is one of the most important issues. From a theoretical point of view, two kinds of singularities are possible for the model in question. The first type involves the loss of one or more degrees of freedom, the second type of special provisions may involve the loss of controllability or the presence of uncontrollable mobility. For a particular model it is possible to avoid these degenerate situations due to its design features. In particular, to achieve the special position associated with the loss of degrees of freedom (singularity of the first kind) it is necessary that at least one hinged parallelogram “folds” so that the links extend into a single line, which is technically impossible to do. A design feature of the resulting mechanism model, in addition to the above, is the absence of special provisions associated with loss of controllability (singularity of the second kind). Since the position of the output (moving) platform (2) is determined by input variables providing three degrees of freedom in translational motion, and the orientation (rotation) of the working tool (3) is determined by input variables providing one degree of freedom rotational motion, (at a fixed platform position), the input variables responsible for the rotation (orientation) can be neglected in further calculations [21].
4 Mechanism Kinematics The direct kinematic analysis is to determine the position of the output link from the input variables (parameters) of a parallel structure mechanism with linear actuators arranged along the Cartesian coordinate system axes Oxyz (Fig. 4). Translational movements are due to the fact that each chain has a hinged parallelogram, two rotation kinematic pairs and linear actuators. e1 (ex ) = (0, 1, −1), e2 ey = (−1, 0, 1),
338
N. Yu. Nosova and S. Yu. Misyurin
Fig. 4. A translational mechanism with three degrees of freedom: (a) the original diagram of the mechanism; (b) a calculation diagram of the mechanism.
e3 (ez ) = (1, −1, 0). These vectors lie in the Ozy, Oxz, Oxy planes, respectively (Fig. 5(a)). Assume that the distance from points A, B and C to O is R. Write down the coordinates of the points: A1 (R, 0, 0), A2 (0, R, 0), A3 (0, 0, R).
B1 (R-r1, 0, 0), B2 (0, R-r2, 0), B3 (0, 0, R-r3).
Fig. 5. (a) Rotational axes of the initial rotational kinematic pairs. (b) Definitions of position point M.
Exploring the Workspace of a Robot
339
To simplify the expressions, set R − ri = Pi , where i = 1, 2, 3 is the number of the kinematic chain, ri is a variable value. Coordinates of points K1 , K2 , K3 : K1 (x1 , y1 , z1 ); K2 (x2 , y3 , z3 ); K3 (x3 , y3 , z3 ). Connect points K1 , K2 , K3 with each other to form a triangle (Fig. 5(b)). The position of the point M is uniquely defined by the triangle and is not considered here. To write down the kinematic equations, we will use the algebraic method, which is based on the invariance of the link lengths. The invariance for the links B1 K1 , B2 K2 and B3 K3 is expressed by the following equations: (P1 − x1 )2 + (y1 )2 + (z1 )2 = a2 , (x2 )2 + (P2 − y2 )2 + (z2 )2 = a2 , (x3 )2 + (y3 )2 + (P3 − z3 )2 = a2 .
(1)
To simplify the solution, the rotation axis of the initial kinematic pair – B1 is perpendicular to the Ox axis and lies in the Oxy plane. Let us write out length invariance ratios K1 K2 , K2 K3 , K3 K1 : (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 = l 2 , (x1 − x3 )2 + (y1 − y3 )2 + (z1 − z3 )2 = l 2 , (x3 − x2 )2 + (y3 − y2 )2 + (z3 − z2 )2 = l 2 .
(2)
Consider triangle K1 K2 K3 (Fig. 5(b)). The bisector of angle K3 K1 K2 is perpendicular to the rotation axis of the final rotational kinematic pair – K1 , and hence to the rotation axis of the initial kinematic pair – B1 , or to unit vector e1 . Let us write it down in the form of three equations. The coordinates of point K1 are equal to: ⎛ x3 +x2 ⎞
K1 = ⎝
2 y3 +y2 2 z3 +z2 2
⎛
x1 − ⎠; vector K1 K1 is equal to K1 K1 = ⎝ y1 − z1 −
x3 +x2 2 y3 +y2 2 z3 +z2 2
⎞ ⎠
Multiplying the vector e1 and K1 K1 scalarly, we get an expression: y2 + y3 z2 + z3 e1 · K1 K1 = y1 − − z1 − =0 2 2
(3)
Let’s write out the rest of the equations in the same way and get, respectively: x1 + x3 z1 + z3 e2 · K2 K2 = x2 − − z2 − =0 (4) 2 2 x1 + x2 y1 + y2 e3 · K3 K3 = x3 − − y3 − =0 (5) 2 2
340
N. Yu. Nosova and S. Yu. Misyurin
This gives us three additional linear equations. Let’s write out a general system consisting of nine equations with nine unknowns: ⎧ (P1 − x1 )2 + (y1 )2 + (z1 )2 = a2 , ⎪ ⎪ ⎪ ⎪ ⎪ (x2 )2 + (P2 − y2 )2 + (z2 )2 = a2 , ⎪ ⎪ ⎪ ⎪ (x3 )2 + (y3 )2 + (P3 − z3 )2 = a2 , ⎪ ⎪ ⎪ ⎪ (x − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 = l 2 , ⎪ ⎪ ⎨ 1 (x1 − x3 )2 + (y1 − y3 )2 + (z1 − z3 )2 = l 2 , (6) ⎪ x2 )2 + (y3− y2 )2 + (z3 − z2 )2 = l 2 , (x3 − ⎪ ⎪ ⎪ ⎪ 3 3 ⎪ y1 − y2 +y − z1 − z2 +z = 0, ⎪ 2 2 ⎪ ⎪ ⎪ x1 +x3 z1 +z3 ⎪ x2 − 2 − z2 − 2 = 0, ⎪ ⎪ ⎪ ⎪ x1 +x2 2 ⎩ x3 − 2 = 0. − y3 − y1 +y 2
5 To Create the Mechanism Workspace with Three Degrees of Freedom In general, the system of Eqs. (6) is 32-order system. By bringing the system to one equation, we obtain biquadrate equations, which generally lower the degree of the system. In the final case, system (6) has two imaginary roots and two real ones. The two real roots correspond to two different design assemblies (Fig. 6).
Fig. 6. Different assemblies of the mechanism with three degree of freedom. The input variables have maximum values: (a) a first assembly; (b) a second assembly.
We choose the first assembly option (Fig. 6(a)). Based on the solution of the equations system (6), the workspace of the mechanism was constructed under the following initial conditions: R = 0.255 m is a general length of the link; ri = 0.0975 m is a variable value; a = 0.12 m is a side length of the hinge parallelogram;
Exploring the Workspace of a Robot
341
l = 0.16 m is the triangle sides K1 K2 = K2 K3 = K3 K2 ; km = 0.076 m is a height of the working tool from the center M of the triangle K1 K2 K3 ; h = 0.075 is a counting step. The simulation of the mechanism’s operation and definition of the workspace is made in Maple environment. As can be seen in Fig. 7, the mechanism workspace is a quite large, which gives great prospects for the use of the studied design in various areas of robotics.
Fig. 7. The workspace of the mechanism with three degrees of freedom: (a), (b) a side view; (c) a top view; (d) a bottom view.
6 Conclusions In this paper, a mechanism with three degrees of freedom has been considered for which the direct and inverse kinematic analysis have been solved. It has been shown that there are no singularities in this mechanism design. It has been shown that the spatial four-links mechanisms (hinged parallelograms) in each kinematic chain are able to transmit translational and rotational motions to the output link (a moving platform).
342
N. Yu. Nosova and S. Yu. Misyurin
For a considered mechanism with three degrees of freedom, the size and shape of the working area of the mechanism under given initial conditions is determined experimentally. The large workspace makes it the ideal choice for applications in various fields such as robotics, medicine, simulators and more. Acknowledgment. The research was supported by Russian Foundation for Basic Research, project No. 18–29-10072 mk (Optimization of nonlinear dynamic models of robotic drive systems taking into account forces of resistance of various nature, including frictional forces).
References 1. Merlet, J.P.: Workspace-oriented methodology for designing a parallel manipulator. In: Proceedings of the 1996 international conference on robotics and automation, pp. 3726–3731. IEEE, Minneapolis (1996) 2. Arockia, S.A.D.: A review on the kinematic simulation and analyses of parallel manipulators. Robot. Autom. Eng. J. 1(2), 34–45 (2017) 3. Kamada, S., Laliberté, T., Gosselin, C.: Kinematic Analysis of a 4-DOF parallel mechanism with large translational and orientational workspace. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 1637–1643. IEEE, Montreal (2019) 4. Wen, K., Harton, D., Laliberté, T., Gosselin, C.: Kinematically redundant (6+3)-dof hybrid parallel robot with large orientational workspace and remotely operated gripper. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 1672–1678. IEEE, Montreal (2019) 5. Wang, Y., Fan, S., Zhang, X., Lu, G., Zhao, G.: Kinematics and singularity analysis of a 3-RPS parallel mechanism. In: 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1–6. IEEE. Macau (2018) 6. Chablat, D., Kong, X., Zhang, C.: Kinematics, workspace and singularity analysis of a multimode parallel robot. In: ASME 2017 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, pp. 1–10. Cleveland, USA (2017) 7. Zou, Q., Zhang, D., Zhang, S., Luo, X.: Kinematic and dynamic analysis of a 3-DOF parallel mechanism. Int. J. Mech. Mater. Des. 17(3), 587–599 (2021). https://doi.org/10.1007/s10999021-09548-8 8. Zhang, H., Fang, H., Fang, Y., Jiang, B.: Workspace analysis of a hybrid kinematic machine tool with high rotational applications. In: Hindawi, Mathematical Problems in Engineering 2018, 1–12 pp. (2018) 9. Meng, X., Li, B., Zhang, Y., Xu, S.: Workspace analysis of 2-RPU 2-SPS spatial parallel mechanism. In: Proceedings of 2020 IEEE International Conference on Mechatronics and Automation, ICMA 2020, pp. 130–135. IEEE, Beijing (2020) 10. Erastova, K.G., Laryushkin, P.A.: Workspaces of parallel mechanisms and methods of determining their shape and size. BMSTU J. Mech. Eng. 8(689), 78–87 (2017) 11. Fu, J., Gao, F.: Optimal design of a 3-leg 6-DOF parallel manipulator for a specific workspace. Chin. J. Mech. Eng. 29(4), 659–668 (2016). https://doi.org/10.3901/CJME.2016.0121.011 12. Pond, G., Carretero, J.A.: Quantitative dexterous workspace comparison of parallel manipulators. Mech. Mach. Theory 42(10), 1388–1400 (2007) 13. Chi, Z.Z., Zhang, D., Xia, L., Gao, Z.: Multi-objective optimization of stiffness and workspace for a parallel kinematic machine. Int. J. Mech. Mater. Des. 9(3), 281–293 (2013)
Exploring the Workspace of a Robot
343
14. Antonov, A.V., Chernetsov, R.A., Ulyanov, E.E., Ivanov, K.A.: Use of the chord method for analyzing workspaces of a parallel structure mechanism. In: IOP Conference Series: Materials Science and Engineering, vol. 747, p. 012079 (6 pages) (2020) 15. Bihari, B., Kumar, D., Jha, C., Rathore, V.S., Dash, A.K.: A geometric approach for the workspace analysis of two symmetric planar parallel manipulators. Robotica 34(4), 738–763 (2016) 16. Chaudhury, A.N., Ghosal, A.: Determination of workspace volume of parallel manipulators using monte carlo method. In: Zeghloul, S., Romdhane, L., Laribi, M.A. (eds.) Computational Kinematics. MMS, vol. 50, pp. 323–330. Springer, Cham (2018). https://doi.org/10.1007/9783-319-60867-9_37 17. Fang, X., Zhang, S., Xu, Q., Wang, T., Liu, Y., Chen, X.: Optimization of a crossbar parallel machine tool based on workspace and dexterity. J. Mech. Sci. Technol. 29(8), 3297–3307 (2015). https://doi.org/10.1007/s12206-015-0620-1 18. Lou, Y., Liu, G., Chen, N., Li, Z.: Optimal design of parallel manipulators for maximum effective regular workspace. In: Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 795–800. IEEE, Edmonton (2005) 19. Hao, Q., Guan, L., Wang, L.: Intelligent acceleration/deceleration control algorithm for drive force for a heavy duty hybrid machine tool. Qinghua Daxue Xuebao/J. Tsinghua Univ. 49(11), 1770–1778 (2009) 20. Hosseini, M.A., Daniali, H.M.: Cartesian workspace optimization of Tricept parallel manipulator with machining application. Robotica 33(9), 1948–1957 (2015) 21. Nosova, N.Yu., Glazunov, V.A., Palochkin, S.V., Kheilo, S.V. Spatial mechanism with four degrees of freedom. Patent of Russia for invention RU No. 2534706, Oct 06, 2014. (In Russian: Prostranstvennyy mekhanizm s chetyr’mya stepenyami svobody) 22. Majou, F., Wenger, Ph., Chablat, D.: Design of a 3 axis parallel machine tool for high speed machining: the orthoglide. In: IDMME 2002, pp. 1–10. Clermont-Ferrand, France. Primeca (2002) 23. Pashkevich, A., Chabla,t D., Wenger, P.: Kinematics and workspace analysis of a three-axis parallel manipulator: the Orthoglide. Robotica 24(1), 39–49 (2006) 24. Wenger, P., Chablat, D.: Kinematic analysis of a new parallel machine tool: the orthoglide. In: Proceedings of the 7th International Symposium on Advances in Robot Kinematics, Portoroz, Slovenia (2000) 25. Chablat, D., Wenger, P.: Device for the movement and orientation of an object in space and use thereof in rapid machining. United States Patent Application Publication No.: US 2007/006232, 22 March 2007 26. Nosova, N.Y., Glazunov, V.A., Palochkin, S.V., Terekhova, A.N.: Synthesis of mechanisms of parallel structure with kinematic interchange. J. Mach. Manuf. Reliab. 43(5), 378–383 (2014). https://doi.org/10.3103/S1052618814050136 27. Nosova, N., Glazunov, V.A., Misyurin, S., Filippov, D.N.: Synthesis and the kinematic analysis of mechanisms of parallel structure with the outcome of progress. Izvestiya Vysshikh Uchebnykh Zavedenii, Seriya Teknologiya Tekstil’noi Promyshlennosti 2, 109–113 (2015) 28. Misyurin, S.Y., Kreinin, G.V., Markov, A.A., Sharpanova, N.S.: Determination of the degree of mobility and solution of the direct kinematic problem for an analogue of a delta robot. J. Mach. Manuf. Reliab. 45(5), 403–411 (2016) 29. Gosselin, C.M., Angeles, J.: Singularity analysis of closed-loop kinematic chains. IEEE Trans. Robot. Autom. 6(3), 281–290 (1990) 30. Laryushkin, P.A., Rashoyan, G.V., Erastova, K.G.: On the features of applying the theory of screws to the evaluation of proximity to specific positions of the mechanisms of parallel structure. J. Mach. Manuf. Reliab. 46(4), 349–355 (2017). https://doi.org/10.3103/S10526 18817040100
Digitalization as a New Paradigm of Economic Progress Svetlana Nosova1(B)
, Anna Norkina1 , Svetlana Makar2 , Irina Arakelova3 , and Galina Fadeicheva4
1 National Research Nuclear University “MEPHI”, Kashirskoe Shossestr. 31, 115409 Moscow,
Russian Federation [email protected] 2 Financial University Under the Government of the Russian Federation, Leningradsky Prospectstr. 49, 125599 Moscow, Russian Federation 3 Volgograd State Medical University, Palih Bortsov Square 1, 400131 Volgograd, Russian Federation 4 Academy of Labor and Social Relations, Lobachevskystr. 90, 119454 Moscow, Russian Federation
Abstract. Digitalization that challenges us, instills a sense of progress, gives autonomy and makes us feel a bright future. It is viewed as a result of the largescale use of digital technologies in the mode of modern integration processes in the international space in order to enhance economic progress. Based on this goal, we reveal the degree of influence of the generation of digital technologies on the transforming factor of economic growth. They tend to argue that digitalization has become a critical necessity in all spheres of the economy, which is the basis for long-term success. Analyzing the use of digital technologies as a “core” of interaction between science, business and government, we prove the inevitability of economic growth, an increase in the quality of human life and the prosperity of society. We identified ways to consistently implement Russia’s coordinated strategy based on the use of digital technologies in all spheres of economic and social activity. Justified the introduction of the “digital spirit” into economic life, especially in the development of artificial intelligence technologies, as an accelerator for achieving global advantages and reducing risks in modern economic development in response to the COVID-19 pandemic. Keywords: COVID-19 pandemic · Digitalization · Digital economy · Digital business · Artificial intelligence
1 Introduction Digital technology is changing all economic life: customer purchases (like Amazon), employment (Uber), investments (Betterment and Wealthfront), value creation. It is in this regard that the use of digital technologies is forcing companies to rethink their current strategy and learn how to cope with the challenges caused by the increase in digitalization. Today is Uber, Airbnb, Amazon, Apple, PayPal are leaders in the global economy. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 344–354, 2022. https://doi.org/10.1007/978-3-030-96993-6_38
Digitalization as a New Paradigm of Economic Progress
345
Digital platforms are expected to have the potential for multiple purposes for industrial enterprises, such as when integrated into the Industry 4.0 vision. Digital platforms are becoming more and more entrenched in the current digital age, and many new business ideas and models are coming with it. Practice shows that the functioning of digital platforms differs from the functioning of traditional organizations. A distinctive feature of the functioning of digital platforms is the limited powers of the control side of digital platforms and the absence of centralized management from the bottom up. The overall management of digital platforms is not so much a control and management style as an organizational-stimulating and coordinating style. It is concluded that the development and deployment of a management structure, a number of control mechanisms, pricing models, as well as the implementation of other measures consistent with the life cycle stage of the development of a business model and platform architecture are necessary for the effective development of the platform. Notable examples of platform ecosystems are mobile application platforms (such as the Apple App Store) and social media platforms (such as Facebook), where participants transform value creation into a collaborative value creation process. At the present stage, these technological transformations are taking the form of digitalization, which is in a contradictory relationship with the processes of capitalization and socialization. It is proved that the processes of digitalization, capitalization and socialization in their isolation lead to negative consequences and destabilization of the economy. Therefore, new governance of the digital economy needs to be learned, given that “digitalization is contributing to a historic turn in governance where practiceoriented research can be done with less effort and improved quality, and micro-level data in the form of digital archives and online content facilitates the adoption of critical perspectives” ([10], p. 340). In this regard, digital technologies can be viewed as a process of strategic maneuvering in economic development. During this period, large companies are faced not only with the problem of attracting resources, such as capital and labor, but also with the development of a competitive proposal. They also face an institutional structure that sometimes requires active transformation to better fit the business. The outcome of such processes is highly uncertain and is subject to scrutiny. Recent studies have shown that digitalization of economic life is not only an attempt to look into the future, first of all it is a search for ways to create the future. President of Russia V.V. Putin emphasized: “And of course, we need to make broader use of the advantages of new technological solutions. Here we also need to become one of the world leaders” [13].
2 Theoretical Analysis 2.1 The Development of Digitalization Creates the Potential for Economic Progress Digital technologies exponentially increase the state of the information space, and hence the scale of the transformation of economic activity ([17], p. 1143). “Exogenous shocks in the form of new technologies are more likely to be introduced into business processes and spread in the digital economy, and this will foster their perception, including by politicians and digital entrepreneurs, which allows them to constantly interact, thereby
346
S. Nosova et al.
strengthening the trust associated with them” ([24], p. 656). In this regard, the structure of the economic system is changing, and with it its dynamic properties, since a key element of the digital transformation process is the transition from analog or physical technologies to digital data systems. In this regard, the development of digitalization should be viewed as a new phenomenon in modern economic life, since it “manifests itself in the emergence of completely new technologies such as big data, cloud technologies, artificial intelligence” [2]. At the same time, practice shows that social stratification is exacerbating in the digital economy, cyber threats are immensely growing, [16] which often lead humanity to a major failure (turbulence) in the global market system, especially given the unpredictable consequences of COVID-19. Nevertheless, digital technologies are rapidly spreading in various sectors of the economy. Digital advances are increasingly being embedded in various business functions, including customer service. Large tech companies are already competing in the digital arena to improve their business processes and therefore results in a digital environment. Thus, “digital technologies are changing the traditional form of business to digital business” ([25], p. 77). Digitalization forces us to rethink how companies conduct their business processes, what management methods to use and how information systems work, and also to learn all about the nature of customer relationships. as you know, new business models are designed based on digital platforms. The digital platform as a technological tool provides mediation processes. In fact, it contributes to the creation of added value through digital collaboration, which is the basis for increasing the effectiveness of the digital version of economic change. The potential of digital technologies is likely to lead to an increase in digitalization in those countries where companies will massively use digital technologies in their field of activity in order to create a product “with less effort and improved quality.” ([9], p. 91) Digital technologies are driven by globalization that brought with them social media, mobility, integration, e-business, digital products and services, new organizational forms, and more. They are transforming businesses. It is important to emphasize that the process of maneuvering the economic activity of the economy under the influence of digital technologies is more associated with new activities and products than with higher productivity. The emphasis is on “being the best in your environment” because being “good enough” is no longer “good enough.” Everyone is “good enough” now. In this aspect, digital technologies should become the driving force of economic progress. It is appropriate to emphasize that “the basis for the successful implementation of AI in the context of digital transformation offers specific recommendations in the field of intelligence, integration, flexibility and leadership of business companies ([4], p. 110). The dominant logic of management research continues to be based on assumptions derived from neoclassical economics, where aggregate data are analyzed using econometric approaches and, accordingly, assumptions about rational behavior of the producer and consumer. According to experts, this is also an integral part of management in the context of digitalization. But it is emphasized that the digitalization of the economy can affect “the introduction of fundamentally new business opportunities and business models” ([5], p. 537). So, the basis of various transformation processes is the transition from the industrial-market to the information-network management system, which determines the historical meaning of the modern stage of development of society. Therefore, it is
Digitalization as a New Paradigm of Economic Progress
347
necessary to manage these processes in order to harmonize them and subordinate them to the goals of human development. David L.Rogers argues that “digital business models do not destroy traditional businesses, but rather make them more competitive” ([15], p. 3). In other words, “digitalization will lead to an increase in production in mature markets, because it reduces operating costs and enables companies to reduce their dependence on such a factor as the difference in wages in different countries.” [fourteen] Taking into account the fundamentally different laws of the functioning of these systems, the transition from one system to another is controversial and leads to an increase in uncertainty and instability. “Big data, the Internet of Things, and artificial intelligence are so destructive that they have redefined the dynamics of technology leadership” [20]. In these conditions, new requirements appear for the management of transformation processes. Information is increasingly becoming an object of control both as a result and as a means of production, and the production of information using information technology is gradually becoming a defining process. 2.2 Artificial Intelligence as a Safe and Profitable Direction of Digitalization Artificial Intelligence (AI) is “a broad-based tool that empowers people to rethink how we integrate information, analyze data and use insights to improve decision-making, and it is already transforming every aspect of life” [22]. Artificial intelligence stands out as the transformational technology of our digital age, and its applications across the economy are growing at a rapid pace. Based on research from the McKinsey Global Institute and McKinsey Analytics’ applied AI experience, both the practical application and the economic potential of advanced AI techniques can be assessed across industries and business functions. The field of AI is developing very rapidly. In recent years, there have been breakthroughs in image and speech recognition, autonomous robotics, language challenges, and games. “Human-friendly virtual and physical collaborative robots, or cobots, will work side-by-side with users to assist minds and hands in a variety of creative cognitive tasks, including design, invention, art creation, or setting goals in unexpected situations in unpredictable environments” ([18], p. 57). In the coming decades, it is likely that significant progress will be made in the development of AI. This bodes well: new scientific discoveries, cheaper and better quality goods and services, medical advances, including climate change mitigation, pandemic response and food security. After all, “AI will be capable of performing many analytical and thought tasks” ([7], p. 43). So, on April 8, 2019, the European Commission Expert Group (“HLEG”) on artificial intelligence came to the conclusion: (1) AI must be ethical and (2) reliable, both technically and socio-economic point of view. “AI systems should benefit all people, including future generations” [8]. This AI assessment forms the basis of the digital version of economic progress. Experts have developed a digital maturity index to denote the so-called “digital champions” [14]. AI can be a valuable tool for customer service management and task personalization. AI enables smoother customer experiences and more efficient document processing. AI capabilities go beyond words alone. For example, deep learning analysis of audio allows systems to gauge the emotional tone of a client; in case the client reacts poorly to the system, the call can be automatically redirected to operators and managers. In other areas of marketing and sales, AI techniques can also have a significant impact.
348
S. Nosova et al.
Combining customer demographics and past transactions with social media monitoring can help create tailor-made product recommendations targeting individual customers, as companies such as Amazon and Netflix have successfully done. These companies can double their sales conversion rates. Two-thirds of the use of AI lies in improving the performance of existing analytics use cases. Jeffrey G. Parker, Marshall W. Van Alstine, and Sangeet Paul Choudary argue that the success of leading-edge businesses is. that they are “built on platforms: two-way markets that will revolutionize the way we do business” [12]. Most modern AI systems are “narrow” applications specifically designed to solve a well-defined problem in one area, such as a specific game. Such approaches cannot adapt to new or broader challenges without substantial revision. While it may be far superior to human performance in one area, it is not superior in other areas. However, a long-standing goal in this area has been to develop artificial intelligence that can learn and adapt to a very wide range of problems. AI also raises short-term concerns: privacy, bias, inequality, safety and security [4]. The study highlighted new threats and trends in global cybersecurity, and examined the problems at the intersection of AI, digitization and nuclear weapons systems. As AI systems become more powerful and more general, they can surpass human performance in many areas. If it does, it could be a transition as transformative economically, socially and politically as the industrial revolution. This can lead to extremely positive events, but can also potentially present catastrophic risks from accidents (safety) or misuse (safety). There are a number of complex technical challenges associated with designing a fail-safe artificial intelligence (AI). Aligning the behavior of current systems with our goals has proven difficult and led to unpredictable negative results. Accidents caused by more powerful systems will be much more devastating. In terms of security, the following can be said: advanced AI systems can be key economic and military assets. If these systems ended up in the hands of bad actors, they could use them in harmful ways. If several groups competed to develop it first, it could have a destabilizing arms race dynamic. Reducing risks and achieving the global benefits of AI will present unique governance challenges and will require global collaboration and representation.
3 Results 3.1 Digitalization in the System of Economic Progress in Russia Russia is striving to solve key problems in the field of promoting its digital development, coupled with world standards. It is believed that Russia’s progress in digital technologies should keep pace with China and the United States, since the digital industry in these countries has become an important point of economic development. Russia is striving to achieve significant achievements in the development of national digital products. This key will help overcome the low annual growth rate of Russia’s GDP, while taking into account that our country has significant technological groundwork and human competencies to find a way out of this situation. You just need to make every effort on the part of business and government to achieve major breakthroughs in digital technology. We
Digitalization as a New Paradigm of Economic Progress
349
need to make sure that AI becomes the main driver of industrial progress and digital transformation. The entire power of the digital economy is determined by the development of the digital sector, where digital products and new technologies are directly created, which now in Russia are like new “oil”, the value of which is constantly increasing. From here came the time when the full force of economic development must be applied to a new type of production - digital. To this end, Western business partners should be involved who are engaged not in finance, but in engineering, digital product development strategies based on the digital model of the Russian economy. Therefore, it is no coincidence that most studies in the field of the digital economy are devoted to the analysis of digital platforms, new forms of government and the reorganization of forms of joint activity of business entities in the system of reproduction phases: production - distribution exchange and consumption. Digital technologies can bridge the gap between phases, as they facilitate quick, accurate decisions in a changing environment, both internal and external. This is because digital objects can be copied many times, and this can be done across a country, groups of countries, and the entire planet. Integrated digital technologies embedded in the reproduction process enable companies to achieve greater benefits at lower costs, identify and analyze valuable information, plan strategies, predict outcomes and collaborate with global expertise. “Researchers have modeled the potential impact of artificial intelligence on 12 advanced economies. The United States will benefit the most from the introduction of AI into the economy - Accenture predicts that by 2035 the growth rate of the American economy using artificial intelligence will be 4.6% (2.6% in the baseline scenario)” [1]. The Russian government should encourage private sector investment and establish a national digital development fund. Looking to the future, Russia intends to conduct research and development with the aim of building up its inherent national advantages through the use of both domestic and international digital resources. 3.2 Digitalization in the System of Economic Progress in Russia The digital economy of Russia operates on the principles of public-private and intersectoral partnerships. In Russia, it is planned to transform the entire life of the population, and not any individual business functions and business areas. Therefore, governments and regulators need to provide support for the growth of digital technologies in order to overcome possible turbulence in economic development under the influence of external factors, in particular foreign sanctions. The development of Russian business models in the digitalization mode testifies to a new stage in the transformation of the world market, which inevitably causes the modernization of the internal socio-economic environment of all countries of the world. Theoretical analysis of the design of digital business models, as well as methodological and empirical ones, determine the vector and dynamics of the development of the digital economy [18]. However, just as important, they can be viewed as tools that should capture world markets and threaten the sovereignty of countries. This factor is seen as extremely important due to the spread of COVID-19, which is forcing businesses to reduce the proportion of human labor. At present, Russia and other countries need to find vital decisions on the following issues: (1) preservation (which is supposed to be carried out in accordance with the fundamental classical economic
350
S. Nosova et al.
theory) of the reproduction chain corresponding to the sequence of stages “production - distribution - exchange - consumption”; (2) the formation of business digitalization concepts focused on the growth of innovation, increased competitiveness and spatial development, which will ultimately contribute to the creation of a new technological base and will make it possible to help the global society overcome turbulence, as well as solve the problem of interaction and coordination in the context of a global pandemic; (3) development of tools (which should be based on the properties of the digital business model and its architecture), involving the solution of practical tasks to form the prestige of domestic entrepreneurship, as well as the country as a whole. Currently, Russian business needs to actively move to the development and implementation of automated technical systems. Leveraging them effectively requires organizations to address key data challenges, including design and regulatory constraint management, as digital technology is becoming an increasingly intelligent resource. The main problems on the way of their use in business lie in the personnel sector, which lacks specialists with the necessary qualifications. As part of a national digital transformation strategy, the focus should be on creating new job opportunities and protecting human well-being. Enacting comprehensive federal privacy laws and policies requiring accountability for the ethical design and implementation of digital transformation is critical to ensuring data sharing that “can be an advantage, not a burden [11]. “Sharing is a phenomenon as well as old as humanity, while co-consumption and the ‘sharing economy’ are phenomena born of the era of the Internet” [2]. A successful national digital strategy requires an analysis of the existing regulatory and policy landscape to identify and remove any barriers to their development and implementation. “But strategists—whether CEOs of established firms, divisional presidents, or entrepreneurs—must have a strategy, an integrated, overarching vision of how the business will achieve its goals,” [6.50] including trade agreements and diplomacy to achieve them. The issues of adapting the interaction of science, business and government to the development of Russian regions are relevant in the context of the inevitable transition to digital transformation of the economy. Consideration of the traditional type of organization and conduct of research within the framework of innovative territorial clusters has shown the presence of a significant number of its advantages. The consistency of the activities of science, business and government allows building up business potential and carrying out developments in search of new forms of organizational activity. Government incentives to increase the willingness and comfort to share information with the public and private sectors will help increase the propensity to share information. The governments of a number of countries pay great attention to digital technologies, in particular AI. For example, China has announced its desire to become a leader in artificial intelligence by 2030 [3]. Experts have speculated that the next few decades will usher in a fourth industrial revolution. “The fourth industrial revolution will be driven by digitalization, information and communication technologies, machine learning, robotics and artificial intelligence; and will move more decision-making from humans to machines. The ensuing social change will have a profound impact on both personal selling and sales management research and practice” [21]. This is especially true for the development of artificial intelligence.
Digitalization as a New Paradigm of Economic Progress
351
Solving the Problems of the Development of Artificial Intelligence in Russia. First of all, you need to consider that there is a great deal of uncertainty and disagreement about the timing of the development of advanced AI systems. But whatever the speed of progress in this area, it seems that there is useful work that can be done in Russia right now. Machine learning security technical research is currently being led by teams from OpenAI, DeepMind, and the Human Compatible AI Center. More advanced and powerful AI systems will be developed and deployed in the coming years, these systems can be transformative with both negative and positive consequences, and it looks like we can do useful work right now. While there are many uncertainties, Russian researchers must make serious efforts and think about laying the foundations for the security of future systems and better understand the implications of such advances. As companies continue their march towards digitalization, they are increasingly adding AI techniques to their value creation toolboxes. What makes AI so powerful is its ability to learn. The organizations that will benefit the most from AI will be the ones who can most clearly and accurately define their goals. The companies that are able to sharpen their vision the most will benefit the most from AI. Now AI comes with capital that learns. Companies need to ensure that information flows into solutions, and then they learn from the result and pass that learning back to the system. Through the techniques used to train AI, the effectiveness of AI is directly related to the clarity of objectives and specifications. Our government has a stake in supporting the widespread adoption of AI as it can lead to increased productivity, economic growth and community prosperity. Their tools include public investment in R&D, as well as support for various training programs that can help prepare excellent AI professionals. Opening up public sector data can spur private sector innovation. Setting common data standards can also help accelerate the development of AI, as well as raise new questions that must be faced. Therefore, it is likely that some political innovation will be required to deal with this rapidly advancing technology. But given the magnitude of AI’s beneficial effects on business, the economy and society, the goal should not be to limit the adoption and use of AI, but rather to encourage its beneficial and safe use. The future of artificial intelligence (AI) depends on intelligent agents becoming human-like in their cognitive abilities, leading to creative cobots and social actors who will inherit human values and be accepted by society as equal minds “[19]. Necessary” to develop artificial intelligence, the essence of which is to “break” the matrix of everyday life in order to launch a large-scale virtual program of a new being of humanity, spurred on by the COVID-19 pandemic” ([23], p. 657).
4 Discussion The 21st century can really be considered the century of accelerated digitalization of human life, its activities and the environment. Many companies that have become digital giants and play a significant role in global processes, comparable to the role of global players, and in some aspects even more, pursue their own policies and determine the current development trends for the whole world and all its subjects, affecting all areas of the biosphere and technosphere. In 2020, the key leaders, especially in the digital
352
S. Nosova et al.
aspect, can be considered two states - China and the United States, which created their own models and distributed them to the whole world, thereby strengthening and expanding their own subject and digital sovereignty. And if the United States has been leading the way in cybernetics and its technical applicability for more than half a century since the end of World War II, while demonstrating its strength and power through the growing quantitative and qualitative distribution of key digital products, from the devices themselves to their content, from platforms to specific content that affects the lives of the masses, then China has made a breakthrough in its digital development over the past 30 years, becoming a serious competitor to the United States, creating comparable alternative solutions around the world. Several Chinese companies can be mentioned, including global Tencent, Huawei and Wechat, which not only pursue an active policy of capturing global markets and audiences, but also actively threaten the sovereignty of other states. Different characteristics of China and the United States determine the vector and dynamics of development of technology companies and products, as well as the principles that are applied in organizing interaction between companies and the state, the state with its citizens and citizens with companies that provide products. It is important to highlight the significant advantages China has in its race with the United States for digital leadership. Firstly, this is the numerical advantage of the Chinese population, which, according to the same rules, regulated by one state, uses system solutions for everyday use, trusting their lives in terms of choosing and leaving a digital footprint, creating data and developing algorithms. Second, the ability of the Chinese Communist Party to influence the ideology and policies of all companies in the country’s market. At the same time, some applications developed by parent companies located in China and belonging to its jurisdiction, such as TikTok, are leading not only in their own market, but also abroad. According to the latest data, TikTok is in the 1st place in the world in terms of downloads on all types of mobile devices. And even the recent presidential decree to block the app unless a global company buys it hasn’t diminished the audience. The essence of the new era is that a new information resource has appeared as an integral part of a more complex technogenic reality, which is especially important in the context of states’ confrontation. Only now all citizens of all countries are involved in this confrontation, and therefore every person in the world. The development and future of companies, as well as the rules formed for them by the countries in which they operate, directly affect the lives of ordinary people who in one way or another come into contact with the company’s products. In fact, countries, using the capabilities of their companies, reaching a specific audience, broadcast and develop their own cultural model, predetermining a person’s lifestyle and receiving various kinds of resources from him.
Digitalization as a New Paradigm of Economic Progress
353
5 Conclusion In order to achieve digital economic progress, the following recommendations are proposed: 1. The rapid pace of digitalization fueled by the pandemic means that organizations need to adapt faster than ever before and develop the right skills to sustain it. The development of digitalization should be accompanied by coordinated actions of interested government and commercial structures, which will serve as an incentive for structural and technological reforms and modernization of the national economy. The digital winner will be the one who comes up with the best practices for digital technology. If Russia manages to create such a business environment, then it will enter the world level of global digitalization. 2. Consider the impact of digital technologies on the development of the economy as an effective form of economic progress through the practical application of digital technologies, which will provide a much larger number of people with access to participation in the digital economy and an increase in the quality of life of the country’s population. The development of digital business in Russia is currently constrained by the distorted motivation of Russian enterprises, banks and other financial institutions regarding its feasibility. However, it is impossible to ensure economic progress without digital business. Business leaders need to be encouraged to work together, leverage long-term capital and drive bold change. 3. In the context of the COVID-19 pandemic, artificial intelligence must take on a “scenario mission” to withdraw the economy from the current system of global turbulence. Today, the world leaders in AI are countries that are wisely investing in digital assets and using digital technologies to capitalize on the multiplier effect of digitalization. Ultimately, it is becoming increasingly clear that the value of AI lies in the ability of companies to use it. In doing so, you always need to consider issues, including confidentiality and data security in the context of the use of artificial intelligence.
Acknowledgements. This work was supported by the National Research Nuclear University MEPhI.
References 1. Accenture: Artificial intelligence will accelerate annual economic growth by 2035 - Inc. Russia (2017). (in Russia) 2. Belk, R.: You are what you can access: Sharing and Sharing consumption on the Internet. J. Bus. Res. 67(8), 1595–1600 (2016) 3. China has announced plans to become a leader in artificial intelligence by 2030 (2018). (hightech.fm) 4. Davenport, T.H., Ronanki, R.: Artificial Intelligence for the Real Word. Harvard Bus. Rev. 96(1/2), 108–116 (2018)
354
S. Nosova et al.
5. Gomber, P., Koch, Ya., Siring, M.: Digital finance and Fintech: current research and future research directions. Econ. Bus. 87, 537–580 (2017). https://doi.org/10.1007/s11573-017-0852 6. Hambrick, D.S., Fredrickson, V.I.: Are you sure you have a strategy? Acad. Executive Manage. 15(4), 48–59 (2001) 7. Huang, M., Rust, R., Maksimovic, V.: The feeling economy: managing in the next generation of artificial intelligence. Calif. Manage. Rev. 61(4), 43–65 (2001). https://doi.org/10.1177/ 0008125619863436 8. European Commission. Ethics guidelines for trustworthy AI, 8 Apr 2019. huntonprivacyblog.com 9. Laurel, S., Sandstrom, S., Eriksson, K., Nyquist, R.: Digitalization and the Future Manage. Training: Manage. Learn. 51(1), 89–108 (2020) 10. Laurel, S., Sandstrom, S.: Comparing the impact of social and traditional media on disruptive change is evidence of the sharing economy. Technol. Forecasting Soc. Change 129, 339–344 (2018) 11. McCarthy, J., Minsky, M.L., Rochester, N., Shannon, K.E.: A proposal for a Dartmouth summer research project on artificial intelligence. Department of Computer Science. Stanford University (1955). http://wwwformal.stanford.edu/jmc/history/dartmo 12. Parker, G., Alstein, M., Choudary, S.: Platform Revolution: How Networked Markets are Transforming the Economy and How to make them work for you. Platform Revolution (2016) (mann-ivanov-ferber.ru ) 13. Parloff, R.: From 2016: Why Deep learning is suddenly changing your life. Fortune.com (2016). https://fortune.com/longform/ai-artificial-intelligence-deep-machine-learning/ 14. PwC. Leaders in the field of digitalization are named, April 2018 (itweek.ru) 15. Kleinert, J.: Digital transformation. Empirica 48(1), 1–3 (2021). https://doi.org/10.1007/s10 663-021-09501-0 16. Russians have been warned of a new type of card fraud (2019). https://www.rbc.ru/finances/ 14/11/2019/5dccdf3a9a79471a888f6e58 17. Rachinger, M., Rauter, R., Mueller, S., Forraber, W., Shirgi, E.: Digitalisation and its impact on business model innovation. J. Manuf. Technol. Manag. 30(8), 1143–1160 (2019). https:// doi.org/10.1108/JMTM-01-2018-0020 18. Samsonovich, A.V.: Socially emotional brain-inspired cognitive architecture framework for artificial intelligence. Cogn. Syst. Res. 60, 57–76. https://doi.org/10.1016/j.cogsys.2019. 12.002 19. Samsonovich, A.V.: On semantic map as a key component in socially-emotional BICA. Biologically Inspired Cogn. Architectures 23, 1–6. https://doi.org/10.1016/j.bica.2017.12.002 20. Sibel, T.: Why is digital transformation now on the shoulders of CEOs? McKinsey Quarterly, December 2017. http://www.dlinst.com/digital-transformation-the-ceo-imperative/ 21. Syam, N., Sharma, A.: Waiting for a sales renaissance in the fourth industrial revolution: machine learning and artificial intelligence in sales research and practice. Ind. Mark. Manage. 69, 135–146 (2018) 22. West, D.M., Allen, J.R.: How Artificial intelligence is transforming the world, 24 April 2018. (brookings.edu) 23. Nosova, S., Norkina, A., Makar, S., Fadeicheva, G.: Digital transformation as a new paradigm of economic policy. Procedia Comput. Sci. 190, 657–665 (2021) 24. Nosova, S., Norkina, A.: Digital technologies as a new component of the business process. Procedia Comput. Sci. 190, 651–656 (2021) 25. Nosova, S.S., Putilov, A.V., Norkina, A.N.: Fundamentals of the digital Economy: Moscow: KNORUS:378 (2021)
Artificial Intelligence Technology as an Economic Accelerator of Business Process Svetlana Nosova1(B) , Anna Norkina1 , Olga Medvedeva2 , Andrey Abramov2 , Svetlana Makar3 , Nina Lozik3 , and Galina Fadeicheva4 1 National Research Nuclear University “MEPHI”, Kashirskoe Shossestr. 31, 115409 Moscow,
Russian Federation [email protected] 2 The State University of Management, Ryazansky Prospektstr. 99, 109542 Moscow, Russian Federation 3 Financial University under the Government of the Russian Federation, Leningradsky Prospectstr. 49, 125599 Moscow, Russian Federation 4 Academy of Labor and Social Relations, Lobachevskystr. 90, 119454 Moscow, Russian Federation
Abstract. The article covers a range of problems related to the need to use artificial intelligence (AI) technology, on the one hand, as an economic accelerator of business processes in conditions of high competition in the global market, and as conditions for the growth of the economy as a whole. The main purpose of the article is to prove the feasibility of developing a conceptual framework that reveals the value of AI technology in areas such as probabilistic thinking, machine learning and computer vision to help managers better understand how promising achievements can be achieved that can eliminate some limitations in business growth and create a new wave of opportunities in the socio-economic development of the country. The results of the study confirmed that AI technology: firstly, helps to create an innovative product, better serve customers, allocate employees to solve more creative tasks, reduce costs and get high results; secondly, it gives a clear incentive to its developers, companies, politicians and users to solve the socio-economic problems facing them; thirdly, removes restrictions in obtaining massive datasets based on global cooperation in the field of digital transformation. An assessment of the possibility of using the experience of AI technology development in advanced countries has been proposed and a number of measures have been developed to adapt the development of foreign AI technology to meet the requirements of the national program of the digital economy of Russia in response to the COVID-19 pandemic. Keywords: Artificial intelligence · Digital economy · Business processes · Strategy
1 Introduction The real world is being updated by AI. The era of large-scale use of AI is coming. Now we can say: AI is a new economic accelerator, “a new phenomenon in modern economic © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 355–366, 2022. https://doi.org/10.1007/978-3-030-96993-6_39
356
S. Nosova et al.
life” [23], a new resource in the global market system of the 21st century, whose value is only increasing every day. The largest technology companies are already participating in the competition in the field of AI technologies and applications to accelerate their business processes [3]. The issues of human adaptation to the development of AI are relevant in the context of the inevitable transition to the digital transformation of the economy. Human participation is necessary throughout the life cycle of any AI application, from data preparation and algorithms to testing output data, retraining the model and verifying the results. As data is collected and prepared, human reviews are essential to process the data according to the requirements of the application. As algorithms sift through the data and generate outputs (e.g. classifications, outliers, and forecasts), the next important component is human analysis of the results for relevance, accuracy, and usefulness. Business and technology stakeholders typically work together to analyze AI-based results and provide appropriate feedback to AI systems to refine the model. The absence of such human analysis and feedback can lead to inappropriate, incorrect or inappropriate results of artificial intelligence systems, potentially creating inefficiency, missed opportunities or new risks if actions are taken based on erroneous results. Consideration of the traditional type of organization and conduct of research within the framework of AI development has shown the presence of a significant number of its advantages. The coherence of the activities of science, business and government under the influence of AI makes it possible to build up business potential and carry out developments in search of new forms of organizational activity. The development of interaction between science, business and government contributes to the transition to cyber-physical space and virtual reality, giving rise to new directions for the development and analysis of big data, their storage, patenting and protection. In the context of the collaboration of science, business, government plus AI, the range of research areas is expanding in order to improve methods and tools for using AI and big data. The implementation of priority projects of interaction between science, business, government and AI ensures an improvement in the quality of life of the population, increases jobs for young professionals of different profiles, and also minimizes the risk of bringing new innovative products to the market that are relevant in modern economic conditions. The Company fully supports the state strategy for the development of AI in Russia. Thus, the interaction of science, business, government and AI is one of the current trends in the development of the Russian economy against the backdrop of the digital transformation gaining momentum. Most experts agree that AI technology has practical value. Although in the 1950s, the viability of AI was the subject of fierce debate. But already in 2000, high-bandwidth networks, cloud computing and powerful graphics-enabled microprocessors appeared, researchers began to create multi-level neural networks - still extremely slow and limited compared to the natural brain, but useful in practical terms. The availability of massive data, improved algorithms, AI technology and the growing computing power of electronic devices lead to a higher strategic position of companies and the economy of each country as a whole. Therefore, it is no coincidence that the governments of different countries offer to solve urgent business tasks by implementing AI quickly and on a large scale.
Artificial Intelligence Technology
357
For example, the French President said: “AI is much more than just a field of research. AI is one of the keys to tomorrow’s world. This is not only a technological revolution, but also an economic, social, ethical and, consequently, political revolution” [11] Russian President Vladimir Putin stressed that “the leader in the field of artificial intelligence will rule the world [9]. China talks about its desire to become a leader in AI by 2030. All this indicates the accelerated implementation of AI in almost every business decision and business process.
2 Literature Most academic research on AI is usually published by researchers from technical disciplines such as computer science. These valuable studies, as a rule, do not give much insight into the prospects of business and management. In this regard, successful business development under the influence of AI technology requires a joint approach of theoretical, technical and managerial skills. In the book Artificial Intelligence: A Modern Approach, AI is defined as “designing and creating intelligent agents that receive perception from the environment and take actions that affect this environment” [25]. The book “Fundamentals of the Digital Economy” examines the role of AI in the new model of economic development in Russia [22]. AI is the subject of countless discussions and articles, from treatises on technical advances to tabloid headlines about its consequences. “Even as the debate continues, the technologies underlying AI are moving forward step by step, permeating our lives” ([7], p. 1). Machine learning, which is the basis of artificial intelligence and data science, solves the question of how to create computers that automatically improve through experience. This is one of the fastest growing technical fields of our time, lying at the junction of computer science and statistics. The introduction of data-intensive machine learning methods can be found in science, technology and commerce, which leads to more evidence-based decision-making in many areas of life, including healthcare, manufacturing, education, financial modeling, police and marketing [15]. Thus, Brynjolfsson and Mitchell outlined 8 criteria “that, in their opinion, machine learning systems should be able to perform” ([6], p. 1530). Experts note that “we are at a technological inflection point where robots are developing the ability to perform cognitive as well as physical work of some fractions of the workforce and thus become able to replace workers in many activities” [10]. Such a statement of the question actualizes the study of a new problem in economic life - the problem of the robot economy. The most exciting developments in the field of AI are advances in deep learning methods [8]. “Given the significant computational demands of deep learning, some organizations will maintain their own data centers due to regulations or security concerns” [20]. A lot of research has been done on the application of deep learning in the field of remote sensing [17]. Articles related to AI business theory usually describe various aspects of its application in various industries in order to better illustrate its impact on business processes.
358
S. Nosova et al.
3 Theoretical Foundations 3.1 The Conceptual Basis of the Accelerated Impact of AI on Business Process The conceptual approach to the study of the impact of AI on business processes is best defined through a digital transformation strategy [21] in companies and correctly assess its impact on the growth of the economy as a whole. AI is usually implemented and used with other advanced digital technologies in companies’ digital transformation projects. “Digital transformation projects that use AI mainly support the existing business of firms. The framework for the successful implementation of AI in the context of digital transformation offers specific recommendations in the areas of data, intelligence, grounding, integration, unification, flexibility and leadership ([4], p. 110). But due to the fact that the development of AI technology is not yet equipped with a range of some human cognitive capabilities, therefore, initially the development of AI can be used in solving routine tasks and only in the future will its value grow when three important events occur - improved algorithms, mass availability of data and more powerful hardware, which will allow AI to go beyond human cognitive abilities in the field of visual recognition and natural language processing. AI has already produced many significant and impressive products of its development. Although no one can predict the future in detail, although it is already clear today that computers with human-level intelligence (or better) will have a huge impact on our daily lives and on the future development of civilization. Life shows that the development of AI technology is essentially a startup in the transformation of economic activity on a global scale. Artificial intelligence methods are deep learning methods based on artificial neural networks that generate up to 40% of the total potential value that all analytical methods can give. Experts say that the results achieved highlight the significant potential of applying deep learning methods for use in the economy. These methods can provide significant additional lift than more traditional analytics methods. Recent breakthroughs such as improved algorithms, more powerful hardware, and the availability of massive data have enabled the use of AI technologies in business processes such as process optimization, human resource allocation, or product innovation. Therefore, such a study is aimed at identifying the opportunities and limitations of AI from a business point of view and developing a conceptual framework for describing the significant elements of AI. AI, despite the pandemic, continues to allow organizations to solve business priorities quickly and on a large scale. AI improves the company’s cost base by increasing human capabilities to motivate greater and higher efficiency, and also helps to increase or protect income, experience and involvement in the top line. It can be argued that the importance of AI is obvious. Key findings from the new data include: • AI is an economic accelerator. As the pandemic continued, the introduction of AI began to positively correlate with excellent business results in terms of revenue, costs and profitability. And this conclusion is true for all industries and regions. • The financial impact of AI has become apparent. Business leaders now recognize that investing in AI is directly related to the financial benefits of using AI.
Artificial Intelligence Technology
359
• Full commitment to AI pays off. Advanced AI users have achieved proportionally greater financial returns from AI technologies. • Investments are necessary prerequisites for progress towards more advanced AI implementation and time to evaluate it. • AI capabilities provide short-term financial results. The field of artificial intelligence is characterized by successive cycles of rise and decline. Advances in deep learning and other machine learning methods have opened the floodgates of artificial intelligence. AI shares have risen sharply since 2015. Startups based on artificial intelligence have become widespread. Industry leaders who did not immediately realize the increased importance of AI were faced with the need for AI, even when they were trying to realize all the benefits of investing in big data, business analytics and advanced analytics. In this environment, the core set of technologies with AI at the center turned out to be especially important during the pandemic. Several key trends related to AI have become particularly important, as reported in a recent study examining the impact of digital transformations of companies on their financial performance in the midst of a pandemic: • Digital transformation is accelerating: it is said that the pandemic has allowed the promotion of specific transformation initiatives that previously faced organizational resistance. • Technologically savvy companies show the best results: organizations that have already deeply and meaningfully implemented technology in business operations and processes, during the pandemic, consistently outperformed competitors in revenue growth by an average of 6 percentage points. AI, although central to transformation, is still underutilized. AI offers only the greatest opportunities in industries such as life sciences, banking, marketing and financial markets. Artificial intelligence technology is transforming the financial services industry around the world. Financial institutions devote significant resources to the study, development and implementation of applications based on artificial intelligence in order to offer new innovative products, increase revenue, reduce costs and improve customer service. Artificial intelligence technology has gained significant momentum over the past decade and has become more widespread in part due to the availability of low-cost computing power, large datasets, cloud storage and complex open source algorithms. The heads of financial institutions noted that AI will become the most important driving force of business in the financial services industry in the short term. 3.2 AI Properties AI Technology: • • • •
simulates the processes of human intelligence; analyzes the data and suggests something based on the user’s interests; predicts results using statistical algorithms and machine learning; improves machine learning systems without clear instructions;
360
S. Nosova et al.
• designs, manufactures and operates robots; • understands human speech as it is pronounced. AI technology has already become an integral part of business operations. Therefore, the consistent evolutionary and iterative development of AI is the future of the entire global economy [2]. While traditional analytics can provide analytical data based on data, cognitive analytics turns this data into recommendations. Cognitive people can understand unstructured information such as images, natural language, and sounds found in books, emails, tweets, blogs, images, audio and video files. Moreover, cognitive systems can reason with data to uncover meaning, learn iteratively to expand opportunities for more informed action, and interact to eliminate barriers between humans and machines. AI has the potential to be used in many sectors of the economy, such as automotive, energy, mining, finance, agriculture, security, transport, tourism and services. AI is increasingly changing the service, performing various tasks, representing the main source of innovation, but threatening human jobs. The replacement of AI jobs occurs mainly at the task level, not at the job level, and for “lower” (simpler for AI) intellectual tasks in the first place. The progress of replacing AI tasks from lower to higher intelligence leads to predictable shifts over time in the relative importance of intelligence for service employees. “Eventually, AI will be able to perform even intuitive and sensitive tasks, which provides innovative ways to integrate humans and machines to provide services” ([12], p. 171). “There is an “economy of feelings” in which AI performs many analytical and thinking tasks” ([13], p. 43). On April 8, 2019, the High-level Expert Group of the European Commission (“HLEG”) on Artificial Intelligence released the final version of its Ethical Guidelines for Reliable AI (“Guidelines”) [14]. The Guidelines set out the basis for achieving reliable AI and offer recommendations on its two fundamental components: (1) that AI should be ethical and (2) that it should be reliable, both from a technical and socio-economic point of view. AI turns traditional business into digital, uniting people in the process of production and consumption of goods/services, making them partners in the strategic development of the country: • makes the economy diversified, which leads to a variety of products, • makes information profitable, • reduces costs, introduces non-standard business models, offers a high level of service to customers and contractors, which forms the basis of the digital economy. “The system of collaboration in the Russian economy contributes to: • selection of priority directions for the development of science, technology and critical technologies as an element in solving strategic tasks of improving the technological structure of production; • effective integration of the Russian scientific and technical potential into the planetary system of innovative connections; • ensuring the continuity of the transition from basic research to innovation; development of a holistic, end-to-end system and institutional mechanism for state support of innovation activities” [24].
Artificial Intelligence Technology
361
4 Results 4.1 Strategy for the Introduction of AI Technology in Business Process The strategy for the introduction of AI and other cognitive technologies in business processes is developed by so-called innovators, who differ from other employees in the following parameters: • deep knowledge of cognitive technologies and concepts; • leadership in innovation; • recognition of the importance of implementing cognitive abilities for their organizations; • preparing their industry to use cognitive computing. Innovators demonstrate disproportionately high results compared to their competitors. They outperform their competitors both in terms of revenue growth and operational efficiency. They are able to realize the value of both structured and unstructured data. Innovators primarily consider AI and cognitive technologies as a driver of growth [1]. They identify customer retention, revenue growth and customer satisfaction as the key rationale for the introduction of cognitive technologies and consider cognitive capabilities as the most important factors for increasing revenue and significantly improving the quality of customer service. Initially, self-learning AI systems provide deep interaction with customers, thanks to which the technology underlying the interaction is recognized, studied and constantly improved. Strategic leaders cannot learn from successful efforts alone; they need to recognize the types of failures that turn into successes. They also need to learn how to manage the tensions associated with uncertainty and how to recover from failure in order to try new ventures again [18]. Subsequently, AI can empower employees and increase productivity by automating repetitive tasks. “Companies as diverse as Walmart, UPS and Uber have found ways to use AI technology to create new profitable business models” [5]. Government involvement is necessary to maximize the effects of AI ([23], p. 1209). Strategic, large-scale investments in AI enabling capabilities, driven by specific AI initiatives, are beneficial in themselves. Companies are beginning to realize the value even before the introduction of AI. For organizations struggling with the pandemic day in and day out, this point is perhaps the most intriguing. While it’s obvious that an enterprise’s path to greater maturity is neither simple nor immediate, the journey through AI can begin with tangible steps. For example, deploying a virtual agent with artificial intelligence support or re-optimizing demand forecasting using an intelligent recommendation engine can quickly provide a positive return. And this not only benefits today, but also lays the foundation for additional accelerated returns from AI in the future. As with other actions in the current state of emergency, it is necessary to gain new knowledge about AI. Almost all of the executives surveyed say they expect continued business turmoil. Many say that operational efficiency is actually improving as a result of the pandemic. Nevertheless, the same executives express a growing fear that the scale of innovation is declining. And, quickly. Turning a disruptive environment into a catalyst for AI ingenuity can eliminate concerns about declining innovation, helping leading AI developers identify the winning strategies of tomorrow.
362
S. Nosova et al.
4.2 AI Technology and the Russian Economy Russia and the whole world are experiencing the greatest catastrophe of the XXI century the COVID-19 pandemic. In this regard, the whole world must solve the tasks of AI development on an accelerated scale. The ongoing global economic crisis in the context of the pandemic is becoming more acute and worsens the situation in the world. Problems such as international terrorism, social stratification, environmental disasters and growing cyber threats have escalated and are rapidly leading humanity to major disruptions (turbulence) in the global market system, especially against the background of the unpredictable consequences of the COVID-19 pandemic. Overcoming the COVID-19 pandemic demonstrates the importance of global economic integration, which is based on a sustainable partnership between governments, as well as governments and the private sector in order to grow a competitive economy as a result of the introduction of AI and the transformation of economic relations. In the context of the COVID-19 pandemic, caused by restrictions on business activities, as well as the free movement of people, changes in the coordination of actions between countries are required. There is a “new quality of economic globalization in the pursuit of digitalization of business. In this regard, Russiais trying to develop a new model of economic development based on artificial intelligence technology. Russian government organizations are investigating the role of AI technology as a driver of business process growth, analyzing industry-specific problems of AI implementation and looking for their possible solutions. Russia is striving to solve key problems in the field of promoting its AI development with advanced world countries. In particular, Russia’s overall progress in AI technology and applications should keep pace with China and the US in this process, as the AI industry becomes an important point of economic growth. Russia hopes to make significant progress not only in next-generation AI technology, but also in technologies such as big data, swarm intelligence, hybrid advanced intelligence and autonomous intelligent systems. These keys to success will help overcome the low annual GDP growth rates in Russia. Thus, the annual growth rate of Russia’s GDP in the 1st quarter of 2021 was −0.7% [16]. But Russia has the technological reserves and human competencies to get out of this situation [27]. It is only necessary to make every effort on the part of business and government to achieve major breakthroughs in the field of AI. We need to make AI the main driver of industrial progress and economic transformation. The full power of the digital economy is determined by the development of the digital sector, where digital products and new technologies are directly created, which are now a new “oil” in Russia, the value of which increases over time. Hence, the time has come when the full force of economic development must be applied to a new type of production - digital. To this end, it is necessary to attract Western business partners engaged not in finance, but in engineering, strategies for the development of a digital product based on the digital model of the Russian economy. Therefore, it is no coincidence that most of the research in the field of digital economy is devoted to the analysis of the problems of digital technologies, company platforms, new forms of public administration and the reorganization of forms of joint activity of economic entities in the system of reproductive phases: production - distribution - exchange and consumption. If there is a gap between the phases, then turbulence in economic development inevitably arises. Digital
Artificial Intelligence Technology
363
technologies can eliminate it, as they contribute to the rapid and competent adoption of accurate decisions in a changing both internal and external business environment. This is because “digital objects can be copied many times, it can be implemented across a country, groups of countries and the entire planet. Integrated digital technologies embedded in the reproduction process allow companies to achieve greater benefits at lower costs, identify and analyze valuable information, plan strategies, predict results and cooperate within the framework of world experience. When we talk about the digital economy, we are not talking about numbers as such, but about digital technologies that transform the economy through the use of digital representation of information and cause economic growth. The Russian government will invest in a number of AI projects, encourage private sector investment in AI, and create a national AI development fund. It is important to note that the plan will also cultivate high-class talents recognized as an integral element of national competitiveness in the field of AI. Looking to the future, Russia intends to conduct high-quality research and development in order to increase its inherent national advantages due to both domestic and international “innovative resources”. “I really hope,” said Alexey Samsonovich, a professor at MEPhI, that artificial intelligence will be free from human shortcomings. Now, against the background of the development of biological and genetic weapons, artificial intelligence is the most harmless of the upcoming discoveries. I think it will be a big step forward, a big event for humanity” [26]. Russia, represented by the MEPhI Research Institute, is aimed at creating emotional AI. There is no doubt that AI is the real mainstream of our time.
5 Discussion Artificial intelligence (AI) stands out as the transformative technology of our digital age. Questions about what it is and what it can already be and more than that, what it can become - intersect with technology, psychology, politics, economics, law and ethics. AI is the subject of countless discussions and articles, from treatises on technical to popular articles on its consequences. Even as the debate continues, the technologies underlying AI continue to move forward, allowing applications to recognize faces in smartphones using AI algorithms to identify the causes of diabetes and hypertension with increasing accuracy. The use of AI around the world is growing rapidly and permeates our lives. Ultimately, the value of AI lies not in the models themselves, but in the organization’s ability to apply them. Business leaders will need to prioritize and make careful choices about how, when and where to use them. AI is not hype, but the ability to transform the economy through technological innovation, scientific knowledge and entrepreneurship. The progressive growth of artificial intelligence in the last decade makes AI the main technology responsible for maximum automation and connectivity, and thus brings the world closer to the beginning of the fourth industrial revolution. This will have a profound impact on governments, communities, companies and individuals. The extremely high capabilities of intelligent agents in various games, recognition and classification tasks open up opportunities for technological innovation, as well as product innovation. This leads to the development of assistive technologies and products for the disabled and the elderly. It also contributes
364
S. Nosova et al.
to the development of the toy and game industry, which will improve the entertainment experience and develop the cognitive and emotional intelligence of children. In conclusion, it should be noted that the introduction of autonomous technologies in almost every sector and the launch of a large number of machines and services based on artificial intelligence will improve health, educational opportunities, safety, transport, security, trade and all other aspects of life. However, there are some security, privacy and ethics issues related to the use of artificial intelligence technology that require a lot of attention. The innovation process and global competitiveness are enhanced as a result of the adoption of various strategies by corporate firms (companies and startups) to become AI firms. The actual intention is to develop with the help of the most advanced artificial intelligence technologies and win the technological race. It is advisable to thoroughly investigate the leading automation and artificial intelligence industries that will create more opportunities in the near future, namely: healthcare, cybersecurity, basic artificial intelligence, business analytics, marketing and sales. There is no doubt that AI has the potential to change the economy.
6 Conclusion 1. AI technology provides a competitive advantage for accelerating business processes if their strengths and weaknesses are well studied. The AI strategy should be well coordinated between the AI technology and the concepts of each business function, including data selection, determining the relationship between task concepts and technology, and fine-tuning the AI system. 2. It has been proven that AI technologies should be introduced into certain areas of activity with a thorough study of them, since automated solutions can destroy the reputation of a company if ethical and regulatory cases do not work well. 3. The business strategy for the introduction of AI should be aimed at ensuring that the results correspond to business goals and accelerated tasks of business processes when a new competitor appears on the market, interest rates, customer preferences, or global trends change. 4. AI is widely featured in academic and practical discussions about its potential to revolutionize business processes and institutional systems. However, to date, as Russian practice shows, companies are not quite ready to accept the transformative statements made regarding AI. In order to help them move along this path, the Russian state must provide conditions for the accelerated introduction of AI in the context of digital transformation. 5. Russia calls for encouraging cooperation between domestic AI enterprises and leading foreign universities, research institutes and teams. Russia will encourage its own AI enterprises to approach foreign mergers and acquisitions, equity investments and venture capital, while creating research centers abroad, but will also encourage foreign AI enterprises to create their own research centers in Russia. Ultimately, Russia’s AI agenda reflects its ambitions to take a leading role in the emerging international competition in this crucial technological field. To succeed in AI, Russia must create leading innovation and training bases, while improving a more comprehensive legal, regulatory, ethical and policy framework. AI is a fundamental and absolutely necessary economic accelerator for the development of future business opportunities.
Artificial Intelligence Technology
365
In the context of the COVID-19 pandemic, AI in Russia should take on a “scenario mission” to withdraw the economy from the current system of global turbulence.
Acknowledgements. This work is supported by the National Research Nuclear University MEPHI. «The reported study was funded by RFBR and BRFBR, project no. 20–510-00029 «Methodology of formation of cross-cluster interactions in the innovation sphere and their infrastructure in integration groupings».
References 1. Abercrombie, C., Ezry, R., Goehring, B., Marshall, A., Nakayama, H.: Accelerating enterprise reinvention: How to build a cognitive organization. The IBM Institute for Business Value, June 2017. https://www-01.ibm.com/common/ssi/cgibin/ssialias?htmlfid=GBE03838USEN& 2. Agrawal, A., Gans, J.S., Goldfarb, A.: What to expect from artificial intelligence. MIT Sloan Manage. Rev., 7 February 2017. https://sloanreview.mit.edu/article/what-to-expect-from-art ificial-intelligence/ 3. Bataller, C., Harris, J.: Turning artificial intelligence into business value. Today (2016). https:// pdfs.semanticscholar.org/a710/a8d529bce6bdf75ba589f42721777bf54d3b.pdf 4. Brock, J.K.U., Von Wangenheim, F.: Demystifying AI: What digital transformation leaders can teach you about realistic artificial intelligence. Calif. Manage. Rev. 61(4), 110–134 (2019) 5. Brynjolfsson, E., McAfee, A.: The business of artificial intelligence. Harvard Bus. Rev., 26 July 2017. https://hbr.org/cover-story/2017/07/the-business-of-artificial-intelligence 6. Brynjolfsson, E., Mitchell, T.: What can machine learning do? Workforce implications. Science 358(6370), 1530–1534 (2017). https://doi.org/10.1126/science.aap8062 7. Chui, M., Manyika, J., Miremadi, M., Henke, N., Chung, R., Nel, P., Malhotra, S.: Notes from the AI frontier: Insights from hundreds of use cases. McKinsey Global Institute (2018) https://www.mckinsey.com/~/media/mckinsey/featured%20insights/artificial%20i ntelligence/notes%20from%20the%20ai%20frontier%20applications%20and%20value% 20of%20deep%20learning/notes-from-the-ai-frontier-insights-from-hundreds-of-use-casesd iscussion-paper.ashx 8. Chui, M., Manyika, J., Miremadi, M.: What AI can and can’t do (yet) for your business. McKinsey Quarterly, January 2018. https://www.mckinsey.com/business-functions/mckinsey-ana lytics/our-insights/what-ai-can-and-cant-do-yet-for-your-business 9. CNBC Putin: Leader in artificial intelligence will rule world. CNBC, 4 September 2017. https://www.cnbc.com/2017/09/04/putin-leader-in-artificial-intelligence-will-ruleworld.html 10. Gillham, J., Rimmington, L., Hugh, D., Verweij, G., Rao, A., Roberts, K.B., Paich, M.: Macroeconomic impact of AI. PwC Report – PricewaterhouseCoopers (2018) 11. Gouvernement Artificial intelligence: “Making France a leader”. Gouvernement, 30 March 2018. https://www.gouvernement.fr/en/artificial-intelligence-making-france-a-leader 12. Huang, M., Rust, R.T.: Artificial intelligence in service. J. Serv. Res. 21(2), 155–172 (2018). https://doi.org/10.1177/1094670517752459 13. Huang, M., Rust, R., Maksimovic, V.: The feeling economy: managing in the next generation of artificial intelligence. Calif. Manage. Rev. 61(4), 43–65 (2019). https://doi.org/10.1177/ 0008125619863436 14. European Commission. Ethics guidelines for trustworthy AI, 8 April 2019. huntonprivacyblog.com
366
S. Nosova et al.
15. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015). https://doi.org/10.1126/science.aaa8415 16. Key indicators of the Russian economy (2020). http://take-profit.org/statistics/countries/russia 17. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https:// doi.org/10.1038/nature14539 18. Leitch, J., Lancefield, D., Dawson, M.: Find your strategic leaders. Leadership, 18 May 2016 19. National strategy for the development of artificial intelligence 2021/02/20. (tadviser.ru) 20. Nosova, S., Norkina, A.: Digital technologies as a new component of the business process. Procedia Comput. Sci. 190, 651–656 (2021) 21. Nosova, S., Norkina, A.: Digital transformation as a new paradigm of economic policy. Procedia Comput. Sci. 190, 657–665 (2021) 22. Nosova, S.S., Putilov, A.V., Norkina, A.N.: Fundamentals of the Digital Economy, p. 378. KNORUS, Moscow (2021) 23. Nosova, S.S., et al.: Artificial intelligence as a phenomenon of digitalization of the economy. Econ. Entrepreneurship 13(3), 1204–1209 (2019). (intereconom.com) 24. Nosova, S.S., et al.: The digital economy as a new paradigm for overcoming turbulence in the modern economy of Russia. Espacios 39(24) (2018) 25. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. by Prentice-Hall, Inc. A Simon & Schuster Company Englewood Cliffs, New Jersey 07632 (2010) 26. Samsonovich, A.V.: Science on the verge of creating an “emotional” computer, 18 March 2016. (mephi.ru) 27. Sovereignty and the “figure”. Russia in Global Politics. 2 (2021) (globalaffairs.ru). https:// doi.org/10.31278/1810-6439-2021-19-2-106-119
The Collaborative Nature of Artificial Intelligence as a New Trend in Economic Development Svetlana Nosova1(B) , Anna Norkina1 , Olga Medvedeva2 , Svetlana Makar3 , Sergey Bondarev4 , Galina Fadeicheva5 , and Alexander Khrebtov6 1 National Research Nuclear University “MEPHI”, Kashirskoe Shossestr. 31, 115409 Moscow,
Russian Federation [email protected] 2 The State University of Management, Ryazansky Prospektstr. 99, 109542 Moscow, Russian Federation 3 Financial University under the Government of the Russian Federation, Leningradsky prospectstr. 49, 125599 Moscow, Russian Federation 4 Plekhanov Russian University of Economics, Stremyanny Lanestr. 36, 117997 Moscow, Russian Federation 5 Academy of Labor and Social Relations, Lobachevskystr. 90, 119454 Moscow, Russian Federation 6 Belarusian State Technological University, Sverdlovastr. 13a, 220006 Minsk, Republic of Belarus
Abstract. In the article, the nature of artificial intelligence (AI) is considered as the result of its collaboration (cooperation) with a person represented by scientists from different fields of science and companies that carry out joint actions in its development and implementation in various fields of activity, as well as governments of different countries The purpose of the study is to help market economy actors effectively and intelligently use AI participation and move from experiments to obtaining reliable opportunities, that act as a source of competitive flexibility on the scale of the organization and the economic growth of the country as a whole. Research results: 1) Stakeholders - from inventors to business leaders and power structures should come together to define the processes of AI development, deployment and management; 2) among economic entities, it is necessary to cultivate a better understanding of the collaborative nature of AI as a basis for obtaining significant business benefits from improving productivity to adapting products for consumers; 3) businesses need to take on more responsibility, do more together, open organizations to implement AI in order to survive and, possibly, further success; 4) the collaborative nature of AI transforms the economy, turning it into a global dynamic, computable general equilibrium model; 5) Russia can become one of the leading countries in the development of the collaborative nature of AI through international cooperation in the development and use of AI as a new reality and a promising direction for the development of the modern economy. Keywords: Collaboration · Artificial intelligence · Business processes
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 367–379, 2022. https://doi.org/10.1007/978-3-030-96993-6_40
368
S. Nosova et al.
1 Introduction The collaborative nature of AI is manifested in the development of cooperation between representatives of science/education, business and the state in order to use information and knowledge in the process of production, distribution, exchange and consumption of goods and services and obtaining economic benefits. In a timely manner, the researchers of the MEPhI Research Institute under the leadership of a well-known scientist in the field of cybernetics A.V. Samsonovich emphasized that “AI should be able to establish a relationship of mutual understanding and trust with a person. A lot of work is being done on the basis of neural networks and cognitive modeling in the direction of creating a socio-emotional AI” ([24], p. 787). This formulation of the question gives the right to consider AI as a new participant in cooperation, which becomes a catalyst for broad structural transformations, since “economies using AI not only do things differently, but will also do different things” [2]. The collaborative nature of AI requires changing the way of thinking and introducing into thinking a new concept called "collaborative thinking", meaning AI cooperation with other players in the ecosystem, which provides favorable conditions for the emergence of a unique network synergy effect in its development. In this aspect, collaborative thinking is implemented in such trends as: • assistance to scientific and educational organizations in the development of accelerated economic development; • establishing partnerships between the public and private sectors to solve global problems cooperation with international organizations and states in order to develop a global collaboration based on AI. Collaborative thinking is one of the main conditions for industrial development and a critical factor in promoting the integration of new technologies such as the Internet of Things, cloud computing and blockchain, big data and Industry 4.0. The study of the collaborative nature of AI occurs through systematic analysis from basic mechanisms to practical applications, from fundamental algorithms to industrial achievements, from the current state to future trends in AI development. To get the most out of AI, firms need to understand which technologies create “a priority portfolio of projects based on business needs and develop company-wide scaling plans” ([8], p. 108). Several different behaviors are emerging among successful AI followers: • A model of strategic, top-level thinking: AI leaders maintain a balanced focus on profitable growth initiatives supported by AI, with a clear prioritization of functions, workflows and use cases that directly support the business strategy. They go beyond one-dimensional cost-based automation, even when faced with economic obstacles. • Prioritization model of targeted investments in AI capabilities: AI leaders engage in large-scale phased investments to create basic AI readiness capabilities. They prioritize AI and data projects that generate cash flows, rather than massive, quixotic and often uncoordinated efforts. They seek and realize tangible, quantifiable benefits.
The Collaborative Nature of Artificial Intelligence
369
• Human-centered approach model: AI leaders adhere to an ethical and thoughtful approach, including design, attracting talent and developing technical skills throughout the enterprise. They take a holistic approach to AI strategy, operational model development, team building and culture - because AI is not just a technological game. Companies are often held back by ambiguous strategic goals related to AI. Many organizations are burdened with outdated technical systems and inflexible organizational structures that cause unnecessary complexity in AI planning and execution. Others simply forget that, in fact, technology helps people—and does not put people at the center of all AI. As you know, large companies have already made huge strides in data analysis and storage. But “companies that don’t understand the premise or don’t prioritize the perspective of design thinking may end up paying a high price” [14]. Therefore, “scientists and developers continue to design and develop intelligent machines that can simulate reasoning, develop and study knowledge, and try to simulate how people think” [26]. There are many problems with AI. Undoubtedly, the development of AI collaboration will play an important role in their accelerated solution, especially in a wide range of applications and areas. “If you ask who will be the winner after the pandemic, then you can only answer this way: if one of the countries wins, especially in the race for artificial intelligence, as the most advanced technology used for data processing, and therefore for influencing processes, then the whole world may fall under the power of this state, because it will be the “ruler of the world” ([29], p. 657).
2 Theoretical Foundations 2.1 Collaborative Nature of AI: Theoretical Aspect Artificial intelligence” is a term that has caused many different interpretations. Its origin can be traced back to 1956, when the book “Studies of Automata” was published, which included now well-known articles in the field of cybernetics [3]. The term “artificial intelligence” broadly refers to the application of technology to perform tasks that resemble human cognitive functions, and is usually defined as “the ability of a machine to simulate intelligent human behavior.” Although the definitions of AI discussed above give a general idea of the meaning of this term, there is no single generally accepted definition of AI. In practice, AI is used as a generalizing term covering a wide range of different technologies and applications. Based on the analysis of the applicability of AI in various spheres of life, it was found that AI is such software systems that “make decisions and usually require a human level of knowledge, help people anticipate and solve problems as they arise. As such, they act intentionally, intelligently and adaptively” [22]. As Peter M. Senge correctly noted: “all societies are the product of their epoch and, in turn, create their epoch” [23]. Indeed, today the whole world is moving towards living in the era of artificial intelligence. In real life, from the point of view of macroeconomics, AI is recognized as one of the most innovative areas in the modern world [13].
370
S. Nosova et al.
To date, the use of artificial intelligence has attracted a lot of attention from researchers and practitioners in order to open up a wide range of useful opportunities for its use in business processes and the economy as a whole. The potential of AI will lead to significant economic growth in those countries where companies will massively use it in their field of activity. Currently, businesses need to actively move to the development of automated technical systems. The effective use of AI requires organizations to solve key data problems, including creating effective data management, defining ontologies, designing data around “channels” from data sources, and managing regulatory constraints. Given the significant computational requirements of deep learning, some organizations will maintain their own data centers due to regulations or security concerns, but the capital costs can be significant, especially when using specialized equipment. On the technical side, organizations will have to develop reliable data maintenance and management processes. Companies will have to consider efforts on the “first mile”, i.e. how to collect and organize data and efforts, as well as on the “last mile”, which is to make sure that the excellent ideas provided by AI are embodied in the behavior of people and processes of the enterprise. AI represents huge opportunities for economic development. Thus, a project undertaken by PricewaterhouseCoopers estimated that “artificial intelligence technologies can increase global GDP by $15.7 trillion, which is as much as 14%, by 2030” [18]. China is making rapid strides in AI development because it has set a national goal of “building a domestic industry worth almost $150 billion” by 2030 [17]. In general, the United States, China, North Korea and other countries invest significant resources in AI ([4], p. 1). 2.2 Collaborative Nature of AI: Theoretical Aspect First developed in the early 1940s, artificial intelligence technology has gained significant momentum over the past decade and has become more widespread in part due to the availability of low-cost computing power, large datasets, cloud storage, and complex open source algorithms. The growing role of collaboration with AI in business reigns in new, knowledge-based businesses. It manifests itself as a tendency to increase the company’s advantages, further promote the product, and increase profits. An AI-based business differs from its predecessors in behavior, style, and culture. It requires new management methods, strategies and codes of state regulation. These differences work in the field of high technology and in the service sector, i.e. where AI takes place. Collaboration with AI should be used to help managers understand the multiplicative effect of the company’s work from its application ([31], p. 77). In this regard, it is necessary to suppress negative attitudes towards the use of AI when efforts to use it encounter real barriers that can reduce the desired result. We need to move on to further investments in AI, seeing how others are moving forward. As established, AI creates a “moving target” problem: It’s really hard to hit a target when it’s moving. It’s disappointing, but also inspiring. Someone manages to hit the target, and someone does not. Therefore, there is a gap between leaders and laggards in the development and application of AI. But in order to overcome this gap, it is necessary to strengthen the cooperation of market economy entities with AI, which is the subject of our study.
The Collaborative Nature of Artificial Intelligence
371
Cooperation with AI is a powerful tool for the development of modern business, since: • • • •
allows you to develop new products and new effective business processes; contributes to the rapid spread of innovation throughout the industry; creates significant economies of scale; increases benefits as the use of AI expands.
Some tech giants, such as Google and Amazon, warn against the growing popularity of artificial intelligence. Now everyone thinks that AI is a convenient thing, but along with this, people ignore the dangerous consequences that should be taken into account. In this regard, it is necessary to show not only the advantages of AI, but also the risks associated with it that can be caused to society. [21] As artificial intelligence systems become more powerful and more general, they can outperform human performance in many areas. This can lead to extremely positive developments, but it can also pose catastrophic risks. This explains why the future of AI, despite having advantages, remains uncertain. Nevertheless, modern business cannot develop without the use of AI, since “the rapid introduction of AI in comparison with competitors will bring greater profit potential, which improves the business rationale of AI and, therefore, further encourages firms to implement it” ([6], p. 54). AI technology is transforming the financial services industry around the world. Financial institutions devote significant resources to the study, development and implementation of applications based on artificial intelligence in order to offer new innovative products, increase revenue, reduce costs and improve customer service. In a recent report based on the survey results, the heads of financial institutions noted that AI will become the most important driving force of business in the financial services industry in the short term, with 77% of all respondents expecting AI to have high or very high overall importance for their business within two years. Broker-dealers study and implement AI-based applications in various functions of their organizations, including customer service, investments and operational activities. In July 2018, FINRA requested comments from industry representatives about potential problems related to the use and control of artificial intelligence applications in broker-dealer firms. In response, the commentators recommended that FINRA conduct a broad review of the use of AI in the securities industry in order to better understand the various applications of the technology, the problems associated with them and the measures taken by brokerdealers to solve these problems. Based on this feedback, FINRA, through its Office of Financial Innovation (OFI), has been engaged in an active dialogue with the industry over the past year and has held meetings with more than two dozen market participants, including broker-dealer firms, scientists, technology providers and service providers, to learn more about the use of AI in the securities industry.
372
S. Nosova et al.
3 Results 3.1 The Impact of AI Collaboration on Business Processes and Economic Development in General Practice shows that specific AI technologies can be combined with advanced predictive and descriptive analytics and complemented by robotics and other forms of automation. As a result, AI technologies and cognitive computing provide completely new types of interaction with customers and customers, transform strategic innovations and business, or, in fact, rethink it. Cognitive computing refers to a new generation of information systems that understand, reason, learn and interact with people more naturally than traditional programmable systems. Better algorithms, mass availability of data and more substantial hardware that allow AI to go beyond human cognitive abilities in visual recognition and natural language processing should be considered on the basis of a conceptual approach to analyzing the development of collaboration with AI as factors of economic growth in the context of digitalization. At the same time, a number of factors become important (the risk of buying online, price and promotion, variety of products) that determine consumer behavior in a market economy. AI technologies provide a competitive advantage for firms if their strengths and weaknesses are well studied. The AI strategy should be well coordinated between the AI technology and the concepts of each business function, including data selection, determining the relationship between task concepts and technology, and fine-tuning the AI system. Relying heavily on AI capabilities can lead to unintended consequences due to biased data or undisclosed ethical situations. This provides opportunities to improve all business processes. AI technologies should be introduced into certain areas of activity with careful study, since automated solutions can destroy the reputation of a company if ethical and regulatory cases do not work well. Finally, we have confirmed that AI technologies can be used in many sectors, such as healthcare, automotive, energy, mining, finance, agriculture, security, IT, transportation, retail and e-commerce, education and insurance. The business strategy for the implementation of AI should be aimed at ensuring that the results correspond to business goals and objectives of business processes. If there is no AI implementation strategy, then it should be developed. It is useful to think about strategy whenever a competitor appears on the market, interest rates or customer preferences change, or global trends. If there is a strategy, then it is a good idea to keep business processes up to date, in case things change. The AI strategy should be well coordinated between AI technology and the concepts of each business function, including data selection, determining the relationship between task concepts and technology, and fine-tuning the AI development system. Russia aims to create a strong AI. The idea of an economic perspective for AI development can help managers determine the timing of investment and the budget share of AI implementation. In other words, it is related to the definition of an investment strategy based on future assessment reports on the economic impact of AI and the economic indicators of the company associated
The Collaborative Nature of Artificial Intelligence
373
with AI. While the development of AI is not yet equipped with a range of certain cognitive capabilities of a person, initially its development can be used in routine, voluminous and not requiring intuitive intelligence. In the end, we present the development of the consequences of using collaboration with AI in business processes and the economy as a whole. (see Fig. 1).
Fig. 1. Results of the impact of AI collaboration on business processes and the digital economy as a whole. Source: compiled by the authors.
Taking into account the above factors, it is important to emphasize that AI technologies should be introduced into certain areas of activity with careful study, since automated solutions can destroy the reputation of not only the company, but also the economy as a whole, if ethical and regulatory rules do not work well. The goal of introducing AI into business processes is to cultivate a better understanding of the task at hand and use this understanding as a basis for future actions [25]. In particular, these reflections are directly related to the introduction of AI into economic activity when it comes to overcoming market failures (fiascos) in order to be able to work together, experiment, take on more responsibility and do more together to help not only overcome the turbulent state of business processes, organizations, and the economy as a whole, but to open them to collaboration and new panaceas in the face of social and business problems and failures, which is important for survival and, perhaps, even for success in the future. It is fair to note that under the influence of the COVID-19 pandemic, the borders of national economies are being erased, which leads to the creation of new forms of collaboration, in particular, “regional economic zones and the single economic space as conditions for survival in modern conditions.” ([29], p. 652). As for the modern era, as it is correctly stated, “it is necessary to assemble a team of potential strategic leaders with a collective task (highlighted by us), i.e. to create a fully developed solution to the problem or design a new critical potential and a way to generate it. Give them a small budget and a tentative deadline. Then carry out assessments
374
S. Nosova et al.
with the help of in-depth analysis” [23]. The field of AI is developing rapidly. In recent years, there have been breakthroughs in image and speech recognition, autonomous robotics, language tasks and games. Significant progress is likely to be made in the coming decades. This promises great benefits: new scientific discoveries, cheaper and better goods and services, medical achievements. But people should be able to exercise supervision and control over AI systems. First of all, AI should be regulated by all the laws that have already been developed for human behavior [12]. But especially people should be aware of the privacy risks associated with AI. 3.2 Collaborative AI Implementation Strategy Creating a collaborative AI implementation strategy means certain advantages in the market. Such a strategy will help to find the keys to success, and set the direction for achieving goals. It can also help expand the production of AI products or services. Without a collaborative AI implementation strategy, individual decisions are often made that contradict each other, which ultimately often worsen the financial and competitive position of the firm. The collaborative nature of AI should be aimed at ensuring that the results correspond to business goals and objectives of business processes. If there is no such AI implementation strategy, then it would be good to develop it. It is necessary to develop three key elements of a collaborative strategy for the implementation of AI: • the goal that the business is striving for • the scope of business activity, is an advantage that makes a business unique. The goal of a collaborative AI strategy based on competition is to properly achieve rapid and profitable growth. This requires leaders. As a rule, they have several personality traits in common: they can challenge the prevailing point of view without provoking outrage or cynicism; they can change course if their chosen path turns out to be wrong; they lead the investigation, as well as propaganda, as well as involvement and command, acting all the time out of deep-rooted humility and respect for others. Finding a strategic fit for AI technology means achieving the set goals in improving competitive advantage. Therefore, governments should consider a number of measures to promote the introduction of AI in skills development, competition, taxation and employment. But there is a general lack of reliable evidence and statistics to support data-driven policy development and effective monitoring of progress. “The world scientific community has come a long way from the development of artificial intelligence as a concept to its modern appeal as an area with almost limitless potential in changing the ways of carrying out activities in a functioning society. The ultimate frontier for artificial intelligence systems is still the achievement of a level of complexity corresponding to the level of the human mind” ([7], p. 23). AI and ITC. Today, the greatest relevance for the Russian economy should be seen in the development of innovation and territorial clusters (ITCs) as the most important condition for the transition to a new technological order in the coming decades of the XXI century for the sustainable economic development of Russia as a result of minimizing problems and strengthening its potential in the ecosystem. ITK, under the influence of AI,
The Collaborative Nature of Artificial Intelligence
375
strengthens the collaboration of economic relations between research and educational organizations, business, development institutions and the state, and also contributes to the adaptation of the obtained model of managing joint activities to solving the problems of sustainable economic development in the context of the digital transformation of the Russian economy. It has been established that there are all prerequisites for this - this is the rapid development of digital technologies, and with them the growth and rapid advancement of AI, without which there is no future. Therefore, it can be said that ITCs with high technological potential and highly qualified specialists should serve in the Russian economy as a source of digital innovations, including AI, creating new jobs for the population, increasing the viability of their territories and ensuring food security of the country as a whole. • To help the ITC work effectively in a complex environment, it is proposed to create applications and deploy them anywhere, monitor services, and ensure the security of the entire organization. In terms of ITC, the company can: get private access to services launched in other regions, • expand its own services by placing them behind the load balancing subsystem to implement a private channel, as well as access Azure Sentinel. Azure Sentinel is a scalable cloud-based information and security management system (SIEM). It provides the IT department with access to real-time security analytics and threat analysis throughout the organization and serves as a single solution for detecting alerts, identifying and proactively searching for threats, as well as responding to them. Data leaks continue to have a negative impact on business, so rapid detection and elimination of consequences become necessary for the security of your infrastructure. Azure Sentinel collects data in all parts of the hybrid cloud architecture and from other cloud service providers, supporting multi-cloud strategies. By combining global and industry-specific threat intelligence, the platform can also detect experienced attackers and reduce the number of false positives. Azure Sentinel uses artificial intelligence (AI) to help companies respond quickly and effectively to every threat. Azure Sentinel improves investigation and detection processes, thanks to the flow of analytical information about Microsoft threats, and allows you to create your own analytical data using AI and machine learning. Azure Sentinel Capabilities (SIEM + SOAR Cloud System): • • • •
Collection of security data throughout the organization. Threat detection using an extensive database of threat analytics. AI-based investigation of critical incidents. Response and automation of protection.
Thus, the main task for IT organizations is to provide a truly integrated solution for their developers, users and administrators. To help enterprises cope with the increasing complexity of cluster management, ITC managers need to work out the principles of administrative management in AI mode in order to increase productivity and flexibility without compromising security and compliance with ITC requirements, or rather to keep up with the pace of innovation growth in the economy.
376
S. Nosova et al.
The Use of AI in Cybersecurity. With the help of AI in the collaboration system, it is certainly possible to achieve significant success in some areas, one of which is cybersecurity. Many cybersecurity service providers currently offer products that use AI and machine learning (ML) to help detect and respond to cyber threats. According to the US Department of Homeland Security, “a large American bank based on machine learning pursues callers, robot calls and potential fraudulent calls.” [16]. Detecting most cyber threats requires analyzing huge amounts of data in search of anomalies or indicators of a possible attack [5]. After detecting a potential threat, further data analysis is required to determine the details of the attack, the impact of the violation and the consequences for computers. All this taken together requires a large number of numbers and viewing the data for anomalies. Efforts to hijack AI in order to obtain confidential information should be severely punished as a way to deter such actions. ([15], p. 3). The main reason why AI is included in the cyber domain is that it is a scalable way to ensure the security of an organization and the country as a whole. It acts as a magnifying glass for human efforts. The AI system ensures that analysts’ attention is focused where it is most needed. In a rapidly changing world, where many organizations have advanced computing capabilities, it is necessary to pay serious attention to cybersecurity. Countries should be careful to protect their own systems and prevent other countries from harming their security [10]. The technology of AI in the collaboration system is still developing, and all possible applications of AI in the cybersecurity space are still poorly understood. However, many AI-based systems are already being used to help protect computers and networks [11]. Fraud, threats related to data protection, new cyber threats, legal and reputational risks, as well as a shortage of AI specialists pose a real danger to business. Among the disadvantages of digital transformation, cyber fraud stands out - fraud committed using computer networks, information and communication systems and the Internet. “The Internet is widely used to promote extremist ideas and movements.” [14]. Currently, cyberbullying occupies a leading place in the list of losses experienced by organizations. The introduction of AI leads to the fight against cyberbullying. Here we get such a dilemma: AI versus AI, i.e. AI is first introduced into socio-economic life, and then, when the results of its activities are used by cybercriminals for criminal actions, it becomes necessary to eradicate them with the help of a new AI. Therefore, it is necessary to implement AI, and not to think about its usefulness. Despite the disadvantages of the digital transformation of the economy, we must admit that thanks to AI, Russiacan get away from gas and oil dependence and achieve exclusive positions in the system of international economic relations. And yet, despite the risks, the introduction of AI should fit into business processes constantly. “In this case, it should be about creating a strategy that will help make the best business decisions on the use of AI technology to meet the limitless needs of the population” [27]. In general, it follows from all the above that the digital transformation of the economy is a promising area of research.
The Collaborative Nature of Artificial Intelligence
377
4 Discussion If the issues of regulation and reliability of AI collaboration are not carefully considered, the reputation of firms can be destroyed due to the adverse impact of the product or service. AI forms global competitiveness for the future, promising to provide its followers with a significant economic and strategic advantage. Today, national governments, regional and intergovernmental organizations are striving to develop AI-oriented policies to maximize the prospects of AI technology, as well as to address its social and ethical implications [28]. Research shows that AI can increase consumer demand by providing personalized and/or better products or services. Similar studies focus on consumer decisions about whether to use the internet and mobile networks to purchase products. In this case, the essence of the AI decision-making mechanism depends on data collected from various sources, such as customers, transactions, sensors, devices, etc. However, biased data that can lead to undesirable consequences are based on automated solutions. On the other hand, many problems related to the development of AI collaboration can be effectively solved only at the regional or international levels. All development cooperation agencies should consider how to fully integrate collaboration into AI in order to achieve the intended goals.
5 Conclusion 1. Better algorithms, mass availability of data and more substantial hardware allow us to consider the collaboration of science, business and government in the AI system as an impetus for economic growth in the practice of strategic development of the country as a whole. 2. The collaboration strategy in the AI system should be well coordinated between AI technology and the concepts of each business function, including data selection, determining the relationship between task concepts and technology that provides fine-tuning of digitalization of business processes in various fields of activity with careful study, since automated solutions can destroy the company’s reputation if ethical and regulatory institutions do not work well. If there is no collaboration strategy in the AI system, then it should be developed. It is useful to think about strategy whenever a competitor appears on the market, interest rates or customer preferences change, or global trends.. 3. Collaboration in the AI system contributes to the formation of global competitiveness in the coming decades, promising to provide followers of the introduction of AI in business processes with a significant economic and strategic advantage. Today, national governments, regional and intergovernmental organizations are striving to develop policies aimed at strengthening collaboration in the AI system in order to maximize the prospects of technology, as well as to solve their social and ethical consequences. 4. Collaborative AI cybersecurity systems will continue to play an important role in managing cyber attacks in the coming years. Using machine learning, organizations will be able to easily detect such security breaches and ensure that information security authorities take the necessary measures in advance.
378
S. Nosova et al.
Acknowledgements. This work is supported by the National Research Nuclear University MEPHI. «The reported study was funded by RFBR and BRFBR, project no. 20–510-00029 «Methodology of formation of cross-cluster interactions in the innovation sphere and their infrastructure in integration groupings».
References 1. Artificial Intelligence. WIPO Technology Trends 2019. – World Intellectual Property Organization. – Geneva (2019) 2. Artificial intelligence for industrial growth for now and for the future IoT & AI World Summit Eurasia (2019). (iotsummiteurasia.com) 3. Automata studies Ed. by Shannon and McCarthy. Princeton - New Jersey, Princeton univ. press, p. 285 (1956) 4. Barton, D., Woetzel, J., Seong, J., et al.: Artificial Intelligence: Implications for China, p. 1. McKinsey Global Institute, New York (2017) 5. Brundage, M., et al.: The Malicious Use of Artificial Intelligence, University of Oxford unpublished paper, February 2018 6. Bughin, J., Seong, J., Manyika, J., Chui, M., Joshi, R.: Notes from the AI Frontier: Modeling the Impact of AI on the World Economy. MGI Discussion Paper, p. 56. McKinsey Global Institute (2018) 7. Chen, N., Christensen, L., Gallagher, K., Mate, R., Rafert, G.: Global Economic Impacts Associated with Artificial Intelligence. Analysis Group (2016) 8. Davenport, T.H., Ronanki, R.: Artificial intelligence for the real word. Harvard Bus. Rev. Boston 96(1/2), 108–119 (2018) 9. Digital Acceleration in the Time of COVID-19 IBM Nordic Blog, 7 December 2020 10. Economist “The Challenger: Technopolitics,” 17 March 2018 11. Ethical Considerations in Artificial Intelligence and Autonomous Systems, unpublished paper. IEEE Global Initiative (2018) 12. Etzioni, O.: How to Regulate Artificial Intelligence. New York Times, 1 September 2017 13. Gillham, J., et al.: Macroeconomic impact of AI. PwC Report. PricewaterhouseCoopers (2018) 14. How experts like Bo Zou from Toronto look at Design Thinking (2018). (digitalconnectmag.com) 15. Markoff, J.: As Artificial Intelligence Evolves, So Does Its Criminal Potential, New York Times, p. 3, 24 October 2016 16. Maughan, D.: Testimony before the House Committee on Oversight and Government Reform Subcommittee on Information Technology, 7 March 2018 17. Mozur, P.: China Sets Goal to Lead in Artificial Intelligence. New York Times, p. B1, 21 July 2017 18. PriceWaterhouseCoopers, “Sizing the Prize: What’s the Real Value of AI for Your Business and How Can You Capitalise?” (2017) 19. Purdy, M., Daugherty, P.: Why Artificial Intelligence Is the Future of Growth. Accenture Institute for High Performance (2016) 20. PwC AI Predictions. trust-infographic.pdf (2019). (pwc.ru) 21. Risks from artificial intelligence (2021). (cser.ac.uk) 22. Schubhendu, S., Vijay, J.: The applicability of artificial intelligence in various spheres of life ai-report-061020.pdf (2013). (finra.org) 23. Senge, P.: Leadership in Living Organizations - PDF | Leadership | Innovation (1999). (scribd.com)
The Collaborative Nature of Artificial Intelligence
379
24. Samsonovich, A.V.: Toward a socially acceptable model of emotional artificial intelligence. Procedia Comput. Sci. 190, 771–788 (2021) 25. The uncertain future of artificial Intelligence 8th International Conference on Cloud Computing, Data Science and Engineering (Confluence) (2018). [ https://www.semanticscholar.org/paper/The-Uncertain-Future-of-Artificial-Intelligence] Dasoriya-Rajpopat/DOI:10.1109/СЛИЯНИЕ.2018.8442945 26. Top 10 trends in artificial intelligence technologies for 2020 - AI Trends (2020) (powerzeka.com) 27. West, D.M.: The Future of Work: Robots, AI, and Automation, Brookings Institution Press (2018) 28. West, D.M., Allen, J.R.: How Artificial intelligence is transforming the world, 24 April 2018. (brookings.edu) 29. Nosova, S., Norkina A., Makar,S., Fadeicheva, G.: Digital transformation as a new paradigm of economic policy. Procedia Comput. Sci. 190, 657–665 (2021) 30. Nosova, S., Norkina, A.: Digital technologies as a new component of the business process. Procedia Comput. Sci. 190, 651–656 (2021) 31. Nosova, S.S., Putilov, A.V., Norkina, A.N.: Fundamentals of the digital Economy, p. 378. KNORUS, Moscow (2021)
Digital Technologies as a Process of Strategic Maneuvering in Economic Development Svetlana Nosova1(B) , Anna Norkina1 , Olga Medvedeva2 , Irina Aracelova3 , Victoria Grankina4 , and Lidia Shirokova5 1 National Research Nuclear University “MEPHI”, Kashirskoe Shossestr. 31, 115409 Moscow,
Russian Federation [email protected] 2 The State University of Management, Ryazansky Prospektstr. 99, 109542 Moscow, Russian Federation 3 Volgograd State Medical University, Pavlov Bortsov square 1, 400131 Volgograd, Russian Federation 4 Moscow Polytechnic University, Bolshaya Semenovskayastr. 38, 107023 Moscow, Russian Federation 5 Gzhel State University, Electroizolyatorstr. 67, 140155 Moscow, Russian Federation
Abstract. This paper presents ideas and practical proposals are presented on more efficient use of digital technologies in the interests of strategic maneuvering in the business activity of the economy, aggravated by the COVID-19 pandemic. Based on the transition to a new digital structure, we found that for effective strategic maneuvering in the business activity of the economy and overcoming its today’s turbulent state, it is required: first, the large-scale use of digital technologies as a component of modern integration processes in the international scientific space in order to minimize the problems and strengthening the economic potential of the country; secondly, the use of digital technologies as the “nuclei” of the interaction of science, business and governments, which contributes to the growth of a new strategic management system based on improving the scenario tools for forecasting, allowing to identify trends and incentives for the development of cluster zones in the regional development of the economy; third, the accelerated use of digital technologies caused by the COVID-19 pandemic, as a result of the consistent implementation of a coordinated strategy for the development of the Russian economy based on global cooperation in all fields of activity; fourth, to introduce “digital style” in strategic public administration, which includes specific measures to develop digital technologies, especially artificial intelligence technologies, in order to determine, identify and eliminate barriers, stimulate public and private participation in the introduction of digital technologies, as well as schedule a program to mitigate negative or unintentional consequences on the development of digital technologies, which will make it easier and focus on the country’s current efforts, making the way to the digital future in response to the COVID-19 pandemic. Keywords: Artificial intelligence · Digital economy · Business processes · Strategy
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 380–392, 2022. https://doi.org/10.1007/978-3-030-96993-6_41
Digital Technologies as a Process of Strategic Maneuvering
381
1 Introduction Strategic digital maneuvering scans the overall technological and competitive environment, considers scenarios, and presents a range of digital use cases that are essential to the business of an economy. Digital is changing everything: customer purchases, employment (Uber), investment and value creation. Namely, in this regard, the use of digital technologies “forced companies to reconsider the current strategy and learn how to cope with the problems caused by the increase in digitalization” and macro levels. But the dominant logic of management research is still based on assumptions derived from neoclassical economics, where aggregated data are analyzed using econometric approaches and, accordingly, assumptions about rational behavior are an integral part of management in a digital environment. Nevertheless, there were calls for a rethinking of the management of the digital economy. But this requires experience. Experts argue that “management is neither a science, nor a profession, nor a function, nor a combination of functions. Management is a practice, it needs to be assessed on the basis of experience, in context” ([12], p. 101). Therefore, it is necessary to learn new management of the digitalized economy. Digitalization covers the entire system of economic relations, since digital objects can be copied many times and they can be implemented throughout the country, groups of countries and the whole planet. “Digitalization is contributing to a historic turn in governance, that practice-oriented research can be conducted with less effort and improved quality, and that micro-level data in the form of digital archives and online content facilitates the adoption of critical perspectives” ([13], p. 340). Digitization can affect strategic maneuvering, especially in terms of “shifting the focus of digitalization from improving the performance of traditional tasks to introducing fundamentally new business opportunities and models for companies serving financial services” ([9], p. 537). In this regard, digital technologies can be viewed as a process of strategic maneuvering in economic development. During this period, large companies are faced not only with the problem of attracting resources, such as capital and labor, but also with the development of a competitive proposal. They also face an institutional structure that sometimes requires active transformation to better fit the entrepreneur’s business. The outcome of such processes is highly uncertain and should be scrutinized. In recent years, a special form of economy has emerged, often referred to as the sharing economy, defined as platforms for the exchange of goods and services using information technology based on non-market logics such as exchange, lending, provision and sharing, as well as market logics such as like rent and sale. Initially, business models of the sharing economy were found in the information economy [3], but in their modern manifestations have grown and entered industries related to physical goods and services. It is estimated that the various sectors of the sharing economy could collectively generate $335 billion in revenue in 2025. [23] The platform revolution and the growth of the digital economy, while not new, are still important megatrends that could revolutionize our economy in the coming decades. Recent studies have shown that digital technologies as a process of strategic maneuvering in economic development is not only an attempt to look into the future, first of all, it is a search for ways to create the future.
382
S. Nosova et al.
2 Literature The problems associated with the development of digital technologies in the mode of strategic maneuvering in business activity and the formation of a digital economy (digital economy) on their basis attracted the natural interest of scientists. The theoretical and methodological basis for the study of digital technologies is the theory of digital transformation, [18] management theory, economic theory, theories of institutionalism, strategic management. The beginning of the study of the problem of the digital economy in Russia was the statement of V.V. Putin, made during a direct line during the St. Petersburg International Economic Forum that “without the digital economy, Russia has no future” [25]. This formulation of the question immediately caused the development of the necessary government documents and at the same time there was a surge of scientific publications in Russia. Accordingly, funds were allocated to solve this problem, a working group was formed to coordinate it, etc. This was perceived by the professional community as a signal for action, and today hundreds of specialists speak up and discuss the topic of the digital economy on the fields of the Internet. Of course, the main goal of studying the digital economy is not to identify an exact definition, but to develop a mechanism for implementing the digital economy, as well as numerical criteria that characterize the degree of its development. We are experiencing a paradigm shift in economic development. The emphasis is on restructuring management, organizing network interaction between the government and business entities and society as a whole. Currently, more and more attention is paid to the role of digital technologies in the mode of strategic maneuvering in business activity in order to ensure sustainable economic growth of the country in the context of a pandemic [4]. Nowadays, terms such as digital technology and machine intelligence have become frequently used in scientific research and applied work. Gradually, a wide range of digital technology applications have acquired real value for business processes. Despite the fact that research related to digital technologies has been around for more than half a century, the amount of research on their use in business processes is gaining momentum. Studies assessing the economic impact of digital technology and the digital economic performance of a firm can help managers determine the timing of investments and the government’s stake in implementing them. In other words, it is supposed to plan the allocation of time and resources based on macroeconomic, sectoral and organizational reports and forecasts. Davenport and Ronanki ([8], p. 110) conducted a study of 152 cognitive projects to assess the impact of digital technology on business processes. They divided the digital impact into three parts: process automation, cognitive insight, and cognitive engagement. Governments place great emphasis on digital technologies, in particular artificial intelligence. For example, China has announced its desire to become a leader in artificial intelligence by 2030 [7]. The President of Russia, declaring the strategic importance of artificial intelligence for the near future, stressed that “artificial intelligence is, of course, the basis for the next leap forward of all mankind in its development. These are the socalled end-to-end technologies that… they permeate and will permeate all spheres of our life: production, social sphere, science and even culture - all this will be combined with each other”.
Digital Technologies as a Process of Strategic Maneuvering
383
3 Theoretical Foundations 3.1 Digital Technologies: Conceptual Analysis “A conceptual approach to studying the impact of digital technologies on business processes and the economy as a whole, including on the development of regions, is best defined through a digital transformation strategy” ([18], p. 660). Traditionally it was believed that economic growth can be achieved only as a result of building up new productive forces, especially in the manufacturing industry. In the context of digital transformation, the emphasis is on the development of digital communication. For such development to be effective, it is necessary to have not only good IT specialists, but also to promote the development of the cluster zone in the mode of creating a digital sector, where digital equipment and infrastructure are directly created (“a piece of digital economy”), especially in terms of creating robotics ([11], p. 43). Digital technologies have exponentially increased the state of the information space, and hence innovations, and hence the scale of strategic changes and transformations of the world economy. In this regard, the structure of the economic system is changing, and with it its dynamic properties. This interconnectedness underlies the fact that a key element of the digitalization process is the transition from analog or physical technologies to digital data systems. The Internet emerged, which increased the interconnectedness of the economy in a way that was unthinkable before it began operating in the early 1990s. The digitalization of information, combined with the Internet, creates a wide range of different combinations of information and knowledge use, through which the use of modern technologies and the availability of broader technical opportunities can be turned into economic opportunities. The Internet is all fueled by economies of scale, and platforms such as consumer electronics, mobile devices and urban infrastructure provide wider service availability to consumers as well as easier access to potential consumers. The development of digitalization should be considered as a new phenomenon in modern economic life, since it has led to an increased need for increasing the computing power of computers and the bandwidth of backbone networks, the emergence of completely new technologies such as big data, cloud technologies, artificial intelligence [5]. Practice shows that social stratification is aggravating in the digital economy, cyber threats are immensely growing, which lead humanity to a major failure (turbulence) in the global market system, especially given the unpredictable consequences of COVID19. Despite this, digital technologies are rapidly spreading across various sectors of the economy. Digital advances are increasingly being embedded in various business functions, including management, customer service, and more. Large tech companies are already competing in the digital arena to improve their business processes and hence results in a digital environment. “The properties of digital technologies can help solve pressing social and global problems, simplifying communication between science, business, government and civil society, increasing productivity, creating new opportunities for entrepreneurship and work, education and continuous improvement and expansion of professional qualifications, allowing to take into account special the needs of socially vulnerable groups, creating new opportunities for socially significant scientific research and mitigating the risks of climate change, lack of drinking water and food, energy shortages, etc. Digital technologies are changing the traditional form of business to digital
384
S. Nosova et al.
business, new business models” ([22], p. 77). New business models are being designed around digital platforms. The digital platform as a technological tool provides mediation processes. In fact, it contributes to the creation of added value based on digital collaboration, which is the basis for increasing the efficiency of companies’ business activities. 3.2 Digital Business Needs More Than Just a Strategy, But Strategic Maneuvering The process of strategically maneuvering the business of an economy under the influence of digital technologies is a dynamic rather than a static economy that is more about new activities and products than higher productivity. Digital technologies are driven by globalization, which has brought with it networks, mobility, integration, e-business, digital products and services, new organizational forms and much more. Digital technologies are transforming the economy and our entire world. Machine learning and deep learning algorithms and the hardware to accelerate them are transformative strategies in business processes. They provide business flexibility, and can help build in forms of flexibility if the situation calls for it. The strategy evaluates the current conditions and only then determines the future [1]. The emphasis is on “being the best in your environment” because being “good enough” is no longer “good enough”. Everyone is “good enough” now. If the product moves forward, increased profits can increase the advantage and the product can continue to be fixed in the market. Mechanisms for increasing returns exist alongside mechanisms for reducing returns in all sectors. But, in general, a decrease in returns prevails in traditional, resource-processing industries [6]. The growing profitability works in the high-tech and service sectors. Competitive business strategies, together with the competence of top managers, play a critical role in creating strategic digital value in organizations. Other critical factors are human assets, innovation and knowledge management, business partnerships, business alignment, and business IT support. All of these factors contribute to the growth of the strategic value of digital technologies and, therefore, sustainable competitive advantage. It is important to emphasize that digital technologies can contribute to the competitive advantage of an organization if an integrated approach to their use is adopted. Advances in digital technology, in particular, have attracted a lot of attention from researchers and practitioners and have opened up a wide range of useful opportunities for their use in business processes [2]. The potential of digital technologies will lead to significant economic growth in countries where companies will massively use digital technologies in their field of activity. Currently, business needs to actively move to the development of automated technical systems. Leveraging digital technology requires organizations to address key data challenges, including building effective data governance, defining ontologies, designing data around “feeds” from data sources, and managing regulatory constraints [10]. Given the significant computational requirements of deep learning, some organizations will support their own data centers due to regulations or security concerns, but capital costs can be significant, especially when using specialized equipment. On the technical side, organizations will need to develop robust data maintenance and management processes and implement modern software disciplines such as Agile and DevOps. Companies will
Digital Technologies as a Process of Strategic Maneuvering
385
need to consider efforts on the “first mile,” which is how to collect and organize data and effort, and the “last mile”, which is to ensure that the superior ideas provided by digital technologies, especially artificial intelligence, are translated into human behavior and business processes. Digital technologies are becoming an increasingly intelligent resource. Now we can say that digital technologies are a new resource in the economy of the XXI century, whose value is only growing every day. The main problems on the way of using digital technologies in business lie in the technical, organizational plane, but especially in the personnel sector, which lacks specialists with the necessary qualifications in this area. For example, Intel’s National AI Strategy for the United States focuses on four key pillars: • Create new job opportunities and protect people’s well-being. • Stimulating innovation through investment in research and development. • Responsible data release to accelerate AI development. Removing legal and policy barriers to the development and implementation of AI. This statement of the question indicates that digital technologies, especially artificial intelligence, should become the driving force behind global economic growth. To be successful in digital leadership requires coordination across academia, industry and civil society through digital entrepreneurship. Digital Entrepreneurship Strategy. The entire power of the digital economy is determined by the development of digital entrepreneurship, where digital products and new technologies are directly created, which are now the new “oil”, the value of which is increasing over time. The time has come when the full force of economic development must be applied to a new type of production - digital. To this end, Western business partners should be involved who are engaged not in finance, but in engineering, digital product development strategies based on the digital model of the economy. Therefore, it is no coincidence that most of the studies are devoted to the analysis of the problems of digital technologies in the system of the reproductive process [19]. Digital technologies can bridge the gap between its phases, as they facilitate quick and competent decisionmaking in a changing business environment, both internal and external. Integrated digital technologies embedded in the reproduction process enable companies to achieve greater benefits at lower costs, identify and analyze valuable information, plan strategies, predict outcomes and collaborate with global expertise. When we talk about the digital economy, we are not talking about numbers as such, but about digital technologies that are transforming the economy through the use of digital representation of information and causing economic growth. Significant economies of scale are integral to digital entrepreneurship. Digital entrepreneurship can be seen as a process of strategic maneuvering in both commercial and managerial spheres, i.e., as a market where supply competes with demand. At the same time, it should be borne in mind that the emergence of digital technologies and related entrepreneurship causes not only economic growth, but also competitive turbulence. For example, it is assumed that by placing too much emphasis on meeting current customer needs, companies are not adapting or adopting new technologies that
386
S. Nosova et al.
will meet future customer needs, and such companies ultimately fall behind. This phenomenon is called “disruptive technology.” The outcome of interrelated technological and institutional changes is highly uncertain and subject to extensive negotiations, as actors seek to influence the institutional structure in their favor. It has been found that since web platforms enable peer-to-peer transactions and allow the creation of new and unique combinations of resources that generate new products and services, digital entrepreneurship is becoming more common in many sectors of the economy, generating institutional conflicts, since new initiatives are often incompatible with formal and informal laws and regulations governing established industries. In our opinion, the main goal of digital entrepreneurship should be aimed at developing mechanisms for introducing digital technologies in order to increase the competitive production of goods and services, accelerate entry into world markets, in order to be ready to respond to the rapidly changing conditions in the development of the digital economy [15]. This should be facilitated by the digital sector or digital entrepreneurship, where digital technologies are directly created. Initially, companies face increased challenges. But they can be overcome. The digital sector is the backbone of the digital economy, which: • • • • •
turns industrial business into digital, makes production diversified, reduces costs, - implements non-standard business models, makes information profitable, offers a high level of service for clients and contractors.
Companies benefit from digital technologies in addition to direct productivity gains, a number of indirect benefits that materialize across multiple channels that are critical to understanding their multiplier role in the digital economy. World practice shows that digital transformation ensures the global economic superiority of the country that embodies it. It is reliable. In fact, this is the basis of the sixth technological order, the era of which is currently beginning. Innovative Territorial Clusters. “Biotechnology, genetic engineering, alternative energy, nanotechnology is currently being actively developed, but is possible on the basis of their development perspective the transition from machine-mechanical technology to “hybrid”, where machine technology is used, coupled with information technology opens the door to a new technological revolution” ([17], p. 171). The emergence of digital technologies and the associated digital entrepreneurship has influenced the creation of a new strategic business model in the form of innovative territorial clusters that provide new forms of relations between research and educational organizations, business, development institutions, government and civil society. In this case, the partnership is a source of collaborative ideas from engineers, scientists, manufacturers, designers and enthusiasts who focus on identifying market needs and solving deep engineering problems to unleash disruptive product innovation. The partnership is creating a new “digital factory” - a specialized facility focused on the production of a small batch of products at a fast pace, where ideas from the community will be created, tested and sold. The strategy of digital technologies should support their implementation at all levels of
Digital Technologies as a Process of Strategic Maneuvering
387
economic development, while ensuring the continuous interaction of science, business and government in the system of socio-economic development of Russia. Digital technologies that generate electronic goods or services provide dynamic efficiency, increased connectivity and numerous combinations of ideas that lead, in the mode of strategic maneuvering, to the interaction of science, business and government, which subsequently forms a new form of organizing entrepreneurial activity in the form of innovative territorial clusters. These organizations, making a truly integrated solution for developers, users, and administrators, are helping enterprises keep pace with the pace of digital growth in business. As a result of the interaction of science, business and government, the following principles must be observed: • develop new digital technologies; • create an integrated management, i.e. manage local spheres as one medium; • provide a production safety system as a result of the introduction of digital technologies. In this aspect, the formation of innovation-territorial clusters is one of the most important concepts in the modern Russian economy, since, based on world practice, “clusters can provide an increase in the competitiveness of not only regions, but the national economy as a whole” ([16], p. 7). The development of interaction between science, business and government contributes to the transition to cyber-physical space and virtual reality, giving rise to new directions for the development and analysis of big data, their storage, patenting and protection. In the digital economy, innovations are products of digitalization, as a result of which it is required to expand the range of research areas in order to improve the methods and tools for using big data. The implementation of priority projects of interaction between science, business and government ensures an improvement in the quality of life of the population, increases jobs for young specialists of different profiles, and also minimizes the risk of bringing new innovative products to the market that are relevant in modern economic conditions. The society fully supports the state strategy for the development of the digital economy in Russia. Thus, the interaction of science, business and government is one of the modern trends in the development of the Russian economy against the backdrop of an accelerating digital transformation. As for the development of regions, the basis for their development can be access to a new paradigm of economic doctrine as the development of cluster zones as a result of the interaction of scientific and educational circles, development institutions, government and civil society in the mode of accelerated digitalization not only at the macro and microlevels, but also at the mesolevel, i.e. regional. The country’s territory plays an important economic role. Its lagging behind in digital development negatively affects the competitiveness of the country’s economy as a whole. Therefore, it becomes urgent to search for incentives for the development of the mesoeconomics of modern Russia based on the introduction of digital technologies into the activities of innovative territorial clusters, which they transform the economic behavior of agents and market structures in all directions.
388
S. Nosova et al.
4 Results 4.1 Digital Technology Strategy in Russia For the stable development of the macroeconomics of Russia, the development of digital technologies in the mode of strategic maneuvering in the active business activity of business and the economy as a whole is of particular importance. Russia has set the goal of getting into the top five largest economies in the world. This is difficult but must be done despite the pandemic [11]. What to do? Is there a way out of this state? There is an exit. It is necessary to develop a new model of economic development based on the development and implementation of digital technologies in the mode of strategic maneuvering in the business activity of business and the economy. The strategy should be well coordinated between digital technology and the concepts of each business function in order to fine-tune the digital technology development system in our country to the global level [16]. To maximize the effects, government participation in funding research and development in the field of digital technologies is required. The government should invest in fundamental research on the future potential uses and adoption of digital technologies. Additional funding should also be allocated to government agencies that support basic research and digital innovation. In Russian companies, it is planned to transform the entire work of the enterprise, and not any individual business functions and directions [12]. The digital economy of Russia should work on the principles of public-private and intersectoral partnerships. Government incentives to increase the readiness and comfort of sharing information with the public and private sectors will help change the mindset of data as a product and will stimulate data sharing. The government can also increase the propensity to share data by investing in privacy-preserving AI research, which allows academic and industrial labs to access training data while protecting it from reverse engineering. In Russia, various levels of authority (federal, republican, regional, local) mainly use a targeted programmatic approach to develop and implement measures to support the regions of Russia. The study of the issues of program-targeted management of the economy shows the need to introduce a program-targeted approach in connection with the complexity of solving intersectoral and interregional socio-economic problems that go beyond the framework of a particular region. Groups of strategic measures at the regional and international levels, of course, contribute to the growth of the number and quality of innovative projects for the production and supply of final goods and services. Thus, the implementation of the interaction of science, business and government makes it possible to more accurately predict the competitive advantage of a particular type of activity, provided that its strategy has been correctly constructed. Therefore, governments and regulators need to provide support in the development of digital technologies in order to overcome possible turbulence in economic development under the influence of external factors, in particular foreign sanctions on the Russian economy. However, increasing access to reliable data must be coupled with privacy protection and comprehensive ethical measures. Enacting comprehensive federal privacy
Digital Technologies as a Process of Strategic Maneuvering
389
laws and policies requiring accountability for ethical design and implementation is critical to ensure that effective communication practices are protected that mitigate potential individual and social harm. With such a policy, data sharing can be an advantage, not a burden [14]. A successful national digital strategy requires an analysis of the existing regulatory and policy landscape to identify and remove any barriers to their development and implementation, while legislation and policies that promote responsible and ethical development need to be advanced. The government should avoid requiring companies to transfer or provide access to technology, source code, algorithms or encryption keys as a condition of doing business, and support the use of all available tools, including trade agreements and diplomacy, to achieve these goals. The issues of adapting the interaction of science, business and government to the development of Russian regions are relevant in the context of the inevitable transition to digital transformation of the economy. Consideration of the traditional type of organization and conduct of research within the framework of innovative territorial clusters has shown the presence of a significant number of its advantages. The consistency of the activities of science, business and government allows building up business potential and carrying out developments in search of new forms of organizational activity.
5 Discussion In today’s dynamic business environment, strategy must become dynamic, leading to new consumer behavior and changes in informal institutions. Changes to formal institutions also seem more likely in the city. “Exogenous shocks in the form of new technologies are more likely to be embedded in business processes and further spread in the digital economy, and this will ease its perception, including politicians and digital entrepreneurs, which allows them to constantly interact, thereby building trust associated with uncertain processes interrelated technological and managerial changes” ([20], p. 656). The combination of technological and institutional change is associated with high transaction costs, since the process is highly uncertain and involves many actors. Colocation of key actors in an area undergoing institutional change appears to be a favorable factor [27]. Negotiations can be continuous and ambiguities can be cleared more effectively when working in the same geographic area. Thus, cities contain the presence of potential allies for those actors who want to implement institutional change. It is easier to obtain a critical mass of participants in these platforms in a more geographically dense agglomeration such as a city, since cities contain a higher degree of diversity, which creates additional opportunities for institutional change. Research results show that technological advances have little negative impact on the ongoing digital revolution ([27], p. 1143). In order to understand the nature and future of the digital society, an understanding of digitalization is needed. Such an understanding must use ways of knowing other than rational thinking. To empower people to participate in a digital society, it is necessary to use the concept of “digi-grasping”. Through the “capture” of the digital world, an ethical and aesthetic attachment to society can be created. It is important to note that digital technologies are reforming the real sector of the economy, which forms the basis
390
S. Nosova et al.
of material production, creates the foundation for interconnected types of economic activity. Digital technologies such as advanced business intelligence, process automation and robotics, cloud technologies and intelligent control and monitoring systems are becoming the basis for innovation for the entire business. Big data and artificial intelligence are becoming a form of capital. Striving to be the first to take advantage of digital technologies and lead the digital revolution, many countries have devoted significant resources to their development and implementation. It is noteworthy that, as a result of strategic maneuvering, digital technologies lead to the emergence of innovative territorial clusters, which, in the end, are their producers.
6 Conclusion In order to increase the efficiency of strategic maneuvering in the business activity of companies and the economy, taking into account the large-scale use of digital technologies, the authors have developed the following recommendations: 1. Introduce a set of measures for adaptive and tactical correction of strategic plans for sectoral programs. A conceptual overview of management science can help to better identify and correctly assess the impact of digital technology on the strategic maneuvering of business and the economy. 2. Strategically develop the interaction of technical and managerial business skills. Ensure that the company’s strategy is understood and accepted by employees to ensure that current decisions and employee behavior align well with competitive intentions and with each other. Senior management needs to actively promote and take responsibility for spreading the company’s strategy by communicating its message directly to employees. In this aspect, it is important to identify strategic opportunities and limitations of business processes. 3. Consider the impact of digital technologies on the interaction of science, business and government as an effective form of ensuring strategic maneuvering in business activity by expanding the digital space based on the formation of innovative territorial clusters that, over time, can expand markets, strengthen the brand, achieve higher profits and maintain competitive advantages while other non-collaborating organizations do not get the same success even within the same industry.
Acknowledgements. This work is supported by the National Research Nuclear University MEPHI. «The reported study was funded by RFBR and BRFBR, project no. 20-510-00029 «Methodology of formation of cross-cluster interactions in the innovation sphere and their infrastructure in integration groupings».
References 1. Abell, D.F.: Defining the Business: The Starting Point of Strategic Planning. Prentice Hall PTR, Inc., Englewood Cliffs, NJ (1980). https://www.yandex.ru/search/?text=abell%2C+d.+ f.+(1980)
Digital Technologies as a Process of Strategic Maneuvering
391
2. Armstrong, D.S.: The value of strategic planning for strategic decisions: a review of empirical research. J. Strateg. Manage. 3(3), 197–211 (1982). https://doi.org/10.1002/smj.4250030303 3. Antoniou, H., Ansoff, I.: Strategic technology management. Technol. Anal. Strateg. Manage. 16(2), 275–291 (2004) 4. Bagire, V., Namada, J.: Managerial skills, financial capability and strategic planning in organizations. Am. J. Ind. Bus. Manage. 03(05), 480–487 (2013). https://doi.org/10.4236/ajibm. 2013.35055 5. Belk, R.: You are what you can access: Sharing and collaborative consumption online. J. Bus. Res. 67(8), 1595–1600 (2014) 6. Carlsson, B.: Digital economy: what is new and what is not? Struct. Changes Econ. Dyn. 15(3), 245–264 (2004) 7. China has announced plans to become a leader in artificial intelligence by 2030 (hightech.fm) (2020) 8. Davenport, T.H., Ronanki, R.: Artificial intelligence for the real word. Harvard Bus. Rev. Boston 96(1/2), 108–116 (2018) 9. Gomber, P., Koch, J.-A., Siering, M.: Digital Finance and FinTech: current research and future research directions. J. Bus. Econ. 87(5), 537–580 (2017). https://doi.org/10.1007/s11 573-017-0852-x 10. Hambrick, D.S., Fredrickson, V.I.: Are you sure you have a strategy? Acad. Executive Manage. 15(4), 48–59 (2001) 11. Huang, M., Rust, R., Maksimovich, V.: The economics of the senses: management in the next generation of artificial intelligence. Calif. Gov. Rev. 61(4), 43–65 (2019). https://doi.org/10. 1177/0008125619863436 12. Laurel, S., Sandstrom, S., Eriksson, K., Nyquist, R.: Digitalization and the future of management training: new technology as a means to promote historical, practice-oriented and critical perspectives in management research and management training. Manag. Learn. 51(1), 89–108 (2020). https://doi.org/10.1177/1350507619872912 13. Laurell, C., Sandström, C.: Comparing coverage of disruptive change in social and traditional media: evidence from the sharing economy. Technol. Forecast. Soc. Change 129, 339–344 (2018). https://doi.org/10.1016/j.techfore.2017.09.038 14. McCarthy, J., Minsky, M.L., Rochester, N. Shannon, K.E.: A proposal for a Dartmouth summer research project on artificial intelligence. Department of Computer Science. Stanford University (1955). http://wwwformal.stanford.edu/jmc/history/dartmouth/dartmouth.html 15. Mehr, H.: Artificial intelligence for civil services and government. Researcher at the Harvard Ash Center for Technology and Democracy. Center for Democratic Governance and Innovation, pp. 1–15 (August 2017). https://ash.harvard.edu/files/ash/files/artificial_intelligence_ for_citiz 16. Nosova, S.S., et al.: Regional clusters are in strategy of achievement of technological leadership of modern economy of Russia. Int. J. Appl. Bus. Econ. Res. 15(11), 247–254 (2017) 17. Nosova, S.S., Grankina, V.L.: Innovative Territorial Clusters (Monograph), 265 p. Rusines, Moscow (2017) 18. Nosova, S.S., Kolodnyaya, G.V., Novikova, N.N., Medvedeva, A.M., Makarenko, A.V.: The strategy of the digital transformation of the Russian economy in the XXI century. Int. J. Civ. Eng. Technol. (IJCIET) 10(02), 1638–1648 (2019) 19. Nosova, S., Norkina, A., Makar, S., Fadeicheva, G.: Digital transformation as a new paradigm of economic policy. Procedia Comput. Sci. 190, 657–665 (2021) 20. Nosova, S.S., Norkina, A.N.: Digital technologies as a new component of the business process. Procedia Comput. Sci. 190, 651–656 (2021) 21. Nosova, S.S., Norkina, A.N., Makar, S.V.: The digital economy as a new paradigm for overcoming turbulence in the modern economy of Russia. Revista Espacios 39(24), 27 (2018)
392
S. Nosova et al.
22. Nosova, S.S., Putilov, A.V., Norkina, A.N.: Fundamentals of the Digital Economy, 378 P. Knorus (2021) 23. Nosova, S.S., Ryabcun, V.V., Norkina, A.N.: Digital economy as new model of modern socio-economic development of Russia of. Econ. Entrepreneurship 3, 26–32 (2018) 24. Parker, G., Alstein, M., Choudary, S.: Platform Revolution: How Network Markets are Transforming the Economy and How to Make them Work for You. Platform Revolution (mann-ivanov-ferber.ru) (2016) 25. Russia has no future without a digital economy (2016). https://forklog.com/putin-no-digitaleconomy-Russia-has-no-future 26. PwC. The sharing economy is an assessment of income generation opportunities. Microsoft PowerPoint - Sharing Economy final_0814.pptx (blogs.com) (2018) 27. Rachinger, M., Rauter, R., Mueller, S., Forraber, W., Shirgi, E.: Digitalisation and its impact on business model innovation. J. Manuf. Technol. Manag. 30(8), 1143–1160 (2019). https:// doi.org/10.1108/JMTM-01-2018-0020
Evaluation of fMRI Data at the Individual Level Vyacheslav A. Orlov , Sergey I. Kartashov , Denis G. Malakhov , Mikhail V. Kovalchuk, and Yuri I. Kholodny(B) National Research Center “Kurchatov Institute”, Moscow, Russia
Abstract. This work is a continuation of research aimed at creating a method for optimal evaluation of fMRI data at the individual level. The article presents the results of experiments in the concealed information paradigm. During the study, certain aspects of the developed method were studied and the factors influencing the formation of such estimates were evaluated. The prospects of the developed approach to creating a method for optimal evaluation of fMRI data at the individual level, which allowed us to successfully identify significant stimuli for a person in 80% of cases, are shown. Keywords: fMRI · Forensic diagnostics · Concealed information paradigm · Evaluation of fMRI data · Individual level of analysis
1 Introduction The article continues the series of publications on the research conducted at the National Research Center “Kurchatov Institute” (NRC “Kurchatov Institute”) focused on the study of neurocognitive processes, including those underlying the diagnosis of information hidden by a person [1]. It is known that the study of the possibility of using functional magnetic resonance imaging (fMRI) to detect a person’s lies was started in 2001 [2]. The subsequent studies [3, 4] attracted the attention of scientists and specialists, contributed to the development of neuroscience and to the creating of an independent branch – forensic neuroscience, in which a separate direction was distinguished – fMRI-based lie detection [5]. In the USA, two commercial companies were established in 2006 [6], which began to provide fMRI-based lie detection (fMRI-BLD) services and present obtained results to the court as evidence [7]. However, the applied use of the fMRI-BLD did not have the required scientific justification and experimental verification, and therefore the activity of these companies was sharply criticized by scientists [8, 9]. This year marks twenty years of scientific research on the subject of fMRI-BLD. The analysis of scientific publications and several meta-analyses on the subject of fMRIBLD [5, 10–13] showed that no results suitable for their practical application in forensic sciences have yet been demonstrated, and also allowed us to identify the main reasons for the current situation. Due to the limited scope of this article, we will briefly mention only some of them.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 393–399, 2022. https://doi.org/10.1007/978-3-030-96993-6_42
394
V. A. Orlov et al.
The authors of the meta-analyses drew attention to the fact that most studies on the subject of fMRI-BLD were fundamental in nature, carried out in the differentiation of deception (DoD) paradigm, and the results of such studies were obtained at the group level (on groups of participants of experiments). Fundamental studies are undoubtedly important for the development of science, however, their results obtained at the group level are of little value for solving practical problems of forensic sciences, which work exclusively at the individual level. In addition, conducting research within the DoD paradigm presents interest for fundamental science only, it takes these studies beyond the interests of forensic sciences and excludes the possibility of using the results obtained in them for forensic purposes. Fundamental fMRI studies in the concealed information (CI) paradigm were conducted much less frequently than in the DoD paradigm, and the first study in the CI paradigm at the individual level was carried out only in 2009 [14]. In the following decade, no reports were found in scientific articles about the development of a technology for evaluating fMRI data at the individual level. It should also be noted that the analysis of foreign articles has shown a certain methodological incorrectness of a number of experimental studies performed in both paradigms: due to the limited scope, we will not dwell on this aspect in this article. So, the creation of a method for evaluating fMRI data at the individual level is in demand in practice: such assessment is necessary for both forensic neuroscience and neurocognitive medical research, when it is necessary to assess the dynamics of brain activity of a particular person.
2 Materials and Methods NRC “Kurchatov Institute” conducts a cycle of fundamental and applied neurocognitive research for various purposes. When organizing the research, special attention was paid to the creation of a methodically correct technology for performing fMRI experiments, therefore, at the initial stage, the experiments were of a fundamental nature and were focused on obtaining results at the group level. Earlier, specifically for these studies, an MRI-compatible polygraph (MRIcP) was created at NRC “Kurchatov Institute” [1]: the use of MRIcP significantly expanded the possibilities of monitoring dynamics of response of a studied person to the stimuli presented during fMRI scanning. In particular, MRIcP allowed: a) objectively divide participants of the experiments into groups depending on their reactivity [15]; b) identify interference that makes it difficult to register high-quality fMRI data, and develop methods to mitigate such interference [16, 17]; c) confirm possibility of using forensic tests in the CI paradigm during fMRI experiments, which provided sufficient amount of fMRI data for research [18]. It should be noted that the use of tests borrowed from forensic sciences imposed strict restrictions on the volume of fMRI data obtained: as a result of their application, very small stimulus sets could be presented, the minimum of which was formed of only five
Evaluation of fMRI Data at the Individual Level
395
stimuli. Therefore, in order to increase the volume of fMRI data and, thereby, increase the accuracy of detecting and evaluating the activity of brain areas as a result of the study, it was proposed to switch from traditional (standard) scanning intervals (for example, TR = 3 s [14] or TR = 2 s [19]) to ultra-fast ones [20] during fMRI, and the usefulness of such a transition was studied. It is known that the mode with TR = 2 s provided 42 axial slices to a depth of 8.4 cm along the axis parallel to the direction of the magnetic field – Z. In order to evaluate the effectiveness of the transition to the ultra-fast scanning interval, experiments were conducted (at the group level) with TR = 1 s. Scanning with TR = 1 s was better (compared to TR = 2 s) in signal-to-noise ratio and time resolution. The disadvantage of this scanning interval was the limitation of the number of recorded axial slices (maximum 48 pieces) in the 9.6 cm band along the Z axis. Therefore, the scanning interval with TR = 1.11 s was selected and investigated: this interval provided the same (with TR = 1 s) signal-to-noise ratio at a slightly worse time resolution, but allowed to increase the number of recorded axial slices (up to 51) and, thereby, the overview of the brain along the Z axis to 10.2 cm. After a preliminary study, a control check of the usefulness of switching to ultra-fast scanning intervals was carried out on a group of 22 participants in the experiments (righthanded men, students of a technical university aged 21–23, who reported no diseases at the time of the study). In total, the preliminary studies allowed us to form a methodically correct technology of the fMRI experiment and created prerequisites for the transition to finding ways to evaluate fMRI data at the individual level. The need to obtain such assessments at the individual level forced us to pay attention to some aspects that are of fundamental importance for forensic sciences. The phenomenon of neurovascular interaction, which underlies the fMRI method and consists of a regional change in blood flow in response to the activation of nearby neurons, directly indicated a way to assess the activity of brain zones – by counting the number of voxels detected (during statistical analysis at a given threshold) in these zones. Therefore, it was naturally assumed that in the case of creating a stimulus-zone assessment of fMRI data (measured in voxels; with the formation of the so-called beta values) recorded in brain areas structured according to the CONN atlas (https://web. conn-toolbox.org/) in the MNI coordinate system, conditions will be created for ranking the stimuli presented to a person according to their subjective significance. The processing of the registered primary fMRI data goes through two stages – preprocessing and postprocessing, which ends with the determination of the number of voxels that characterize the activity of the brain zone in the process of performing a particular cognitive task. Preprocessing has an unavoidable and difficult-to-control effect on the subsequent quantitative (in voxels) estimates, and, due to the specifics of this stage, it is impossible to repeat preprocessing identical to the original one. As a result, the estimates obtained – i.e., the number of voxels in each of the studied zones and their total number as a result of a particular test – have, to a certain extent, a spread. Therefore, to determine the influence of this factor on the obtained results, it was necessary to carry out repeated preprocessing of raw fMRI data for several times. The study of this factor was carried
396
V. A. Orlov et al.
out on fMRI data (obtained from 2 of these 22 datasets), which were subjected to the preprocessing stage for three times. In addition to the above, another question needed to be studied – how many voxels should be detected during statistical analysis, so that the assessment presented by them reflects the actual picture of the activation of brain zones.
3 Results The control check experiments clearly demonstrated that the transition from standard to ultra-fast scanning intervals is useful. Ultra-fast scanning with TR = 1.11 s made it possible to more effectively identify the active areas of the brain involved in the separation of perceived stimuli by significance, in particular, the forensic test with a concealed name (TCN) [18]. This is confirmed by the group statistical maps of the activity of brain areas (Fig. 1), which were obtained during TCN at different TR (2 s, 1 s and 1.11 s). The preprocessing of the primary fMRI data of two subjects was carried out for three times and the subsequent processing of these data eventually gave six groups of stimulus-zone estimates, which, of course, changed, and the changes in individual zones could reach 35%. It should be noted that in one of the subjects, the stimulus estimates of the brain zones activation, varying in magnitude with each preprocessing run, maintained the trends of interzone differences and remained invariably maximal at a certain stimulus. In the second subject of the experiment, the activation scores in 5 brain zones (out of 132, i.e. in 4% of cases) did not maintain the proportion of interzone differences, and the maximum scores shifted from one stimulus to another. The data obtained after preprocessing were subjected to the second stage of processing and completed with the formation of incentive estimates. The results of the study (conducted on 19 subjects from another sample) experimentally confirmed that the repeated conduct of the second stage on the basis of unchanged preprocessing does not affect the received stimulus-zone estimates: their values remain unchanged with an accuracy of up to a single voxel. The presented result indicates that, with the obvious variability of the preprocessing data, they reflect inter-stimular differences in the activation of brain zones and, thus, can serve as a basis for developing a method for evaluating fMRI data at the individual level. At the same time, ultra-fast scanning contributed to an increase in the stimuluszone assessment of the brain zones activity at the individual level when ranking stimuli according to their subjective significance. Experiments have shown that the number of voxels identified during the test can serve as a criterion for the quality of the performed registration of primary fMRI data. The analysis of stimulus-zone estimates in a group of 22 subjects allowed us to establish that when registering a total of less than 8,000 voxels, it is not possible to identify a significant stimulus: such data were obtained from 2 subjects and were excluded from the subsequent analysis. In the remaining 20 subjects, as a result, TCN was registered from 14,700 to 74,900 voxels, and with the help of a stimulus-zone assessment for selected zones [21], significant stimuli were correctly diagnosed in 16 people (i.e. 80%); some other researchers give similar results [14].
Evaluation of fMRI Data at the Individual Level
397
Fig. 1. Group statistical maps (p < 0.001) overlayed with a template model of the human cerebral cortex at: a) TR = 1.11 s (sample size – 20 people); b) TR = 1 s (sample size – 18 people); c) TR = 2 s (sample size – 18 people).
Realizing that such assessments performed at the individual level are rather rough, a pairwise interstimulus comparison of zonal assessments was carried out in order to determine significant differences in the activation of brain zones. This approach turned out to be promising: according to the empirically chosen criterion, the correct diagnosis of significant stimuli in the TSI was the same 80%, and the discussion of the obtained data will be the subject of the next article.
4 Conclusion The results presented above are preliminary in nature and require subsequent confirmation. At the same time, the obtained results, firstly, indicate the correctness of the chosen way of creating a method for evaluating fMRI data at the individual level. Secondly, the experiments allowed us to understand the reason for the current lack of methods for evaluating fMRI data at the individual level and the implementation of mainly fundamental research: surprisingly, the latter is much easier to conduct, since assessments at the group level allow us to level out a number of methodological inaccuracies and leave “acute issues” without discussion. Acknowledgements. The research presented above is an initiative internal research conducted by the NRC “Kurchatov Institute” (order No. 1059 of July 2, 2020 “Biomedical technologies”, 4.14).
398
V. A. Orlov et al.
References 1. Kovalchuk, M.V., Kholodny, Y.I.: Functional magnetic resonance imaging augmented with polygraph: new capabilities. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 260– 265. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_33 2. Spence, S.A., et al.: Behavioral and functional anatomical correlates of deception in humans. NeuroReport 12(13), 2849–2853 (2001) 3. Langleben, D.D., et al.: Brain activity during simulated deception: an event-related functional magnetic resonance study. Neuroimage 15, 727–732 (2002) 4. Ganis, G., et al.: Neural correlates of different types of deception: an fMRI investigation. Cereb. Cortex 13(8), 830–836 (2003) 5. Farah, M.J., Hutchinson, J.B., Phelps, E.A., Wagner, A.D.: Functional MRI-based lie detection: scientific and societal challenges. Nat. Rev. Neurosci. 15(2), 123–131 (2014) 6. Jones, O.D., Jones, O.D., Shen, F.X.: Law and neuroscience in the United States. In: Spranger, T.M. (ed.) International Neurolaw, pp. 349–380. Springer, Heidelberg (2012). https://doi.org/ 10.1007/978-3-642-21541-4_19 7. Wegmann, H.: Summary: neurolaw in an international comparison. In: Spranger, T.M. (ed.) International Neurolaw, pp. 381–411. Springer, Heidelberg (2012). https://doi.org/10.1007/ 978-3-642-21541-4_20 8. Greely, H.T., Illes, J.: Neuroscience-based lie detection: the urgent need for regulation. Am. J. Law Med. 33, 377–431 (2007) 9. Deceiving the law. Nat. Neurosci. 11, 1231 (2008) 10. Wagner, A.: Can Neuroscience Identify Lies? A Judge’s Guide to Neuroscience: A Concise Introduction, p. 13. University of California (2010) 11. Gamer, M., Verschuere, B., Ben-Shakhar, G., Meijer, E.: Detecting of deception and concealed information using neuroimaging techniques. In: Verschuere, B., Ben-Shakhar, G., Meijer, E. (eds.) Memory Detection: Theory and Application of the Concealed Information Test, pp. 90–113. Cambridge University Press, Cambridge (2011). https://doi.org/10.1017/CBO 9780511975196.006 12. Beecher-Monas, E., Garcia-Rill, E.: Overselling images: fMRI and search for truth. John Marshall Law Rev. 48(3), 651–692 (2015) 13. Rosenfeld, J.P.: Detecting Concealed Information and Deception: Recent Developments. Elsevier Inc. (2018) 14. Nose, I., Murai, J., Taira, M.: Disclosing concealed information on the basis of cortical activations. Neuroimage 44, 1380–1386 (2009) 15. Orlov, V.A., Kholodny, Y.I., Kartashov, S.I., Malakhov, D.G., Kovalchuk, M.V., Ushakov, V.L.: Application of registration of human vegetative reactions in the process of functional magnetic resonance imaging. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 393–399. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_51 16. Kholodny, Y.I., Kartashov, S.I., Malakhov, D.G., Orlov, V.A.: Improvement of the technology of fMRI experiments in the concealed information paradigm. In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) Brain-Inspired Cognitive Architectures for Artificial Intelligence: BICA AI 2020. AISC, vol. 1310, pp. 591–597. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-65596-9_73 17. Kovalqyk, M.B., Xolodny, .I., Kaptaxov, C.I., Malaxov, D.G., Oplov, B.A.: Komplekcnoe ppimenenie fMPT i MPT-covmectimogo poligpafa: novye vozmonocti ppi ppovedenii iccledovani qeloveka. Bectnik Boennogo innovacionnogo texnopolica «PA» T. 1. № 1. C. 112–116 (2020)
Evaluation of fMRI Data at the Individual Level
399
18. Kholodny, Y.I., Kartashov, S.I., Malakhov, D.G., Orlov, V.A.: Study of neurocognitive mechanisms in the concealed information paradigm. In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) BICA 2020. AISC, vol. 1310, pp. 149–155. Springer, Cham (2021). https://doi. org/10.1007/978-3-030-65596-9_19 19. Ofen, N., Whitfield-Gabrieli, S., Chai, X.J., Schwarzlose, R.F., Gabrieli, J.D.E.: Neural correlates of deception: lying about past events and personal beliefs. Soc. Cogn. Affect. Neurosci. 12(1), 116–127 (2017) 20. Feinberg, D.A., Setsompop, K.: Ultra-fast MRI of the human brain with simultaneous multislice imaging. J. Magn. Reson. 229, 90–100 (2013). https://doi.org/10.1016/j.jmr.2013.02.002 21. Kovalqyk, M.B., Kaptaxov, C.I., Oplov, B.A., Xolodny, .I.: fMPTdiagnoctika ckpyvaemo infopmacii na individyalnom ypovne. Bectnik Boennogo innovacionnogo texnopolica «PA». (v peqati)
The Impact of Internet Media on the Cognitive Attitudes of Individuals on the Example of RT and BBC Alexandr Y. Petukhov1,2(B)
, Sofia A. Polevaya2
, and Evgeniy A. Gorbov2
1 Lomonosov Moscow State University, Leninskie Gory, 1, Moscow 119991, Russia 2 Nizhniy Novgorod Lobachevski State University,
Gagarin Avenue 23, Nizhniy Novgorod 603950, Russia
Abstract. The information influence in the modern globalizing world is a serious challenge to the security of any state. This article presents the results of an experimental study of the way the modern Internet media affect the cognitive attitudes of individuals on the example of two leading international TV channels - RT and BBC. In order to conduct this study our team developed an experimental plan for the psychophysiological recording of deformation of cognitive attitudes under the external informational influence. The study was conducted at the Department of Psychophysiology of the Lobachevsky State University of Nizhny Novgorod from March to May 2018. The experiment was conducted on twenty-one (21) volunteers aged from nineteen to thirty-six, the average age of the group being twenty-four. Since the largest audience of modern communication networks is the younger generation, they became the focus of the study. The authors analyzed the deformations of the cognitive attitudes of individuals to identify distinctive features of these processes. Keywords: Cognitive attitudes · Information influence · Emotional maladjustment · Telemetry of heart rate · Survey · Internet media
1 Introduction The same information obtained from different sources may be perceived differently by the same individual. This phenomenon is often used in so-called information wars. This, in turn, increases the relevance of studies of the psychophysiological registration of the ways the cognitive attitudes of an individual are deformed by the external informational influence. Such studies can allow identifying specific mechanisms, algorithms, and patterns of such processes, which will, in turn, not only help with their correct explanation and definition but also can be used for forecasting in certain particular cases [1–3]. The study is aimed at solving a fundamental scientific problem, which consists in identifying the characteristic patterns of change in the psycho-physiological parameters of an individual during the deformation of their cognitive attitudes through external informational influence [4]. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 400–405, 2022. https://doi.org/10.1007/978-3-030-96993-6_43
The Impact of Internet Media on the Cognitive Attitudes of Individuals
401
The information influence in the modern globalizing world is a serious challenge to the security of any state. Improving methods and the development of communication networks makes this problem one of the most pressing issues of today’s world.
2 Method 2.1 Procedure The study was conducted at the Department of Psychophysiology of the Lobachevsky State University of Nizhny Novgorod from March to May 2018. The experiment was conducted on twenty-one (21) volunteers aged from nineteen to thirty-six, the average age of the group being twenty-four. Since the largest audience of modern communication networks is the younger generation, they became the focus of the study. Detail sampling: – Eighteen women (86%) and three men (14%); – Eleven subjects with university degrees (52%), ten more - students (48%); – Eight people - in the process of obtaining or have a degree in Psychology (38%) and 13 people who have chosen to pursue a degree in other fields, or a degree in psychology as a second higher education (62%); – Ten - unemployed (48%) and eleven - employed (52%). The subjects were given general non-biased information about a well-known event (problems with the admission of Russian athletes to the Winter Olympic Games of 2018), which was covered by global media. The BBC video is 4 min 11 s, and the RT is 3 min 59 s. Both videos are in the channel’s standard news style. The video demonstration order was randomly determined each time. After that, the test subjects were to put on a sensor to monitor their heart rate variability and fill in questionnaires that allow monitoring cognitive attitudes in the process of information influence on a scale of inclinations toward liberalism and conservatism. Next, the subjects passed the LEM test to determine the level of emotional maladjustment. We chose a tendency towards conservatism and liberalism, since for the tasks of our study we needed 2 different ideologies, with sometimes conflicting ideas, concepts, attitudes and principles, but at the same time widely represented in the media as sources of informational influence. In addition, this is a fairly classic comparison/opposition in psychology and philosophy. Heart rate variability, the level of emotional maladjustment are effective parameters in psychophysiology to determine the dynamics of the state of an individual in the process of information interaction [3]. Statements are divided into two types: first being a pronounced conservative position, the second - a liberal one. The resulting points are added, positive and negative values separately, and divided by 22 and −22, respectively, and multiplied by 100 to get representation of the level of conservatism and liberalism in percentage points.
402
A. Y. Petukhov et al.
Questionnaire 1 reflects the questions in Questionnaire 2, as well as Questionnaire 3 reflects questions in Questionnaire 4, thereby making it possible to trace how the informational influence affects the level of propensity towards a conservative and liberal worldview. As a result, if a subject gains a maximum of 22 points, then it can be argued that he or she shows the maximum (100% on this scale) tendency to conservatism at the time of the experiment. On the contrary, the score of −22 points suggests a maximum inclination towards liberalism. Comparing the results of the Questionnaire 1 with the results of the Questionnaire 2, as well as the Questionnaire 3 with the Questionnaire 4, we can trace the changes in the level of a tendency towards conservatism and liberalism after external information influence, which was carried out between the questionnaires.
3 Data Analysis There are a number of factors that influence the results of this experiment. In particular, it is necessary to take into account the specifics of the sample of individuals who participated in this study. Most of the subjects belong to the “younger generation”, which is characterized by a specific perception of information from the outside, namely increased distrust and skepticism towards the information offered [8]. At the same time, RT, according to the estimates of most experts [9, 10], expresses a relatively conservative, pro-government position, which can be perceived quite critically by the young part of the population. At the same time, the BBC is a relatively liberal media that expresses a Western position, often criticizing conservative values in Russia [11, 12]. The bright expression of such a position (conservative or liberal) may lead to its rejection and to the presented change in characteristics [12, 13] (Fig. 1).
Fig.1. Differences in levels of conservatism and liberalism before and after the information influence, taking into account the CONTEXT factor (RT - watching a news segment of RT; BBC - watching a news segment of BBC). A star indicates significant differences according to the Wilcoxon test (p < 0.05).
The Impact of Internet Media on the Cognitive Attitudes of Individuals
403
We conducted the assessment of the reliability of differences in the levels of conservatism and liberalism before and after the impact of information through news clips of RT and BBC. After watching the videos of both RT and BBC, the level of emotional maladjustment increases (p < 0.05) (Fig. 2). For this research we used the Wilcoxon test, since it is effective for small (up to 25 elements) samples. The increase in the level of emotional maladjustment can be associated with an active and purposeful informational influence carried out by these TV channels.
Fig. 2. Differences in levels of conservatism and liberalism before and after the information influence, taking into account the CONTEXT factor (RT - watching a news segment of RT; BBC - watching a news segment of BBC).
3.1 Discussion Thus, within the framework of this study, we studied the impact of modern media on the cognitive attitudes of an individual using the example of two leading TV channels, RT and BBC. We also developed an experiment plan for the psycho-physiological recording of cognitive deformities under external informational conditions and a method to quantify cognitive attitudes in terms of the propensity to conservatism and liberalism in the form of a questionnaire. Based on a group analysis of cognitive attitudes deformation, the following features were identified: • Watching the news clips of RT decreases the level of conservatism; watching BBC news clips, on the contrary, increases, the level of conservatism. There are a number of factors that influence the results of this experiment. In particular, it is necessary to take into account the specific features of the sample of individuals who participated in this study. Most of the subjects belonged to the “younger generation”, which is characterized by a specific perception of information from the outside, namely increased distrust and skepticism towards the information offered. At the same time, RT, according to the estimates of most experts, expresses a relatively conservative, pro-government position, which can be perceived quite critically by the young part
404
A. Y. Petukhov et al.
of the population. At the same time, BBC is a relatively liberal media that expresses a Western position, often criticizing conservative values in Russia. The outspoken expression of such a position (conservative or liberal) may lead to its rejection and to the presented change in characteristics. We discovered the following correlation between the level of emotional maladjustment and the deformation of cognitive attitudes: • Watching the clips of both RT and BBC drives up the level of emotional maladjustment. The increase in the level of emotional maladjustment can be associated with an active and purposeful informational influence carried out by these TV channels. In terms of comparison and interpretation of the results with existing studies on similar topics [12, 13], we should note the following: • The novelty of the study is determined by the use of the latest, including the author’s, experimental methods to identify the psychophysiological characteristics of cognitive distortions. And also in the use of psychophysiological methods to determine the degree of influence of modern media on individuals (in most studies, they are limited to polls and questionnaires). • The results obtained are rather unexpected, since the shift in the cognitive attitudes of individuals somewhat contradicts the declared positions of these TV channels. • However, the dynamics of changes in the cognitive attitudes of individuals (including on the scale of conservatism-liberalism) may have a slightly different look in the long term and regular review of these sources of information.
References 1. Kooi, B.W.: Modelling the dynamics of traits involved in fighting-predators–prey system. J. Math. Biol. 71(6–7), 1575–1605 (2015). https://doi.org/10.1007/s00285-015-0869-0 2. Faugeras, O., Inglis, J.: Stochastic neural field equations: a rigorous footing. J. Math. Biol. 71(2), 259–300 (2014) 3. Petukhov, A.Y., Polevaya, S.A.: Modeling of communicative individual interactions through the theory of information images. Curr. Psychol. 36(3), 428–433 (2016). https://doi.org/10. 1007/s12144-016-9431-5 4. Sebastian, G., Gaskell, G.M., Zwitserlood, P.: Stroop effects from newly learned color words: effects of memory consolidation and episodic context. Front. Psychol. 6, 278 (2015) 5. Eremin, E.V., Kozhevnikov, V.V., Polevaya, S.A., Bakhchina, A.V.: Web service for visualization and storage of heart rate measurement results. Russian patent. Certificate of state registration of the database № 2014621202 from 08.26.2014 6. McCraty, R., Shaffer, F.: Heart rate variability: new perspectives on physiological mechanisms, assessment of self-regulatory capacity, and health risk. Glob Adv. Health Med. 4(1), 46–61 (2015) 7. Grigorieva V.N., Tkhostov A.S.: Method of assessment of emotional state of an individual. RF Patent RU 2291720 C1. Published 20.01.2007 in Patent Database no. 2 8. Kassikhina, V.E.: On legal education of younger generations. State Law XXI 2, 23–28 (2016)
The Impact of Internet Media on the Cognitive Attitudes of Individuals
405
9. Solomatin, A.N.: Communicative strategies of RT (Russia Today). Bull. Electr. Printed Media 2, 60–76 (2014) 10. Babayeva, S.: Free from morality, or what Russia believes in today. Russ. Glob. Aff. 5(3), 34–45 (2007) 11. Hosseini, F.: BBC versus Euro news: discourse and ideology in news translation. Russ. Linguist. Bull. 3(7), 128–132 (2016) 12. Subbotkin, V.D.: Global media and their influence on international policy (based on CNN and BBC). In: Vilshinskaya-Butenko, M.E. (ed.) Collection of Works: Articles of the Institute of Business Communications. Scientific Publication, Saint-Petersburg, pp. 49–54 (2017) 13. Prokofieva, V., Kostromina, S., Polevaia, S., Fenouillet, F.: Understanding emotion-related processes in classroom activities through functional measurements. Front. Psychol. 10, 2263 (2019). https://doi.org/10.3389/fpsyg.2019.02263
Applications of the Knowledge Base and Ontologies for the Process of Unification and Abstraction of the Information System Description Pavel Piskunov(B) and Igor Prokhorov National Research Nuclear University “MEPhI”, Moscow 115409, Russia
Abstract. Nowadays in the development of information systems (IS) the price of implementing a single piece of functionality is becoming less than the price of simplifying of designing, applying existing solutions and adapting technologies. The degree of abstraction of architectural solutions and amount of attention on transition from individual systems to the information landscape also growing. At the same time, the size of systems and the number of template solutions begin to go beyond the processing capabilities of one person. Also, high-level design can be more subjective then functional design. We are trying to solve this by simplifying the work with the representation of the system, its modules or data, by unifying and abstracting the description of the system. Unifying design through some generalizations and rules or creating elaborate notations is not a new idea. But we work with refining of the description itself, not the method of description and automation of translation, additions or changes in the scale (amount details) of the system description. We use semantic libraries, identification patterns and cognitive perception of the person involved in designing. The goal is to create some kind of analytical agent or digital assistant for IS design, capable of taking into account the specifics of a particular organization or subject area, and will learn through simple replenishing databases of rules and templates. Such utilitarian assistant should remove some workload from the architect as well as reduce his subjective influence. Keywords: Information system architecture · Cognitive representations · Abstraction · Meaning translation · Knowledge base · Translation pipeline · Ontology · Information system design
1 Introduction The larger the modern information landscape of an organization, the more resources is spent on maintaining each new round of its development. At its core, it is a gradually tying knot. At the moment, due to the implementation and improvement of tools and approaches in development, out of the problem of writing the code well this problem became more about designing a system and maintaining the effectiveness of its structure. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 406–411, 2022. https://doi.org/10.1007/978-3-030-96993-6_44
Applications of the Knowledge Base and Ontologies
407
Often it comes down to the need of processing new connections or restructuring entire segments of the system into separate closed modules [1]. Many such tasks are solved using certain architectural patterns. One of the most common now is the microservice approach, which defines a certain concept of dividing the system into independent modules, that operating for a separate business task. But the general concepts cannot help to untie the knot of complex poorly organized integrated systems or guarantee that it will not form. It only gives some idea that simplifies the design by individual teams - a recommendation, not an unambiguous solution. The support of Information systems in some adequate condition is provided by the architect or individual high-level developers. But when we rely on some person, we run into the limitation of the capacity of his brain and concentration, since he cannot add computational resources as a machine. On the other hand, in the case of collective design process, we run into the difficulty of transferring knowledge and way of efficient and rapid exchange of ideas between participants without data or quality loss. Both problems can be solved by reducing the subjectivity of perception and hiding some uniquely known and usual elements of the description to free person’s limited resources [2]. Within this paper, these problems are scoped to the unification of the system presentation to simplify the exchange and in adaptive abstraction of the description, so that the degree of detail of the description can be changed on the fly. The meaning of the word “description” within this article denotes a certain format describing interacting objects in the system and their connections, both functional and at the data level. The main goal is not so much about high-quality design of the system, but rather the simplification of working with it by people or other automated middleware (even a hypothetical AI will work and study better on narrowed and standardized set of concepts). Therefore, some of our ideas for unification and abstraction were taken directly from the general patterns of how a person perceives information and forms high-level solutions from the simple data. We use a concept of a knowledge base that implements some kind of memory of a specialist in some field, who also remembers a set of rules and best practices adopted in his organization or subject area. At the same time the described in this paper approach can be used to help form dynamic design requirements based on automated categorization [3]. In a sense, we can say that our goal is to create a meticulous designer assistant who has read all the working documentation and recommendations, knows a couple of languages, remembers all previous projects and how problems were solved there. He does not have intuition, but distinguishes recurring template cases. He also knows how to describe the final structure to other participants in the development process without any personal opinion using only rules. Also, the modular structure of the knowledge base allows you to connect and disable individual rule blocks. For example, when switching from translations for internal or external use (generalized and domain ontologies [4]).
2 System Description Translation Pipeline The general implementation of the concept of this article is based on the gradual modification of individual description nodes or connections by applying different types of
408
P. Piskunov and I. Prokhorov
rules and templates. Below you can see a generalized pipeline for the transformation of the system description (Fig. 1). It displays the stages and related blocks of knowledge (cropped blocks) represented by ontologies or structural templates. Each block of knowledge in this scheme can generally be more detailed in case of growth of the relevant knowledge objects (rules). Since at this stage the goal is only semantic transformation, the stages of loading/converting the input of description format and output (rendering, export) are omitted on the pipeline. We can say that we do not consider how the excitation of the ocular nerves transforms into information and how person builds ready-made thoughts into sound or another information flow.
Fig. 1. Overall pipeline with current knowledge modules
Concretization and meaning narrowing. At this stage, ambiguous terms and double interpretations should be cleared. There is a transition from a subjective set of possible names and descriptions of objects to a limited specialized set based on a dictionary of synonyms and a dictionary of “dangerous” or vague words (that has many meanings, very context sensitive or very formal for some broad system part). In human perception, this stage implies a translation into concepts closer to a person. At the same time with developing different ontologies for different purposes or employee types. It can give an opportunity to have a dynamic target audience, for example, different meanings for programmers and managers or have additional blocks of knowledge that provide the most correct normative meaning of words for working with various regulations and normative acts. Normalization is a stage that aims to break down the structure of the system by reducing the number of data duplications and various structural mismatches. This stage is directly related to the theory of database design. Also, this stage implies the identification of inheritance relationships between objects and the allocation of common objects from particulars. At the level of human perception, this is the identification of the properties of an object according to the class to which it corresponds. This stage is based on the rules of inheritance and type nesting, from a human point of view, this may be a structural identification of components (e.g. geometric shapes, color and material).
Applications of the Knowledge Base and Ontologies
409
Complementation (stage of completion of the structure) represents the most highlevel actions with the structure. At this stage, a database of patterns or template structures is used. In a way, they can be attributed to point best practices. By identifying repeats in the sections of the structure with the presented templates, recommendations for supplementing the structure can be created. In the human mind, this stage is represented by the experience of working with such objects, when a person learns similar cases and uses a ready-made solution or understands that something may be missing here. Contextual concretization (or semantic finalization) is the stage of the final elimination of ambiguity and possibly reducing the variability of names. This stage is generally similar to the usual specification, but implies getting rid of accumulated errors and further refinement, taking into account the changes made in the previous stages. Also this stage can use some contextual particularities that can help target some specific use cases and sub areas [5]. For example use more formal meaning for programmers or less formal and more human for managers. In human thinking, this is the stage of preparing external information, for example, when a person has formed his thought, but after hearing it entirely in his head, he transforms it for a better understanding by another person. These stages in some way correspond to the above stages of perception. The concretization block can be represented as filter of incoming information. Normalization is the grouping of small objects and their classification, complementation is the identification of existing patterns, contextual concretization is an analysis of the final picture and improvement of its representation in our head, or the formation of a qualitative description. The figure below show the simplest special case of transformation based on ontologies and the pipeline described above. We use subject area of the educational process of the university.
Fig. 2. Case of simple transformation in educational process area
This is some explanation how the pipeline can interpret them to improve structure. There are links next to the objects names on Fig. 2 that references corresponding rules described below.
410
P. Piskunov and I. Prokhorov
1. Specification A) Dictionary of synonyms (substitutions): learner => student. Student is too narrow but more common. B) Dictionary of dangerous words: adding the “educational” to the group and plan as they can be associated with too many meanings. 2. Normalization A) Rules for types nesting: Student = personal data + properties for training. Employee = personal data + contractual properties. Both rules generate the allocation of the “personal data” entity (became “person” after last step). B) Rules of inheritance: Teacher = an employee of a certain department participating in the teaching process. Department = a separate type of unit providing training. Both rules provoke the allocation of two heirs from an employee and a division, for the implementation of more highly specialized functions. 3. Complementation. Template “teacher – department”: almost always, the teacher is somehow connected with the academic department, even if this connection is described through an “employee – division”. Template “academic department - subject”: it based on the fact that usually only education departments are fixed in the education plan, which leads to the branching of the “department”. We can say that this is a forced allocation of purely functional objects without taking into account their internal structure. 4. Contextual concretization. “Personal data” is a normal definition, but for it there is a more effective and usual “Person” which in this case is free. In the case of a more complex structure, when a person would have already been used and separate processing of some personal data was required (as an option as “identifying document”) this renaming might not have happened. This is the essence of the contextuality of the stage and the possibility of correcting what was generated by the previous stages and target use case.
3 Conclusions and Future Steps This article presents the methodological basis and the pipeline for the transformation of the description of the IS structure, functional or data links and objects. As well as the applicability of the replenishing and adaptive knowledge base that implements the functions of a highly specialized trainee employee who can simplify the work on designing and maintaining large scaled system landscapes. This allowing to maintain objectivity, unification and desired level of abstraction of the IS description.
Applications of the Knowledge Base and Ontologies
411
At the moment, the implementation of the concept is based on a set of local rules applied manually. This is convenient for initial study, but for the mass filling of the knowledge base, as well as the ability to apply it outside of one system, we are gradually switching to using existing notations and an approach for describing IS. It is also possible to switch the description of the rules and ontologies themselves to UML [6]. We also plan to create a small toolkit to simplify the creation of rules and templates. To increase the flexibility of the knowledge base, we plan to apply the concepts of fuzzy logic and decision-making in determining which rules will be applied rather than use rigid predicates. At the same time, we trying to the problem of intersection between rules by teaching the system to choose the most appropriate or create a combination of actions from several rules. Now we are using an additional layer of rules between the choices or create cross-rules in advance for frequent cases. We and also work out the dynamic order of knowledge blocks in the pipeline. We fill our knowledge base using observed design errors or by processing different documentation. At the same time we work on a unification of simple dictionary block format. This will allow using some existent ontologies to accelerate the expansion of the knowledge base and simplify the work of those who have their own ontologies in their organization.
References 1. Hanseth, O., Lyytinen, K.: Design theory for dynamic complexity in information infrastructures: the case of building internet. In: Willcocks, L.P., Sauer, C., Lacity, M.C. (eds.) Enacting Research Methods in Information Systems, pp. 104–142. Springer, Cham (2016). https://doi. org/10.1007/978-3-319-29272-4_4 2. Lieder, F., Griffiths, T.: Resource-rational analysis: understanding human cognition as the optimal use of limited computational resources. Behav. Brain Sci. 43, E1 (2020). https://doi. org/10.1017/S0140525X1900061X 3. Moser, T., Winkler, D., Heindl, M., Biffl, S.: Requirements management with semantic technology: an empirical study on automated requirements categorization and conflict analysis. In: Mouratidis, H., Rolland, C. (eds.) CAiSE 2011. LNCS, vol. 6741, pp. 3–17. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21640-4_3 4. Li, Y., Cleland-Huang, J.: Ontology-based trace retrieval. In: 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE), pp. 30–36 (2013). https://doi.org/10.1109/TEFSE.2013.6620151 5. Avgerou, C.: The significance of context in information systems and organizational change. Inf. Syst. J. 11(1), 43–63 (2001). https://doi.org/10.1046/j.1365-2575.2001.00095.x 6. Baclawski, K., et al.: Extending the Unified Modeling Language for ontology development. Softw. Syst. Model. 1(2), 142–156 (2002)
Designing an Economic Cross as a Condition for the Formation of Technological Platforms of a Digital Society Olga Bronislavovna Repkina1,3 , Galina Ivanovna Popova2 , and Dmitriy Vladimirovich Timokhin1,2(B) 1 Moscow State University of Humanities and Economics, 109044 Moscow, Russia 2 National Research Nuclear University MEPHI, 115409 Moscow, Russia 3 Russian University of Transport (MIIT), 127994 Moscow, Russia
Abstract. The article examines the possibilities of using the economic cross methodology for the design of technological platforms. The assessment of additional economic efficiency from the use of “smart” technology platforms by leading Russian high-tech companies per one ruble of expenses was made. The economic efficiency of the use of infra-structural technological solutions for the interaction of high-tech companies of Russia with the external environment is investigated. The classification of leading high-tech Russian companies based on clusters of their needs for “smart” technology platforms is proposed. Based on the study of trends in the technological transformation of the IT market, the economic interpretation of each of the discovered clusters of high–tech companies in Russia is presented. The trends of technological development of the global economy are investigated. For the most significant in terms of the share of the leading Russian IT companies included in the cluster, taking into account the economic interpretation carried out, a system of recommendations for the formation of a technological board is proposed. The main problems that the economy will face when integrating the considered high-tech companies on the basis of a “smart” independent infrastructure ecosystem are identified. A system of proposals has been developed for the use of the “economic cross” methodology for early detection of relevant problems and their prevention, taking into account the already known difficulties of integrating private and corporate interests on the basis of unified intersectoral “smart” ecosystems. Keywords: Technological platforms · Smart technologies · Industry economic modeling · Technological convergence · Innovation · Rosatom
1 Introduction The technological transformation taking place at all levels of the value-added production chain in the modern economy is due to the introduction of smart technologies at the suprasectoral level. In the period 2022–2035, the formation of a new industry of technological aggregator services is expected. The product of these market participants will © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 412–418, 2022. https://doi.org/10.1007/978-3-030-96993-6_45
Designing an Economic Cross
413
include a layout of disparate technological products supplied by participants external to the aggregator industry, in the most appropriate way from the point of view of the needs of the market participant based on the concept of their combination developed by the aggregator. The importance of aggregators as participants in the innovation process is conditioned by the technological convergence of the modern production process. Such convergence is a consequence of the formation of industry 4.0, which implies an increase in the share of “smart” infrastructure in the cost of a single product. At the same time, the process of integrating manufacturers with diverse industry composition on the basis of "smart" technology platforms is difficult due to economic barriers. A comparison of technological solutions offered to business in 2010–2019 and the practice of integrating these solutions into business processes revealed an asymmetry that caused the technological transformation of business to lag behind existing opportunities [4]. The result of the rejection by business of the possibilities of technological reequipment of production processes was a slowdown in economic development and partial monopolization of high-tech markets by old leading companies such as Google, Apple, Microsoft. At the same time, this slowdown has provided an opportunity for growing economies, primarily China, to bridge the technological gap between national producers of innovative products and global high-tech companies. The slowdown in the technological development of the global economy at the expense of transnational corporations of the collective West has caused the loss of their unconditional technological superiority in a number of areas, including the development of 5G communication technology [5]. The forced experiment on the maximum transfer of economic relations to digital format, which took place in 2019–2021, became a trigger for eliminating the imbalance between the technological capabilities offered to business and the practice of companies in the field of technologization of the economic and production process. The proven results of eliminating technological asymmetry as of mid-2021 were: – more active use by consumers of the possibilities of online payment forms of interaction with the external environment, which led to the rapid development of the digital communications and e-commerce industry; – reformatting of the labor market based on the involvement of remote workplace organization technologies. At the same time, the active involvement of “smart” technologies in the life of modern society needs coordination at the infrastructural level. The demand for the services of technology platforms for the period 2019–2021 increased 5.6 times against the background of falling GDP in almost all leading economies of the world, with the exception of China. It is obvious that the period 2021–2025 will be of fundamental importance from the point of view of designing the infrastructure architecture of industry 4.0. The restructuring of economic and production processes on the basis of technological platforms requires a revision of approaches to the organization of sectoral development planning. The task of maximizing the synergetic effect through the integration of various technological solutions on the basis of already existing eco-system infrastructure technologies is of crucial importance. In addition, the technological structure of the business
414
O. B. Repkina et al.
based on the technology platform is multivariate, which actualizes the task of selecting optimal combinations of technologies and their revision as market conditions change in the shortest possible time.
2 Assessment of the Economic Prospects for the Technological Formation of Industry 4.0 Based on “Smart” Technology Platforms The forced experiment of transferring a significant part of economic interactions to the online mode, which took place in the conditions of the Covid-19 pandemic, raised a number of economic issues related to the obviously marked trends in the technological transformation of the global economy until 2021. These issues require reformatting the management system and forecasting economic systems. Let’s consider these issues in more detail. As intersectoral problems, it is necessary to highlight the problem of reducing the response time to changes in demand from private companies, the problem of technological convergence and the problem of integrating enterprises into the “smart” space. According to the results of the work of Nobel laureate Richard Taller, the economic model of pre-coronavirus demand was largely driven by behavioral mechanisms. The stability of the relationship between buyers and suppliers known to them both at the micro level and at the institutional level increased the effectiveness of marketing tools traditional for each micro niche of the product for this niche and reduced the effectiveness for other niches where these tools were used less often. Table 1 presents the results obtained by the author on the basis of an assessment of the effectiveness of the use of “traditional” and “smart” technologies of interaction with customers by TOP-10 Russian high-tech enterprises on average for 2010–2019. As a criterion for assigning the technology of interaction with customers to the category of “smart”, the share of costs for the IT component in the total price of the marketing tool by less than 55% was chosen. Based on the data calculated in Table 1, a cluster analysis is performed and a distance matrix is constructed (see Table 2). The distances highlighted in Table 2 are analyzed and grouped into two clusters, the economic meaning of which is interpreted by the author taking into account the trends of economic and technological development of the global competitive era; 3 clusters of companies are identified based on the model of the minimum Euclidean state. Combining high-tech companies into clusters makes it possible to identify the needs of each of the clusters in relation to the technology platform being formed. The calculation of the cost-effective cost for the formation of technology-tion of the platform shall be in accordance with the methodology of “economic cross” on the basis of the solution of the following system of equations: C(xi) + B = const
(1)
F(xi)−C(xi) = max
(2)
Designing an Economic Cross
415
Table 1. Comparative avarage assessment of the economic efficiency of expenditures per 1 ruble of costs for traditional and “smart” tools of interaction of the leading participants of the Russian IT sector with the external environment, for 2010–2020 Mail.ru (x1)
Yandex (x2)
Kasperskiy Laboratory (x3)
T8 ltd (x4)
Ulanotech (x5)
CRT Group (x6)
JSC «Radio» and microelectronics” (x7)
Angara (x8)
LLC «CMI Moscow» state university named after M.V. Lomonosov” (x9)
Usergate (x10)
Revenue 1.56 per 1 ruble of costs for “smart” tools of interaction with customers
1.59
1.87
1.45
1.76
1.56
1.23
1.41
1.32
1.34
Revenue 1.76 per 1 ruble of costs for traditional tools of interaction with customers
1.43
1.34
1.32
1.78
1.56
1.43
1.88
1.34
1.45
Formulas (1) and (2) use the following notation: B – the basic cost of development of the technology platform; calculated as the initial budget constraints on the formation of technological platforms for the period for which it is planned “economic cross”; C(xi) is the cost of development and operation of the technological platform by the participants of industrial Economics, implemented by leading technology companies and provides additional economic result from the use of this technology platform in the activities of the i-th of the company; F(xi) is the economic result of the use of digital platforms, get on one ruble invested in the creation and operation of the technological platform costs (including initial costs, including adjusted budget financing). Let’s calculate the matrix of Euclidean distances for the distribution of the results indicated in Table 1. Let’s use the agglomerative hierarchical classification algorithm. Let’s take the usual Euclidean distance as the distance between objects. Then according to the formula: (3) p xi,j = (Xi, j − Xj, i)2 where i is the the number of rows in Table 1; j is the number of rows in Table 1.
416
O. B. Repkina et al.
Bellow we carry out the necessary calculations according to Table 1 in accordance with formula (3): p x1,2 = (1, 56 − 1, 59)2 + (1, 76 + 1, 43)2 = 0, 33 p x1,3 = p x1,4 =
(1, 56 − 1, 87)2 + (1, 76 + 1, 34)2 = 0, 52 (1, 56 − 1, 45)2 + (1, 76 + 1, 32)2 = 0, 45
The obtained data on Euclidean distances characterizing the effectiveness of the use of “smart” and traditional infrastructure technology platforms will be placed in Table 2. Table 2. Matrix of Euclidean distances of the economic efficiency of using the TOP 10 Russian high-tech companies of “smart” and traditional technology platforms calculated using the “nearest neighbor” method x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x1
0
0.331
0.522
0.454
0.201
0.2
0.467
0.192
0.484
0.38
x2
0.331
0
0.294
0.178
0.389
0.133
0.36
0.485
0.285
0.251
x3
0.522
0.294
0
0.42
0.454
0.38
0.646
0.709
0.55
0.541
x4
0.454
0.178
0.42
0
0.555
0.264
0.246
0.561
0.132
0.17
x5
0.201
0.389
0.454
0.555
0
0.297
0.635
0.364
0.622
0.534
x6
0.2
0.133
0.38
0.264
0.297
0
0.355
0.353
0.326
0.246
x7
0.467
0.36
0.646
0.246
0.635
0.355
0
0.485
0.127
0.112
x8
0.192
0.485
0.709
0.561
0.364
0.353
0.485
0
0.547
0.436
x9
0.484
0.285
0.55
0.132
0.622
0.326
0.127
0.547
0
0.112
x10
0.38
0.251
0.541
0.17
0.534
0.246
0.112
0.436
0.112
0
For the purposes of forming the needs of technological platforms common to the “economic crosses” of innovative systems, let us consider the trends of convergence of the needs of high-tech corporations. For the period 2022–2025, we should expect the formation of 6 types of intersectoral technology platforms that ensure satisfaction of the cognitive needs of counterparties who are unable to conduct a detailed analysis of the totality of characteristics of a high-tech product. The main trends for the period 2022–2025 will be: a) b) c) d)
core modernization; risk management systems development; IoT infrastructure development; cloude and distributed platforms development;
Designing an Economic Cross
417
e) data and analytics & AI development; f) digital experience & digital reality technology development. The establishment of similarity of requests of leading domestic IT companies regarding the economic and technical characteristics of technology platforms allows identifying priority types of technology platforms in terms of organizing their state support. Ocenim, kakie iz ykazannyx pexeni v oblacti texnologiqeckix platfopm naibolee podxodt paccmatpivaemym oteqectvennym vycokotexnologiqnym kompanim v tablice 3. Let’s evaluate which of these solutions in the field of technology platforms are most suitable for the domestic high-tech companies. The study of Euclidean distances presented in Table 2 by the method of the smallest distances allowed us to group Russian IT companies into three clusters, the most capacious of which is cluster No. 1 (see Table 3). Table 3. Estimation of Euclidean distances between clusters of leading domestic IT companies grouped by similarity of requests of their economic cross to technological platforms x1,8,2,6,4,7,9,10
x3
x5
x1,8,2,6,4,7,9,10
0
0.294
0.201
x3
0.294
0
0.454
x5
0.201
0.454
0
The study showed the similarity of requests to technology platforms made by companies Mail.ru; Yandex; T8 ltd; Ulanotech; CRT Group; JSC “Radio” and microelectronics”; LLC “CMI Moscow” state university named after M.V. Lomonosov”; Usergate. Based on the requests of these companies and foreign competing companies operating in related industries [2, 3], the main features that will ensure maximum compliance of the economic cross of the technological platform with the requests of leading IT companies of companies in the implementation of their interaction with the external environment are identified: a) cognitive accessibility, that is, the ability to interact on the basis of a technological platform with companies of a wide range of people who do not have special industry knowledge, including a mass investor; b) infrastructure accessibility, that is, the compliance of the interaction infrastructure with the technological capabilities of IT companies; c) the possibility of barrier-free introduction of a new product to the market.
3 Conclusions Thus, within the framework of the conducted research, based on modeling the economic cross of the intersection of infrastructure requests of leading domestic IT companies to technology platforms, the largest group of IT companies with common technological
418
O. B. Repkina et al.
requests was identified. Taking into account the trends of technological development and those that developed in the pre-coronavirus period and, presumably, capable of maintaining stability in 2022–2025, the most general recommendations for the characteristics of a technological platform that meets the needs of the most capacious cluster of leading IT companies in Russia have been identified.
References 1. Budman, M, Hurley, B, Khan, A.: Tech trends Deloitte Development LLC. (2021). https:// www2.deloitte.com/content/dam/insights/articles/6730_TT-Landing-page/DI_2021-Tech-Tre nds.pdf 2. Cheshmehzangi, A.: Smart platforms and technical solutions: can we really achieve smartresilient models? In: Cheshmehzangi, A. (ed.) Urban Health, Sustainability, and Peace in the Day the World Stopped, pp. 169–176. Springer Singapore, Singapore (2021). https://doi.org/ 10.1007/978-981-16-4888-5_20 3. Liu, Z., Qian, P., Wang, X., Zhuang, Y., Qiu, L., Wang, X.: Combining graph neural networks with expert knowledge for smart contract vulnerability detection. IEEE Trans. Knowl. Data Eng. (2021).https://doi.org/10.1109/TKDE.2021.3095196 4. Fedotova, O., Platonova, E., Igumnov, O., Tong, B., Zhang, T.: The impact of the digital technological platforms on the institutional system of the higher education during the COVID 19 pandemic. E3S Web Conf. 273, 12069 (2021). https://doi.org/10.1051/e3sconf/202127312069 5. Hein, A., et al.: Digital platform ecosystems. Electron. Mark. 30(1), 87–98 (2019). https://doi. org/10.1007/s12525-019-00377-4 6. Ye, J., Jiang, Y., Hao, B., Feng, Y.: Knowledge search strategies and corporate entrepreneurship: evidence from China’s high-tech firms. Eur. J. Innov. Manage. (2021) ISSN 1460-1060. https:// www.emerald.com/insight/content/doi/10.1108/EJIM-02-2021-0111/full/html 7. Pimenova, O.V., Repkina, O.B., Timokhin, D.V.: The economic cross of the digital postcoronavirus economy (on the example of rare earth metals industry). In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) Brain-Inspired Cognitive Architectures for Artificial Intelligence: BICA*AI 2020. AISC, vol. 1310, pp. 371–379. Springer, Cham (2021). https://doi. org/10.1007/978-3-030-65596-9_45 8. Luzgina, K.S., Popova, G.I., Manakhova, I.V.: Cyber Threats to Information Security in the Digital Economy. In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) Brain-Inspired Cognitive Architectures for Artificial Intelligence: BICA*AI 2020. AISC, vol. 1310, pp. 195– 205. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65596-9_25
A Physical Structural Perspective of Intelligence Saty Raghavachary(B) University of Southern California, Los Angeles, CA 90089, USA [email protected]
Abstract. The BICA Challenge is about building cognitive architectures inspired by biological systems. To contribute towards that goal, this short paper presents a novel view of natural intelligence, stemming from the following broad observation: from the quantum scale to the cosmological, physical structures, by virtue of their design and constitution, result in appropriate phenomena. Such a physical structure-oriented view is shown to account for natural intelligence exhibited by a wide range of living systems, including but not limited to flora, insects, viruses, groups/colonies, and humans; the underlying principles at work could turn out to be quite useful in designing robust artificial systems that display intelligent behavior similar to those of natural life forms. Keywords: AGI · Artificial general intelligence · Artificial intelligence · Evolution · Adaptation · Intelligence · Physical structure · Natural phenomena · Biomimetics · Emergence · Computation · Analog computing
1 Approaches to A(G)I Traditional approaches to AGI have focused on designing cognitive architectures, which are complex computational models that usually contain components focused on specific intelligence-related aspects such as perception, reasoning, types of memory (eg. episodic, semantic), knowledge representation, reinforcement learning, emotion, action selection, etc. Quite a few such architectures have been proposed over the years; a good catalog of them is summarized in [1]. These architectures are all symbol-oriented, reflecting the long-standing symbolic approach that has dominated research and development since AI’s inception. Also, while not yet commonplace, creating AGI architectures based on braininspired connectionist ideas have been proposed [2, 3]. A hybrid symbolic/connectionist architecture called CogPrime has been in development, as well [4]. And finally, a very different approach has involved embodiment, where explicit human-created rules/data/goals to design AGI have been replaced instead, by ‘modelfree’ architectures that rely on physical (‘embodied’) agents directly engaging with their environment, displaying intelligence on account of their physicality [5, 6]. In this brief paper, we build on the ’embodiment’ approach, and present a view of intelligence that is based exclusively on the behavior of physical structures (rather than on explicit computation via digital processors). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 419–427, 2022. https://doi.org/10.1007/978-3-030-96993-6_46
420
S. Raghavachary
2 Structures → Phenomena (S → P) Physical structures exhibit phenomena - this principle is ubiquitous, and universal. By physical structure, we mean arrangement or assemblies of matter, from the subatomic to cosmological scales. Atomic nuclei, molecules, single crystals, cell membranes, prisms, sunflower seeds, tornadoes, rockets, galaxies - these are all examples of such structures. We differentiate physical structures, from intangible ones such as conceptual, logical, or computational structures. Physical structures exhibit/display/produce… spatio-temporal phenomena, on account of energy or matter exchange with their environment, undergoing chemical reactions, etc.; this might possibly lead to the structure itself undergoing change. Phenomena exhibited could be mechanical (eg. vibrational, oscillatory), thermal, optical, acoustic, electro-magnetic, chemical, etc., depending on the structure. Sometimes we use ‘behavior’ to colloquially refer to phenomena. One definition of ’engineering’ could be, ‘the exploitation of phenomena for useful purposes’, which would involve the design and construction of appropriate physical structures. In our engineered world, examples include MEMS (Micro Electro Mechanical Systems) sensors, lasers, filtration systems, antennae for radio waves, suction cups, turbines and myriad others. Modern materials science rests on this premise: form produces function (analogous to ’structures display phenomena’). The idea is to design and build specific structures tailored to desired functionality, eg. semiconductors, zeolites, optical coatings, quantum dots, etc. Aggregates of similar structures (eg. chunks of snow, gas molecules) can result in emergent phenomena (eg. avalanche, gas pressure) that are distinct from their own component phenomena, as a result of neighboring component structures interacting with each other (snow blocks sliding, molecules colliding) - such emergent phenomena are considered to occur at the group/ensemble/collective/assembly/aggregate/population level. This is true in the biological world as well, where statistical ensembles (ie. collections) of structures such as cells, receptors, follicles etc. display individual, as well as group behavior which might be emergent. A biological structure’s behavior (ie. exhibited phenomenon that constitutes its intelligence) can be regarded as a form of ’considered response’ [7], since structure can be presumed to have specifically evolved to display that behavior. For example, certain ion channel receptors (specific proteins) on our skin act as sensors for heat and touch (the discovery of these, led to the 2021 Nobel Prize in Physiology or Medicine, for David Julius and Ardem Patapoutian). In that sense, the following are analogous: structure → phenomenon, design → function, consideration → response. In other words, biologically speaking, consideration (of stimuli, for example) occurs via designed (evolved) structures, whose function (phenomena) constitutes the response. Structures can be assembled in such a way that their component phenomena interact to achieve a specific, ‘higher level’ (possibly surprising, and non-obvious) purpose/functionality. The most instructive and delightful examples of this would have to be the assemblies conceived and sketched by cartoonist Rube Goldberg [8]. A ‘Rube Goldberg Machine’ is a complex, hand-drawn assembly, whose components are comprised of common household items (eg. candle, bucket, rope…), and even pets, which
A Physical Structural Perspective of Intelligence
421
together would amusingly carry out a non-obvious task (eg. wipes a sitting person’s chin) by the synergistic combination of component phenomena (eg. string burning, a parrot flying, bucket dumping water, etc.) that are almost always sequentially executed. There is an important realization to make: the entire contraption can be regarded as an analog computer which displays (considers inputs, computes, and responds with) intelligent behavior. In that sense, the mechanism is the computer; in other words, physical structures can be engineered to compute.
3 Biological Structures The ‘structure → phenomenon’ principle readily applies to biological systems, ie. to living entities: life itself can be viewed as a collection of cooperating structures (at multiple levels, eg. molecular, sub-cellular…) that are custom-built (have evolved) to exhibit certain characteristics that qualify them to be living. The apparent circularity in this definition can be resolved, by considering that life on earth emerged from non-living entities (from inorganic molecules, via ‘abiogenesis’), about 3.5 billion years back - in this view, specific structures and their associated phenomena are what led to what we characterize to be ‘life’. 3.1 Life: In Terms of S → P For an entity to be considered ‘life’, it needs to exhibit the following three characteristics: autopoiesis, survival, reproduction. And in our view, biological structures exhibit phenomena that manifest all three: • autopoiesis [9] is the notion that life involves autonomous self-copying, via cells: a cell contains all requisite materials and mechanisms that permit it to replicate itself - it is self-organizing as well as self-fabricating. Autopoiesis is what permits growth, regeneration, repair etc. • survival not only includes feeding, and avoiding predators and other dangers (selfpreservation in general), but also, homeostasis - self-regulating processes (eg. perspiration that permits thermoregulation) that help a living system to maintain internal equilibrium. Survival includes intelligence as an aspect. • reproduction involves creating a replica of an organism that is biologically similar to itself. Autopoiesis, survival as well as reproduction involve exchange of matter, energy, and information (‘MEI’) between an organism and its environment - interestingly, this makes an organism function as a system that is ‘open’ to the world, but also ‘closed’ in the autopoietic sense of self-sufficiency. In summary, ‘life’ is comprised of evolved/biological structures and associated phenomena that carry out autopoiesis, survival and reproduction, via ‘MEI’ exchange with the environment.
422
S. Raghavachary
3.2 Natural Intelligence: In Terms of S → P Considering intelligence as an aspect of survival, like we just did, we come to the following realization: intelligence, which is a life process, is realized using structures and their associated phenomena. That is what we noted earlier as well, that a biological structure’s behavior, being its considered response, is ‘intelligent’. This also means that intelligent behavior is not limited to an organism’s aggregate ‘top-level’ response that is externally observable - rather, it is present at every level, from the creature, organ, tissue, cell, sub-cellular, down to the molecular - in the words of Michael Levin and Daniel Dennett, it is ‘cognition all the way down’ [10]. In this view, which we might call Ultra-radical Embodied Cognition (UEC), cognition is embodied, at not just the highest/external level, but at all levels of bodily organization - in however many arbitrary levels we might conceive it to be. Further, hypothesizing that consciousness solely arises from matter, we would use the terms ‘physical structuralism’ or ‘sphenomenality’ to characterize it, rather ‘physicalism’ or ‘materialism’, both of which are less specific. There are numerous examples [11–13] of structures in the biological world (that include both flora and fauna, ranging in a variety of scales, complexity and diversity, from bacteria to humans for example), where structures manifest intelligence. Here is a sampler: • bacterial proteins that perform signal transduction to help navigate to food sources (undergo chemotaxis) using their flagella • the existence of specific sensors for salt, in C. Elegans [14] - presumably to sense (salt-laden) bacteria which constitute their food source • sea stars using pressure differential among their feet (podia), to navigate in an aggregate yet decentralized manner • cats’ low-light vision that relies on retro-reflective pigments in their retina • certain butterfly species’ wing iridescence that results from light interference caused by ultra-thin layers • bat’s use of sonar, for navigation • certain fishes’ use of body-generated electric fields, to locate and stun prey • certain birds’ beaks’ containing a fulcrum mechanism to help crack nuts • maple tree’s seeds in a long ‘wing’ shape that balances seed weight, that help propel them far • baby locusts’ use of meshing gears (!) to perform jumping with coordinated feet [15] • muscle contraction effected by a regular placement (eg. along a hexagonal array) of elastic filaments [16] • planar as well as spatial (3D) distribution of ‘place’ cells to help higher animals encode and navigate surfaces and 3D space It is abundantly clear that structures and their phenomena are responsible for all life processes related to intelligence, more generally, survival. Eg. in higher animals, voice/speech production, hearing, vision, touch, smell, taste, proprioception, graviception, defense, camouflage… all involve specific structures in the body, together with matching structures (processing mechanisms) in the brain. None of these involve direct,
A Physical Structural Perspective of Intelligence
423
explicit, symbolic representations and their manipulation; instead, it is appropriate structures that ’do the work’ on account of their design, material makeup, and organization (ie. spatial makeup). It is indeed interesting that computations can be carried out by structures (which are comprised of organic molecules that make up all cells, organs) - obviously, no digital processing (using a stored program architecture) is involved. As a specific example, fruit flies perform vector addition using a set of suitably-evolved neurons that generate the direction of travel, regardless of head orientation - those neurons even transform the flies’ body-based inputs (in terms of their local coordinate system) into world-centric flight direction (eg. relative to a spatial target or direction) [17]. Rather than use a single general purpose programmable architecture (similar to a microprocessor), natural design appears to favor specialized processors (structures) instead, custom-created for the task to be solved. It is as if the structures (molecules, all the way up to organisms and even colonies) represent ‘compiled circuitry’, where computability has been designed directly into the structures themselves; the response generated via the structures’ phenomena, would correspond to task execution. Use of specialized processors (structures) presumably produces a design that is clean/elegant, scalable, robust, flexible. 3.3 SPSH (Compared to PSSH) The notion that intelligence results from physical structures and their phenomena, inspires the following Structured Physical System Hypothesis (SPSH): ‘A structured physical system has the necessary and sufficient means for specific intelligent response’. SPSH can be considered as a complement to the widely accepted PSSH - Physical Symbol System Hypothesis [18], which has resulted in massive gains for society on account of it underlying almost all AI advances to date [19]: ‘A physical symbol system has the necessary and sufficient means for general intelligent action’. While aspects of intelligence such as reasoning, planning, specific types of problemsolving, game-playing etc. can be simulated on a physical symbol system (specifically, on digital processors that are part of a von Neumann stored program architecture), there are other types of intelligence that are experiential (eg. kinesthetic, inter-personal). These do not need to be represented/computed using symbols; instead, they can be directly, actively, incrementally and continuously (in time) acquired via a body, stored in our memory, recalled for subsequent use, etc. Rodney Brooks [20] calls this, ‘Cognition without computation’ - presumably he means, without digital computation that employs a von Neumann architecture like we just mentioned. Body-based intelligence (eg. related to spatial navigation, proprioception etc.) is hypothesized to be better realized via physical structures, which is why, SPSH can be considered to complement PSSH. Here, “physical structure matters”. Such a form of intelligence can be viewed as ‘considering and responding to phenomena in the environment, using phenomena generated by bodily structures’ (where ‘body’ includes the brain as well, if present) - this is expressed as ‘CRiSP’ - ‘Considered Response in-terms-of Structure-derived Phenomena’.
424
S. Raghavachary
Varela and Thompson used three questions to compare their enactive paradigm (where sense-making occurs when an embodied mind actively negotiates its environment), with the symbolic and connectionist ones [21]. The three questions serve to clarify the ‘CRiSP’ approach to intelligence that is presented above. Here are the questions and answers. Q1. What is cognition? A. Considered Response (CR). Q2. How does it work? A. Through appropriate physical structures that exhibit phenomena (SP). Q3. How do I know when a cognitive system is functioning adequately? When an agent can successfully negotiate its environment, making use of phenomena exhibited by suitably-designed body and brain structures.
4 Biomimetics for AGI Nature ‘tirelessly experiments’, in order to come up with workable designs for plants and animals - biological diversity in the natural world is quite astonishing indeed. Animal engineering and plant engineering have resulted in a plethora of materials and designs, that make use practically every phenomenon known to humans (and possibly others that humans have not discovered yet). Biomimetics is the practice of harvesting nature’s efficient structures and designs, for human purposes (there are plenty of examples of this in science, engineering and medicine). We would do well to mimic nature, as a shortcut to coming up with good architectures for our AGI. 4.1 Existing Structure-based Designs There is a small, but growing body of work, along the lines of what is presented in this paper. Subsumption architecture [22] is a robot design principle where hierarchical layers are used to effect control - higher-level layers (whose goals are more abstract) are fed inputs by lower-level ones (with concrete goals such as object avoidance). Neuromorphic computing is based on algorithmic approaches that emulate how biological brains operate (eg. via spiking neural networks that handle sequence-based outputs called spike trains). Developmental robotics is a relatively new area where the goal is to examine the architectures, constraints and principles that govern how an embodied agent would continue to acquire new skills on its own by continuously interacting with the environment. Reservoir Computing (RC) is a relatively lightweight algorithm, expressed as an Echo State Network (ESM), or a Liquid State Network (LSN) - where ‘reservoir’ is a hidden layer (in classic neural network terminology) with random, non-trainable weights. Physical reservoir computing (PRC) is an adaptation of RC, meant for developing mechanical intelligence, eg. to control a robot constructed using trusses [23].
A Physical Structural Perspective of Intelligence
425
Braitenberg [24] designed a series of machines as thought experiments, where each machine is outfitted with relatively simple sensors that are inter-connected to create biologically-inspired circuits, producing complex, unexpected behaviors that might be presumed to have come from much more involved architectures. Here again, a takeaway would be that appropriately designed, simpler structures would achieve behavioral results similar to those from more complex computational architectures that compute the behavior - it is the structure that computes, rather than an explicit algorithm. 4.2 Design Principles We conclude with a list of high-level principles (in no particular order) that could help with designing physical structures that give rise to AGI: • ensure tight coupling between the body and brain - in other words, set up specific brain regions for handling specific inputs and outputs; consider ‘embrainment’, for AGI with complex functionality: rather than use standard chip-like hardware (including neuromorphic processors), it might be useful to design the brain portion of the AGI as a physical structure, as well; this might help situate processing close to the sensing and actuating, permit setting up connections between various brain regions (possibly even useful for consciousness explorations, on the premise that physically-based interactions between different brain areas might play a role in creating consciousness - eg. long-range interactions via brain waves) • consider ‘field computing’ in the design of structures [25] • consider analog hardware that has direct signal-based routes between computing elements, inputs and outputs, in addition to the usual digital architectures (that employ compiled instructions that reside in external (to the processor) memory) • perform architecture search (eg. using genetic algorithms or reinforcement learning), for forms (structures) that enable desired functions (via phenomena) for locomotion, various -ceptions (eg. interoception), -tropisms (eg. heliotropisms), sensing (eg. pressure, EM radiation, vibration); for robotics control systems (eg. subsumption, centralized pattern generators (CPGs), inverse kinematics (IK) solvers, dynamic balancers etc.) • reverse-engineer natural structures and their phenomena, to obtain usable solutions for core body functions such as heat exchange, energy transformation cycles, structural design (eg. reinforcement strengthening of thin surfaces) etc. • consider the ‘Umwelt’ (the specific perceptual experience/world of an agent, as popularized by the biologist Jakob Johann von Uexküll) of the AGI being designed, and deploy it in an environment suitable for maximizing its effectiveness; alternatively, design for Umwelt that matches the environment in which the AGI will be deployed • analyze the ‘affordances’ (made popular by James Gibson) of objects in the AGI’s environment (properties that the objects offer the AGI), and ensure that the AGI can indeed respond to them (either by modifying the AGI’s design suitably, or, modifying the objects to match how the AGI would utilize/respond to them) • design allopoietic systems where we humans would provide growth (eg. by swapping out parts) and perform repairs; futuristic versions would be based on inorganic/organic materials that display self-organization, whereby autopoiesis would become possible
426
• • • •
S. Raghavachary
(among other capabilities, this would permit memory formation and modification, neural plasticity, etc.) design aspects of homeostasis, where the AGI would attempt self restoration of equilibrium, self-repair, possess pain awareness (nociception), etc. consider non-anthropomorphic designs for eyes, limbs, bodies and internal organs, which might lead to enhanced, or alternative ways of sensing, perceiving and locomoting etc., compared with humans consider deploying promising designs initially in a suitable VR environment [26], fixing design and other issues in VR, and subsequently building physical versions consider creating statistical ensembles of structures that function as aggregates, for purposes of fault tolerance, for use in environments with fluctuating or uneven conditions (eg. uneven temperatures, rough terrains etc.)
The above principles would help jump-start various forms of AGI, whose structure and capabilities would be based on our needs (in other words, would help us play Intelligent Designer); some forms, depending on their design, might even be able to adapt to changes in the environment during the course of their functioning, ie. ‘evolve’ on their own.
5 Conclusions In this paper, we looked at biological intelligence as being exhibited by appropriately evolved structures at multiple spatial scales, which display phenomena that gets manifested as intelligence. The takeaway from this view is that to build AGI, we would do well to design and build physical structures that would display the forms of embodied intelligence we seek; for complex behavior equivalent to what we see in higher animals, our designs would need to be, not only for bodies (‘embodiment’) that match their environment, but also, for suitably matched brains (‘embrainment’) coupled with the bodies. The premise is that such physical structure-based design would lead to robust and flexible behavior on the part of the AGI that is built using it.
References 1. Samsonovich, A.V.: Toward a unified catalog of implemented cognitive architectures. BICA 221, 195–244 (2010) 2. Hassabis, D., Kumaran, D., Summerfield, C., Botvinick, M.: Neuroscience-inspired artificial intelligence. Neuron 95, 245–258 (2017) 3. Yamakawa, H.: The whole brain architecture approach: accelerating the development of artificial general intelligence by referring to the brain. Neural Netw. 144, 478–495 (2021) 4. Hart D., Goertzel B.: OpenCog: a software framework for integrative artificial general intelligence. In: AGI, pp. 468–472. IOS Press, Amsterdam (2008) 5. Brooks, R.: Intelligence without representation. Artif. Intell. 47, 139–159 (1991) 6. Clark, A.: Can Philosophy contribute to an understanding of Artificial Intelligence?, http://undercurrentphilosophy.com/medium/can-philosophy-contribute-to-an-unders tanding-of-artificial-intelligence, Accessed 28 Dec 2021
A Physical Structural Perspective of Intelligence
427
7. Raghavachary, S.: Intelligence - consider this and respond! In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) BICA 2020. AISC, vol. 1310, pp. 400–409. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65596-9_48 8. Wolfe, M.F., Goldberg, R.: Rube Goldberg: Inventions! Simon & Schuster, New York (2000) 9. Maturana, H.R., Varela, F.J.: Autopoiesis and Cognition: The Realization of the Living. D. Reidel Publishing Company, Dordrecht (1980) 10. Levin M., Dennett, D. C.: Cognition all the way down. https://aeon.co/essays/how-to-unders tand-cells-tissues-and-organisms-as-agents-with-agendas, Accessed 28 Dec 2021 11. Vogel S.: Life’s Devices: The Physical World of Animals and Plants. Princeton Paperbacks. Princeton, New Jersey (1988) 12. Tributsch, H.: How Life Learned to Live. The MIT Press, Cambridge (1982) 13. Griffin, D.R.: Animal Engineering: Readings from Scientific American. W. H. Freeman and Company, New York (1975) 14. How, J.J., et al.: Neural network features distinguish chemosensory stimuli in Caenorhabditis elegans. PLoS Comput. Biol. (2021). https://doi.org/10.1371/journal.pcbi.1009591 15. Burrows, M., Sutton, G.P.: Interacting gears synchronise propulsive leg movements in a jumping insect. Science 341, 1254–1256 (2013) 16. Huxley, H.E.: The mechanism of muscular contraction. Science 164(3886), 1356–1366 (1969) 17. Lyu, C., Abbott, L.F., Maimon, G.: Building an allocentric travelling direction signal via vector computation. Nature (2021), https://doi.org/10.1038/s41586-021-04067-0 18. Newell, A.: Physical symbol systems. Cogn. Sci. 4(2), 135–183 (1980) 19. Nilsson, N.: The Quest for Artificial Intelligence: A History of Ideas and Achievements. Cambridge University Press, Cambridge/New York (2010) 20. Brooks, R.: Cognition without computation. https://spectrum.ieee.org/computational-cognit ive-science, Accessed 28 Dec 2021 21. Varela, F.J., Thompson, E., Rosch, E.: The Embodied Mind. The MIT Press, Cambridge (1991) 22. Brooks, R.A.: A robust layered control system for a mobile robot. IEEE J. Rob. Autom. 2, 14–23 (1986) 23. Bhovad, P., Li, S.: Physical reservoir computing with origami and its application to robotic crawling. Scientific Reports 11, Article 13002 (2021). https://doi.org/10.1038/s41598-02192257-1 24. Braitenberg, V.: Vehicles: Experiments in Synthetic Psychology. The MIT Press, Cambridge (1984) 25. MacLennan, B.J.: Field computation in natural and artificial intelligence. Inf. Sci. 119, 73–89 (1999) 26. Raghavachary, S., Lei, L.: A VR-based system and architecture for computational modeling of minds. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 417–425. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_55
One Possibility of a Neuro-Symbolic Integration Alexei V. Samsonovich(B) National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe Shosse 31, Moscow 115409, Russian Federation [email protected]
Abstract. Deep learning (DL) technologies automate the labor of theoreticians, developers, and programmers, that is otherwise needed to design knowledge representations and algorithms for each particular domain. At the same time, DL has limitations rooted in the statistical nature of the method, which largely ignores the available domain knowledge. An example of a DL-hard domain is human social-emotional behavior. In contrast, biologically inspired cognitive architectures (BICA) are based on the scientific knowledge accumulated in cognitive and neuro-sciences, and therefore can be successful in generating believable sociallyemotional behavior, although at great expense of intellectual human labor. The challenge is to integrate the two approaches, adding their strengths and compensating their weaknesses. Here a particular form of such integration is proposed, that also involves methods of evolutionary programming. The expected result could be a self-developing artificial socially-emotional intelligence, capable of growing autonomously to the human level and beyond. Potential future applications of the method can be expected to have a global impact on the society. Keywords: Deep learning · Cognitive architectures · Evolutionary algorithms · Neuro-symbolic integration · Socially-emotional intelligence
1 Introduction Since the inception of Artificial Intelligence (AI) as a field [1], there were always hopes that one day AI will achieve the human level of general intelligence, will be able to unlimitedly develop itself and grow cognitively, similarly to mankind, but on a much shorter time scale [2]. Having this process under control, man would free themselves from hard intellectual labor, would only steer the wheel and enjoy the ride. However, nearly seven decades of AI history show the opposite trend: more and more programmer and developer labor are needed for AI technologies to grow, and the process does not tend to become fully automated and autonomous. Moreover, new seemingly unbreakable limitations emerge over time. This is true about all currently pursued approaches in AI, including (a) traditional approaches, such as formal reasoning, planning, decision making, knowledge representation, semantic parsing, etc. [3], (b) cognitive approaches, such as cognitive modeling in general and Biologically Inspired Cognitive Architectures (BICA) in particular [4–6], and (c) statistical data-scientific approaches, most promising representatives of which are deep neural networks [7, 8] and evolutionary algorithms © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 428–437, 2022. https://doi.org/10.1007/978-3-030-96993-6_47
One Possibility of a Neuro-Symbolic Integration
429
(EA) [9, 10]. They too have limitations, despite the impressive impacts of the deep learning (DL) revolution [11]. Indeed, nowadays DL technologies change our life significantly by automating the labor of theoreticians, developers, and programmers, that otherwise would have to be done at great expense, including the design of the architecture, knowledge representations and algorithms for each new class of problems [7, 8, 11]. With DL, a data scientist may not even know how all these details are automatically generated in a neural network during training. It seems like the dream [1] is coming true, to the extent that the term “deep learning” has become a synonym of AI. But still, the capabilities of deep neural networks are limited, and these limitations are rooted in the nature of the method. DL is a statistical approach that is entirely data-driven and data-oriented. While (a) and (b) are based on the scientific and general human knowledge accumulated over centuries, (c) in general ignores this knowledge and treats any given domain as an abstract dataset. Then a standard device is used to automatically learn to perform one of several standard functions on the data, such as prediction, generation, classification, characterization, optimization, and the like. This is done automatically (which is a strength of the approach), but apart from and in ignorance of the human understanding of the nature of the phenomenon (which eventually becomes a weakness). As a result, large volumes of data are required that may not be available, and the approach does not work for many domains. For example, the problem of modeling human socially emotional behavior appears difficult to solve with DL: difficulties emerge when the lack of understanding of the context of behavior becomes an obstacle [12, 13]. At the same time, this task presents one of the main challenges for modern AI, and many efforts today are devoted to its solution. There are principal differences between approaches (a), (b) and (c). Tools developed under (a) and (b) can be called “man-made” and “cognitive” at the same time, because they are created manually by theoreticians, developers and programmers and are based on the available human knowledge, accumulated using the scientific method. In contrast, tools developed under (c) are machine-made and statistical in the sense that they are created automatically using machine learning procedures applied to available data. Both kinds of approaches have their weaknesses. While in principle almost any intellectual task can be machine-learned given sufficient amounts of data and computer power, in many domains these factors become the bottleneck. On the other hand, in the case of (a) or (b), the bottleneck is human labor. At the same time, (b) has been recently remarkably successful in modeling human social-emotional behavior [14–19]. Therefore, it can be argued that a challenge in AI today is to find a particular form of integration of (b) and (c), such that they strengthen each other, while compensating for each other’s weaknesses. This topic is very popular today, and many forms of neuralsymbolic integration (that go beyond trivial hybrid schemes) were recently explored that cannot be reviewed here in detail [20–27]. With all respect to these efforts, yet another idea of the neural-symbolic integration is outlined in this short paper.
2 The Concept and Methods The idea of the concept is to combine three major paradigms of AI: (1) BICA, (2) DL, and (3) EA, not by creating their hybrid, but by connecting them in one pipeline.
430
A. V. Samsonovich
BICA from their onset were intended to represent the quintessence of scientific knowledge about the highest known form of natural phenomena: about conscious voluntary behavior and internal experiences of intelligent agents at the human level [28–30]. Therefore, BICA must be able to recognize and evaluate, as well as to predict and reproduce human-specific behaviors across a wide range of domains, paradigms and situations. Today, the available scientific knowledge and computing powers make this possible. The difficulty here is that BICA need to be “hand-crafted”, including the development and coding of everything. While many forms of machine learning may be available in BICA, it is still a long way to go before BICA will be able to grow from a child to a human adult level by developing themselves autonomously. In contrast, DL is capable of creating an intelligent agent automatically, without human intervention. The user only needs to choose one of the standard models, including the network architecture, paradigm and learning rules, as well as data representation formats, hyperparameters, etc., provide sufficient data for training and an efficient way to preprocess, annotate and evaluate the data [8]. The situation with EA is similar: one just needs to choose an evolutionary model, including definitions of the genotype and the mechanisms of ontogeny, define the initial population of individuals, determine the fitness function or other selection criteria, introduce the mechanisms of mutation and recombination, etc. [9, 10]. Then the process continues automatically. In the case of the creation of a social agent, there are many gaps in the last two schemes, namely, DL and EA. For example, the data itself in large volumes as well as its automated annotation and fitness-ranking may not be available. In addition, new data and new training procedure would be required for each new case or paradigm. This is the barrier. The idea of the proposed solution is to integrate BICA, DL and EA: specifically, use BICA to fill in the gaps in DL, and then use BICA and DL together to implement EA. In this case, BICA is directing DL and EA and is providing a scaffolding for the cognitive growth, while DL and EA automate the development of BICA. Details are shown in Fig. 1 and explained below. 2.1 The Roadmap This section outlines the goal and a plan of work for a hypothetical research project that could bring the above ideas to life. By no means it should be considered the only possibility or the right way to do it: many details should become clear during the process. Objective. The goal is to create a technological pipeline for the automated production of new types of social intelligent agents, implemented as deep neural networks. This pipeline can be called a Cognitive Conveyor. The plan is divided into several phases, each of which, in turn, is divided into several levels (the same levels for all phases), as explained below. These levels correspond to the columns 1, 2, 3 in Fig. 1, representing BICA, DL, and EA, respectively. Phase 0. Research is carried out theoretically and using computational experiments on abstract models without human subjects involved in experiments. Semantics of model elements is limited to values of their appraisals and otherwise remains unspecified.
One Possibility of a Neuro-Symbolic Integration
431
Fig. 1. Top-level logic model of the proposed neuro-symbolic integration. Vertical columns (pairs of boxes) in the diagram represent, respectively, 1: BICA, 2: DL, 3: EA. For description of connections and loops, see text.
Phase 1. Research is conducted using “toy” paradigms involving human participation. The paradigms should require from participants an adequate socially emotional behavior, while at the same time an immediate practical usefulness of the outcome is not a requirement. Examples of paradigms may include a virtual dance partner [31, 32], a virtual pet [33, 34], a virtual clownery [35, 36], a virtual composer assistant [37], a virtual listener [38], and the like. Phase 2. Paradigms and agents of greater practical importance are developed. Examples of possible paradigms include: a virtual Pokémon-style multipurpose registrar or guide (can be used ubiquitously), an intelligent virtual or robotic tutor, a virtual presenter, a robotic toy, an NPC for videogames, a virtual performer, actor, artist or musician, a “soul” (or “spirit inside the machine”) for electronic devices, appliances, vehicles, “smart” facilities, and many more. Phases break into levels. The levels included in each phase are explained below. At this stage of analysis, they can be considered identical for all phases. Level 0. Accumulating knowledge about the subject domain, choosing a paradigm, implementing a virtual environment, collecting data on human behavior in it, developing a BICA model together with the underlying data, and adapting it to the paradigm. Level 1. Solution based on BICA (Fig. 1, column 1). Implementing a first-approximation solution in the form of agents based on the BICA model. Analytical and computational study of the solution. Experiments with human participants, data collection. Refinement, tuning and validation of the model. Generation of annotated synthetic behavior data using BICA that will be used for training neural networks.
432
A. V. Samsonovich
Level 2. Solution based on DL. The first loop (Fig. 1, see also below) is implemented, which includes the transfer of functionality from BICA to the neural network with feedback, meaning that the BICA model is corrected based on the study of the transfer outcomes. Specific steps are listed below. 1. Choice of a neural network model, representation format and learning algorithm. 2. Transfer of functionality from BICA to a neural network by means of DL on the data generated by BICA and using the BICA evaluation capabilities during the training. 3. Testing and validation of the transfer results in experiments with human participants. Analysis of results, clarification of the role and helpfulness of specific features of BICA in goal achievement. 4. Manual correction of BICA parameters based on the analysis of experimental results. Level 3. Solution based on EA. Loops (Fig. 1) described below are added sequentially to control and organize the evolution of the population of neural networks implementing social agents. Each loop on its own can be viewed as an evolutionary scheme. • First loop. Transfer of functionality from BICA to the neural network with feedback (Fig. 1, columns 1 and 2, see above). No EA involvement. • Second loop. Evolution or co-evolution of a population of trained neural networks (Fig. 1, columns 2, 3). Evolution occurs during social interaction of individuals in the selected paradigm. It becomes co-evolution, when individuals of different kinds (e.g., performing different roles) are interacting. An individual (genotype) is a neural network trained on BICA-generated data (and, as a result, possessing the functionality of BICA). In the case of co-evolution, some of the interacting entities can be BICA agents. Mutations1 are understood as DL in the process of (co-)evolution. Recombinations can be understood as one-to-one or two-to-one DL training, arranged among individuals temporarily set aside from the social interaction paradigm, resulting in a functionality transfer among individuals. The fitness function can be implemented using evaluative capabilities of BICA, and eventually using a special sub-population of trained neural networks. Feedback consists in the modification of DL parameters based on the analysis of the outcome. • Third loop. Genotype is the DL algorithm and its hyperparameters. The initial population of individuals consists of untrained neural networks. Ontogenesis involves DL based on BICA-generated data and then maturation during social interactions. Mutations and recombinations are understood as changes in the genotype. In this case the second loop plays the role of individual’s ontogenesis. Feedback consists in the addition to, and modification of the database of behavior examples used for initial training, with the additional set of behavior examples used for training accumulated during co-evolution. • Fourth loop. Genotype consists of the set of parameters of the cognitive architecture and the set of examples of behavior generated with its help. Mutations and recombinations are as always understood as changes in the genotype. The third loop in this case 1 In Evolutionary Computation, a mutation is an operation that alters the genotype of an
individual. If a genotype is a result of DL, then it can be altered by an additional DL process.
One Possibility of a Neuro-Symbolic Integration
433
plays the role of ontogenesis. Feedback consists in the manual adaptation of BICA based on the analysis of the outcome. Level 4. This level can be formally added for completeness, to close all loops in Fig. 1. It consists in the experimental and theoretical study of resulting machine-made intelligent social agents, their practical deployment and usage. The outcome of this level is a contribution to the machine learning technology, and the feedback goes back to science.
3 Discussion The key idea in this concept of integration is to use BICA and DL as sequential stages in the same technological pipeline of developing AI agents designated for a specific task that requires a human-level social-emotional intelligence. The role of BICA is to bootstrap the process and to provide a scaffolding for DL, which results in the transfer of the human scientific and commonsense knowledge, compressed into BICA, to the neural network. Neural networks trained in this way should be able to continue their evolution during social interactions with each other, similarly to the human evolution, but on a much shorter timescale. This evolutionary process will involve interactions with BICA agents and, directly or indirectly, with humans. Thus, the top level achievable based on this scheme is determined by the evolution of neural networks and can be significantly higher than the level determined by the initial DL using BICA-generated data. Evidence that the first loop is feasible and works comes from the recent work of the Trafton group [39]. There are other studies also supporting this statement (e.g., [36]). The proposed concept, however, goes deeper than just the idea to use BICA to train a neural network (which is a variety of nowadays popular model-based learning [40]). The main hope is for the EA implemented using DL in neural networks and BICA, as described above. This should be a synergistic unification, resulting in a whole that is much greater than its parts. At present, this bigger hope is only a hypothesis. Another aspect or potential starting point here is the question: how to make AI socially emotional at the human level? This question is practically important today, because we need intelligent agents who can understand human emotions and respond to them adequately, earn trust and empathy, establish emotional contact based on mutual understanding, and at the same time act as efficient assistants or partners in solving specific tasks. Recent attempts to solve this problem by DL from scratch lead only to superficial mimicry of human emotionality and run into a barrier of lack of understanding of the context of social interaction [12, 13, 41]. At the same time, it is known that the cognitive approach can overcome this barrier [42]. Can we expect that DL placed on the shoulders of the cognitive approach will be able to take us much further? The efficiency of DL stands as a paradox. The number of layers in popular neural network models has grown exponentially over the past several years. The more layers, the higher the robustness of the solution, contrary to the intuition. At the same time, limitations of DL attract more and more attention today. These limitations are related to (1) the dependence of DL on the availability of large volumes of data, (2) the required large number of training cycles, (3) the knowledge transfer problem, (4) the limited size
434
A. V. Samsonovich
of the input vector, (5) the logical reasoning problem, (6) the forgetting problem, (7) the sub-goaling problem, (8) the lack of common sense problem, (9) the transparency, or explainability problem, and so on. Many of these problems are finding their solutions today, others appear too hard to solve within DL alone. Among the hard problems is the one related to the inability of neural networks in understanding human psychology. This situation urges data scientists to seek alternative solutions in the field of cognitive modeling that could augment pure machine learning solutions. Scaffolding DL of neural networks with BICA within an EA is an ambitious proposal, that needs domain-specific justification. A well-known example of a successful outcome of a simple evolutionary scheme involving DL is the strongest Go player, known as AlphaGo Zero. In this case, a deep convolutional neural network was trained during several days starting from scratch, without any prior knowledge about the game. The network was playing against itself and was trained during this process [43]. Interestingly, any usage of prior domain knowledge made the learning process less efficient. A similar approach has led to similar successes in other tasks, including the game of poker [44]. In these examples, however, rules and objectives of games are well defined, easy to compute, and are readily available as a reward signal during DL. This is not the case in paradigms of human social interaction, when there are no formal criteria of “victory” or “defeat”, and it is not clear who should be the judge and what rules should be used to judge behavior. A similar problem is known for the domain of artificial creativity, where a solution is to use another population of evolving neural networks as judges [45]. In examples like these, semantic maps of various nature (e.g., [46]) can be used for judgment, and their development is an important task. In fact, the task of social behavior evaluation stands next to the task of social behavior generation. They both can be solved with BICA [47], and both eventually should be solved using a DL-EA scheme. However, if this process starts from scratch, with zero domain knowledge, then there is no hope for neural networks to develop a faithful equivalent of human psychology. If the initial population knows nothing about human psychology and has no access to human or human-like behavioral data or its derivatives, then it is difficult to expect that the result of their evolution will be socially acceptable for a human. Therefore, some sort of a scaffolding is required to direct the evolutionary process along the right path, and this is why BICA is needed and should become involved. Details of the scheme will be clarified and may change during implementation. E.g., multiple co-evolving populations can help to achieve a diversity of outcomes necessary for their compatibility with human intelligence. The idea of the need for cognitive models to complement DL is today in the air, primarily because of the DL limitations becoming more and more evident. It can be expected therefore that cognitive models, especially BICA, describing human emotions and social behavior, will be in high demand in the near future. However, not any cognitive model or BICA can do the job of providing a large amount and diversity of data required by DL to obtain an acceptable outcome. Another functional requirement for BICA in this scheme is their ability to evaluate psychological characteristics and believability of observed behavior [47, 48], necessary to implement the fitness function. It is therefore important to pay attention to promising new general approaches in BICA development
One Possibility of a Neuro-Symbolic Integration
435
that are capable of replicating higher cognitive and emotional abilities of the human mind [49], in other words, capable of solving the BICA Challenge [50]. Acknowledgments. The author is grateful to all Reviewers for their useful comments. This work was supported by the Ministry of Science and Higher Education of the Russian Federation, state assignment project No. 0723-2020-0036.
References 1. McCarthy, J., Minsky, M., Rochester, N., Shannon, C.: A proposal for the dartmouth summer research project on artificial intelligence. AI Mag. 27(4), 12–14 (1955) 2. Vinge, V.: The coming technological singularity: how to survive in the post-human era. Whole Earth Review (1993) 3. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (1995) 4. Gray, W.D. (ed.): Integrated Models of Cognitive Systems. Series on Cognitive Models and Architectures. Oxford University Press, Oxford (2007) 5. Laird, J.E.: The Soar Cognitive Architecture. MIT Press, Cambridge (2012) 6. Anderson, J.R.: How Can the Human Mind Occur in the Physical Universe? Oxford University Press, New York (2007) 7. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009) 8. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT press, Cambridge (2016) 9. Holland, J.: Adaptation in Natural and Artificial Systems. MIT Press, Cambridge (1992). ISBN 978-0262581110 10. De Jong, K.A.: Evolutionary Computation: A Unified Approach. MIT Press, Cambridge (2016). ISBN: 9780262529600 11. Sejnowski, T.J.: The Deep Learning Revolution. The MIT Press, Cambridge (2021) 12. Suresh, V., Ong, D.C.: Using knowledge-embedded attention to augment pre-trained language models for fine-grained emotion recognition. In: 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 1–8 (2021). https://doi.org/10. 1109/ACII52823.2021.9597390 13. Casas, J., Spring, T., Daher, K., Mugellini, E., Khaled, O.A., Cudré-Mauroux, P.: Enhancing conversational agents with empathic abilities. In: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, pp. 41–47. Association for Computing Machinery, Inc. (2021). https://doi.org/10.1145/3472306.3478344 14. Marsella, S., Gratch, J., Petta, P.: Computational models of emotion. In: Scherer, K.R., Banziger, T., Roesch, E. (eds.) A Blueprint for Affective Computing: A Sourcebook and Manual. Oxford University Press, Oxford (2010) 15. Lucas, G.M., Gratch, J., King, A., Morency, L.-P.: It’s only a computer: virtual humans increase willingness to disclose. Comput. Hum. Behav. 37, 94–100 (2014). https://doi.org/ 10.1016/j.chb.2014.04.043 16. Lieto, A.: Cognitive Design for Artificial Minds, p. 152. Taylor & Francis, UK (2021). ISBN 9781315460536 17. Rodriguez, L.-F., Ramos, F.: Development of computational models of emotions for autonomous agents: a review. Cogn. Comput. 6(3), 351–375 (2014) 18. Gratch, J., Marsella, S.: A domain-independent framework for modeling emotion. Cogn. Syst. Res. 5(4), 269–306 (2004)
436
A. V. Samsonovich
19. Gratch, J., Wang, N., Gerten, J., Fast, E., Duffy, R.: Creating rapport with virtual agents. In: Proceedings of the Seventh International Conference on Intelligent Virtual Agents, pp. 125– 138 (2007) 20. Goertzel, B., Suárez-Madrigal, A., Gino, Y.: Guiding symbolic natural language grammar induction via transformer-based sequence probabilities. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 153–163. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_16 21. Garcez, A.D., Gori, M., Lamb, L.C., Serafini, L., Spranger, M., Tran, S.N.: Neural-symbolic computing: an effective methodology for principled integration of machine learning and reasoning. J. Appl. Logics 6(4), 611–631 (2019) 22. Besold, T.R., Kühnberger, K.-U.: Towards integrated neural-symbolic systems for humanlevel AI: two research programs helping to bridge the gaps. Biol. Insp. Cogn. Arch. 14, 97–110 (2015) 23. Goertzel, B.: Perception processing for general intelligence: bridging the symbolic/subsymbolic gap. In: Bach, J., Goertzel, B., Iklé, M. (eds.) AGI 2012. LNCS (LNAI), vol. 7716, pp. 79–88. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-355 06-6_9 24. Riley, H., Sridharan, M.: Integrating non-monotonic logical reasoning and inductive learning with deep learning for explainable visual question answering. Front. Rob. AI, 6, art. no. 125 (2019) 25. Kovalev, A.K., Shaban, M., Osipov, E., Panov, A.I.: Vector semiotic model for visual question answering. Cogn. Syst. Res. 71, 52–63 (2022). https://doi.org/10.1016/j.cogsys.2021.09.001 26. Skrynnik, A., Staroverov, A., Aitygulov, E., Aksenov, K., Davydov, V., Panov, A.I.: Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl.Based Syst. 218, 106844 (2021). https://doi.org/10.1016/j.knosys.2021.106844 27. Thomsen, K. The Ouroboros model: proposal for self-organizing general cognition substantiated. AI 2, 89–105 (2021). https://doi.org/10.3390/ai2010007 28. Newell, A.: Unified Theories of Cognition. Harvard University Press, Harvard (1990) 29. Anderson, J.R., Lebiere, C.: The Atomic Components of Thought. Lawrence Erlbaum Associates, Mahwah (1998) 30. Sun, R.: Anatomy of the Mind: Exploring Psychological Mechanisms and Processes with the Clarion Cognitive Architecture. Oxford University Press, Oxford (2016) 31. Krylov, D.I., Samsonovich, A.V.: Designing an emotionally-intelligent assistant of a virtual dance creator. In: Samsonovich, A.V. (ed.) BICA 2018. AISC, vol. 848, pp. 197–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-99316-4_26 32. Karabelnikova, Y., Samsonovich, A.V.: Virtual partner dance as a paradigm for empirical study of cognitive models of emotional intelligence. Procedia Comput. Sci. 190, 414–433 (2021). https://doi.org/10.1016/j.procs.2021.06.05 33. Bogatyreva, A.A., Sovkov, A.D., Tikhomirova, S.A., Vinogradova, A.R., Samsonovich, A.V.: Virtual pet powered by a socially-emotional BICA. Procedia Comput. Sci. 145, 564–571 (2018) 34. Tsarkov, V.S., Enikeev, V.A., Samsonovich, A.V.: Toward a socially acceptable model of emotional artificial intelligence. Procedia Comput. Sci. 190, 771–788 (2021). https://doi.org/ 10.1016/j.procs.2021.06.090 35. Samsonovich, A.V., Dodonov, A.D., Klychkov, M.D., Budanitsky, A.V., Grishin, I.A., Anisimova, A.S.: A virtual clown behavior model based on emotional biologically inspired cognitive architecture. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y., Klimov, V.V. (eds.) NEUROINFORMATICS 2021. SCI, vol. 1008, pp. 99–108. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91581-0_14
One Possibility of a Neuro-Symbolic Integration
437
36. Samsonovich, A.V.: A virtual actor behavior model based on emotional biologically inspired cognitive architecture. In: Goertzel, B., Iklé, M., Potapov, A. (eds.) Artificial General Intelligence: 14th International Conference, AGI 2021, Palo Alto, CA, USA, October 15–18, 2021, Proceedings, pp. 221–227. Springer International Publishing, Cham (2022). https://doi.org/ 10.1007/978-3-030-93758-4_23 37. Voznenko, T.I., Samsonovich, A.V., Gridnev, A.A., Petrova, A.I.: The principle of implementing an assistant composer. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds.) NEUROINFORMATICS 2018. SCI, vol. 799, pp. 300–304. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01328-8_36 38. Eidlin, A.A., Chubarov, A.A., Samsonovich, A.V.: Virtual listener: emotionally-intelligent assistant based on a cognitive architecture. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 73–82. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_10 39. Trafton, J.G., Hiatt, L.M., Brumback, B., McCurry, J.M.: Using cognitive models to train big data models with small data. In: An, B., Yorke-Smith, N., El Fallah Seghrouchni, A., Sukthankar, G., (eds.). Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), pp. 1413–1421. International Foundation for Autonomous Agents and Multiagent Systems: Richland, SC (2020) 40. Sense, F., et al. Cognition-enhanced machine learning for better predictions with limited data. Topics Cogn. Sci., 1–17 (2021). https://doi.org/10.1111/tops.12574 41. Ong, D.C., et al.: Modeling emotion in complex stories: the stanford emotional narratives dataset. IEEE Trans. Affect. Comput. 12(3), 579–594 (2021). https://doi.org/10.1109/Taffc. 2019.2955949 42. Marsella, S., Gratch, J., Petta, P.: EMA: a process model of appraisal dynamics. Cogn. Syst. Res. 10, 70–90 (2009). https://doi.org/10.1016/j.cogsys.2008.03.005 43. Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354 (2017) 44. Moravcik, M., et al.: DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337), 508 (2017) 45. Macret, M., Pasquier, P.: Automatic design of sound synthesizers as pure data patches using coevolutionary mixed-typed cartesian genetic programming. In: GECCO 2014: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 309–316 (2014). Association for Computing Machinery. https://doi.org/10.1145/2576768.2598303 46. Samsonovich, A.V., Ascoli, G.A.: Augmenting weak semantic cognitive maps with an “abstractness” dimension. Comput. Intell. Neurosci. 2013, 308176 (2013). https://doi.org/10. 1155/2013/308176 47. Tikhomirova, D., Zavrajnova, M., Rodkina, E., Musayeva, Y., Samsonovich, A.: Psychological portrait of a virtual agent in the teleport game paradigm. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 327–336. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_35 48. Samsonovich, A.V.: Believable character reasoning and a measure of self-confidence for autonomous team actors. In: Nisar, A., Cummings, M., Miller, C. (eds.) Self-Confidence in Autonomous Systems: Papers from the AAAI Fall Symposium. AAAI Technical Report FS-15–05, p. 5. AAAI Press, Palo Alto (2015). ISBN 978-1-57735-751-3 49. Samsonovich, A.V.: Socially emotional brain-inspired cognitive architecture framework for artificial intelligence. Cogn. Syst. Res. 60, 57–76 (2020). https://doi.org/10.1016/j.cogsys. 2019.12.002 50. Samsonovich, A.V.: On a roadmap for the BICA challenge. Biol. Insp. Cogn. Arch. 1, 100–107 (2012)
A Comparison of Two Variants of Memristive Plasticity for Solving the Classification Problem of Handwritten Digits Recognition Alexander Sboev1,2(B) , Yury Davydov1 , Roman Rybka1 , Danila Vlasov1 , and Alexey Serenko1 1
National Research Centre “Kurchatov Institute”, Moscow, Russia National Research Nuclear University MEPhI, Moscow, Russia
2
Abstract. Nowadays, the task of creating and training spiking neural networks (SNN) is extremely relevant due to their high energy efficiency achieved by implementing such networks via neuromorphic hardware. Especially interesting is the possibility of building SNNs based on memristors, which have properties that potentially allow them to be used as analog synapses. With that in mind, it seems relevant to study spike networks built upon plasticity rules that correspond to the experimentally observed nonlinear laws of conductivity change in memristors. Earlier it was shown that spiking neural networks trained with a biologically inspired local STDP (Spike-Timing-Dependent Plasticity) rule are capable of solving classification problems successfully. In addition, it was also demonstrated that classification problems can also be solved with spiking neural networks operating with a plasticity rule that models the change in conductivity in nanocomposite (NC) memristors. This paper presents a continuation of the study of the applicability of memristive plasticity rules on the handwritten digit recognition problem. Two types of memristive plasticity are compared: for nanocomposite and PPX memristors. It is shown that both models can successfully solve the classification problem, and the key differences between them are identified. Keywords: Spiking neural networks · Spike-timing-dependent plasticity · Memristors · Classification · Machine learning
1
Introduction
Spiking neural networks (SNNs) [10,21] are interesting from the theoretical point of view due to the fact that the laws by which they are governed are close to those according to which real biological neurons function. Fascinating efficiency and ability to solve problems of incredible complexity inherent to neural structures in the brain make the task of building artificial neural networks functioning according to the same laws extremely promising. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 438–446, 2022. https://doi.org/10.1007/978-3-030-96993-6_48
A Comparison of Two Variants of Memristive Plasticity
439
From the practical point of view, SNNs are interesting because their implementations using neuromorphic devices have extremely high energy efficiency [2,6]. Implementations in which both neurons and synapses are implemented using analog devices may be even more energy efficient [15]. In particular, the use of memristors in this kind of analog implementations looks promising [1,18]. On this basis, it seems relevant to solve the problem of developing spiking neural networks, whose local learning mechanism (synaptic plasticity) is based on the experimentally observed laws of conductivity change in memristors. So far, many models of memristive plasticity have been experimentally obtained, in which the change in the conductance of a memristor depends on its current conductance and the time difference between pre- and postsynaptic spikes [7,8, 12,17,19]. In this sense, memristive plasticity rules are similar to the classical weight-dependent STDP rule [16], but have a much higher degree of nonlinearity. Earlier it was shown that spiking neural networks with a plasticity model approximating the law of plasticity change in nanocomposite (NC) memristors (CoFeB)x (LiNbO3 )1−x successfully solve the problem of classifying handwritten digits on the corpus MNIST [3]. In addition, the laws of conductivity change in highly plastic poly-p-xylylene (PPX) memristors were studied [9], which makes the problem of modeling plasticity for this type of memristors, as well as its comparison to the NC plasticity, a relevant task. This paper studies the applicability of two types of memristive synaptic plasticity (NC and PPX) to the handwritten digits classification problem on the Digits corpus presented in the Scikit-Learn [11] library. Unlike the MNIST corpus, the Digits dataset is an order of magnitude smaller in both image size and number of examples. The latter circumstance in particular makes this corpus suitable for the current study: by their nature, STDP-like learning mechanisms are unsupervised, and their efficiency often strongly depends on the number of training examples. Consequently, a successful solution to the classification problem on a small-sized corpus may indicate a wider range of problems that can be solved in the future. The topology of the network used in this problem corresponds to that presented in the article [4]. The reason for this choice is the proven effectiveness of this network for image recognition problems, which makes it possible to more objectively evaluate the effectiveness of the proposed mechanisms of synaptic plasticity. For the same reason, this paper uses classical frequency coding using Poisson sequences, which has been previously used repeatedly in other works devoted to the study of memristic plasticity rules [3,13,20,22].
2 2.1
Materials and Methods Nanocomposite Memristor Plasticity
The dependence of synaptic conductance change Δw on the current value of the conductance w and on the time difference Δt between presynaptic and postsynaptic spike was proposed earlier for nanocomposite memristors [3]:
440
A. Sboev et al.
Fig. 1. The dependence of the change in synaptic conductance on the interval Δt between a presynaptic spike and a postsynaptic spike, A: for the nanocomposite memristive plasticity [3]; B: for the PPX memristive plasticity. Different curves correspond to different initial values of synaptic conductance.
⎧ ⎨A+ · w · 1 + tanh − Δt−μ+ if Δt > 0; τ + Δw(Δt) = ⎩A− · w · 1 + tanh Δt−μ− if Δt < 0; τ−
(1)
where A+ = 0.074, A− = −0.047, μ+ = 26.7 ms, μ− = −22.3 ms, τ + = 9.3 ms, τ − = 10.8 ms. 2.2
Model of Poly-p-xylylene Memristors
In PPX memristors, where resistive switching is driven by electrochemical metallization mechanism [9], the timing and initial weight dependence curves (Fig. 1 B) are significantly different from those for NC memristors (Fig. 1 A). The plasticity of PPX memristors is modelled here by fitting experimental measurements [9] with the following function: wmax −w + Δt 2 |Δt| + −β + ( wmax −wmin ) −γ ( τ ) e if Δt > 0; τ α e (2) Δw(Δt) = |Δt| w−wmin − − Δt 2 −β ( ) − wmax −wmin e−γ ( τ ) if Δt < 0. τ α e Here τ = 10 ms, α+ = 0.316, α− = 0.011, β + = 2.213, β − = −5.969, γ = 0.032, γ − = 0.146, wmax = 1, wmin = 0. The observed exponential dependence of the change in synaptic conductance on the initial conductance value is similar to what has been applied in other studies [14]. In contrast to NC-plasticity, for PPX-plasticity there is a significant asymmetry associated with the initial value of the synaptic weight w: if w is small, then the positive branch of the curve will dominate over the negative one. If w > 0.6, however, the contribution of the negative branch increases significantly. In other words, the greater the value of the synaptic weight, the more it decreases in case of the pre-spike arrival after the post-spike arrival, and vice versa. +
A Comparison of Two Variants of Memristive Plasticity
2.3
441
Network Model
The network topology used to solve the problem proposed here corresponds in general to that given in the paper [4]. The network consists of two layers, the first of which converts the input data into Poisson sequences of spikes with mean rate depending on the correspondent input vector components (multiplied by the intensity which is a hyperparameter), and the second layer performs their processing. The same number of neurons was used for both types of plasticity: 64 input (by the number of pixels in each image) and 1600 output neurons. This choice of the output layer size is due to the need to strike a balance between the recognition quality and the computational speed of the network. The neurons are Leaky Integrate-and-Fire with adaptive threshold, and with the postsynaptic current obeying the synaptic conductance model [4]. The only modification made to the network topology in the implementation used here is the absence of a separate population of inhibitory neurons: instead, excitatory neurons are linked by additional untrainable inhibitory connections in the all-to-all manner (except that any neuron is not connected to itself). This modification does not affect the operation principle of the network. The simulations were performed using the open-source BindsNET [5] library. The hyperparameters of the network varied for NC and PPX plasticities and were chosen as follows: – NC: time = 300 ms, dt = 1 ms, intensity = 3.98, inh = 17.5, rest = −65, reset = −65, thresh = −42.08, refrac = 3.4, tc decay = 109.74, tc trace = 10, norm = 36.76; – PPX: time = 15 ms, dt = 0.05 ms, intensity = 82.69, inh = 15.2, rest = −65, reset = −65, thresh = −28.83, refrac = 5.78, tc decay = 120.29, tc trace = 10, norm = 62.14. It is important to note that NC and PPX plasticities operate on different time scales: for PPX plasticity, the sample time must be significantly shorter, and the frequency of Poisson sequences is much higher. This fact can have a significant impact on the hardware implementation of spiking networks based on PPX memristors. 2.4
Output Decoding
Three different methods of obtaining the class of an input example on base of the neurons’ spiking rates are compared. All three methods involve recording the output spiking rates of a trained network in response to both the training set and the testing set, and then making a decision on the testing set samples on base of comparing their output spiking rates to the recorded training-set rates. – voting: in this decoding method during the training process the output neurons of the network are iteratively assigned labels of classes for which the average activity of those neurons is maximal. Prediction is then performed by neurons’ voting with their spikes: for instance, class “1” is predicted for
442
A. Sboev et al.
a test sample if the average activity of neurons with the label of class “1” is the highest of all neurons. – proportion: the method is almost identical to the previous one, but instead of the common averaging, weighting is performed based on the representativeness of the classes in the dataset. – logistic regression: in this decoding method, a LogisticRegression classifier from the Scikit-Learn [11] library is trained on the output spiking rates recorded when presenting the training set to the spiking network. During the testing phase, the output spiking rates in response the test examples are fed into the trained classifier, which predicts their class labels.
3
Experiments
The performance of an SNN with each of the plasticity models is tested on the benchmark problem of handwritten digits classification for the Digits dataset available in the Scikit-Learn [11] library. The dataset consists of 1797 gray-scale examples of handwritten digits 0 to 9, with roughly equal number of examples in all 10 classes. The size of each picture is 8 × 8 pixels. The color intensity of each pixel is encoded by an integer between 0 and 16. Classification performance is estimated using 5-fold cross-validation with stratified splitting. The training is performed for 3 epochs, with the statistics for training the regression model recorded only at the last training epoch. For the performance metric we use F1-score metric with averaging over all classes (F1macro), due to the almost identical representativeness of classes in the corpus.
4
Results
Table 1 show the performance of the spiking neural network described in 2.3 with NC and PPX plasticity. Both memristive plasticity rules equivalently successfully solve the proposed problem of handwritten digits classification. Of the three encoding methods used, logistic regression is the most efficient and provides a gain in accuracy of 5% in comparison with voting and proportion methods.
5
Discussion
In order to examine the structure of the synaptic weights established after learning with each of the plasticity types, the weights are depicted in Fig. 2, averaged for each class over all neurons assigned to that class. As expected, both plasticities lead to synaptic weights that resemble the average shape of each class of examples present in the training set. The representation to which the NCplasticity converges is more “fuzzy” (i.e., the shape of the resulting digits corresponds worse to the real images presented in the bottom row), but it is also more contrast than the corresponding representation for the PPX-plasticity.
A Comparison of Two Variants of Memristive Plasticity
443
Table 1. F1-macro (%) of handwritten digit recognition by the spiking neural network trained with PPX or NC plasticity, with different decoding methods. Minimum, maximum and mean values are presented for the cross-validation folds Plasticity Decoding
Min Max Mean
PPX
Voting 77 Proportion 79 Logistic regression 83
82 82 88
80 80 86
NC
Voting 78 Proportion 80 Logistic regression 84
85 84 86
81 82 85
Fig. 2. Top and middle rows: average synaptic weights of neurons corresponding to each of the classes after training the network, for the PPX-trained network and the NCtrained one respectively. Bottom row: samples of input images of each class, averaged over the training set.
Confusion matrices for the class labels obtained using each of the decoding methods (Fig. 3) show that both plasticities are prone to errors in handwritten digit recognition tasks typical for the most of neural network algorithms: the network trained with PPX plasticity makes more errors in classes 3 and 9, while NC plasticity makes errors in classes 8 and 9.
444
A. Sboev et al.
Fig. 3. Confusion matrices obtained after testing the network. The topmost row corresponds to the PPX-trained network, the bottom-most one – to the NC-trained one. Each row contains 3 matrices corresponding to the voting, proportion and log.reg methods of decoding the output. The color of a matrix element in row i and column j depicts the number of input images having the true class i and the predicted class j.
Conclusion In this paper two types of memristive plasticity rules (PPX and NC) were compared on the task of handwritten digits recognition on the Digits corpus. Experiments show that both plasticity rules solve the task equally successfully: F1-macro metric accuracies coincide within error limits for all three decoding methods considered (voting, proportion and logistic regression) and are 81, 81 and 86% (±2%), respectively. Thus logistic regression yields the best results: the gain for both plasticities is about 5%. Nevertheless, despite having almost identical accuracies, when analyzed in more detail, the considered types of memristive plasticity turn out to be significantly different: – operation time (exposure time of one example) and time step for the PPXplasticity is an order of magnitude smaller than for the NC-plasticity: this fact can be essential when building physical models of spiking networks based on PPX memristors; – the internal representation of the training data accumulated by the networks trained by each of the two plasticities is also different: the NC-plasticity is characterized by a more fuzzy, but more contrast representation, while the PPX-plasticity provides a distribution that is closer to that actually existing in the training data, but with less contrast (i.e. the amplitude of weights is on average lower than that for NC);
A Comparison of Two Variants of Memristive Plasticity
445
Acknowledgements. This work has been supported by the Russian Science Foundation grant No.21-11-00328 and has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.
References 1. Camu˜ nas-Mesa, L.A., Linares-Barranco, B., Serrano-Gotarredona, T.: Neuromorphic spiking neural networks and their memristor-cmos hardware implementations. Materials 12 (2019). https://doi.org/10.3390/ma12172745 2. Davies, M., et al.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018). https://doi.org/10.1109/MM.2018.112130359 3. Demin, V., et al.: Necessary conditions for STDP-based pattern recognition learning in a memristive spiking neural network. Neural Networks 134, 64–75 (2021) 4. Diehl, P.U., Cook, M.: Unsupervised learning of digit recognition using SpikeTiming-Dependent Plasticity. Front. Comput. Neurosci. (2015). https://doi.org/ 10.3389/fncom.2015.00099 5. Hazan, H., Saunders, D.J., Khan, H., Patel, D., Sanghavi, D.T., Siegelmann, H.T., Kozma, R.: Bindsnet: a machine learning-oriented spiking neural networks library in python. Front. Neuroinform. 12 (2018). https://doi.org/10.3389/fninf. 2018.00089 6. Indiveri, G., Corradi, F., Qiao, N.: Neuromorphic architectures for spiking deep neural networks. In: 2015 IEEE International Electron Devices Meeting, pp. 4.2.14.2.4 (2016). https://doi.org/10.1109/IEDM.2015.7409623 7. Ismail, M., Chand, U., Mahata, C., Nebhen, J., Kim, S.: Demonstration of synaptic and resistive switching characteristics in w/tio2/hfo2/tan memristor crossbar array for bioinspired neuromorphic computing. J. Mater. Sci. Technol. 96, 94–102 (2022). https://doi.org/10.1016/j.jmst.2021.04.025. https://www. sciencedirect.com/science/article/pii/S1005030221004655 8. Lapkin, D.A., Emelyanov, A.V., Demin, V.A., Berzina, T.S., Erokhin, V.V.: Spike-timing-dependent plasticity of polyaniline-based memristive element. Microelectron. Eng. 185–186, 43–47 (2018). https://doi.org/10.1016/j.mee.2017.10.017. https://www.sciencedirect.com/science/article/pii/S016793171730357X 9. Minnekhanov, A.A., et al.: On the resistive switching mechanism of parylene-based memristive devices. Org. Electron. 74, 89–95 (2019). https://doi.org/10.1016/j. orgel.2019.06.052 10. Paugam-Moisy, H., Bohte, S.M.: Computing with spiking neuron networks. In: Rozenberg, G., Back, T., Kok, J. (eds.) Handbook of Natural Computing, pp. 335–376. Springer, Heidelberg (2012). http://homepages.cwi.nl/∼sbohte/ publication/paugam moisy bohte SNNChapter.pdf. https://doi.org/10.1007/9783-540-92910-9 10 11. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 12. Prudnikov, N.V., et al.: Associative STDP-like learning of neuromorphic circuits based on polyaniline memristive microdevices. J. Phys. D: Appl. Phys. 53(41), 414,001 (2020). https://doi.org/10.1088/1361-6463/ab9262 13. Qu, L., Zhao, Z., Wang, L., Wang, Y.: Efficient and hardware-friendly methods to implement competitive learning for spiking neural networks. Neural Comput. Appl. 32(17), 13479–13490 (2020). https://doi.org/10.1007/s00521-020-04755-4
446
A. Sboev et al.
14. Querlioz, D., Dollfus, P., Bichler, O., Gamrat, C.: Learning with memristive devices: How should we model their behavior? In: 2011 IEEE/ACM International Symposium on Nanoscale Architectures, pp. 150–156 (2011). https://doi.org/10. 1109/NANOARCH.2011.5941497 15. Rajendran, B., Sebastian, A., Schmuker, M., Srinivasa, N., Eleftheriou, E.: Lowpower neuromorphic hardware for signal processing applications: A review of architectural and system-level design approaches. IEEE Signal Process. Mag. 36(6), 97–110 (2019). https://doi.org/10.1109/MSP.2019.2933719 16. van Rossum, M.C.W., Bi, G.Q., Turrigiano, G.G.: Stable hebbian learning from spike timing-dependent plasticity. J. Neurosci. 20(23), 8812–8821 (2000). http:// www.jneurosci.org/content/20/23/8812.long 17. Ryu, J.H., Mahata, C., Kim, S.: Long-term and short-term plasticity of Ta2O5/HfO2 memristor for hardware neuromorphic application. J. Alloys Compounds 850, 156,675 (2021). https://doi.org/10.1016/j.jallcom.2020.156675. https://www.sciencedirect.com/science/article/pii/S0925838820330395 18. Sa¨ıghi, S., Mayr, C.G., Serrano-Gotarredona, T., Schmidt, H., Lecerf, G., Tomas, J., Grollier, J., Boyn, S., Vincent, A.F., Querlioz, D., La Barbera, S., Alibart, F., Vuillaume, D., Bichler, O., Gamrat, C., Linares-Barranco, B.: Plasticity in memristive devices for spiking neural networks. Front. Neurosci. 9, 51 (2015). https:// doi.org/10.3389/fnins.2015.00051 19. Sboev, A.G., et al.: Self-adaptive STDP-based learning of a spiking neuron with nanocomposite memristive weights. Nanotechnology 31(4), 045,201:1– 10 (2019). https://doi.org/10.1088/1361-6528/ab4a6d. http://iopscience.iop.org/ article/10.1088/1361-6528/ab4a6d 20. Serrano-Gotarredona, T., Masquelier, T., Prodromakis, T., Indiveri, G., LinaresBarranco, B.: STDP and STDP variations with memristors for spiking neuromorphic learning systems. Front. Neurosci. 7, 2 (2013) 21. Taherkhani, A., Belatreche, A., Li, Y., Cosma, G., Maguire, L.P., McGinnity, T.: A review of learning in biologically plausible spiking neural networks. Neural Networks 122, 253–272 (2020).https://doi.org/10.1016/j.neunet.2019.09.036. http:// www.sciencedirect.com/science/article/pii/S0893608019303181 22. Wang, Z., et al.: Fully memristive neural networks for pattern classification with unsupervised learning. Nature Electron. 1(2), 137–145 (2018). https://doi.org/10. 1038/s41928-018-0023-2
Sentiment Analysis of Russian Reviews to Estimate the Usefulness of Drugs Using the Domain-Specific XLM-RoBERTa Model Alexander Sboev1,2 , Aleksandr Naumov1(B) , Ivan Moloshnikov1 , and Roman Rybka1 1 2
National Research Centre “Kurchatov Institute”, Moscow, Russia National Research Nuclear University MEPhI, Moscow, Russia
Abstract. This paper considers the problem of classifying Russianlanguage drug reviews into five classes (corresponding to the authors’ rating scores) and into two classes (the drug is helpful or useless) in terms of the sentiment analysis task. The dataset of reviews with a markup of pharmaceutically-significant entities and the set of neural network models based on language models are used for the study. The result obtained in this task formulation is compared to a solution based on extracting the named entities relevant to the drug-taking effectiveness, including the positive dynamics after the drug, the side effects occurring, the worsening of the condition, and the absence of the effect. It is shown that both approaches (a classification one and one based on the extracted entities) demonstrate close results in the case of using the best-performing model – XML-RoBERTa-sag. Keywords: Sentiment analysis · Text classification learning · Russian-language texts · Public opinion
1
· Transfer
Introduction
Nowadays, much attention in the literature is paid to the task of extracting pharmaceutically and medically relevant information from texts written in natural language, in particular from reviews of Internet users on certain websites [1,7,14], or from tweets [7,14], where they share their individual experiences of using certain medicines. The relevance of this task is determined by the usefulness of the information extracted from such sources for pharmacists, physicians, and pharmaceutical manufacturers in terms of expanding the data on the use of drugs and estimates of their effectiveness beyond a relatively limited range of clinical trials [7]. The added value of such information is that it relates more to using medicines in the most natural conditions, in real-life settings, often not as prescribed by a physician. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 447–456, 2022. https://doi.org/10.1007/978-3-030-96993-6_49
448
A. Sboev et al.
There are several approaches to solving the problem in question, such as the classification task for the presence of meaningful information in the text [12], the Named Entity Extraction (NER) problem for significant entities [21], extraction of the relations between entities [6], etc. In general, the texts under study are written by people without any medical background in the language of everyday vocabulary, often with the use of slang words, which significantly complicates the task and reduces the accuracy of its solution using the mentioned approaches based on marked-up corpora that are explicitly prepared within these approaches. Therefore, the use of alternative options seems relevant. One of such options is using a drug’s rating on a scale of usefulness given by its author, as adopted by many review sites. This rating can be interpreted as an expression of sentiment over the drug usefulness: the higher the rating, the more positive the assessment of usefulness. This approach is particularly relevant for Russianlanguage texts due to the limited set of available corpora with detailed annotation of pharmaceutically important information and specialized dictionaries and thesauruses in use [21]. In this paper, using the Russian-language corpus of marked reviews (RDRS), we compare the sentiment scores derived from existing solutions to the NER task of highlighting drug effects and classifying reviews on a utility-scale, with an assessment of their consistency.
2
Related Works
The task of sentiment classification text into two [11,17], three [1,4,10,16], or more classes [4,20] is quite well represented in various applied fields. However, there are much fewer works for medical fields are confirmed in the works [4,11, 17]. In the paper [17], the authors propose a method based on a bi-directional recurrent neural network BiLSTM together with the vector word representation GloVe [19] for binary sentiment classification of medical drug reviews1 . The authors showed that this method achieves an accuracy of 0.96 on the F1-score metric and outperforms all basic machine learning-based methods, including decision trees, random forest, k-nearest neighbors, and a naive Bayesian classifier. The authors of [4] compared different deep learning approaches to analyze the tone of medical drug reviews, categorized into three and ten classes (0 - negative experience, 10 - maximum positive experience). The source of the reviews was Drugs.com. The paper shows that applying the BERT language model [8] improves the final result relative to the neural network based on convolutional layers. The peculiarity of the better approach is a severe increase in training time and the need for significant computational resources for implementation. As a result, the authors obtained accuracies of 0.82 and 0.64 on the F1-macro metric for three and ten classes, respectively. A set of drug reviews from Drugs.com was also used in [1]. The paper investigates Deep learning methods using recurrent 1
UCI ML Drug Review dataset at https://www.kaggle.com/.
Sentiment Analysis of Russian Drugs Reviews
449
and convolutional layers, as well as the classical naive Bayesian machine learning model. The ensemble classifier obtained the best result (accuracy of 0.87 for classification into three classes according to the F1-macro metric). In the paper [10], the authors marked up the SentiDrugs corpus containing reviews of medical drugs from Druglib.com, aspects extracted from them, and sentiment assigned to them. A transfer learning technique was used here in conjunction with a multitasking approach. A model consists of the bi-directional recurrent neural network GRU along with the vector word representation GloVe. Training the model included the run of two tasks at once: analyzing sentiment of the drug review text and determining sentiment relative to a given aspect. Various objects were considered as an aspect, such as adverse drug reactions (ADR)), as in [9]. The model was pre-trained on a large corpus of data. The obtained result of 0.78 on the F1-macro metric outperformed current models for aspect-based sentiment analysis, which shows the effectiveness of transfer learning. In the other paper [24], researchers conducted investigations on the PsyTar psychiatric medication user review dataset [26]. The authors investigated the effect of the presence of health-related factors (e.g., adverse drug reactions (ADRs) and drug indications) in the text on predicting a recall rating reflecting user satisfaction with the medication. The results showed that a regression model based on the gradient-boosting method, which takes into account considered factors, predicts the rating better. At the same time, the marked ADRs are significant for predicting negative satisfaction (low rating). Currently, there is a limited set of Russian language corpora labeled for the sentiment classification task in open sources, which does not contain corpora on drug reviews. Therefore, we use an approach to sentiment evaluation based on the scale of usefulness available on sites. The latest reviews [22,23] on solving the sentiment analysis task of Russian language texts show the best efficiency of language models using the transformer’s architecture. In this work, we use a corpus of labeled Russian-language RDRS reviews from our previous work [21] with user evaluations of the usefulness of drugs. Moreover, we also use a language model based on the XLM-RoBERTa architecture proposed in this work (XML-RoBERTa-sag), additionally trained on a large corpus of unlabeled drug reviews. This model showed higher accuracy compared to the original model for the extraction of pharmaceutically relevant entities. The prepared corpus and the model were used in this paper to conduct a study comparing user recall-based drug utility assessment approaches.
3
Dataset
The RDRS corpus consists of 2,800 drug reviews from Otzovik.com, annotated by expert pharmacists. The reviews contain titles, advantages, and disadvantages of the drug, commentary, general impression, and a rating of the drug on a 5point scale. The annotation scheme included 18 types of entities, of which, for this paper, the most interesting are the mentions of the effects of using the drugs:
450
– – – – –
A. Sboev et al.
“BNE-Pos” - mentions of a positive trend after or during use of the drug; “ADE-Neg” - mentions of a negative trend after or during use of the drug; “ADR” - mentions of adverse side effects observed while using the drug; “Worse” - mentions of a worsening condition after taking the drug; “NegatedADE” - mentions of no effect of the drug, i.e., the drug has no effect.
A user review explaining the assigned rating of a drug’s usefulness usually mentions several medications at once. However, most reviews mention the primary drug in the title, making it possible to avoid identifying the drug under evaluation. Table 1 contains brief information about the corpus. Table 1. RDRS dataset statistics Number of Texts length reviews with (number of the main drug characters) in the title Yes
No
RDRS 2792 8
User rating of a drug’s helpfulness
Min. Avg. Max. 1 star 148
2 star
3 star
4 star
5 star
782.2 1194 437 (15.6%) 370 (13.2%) 495 (17.7%) 1098 (39.2%) 400 (14.3%)
The total number of mentions of different types of entities considered: “BNEPos”: 5623; “ADR”: 1795; “NegatedADE”: 2795; “Worse”: 219; “ADE-Neg”: 84. (See Fig. 1). The abovementioned effects can be divided into two groups according to the tone specification: – Negative effects: “ADE-Neg”, “ADR”, “Worse”, “NegatedADE”; – Positive effects: “BNE-Pos.”
Fig. 1. Distribution of the number of mentions for each of the effects in the RDRS dataset reviews analyzed in this paper.
Figure 2 shows histograms that show the distribution of the average number of “negative” tags (left) and “positive” tags (right) in the reviews for each of the rating values in the training part of the Fold #1 (see Sect. 5).
Sentiment Analysis of Russian Drugs Reviews
451
Fig. 2. Distribution of negative (left) and positive (right) effects in reviews with different user ratings for the training part of the Fold #1.
The figure shows that the authors mention more “negative” tags when they give a drug a rating of “1”, “2”, and “3”; and more “positive” tags when they provide a rating of “4” and “5”. For a rating value of “3,” the authors use an average of 2.5 times more “negative” tags than “positive” tags. Other folds demonstrate a similar situation. Based on these results, the ratings presented, in addition to the classification into five utility classes, allow us to solve the classification problem of tone in a binary formulation: reviews with ratings of “1”, “2” and “3” have a negative class, and those with ratings of “4” and “5” have a positive class.
4 4.1
Methods XLM-RoBERTa-sag Approach
As a base of this paper investigations, we chose a model from our previous work based on the XLM-RoBERTa model [5] of the Transformer architecture trained in two stages. In the first stage, the training was based on the language model using a large set of unlabeled drug reviews in Russian (XLM-RoBERTasag). The second stage consisted of training the model (fine-tuning) to solve the task of extracting named entities-the final model extracts pharmaceutically significant entities, including the effects of drug use. (See [21] for details). The model’s accuracies of the effect extraction for the full-sized RDRS corpus is: 50.3 for “BNE-Pos” tag, 52.8 for “ADR” tag and 52.0 for union of tags: “Worse”, “NegatedADE”, “ADE-Neg”. Currently, these are the state-of-the-art published accuracies established on a full-sized Russian language corpus with a wide variability of texts. In this paper, this model is used in two variants of determining the usefulness of the review: – Classification task: the model is used to either determine the tone class in a binary setting or determine one of the five classes corresponding to the original user score; – Named entity recognition: the original model is used to determine the mentions of effects, on the basis of which the usefulness of the review is calculated
452
A. Sboev et al.
using the following set of rules: the sentiment class is determined either by the first effect encountered in the review or by the more significant number of effects of the same sentiment class in the review. Training the model for the classification problem was performed using the following hyperparameters: batch size - 64; the number of epochs - 10; learning rate - 3.275 ∗ 10−5 . 4.2
Other Language Models
Several transformer-based language models were used in comparative experiments for the classification task, among them: – Rubert-base-cased2 . - This model is based on the multilingual version of BERT-base (12-layer, 768-hidden, 12-heads, 180M parameters) [13]. For its training, a part of Russian-language Wikipedia and news data was used. The model contains 12 layers, 768 hidden neurons, 12 attention heads, and 180 million parameters. Among the works on the analysis of Russian-language texts, this model is the most commonly used one; – XLM-Roberta-large3 - This model was the basis for the XLM-Roberta-sag model described in Sect. 4.1. For its training, 2.5 TB of data from the CommonCrawl project was used, containing texts written in 100 languages. In this case, Russian is the second most popular language in the entire corpus after English. The model contains 24 layers, 1024 hidden neurons, 16 “attention heads”, and 550 million parameters. The language model implementations were taken from the open-source Huggingface project [25]. The hyperparameters for training correspond to those used for XLM-RoBERTa-sag. 4.3
Other Approaches
In solving the classification problem, additional comparison experiments were conducted using basic classifiers and evaluations: – Random: the class label is randomly determined for each input example; – Lexicon: only binary classifier (does not participate in comparison with multiclass classifier) based on tone vocabularies. The class label is determined by words from positive and negative parts of RuSentiLex [15] sentiment vocabulary dictionary in the input example text (lower case word lemmas are considered). The positive class label is applied to the example, which contains more words from the positive part of the dictionary. If there are more words from the negative part of the dictionary or the same number of words, then the negative class label is applied. If there are no words from the dictionaries in the example, the negative class label is used; 2 3
‘rubert-base-cased’ model from the hugging face website: https://huggingface.co/. ‘xlm-roberta-large’ model from the hugging face website: https://huggingface.co/.
Sentiment Analysis of Russian Drugs Reviews
453
– Rule-based-NER: only binary classifier (does not participate in comparison with multiclass classifier, rules on positive and negative tags: • gold_ner - the calculation is based on the reference annotations of the effect entities represented in the corpus; • pred_ner - the calculation is based on the entities defined by the XLMRoBERTa-sag model, the accuracy of which is presented in Sect. 4.1. – LinearSVM4 : a classifier based on the support vector machine with a linear kernel. The implementation of the method is taken with standard hyperparameters from the open-source library scikit-learn [18]. The features used for this classifier are: • TF-IDF - encoding of the input text is done by the TF-IDF frequency method by n-grams of characters (from 4 to 8); • fastText - a vector word representation model based on the fastText skipgram method [2] with vector dimensionality equal to 300. The model was trained for five epochs on the Russian part of Wikipedia and Russianlanguage news data from lenta.ru. The implementation of the model5 was taken from the open-source library DeepPavlov [3].
5
Experiments
The data were divided into training, validation, and test parts to conduct experiments and evaluate the accuracy of the classifiers. The entire dataset was divided into five equal parts (folds), stratified relative to the available rating scores (each allocated part had the same proportion of available rating scores). Crossvalidation was then performed, in which each of the parts, in turn, was the test set, and the other was the training set. As a validation set, 5% of examples from the training part were taken. All the models were evaluated using the F1-score metric: P TP P recision∗Recall P recision = T PT+F P ; Recall = T P +F N ; F1 −score = 2 ∗ P recision+Recall , where TP is the number of true-positive decisions, FP is the number of falsepositive decisions, FN is the number of false-negative decisions. The final score is calculated with macro averaging, where F1-scores obtained for each class separately are averaged over all classes (hereafter, F1-macro). This averaging is chosen because we consider all classes equally important, regardless of their representativeness in the corpus. For the rule-based classifier, calculated on the representativeness of the effects mentioned in the review, two estimates are presented: one is based on the reference mentions of the effects highlighted by the annotators when creating the corpus (hereafter gold_ner), the other is based on the mentions highlighted by the XLM-RoBERTa-sag model (hereafter pred_ner). Table 2 shows the results of the set of models on the RDRS corpus for binary and multiclass classification. 4 5
LinearSVC model from the scikit-learn website: https://scikit-learn.org/. fastText pre-trained model from the website: https://docs.deeppavlov.ai/.
454
A. Sboev et al. Table 2. F1-score for binary classification Model
Features
F1-score (macro) binary task multiclass task
Random
–
0.5
0.2
Lexicon
–
0.21
Not applicable
Rules (first entity)
gold_ner
0.86
Not applicable
pred_ner
0.85
not applicable
gold_ner
0.87
Not applicable
pred_ner
0.86
Not applicable
tf-idf
0.84
0.39
fastText
0.77
0.33
bert
Rules (majority) Linear SVM Rubert
6
0.88
0.52
XLM-Roberta-large xlm-roberta 0.91
0.56
XLM-Roberta-sag
0.58
xlm-roberta 0.92
Conclusion
The conducted research allows us to draw the following conclusions. The results of solving the binary tone estimation task on the rated user reviews and based on the allocation of entities reflecting the effectiveness of taking medications to agree with each other The difference between the approaches is 5-6%, with an error of 2%. The use of mentions highlighted by the best model XLM-RoBERTa-sag instead of the reference effect mentions in the corpus does not significantly affect the result, increasing the difference by 2%. A similar change in the difference yields using a score based on the first mention of an effect instead of a score based on most mentions in the recall. The XLM-RoBERTa-sag model outperforms the closest basic Linear SVM model by 8% with a total std of 3% in the case of the binary problem and by 19% in the case of 5-classes with a total std of 5%. The most typical errors in comparing approaches are related to the inability to unambiguously compare user feedback and evaluation of drug use, which is expressed in the following typical cases: – the evaluation is inconsistent with the review. For example, there is mention of deterioration of dynamics with a positive evaluation or the evaluation is positive, but there is no mention of improvement of dynamics; – the review describes several cases of use of the same drug with different amounts of mention of emerging effects; – the review is not about the use of the drug at all, although the evaluation is positive. Errors also arise due to the lack of consideration of relationships between entities, in particular effects and drugs mentioned. Proper consideration of the
Sentiment Analysis of Russian Drugs Reviews
455
above cases for a more representative evaluation of the effectiveness of drugs is the goal of our further work. Acknowledgments. This work has been supported by the Russian Science Foundation grant No.20-11-20246 and has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.
References 1. Basiri, M.E., Abdar, M., Cifci, M.A., Nemati, S., Acharya, U.R.: A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques. Knowl.-Based Syst. 198, 105949 (2020) 2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017) 3. Burtsev, M., et al.: Deeppavlov: open-source library for dialogue systems. In: Proceedings of ACL 2018, System Demonstrations, pp. 122–127 (2018) 4. Colón-Ruiz, C., Segura-Bedmar, I.: Comparing deep learning architectures for sentiment analysis on drug reviews. J. Biomed. Inf. 110, 103539 (2020) 5. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019) 6. Dai, D., Xiao, X., Lyu, Y., Dou, S., She, Q., Wang, H.: Joint extraction of entities and overlapping relations using position-attentive sequence labeling. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 6300–6308 (2019) 7. Denecke, K.: Health Web Science: Social Media Data for Healthcare. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-319-20582-3 8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) 9. Gräßer, F., Kallumadi, S., Malberg, H., Zaunseder, S.: Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning. In: Proceedings of the 2018 International Conference on Digital Health, pp. 121–125 (2018) 10. Han, Y., Liu, M., Jing, W.: Aspect-level drug reviews sentiment analysis based on double bigru and knowledge transfer. IEEE Access 8, 21314–21325 (2020) 11. Jiménez-Zafra, S.M., Martín-Valdivia, M.T., Molina-González, M.D., UreñaLópez, L.A.: How do we talk about doctors and drugs? sentiment analysis in forums expressing opinions for medical domain. Artif. Intell. Med. 93, 50–57 (2019) 12. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019) 13. Kuratov, Y., Arkhipov, M.: Adaptation of deep bidirectional multilingual transformers for Russian language. arXiv preprint arXiv:1905.07213 (2019) 14. Li, Z., Fan, Y., Jiang, B., Lei, T., Liu, W.: A survey on sentiment analysis and opinion mining for social multimedia. Multimedia Tools Appl. 78(6), 6939–6967 (2018). https://doi.org/10.1007/s11042-018-6445-z 15. Loukachevitch, N., Levchik, A.: Creating a general Russian sentiment lexicon. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1171–1176 (2016)
456
A. Sboev et al.
16. Naumov, A.: Neural-network method for determining text author’s sentiment to an aspect specified by the named entity. In: CEUR Workshop Proceedings (2020) 17. Obayes, H.K., Al-Turaihi, F.S., Alhussayni, K.H.: Sentiment classification of user’s reviews on drugs based on global vectors for word representation and bidirectional long short-term memory recurrent neural network. Indonesian J. Electric. Eng. Comput. Sci. 23(1), 345–353 (2021) 18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 19. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 20. Sboev, A., Naumov, A., Rybka, R.: Data-driven model for emotion detection in Russian texts. Procedia Comput. Sci. 190, 637–642 (2021) 21. Sboev, A., et al.: An analysis of full-size Russian complexly ner labelled corpus of internet user reviews on the drugs based on deep learning and language neural nets. arXiv preprint arXiv:2105.00059 (2021) 22. Smetanin, S.: The applications of sentiment analysis for Russian language texts: current challenges and future perspectives. IEEE Access 8, 110693–110719 (2020) 23. Smetanin, S., Komarov, M.: Deep transfer learning baselines for sentiment analysis in Russian. Inf. Process. Manag. 58(3), 102484 (2021) 24. Tutubalina, E., Alimova, I., Solovyev, V.: Biomedical entities impact on rating prediction for psychiatric drugs. In: van der Aalst, W.M.P., et al. (eds.) AIST 2019. LNCS, vol. 11832, pp. 97–104. Springer, Cham (2019). https://doi.org/10. 1007/978-3-030-37334-4_9 25. Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019) 26. Zolnoori, M., et al.: The psytar dataset: From patients generated narratives to a corpus of adverse drug events and effectiveness of psychiatric medications. Data in brief 24, 103838 (2019)
Correlation Encoding of Input Data for Solving a Classification Task by a Spiking Neural Network with Spike-Timing-Dependent Plasticity Alexander Sboev1,2(B) , Alexey Serenko1 , and Roman Rybka1 1 2
National Research Centre, Kurchatov Institute, Moscow, Russia National Research Nuclear University MEPhI, Moscow, Russia
Abstract. We propose a new approach for encoding input data into spike sequences presented to the spiking neural network in a classification task: an input vector is represented by mutual correlations of input spike sequences. The accuracy obtained on the benchmark classification tasks of Fisher’s Iris and Wisconsin breast cancer is comparable to the results of other existing approaches to spiking neural network learning on base of Spike-Timing-Dependent Plasticity. Keywords: Spiking neural networks plasticity · Machine learning
1
· Spike-timing-dependent
Introduction
Developing learning algorithms for spiking neural networks based on local plasticity mechanisms such as Spike-Timing-Dependent Plasticity (STDP) is increasingly relevant thanks to the ongoing progress towards hardware implementation of such networks in neuromorphic devices with ultra-low power consumption [8]. Applying a spiking network to solving classification tasks requires encoding the input data into spike sequences presented to the input synapses of the network [1]. Rate encoding, in which an input synapse receives a Poisson spike train with the mean rate depending on the corresponding input vector component, allows learning based either on the ability of Hebbian plasticity to strengthen synapses that receive more input spikes [2] or on the neuron’s mean output spiking rate stabilization observed with certain forms of STDP [9]. Such learning achieves competitive performance on the benchmark tasks of classifying images [2] and real-valued vectors [9]. Temporal encoding, in which an input vector component is encoded by the timing of an input spike arriving at the corresponding synapse, allows learning based on the ability of a neuron with STDP to memorize repeating input spike patterns [5]. A one-layer network with temporal encoding was shown to classify real-valued vectors [9], while multi-layer networks were applied to image classification tasks [4]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 457–462, 2022. https://doi.org/10.1007/978-3-030-96993-6_50
458
A. Sboev et al.
However, there still has not established a commonly-accepted technique of STDP-based spiking network learning that would be consistently efficient on a wide range of tasks. Thus, the optimal choice of input encoding and output decoding for STDP learning remains an open question. This work proposes a novel way of input encoding, in which each input vector component is assigned a bunch of input synapses, and the value of the vector component is encoded by the mutual correlation of spike sequences these synapses receive. The method used for generating correlated input spike sequences is described in Sect. 2.1. The neuron and synapse models (Sect. 2.2), the model setup and the learning algorithm (Sect. 2.3), and input preprocessing (Sect. 2.6) are chosen in accordance with earlier work on STDP learning with other input encoding [9]. Output decoding is described in Sect. 2.4 and based on the assumption that, the more a bunch of inputs correlate among each other, the more it will correlate with the neuron’s output spikes, which will cause STDP to strengthen these inputs, thus making the neuron specifically sensitive to the input it was trained on. The applicability of the proposed learning algorithm is tested on benchmark classification tasks of Fisher’s Iris and Wisconsin breast cancer described in Sect. 2.5.
2 2.1
Materials and Methods Input Encoding
Each component xi of the input vector x corresponds to K input synapses of a neuron, and is encoded by feeding these synapses with 2-s-long spike sequences, all of which have the same mean rate r = 20 Hz and correlate with each other with the Pearson coefficient ci = k · xi . Spike sequences are generated with the help of the algorithm described in the literature [7]. First, a reference sequence S0 of the mean rate r is generated by iterating over discrete time steps δt = 0.1 ms, each time step containing a spike with the probability P (S0 (t) = 1) = r ·δt, and no spike otherwise (here S0 (t) = 1 denotes the presence of spike, and 0 denotes the absence of spike). Then, each of the K correlated spike sequences is derived independently from the reference sequence under the following rule: a timestep that contains a spike in the reference sequence has the probability P (Si (t) = 1 | S0 (t) = 1) = Θ to contain a spike in a derived sequence. With the probability P (Si (t) = 0 | S0 (t) = 1) = 1 − Θ, a spike in the reference sequence does not get into the derived sequence. At a timestep that is empty in the reference sequence, a spike can occur with the probability P (Si (t) = 1 | S0 (t) = 0) = ϕ in the derived sequence. Θ and ϕ are chosen so that the Pearson coefficient of correlation between two derived sequences is c, and their mean rate equals the rate r of the reference sequence: √ √ Θ = r δt + (1 − r δt) · c. ϕ = r δt · (1 − c),
Correlation Encoding for Solving a Classification Task
2.2
459
Neuron and Synapse Model
For the neuron model we use the computationally simple Leaky Integrate-andFire: V − Vrest Isyn (t) dV =− + , dt τm Cm with the exponential form of postsynaptic current Isyn (t), in which an input spike arriving at time tsp at i-th input synapse of the neuron adds to Isyn (t) an exponentially-decaying pulse w(tsp )
sp qsyn − t−t e τsyn Θ(t − tsp ), τsyn
where wi (t) is the synaptic weight, and Θ is the Heaviside step function. A neuron emits a spike as soon as its membrane potential V (t) exceeds the threshold Vth . After that, V (t) is reset to Vrest , and the neuron is insensitive to its input spikes during its refractory period τref . The input synapses of the neuron possess additive STDP with restricted symmetric spike pairing scheme [6]: an input (presynaptic) spike arriving at the synapse at time tpre triggers a weight decrease, the amount of which depends on the interval between tpre and the time tpost of the latest preceding output (postsynaptic) spike: tpre − tpost , Δw = −λ · α · exp − τ− but only if there have been no other presynaptic or postsynaptic spikes between tpre and tpost . Analogously, a neuron emitting a post-spike causes a weight increase tpost − tpre Δw = λ · exp − , τ+ if no other spikes occur between tpre and tpost . The synaptic weight is restricted so as not to exceed 1 nor fall below 0, and clamped to these boundary values if exceeding after being changed by Δw. The neuron and synapse model constants are chosen in accordance with earlier work with other input encoding [9]: Vrest = 0, Vth = −54 mV, τm = 10 ms, qsyn = 5 fC, τsyn = 5 ms, λ = 0.01. Cm , λ, τ + , and τ − are adjusted separately for each particular classification task with the help of the MultiNEAT [10] genetic algorithm. 2.3
Learning Algorithm
The model consists of unconnected neurons, the number of which equals the number of classes in the classification task. During learning, each neuron is assigned a class and receives training set vectors of the corresponding class only. The vectors are encoded as described
460
A. Sboev et al.
in Sect. 2.1, the reference spike sequence S0 generated independently for each neuron and each input vector. During the testing stage, the weights are fixed, and every neuron receives the training and testing set vectors from all the classes. 2.4
Output Decoding
We assume the output value Outi (x), one per each neuron i, sufficient for classifying the current input vector x, to be contained in the correlation of the sequence Sout (t) of output spikes (emitted by the neuron in response to the input vector) with the reference sequence S0 (t) on base of which the input spike sequences encoding the current vector are generated. Taking into account a delay between receiving input spikes and emitting output spikes caused by the neuron membrane potential dynamics, Outi (x) is defined as the number of such spike pairs that the output spike is within 0 to 2τm = 20 ms: Outi (x) =
20 ms
Δt=0
t while presenting x
S0 (t)Sout (t + Δt),
where summing is performed instead of integrating because time is measured up to the simulation timestep δt. During testing, the values {Outi (x), x ∈ training set of class i} form the own-class training-set distribution Pi of the neuron i. For a testing set vector y, its Outi (y) values for all neurons i are compared to the own-class trainingset distributions Pi of these neurons, and the vector is assigned to the class j to whose training-set distribution Pj its Outj (y) has the highest probability to belong. The probability is estimated by assuming P to be normal and calculating its mean and standard deviation. In order to assess the efficiency of the decoding method described above, it is compared to decoding using a conventional Gradient Boosting classifier (GBC): the classifier is trained on the Out values of each neuron in response to training set vectors, and then is given these values for the testing set vectors and is to predict their classes. 2.5
Datasets
The classification performance is assessed on two benchmarks: Fisher’s Iris [3], having 3 classes of 50 4-dimensional vectors, and Wisconsin breast cancer [11], having 350 30-dimensional vectors in the class “benign” and 212 in the class “malignant”. 2.6
Input Preprocessing
Before encoding, input vectors are minmaxscale-normalized so that each component ranges from 0 to 1, and then are processed with Gaussian receptive fields [12]
Correlation Encoding for Solving a Classification Task
461
which increases the input dimension by the factor of N , where N is the number of receptive fields. The receptive field centers μ1 , . . . , μN are placed uniformly from 0 to 1, and each component xi of the input vector is transformed into (xi −μj )2 . The N components g(xi , μ1 ), . . . , g(xi , μ1 ), where g(xi , μj ) = exp σ parameter N is adjusted for each particular task using the genetic algorithm.
3
Results
The classification accuracy of the proposed algorithm is presented in Table 1, measured in F1-score. Error ranges are given over the splits of 5-fold crossvalidation. When decoding with Gradient Boosting, the accuracy is the same (up to the error ranges) as when decoding as described in Sect. 2.4. This confirms the efficiency of the proposed decoding method. Overall, the accuracy of the proposed learning with correlation encoding is not significantly inferior to the earlier results of one-layer spiking networks with STDP learning and rate or temporal encoding. Table 1. F1-scores of the proposed learning algorithm and a few other existing ones on the classification task of Fisher’s Iris and Wisconsin breast cancer Spiking neural network
Iris
With correlation encoding
95% ± 5% 85% ± 3%
Cancer
With corr. encoding and GBC-based decoding 94% ± 3% 87% ± 2%
4
With rate encoding [9]
97% ± 3% 90% ± 1%
With temporal encoding [9]
99% ± 1% 89% ± 1%
Conclusion
The proposed algorithm of spiking neural network learning with correlation encoding shows on the benchmark tasks of Fisher’s Iris and Wisconsin breast cancer classification accuracies that do not significantly differ from that of one-layer spiking networks with STDP-based learning with other input encoding. This confirms, as a proof of concept, the prospective feasibility of correlation encoding within learning algorithms for spiking neural networks based on Hebbian plasticity mechanisms. Studying the possible application of correlation encoding in multi-layer networks, as well as developing less complicated ways of decoding the output spikes into the classification result, are directions of further work. Acknowledgments. This work has been supported by the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/. The authors are grateful to Niyaz Yusupov, M. Sc. student at Moscow Institute of Physics and Technology, for his assistance in computational experiments.
462
A. Sboev et al.
References 1. Auge, D., Hille, J., Mueller, E., Knoll, A.: A survey of encoding techniques for signal processing in spiking neural networks. Neural Process. Lett. 53(6), 4693– 4710 (2021). https://doi.org/10.1007/s11063-021-10562-2. https://link.springer. com/article/10.1007/11063-021-10562-2 2. Demin, V., et al.: Necessary conditions for STDP-based pattern recognition learning in a memristive spiking neural network. Neural Netw. 134, 64–75 (2021). https://doi.org/10.1016/j.neunet.2020.11.005. https://www.sciencedirect. com/science/article/pii/S0893608020303907 3. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annu. Eugenics 7, 179–188 (1936). https://doi.org/10.1111/j.1469-1809.1936.tb02137.x. http://onlinelibrary.wiley.com/doi/abs/10.1111/j.1469-1809.1936.tb02137.x 4. Kheradpisheh, S.R., Ganjtabesh, M., Thorpe, S.J., Masquelier, T.: STDPbased spiking deep convolutional neural networks for object recognition. Neural Netw. 99, 56–67 (2018). https://doi.org/10.1016/j.neunet.2017.12.005. http:// www.sciencedirect.com/science/article/pii/S0893608017302903 5. Masquelier, T., Guyonneau, R., Thorpe, S.J.: Spike Timing Dependent Plasticity finds the start of repeating patterns in continuous spike trains. PLoS ONE 3(1), e1377 (2008). https://doi.org/10.1371/journal.pone.0001377 6. Morrison, A., Diesmann, M., Gerstner, W.: Phenomenological models of synaptic plasticity based on spike timing. Biol. Cybern. 98, 459–478 (2008). https://doi. org/10.1007/s00422-008-0233-1 7. van Rossum, M.C.W., Bi, G.Q., Turrigiano, G.G.: Stable Hebbian learning from spike timing-dependent plasticity. J. Neurosci. 20(23), 8812–8821 (2000). http:// www.jneurosci.org/content/20/23/8812.long 8. Sa¨ıghi, S.: Plasticity in memristive devices for spiking neural networks. Front. Neurosci. 9, 51 (2015). https://doi.org/10.3389/fnins.2015.00051 9. Sboev, A., Serenko, A., Rybka, R., Vlasov, D.: Solving a classification task by spiking neural network with STDP based on rate and temporal input encoding. Math. Meth. Appl. Sci. 43(13), 7802–7814 (2020). https://doi.org/10.1002/mma. 6241. https://onlinelibrary.wiley.com/doi/abs/10.1002/mma.6241 10. Stanley, K., Miikkulainen, R.: MultiNEAT - a portable software library for performing neuroevolution. http://multineat.com/index.html 11. Street, W.N., Wolberg, W.H., Mangasarian, O.L.: Nuclear feature extraction for breast tumor diagnosis. In: International Symposium on Electronic Imaging: Science and Technology, vol. 1905, pp. 861–870 (1993). https://doi.org/10.1117/12. 148698. https://minds.wisconsin.edu/bitstream/handle/1793/59692/TR1131.pdf? sequence=1 12. Yu, Q., Tang, H., Tan, K.C., Yu, H.: A brain-inspired spiking neural network model with temporal encoding and learning. Neurocomputing 138, 3–13 (2014). https:// doi.org/10.1016/j.neucom.2013.06.052
The Two-Stage Algorithm for Extraction of the Significant Pharmaceutical Named Entities and Their Relations in the Russian-Language Reviews on Medications on Base of the XLM-RoBERTa Language Model Alexander Sboev1,2(B) , Ivan Moloshnikov1 , Anton Selivanov1 , Gleb Rylkov1 , and Roman Rybka1 1 2
National research Centre “Kurchatov Institute”, Moscow, Russia National Research Nuclear University MEPhI, Moscow, Russia
Abstract. The Internet contains a large amount of heterogeneous information, the extraction and structuring of which is currently a relevant task. This is especially relevant for tasks of social importance, in particular the analysis of the experience of using pharmaceutical products. In this paper, we propose a two-step sequential algorithm for extracting named entities and the relationships between them. Its creation was made possible by the availability of a marked-up corpus of Internet users’ reviews of medicines (Russian Drug Review Corpus). The basis of the algorithm is the language model XLM-RoBERTa-sag, which is pre-trained on a large corpus of unlabeled texts of reviews. The developed algorithm achieves the accuracy of identifying related entities: 71.6 and relations: 80.5, which is the first estimate of the accuracy of the solution of the considered problem on the Russian-language drug review texts. Keywords: Neural networks · Natural language processing · Relation extraction · Pharmaceutical dataset · Russian language · Language models
1
Introduction
The task of extracting meaningful information is relevant to a number of applied tasks of analysis of the Internet resources, in particular the evaluation of the effectiveness of medicines. The manual execution of such analysis is a timeconsuming process that requires the involvement of experts familiar with the subject. Automating this process requires solving two main tasks: recognizing named entities and determining the relationships between them. Several papers are devoted to solving this problem, and the approaches presented there can be divided into two types. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 463–471, 2022. https://doi.org/10.1007/978-3-030-96993-6_51
464
A. Sboev et al.
The first is a step-by-step approach to identifying entities and relationships, which is based on the sequential use of two separate models. At the same time, the models may differ in their principle of organization. The best results show methods based on neural network solutions both based on popular LSTM-CRF topologies [13], and language neural network models [7,8]. Another approach based on a joint or end-to-end model that processes the text to extract related entities pairs. The text is represented by a set of word combinations (spans), which are classified according to the classes of named entities. Span pairs are classified by the presence of relations of a certain type. Successful implementations of this approach are presented in [2–4]. As the modern studies of textual data analysis methods show, the most efficient approach is transfer learning technologies based on the language models, a separate class of neural networks pre-trained on a large set of unlabeled texts of the selected subject area [6,7,10]. However, the application of a particular method depends on the availability of the data and the complexity of the entity and relationship annotation scheme used: the presence of nested entities and matching multiple entities to a single reference (overlapping), or the presence of references separated by other words (discontinuous entities). In this paper, we focus on the most representative corpus of Russian-language reviews with markup of pharmaceutically significant entities and relationships – Russian Drug Reviews Corpus – RDRS1 [8]. It contains all of the complex markup examples mentioned above. There are no open solutions for such a problem and in this paper we propose an algorithm based on the cascade approach to highlight the relationships between the entities: Drugname (the name of the medication reviewed), ADR (adverse reaction of the drug), Diseasename (name of the disease in the review), SourceInfoDrug (Source of the information about the drug), Indication (a symptom of the disease).
2
Data
In this paper the Russian Drug Review Corpus (RDRS) [8] was used, which contains internet user reviews on medicines from otzovik.ru website. Total number of texts in RDRS is 2800. Each of them manually was annotated by pharmacological experts for labeling significant entities. In this paper only five types of entities described above were used. The 1590 texts from RDRS were additionally marked up for selecting group of entities related to one case of using a drug (context). Different contexts are distinguished primarily by mentions of new drugs and mentions of new effects that appeared after the use of the same drug against different diseases, and also if the review describes cases of drug use by different people. One text can contain different number of contexts, and one entity can be a part of several contexts. A sample of texts that contain multiple contexts (the total number of such texts is 908) was drawn from the corpus for investigating in this paper. Entities 1
https://sagteam.ru/en/med-corpus/ - project site dedicated to creating the corpus.
The Two-Stage Algorithm for Analyzing Drug Reviews on Russian
465
that occur in the same group were related (positive class) and entities from different groups of sense were considered to be unrelated (negative class). The following pairs of entity types were considered in the work as the most interesting for analysis from the practical point of view: – ADR-Drugname – the relationship between a drug and its adverse effects; – Drugname-SourceInfodrug – the relationship between a drug and a source of information about it (e.g., “was advised in a pharmacy”, “the doctor recommended it”); – Drugname-Diseasename – the relationship between the drug and the disease; – Diseasename-Indication – the relationship between the disease and its symptoms (e.g., “cough”, “temperature 39”). The total number of the entities in this corpus is 21497, relations – 16481. The statistics of the entity types is presented in Table 1. The Table 2 presents statistics of the relation types in used corpus. Table 1. Information of used corpus. Entity type – type of the entity considered; continuous entities – the number of the continuous entities; discontinuous entities – the number of the discontinuous entities; text number – the number of the text containing considered entity type; text number continuous – the number of the text containing only continuous entities of the considered type; text number discontinuous – the number of the text containing discontinuous entities of the considered type. Entity type
Number of entities
ADR
Continuous Discontinuous entities entities
Text number
Text number continuous
Text number discontinuous
733
705
28
257
231
26
Drugname
3629
3629
0
908
908
0
Diseasename
1638
1631
7
595
588
7
Indication
1667
1637
30
611
583
28
959
896
63
552
491
61
SourceInfodrug
Table 2. Statistics on the types of relations in the RDRS corpus with 908 multi-context reviews. Entity pair types are specified in the names of the 4 main columns; positive (pos.) – the number of pairs of entities of the indicated type, between which there is a relationship; negative (neg.) – the number of pairs of entities of the specified type, between which there is no relationship because the entities are in different contexts; Number of relations – the total number of relations of a given type in the dataset; Text fraction – the ratio of the number of texts with relations of given type to the total number of texts in the corpus. Relation classes
ADR & Drugname
Drugname & Diseasename
Drugname & SourceInfoDrug
Diseasename & Indication
pos.
neg.
pos.
neg.
pos.
neg.
pos.
neg.
Relation number 1913
917
4277
2153
2700
1232
2588
701
Text fraction
0.273 0.204
0.634 0.514
0.598 0.457
0.416 0.148
466
3 3.1
A. Sboev et al.
Materials and Methods Proposed Approach
The approach proposed in this paper is based on the sequential application of the two neural network models (hereafter the Pipeline). First, the text is processed, and the named entities are determined using the first model. Next, pairs of entities are generated, between which the existence of a relationship will be checked. The pairs are made taking into account the predicted types of entities. For example, ADR-Indication pairs won’t be considered, since determining the relationship in such an entity pair is outside of the scope of the study conducted in this paper. After that, the determined entity pairs are classified for the presence of a relationship. 3.2
Language Model
The neural network models within the Pipeline are based on the language model of Transformer architecture [12]. In our previous works, the applications of different models to the solution of separately named entity extraction tasks [7,8] and RE [9] were investigated. It is shown that the best accuracy is achieved by using the original XLM-RoBERTa-large [1] model, which was additionally trained on a corpus of unlabeled Internet users’ reviews of medications (XLM-RoBERTa-sag). Two XLM-RoBERTa-sag models were used for the implementation in the Pipeline, each one fine-tuned to solve the NER or RE problem respectively. The scheme of the pipeline is shown on the Fig. 1.
Fig. 1. The pipeline scheme
3.3
Named Entity Recognition
Named entities recognition task is a sequence classification problem, where class should be determined for every input “token” from the input text. A significant feature of some text datasets is “overlapping entities” – one token could be a part of several entities, thus the task becomes sequential multi-label classification.
The Two-Stage Algorithm for Analyzing Drug Reviews on Russian
467
Our approach to the multi-label classification is to use a fully-connected layer for each entity class as an additional neural network block on top of the language model. A layer for each class has 3 neurons to determine if the token is a beginning of the entity of certain type (B-tag in BIO notation), a continuation of such entity (I-tag) or isn’t an entity of this type (O). 3.4
Relation Extraction
When solving the relation extraction task, the language model forms a vector representation of the text, which is fed into the linear layer. The output activities of the linear layer determine if there is a relationship between the pair of entities fed to the input. To predict whether there is a relationship between entities, the following representation of the input data is used: [CLS] “the text of the first entity” [SEP] “the text of the second entity” [TXTSEP] “the text that contains the entities in question”. Here: [CLS] is a special service “token”, the vector representation of which is used to determine whether or not the target entities are connected. During training, the weights of the model are adjusted so that the vector representation of this “token” aggregates information about the entities and words of the text and is the most informative in the context of the classification task being solved (determination of the relationship between entities); [SEP] – a special service “token” that is used to separate the pair of entities in question; [TXTSEP] – a special service “token”, which is used to separate a pair of entities from the text. Thus, a pair of entities is fed into the model to establish the relationship between them and a text that contains information about the context of the entities in question, which allows the model to form a more informative vector representation. The effectiveness of using such a representation of input data in solving the problem of determining relationships between named entities based on a language model was shown in our previous work [9]. Additionally, experiments were carried out with basic machine learning (ML) methods on the “gold” named entity annotation to have baseline accuracy estimation for the relation extraction task. The following methods were used during the experiments on basic machine learning: – Support vector machine [11] – a linear model based on building a hyperplane that maximizes the margin between two classes; – Multinomial Naive Bayes model [5] – a popular solution for baselines in such text analysis tasks as spam filtering or text classification. It performs text classification based on words’ n-grams’ co-occurrence probability; The data representation for the basic ML methods is a concatenation of TF-IDF vectors of character n-grams of entities in question.
468
4
A. Sboev et al.
Experiments
4.1
Experimental Setup
The calculations were performed using cross-validation with splitting the data into 5 parts. Thus, at each cross-validation iteration, 80% of the texts were used for fine-tuning the model, and 20% were used for testing. From a pair of entities and a review text it was required to determine the class of relationship, which includes the type of entity pair and the label of the presence or absence of a pos./neg. relationship for this type (see Table 1). To test the pipeline, the following experiments were performed: 1. efficiency estimation of the relation extraction model using “gold” annotation of entities; 2. efficiency estimation of the named entity recognition model (1st step of the pipeline); 3. efficiency estimation of the relation extraction model on the entities determined by NER model (2nd step of the pipeline). 4.2
Evaluation Metrics
The accuracy of the model was evaluated on the testing part of each fold by the metric f1 score. For NER task we use f1-exact metrics in which a correctly defined entity is considered if its both class and boundaries are correctly defined in the source text. We use evaluation script from conlleval2 for achieving f1-exact value. In case of overlapping we represented all predicted entities’ types in separate files of CONLL format and for each of them f1-exact was estimated. In the case of RE task on each positive (pos.) relationship class A from Table 2: TP TP 2∗P ∗R ,P = ,R = P +R TP + FP TP + FN Where P is “precision” – the share of correctly predicted objects of the class under consideration from the number of objects that the model assigned to the class under consideration; R – “recall” – the share of correctly predicted objects of the class under consideration from the real number of objects of the class under consideration; T P – true positive examples are the number of objects of class A correctly identified by the model; F P – false positive examples are the number of objects recognized by the RE model as class A, but actually having a different class. If we consider RE task on entities predicted by the NER model, then there is an addition to the current definition: all the erroneously predicted entities that forms a relations classified f 1score =
2
https://www.clips.uantwerpen.be/conll2000/chunking/output.html.
The Two-Stage Algorithm for Analyzing Drug Reviews on Russian
469
by the RE model to class A are also considered as false positive examples for A class; F N – false negative examples are the number of ties that have class A but were incorrectly recognized by the RE model as a different class. If we consider RE task on entities predicted by the NER model then there is an addition to the current definition: all entities forming a relationship class A according to the golden annotation that were not correctly recognized by the NER model are also considered false negative examples for class A.
5
Results
In this section we present the results of the evaluation of the models in the pipeline, as well as the final evaluation of its performance on the problem of extracting related named entities. The accuracies were obtained on a sample of 908 RDRS corpus texts containing multi-contexts. Table 3 presents the results of a comparative analysis of Relation Extraction methods obtained using entities with reference markup from the RDRS corpus. For the most complete analysis of the model’s performance, we compared the accuracy of different machine learning models in terms of complexity and type, as well as a classifier based on the probability distribution of positive and negative examples of the pairs of entities in question. Table 3. Accuracy of the models for the relation extraction task on the basis of named entities with gold markup Model
ADR & Drugname
Drugname & Drugname & Diseasename SourceInfoDrug
Diseasename & Indication
RE-XLM-RoBERTa-sag
88.2
84.2
87.5
61.9
RuBERT
82.1
78.2
82.6
56.7
Linear SVM
58.9
60.3
61.6
53.8
Multinomial Naive Bayes
50.1
47.5
43.9
42.8
Dummy Classifier (stratified random generation)
49.2
49.9
51.4
49.9
As the comparison results show, the proposed XLM-RoBERTa-sag model, which is fine-tuned to the task of classifying the existence of links between entities, shows the best accuracies. Table 4 presents the accuracies of the first stage of the Pipeline’s work - identifying the entities involved in the formation of relations. In general, the obtained estimates of the NER problem solution accuracies are comparable to our previous work [8], where a comparative analysis and selection of the most effective approach to solving this problem were performed. The observed differences in accuracy are related to the different number of texts used for training and testing and, accordingly, the examples of specific entities.
470
A. Sboev et al. Table 4. Estimations of NER task using fine-tuned XLM-RoBERTa-sag Type of named entity f1-exact ADR
54.7
Diseasename
87.2
Indication
63.6
Drugname
96.2
SourceInfodrug
64
Table 5 presents the accuracies of the second stage of the pipeline: determining the relationships between the entities identified in the first stage. Table 5. Estimations of Relation extraction task on the basis of predicted named entities Type of relation
f1-score
ADR & Drugname
50.5
Drugname & Diseasename
78
Drugname & SourceInfoDrug 50.7 Diseasename & Indication
49.0
f1-macro
55.9
The presented estimates show the performance of the algorithm, however, they are lower than the estimates of determining the existence of links between the entities of the benchmark markup. Thus, the basis for improving the accuracy in general is to improve the accuracy of the model for identifying named entities. The following hyperparameters were set for the named entity recognition model: learning rate 0.00002, number of epochs is equal to 10, maximum input length of the sequence is 512, and batch size is equal to 8. During the experiments, the following parameters of the learning process allowed to achieve maximum accuracy in solving the problem of relation extraction: learning rate 1 ∗ 10−5 , maximum input sequence length of 512 tokens, number of learning epochs 10, and the batch size 8.
6
Conclusion
As a result, two-stage algorithm based on the XLM-RoBERTa language model was developed for recognition of pharmaceutically significant named entities and relationships between them. Finally the algorithm was evaluated on complex part of Russian Drug Review Corpus that contained texts describing several cases of drug use. It is shown that the use of a language model pre-trained on a large set of
The Two-Stage Algorithm for Analyzing Drug Reviews on Russian
471
unlabeled reviews makes it possible to improve the accuracy of the extraction of related named entities. Improving the accuracy of the algorithm can be achieved by increasing the efficiency of solving the named entities recognition problem. This requires the development of more efficient models and the expansion of the labeled corpus of examples, which is the goal of our further work. Acknowledgments. This work has been supported by the Russian Science Foundation grant No. 20-11-20246 and has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.
References 1. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019) 2. Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. arXiv:2007.15779 (2020) 3. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: SpanBERT: improving pre-training by representing and predicting spans. Trans. Assoc. Comput. Linguist. 8, 64–77 (2020) 4. Peng, Y., Chen, Q., Lu, Z.: An empirical study of multi-task learning on BERT for biomedical text mining. arXiv preprint arXiv:2005.02799 (2020) 5. Rish, I., et al.: An empirical study of the naive Bayes classifier. In: Workshop on Empirical Methods in Artificial Intelligence, IJCAI 2001, vol. 3, pp. 41–46 (2001) 6. Sboev, A., Sboeva, S., Gryaznov, A., Evteeva, A., Rybka, R., Silin, M.: A neural network algorithm for extracting pharmacological information from Russianlanguage internet reviews on drugs. J. Phys. Conf. Ser. 1686, 012037 (2020) 7. Sboev, A., et al.: An analysis of full-size Russian complexly NER labelled corpus of internet user reviews on the drugs based on deep learning and language neural nets. arXiv preprint arXiv:2105.00059 (2021) 8. Sboev, A., et al.: An analysis of full-size Russian complexly NER labelled corpus of internet user reviews on the drugs based on deep learning and language neural nets (2021). http://arxiv.org/abs/2105.00059 9. Sboev, A., Selivanov, A., Rybka, R., Moloshnikov, I., Rylkov, G.: Evaluation of machine learning methods for relation extraction between drug adverse effects and medications in Russian texts of internet user reviews (2021) 10. Sboev, A., Selivanov, A., Rylkov, G., Rybka, R.: On the accuracy of different neural language model approaches to ADE extraction in natural language corpora. Procedia Comput. Sci. 190, 706–711 (2021) 11. Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999) 12. Vaswani, A., et al.: Attention is all you need. arXiv preprint arXiv:1706.03762 (2017) 13. Zeng, D., Sun, C., Lin, L., Liu, B.: LSTM-CRF for drug-named entity recognition. Entropy 19(6), 283 (2017)
Causal Cognitive Architecture 2: A Solution to the Binding Problem Howard Schneider(B) Sheppard Clinic North, Richmond Hill, ON, Canada [email protected]
Abstract. The binding problem is considered in terms of how the brain or another cognitive system can recognize multiple sensory features from an object which may be among many objects, process those features individually and then bind the multiple features to the object they belong to. The Causal Cognitive Architecture 2 (CCA2) builds upon its predecessor with the Navigation Module now consisting of an Object Segmentation Gateway Module allowing segmentation of a sensory scene, the core Navigation Module where the navigation maps are operated on, and the Causal Memory Module storing navigation maps the CCA2 has made in the course of its experiences. Objects within an input sensory scene are segmented, and sensory features (i.e., visual, auditory, etc.) of each segmented object are spatially mapped onto a navigation map in addition to a mapping of all objects on another navigation map. Keywords: Binding problem · Cognitive architecture · Spatial navigation · Artificial general intelligence
1 Introduction 1.1 The Binding Problem Different sensory features, both between and within sensory systems, are often processed by different assemblies in the mammalian brain. For example, it is known that visual motion and color sensory inputs are processed in different brain regions with only sparse connections between them [1]. The “binding problem” considers how the brain can recognize and then essentially “bind” these separately processed pieces of data [2]. The binding problem has been considered key to explaining the unity of one’s experience of consciousness [3, 4] and the obvious extension to creating more general forms of artificial intelligence. However, in this paper, we consider only a narrow mechanistic aspect of the binding problem, i.e., what is a possible mechanism to segment objects and in particular combine back sensory features so that a single object is recognized. A number of solutions to the binding problem have been described in the literature. Olshausen and colleagues [5] proposed that attention is focused on a particular region in a visual scene for example, with particular features fed forward to higher visual processing areas. A longstanding proposed solution to the binding problem has been temporal © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 472–485, 2022. https://doi.org/10.1007/978-3-030-96993-6_52
Causal Cognitive Architecture 2
473
synchronization of the firing of neurons in different cortical areas with sensory features associated with a given object. For example, work by Engel and colleagues [6] showed neuronal oscillations in the cat visual cortex at separate sites can transiently synchronize and were affected by features of the visual input data. Shadlen and Movshon [7] considered the experimental evidence which was not fully supportive of this hypothesis. Merker [8] suggested that the synchronized gamma range neuronal oscillations, thought to be important in binding sensory input features, may simply be reflecting activation of cortical areas, and have little other function. Kahneman, Treisman and Gibbs [9] proposed an “object-file theory” that different input features of an object are bound by their location. If in a visual scene there are three different objects, then three location tags are created to bind to various features. Goldfarb and Treisman [10] suggested that the ability of the brain to represent temporary object files as such, gives rise to an arithmetic system in humans as well as in animals. As well, Goldfarb and Treisman suggested how the above neural synchronization hypothesis can successfully work with the object-file theory discussed above. A location tag for each object exists, but the features for a particular object are bound by neuronal oscillation synchronization of features with a particular location. Thus, the features of different objects are not erroneously bound to each other. Isbister and colleagues [11] described polychronous neuronal groups which contain binding neurons that with training learn to bind features of objects between lower and higher visual cortical levels. As such Isbister and colleagues believed this to be a solution to the binding problem in the primate visual system. However, at the time of this writing, a well-proven mechanism for the binding of input sensory features to various real-world objects sensed in the brain, remains unresolved. 1.2 The Neural-Symbolic Gap Artificial neural networks (ANNs) can recognize patterns and perform reinforcement learning at a human-like proficiency [12, 13]. However, compared to a four-year old child, in terms of logically and causally making sense of their environment or a problem at hand, especially if training examples are limited, they perform poorly [14, 15]. This is sometimes referred to as the neural-symbolic gap [16]. A number of cognitive architectures integrate subsymbolic and symbolic processing to varying degrees [17, 18]. A review of the field by Langley [19] noted that while early models were mainly symbolic, many of the modern architectures are more hybrid. Lake and colleagues [20] proposed that thinking machines should build causal models of the world and discussed intuitive physics and psychology present in infants. Epstein [21] discussed cognitive and robotic modeling of spatial navigation. Hawkins and others [22, 23] described how abstract concepts can be represented in a spatial framework. Despite many of the above designs and implementations combining ANNs and symbolic elements, they do not approach the causal abilities seen in human children. The Causal Cognitive Architecture 1 (CCA1) and its predecessors combined connectionist and symbolic elements in a biologically plausible manner in which sensory input vectors were processed causally [24–27]. A collection of intuitive and learned logic, physics, psychology and goal planning procedural vectors (essentially acting as small algorithms) are applied against inputs, and intermediate causal results can be fed back to
474
H. Schneider
the sensory input stages and processed over and over again. An overview of the CCA1 is shown in Fig. 1. As will be discussed below, in toy examples, the CCA1 seemed to narrow the neural-symbolic gap, i.e., it offered connectionist pattern recognition along with the ability to generate causal behavior, i.e., to take actions from exploration of possible cause and effect of the actions. 1.3 The Binding Problem and the Causal Cognitive Architecture An issue relevant to both cognitive science and artificial intelligence is the binding problem—how can the brain or another cognitive system recognize multiple sensory features from an object which may be among many objects, process those features individually and then bind the multiple features to the object they belong to? In work to enhance the abilities of the CCA1, particularly with regard to the neurosymbolic gap, it is shown below how removing the Sensory Vectors Binding Module (Fig. 1) in the CCA1 led to the development of the CCA2, and improved its abilities to bind sensory input features to an object.
2 Functioning of the Causal Cognitive Architecture (CCA1) 2.1 Sensory Processing in the CCA1 We start with an overview of the CCA1 before moving onto the CCA2. A summary of the architecture of the CCA1 is shown in Fig. 1. Sensory Inputs 1..n from different sensory systems 1..n, propagate to the Input Sensory Vectors Association Modules 1..n, with a module dedicated for each sensory system. Each such module contains a conventional neural network [12] or a hierarchy of Hopfield-like Network units (HLNs) [24], or other similar mechanism that can robustly associate an input sensory vector with other vectors (previously learned ones, instinctive pre-programmed ones, as well as other recent sensory input vectors) within the CCA1. Further binding of the processed input sensory vectors, via straightforward temporal mechanisms, or more complex global feedback mechanisms, occurs within the Sensory Vectors Binding Module. 2.2 Pre-Causal Cognitive Processing and Output Cognition in the CCA1 is fundamentally movement-based. At the simplest level, the CCA1’s embodiment navigates through physical space, although at higher cognitive levels navigation occurs through a space of concepts and analogies. This is discussed more in Schneider [27]. In the CCA1 the Navigation Module holds a current navigation map of a small part of the inferred physical world. Represented on this modest map are objects from the Sensory Vectors Binding Module, an object representing the CCA1 embodiment itself, and possible objects from the Instinctive Primitives Module and the Learned Primitives Module.
Causal Cognitive Architecture 2
475
The Instinctive Primitives Module and the Learned Primitives Module are triggered by processed vectors from the Input Sensory Vectors Association Module, the Sensory Vectors Binding Module, as well as by the Goal/Emotion Module and the Autonomic Module. The Instinctive Primitives Module and the Learned Primitives Module can manipulate the representations of the objects in the current navigation map and produce an output signal. This output signal from the Navigation Module causes the Output Vector Association Module to produce the desired movement of the embodiment of the CCA1. 2.3 Causal Cognitive Processing and Output In pre-causal operation of the CCA1, associations are made in the Input Sensory Vectors Association Modules and other modules described above. The Navigation Module effectively allows pre-causal processing since objects and rules are being applied by the Instinctive and Learned Primitives Modules onto the current navigation map. Then the Navigation Module makes a navigation decision which becomes the output of the CCA1. In causal operation, a more significant feedback pathway from the Navigation Module to the Input Sensory Vectors Association Modules allows the intermediate results of a problem or a causal situation to be fed back to the sensory input stages, and processed again in the next processing cycle. This is described in more detail in Schneider [27].
Fig. 1. Causal cognitive architecture 1 (CCA1) (not all connections shown. D – internal developmental timer)
476
H. Schneider
Essentially, intermediate results can be fed back to the sensory input stages and then processed by the Navigation Module, then repeated as needed. As shown in the simulation example below, in this manner, causal processing of the inputs often results. The navigation maps are stored in the Causal Memory Module. When similar events occur again, the most relevant navigation map(s) are activated and recalled into the Navigation Module, and thus provide the CCA1 with an instant possible model of the world and actions it took previously. 2.4 Pre-Causal Mode CCA1 Simulation Example An embodiment of the CCA1 plus the CCA1 itself (both together informally referred to as “the CCA1”) must enter a simulated grid world forest, and find and rescue a lost hiker. Figure 2 shows the starting position of the CCA1 in this grid world. Consider a simulation example. After a number of moves, the CCA1 ends up in the square “forest” just north of the square “waterfall.” (Note that “waterfall” is labeled in the map in Fig. 2 for the convenience of the reader. The CCA1 does not know this square is a waterfall nor have access to this map’s information—it must try to build up its own internal map.) In the next few processing cycles the CCA1 recognizes a lake to the west (which the Instinctive Primitives Module signals to avoid movement to) and to the south a shallow river with fast flowing noisy water (the cliff part of the waterfall is not visible). A shallow river does not trigger any avoidance signals in the Instinctive Primitives Module and the CCA1 moves south to the square labeled “waterfall” in Fig. 2. It is swept by the fast-moving river over the waterfall’s cliff and is damaged. Associative learning does occur. The next time the CCA1 recognizes a fast-flowing river with much noise, then this will trigger in the Goal/Emotion Module and the Learned Primitives Module a signal to the Navigation Module not to navigate to this square. 2.5 Causal Mode CCA1 Simulation Example A brand new CCA1 starts off in a new simulation, again starting off as shown in Fig. 2, and after a number of moves, it happens to navigate again to the square “forest” just north of the square which in Fig. 2 (intended only for the reader) is labelled a “waterfall.” The CCA1 recognizes to the south a shallow river with fast flowing noisy water (the cliff part of the waterfall is not visible). This new CCA1 has never seen a waterfall before. However, {“water”} + {“fast flow” + “noise”} triggers in the Instinctive Primitives Module {“water”} + {“push”} which is sent to the Navigation Module. The Navigation Module is unable to further process the vector representing {“water” + “push”}. Thus, it feeds it back to the Input Sensory Vectors Association Module. In the next processing cycle, the Input Sensory Vectors Association Module ignores the external sensory inputs, but instead forwards the intermediate result {“water” + “push”}, as if it is the new sensory input. {“water” + “push”} triggers in the Instinctive Primitives Module a vector propagated to the Navigation Module which causes the Navigation Module to create a new current navigation map. On this new map is an object representing the CCA1 and objects
Causal Cognitive Architecture 2
477
representing water on much of the map. The Navigation Module feeds {“CCA1 under water”} back to the Input Sensory Vectors Association Module. In the next processing cycle, the Input Sensory Vectors Association Module ignores the external sensory inputs, but forwards the intermediate result {“CCA1 under water”} as the sensory input. This triggers in the Instinctive Primitives module a vector representing “do not go” which is propagated to the Navigation Module. This triggers the Navigation Module to load the previous current navigation map of the forest, and “do not go” applies to the square south, i.e., the “waterfall.” The CCA1 recognizes the square to the east as forest and moves instead there. Note that all these processing cycles were triggered from one to another, with no central controlling stored program, other than the basic architecture of the CCA1.
EDG E EDG E EDG E EDG E EDG E EDG E
EDGE
EDGE
EDGE
EDGE
CCA1 * lake
forest
shallow river forest
forest
forest
forest
forest
forest
EDGE
EDGE
forest
forest ** waterfall HIKER
EDGE
EDGE
forest
forest
EDG E EDG E EDG E EDG E EDG E EDG E
Fig. 2. Birds-eye view of the starting positions of the CCA1 and the lost hiker (Note: For the reader. The CCA1 does not possess this information but must construct its own internal current navigation map).
3 The Evolution of the CCA2 from the CCA1 3.1 Problems Arising in Attempts to Enhance the CCA1 The Causal Cognitive Architecture 1 (CCA1) appears to be able to narrow the neuralsymbolic gap. However, the examples described above are toy problems—very simplified problems without the complexity found in real world examples. Thus, further development work began on the CCA2 which was to be a more robust version of the CCA1.
478
H. Schneider
The CCA2, like the CCA1, incorporated several sensory input systems, and received visual, auditory, and olfactory features of objects and the environments in the direction in front of itself. An issue which arose in attempting to create a more robust system, i.e., the CCA2, is that Sensory Vectors Binding Module must output some vector which represents the object/environment it has detected by fusing the sensory features together and then using neural network-like pattern recognition to identify the objects and the sensory scene. (We will use the term “sensory scene” or just “scene” for short to refer to the sensory stimuli being presented to the CCA1 or CCA2—visual, auditory, olfactory, etc.) The output vector from the Sensory Vectors Binding Module then goes to the Navigation Module, the Instinctive and Learned Primitives Module, and several other modules, as shown in Fig. 1. Once examples become even modestly larger than the toy problems of the lost hiker in the simulated forest (or the three gears in Schneider [27]), it becomes extremely difficult for the other modules to understand what it is exactly that the Sensory Vectors Binding Module is recognizing. Is a sensory scene with a few trees, some large rocks, the sound of fast flowing water and the algae-like water odor identified the same as a scene with no trees at all but containing rocks, the fast-flowing water sound, and algae-like water odor, or the same as a scene of no trees but containing some large rocks and sounds of fast-flowing water, but with the distant smell of pine needles? In the toy examples above we can use a convolutional neural network (CNN) or equivalent to crudely classify the sights, sounds, and smells to a forest square, an edge square or a waterfall/fast river square. However, once we allow even a slightly larger combination of features to be classified and reported to the Navigation Module, the conundrum arises of how to label different combinations detected by the Sensory Vectors Binding Module. Even if we devise a training scheme to produce a variety of different labels for different combinations of sensory input features in the Sensory Vectors Binding Module, how does the Navigation Module, the Instinctive Primitives Module, the Learned Primitives Module and so on (Fig. 1), process particular labels, i.e., the output vectors from the Sensory Vectors Binding Module? There is also another issue of how the system handles the addition or loss of a sensory system. For example, if vision inputs are occluded and the CCA1 only receives auditory and olfactory inputs for a scene, how does the Sensory Vectors Binding Module use this reduced information to make a decision? 3.2 Elimination of the Sensory Vectors Binding Module and the Emergence of the CCA2 Over the last decade many techniques have been developed to avoid brittleness in deep learning networks, i.e., not being able to recognize even slightly out-of distribution sensory representations of an object [12]. For example, Chidester and colleagues [28] discuss using CNNs to automatically analyze high volumes of microscope slide images. A problem is that different microscopic images of the same classification category may be somewhat rotated to each other. If a large variety of rotated versions of the microscopic images are available for training, then the CNN can become somewhat rotation-invariant. However, Chidester and colleagues note that biological data is often limited or costly to obtain, and thus they use various schemes (e.g., group convolution, conic convolution)
Causal Cognitive Architecture 2
479
to obtain the rotation-invariance so that their deep learning network will more accurately classify microscopic images that may have different rotations. Applying various regularization techniques to reduce generalization errors within a particular sensory system in the CCA1 is advantageous. However, they do not adequately deal with the issues which arise when, for example, a complete sensory system or several sensory system inputs are not available. Although various engineering workarounds were considered to allow the Sensory Vectors Binding Module to better handle varying levels of sensor fusion, the conundrum raised above still remains—how to label different combinations detected by the Sensory Vectors Binding Module and feed this signal to the Navigation Module and other Modules. If through various sensor fusion techniques, the Sensory Vectors Binding Module classifies the input object and thus produces a symbolic output (e.g., forest or edge or fast-moving river, etc.) this can be fed to the Navigation and other modules (and in fact is what occurs in the CCA1 toy models). However, the problem is that if there are a number of possible sensory systems, each with a number of possible sensory features, and there are a number of possible objects in the scene to be analyzed, there is an explosion of combinations possible, making processing a particular symbol for each possible combination unwieldly to handle. The decision was thus made in the CCA2 to eliminate the Sensory Vectors Binding Module. Binding of the various sensory inputs will thus take place directly in the Navigation Module. The Input Sensory Vectors Association Modules still remain in the CCA2 for the various sensory modalities, although they function largely to pre-process sensory input signals, providing some classifications to help with segmentation, which will be discussed in the next session, and to help trigger more rapidly various Learned and Instinctive Primitives. Binding of the input sensory features no longer occurs in a “Binding Module” but in the Navigation Module.
4 Mechanism of Sensory Binding in the CCA2 4.1 Multiple Objects in a Scene An overview of the architecture of the CCA2 is shown in Fig. 3. While it is similar to the CCA1, the Sensory Vectors Binding Module is no longer present. As well, the Navigation Module has an Object Segmentation Gateway Module leading into it. In addition, the Causal Memory Module is more of an integral part of the Navigation Module now. The function of the Object Segmentation Gateway Module is to segment a sensory scene into objects of interest. (As noted above we use the term “sensory scene” to refer to the sensory stimuli being presented to the CCA2—visual, auditory, olfactory, etc.) The individual objects segmented in the sensory scene, as well as the entire scene itself treated as one composite object, will then trigger navigation maps in the Causal Memory Module to be retrieved and moved to the Navigation Module. A limited generative process is used in combination with an assembly of recognition units (e.g., CNN or other neural network). If a particular segmentation of the sensory scene better matches one stored navigation map (for example, a particular river) than another stored map, then that particular navigation map (for example, a particular river) in the Causal Memory Module will be transferred to the Navigation Module. (Actually, data is not transferred as in a typical computer architecture, but instead that navigation map is activated.)
480
H. Schneider
4.2 Binding Sensory Features of a Sensory Scene to an Object For simplicity and to better explain binding in the Navigation Module, we will assume there is a single object (or single collection of objects representing a composite object) in the sensory scene. The Object Segmentation Gateway Module (Fig. 3) propagates the sensory features of this object to the Navigation Module. For example, as shown in Fig. 4, the CCA2, looking for the lost hiker in the simulated forest grid world, is presented with a sensory scene of what we call a “river.” However, to the CCA2 there are emerging out of the Input Sensory Vectors Association modules olfactory, auditory and visual sensory features. These sensory features propagate to the Object Segmentation Gateway Module. As noted above, for simplicity we consider a single object here. These sensory features are then propagated to the Navigation Module. (Actually, they are already within the Navigation Module since in the CCA2 the Object Segmentation Gateway Module, the Navigation Module and the Causal Memory Module are all tightly integrated. It is often not necessary to transfer data from one part to another of the module, but simply activate data structures of interest.)
Fig. 3. Causal cognitive architecture 2 (CCA2) (not all connections shown. D - Internal developmental timer)
Causal Cognitive Architecture 2
481
An existing navigation map that has similar sensory features of the sensory scene is recalled from the Causal Memory Module, e.g., a map of a previous “river” that the CCA2 had seen. (Actually, a number of navigation maps are recalled and are operated on in parallel allowing a generative aspect to finding the most suitable navigation map to use, but for simplicity in the discussion here we consider a single navigation map.) The sensory features of the sensory scene that are being propagated to the Navigation Module are now mapped onto the recalled navigation map or a copy of it, thus changing this to a newer navigation map. Sensory features are spatially bound onto the navigation map. As shown in Fig. 4, lines demarcating water from land are stored onto the navigation map. It is not an artistic reproduction of the scene but rather a representation of how the CCA2 senses it. Bubbling sounds are mapped onto two places on this navigation map. An algae-like smell is also mapped onto a location on this navigation map. The CCA2 uses navigation maps as its main storage of data. Note that the binding of separate streams of sensory data occurs automatically in a spatially oriented fashion onto a navigation map.
Fig. 4. Different sensory features are bound onto a navigation map representing the object, in this case a river
All the words and shapes on the map in Fig. 4 are the lowest-level sensory primitives for the CCA2. “Water”, “bubbling sound”, “algae odor”, and the lines of the river are sensory primitives. While, of course, water, for example, requires even lower-level visual sensory features, it is considered a recognizable sensory feature and treated in the CCA2 as a lowest-level sensory primitive. Not shown in Fig. 4, and beyond the scope of this paper’s subject, are links between navigation maps. However, the operations on the navigation maps by the procedural vectors from the Instinctive and Learned Primitives Modules and other modules, and the connections between navigation maps are discussed in Schneider [27]. Figure 4 does not classify the sensory scene as a river, but there can be links between Fig. 4 (and other navigation maps similar to it) to other navigation maps holding language words such as “river.”
482
H. Schneider
While in Fig. 4 the features in the map are the lowest-level sensory primitives, it is possible, and indeed an important element of advantageous cognitive behavior, to hold higher-level complex objects and properties, on a navigation map, which link to other navigation maps and in turn to yet other navigation maps. For example, a tiny crude image of sorts of the lost hiker (not shown in Fig. 4 but would exist in another square of the simulated grid world forest) could be part of a navigation map representing the CCA2’s internal map of the forest. However, the tiny crude image of sorts of the lost hiker would link to another navigation map(s) that contains more information about the lost hiker. The Causal Memory Module of the CCA2 contains navigation maps, all linked to a variety of other navigation maps. 4.3 A Solution to the Binding Problem As shown in Fig. 4, it does not really matter that sensory features are processed in distinct streams, since they are all mapped onto a common navigation map. The various visual features—edges, colors, and so on, are mapped onto the map. The auditory features are also mapped onto the map, as best as the location of the sounds can be determined. The olfactory features are mapped onto the map. As shown in Fig. 4 the purpose of this navigation map is not to classify the river as a “river” but to spatially map the sensory features onto a navigation map, which is the common data structure used by the CCA2. The binding problem was described above as how the brain or other cognitive system can recognize multiple sensory features from an object which may be among many objects, process those features individually and then bind the multiple features to the object they belong to. The Object Segmentation Gateway Module described in the section above, allows segmenting a number of objects in a sensory scene. Sensory features for each object are then spatially mapped onto a navigation map. Although not shown in Fig. 4, since for simplicity we only have one object (i.e., the river), scenes containing multiple objects are also mapped onto another navigation map (i.e., providing a composite map of the whole scene) with links to their individual navigation maps. The navigation map-based structure of the architecture requires data stored in and operated on in the form of navigation maps. Binding operations are implicit in the operation of the CCA2. Note that if a sensory system is unable to be used (for example, the visual sensors are blocked) and the other sensory systems available are reasonably functional, the same navigation maps can end up being used—the CCA2 still has almost the same representation of the world and can take similar operations on this navigation map-based representation of the world. As discussed in Schneider [27] the causal cognitive architecture easily allows the emergence of pre-causal behavior for straightforward operation. It easily allows the emergence of full causal behavior as well the as the possibility of psychosis (both occurring in humans but weakly or rarely in other mammals) when the intermediate results from the Navigation Module are fed back to the sensory modules and operated on again in the next processing cycle. It easily allows the emergence of analogies. In this paper, we discuss how with modest modifications of the CCA1 to the CCA2, the causal cognitive architecture readily handles the binding of sensory features, even if these features are processed separately. The causal cognitive architecture is brain inspired. Given that a navigation map-based data structure is required for its operation, it is hypothesized that
Causal Cognitive Architecture 2
483
a similar functional requirement applies to mammalian including human brains, and that this map-based structure allows the brain to bind sensory features from an object, even if the sensory features have been processed separately.
5 Discussion There are theoretically a myriad of different mechanisms which can produce a general intelligence (i.e., artificial general intelligence in the case of machines) but at the time of this writing there is only one which exists in practice—the human brain. Hence, it was worthwhile to consider the binding problem, an aspect of the brain which continues to intrigue neuroscientists, cognitive scientists as well as neurophilosophers. The CCA1 and its predecessors [24, 26, 27] were originally designed to consider other questions of animal and human brain function: Why do humans but not other animals demonstrate robust causality? Why do humans but not other animals demonstrate psychosis with any significant frequency? Why does navigation appear to be a basic ability of most animals, and in mammals be a basic function of the hippocampal-entorhinal system? In this paper we consider another question of animal and human brain function: What is a solution to the binding problem? While the CCA2 may only loosely (or even incorrectly) model the biological binding process, by attempting to answer this question the CCA2 is able to surpass the CCA1 in being able to better process sensory scenes involving multiple objects with multiple sensory features involving multiple different sensory systems. As noted above, in this paper, the binding problem is considered in terms of how the brain or another cognitive system can recognize multiple sensory features from an object which may be among many objects, process those features individually and then bind the multiple features to the object they belong to. In order to do so, the Sensory Vectors Binding Module of the CCA1 is removed and instead binding now occurs within the Navigation Module. In the CCA2 the greater Navigation Module now consists of an Object Segmentation Gateway Module allowing segmentation of a sensory scene, the Navigation Module where the navigation maps are operated on, and the Causal Memory Module storing navigation maps the CCA2 has made in the course of its experiences.
References 1. Bartels, A., Zeki, S.: The temporal order of binding visual attributes. Vision Res. 46(14), 2280–2286 (2006). https://doi.org/10.1016/j.visres.2005.11.017 2. Herzog, M.: Binding Problem. In: Binder, M.D., Hirokawa, N., Windhorst, U. (eds.) Encyclopedia of Neuroscience. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/9783-540-29678-2_626 3. Revonsuo, A.: Binding and the phenomenal unity of consciousness. Conscious. Cogn. 8(2), 173–185 (1999). https://doi.org/10.1006/ccog.1999.0384 4. Feldman, J.: The neural binding problem(s). Cogn Neurodyn. 7(1), 1–11 (2013). https://doi. org/10.1007/s11571-012-9219-8 5. Olshausen, B.A., Anderson, C.H., Van Essen, D.C.: A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13(11), 4700–4719 (1993). https://doi.org/10.1523/JNEUROSCI.13-11-04700.1993
484
H. Schneider
6. Engel, A.K., König, P., Gray, C.M., Singer, W.: Stimulus-dependent neuronal oscillations in cat visual cortex. Eur. J. Neurosci. 2, 588–606 (1990). https://doi.org/10.1111/j.1460-9568. 1990.tb00449.x 7. Shadlen, M.N., Movshon, J.A.: Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24(1), 67–77 (1999). https://doi.org/10.1016/S0896-6273(00)808 22-3 8. Merker, B.: Cortical gamma oscillations: the functional key is activation, not cognition. Neurosci. Biobehav. Rev. 37(3), 401–417 (2013) 9. Kahneman, D., Treisman, A., Gibbs, B.J.: The reviewing of object files: object-specific integration of information. Cogn. Psychol. 24(2), 175–219 (1992). https://doi.org/10.1016/00100285(92)90007-o 10. Goldfarb, L., Treisman, A.: Counting multidimensional objects: implications for the neuralsynchrony theory. Psychol Sci. 24(3), 266–271 (2013). https://doi.org/10.1177/095679761 2459761 11. Isbister, J.B., Eguchi, A., Ahmad, N., Galeazzi, J.M., Buckley, M.J., Stringer, S.: A new approach to solving the feature-binding problem in primate vision. Interface Focus. 8(4), 20180021 (2018). https://doi.org/10.1098/rsfs.2018.0021 12. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 13. Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236 14. Ullman, S.: Using neuroscience to develop artificial intelligence. Science 363(6428), 692–693 (2019). https://doi.org/10.1126/science.aau6595 15. Waismeyer, A., Meltzoff, A.N., Gopnik, A.: Causal learning from probabilistic events in 24-month-olds Developmental Science 18(1), 175–182 (2015). https://doi.org/10.1111/desc. 12208 16. Besold, T.R., et al.: Neural-symbolic learning and reasoning: a survey and interpretation. ArXiv abs/1711.03902 (2017): n. pag 17. Anderson, J.R., Bothell, D., Byrne, M.D., et al.: An integrated theory of mind. Psychol. Rev. 111(4), 1036–1060 (2004). https://doi.org/10.1037/0033-295X.111.4.1036 18. Rosenbloom, P., Demski, A., Ustun, V.: The Sigma cognitive architecture and system. J. Artif. General Intell. 7(1), 1–103 (2016). https://doi.org/10.1515/jagi-2016-0001 19. Langley, P.: Progress and challenges in research on cognitive architectures. In: Proceedings of AAAI-17 (2017) 20. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav Brain Sci. 40, E253 (2017) 21. Epstein, S.L.: Navigation, Cognitive Spatial Models, and the Mind. In: AAAI 2017 Fall Symposium: Technical report FS-17–05 (2017) 22. Hawkins, J., Lewis, M., Klukas, M., et al.: A Framework for intelligence and cortical function based on grid cells in the Neocortex. Front. Neural Circ. 12, 121 (2019). https://doi.org/10. 3389/fncir.2018.00121 23. Schafer, M., Schiller, D.: Navigating social space. Neuron 100(2), 476–489 (2018). https:// doi.org/10.1016/j.neuron.2018.10.006 24. Schneider, H.: Meaningful-based cognitive architecture. Procedia Comput. Sci. 145, 471–480 (2018). https://doi.org/10.1016/j.procs.2018.11.109 25. Schneider, H.: Subsymbolic versus symbolic data flow in the meaningful-based cognitive Architecture. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 465–474. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-25719-4_61 26. Schneider, H.: The meaningful-based cognitive architecture model of schizophrenia. Cogn. Syst. Res. 59, 73–90 (2020). https://doi.org/10.1016/j.cogsys.2019.09.019
Causal Cognitive Architecture 2
485
27. Schneider, H.: Causal cognitive architecture 1: integration of connectionist elements into a navigation-based framework. Cogn. Syst. Res. 66, 67–81 (2021). https://doi.org/10.1016/j. cogsys.2020.10.021 28. Chidester, B., Zhou, T., Do, M.N., Ma, J.: Rotation equivariant and invariant neural networks for microscopy image analysis. Bioinformatics 35(14), i530–i537 (2019). https://doi.org/10. 1093/bioinformatics/btz353
Fundamentals of a Multi-agent Planning and Logistics Model for Managing the Online industry’s Reproduction Cycle Vasily V. Shpak(B) Institute of Microdevices and Control Systems, Moscow, Russia [email protected]
Abstract. The transition of economy to the “Industry 5.0” paradigm and the Sixth Technological Mode will be much more effective if the management models of the real economy sector are ahead of technological, technical, organizational and other processes. The article discusses the structure of a multi-agent system for the management of the electronics industry. A list of requirements for such a system has been proposed, the fulfillment of which will ensure the use of the system for the prompt adoption of optimal management decisions. The article emphasizes that the work of scientists and specialists in the electronics industry in the context of the widespread introduction of digital technologies in the economy must comply with the Ashby rule. This means that the national control subsystem of the industry must have a margin of complexity in comparison with the complexity of the controlled subjects and processes. Therefore, the creation of a multi-agent platform will ensure the implementation of this management law in the industry. A multi-agent platform for coordination, planning and logistics conjugation of adjacent chains of manufacturers and consumers of electronics may be in demand in the very near future, which makes this article relevant. Keywords: Electronics · Radio electronics · Management · Multi-agent technologies · Smart Industry
1 Introduction The productive forces of the modern economy are rapidly developing under the powerful influence of digital technologies and on the basis of the qualitative improvement of hardware systems that are capable of faster and faster performing an increasing number of procedures, calculations and operations. In the description of the ongoing processes, the place of the common, but not entirely correct from the scientific point of view, the term “artificial intelligence” is necessary to apply the concept of “distributed intelligent systems” for coordination and planning and logistical support for making coordinated, optimal decisions on managing the resources of the industry (the economy as a whole) based on the principles of “self-organization and evolution”. In what follows, for brevity, we will use the term “multi-agent approach (technology, platform)”. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 486–495, 2022. https://doi.org/10.1007/978-3-030-96993-6_53
Fundamentals of a Multi-agent Planning and Logistics Model
487
The available practical solutions in the field of application of multi-agent digital technologies clearly show that they are of universal importance for a wide range of management tasks for diversified clusters of development and production of complex products in the economy [1]. Their developments are not inferior to world achievements, which can become a serious help for the leadership, enterprises and organizations of the industry in the implementation of the ambitious plans of the “Strategy for the development of the electronic industry of the Russian Federation for the period up to 2030” (hereinafter referred to as the “Strategy”) [2]. Among the leading domestic developers of the multi-agent platform are the St.Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), the St. Petersburg State Electrotechnical University “LETI” (St. Skolkovo).
2 The Strategy for Managing the Development of the Electronic Industry and Instruments for Its Implementation The distinctive properties of multi-agent technology include, at least, the following advantages: – application of the mathematical apparatus and software developed by scientists to control complex adaptive systems, including such sections of applied mathematics as fractal mathematics and mathematics of strange attractors, models and methods of collective decision-making in network structures, etc. (hereinafter, considering the electronics industry as a complex system, we proceed from the understanding that a complex system, in addition to a significant number of structures of different quality, has the properties of synergy, recursiveness and autopoiesis) [3]; – implementation of dynamic and continuously adjusted optimization of available limited resources (primarily natural) to maintain a complex system in a state of “stable equilibrium in a nonequilibrium super-system” (self-creation), which is fundamentally impossible in the paradigm of classical manual control; – translation of a discrete approach to planning (in stages, from one agreed decision to the next, etc.) into dynamic optimization formats in real time based on the principles of self-organization and evolution of both the system itself and the control system that accumulates experience (recursiveness); – development, implementation and debugging in management practice of a new architecture and model of a network-centric system (systems of systems), starting from the macro level, including the meso level and moving to the micro level (fractal mathematics); – combination of fundamental research of the phenomena of the theory of complexity and systems discovered by Russian scientists [4], which allow to form approaches to quality and efficiency management of coordination and logistics conjugation of goals, tasks and resources, with an applied procedure for the formation of an adequate hardware and software control complex that has no analogues in the world [5].
488
V. V. Shpak
Currently, all fundamental and exploratory research in the development of a fundamentally new multi-agent platform, which will provide the principle of fractal “matryoshka” (fractally nested system structures) in the creation of intelligent systems for resource management in the industry in real time, are in the final stage [6]. The most important elements of the new management system have been worked out on private projects [7]. In the near future, it is necessary to build a management system for the electronics industry, taking into account the data of the world’s most advanced Russian developments. Such a distributed joint solution of complex coordination and logistics issues, for example, the synchronization of the supply of components and complete units for final production, will significantly reduce the time frame in an environment where no enterprise on the market can independently fulfill a production order in full on time. The participation of managers in this continuous process, who will be freed from routine iterative meetings and negotiations, in this continuous process will be important in critical and conflict situations that go beyond the algorithms used. These situations will be accepted by self-learning agents, who in the future will be able to further free the optimization process from the subjective interference of people. A holistic family of “agents” operating within a single multi-agent platform will implement the principle of “matryoshka” in management. The introduction of such a platform online and the launch of its self-study program will provide an adaptive optimized distribution of national economic and industrial resources in kind and in real time. The efficiency and optimization criteria laid down in the “Strategy” will be achieved by combining the principle of “bottom-up planning” and dynamic feedback from final customers. Multi-agent technologies allow not only to combine in kind the chains of interaction of enterprises from R&D to the production of final products, but also to coordinate the fair interests of all participants in production chains on the principles of a solidary economy. Compensation of costs and revenues in exchange for mutual concessions in order to optimize the final result will be distributed from the resulting total profit for each project as a whole. This will make it possible to move from competition for the profit of vertically associated co-executors to mutually beneficial and financially transparent cooperation of all enterprises and organizations of production chains [8]. Due to the rapid technological progress in the field of electronics, the most important issue is the passage of innovative impulses along the entire chain from R&D to final consumer products, and the demands of effective demand in the opposite direction [9]. The multi-agent platform and digital eco-system should be integrated with the stages of R&D [10], with computer-aided design (CAD) systems used in electronics, and with BigData technology for monitoring and analyzing market priorities, and also open for connecting any other services, performers, startups, etc. This approach will meet the principles of the emerging Industry 5.0 - the focus of which, in contrast to Industry 4.0, which solves the problem of industrial automation and the integration of industrial control systems with information-analytical and accounting systems, is the digitalization of knowledge and the creation of “colonies” of multi-agent systems, able, within the framework of digital eco-systems, to simulate the processes of developing and making decisions on resource management in real time, as well as to interact both with each other and with users to agree on decisions [11].
Fundamentals of a Multi-agent Planning and Logistics Model
489
It is fundamentally important that the developed digital eco-system does not become “product for the sake of production”. It should open up opportunities for a strategic breakthrough in the direction of “Society 5.0” and the creation of a qualitatively new model of economic management at the top, middle and lower levels. Specialists in the field of management have no doubt that the classical organization of management of enterprises, their associations, industries and the economy as a whole with their bureaucracy, rigid framework of job descriptions and discretely changed business processes has exhausted itself. Only a modern management system methodologically based on adhocracy (the term “Adhocracy” in 1968 was introduced into scientific circulation by the American psychologist Warren Bennis [12]. He became widely known thanks to Alvin Toffler [13]. Adhocracy formalizes the style of interaction between managers and subordinates, society and the economy with a simultaneous dynamic growth of the degree of freedom in the powers and actions of “customers” and “executors”). The meso-level of management [14], to which the industry can be attributed, should be aimed at identifying and supporting technological, organizational, commercial and other leaders. The practice of the world’s leading corporations in the field of electronics demonstrates an active transition from once and for all established hierarchies to internal network relations that combine healthy competition of ideas and mutually beneficial cooperation of production. The benefits from the centralization of several common functions in modern large structures (TNCs, industries) are multiplied by the development of distributed structures and network interactions at all stages from start-up to participation in market competition. This multi-agent, network-centric approach will enable a deep digital transformation of the place and value of the industry. The multi-agent platform will allow implementing the most advanced management principles based on intelligent, self-learning systems. In place of the traditional “bureaucratic management” all over the world, there are management complexes that optimally weigh the competition by cooperation between enterprises, organizations, other industry structures and individuals-generators of new knowledge, while achieving a significant synergistic end result [9]. The network-centric platform, intelligent multi-agent platform and the Smart Electronics eco-system will be created and will continue to be continuously improved through modularity. This will create a long-term opportunity for both replacing old modules with more advanced ones, and the prospect of connecting new modules that will be in demand in the future. For such a modification of the system, it will not be necessary to stop its operation. All co-executors will function as part of the organic structure of the Smart Industry multi-agent platform. Particular attention should be paid to the major “cumulative manufacturers” of electronics. In relation to the industry as a whole, internal “nesting dolls”, fractally coupled with other management levels, will develop according to a coordinated program for creating an embedded digital system - “Smart Enterprise”. This should include all research institutes, design bureaus, enterprises, etc., involved in the implementation of the Strategy. This will allow the management of the industry to go away from linking co-contractors and customers through discrete approvals, planning and difficult renegotiations, in the event of any disturbing factors. All eleven technological directions defined in the “Strategy”, through multi-agent technology, will turn
490
V. V. Shpak
into a single and holistic organism, working for common ultimate goals, like a working clockwork. Figure 1 presents a matrix of cross-cutting projects showing the mutual exchange of products from the presented eleven technological areas of development of the electronics industry.
Fig. 1. Matrix of cross-cutting projects.
Any forced adjustments of mutual efforts and resource allocation due to this control system will be included in the scientific and technological and production and technical plans of the performers in real time when disturbing changes occur. This will significantly increase the final efficiency of using allocated budget funds and, accordingly, will increase the profitability of attracted private investments. The growth of the aggregate efficiency of reproduction in general and the transparency of all stages of production and sale of final products will become a reliable foundation for the development and expansion of the sphere of public-private partnerships in the industry, not only to the finishing stages of production, but also to the starting costs of R&D, which is by far the most bottleneck. For a multi-agent platform in such a configuration, combined with a database (BigData), today there are no restrictions on the multi-parametric characteristics of the processed information, including the high complexity of the composition of components and products, the interdependence of work along the entire technological chain from raw materials and skills of individuals to final products. Features of technological processes, as well as the availability of the required materials and resources (material, financial, personnel and others), not just in an abstract representation, but in a strictly coordinated volume, place and time without deviations, is also not a limitation of the use of a multiagent platform. Constant hardware and software monitoring and control of the progress of all work, without exception, in real time will allow an order of magnitude to reduce
Fundamentals of a Multi-agent Planning and Logistics Model
491
the time of executing specialists to optimize routine problems and circumstances that the multi-agent platform will gradually “learn” to solve optimally and independently [11]. At the same time, the human factor in the system will remain decisive in the event of fundamentally new challenges, unforeseen events and problems, which will require, on the one hand, a significant reduction in the range of problems solved manually, to 20–30% of the modern workload of personnel, but, on the other hand, will require a significant improvement in the qualifications of decision-makers in difficult situations. The purpose of developing a multi-agent platform and a digital eco-system “Smart Industry” is to complete all tasks of the “Strategy” in full, in compliance with the established deadlines, as well as to create an organizational and managerial groundwork both for bringing domestic electronics to leading positions in the world, and for the formation of a visual and an effective example of sectoral logistics coordination of participants in the R&D - production - consumption - utilization chain. The ecological block in the multi-agent platform will represent a socially significant mechanism of the state approach to the problems of preserving and improving the human environment. A multi-agent platform and a digital eco-system for managing the industry in the context of the interaction of enterprises of various forms of ownership and the improvement of market-oriented relations in the country should turn into a dynamic system for holding project competitions both for budget consumers of electronics in the most general sense, and for commercial structures, as well as for the selection and certification of personnel, their training based on individual career paths, mentoring, etc.
3 Scientific Foundations of Multi-agent Technology in the Real Sector of the Economy In the course of the practical implementation of specific tasks of transferring the management of the industry to this latest technological platform, work should be continued in a field that does not yet have world analogues, including: creation of an ontology and a knowledge base of the subject area, which allows to formalize knowledge for multi-agent systems and standardize the protocols of their interaction at the industry level with an eye on electronics, radio electronics and the production of communications equipment; – creation of a fundamentally new distributed intelligent system for managing industry resources (human, material, intellectual, financial) online based on block chain technology; – “intellectual” support within the framework of a multi-agent platform for coordinated decision-making processes along the entire chain of development and production of electronic components and final electronics products, taking into account all the specifics of the industry and its subsectors in order to achieve the goals of the “Strategy”; – reducing the labor intensity of strategic and operational planning of resource allocation by transferring routine optimization, logistics and coordination operations to a multiagent platform that will perform its role around the clock, without interruptions and stops and prepare draft final decisions for industry leaders and its most important actors;
492
V. V. Shpak
– creation of an unprecedented effective mechanism for monitoring, control and coordination of projects at all stages of their execution in a recursive version of adjustments, which will ensure the transparency of the entire system; – increasing the efficiency of the use of resources, primarily in the context of fixing the most scarce positions and bottlenecks; – reducing the time and cost of developing both electronic components and final products in accordance with the standards set by world competitors; – minimization of the risks of projects for the creation of fundamentally new products due to continuous recursive optimization and coordination of the activities of all agents working for a single ultimate goal; – creation of new technologies for enterprise management, focused on unleashing the creative potential of employees, full use of people’s knowledge and skills, their will and energy to increase the productivity of enterprises and increase labor productivity. To this end, the systems being developed for all structures of the “matryoshka” will interact and develop options for plans and other decisions, resolve conflicts, coordinate decisions with people and seek their implementation, despite various disturbing events. The multi-agent technology, actively developed in our country, provides an optimal solution to complex problems of distribution, planning and effective use of available limited (including due to sanctions) resources in real time based on the fundamental principles of self-organization and evolution inherent in living systems that are fairly well formalized thanks to the mathematical apparatus and the cybernetic approach to reproduction processes based on effective feedbacks. Autonomous software agents included in the scope of the multi-agent platform will act on behalf of industry, functional and territorial projects, tasks, works, organizations, performers, products, components and materials. They will dynamically coordinate mutually beneficial solutions and form adaptive networks of operations, multi-level plans according to the integral criterion of maximizing the final national economic efficiency [15]. This will allow, without prejudice to the quality and timing of work and production of products, to rebuild the current production and technological plans, including in the event of unforeseen events. Unlike well-known batch production management systems and technologies, such as Microsoft Project, Alt Invest, Project Expert, Prima Vera and a number of others, the multi-agent platform will create conditions for end-to-end flexible and adaptive work and production planning in a single space of industry resources and almost instantly the cost estimate of any decisions of the higher authorities that may lead to a delay in project financing (for example, in a sequestration format, etc.), “underdelivery” by the educational module of personnel of the required qualifications, etc. Multi-agent technology is based on the principles of a formalized representation of knowledge (ontologies) about an object and control subjects. For this purpose, the agents included in the multi-agent platform will be endowed with several parameters, which in their most general form can be reduced to four groups: actors, collections of actors, hierarchy of actors, and formalized attributes of actors. Such formalization based on ontologies will allow in a unified form to specify the requirements for the tasks set in the “Strategy” and the resources allocated for their achievement with the aim of making Russia one of the leading electronic countries of the world. The use of this technology
Fundamentals of a Multi-agent Planning and Logistics Model
493
allows to separate the knowledge of the subject area from the program code of the system and enable users to replenish this knowledge or more accurately tune the system for a specific project [16]. In contrast to the existing now classical systems consisting of internal modules, the developed digital intelligent system “Smart Industry” will function as a networkcentric system. Autonomous participants of the multi-agent system through the ServiceOriented Architecture (SOA) will interact via the BSB (Brunch Service Bus) common for the industry. Thanks to such a “system of systems”, the “negotiations” of the actors will not represent a pulling over the blanket of local benefits, but a form of iterative optimization of the costs and benefits of both each actor and the entire industry as a whole. The currently available published developments and the practice of using similar control systems demonstrate their advantage over the hyper-complex discrete-monolithic control systems that are currently used by business entities. The future network-centric adaptive platform p2p (peer-to-peer network) will allow actors to interact “each with each” both horizontally – “equal to equal” and vertically, considering the adaptive relations of hierarchical subordination. A cluster of already existing and tested software solutions will interface with the future multi-agent platform. The network-centric adaptive platform, while achieving a high level of information security, ensures openness, flexibility, high performance, scalability, reliability and survivability of the created system [6]. Formalization of up to 80% of routine operations that the developed system can perform faster and more efficiently, including the assessment of: – efficient use of resources in each of the ongoing projects, – time parameters for the implementation of both individual stages of work and all projects in general, – possible downtime, bottlenecks and resource scarcity in a proactive rather than emergency manner, – the degree of transparency and timeliness of the reaction of decision-makers, which will increase the consistency and coherence of work along the entire chain from R&D to the release of finished products, – the dangers and consequences of unforeseen events in real time, – positive or negative influence of the human factor, including mistakes, collusion, etc. The multi-agent platform turns into an “escalator”, which allows achieving overall success much faster [1].
4 Stages of Development and Construction of a Multi-agent Platform “Smart Electronics” The knowledge and practical experience available to public administration practitioners, scientists and specialists in research centers and IT companies make it possible to accomplish the most important tasks in the shortest possible time:
494
V. V. Shpak
1. Formalize the “Development Scheme” of the electronic industry on the basis of the developed management methodology [17]. 2. Form and constantly develop a knowledge base based on the ontology of the subject area of electronic industry products for a formalized ontological specification of designed components and products, applied design and manufacturing processes, mechanisms and processes for expanding and deepening personnel competencies, as well as rational use of resources, including the results of intellectual activity. 3. Create, debug and put into operation a digital platform for resource management of the industry “Smart Electronics” in order to achieve optimal reproduction over a long period of time without prejudice to the nature and interests of future generations. 4. Create and dynamically improve industry and government standards, ontologically formalized business processes, as well as procedures and protocols for the interaction of actors within the framework of the global industry multi-agent platform “Smart Electronics”. 5. Based on the created digital platform, knowledge base and standards, develop a system of smart services (“AI systems”) along the entire chain from research and development to engineering and production of finished products. The development of the “Smart Electronics” platform should provide for the connection, on the principles considered, of enterprises that are not part of the Electronics industry, subcontractors and all chains of cooperation to obtain a new quality in specialization and division of production in the economy as a whole. 6. Development of a software and hardware complex to supplement short-term and medium-term planning of the industry development with predictive models of strategic planning. 7. Create a theoretical opportunity and practical portals for a safe and mutually beneficial connection to the developed system of foreign participants, primarily from the CIS, SCO and BRICS countries.
5 Conclusion The work of scientists and specialists in the “Electronics” industry in the context of widespread introduction of digital technologies into the economy must comply with Ashby’s rule. This means that the nationwide governing subsystem of the industry must have a margin of complexity in comparison with the complexity of controlled actors and processes. The creation of a multi-agent platform for the successful implementation of the goals and objectives set by the country’s leadership in the “Strategy” will guarantee the implementation of this immutable law of management in the industry, and, consequently, the implementation of both strategic and tactical plans. The undoubted advantage of this approach to management at the meso-level is the use of scientific developments, practically debugged in related science-intensive sectors of the economy.
References 1. Vittikh, V.A.: Introduction to the Theory of Intersubjective Control. SSC RAS, Samara (2013) 2. Order of the Government of the Russian Federation 17(20): Moscow (2020)
Fundamentals of a Multi-agent Planning and Logistics Model
495
3. Fritjof, C.: The Wed of Life. Anchor Books, a division of Random House, New-York (1995) 4. Bogdanov A.A.: Tectology: A General Organizational Science, 3rd edn. (revised and enlarged again). Economics, Moscow (1989) 5. Guide to the Body of Knowledge on Project Management, 4th edn. Project Management Institute, Moscow (2012) 6. Vittikh, V.A., Skobelev, P.O.: The method of conjugate interactions for managing resource allocation in real time. Avtometriya 2, 78–87 (2009) 7. Skobelev, P.O.: Multi-agent technologies in industrial applications: to the 20th anniversary of the founding of the Samara scientific school of multi-agent systems. Mechatron. Autom. Control 12, 33–46 (2010) 8. Porter, M.M.: International competition. International relations, Moscow (1993) 9. Klok, K., Goldsmith, J.: The end of management and the formation of organizational democracy. PeterSPb (2004) 10. Kleimenova, E.M., et al.: The method of risk assessment in a multi-agent project management system for research and development and development projects in real time. Inf. Control Syst. 2(63), 29–37 (2013) 11. Vittikh, V.A., Skobelev, P.O.: Multiagent interaction models for building networks of needs and opportunities in open systems. Autom. Telemech. 1, 162–169 (2003) 12. Yuldashev, R.T.: Insurance Business: Dictionary-Reference. Ankil, Moscow (2005) 13. Toffler, E.: Future Shock. AST, Moscow (1970) 14. Kleiner, G.B. (ed.): Mesoeconomics of Development. Nauka, Moscow (2011) 15. Skobelev, P.O.: Intelligent resource management systems in real time: development principles, experience of industrial implementations and development prospects. Inf. Technol. 1, 1–32 (2013) 16. Burkov, V.N., Korgin, N.A., Novikov, D.A.: Introduction to the Theory of Management of Organizational Structures. Librokom, Moscow (2009) 17. Shpak, V.V., Brykin, A.V.: On the formation of an organizational and managerial model for the development of the radio-electronic industry in Russia. Resour. Inf. Supply Compet. (RISK) 3, 108–115 (2020)
Automated Flow Meter for LPG Cylinders Viktor A. Shurygin , Igor M. Yadykin , and Aleksandr B. Vavrenyuk(B) National Research Nuclear University MEPhI, Kashirskoe Highway 31, Moscow 115409, Russia {IMYadykin,ABVavrenyuk}@mephi.ru
Abstract. The article discusses the stages of practical implementation of measuring the flow of liquefied gas from a cylinder. The gas flow rate is determined based on the revealed dependence of the inductance of the gas cylinder on the mass of the gas and the ambient temperature. Devices for carrying out reference measurements and stages of calibration of flow meters and requirements for the design of the measuring system are given. An example of the implementation of an automated measuring system for calibration is considered. Keywords: Meter · Gas cylinder · Inductance coil · Temperature sensor · Calibration · Automated test bench
1 Introduction Currently, there is a sharp demand for natural gas in the world, both pipeline and liquefied. This is explained by the massive rejection of the leading economies of the world, for example, Germany, Britain, the USA, from traditional energy sources – coal, oil, nuclear energy in the hope of using so-called “green”, environmentally friendly energy sources. This is the energy of the sun, wind, etc. However, the hopes were not fully realized, which led to a sharp demand for traditional sources, primarily natural gas. The priority is due to the prospects of the world’s natural gas reserves, the lower cost of liquefied gas than fuels based on oil and coal. Fuel based on liquefied gas is more environmentally friendly. When burning gas, on average, there are 30% less harmful emissions into the atmosphere than using oil-based fuel. In addition, the lower cost of liquefied natural gas [1]. This transition concerns almost all areas of modern industry. From strategic to the production of consumer goods. These are areas such as motor transport, heating systems, chemical production. It is particularly possible to single out the production associated with the delivery, storage and distribution of liquefied natural gas, i.e. areas affecting the widest segments of the population and largely determining the economic, environmental and social situation. Consequently, fuel based on liquefied natural gas is the fuel of the future, which determines the necessity and relevance of scientific and technical developments in this area.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 496–505, 2022. https://doi.org/10.1007/978-3-030-96993-6_54
Automated Flow Meter for LPG Cylinders
497
The mass transition to liquid gas requires the improvement of methods and means of controlling the amount and consumption of fuel in order to optimize the operation of the relevant systems, starting from the means of pipeline transport of gas delivery, and consumer devices – complex technological equipment of the relevant industries, utilities and vehicles. The following directions are relevant for solving these problems [2, 3]: • Control of gas leaks in case of leak proofness of tanks and prevention of situations related to possible explosions and poisoning of the environment. • Development of automatic fuel consumption regulators to ensure economical operation and reduce harmful emissions, using the capabilities of modern computer technologies. • Control of unauthorized access to gas transmission systems for the purpose of fuel theft. • Providing simple visual, as well as differential and integral monitoring of gas flow. This article discusses the development of measuring instruments for monitoring gas consumption, focused on road transport. The gas engine fuel market is one of the most dynamically developing in the world. In Russia, the use of natural gas as a motor fuel is one of the priority innovative directions for the development of the oil and gas complex [1].
2 Problem Statement From the analysis carried out for the currently common direct and indirect methods of monitoring the consumption of gas engine fuel, based on the measurement of weight, density, pressure, level gauges on the float principle, the following disadvantages can be identified [4–8]: • do not allow to provide high accuracy and reliability in conditions of shaking and vibration, in various climatic conditions and wide temperature ranges; • implantation of sensors in gas tanks violates the initial strength, which is unsafe in conditions of high pressure; • methods based on pressure measurement do not allow to reliably judge the mass of the remaining liquefied gas in the cylinder; • in a number of applications, fuel consumption monitoring devices are placed directly on the gas cylinder, which is unergonomical when operating cars. In this regard, the task was set to find parameters based on the measurement of which it is possible to control the amount and flow of liquefied gas without violating the integrity of the tank and regardless of factors such as shaking, vibration, humidity, the possibility of an electric spark, allowing for the possibility of convenient information output to the front panel. The authors hypothesized that the inductance of the reservoir can act as such a parameter.
498
V. A. Shurygin et al.
As a result of the experimental studies carried out [9], the authors proved that the value of the inductance winding installed on a cylinder with liquefied gas does change with the gas flow rate. In this case, the inductor and the cylinder with a changing volume of gas can be considered as a contour from which the core is unscrewed. However, the changes in inductance will be insignificant, compared with a ferrite core, but at the same time quite measurable by modern compact digital devices [2]. In addition, the authors proved that not only the inductance, but also the temperature of the cylinder, determined by the ambient temperature, correlates with the volume of gas in the cylinder. The temperature of the cylinder itself also changes as the amount of gas in the cylinder decreases due to its expansion. The temperature values of the gas cylinder can also be measured with modern small-sized digital thermometers [2]. Thus, the amount of liquefied gas in the cylinder (at the current time) uniquely corresponds to the values of the inductance and temperature of the cylinder {L, t}.
3 Practical Implementation The authors in [10] proposed a functional diagram of a device for determining the amount of liquefied gas in a cylinder. The purpose of this work is to develop the requirements and stages for creating a simple automated device (gas flow meter) suitable for mass use on motor vehicles. The conducted studies allow us to draw the following conclusion. Consumption (volume, weight) the liquefied gas in the cylinder can be measured by matching the mass of the remaining gas with the change in two parameters – the inductance L and the temperature t° of the cylinder to. To move to practical implementation – the creation of a device for operational control of gas consumption, it is necessary to solve a number of applied tasks. Despite the fact that the range of applications of liquefied gas is exceptionally wide, it is possible to distinguish two main stages common to all, which include design development and calibration. 3.1 Technical Tasks for the Design Development At this stage, it is necessary to solve the following technical tasks: • For each size and characteristics of the gas cylinder, calculate the parameters of the inductance winding (wire diameter, number of turns, type of winding, frame material, tightness) sufficient for measurement. It is known that the more turns there are, the greater the inductance, and therefore, its measurement is easier. At the same time, copper is expensive, so reducing the number of turns is economical, especially with mass use. • Identify ways and means to measure parameters. Ready-made measuring instruments can be used. A large number of portable small-sized digital inductance and temperature meters are presented on the modern electronics market. The second approach is the development of original measuring devices.
Automated Flow Meter for LPG Cylinders
499
• Development of standard housings for inductors and temperature sensors for installation on a gas cylinder. Determine the location of the inductor and temperature sensors on the cylinder – in the middle, along the edges, along the entire length. Development of means for installation, fastening and mounting of inductors and temperature sensors and their replacement in case of failure. • Selection of computing tools and development of appropriate calculation programs or development of specialized devices for converting the measured values of inductance and temperature into the corresponding values of gas mass. For example, the correspondence tables obtained after calibration of a typical cylinder can be sewn into a permanent storage device, and the measured current values of inductance and temperature can be used as addresses for which the corresponding mass of gas is recorded. • For working conditions in large temperature ranges, select (develop) devices capable of making scale coefficients and adapting calculations to changing conditions or using several corresponding tables recorded in memory. For example, you can use devices based on computing micro architectures or flash memory. • For special operating conditions, it is advisable to place the gas cylinder in an outer shell (such as a thermos) to exclude the influence of external temperature changes. • Select the means to visualize the results of the current operational and accumulated monitoring of the flow rate and the remaining amount of gas. • Develop or select tools (programs) to collect and process statistical data on gas consumption. In addition, it is possible to interact via the communication channel between the on-board device and the tachograph cards that are part of the tachograph installed on vehicles. At the same time, it is necessary to provide cryptographic mechanisms for data integrity and authentication in accordance with the standards. 3.2 Calibration of the Flow Meter This is an important stage necessary for matching the values of the current mass of gas in the cylinder with the values of a pair of parameters: inductance L and temperature t° to of the cylinder. Calibration is carried out on test benches, which provide the ability to set different climatic conditions. Calibration is carried out on one or simultaneously on a group of gas cylinders. The following steps are performed for calibration: • Selection and calibration of inductance windings and temperature sensors is carried out. • An inductance coil and a temperature sensor(s) are installed and fixed on the gas cylinder. • The gas cylinder is placed on an electronic scale and a gas flow device is connected, for example, a car motor or a control gas burner. • The inductance coil and the temperature sensor(s) are connected to the corresponding meters, which are connected to the information channels for transmitting measurement data.
500
V. A. Shurygin et al.
• A preset ambient temperature is set, which is maintained throughout the test period. The temperature is determined by the scope of the cylinder. For example, for a car it can be a range from –50 °C to +50 °C. • The gas flow device is switched on, which will work until the cylinder is completely emptied. • During the tests, the measured parameters from the inductor, temperature sensors and electronic scales are transmitted via information channels to the basic processing unit (computer) and stored. Registration of a set of controlled parameters is carried out with a specified time step and/or specified parameter intervals. Then the cylinder is refueled, or another full cylinder of the same type is taken. A new ambient temperature value is set and parameter measurements are repeated. The step of temperature change in a given range is determined by specific conditions and accuracy requirements. For mass use of flow meters, the steps considered are performed for several inductors and temperature sensors in order to determine measurement errors. In addition, it is also necessary to conduct studies in test chambers that ensure changes in external extreme operating conditions, for example, aggressive atmospheric influences – rain, wind, snow, etc. Based on the processing of test results, tables of correspondence between the mass of gas and the values of inductance, cylinder temperature and ambient temperature are formed. Further, at the stage of trial tests, monitoring and data collection is carried out in real conditions on the ground, their comparison and the necessary adjustment of the compliance tables. The calibration procedure of flow meters can be lengthy, but it is performed once for commercially available standard sizes of cylinders.
4 Implementation Example To implement a method for measuring gas flow in a cylinder based on measuring the inductance and temperature of the cylinder, a block diagram of a prototype flow meter was developed, shown in Fig. 1.
Fig. 1. Block diagram of the device.
Automated Flow Meter for LPG Cylinders
501
The prototype device contains the following elements: – – – – – – –
a gas cylinder; a coil with an inductance winding mounted on the cylinder body; a temperature sensor; digital inductance meter; digital temperature meter; a basic processing unit containing a computer; information channels for transmitting measurement data.
We will select specific standard devices for the implementation of block diagram blocks offered on the market of electronic measuring devices. 4.1 Inductance Meters One example of a ready-made inductance meter is the “MINY USB L/C METER” given in [11]. Let’s consider its characteristics and features of work. The meter is made in the form of a USB set-top box to a computer, measurement processing is carried out by a special program, and the indication of readings is carried out on the monitor screen. The appearance of the inductance meter is shown in Fig. 2.
Fig. 2. Appearance of the inductance meter device.
Specifications: – – – –
Measuring range L: 0.01 µH – ~100 mH. Automatic range switching: 0.01–999.99 µH, 1 mH –99.99 MH. The device does not require a driver. Does not require adjustment (except for the calibration procedure).
502
V. A. Shurygin et al.
The schematic diagram of the meter is shown in Fig. 3. There are no controls in the circuit; all control (L measurements, as well as calibration of the device) comes from the control program. Only two terminals are available to the user to install the measured part in them, a USB connector and an LED that lights up when the control program is running and flashes otherwise. The basis of the device is an LC generator made on an LM311 comparator. A miniature quartz at 12 MHz, smaller than a sentinel, was used. As a microcontroller, Atmega48, Atmega8 and Atmega88 can be used, for which firmware cards are attached. The microcontroller uses the V-USB library, communicates with the computer, and also counts the frequency from the generator. At the same time, the frequency calculation is also handled by the control program, the microcontroller only sends raw data from the timers. Upon completion of calibration, all data is recorded in the non-volatile memory of the microcontroller. Thus, the device’s memory stores settings specifically for it.
Fig. 3. Schematic diagram of the inductance meter.
The algorithm of the program. Frequency counting is performed using two microcontroller timers. The 8-bit timer operates in pulse counting mode at the input T0 and generates an interrupt every 256 pulses, in the handler of which the value of the counter variable (COUNT) is incremented. The 16-bit timer works in the cleaning mode by coincidence and generates an interrupt once every 0.36 s, in the handler of which the value of the counter variable (COUNT) is stored, as well as the residual value of the counter of the 8-bit timer (TCNT0) for subsequent transmission to the computer. The control program is already engaged in further frequency calculation, in the Embarcadero RAD Studio XE environment in C++. Knowing the frequency of the generator, as well as the values of the setting values of L, it is possible to determine the nominal value of the inductance connected for measurement.
Automated Flow Meter for LPG Cylinders
503
4.2 Temperature Meter with USB Output One example of a ready-made temperature meter is the RODOS-5S digital thermometer given in [12]. Let’s consider its characteristics and features of work. The meter is made in the form of a USB set-top box to a computer; measurement processing is carried out by a special program. The appearance of the inductance meter is shown in Fig. 4. Specifications: – – – – –
The range of measured temperatures: –55 °C… +125 °C. Error: ±0.5 °C. Power supply voltage: 5 V (from the USB bus). Dimensions of the device: length – 72 mm, width – 18 mm, height – 13 mm. Device weight: 8 gm.
Fig. 4. Appearance of the RODOS-5S thermometer.
The RODOS-5S digital thermometer is connected directly to a personal computer via a USB port. The device is designed to measure temperature at one point. The DS18B20 sensor is mounted directly on the board, and is placed outside the case to ensure accurate readings. With the RODOS-5S, the ambient temperature can also be measured. The meter can monitor the current temperature via the Internet. In this case, the device logs the measured temperature and generates a file with the current readings in HTML format. The device supports Windows and Linux operating systems. This thermometer is made in a durable ABS plastic case. If you need to extend the device to a distance from the computer, use a standard USB extension cable. The RODOS-5 device is available in modifications: RODOS-5B is a modification in a durable ABS plastic housing with a remote moisture-proof DS18B20 sensor with a length of 1 m. RODOS-5Z is a modification in a durable ABS plastic housing with the ability to independently connect DS18B20 type sensors (the maximum allowable number of connected sensors is 32 pcs., the maximum length of sensors is up to 20 m).
5 Implementation of the Flow Meter Figure 5 shows the implementation of the block diagram (Fig. 1) on the above standard inductance and temperature meters.
504
V. A. Shurygin et al.
A laptop is used in the processing unit, which can provide both mobility and work as a stationary workstation, depending on the testing on test benches or on the ground.
Fig. 5. Flow meter on standard meters.
The computer automatically collects data from inductance and temperature meters for a certain time interval and with specified steps or upon the occurrence of an event. Then it processes the data and generates statistical results. The operator behind the monitor screen can analyze the results obtained, as well as manage the tests. In the case of an on-site test, it is possible to transmit the collected data via the Internet via a computer. In addition, it is also possible to conduct testing when comparing the recorded results with reference data obtained during calibration of a gas cylinder recorded in permanent memory.
6 Conclusion This article develops the requirements and steps for creating a simple automated device (gas flow meter) suitable for mass use. The determination of the flow rate and the gas residue in the cylinder is carried out on the basis of the proposed ratio between the mass of the gas and the measurement values of the inductor and temperature sensors installed on the outer housing of the gas cylinder. A block diagram of the device has been developed. Examples of commercially available inexpensive components of the device, inductance and temperature meters are given. Special attention is paid to the requirements for carrying out the calibration stage of the device in stationary conditions and when changing external extreme operating conditions. The proposed flow meter is characterized by increased safety in the operation of gas cylinder systems and can be recommended for mass use in a number of areas, including road transport, where liquefied gas is used.
Automated Flow Meter for LPG Cylinders
505
References 1. Avtaikina, E.E.: Innovations in the gas industry. Gas engine fuel – an innovative direction for the Russian economy. NovaInfo.Ru 6, 79–81 (2014) 2. Tereshin, V.I., Sovlukov A.S.: Comparative analysis of existing methods for measuring the mass of light petroleum products in tanks. In: III-rd International Metrological Conference “Topical Issues of Metrological Support of Flow and Quantity Measurements of Liquids and Gases”, Kazan, Russia, 2–4 September 2015, pp. 72–78. Publishing house the World without Borders, Kazan (2015) 3. Zolotarevsky, S.A.: On the applicability of various flow measurement methods for commercial gas metering. Electron. J. Energy Serv. Comp. Ecol. Syst. 5 (2007) 4. Rozhkov, N.: Metering devices: which one to choose? Gas Russia 3, 50–53 (2014) 5. Sovlukov, A.S., Tereshin, V.I.: Method to determine mass of liquefied petroleum gas in reservoir. Patent RU 2506545 C1. Application RU 2012132133/28, 27.07.2012. Published 10.02.2014, bull no. 4. IPC G01F23/26, G01F1/64 (2014) 6. Chistoforova, N.V., Kolmogorov, A.G.: Technical measurements and devices. Angarsk, AGTA, p. 200 (2008) 7. Mustafin, T.M., Burkov, A.S.: A method for controlling the pressure of gas cylinders. Patent RU 2548059 C1. Application RU 2013153745/28, 05.12.2013. Published 10.04.2015, bull. no. 10. IPC G06L11/06 (2015) 8. Tereshin, V.I., Sovlukov, A.S.: Improving the accuracy of measurements of the mass of petroleum products when using radio frequency sensors. In: III-rd International Metrological Conference “Topical Issues of Metrological Support of Flow and Quantity Measurements of Liquids and Gases”, Kazan, Russia, 7–9 September 2016, pp. 70–75. Publishing house the World without Borders, Kazan (2016) 9. Makarov, V.V., Klarin, A.P., Shurygin, V.A., Dyumin, A.A., Yadykin, I.M.: Measurement of inductance of liquefied natural gas. In: Proceeding of the 22-nd Conference of FRUCT Association, May 2018, pp. 138–143 (2018) 10. Shurygin, V.A., Yadykin I.M.: A device for determining the amount of liquefied gas in a cylinder. Patent RU 161814 U1. Application RU 2015157224/28, 29.12.2015. Published 10.05.2016, bull. no. 13. IPC G01F 23/26 (2016) 11. A meter of capacitance and inductance of small quantities. https://www.radiokot.ru/circuit/ digital/pcmod/57, Accessed 15 Sept 2021 12. RODOS-5S – USB temperature sensor. https://www.silines.ru/, Accessed 15 Sept 2021
Construction of Statically Verified System Interacting with User in Question-Answer Mode According to the Specification Set by the Formula of Linear Temporal Logic Igor Slieptsov2 , Larisa Ismailova1 , Sergey Kosikov2 , and Viacheslav Wolfengagen1(B) 1
National Research Nuclear University “Moscow Engineering Physics Institute”, Moscow 115409, Russian Federation [email protected] 2 NAO “JurInfoR”, Moscow 119435, Russian Federation
Abstract. The paper considers the automatic construction of information systems according to the formal specification of interaction with a user. The constructed system is provided with a proof of correctness in such a way that static verification of the compliance of the constructed system with its specification and the correctness of the construction itself is possible. The specification language is an extension of linear temporal logic, which allows the user setting the specification to set a class of possible options for the interaction of an information system with the environment and/or the user. The proposed solution is based on the implementation of the interpretive function, which checks the consistency of the specification and the interaction history at each stage.
Keywords: Information system temporal logic · Consistency
1
· Formal specification · Linear
Introduction
When designing the information system allowing for the interaction with a user (a person, a program, etc.) according to a certain specification, there appears a need to make sure that the constructed system is correct, that is, its compliance with the given specification. For systems developed by programmers (but not automatically generated), there are well-worked out methods for detecting and eliminating errors: testing and static verification. Testing permits to investigate the system’s behavior for a prepared model of user. Testing, being a means of finding a counterexample— a situation in which the system deviates from the specification— does not guarantee the absence of errors if a counterexample was not found in the allotted time.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 506–511, 2022. https://doi.org/10.1007/978-3-030-96993-6_55
Construction of Statically Verified System
507
– Passive monitoring consists in analyzing the interaction of an information system with an independent user in “live” conditions. – Manual and semi-automatic testing consists of monitoring the information system in a managed environment. – Automatic testing based on test cases analyzes the interaction of the information system with the user, who is represented by a prepared deterministic program acting on the basis of pre-prepared data (test cases). – Automatic property-based testing [3,8] checks the satisfaction of certain properties of the result (protocol) of interaction with the user, which is represented by a (pseudo)non-deterministic program; – Assertions and dynamic type checking [13] provide run-time monitoring and localization of errors. It should be noted, the property-based tests, which allow for describing a property of the system in some formal language, for example, a predicate any interaction with any user of a certain model must satisfy, which is also set in this language. The testing tool automatically generates test cases by the strategy implied by the property, collects the necessary data and checks the predicate. If the predicate is not satisfied, a counterexample is printed, which indicates that the developed system does not satisfy the formal specification. Note, that any testing methodology does not reliably guarantee that the system is correct, but is aimed at finding a falsifying counterexample. The sort of counterexample is determined by the expert assessment carried out by the test maker (including the subject and method of formalization of requirements and the strategy for finding a counterexample) of possible deviations of the implemented system from the specification, which is made on the basis of the analysis of the specification and observation of the functioning system [8]. In contrary to the testing, the static verification of an information system examines the code and statically proves the presence of certain properties. There are various methods of static verification, including automatic construction of a formal proof of the errors absence of a certain type (memory leaks, null pointer dereference, etc.), proof by the system of statically verifiable contracts (Hoare logic), and type inferability, which guarantees type safety. Dependent type systems [2] cause an interest, because any constraint in the language of first-order logic can be naturally encoded with a type [1]. This research is aimed at the automatic construction of a system, when 1) the specification of the interaction with the user can be expressed formally, 2) no additional restrictions (timing, memory consumption, etc.) of the implementation are required, and 3) precise compliance with the specification is essential.
2
Problem of Constructing a Statically Verified System According to the Specification
If it is needed to have a system whose interaction with user must meet the specifications, and at the same time there are no other requirements (for memory
508
I. Slieptsov et al.
consumption, response time, etc.), the task may be set to automatically construct a system according to a specification written in a certain language. This approach is similar to how a property based testing framework generates an environment and a scenario for using the system according to a formal property written in a special language. The problem can be considered as the reverse one: if property based testing takes a formal property and an implemented system and checks that the interaction of the program satisfies the property, the considered problem is to automatically construct a system according to the specification, which specifies formal properties that any interaction with the user must satisfy. When constructing a system, one cannot exclude the developers’ errors leading to a difference in the behavior of the system from the specified one. In order to exclude such errors and ensure compliance of the constructed system with the specification, the task in this paper is complicated by the requirement to guarantee two correctness properties using static verification by means of dependent type system: 1. compliance of an arbitrary correct specification with a system constructed according to it, i. e., any interaction of the constructed system with any user must satisfy the properties specified in the specification; 2. correctness of the implementation of the procedure for automatic system construction.
3
Related Works
The problem of representing concepts and domain data and constructing processes that work with them without run-time errors, is actively explored [5,6,12]. Methods aimed to localize and eliminate type errors in various systems are designed. For example, Wolfengagen et al. [10] proposes an approach of constructing computational model with information processes represented in untyped theory, but considered to be special cases of typed theory. Another approach is presented in Wolfengagen et al. [11]. It is based on the semantic metalanguage that is used to study semantic processes in information modeling systems and their safe interaction.
4
Proposed Approach
The first property of correctness is ensured by the fact that the control function of a constructed system is provided with a proof that any interaction satisfies the property set by the specification. The proof is represented by an instance of the type that corresponds to the property under consideration by the CurryHoward isomorphism; the very existence of an instance proves the existence of this property in the constructed function. If there is a static verification of all other components, this implies the correctness of the entire system. The formal specification must be supplied with enough data to generate available options for the system’s actions in every possible situation and the proof that it is possible
Construction of Statically Verified System
509
to continue interaction strictly according to the specification for any possible interaction scenario. The second property of correctness is ensued by the derivability of the type of the function that maps the specifications to the control function of the system being constructed. The type safety guarantees the absence of certain errors in the implementation, such as 1) substitution of formulas, properties, proofs, concepts and, in particular, unjustified use of the control function of one specification for another, the use of proof of correctness of one control function to prove the correctness of another; 2) incorrect proof of correctness: the use of implicit or false assumptions, errors in the structure of the proof. All axioms about the model of system interaction with the user and about the semantics of the specification language are listed explicitly and localized in the code. The approach proposed in this paper reduces the probability of making a mistake by dividing the software and evidence system into the components: the type inference system (delegated to the compiler in this paper), the semantics of the specification language (a set of types and axioms gathered in one place) and the procedure the system constructing, implemented by a correctly typed function that relies on axioms. We’d like to note that the proposed approach considers the specifications that establish the system’s behavior but not its other characteristics: response time, the scope of consumed memory, and alike.
5
Architecture of Solution
This paper considers the single-user discrete interaction, whose protocol can be represented as an infinite sequence M = (y1 , v1 ), (y2 , v2 ), . . . of message pairs (yk , vk ), where yk ∈ Y - the questtion message to the user at k-th stage, vk ∈ V (yk ) - the message from the user as a response to yk . As a formal language for the specification of the system’s interaction with the user the extension of linear temporal logic [7,9] is used, the formula φ of which divides all protocols M in satisfying (M |= φ) and unsatisfying it (M |= φ). We’d like to emphasize that the developed specification language allows to specify predicates on the set of all user interaction protocols, but does not allow to impose restrictions on the internal properties of the system (scope of consumed memory, response time, and alike). The architecture of the solution is shown in on Fig. 1. The constructed system is marked by dotted line. It is parameterized by the specification. At every stage of interaction the system establishes one of the three signals (A, ¬A or U ) or forms a message-question. The database contains the “history” of interaction H = (y1 , v1 ), . . . , (yn−1 , vn−1 ) which is the finite prefix of the protocol. The control function q has a type dependent on the history E
A
A
q(H) : (H |= φ) × [Y ] + (H |= φ) + (H |= ¬φ) + 1,
510
I. Slieptsov et al.
Fig. 1. Architecture of the information system operating in accordance with formal specification. At every stage the system forms a question y or sends one of the signals A, ¬A or U E
where φ is a specification, marked formula of linear temporal logic, H |= φ and A
H |= φ are types-as-formulas [4].
6
Implementation of Prototype
The prototype is developed in the Idris 2 language. The prototype contains a definition of data types that represent formulas of linear temporal logic, markedup formula, pre-defined axioms of linear temporal logic, the type of final history of interaction, as well as the types defined in the previous section. The following abstractions are introduced into the implemented prototype: 1) abstraction according to the language of questions and answers; 2) abstraction according to the type of atomic predicates and the semantics; 3) abstraction according to the type of generator, which opens up the possibility of creating additional semantic markup of questions that can be asked at the current stage. In the future it is planned to implement modal operators Until and Release, as well as to implement several generator models.
7
Conclusion
When it is necessary to develop a plenty of systems that will be guaranteed to interact with the user strictly in accordance with their formal specifications in all interaction scenarios, regardless of user actions, the automation of system construction and obtaining guarantees of correctness in the form of formal proof of the system’s compliance with its specification come to the fore.
Construction of Statically Verified System
511
This paper proposes an approach in which the formal specification is a logical formula that all possible interaction protocols must satisfy. A system architecture is proposed that accepts a formal specification as a parameter. The control function classifies the specification either as incorrect, or as a specification that will always or never be satisfied, or as a generator of set of questions. The proof of correctness is expressed by the function, the type of which is interpreted as the requirements for the correctness of the control function. The static verification is performed with the dependent type system. The proposed approach was generalized and implemented in the form of a prototype in the language Idris 2. Acknowledgement. This research is supported in part by the Russian Foundation for Basic Research, RFBR grants 20-07-00149-a, 19-07-00326-a, 19-07-00420-a.
References 1. Barendregt, H.: Introduction to generalized type systems. J. Funct. Program. 1(2), 125–154 (1991) 2. Brady, E.: Idris 2: quantitative type theory in practice. arXiv preprint arXiv:2104.00480 (2021) 3. Claessen, K., Hughes, J.: Quickcheck: a lightweight tool for random testing of haskell programs. In: Proceedings of the fifth ACM SIGPLAN International Conference on Functional Programming, pp. 268–279 (2000) 4. Girard, J.Y., Taylor, P., Lafont, Y.: Proofs and Types, vol. 7. Cambridge University Press, Cambridge (1989) 5. Ismailova, L., Kosikov, S., Kucherov, I., Zhuravleva, O.: Tools of algebraic type for manipulating methodologically oriented cognitive information. Procedia Comput. Sci. 169, 23–30 (2020) 6. Ismailova, L., Wolfengagen, V., Kosikov, S.: Hereditary information processes with semantic modeling structures. Procedia Comput. Sci. 169, 291–296 (2020) 7. Manna, Z., Pnueli, A.: The temporal logic of reactive and concurrent systems: Specification. Springer Science & Business Media (2012). https://doi.org/10.1007/ 978-1-4612-0931-7 8. Reich, J.S.: Property-based testing and properties as types: a hybrid approach to supercompiler verification. Ph.D. thesis, University of York (2013) 9. Warford, J.S., Vega, D., Staley, S.M.: A calculational deductive system for linear temporal logic. ACM Comput. Surv. (CSUR) 53(3), 1–38 (2020) 10. Wolfengagen, V., Ismailova, L., Kosikov, S.: Capturing information processes with variable domains. Procedia Comput. Sci. 169, 276–283 (2020) 11. Wolfengagen, V., Ismailova, L., Kosikov, S., Babushkin, D.: Modeling spread, interlace and interchange of information processes with variable domains. Cogn. Syst. Res. 66, 21–29 (2021) 12. Wolfengagen, V., Ismailova, L., Kosikov, S., Dohrn, J.: Superimposing semantic mesh to prevent information processes entanglement. Procedia Comput. Sci. 169, 645–651 (2020) 13. Wolfengagen, V., Kosikov, S., Slieptsov, I.O.: A cognitive type system simulation by a dynamically typed language. Procedia Comput. Sci. 145, 641–645 (2018)
Comparison of ERP in Internal Speech (Meaningful and Non-existent Words) Alisa R. Suyuncheva and Alexander V. Vartanov(B) Lomonosov Moscow State University, Moscow, Russia
Abstract. In this article it was decided to pay attention to the mechanisms of mental articulation of words with semantic meaning and words without meaning. The research is based on scalp EEG registration and application of a new method of source localization “Virtual implanted electrode” for calculation of the analogue of the local field potential in the brain regions specified by its coordinates. The purpose of the study is to compare EPs to the signal for internal utterance (repetition) of words in the native language and words with no sense, pronounced in a non-familiar language (Japanese). In Caput n.Caudati there is a complex of N150 - P200, whereas in Putamen the opposite picture is observed - a complex of P120 - N200 peaks. There is also a significant difference in Gl. pallidus Med., especially on the left side. At the same time, the meaningful words show large amplitude peaks - P100, P120, N150, P220, P300. A strong and significant distinction is found for average latent peaks, in the range of P230 - P350, the amplitude of peaks when uttering words with a semantic load is greater. For left- and rightsided areas of Parietal c. BA7, as well as in the middle part of the cingular cortex (G. Cingulate Med. BA24), the potentials are similar and characterized by the two-component structure P120 - N230, analogous to the potentials in Putamen. Keywords: EEG · EP · Internal pronunciation · Covert speech · Words · Non-existent words · Japanese · Neologisms
1 Introduction The perception of words in different languages, as well as the problem of reproduction of words that have no semantic meaning, still remains poorly studied. There are many contradictory hypotheses. Recent studies on the peculiarities of neural adaptation processes suggest a possible influence of the respondent’s reading skill development on his perception of lexicogrammatical constructions. In dyslexia, the phenomenon of repeated word re-production (neural adaptation) can interfere with implicit learning and the formation of stable representations of word semantics, which was hypothesized in earlier studies. In this work it has been decided to pay attention to the mechanisms of perception and articulation of words with semantic meaning and words without meaning. In the future it can be used not only to improve technologies of speech reconstruction (BCI-interfaces), but first of all for scientific purposes to obtain more detailed information about the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 512–521, 2022. https://doi.org/10.1007/978-3-030-96993-6_56
Comparison of ERP in Internal Speech
513
substrate of such phenomenon as inner speech (self-pronunciation). Research is based on registration of scalp EEG and application of a new method of source localization “Virtual implanted electrode” (developed by A.V. Vartanov, the method is on the stage of patenting in the Russian Federation). This technology allows by scalp EEG data to reconstruct the electrical activity whose source is located in a certain place inside the head with preset coordinates relative to scalp electrodes that can be considered as an analogue of the local field potential. The closest to such a procedure is the method of spatial filtering (presented in US Patent № 5263488, authors Van Veen, Joseph, Hecox, 23.11.1993), as well as its subsequent development in the form of a group of methods of localizing sources, united by the common name of “beamforming” (“beamforming”). In contrast to them, the method used solves this problem in a different way, which allows us to obtain an unambiguous and reliable solution. Besides, this method can be applied to the same EEG data an unlimited number of times, which allows an effective “cleanup” - differentiating the activity of the investigated source from the activity generated in the surrounding areas by subtracting the activity of the surrounding points (with a radius of 1 cm).
2 Method 2.1 Data Processing The objective of the research is to compare the EP in auditory perception and internal pronunciation of the stimuli. Electrical activity was measured in tasks for auditory perception and pronunciation during electrophysiological experiment with registration of 19-channel EEG (according to the ‘10–20%’ international system, using Neuro KM type electroencephalograph). The electrodes were arranged in a 10–20% system with two bridges. Presentation v.18.0 was used to present incentives. Incentives were presented in random order. For data processing, the Brainsys program was used. To present a sound stimulus, the subjects wore headphones in which they listened to stimuli pre-recorded on the recorder. The total experiment time is from 40 to 55 min due to different times for presenting instructions. The experimental design corresponds to the standard designs described in similar studies: a sound stimulus is shown to the subject during 700 ms; then sound stimuli on which the subject should focus his attention and which is the starting-up team of internal speaking (1500 ms). 2.2 Procedure Our experiments included two series: 1. Control series - the subject was shown words, in random auditory order, task was to remember what participant was doing during the session(but not to pronounce himself). 2. Experiment with initialization of covert speech on the basis of auditory stimuli (words). In these series, the sound image already exists, it is only needed to be repeated). A participant speaks with his eyes closed.
514
A. R. Suyuncheva and A. V. Vartanov
2.3 Participants and Stimuli The study involved 18 people, among them male - 10 people, and 8 - female. Age 18 to 25 years. All participants of the expert did not have a history of head injuries and mental illness at the time of this study, and were right-handed. The participants were instructed to pronounce phonemes and syllables related to Japanese words covertly, with eyes closed. All stimuli were presented in a random order. The beginning of pronunciation was set by a special stimulus following the first one, which sets the word to be pronounced. An audial stimulus continues on for 700 ms. Presentation of the stimulus was followed by a pause of 500 ms duration. After a pause participant hear a special signal. This is the start command for internal pronunciation, which must complete within 1500 ms. To present the audible stimulus the participants were wearing headphones in which they were listening to pre-recorded words and conditioned stimuli required for speaking. The experiment was considered and approved by the Research Ethics Committee of the Faculty of Psychology, Lomonosov Moscow State University.
3 Results As a result of the data analysis we obtained information on activation for 33 points selected according to the atlas MNI152 in the center of the following structures: Hypothalamus, Brainstem, Mesencephalon, Medula Oblongata, Caput n. Caudati L, Caput n. Caudati R, L Gl. Pallidus Med., R Gl. Pallidus M., Putamen L, Putamen R, Thalamus L, Thalamus R, Hippocampus L, Hippocampus R, L Amygdaloideum, R Amygdaloideum, G. Cingulate Med. BA24, Ant. Cingulate BA32, Insula L BA13, Insula R BA13, Vent. Striatum BA2, L Dor.Med.Prefr. 9, R Dor. Med. Prefr. 9, L Supramarg. g. BA40, R Supramarg. g. BA40, L Parietal c. BA7, R Parietal c. BA7, V1 BA17 L, V1 BA17 R, Broca BA44 L, Wernicke BA22 L, BA44 R, BA22 R. Figure 1 shows the evoked potentials for the deep structures of the brain responsible for the organization of movements: Caput n.Caudati, Putamen, Gl. Pallidus Med.
Comparison of ERP in Internal Speech Electrod=Caput n.Caudati L
Electrod=Caput n.Caudati R
-4
-4 Mental articulation: jap_aud, Mean Mental articulation: jap_aud, 95% Conf. Int. Mental articulation: rus _aud, Mean Mental articulation: rus _aud, 95% Conf. Int.
-3
-2
-2
-1
-1
mk V
mkV
-3
0
0
1
1
2 -300
-200
-100
0
100
200
300
400
500
2 -300
600
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus _aud, Mean Mental artic ulation: rus _aud, 95% Conf. Int.
-200
-100
0
100
TimeEP(ms)
Electrod=Putamen L
-1
-1
mkV
mkV
-3
-2
0
1
2
2
-100
0
100
200
300
400
500
3 -300
600
-200
-100
Electrod=L Gl.Pallidus Med.
0
-2
-1
-1
mkV
mkV
-3
-2
0
1
2
2
0
100
200
TimeEP(ms)
200
300
400
500
600
300
400
500
600
400
500
600
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus _aud, Mean Mental artic ulation: rus _aud, 95% Conf. Int.
0
1
-100
100
Electrod=R Gl.Pallidus Med. -4
Mental articulation: jap_aud, Mean Mental articulation: jap_aud, 95% Conf. Int. Mental articulation: rus_aud, Mean Mental articulation: rus_aud, 95% Conf. Int.
-200
600
TimeEP(ms)
-4
3 -300
500
Mental articulation: jap_aud, Mean Mental articulation: jap_aud, 95% Conf. Int. Mental articulation: rus_aud, Mean Mental articulation: rus_aud, 95% Conf. Int.
TimeEP(ms)
-3
400
0
1
-200
300
Electrod=Putamen R -4
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus_aud, Mean Mental artic ulation: rus_aud, 95% Conf. Int.
-2
3 -300
200
TimeEP(ms )
-4
-3
515
3 -300
-200
-100
0
100
200
300
TimeEP(ms)
Fig. 1. EPs for the signal of internal utterance (repetition) after an auditory presentation of words in Russian (native) and Japanese (unfamiliar to the subjects). The graphs are presented for the following subcortical structures: top for the caudate nucleus, middle for the putamen, and bottom for the pale ball. The column on the left represents the corresponding structures of the left hemisphere, and the column on the right represents the right hemisphere. Here and afterwards a blue solid line shows average on all stimuli and on the whole sampling of examinees GP for sensible words (in native Russian), red dotted line - similarly averaged EPs for unfamiliar words (in Japanese). A small dotted line in the corresponding color shows 95% confidence intervals.
516
A. R. Suyuncheva and A. V. Vartanov
A pronounced potential is registered in these areas, which in Caput n. Caudati and Putamen is two-component, and in Gl. Pallidus Med., especially on the left side, is more complex. Thus, in Caput n. Caudati the complex N150 - P200 appears, whereas in Putamen the opposite picture is observed - the complex of peaks P120 - N200. At the same time, both the right and left parts of these structures show a significant difference between the potentials for uttering words that make sense (in the native language) and words in an unfamiliar language. There is also a significant difference in Gl. pal-lidus Med., especially on the left side. Meaningful words give large amplitude peaks - P100, P120, N150, P220, P300. Let us further consider the classical speech zones - Broca’s and Wernike’s zones (as well as their homologues in the opposite hemisphere), on which we also obtained pronounced evoked potentials, although of lower amplitude than in the subcortical nuclei (Fig. 2). Electrod=Broca BA44 L -1.1
-1.0
-1.0
-0.9
-0.9
-0.8
-0.8
-0.7
-0.7
mkV
mkV
Electrod=Broca BA44 L -1.1
-0.6 -0.5
-0.5
-0.4 -0.3 -0.2 -300
-0.6
-0.4 Mental artic ulation: jap_aud ,: Mean Mental articulation: jap_aud, 95% Conf. Int. Mental articulation: rus_aud, Mean Mental articulation: rus_aud, 95% Conf. Int. -200
-100
0
-0.3
100
200
300
400
500
-0.2 -300
600
Mental artic ulation: jap_aud,: Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus _aud, Mean Mental artic ulation: rus _aud, 95% Conf. Int.
-200
-100
0
100
200
300
400
500
600
300
400
500
600
TimeEP(ms)
TimeEP(ms)
Electrod=BA22 R -1.1
-1.0
-1.0
-0.9
-0.9
-0.8
-0.8
-0.7
-0.7
mkV
mkV
Electrod=Wernicke BA22 L -1.1
-0.6 -0.5 -0.4 -0.3 -0.2 -300
-0.6 -0.5 -0.4
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus _aud, Mean Mental artic ulation: rus _aud, 95% Conf. Int. -200
-100
0
100
-0.3 200
300
400
500
600
-0.2 -300
Mental articulation: jap_aud, Mean Mental articulation: jap_aud, 95% Conf. Int. Mental articulation: rus _aud, Mean Mental articulation: rus _aud, 95% Conf. Int.
-200
-100
0
100
200
TimeEP(ms )
TimeEP(ms)
Fig. 2. EPs for the signal of internal speech (repetition) after an auditory presentation of words in Russian (native) and Japanese (unfamiliar to subjects), recorded in the speech areas of Broca (top, left; homologous zone BA 44, right) and Wernicke (bottom left; homologous zone BA22, right).
Comparison of ERP in Internal Speech Electrod=L Parietal c. BA7
Electrod=R Parietal c. BA7
-4
-3
-4
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus _aud, Mean Mental artic ulation: rus _aud, 95% Conf. Int.
-3
-2
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus _aud, Mean Mental artic ulation: rus _aud, 95% Conf. Int.
-2
-1
mkV
mk V
517
-1
0
0 1
1 2 -300
-200
-100
0
100
200
300
400
500
600
2 -300
TimeEP(ms )
-200
-100
0
100
200
300
400
500
600
400
500
600
TimeEP(ms)
Electrod=G.Cingulate Med. BA 24
Electrod=Ant.Cingulate BA32
-4
-3
-2
-2
-1
-1
mkV
mkV
-3
-4 Mental articulation: jap_aud, Mean Mental articulation: jap_aud, 95% Conf. Int. Mental articulation: rus_aud, Mean Mental articulation: rus_aud, 95% Conf. Int.
0
0
1
2 -300
Mental artic ulation: jap_aud, Mean Mental artic ulation: jap_aud, 95% Conf. Int. Mental artic ulation: rus_aud, Mean Mental artic ulation: rus_aud, 95% Conf. Int.
1
-200
-100
0
100
200
300
400
500
600
2 -300
-200
-100
0
100
200
300
TimeEP(ms)
TimeEP(ms)
Fig. 3. EPs on the signal of internal utterance (repetition) after an auditory presentation of words in Russian (native) and Japanese (unfamiliar to subjects), recorded in the parietal cortex BA 7 (top, left and right), as well as in the cingular cortex (bottom left medial part BA 24; right - anterior part BA32).
Figure 2 shows that for the motor areas compared to the sensory ones, the potential looks more simple two-component - N120 - P250 and is similar to the potential in Caput n.Caudati. At the same time, the strongest and most significant difference is found for the average latent peaks, in the range of P230 - P350, the amplitude of peaks during the pronunciation of words with a semantic load is greater. More complex potential is observed in the sensory areas, and the difference in the pronunciation of meaningful and non-meaningful words is much smaller. This means that during internal speech activity the motor zones reflect the meaning of the spoken word to a greater extent. Figure 3 shows the potentials in the parietal and cingular cortex. These potentials are of sufficiently large amplitude, comparable with the potentials in the deep nuclei. Figure 3 shows that for the left- and right-sided areas of Parietal c. BA7, as well as in the middle part of the cingular cortex (G. Cingulate Med. BA24), the potentials are similar and characterized by a two-component structure of P120 - N230, similar to the potentials in Putamen. In these cases, large significant differences in the utterance of meaningful and meaningless words are found in the early stage - 100 ms. A different pattern is observed in the anterior cingulate cortex (Ant. Cingulate BA32). In this area, the response has a significantly lower amplitude and the opposite sign of
518
A. R. Suyuncheva and A. V. Vartanov
Fig. 4. Comparative connectome (for pronunciation of meaningful words and words in an unfamiliar language), constructed for the greatest connections between the studied structures based on correlation coefficients between GPs (shown at the top of the connections; solid lines represent positive correlation, dotted lines represent negative correlation), taking into account the time shift (preceding changes are considered to be the fact of influence of one structure on another and are shown by an arrow; delay time in ms is signed at the bottom of the connection). The black color shows the connections that are equal in strength for the corresponding conditions. The increased (amplitude greater than 75% of all) activity of the studied structures for the unknown (Japanese) language is highlighted in blue, and the highest (greater than 75%) signal amplitude in both conditions is highlighted in yellow.
Comparison of ERP in Internal Speech
519
the two peak structure components - N120 - P200, similar to the potential in the caudate nucleus (Caput n. Caudati). At the same time, a small but significant difference between the utterance of meaningful and meaningless words is detected only in the late latency of 300 ms. The obtained data on reconstruction of the local field potentials using the new method "Virtual implanted electrode" for the above described structures allows not only to perform analysis for each of the structures separately, but also to investigate their functional relations based on calculation of correlation coefficients and construction of the connectivity graph. Figure 4 shows the comparative connectome of correlation in the internal utterance of meaningful and meaningless words. Figure 4 shows that putamen, which activates the right parietal and dorsomedial cortex and the hippocampus, but inhibits the caudate nuclei, amygdala, and anterior cingular cortex, can be distinguished as the main node of directed connections. At the same time, its activity is the highest - it is in the first quantile by signal value. It is considered that the main functions of the putamen are regulation of movement and influence on various types of learning. The hippocampus, insula, and motor area of the right hemisphere (homologous to Broca’s area) turned out to be the objects of activation from different levels. The sensory speech area turned out to be connected only with the middle cingular cortex, receiving inhibitory influence from it. At the same time, the middle cingular cortex itself is activated more strongly in the case of uttering words that do not make sense.
4 Discussion The relationship between the neural correlates of imagined speech and articulated speech is still a subject of debate. Two early hypotheses about the neural correlates of imagined speech belong to Watson [1], who argued that the neural correlates are similar, and to Vygotsky [2, 3], who argued that they are quite different. A large number of studies in the literature to test these hypotheses are based on the model of speech production proposed by Levelt [4]. This model divides the production of articulated speech into several stages, such as (1) lemma search and selection, (2) phonological code search, (3) syllabic separation, (4) phonetic coding, and (5) articulation. The results of studies based on Levelt’s model are inconsistent. Several studies [5, 6] have shown that greater activation of motor and premotor areas occurs during articulated speech, whereas some other studies [7] have shown that greater frontal lobe activation, in contrast, occurs during imagined speech. Overall, the results obtained in this study are in fairly good agreement with the literature and existing theories. For example, in Molfese’s experiment [8] 3 groups of children were selected: a control group (normally developed reading skills), and a group with poor reading skills and a group with dyslexic disorders. Children in the control group showed higher latencies compared to the dyslexic and poorly reading groups, as well as larger N1 amplitudes for the control group, while N2 amplitudes were larger in the group with diagnosed dyslexia and the group with poorly developed reading skill. The poorly reading group also showed higher P2 amplitudes according to the results. To support Molfese’s claim, I suggest looking at the data from Broca’s and Wernicke’s
520
A. R. Suyuncheva and A. V. Vartanov
zones. From Broca’s area the difference in amplitude of the P200 component is very clear, allowing us to not only speak of the efficacy of the virtual implanted electrode method but also to conclude whether the results of the Molfese study can be transferred to clinically normal adults. If we summarize what has been said, it turns out that Fig. 4 depicts the reference from lower structures to higher ones, which is also logical for the analysis of imagined speech, because one of the well-known models of neural representation of articulated speech is the hypothesis of two streams [9–11]. According to this hypothesis, humans have two different auditory pathways: ventral flow and dorsal flow that pass through the primary auditory cortex. In the ventral stream, phonemes are processed in the left superior temporal gyrus (RVG) and words are processed in the left anterior RVG [12, 13]. In addition, this area responds predominantly to speech rather than semantically comparable environmental sounds. In the dorsal stream, auditory sensory representations are mapped to articulatory motor representations.
5 Conclusion Thus, when mentally uttering meaningful and meaningless words, pronounced responses are found in a number of subcortical structures and cortical brain areas, as well as the emergence of a complex functional system of their interconnections. At the same time, the greatest difference in the pronunciation of meaningful and meaningless words is detected in the early and middle latencies (100–300 ms). This shows that the subcortical structures involved in movement control (especially the Putamen) not only play an important role in internal enunciation, but are also related to the analysis of the meaning of the words being enunciated. Acknowledgments. Funding: The research was financially supported by the Russian Science Foundation, project № 20–18-00067.
References 1. Watson, J.: Psychology as the behaviorist sees it. Psychol. Rev. 20, 158 (1913) 2. Vygotsky, L.S.: Psychology of art, vol. 3 (1986) 3. Martin, S., Brunner, P., Holdgraf, C., Heinz, H., Crone, N., Rieger, J.: Decoding spectrotemporal features of overt and covert speech from the human cortex. Front. Neuroeng. 7, 1–15 (2014). https://doi.org/10.3389/fneng.2014.00014 4. Levelt, J.M.: Speaking: From Intention to Articulation. MIT Press, Cambridge (1993) 5. Bookheimer, S., et al.: Functional MRI of language: New approaches to understanding the cortical organization of semantic processing. Ann. Rev. Neurosci. 25(1), 151–188 (1995). https://doi.org/10.1146/annurev.neuro.25.112701.142946 6. Shuster, L.: An fMRI investigation of covertly and overtly produced mono- and multisyllabic word. Brain Lang. 93(1), 20–31 (2005). https://doi.org/10.1016/j.bandl.2004.07.007 7. Langland-Hassan, P., Vicente, A.: Inner Speech: New Voices. Oxford University Press, Oxford (2018) 8. Molfese, D.L.: Predicting dyslexia at 8 years of age using neonatal brain responses. Brain Lang. 72, 238–245 (2000). https://doi.org/10.1006/brln.2000.2287
Comparison of ERP in Internal Speech
521
9. Hickok, G., Poeppel, D.: The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007) 10. Hickok, G., Poeppel, D.: Towards a functional neuroanatomy of speech perception. Trends Cogn. Sci. 4, 131–138 (2000) 11. Rauschecker, J.P., Scott, S.K.: Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat Neurosci. 12(6), 718–724 (2009). https://doi.org/ 10.1038/nn.2331 12. DeWitt, I., Rauschecker, J.P.: Phoneme and word recognition in the auditory ventral stream. Proc. Natl. Acad. Sci. 109(8), E505–E514 (2012). https://doi.org/10.1073/pnas.1113427109 13. DeWitt, I., Rauschecker, J.P.: Wernicke’s area revisited: parallel streams and word processing. Brain Lang. 127(2), 181–191 (2013). https://doi.org/10.1016/j.bandl.2013.09.014
Digital Transformation of the Economy and Industrialization Based on Industry 4.0 as a Way to Leadership and High-Quality Economic Growth in the World After the Structural Economic Crisis Tamara O. Temirova1
and Rafael E. Abdulov1,2(B)
1 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute),
Kashirskoe shosse 31, Moscow 115409, Russia 2 National University of Science and Technology (MISiS),
Leninskiy Prospekt 4, Moscow 119049, Russia
Abstract. The current state of the world economy is in anticipation of rapid transformational changes. Stagnation of the world economy globalization expressed by the strengthening of protectionist measures, the growing introduction of sanctions and trade restrictions, including due to the Covid-19 pandemic, testifies to the crisis of the old world order and the onset of a new one based on qualitatively new equipment, advanced technologies, such as NBICS, and a radically different system of economic relations. A way to leadership, high-quality and crisis-free economic growth is possible only through digitalization of the economy, industrialization based on Industry 4.0, as well as the introduction of a planning system based on new technological paradigm. Keywords: Digital transformation · Industry 4.0 · New technological paradigm
1 Introduction The “digital economy” concept has been used since the mid-1990s. At that time, technologies of the fifth technological paradigm based on active introduction of network and communication technologies, were being actively disseminated. Of course, the progress in telecommunication technologies started much earlier, and their active implementation coincided in time with the globalization of the world economy rooted back to the 1970s. This coincidence is not accidental. At the time, the capitalist economy was in a severe crisis known as stagflation. The nature of that crisis can be explained through the process of relative overaccumulation of capital in the real sector after the final recovery of the world economy since the Second World War. After stagflation, competition only increased, and, accordingly, profit margins began to decline [1]. New approaches were required to overcome the crisis.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 522–527, 2022. https://doi.org/10.1007/978-3-030-96993-6_57
Digital Transformation of the Economy and Industrialization Based on Industry
523
2 World Economy Globalization Causes The response of big business and politics of developed countries to these events was expressed in the rejection of Keynesian methods of economic management (support of demand, regulation of employment) and a turn towards globalization and financialization. These processes presupposed, first of all, the transfer of industry to third world countries, where there is an army of cheap labor. Low wages in the so-called periphery countries, made it possible to reduce costs and thus increase corporate profitability. New telecommunication technologies, in turn, made it possible to effectively control businesses over vast distances. It has also become easy to build global value chains, that is, the processes of interaction of large transnational corporations with companies from peripheral countries on the outsourcing principles. In these respects, large capital from the countries of the center was in a more privileged position, since almost all innovations, license packages and pools of patents were assigned to them, that is, everything that allowed them to control highly profitable stages of global value chains. Whereas labor-intensive production with low added value was assigned to businesses from peripheral countries. Globalization brought benefits, first of all, to the countries of the world-system center. The G7 countries were major agents in the rise of the global economy [2]. Some countries from the Asia-Pacific region also took advantage of the globalization outcomes and go through the path of modernization. We are talking about countries that adapted to new relations in the world-system and follow the path of associating their interests with the interests of the countries of the center. In this way, Japan, South Korea, Hong Kong, Singapore, Taiwan, then Thailand, Malaysia, the Philippines, Indonesia, and finally China and Vietnam managed to modernize. The sequence and mechanics of modernization of these countries was well analyzed by T. Ozawa [3], who introduced a goose-wedge model to describe the stages of development of this group of countries. This model shows that industrialization initially encompassed labor-intensive industries with low added value, such as food and agriculture. Then the modernization concerned metallurgy, chemical industry, heavy industry, pharmaceuticals, followed by household appliances and computers. At the same time, with each such iteration, production businesses with low added value moved to less developed countries, while highly profitable businesses, on the contrary, were assigned to their original country. It was this approach that resembled the flying geese model.
3 Financialization and Limits of the World Economy Globalization So, the advanced technologies in the information and communication field, of course, remained with the countries of the center, first of all the United States, then Japan, and so on. This hierarchy made it possible to effectively integrate each country into the world-system. Simultaneously with telecommunications, the countries of the center still controlled the finance which also promised huge profits due to the growth of speculative transactions. Indeed, since capital was not profitably invested in the industries of developed countries, it could be exported either to the periphery or migrate to the finance, warming up stock markets, over-the-counter markets, mergers and acquisitions, etc. This
524
T. O. Temirova and R. E. Abdulov
process of replacing productive assets with financial ones became known as financialization. Thus, by the end of the 1990s, belief in new telecommunication technologies led to unreasonable optimism and growth in the quotations of such companies, which caused the inflating of giant financial bubbles by 2000. In fact, it was a manifestation of the crisis in the capitalist system, dating back to the stagflation of the 1970s [1]. The next global economic crisis in 2008 showed the financialization development limits and the threat of speculative bubbles for the entire world economy. However, these events could not force the governments of developed countries to abandon the ideas of globalization. On the contrary, the central banks of the countries of the center began to issue unsecured money in order to stimulate the economy. Financialization was further strengthened upon such actions. Speculative bubbles began to grow at an insane rate. New technologies, first of all, elements of the digital economy, made it possible to effectively serve the financialization process. Nowadays, financial capital is able to move around the world at a huge speed, which sometimes threatens financial markets of small sovereign states, since the volumes and capitalization there are incomparable with the financial power of capital from the countries of the center. It should be noted that similar situations, the domination of the finance over the real industry, existed before as well. For example, the representative of the world-system approach, G. Arrighi [4], introduced the concept of systemic cycles of capital accumulation into scientific circulation. They consist of two stages: material expansion and financial expansion. Each such cycle developed under the auspices of some hegemon of the capitalist world: Holland, Great Britain and now the United States. Thus, the end of the current financial stage shows the end of the American cycle of accumulation. By the way, according to Arrighi, it was financial expansion that was always accompanied by the world economy globalization, while material expansion, on the contrary, was accompanied by protectionism.
4 De-globolization and Digital Transformation as a Path to Leadership in the Global Economy Today, already before the pandemic, the process of globalization has stumbled upon its own limits to growth. Investments became less profitable, and risks and volatility increased in financial markets. We can see a decrease in the activity of the development of global value chains, a decrease in the intensity of world trade and investment [5]. On the one hand, the countries of the center need a tendency to constantly expand the world-system and include more and more countries there on certain conditions, where the periphery would technologically and financially depend on the center [6, 7]. However, these operating plans may be violated. An illustrative example is China, which has been demonstrating high economic growth in recent years and developing knowledgeintensive and high-tech industries. China is making great efforts to develop technologies for Industry 4.0. This state of affairs could potentially threaten some knowledge-intensive industries in the United States and Europe. That is why we are seeing tough sanctions against high-tech companies in China such as Huawei. We also see tension in relations in other industries such as metallurgy and the chemical industry.
Digital Transformation of the Economy and Industrialization Based on Industry
525
The current state of international economic relations began to show the prevalence of protectionist tendencies. This is no coincidence, since at the edge of a new technological paradigm, it is necessary to protect one’s own market from competitors. Only at the stage of maturation of some revolutionary technologies that will be the conductors in the introduction of new basic technologies can we again follow the path of opening markets and globalization. Nowadays, the image of the future leader – a country capable of creating a new industry based on the new sixth technological paradigm – is being formed. For example, the United States and some European countries are again potential leaders in digital transformation. China ranks 16th in the ranking (see Fig. 1).
120 100 80 60 40 20
Ukraine
Colombia
Peru
Croaa
Greece
Romania
Russia
Cyprus
Portugal
Slovenia
Saudi Arabia
Belgium
Luxembourg
New Zealand
China
Ireland
Finland
United Kingdom
Sweden
Netherlands
United States
0
Fig. 1. Country-level digital competitiveness rankings worldwide as of 2020 international institute for management development [8].
Digital transformation is another tool for dominating and exploiting the periphery. The acquisition of economic independence and self-sufficiency today is impossible without revolutionary transformations in the digital economy. It is its implementation that will allow one to engage in a broad modernization of the economy, based on a new technological paradigm, including NBIC technologies and Industry 4.0. Thus, if a new accumulation cycle begins, it can proceed under the auspices of a high-tech and science-intensive center. That is why digital transformation is promoted in the main international forums, where plans are drawn up for the digitalization of the world’s economies. Basic principles of the so-called information society were established by the Okinawa Charter in 2000, then the Tunis Commitment Action Plan in 2005, etc. Today, the ideas of digitalization are being actively promoted by the World Economic Forum as an element of the beginning of the fourth industrial revolution [9].
526
T. O. Temirova and R. E. Abdulov
5 Prospects for Digitalization in Russia Russia has joined many international treaties and declarations of this kind, which resulted in the Decree of the President of the Russian Federation No. 203 dated 09. 05. 2017 “On the Strategy for the Development of the Information Society in the Russian Federation for 2017–2030” [10]. This decree presupposes formation of a knowledge-based information space. In general, the document declares only positive outcomes of digitalization processes, due to the development of science, education, culture, the availability of services for business and the population, reducing transaction and general costs, increasing competitiveness and jobs, etc. These actions are undoubtedly useful for the economy, but they also pose threats, for example, leakage of personal information of citizens, confidential business information and critical information of governmental and other institutions. Of course, the legislation focuses on the means of strengthening and using the cryptography tools of Russian developers, the use of Russian software, etc. However, in the modern world conditions, the use of domestic equipment is just impossible. There is no industrial base in Russia for mass production of microcircuits, chips, microprocessors, logic circuits, etc. There are no corresponding factories for the production of communication, network equipment, modules for the implementation technologies for Industry 4.0. There is no corresponding component base, etc. Thus, the introduction of domestic information and software, but based on imported basic equipment, will not fully protect the entire information space. With the growing contradictions in the global economy, the threat to the digital economy will grow. Due to its peripheral nature, Russia remains a consumer of basic technologies for digitalization and technologies for Industry 4.0. Of course, digitalization cannot be abandoned. On the contrary, it should be used to modernize the economy, for new industrialization based on new technology, for planning sustainable development, for example, using the experience of the Asia-Pacific region countries. For a long time, it was believed that planning was impossible to organize due to the lack of computational capabilities of technology. Nowadays, with an increase in computer and digitalization processes performance, such an opportunity already exists. Many technical difficulties can be overcome through the introduction of artificial intelligence and big data analysis [11]. Thus, it is necessary to get rid of the fundamental problems of the Russian economy expressed in peripherality, that is, in primitivization of the means of production and subordination to global capital accumulation. Only in this way can digital transformation become the basis for a new technological paradigm and high-quality economic growth, especially when de-globalization of the world economy and the crisis of the world-system today also open some windows of opportunity for overcoming the peripheral deadlock of development.
References 1. Komolov, O.: Deglobalization and the great stagnation. Int. Crit. Thought 10(3), 424–439 (2020) 2. Temirova, T., Titov, A.: Interests of national economies in the conditions of globalization of the world market. Bull. Moscow State Open Univ. 11, 5–11 (2011)
Digital Transformation of the Economy and Industrialization Based on Industry
527
3. Ozawa, T.: Pax Americana-led macro-clustering and flying-geese-style catch-up in East Asia: mechanism of regionalized endogenous. J. Asian Econ. 13, 700 (2003) 4. Arrighi, G.: The Long Twentieth Century. Verso, London (1994) 5. Abdulov, R., Jabborov, D., Komolov, O., Maslov, G., Stepanova, T.: Deglobalization: the crisis of neoliberalism and the movement towards a new world order. 10.13140/RG.2.2.28808.14087.https://www.researchgate.net/publication/350878182_DEG LOBALIZACIA_KRIZIS_NEOLIBERALIZMA_I_DVIZENIE_K_NOVOMU_MIROPO RADKU 6. Dzarasov, R.S.: Place in Russia in the global economy in the context of digitalization. Digital Economy: Trends and Development Prospects. Collection of abstracts of the national scientific-practical conference. Moscow: Publishing house of REU (2020) 7. Dzarasov, R.S., Gritsenko, V.S.: Colours of a revolution. post-communist society, global capitalism and the Ukraine crisis. Third World Q. 41(8), 1285–1305 (2020) 8. IMD World Digital Competitiveness Ranking, pp. 28–29 (2020) 9. Schwab, K.: The Fourth Industrial Revolution. Moscow. Eksmo (2018) 10. Decree of the President of the Russian Federation of 09.05.2017 No. 203 On the Strategy for the Development of the Information Society in the Russian Federation for 2017–2030. http:// kremlin.ru/acts/bank/41919/page/2 11. Abdulov, R.E.: Artificial intelligence as an important factor of sustainable and crisis-free economic growth. In: Postproceedings of the 10th Annual International Conference on Biologically Inspired Cognitive Architectures. BICA 2019 (Tenth Annual Meeting of the BICA Society). August 15–19, 2019 in Seattle, Washington, USA, pp. 468–472 (2019)
The Use of the Economic Cross Model in Planning Sectoral Digitalisation Dmitriy Vladimirovich Timokhin(B) National Research Nuclear University MEPHI, 115409 Moscow, Russia
Abstract. The article examines methodological approaches to organizing planning and forecasting digitalization processes at the industry level. The obstacles and problems faced by the beneficiaries involved in the planning and forecasting of digitalization have been investigated, possible ways to eliminate these obstacles and problems based on the use of the “economic cross” methodology have been investigated. Specific requests in expanding the use of digital technologies at the level of nuclear energy in relation to the economic situation around the project of the state corporation Rosatom “breakthrough” are investigated. The directions of the use of the “economic cross” method are proposed in order to ensure complex interaction of participants in digitalization processes in the nuclear industry on the example of planning, forecasting and formation of contractual agreements between representatives of the state, Rosatom State Corporation and other beneficiaries within the framework of the development of the digital component of the project of the state corporation Rosatom “Breakthrough”. The features of the use of the “economic cross” methodology for the purposes of planning digitalization processes, taking into account the current macro-market sectoral factors, such as the priority of import substitution, sanctions risks, aggravated competition between technological global alliances, have been investigated. Based on the results of the analysis, a system of decisions is proposed for organizing digitalization planning processes in the nuclear power industry using the example of the project for the formation of a two-component nuclear power industry of the state corporation Rosatom “Proryv”. Keywords: Sectoral digitalization · Nuclear energy · Breakthrough project of the state corporation Rosatom · Economic modeling · Big-data analysis
1 Introduction Duaring 2010–2020 outstripping the average level of economic growth of the industry, the development of the digital component of industrial production and economic processes is the most important development trend of the worlds leading economies. At the same time, during the specified period, digitalization processes were carried out mainly at the level of individual modules of the economic and production processes of private participants in economic relations. This situation has created an asymmetric digital transformation at the industry level. The most significant manifestations of digital asymmetry were: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 528–534, 2022. https://doi.org/10.1007/978-3-030-96993-6_58
The Use of the Economic Cross Model in Planning Sectoral Digitalisation
529
a) heterogeneity of access to the Internet infrastructure and related infrastructural capabilities of different regions of Russia; b) insufficient involvement of some representatives of regional business in the system of economic relations using modern technological capabilities; c) the outflow of capital and specialists from the least technologically developed regions of the country, which consolidated their technological degradation. A feature of Industry 4.0 is a higher mutual infrastructural dependence between sectors of the national economy than for a traditional industrial economy. The lag of at least one system-forming branch of the national economy in terms of quantitative and qualitative indicators of the level of involvement in the use of digital technologies makes it impossible to fully develop such innovative areas of industrial technological development as: a) the Internet of Things, due to the lack of a technologically unified information space for transferring information from the senders object to the recipient’s object; b) the introduction of big data due to the incompleteness of information about some of the potential contractors of the industry manufacturer in the network and the lack of a single information digital space; c) outsourcing division of labor and involvement of small innovative enterprises in sectoral economic and production processes. At the same time in the period 2010–2019. there is a trend for the outstripping growth in the capitalization of industrial industries that form their own digital ecosystem. This trend is typical for both the Russian economy (Sberk, Yandex) and foreign (Google, Apple, Facebook). The example of the BEER brand in Russia and the Facebook brand in the United States indicates the expediency of cross-industry convergence of industryleading companies, refusal to position oneself as a single-industry supplier, although it is difficult in terms of product structure, and transition to the role of a “style of consumption” supplier. It should also be noted the need to restructure the state system for supporting and stimulating the development of sectoral and inter-sectoral digitalization processes as the technological convergence of digital interaction schemes of various participants in the sectoral economy based on universal digital ecosystems. The priority area of such support from the point of view of the Russian economy is to ensure the digital reintegration of the production and infrastructural potential of strategically important regions of the country, which currently lag significantly behind the regions that are leaders in the industrial development rating. First of all, these are the Far Eastern regions, the North Caucasus and the regions adjacent to the largest Russian megalopolis, economically isolated from the non-digital (traditional) infrastructure of the corresponding megalopolises. The task of building a digital infrastructure for innovative backbone industry projects is especially urgent. Such projects include the project of the state corporation Rosatom
530
D. V. Timokhin
“Proryv”. A study of its infrastructure needs revealed the following requests for digital infrastructure elements: a) b) c) d) e)
IaaS; control automation systems; software systems for modernization and support of virtual nuclear power plants; automated control facilities with minimal human participation; Big data tools for analyzing innovative technological solutions in the field of functioning of two-component nuclear power.
2 Assessment of the Requests of the Russian Economy for the Digitalization of Its Individual Industries in the Context of the Analysis of the Events of the Covid-19 Pandemic The assessment of the requests of the sectors of the Russian economy for digital transformation was carried out on the basis of a comparison of the economic results of the functioning of these sectors in the period of 2020 and in the period from January to July 2021. As a benchmark, the average performance of the respective industries in Singapore, South Korea and Japan, which are in the Top 10 countries according to the digital development rating, is used. The analysis involved industries that are relevant from the point of view of infrastructure support for the development of the project of the state corporation Rosatom “Proryv”. The results of the analysis are presented in Fig. 1.
Finance 150 100 Constructi on
50 73
0
Integrated benchmark
102
86
IT sector Performance indicator for Russia (based on data for the Central Federal District)
56
69 Trade
Transport
Fig. 1. Comparative characteristics of the economic performance of some sectors of the Russian economy and the reference integrated indicator, 2020 - July 2021, persent. Compiled by the author based on [1–8].
Figure 2 shows the results of a comparative assessment of the development of individual elements of the digital infrastructure of individual sectors of the Russian economy.
The Use of the Economic Cross Model in Planning Sectoral Digitalisation
531
Comparison of the financial result presented in Fig. 1 and the assessment of the volume of industry investments in the basic digital infrastructure, presented in Fig. 2, allows us to draw the following conclusions. Finance
Average indicato r of digital infrastru cture develop ment
100
80
Integrated benchmark
93
60 79
40
91
IT sector
20 0
Constru ction
62
74 Transpo rt
71
Performance indicator for Russia (based on data for the Central Federal District)
Trade Fig. 2. Comparative assessment of integrated indicators of sectoral development of digital infrastructure and averaged integrated indicators calculated for Singapore, South Korea and Japan, in percent. Compiled by the author on the basis of [1–8]
First, the economic performance of the functioning of the sectors of the economy is directly related to the effectiveness of the development of the sectoral digital infrastructure. Secondly, there is a clear correlation between the industrys position in the ranking by economic performance and the ranking by the integrated indicator of the development of digital infrastructure. In the course of the study, it was also revealed that the leaders in terms of economic efficiency are those industries that make the most of the cross-sectoral ecosystem digital infrastructure. The correlation index between the degree of use of the industrys basic digital infrastructure and economic performance was 0.63, and for the use of the crossindustry digital infrastructure ecosystem, its value reaches 0.76. In general, the correlation between the digital involvement of sectors of the Russian economy and the average indicators of their economic performance for 2020–2021. correspond to the indicators typical for Japan, South Korea and Singapore. This correspondence allows us to assert that in Russia there is a significant demand for the formation of an inter-sectoral ecosystem digital infrastructure as a condition for the formation of Industry 4.0 in the country.
532
D. V. Timokhin
At the same time, it should be noted that Russia lags behind the leading countries in the digital infrastructure development rating. The reasons for this lag include: a) funding gap; b) low investment attractiveness of the intersectoral ecosystem digital infrastructure for private capital; c) the asymmetry of the regions in terms of the provision of basic elements of the intersectoral ecosystem infrastructure, such as the access of the population and business to the Internet, the number of terminals for electronic payments per capita. Given these problems, none of Russias industries is self-sufficient in terms of its economic capabilities for digital transformation. The solution to this problem is seen in the use of a synergistic effect, within which financing of the development of an intersectoral infrastructure ecosystem can be carried out by the participants of the sectoral business on an equal footing. The economic efficiency of such financing will increase due to the synergistic effect, and at the same time, the scheme for organizing state support for sectoral digital transformation will be simplified by eliminating duplicate requests from sectors and harmonizing funding over time. Of great importance for the implementation of such financing is the selection of a methodology that makes it possible to link industry infrastructure demands and to rank them according to the criterion of the complex economic attractiveness of smart technologies supported on its basis.
3 Assessment of the Economic Attractiveness of the Formation of the Digital Infrastructure of the Project of the State Corporation “Breakthrough” Based on the Model of the Economic Cross The economic cross model is a vector model of the closure of isolated technological processes based on projects that form consumer value. The economic cross consists of two technological processes of the maximum level, each of which can include technological processes represented by economic crosses of the lower levels. If the closure of more than two technological processes is necessary for the formation of consumer value (innovative goods), the technological process to be evaluated is selected as an independent (independent), and the remaining technological processes are presented as a single integrated technological process. Taking into account the infrastructure needs, we will present a model of the economic cross of the project of the state corporation Rosatom “Prorvyv” at Table 1. Modeling the development of the project of the state corporation “Rosatom” on the basis of the economic cross is carried out in three stages: a) all elements of the technological process are selected that form Ri and Cj. b) the continuum of Ri and Cj is calculated, a set of economic crosses is built on the basis of their possible closures of technological processes. c) the option of closingRi and Cj is chosen so that they provide the maximum of the 5 function F = n5 i−1 j=1 (Ri − Cj).
The Use of the Economic Cross Model in Planning Sectoral Digitalisation
533
Table 1. - Model of the economic cross of the digital infrastructure of the state-owned Rosatom “Breakthrough”. Developed by the author. Finance
Construction
Trade
IT industry
Transport
IaaS
P = Ri-Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
Server purchase (costs C1)
Control automation systems
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
Planning the life cycle of modules (costs C2)
Software systems for modernization and support of virtual nuclear power plants
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
Creation of a unified information and software ecosystem (costs C3)
Automated control facilities with minimal human participation
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
Planning and organization of replacement of management chains with digital protocols (costs C4)
Big data tools for analyzing innovative technological solutions in the field of functioning of two-component nuclear power
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
P = Ri -Cj
Costs for the reorganization of the contract network, optimization of their terms based on big data analysis (costs C5)
Ensuring the safety and sustainability of long-term financial flows (revenueb R1)
Formation of “smart” modular production in two-component nuclear energy (revenue R2)
Optimization of the network of contracts with buyers and suppliers using big data analysis (revenue R3)
Process virtualization (simulation) (revenue R4)
Industry-wide real-time traffic management and planning (revenue R5)
4 Conclusions Thus, the use of the economic cross model in the design of sectoral digitalization makes it possible to select such technological solutions that will provide the maximum synergistic economic effect on an industry-wide scale. The use of the economic cross is recommended when distributing government funding for the Breakthrough project.
534
D. V. Timokhin
References 1. 2. 3. 4. 5. 6. 7. 8.
9.
10.
11.
12. 13.
Portal site of official statistits of Japan. https://www.e-stat.go.jp/en Korean Statistical Information service. https://kosis.kr/eng/ A Singapore Government Agency Website. https://www.singstat.gov.sg/ Russian Federal State Statistics Service. https://eng.rosstat.gov.ru/ Analitical site Testfirm (sectoral data). https://www.testfirm.ru/keyrates/ JETRO Global Connection -Accelerate Innovation with Japan (2020). https://www.jetro.go. jp/en/jgc/reports/2020/6790871cde54c518.html OECD.ORG Innovation and Technology. https://data.oecd.org/korea.htm#profile-innovatio nandtechnology Economic Survey of Singapore 2020. Chapter 6 | Sectoral Performance. https://www.mti. gov.sg/-/media/MTI/Resources/Economic-Survey-of-Singapore/2020/Economic-Survey-ofSingapore-2020/Ch6_AES2020.pdf Vidushini, S., Hoppe, T., Mansi, J.: Green buildings in singapore; analyzing a frontrunner’s sectoral innovation system. Sustainability. 9(6), 919 (2017). https://doi.org/10.3390/ su9060919 Timokhin, D.V.: The use of digital tools in the formation of two-component nuclear energy on the base of economic cross method. In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) BICA 2020. AISC, vol. 1310, pp. 508–516. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-65596-9_62 Putilov, A.V., Timokhin, D.V., Bugaenko, M.V.: The use of the economic cross method in IT Modeling of industrial development (using the example of two-component nuclear energy). In: Samsonovich, A.V., Gudwin, R.R., Simões, Ad.S. (eds.) BICA 2020. AISC, vol. 1310, pp. 391–399. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-65596-9_47 Plotonova, O.: Rosatom creates a virtual nuclear power plant. Electr. J. Atom. Expert (2020). https://atomicexpert.com/virtual_npp_rosatom Putilov, A.V.: On the prospects of digital transformation in nuclear industry (2020). http:// sng-atom.com
Modeling the Economic Cross of Technological Platform for Sectoral Development in the Context of Digitalization Marina Vladimirovna Bugaenko1 , Dzhannet Sergoevna Shikhalieva2 and Dmitriy Vladimirovich Timokhin1,2(B)
,
1 National Research Nuclear University MEPHI, 115409 Moscow, Russia 2 Moscow State University of Humanities and Economics, 109044 Moscow, Russia
Abstract. The article examines the trends in the technological development of modern society from an economic point of view. The most valuable areas of technological development and infrastructural and economic conditions for longterm support of relevant trends have been identified. The economic problems and demands that businesses face when designing new industries based on the use of smart technologies and the range of solutions currently used are investigated. in the context of the coronavirus pandemic and a forced experiment to scale up the use of technological platforms that existed at that time. Taking into account the development trends of the innovative economy in 2021–2022. forecasts were made regarding the further development of technological platforms as a factor in the formation of a “smart” economy. The most effective ones from the point of view of the current geoeconomic environment have been identified. The applicability of the economic cross model to solving problems of designing inter-industry technological platforms has been proved. The areas in which the formation of technological platforms based on the model of the economic cross is economically justified are identified. The barriers to the introduction of smart technologies into the sectoral reproduction chains, which can be solved through the development of technological platforms based on the economic cross, are indicated. The directions of adaptation of the economic cross methodology in relation to the needs of the formation of a “smart” technological platform for nuclear energy (within the framework of the project of the state corporation Rosatom) are proposed. Keywords: Technological platforms · Smart technologies · Industry economic modeling · Technological convergence · Innovation · Rosatom.
1 Introduction The system-forming trend in the technological development of economic sectors is the consolidation of individual industrial producers within the production chains of aggregated innovative products. This trend was formed in the period 2000–2010, but received a significant impetus for development in 2019–2020. as a result of the Covid19 pandemic, which became the reason for the withdrawal to the online sphere of a part © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 535–541, 2022. https://doi.org/10.1007/978-3-030-96993-6_59
536
M. V. Bugaenko et al.
of production and economic interactions, which were previously implement-ed mainly in an offline format. The main result of the Covid-19 pandemic is the growing demand for complex “smart” products, the structure of the producers of which is of a cross-sectoral nature. Separate consumption of individual high-tech products, implying significant structural differentiation of consumption, is replaced by standard package demand. The batch demand for standardized innovative products implies: a) the standard structure of an intersectoral product, both in terms of its elements and in terms of the structure of its price; b) an increase in the share of secondary goods and services in an innovative product, such as standard insurance, standard delivery services, and information security services for consumption; c) inclusion in the structure of the product of batch consumption of services related to ensuring uninterrupted access to the product; pricing for such services is based on the public sector economic cross model. In general, the development of an innovative product in 2022–2025. will move towards the formation of a “smart” ecosystem of convergence of inter-sectoral lineup (development), production, promotion and delivery of an innovative product to the client on the basis of a single inter-sectoral technological platform. At the same time, there are still systemic obstacles to technological cross-sectoral integration of the integration of manufacturers of an innovative product based on “smart” technological platforms. A review of the exit practices of industry companies in the United States, Great Britain, China and Russia during the Covid-19 pandemic revealed the following systemic obstacles to technological integration: a) the inability of companies, especially small and medium-sized businesses, to work with big data; for this reason, organizations prefer to build a system of economic relations with counterparties in the institutional space familiar to them; b) lack of uniform standards for the presentation of commercially significant information by companies operating in various industries; for this reason, finding the optimal value chain partner from another industry for the production of an integrated “smart” product is difficult. The purpose of this article is to develop proposals for modeling an industry technological platform, taking into account the threats and opportunities caused by the outstripping economic needs of technologies for creating smart markets in 2010–2018. and removing behavioral barriers to their massive introduction into value chains after the 2019–2020 coronavirus crisis.
2 Economic and Technological Features of the Economic Cross of the Industry Technology Platform The systemic trend of global technological development in the context of the introduction of “smart” technologies and the addition of offline technological chains with a
Modeling the Economic Cross of Technological Platform
537
digital component is technological convergence. Industry-specific innovative solutions currently available on the global market meet the technological requirements of both the manufacturer and the global market. Moreover, a significant part of the technological potential available on the market has not been introduced into production. Figure 1 shows the authors estimates of the ratio of the existing economic potential of using individual elements of “smart” technology platforms and their specific use. Big date analyses toolkit
Systems for ensuring remote interaction, including the organization of remote work
667 203 107
23
409
Smart management 1342 platforms
764
Remote control and monitoring systems
286 307 874
B2B, B2C, B2G platforms
Market capacity at the end of 2018, USD billion Assessment of market demand at the end of 2018, USD billion Fig. 1. Assessment of the volume of use of individual elements of “smart” technology platforms and their specific use in the dock period, billions of dollars
The processes in the economy, caused by the forced experiment with the transfer of part of economic and industrial relations to the online format, led to a narrowing of the gap between the indicators of the potential capacity of the global market for smart technologies and the real need of the economy. The authors assessment of the ratio of these indicators is shown in Fig. 2. Comparison of the information presented in Figs. 1 and 2 allows us to draw the following conclusions. 1. Events in the global market for smart technologies have not led to a change in the structure of demand for them. 2. The lag of the actual demand for smart technologies from the estimated indicator of market capacity decreased in relation to all smart technologies presented in Figs. 1 and 2; there is a change in the structure of the existing gap, with smart technologies that provide the possibility of interactive interaction becoming the leader in reducing the gap. 3. The potential for technological development of traditional business is preserved by replacing traditional forms of interaction with smart technologies.
538
M. V. Bugaenko et al.
Big date analyses toolkit 691
Systems for ensuring remote interaction, including the organization of remote work
412 785
218 187 978
Remote control and monitoring systems
389
939 529
1577
Smart management platforms
B2B, B2C, B2G platforms
Market capacity at the end of 2018, USD billion Assessment of market demand at the end of 2018, USD billion Fig. 2. Assessment of the volume of use of individual elements of “smart” technology platforms and their specific use as of August 2021, billion dollars
The Russian companies that are prioritized in terms of technological development focused on the involvement of smart technologies are shown in Fig. 3. The most capacious for the period 2022–2030. there is a market for smart technologies for traditional activities, such as trade, translation, finance and other types of services that involve the organization of package consumption. The technological trend in the development of the relevant areas will be the introduction of smart technologies, the most significant of which are identified in Fig. 3. Other (averige) Interface Global connection systems Cloud compiting systems Bionanotechnology Remote controle&monitoring systems Big data AI
12 27 28 29 29 35 37 39
0
10
20
30
40
50
Fig. 3. Main technological trends for the period 2022–2030, the percentage of those who marked the relevant technology as a priority is indicated. Prepared based on the results of the author’s survey of representatives of Russian companies (ROSATOM, Yandex, Mail.ru, Sber)
Modeling the Economic Cross of Technological Platform
539
The implementation of the appropriate technologies is proposed on the basis of the formation of the “economic cross” of the smart technology platform. The corresponding economic structure represents the closure of the technological and economic cycles. An example of closing such a cycle is presented in Table. 1. Table 1. An example of closing the “economic cross” for the product life cycle. AI
Big data
Remote controle&monitoring systems
Cloud compiting systems
Global connection systems
Interface
Market entry
y/n
y/n
y/n
y/n
y/n
y/n
Development
y/n
y/n
y/n
y/n
y/n
y/n
Saturation
y/n
y/n
y/n
y/n
y/n
y/n
Organization of re-export
y/n
y/n
y/n
y/n
y/n
y/n
The choice of specific solutions for the organization of the economic cross of the technological platform at the intersection of the term and the columns in Table 1 is due to the specific challenges facing the industry.
3 Modeling a Technological Platform for Interaction with the External Environment for the State Corporation Rosatom The most significant obstacles to the development of nuclear energy are the lack of public trust and the information asymmetry of the space in which nuclear scientists and other members of society interact. The solution of this issue will be facilitated by the formation of an information technology platform that will promote the popularization of nuclear energy. The proposed economic cross of such a platform is presented in Table 2. Table 2 proposes the following elements of the economic cross of a smart technology platform for interconnection with the external environment: 1. A system for collecting information on the main concerns of the population regarding the safety of nuclear power and software support for its analysis and interpretation, a system for informational feedback support using the potential of social networks and other smart intersectoral infrastructure. 2. An analytical system that provides a consolidated collection of information about technical solutions that are relevant to current innovation projects of the state corporation Rosatom.
540
M. V. Bugaenko et al.
Table 2. Model of the economic cross of the information technology platform of interactions of the state corporation “Rosatom” with the external environment. AI
Big data
Remote controle&monitoring systems
Cloud computing systems
Global connection systems
Interface
Market entry
–
1
–
–
2
–
Development
3
1
4
–
5
–
Saturation
–
–
4
–
–
6
Organization of re-export
6
–
–
–
–
–
3. The design system for the development of the global geography of the presence of the state corporation Rosatom, taking into account the geography of public opinion loyalty to nuclear energy, trends in their development since the reformatting of public opinion in the target foreign markets of the state corporation Rosatom. 4. Organization of collection of information on the effectiveness of safety control processes at nuclear power plants and presentation of this information, taking into account the technological and information needs of society. 5. Ensuring timely, complete and comprehensive information interaction between the state corporation Rosatom and stakeholders on the entire spectrum of issues of socioeconomic discussion around local (regional) prospects of nuclear energy. 6. Automation of processes of interaction with potential foreign partners of the state corporation Rosatom, incl. with foreign investors, creditors, government and other interested parties.
4 Conclusions Based on the assessment of the modification of the requirements of the global market for smart technological platforms, directions of adaptive modernization of the technological platform of interaction with the external environment for the state corporation Rosatom are proposed. Taking into account the current trends in the development of the energy market, the technologies necessary to ensure the digitalization process in the nuclear power industry have been identified. The solutions and recommendations formulated for the abstract energy industry are concretized by the example of their real use in solving the problems facing the state corporation Rosatom in the post-coronavirus period. The economic effect is calculated, presumably obtained by the state corporation Rosatom from the introduction of digital platforms into its production and economic activities using the method of the economic cross.
References 1. Matyushok, V., Krasavina, V., Berezin, A., García, J.: The global economy in technological transformation conditions: a review of modern trends. Econ. Res. Ekonomska Istraživanja 34, 1–41 (2021). https://doi.org/10.1080/1331677X.2020.1844030
Modeling the Economic Cross of Technological Platform
541
2. Adelino, R., Silveira, R.C.: ZeroWaste: technological platform to promote solidarity in smart cities. Int. J. Entrepreneurship Gov. Cogn. Cities. 2, 61–82 (2021). https://doi.org/10.4018/IJE GCC.2021010105 3. Samah, A., Ahood, H.: Cyber security of internet of things platform, smart home as example. 27–37 (2021) 4. Luzgina, K., Popova, G., Manakhova, I.: Cyber threats to information security in the digital economy. Adv. Intell. Syst. Comput. 1310, 195–205 (2021). https://doi.org/10.1007/978-3030-65596-9_25(2020) 5. Kosel, K., Miff, S.: Technology Platform Track. https://doi.org/10.4324/9781003010838-5 (2020) 6. Wang, H., Yongjia, F.: Research on automotive cloud technology platform and solutions. J. Phys. Conf. Ser. 1915 042013 (2021). https://doi.org/10.1088/1742-6596/1915/4/042013 7. Pimenova, O.V., Repkina, O.B., Timokhin, D.V.: The economic cross of the digital postcoronavirus economy (on the example of rare earth metals industry, In: Brain-Inspired Cognitive Architectures for Artificial Intelligence: BICA*AI 2020. BICA 2020. Advances in Intelligent Systems and Computing, vol 1310. Springer, Cham (2021). https://doi.org/10.1007/9783-030-65596-9_45
Cooperative Multi-user Motor-Imagery BCI Based on the Riemannian Manifold and CSP Classifiers Sergey A. Titaev(B) National Research Nuclear University MEPhI, 31 Kashirskoe highway, 115409 Moscow, Russia
Abstract. The field in which the work was carried out is the field of neuroscience: neuroengineering - it deals with the study, restoration or improvement of the nervous system by using various engineering methods. Research in this area can provide a scientific understanding detail working of our brain, study many difficult processes, their nature, etc. Based on this knowledge, we can also find ways to rehabilitate various pathological diseases, develop the field of neuroimplants and neuroprosthetics for people, including people with disabilities. Research and projects that require the cooperative interaction of several users can be promising - this can be, for example, the study of various processes corresponding to the psychological or social aspects of human behavior, or the development of such interfaces for the introduction of existing electronic computing systems and mechanisms into the controls. This article describes work that includes research of the possibility of creating a cooperative BCI, an attempt to create it and research of the possibility of using BCI in the rehabilitation of disorders in brain activity caused by stroke. Keywords: Electroencephalography (EEG) · Brain-computer interface (BCI) · Neuroactivity · Neuroplasticity · Classification of conditions
1 Introduction The brain-computer interface is a system that implements the exchange of information between the brain and an external electronic device by reading the electrical activity of the brain and converting it into signals for the external device. Readings can be performed using a variety of invasive and non-invasive neuroimaging technologies, but the most common is the use of an electroencephalogram. A key property of BCI is the ability to implement a channel for the exchange of information with the external environment, independent of conventional pathways, such as peripheral nerves and muscles. The use of this additional channel requires conscious control by the user and is a trainable skill [1]. The use of brain-computer interfaces is not limited to clinical medicine, but the most relevant and necessary is the use of such developments for people who are deprived (due to illness or injury) of the ability to use natural neural pathways to control their own body or interact with the outside world. Also, a promising direction is the development © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 542–551, 2022. https://doi.org/10.1007/978-3-030-96993-6_60
Cooperative Multi-user Motor-Imagery BCI
543
of devices and appropriate software that can expand the capabilities of the human body and bring interaction with computers to the physiological level.
2 BCI Application in Neuroplasticity and Potentiation of the Sensorimotor Cortex of the Brain in Stroke Diseases In the acute period of a stroke, functional recovery occurs as a result of disinhibition of partially damaged parts of the brain and further resumption of their activity. Later, the mechanisms of neuroplasticity are involved in the rehabilitation process. In this case, the term “neuroplasticity” means various mechanisms, such as: plasticity (the ability to rearrange), substitution (change in the localization of the representation of the function in the cortex), compensation (involvement of additional motor cortical zones), reorganization, etc. Evidence for the existence of neuroplasticity can be found in numerous experiments that have been carried out on animals over the past two decades. They showed that the morphofunctional organization of the neuronal structures of the cerebral cortex can be modulated both in the event of damage to the peripheral and central nervous systems and in the learning process [2]. The core component of neuroplasticity is the synapse, which is a dynamic formation that is the main active vector of functional changes [3]. Early structural transformations (number, size, and shape of dendrites) that occur sometime after damage to areas of the brain are most likely associated with the synthesis of new proteins, as well as with the action of a number of growth factors and neurotrophins [4]. Since neuroplasticity is initiated by internal and external factors, some control of this process is possible. Thus, rehabilitation of patients after acute cerebrovascular accident, which is the repeated performance of a certain set of tasks, allows one to stimulate neuroplasticity, which ultimately leads to the consolidation of the stereotype of one movement and inhibition of another. Using the methods of functional neuroimaging, it was found that the activation of the sensorimotor areas of the cerebral cortex can be caused by the observation of any motor act, its mental image, or the passive performance of any movement [5]. In patients after a stroke, training sessions are able to expand the area of representation of certain muscle groups in the M1 zone of the cerebral cortex, while there is a clear correlation with an increase in the strength and volume of motor skills [5, 6]. Thus, the scientific rationale for the use of BCI in neurorehabilitation is provided by the above data on the influence of the process of imagining movement on the processes of neuroplasticity [7]. The use of EEG sensorimotor rhythms in BCI seems to be the most promising in the implementation of neurorehabilitation, because this phenomenon is associated with the motor areas of the brain, while desynchronization of the sensorimotor rhythm does not require real movement, but only its imagination [8]. Thus, the natural type of mental activity that can be recognized in the BCI system is simply imagining the movement of any executive organ (limb, tongue) or some real object. Imagination of movements of various organs generates a different distribution of activity over the surface of the cerebral cortex and different spatial EEG patterns. This facilitates the classification task and improves the accuracy of the stimuli presented to the user.
544
S. A. Titaev
3 Result of Designing a Cooperative BCI for the Rehabilitation of Post-stroke Patients A cooperative BCI with neuro-feedback was designed for 2 users for the rehabilitation of post-stroke patients. Neural feedback is implemented as a two-dimensional maze in which an object moves. This object is controlled by users imagining the movement of this object. One user controls the movement up and down, the other left and right. The system operation consists of two stages - preparatory (training) and the main work cycle (see Fig. 1 and Fig. 2).
Fig. 1. The calibration stage of the BCI working: graphic interpretation.
Fig. 2. The main stage of the BCI working: graphic interpretation.
At the training stage of the system, three processes are launched in parallel: one central process and two for receiving and processing data. In the central process, a random maze is generated using a randomized depth-first search algorithm (or Prim’s
Cooperative Multi-user Motor-Imagery BCI
545
algorithm, this is chosen before starting the system). Then a server is opened in the central process, to which client processes will join in the future. At this time, two other processes are connected to the EEG data streams of each of the users. The learning process begins: with the help of a special interface (see example in Fig. 3), users receive information about which image of movement should be imagined in. Thus, EEG samples with a fixed time duration corresponding to known classes are collected. Further, using the implemented algorithms (they are described in the following sections), the obtained data is processed mathematically. Finally, clients are opened and joined to the central process of the software system. This completes the training phase. The main stage of the work of the developed BCI is cyclical. After completing the training stage, a new window with a maze appears on the screen, which users are invited to go through. Each of the users imagines his own component of the movement of the object based on its current position. Client processes receive samples of brain activity fixed in frequency, classify them according to the training data and their processing, and transmit the results to the server (central process). Based on the data obtained, the central process makes a decision on the movement of the object in the maze, carries out animation of the movement. Users see the movement and, in accordance with it, “go” further along the maze, adjusting their brain activity depending on the position of the object on the screen. This happens until the object reaches the end of the labyrinth, which is marked in a special way. At the end of the passage of the maze, the system stops working, and users are notified of this.
Fig. 3. An example of learning interface working – assumes that User 1 imagines movement to the left and User 2 imagines movement up. There are also commands for the imagination to the right and down for users, in addition to the pause between these states (Rest state).
546
S. A. Titaev
4 Description of the Brain-Activity Conditions Classification Algorithms and Presentation of the Testing Results on the EEG Simulator 4.1 Description of the Brain-Activity Conditions Classification Algorithms and Methods Used in the Implemented BCI There are several of the most popular approaches to solving this problem - using the CSPalgorithm (Common Spatial Pattern), the concept of Riemannian geometry, deep neural networks. This work uses the CSP algorithm and the Minimum Riemannian Distance to Mean (MRDM) algorithm. These algorithms are selected based on the best parameters input data/computational complexity/simplicity of presentation of results or features [9]. The CSP-classification method based on the simultaneous diagonalization of the covariance matrices of each class. First of all the CSP-transformation is applied (this transformation is described in more detail in [9]): Zi = WXi
(1)
where Xi is a matrix of signal amplitudes of size E × T , where E is the number of electroencephalograph electrodes used, and T = fs · t is the number of times counts for a sample length t and a sampling frequency fs . W is a matrix of CSP-transformation: T W = BT P (2) here P is a whitening matrix and B is matrix resulting from the spectral decomposition of the mean covariance matrices of each class (firstly these matrices must go through a whitening process). After applying the CSP transformation, the logarithm of the normalized variance of the 2 m best components is considered the classification feature (taking the logarithm is necessary to approximate the feature distribution to the normal one): var Zp fp = log 2m (3) i=1 var(Zi ) here m is the best components for classification corresponding to the first and last m eigenvectors in matrix B. The MRDM-classification method is a generalization of the method of nearest neighbors for a Riemannian manifold, based on the assumption that objects close to each other belong to the same class. The algorithm assumes a consideration of the metric space between the signals of the electroencephalograph [10]. To determine the proximity of samples, the Riemannian metric of the distance between the covariance matrices of the samples under consideration is introduced: N 0.5
−1 2 log (λi ) δR (C1 , C2 ) = log C1 C2 = F
(4)
i=1
where C1 , C2 is covariance matrix of EEG-probes, λi , i = 1, . . . , n are C1−1 C2 matrix eigenvalues. After receiving all samples of each class, the covariance matrices of each
Cooperative Multi-user Motor-Imagery BCI
547
sample are calculated and, using the introduced metric, the geometric means of the classes are found: N
C k C1k , . . . , Cnkk = arg min δR2 C, Cik C
(5)
i=1
here C k is geometric mean matrix of class k (the classification problem can be solved in the general case for N classes), Cikk is i covariance matrix of a class k. In the process of classifying conditions, the covariance matrix of the next incoming probe is calculated and, using formula (4), it calculates the Riemannian distance to the geometric mean of each class. The probe is assigned the class, the distance to which will be the smallest. 4.2 Approbation of Classification Algorithms for BCI Using an EEG Emulator The emulator is a continuous data stream that is regulated by a special interface. There are 4 emulated states: “right”, “left”, “legs” and “rest”, which correspond respectively to the states when the user imagines: movement with the right hand, left hand, legs and when the user is at rest, not imagining anything. An example of the input data of the emulator is shown in Fig. 4.
Fig. 4. An example of input emulated EEG-data (left hand moving imagination): horizontally shows time counts at a sampling rate of 250 Hz (sample length 2 s–500 counts), vertically, the signal power from each electrode (the number of emulator electrodes is 30).
For training, 3 states were used - “Rest” to emulate the absence of representations of motion, and two arbitrary others as states corresponding to the representation of movements, the number of training trials was 20, this is the maximum value close to the conditions of a real experiment, since the human factor cannot provide correct data on large samples - users can simply lose concentration, up to errors in the imagination of motion images.
548
S. A. Titaev
After that learning process, sets of 20 samples of one class were fed to the input of each of the algorithms - there were 50 such experiments. At the same time, the number of correctly defined classes for samples for every 20 samples submitted was recorded thus, statistics on the classification accuracy were collected. First, the MRDM algorithm was tested in the way described above - the results for two states coming from the emulator are shown in Fig. 5. From the statistics obtained, it is easy to estimate the average accuracy of the written classifier based on the MRDM algorithm: it is about 73–74%.
Fig. 5. Distribution of correctly classified conditions using the MRDM algorithm from sets containing 20 probes of the same class. The left picture corresponds to the “Left” condition of the emulator, the right picture – to the “Right”.
Fig. 6. Distribution of correctly classified conditions using the CSP algorithm from sets containing 20 probes of the same class. The left picture corresponds to the “Left” condition of the emulator, the right picture – to the “Right”.
The next step was to evaluate the accuracy of the CSP algorithm. Each new sample at the classification stage will be characterized by a 4 (at number of features m = 2, see formula (2)) dimensional vector of features, which will need to be correlated with the vectors of samples of already known classes. This was implemented using Linear Discriminant Analysis (LDA). The vectors of features of the conditions of the training
Cooperative Multi-user Motor-Imagery BCI
549
set are shown on Table 1, Table 2 and Table 3, and in Fig. 6 shows the results of testing the CSP-classifier by the already described method. It is easy to show that the resulting average accuracy varies in the range of 78– 80%, which is higher than that of the MRDM algorithm. In addition, the CSP algorithm showed a smaller spread in the number of correct classifications for every 20 probes - this is evidenced by the variance values of each of the samples indicated under the histograms. Table 1. Table of Vector of features “left” EEG condition. № of Learning probe in “left” condition class
1st component of vector of features
2nd component of vector of features
3rd component of vector of features
4th component of vector of features
1
−5.06641545
−4.74341537
−0.38194071
−0.23287185
2
−5.16701268
−4.67705359
−0.29027315
−0.31208486
3
−5.1562899
−4.78693821
−0.45188741
−0.18929583
4
−5.17392949
−4.66416199
−0.43624019
−0.19809117
5
−5.46125482
−5.35697796
−0.31217475
−0.29017075
6
−5.19499739
−4.70102787
−0.24834282
−0.36102928
7
−5.33144234
−4.65254137
−0.40608757
−0.21651999
8
−5.14244089
−4.62667454
−0.35416722
−0.25371526
9
−5.1959026
−4.64256622
−0.33705891
−0.26778586
10
−5.22127202
−4.83382079
−0.28895875
−0.31346485
Table 2. Table of Vector of features “Right” EEG condition. № of Learning probe in “right” condition class
1st component of vector of features
2nd component of vector of features
3rd component of vector of features
4th component of vector of features
1
−4.23745229
−3.63226168
−0.38009729
−0.23438143
2
−4.85182281
−3.60667109
−0.37148448
−0.24062329
3
−4.87242297
−3.41261791
−0.43531305
−0.19888301
4
−4.84512115
−3.5914799
−0.35107304
−0.25637404
5
−4.91998263
−3.45508865
−0.32348615
−0.27997815
6
−4.9769095
−3.41128528
−0.3609393
−0.24869871
7
−4.51185938
−3.38614795
−0.37070021
−0.24134095 (continued)
550
S. A. Titaev Table 2. (continued)
№ of Learning probe in “right” condition class
1st component of vector of features
2nd component of vector of features
3rd component of vector of features
4th component of vector of features
8
−5.0511385
−3.48450449
−0.27362492
−0.33059443
9
−4.77566419
−3.5684178
−0.33492918
−0.26981818
10
−4.80331698
−3.28827936
−0.54034119
−0.14795164
Table 3. Table of Vector of features “Rest” EEG condition. № of Learning probe in “rest” condition class
1st component of vector of features
2nd component of vector of features
3rd component of vector of features
4th component of vector of features
1
−4.08020075
−4.30489095
−0.92912583
−0.05446208
2
−3.98513907
−4.17362551
−0.7527035
−0.08454472
3
−3.84804289
−4.18780608
−0.85292584
−0.06576024
4
−3.97773794
−4.19057913
−0.80748131
−0.07363298
5
−4.03442135
−4.23816729
−0.70993918
−0.09429149
6
−3.89680555
−4.102682
−0.61243624
−0.12165259
7
−4.06468118
−4.04416351
−0.67236001
−0.10392267
8
−4.09108215
−4.03169481
−0.72973488
−0.08964055
9
−3.95377483
−4.38001793
−0.50192945
−0.16429604
10
−3.85912947
−4.15201551
−0.63332754
−0.1151154
5 Conclusion A multi-user BCI was developed as a result of research work, which has a fairly good level of accuracy in classifying states - on average, at least 70%, such a result was shown by tests using a special EEG emulator, which at the first stage replaced real users and made it possible to test the system outside laboratory conditions. At the current stage of research on the basis of the Center of Bioelectric Interfaces of the National Research University Higher School of Economics, the developed system has shown less accuracy in classifying the conditions of brain activity. Trials with real users using electroencephalography are actively being carried out. The developed system is edited, frequency filters are added with the search for optimal frequencies for filtering data, the existing algorithms are optimized with new mathematical tools. It is supposed to achieve high accuracy in the classification of conditions (at least 90–92%) with an attempt to integrate into the medical field, as well as with statistical studies in the rehabilitation of post-stroke patients.
Cooperative Multi-user Motor-Imagery BCI
551
References 1. Lebedev, M.A., Nicolelis, M.A.L.: Brain-machine interfaces: past, present and future. Trends, Neurosci. 29, 536–546. Russia (2006) 2. Zhivolupov, S.A.: Changes in the nervous system in traumatic lesions of the nerve trunks of the limbs and plexuses (morphological experimental and morphological research). St. Petersburg: VMedA, pp. 45–65 (1988) 3. Foeller, E., Feldman, D.: Synaptic basis for developmental plasticity in somatosensory cortex. Curr. Opin. Neurobiol. 14, 89—95. Elsevier, Amsterdam (2004) 4. Poo, M.: Neurotrophins as synaptic modulators. Nat. Rev. Neurosci. 2(1), 24–32. Nature Publishing Group, United states (2001) 5. Liepert, J., Graef, S., Uhde, I.: Training-induced changes of motor cortex representations in stroke patients. Acta Neurol. Scandinavica 101, 321–326. Wiley-black-well, New Jersey, United states (2000) 6. Johansson, B.B.: Brain plasticity and stroke rehabilitation. Stroke 31, 223–230 . American Heart association, Dallas, United states (2000) 7. Zhivolupov, S.A., Samartsev, I.N., Syroezhkin, F.A.: The modern concept of neuroplasticity (theoretical aspects and practical significance). J. Neurol. Psychiatry named after S.S. Korsakov 113(10), 102–108. Media-sphera, Russia (2013) 8. Pfurtscheller, G., Aranibar, A.: Evaluation of event-related desynchronization (ERD) preceding and following voluntary self-paced movement. (eds.) Electroencephalogr. Clin. Neurophysiol. 46(2), 138. Elsevier, Amsterdam (1979) 9. Kapralov, N.V., Nagornova, Z.V., Shemyakina, N.V.: Methods for the classification of EEG patterns of imaginary movements. Inform. Autom. 20(1), 94–132. SPC RAS, St. Petersburg, Russia (2021) 10. Barachant, A., Bonnet S., Congedo, M., Jutten, C.: Multiclass brain – computer interface classification by riemannian geometry. IEEE Trans. Mag. 59(4), 920–928. IEEE Magnetics Society, New Orleans (2012)
Sentiment Analysis of Social Networks Messages Evgeny Tretyakov1 , Dobrica Savi´c2 , Anastasia Korpusenko1 , and Kristina Ionkina3(B) 1 National Research Nuclear University MEPhI, Kashirskoe Highway, 31, 115409 Moscow,
Russian Federation 2 Vienna International Centre, International Atomic Energy Agency, PO Box 100,
1400 Vienna, Austria 3 Plekhanov Russian University of Economics, Stremyanny Lane 36,
117997 Moscow, Russian Federation [email protected]
Abstract. In the modern era of artificial intelligence and machine learning, data mining is becoming an important tool for determining public opinion and social research. In this regard, sentiment analysis is a new method of studying public opinion, in particular, as a nontrivial approach to the analysis of political texts. This paper examines the nature of sentiment analysis in political texts, identifies the problems which researchers face when analyzing political texts, and identifies the difficulties that affect the accuracy of results. The aim of this study is to determine the relevance of sentiment analysis in the analysis of political texts. It presents an ongoing work that is developing an algorithm combining a lexical-oriented approach with machine learning, that studies stylistic devices (e.g., sarcasm, irony and hyperbole), and provides options for determining the sentiment of texts in sentences containing these stylistic devices. As a result of the experiments, patterns that affect the accuracy of the analysis result are identified and ways to handle them are suggested in order to improve the accuracy of the results. Options for determining the sentiment of texts in sentences containing stylistic devices are provided as a contribution to the scientific field. Keywords: Computational linguistics · Machine learning · Natural language processing · Sentiment analysis · Social media analysis
1 Introduction Sentiment analysis has become a very useful tool in data mining. It is also considered useful in analyzing political texts for identifying people’s opinions about politics, politicians, and elections. Yet, the algorithms that aim to determine the sentiment of texts fail to demonstrate high accuracy, similarly to the ones dealing with political discourses. The sentiment analysis of political texts in recent years has aroused the interest of many researchers who study opinion mining (e.g. Haselmayer et al. 2016; Bhayani et al. 2009; Pak et al. 2010; Gunda et al. 2016). However, despite a large number of studies in this area, the topic remains relevant, since the problem of creating an algorithm for analysis of political texts with a sufficiently © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 552–560, 2022. https://doi.org/10.1007/978-3-030-96993-6_61
Sentiment Analysis of Social Networks Messages
553
high accuracy of results has not yet been solved. This is the motivation for this study, which aims to experimentally find out the reasons for the low accuracy of the results and provide possible solutions to this problem. Our work uses the social network Twitter to perform analysis. This choice is due to several reasons, including the significance of this type of social media posts that are used in Twitter: posts of small size, containing concise and expressive writing. Moreover, Twitter contains a large number of subjective statements on political topics, which is more important for our research in comparison with articles in the mass media and scientific journals, which mostly contain factual descriptions of political events and opinions on this topic. In view of the above, this research is conducted to gain understanding of the reasons for the high complexity while analyzing the sentiment of political texts. It also aims to develop an algorithm for determining the sentiment of texts, taking into consideration the features of political texts that impact the accuracy of results.
2 Background One of the most popular areas for data mining is socio-political life. Political discourses cannot be condensed into a simple objective presentation. The tone of message should have the same level of influence on the recipient of information as its content. Devoted to this issue, numerous studies in this field focus on the tone of the information in news media, political speeches, election campaigns, and so forth. There is also a growing number of research showing that the effect of perceived texts addresses not only the conscious, but also the subconscious component of the human psyche and the cognitive perception thus giving rise to emotions and feelings. Having this in mind, an in-depth analysis of political texts should identify the mood among people and understand the effect of information on their mood. This analysis can serve as a basis for research goals of data mining, especially in sentiment analysis. The main task of sentiment analysis is identifying emotions and determining their polarity. While emotion detection focuses exclusively on emotive labels, the task is often a binary classification - consisting of two groups, producing “positive sentiment” and “negative sentiment” results. They are usually interrelated but also interdependent. The aim of sentiment analysis in social networks is to identify users’ attitudes and opinions by analyzing and extracting subjective texts that include users’ moods, opinions and preferences. The main information base for political texts is users’ messages in social networks, blog posts and news columns. Sentiment analysis can be aimed at two distinct parts: opinions from users and opinions from news articles (containing author’s personal opinions). Although news articles may be unbiased and provide only facts, any text written by a person tends to be subjective in one way or another. Sentiment analysis in politics mainly involves the trend analysis, political orientation or ideological bias of the electorate for targeting campaigns or evaluating political reactions.
554
E. Tretyakov et al.
2.1 Literature Review With an increasing popularity of blogs and social networks, opinion mining and sentiment analysis have attracted attention of researchers. Many researchers have started to actively express opinions on a wide range of issues found on the Internet, mainly conveyed through Twitter and similar channels. Twitter opinion analysis is a fast and effective method of measuring public opinion adopted by political parties, businesses, marketing departments, social and political research. However, the unique characteristics of the messages that users leave on Twitter create new challenges for modern methods of sentiment analysis which focuses initially on analyzing information corpus of a larger volume. Some types of data, such as posts on Twitter and other social media sites, differ from regular formally stylized texts due to their informal approach and grammatical mistakes. Yet, such posts are as significant as any formal documents published on social media in terms of content. Sentiment analysis is considered a natural language processing method for performing various tasks. Starting with the problem of classification at the document level [11], sentiment analysis has been actively used for determining the sentiment of sentences [6]. Recently, it has been also actively used for determining the sentiment of individual phrases [12]. Posts from microblogs (e.g.,Twitter), where users post real-time reactions and opinions on various topics, create new challenges for data mining, for which sentiment analysis is a new approach and a feasible solution. Some early results of sentiment data analysis in Twitter are presented by Go, Bhayani, and Huang [5]. To obtain sentiment data, they used machine-learning techniques which identified tweets ending with positive emoji (e.g., “:)”, “:–)”) and negative emoji (e.g., “:(” and “:–(”). They built models that used a simplified Bayes algorithm with a support of the vector machine (SVM) method. They used the unigram and bigram models among the features of the analysis units combined with the functions of parts of speech tagging. It was found that the unigram model was superior to other models and showed the most accurate results. Pak and Paroubek [7] collected data with a similar machine learning paradigm, although they performed a different task - subjective and objective classification. To obtain subjective data, they collected tweets ending with emoji in the same way as Go and colleagues did. To gain objective data they looked at the Twitter accounts of popular Newspapers (e.g., New York Times and Washington Post). They reported that the model based on the functions of parts of speech and bigrams showed the best results. Yet, these approaches were based on n-gram models, while the data they used for training and testing was collected by search queries and therefore was regarded as biased.
3 Results of Linguistic Formula Experiments In contrast to the above work, this study investigates a different method of sentiment analysis. The method adapted in this research is based on the use of sentiment lexicons. The material is a random sample of streaming tweets on political topics, as opposed to data collected from specific queries.
Sentiment Analysis of Social Networks Messages
555
This study reveals the possibility of using Twitter as a corpus of materials for sentiment analysis on political topics. Twitter was used for sentiment analysis of political messages for the following reasons: • microblogging platforms, especially Twitter, are used by people in different walks of society (e.g., politicians, journalists, writers, students and housewives) to express opinions on various topics. Thus, it is deemed a valuable source of people’s opinions; • Twitter contains many text messages, and that number keeps growing daily. The assembled research sample can be vast; • Twitter audience ranges from ordinary people to celebrities, company representatives, politicians, and even state leaders. Therefore, one can collect text messages from members of different social groups; • Twitter audience is geographically disbursed, represented by users from many countries, although predominantly from the US. Data is multilingual and can be collected in many different languages. We used corpus-based semantic-oriented approach to analyze Twitter messages posted by users in expressing their personal opinions. We collected, a sample of messages from the Twitter network, consisting of 2458 messages from 20 accounts over a period of 70 days. The sample was gathered through the Octoparse scraper program, a service that allows collecting material from open-access sites, including Twitter. The collected messages were mainly about Russian and American politics. They also included current news and interviews with politicians. Each message was presented as a text of 7–280 characters. Images, videos and audio files were excluded during the pre-process and not considered for the analysis. The selection consisted of four columns: message text, UserName, UserID and UserURL. A fifth column—the user’s assessment of the message sentiment—was added manually after all messages were marked. Following the collection of the sample, all messages were marked manually as positive, negative, or neutral. It should be noted that analyses focused exclusively on polarized opinions considered only three moods: positive, neutral and negative. However, three basic moods are regarded as insufficient for increased accuracy of results. Thus, sentiment analysis should be made of several types of mood parameters (i.e., very positive, positive, neutral, negative, very negative). After the messages were marked, we created and tested a machine learning algorithm to determine the tone of texts. For this purpose, three experiments were conducted to analyze the sentiment of messages. The first experiment was to select and test a readymade machine learning algorithm on our data. The second one was to create and test a machine learning algorithm for determining sentiment combined with the elements of a lexical-oriented approach. In the first experiment, we choose an algorithm written by Daityari Shaumik on September 26, 2019 for sentiment analysis. Shaumik prepared a data set of sample tweets from the nltk package for NLP with various data cleaning methods. When the
556
E. Tretyakov et al.
data set was ready for processing, Shaumik trained the model on pre-classified tweets and used the model to classify sample tweets into negative and positive sentiments. Shaumik used the binary classification for positive and negative messages in his work. As a classifier for determining the tone of texts, he used the method of Naive Bayes Classifier, which assumes strong independence assumptions between the words. Firstly, Shaumik consulted the nltk library and used examples of messages from the Twitter network presented in the library. We conducted our experiment with messages collected from Twitter on political topics. After collecting the messages, we have tokenized them. A token is a sequence of characters in texts that serves as a unit. Based on the way the tokens are created, they may consist of words, emoticons, hashtags, links, or individual characters. A basic way of breaking language into tokens is by splitting the text based on whitespace and punctuation. Shaumik’s next step was to normalize the data. Normalization in NLP is the process of converting a word to its canonical form. Normalization helps group words with the same meaning but different forms. He also assigned parts of speech to each word using the pos_tag tokenizer from the nltk library. After we determined the parts of speech, it was easier to analyze sentences, assuming that adverbs and adjectives can carry more shades of emotion and have more sentiment weight than other parts of speech. Prepositions and conjunctions can be ignored, and negative particles change the meaning of the phrase to the opposite. Shaumik proceeded with deleting those words that disrupt analysis, and removed noise from the dataset. Noise is any part of texts that does not add meaning or information to data: • hyperlinks – all hyperlinks in Twitter are converted to the URL shortened “t.co” form, since keeping them for text processing would not add any value to the analysis; • Twitter handles in replies – these Twitter usernames are preceded by a “@” symbol, which does not convey meaning; • punctuation and special characters – while they provide a context to textual data, the context is often difficult to process. For simplicity, Shaumik removed punctuation and special characters from tweets. Shaumik then used the method of the Naive Bayes Classifier, to train the algorithm on pre-marked positive and negative messages. His result shows that the accuracy was 99.5%, while the accuracy of our data was just 53%. We followed this with conducting two experiments. The first one was to create and test a sentiment detection algorithm using a sentiment lexicon. In this experiment, we used the SentiWordNet sentient lexicon, based on the online dictionary WordNet and supplemented it with information about sentiment. Its marking was fuzzy and was performed by adding three mood ratings to each word in WordNet. Each synset had three types of scores: 1. Pos: positive score; 2. Neg: negative score; 3. Obj: objective score.
Sentiment Analysis of Social Networks Messages
557
In our experiment, we consider only positive and negative scores. The gained results showed the accuracy of 56%. Second experiment was conducted using the SentiWordNet sentiment lexicon. It also uses formulas for parts of speech to test the effect of formulas for the interdependence of parts of speech on the accuracy of results. Abbreviations: • • • • • • • • •
N – noun; Adj – adjective; Adv – adverb; Pos – positive; Neg – negative; Col – collocation; Sh – shifter; Dow – downtoner; Amp – amplifier. Sentiment determination formulas with parts of speech
N(Pos) + N(Pos) = Col(Pos)
N(Neg) + N(Neg) = Col(Neg)
Adj(Pos) + N(Pos) = Col(Pos)
Adj(Neg) + N(Neg) = Col(Neg)
Adv(Pos) + Adj(Pos) = Col(Pos)
Adv(Neg) + Adj(Neg) = Col(Neg)
Adj(Pos) + Adv(Pos) = Col(Pos)
Adj(Neg) + Adv(Neg) = Col(Neg)
The formulas show that when two positive words are used in a phrase, the phrase can be considered positive; the same is with chin with both negative words. N(Pos) + N(Neg) = Col(Neg)
N(Neg) + N(Pos) = Col(Neg)
Adj(Pos) + N(Neg) = Col(Neg)
Adj(Neg) + N(Pos) = Col(Neg)
Adv(Pos) + Adj(Neg) = Col(Neg)
Adv(Neg) + Adj(Pos) = Col(Pos)
Adv(Pos) + Adj(Neg) = Col(Neg)
Adj(Neg) + Adv(Pos) = Col(Neg)
The analyzed material demonstrates that when a positive and negative word are used in a phrase, in which the former is followed by the latter, the negative word has more weight in the phrase, and the phrase takes over its sentiment and becomes negative. The study of the examples showed that when using the first negative and then positive words, which appears consecutively, the General tone of the phrase becomes negative, except the chain Adverb + Adjective, where the positive adjective exerts more weight and makes the entire phrase positive.
4 Experiments with Shifters Using formulas in the sentiment analysis simplifies the task and increases the accuracy of results; it is also useful and convenient. However, these resources are insufficient
558
E. Tretyakov et al.
because classifying polarity is complex and requires considering many semantic details and subtleties of languages that affect results, increasing or decreasing the weight of sentiment and sometimes changing sentiment to the opposite completely. Such semantic details are called “shifters”. Sentiment shifters (also called valence shifters) are words that affect (increase/decrease) the polarity of opinions. For example, in the sentence “I do not like the politics of this party”, the word-shifter “not”, which stands before the word “like” with a positive sentiment, changes the meaning of the entire phrase into the opposite, making it negative. Such words are called downtoners and amplifiers. Thus, ignoring shifters in sentimental analysis can lead to a pronounced decrease in the accuracy of sentiment analysis. Sh + N(Pos) = N(Neg)
Amp + N(Pos) = N(Pos)*2
Dow + N(Pos) = N(Pos)/2
Sh + N(Neg) = N(Pos)
Amp + N(Neg) = N(Neg)*2
Dow + N(Neg) = N(Neg)/2
Sh + Adj(Pos) = Adj(Neg)
Amp + Adj(Pos) = Adj(Pos)*2
Dow + Adj(Pos) = Adj(Pos)/2
Sh + Adj(Neg) = Adj(Pos)
Amp + Adj(Neg) = Adj(Neg)*2
Dow + Adj(Neg) = Adj(Neg)/2
Sh + Adv(Pos) = Adv(Neg)
Amp + Adv(Pos) = Adv(Pos)*2
Dow + Adv(Pos) = Adv(Pos)/2
Sh + Adv(Neg) = Adv(Pos)
Amp + Adv(Neg) = Adv(Neg)*2
Dow + Adv(Neg) = Adv(Neg)/2
Shifters that change the meaning of a phrase to the opposite are not limited to negative particles. Some types of verbs (such as “reduce”) and adverbs (such as “less”) can also act as shifters.
5 Discussion In our study, we used part-of-speech formulas combined with a sentiment lexicon to increase the accuracy of results. However, the experiment has shown that the accuracy was 56%, and the difference between the responses of the two experiments was only 31 positions. The large difference between the accuracy of the algorithm of Shaumik and the accuracy on another training data is a typical feature of many modern machine learning algorithms, which leads to the conclusion that, at present, the issue is not about creating an algorithm that determines the tone of texts because there are already many readymade algorithms with high claimed accuracy and showing low results. Rather, the issue is about increasing the accuracy of results, and a possible solution is to study stylistic devices and techniques, especially sarcasm, irony and hyperbole.
6 Conclusions As a result of the study, it was revealed that despite the different methods used by researchers in the field of determining sentiment analysis, the problem of low accuracy of results remains relevant. After the experiments, it was found that the stylistic component of the language, specifically, stylistic devices and means of lexical expressiveness, plays
Sentiment Analysis of Social Networks Messages
559
an important role in determining the sentiment of the text. In this regard, we find a solution to the problem of low accuracy in comprehensive work with stylistic devices. In view of what has been discussed here, a new task is confronting us—to identify sarcasm, irony and hyperbole in texts for increasing the accuracy of results of sentimental analysis. Possible solutions to identify these stylistic devices may be: • choosing words that are often found in sentences with sarcasm and changing the sentiment into the opposite in sentences containing such words. One simple method can be used: if a tweet is classified as “positive” in sentiment analysis, but contains “sarcasm word”, the sentiment of the tweet should be changed to “negative” [2]; • some authors have noted that if a text begins with an interjection followed by an adjective or adverb, then the text is also sarcastic. The examples are ‘wow’ and ‘unbelievable’ [3]; • searching for signs associated with ambiguity. One aspect of sarcasm and irony is the ambiguity of statements that contain sarcasm or irony. It can be assumed that if a word has many meanings, the probability that the literal and intended meaning of the sentence will be opposite will increase [9]; • the punctuation-related feature is used to detect sarcasm and irony using a form of expression. For each tweet, the number of exclamation marks, question marks, capital letters and word quotes should be counted. The use of the same letter more than twice in a word should also be counted [1, 10]; • a hyperbolic text contains one of the properties (e.g., intensifier, interjection, quotes and punctuation). These features can be used to achieve a result in sarcasm detection; • it is also possible to collect twits containing the tags #sarcasm, #sarc, #sarcastic, #sarcastictweet, #not, #irony; • the expression of laughter (e.g., hehe, haha, and lol), exclamation words (e.g., ah, uh, eh, oh, wow and wah) and words that are rarely used can sometimes be found in messages containing sarcasm [8]. It is expected that sarcasm detection would improve the results of sentiment analysis. Furthermore, the consideration of the context in sarcasm detection is an interesting area for future work.
References 1. Barbieri, F., Saggion, H., Ronzano, F.: Modelling sarcasm in twitter a novel approach. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 50–58 (2014) 2. Bharti, S.K., Vachza, B., Pradhan, R.K., Babu, K.S., Jena, S.K.: Sarcastic sentiment detection in tweets streamed in real time: a big data approach. Digit. Commun. Netw. 2, 108–121 (2016) 3. Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic sentences in twitter and amazon. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, ACL, pp. 107–116 (2010)
560
E. Tretyakov et al.
4. Digital Ocean [Online source]. Shaumik Daityari: How To Perform Sentiment Analysis in Python 3 Using the Natural Language Toolkit (NLTK) (2019). https://www.digitaloc ean.com/community/tutorials/how-to-perform-sentiment-analysis-in-python-3-using-thenatural-language-toolkit-nltk. Accessed 04 Jun 2020 5. Go, A., Bhayani, R., Huang, L.: Twitter Sentiment Classification Using Distant Supervision, p. 138. Technical report, Stanford (2009) 6. Kim, S., Hovy, E.: Determining the sentiment of opinions. In: COLING 2004, p. 171 (2004) 7. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC, p. 143 (2010) 8. Rajadesingan, A., Zafarani, R., Liu, H.: Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 97–106 (2015) 9. Tayal, D., Yadav, S., Gupta, K., Rajput, B., Kumari K.: Polarity detection of sarcastic political tweets. In: Proceedings of International Conference on Computing for Sustainable Global Development (INDIACom), IEEE, pp. 625–628 (2014) 10. Tungthamthiti, P., Kiyoaki, S., Mohd, M.: Recognition of sarcasm in tweets based on concept level sentiment analysis and supervised learning approaches. In: Proceedings of Pacific Asia Conference on Language, Information and Computing, pp. 112–124 (2014) 11. Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: De Raedt, L., Flach, P. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44795-4_42 12. Wang, H., Lu, Y., Zhai, C.: Latent aspect rating analysis without aspect keyword supervision. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 618–626. ACM, New York (2010)
Smart Technologies in REM Production Economy in Russia Victoria Olegovna Pimenova1 , Gusov Zakharovich Auzby2 and Evgeniy Valerievich Trubacheev1,3(B)
,
1 National Research Nuclear University MEPHI, 115409, Moscow, Russia 2 Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya Street,
Moscow 117198, Russian Federation 3 Moscow State University of Humanities and Economics, 109044 Moscow, Russia
Abstract. The article discusses the problems of the development of the rare earth metals industry in Russia using smart technologies. Although China currently controls about 60% of the rare earth metals market, Russia has significant competitive potential in this area. In the context of the expansion of Russian business into resource markets and the preparation of the country for techno-logically independent development in the conditions of the cyclical down-turn of the global economy 2022−2030. Russia’s interest in recreating the Soviet closed system of REM production is expected to increase. The geography of REM production is determined by the geography of REM ore de-posits, and the planned contour of the geography of REM production is determined by the strategic needs of the Russian industry and its partners in the technological chain. At the same time, the restoration of REM production is not advisable on the basis of the Soviet model, since it did not take into ac-count the capabilities of existing smart technologies. The purpose of the scientific article is to determine the directions of the introduction of “smart” technologies into the production and economic model of Russian REM production. The reserves of increasing the economic efficiency of REM production are investigated, taking into account the possibilities of introducing “smart” technologies into the process of designing and constructing REM production in Russia. The actual ways of using the foreign experience of integrating “smart” technologies, taking into account the cur-rent needs of REM production in Russia, are revealed. Keywords: Smart technologies · Modernization · REM-production · Industry digitalization.
1 Introduction The development of the production of rare earth metals is an important condition for the formation of a technologically integrated innovative economy in Russia. The dependence of modern technological solutions on the use of certain categories of rare earth metals as an alternative resource is increasing. After a broader integration of additive technologies © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 561–568, 2022. https://doi.org/10.1007/978-3-030-96993-6_62
562
V. O. Pimenova et al.
into industry and a comprehensive technological transformation of the materials manufacturing industry based on manipulation of their structure at the nano level, the market capacity of rare earth materials may grow by 2.5 times. This circumstance, against the background of tougher competition between the concepts of innovative planetary system development proposed by global centers of technological development, raises the question of the need to ensure Russia’s self-sufficiency in terms of self-sufficiency of its producer with rare earth resources. The resource potential of the Eurasian Union, whose central economy is Russia, makes it possible to recreate the industry of production of rare earth metals. The prototype of the architecture of such an industry is supposed to be the architecture of the RZM production in the USSR. At the same time, the economic side of the issue of recreating the REM industry on the basis of the countries of the Eurasian Union includes such aspects as: a) providing conditions for financing the process of recreating the REM industry at the initial stage, since the payback period is 3−7 years; b) designing the structure of industries that are heterogeneous from the point of view of the participants of the REM in such a way as to ensure the maximum synergetic effect for the national innovation process in the long-term perspective; c) linking the needs of the target buyer of Eurasian REM products and the structure of REM production, designed at the planning stage of the life cycle.
2 Assessment of the Economic Prospects of the Eurasian REM Production The features of the market of REM products are: – China’s dominance in the ranks of all types of REM products, whose share throughout the period 2000–2020 did not decrease below 65% despite the efforts of the United States and other industrialized countries; in the 2010s, at peak values, China’s share in the market of REM products was about 90% of the global market; – the increase in demand for rare earth metals is higher than the growth of the global economy and demographic growth rates; – increasing the importance of rare earth metals as non-alternative resources for innovative production; the peak of this trend is planned for the period 2035–2040 and will coincide with the transition of global production to the mass use of additive technologies. The assessment of the reserves of REM ore in Russia indicates the possibility of fully meeting the needs of the national economy with rare earth raw materials while maintaining the country’s role as its exporter. Currently, Russia’s share in the global market is significantly lower than the value determined by its resource capabilities. The comparative characteristics of the resource and production potential of the leading participants in global REM production are presented in Fig. 1. Currently, the involvement of resources for the development of REM pro-duction will require the formation of an
Smart Technologies in REM Production Economy in Russia
563
infrastructure “from scratch”, which al-lows us to raise the question of its initial design using smart technologies.
14
Other
19
Brasil 7
Mayanmar 3
Australia
Share in world reserves 8
India
Global marcet share 11
Russia 1
US
37
China 0
20
40
58 60
80
Fig. 1. Comparative characteristics of the resource and production potential of the leading participants in global REM production [1]
The dominant position in the market of rare earth products is occupied by countries whose demand for rare earth metals from residents is increasing. For the period 2010– 2020, this indicator amounted to 49.8%, by 2030 this indicator will increase by another 61.2% [1]. Accordingly, Russia needs to form its own REM production. An important condition for the formation of effective REM production in Russia is to ensure the integration of this industry into the logistics, resource and production structure of the national economy. The greatest potential in this sense is possessed by such areas of development of the REM economy as: a) extraction of raw materials; b) production of the top 10 most popular names of REM products; c) participation of the REM manufacturer in the development of international innovation programs; the Russian REM manufacturer can participate in global innovation chains as an investor. The average payback period for investments in REM production is 7.5 years. In order to overcome the competitive lag of the Eurasian REM platform, taking into account the trends in the development of mining technologies, when designing the national REM subsector, it may be recommended to focus on advanced technological development. This
564
V. O. Pimenova et al.
implies the fullest possible use of the potential of “smart” technologies and the design of the production chain of REM-closed-cycle production in the form of a centralized system, for which integrated planning and management using smart technologies are carried out.
3 Assessment of the Economic Potential of Using Smart Technologies in the Design and Development of REM Production in Russia An assessment of the experience of Chinese REM production suggests that the whole range of smart solutions is applicable to this industry. The map of smart technologies recommended for implementation as part of the implementation of the concept of advanced development of the Russian REM industry is shown in Fig. 2.
Fig. 2. Map of smart technologies recommended for the implementation of the concept of advanced development of REM production in Russia [4]
The concept of advanced development of smart REM-production based on smart technologies assumes complex territorial development with the simultaneous formation
Smart Technologies in REM Production Economy in Russia
565
of both individual manufacturers and a centralized logistics system controlled on the basis of digital smart technologies. Such an approach makes it possible to ensure the maximum degree of integration between heterogeneous participants of the Russian rareearth metals sub-sector. Let’s define the main groups of smart technologies recommended for primary implementation in the activities of RSM manufacturers in Table 1. When designing the structure of the smart solutions used at various stages of REM production, it is proposed to take the Chinese structure as a basis. Data on the use of certain types of digital technologies at each of the stages of REM production recommended in the mining industry are presented in Fig. 3. More than 89% of the cost of the smart technologies proposed in Fig. 3 for the involvement of smart technologies belongs to the four groups of functional use of technologies listed in Table 1. The Russian IT industry is able to meet the needs of the REM industry in the amount of up to 70% of all needs [9], and taking into account import substitution programs by 2035, this figure is expected to increase to 90%. With respect to the remaining 10% of smart solutions required by the REM industry, 8% are supplied by at least 3 independent firms, the market for the remaining 2% of technologies (based on their total cost) is dominated exclusively by Chinese IT manufacturers. According to the results of the study of the Chinese experience, it is recommended to introduce most of the smart technologies (from 65%) already at the first stages of the development of the domestic REM industry. At the first stage, it is recommended to fully form digital profiles of companies - potential participants in REM production, smart design systems for the structure of participants and the infrastructure for interaction of smart monitoring and control systems with GLONASS and other system external partners. At the second stage, it is proposed to gradually replace existing interaction technologies with “smart” analogues, taking into account state support. Particular attention should be paid to the issues of linking the volumes and deadlines stated in the Strategy for the Development of the Rare and Rare Earth Metals industry of the Russian Federation for the period up to 2035 (hereinafter referred to as the Strategy) of state support and support for import-substituting IT manufacturers. In accordance with the Strategy for the Development of the Rare and Rare Earth Metals industry of the Russian Federation for the period up to 2035, financing is provided in the amount covering 43% of all financial needs of the digital infrastructure (minimum estimate). Taking into account this fact of paramount importance are the tasks of attracting investors and creditors to the emerging industry. In this regard, through the use of smart technologies, such tasks as: – popularization of the concept of the emerging REM industry by creating audiovisual models available to interested partners explaining the principles of functioning of the technical solutions underlying the emerging REM industry; – providing potential partners with the ability to monitor the status of potential investment objects in real time; – reducing the risks of investment by partners in obsolete technologies due to the reorientation of the REM industry to use the most advanced solutions and their integration on the basis of “smart” infrastructure.
566
V. O. Pimenova et al.
Table 1. Smart technologies recommended for priority implementation in Russian REM production Technology
Form of implementation
Tasks to be solved
Big data analyses
Designing the logistics structure of REM production based on the existing supply of logistics infrastructure services
a) ensuring budget savings for financing additional infrastructure facilities; b) increasing the load of regional logistics and increasing employment in the regions; c) reduction of the financial burden on REM manufacturers
Processing of information received from drones and through space exploration
Assessment of the resource potential of REM deposits and infrastructure-the economic potential of its development
a) overcoming the asymmetry of information about the complex of available resources; b) prevention of abuse in the field of distortion and
Creation of digital doubles of potential participants in a closed chain of REM production
The inclusion of requirements for the presence of a digital double, represented by a full set of digital economic and technical characteristics, in the structure of requirements for recipients of state support
a) updating information about potential participants of the REM industry; b) formation of an information base for further assessments of the company’s activities; c) creation of conditions for the centralized design of the national REM production chain using smart technologies; a fully automated modeling process is proposed
Smart technologies for automated management of territorial elements of REM production using satellite communications
a) equipping of REM-productions with smart sensors providing monitoring of technological process parameters; b) formation of a national technology platform and ensuring its integration with the GLONASS network
a) reducing the risk of accidents on the ground; b) increasing the efficiency of the production process due to timely identification of risk zones of technological accidents; c) increasing the efficiency of the production process through the introduction of smart systems for predictive modeling of the development of REM production based on the study of retrospective data of industry development
Smart Technologies in REM Production Economy in Russia
300 250 200
234
243
186
ERP
145
150
567
100
87 98 76 67
65
50
17
9 18
12
0 Mining operations, $ billion
Planning, $ billion
CRM
85 78 65
Operations, $ billion
SCM RFID
Other
Fig. 3. Recommended amounts of costs for digital technologies in certain sections of the process of REM production (based on Chinese experience). Compiled by the authors on the basis of [2, 7, 9]
4 Conclusions The paper presents an assessment of the needs of the Russian REM industry, the Development Strategy of which was adopted in 2019 and calculated until 2035, taking into account the experience of the Chinese REM manufacturer. The most promising spheres from the point of view of the introduction of “smart” technologies are proposed, the most relevant smart technologies from the point of view of long-term industry development are indicated.
References 1. Rasheed, M., Myung-suk, P., Sang-min, N., Sun-Woo, H.: Rare earth magnet recycling and materialization for a circular economy—a Korean perspective. Appl. Sci. 11, 6739 (2021). https://doi.org/10.3390/app11156739 2. Rare Earth Production Commencing URL (2021). https://investorintel.com/wp-content/upl oads/2020/11/2020-AGM-Presentation.pdf 3. Daigle, B., DeCargo, S.: Rare Earths and the U.S. Electronics Sector: Supply Chain Developments and Trends, Office of Industries, Working Paper ID-075 (2021). https://www.usitc.gov/ publications/332/working_papers/rare_earths_and_the_electronics_sector_final_070921_2compliant.pdf 4. Barnewold, L., Lottermoser, B.: Identification of digital technologies and digitalisation trends in the mining industry. Int. J. Mining Sci. Technol. 30, 747–757 (2020). https://doi.org/10. 1016/j.ijmst.2020.07.003 5. Xu, Y., Junqiang, Z., Xiekang, S.: Application and development of smart mine in China. MATEC Web Conf. 295, 02005 (2019). https://doi.org/10.1051/matecconf/201929502005 6. Polukhin, A.A., Yusipova, A.B., Panin, A.V., Timokhin, D.V., Logacheva, O.V.: The effectiveness of reserves development to increase effectiveness in agricultural organizations: economic assessment. Lect. Notes Netw. Syst. 206, 3–14 (2021). https://doi.org/10.1007/978-3030-72110-7_1(2021) 7. Kalenov, O., Kukushkin S.: Digital transformation of mining enterprises. E3S Web Conf. 278, 01015 (2021). https://doi.org/10.1051/e3sconf/202127801015
568
V. O. Pimenova et al.
8. Mareschal, B., Kaur, M., Vilas, S.: Convergence of smart technologies for digital transformation. Tehniˇcki glasnik 15, II−IV (2021). https://doi.org/10.31803/tg-20210225102651 9. Pimenova, V., Repkina, O., Timokhin, D.: The economic cross of the digital post-coronavirus economy (on the example of rare earth metals industry). Adv. Intell. Syst. Comput. 1310, 371–379 (2020). https://doi.org/10.1007/978-3-030-65596-9_45 10. Development strategy of the rare and rare earth metals industry of the Russian Federation for the period up to 2035. Official website of the Ministry of Industry and Trade of Russia, URL. https://minpromtorg.gov.ru/docs/#!strategiya_razvitiya_otrasli_redkih_i_redkoze melnyh_metallov_rossiyskoy_federacii_na_period_do_2035_goda
fMRI Study of Brain Activity in Men and Women During Rhythm Reproduction and Measuring Short Time Intervals V. L. Ushakov1,2,3(B)
, S. I. Kartashov4 , V. A. Orlov4 and V. Yu. Bushov5
, M. V. Svetlik5
,
1 Institute for Advanced Brain Studies, Lomonosov Moscow State University, GSP-1,
Leninskie Gory, 119991 Moscow, Russia 2 SFHI “Mental-Health Clinic No. 1 Named N.A. Alexeev of Moscow Health Department”,
2 Zagorodnoe Highway, 117152 Moscow, Russia 3 National Research Nuclear University MEPhI, 31 Kashirskoe shosse, 115409 Moscow, Russia 4 National Research Centre “Kurchatov Institute”, 1 Akademika Kurchatova Pl.,
123182 Moscow, Russia 5 National Research Tomsk State University, 36 Lenin Ave., 634050 Tomsk, Russia
Abstract. The aim of the research was to study the men and women brain activity in during memorizing and reproducing rhythm (5 s) and measuring short time intervals (0.8 s) by fMRI method. Volunteers are young people (boys and girls) aged 18 to 27 years. It was shown that extensive brain areas (prefrontal and motor cortex, insular cortex, sensory and associative areas of the parietal and temporal cortex, amygdala, hippocampus, thalamus, basal ganglia and cerebellum) are involved in providing sensorimotor activity associated with rhythm reproduction and measuring short time intervals. It was found that the measurement of short time intervals is partially provided by the same brain structures as the rhythm reproduction. When measuring the duration, the activation in both hemisphere of a number of additional structures (frontal pole, supraorbital gyrus, angular gyrus, temporal area and some other cortical areas) was detected. The results of this study indicate that the “brain support” of sensorimotor activity associated with rhythm reproduction and measuring short time intervals significantly depends on the method of scaling time intervals and gender differences. Keywords: Rhythm reproduction · Measuring time intervals · Gender differences · fMRI
1 Introduction The problem of time and its perception by man has attracted and attracts the attention of researchers for hundreds of years. However, in recent decades, this problem has acquired particular urgency. This is due to the fact that the increasing computerization and the widespread introduction of the latest information technologies in production and transport, in education and science impose increased requirements on the ability of a © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 569–575, 2022. https://doi.org/10.1007/978-3-030-96993-6_63
570
V. L. Ushakov et al.
modern person to navigate in time, to make emergency decisions in conditions of time pressure. On the other hand, a number of neuropsychiatric disorders are accompanied by disturbances in the perception of time, which can be used in the diagnosis of these diseases [1]. Therefore, the study of individual features and mechanisms of perception of time is an actual problem of modern physiology and medicine. Despite significant progress in this area, especially in recent decades, there are still many unresolved issues within this problem. So, for example, although many researchers [2–8] noted a clear dependence of the processes of time perception on the individual characteristics of a person (gender, age, properties of the nervous system, temperament, etc.), the causes and psychophysiological mechanisms of these dependencies are still poorly understood. The role of various structures and hemispheres of the brain in the processes of time perception has not been sufficiently studied, since most of this data was obtained through clinical observations of sick people who have suffered a stroke, traumatic brain injury, etc., and these data cannot be unconditionally transferred to a healthy brain. At the same time, the emergence of modern non-invasive and minimally invasive methods of neuroimaging (fMRI, PET) opens up wide opportunities for solving these issues. The aim of this study was to study the brain activity in men and women using the functional magnetic resonance image (fMRI) method during rhythm reproduction and measuring short time intervals. The task of the study was to find out which brain structures are involved in the provision of these types of activity, and how this “brain supply” depends on gender differences.
2 Materials and Methods The fMRI method was used to study the brain activity in men and women during reproduction of a five-second rhythm and measuring short time intervals. The study involved 40 healthy volunteers: boys (20 people) and girls (20 people) - university students aged 19 to 27 years; the average age is 23 years. Voluntary consent was obtained from each subject to participate in the experiment. Permission to conduct these studies was granted by the Ethics Commission of the National Research Center “Kurchatov Institute”. During the preliminary examination, the leading hand was identified using the Annette questionnaire. The study included several series of experiments. In a series with a reproduction of the rhythm, a video is shown to the subject, in which a white square with a side of 2 cm appears in the center of the screen periodically (with a period of 5 s) for 200 ms. The subject must memorize this rhythm and then reproduce the given rhythm by pressing the button with his left or right hand, depending on the instructions. After that, a video is shown with the image of the stimulus to the subject (a white cross on a dark background in the center of the screen), to which the subject’s gaze should be directed during rest. In a series with measuring the duration, a video is shown to the subject, in which a white square with a side of 2 cm appears in the center of the screen periodically (with a period of 0.8 s) for 200 ms. Then the subject measures a time interval of 0.8 s: the first press on the button is the beginning of the interval (with the right or left hand, depending on the instructions), and the second is the end of the interval. After that, a video is shown with the image of the stimulus to the subject (a white cross on a dark background in the center of the screen), to which the gaze should be directed during
fMRI Study of Brain Activity in Men and Women
571
rest. When performing tasks on the perception of time, the subject was forbidden to use a clock, oral counting and other methods of measuring time. The results of structural and functional MRI were obtained at the NBICS-technologies of the National Research Center “Kurchatov Institute” on a SIEMENS Magnetom Verio 3 T MR-tomograph. To obtain a high-resolution structural image of the brain (T1-weighted image), the following parameters of the rapid gradient echo sequence were used: 176 slices, TR (repetition time) = 1900 ms, TE (echo time) = 2.19 ms, thickness cut = 1 mm, flip angle = 900 , inversion time = 900 ms, observation field = 250 × 218 mm. To obtain fMRI data, the following parameters were used: TR = 2 s, TE = 20 ms, number of slices - 42, voxel size – 2 × 2 × 2 mm. Additionally, the GFM (gre_field_map) MRI scan mode was used to take into account the inhomogeneity of the magnetic field of the tomograph and correct artifacts associated with this. Field map imaging was performed with a double-echo spoiled gradient echo sequence (gre_field_map; TR = 580.0 ms, TE = 4.92/7.38 ms, voxel size: 2 × 2 × 2 mm (0.6-mm gaps), flip angle 90° ) that generated a magnitude image and 2 phase images. The fMRI and anatomical MRI data were pre-processed using Statistical Parametric Mapping (SPM12; Wellcome Trust Centre for Neuroimaging, London, UK; available free at http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) based on Matlab 2016a. After importing the Siemens DICOM files into the SPM NIFTI format, the center of anatomical and functional data was manually set to the anterior commissure for each subject. Broccoli’s algorithm has been used to correct motor artifacts. The head movement parameter was less than 0.5 mm after the correction. Motion parameters were re-calculated in the realignment step and after were included as additional regressors in the general linear model (GLM) analysis. Spatial distortions of the EPIs resulting from motion-by-field inhomogeneity interactions were reduced using the FieldMap toolbox implemented in SPM12. Anatomical MPRAGE images were segmented using the segmentation algorithm implemented in SPM, and both anatomical and functional images were normalized into the ICBM stereotactic reference frame (MNI (Montreal Neurological Institute) coordinates) using the warping parameters obtained from segmentation. Anatomical data were segmented into 3 possible tissues (grey matter, white matter, cerebrospinal fluid). Functional data were smoothed using a Gauss function with an isotropic kernel of 6 mm FWHM. Within each of the paradigms, pairwise comparisons were made based on the Student’s statistics and individual and group maps were obtained with a significance level of p < 0.001.
3 Results and Discussion The study of the lateral organization of the subject’s brain showed that in the 20 young men group - 10 young men are characterized by a high level of right-handedness, 8 - a low level of right-handedness, 1 - a high level of left-handedness, and 1 - a low level of left-handedness. In the 20 girls group: 12 girls are characterized by a high level of right-handedness, 5 - a low level of right-handedness, 1 - a high level of left-handedness, 2 - are “ambidextrous”. In the young men group, a comparison of the condition of fivesecond rhythm reproduction in relation to the resting state revealed the activation of an additional motor cortex, right and left, precentral and superior frontal gyrus, right and
572
V. L. Ushakov et al.
left, left middle frontal gyrus, inferior frontal and postcentral gyri, right, singular gyrus, lingual gyrus and insular cortex, right and left, as well as thalamus, amygdala, basal ganglia and some areas of the cerebellum. Under the same conditions in the girl’s group, the activation the part same brain areas as in the men group was found: the additional motor cortex, precentral gyrus and superior frontal gyrus, right and left, inferior frontal gyrus, right, postcentral gyrus, right and left, singular gyrus, insular cortex, right and left, as well as the thalamus, amygdala, basal ganglia and some areas of the cerebellum. Along with this, they showed activation of the frontal pole, on the left, the supra-marginal gyrus, on the right, the temporal lobe, on the right, the superior temporal gyrus, on the right, the middle temporal gyrus, on the right and left, the angular gyrus, on the right, and the hippocampus. It is important to note that the most significant gender differences were observed during rhythm reproduction with the left hand. Moreover, these differences were manifested not only in the composition of the brain structures “called interested”, but also in the level of their activation. So, if in men group in the area of the precentral gyrus, 36 voxels are activated on the right and 15 voxels on the left hemisphere, then in women group, respectively, 387 voxels are on the right and 589 voxels on the left hemisphere. Probably, the found differences are mainly associated with the features of the lateral organization of the men and women brain, which have a significant effect on speech [9] and other cognitive functions of a person [10–13]. Group statistical maps plotted on a high-resolution template T1 image obtained in men and women groups when comparing the conditions for reproducing a five-second rhythm with the left hand in relation to the resting state with fixation of the gaze at a white cross in the center of the screen are shown in Fig. 1.
Fig. 1. Group statistical maps plotted on a high-resolution template T1 image obtained in men (A) and women (B) groups when comparing the conditions for reproducing a five-second rhythm with the left hand in relation to the resting state with gaze fixation on a white cross in the center of a darkened screen (warm tones - a positive effect, cold tones - a negative effect, p < 0.001)
fMRI Study of Brain Activity in Men and Women
573
Comparison of the conditions for measuring the time interval of 0.8 s in relation to the resting state made it possible to detect in men group partially the activation of the same brain regions as during rhythm reproduction: the superior, middle and inferior frontal gyri, right and left, pre- and postcentral gyri, right and on the left, the singular gyrus, as well as the thalamus, amygdala, basal ganglia and some areas of the cerebellum. Along with this, the activation of other brain areas was found: the frontal pole, right and left, the supra-marginal gyrus, right and left, the angular gyrus, right and left, the superior and middle temporal gyri, right and left, the temporal area, right and left, inferior temporal gyrus, on the right and some other cortex areas. The obtained data indicate that the “brain provision” of two types of activity: reproduction of the rhythm and measuring the time intervals differ significantly. Probably, these differences are associated with the fact that measuring the time intervals requires the participation of the second signaling system and the actualization of long-term memory [7, 8], while the reproduction of the rhythm does not require this. Activation of the pre- and postcentral gyri, right and left, supra-marginal and paracingular gyri, right and left, superior, middle and inferior frontal gyri, right and left, frontal pole, right and left, superior and middle temporal gyri, right and left, inferior temporal gyrus, right, temporal platform, right and left, insular cortex, right and left, angular gyrus, right and other areas of the cortex, as well as the thalamus, amygdala, basal ganglia and some areas of the cerebellum was found in women group under the same conditions. Gender differences were manifested mainly in different levels of activation of “interested” brain structures and, in particular, in the fact that in men group, when measuring an interval of 0.8 s, the number of activated voxels in the area of the precentral gyrus in the right and left hemisphere are 1319 and 231 voxels, respectively, while in women group - 818 and 409 voxels, respectively. Group statistical maps plotted on a high-resolution template T1 image obtained from men and women groups when comparing the conditions for measuring an interval of 0.8 s using the left hand in relation to the resting state with fixation of the gaze at a light cross in the center of the screen are shown in Fig. 2. The obtained data indicate that large areas of the brain are involved in providing sensorimotor activity associated with rhythm reproduction and measuring short time intervals, and among them: – the prefrontal cortex, which provides planning, goal setting and initiation of action; – the motor cortex, which ensure the implementation of motor programs; – sensory and associative areas of the temporal cortex, which provide processing of auditory-speech information and sensory integration by forming a holistic image of the object; – sensory and associative areas of the parietal cortex, which are involved in the processing of signals from proprioceptors and spatial attention; – the insular cortex, which provides interaction of motor, sensory and autonomic functions in the organization of appropriate adaptive activity, regulates complex behavioral acts and mnemic processes involved in the emotion’s formation; – the hippocampus and amygdala, which are involved in the processes of memory and the emotions formation; – thalamus, the motor nuclei of which are involved in the formation of motor programs;
574
V. L. Ushakov et al.
– the basal ganglia and cerebellum, which are also involved in the formation of motor programs and their storage.
Fig. 2. Group statistical maps plotted on a high-resolution template T1 image obtained in men (A) and women (B) groups when comparing the conditions for measuring an interval of 0.8 s with the left hand in relation to the state of rest with fixation of the gaze at a light cross in the center of the screen (warm tones - a positive effect, cold tones - a negative effect, p < 0.001)
In the time perception processes the structures participation is also confirmed by some literature data: the frontal, intra-parietal and auditory cortex [14], the cerebellum and basal ganglia [15], as well as the hippocampus [16]. There is evidence that large areas of the auditory and motor cortex, accessory motor cortex, prefrontal regions, inferior parietal lobe, basal ganglia, and cerebellum play an important role in the perception and production of rhythm [17].
4 Conclusion Thus, the “brain provision” of sensorimotor activity associated with rhythm reproduction and measuring short time intervals depends significantly on the method of scaling time intervals and gender differences. All this must be taken into account when developing methods for diagnosing neuropsychic disorders associated with impaired time perception. This work was in the part supported by the Russian Foundation for Basic Research (project No. 18-013-00758), by the Ministry of Science and Higher Education of the Russian Federation (Grant № 075-15-2020-801).
References 1. Krylov, V.I.: Time perception disorders: psychopathological features, diagnostic value, taxonomy. Neurol. Bull. 50(3), 88–92 (2018). In Russian
fMRI Study of Brain Activity in Men and Women
575
2. Tsukanov, B.I.: The time factor and the nature of temperament. Psychol. Issues 4, 129–136 (1988). In Russian 3. Tsukanov, B.I.: The quality of the internal clock and the problem of intelligence. Psychol. J. 12(3), 38–44 (1991). In Russian 4. Werth, R.: The influence of culture and environment on the perception of time. Int. J. Psychophysiol. 7(2–4), 436–437 (1989). https://doi.org/10.1016/0167-8760(89)90371-1 5. Kostandov, E.A., Zakharova, N.N., Vaginova, T.T., et al.: Distinguishing of micro-intervals of time by emotionally excitable individuals. J. Higher Nerv. Act. 23(4), 614–619 (1988). In Russian 6. Zabrodin, Y., Borozdina, D.V., Musina, I.A.: To the methodology for assessing the level of anxiety by the characteristics of temporary perception. Psychol. J. 10(5), 87–94 (1989). In Russian 7. Lupandin, V.I., Surnina, O.E.: Subjective Scales of Space and Time. Ural Publishing House University, Sverdlovsk (1991).In Russian 8. Bushov, Y., Khodanovich, M.Y., Ivanov, A.S., Svetlik, M.V.: Systemic Mechanisms of Time Perception. Publishing house Tomsk University Tomsk, Tomsk (2007).In Russian 9. Wolf, N.V.: Gender Differences in the Functional Organization of the Processes of Hemispheric Processing of Speech Information. Publishing house of LLC “CVVR”, Rostov on Dony (2000). In Russian 10. Ignatova, Y., Makarova, I.I., Zenina, O., Aksenova, A.V.: Modern aspects of the study of functional interhemispheric asymmetry of the brain (literature review). Hum. Ecol. 9, 30–39 (2016). In Russian 11. Olonenko, E.S., Kodochigova, A.I., Kirichug, V.F., Deevav, M.A.: Psychophysiological aspects of gender differentiation. Psychosom. Integr. Res. 2(1), 1–4 (2016). In Russian 12. Khorolskaya, E.N., Pogrebnyak, T.A.: Gender features of functional asymmetry of the cerebral hemispheres and channels of perception of educational information in 14–15-year-old adolescents. Sci. Result Physiol. 3(1), 19–24 (2017). In Russian 13. Xin, J., Zhang, Y., Tang, Y., Yang, Y.: Brain differences between men and women: evidence from deep learning. Front. Neurosci. 13, 185–194 (2019).https://doi.org/10.3389/fnins.2019. 00185 14. Sysoeva, O.V., Vartanov, A.V.: Reflection of the duration of the stimulus in the characteristics of the evoked potential. Psychol. J. 25(1), 101–110 (2004). In Russian 15. Jeuptner, M., et al.: Localization of cerebellar timing processes using PET. Neurology 45, 1540–1545 (1995). https://doi.org/10.1212/WNL.45.8.1540 16. Mehring, T.A.: About different forms of time reflection by the brain. Philos. Questions 7, 119–128 (1975). In Russian 17. Kovaleva, A.V.: Physiological bases of perception and reproduction of rhythm in neurology. Russ. Med. J. 12(1), 61–65 (2018). In Russian
Development of the Intelligent Object Detection System on the Road for Self-driving Cars in Low Visibility Conditions Nikita Vasiliev1 , Nikita Pavlov1 , Osipov Aleksey1 , Ivanov Mikhail1 Radygin Victor2(B) , Ekaterina Pleshakova1 , Sergey Korchagin1 , and Bublikov Konstantin3
,
1 Financial University under the Government of the Russian Federation, Shcherbakovskaya, 38,
Moscow, Russian Federation {AVOsipov,MNivanov,ESPleshakova,SAKorchagin}@fa.ru 2 National Research Nuclear University “MEPHI”, 31 Kashirskoe Shosse, Moscow 115409, Russian Federation [email protected] 3 Institute of Electrical Engineering of the Slovak Academy of Sciences, Dubravska cesta 3484/9, Bratislava, Slovakia [email protected]
Abstract. Self-driving cars is rapidly developing area in modern world. Many companies, such as Tesla, are using computer vision technologies to automate driving process by computers using a lot of cameras, integrated in car bodies. One of the problems is weather. We are solving the following task: One day a self-driving car was driving and suddenly it started to rain. The main camera has lost image clarity, except random square area. We need to find out, either a man or a car is in this square area. It is guaranteed that there is a car or a man there. Classes are balanced. For train dataset we have images, positions of square areas with their size and labels. We do not have test dataset, but it is known, that it has only images. So, we need to create algorithm to locate areas, where image is not blurred and only then solve classification task. Finally, model scored using cross-validation. Keywords: Neural network · Computer vision · Edge detection · Image classification
1 Introduction The autonomous driving system is currently one of the most difficult tasks in autonomous vehicles, which is an important area of research. The machine learning used in this case is an actual trend in the development of this industry, which includes classical machine learning and new methods of deep learning based on convolutional neural networks. Convolutional neural network (also CNN) is one of the most popular algorithms in © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 576–584, 2022. https://doi.org/10.1007/978-3-030-96993-6_64
Development of the Intelligent Object Detection System
577
deep learning, it is a type of machine learning in which the model learns to perform classification tasks directly on the image [1, 2], speech recognition [3, 4] and natural language processing [5, 6]. The use of CNN for deep learning has become more popular due to the following advantages: thanks to deep neural networks, we have a flexible architecture with the addition of layers of neurons, which increases the ability of these networks to learn [7]; CNNs can be retrained to perform new recognition tasks, which will allow using existing networks; high accuracy; reliability. CNN is a multi-stage learning architecture with direct communication, where each stage contains several levels used to configure the network [8]. For classification tasks, CNN accepts input data in the form of images. These images are passed through a series of convolutional layers and subsampling layers to extract objects. Fully connected layers classify the image according to certain results, such as accuracy and loss [9]. In this paper, deep learning methods used in the field of autonomous driving can be classified according to the following tasks (see Table 1). CNNs provide an optimal architecture for image recognition and pattern detection, which have recently significantly improved their performance [10–12]. Combined with advances in GPUs and parallel computing, CNN is a key technology underlying new developments in driverless driving and facial recognition. We looked at our CSV file for the train dataset (see Table 1). Using both picture and CSV files we were able to highlight the required area on the photo. An example of a photograph used to highlight an area is shown in Fig. 1. However, we had “bad” photos, on which it was unreal to detect required area even for a human being. Table 1. Train CSV file. filename
center_x
center_y
Size
label
0
571.jpeg
1746
691
138
1
1
387.jpeg
1132
649
100
0
2
813.jpeg
166
642
108
0
3
1351.jpeg
1693
753
104
1
4
1367.jpeg
1320
647
112
1
…
…
…
…
…
…
2451
379.jpeg
634
697
130
0
2452
648.jpeg
508
633
102
1
2453
1700.jpeg
534
636
64
0
2454
691.jpeg
937
685
150
0
2455
1677.jpeg
1885
749
68
1
578
N. Vasiliev et al.
Fig. 1. Image sample.
2 Main Part 2.1 Canny Edge Detector Our first task was to find some algorithm to detect required areas. We decided to use Canny edge detector. Firstly, it applies Gaussian filter to smooth the image in order to remove the noise. The equation for a Gaussian filter kernel of size (2k + 1) × (2k + 1) is given by: 1 (i − (k + 1))2 + (j − (k + 1))2 exp − Hij = ; 1 ≤ i, j ≤ (2k + 1) (1) 2π σ 2 2σ 2 Then Canny edge detector finds the intensity gradients of the image and applies non-maximum suppression to get rid of spurious response to edge detection. After that, it applies double threshold to determine potential edges [1–4]. We used 80 for the first threshold and 200 for the second. Then Canny edge detector finalizes the detection of edges by suppressing all the other edges that are weak and not connected to strong edges. The main idea is that it is possible to find edges only in the area, where image is not blurred [13–19]. This algorithm worked quite well on our image sample (see Fig. 2). Since black color on the picture is zero, and we were working with gray image with only 1 channel, we were able to find first non-zero elements from left, right, up and down and detect required area [20–25]. However, 4.52% of our photos had full black edge images, so we were not training on them. 2.2 IoU Then we needed to score somehow our algorithm [26–30]. We decided to use Intersection over Union metric (see Fig. 3).
Development of the Intelligent Object Detection System
579
Fig. 2. Edge image.
IoU defined as: IoU =
Area of Overlap Area of Union
(2)
To perform this metric, we transformed our images with true and predicted areas to the following format: required area was drawn in white color and other part of image was drawn in black (see Fig. 4). However, we would not be able to detect something on photos, if it’s IoU metric is less, than 10, so we added 5.33% of our photos to our “bad” list of photos and finally we dropped 9.61% of our train dataset, which is not critical. More than that, there were nontrivial cases, when true area, since it was square and we knew its center and size could go beyond the borders of image. To prevent that, we checked borders of true area to be nonnegative and made them zeros if they were negative [31–34]. Mean value of IoU metric was 78.21. We believe that it is enough to understand either a man or a car on this area. 25% quantile for width of new images was 71 and for height was 66. We need to remember these values since we will use convolutional neural network (CNN) and it accepts images of fixed size. After that, we cut predicted areas from images and saved them as new files.
580
N. Vasiliev et al.
Fig. 3. IoU metric.
Fig. 4. Transformed image.
Development of the Intelligent Object Detection System
581
2.3 Classification We decided to scale each new image to 70 x 70. After splitting the training dataset into training and validation datasets, we got the image classification task presented in Fig. 5.
Fig. 5. Images and labels.
So, a car is label 0 and a man is label 1. We defined following structure of CNN (kernel sizes for all convolutions is 3 and for pooling is 2): Conv2d (from 3 channels to 4) → Maxpool2d (stride 2) → Conv2d (from 4 chennels to 6) → Pr oceeding matrix M × N to vector M ∗ N dimension →
(3)
Linear (from 16 ∗ 16 ∗ 6 to 500) → ReLU → Linear (from 500 to 50) → ReLU → Linear (from 50 to 2) We were training on Tesla K80, but it is not required for this task to have such a powerful GPU. We used cross entropy loss as loss function and adam optimizer with
582
N. Vasiliev et al.
learning rate 1e-4. Now let’s compare the results that showed the selected loss functions. CNN was trained for 10 epochs (see Fig. 6). Figure 6 shows a comparison of the forecasts of networks trained with different loss functions according to the selected criteria.
Fig. 6. Loss of the CNN.
We reached accuracy 90.625% on validation dataset. However, we did not have test dataset to understand our real result. That is why we decided to use cross-validation with 5 folds. Our CNN was trained 5 times for 15 epochs each and mean accuracy result on 5 validation samples was 84.89%, which is representative score.
3 Conclusion In our research, we solved the problem of computer vision, which today can be used in unmanned vehicles. In this paper, we propose a Convolutional Neural Net-work (CNN) for recognition, a medium-scale dataset. A large number of experiments demonstrate that the CNN model gives excellent results with an average recognition accuracy of up to 84,89%. Compared to other works, CNN provides better performance in terms of image recognition in poor visibility conditions than other classic network models. We have proposed a classification structure with multiple labels for the recognition task. A higher quality image is subjected to selection techniques so that less visible objects are visible. The basic idea is that you can only find edges in an area where the image is not blurry. This algorithm worked well on our sample image. However, we still have a lot to do: we can build a more complex CNN, change its hyperparameters, and expand our train dataset with new tagged data. It is assumed that our method may also work well for images taken at night, with further improvement.
Development of the Intelligent Object Detection System
583
References 1. Ni, J., Chen, Y., Chen, Y., Zhu, J., Ali, D., Cao, W.: A survey on theories and applications for self-driving cars based on deep learning methods. Appl. Sci. 10(8), 2749 (2020) 2. Zhong, Z., Li, J., Luo, Z., Chapman, M.: Spectral-spatial residual network for hyperspectral image classification: a 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 56, 847–858 (2018) 3. Agrawal, P., Ganapathy, S.: Modulation filter learning using deep variational networks for robust speech recognition. IEEE J. Sel. Top. Sign. Process 13, 244–253 (2019) 4. Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A.E.D., Jin, W., Schuller, B.: Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans. Intell. Syst. Technol. 9, 49 (2018) 5. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning-based natural language processing. IEEE Comput. Intell. Mag. 13, 55–75 (2018) 6. Sun, S., Luo, C., Chen, J.: A review of natural language processing techniques for opinion mining systems. Inform. Fus. 36, 10–25 (2017) 7. Raisa, A., Hosen, M.I.: CNN-based leaf image classification for bangladeshi medicinal plant recognition. In: Computing Communication and Electronics (ETCCE) 2020 Emerging Technology, pp. 1–6 (2020) 8. Jogin, M., Madhulika, M.S., Divya, G.D., Meghana, R.K., Apoorva, S.: Feature extraction using convolution neural networks (CNN) and deep learning. In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 2319–2323. IEEE (2018) 9. Jogin, M., Mohana, M.S., Madhulika, G.D., Divya, R.K.M., Apoorva, S.: Feature extraction using convolution neural networks (cnn) and deep learning. In: 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 2319–2323 (2018) 10. Marino, S., Beauseroy, P., Smolarz, A.: Weakly-supervised learning approach for potato defects segmentation. Eng. Appl. Artif. Intell. 85, 337–346 (2019) 11. Afonso, M., Blok, P.M., Polder, G., van der Wolf, J.M.: Blackleg detection in potato plants using convolutional neural networks. IFAC-PapersOnLine 52(30), 6–11 (2019) 12. Ang, W., Juanhua, Z., Taiyong, R.: Detection of apple defect using laser-induced light backscattering imaging and convolutional neural network. Comput. Electr. Eng. 81, 106454 (2020) 13. Badue, C., et al.: Self-driving cars: a survey. Expert Syst. Appl. 165(1), 113816 (2020) 14. Gambi, A., Mueller, M., Fraser, G.: Automatically testing self-driving cars with search-based procedural content generation. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 318–328 (2019) 15. Ndikumana, A., Nguyen, H.T., Do Hyeon, K., Ki Tae, K., Choong Seon, H.: Deep learning based caching for self-driving cars in multi-access edge computing. In: IEEE Transactions on Intelligent Transportation Systems, pp. 2862–2877 (2021) 16. Kolomeets, M., Zhernova, K., Chechulin, A.: Unmanned transport environment threats. In: Proceedings of 15th International Conference on Electromechanics and Robotics “Zavalishin’s Readings”, pp. 395–408. Springer, Singapore (2020). https://doi.org/10.1007/978981-15-5580-0_32 17. Wang, Z., Wang, M.: Development status and challenges of unmanned vehicle driving technology. In: International conference on Big Data Analytics for Cyber-Physical-Systems, pp. 68–74. Springer, Singapore (2019). https://doi.org/10.1007/978-981-15-2568-1_11
584
N. Vasiliev et al.
18. Korchagin, S.A., Terin, D.V., Klinaev, Yu.V., Romanchuk, S.P.: Simulation of current-voltage characteristics of conglomerate of nonlinear semiconductor nanocomposites. In: Conference: 2018 International Conference on Actual Problems of Electron Devices Engineering (APEDE), pp. 397–399. IEEE (2018) 19. Benhamza, K., Seridi, H.: Canny edge detector improvement using an intelligent ants routing. Evol. Syst. 12(2), 397–406 (2019). https://doi.org/10.1007/s12530-019-09299-0 20. Korchagin, S.: Forecasting oil tanker shipping market in crisis periods: exponential smoothing model application. Asian J. Shipping Logistics 37(3), 239–244 (2021) 21. Rezatofighi, H., et al.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019) 22. Dogadina, E.P., Smirnov, M.V., Osipov, A.V., Suvorov, S.V.: Evaluation of the forms of education of high school students using a hybrid model based on various optimization methods and a neural network. Informatics 8(3), 46 (2021) 23. Korchagin, S., Romanova, E., Serdechnyy, D., Nikitin, P., Dolgov, V., Feklin, V.: Mathematical modeling of layered nanocomposite of fractal structure. Mathematics 9(13), 1541 (2021) 24. Shirokanev, A.S., Andriyanov, N.A., Ilyasova, N.Y.: Development of vector algorithm using CUDA technology for three-dimensional retinal laser coagulation process modeling. Comput. Opt. 45(3), 427–437 (2021) 25. Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using YOLOv3 and YOLOv5 in general and close-up images. International Symposium on Neural Networks, pp. 233–243. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64221-1_20 26. Korchagin, S.A., et al.: Development of an optimal algorithm for detecting damaged and diseased potato tubers moving along a conveyor belt using computer vision systems. Agronomy 11–10, 1980 (2021) 27. Korchagin, S.A., Terin, D.V., Klinaev, Yu.V., Romanchuk, S.P. Simulation of current-voltage characteristics of conglomerate of nonlinear semiconductor nanocomposites. In: 2018 International Conference on Actual Problems of Electron Devices Engineering, pp. 397–399, 8542433. APEDE (2018) 28. Shirokanev, A.S., Andriyanov, N.A., Ilyasova, N.Y.: Development of vector algorithm using CUDA technology for three-dimensional retinal laser coagulation process modeling. Comput. Opt. 45(3), 427–437 (2021) 29. Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using yolov3 and yolov5 in general and close-up images. In: Han, M., Qin, S., Zhang, N. (eds.) ISNN 2020. LNCS, vol. 12557, pp. 233–243. Springer, Cham (2020). https://doi.org/10.1007/978-3-03064221-1_20 30. Soloviev, V., Titov, N., Smirnova, E.: Coking coal railway transportation forecasting using ensembles of ElasticNet, LightGBM, and Facebook Prophet. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12566, pp. 181–190. Springer, Cham (2020). https://doi.org/10.1007/9783-030-64580-9_15 31. Kuznetsova, A., Maleva, T., Soloviev, V.: Detecting apples in orchards using YOLOv3. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12249, pp. 923–934. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58799-4_66 32. Kuznetsova, A., Maleva, T., Soloviev, V.: Using YOLOv3 algorithm with pre- and postprocessing for apple detection in fruit-harvesting robot. Agronomy 10(7), 1016 (2020) 33. Gataullin, T.M., Gataullin, S.T., Ivanova, K.V.: Synergetic effects in game theory. In: 13th International Conference Management of Large-Scale System Development, MLSD 2020, Moscow (2020) 34. Gataullin, T.M., Gataullin, S.T., Ivanova, K.V.: Modeling an electronic auction. In: Popkova, E.G., Sergi, B.S. (eds.) ISC 2020. LNNS, vol. 155, pp. 1108–1117. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-59126-7_122
Application of Machine Learning for Solving Problems of Nuclear Power Plant Operation V. S. Volodin(B)
and A. O. Tolokonskij
National Research Nuclear University MEPhI, 31 Kashirskoe shosse, 115409 Moscow, Russia [email protected]
Abstract. Nowadays, the industry is actively introducing technologies based on machine learning: predictive analytics, computer vision, industrial robots, etc. In this article authors discuss the possible application of machine learning to improve the operation of nuclear power plant (NPP) power units: diagnostics of the state of equipment (both technological equipment of normal operation systems and equipment of safety systems); definition of irrelevant alarm; determination of the state of the reactor plant; application of machine learning in equipment control algorithms. The report also examines the existing difficulties in introducing machine learning into NPP operation: issues of stability of control systems based on machine learning; the issue of interpretability of solutions issued by systems based on machine learning; small data set size for training machine learning models. Keywords: Nuclear power plant · Machine learning · Industry 4.0
1 Introduction Artificial Intelligence (AI) is one of the key technologies in Industry 4.0, along with robotics, the Industrial Internet of Things (IIoT) and Big Data. The rise in popularity of machine learning came after the publication of an article by Jeffrey Hinton et al. in 2006, in which they showed how to train a deep neural network [1]. This method is used to solve a large number of tasks - from image classification tasks to tasks of developing a car autopilot. This technology finds its application in complex technical objects with real-time control systems. In combination with other technologies of Industry 4.0, the following trends of application of AI in industry can be distinguished [2, 3]: digital twins of technological objects, data mining coupled with IIoT, and computer vision. In the nuclear industry, in operational tasks, computer vision and digital twins of power units (including predictive analytics) are used (or the issue of application is being worked out).
2 Digital Twins of Nuclear Power Plant Unit A digital twin (DT) is a software analogue of a physical device that simulates internal processes, technical characteristics and behavior of a real object under the influence © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 585–589, 2022. https://doi.org/10.1007/978-3-030-96993-6_65
586
V. S. Volodin and A. O. Tolokonskij
of interference and the environment. An important feature of the digital twin is that information from sensors of a real device operating in parallel is used to set input influences on it. DT improves the process of managing a complex system at all stages of its life cycle: from design to decommissioning. With the help of information collected by the elements of the IIoT of a complex system (for example, as Tesla does with their cars), it is possible to predict failures and estimate the residual resource of equipment, optimize the operation process (for example, by adjusting automatic regulators). Full-scale mathematical models describing the object, or mathematical models based on machine learning are used in solutions of these problems. Application of AI allows avoiding the time-consuming process of building an accurate model without loss of accuracy, but large data sets are required to train it (Big Data). One of the ways to get around this difficulty is to train the neural network on accurate mathematical models using powerful graphics accelerators. Digital twins are designed and used by such large companies as General Electric [4], Tesla [5], Siemens [6], Rosatom [7]. In the nuclear industry, mathematical modeling has historically been used mainly for checking design solutions at the stage of designing technological equipment, justifying the safety of operation of nuclear power plant (NPP) power units and training of operating personnel (the growth in the use of mathematical modeling for personnel training happened after the tragedy at the fourth power unit ChNPP in 1986). However, the development of computer technology and information technologies in the last ten years has made it possible to begin the introduction of mathematical modeling at other stages of the NPP process control system life cycle, such as commissioning work on the process control system, as well as the industrial operation of the process control system. Currently, Russia is developing a digital twin of the NPP power unit - the software and hardware complex (SHC) “Virtual-digital NPP with VVER”, which is a set of calculation codes that simulate the physical processes occurring in the technological equipment of the NPP power unit with a VVER-type reactor, which operates on a highperformance computing system. A feature of the SHC is the presence of codes of severe accidents by Nuclear Safety Institute of the Russian Academy of Sciences. In 2017, the development of the main calculation codes that form the core of the software and hardware complex was completed. The first demonstration of the software and hardware complex “Virtual-digital NPP with VVER” took place on May 23, 2018 at VNIIAES JSC (subsidiary of Rosenergoatom Concern JSC, part of the electric power division of Rosatom State Corporation). In 2019, about 100 autonomous and complex tests were carried out at the software and hardware complex of the Virtual Digital NPP. The trial operation of the SHC confirmed the possibility of its use for emergency training and integrated emergency exercises, verification of control algorithms for the NPP power unit, verification of emergency instructions and design solutions for the most critical equipment. On March 3, 2020, the software and hardware complex “Virtual-digital NPP with VVER” was put into commercial operation. Also, at the moment, a predictive analytics system at Novoronezh NPP-2 is in trial operation, developed to determine hidden defects in electrical equipment: the model in real time analyzes the parameters of the equipment (for example, the concentration of
Application of Machine Learning for Solving Problems
587
various gases) and determines the presence of a developing process that will lead to equipment failure.
3 Problems of Modern Dynamical Systems Topical problems of the theory of control systems include the following [8]: • prediction of the future state; • structure and optimization. It is required to select the parameters of the system to improve the quality of work or stability; • assessment and management. It is often possible to actively control a dynamic system using feedback, that is, to use the measurements of the system to provide signals to the drives and thereby change its behavior. In this case, it is required to assess the overall state of the system using a limited set of measurements.
4 Operational Tasks That Can Be Solved Using Machine Learning All tasks for the solution of which machine learning can be applied can be conditionally divided into two groups: improving the algorithms for controlling technological equipment and reducing the information load on the operating personnel of the power unit [9]. However, in view of the openness of the issues of substantiating the stability of control systems based on machine learning algorithms, as well as the conservatism of the nuclear industry, their practical application in the near future is not possible. Using machine learning algorithms, it is possible to solve the following tasks for information support of operating personnel (not only within the block control point): to determine the state of the reactor plant in accordance with the technological regulations for the safe operation of a power unit, to predict violations of operational limits and limits of safe operation (with the definition of the time gap before the violation), monitoring the reliability of the readings of the sensors of normal operation and safety systems, identifying slowly changing characteristics of equipment (for example, the efficiency of heat exchangers; which leads, for example, to the need to adjust automatic controllers for new parameters of technological control objects), adjustment of automatic regulators at the stage of industrial operation, determination of the root cause of violations, suppression of irrelevant alarms (machine learning can be used to determine which of the triggered systems analysis is irrelevant - the problem of classification). Solving these problems using machine learning will improve information support for the operating personnel of the NPP power unit. Also solved in terms of machine learning is the problem of issuing recommendations for the control of technological equipment. The task of forming optimal recommendations for managing a technological process is a dynamic optimization task, and the target functional for which may include separate functional for the tasks of optimal control of individual technological equipment. Such functionals can be built on the basis of the list of output parameters of the system for calculating the technical and economic indicators of the unit operation. In terms of machine learning, this task belongs to regression problems, which allows you to solve it with less resources.
588
V. S. Volodin and A. O. Tolokonskij
5 Difficulties in Applying Machine Learning The introduction of machine learning into the workflow of NPP operation is difficult for the following reasons: • outdated technical means on which the system is implemented, in particular, a small amount of RAM; • interpretability of machine learning algorithms; • long duration of the processes of collection, processing and research analysis of data: as a rule, it takes about 3–4 years of intensive work of IT specialists (including data specialists), technologists and operators of the respective enterprises. It is also important to note the issue of scalability of solutions based on machine learning algorithms for subsequent projects, which arises in view of the fact that such algorithms are implemented on the basis of statistical data collected on a specific technical object. Since each power unit is unique, then, most likely, the quality of machine learning models will decrease when they are applied to similar units. And if for the units put into operation such models can be retrained on the corresponding historical archives of the NPP control system, then for only the units under construction it is advisable to synthesize data on the historical archive of a full-scale simulator for the initial training of the model with its subsequent retraining during the operation of the unit. The uniqueness of the block is due to different design organizations, as well as the improvement of the technical design of the block due to the experience of operating previously introduced similar blocks. Thus, the difference even between serial units can be not only in engineering equipment, but also in software and hardware complexes of control systems.
6 Conclusion Artificial Intelligence is one of the key technologies in Industry 4.0. This technology finds its application in complex technical objects with real-time control systems. In combination with other technologies of Industry 4.0, the following trends of application of AI in industry: digital twins of technological objects, data mining coupled with IIoT, and computer vision. The use of neural networks as control systems for real equipment is hindered by the following fundamental difficulty, which is relevant at the moment - no one knows how a neural network with a large number of hidden layers makes decisions. It is the interaction of computations within a deep neural network that is crucial for making complex decisions, but these computations are a web of mathematical functions and variables. We cannot use the network as a control system for a real complex system (for example, an NPP power unit) due to safety reasons. At the moment, this problem remains unresolved. This moment is the main difficulty for the implementation of this idea at any power unit, but it does not interfere with the development of a neural network to solve the problem under consideration. Yes, we cannot train the network on a real power unit
Application of Machine Learning for Solving Problems
589
due to security reasons. However, it is possible to simulate the investigated physical environment: you can use either analytical simulators of various blocks (plus, you can add the archives of the upper-level system), or the software and hardware complex Virtual-digital NPP, which is described above. This moment is the main difficulty for the implementation of this idea at any power unit, but it does not interfere with the development of a neural network to solve the problem under consideration. Yes, we cannot train the network on a real power unit due to security reasons. However, it is possible to simulate the investigated physical environment: you can use either analytical simulators of various blocks (plus, you can add the archives of the upper-level system), or the software and hardware complex Virtual-digital NPP, which is described above.
References 1. Geron, A.: Hands-On Machine Learning with Scikit-Learn and TensorFlow. O’Reilly Media Inc., Boston (2017) 2. Patel, A.R., Ramaiya, K.K., Bhatia, C.V., Shah, H.N., Bhavsar, S.N.: Artificial intelligence: prospect in mechanical engineering field—a review. In: Kotecha, K., Piuri, V., Shah, H., Patel, R. (eds.) Data Science and Intelligent Applications. Lecture Notes on Data Engineering and Communications Technologies, vol. 52, pp. 267–282. Springer, Singapore (2021) 3. Swami, M., Verma, D., Vishwakarma, V.P.: Blockchain and industrial internet of things: applications for industry 4.0. In: Bansal, P., Tushir, M., Balas, V.E., Srivastava, R. (eds.) Proceedings of International Conference on Artificial Intelligence and Applications. Advances in Intelligent Systems and Computing, vol. 1164, pp. 279–290. Springer, Singapore (2021). https://doi.org/ 10.1007/978-981-15-4992-2_27 4. GE Digital. Digital twin, https://www.ge.com/digital/applications/digital-twin. 5. Tesla. About Tesla. https://www.tesla.com/about. 6. Cambashi. Industry Knowledge for Business Advantage. https://cambashi.com/siemens-goesall-in-on-digital-twin/amp/. 7. Atomic Expert and Atomic Energy journal, Digital twins in nuclear industry. http://atomicexp ert.com/virtual_npp_rosatom 8. Brunton, S.L., Kutz, N.: Data-Driven Science and Engineering: Machine Learning, Dynamical Systems and Control. Cambridge University Press, Cambridge (2019) 9. Framatome official site, Innovation in AI, data analytics, codes, and neural networks. https://www.framatome.com/solutions-portfolio/portfolio/solution?sol=innovation-inai-data-analytics-codes-and-neural-networks
The Architecture of Cognition as a Generalization of Adaptive Problem-Solving in Biological Systems Andy E. Williams(B) Nobeah Foundation, Nairobi, Kenya [email protected]
Abstract. The emerging science of Human-Centric Functional Modeling has recently been used to develop what has been suggested to be the first model of human cognition with the potential to represent all of the functionality of human intelligence, a significant milestone critical to the goal of creating a real-life computational equivalent of the human mind that is central to the field of Biologically Inspired Cognitive Architectures. This paper provides an overview of this modeling technique and why it can potentially be used to represent all biological processes as generalizations of the same adaptive problem-solving process, as well as providing an overview of why this approach has the potential to solve fundamentally different problems than any preceding one, such as increasing our collective capacity to understand and make use of each of the different cognitive models to the point we are able to collectively to converge on a single understanding that is most “fit” at solving the problem of cognition. Keywords: Artificial general intelligence · Human-Centric Functional Modeling · Functional Modeling Framework
1 Introduction There are a number of biologically inspired architectures that have been proposed to represent human cognition [1]. Rather than defining another, this paper begins by defining a modeling approach with which one might determine whether a cognitive architecture is complete and self-consistent.
2 Human-Centric Functional Modeling In Human-Centric Functional Modeling living systems are modeled as having a set of human-observable behaviours (functions). All the functional states accessible through these functions within a given domain of behaviour form a “functional state space” which the system acting in that domain moves through. Any collection of such systems then moves through a collective functional state space. As an example, the cognitive system executes reasoning and understanding processes, as it does so it moves from one concept © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 590–595, 2022. https://doi.org/10.1007/978-3-030-96993-6_66
The Architecture of Cognition as a Generalization of Adaptive Problem-Solving
591
to another, thereby moving through a space of concepts or a “conceptual space” (the functional state space of the cognitive system). Using this same approach a collective cognition can be represented as navigating through a collective conceptual space in order to solve group problems. Similarly, other living processes such as homeostasis, reproduction, and evolution can be represented as adaptive problem-solving systems which move through their own functional state spaces. These functional state spaces are dynamic in that functional states (for example concepts in conceptual space) can be added or removed, and can also move within that functional state space. In each functional domain this network of functional states is considered to be represented by a graph containing a network of nodes, with each node representing a functional state, and with the connections between functional states representing the behavior through which the system can transition from one functional state to another. This graph of the network of states accessible through these behaviors is a representation of a “functional state space”. In conceptual space each node represents a concept, where those nodes are connected by edges representing the reasoning relationships through which one concept might be transformed into another. These reasoning relationships also define concepts. Therefore any section of the graph is potentially a complete semantic model providing a fully self-contained representation of meaning so that by exchanging sections of this graph it might be possible to exchange and accumulate meaning at exponentially greater rates, as opposed to just exchanging information. Conceptual space is potentially the first complete semantic representation in existence. No other complete semantic representation is believed to exist, since before this Human-Centric Functional Modeling approach it is believed there were no cognitive models suggested by any researcher to have the capacity to represent all of the functions of cognition, and as one researcher has stated “it is hard to imagine that one could give a complete theory of semantic representation outside of a complete theory of cognition in general” [4]. Any system with a stable set of repeatable functions also must stay within a bounded region of a “fitness space” that describes the fitness of the system to execute its functions. A change in fitness of the system occurs as a result of some action, that is, as a result of following some path, in functional state space. Defining a generalized “fitness space” for all problems, such as defined by the three dimensions of target value of fitness, actual value of fitness, and predicted value of fitness, then the path through this fitness space must stay within a bounded region of fitness space. In this sense the motion in fitness space must be stable globally throughout the fitness space, despite potentially being chaotic in functional state space due to random interactions with the environment. From the functional modeling perspective all reasoning or understanding are selected during the course of that navigation by some process within the cognitive system (which we will call the “cognitive awareness process”) that maintains fitness of the cognitive system within a stable range. In terms of the cognitive system we can intuitively understand its position in this fitness space as “cognitive well-being”. In conceptual space a set of concepts is still a concept. A specific problem is represented as the lack of a reasoning path from one concept (one point in conceptual space) to another so the cognitive system has to find
592
A. E. Williams
reasoning or understanding processes that allow it to navigate a path from that specific initial concept to that specific final concept. General problem-solving ability is represented as the ability to find reasoning or understanding processes that allow the cognitive system to potentially navigate a path from any initial concept in general to any final concept in general. The magnitude of general problem-solving ability is represented as being related to the total volume of conceptual space that can be navigated per unit time multiplied by the density of concepts in that volume that the cognitive system must move through. If our cognition is to have the ability to continue to function it must tend to navigate our space of concepts (conceptual space) in a way that solves the problem of maintaining our cognitive well-being (fitness of the cognitive system) within a stable range. From the perspective of HCFM this dynamically stable navigation of the conceptual space is general problem-solving ability in the cognitive domain, and in performing this navigation through selection of reasoning processes the cognitive system acts as a fitness optimization function. HCFM as a paradigm for the design and implementation of the next generation biologically inspired cognitive architectures is intended to model all the functions of cognition, but also all the structures by which those functions might be implemented. In this sense HCFM represents a continuum that considers the relevant dimensions in the most up to date and extensive overview of the literature on cognitive design as of the time of this writing [2], namely it both considers functional and structural models of cognition, as in some cases the solutions assessed to be most fit in this approach might be represented by functional models, and in other cases by structural approaches.
3 The Functional Modeling Framework for Cognition The Functional Modeling Framework (FMF) defines living organisms as consisting of a hierarchy of functional components, each set of which operates within their own functional domain, in addition to having their own functional state space, and operating within their own fitness space. The functional components in this hierarchy that are required to implement cognition, and the functional components within the cognitive domain specifically, constitute the FMF for cognition. Representing the mind as moving through its own functional state space (again, a space of concepts or a “conceptual space”), then it is hypothesized that part of this FMF consists of a set of four operations can “span” the conceptual space in the sense that any concept or reasoning in that space can be represented as some combination of those operations [3]. The FMF also defines functional state spaces for the body, for the emotions, and for the consciousness. All awareness processes by which the consciousness might become aware of perceptions in the body, emotions, or mind are represented in the FMF as paths through an “awareness space”. All of these spaces are represented in the FMF as being integrated into the single conscious self-awareness (i.e. consciousness), which is represented as selecting a sequence of awareness processes with which to navigate the awareness space. In other words one awareness process might direct awareness to a reasoning process in the mind, another awareness process might direct awareness to a sensation in the body. The FMF defines three functions hypothesized as being required for the consciousness to be “aware” of potentially every point (every perception) in every
The Architecture of Cognition as a Generalization of Adaptive Problem-Solving
593
other perceptual space, and also required for the cognition to be able to “conceive” of points in the other perceptual spaces. The separate physical awareness process in our bodies, the emotional awareness process in our emotions, and the cognitive awareness process in our mind can continue to process input and output in parallel to each other, but our conscious awareness does not always focus on those processes. This is consistent with our experience of our consciousness system, which can be observed to switch it’s focus from awareness of one system to another, like a spotlight moving to different areas on a stage (Fig. 1).
Fig. 1. Conscious switching between processes.
4 Intelligence, Procedural Computer Programs, and Pattern Detection As mentioned, in each functional domain problems are defined as the lack of a path from an initial point in functional state space to a final target point in that space. Solutions are defined as paths which accomplish that navigation. In any adaptive domain there are two ways that problems can be solved. One is through recalling patterns of solutions (i.e. paths which solve the required problem of navigation) observed in the past when such solutions can’t be computed (i.e. when solutions are non-computable). Another is through using known path segments to compute the unknown path (i.e. using those path segments to compute the solution to the problem of navigation when solutions are computable). In the domain of cognition, cognitive psychologists have confirmed the existence of these two problem-solving methods, namely type 1 (fast or intuitive) reasoning, and type 2 (slow or rational methodical) reasoning. Type 1 reasoning solves non-computable
594
A. E. Williams
problems by recalling patterns of solutions observed in the past. Type 2 reasoning solves computable problems through some methodical process such as evaluating an equation. The conceptual space model represents both of these reasoning types and is therefore capable of modeling both AI solutions that represent pattern matching in analogy to type 1 (intuitive) reasoning, and procedural software programs that represent type 2 (rational methodical) reasoning. The cognitive awareness process is then potentially capable of navigating both.
5 Implications If every AI solution or procedural software program can be represented as a set of paths in conceptual space, then each AI solution or AGI model can potentially be decoupled into a library of functions that any AGI architecture might use to increase it’s general problem-solving ability (intelligence) if that AGI architecture is able to model itself and all other AI solutions or AGI models in this common way. Every reasoning process executed by any researcher’s model of cognition can then potentially be represented as a path through conceptual space. By decoupling that model into a set of reasoning processes, those processes can potentially be added to a library that can be used by any other model to increase its general problem-solving ability. We can then define a single measure of fitness by which all such processes might be compared. This can enable that model to be reused in an exponentially greater number of instances where it is most fit in achieving an outcome. Given that this model of conceptual space proposes an objective definition of the meaning of general problem-solving ability (intelligence) and predicts that an exponential increase in general problem-solving ability over that of individual humans is achievable by an artificial cognition [5], this potential increase in ability to solve any problem in general introduces the possibility that HCFM and GCI might exponentially increase our collective capacity to reuse components of every biologically inspired cognitive architecture to the point that it is reliably achievable to converge on the single collective understanding of cognition that is most fit. Furthermore, if it is possible to define a General Collective Intelligence platform able to exponentially increase the group’s fitness at solving any general problem [6], such a system might be used to increase the ability of all research groups to collectively converge on a working model of cognition to the point that artificial cognition is for the first time reliably achievable. In turn, when applied to software and hardware, GCI predicts a world in which groups selfassemble software and hardware in a self-sustaining and self-adapting way to maximize collective outcomes and to achieve collective outcomes not possible today. Since such technology must have a “genome” [7] enabling it to self-assemble when the patterns of solutions become sufficiently complex, this technology will in effect be “grown”.
6 Conclusions The approach of Human-Centric Functional Modeling has been used to present what is believed to be a computationally grounded science of mind. In summary, Human-Centric Functional Modeling represents cognition in terms of a set of reasoning processes that can be navigated, which we have called the external functions of cognition, and a set
The Architecture of Cognition as a Generalization of Adaptive Problem-Solving
595
of functions required to select those processes in a way that navigates the conceptual space with general problem-solving ability, which we have called the internal functions of cognition. This approach attempts to solve all the problems of cognition by simply reusing the functional models and implementations the larger community is currently developing. Using General Collective Intelligence to coordinate our efforts, it may be possible to exponentially increase our capacity to converge on the single understanding of cognition that is most fit at solving the problem. It’s hoped that this article will inspire others to independently confirm or refute the viability of this model and the feasibility of using it to exponentially increase individual as well as collective intelligence.
References 1. Goertzel, B., Lian, R., Arel, I., De Garis, H., Chen, S.: A world survey of artificial brain projects, part II: biologically inspired cognitive architectures. Neurocomputing 74(1–3), 30–49 (2010) 2. Lieto, A.: Cognitive Design for Artificial Minds. Routledge, Abingdon (2021) 3. Williams, A.E.: A model for artificial general intelligence. In: Goertzel, B., Panov, A.I., Potapov, A., Yampolskiy, R. (eds.) AGI 2020. LNCS (LNAI), vol. 12177, pp. 357–369. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52152-3_38 4. Griffiths, T.L., Steyvers, M., Tenenbaum, J.B.: Topics in semantic representation. Psychol. Rev. 114(2), 211 (2007) 5. Williams, A.E.: Human intelligence and general collective intelligence as phase changes in animal intelligence (2020). https://doi.org/10.31234/osf.io/dr8qn 6. Williams, A.E.: Defining a continuum from individual, to swarm, to collective intelligence, to general collective intelligence. Int. J. Collab. Intell. (2021). in press 7. Williams, A.E.: Defining the genome and gametes of a general collective intelligence based smart city (2020). https://doi.org/10.31730/osf.io/b6ep8
Cognitive System for Traversing the Possible Worlds with Individual Information Processes Viacheslav Wolfengagen1(B) , Larisa Ismailova1 , Sergey Kosikov2 , and Sebastian Dohrn2 1
National Research Nuclear University “Moscow Engineering Physics Institute”, Moscow 115409, Russian Federation [email protected] 2 NAO “JurInfoR”, Moscow 119435, Russian Federation
Abstract. In this work, an individual process, or for short, an individual is selected, and the “history” of its transformations is traced, depending on the scenario. Scenario is considered as the restriction imposed on the behavior of individual. States, or separate scenes, are achieved by possible worlds, and transitions between worlds are given by the relation of reachability, or evolvent. The evolvent restricts a transition from one stage of knowledge to later stages. In fact, in this work, the only preliminary work has been done to study the dynamics of the individual and a leading example is formulated. Its formulation gives rise to the induced commutative diagrams in a representative functor category. The mathematical apparatus involved is based on the variable domains and the functor category that represents them. Its purpose is to develop a certain intuition of semantic modeling, and then, if possible, to form a cognitive system. Some limited semantic models have been implemented and are undergoing further practical testing and improvement.
Keywords: Possible worlds Semantic model · Functor
1
· Cognitive system · Information process ·
Introduction
The purpose of this work is not so much an exhaustive study of the issue – this required a much more impressive volume – as an indication of the real attempts of a team of authors to develop the problems of using the idea of the possible worlds in the construction of semantic models of varying degrees of generality. Some of them are gradually being supported by implementation. The applied modeltheoretical apparatus is not something frozen, but is gradually being formed in accordance with the requirements of current information technologies. At first glance, developing information technologies may seem like a conglomerate, but the task is precisely to reveal the cognitive system behind them, including c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 596–601, 2022. https://doi.org/10.1007/978-3-030-96993-6_67
Traversing the Possible Worlds
597
possible semantic models. In turn, the presentation of the data also changes, which, if necessary, converts various semantic properties. The evolution of views on such a cognitive system is the subject of a separate study. But at the core are information processes and the channels through which they are distributed. At the apparent level of abstraction, information graphs suggest themselves for presentation. The next level is formed by semantic networks and semantic maps. Their use by the peculiarities of the tasks to be solved and, as a rule, practice is satisfactory. Attempts to generalize information processes based on semantic maps often lead to the use of category theory. Functor categories and Cartesian closed categories (c.c.c.) are considered promising, but in the case of really arbitrarily changing processes, use the variable domains. In this work, an individual process is selected, or, for short, an individual and the “history” of transformation is traced, depending on the scenario. States, or separate scenes, are achieved by possible worlds, and transitions between worlds are given by the relation of reachability, or evolvent. In fact, in this work, only preliminary work has been done to study the dynamics of the individual and a leading example. The mathematical apparatus involved is based on the variable domains and the functor category that represents them. Its purpose is to develop a certain intuition of semantic modeling, and then, if possible, to form a cognitive system. Some limited semantic models have been implemented and are undergoing further practical testing and improvement. 1.1
Related Authors’ Works
The abundance of information services provided makes us take a fresh look at what functions they actually perform. For this purpose we indicate the six promising areas as follows: – the parameterized semantic model for mappings [7]; – mechanisms for supporting the network of links to parameterized data objects [4]; – a semantic model for indexing in the hidden Web [8]; – the Cognitive System to clarify the semantic vulnerability and destructive substitutions to verify data consistency [5]; – establishing the equalities between combinators to evaluate arbitrary expressions and obtain the computational look at semantics [6]; – the channeled semantic models for data enrichment in the Information Graphs Environment based on a specialized architecture of information channels [11]. This choice is not occasional but reflects both the semantic information models of disparate levels of abstraction and the associated implementation activity behind them. 1.2
Theoretical Ground Relative Work
An early incite in cross referencing individuals using possible worlds was discussed in [12] and even in [15] as the calculus of relations.
598
V. Wolfengagen et al.
Further study based on a functor category was given in [14] and in [3] using morphisms and Functors. More recent study was given in [1] as the combined topics on topology and modality. This resulted in the topological interpretation of first-order modal Logic. The recent general study was represented in [16] as the Univalent Foundations Program based on homotopy type theory. The identity and existence in intuitionistic logic was analyzed in [13]. On the other hand there are promising attempts of implementing a Categorical Information System, see [10], and a database of categories [2]. This resulted in invention of category based Sketch Data Models, Relational Schema and Data Specifications and mainly covered in [9].
2
Script Analysis
At present, the rate of technology development is so high that it does not allow to bring any project to full readiness. This is especially true in the field of software engineering. The fact is that a newly completed (software) product immediately after completion begins to become obsolete, and often already become obsolete in the course of work on it. It remains to take as a basis permanent variability, which requires following a special model of the technological process, during which speed and dynamism are important. You have to adapt to this whenever a development is planned. 2.1
Agility Principle
It remains less and less space for constancy, and mathematics is in dire need of models, methods and techniques for working with variable sets, the elements of which are not just constants in the usual sense, but generalized elements. All of this requires the adoption of a special principle of agility. As it turns out, this notion of agility is mathematically modeled by functors. Functors, in turn, represent the idea of mutability. 2.2
Direct and Inverse Search Problem
The following is analysis of the script given by a scheme “prerequisite-effect”. Here is a processing of the questions: Image. Question: find the image of inhabitant h of the world A. Answer: trans¯ mutant hk◦g f ◦e of the world D, possibly, displacing h. Who has displaced the ¯ inhabitant h? ¯ of the world D, possibly, Preimage. Question: find the preimage of inhabitant h ¯ g ◦k transmutant. Answer: the inhabitant f ◦e h of the world A, which was ¯ g ◦k . From what could appear the h; the inhabitant h, evolved up to f ◦e h ¯ What was earlier the inhabitant h? ¯ inhabitant with the properties of h?
Traversing the Possible Worlds
2.3
599
Analysis by Prerequisites
(The events evolve from A to B and from B to D!) Consider the possibilities for transmutation of preimage h into image hk◦g f ◦e . The prerequisite prer1 of world ‘Supply’ was the world ‘Order’, and the prerequisite prer2 of the world ‘Receive’ was the world ‘Supply’. For the individuals there is Fig. 1. A prerequisite of occurring in the world B is the transition prer1: B → A, and a prerequisite of occurring in the world D is the trahsition prer2: D → B. An effect of inhabiting in the world A can be the transition eff1: A → B, and an effect of occurring in the world B can be the transition eff2: B → D. We can assume f = prer1, e = prer2, and also g = eff1, k = eff2. We consider the generalized elements, thus we should distinct ‘order’ (individual, h), ‘Order’ (world A, domain A), ‘Order’ (the operation HT = C on the class (world) A, giving rise to codomain HT (A) = C(A)). Individuals. Consider the individuals h : A → T , which assumed as generalized elements. Their classes HT (A) = {h|hA → T } are considered as the variable sets. A Order
T after g U after k V
h
prer1 eff1
HT (f )
Hg (A) hg
hf
eff2
HT (e)
Hg (B) HU (f )
Hk (A) ? hk◦g
prer2
B Supply
hgf
HU (e)
Hk (B) HV (f )
hk◦g f
HV (e)
D Receive
hf ◦e Hg (D) hgf ◦e Hk (D) ¯ hk◦g f ◦e = h
¯ Answer: By the inhabFig. 1. Question: by whom could be displaced the inhabitant h? k◦g itant hf ◦e . The worlds are A, B, D. An ability to compose the functions. The composing functions are distinct. hg = g ◦ h, hf = h ◦ f . Hk ◦ Hg = Hk◦g . H(e) ◦ H(f ) = H(f ◦ e). V ⊆ U ⊆ T.
The script is represented in terms of after and prer-eff. Domains. The classes HT (A) = {h|h : A → T } are considered as the variable sets.
600
3
V. Wolfengagen et al.
Conclusion
1. A model of the individual information process has been built. It takes into account the scenarios of the evolving of events, which is a connected graph of possible worlds. 2. The variable domains are defined as a collection of individuals. A semantic model of variable domains is built. 3. The foundations for the formation of a cognitive system based on semantic models of transition through possible worlds have been laid. Acknowledgement. This research is supported by the Russian Foundation for Basic Research, RFBR grant 20-07-00149-a. Thanks to numerous and fruitful discussions with A. G. Panteleev, M. A Bulkin and Ph.D. Yu. R. Gabovich managed to achieve a clearer understanding of the role and place of objects in the computer system.
References 1. Awodey, S., Kishida, K.: Topology and modality: the topological interpretation of first-order modal logic. Rev. Symbolic Logic 1(2), 146–166 (2008) 2. Fleming, M.W., Gunther, R., Rosebrugh, R.D.: A database of categories. J. Symb. Comput. 35(2), 127–135 (2003) 3. Gierz, G., Hofmann, K.H., Keimel, K., Lawson, J.D., Mislove, M.W., Scott, D.S.: A Compendium of Continuous Lattices, chap. Morphisms and Functors, pp. 177–236. Springer, Heidelberg (1980). https://doi.org/10.1007/978-3-642-67678-9 5 4. Ismailova, L., Kosikov, S., Wolfengagen, V.: Prototype mechanisms for supporting the network of links to parameterized data objects. Procedia Comput. Sci. 190, 317–323 (2021) 5. Ismailova, L., Wolfengagen, V., Kosikov, S.: Cognitive system to clarify the semantic vulnerability and destructive substitutions. Procedia Comput. Sci. 190, 341–360 (2021) 6. Ismailova, L., Wolfengagen, V., Kosikov, S.: Equalities between combinators to evaluate expressions. Procedia Comput. Sci. 190, 332–340 (2021) 7. Ismailova, L., Wolfengagen, V., Kosikov, S.: A mathematical model of the feature variability. Procedia Comput. Sci. 190, 312–316 (2021) 8. Ismailova, L., Wolfengagen, V., Kosikov, S.: A semantic model for indexing in the hidden web. Procedia Comput. Sci. 190, 324–331 (2021) 9. Johnson, M., Rosebrugh, R.D.: Sketch data models, relational schema and data specifications. Electr. Notes Theor. Comput. Sci. 61, 51–63 (2002) 10. Johnson, M., Rosebrugh, R.: Implementing a categorical information system. In: Meseguer, J., Ro¸su, G. (eds.) AMAST 2008. LNCS, vol. 5140, pp. 232–237. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79980-1 18 11. Kosikov, S., Ismailova, L., Wolfengagen, V.: Data enrichment in the information graphs environment based on a specialized architecture of information channels. Procedia Comput. Sci. 190, 492–499 (2021) 12. Scott, D.: Advice on modal logic. In: Lambert, K. (ed.) Philosophical Problems in Logic: Some Recent Developments, pp. 143–173. Springer, Dordrecht (1970). https://doi.org/10.1007/978-94-010-3272-8 7
Traversing the Possible Worlds
601
13. Scott, D.: Identity and existence in intuitionistic logic, pp. 660–696. Springer, Heidelberg (1979). https://doi.org/10.1007/BFb0061839 14. Scott, D.: Relating theories of the λ-calculus. In: Hindley, J., Seldin, J. (eds.) To H.B. Curry: Essays on Combinatory Logic, Lambda-Calculus and Formalism, pp. 403–450. Academic Press, Berlin (1980) 15. Tarski, A.: On the calculus of relations. J. Symbolic Logic 6(3), 73–89 (1941). http://www.jstor.org/stable/2268577 16. Univalent Foundations Program, T.: Homotopy Type Theory: Univalent Foundations of Mathematics. http://homotopytypetheory.org/book/ (2013)
Integrated Multi-task Agent Architecture with Affect-Like Guided Behavior James B. Worth1 and Mei Si2(B) 1
2
Substrate AI, Valencia, Spain [email protected] Rensselaer Polytechnic Institute, Troy, USA [email protected]
Abstract. Inspired by how people’s cognitive and affective systems work together, this work proposes a reinforcement learning agent framework to support adaptive learning by dynamically adjusting the agent’s goals, focus of attention on states, and available actions. The agent framework includes subsystems for modeling affective states, sub-goal selection, attention, and action affordance, which are all implemented as individual reinforcement learning systems. The agent’s affective states moderate its other subsystems. Results show that agents using this architecture outperform vanilla Q-Learning agents in a mini-Go game. Keywords: Goal learning
1
· Reinforcement learning · Attention
Introduction and Motivation
Deep reinforcement learning (RL) algorithms have advanced rapidly in many fields, from robotics to automated stock trading, to AI-based game agents. The most visible achievements are their successes in playing large-scale complex games, which are traditionally believed to need human intelligence, such as AlphaGo for playing the game Go [11], OpenAI Five for playing Dota 2 [1], and AlphaStar for playing StarCraft II [13]. Thus, game environments have often been used as a benchmark for reinforcement learning algorithms. In this work, we propose the Integrated Multi-Task (IMT) agent, which is highly inspired by how people’s perception, attention, and decision-making processes are shaped by their past experiences and affective states. It aims at achieving fast learning in complex environments through the dynamic reduction of the state, goal, and action spaces in which the agent reasons. The IMT agent architecture uses a hierarchical reinforcement learning based design that supports general adaptive multi-task learning and execution. The agent framework includes subsystems for modeling affective states, sub-goal selection, attention, and action affordance, which are all implemented as individual reinforcement learning systems. The agent’s affective states moderate its other subsystems and guide its decision-making process by reducing the state-space that the agent reasons on and selects contextually aware sub-goal and available actions. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 602–612, 2022. https://doi.org/10.1007/978-3-030-96993-6_68
Integrated Multi-task Agent Architecture with a Ect-Like Guided Behavior
603
The proposed agent design is inspired by the various components of human cognition, including affective regulation, attention, goal learning, working memory, and skill learning. The initial evaluation successfully demonstrates the agent’s advantage over the vanilla Q-learning approach in a mini (5 × 5) Go game environment. 1.1
Reinforcement Learning
Reinforcement learning is inspired by how humans and animals learn from interacting with the environment [12]. It assumes the agent is actively pursuing a set of goals. Through exploring its action space, the agent gradually learns the best strategy to maximize its utilities. Reinforcement learning can be defined as an MDP problem with a tuple of < S, A, P a, R >, where S is the set of states the agent can be in, A is the set of actions the agent can take, P a is the probability of state transitions by taking action a at state s, P a = P [St+1 = s |St = s, At = a], and R is the reward received from the environment. Through repeated trial and error, the agent learns a policy π for choosing action a at state s for maximizing its future reward. Many reinforcement learning algorithms have been proposed. The basic Qlearning algorithm is a value-based model-free reinforcement learning algorithm. Using temporal difference update, it seeks to estimate the Q value associated with state (s) and action (a) pairs. The basic Q-learning algorithm only works with discretized action spaces. DQN replaces the Q-table with a neuro network and further improves the effectiveness of learning using experience replay [8]. Several improvements to the vanilla DQN algorithms have been suggested. Double DQN [4] and dueling DQN [14] use separate neural networks to stabilize the weights of the neuro network during training. DDPG extends Q-learning to continuous action space [6]. TRPO improves the efficiency of Q-learning by limiting the direction of the gradient descent process [9]. PPO optimizes TRPO by providing an approximation function that runs faster than TRPG the majority of the time [10]. Hierarchical Reinforcement Learning (HRL) is a promising approach that solves long-horizon problems with little or sparse rewards [5]. HRL improves upon Q-learning by enabling the agent to plan behaviors more efficiently while still learning low-level policies for execution. 1.2
Proposed Enhancement Inspired by Human Cognitive Systems
The idea behind reinforcement learning has always represented a strong link between neuroscience and AI. All reinforcement learning algorithms implement the basic idea of learning by repeatedly exploring the environment. More specific phenomena in human learning processes have also been modeled. For example, the dopamine system explored the idea of introducing both internal and external rewards to the reinforcement learning process [3]. In this work, we seek to explore a reinforcement learning model inspired by how people’s perception, action, and affective systems work together. The two key ideas proposed are reward-based dynamic goal selection and limited attention and action affordance.
604
J. B. Worth and M. Si
Most existing reinforcement learning algorithms assume the agent’s goals are fixed while people and animals constantly adjust their objectives. The adjustments are made based on the circumstance and past successes/failures in achieving the objectives. We try to mimic this phenomenon by computing the change of the agent’s reward and the speed of the change, and allowing the agent to learn a subgoal through a separate reinforcement learning process using this computed information combined with its state and external rewards. We name the change of reward, and the speed of the change emotive and arousal, respectively, because they mimic the affective experience in goal-orientated behaviors in humans and animals. The subgoal works as an intrinsic reward to the agent. Further, a dynamically updated attention mask for the agent’s states and affordance mask for the agent’s actions are applied. This mimics people’s selective attention and estimation for action affordance. This proposed approach is inspired by the fact that biological agents are constantly bombarded by vast amounts of complex sensory data and numerous and potentially conflicting reward signals. One way an organism can solve complex problems is by reducing the state space in which the agent needs to reason upon.
2
Agent Architecture
The proposed agent architecture utilizes a hierarchical design that implements different aspects of the agent’s workflow. The Integrated Multi-Task (IMT) Model, as shown in Fig. 1 is a top-down system that consists of multiple subsystems. Individual Q-learning models are used to model the agent’s affective, goal selection, attention, and adjustment of action affordance processes. These models set parameters in the Experiential Model, which also uses Q-learning to interact with the outside environment. In turn, the external rewards received by the Experiential model are used to train all other models. Thus, the IMT agent learns its subgoal, attentional mask, and action affordance over time to regulate its behaviors. 2.1
Agent State
The agent’s state is composed of state features and their values. We call this environmental state to differentiate from other state features computed by the agent, e.g., the emotive and arousal states. For example, we used a 5 * 5 mini Go game to test our agent. This game contains 25 state features corresponding to the 25 board positions. Furthermore, each state feature can have three alternative values corresponding to the status of the board position: empty, black, and white. 2.2
Affective System
The affective system is composed of the Arousal and Emotive models. The reward signals for the affective system are computed by evaluating the percent difference of the first and second derivative changes in Q value of the agent over time.
Integrated Multi-task Agent Architecture with a Ect-Like Guided Behavior
605
Fig. 1. Integrated multi-task model.
Using a sliding window of N steps, the Emotive model averages the first derivatives of the Q values in this window, and the Arousal model averages the second derivative of the Q values. The typical size for the sliding window is 20. N 1 dQ avg f irst derivative ← N t=1 dt
(1)
N 1 d2 Q avg second derivative ← N t=1 dt2
(2)
Then both the Emotive and the Arousal models use a Q-Learning agent to learn their final outputs. For the Arousal model, the agent’s state is environmental state concatenated with the avg second derivative value discretized using an interval of 0.01. The reward is computed as |target − avg second derivative| ∗ external reward
(3)
606
J. B. Worth and M. Si
The action of the Q-Learning agent is the final output –arousal, which is a numeric value ranging from 0 to 1.0, discretized using an interval of 0.01. Similarly, for the Emotive model, the agent’s state is environmental state concatenated with the avg f irst derivative value discretized using an interval of 0.01, and the final output from the Q-Learning agent for Arousal. The reward is computed as |target − avg f irst derivative| ∗ external reward
(4)
Again, the action of the Q-Learning agent is the final output–emotive, which is a numeric value ranging from 0 to 1.0, discretized using an interval of 0.01. The Arousal/Emotive reward functions are computed by the differences from the expected values of the first and second derivative of the Q value over time. The Arousal target value is set statically for the agent and is designed to regulate the reward-seeking behavior of the agent. In this way, if the agent receives reward beyond the target rate, the agent will adopt different behavior and reduce the agent’s possible execution of higher-risk behavior. The Emotive target value is dynamically set by the Arousal model. 2.3
Goal Model
The Affective system provides enhanced state context with the additional features of emotive and arousal, and the Goal model selects the next subgoal for the Experiential model. When the expected external rewards are equal during the interaction with the external environment, the agent will bias towards actions that lead to the subgoal, i.e., the expected next state is closer to the subgoal. Thus, the Goal model provides the agent an intrinsic reward. As the agent interacts with the environment, the states that the agent visits are added to the Goal model and provide the candidate states for the subgoal to be achieved (candidate states). The goal model uses a Q-learning agent to learn the subgoal. The agent’s state is environmental state concatenated with emotive and arousal states. Initially, the subgoal state is randomly selected from the candidate states, denoted as an index number. There are only two actions the goal model Q-learning agent can take: either increasing the index of the subgoal state or decreasing it. This agent uses the external reward received from the Experiential model as its reward. Observations of the new environmental state are passed from The Experiential model too. Over time, the agent learns to select contextually relevant subgoals to maximize its reward. 2.4
Attentional Model
Inspired by the attention process in humans and animals, the Attentional model selects the active state features that the agent should be attending to, i.e., use when interacting with the environment. The Attentional model selects state features by creating and modifying an Attentional mask which is a bit map vector.
Integrated Multi-task Agent Architecture with a Ect-Like Guided Behavior
607
The mask has the same size as environmental state. The Attentional mask is represented by the decimal value of the bit map. For example, assuming the agent’s environmental state has five state features, the value for the Attentional mask below is 42. subgoal mask = 1 0 1 0 1 0
(5)
Over time as the agent interacts with the environment, the Attentional Model learns to select feature masks that are contextually relevant to its task. The Attentional Model leverages a Q-learning agent that functions similarly to the Goal model. The Q-learning agent’s state is environmental state. Initially, the Attentional mask is randomly set, and represented as its decimal integer value. The Q-learning agent has two actions: either increasing or decreasing the decimal value of the mask. The external reward received from the Experiential model is used as this Q-learning agent’s reward. The Experiential model also updates the environmental state. 2.5
Affordance Model
The Affordance model is responsible for selecting what action types are enabled within a given context. As the agent interacts within an environment, the Affordance model provides an action inhibition function similar to how the Attentional model limits the active states the agent considers. The Affordance uses a similar masking approach as the Attentional model, where the state features are composed of environmental state, emotive, and arousal states. The affordance mask is a bit map that specifies which action is applicable in each state. Similarly, as in other models, the mask is randomly initialized. Then using the external reward observed from the Experiential Model as reward, a Q-learning agent either increases or decreases the mask’s value, and eventually learns to reduce the possible errant actions. 2.6
Experiential Model
The Experiential model is responsible for executing agent behavior in the environment. It is also responsible for passing the environmental state and rewards received from the environment to all other models for their training purposes at each time step. In return, it receives emotive, and arousal values, subgoal state, Attentional mask, and Affordance mask from the Affective, Goal, Attentioanl, and Affordance models, respectively. The emotive and arousal values become part of its state. The Attentional mask is applied to reduce its full state space to only the active state space. Then a Q-learning agent is used to interact with the environment. The set of actions the Q-learning agent can take is the set of actions supported by the environment. For example, in the game Go, the action set includes placing a piece at any possible position on the board. The subgoal
608
J. B. Worth and M. Si
Algorithm 1. Experiential Model () states ← environmental state + emotive + arousal actions ← actions supported by the environment while T rue do states ← states ∗ Attentional mask action ← Q Learning with subgoal f or tie breaking action ← action f illeted by Af f ordance mask environmental state, reward ← execute action in the environment states ← environmental state + emotive + arousal emotive, arousal ← Af f ective M odel(reward) subgoal ← Goal M odel(states, reward) Attentional mask ← Attentional M odel(environmental state, reward) Af f ordance mask ← Af f ordance M odel(states, reward) end while
is used for tie-breaking during the agent’s action selection process. If multiple actions are of equal utility to the agent, the agent will take the action that can move its state closer to the subgoal state. Finally, if the action selected by Qlearning is masked in the Affordance mask, the agent will not take any action during that turn. Algorithm 1 describes the workflow of the Experiential Model.
3
Evaluation
The IMT agent framework has been applied to many real-world applications at Substrade AI, including financial trading, agtech (animal management, water conservation, product optimization), and renewable energy (power plant monitor automation). One of the IMT agent’s most successful applications is stock trading, which is described in Sect. 3.1. In addition, we present an experimental study where the IMT agent competes against the standard Q-Learning agent in a mini Go game. 3.1
Stock Trading
The financial platform presents unique challenges and opportunities for reinforcement learning algorithms. Using historical and current datasets of financial markets and the goal of maximizing profit, the application of reinforcement learning has the potential to provide a viable means of asset management, risk assessment, and security valuation. At the same time, algorithm-based trading has always been regarded as a challenging task because of the noisiness of the data. The IMT agent has been used to manage live portfolios for the past three years. Published on the Collective2 platform, these portfolios have been performing in the top 2–5 percentile with an average annual return of 49.7% vs. S&P 500 average return of 10%–11% annually. A trading model is composed of
Integrated Multi-task Agent Architecture with a Ect-Like Guided Behavior
609
Accumulative Win Rates IMT Agent Win Rate
Q Agent Win Rate
Draw Rate
100
75
50
25
0 500
1000
1500
2000
2500
3000
2500
3000
2500
3000
(a) Accumulative Win Rates IMT Agent Win Rate
Q Agent Win Rate
Draw Rate
100
75
50
25
0 500
1000
1500
2000
(b) Accumulative Win Rates IMT Agent Win Rate
Q Agent Win Rate
Draw Rate
100
75
50
25
0 500
1000
1500
2000
(c)
Fig. 2. Comparison of accumulative win rates: (a) Episode 0. (b) Episode 1. (c) Episode 2
610
J. B. Worth and M. Si
an ensemble of agents that learn to trade a security from a collection of securities. Training of these models uses daily technical data, which includes: Last Trade Price, High, Low, Close, 3v7 Momentum, and Volume with the agent’s actions include Buy, Sell and Hold. From the collection of trading results, the top-performing models are used to build a portfolio. 3.2
Mini Go Game
To further test the IMT agent architecture, we created a mini Go game simulator with a 5 × 5 board and used a standard Q-Learning agent as the opponent. The agents played three episodes, with 3000 games played per episode. The agents alternate at starting the game to account for the first-hand advantage. The hyperparameter values for test/reference agents: – Alpha = 0.1 – Gamma = 0.9 – Epsilon = 0.1. The results from this study support our hypothesis that our proposed agent architect improves the performance of the vanilla Q-learning agent. As we can see in Figs. 2a, 2b, and 2c, the IMT agent always outperform the Q-learning agent. By the end of the three episodes, the IMT agent’s win rate is 4.96%, 7.69%, and 5.47%, higher than the Q-learning agent’s.
4
Discussion and Future Work
The performance of the IMT agent in the mini Go game seems to be less impressive than in its real-world application of stock trading. This may be caused by the affective system’s moderation on decision-making being more suitable for stock trading than for the mini Go game. In stock trading, managing risk is essential. Therefore, the agent shouldn’t always aim for faster reward growth since that is often too risky. The target values in the affective system prevent the agent from doing that. However, fast reward growth is typically not risky in the mini Go game. As part of future work, we plan to further develop the agent by incorporating a richer set of mechanisms by which the affective system can influence the agent’s decision-making processes. The IMT agent was developed and refined through its use in real-world applications. However, it has not been tested systematically against other reinforcement learning algorithms on established benchmarks. We are interested in running more controlled experiments and comparing its performances with other state-of-the-art reinforcement learning agents in the future. We are particularly interested in running experiments on two platforms. One is FinRL which is a deep reinforcement learning library for quantitative finance [7]. It contains simulated environments that can be configured with different stock market datasets and trading agents, and thus provides a convenient platform for evaluating different trading algorithms. We also want to evaluate the IMT agent’s performances in
Integrated Multi-task Agent Architecture with a Ect-Like Guided Behavior
611
games systematically. We will use OpenAI Gym, which is a classical environment for testing reinforcement learning algorithms [2]. While running these experiments, we also plan to use ablation and test the effectiveness and performance of the individual models separately. This will help us understand each model’s contribution to the agent’s performance and identify possible opportunities for enhancements. Finally, we are interested in studying whether the performance of the system can be improved by offline training. At its current state, all the models are trained online. We want to experiment with adding an experience reply buffer to the Goal, Attentioanl, and Affordance models.
5
Conclusion
This paper proposes a reinforcement learning agent with a hierarchical learning architecture inspired by how people’s cognitive and affective systems work together. The architecture contains an Integrated Multi-Task Model (IMT), which uses the environment context and changes in reward over time to learn goals, attentional features, and action affordances to regulate behavior and build enhanced reference frames. Experiments performed on the 5 × 5 Go game show that the IMT agent successfully demonstrated improved gaming results compared to the vanilla Q agent.
References 1. Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019) 2. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016) 3. Castro, P.S., Moitra, S., Gelada, C., Kumar, S., Bellemare, M.G.: Dopamine: a research framework for deep reinforcement learning. arXiv preprint arXiv:1812.06110 (2018) 4. Hasselt, H.V.: Double Q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010) 5. Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Process. Syst. 29, 3675–3683 (2016) 6. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015) 7. Liu, X.Y., et al.: FinRL: a deep reinforcement learning library for automated stock trading in quantitative finance. arXiv preprint arXiv:2011.09607 (2020) 8. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013) 9. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015) 10. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
612
J. B. Worth and M. Si
11. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018) 12. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018) 13. Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019) 14. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003 (2016)
The Control Algorithm of Compressor Equipment of Automobile Gas-Filling Compressor Stations with Fuzzy Logic Elements Andrew A. Evstifeev1,2 , Margarita A. Zaeva2(B) , and Nadezhda A. Shevchenko2 1
LLC “Gazprom VNIIGAZ”, Proektiruemyj proezd 5537, 15, 1, Razvilka, s.p. Razvilkovskoe, Leninsky District, Moscow Region 142717, Russian Federation 2 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoe shosse, 31, Moscow 115409, Russian Federation [email protected] https://mephi.ru
Abstract. The need for motor fuel has a pronounced unevenness. At the same time, the unevenness of fuel consumption negatively affects the energy and economic efficiency of the production activities of automobile gas-filling compressor stations. The paper proposes algorithms for controlling compressor equipment that reduce this negative effect. The application of the theory of fuzzy sets and the fuzzy inference algorithm for estimating the parameters and the residual resource of gas equipment in the presence of fuzzy values in the equations allows us to obtain numerical values of the uncertainty of the estimated parameters. The use of varying the test results of equipment (nominal reference objects) allows you to make informed decisions when operating a vehicle if an expert has results for a limited number of tests with a set of models built on their basis. Keywords: Takagi-Sugeno algorithm · Automobile gas-filling compressor stations · Control algorithm
1
Introduction
The need for motor fuel has a pronounced unevenness. Among the main types of irregularities in the consumption of motor fuel, there are daily and seasonal ones. The daily unevenness is associated with production activities, the opening hours of consumer objects, the entertainment sphere and the biological rhythms of the general population. Seasonal unevenness is associated with changes in the intensity of production processes, primarily agricultural, construction work which is carried out in the spring and autumn period, as well as mass delivery of products and goods before the New Year holidays. Filling stations of transport c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 613–619, 2022. https://doi.org/10.1007/978-3-030-96993-6_69
614
A. A. Evstifeev et al.
Fig. 1. General view of the control scheme for switching on and off of compressors at CNG stations: 1 – is a high-pressure compressor with an electric drive; 2 – is compressor power supply lines; 3 – is a relay for switching on/off the power supply to the compressor motor; 4 – is a controller with rigid logic; 5 – is a pressure gauge with a pressure sensor; 6 – is a gas filling column (GFC) for compressed natural gas; 7 – is a gas tee with a measuring tube.
with compressed natural gas (CNG), as well as other filling stations with motor fuel, are subject to factors of uneven fuel consumption [1]. At the same time, the unevenness of fuel consumption negatively affects the energy and economic efficiency of the production activities of automobile gas-filling compressor stations (CNG stations). Currently, the electric drives of the compressor equipment of the CNG station, which is the main consumer of electricity at the station, are controlled by automatic control with rigid logic according to the configured boundary pressure values in the station battery. Figure 1 shows a typical control scheme for electric drives of compressor equipment of CNG stations. The principle of operation of this solution option is as follows: if the maximum pressure is set to 25 MPa in the battery, the pressure gauge with a pressure sensor (5) will output the corresponding voltage value to the control unit (4), which will send control signals to the relay R1 ,.., Rn (3) about opening and disconnecting the voltage supply from the source (2) to the electric drives of the compressors K1 ,.., Kn (1), after which the natural gas compression process stops. When the vehicle is connected to the GFC (6) and the remote filling device tap is opened, as the high-pressure gas flows through the GFC, a gradual drop in pressure in the battery occurs, which is reflected in the pressure readings on the pressure gauge (5) and is signaled by the pressure sensor. After the expiration of a part of the compressed natural gas from the battery leading to a pressure drop to the level of 19 MPa, the control unit issues a control signal to the relay R1 (3) and turns on the compressor K1 (1), with a further pressure drop and reaching the level of 16 MPa, the second compressor turns on, and when the pressure reaches 14 MPa, all the compressors are switched on. All compressors are switched off at the same time by the controller (4) when the pressure in the battery reaches 25 MPa, which is signaled by the pressure sensor of the pressure gauge (5). The formalized representation of the decision-making rules and the issuance of control actions for the scheme shown in Fig. 1 can be written as:
Control Algorithm of Compressor Equipment
615
Fig. 2. General view of the upgraded control scheme for turning on and off compressors at CNG stations: 1 – is a high-pressure compressor with an electric drive; 2 – compressor power supply lines; 3-a relay for switching on/off the power supply to the compressor motor; 4 – is a controller with memory and adaptive software logic; 5 – is a pressure gauge with a pressure sensor; 6 – is a GFC for compressed natural gas; 7 – is a controlled gas shut-off valves.
1. if the Udp > Uunp then Ur1 = Urn = 0; 2. if the Udp < Uunp and Udp ≥ 0.75Uunp then Ur1 = 1, Urn = 0; 3. if the Udp < Uunp and Udp ≤ 0.65Uunp then Ur1 = Urn = 1; where Udp is the current voltage at the analog output of the natural gas pressure sensor in the battery, Uunp is the preset cut-off voltage corresponding to a pressure of 25 MPa, Ur1 and Urn are the cut-off voltage of the current supply relay to the electric motor of the 1-st and n-th compressors. The main disadvantages of the currently implemented algorithm are: the presence of control feedback based on the readings of the pressure sensor in the CNG battery, the lack of the possibility of limiting the gas flow at the gas filling gallery, the lack of information on the volumes of compressed gas required for refueling. These disadvantages lead to an increase in the frequency of switching on/off of compressor equipment, which negatively affects the resource of compressors, an increase in electricity consumption, and in the case of uneven flow of vehicles coming to refueling, peak load periods occur, characterized by a lack of power and a sharp increase in the time of refueling of vehicles.
2
Improved Control Scheme of CNG Compressor Equipment
Figure 2 shows a variant of the control scheme that ensures uniform operation of the compressor equipment of the station in the power saving mode. A special feature of this control scheme is the presence of a double feedback loop and a modified control unit.
616
A. A. Evstifeev et al.
Fig. 3. The principle of operation of the control unit and the order of operations in the system.
3
Improved Control Scheme of CNG Compressor Equipment
The principle of operation of the control unit and the order of operations in the system can be represented as follows (see Fig. 3). After power is supplied to the compressor equipment control unit, the unit switches to the start mode, where the control program is loaded from the permanent memory, channels and connected devices are diagnosed for operability, and the initial mode is set. After passing the survey procedures, the system switches to the operating mode, where the compressor equipment is switched on and off based on the readings of the pressure sensors and the calculated value of the pressure increase rate per unit of time in the vehicle cylinder. With a frequency of once a day, if there is a break in the operation of the compressor equipment and there are no requests for refueling, the self-diagnosis mode is activated, which checks the operability of sensors, equipment and internal control units. If incorrectly functioning elements are detected, error information is generated and an attempt is made to eliminate the malfunction, partially turn off the equipment, transfer the equipment to the limited load mode. All actions of the system are recorded by forming records of errors found in permanent memory with duplication of information to the operator on the control panel. When a signal is received from the power supply unit, the switch-off mode is switched
Control Algorithm of Compressor Equipment
617
off, at which all compressor equipment stops normally, information about the fact of the shutdown and its causes is recorded, all devices are switched to the off state. One of the modes of operation of the control unit is the mode of identification of the criticality of the found error. The algorithm of this module is implemented using elements of fuzzy logic based on the Takagi-Sugeno algorithm, which provides for the presence of a phasification stage of the parcel part of the rules [2]. In this paper, we used asymmetric smooth membership functions (Fig. 4a), based on a set of sigmoid functions that can be opened either on the left or on the right, depending on the type of function. Symmetric and closed functions are synthesized using two additional sigmoids: the main one and the additional one. In the case of determining the criticality of the emergency mode of compressor equipment, the residual life of this compressor is calculated based on the analysis of vibration sensor data, statistical data on the number of starts and stops and other. In some cases, when identifying the facts of thermal exposure, it is necessary to take into account the presence of microcracks, their number, analysis of placement locations and dimensions for compliance with the manufacturer’s technical specifications [4]. When performing the identification of criticality, the information obtained and the results of the survey (if any) are compared with the nominal test results that were carried out during the inspection of such equipment by the manufacturer or the data of precedents with it. The information about the nominal indicators is kept by the manufacturer and, as facts and incidents appear, adjusts the regulatory curve for the indicators. In this case, a measure of similarity between the installed and rated equipment and a function of the distance to the nearest known test result is introduced, after which the numerical values of the equipment ownership function are calculated. To illustrate the application of the approach described above, we present graphs of membership functions for estimating the residual resource (Fig. 4b) using two methods: actual and normative. The application of this method for obtaining an integral estimate of heterogeneous fuzzy information allows us to form an adapted membership function of fuzzy parameters using corrective procedures, bringing them to the conservative form for critical equipment, shown in Fig. 4c, and for non - critical equipment-to the non-conservative form, capturing the entire space under the curves on the left and right. In the future, these membership functions are used in the well-known fuzzy inference algorithm in the theory of fuzzy sets to assess the parameters and the remaining life, which allows us to conclude that the condition of the equipment is critical and, if necessary, to take corrective actions. The use of varying the test results of equipment (nominal reference objects) allows you to make informed decisions when operating a gas station if an expert has results for a limited number of tests with a set of models built on their basis [3].
618
A. A. Evstifeev et al.
Fig. 4. a) Gaussian and sigmoid membership functions; b) membership functions: Rn the normative curve, Rf -the actual curve, Rp - agreed assessment; c) corrected original membership functions.
4
Conclusions
As a result of the conducted research, the control system of the main production equipment of the CNG station was analyzed and a scheme for controlling the switching on and off of compressors at the station was developed. An improved scheme for controlling the switching on and off of compressors at the station was developed. The emergence of additional feedback led to the need to adjust the algorithms of the control unit in order to expand its functionality and implement the modes of priority refueling of vehicles, self-diagnosis, artificial intelligence in the error processing. A feature of the proposed algorithm is the use of corrected conservative initial membership functions based on the normative membership functions. The use of varying the test results of equipment allows you to make informed decisions when operating a gas station. This is especially important for new equipment that is in the process of finishing, as well as outdated equipment that is periodically used, the production of which has already been completed, but the operation is still ongoing.
References 1. Ministry of Economic Development of the Russian Federation: Forecast of the socio-economic development of the Russian Federation for 2021 and for the planning period of 2022 and 2023. https://www.economy.gov.ru/material/file/ 956cde638e96c25da7d978fe3424ad87/Prognoz.pdf 2. Rybina G., Blokhin Y.: Automated planning: usage for integrated expert systems construction. In: Biologically Inspired Cognitive Architectures (BICA) for Young Scientists: Proceedings of the First International Early Research Career Enhancement School (FIERCES 2016), vol. 449, pp. 169–177 (2016). https://doi.org/10. 1007/978-3-319-32554-5 221
Control Algorithm of Compressor Equipment
619
3. Kluchnikov, M., Matrosova, E., Tikhomirova, A., Tikhomirova, S.: Development of an optimal production plan using fuzzy logic tools. In: Samsonovich, A.V. (ed.) BICA 2019. AISC, vol. 948, pp. 211–218. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-25719-4 27 4. Evstifeev, A.A., Zaeva, M.A.: The method of planning the process of refueling vehicles using artificial intelligence and fuzzy logic methods. Procedia Comput. Sci. 190, 252–255 (2021). https://doi.org/10.1016/j.procs.2021.06.031
Author Index
A Abdulov, Rafael E., 275, 522 Abramov, Andrey, 355 Akimoto, Taisuke, 1 Aleksey, Osipov, 223, 576 Alicea, Bradly, 15 Anchekov, Murat, 319 Anthis, Jacy Reese, 20 Aracelova, Irina, 380 Arakelova, Irina, 344 Artamonov, Alexey Anatolevich, 69 Auzby, Gusov Zakharovich, 561 B Barradas, Isabel, 42 Bondarev, Sergey, 367 Bugaenko, Marina Vladimirovna, 535 Bushov, V. Yu., 569 Bzhikhatlov, K. Ch., 327 C Canbalo˘glu, Gülay, 54 Cherkasskaya, Marina Valeryevna, 69 Cherkasskiy, Andrey Igorevich, 69 Chistiy, A. S., 297 Cialfi, Daniela, 15 D Davydov, Yury, 438 de Jong, Frank, 75 Dmitry, Kupriyanov, 223 Dmitry, Sheludyakov, 182 Dohrn, Sebastian, 596 Dolidze, Alexandra, 89
Dubrovsky, David I., 127 Durakovskiy, Anatoly P., 96 Dushkin, Roman V., 113 Dyatlov, Dmitriy A., 96 Dzhabborov, Daler B., 275 E Efimov, Albert, 127 Ekaterina, Pleshakova, 182, 223 Eler, Edgar, 75 Ercelik, Dilay F., 138 Evstifeev, Andrew A., 613 Ezhova, Anastasia A., 243 F Fadeicheva, Galina, 344, 355, 367 G Galin, Ilya Yurievich, 69 Gavrilkina, Anastasia, 261 Golitsina, Olga, 261 Golitsyna, Olga, 268 Gorbatov, Victor S., 96 Gorbov, Evgeniy A., 400 Grankina, Victoria, 380 Gurtueva, I. A., 327 Gurtueva, Irina, 319 Guseva, Anna I., 148 I Igrevskaya, Anna, 158 Ionkina, Kristina, 552 Ismailova, Larisa, 164, 176, 506, 596 Ismailova, Larisa Y., 170
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 V. V. Klimov and D. J. Kelley (Eds.): BICA 2021, SCI 1032, pp. 621–623, 2022. https://doi.org/10.1007/978-3-030-96993-6
622 J Jackson Jr., Philip C., 195 K Kachina, Alexandra, 158 Kankulov, S. A., 327 Kartashov, Sergey I., 393, 569 Khokhlov, N. M., 297 Kholodny, Yuri I., 393 Khrebtov, Alexander, 367 Khrupina, Ksenia Sergeevna, 208 Kloc, Agnieszka, 42 Kogler Jr., Joao E., 216 Komolov, Oleg O., 275 Konstantin, Bublikov, 182, 223, 576 Koole, Sander L., 75 Koptelov, Matvey, 148 Korchagin, Sergey, 182, 576 Korpusenko, Anastasia, 552 Kosikov, Sergey, 164, 176, 506, 596 Kosikov, Sergey V., 170 Kovalchuk, Mikhail V., 393 Kreinin, G. V., 288 Kreynin, G. V., 297 Kudryavtsev, Konstantin, 158 Kuzmin, Andrey I., 231 Kuznetsova, Nadezhda V., 243 L Lebedev, Alexander, 261, 268 Leonov, Pavel Y., 243 Leshchev, Sergey V., 249 Lim, Avery, 15 Lozik, Nina, 355 Lyutikova, Larisa A., 255 M Makar, Svetlana, 344, 355, 367 Maksimov, Nikolay, 261, 268 Malakhov, Denis G., 393 Manakhova, Irina Viktorovna, 208 Matrosova, Elena, 148 Matveev, Philipp, 127 Medvedeva, Olga, 355, 367, 380 Medvedeva, Yulia M., 275 Melnikov, Dmitriy A., 96 Mikhail, Ivanov, 182, 223, 576 Misyurin, S. Yu., 288, 297, 306 Misyurin, Sergey Yu., 281, 333 Miyazaki, Kazuteru, 313 Molchanov, E. M., 297 Moloshnikov, Ivan, 447, 463 Morozevich, Maria, 89
Author Index N Nagoev, Z. V., 327 Nagoev, Zalimkhan, 319 Nagoeva, O. V., 327 Naumov, Aleksandr, 447 Nelyubin, Andrey P., 281, 288, 297 Norkina, Anna, 344, 355, 367, 380 Nosova, Natalia Yu., 288, 297, 333 Nosova, Svetlana, 344, 355, 367, 380
O Orlov, Vyacheslav A., 393, 569
P Pak, Nikolay, 89 Parent, Jesse, 15 Pavlov, Nikita, 576 Petrova, Aliona, 158 Petukhov, Alexandr Y., 400 Pimenova, Victoria Olegovna, 561 Piskunov, Pavel, 406 Pleshakova, Ekaterina, 576 Polevaya, Sofia A., 400 Popova, Galina Ivanovna, 412 Prokhorov, Igor, 406 Pshenokova, I. A., 327 Putilov, Alexander Valentinovich, 208
R Raghavachary, Saty, 419 Rass, Lars, 75 Repkina, Olga Bronislavovna, 412 Rybka, Roman, 438, 447, 457, 463 Rychkov, Vadim A., 243 Rylkov, Gleb, 463
S Samsonovich, Alexei V., 231, 428 Savi´c, Dobrica, 552 Sboev, Alexander, 438, 447, 457, 463 Schneider, Howard, 472 Selivanov, Anton, 463 Semenov, Yu. A., 306 Semenova, E. B., 306 Semyonov, Denis A., 231 Serenko, Alexey, 438, 457 Sergey, Korchagin, 223 Shevchenko, Nadezhda A., 613 Shikhalieva, Dzhannet Sergoevna, 535
Author Index Shirokova, Lidia, 380 Shpak, Vasily V., 486 Shurygin, Viktor A., 496 Si, Mei, 602 Slieptsov, Igor, 164, 176, 506 Stepankov, Vladimir Y., 113 Sushkov, Viktor M., 243 Suyts, Viktor P., 243 Suyuncheva, Alisa R., 512 Svetlik, M. V., 569 T Temirova, Tamara O., 522 Tikhomirova, Anna, 148 Timokhin, Dmitriy Vladimirovich, 412, 528, 535 Titaev, Sergey A., 542 Tolokonskij, A. O., 585 Tretyakov, Evgeny, 552 Treur, Jan, 42, 54, 75, 138 Treur, Roy M., 75 Trubacheev, Evgeniy Valerievich, 561 U Ushakov, V. L., 569
623 V Vartanov, Alexander V., 512 Vasiliev, Nikita, 576 Vavrenyuk, Aleksandr B., 496 Victor, Radygin, 182, 576 Vlasov, Danila, 438 Volodin, V. S., 585 W Weng, Nina, 42 Williams, Andy E., 590 Wolfengagen, Viacheslav, 164, 176, 506, 596 Wolfengagen, Viacheslav E., 170 Worth, James B., 602 Y Yadykin, Igor M., 496 Yerbayev, Yerbol, 182 Z Zaeva, Margarita A., 613