207 50 39MB
English Pages XVI, 441 [449] Year 2021
Studies in Computational Intelligence 925
Boris Kryzhanovsky Witali Dunin-Barkowski Vladimir Redko Yury Tiumentsev Editors
Advances in Neural Computation, Machine Learning, and Cognitive Research IV Selected Papers from the XXII International Conference on Neuroinformatics, October 12–16, 2020, Moscow, Russia
Studies in Computational Intelligence Volume 925
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.
More information about this series at http://www.springer.com/series/7092
Boris Kryzhanovsky Witali Dunin-Barkowski Vladimir Redko Yury Tiumentsev •
•
•
Editors
Advances in Neural Computation, Machine Learning, and Cognitive Research IV Selected Papers from the XXII International Conference on Neuroinformatics, October 12–16, 2020, Moscow, Russia
123
Editors Boris Kryzhanovsky Scientific Research Institute for System Analysis Russian Academy of Sciences Moscow, Russia Vladimir Redko Scientific Research Institute for System Analysis Russian Academy of Sciences Moscow, Russia
Witali Dunin-Barkowski Scientific Research Institute for System Analysis Russian Academy of Sciences Moscow, Russia Yury Tiumentsev Moscow Aviation Institute (National Research University) Moscow, Russia
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-60576-6 ISBN 978-3-030-60577-3 (eBook) https://doi.org/10.1007/978-3-030-60577-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The international conference on “Neuroinformatics” is the annual multidisciplinary scientific forum dedicated to the theory and applications of artificial neural networks, the problems of neuroscience and biophysics systems, artificial intelligence, adaptive behavior, and cognitive studies. The scope of the neuroinformatics conference is broad, ranging from the theory of artificial neural networks, machine learning algorithms, and evolutionary programming to neuroimaging and neurobiology. Main topics of the conference cover theoretical and applied research from the following fields: neurobiology and neurobionics: cognitive studies, neural excitability, cellular mechanisms, cognition and behavior, learning and memory, motivation and emotion, bioinformatics, adaptive behavior and evolutionary modeling, brain–computer interface; neural networks: neurocomputing and learning, paradigms and architectures, biological foundations, computational neuroscience, neurodynamics, neuroinformatics, deep learning networks, neuro-fuzzy systems, and hybrid intelligent systems; machine learning: pattern recognition, Bayesian networks, kernel methods, generative models, information-theoretic learning, reinforcement learning, relational learning, dynamical models, classification and clustering algorithms, and self-organizing systems; applications: medicine, signal processing, control, simulation, robotics, hardware implementations, security, finance and business, data mining, natural language processing, image processing, and computer vision.
v
vi
Preface
In 2020, the XXII Neuroinformatics Conference was carried out as part of the CAICS 2020: National Congress on Cognitive Research, Artificial Intelligence, and Neuroinformatics held during October 13–16, 2020, in Moscow. More than 100 reports were presented at the Neuroinformatics-2020 Conference. Of these, 51 papers were selected, including two invited papers, for which articles were prepared and published in this volume. July 2020
Boris Kryzhanovskiy Witali Dunin-Barkowski Vladimir Red’ko Yury Tiumentsev
Organization
Editorial Board Boris Kryzhanovsky Witali Dunin-Barkowsky Vladimir Red’ko Yury Tiumentsev
Scientific Research Institute for System Analysis of Russian Academy of Sciences Scientific Research Institute for System Analysis of Russian Academy of Sciences Scientific Research Institute for System Analysis of Russian Academy of Sciences Moscow Aviation Institute (National Research University), Russia
Advisory Board Alexander N. Gorban
Nicola Kasabov
Jun Wang
Tentative Chair of the International Advisory Board, University of Leicester, Great Britain; Lobachevsky State University of Nizhny Novgorod, Russia Professor of Computer Science and Director KEDRI, Auckland University of Technology, Auckland, New Zealand Chair Professor of Computational Intelligence, Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong
vii
viii
Organization
Program Committee of the XXII International Conference “Neuroinformatics-2020” General Chair Gorban Alexander
University of Leicester, Great Britain; Lobachevsky State University of Nizhny Novgorod, Russia
Co-chairs Kryzhanovskiy Boris Dunin-Barkowski Witali Shumsky Sergey
Scientific Research Institute for System Analysis, Moscow, Russia The Moscow Institute of Physics and Technology (State University), Russia The Moscow Institute of Physics and Technology (State University), Russia
Program Committee Members Abraham Ajith
Baidyk Tatiana Borisyuk Roman Burtsev Mikhail Cangelosi Angelo Chizhov Anton Dolenko Sergey Dosovitskiy Alexey Dudkin Alexander Ezhov Alexandr Alexandrovich Frolov Alexander Golovko Vladimir Gorban Alexander Nikolaevich Hayashi Yoichi Husek Dusan
Machine Intelligence Research Laboratories (MIR Labs), Scientific Network for Innovation and Research Excellence, Washington, USA The National Autonomous University of Mexico, Mexico Plymouth University, UK The Moscow Institute of Physics and Technology (State University) Plymouth University, UK Ioffe Physical Technical Institute, Russian Academy of Sciences, Saint Petersburg Skobeltsyn Institute of Nuclear Physics Lomonosov Moscow State University, Russia Albert-Ludwigs-Universität, Freiburg, Germany United Institute of Informatics Problems, Minsk, Belarus State Research Center of Russian Federation “Troitsk Institute for Innovation and Fusion Research”, Moscow Institute of Higher Nervous Activity and Neurophysiology of RAS, Moscow Brest State Technical University, Belarus University of Leicester, Great Britain Meiji University, Kawasaki, Japan Institute of Computer Science, Czech Republic
Organization
Izhikevich Eugene Jankowski Stanislaw Kaganov Yuri Kazanovich Yakov Kecman Vojislav Kernbach Serge
Koprinkova-Hristova Petia Kussul Ernst Litinskii Leonid Makarenko Nikolay
Mishulina Olga Narynov Sergazy Pareja-Flores Cristobal Prokhorov Danil Red’ko Vladimir Rutkowski Leszek Samarin Anatoly
Samsonovich Alexei Vladimirovich Sandamirskaya Yulia Shumsky Sergey Sirota Anton Snasel Vaclav Terekhov Serge Tikidji-Hamburyan Ruben Tiumentsev Yury
ix
Braincorporation, San Diego, USA Warsaw University of Technology, Poland Bauman Moscow State Technical University, Russia Institute of Mathematical Problems of Biology Russian Academy of Sciences Virginia Commonwealth University, USA Cybertronica Research, Research Center of Advanced Robotics and Environmental Science, Stuttgart, Germany Institute of Information and Communication Technologies, Bulgaria The National Autonomous University of Mexico, Mexico Scientific Research Institute for System Analysis, Moscow, Russia The Central Astronomical Observatory of the Russian Academy of Sciences at Pulkovo, Saint Petersburg, Russia National Research Nuclear University (MEPhI), Moscow, Russia Alem Research, Almaty, Kazakhstan Complutense University of Madrid, Spain Toyota Research Institute of North America, USA Scientific Research Institute for System Analysis, Moscow, Russia Czestochowa University of Technology, Poland Kogan Research Institute for Neurocybernetics Southern Federal University, Rostov-on-Don, Russia George Mason University, USA Institute of Neuroinformatics, UZH/ETHZ, Switzerland The Moscow Institute of Physics and Technology (State University), Russia Ludwig Maximilian University of Munich, Germany Technical University Ostrava, Czech Republic JSC “Svyaznoy Logistics”, Moscow, Russia Louisiana State University, USA Moscow Aviation Institute (National Research University), Russia
x
Trofimov Alexander Tsodyks Misha Tsoy Yury Ushakov Vadim Vvedensky Viktor Wunsch Donald Yakhno Vladimir
Zhdanov Alexander
Organization
National Research Nuclear University (MEPhI), Moscow, Russia Weizmann Institute of Science, Rehovot, Israel Institut Pasteur Korea, Republic of Korea National Research Center “Kurchatov Institute”, Moscow, Russia National Research Center “Kurchatov Institute”, Moscow, Russia Missouri University of Science and Technology The Institute of Applied Physics of the Russian Academy of Sciences, Nizhny Novgorod, Russia Lebedev Institute of Precision Mechanics and Computer Engineering, Russian Academy of Sciences, Moscow, Russia
Contents
Invited Papers Reverse Engineering the Brain Based on Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. A. Shumsky Who Says Formalized Models are Appropriate for Describing Living Systems? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. G. Yakhno, S. B. Parin, S. A. Polevaya, I. V. Nuidel, and O. V. Shemagina
3
10
Cognitive Sciences and Brain-Computer Interface Mirror Neurons in the Interpretation of Action and Intentions . . . . . . . Yuri V. Bushov, Vadim L. Ushakov, Mikhail V. Svetlik, Sergey I. Kartashov, and Vyacheslav A. Orlov Revealing Differences in Resting States Through Phase Synchronization Analysis. Eyes Open, Eyes Closed in Lighting and Darkness Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Irina Knyazeva, Boytsova Yulia, Sergey Danko, and Nikolay Makarenko Assessment of Cortical Travelling Waves Parameters Using Radially Symmetric Solutions to Neural Field Equations with Microstructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evgenii Burlakov, Vitaly Verkhlyutov, Ivan Malkov, and Vadim Ushakov The Solution to the Problem of Classifying High-Dimension fMRI Data Based on the Spark Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Efitorov, Vladimir Shirokii, Vyacheslav Orlov, Vadim Ushakov, and Sergey Dolenko
37
44
51
58
xi
xii
Contents
A Rehabilitation Device for Paralyzed Disabled People Based on an Eye Tracker and fNIRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrey N. Afonin, Rustam G. Asadullayev, Maria A. Sitnikova, and Anatoliy A. Shamrayev Analytic Model of Mental Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evgeny Meilikov and Rimma Farzetdinova Topology of the Thesaurus of Russian Adjectives Revealed by Measurements of the Spoken Word Recognition Time . . . . . . . . . . . Victor Vvedensky and Konstantin Gurtovoy
65
71
85
Adaptive Behavior and Evolutionary Simulation Model of Self-organizing System of Autonomous Agents . . . . . . . . . . . . Zarema B. Sokhova
93
Role of Resource Production in Community of People and Robots . . . . 101 Vladimir B. Kotov and Zarema B. Sokhova Modeling of Interaction Between Learning and Evolution at Minimizing of Spin-Glass Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Vladimir G. Red’ko and Galina A. Beskhlebnova Symmetry Learning Using Non-traditional Biologically Plausible Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Alexander Lebedev, Vladislav Dorofeev, and Vladimir Shakirov Providing Situational Awareness in the Control of Unmanned Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Dmitry M. Igonin, Pavel A. Kolganov, and Yury V. Tiumentsev Neurobiology and Neurobionics Complexity of Continuous Functions and Novel Technologies for Classification of Multi-channel EEG Records . . . . . . . . . . . . . . . . . . 137 Boris S. Darkhovsky, Alexandra Piryatinska, Yuri A. Dubnov, Alexey Y. Popkov, and Alexander Y. Kaplan Towards Neuroinformatic Approach for Second-Person Neuroscience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Lubov N. Podladchikova, Dmitry G. Shaposhnikov, and Evgeny A. Kozubenko Is the Reinforcement Learning Theory Well Suited to Fit the Functioning of the Cerebral Cortex-Basal Ganglia System? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Irina A. Smirnitskaya
Contents
xiii
Neural Activity Retaining in Response to Flash Stimulus in a Ring Model of an Orientation Hypercolumn with Recurrent Connections, Synaptic Depression and Slow NMDA Kinetics . . . . . . . . . . . . . . . . . . . 157 Vasilii S. Tiselko, Margarita G. Kozeletskaya, and Anton V. Chizhov Applications of Neural Networks Deep Neural Networks for Ortophoto-Based Vehicle Localization . . . . . 167 Alexander Rezanov and Dmitry Yudin Roof Defect Segmentation on Aerial Images Using Neural Networks . . . 175 Dmitry A. Yudin, Vasily Adeshkin, Alexandr V. Dolzhenko, Alexandr Polyakov, and Andrey E. Naumov Choice of Hyperparameter Values for Convolutional Neural Networks Based on the Analysis of Intra-network Processes . . . . . . . . . 184 Dmitry M. Igonin, Pavel A. Kolganov, and Yury V. Tiumentsev Generation an Annotated Dataset of Human Poses for Deep Learning Networks Based on Motion Tracking System . . . . . . . . . . . . . . . . . . . . . 198 Igor Artamonov, Yana Artamonova, Alexander Efitorov, Vladimir Shirokii, and Oleg Vasilyev Automatic Segmentation of Acute Stroke Lesions Using Convolutional Neural Networks and Histograms of Oriented Gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Nurlan Mamedov, Sofya Kulikova, Victor Drobakha, Elena Bartuli, and Pavel Ragachev Learning Embodied Agents with Policy Gradients to Navigate in Realistic Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 Alexey Staroverov, Vladislav Vetlin, Stepan Makarenko, Anton Naumov, and Aleksandr I. Panov Comparative Efficiency of Prediction of Relativistic Electron Flux in the Near-Earth Space Using Various Machine Learning Methods . . . 222 Irina Myagkova, Vladimir Shirokii, Roman Vladimirov, Oleg Barinov, and Sergey Dolenko Metal Oxide Gas Sensors Response Processing by Statistical Shape Analysis and Machine Learning Algorithm for Industrial Safety Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 Alexander Efitorov, Matvei Andreev, and Valeriy Krivetskiy Feature Selection in Neural Network Solution of Inverse Problem Based on Integration of Optical Spectroscopic Methods . . . . . . . . . . . . . 234 Igor Isaev, Olga Sarmanova, Sergey Burikov, Tatiana Dolenko, Kirill Laptinskiy, and Sergey Dolenko
xiv
Contents
The Text Fragment Extraction Module of the Hybrid Intelligent Information System for Analysis of Judicial Practice of Arbitration Courts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 Maria O. Taran, Georgiy I. Revunkov, and Yuriy E. Gapanyuk Construction of a Neural Network Semi-empirical Model of Deflection of a Sample from a Composite Material . . . . . . . . . . . . . . . . . . . . . . . . . 249 Dmitry Tarkhov, Valeriy Tereshin, Galina Malykhina, Anastasia Gomzina, Ilya Markov, and Pavel Malykh Combined Neural Network for Assessing the State of Computer Network Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 Igor Saenko, Fadey Skorik, and Igor Kotenko Operational Visual Control of the Presence of Students in Training Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 Oleg I. Fedyaev and Nikolay M. Tkachev Classification of Microscopy Image Stained by Ziehl–Neelsen Method Using Different Architectures of Convolutional Neural Network . . . . . . 269 Inga G. Shelomentseva and Serge V. Chentsov Using a Sparse Neural Network to Predict Clicks Probabilities in Online Advertising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Yuriy S. Fedorenko Error Analysis for Visual Question Answering . . . . . . . . . . . . . . . . . . . 283 Artur Podtikhov, Makhmud Shaban, Alexey K. Kovalev, and Aleksandr I. Panov Super Intelligence to Solve COVID-19 Problem . . . . . . . . . . . . . . . . . . . 293 Vladislav P. Dorofeev, Alexander E. Lebedev, Vladimir V. Shakirov, and Witali L. Dunin-Barkowski Neural Network Theory, Concepts and Architectures Time Series Prediction by Reservoir Neural Networks . . . . . . . . . . . . . . 303 Mikhail S. Tarkov and Ivan A. Chernov Monitoring Human Cognitive Activity Through Biomedical Signal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Konstantin V. Sidorov, Natalya I. Bodrina, and Natalya N. Filatova Adversarial Dataset Augmentation Using Reinforcement Learning and 3D Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 Vladimir V. Kniaz, Vladimir A. Knyaz, Vladimir Mizginov, Ares Papazyan, Nikita Fomin, and Lev Grodzitsky
Contents
xv
Data Representation in All-Resistor Systems . . . . . . . . . . . . . . . . . . . . . 330 Vladimir B. Kotov and Galina A. Beskhlebnova The Neuromorphic Model of the Human Visual System . . . . . . . . . . . . 339 Anton Korsakov and Aleksandr Bakhshiev Ring of Unidirectionally Synaptically Coupled Neurons with a Relay Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Sergey D. Glyzin and Margarita M. Preobrazhenskaia Find a Needle in a Haystack - Sorting arXiv Articles by Your Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Vladimir V. Shakirov, Vladislav P. Dorofeev, Alexander E. Lebedev, and Witali L. Dunin-Barkowski Quantization of Weights of Trained Neural Network by Correlation Maximization Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Maria Pushkareva and Yakov Karandashev Solar Plant Intelligent Control System Under Uniform and Non-uniform Insolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Ekaterina A. Engel and Nikita E. Engel Hopfield Neural Network and Anisotropic Ising Model . . . . . . . . . . . . . 381 Dmitry V. Talalaev Development of the Learning Logic Gate for Optimization of Neural Networks at the Hardware Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Taras Mikhailyuk and Sergey Zhernakov Approximating Conductance-Based Synapses by Current-Based Synapses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 Mikhail Kiselev, Alexey Ivanov, and Daniil Ivanov STDP-Based Classificational Spiking Neural Networks Combining Rate and Temporal Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Aleksandr Sboev, Danila Vlasov, Alexey Serenko, Roman Rybka, and Ivan Moloshnikov Solving Equations Describing Processes in a Piecewise Homogeneous Medium on Radial Basis Functions Networks . . . . . . . . . . . . . . . . . . . . 412 Dmitry A. Stenkin and Vladimir I. Gorbachenko Decoding Neural Signals with a Compact and Interpretable Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420 Artur Petrosyan, Mikhail Lebedev, and Alexey Ossadtchi
xvi
Contents
Estimation of the Complexity of the Classification Task Based on the Analysis of Variational Autoencoders . . . . . . . . . . . . . . . . . . . . . 429 Andrey A. Brynza and Maria O. Korlyakova Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Invited Papers
Reverse Engineering the Brain Based on Machine Learning S. A. Shumsky1,2(&) 1
2
P.N.Lebedev Physical Institute, Moscow, Russia [email protected] Moscow Institute of Physics and Technology, Moscow, Russia
Abstract. A research program is proposed for developing human-like AGI. The top-down strategy of reverse engineering the brain architecture is complemented with the bottom-up machine learning approach. Narrow AI based on deep neural networks mimics low-level conditional reflexes. General intelligence is supposed to have higher-level computational architecture of artificial psyche, possessing inner motivations to explore and model complex environment. Such an architecture may be borrowed from the brain. Namely, by reverse engineering cortico-striatal system of mammalian brain we developed a hierarchical reinforcement learning model, capable of learning complex behavior with an arbitrary large planning horizon. The latter extends simple models of artificial psyche of animats, like MicroPsi2, providing the next step on the “animat path to AGI”. We argue that research should focus on modeling the architecture of the brain in terms of machine learning. Keywords: Human-like artificial general intelligence architectures Machine learning
Cognitive
1 Introduction In recent years, several major brain research programs have been launched in the world: European Human Brain Project, American BRAIN Initiative, China Brain Project, to name a few. In this paper, we propose an alternative research program to create an artificial mind with a computational architecture of the brain, based on machine learning. The proposed research program is based on the same “what I can’t create, I don’t understand” paradigm as computational neuroscience and artificial intelligence (AI), but differs from both of these areas in a special statement of the problem. Namely, it is proposed to shift focus from the physiology of the brain to its computational architecture, i.e. logical structure of the algorithms of the mind. At the same time, incredibly complex algorithms of thinking are supposed to be automatically generated in the learning process by the much simpler learning algorithms – “embryos of thinking”. Such an approach – focusing not on a substrate (brain), but on algorithms (mind) – is usually associated with AI. As a result, in current AI the accumulated body of knowledge about the brain is used to a minimum extent. We propose the development of an artificial psyche with the computational architecture of human brain, as a way to © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 3–9, 2021. https://doi.org/10.1007/978-3-030-60577-3_1
4
S. A. Shumsky
approach understanding the basic principles of the brain on the one hand, and creating an artificial general intelligence (AGI) on the other. Indeed, a huge body of empirical evidence has been accumulated in the brain sciences about what and when happens in the brain. At the same time, there is clearly a lack of generalized concepts summarizing these data [1]. Comprehensive models of the psyche are absent, though several approaches like MicroPsi2 [2] may serve as a good starting point. Nevertheless, we still do not understand how the brain works. On the other hand, as a result of the deep learning revolution of the 2010s, many types of cognitive tasks can now be solved with human level accuracy (weak AI). However, there is still no generally accepted approach to AGI. Thus, the brain scientists lack a working model of human psyche based on accumulated data, and AI developers lack the ideas on how the artificial psych can be created. Accordingly, in this paper, we urge to return to the idea of joining forces in understanding both natural and artificial intelligence within a single framework [3], namely – within machine learning paradigm. The paper is organized as follows. After discussing the basic concepts (Sect. 2), we formulate the essence of the proposed research program (Sect. 3). In Sect. 4, we elaborate on the role of brain sciences in this program. Finally, in Sect. 5, we propose a series of projects for modeling the artificial psyche, which may be of interest both to brain scientists and to AI developers.
2 Basic Concepts and Problem Statement A recent publication [4] fueled a heated debate in the AI community [5] regarding the definition of the term intelligence, situated at the crossroads of many sciences – philosophy, psychology, neurophysiology and, of course, AI, whose initial idea was to supplement the historically established approaches to intelligence with new ideas from the recently emerged computer sciences. Norbert Wiener in the late 1940s proposed cybernetics as the science of general principles of control and communication in living and technical systems [6]. The participants of the Dartmouth 1956 seminar, who coined the term AI, raised this bar even higher – to create artificial intelligence [7]. AI pioneers, as it turned out, clearly overestimated the level of science of that time in modeling the “algorithms of the mind.” This led, as we all know, to the collapse of high expectations and setting a significantly lower bar for the narrow (weak) AI. However, since then the situation has significantly changed both in the field of algorithms and in the brain sciences. Apparently, the time has come to rethink the research program of cognitive sciences, including AI. As it turned out during the discussion, most AI experts consider the defining property of intelligence the ability of agents to learn to solve a wide range of tasks [8]. Accordingly, intelligence may be defined as algorithms of adaptive goal-directed behavior of agents, capable for accumulating their experience using learning algorithms. This ability to learn to solve new tasks, i.e. to evolve more complex behavior, differ intelligent agents from adaptive physical systems, such as a thermostat.
Reverse Engineering the Brain Based on Machine Learning
5
3 Machine Learning Research Program We defined intelligence as a special class of algorithms that can increase their complexity when interacting with the environment, improving the ability to achieve their goals in line with changing conditions. Such algorithms are the subject of machine learning. Therefore, machine learning is the most appropriate paradigm for the study of intelligence and mind. In our opinion, machine learning plays the same role for understanding the mind as the theory of evolution for understanding life. We consider machine learning as a fundamental science about that part of the nature that has learned to learn. In contrast to physics, which studies inert matter, incapable of learning. In this understanding, the theory of evolution is the theory of living matter learning, explaining the appearance of all life forms in all their amazing diversity. Thus, the well-known thesis of F. Dobzhansky: “Nothing in biology makes sense, except in the light of the theory of evolution” [9] can be rephrased as follows: “Nothing in cognitive sciences makes sense, except in the light of machine learning”. Accordingly, it is machine learning that is able to combine the efforts of different sciences in understanding the mind, both human and artificial. By providing a unifying scientific paradigm, a mathematical language convenient for all of them to communicate. Neurophysiology describes the brain as a thinking machine and seeks to explain the operation of this machine. In “brain physics”, as elsewhere in physics, explanation means finding causal relationships. So, patriarch of neurophysiology I.M. Sechenov explained thinking as a complex hierarchy of reflexes: “By certain external influences, series of associated thoughts are invoked sequentially, and the end of the reflex follows logically from the strongest” [10]. Psychology, on the contrary, describes thinking teleologically, based on the internal goals of the body. Our mind seeks indirectly, through interaction with the outside world, to satisfy our innate needs and acquired values. That is, thinking is determined mainly not by external circumstances but by our internal needs, both biological and social. This position is best illustrated by the famous metaphor of L.S. Vygotsky: “Clouds of thoughts, driven by the winds of motives, spill the rain of words” [11]. Logic focuses on the laws of conscious symbolic thinking (“clouds of thoughts”), psycholinguistics how they “spill the rain of words”, psychology is more interested in subconscious “winds of motives”. In any case, the focus is not on the physics of the device-brain, but on the models of the mind, often only in descriptive verbal form, and not as working models with clear algorithms of thinking. The eternal question arises: how to relate causal relationships and goal-directed behavior, i.e. physical and mental? Of course, determinism and teleology do not contradict each other. These are two ways of understanding the world – as the trajectories of dynamic systems or their attractors. However, deterministic description of a complex system does not tell us anything about its goals, and knowing the goals does not help to reveal algorithms for achieving them.
6
S. A. Shumsky
The theory of machine learning provides the missing link between the principles of causality and goal-directed behavior by changing the setting of the problem. Physics takes a system as given, and asks how this system works? Machine learning, by contrast, is interested in how to create a system with the desired properties. The attention of researchers here focuses not on the “morphology”, but on the “embryology” of complex systems. Accordingly, machine learning is interested mainly not in “fast” algorithms of a ready-made system, but in “slow” algorithms of its self-assembly, a gradual increase of system’s complexity in its interaction with the world. And these latter (learning algorithms) are formulated precisely in terms of optimal control, i.e. achievement of some goals. Thus, in machine learning, goals are embedded in completely deterministic learning algorithms, allowing us to create arbitrarily complex systems with predetermined goaldirected behavior, thereby reconciling causal and teleological descriptions. So, machine learning offers us the following program of research and creating complex learning systems: identifying or developing learning algorithms (causal description), as a solution to optimization problems (achieving goals) by increasing the complexity of the system (accumulating knowledge). Physics, in which the concept of learning is absent, is not able to explain the appearance of such complex systems as the mind. For this task, we have to turn to machine learning. Accordingly, the structure and working principles of the brain can also be understood only in the light of machine learning. Not as a ready-made system, but as a result of a long learning process. Hence the emphasis on the learning algorithms of the brain. By the way, the Anokhin’s theory of functional systems offers, in essence, reinforcement learning algorithm as a system-forming principle of brain organization [12]. Such algorithms are most relevant in modern machine learning related to robots, agents, and AGI. In recent years, the problem of machine vision has been practically solved, so that fully autonomous cars have already appeared on the roads. Deep learning algorithms form a hierarchy of increasingly abstract features that recognize the situation. At the same time helping us to understand the general principles of sensory intelligence. Following the ability to perceive the world, machine learning now turns to managing behavior and mastering the skills of abstract thinking [13]. A solvent demand for artificial psyche of robots and AGI is on the rise. In our understanding, it is the brain science that can help overcome this important technological barrier. One only needs to look at the brain in terms of machine learning, to distract from the substrate of the brain and focus on its learning algorithms. It is especially important to reengineer the system-level computational architecture of the brain, paving the way to the artificial psyche of robots.
Reverse Engineering the Brain Based on Machine Learning
7
4 Reverse-Engineering the Brain Such problem statement is not new for neurophysiology. A similar research program was proposed, for example, by David Marr in the 70s, who distinguished three levels of description of complex systems: • computational level – system architecture, purpose and interaction of it’s subsystems; • algorithmic level – data structures and algorithms; • implementation level – the physical implementation of these algorithms. Only the lowest level is directly accessible to neurophysiologists – how specific neurons or neural ensembles interact with each other. The majority of studies remain at this descriptive level. Less often it is possible to unravel the algorithms of some subsystems of the brain and understand the logic of their design. For instance, studies of the primary visual cortex made it possible to reverse engineer algorithms of local cortical modules [14]. However, a true understanding of the principles of the brain is possible only with access to a system-level architecture of the brain. Namely, the purpose and interaction of the main brain subsystems: the cortex, basal ganglia, thalamus, cerebellum, stem structures, etc. With an understanding of the algorithms of their work and learning. Reverse engineering of the computational architecture of the brain is, from our point of view, the primary task of neurophysiology in the framework of the proposed research program. We agree with Marr, who noted that “although the top level is most neglected, it is also the most important” [15]. As an example of this approach, we present the author’s version of reverse engineering of the cortico-striatal system [15], as a hierarchy of modules learning to jointly implement predictive behavior control with reinforcing signals coming from the dopamine system of the midbrain [17–19]. The architecture of such a large brain subsystem can serve as a prototype of the simplest artificial psyche. The learning algorithms in this case comprise: • a hierarchy of self-organizing maps of cortical modules, predicting activity of lower level cortical modules with primary sensory-motor modules at the lowest level; • convergence of those predictions with the actual signals from lower levels; • an assessment of the usefulness of various patterns of cortical activity and the selection of the winning pattern by the basal ganglia, implementing reinforcement learning. Based on this reconstruction, a model of the artificial psyche ADAM (Adaptive Deep Autonomous Machine) “in the image and likeness” of the human is being developed in the MIPT Cognitive architecture lab. We emphasize that it was reverse engineering of the cortico-striatal system that helped us to propose a learning scheme for hierarchical behavior planning. So far, this problem has not been resolved [20]. In the spirit of this approach, we offer a program to develop increasingly realistic models of the psyche and mind to reproduce the animal and human behavior in real experiments and thereby make sure that we really understand it.
8
S. A. Shumsky
5 Modeling the Mind We have shown that an understanding of the computational architecture of the brain is really useful for developing models of the artificial psyche. And with the advent of such models, we are now able to test various theoretical ideas about how the brain structure determines the behavior of animals and humans. Planning and conducting experiments with the artificial psyche by the combined teams of cognitive scientists and AI developers is the essence of the proposed research program. The goal is to reproduce, with the gradual complication of the artificial psyche, the evolution of the cognitive abilities of mammals from rodents to primates and to humans following “the animat path to AGI” [20]. Starting naturally with the simplest task to reproduce the basic behavioral patterns of an “artificial mouse”: finding food in labyrinths, etc. Following with the study of game behavior of different agents, say predators and victims – “cat and mouse”. As well as the cooperative behavior of flock animals that can exchange signals – the language of poses and other ways to demonstrate their intentions. As a result, agents develop the ability to plan their behavior taking into account the intentions of other agents, commonly called the “theory of mind”. Moreover, if we really want to understand the human mind and create robots that can fit into human civilization, we should develop the theory of machine education. Namely, how can robots learn human values and human culture? Here we follow Vygotsky, who stressed the decisive role of the upbringing process – the installation of the concepts of collective human intelligence in an individual mind. In this connection, we note a modern analogue of the “three laws of robotics” proposed by Stuart Russell as the basis for the future “AI ethics” [21].
6 Conclusions In conclusion, we summarize the logic of the proposed research program. We proceed from the concept that “to understand is to reproduce.” Therefore, the only way to understand the mind is to develop its working model. To do this, brain and cognitive sciences should participate in modern AI research as a part of joint research program. In the last decade AI underwent a paradigm shift. It was realized that machine learning is the only way to build intelligent machines of unlimited complexity. Accordingly, the proposed research program should be based on the machine learning methodology. The task of cognitive sciences in such a joint research program is reverse engineering of the brain architecture and the construction of models of the artificial psyche. The proposed joint research program aims to provide: AI developers – a clear path to AGI, neurophysiologists – an understanding of the principles and algorithms of the brain, psychologists – an understanding of the principles of the organization of the psyche and its formation in the process of individual development.
Reverse Engineering the Brain Based on Machine Learning
9
References 1. Anokhin, K.V.: The Last Great Frontier of Life Sciences. Economic Strategies. 12(11), 56– 63 (2010) 2. Bach, J.: Modeling motivation in MicroPsi 2. In: International Conference on Artificial General Intelligence 2015, pp. 3–13. Springer, Cham (2015) 3. Newell, A.: Unified theories of Cognition. Harvard University Press (1994) 4. Wang, P.: On defining artificial intelligence. J. Artif. Gen. Intell. 10(2), 1–37 (2019) 5. Monett, D., et al. (eds).: On defining artificial intelligence (special issue). J. Artif. Gen. Intell. 11(2), 1–99 (2020) 6. Wiener, N.: Cybernetics or Control and Communication in the Animal and the Machine. Technology Press (1948) 7. McCarthy, J., et al.: A proposal for the Dartmouth summer research project on artificial intelligence, 31 August 1955 8. Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall (2009) 9. Dobzhansky, T.: Nothing in biology makes sense except in the light of evolution. Am. Biol. Teach. 35(3), 125–129 (1973) 10. Sechenov, I.M.: Reflexes of the Brain. MIT Press (1965) 11. Vygotskiĭ, L. S.: Thought and Language. MIT Press (2012) 12. Anokhin, P.K.: Essays on the physiology of functional systems. Medicine (1975) 13. Bengio, Y.: From system 1 to system 2. In: NeurIPS (2019) 14. Miikkulainen, R., et al.: Computational maps in the visual cortex. Springer, New York (2006) 15. Marr, D., Poggio, T.: From understanding computation to understanding neural circuitry (1976) 16. Haber, S.N.: Corticostriatal circuitry. Dialogues Clin. Neurosci. 18(1), 7–21 (2016) 17. Shumsky, S.A.: Reengineering of brain architecture: the role and interaction of the main subsystems. In: Russian Scientific Conference NEUROINFORMATICS 2015. Lectures on Neuroinformatics, pp. 13–45 (2015) 18. Shumsky, S.A.: Deep structural learning: a new look at reinforced learning. In: XX Russian Scientific Conference NEUROINFORMATICS 2018. Lectures on Neuroinformatics, pp. 11–43 (2018) 19. Shumsky, S.A.: Machine intelligence. In: Essays on the Theory of Machine Learning and Artificial Intelligence. RIOR Publishing, Moscow (2019). ISBN 978–5–369–02011–1 20. Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control. Viking (2019) 21. Strannegård, C., et al.: Learning and decision-making in artificial animals. J. Artif. Gen. Intell. 9(1), 55–82 (2018) 22. Russell, S., et al.: Ethics of artificial intelligence. Nature 521(7553), 415–416 (2015)
Who Says Formalized Models are Appropriate for Describing Living Systems? V. G. Yakhno1,2(&) , S. B. Parin1 , S. A. Polevaya1,3 I. V. Nuidel1,2 , and O. V. Shemagina1,2
,
1
N. I. Lobachevsky State University of Nizhny Novgorod (National Research University), Nizhny Novgorod, Russia [email protected], [email protected] 2 Federal Research Center Institute of Applied Physics of the Russian Academy of Sciences (IAP RAS), Nizhny Novgorod, Russia [email protected] 3 Federal State Budgetary Educational Institution of Higher Education «Privolzhsky Research Medical University» of the Ministry of Health of the Russian Federation, Nizhny Novgorod 603950, Russia
Abstract. The mathematical models of subsystems, the mutual functioning of which allows analyzing a wide range of reactions inherent in living systems, are considered. One of these subsystems describes possible states of a basic control recognition module that makes decisions based on the processed sensor signals. If the errors in its functioning exceed the specified threshold, a stress response trigger signal is generated. The other subsystem describes the mechanisms of formation and different levels of the development of three stages of the stress response, alternative variants of getting out of stress and controlling sensory signal perception thresholds in the first subsystem. For this, additional variables corresponding to the experimentally recorded data and knowledge about the “fast” and “slow” stages are used in the model subsystems. The performed research demonstrates the validity of the chosen model architecture and the possibility of using the results of its analysis as an adequate “language” of communication for researchers of living systems. We believe that the formalized models are important for understanding the meanings and consequences of unconscious perception through “image”, including sensation channels. They allowed us to formalize the description of a number of processes which were earlier interpreted ambiguously. Options for comparing dynamic modes of the integral system with known experimental data and interpreting them by interested audience are discussed. Keywords: Stress function modeling Experimental data structuring
Dynamic recognition modes
1 Introduction In living systems, the semantic understanding of external influences or internal states is usually realized through the sensory (emotional) sensation of “image representations” of such signals (for example, [1–5]). This occurs unconsciously, especially if the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 10–33, 2021. https://doi.org/10.1007/978-3-030-60577-3_2
Who Says Formalized Models are Appropriate for Describing Living Systems?
11
sensory sensation is far from being “disturbing”. If the system is motivated to realize the results of its functioning, it uses various types of “languages” of description (everyday, specialized, model-formalized descriptions). It is well known that a number of concepts whose meaning in everyday or specialized languages raises no doubt are difficult to translate into the language of formalized (engineering) models (for example, [6–28]). Why? Maybe because of improperly selected models and, consequently, inadequate language? The lack of agreement on the meaning of concepts such as “mind”, “consciousness”, “intuition”, “cognitive filters”, “specific” and “non-specific” reactions of living systems involuntarily inhibits understanding of the mechanisms implemented in living systems and motivates the development of new approaches to the formalized description of these modes. Even using a physical methodology does not eliminate subjective preferences in the interpretation of recorded data (for example, we do not like the data, so we omit them). For mutual understanding between the interested researchers, it is necessary to agree on the “description languages” used by announcing them as well as the “image representations” corresponding to them by default. The title of this article draws attention to this very feature. The goal of this work is schematic consideration of the integral system consisting of two subsystems, which demonstrates the possibility of implementing a wide range of modes analogous to the states and reactions of living systems. The main motivation is the need for further development of the concepts of the capabilities and functions of living systems under stressful conditions. One of the subsystems describes possible states of the basic control recognition module, which makes decisions based on the algorithms adapted to the type of the processed sensor signals. When errors in the subsystem operation exceed the specified threshold, a signal triggering stress response is generated. The second subsystem describes the mechanisms of formation and different levels of the development of three stages of stress response, alternative variants of getting out of stress, and controlling sensory signal perception thresholds in the first subsystem. Such hierarchically organized models contain a large variety of possible dynamic modes. By comparing them with experimental data an interested researcher can choose the interpretations and the level of describing living systems which, to their mind, will better corresponds to the currently accepted scientific approach. However, there are a lot of data about living systems that are rejected by the “scientific approach”. The integral system and its functional modes of operation considered here, allows us to demonstrate the existence of schematic descriptions, which may seem to contradict the generally accepted “scientific points of view”.
2 Subsystem Recognizing Sensor Signals Biological systems can be modeled as interacting hierarchies of recognition systems (modules) for diverse streams of sensory signals. Naturally, the desire to describe all the details and capabilities of real biological systems is not an optimal approach. It is important to choose the most universal variables (words in the language of description or construction) that allow understanding the meaning of the operations performed at different levels of the description of the living beings. For example, the introduced variables must allow determining and describing the modes of functioning of the
12
V. G. Yakhno et al.
“mind” of the model in question; demonstrating the processes of manifestation of “consciousness”, “intuition”, “cognitive filters”; and distinguishing the differences between “specific” and “non-specific” reactions. The evolution of the levels, type of “knowledge” and “data” of the considered model systems should preferably be comparable with the main characteristics of the age stages of the development, the descriptions of which are statistically most adequate for people in the conditions of their existence. 2.1
Variants of Architectures of the First Subsystem and Its Reaction
Among the architecture options that correspond to the experimental data and were proposed for consideration by many researchers (see, for example, [6–28]). The conceptual approaches of P. Anokhin’s theory of functional systems and variants of the corresponding models were considered in the works [6–12]. Versions of the processes associated with the comprehension of sensory signals, the formation of system goals based on past experience, modes of increasing the accuracy in the training and decision-making procedures were addressed in [12–20]. It’s worth mentioning the works [21–24] demonstrating significant advances in software implementation of cognitive architectures for a wide range of applications. Examples of the studies of the goal-oriented behavior of neural network agents when they choose various strategies for population development and analysis of the influence of the levels of learning achieved by individual agents on the rate of reproduction of generations in the population were considered, for example, in [25–27]. All these systems can use hierarchical architectures to more effectively achieve their goals. However, for achieving the goals set in this paper, to obtain joint functioning of the first subsystem with the stress response module (the second subsystem), it appeared sufficient and more convenient for us to rely on the results of functional models [6–12]. The use of such subsystems allows highlighting the main variables (words of the language) that describe the state of the recognizing module as a whole, without going deep into details, which often hinder understanding. We have chosen the Basic Recognizing Module (BRM) [9–12], which permits describing the main dynamic modes of perception and decision making by living systems. The scheme of such a model system is shown in Fig. 1. The implementation of such a module makes it possible to tune the available algorithms to the type of processed signals from the sensors. This universal model scheme contains the elements that are necessary and sufficient for the formation of the “language” of the description of the known pictures of the World. These elements, as discussed in [5], are connected in a sign – meaning – cognitive unity of five parameters: meaning; a specific way of specifying this meaning; the corresponding sign (word); an indication of the subject area of the World, to which the sign and meaning belong; and the function of this system (cognemy), i.e. general assessment of its relevance/irrelevance to recreate the image of the world. Comparison of the results of the works on the “logical - engineering” [6–12] and “image” [5] approaches allows expanding this “image” (description) of important semantic elements for modeling procedures:
Who Says Formalized Models are Appropriate for Describing Living Systems?
13
Fig. 1. Main elements of the first subsystem (BRM) which allow describing a wide range of modes when processing sensor signals and similar reactions of living systems. The functional operations performed by individual elements are shown in Table 1.
1) MEANING – The target vector (meaning of functioning) and description of the characteristic features of the studied signals (images) are usually set by external circumstances; cognitive filters, the error vector, the decision made on the expected signal, “YES” - “SETUP” - “NO”, are provided by a priori data and algorithms of the researcher-developer, as well as by the results obtained for the previous period of functioning of the system under construction; 2) METHOD – The sets of processing and decision-making algorithms representing the “knowledge” of the system have been developed over the previous period of functioning of both the researcher-developer and the system being constructed; 3) SIGN – The code or semantic description, formed by researchers-developers of the system; 4) AREA – The subject areas of operational activity, a precedent base of signals for training, the “providence” of the system is set by researchers-developers of the system or external circumstances; 5) FUNCTION – Functioning modes (unconscious, conscious and intuitive processes) are provided with data and algorithms developed over the previous period of the system’s functioning. Table 1 provides a further, more detailed, description of a sufficient set of variables for the “logical-engineering” modeling of such a physical system [6–12] and consideration of the necessary set of functional operations known for living systems.
14
V. G. Yakhno et al.
Table 1. The main functional operations performed by the elements of the Basic Recognizing Module (BRM - the first model subsystem). Elements of the Basic Recognizing Module CODING
RECOVERY (IMITATION of the input signal) It is these functional operations that distinguish living systems from non-living systems
ERROR (RESIDUAL DIFFERENCE)
THE CHOICE of VIRTUAL IMAGE and DECISION MAKINGS
Basic functional operations of the recognizing module Knowledge variable 1 – A1: Coding algorithms used to convert input signals to code description Data variable 1 – I1: Input sensor signals – images (number tables) Data variable 2– I2: Code description – feature vectors Data variable 3– I3: Filter masks – tables or vectors of weight coefficients that change the input signals or code description so that the accuracy (efficiency) of the decisions made is increased. They are formed on the basis of the previous experience in the functioning of the system for each of the algorithms used Knowledge variable 2 – A2: Algorithms for generating versions of input signals from code descriptions Data variable 4 – I4: Image imitating the input sensor signal – an internal idea of the system about the input signal The analysis of experimental data allows us to postulate that the implementation of precisely these functional operations distinguishes living systems from non-living ones Knowledge variable 3 – A3: Algorithms that allow calculating the measure of difference (error – residual difference) between a given level on the path of converting the input signal into a code description and the corresponding level for the expected signal on the path of converting from the code description to an image that simulates an input sensor signal Data variable 5 – I5: Error vector between the input image and the image simulating the input sensor signal Knowledge variable 4 – A4: Algorithms controlling the sequence of actions for registering an input signal, its coding, calculating differences, performing tuning procedures, decision-making procedures, planning actions at different stages; in fact, these algorithms ensure the operation of the OPERATING SYSTEM for controlling the (continued)
Who Says Formalized Models are Appropriate for Describing Living Systems?
15
Table 1. (continued) Elements of the Basic Recognizing Module
DATA about PAST and EXPECTED (PLANNED) MODES of the MODULE The element implements modes similar to the basic functional operations of EPISODIC MEMORY in living systems.
Basic functional operations of the recognizing module module, which allows organizing the cycles of operation: - STAGE of TRAINING by the precedents of the input signals RECOGNITION STAGE - STAGE of ANALYSING the ERRORS recorded by the module at the training and recognition stages – etc. Knowledge variable 5 – A5: Algorithms for making decision based on I5 values Data variable 6 – I6: One of the types of Decisions: a) I6 < H1 – “YES”, consistent with the expected “image”, б) H1 < I6 < H2 – “Continuation of the search for an adequate solution”, continued adaptation settings, в) I6 > H2 – “NO” does not match the expected “image” Knowledge variable 6 – A6: Algorithms performing indexing of all algorithms used by the system in decision-making procedures on the basis of the input signal Knowledge variable 7 – A7: Algorithms that make up the sequence of index descriptions for all actions of the module at a fixed (n-th) stage of work with a specific input sensor signal (semantic description of the operations performed by the module) and write this data (I7) in its memory if the error vector (I5) exceeds the threshold (I5 threshold) formed by the module based on its past experience Knowledge variable 8 – A8: Algorithms that evaluate and make decisions on the formation of signal Sai ðtÞ and trigger a particular type of emotional reaction of the integral system if the error vector (I5) exceeds a threshold defined by the module (I5 threshold) Knowledge variable 9 – A9: Algorithms restoring (using A4, A5 and other subordinate to them algorithms) any previously performed signal processing procedures based on the available index descriptions Knowledge variable 10 – A10: Algorithms controlling the acceptable level of errors (continued)
16
V. G. Yakhno et al. Table 1. (continued)
Elements of the Basic Recognizing Module
DATABASE and BASE of FILTERING MASK BASE for MODELS and signal PROCESSING ALGORITHMS BASE for COGNITIVE MODELS and control ALGORITHMS
Basic functional operations of the recognizing module accumulated by the module and provide analysis and adaptation of existing algorithms, as well as the search for new possibilities for correcting errors detected at the stages of training and recognition; the solutions found are recorded in the memory of semantic descriptions (I7) and marked as probable applicants for execution in the planned actions of the system Data variable 7 – I7: Index data on the operations performed by the module at the registered stages, allowing at the stage of analysis of the received errors to reproduce using the available algorithms all processing and decision-making operations even in the absence of an input signal Data variable 8 – I8: The signal Sai ðtÞ transmitted to the second subsystem to trigger the emotional reactions of the integral system which the first subsystem considers the most optimal in the current situation based on its past experience Provides recording, storage, search and retrieval for use of variants of the following Data variable: I1; I2; I3; I4; I5; I6; I7; I8 Provides recording, storage, search and retrieval for use of variants of the following algorithms: A1; A2; A3; A4; A5 Provides recording, storage, search and retrieval for use of variants of the following algorithms: A6; A7; A8; A9; A10
It should be noted that in the chosen architecture of the first subsystem (Fig. 1), the element RECOVERY (IMITATION of the input signal) plays a decisive role in describing the dynamic recognition processes of interest to us. The remaining major elements are modified for implementing and maintaining the basic idea of P.K. Anokhin [6] about the “advanced reflection” introduced for adequate description of living systems. Using the basic recognizing module, one can construct both modes of image response and a model-logical description of the modes of behavior of living systems. Such a model module demonstrates universal properties and can be used to describe signal processing modes at different levels of hierarchical control in an integral system (for example, the “I” and “SELF” modules are the upper level; the underlying ones are
Who Says Formalized Models are Appropriate for Describing Living Systems?
17
specialized modules, etc.), with any input signals and at different time scales of the analyzed signals. At the first step of studying the modes of functioning of a model module, it is usually assumed by default that its algorithms work stably (i.e. energy supply is unchanged during the process under study), and the external impact is expressed in the changes made by the developer-researcher. However, in a more realistic analysis of the situation, one should take into account the dependence of the parameters of the considered module algorithms on the levels of energy supply, emotional states, the adequacy of the developed algorithms to the specific features of the subject (functioning) area, target functions (plans), and also the types of interaction of the subsystem with the environment. With these features taken into account, it is convenient to consider the main patterns in the functional modes of the basic control recognition module using “resource diagrams” (Fig. 2, Fig. 3), which can be considered as an additional level of the integral description of perception modes in such a system. Resource diagrams are built on the basis of experimental statistical verification of processing algorithms and decision making in recognition of sensory signals. Statistics of such studies show that in the case of unconscious perception, the signal is close to the expected signal (at the decision threshold d = |I5| < H1, see Fig. 2a), no more Runcon values are required (time, allocated energy, the number of test algorithms, and some other parameters).
B) A) C)
Fig. 2. A) Resource diagram demonstrating the levels of available resources when performing different modes of signal perception; d0frr < d < d0far, B) Experimental dependences of the rate errors of the first, FAR (d), and second, FRR (d) kinds on the decision threshold (d) in the case when EER = 0 in the range d0frr < d < d0far the system demonstrates 100% recognition accuracy in this specified range; C) Experimental dependences of rate errors of the first and second kind on the decision threshold in the case when FAR(deer) = FRR(deer) = EER(deer) > 0.
When filtering a signal far from an expected one (with d = |I5| > H2), a low level of Rfiltr * Runcon resources is also required. The fact is that the values of H1 and H2 are determined from the analysis of the rate error dependences of the first, FAR (d) – False Acceptance Rate, and second, FRR (d) – False Rejection Rate, kinds: H1 < or = d0far, and H2 > or = d0frr (see Fig. 2b). Whereas the inclusion of the mode of conscious perception (for H1 < d = |I5| < H2, see Fig. 2c) requires numerous settings and checks, which leads to a significant consumption of resources. Therefore, the statistically
18
V. G. Yakhno et al.
determined maximum level of Rcon resources allocated for this will be >> Runcon * Rfiltr. Thus, if, in the case of a tuning (conscious) mode, the system determines which of the previously known algorithms it reaches and provides the condition d = |I5| < H1 at R < Rcon, then a “specific reaction” of error elimination is realized. If it is impossible to find such an algorithm and the spent resource R is larger than Rcon, the first subsystem is forced to start a “non-specific reaction”, activating the second subsystem. The mission of a “non-specific reaction” is to create conditions under which it is possible to find a version of the algorithm that can eliminate a registered error. In this case, the formation of the vector of triggering signals Sai ðtÞ for the second subsystem occurs on the basis of the past experience of the first subsystem. The variety of such signals corresponds to the established stereotypes, i.e. those familiar sequences of states or reactions that the first subsystem experienced and remembered at the previous stages of its life. One of the options for a set of possible stereotypic conditions was proposed by S. Grof based on his extensive experience in working with patients [3, 4]. Four conditions were identified, which he called condensed experience systems (CES) and which reflect characteristic combinations of emotions, sensations and images associated with successive stages of childbirth and the experiences of the fetus and newborn in the perinatal period. In the further experience of peoples evolution, all events occurring with them are compared with the corresponding CES. A simplified interpretation of CESs and their notation in our model, if we proceed within the framework of this concept, is as follows: a) the “state of inspiration for fulfilling current plans” is denoted by Sa1 ðtÞ; b) the “state and feeling of hopelessness” by Sa2 ðtÞ; c) “overcoming a hopeless state, difficult search for solutions, implementation of selected solutions” by Sa3 ðtÞ; and d) the “implementation of the solution and achievement of the intended results” by Sa4 ðtÞ. A comparison of the operating modes of a model recognizing system (Fig. 1, Table 2) with experimental data allows presenting in natural definitions of those characteristic features that correspond to the main dynamic modes of signal processing or conversion. Naturally, the dynamic modes of the transition between various possible states or reactions of the system under study are determined by energy recharge, information recharge and control signals from the external environment. Table 2. The main modes of functioning in the perception of the first model subsystem. Modes of functioning UNCONSCIOUS perception of sensory signals
Definition of features for main operating modes Unconscious perception is based on a great learning experience and the correct operation of the system, when the recorded signals are recognized with a small error (for d = |I5| < H1). A low level of Rfiltr * Runcon resources is required (continued)
Who Says Formalized Models are Appropriate for Describing Living Systems?
19
Table 2. (continued) Modes of functioning CONSCIOUS perception of sensory signals
CONTROLLING COGNITIVE FILTERS of recognizing system
“SPECIFIC” reactions of INTEGRAL SYSTEMS
“NON-SPECIFIC” reactions of INTEGRAL SYSTEMS
INTUITIVE sensory perception
Definition of features for main operating modes “Awareness – Consciousness” of any sensory signal is always work with images imitating an input sensory signal (including the process of its construction), i.e. with an internal idea of the system about the input signal [6–8]. The elementary process of “awareness – comprehension” of the image of the input sensor signal is aimed at optimizing (tuning cycles) and improving the accuracy of the recognizing system associated with the selection of an adequate coding algorithm, features and filter mask for the information signal (image). Conscious signal recognition requires a higher level of resources: Rcon > or >> Runcon * Rfiltr Cognitive filters are used to improve the accuracy of decisions, as well as to save energy resources of the system. The effectiveness of their work is checked by past experience of the functioning of the system in a specific area of its operation for each type of signal. As a result, stereotypes about “dangerous” signals are formed, on which it is undesirable to spend “valuable” resources The integral system is confident (i.e. current errors made during its operation are less than a certain threshold) in the ability of controlling the resources it needs to reduce errors found in the area of its functioning. The system uses only adaptive tuning modes for the functioning algorithms already available in the first subsystem of algorithms of functioning The integral system is aware that its available resources (information or energy) lead to big functioning errors in the area of its operational activities. The system is forced to start the same (non-specific) type of control process to find the resources it needs to overcome the “difficult” situation, consisting of three phases: “Anxiety (Alarm)”; “Resistance”; “Exhaustion.” Intuitive perception is characterized by a correctly made decision in the absence of experience with this type of signal. This differs it from unconscious perception. At the first stage, when H1 < d = |I5| < H2, the system spends its resource searching for the desired algorithm, and having spent the allocated resource Rcon, it launches a “non-specific reaction” for further search. The options for finding the right solutions and a description of the reactions after eliminating the error require separate consideration as they depend on the contextual conditions of the system (continued)
20
V. G. Yakhno et al. Table 2. (continued)
Modes of functioning MIND of recognition system
2.2
Definition of features for main operating modes The mind of a recognizing system is determined, first of all, by the area of its operational activity and can be estimated from the experimental curves FAR (d), FRR (d) - errors of the first and second kind depending on the decision threshold d. System accuracy criteria: d0far, d0frr, EER. In cases of EER = 0 in the d0frr < d < d0far range, the system demonstrates 100% accuracy in the indicated range. For the cases d0far < deer < d0frr, FAR (deer) = FRR (deer) = EER (deer) > 0. Of course, the system, is more reasonable if its criteria demonstrate the best increase in accuracy after the training procedure. A decrease in accuracy criteria indicates degradation of the system. The intellectual system seeks to increase its information and energy resources. Although, depending on the contextual situation, options are possible
The Impact of Context Conditions on the Operation of the First Subsystem
When only one integral system is considered, the influence of the second subsystem on the first one during the implementation of the “stress reaction” is manifested in the changes in perception thresholds (H1stress and H2stress) allowing energy resources to be solved. First of all, the range of the conscious search for solutions decreases, there is even a tendency to reject it. Figure 3 illustrates options for such changes. The influence of other basic recognition modules can be realized in more complex modes of changing H1stress (t) and H2stress (t).
A)
B)
C
Fig. 3. Examples of possible changes in resource diagrams under stress. A) H1stress (t) and H2stress (t) decrease to values less than H1; B) H1stress (t) and H2stress (t) tend to each other in the range H1 < H1stress (t) * H2stress (t) < H2; C) H1stress (t) * H2stress (t) increase more than H2.
Who Says Formalized Models are Appropriate for Describing Living Systems?
21
The situation shown in Fig. 3A means that stress changes in the thresholds allow only part of the previously recognized signals to be recognized correctly, and all other signals leave the perception zone (version of the responsible system). In the case shown in Fig. 3b, stress changes in thresholds lead to an increase in the error for correctly recognized signals and increase the proportion of screened signals (a variant of a medium-responsible system). The changes shown in Fig. 3c, lead to a strong increase in recognition error (variant of an irresponsible system). The interpretation of these results can change dramatically if we take into account the well-known multiple connections (hierarchical, network, etc.) with other similar recognition systems. These include social systems [16], the manifestation of the second “I” the “SELF” of the system [2, 29], and various options for non-verbal communications in living systems [2–4, 30–32]. In this case, stress can change the effectiveness of cognitive filtering in communication channels through which new informational representations from other recognizing systems are transmitted. Then the situation shown in Fig. 3a, means the closure of the system only within the framework of its algorithmic representations (a variant of isolation from a new knowledge). And the changes shown in Fig. 3c provide the perception of new images, algorithmic models from other systems, with different operational experience (a variant of a system open for new knowledge).
3 Subsystem Describing the Stages of Stress Reaction The stress theory of Hans Selye [33] has undergone major changes over the 85 years of its existence. Of course, most of its basic provisions have remained relevant today. For example, the classical triad of H. Selye (hypertrophy of the adrenal glands, involution of the thymus and ulceration in the gastrointestinal tract) has been verified by millions of experimental and clinical observations, which confirms the initial thesis of G. Selye about the unconditional destructiveness of the stress reaction. Despite some attempts to revise the three-stage structure of stress [34, 35], further studies have shown their complete failure. As H. Selye pointed out, stress develops through successive stages of anxiety, resistance and exhaustion. However, we can’t but mention significant changes in the structure of the theory of G. Selye associated with the emergence of new methods, approaches and research paradigms. The central ones are the issues related to the concretization of neuroendocrine stress mechanisms, the complexity of systemic management of this process, and the features of fundamentally anti-adaptive manifestations of the reaction. The basic principles of stress theory, which postulated a decisive role in its development of two stress-reactive systems (sympathoadrenal - SAS, hypothalamic-pituitary-adrenal HPAS) in the last four decades have been supplemented by an understanding of the leading role of the third system: endogenous opioid (EOS) [36–38]. It was revealed that the multiplicity of control systems of body functions, which is characteristic of a comfortable state, reduces (or regresses) under stress to absolute dominance of three stress-reactive systems: SAS, HPAS and EOS [39, 40]. Finally, the key issue is the anti-adaptive properties of stress. The wording of H. Selye [41]—stress is a common adaptation syndrome—had a dramatic effect on the whole doctrine of stress.
22
V. G. Yakhno et al.
Inappropriate use of the term “adaptation”, popular in the first half of the twentieth century, [42] led to scholastic disputes about the origin of the unknown “adaptive energy” [43]. The mainstream in the study of stress was blurred, which led to the disavowal the main provisions of the theory and jeopardized its very existence. Meanwhile, even in the early works of H. Selye, convincing evidence of the fundamentally anti-adaptive properties of the protective reaction of stress was presented: the triad of H. Selye is formed not as an adaptation to a harmful effect, but as a retribution for the ability to resist this damage. Under stress, as opposed to adaptation, the thresholds for distinguishing sensor signals (differential thresholds) are steadily increasing [11, 44], because there is no need to closely monitor the features of processes for which a rigidly determined non-specific defense reaction has already been launched. Finally, stress always leads to the extreme standardization of all vital signs: the variety of adaptive responses to stress is replaced by uniformly unified processes [39]. In fact, stress, like other non-specific defensive reactions (e.g. inflammation), ensures the salvation of the whole organism by sacrificing private: cells, tissues of this organism. Pneumonia with COVID-19 is a typical example of this kind of reaction. As excessive protection during stress leads to the development of shock, so excessive protection during inflammation leads to the fatal degeneration of the tissues of the vascular bed and lungs. Thus, stress should be considered today as a non-specific, protective, staged, systemic, reduced (regression), psycho-physiological response to damage or its threat [45]. Obviously, further investigation of stress requires not only experimental modeling and clinical observations, but also a formalized description of the mechanisms of its development. We have previously made attempts to mathematically model this process. In the first case [46], the model biologically plausibly describes the dynamics of mean arterial pressure in the interaction of three stress-reactive systems: SAS, HPAS and EOS, in stress and shock. However, it did not take into account the starting threshold conditions for triggering the stress response, which are dependent on the genetically determined and acquired characteristics of the subject, and the mechanisms of stress completion. Another model [47] reproduces with high accuracy the dynamics of the rhythmocardiogram under acute stress and is reliably verified in the experiment. The model demonstrated certain predictive capabilities. However, it describes only the initial stage of stress - the stage of anxiety - and takes into account the interaction of the mediator (“fast”) component of the EOS and the activity of the SAS. Some models [48, 49] also take into account the energy supply of the body. When comparing model and experimental results, one should also take into account data from [2–4, 44–50]. 3.1
Basic Variables and Model Balance Equations
Consider the next stage in the development of a mathematical model capable of giving a complete formalized description of the mechanisms of triggering a stress reaction, the formation and development of the three stages of this reaction, and alternative ways of overcoming stress. For this, the model uses knowledge of “fast” (“light” opioid peptides mainly with mediator properties – enkephalins) and “slow” mechanisms (“heavy” opioid peptides mainly with hormonal properties – endorphins and dynorphins) [51]. Data on the role of “awakening” paraopioid FaRPs (FMRFamide Related Peptides)
Who Says Formalized Models are Appropriate for Describing Living Systems?
23
peptides were also used [52]. At the stage of stress anxiety, options are described for both active protection (“fight-or-flight” [42]) with the dominance of SAS, and passivedefensive reaction (“freezing” [50]) with the dominance of EOS. The following variables in Table 3 are used in the model subsystem describing non-specific reactions of the integral system. Table 3. Description of variables ANXIETY-PHASE (the characteristic time from a second to tens of seconds)
RESISTANCE PHASE (the characteristic time is tens of minutes - hours)
EXHAUSTION PHASE (characteristic time hours - day).
Variable 1 – К1: SAS – sympathoadrenal system, activates the system Variable 2– К2: EOS – endogenous opioid system (“light” enkephalins, fast component), reduces sensory sensitivity Variable 3– К3: E(t) – efficient energy supply (fast spending – restoration of used resources, apparently with a limited amount of reserves) Variable 4 – К4: HPAS – hypothalamic-pituitaryadrenal system, helps to remember the solutions (actions) found Variable 5 – К5: Oxytocin, contributes to the erasure (forgetting) of the image of the “pain” associated with the event Variables 3 and 7 – К3: E(t) and К7: E(t) from different sources Variable 6 – К6: EOS – endogenous opioid (“heavy” endorphins and dynorphins) system (slow component), causes relaxation of the system and increases thresholds of sensory systems, as well as the launch of recovery mechanisms for resources used Variable 7– К7: E(t) – additional energy sources (with reduced efficiency, significant, but less accessible reserves), provide the necessary energy expenditure and the mechanism of slow recovery of used resources) Variable 8– К8: Awakening peptides (FMRFamideRelated-Peptides = FaRPs) [45], apparently aimed at controlling the thresholds of the EOS – endogenous opioid (“heavy” endorphins and dynorphins) system (K6)
Thus, any information signal (external or internal) either causes a specific reaction (with adequate conscious or unconscious recognition), or a nonspecific process is started associated with the search for new algorithms for getting out of the situations in which the previously mastered algorithms fail. It is known that experienced people with the necessary amount of knowledge and “energy supply” can learn in any difficult situations without falling into stress. And when the knowledge or “energy supply” are insufficient, the search for a way out of a difficult situation starts through stress stages. Balance equations for the variables (components) providing non-specific reactions are written in the form (the procedure for deriving similar equations was described in
24
V. G. Yakhno et al.
detail in [10], and then implemented in the previous model for describing three-stage stress [46]): " # 8 X dKi Ki 1 ¼ þ Fi T0i þ rij Kj þ Sai ðtÞ ; where i ¼ 1; 2; 4; 5; 6; 8 dt s2i s1i j¼1
ð1Þ
Ki are the variables introduced in Table 3, s1i are characteristic time of activation of the i-th variable; s2i are characteristic time of utilization of the i-th variable; T0i are activation threshold of an activation nonlinear function for the i-th variable; rij are coefficients of the influence of the j-th variable on the i-th variable; Sai ðtÞ are activation signals to the i-th variable from the side of the recognition system, which detected a failure in the operation of its algorithms. Changes in the variables K3 and K7 are determined by the equations: " # 8 8 X X dKi Kj 1 ¼ þ Fi T0i þ rij Kj þ Sai ðtÞ s1i dt s j¼1 3ij j¼1
ð2Þ
where i = 3, 7, and s3ij is the characteristic time of consumption by the j-th component of the energy taken from the i-th energy source. All other parameters are the same as in Eqs. (1). Nonlinear activation functions are defined in piecewise linear form Fi ½ )
8
К2(t):
2.1b. Activation under the condition when at first K1 (t) < K2 (t), and then К1(t) > К2(t):
a)
b)
Fig. 4. a) Activation, search for a solution based on genetically specified (“instinctive”) programs (“fight-or-flight” according to W. Cannon [42]), transition to the 2nd stage. b) Combined mode consisting of two phases: the initial “freeze”, and then the inclusion of the search for new solutions.
2:2. Fading under the condition K1 (t) < K2 (t):
Fig. 5. a. Fading (“freezing” according to T. Inoue [50]), refusal to search for new solutions, the option of “depression”, transition to the 2nd stage of stress (curve 1 in b).
26
V. G. Yakhno et al.
If K3 = E (t) becomes < Kcr = Ecr, then the body passes to a more severe pathological state (“shock”, loss of consciousness – curve 2 in Fig. 5b.). 3. The second stage of stress, RESISTANCE (characteristic time is tens of minutes – hours). 3:1. HPAS and oxytocin activation under the conditions К4(t) > К5(t):
Fig. 6. If the action of HPAS prevails over the action of oxytocin (Fig. 6a), then the found solutions (actions) are memorized, the transition to the 3rd stage (curve 1 in b).
If K3 = E (t) becomes < Kcr = Ecr, then the body passes to a more severe pathological state (“shock”, loss of consciousness – curve 2 in Fig. 6b). 3:2. Activation of HPAS and oxytocin under the conditions К4(t) < К5(t): 3:3. With a decrease in the level of available energy resources К3 = E(t) becomes less than Kcr = Ecr, the body passes to a more severe pathological state (“shock”, loss of consciousness – curve 2 in Fig. 7b). 4. The third stage of stress, EXHAUSTION (characteristic time is hours – days). 4:1. The interaction of the components under the condition К8(t) > К6(t) and К7(t) = E(t) is always greater than Кcr = Ecr – the critical value of the energy supply of the body. Thus, the stage of “rest and restoration” of the system is implemented.
Who Says Formalized Models are Appropriate for Describing Living Systems?
27
Fig. 7. If the action of oxytocin prevails over the action of HPAS, then the found solutions (completed actions) are erased (forgotten), and the transition to the 3rd stage occurs (curve 1 in b);
Fig. 8. If the action of the “Awakening peptides” (FaRPs) prevails over the action of the slow component of the EOS (a.), then the level of energy available to the body (b.) is restored (grows) and the variables responsible for non-specific reactions go back to the initial range of “norms” (below threshold values).
4:2. The interaction of the components under the condition К8(t) < К6(t): 4:3. This increases the risk of the transition to a more severe pathological condition. If the variable К3 = E(t) becomes less than the value Кcr = Ecr (curve 2 in Fig. 9b), the system passes to the state of “shock” and loss of consciousness.
28
V. G. Yakhno et al.
Fig. 9. The interaction of the components К8(t) and К6(t) during the slowed down process of restoring the level of energy available to the body (curve 1 in b).
Variants of non-specific responses triggered by the first subsystem using the signal Sai ðtÞ generates feedback from the second subsystem to the first. These influences change the conditions for decision making in the first subsystem. The options shown in Fig. 3 demonstrate a change in the accuracy of the decisions made and, accordingly, the choice of a new control signal that launches the second subsystem. So the integral system, gradually changing its interpretation of the input information signal, chooses a more appropriate relationship with the environment for it.
4 Examples of Operating Modes of an Integral System and Comparison with Experimental Data For verification of the solutions of the model, it is intended to use, in the first place, the results of the technology of event-related telemetry of heart rhythm (ERT HR) [53], which allows high-accuracy recording of the dynamics of various functional states under natural conditions of activity. At the heart of the ERT HR is the cardiac rhythmography method [54] implemented on the basis of mobile telemetry and WEB technologies. The implementation method provides continuous monitoring of the functional state of the test subject in non-stationary contexts without restrictions on mobility and distance. Algorithms for detecting stress and mapping stressful contexts have been developed and automated. The recorded data allows, in particular, evaluating in real time the levels of the subject’s energy resources and the significance of the information context in various situations of natural activity. It is important to note that the interpretation of recorded data is practically impossible without theoretical hypotheses adequate to the studied living subject. The integral system considered allows us to single out the most important parameters, the dynamics of which must be recorded in order to assess the mechanisms
Who Says Formalized Models are Appropriate for Describing Living Systems?
29
of changes in the functional states of a living system. For example, during discussion of the results of this article, one of the authors had to fix the interpretation of external signals and assess the processes of changes in their own functional state. It turned out that the signal from the external environment, which led to the transition from the state Sa1 ðtÞ (“state of inspiration for fulfilling current plans”) to the state Sa2 ðtÞ, (“state and feeling of hopelessness”) triggered the dynamic process shown in Fig. 4b. But when the process presented in Fig. 6 changed to that in Fig. 8, it became clear that the interpretation of the original signal was erroneous. The decision thresholds in the recognition subsystem were changed, and, in accordance with the new interpretation, the second subsystem was re-launched by the signal Sa3 ðtÞ (“overcoming a hopeless state, difficult search for solutions, implementation of selected solutions”). Then the functional state was changed following a similar scheme Fig. 4a -> Fig. 6, and again it was realized that, in accordance with the experience of the integral system, the interpretation of the signal was erroneous. After changing the thresholds in the recognition subsystem and triggering the signal Sa4 ðtÞ (“implementation of the solution and achievement of the intended results”), it became clear that a solution satisfying the integral system can be implemented without launching the stress system. Such an experiment with subjective analysis, obviously unplanned by the authors of the article, completely convinced the authors of the efficiency of the results obtained in the article. Is such a formalization necessary if such subjective descriptions are interpreted through a simple and understandable image: “well, a person suffered, and then found the right solution”? However, only through such formalization adequate diagnostic features can be found in a large database of instrumental measurements, which is now being accumulated in the framework of the technology (ERT HR) [53]. It is with the help of this instrumental method that it is planned to compare theoretical and experimental results in the future.
5 Conclusion and Outlook The use of “image” description – training, “formalized” description – training, and a non-specific system that launches the search for ways of correcting the accumulated errors in a modified channel model allows constructing “formalized” description procedures for more voluminous material of experimental data. The demonstrated variety of modes makes it possible to further refine the model description language, as well as to identify controversial points of the approach used here. 1. For a recognizing subsystem, an understanding of the object being studied, the meaning of the situation in which it operates, is associated with the rapid formation of the “image” of this sensor signal. Then, algorithms for a formalized description of the details of such an “image” come into play. These algorithms have been created when learning this “image”. Consequently, the “image” channel represents the most effective way to describe the meaning, and the “formalized” description channel provides the opportunity to realize a more detailed structure, details of the “image” during recognition or the ability to create a single “image” from the flows of sensor signals learning.
30
V. G. Yakhno et al.
2. A variety of dynamic processes in the procedures of the “formalized” description of the operating modes of recognizing systems can be demonstrated using various types of information and energy resource diagrams. Such diagrams demonstrate the security level of the algorithms for processing sensor signals (information resource) and sources that ensure the operation of these algorithms (energy resource) at the considered moment of integral system operation. A sufficient level of resources allows such a system to overcome the problems created by the external environment, using the previously developed algorithms for the system functioning. It is natural to name such processes as specific reactions. 3. Non-specific reactions are triggered when the system is aware of the inadequacy of available resources in the context of the need to perform the target settings of the recognition system. The purpose of such reactions is to find a new informationalgorithmic resource. At each stage of this process, from 2 to 5 modes of confrontation between the possibilities to activate or deactivate the function in progress at this stage can be realized (“active search” – “freezing”; “remembering the result” – “forgetting”; “relaxation” – “return to initial activity”). The parameters for the implementation of such modes are formed by the operating conditions of the integral system for the previous period. At the same time, energy consumption increases at all stages. Consequently, the loss of control of the timely connection of new energy sources can lead to a breakdown in the pathological conditions for the system. An important role can be played by controlling the type of non-specific reactions of the recognition module. 4. The proposed formalized model allowed us to consider a wide range of states and various types of dynamic modes and compare them with the reactions of living systems previously registered only experimentally. This fact demonstrates the validity of the chosen model architecture and the possibility of using the results of its analysis as an adequate “language” of communication for researchers of living systems. We believe that formalized models are necessary for understanding the meanings and consequences of unconscious perception through “image”, including sensation channels. They allowed us to formalize the description of a number of processes which were earlier interpreted ambiguously. We intend to further verify the efficiency of this approach. Acknowledgements. The authors are grateful to A.N. Gorban for his support and active participation in the discussion of the results presented here. This research was funded by the Ministry of Education and Science of Russia, grant number 14.Y26.31.0022.
References 1. Piaget, J.: The Psychology of Intelligence. Routledge, London (1948) 2. Wilber, K.: Integral Psychology: Consciousness, Spirit, Psychology, Therapy. LLC “Publisher AST” and others, Moscow (2004). (in Russian) 3. Grof, S.: Adventure of Self-discovery: Dimension of Consciousness. New Perspectives in Psychotherapy. State University of New York Press, Albany (1988)
Who Says Formalized Models are Appropriate for Describing Living Systems?
31
4. Grof, S.: The Ultimate Journey: Consciousness and the Mystery of Death. Publishing house AST, Moscow (2007). (in Russian) 5. Karaulov, Yu.: Conceptography of a linguistic picture of the world. Article 1. The first stage of the “Ascent” to the image of the world: from elementary figures of knowledge to subjectreference areas of culture. In: Vasilyeva, N.V. (ed.) Problems of Applied Linguistics, Collection of articles, no. 2, pp. 7–17. Azbukovnik, Moscow (2004). (in Russian) 6. Anokhin, P.K.: Selected Works. Philosophical Aspects of the Functional System Theory. Nauka, Moscow (1978). (in Russian) 7. Sergin, V.Ya.: Psychophysiological mechanisms of awareness: self-identification hypothesis. J. High. Nerv. Act. 48(3), 558–570 (1998). (in Russian) 8. Ivanitsky, A.M.: Brain Mechanisms of Signal Evaluation. Moscow (1976). (in Russian) 9. Yakhno, V.G.: Basic models of hierarchy neuron-like systems and ways to analyze some of their complex reactions. J. Opt. Mem. Neural Netw. 4(2), 141–155 (1995) 10. Yakhno, V.G.: Models of neuron-like systems. Dynamic information conversion modes. In: Gaponov-Grekhov, A.V., Nekorkin, V.I. (eds.) Nonlinear Waves 2002, pp. 90–114. IAP RAS, Nizhny Novgorod (2003). (in Russian) 11. Yakhno, V.G., Polevaya, S.A., Parin, S.B.: The basic architecture of the system that describes the neurobiological mechanisms of awareness of sensory signals. In: Aleksandrov, Yu.I., Solovyov, V.D. (eds.) Cognitive Research: Collection of Scientific Papers, no. 4, pp. 273–301. Publishing of House “Institute of Psychology RAS”, Moscow (2010). (in Russian) 12. Yakhno, V.G., Makarenko, N.G.: Will the creation of the “digital human twin” help us better understand each other? In: Redko, V.G. (eds.) Approaches to the Modeling of Thinking, pp. 169–202. LENAND, Moscow (2014). (in Russian) 13. Osipov, G.S., Panov, A.I., Chudova, N.V.: Behavior control as a function of consciousness. 1. Picture of the world and goal-setting. News of the Russian Academy of Sciences. Theory Control Syst. 4, 49 (2014). (in Russian) 14. Chernavskaya, O.D., Chernavsky, D.S., Karp, V.P., Nikitin, A.P.: On the role of the concepts of “image” and “symbol” in modeling the process of thinking by means of neurocomputing. Appl. Nonlinear Dyn. 19(6), 5–21 (2011). (in Russian) 15. Shumsky, S.A.: Reengineering of brain architecture: the role and interaction of the main subsystems. In: XVII All-Russian Scientific and Technical Conference with International Participation Neuroinformatics, Lectures on Neuroinformatics, pp. 13–46. NRNU “MEPhI”, Moscow (2015). (in Russian) 16. Internal Predictor USSR: Fundamentals of Sociology, St. Petersburg (2010). http://kob.su/ kobbooks/osnovy-sotsiologii. (in Russian) 17. Stankevich, L.A.: Cognitive systems of visual perception. In: Scientific Session MEPI 2013, XV All-Russian Scientific and Technical Conference “Neuroinformatics 2013”, Lectures on neuroinformatics, pp. 14–71. NRNU MEPhI, Moscow (2013). (in Russian) 18. Zhdanov, A.A.: Autonomous Artificial Intelligence. Binom. Laboratory of knowledge, Moscow (2008). (in Russian) 19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) 20. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 13(1–2), 41–77 (2003) 21. Mininger, A., Laird, J.E.: Using domain knowledge to correct anchoring errors in a cognitive architecture. Adv. Cogn. Syst. 7, 1–16 (2019) 22. Laird, J.E., Mohan, S.: Learning fast and slow: levels of learning in general autonomous intelligent agents. In: The Thirty-Second AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 7983–7987 (2018)
32
V. G. Yakhno et al.
23. Lindes, P.: The common model of cognition and humanlike language comprehension. Procedia Comput. Sci. 145, 765–772 (2018) 24. Demin, A.V., Vityaev, E.E.: Adaptive control of modular robots. In: Biologically Inspired Cognitive Architectures (BICA) for Young Scientists, 26 July 2017, pp. 145–150 (2017). https://link.springer.com/chapter/10.1007/978-3-319-63940-6_29 25. Telnykh, A., Nuidel, I., Samorodova, Yu.: Construction of efficient detectors for character information recognition. Procedia Comput. Sci. 169, 744–754 (2020). https://doi.org/10. 1016/j.procs.2020.02.170 26. Red’ko, V.G., Tsoy, Yu.R.: Estimation of the efficiency of evolution algorithms. Dokl. Math. (Rep. Math.) 72(2), 810–813 (2005) 27. Red’ko, V.G.: Model of interaction between learning and evolution. In: Biologically Inspired Cognitive Architectures (BICA) for Young Scientists, 26 July 2017, pp. 145–150 (2017). https://link.springer.com/chapter/10.1007/978-3-319-63940-6_20 28. Lakhman, K.V., Burtsev, M.S.: Mechanisms of short-term memory in the purposeful behavior of neural network agents. Math. Biol. Bioinform. 2, 419–431 (2013). http://www. matbio.org/2013/Lakhman_8_419.pdf 29. Moltz, M.: I am I, or How To Become Happy. Translation from English, Preface V. P. Zinchenko, E.B. Morgunova. Progress, Moscow (1991). (in Russian) 30. Maslow, A.: Motivation and Personality, 3rd edn. Translation from English. Peter, St. Petersburg (2008) 31. Sviyash, A.: Project “Humanity”: Success or Failure? Reflections on People and Their Strange Behavior. Astrel, AST, Moscow (2006). (in Russian) 32. Wilson, R.A.: Prometheus Risen. Psychology of Evolution. Translation from English, Nevstrueva, Ya. (eds). JANUS, Kiev (1999). http://www.yugzone.ru/uilson.htm 33. Selye, H.A.: Syndrome produced by diverse nocuous agents. Nature 138, 32 (1936) 34. Garkavi, L.K., Kvakina, E.B., Ukolova, M.A.: Adaptive Reactions and Body Resistance. Publishing house Rost. University, Rostov (1977). (in Russian) 35. Garkavi, L.Kh.: Activation Therapy. Antistress activation and training reactions and their use for healing, prevention and treatment. (in Russian). www.rak.by, http://knigi.konflib.ru/ 8himiya/74202–1-aktivacionnaya-terapiya-antistressornie-reakcii-aktivacii-trenirovkiispolzovanie-dlya-ozdorovleniya-profilaktiki.php 36. Bodnar, R.J., Kelly, D.D., Brutus, M., Glusman, M.: Stress-induced analgesia: neural and hormonal determinants. Neurosci. Biobehav. Rev. 4(1), 87–100 (1980) 37. Golanov, E.V., Fufacheva, A.A., Cherkovich, G.M., Parin, S.B.: The effect of opiate receptor ligands on the emotiogenic reactions of the cardiovascular system in lower primates. Bull. Exp. Biol. Med. 103(4), 424–427 (1987) 38. Onaka, T., Yagi, K.: Differential effects of naloxone on neuroendocrine responses to fearrelated emotional stress. Exp. Brain Res. 81(1), 53–58 (1990) 39. Parin, S.B.: People and animals in extreme situations: neurochemical mechanisms, evolutionary aspect. Bull. Novosibirsk State Univ. 2(2), 118–135 (2008). (in Russian) 40. Aleksandrov, Y.I., Svarnik, O.E., Znamenskaya, I.I., Kolbeneva, M.G., Arutyunova, K.R., Krylov, A.K., Bulava, A.I.: Regression as a Stage of Development. Kogito-Center, Moscow (2017). (in Russian) 41. Selye, H.: Experimental evidence supporting the conception of «adaptation energy». Am. J. Physiol. 123, 758–765 (1938) 42. Cannon, W.B.: Bodily Changes in Pain, Hunger, Fear and Rage: An Account of Recent Researches into the Function of Emotional Excitement. D. Appleton & Company, New York (1915) 43. Goldstone, B.: The general practitioner and the general adaptation syndrome. South Afr. Med. J. 26(5), 88–92, 106–109 (1952)
Who Says Formalized Models are Appropriate for Describing Living Systems?
33
44. Polevaya, S.A., Kovalchuk, A.V., Parin, S.B., Yakhno, V.G.: The role of the parameters of the internal state of the physiological system in the awareness of sensory signals. In: Scientific Session of NRNU MEPhI 2010, Materials of Selected Scientific Works on the Topic: “Actual Issues of Neurobiology, Neuroinformatics and Cognitive Research”, pp. 58– 68. NRNU MEPhI, Moscow (2010). (in Russian) 45. Parin, S.B., Bakhchina, A.V., Polevaia, S.A.: A neurochemical framework of the theory of stress. Int. J. Psychophysiol. 94(2), 230 (2014) 46. Parin, S.B., Yakhno, V.G., Tsverov, A.V., Polevaya, S.A.: Psychophysiological and neurochemical mechanisms of stress and shock: experiment and model. Bull. Nizhny Novgorod Univ. N.I. Lobachevsky 4, 190–196 (2007). (in Russian) 47. Parin, S., Polevaia, S., Gromov, K., Polevaia, A., Kovalchuk, A.: Short-term variability of RR intervals during acute stress in healthy adults: neuromorphic model, experiment data, monitoring of daily life activity. Int. J. Psychophysiol. 108, 88 (2016) 48. Gorban, A.N., Tyukina, T.A., Smirnova, E.V., Pokidysheva, L.I.: Evolution of adaptation mechanisms: adaptation energy, stress, and oscillating death. J. Theor. Biol. 405, 127–139 (2016) 49. Gorban, A.N., Pokidysheva, L.I.,Smirnova, E.V., Tyukina, T.A.: Law of the minimum paradoxes. Bull. Math. Biol. 73(9), 2013–2044 (2011) 50. Inoue, T., Tsuchiya, K., Koyama, T.: Effects of typical and atypical antipsychotic drugs on freezing behavior induced by conditioned fear. Pharmacol. Biochem. Behav. 55(2), 195–201 (1996) 51. Akil, H., Darragh, F.M., Devine, P., Watson, S.J.: Molecular and neuroanatomical properties of the endogenous opioid system: implications for treatment of opiate addiction. Semin. Neurosci. 9(3–4), 70–83 (1997) 52. Tinyakov, R.L., Parin, S.B., Bespalova, Zh.D., Krushinskaya, Ya.V., Sokolova, N.A.: FMRFa and FMRFamide-like peptides (FaRPs) in the pathogenesis of shock. Adv. Physiol. Sci. 29(3), 56–65 (1998) 53. Polevaya, S.A., Eremin, E.V., Bulanov, N.A., Bakhchina, A.V., Kovalchuk, V., Parin, S.B.: Event-related telemetry of heart rhythm for personalized remote monitoring of cognitive functions and stress under conditions of everyday activity. Sovremennye tehnologii v medicine 11(1), 109–115 (2019). (in Russian) 54. Parin, V.V., Baevsky, P.M.: Introduction to Medical Cybernetics. Medicine, Moscow (1966). (in Russian)
Cognitive Sciences and Brain-Computer Interface
Mirror Neurons in the Interpretation of Action and Intentions Yuri V. Bushov1(&), Vadim L. Ushakov2,3,4, Mikhail V. Svetlik1,5, Sergey I. Kartashov2, and Vyacheslav A. Orlov2 1
3
National Research Tomsk State University, Tomsk, Russia [email protected] 2 National Research Center «Kurchatov Institute», Moscow, Russia Institute for Advanced Brain Studies, Lomonosov Moscow State University, Moscow, Russia 4 NRNU MEPhI, Moscow, Russia 5 Siberian State Medical University, Tomsk, Russia
Abstract. The aim of the research was the studying the activity of mirror neurons in humans during the observation and reproduction of rhythm. As markers of mirror neuron activity, we used depression of the EEG mu-rhythm in the alpha and beta frequency ranges, cortical interactions at the frequency of this rhythm, as well as the results of fMRI brain mapping. The research involved volunteers men and women aged from 18 to 27 year (University students). Research has shown that monitoring the reproduction of a five-second rhythm is accompanied by activation of not only those areas of the cortex where the «motor» mirror neurons are located, but also other cortex areas, as well as the basal ganglia and cerebellum. This findings suggest that mirror neurons themselves do not provide an understanding of actions and intentions, although they are involved in these processes. It is assumed that these neurons provide interaction between the prefrontal, sensory and motor areas of the cortex, as well as places where motor programs are stored in the brain. The result of the interaction of these structures is an understanding of the actions and intentions of other people. Keywords: Mirror neurons Interpretation of actions
Observation and reproduction of rhythm
1 Introduction Studying the mirror neurons functions is a relevant scientific and practical problem that is important for human social behavior understanding. According to the currently popular hypothesis [1], mirror neurons can serve as a neural basis for interpreting actions, imitating learning, and imitating the behavior of other people. According to researchers [2], this is achieved by copying the observer’s brain actions of another person by updating the corresponding motor programs. However, from this point of view, it is not clear why mirror neurons are activated not only when observing, but also when performing and mentally reproducing the same action. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 37–43, 2021. https://doi.org/10.1007/978-3-030-60577-3_3
38
Y. V. Bushov et al.
The purpose of this research was to study the activity of mirror neurons in humans during the observation and reproduction of rhythm.
2 Materials and Methods During the preliminary examination, the features of the lateral organization of the brain were studied with the determination of the leading hand (using the questionnaire method) and the speech hemisphere (dichotic test). As a model of cognitive activity, the subjects were offered activities related to the observation and reproduction of a fivesecond rhythm. As markers of mirror neuron activity, we used EEG mu-rhythm depression in the alpha and beta frequency ranges, cortical interactions at the frequency of this rhythm, and results of fMRI brain mapping. The electroencephalographic study involved volunteers, nearly healthy men (31 people) and women (34 people), and students aged from 18 to 23 year. All subjects gave informed consent to participate in this study. Several series of experiments were carried out. In the first series («monitoring the reproduction of rhythm»), the subject observed the operator’s hand, which first memorized the five-second rhythm, then the middle and index fingers of the leading hand reproduced this rhythm, periodically pressing the «Space» key. The rhythm period was set by a visual stimulus (a light square with a side of 2 cm, appearing periodically for 200 MS in the center of the darkened monitor screen). In the second and third series, the subject first memorized a five-second rhythm, then reproduced this rhythm with the fingers of the left hand, then with the right hand. Before and during cognitive activity, EEG was recorded in the frontal, middle, temporal, parietal, and occipital leads according to the «10–20%» system. When processing the obtained data, the maximum values of cross-correlation functions and spectral power estimates were calculated at short (1.5 s) artifact-free segments of EEG recording for 3 s («Background») and 1.5 s («Preparation») before pressing the key and immediately after the specified event («action Execution»). For statistical data processing, we used the “MatLab v14.0” package and the Wilcoxon criterion for related and independent samples. In part of the experiments, fMRI was used to study brain activity during observation and perception of time. These studies involved volunteers - 20 men and 20 women aged from 19 to 27 year (University students). The study included several series of experiments. In the first series, the subject is shown a video clip in which a white square with a side of 2 cm appears periodically in the screen center (with a period of 5 s). The subject must remember this rhythm. Then a video clip is shown showing the operator’s hand playing a five-second rhythm by pressing the «space» key with the middle and index fingers of the leading hand. After that, they show a video clip with the image of the operator’s motionless hand. In the second and third series, the subject reproduces the five-second rhythm by pressing buttons with the left or right hand, depending on the instructions. The subject is then shown a video clip with a picture of the stimulus (a white cross on a dark background in the center of the screen), on which his eyes should be directed during rest. The results of functional MRI were obtained in the complex of NBICS technologies of the Kurchatov Institute using a SIEMENS Magnetom Verio 3 T tomograph. All fMRI data was pre-processed using the SPM8 package. Within each of the paradigms,
Mirror Neurons in the Interpretation of Action and Intentions
39
pairwise comparisons were made based on student statistics and individual and group maps with a significance level of p < 0.001 were obtained. All the obtained statistical maps were underlaid on a template T-1 image and anatomically linked the «active» voxels to the CONN Atlas.
3 Results and Discussion The conducted research allowed us to detect statistically significant changes in the spectral characteristics of the EEG mu-rhythm in the middle regions of the cortex at different stages of the performed activity in men and women. It turned out that the nature of these changes depends on the frequency of this rhythm, gender, lateral organization of the brain, the type and stage of the activity performed. In particular, in the series «monitoring the reproduction of rhythm» on stage «Performing actions» in comparison with men, there is a statistically significant decrease in spectral power of EEG leads C3 at a frequency of 9 Hz to 10.1% (p < 0.005), and at a frequency of 10 Hz at 9.4% (p < 0.05). At the same time, at the «Preparation» stage, there were no statistically significant changes in the spectral power of the EEG at mu-rhythm frequencies in comparison with the background. The dependence of changes in the spectral characteristics of the mu-rhythm on its frequency, type and stage of the performed activity was also found in women. In particular, when reproducing a five-second rhythm with the right hand at the stage of «filling in the action» in comparison with the background, they have a statistically significant increase in the spectral power of the EEG in the C4 lead at a frequency of 8 Hz (p < 0.01), and at a frequency of 9 Hz in the same lead - a decrease of 10.8% (p < 0.02). Probably, the decrease in the spectral power of the mu-rhythm detected at certain frequencies at the stages of «Preparation» and «action Execution» reflects the activation of «motor» mirror neurons [3]. Analysis of cortical interactions at the mu-rhythm frequency between the middle and other cortical regions during the observation and reproduction of the five-second rhythm allowed us to detect in men and women the dependence of these interactions on the type and stage of the performed activity. The dependence of cortical interactions on the type and stage of activity performed in men when observing and reproducing a five-second rhythm is illustrated in Fig. 1. Comparison of the series with the right and left hands rhythm reproduction allowed us to detect interhemispheric differences in the levels of cortical connections. In particular, when the reproduction of a rhythm with the left hand (Fig. 1B) observed a statistically significant increased right-brain connections, and, if reproduction rhythm in the right hand – strengthening ties left hemisphere (Fig. 1C). Comparison of the series with the right and left hands rhythm reproduction allowed us to detect interhemispheric differences in the levels of cortical connections. In particular, when the reproduction of a rhythm with the left hand observed a statistically significant increased right-brain connections, and, if reproduction rhythm in the right hand – strengthening ties left hemisphere. A clear dependence of cortical interactions on the type and stage of the activity performed was also found in women. In particular, when reproducing the rhythm with
40
Y. V. Bushov et al.
Fig. 1. Dependence of cortical interactions at the mu-rhythm frequency on the type and stage of performed activity in men in the series «monitoring of rhythm reproduction» (A), «left-hand rhythm Reproduction» (B) and «right-hand rhythm Reproduction» (C). Note: the correlation coefficient values in rel. units are deferred on the ordinate axis; only statistically significant (p < 0.05) differences are shown
the left hand at the «Preparation» stage, they have a statistically significant (p = 0.01 0.02) increase in the levels of cortical connections between the middle and temporal, middle and occipital EEG leads compared to the background. So, if in the background the correlation coefficient between leads C3 and T5 is r = 0.634, then at the «Preparation» stage it increases statistically significantly (p = 0.02) to r = 0.649. If in the background the correlation coefficient between leads C3 and O1 is r = 0.598, then at the «Preparation» stage it increases statistically significantly (p = 0.01) to r = 0.617. Along with this, there are clear gender differences in the dynamics of cortical connections in the observation and reproduction of the rhythm. Thus, if when observing the reproduction of the rhythm at the stage «performing an action» in men, a statistically significant (p < 0.05) increase in cortical connections between the middle and parietal (CZ and P4), middle and temporal (CZ and T6, C4 and T6) leads was found in comparison with the background. Such changes in women were not detected. The comparison of the fMRI brain mapping results while monitoring the operator’s hand, which plays a five-second rhythm, with the rest (viewing the video image of a stationary on the operator’s hands) in men and women made it possible to detect activation not only in regions of «motor» mirror neurons [3], but also in other ones of the cortex and, in particular, right and left supramarginal gyrus, right and left angular gyrus, right and left lateral occipital cortex, right temporal courts, right and left frontal pole as well as the basal ganglia and some cerebellar areas. Gender differences were shown, in particular, in the fact that men have more activated angular gyrus on the left, while women have more activated angular gyrus on the right. At the same time, the area of the right and left middle temporal gyrus is more active in women. Probably,
Mirror Neurons in the Interpretation of Action and Intentions
41
these gender differences are related to the peculiarities of the lateral organization of the brain in men and women. Figure 2 shows group statistical maps of pairwise comparison between conditions of presenting a rhythmic stimulus (stimuli), observation of the operator reproducing a five-second rhythm (reprod), observation of operator’s motionless hand (rest1). All these maps imposed on a high-resolution T1 image template.
Fig. 2. Comparison of group statistical maps of the brain of men obtained when presenting a stimulus (stimuli) – a light square that appears with a period of 5 s, when observing the operator’s hand, which reproduces a five-second rhythm by pressing the «Space» key (reprod) and at rest (rest1), when a video clip showing the operator’s motionless hand is shown. A) reprod-rest1, B) stimuli-reprod, C) stimuli-rest1. Note: the comparison of stimuli-rest1 conditions did not reveal statistically significant differences.
The comparison of the results of a fMRI during reproduction five-second rhythm with the rest state allowed to detect men and women are partially activated the same regions of the cortex like when observing the reproduction of rhythm and also precuneus, the occipital pole and some other regions of the cortex. However, when reproducing the rhythm, the activity of these brain regions is much more pronounced. So, if in men, when reproducing the rhythm in the area of the right and left lateral occipital cortex, the number of activated voxels is 878 and 1007 respectively, then when observing the reproduction of the rhythm - 28 and 60 voxels respectively. Group statistical maps imposed on a T-1 high-resolution template image, obtained in men when comparing the conditions of reproducing a five-second rhythm with the right (right) or left (left) hand and resting state (rest2) with fixing the gaze on a white cross in the center of the screen, are shown in Fig. 3. Gender differences were shown in the fact that women’s brain regions involved in providing the proposed activity are activated more strongly than in men. So, if the
42
Y. V. Bushov et al.
Fig. 3. Comparison of group statistical maps of the brain of men obtained when playing a fivesecond rhythm with the right (right) or left (left) hand and at rest (rest2) with fixing the gaze on the white cross in the screen center. A) left-rest2, B) right-left. C) right-rest2
number of activated voxels in the area of the right precentral gyrus in men is 179 voxels, then in women - 509 voxels. Thus, studies have shown that the observation and reproduction of the rhythm are accompanied by depression of the mu-rhythm at certain frequencies and, most often, an increase in cortical connections at the mu-rhythm frequency between the middle and other cortical areas. It turned out that the nature of these changes depends on the frequency of mu-rhythm, the type and stage of activity performed, gender and lateral organization of the brain. It was found that the observation of rhythm reproduction is accompanied by activation of the right and left lateral occipital cortex, right fusiform gyrus, right and left middle temporal gyrus, right and left precentral gyrus, right and left temporal pad, right and left supracranial gyrus, right and left angular gyrus, right and left frontal pole as well as basal ganglia and some areas of the cerebellum. These structures probably form a functional system that provides an understanding of actions and intentions. It is important to note that in addition to the temporal regions of the brain and the prefrontal cortex, where the «motor» mirror neurons are located [3], this system includes other areas of the cortex, as well as the cerebellum and basal ganglia, which are considered the storage place for motor programs. These results and some literature data [4] suggest that mirror neurons themselves do not provide an understanding of actions and intentions, although they are involved in these processes. Analysis of cortical connections at the mu-rhythm frequency suggests that these neurons provide interaction between the sensory, motor and prefrontal cortex zones, as well as places where motor programs are stored in the brain. The result of the interaction of these structures is probably an understanding of the actions and intentions of other people.
Mirror Neurons in the Interpretation of Action and Intentions
43
4 Conclusion Thus, the research shows that mirror neurons themselves do not provide interpretation of actions and intentions, although they are involved in these processes. The results suggest that these neurons provide interaction between the prefrontal, sensory, and motor cortical regions, as well as storage locations for motor programs in the brain. The result of the interaction of these structures, apparently, is an understanding of the actions and intentions of other people. The study was partially funded by RFBR according to the research project № 18-001-00001.
References 1. Skoyles, J.R.: Gesture language origins and right handedness. Psychology. 11, 24–29 (2000) 2. Rizzolatti, G., Sinigaglia, C., Anderson, F.: Mirrors in the Brain: How Our Minds Share Actions, Emotions, and Experience, 1st edn., p. 242. Oxford University Press (2008) 3. Kosonogov, V.: Why the mirror neurons cannot support action understanding. Neurophysiology. 44(6), 499–502 (2012) 4. Schippers, M.B., Roebroeck, A., Renken, R., Nanetti, L., Keysers, C.: Mapping the information flow from one brain to another during gestural communication. Proc. Natl. Acad. Sci. 107(20), 9388–9393 (2010)
Revealing Differences in Resting States Through Phase Synchronization Analysis. Eyes Open, Eyes Closed in Lighting and Darkness Condition Irina Knyazeva1,4(B) , Boytsova Yulia3 , Sergey Danko3 , and Nikolay Makarenko2,4 1
Saint-Petersburg State University, Saint-Petersburg, Russia [email protected] 2 Pulkovo Observatory, Saint-Petersburg, Russia 3 Institute of the Human Brain, Russian Academy of Sciences, St. Petersburg, Saint-Petersburg, Russia 4 Institute of Information and Computational Technologies, Almaty, Kazakhstan
Abstract. Modern neuroimaging studies have shown that there are significant differences in the brain activity zones during resting states with open and closed eyes in light and dark circumstances. Our previous EEG studies also showed that these states demonstrate differences in the power spectrum and between-channel coherence, both in light and dark conditions. This work continues our previous studies. Here we increased the group of subjects and focused on the estimation of changes in the spatial between-channel coherence. We explored the following conditions: the resting state with closed eyes in the dark, the resting state with opened eyes in the dark, and the resting state with opened eyes in the light with eyes focused on the screen. To characterize the distinctions between each pair of states, we used individual differences in the phase synchronization. We calculated them for each pair of channels for a set of seven frequencies corresponding to the main rhythms. The size effect and its level of uncertainty were estimated using the Bayesian approach. Keywords: EEG resting state · EEG synchronization methods · Brain default system
1
· Bayesian
Introduction
Since the first EEG studies, it is known that the transition from a resting state with closed eyes (EC) to a resting state with open eyes (EO) is associated with the changes in the EEG patterns. These changes are usually explained by a reorganization of brain activity in response to visual information. Described observations are used in clinical neurophysiology as functional tests for brainstem dysfunction. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 44–50, 2021. https://doi.org/10.1007/978-3-030-60577-3_4
Mental and Sensory Attention
45
Modern neuroimaging studies have shown that there are significant differences in the activity of brain zones between resting states with open and closed eyes in daylight conditions (for example, [12,13]). It was also shown that resting states with EO and EC differ in complete darkness [9] and these differences are not limited to the activation of visual cortical zones of the cortex. Since we exclude the influence of visual sensory information, the obtained changes began to be interpreted as a consequence of different modes of the brain: an exteroceptive mode with open eyes, when attention is more focused on external perception, and an interoceptive mode with closed eyes, when attention is mainly focused on internal mental processes. EEG studies have shown that the differences between the states with open and closed eyes in daylight conditions are characterized by extensive brain activity rearrangements that affect all frequency ranges, and not just the alpha range [4]. Studies conducted on people with open and closed eyes in complete darkness have also pointed to certain distinctions between the EEG parameters [1]. The present work continues our previous studies by increasing the group of subjects, as well as exploring the following conditions: the resting state with EC in the dark, the resting state with EO in the dark, and the resting state with EO in the daylight with eyes focused on the screen. To characterize the distinctions between each pair of states, we used individual differences in the phase synchronization. We calculated them for each pair of channels for a set of seven frequencies corresponding to the main rhythms. We used the Bayesian approach to quantify distinctions between the states mentioned above. Within the framework of this approach, it is possible to estimate the magnitude of the difference in synchronization, along with the characteristic of the uncertainty of this quantity. Since studies of resting states and of the default system of the brain are strongly linked, in this work we tried to make a comparison of our result with the latter studies.
2 2.1
Methods and Materials Experiment Design
The study involved 53 healthy subjects (23 male and 30 female), aged 18 to 30 years old, right-handed. In accordance with the ethical standards of the 1964 Helsinki Declaration, all subjects signed a voluntary consent to participate in the study. The study was approved by the decision of the Ethics Commission of the Institute of Human Brain of the Russian Academy of Sciences. Subjects sat in a comfortable chair at rest, without a task. EEG recording was carried out for 3 min for each of the three states: eyes focused on a black dot in the middle of a white computer screen in daylight conditions; EC in complete darkness; EO in complete darkness. 2.2
EEG Data Preprocessing
19 channels of EEG were recorded using standard 10–20 electrode placement on the scalp by computer electroencephalograph “Mitsar-202”. The sampling
46
I. Knyazeva et al.
frequency of the data was 500 Hz, we additionally applied a bandwidth lter with the frequencies of 0–70 Hz. We used Independent Component Analysis (ICA) implemented by the WinEEG software package to remove eye-blink and moving artifacts in EEG records. The parameters of the high-pass lter and low-pass lter for removing eye-movement artifacts and cardiograms were 0.53 Hz and 50 Hz, respectively. 2.3
Phase-Based Connectivity Analysis
In order to evaluate the interchannel interactions in this work, we used phasebased connectivity analyses based on the distribution of phase angle differences between two electrodes. It is assumed that phase synchronization indicates a consistent behavior of neural populations [3]. In literature, there are several terms and methods to describe phase-based connectivity or synchronization. In this paper, we used two characteristics: intersite phase clustering (ISPC) and phase lag index (PLI). ISPC is defined as clustering in polar space of phase angle differences between electrodes. The PLI [3] measures the proportion of phase angle differences pointed toward positive or negative sides of an imaginary axis on a complex plane. The main advantage of PLI is robustness to spurious connectivity results due to volume conduction, whereas ISPC is generally more sensitive and able to detect the weakest connectivity. Both values is positive definite, symmetrical and not directed. The phase of the time series can be obtained from the complex representation of the signal using the wavelet transform or Hilbert transform. During a wavelet transform, for example, using the Morlet wavelet, we need to choose the central frequency of the wavelet. Since wavelet transform is a convolution of the signal and the wavelet, the signal is automatically filtered with this central frequency. Before using the Hilbert transform, it is required to do this filtering beforehand. In this paper, we utilized Morlet wavelet transform. 2.4
Data Analysis Pipeline
The data analysis pipeline for each individual is presented below (see Fig. 1). After basic preprocessing of EEG data we computed the ISPC and PLI connectivity between each pair of channels for every resting-state condition (3 ∗ 2 × 172 in total). The complex signal was obtained using Morlet wavelet transform. It was done for a grid of 7 frequencies: 3, 6, 9, 12, 16, 25, 35 Hz corresponding to the main EEG rhythms: δ, θ, α1 , α2 , β1 , β2 and γ respectively. Next, we obtained matrices of difference in connectivity for each pair of states and each frequency. In this case, we have three contrast matrices: EO relative to the EC in the dark, EO in the light relative to the EC in the dark, EO in the light relative to eyes open in the dark. Thus, differences were obtained both for ISPC and PLI, for each channel pair, for each individual, and for each frequency. The total number of estimated parameters is (2 ∗ 127 × 7 × 3). Conducting any tests in a frequentist framework is complicated by the problem of multiple comparisons. So, to assess the magnitude of the dissimilarities between
Mental and Sensory Attention
47
Fig. 1. Data analysis pipeline for one subject.
resting states in different conditions at the group level, we used the Bayesian approach. Within this approach it is possible to avoid the problem of multiple comparisons and obtain not only point estimates of the parameters of interest, but also a distribution with information about the uncertainty in parameter determination. In the Bayesian framework a posterior distribution of the parameters can be simulated basing on the information about their prior distribution and the data generating model. In our case, the following scheme was used for modelling the dissimilarities in phase synchronization: we took normal prior distribution with zero mean and unit dispersion for differences, we also took normal distribution for data generating model. As the output of the model, we obtain a posterior distribution for differences, which shows the average difference in synchronization and the Bayesian confidence interval, that is, the interval in which this difference lies with a 95% probability, in Bayesian framework this interval is also called highest posterior density interval (HPD). Such estimates were obtained for all combinations of connectivity contrasts and frequencies. For further analysis we selected only those for which this interval does not contain zero, which corresponds to the absence of synchronization difference. The analysis and discussion of the results are presented in the following section. We applied the algorithms described in the book [3] and in our previous work [8] to calculate ISPC and PLI values. For Bayesian modelling we used pymc3 package. The code was written in Python and is available at the Github repository with application examples.
3
Results and Discussion
First of all, it should be noted that ISPC and PLI characteristics for assessing the spatial coherence of the EEG reveal dissimilarities between the compared states in different zones of the cortex, which overlap only partially. It is connected with the above-mentioned differences in their calculation, see Methods section.
48
I. Knyazeva et al.
EO in Lighting Versus EC in Darkness. The ISPC and PLI results for this contrast are consistent with our previous results [4]. This difference is characterized by a coherence decrease in the anterior regions and a coherence increase in the posterior regions, as well as between the anterior and posterior regions in all EEG frequency bands (Fig. 2). In the α1 , α2 , and β1 bands a decrease is observed throughout all cortex. A coherence increase in the posterior regions, as well as between the anterior and posterior regions (Fig. 2). In this case, the effects from two other contrasts are superimposed (Fig. 3, 4), this confirms the earlier assumption [4] that in this case we have both effects: reorienting attention from the interoceptive eyes closed state to the exteroceptive eyes open state, as well as the effect of receiving visual information.
Fig. 2. Changes in the spatial coherence (measured with ISPC and PLI) of the EEG in EO state in lighting relative to the EC state in darkness
EO Versus EC in Complete Darkness. The results are shown in (Fig. 3). In this case we got less significant differences than in the previous work [1], but the direction of the differences coincides with the previous findings. This state is characterized by a coherence decrease between the frontal regions in all frequency bands (Fig. 3) and a decrease in coherence between the right anterior temporal region and other cortical regions in the α1 , α2 , β1 and γ bands. Some increases in coherence are noticed in the posterior temporal and parietal regions, as well as between the frontal and parietal, occipital regions. These changes are not associated with the perception of visual information and could be considered as a reflection of the process of reorienting attention from the interoceptive eyes closed state to the exteroceptive eyes open state. EO in Lighting Versus EO in Darkness. This state (see Fig. 4) is also characterized by a decrease of coherence between frontal regions in all frequency bands, but primarily in the θ, α1 , α2 , and β1 bands. Also, EO state in lighting is characterized by bigger connections in all frequency bands between the frontal, parietal regions on one hand, and parietal, occipital regions on the other. In this comparison, EO states differ only in the level of illumination, therefore, the revealed dissimilarities can be the result of visual information entering the cortex.
Mental and Sensory Attention
49
Fig. 3. Changes in the spatial coherence of the EEG in EO state relative to EC state in complete darkness.
Fig. 4. Changes in the spatial coherence of the EEG in the EO state in lighting relative to EO state in darkness.
4
Conclusion and Discussion
Since the discovery made in [7,10], studies of the default mode network (DMN) have been associated with studies of resting states [10]. It was initially believed [5, 6] that activity of the DMN does not change in the EO and EC resting states, since these states practically do not differ at the behavioral level. However, most studies agreed that the DMN is more associated with the internal attention, the state of rest with EC is also more characterized by the internal focus of attention, compared with the EO state. So, later neurovisualisation studies showed that the transition from the EC state to the EO state is characterized by a decrease in the DMN activity [2,11]. Despite a large number of studies in this field, the study of such a chain of states, along with applicable methods, was used for the first time. Comparison of our data with literature is largely difficult, because the DMN is more studied using neuroimaging methods. But several works can be mentioned. The DMN activity is associated with an increase in interhemispheric α-coherence in the frontal and parietal regions [11] or with increased α- and θ-coherence in the anterior regions [2]. These results are consistent with our findings in a chain of states (EC in darkness - EO in darkness - EO in lighting) and indicate a decrease of the DMN activity. Thus, the DMN activity can really be different in different resting states with EO and EC. These may be related to the focus of attention and the flow of visual information.
50
I. Knyazeva et al.
Acknowledgments. We gratefully acknowledge financial support of Institute of Information and Computational Technologies (Grant AR05134227, Kazakhstan) and RFBR, project number 19-07-00337.
References 1. Boytsova, Y.A., Danko, S.G.: EEG differences between resting states with eyes open and closed in darkness. Hum. Physiol. 36(3), 367–369 (2010). https://doi. org/10.1134/S0362119710030199 2. Chen, A.C.: EEG default mode network in the human brain: spectral field power, coherence topology, and current source imaging. In: 2007 Joint Meeting of the 6th International Symposium on Noninvasive Functional Source Imaging of the Brain and Heart and the International Conference on Functional Biomedical Imaging, pp. 215–218. IEEE (2007) 3. Cohen, M.: Analyzing Time Series Data. MIT Press, Cambridge (2014) 4. Danko, S.: The reflection of different aspects of brain activation in the electroencephalogram: quantitative electroencephalography of the states of rest with the eyes open and closed. Hum. Physiol. 32(4), 377–388 (2006). https://doi.org/10. 1134/S0362119706040013 5. Fox, M.D., Snyder, A.Z., Vincent, J.L., Corbetta, M., Van Essen, D.C., Raichle, M.E.: The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc. Natl. Acad. Sci. 102(27), 9673–9678 (2005) 6. Fransson, P.: Spontaneous low-frequency bold signal fluctuations: an fMRI investigation of the resting-state default mode of brain function hypothesis. Hum. Brain Mapp. 26(1), 15–29 (2005) 7. Gusnard, D.A., Raichle, M.E.: Searching for a baseline: functional imaging and the resting human brain. Nat. Rev. Neurosci. 2(10), 685–694 (2001) 8. Knyazeva, I., Yulia, B., Danko, S., Makarenko, N.: Spatial and temporal dynamics of EEG parameters during performance of tasks with dominance of mental and sensory attention. In: International Conference on Neuroinformatics, pp. 321–327. Springer, Cham (2018) 9. Marx, E., Deutschl¨ ander, A., Stephan, T., Dieterich, M., Wiesmann, M., Brandt, T.: Eyes open and eyes closed as rest conditions: impact on brain activation patterns. Neuroimage 21(4), 1818–1824 (2004) 10. Raichle, M.E.: The brain’s default mode network. Annu. Rev. Neurosci. 38, 433– 447 (2015) 11. Travis, F., et al.: A self-referential default brain state: patterns of coherence, power, and eloreta sources during eyes-closed rest and transcendental meditation practice. Cogn. Process. 11(1), 21–30 (2010) 12. Wei, J., Chen, T., Li, C., Liu, G., Qiu, J., Wei, D.: Eyes-open and eyes-closed resting states with opposite brain activity in sensorimotor and occipital regions: multidimensional evidences from machine learning perspective. Front. Hum. Neurosci. 12, 422 (2018) 13. Xu, P., et al.: Different topological organization of human brain functional networks with eyes open versus eyes closed. Neuroimage 90, 246–255 (2014)
Assessment of Cortical Travelling Waves Parameters Using Radially Symmetric Solutions to Neural Field Equations with Microstructure Evgenii Burlakov1,2(B) , Vitaly Verkhlyutov3 , Ivan Malkov1 , and Vadim Ushakov4,5,6 1
University of Tyumen, 6 Volodarskogo St., 625003 Tyumen, Russian Federation eb @bk.ru 2 V.A. Trapeznikov Institute of Control Sciences of RAS, 65 Profsoyuznaya St., 117997 Moscow, Russian Federation 3 Institute of Higher Nervous Activity and Neurophysiology of RAS, 5A Butlerova St., 117485 Moscow, Russian Federation 4 National Research Center Kurchatov Institute, 1 Akademika Kurchatova pl., 123182 Moscow, Russian Federation 5 National Research Nuclear University MEPhI, 31 Kashirskoe hwy, 115409 Moscow, Russian Federation 6 Institute for Advanced Brain Studies, Lomonosov Moscow State University, GSP-1, Leninskie Gory, 119991 Moscow, Russian Federation
Abstract. We model cortical travelling waves by radially symmetric solutions to planar neural field equations with periodic spatial heterogeneity, which capture e.g. the microstructure observed in the primary visual cortex. We investigate the so-called bump-solutions that correspond to generation of local excitation in the brain tissue and an early stage of radially symmetric spread of cortical waves. We study how the neural medium heterogeneity and other biophysical parameters such as neuron activation threshold, the lengths and the strengths of neuronal connections affect cortical waves mathematically represented in terms of radially symmetric solutions to neural field equations with microstructure. Keywords: Neural field equations · Cortical travelling waves · Radially symmetric solutions · Periodic spatial heterogeneity · BCI systems · Artificial intelligence The reported study was funded by RFBR and FRLC, project number 20-511-23001, by RFBR, project number 20-015-00475, and in part supported by OFIm, project number 17-29-02518. The results in Sect. 2 were obtained with the support of the Russian Science Foundation (grant no. 20-11-20131) in V.A. Trapeznikov Institute of Control Sciences of RAS. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 51–57, 2021. https://doi.org/10.1007/978-3-030-60577-3_5
52
1
E. Burlakov et al.
Introduction
Brain travelling waves are registered in many animals, from snails to higher mammals and humans, by electrical methods (see e.g. [1,2]). However, the complexity of registration and the ambiguity in interpretation so far does not allow neurotechnologies developers to focus the attention on this phenomenon. In the analysis of integrated electrical signals from a large number of neurons (EEG, MEG, ECoG, MUA), trained artificial neural networks are often used. In some cases, this brings to a success, but this approach does not allow to understand how the useful signal is extracted from these data [3], which reduces the effectiveness of artificial neural networks and impedes development of advanced BCI and AI systems. From experimental observations, we can conclude that the recorded integrated signals of EEG, MEG, and even ECoG above the noise level are mainly conditioned by cortical travelling waves. It has been theoretically shown that “point” stimulation of the cortical layer can generate travelling waves of various configurations from locally pulsating bumps to propagating and spiral waves [4]. However, in direct measurements, only fragments of travelling waves can be observed experimentally [1]. In this case, models of neural circuits, such as neural field equations, and, in particular, their solutions representing the forms of radially propagating excitations, which can be used to interpret not only local field potentials, but also EEG and MEG [5], are of significant importance. Here, we investigate the interaction of radially propagating excitation with a functional microstructure in the framework of the neural field theory, which allows us to understand how subtle changes in neural interactions affect the global potentials of the brain. The main object of our study is radially symmetric solutions to two-dimensional neural fields with periodic microstructure. The following two-dimensional modification of the classical Amari model, which captures the neural field heterogeneity was suggested in [6] ω(x − y, xf − yf , γ)fβ (u(t, y, yf ))dyf dy, ∂t u(t, x, xf ) = −τ u(t, x, xf ) + (1) Ω Y t ≥ 0, x ∈ Ω ⊆ R2 , xf ∈ Y, γ ∈ [0, 1).
Here u(t, x, xf ) is the activity of a neuron u at time t and position (x, xf ), Ω is a planar neural field, Y is the unit of the neural field microstructure taken here as the two-dimensional unit torus, x and xf are the global-scale and the micro-scale spatial variables, respectively, τ > 0 is a relative time constant. The integration kernel (connectivity function) ω determines the the lengths and the strengths of neuronal connections and often has the shape of a Mexican hat function, γ is the neural medium heterogeneity parameter (We refer the reader to [7] for more information on neural fields with microstructure as well as for the derivation of (1)). The probability fβ (u) of firing of a neuron with activity u is defined by the non-negative smooth function fβ . Typically fβ has sigmoidal shape and can be modeled by e.g. the Hill function:
Assessment of Cortical Travelling Waves Parameters
fβ (u) =
0, u ≤ 0, u > 0,
uβ , θ+uβ
53
(2)
where θ > 0 is the activation threshold and β > 0 is the steepness parameter. As β → ∞, the Hill function converges to the Heaviside function 0, u ≤ θ, H(u) = (3) 1, u > θ, so we put f∞ = H. Other types of sigmoidal functions also admit parameterizations with the same asymptotic behavior. The Heaviside function considered as the limit case of sigmoidal functions with growing steepness has been often employed e.g. to obtain analytic solutions of important types, to assess the stability of these solutions, and to perform numerical investigations of the corresponding neural field models, thus, significantly facilitating the investigations of neural field. At the same time, it has been shown (see [8]) that even a simple system consisting of two interconnected excitatory and inhibitory neurons ceases to function after the change of a sigmoidal activation function to the Heaviside function. Therefore, the use of Heaviside-type activation functions has to be rigorously justified. For the general solutions to neural field equations, such justification has been recently suggested in [9]. The main goal of the present research is to investigate the impact of biophysical parameters (such as e.g. the neural field microstructure expression level γ, the neuronal excitation rate determined by β, neuronal activation threshold θ, and the lengths and the strengths of neuronal connections formalized by ω) on the existence of the radially symmetric solutions to (1), which model the generation of a cortical wave, and on the shapes of these solutions, which determine the wave profile and and allow to study early stages of formation of cortical waves from the corresponding localized activity spots. In Sect. 2 we provide conditions that guarantee that the substitution of the Heaviside activation function can be used to assess the properties of radially symmetric solutions to the equation (1) of the neural field with periodic microstructure. In Sect. 3 we investigate numerically the effects of variation of the aforementioned biophysical parameters on the properties of cortical waves. Section 4 provides a brief summary of the main results obtained.
2
Analytical Results
Under the assumptions that the wave spread in the cortex is close to radially symmetric (see e.g. [2,10]) and that the neuron activation is described by the Heaviside function, we obtain the following stationary solution to (1): ∞ UA,B (r)=2π ω ( )J 0 (r ) BbJ1 (b )−AaJ1 (a ) d , b>a>0, A, B ∈ R, (4) 0
54
E. Burlakov et al.
where Jk is the Bessel function of the first kind of order k, ω is the Hankel transform of the connectivity kernel ω, ω (x) = ω(x, xf )dxf (The derivation Y
of (4) from (1) can be found in [11]). From (4) we have U0,1 to be a bump-solution to (1) of the radius b, which corresponds to an early stage of the cortical wave spread, and U1,1 to be a ring-solution to (1) with the contours r = b and r = a, which represents the cortical wave spread in later stages. Hereinafter we consider the following connectivity kernel x 1 φ , ω(x, y, γ) = (5) ς(y, γ) ς(y, γ) 1 1 where φ(x) = 4π exp(−|x|) − 8π exp(− |x| 2 ), ς(y, γ) = 1 + γ cos(2πy1 ) cos(2πy2 ), y = (y1 , y2 ), γ ∈ [0, 1). However, we emphasize that the main result of this section, which we formulate below as Proposition 1, is valid for a broad class of connectivity functions standardly used in the neural field theory. Let us introduce the following notation: Y ( ) = BbJ1 (b )−AaJ1 (a ), a Ya ( ) = −A J1 (a )− 2 (J0 (a ) − J0 (a )) , Yb ( ) = B J1 (b )− b 2 (J0 (b ) − J0 (b )) .
Proposition 1. Assume that the conditions ∞
|ω (r)|r2 dr < ∞,
(6)
0
∞ ω (r) J0 (ξr)J1 (ξr) + 0
ξr 2
J02 (ξr) − 2J12 (ξr) − J0 (ξr)J2 (ξr) dr = 0,
∞
ω (r)J0 (· r) J1 (ξr) + 0
ξr 2
J0 (ξr) − J2 (ξr) dr ≡ 0,
are satisfied for ξ = a, b, and the system of equations ⎧ ∞ ⎪
⎪ ω ( )(J 0 (a )Ya ( ) − j11 J1 (a )Y ( ))d = 0, ⎪ ⎪ ⎪ 0 ⎪ ⎪ ∞
⎪ ⎪ ⎪ ω ( )(J0 (b )Yb ( ) − j12 J1 (b )Y ( ))d = 0, ⎪ ⎪ ⎨0 ∞
ω ( )(J0 (a )Ya ( ) − j21 J1 (a )Y ( ))d = 0, ⎪ ⎪ ⎪ 0 ⎪ ⎪ ∞
⎪ ⎪ ⎪ ω ( )(J 0 (b )Yb ( ) − j22 J1 (b )Y ( ))d = 0, ⎪ ⎪ ⎪ 0 ⎪ ⎩ j21 j11 = jj12 + 1. 22 −1
(7)
(8)
(9)
with respect to the variables j11 , j12 , j21 , j22 is inconsistent. Then for any radially symmetric solution of the form (4) to the equation (1) with the Heaviside-type activation function (3), there exists arbitrarily close solution of the form (4) to the equation (1) with a sigmoidal activation function of the form (2).
Assessment of Cortical Travelling Waves Parameters
55
The proof of this statement is based on the general results of [12] and can be found in [13]. The validity of Proposition 1 conditions, thus, allows to use the Heaviside activation function in the investigation of radially symmetric solutions to (1).
3
Numerical Results
Here we present and interpret the results of numerical investigation of radially symmetric solutions to (1) for β = ∞, i.e. in the case of the Heaviside activation function. Figure 1 illustrates the existence condition for bump-solutions for the following fixed values of the heterogeneity parameter γ = 0, 0.2, 0.5, 0.9. The intersections between the threshold values θ (horizontal lines) and the graphs determine the corresponding bumps radii. In the general case, we have two bump-solutions for each admissible activation threshold value: one broad bump and one narrow bump. We also observe that an increase of the value of the parameter γ, i.e. the increase of the heterogeneity degree yields an increase of the bumps radii of both the broad and the narrow bump-solutions for each fixed value of the threshold θ. The level curves description in Fig. 2 reveals that the broad bumps radii decrease with γ for large values of the activation threshold θ and increase for small and medium and values of θ.
Fig. 1. The curves of the bump-solution existence condition for different values of γ. The figures in the insets a) and b) show the bump-solutions shapes corresponding to the solid and the dashed branches of the curves, respectively. The colored boxes indicate the values of transition between these shapes.
56
E. Burlakov et al.
Fig. 2. The level curves of the bump-solution existence condition in different regions of the parameter plane (γ, a). The lines are marked with the values of θ.
Fig. 3. Dependence of the radius of transition between the bump shapes a) and b) (the vertical axis) on the brain medium heterogeneity γ.
Figure 1 also exhibits the following interesting feature. With the increase of the broad bumps radii the bump-solution shape changes from the shape a) to the shape b), which is consistent with the natural process of the cortical wave spread in the brain cortex. Inerestingly, the values of the activation thresholds and the radii, for which the transitions occur, depend on the level of the brain medium heterogeneity. The dependence of the radius of transition between the bump shapes a) and b) on the neural field heterogeneity is presented in Fig. 3.
4
Conclusions
Travelling waves and pulsating bumps can exist in neural medium with lower heterogeneity for higher values of neuronal activation thresholds comparing to the cortical waves in the neural fields with a higher expression of the microstructure. For small and medium values of the activation threshold, the maximal admissible regions of excitation in the brain cortex grow together with the heterogeneity
Assessment of Cortical Travelling Waves Parameters
57
level, at the same time decreasing with an increase of the heterogeneity for high values of neuronal activation thresholds. Finally, the transition from a bump to the initial stage of the corresponding radially symmetric travelling wave formation occurs in earlier stages of the growth of the excited regions (i.e. for smaller values of the bump radius) in the brain medium with lower levels of heterogeneity compared to neural media with higher expression of microstructure.
References 1. Muller, L., Chavane, F., Reynolds, J., Sejnowski, T.: Cortical travelling waves: mechanisms and computational principles. Nat. Rev. Neurosci. 5, 255–268 (2018) 2. Martinet, L.E., Fiddyment, G., Madsen, J.R., Eskandar, E.N., Truccolo, W., Eden, U.T., Cash, S.S., Kramer, M.A.: Human seizures couple across spatial scales through travelling wave dynamics. Nat. Commun. 8, 14896 (2017) 3. Ha, K., Jeon, J.: Motor imagery EEG classification using capsule networks. Sensors 19(13), 2854 (2019) 4. Naoumenko, D., Gong, P.: Complex dynamics of propagating waves in a twodimensional neural field. Front. Comput. Neurosci. 13(50), 1–50 (2019) 5. Verkhlyutov, V., Sharaev, M., Balaev, V., Osadtchi, A., Ushakov, V., Skiteva, L., Velichkovsky, B.: Towards localization of radial traveling waves in the evoked and spontaneous MEG: a solution based on the intra-cortical propagation hypothesis. Proc. Comput. Sci. 145, 617–622 (2018) 6. Burlakov, E., Wyller, J., Ponosov, A.: Two-dimensional Amari neural field model with periodic microstructure: rotationally symmetric bump solutions. Commun. Nonl. Sci. Num. Simul. 32, 81–88 (2016) 7. Svanstedt, N., Woukeng, J.L.: Homogenization of a Wilson-Cowan model for neural fields. Nonlin. Anal. Real World Appl. 14(3), 1705–1715 (2013) 8. Burlakov, E.: On inclusions arising in neural field modeling. Differ. Equ. Dyn. Syst. (2018). https://doi.org/10.1007/s12591-018-0443-5 9. Burlakov, E., Zhukovskiy, E., Verkhlyutov, V.: Neural field equations with neurondependent Heaviside-type activation function and spatial-dependent delay. Math. Meth. Appl. Sci. (2020). https://doi.org/10.1002/mma.6661 10. Xu, W., Huang, X., Takagaki, K., Wu, J.Y.: Compression and reflection of visually evoked cortical waves. Neuron 55(1), 119–29 (2007) 11. Burlakov, E., Nasonkina, M.: On connection between continuous and discontinuous neural field models with microstructure I. general theory. Tambov Univ. Rep. Ser. Nat. Tech. Sci. 23(121), 17–30 (2018) 12. Burlakov, E., Ponosov, A., Wyller, J.: Stationary solutions of continuous and discontinuous neural field equations. J. Math. Anal. Appl. 444, 47–68 (2016) 13. Radially symmetric solutions to continuous and discontinuous neural field equations. https://sites.google.com/view/vhbtw/main/rsscdnfe
The Solution to the Problem of Classifying High-Dimension fMRI Data Based on the Spark Platform Alexander Efitorov1(&) , Vladimir Shirokii1,2 , Vyacheslav Orlov3, Vadim Ushakov2,4, and Sergey Dolenko1 1
4
D.V.Skobeltsyn Institute of Nuclear Physics, M.V.Lomonosov Moscow State University, Moscow, Russia [email protected] 2 National Nuclear Research University “MEPhI”, Moscow, Russia 3 National Research Center “Kurchatov Institute”, Moscow, Russia Institute for Advanced Brain Studies, Lomonosov Moscow State University, Moscow, Russia
Abstract. This paper compares approaches to solving the classification problem based on fMRI data of the original dimension using the big data platform Spark. The original data is 4D fMRI time series with time resolution (TR) = 0.5 s for one sample recording. Participants have to solve 6 tasks, requiring activating various types of thinking, during 30 min session. A large number of subjects and a short time resolution generated the dataset with more than 86 000 samples, which allowed applying machine learning methods to solve this problem, instead of classical statistical maps. The random forest model was used to solve the binary classification problem. The paper analyzes model performance dependence upon time during the problem solving sessions. Evidence has been obtained that there is some limited time required for solving the same type of problems, and if more time is spent, this is due to the fact that the brain does not instantly get involved in the work on the proposed task, but it is still staying at resting state for some time. Keywords: fMRI
Big data Spark Random forest
1 Introduction fMRI is one of the most popular methods for studying the functional characteristics of the human brain over the past 25 years [1]. The development of tomography technology has allowed not only clinical studies of the structure of the brain, but also study of functional features, ranging from functional networks of various phases of sleep [2] to more complex tasks of localizing individual words in the human brain [3]. The solution of such grandiose tasks became possible thanks to the development of the functional tomography technique itself (temporal resolution changed from 4 s to 0.5 s [4]) and to the development of data processing methods [5] and computer technology. While in the first versions of generally accepted approaches to processing simple statistical tests were used to sample the subjects [6], modern studies already use © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 58–64, 2021. https://doi.org/10.1007/978-3-030-60577-3_6
The Solution to the Problem of Classifying High-Dimension fMRI Data
59
permutation tests [7] and machine learning models [8]. It should be noted that despite the success of modern research, discussions regarding the reliability of the obtained results still take place [9, 10]: fMRI images are high-dimensional data with a high noise level, undergoing a number of geometric transformations, such a realignment to standard anatomical brain (e.g. MNI152) [11] and motion correction [12], which can significantly damage the original image. The development of modern computer science and big data approach [13] will allow us to finally close the question regarding the reliability of the results obtained: if one manages to build a model that stably solves the inverse problem (for example, determining the type of human activity), by studying this model it will be possible to determine the active zones of the brain that correspond to the problem being solved. Note that in recent years, datasets compiled for hundreds of task-based [14] and resting state [15] subjects were published, which made it possible to use modern methods of big data analysis and deep learning methods to process them. Approaches and software for processing fMRI data were formed back in the late 90s, and are based on the mathlab [16], shell [17] and GUI interfaces [18], popular in the scientific community. Note that in recent years, solutions have appeared in the python language (nipype [19], nilearn [20]), which is popular in the field of data science. However, all these packages use standalone personal computers, which do not allow efficient processing of large amounts of data. Because of this, until recently, the standard approach was data conversion using the PCA and ICA methods to low dimensional data (less than 1000 features), and working with this data with classical machine learning methods [21]. BROCCOLI [17] realized concept of parallel data processing based on multiple CPU/GPU computational cores, C-PAC [22] became one of the popular cloud-based solutions, but technically this package just accumulated default packages (AFNI, FSL and ANTS) and effectively parallelized all manipulations at virtual machines at cloud. In this study, we solve the problem of binary classification of human mental states on full-dimension fMRI data. All the processing procedures were performed using the big data analysis platform Spark, the random forest model was used as classifier.
2 Experimental fMRI records had the following parameters: TR = 0.5 s, TE = 50 ms, spatial resolution = 1 mm for T1 image and 2 mm for T2 image. All images have been subject to realignment in the MNI 152 template, motion artifacts were removed based on [23], and then spatial smoothing over 8 mm has been performed. In this study, we used data recorded on a total of 31 subject. There were 6 various types of tasks (15 tasks of each type) for all humans, randomly selected from database with no repeating [24]. A participant has 16 s to solve the problem, then the screen was turned off, and the answer was not accepted. If a participant pressed the button with answer earlier, the answer was accepted, and the screen was turned off immediately. After a short pause for rest, a special picture appeared to fix the participant’s sight, and a new task appeared at the screen.
60
A. Efitorov et al.
The 6 types of tasks were as follows: 1. From the four proposed options, choose a puzzle element suitable in shape and pattern on it (S1); 2. From the four proposed options, choose a puzzle element that is suitable in shape and inscription on it (S2); 3. From the four proposed variants of words, choose a word logically unsuitable for any of the four squares represented inside (V1); 4. From the four proposed variants of words, choose a word logically unsuitable for any of the presented figures inside the four squares (V2); 5. From the four proposed options for the drawings, choose a logically inappropriate one for any of the presented drawings inside the four squares (V3); 6. From the four proposed variants of words, choose a word logically unsuitable for any of the presented inside one square (V4); Note that different test subjects took different time to solve the problems. To exclude unmotivated answers, decisions that took less than 5 s were excluded. Also, samples, corresponding to the first and the last seconds of the active state were excluded, because of artifacts related with motion (changing point of view, when task image appears, and button press). All the remaining samples corresponding to the task (presumed active state) were marked as Class 1, other samples corresponding to the time between task solving sessions (rest state) – as Class 2. The total dataset consisted of over 86,000 of samples, about 43% of which belonged to Class 1. The entire data set was randomly divided into training and test sets in a ratio of 80% to 20%, respectively. Samples of the test set were not used in the training process, only for model performance evaluation. Based on these data, the sets for training random forest classifier models were formed. In order to avoid unnecessary problems during model evaluation, features having only zero values over the whole dataset of samples were removed. As the result, the number of features was halved (from 1.08 Mio. Features to 434 thousand), and these vectors of particular fMRI samples were used for further calculations.
3 Results The accuracy of the best random forest model (20 trees, max depth 15) on the described binary classification problem was 97.6% on the training set and 73.6% on the test set. All the results presented below were obtained on the test set. An analysis of the behavior of the model depending on time required to solve the problem is presented in Fig. 1. It should be noted that the resting state was always defined correctly, while for some samples marked as active state (Class 1) the subject in fact had no activity. Therefore, some of the samples marked as active state in fact corresponded to resting state (while all the samples marked as resting state were really resting state). In this case of supervised learning with a supervisor that sometimes lies, some of the false negative errors were in fact not errors, and there were practically no false positives. Having this
The Solution to the Problem of Classifying High-Dimension fMRI Data
61
Fig. 1. Left: the dependence of classification accuracy on the time spent to solve the problem. Right: the number of fMRI images for different durations of problem solving.
in mind, it is interesting to analyze the “accuracy” of the random forest model in respect to the supervisor’s desired responses and its dependence on time. As Fig. 1 shows, the accuracy of the model does not depend on the number of samples corresponding to the time spent on solving the problem (no obvious correlation exists between the two parts of Fig. 1). The most accurate classification results were observed if the time of problem solving was from 6 to 10 s. For larger times, the accuracy goes down and even below 0.5, making us suspect that it was mostly for these long trials that the supervisor lied, i.e. when the brain of the subject presumed to solve the task was in fact in the resting state. This interesting assumption is somewhat confirmed by Fig. 2. The left part of the figure shows the accuracy of the model depending on the time remaining before pressing the answer button by tested subjects. It can be clearly seen, that for time moments distant from button click (if the subject gives an answer only after 8 s or more from current moment), in most cases the model marks the brain state as rest, not active. Also, one second before button click is never associated with rest state (most likely, the brain activity is determined by the motion of pressing the button).
Fig. 2. Left: the dependence of classification accuracy on the time remaining before pressing the response button by the tested subject. Right: the dependence of the classification accuracy on the type of problem being solved (rest state – no problem being solved).
62
A. Efitorov et al.
Thus, if a person spends a long time solving a problem, then most likely during the first several seconds the brain did not start solving the problem. Only 6–8 s of brain activity are required to solve the problem, and all the time exceeding this limit is actually the rest state. This assumption is also confirmed by the analysis of model accuracy depending on the type of problem (Fig. 2, right): the accuracy of determining the rest state of the brain is much higher than for the state of solving any of the problems. Note that the fraction of the samples of Class 2 (definite rest state) is about 57%, which means that the effect is not related to the imbalanced classes in the dataset. The low accuracy of determining the active state (problem solving) can be explained by the fact that the brain does not get involved in solving a problem immediately, and the transfer from the resting state to the active state is delayed. Further studies should be aimed at confirmation of the offered hypothesis by detailed analysis of the confusion matrices and their dependence on time before button click. Another possible approach is to perform clustering of the samples into 2 or 3 classes and to juxtapose the results of clustering and classification with the hope of making the samples of Class 1 that actually describe the rest state at the beginning of problem solution time fall into a separate cluster. Also, the results should be checked using alternative machine learning algorithms such as convolutional neural networks or the attention mechanism based on the LSTM network, which allows framing a sequence of input vectors as a single vector.
4 Conclusion This paper presents an approach to the study of the brain activity based on analysis of performance of the random forest machine learning model, built on the full-dimension fMRI data using the Spark big data analytics platform. This approach eliminates the influence of mathematical methods of reducing the dimensionality (for example, PCA) on the result. During the study, evidence has been obtained that there is some limited time required for solving the same type of problems, and if more time is spent, this is due to the fact that the brain does not instantly get involved in the work on the proposed task, but it is still staying at resting state for some time. Further studies can be aimed at confirmation of the offered hypothesis by more detailed analysis of the results of classification, by using sample clusterization or alternative machine learning models. Acknowledgement. This study was approved by the ethics committee of the NRC Kurchatov Institute, ref. no. 5 (from 05.04.2017). This study has been conducted at the expense of Russian Science Foundation, grant no. 18-1100336.
The Solution to the Problem of Classifying High-Dimension fMRI Data
63
References 1. Bookheimer, et al.: The lifespan human connectome project in aging: an overview. NeuroImage 185, 335–348 (2019) 2. Siclari, F., et al.: The neural correlates of dreaming. Nat. Neurosci. 20(6), 872 (2017) 3. Huth, A., de Heer, W., Griffiths, T., et al.: Natural speech reveals the semantic maps that tile human cerebral cortex. Nature 532, 453–458 (2016) 4. Ushakov, V., Orlov, V., Kartashov, S., Malakhov, D.: Ultrafast fMRI sequences for studying the cognitive brain architectures. Procedia Comput. Sci. 145, 581–589 (2018) 5. Fornito, A., Zalesky, A., Bullmore, E.: Fundamentals of Brain Network Analysis. Academic Press, Cambridge (2016) 6. Friston, K., Holmes, A., Poline, J., Grasby, P.: Analysis of fMRI time-series revisited. Neuroimage 2(1), 45–53 (1995) (Elsevier) 7. Nichols, T.E., Holmes, A.P.: Nonparametric permutation tests for functional neuroimaging: a primer with examples. Hum. Brain Mapp. 15(1), 1–25 (2002) 8. Pereira, F., Mitchell, T., Botvinick, M.: Machine learning classifiers and fMRI: a tutorial overview. NeuroImage 45(1), 199–209 (2009) 9. Eklund, A., Nichols, T., Knutsson, H.: Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proc. Natl. Acad. Sci. U.S.A. 113(28), 7900–7905 (2016) 10. Efitorov, A., Orlov, V., Ushakov, V., Shirokiy, V., Dolenko, S.: Comparison of nonlinear methods of motion correction in fMRI data. Procedia Comput. Sci. 145, 188–192 (2018) 11. Evans, A., Janke, A., Collins, L., Baillet, S.: Brain templates and atlases. NeuroImage 62, 911–922 (2012) 12. Power, J., Mitra, A., Laumann, T., et al.: Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage 84, 320–341 (2014) 13. Landhuis, E.: Neuroscience: big brain, big data. Nature 541, 559–561 (2017) 14. Gorgolewski, K., Storkey, A., Bastin, M., Whittle, I., Wardlaw, J., Pernet, C.: A test-retest fMRI dataset for motor, language and spatial attention functions. GigaScience 2(1), 2047– 217X (2013) 15. Smith, S., et al.: Resting-state fMRI in the human connectome project. Neuroimage 15(80), 144–168 (2013) 16. John, A.: SPM: a history. NeuroImage 62(2), 791–800 (2012) 17. Eklund, A., Dufort, P., Villani, M., LaConte, S.: BROCCOLI: software for fast fMRI analysis on many-core CPUs and GPUs. Front. Neuroinf. 8, 24 (2014) 18. Jenkinson, M., Beckmann, C., Behrens, T., Woolrich, M., Smith, S.: FSL. NeuroImage 62, 782–790 (2012) 19. Gorgolewski, K., Burns, C., Madison, C., Clark, D., Halchenko, Y., Waskom, M., Ghosh, S.: Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in Python. Front. Neuroinf. 5, 13 (2011) 20. Abraham, A., Pedregosa, F., Eickenberg, M., et al.: Machine learning for neuroimaging with scikit-learn. Front. Neuroinf. 8, 1–14 (2014) 21. Varoquaux, G., Sadaghiani, S., Pinel, P., Kleinschmidt, A., Poline, J., Thirion, B.: A group model for stable multi-subject ICA on fMRI datasets. NeuroImage 51(1), 288–299 (2010) 22. Craddock, C., et al.: Towards automated analysis of connectomes: the configurable pipeline for the analysis of connectomes (C-PAC). Front. Neuroinf. 42 (2013)
64
A. Efitorov et al.
23. Griffanti, L., Douaud, G., Bijsterbosch, J., et al.: Hand classification of fMRI ICA noise components. NeuroImage 154, 188–205 (2017) 24. Orlov, V., et al.: “Cognovisor” for the human brain: towards mapping of thought processes by a combination of fMRI and eye-tracking. In: Samsonovich, A., Klimov, V., Rybina, G. (eds.) Biologically Inspired Cognitive Architectures (BICA) for Young Scientist, AISC 2016, vol. 449, pp. 151–157. Springer, Heidelberg (2016)
A Rehabilitation Device for Paralyzed Disabled People Based on an Eye Tracker and fNIRS Andrey N. Afonin(&), Rustam G. Asadullayev, Maria A. Sitnikova, and Anatoliy A. Shamrayev Belgorod National Research University, Belgorod, Russia [email protected]
Abstract. The article considers the robotic arm rehabilitation device for severely paralyzed and motor disabled people with a control system using a portable eye tracker and functional near-infrared spectroscopy (fNIRS). An eye tracker is used to control the trajectory of the robotic arm's grip, while fNIRS is used to switch between control planes, or to open and close a prosthetic hand. Artificial neural networks (ANN) are employed for pattern recognition in the processing of fNIRS signals. User safety is ensured by a vision system and mechanical safety elements. The pilot experiment showed the feasibility of the implemented device. A prototype of the robotic arm rehabilitation device with a spherical coordinate system was implemented to help severely paralyzed people to take care of themselves. Keywords: Severely paralyzed people Rehabilitation device Robotic arm Manipulator Neurotechnology Brain-computer interface Eye tracker fNIRS
1 Introduction Nowadays there are about 750,000 severely paralyzed and motor disabled people all over the world who face many challenges in performing their everyday duties, in communicating with other people and interacting with the environment. The problem of improving the quality of life of severely paralyzed people is extremely urgent both from medical as well as social perspectives. Since medical aid can’t provide the rehabilitation of the motor functions of disabled people in most cases, there is a possibility to improve the quality of their life and help them independently perform some activities by applying robotic devices, for example, robotic manipulators, combined with brain-computer interfaces. By utilizing a manipulator, a disabled person can drink, eat, rub his face with a sponge, control various household appliances, call a nurse, etc. without assistance. Today some prototypes of such devices have already been invented [1, 2, 3, etc.], for example, the service robot My Spoon (Japan). The core components of such rehabilitation devices are a robotic manipulator to successfully perform its functions and a BCI that can be easily controlled by severely paralyzed people. However, the feasibility of skillful performing even simple, but important tasks with the help of a robotic device for a disabled person is yet on a low level. The main problems, such as the level of accuracy of a manipulator in performing © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 65–70, 2021. https://doi.org/10.1007/978-3-030-60577-3_7
66
A. N. Afonin et al.
any movement, as well as an easy-to-understand and easy-to-operate BCI that can be controlled by any disabled person, have not been successfully solved up to date. In this regard, robotic devices have not yet been widely used to improve the quality of life of people with disabilities due to significant drawbacks. In case of severely paralyzed people, who can't speak and move, especially for people with a “locked-in” syndrome, it's extremely difficult to invent rehabilitation devices with a possibility of lost functions restoration. Considerable attention has recently been paid to the invention and development of robotic devices that can be controlled by a BCI to improve the quality of life of severely paralyzed people. BCIs for many people with disabilities are tools that can support their motor recovery and restoring mobility, as well as communication with other people. Moreover, various BCIs (invasive, non-invasive, partially invasive) that uses brain activity directly without any motor involvement, for activation of a computer or other external devices, are being developed nowadays [4–7]. Non-invasive BCIs based on electroencephalography (EEG) applications are fast and safe [4, 6], but their main disadvantage is a high percentage of errors that make them unacceptable for controlling robotic devices while moving. Non-invasive BCIs based on indirect measurement of neural activation such due to the analysis of blood oxygen saturation in the brain such as the blood oxygen level-dependent (BOLD) principle in a functional magnetic resonance imaging (fMRI) and signals measured by functional near-infrared spectroscopy (fNIRS) [7–9] are more accurate than EEG-based BCIs but have a low temporal resolution that makes it difficult in applications with moving robotic devices. fNIRS has some advantages over fMRI: low cost, portability, safety, low noise, and easy applicability. fNIRS is a noninvasive optical imaging technique that measures changes in hemoglobin (Hb) concentrations within the brain using the characteristic absorption spectra of Hb in the near-infrared range. fNIRS is proved to be successfully applied in BCI for severely paralyzed people's communication [10]. Most severely paralyzed and motor disabled people retain the ability to move their eyes. The procedure of measuring the fixation position or rapid movements of the eye between fixation points (saccades) for different scientific and practical purposes has been widely studied. Therefore, eye trackers are regarded as a promising tool to detect gaze movements to control robotic manipulators [11]. There are successful examples of using eye trackers in robot and robotic devices control [12, 13]. However, several limitations should be taken into considerations. The human eye movements are accompanied by vibrations with different frequencies [11]. Non preprocessed transmission of eye-tracker signals with vibrations and jerks to the robotic manipulator can lead to a significant loss in the accuracy of the manipulator grip positioning. Therefore, filtering out spontaneous vibrations of the eyeball is required to be built in the control system of the manipulator. Some difficulties can occur while fixing the gaze at a given point. Eye trackers allow controlling eye movements by two coordinates, while the manipulator requires control in a three-dimensional space. Some additional independent channels are required in the robotic device to start and stop the movement of the manipulator, to compress and unclench the grip. The implementation of such channels, for example, by eye blinking is not possible, since blinking inevitably cause a gaze shift, and therefore, a shift in the position of the manipulator grip. Therefore, some scientists suggest applying BCI [13] in addition to the eye trackers.
A Rehabilitation Device for Paralyzed Disabled People
67
Most severely paralyzed people are not capable of moving other parts of the body, except eyes. Thus, much promising is the invention of robotic rehabilitation devices for disabled people with a combined control system based on detecting and converting both eye movements and neural activity.
2 Methods Objectives. This project aims to invent a rehabilitation device (a prosthetic hand) that, on the one hand, measures human cerebral activities and apply neural control directly as an input signal to open and close a prosthetic hand, and on the other hand, uses tracking the gaze to switch the control of a robotic limb to smoothly perform rudimentary actions. Procedure. The rehabilitation device includes two video cameras mounted perpendicular to the display of the PC; eye tracker fixed stationary so that it can detect the gaze of a person with disabilities, and an fNIRS with sensors mounted over the motor cortex. The main principle of the rehabilitation device: motor imagery tasks detected by fNIRS will be used to start and stop the robotic manipulator, to switch the control of the eye tracker while moving the gripper along the planes (YOZ and XOZ), closing the gripper. The hemodynamic patterns of mentally represented movements (motor imagery tasks) of the upper and lower extremities are used in the experiment. The rehabilitation device operates as follows: a severely paralyzed or motor disabled person (the User of the device) sets the grip of the manipulator in the XOZ plane by using his or her gaze and tracking its position on the screen where the image from the corresponding camera is displayed. After the position of the manipulator grip is settled along the XOZ plane, the User mentally presses with his left hand, for example, the marked point on the screen where the manipulator grip is located (so that not to lose its position by moving a gaze). This press is a key signal for switching the control system to the YOZ plane with the simultaneous switching to another camera. The next step is to settle the manipulator grip along the YOZ plane. After the manipulator is finally set at the desired position, the User sends a command to open/close the grip by mentally pressing the marked point of the grip on the screen with the right hand. Participants of the pilot experiment gave written informed consent. The study was approved by the Ethics Committee of the General and Clinical Psychology department of Belgorod National Research Institute (Belgorod, Russia) meeting № 10 held 06.05.2020 Data Acquisition. In the described device portable Tobii Eye Tracker 4C was applied together with user software developed to track changes in the gaze coordinates on the plane. Neuroimaging was performed using the mobile near-infrared spectroscopy device NIRSport Model 88 (NIRx Medical Technologies, LLC, NY, the USA). Eight sources and eight detectors were mounted at a distance of 3 cm from each other in an fNIRS cap according to the 10/20 system covering the primary motor cortex. NIRStim and NIRStar software were used to present tasks to the subject according to the paradigm, and register fNIRS signals, respectively. The research paradigm was a block design with motor imagery tasks and motor execution tasks of a hand gripping. The
68
A. N. Afonin et al.
inter-block interval was 4 s to ensure the relaxation of neurons in the motor cortex of the brain.
3 Results A pilot experiment was carried out to verify the effectiveness of the rehabilitation device and the co-work of the eye tracker and fNIRS. During the experiment, the subject’s task was to fix his gaze at a point in the low left corner of the screen to set the manipulator to the zero position. Then he moved his gaze to the upper left corner of the screen to move the manipulator in the XOZ plane. To switch the manipulator to control planes the subject imagined pressing the button at a fixed point on the screen with his left hand. Then he moved his gaze to the upper right corner of the screen to move the manipulator in the YOZ plane. To open the manipulator grip the subject mentally pressed the button at a fixed point on the screen with his right hand. Then he moved his gaze to the lower right corner to retract the manipulator. The fNIRS data obtained during the experiment is presented in Fig. 1. The figure clearly shows that motor imagery tasks that are used as input signals at the moment when a button was mentally pressed by the subject caused a noticeable increase in the concentration of hemoglobin in the motor cortex of the brain, which can be easily extracted and processed by using artificial neural networks, for example [14].
Fig. 1. Change in the concentration of oxyhemoglobin in the motor cortex of the brain during the experiment.
The gaze trajectory obtained during the pilot experiment is presented in Fig. 2.
A Rehabilitation Device for Paralyzed Disabled People
69
Fig. 2. The gaze trajectory during the pilot experiment.
It shows that rapid movements of the eyes between fixation points during neural activation (motor imagery tasks) are insignificant and can be removed by a filter (for example, applying the moving-average method). It proves that neural activation can’t lead to a shift in the manipulator grip.
4 Discussion The findings of the pilot experiment show that the described control scheme of the robotic rehabilitation device is feasible. However, to increase the accuracy of positioning the manipulator grip, it is good to apply special labels to all handles, buttons, and other surfaces that are intended to be gripped by the manipulator. The vision system of the robotic device is designed to recognize them when the manipulator grip is close to them (by using neural networks, for example). Then, depending on the type of the label the manipulator will capture the object or press on it. The shape and size of these surfaces should be unified. Special labels to facilitate the positioning of the manipulator is advisable to apply on the body of the paralyzed. A surface that is equidistant to the body of the disabled person at the distance of 1– 2 cm determines the forbidden area to move the manipulator grip. However, a failure in the control system of the manipulator can still lead to the User injury. An additional mechanism to increase the safety of the rehabilitation device can be provided by using a manipulator with the limitation force of a robotic arm while gripping of 1 kg due to mechanical safety tools such as cut pins. A series of patient’s eyes’ blinking, recorded by the eye tracker, can be used in the proposed control system to switch modes, for example, from the manipulator control mode to the typing mode on the screen. While typing, one can use the eye tracker to move along the virtual keyboard and use neural activation (mental commands) to press letters.
70
A. N. Afonin et al.
The proposed rehabilitation device for severely paralyzed and motor disabled people based on the eye tracker and fNIRS was successfully tested during the experiment with a desktop manipulator with a spherical coordinate system and Arduino control system. Acknowledgment. Research is supported by the RFBR grant 20–08-01178.
References 1. Chen, T.L., Ciocarlie, M., Cousins, S., et al.: Robots for humanity: a case study in assistive mobile manipulation. IEEE Rob. Autom. Mag. 20(1), 30–39 (2013) 2. Hammel, J., Hall, K., Lees, D., et al.: Clinical evaluation of a desktop robotic assistant. J. Rehabil. Res. 26(3), 1–16 (1989) 3. Park, D., Hoshia Y., Mahajanb, H.P., Rogersc, W.A., Kempa, C.C.: Active robot-assisted feeding with a general-purpose mobile manipulator: design, evaluation, and lessons learned. Robot. Auton. Syst. 124 (2020) 4. Lazarou, I., Nikolopoulos, S., Petrantonakis, P.C., Kompatsiaris, I., Tsolaki, M.: EEG-based brain-computer interfaces for communication and rehabilitation of people with motor impairment: a novel approach of the 21st Century. Front. Hum. Neurosci. 12, art. 14 (2018) 5. Afonin, A.N., Asadullaev, R.G., Sitnikova, M.A., Gladishev, A.R., Davletchurin, KKh.: Brain-computer interfaces in robotics. COMPUSOFT Int. J. Adv. Comput. Technol. 8(8), 3356–3361 (2019) 6. Kaplan, AYa., Zhigulskaya, D.D., Kirjanov, D.A.: The control human phantom fingers by means of P300 brain-computer interface for neurorehabilitation. Opera Medica et Physiologica. 2(S2), 73–74 (2016) 7. Lee, J.H., Ryub, J., Jolesza, F.A., et al.: Brain–machine interface via real-time fMRI: preliminary study on thought-controlled robotic arm. Neurosci. Lett. 450(1), 1–6 (2009) 8. Bianchi, T., Croitoru, N.I., Frenz, M., et al.: NIRS monitoring of muscle contraction to control a prosthetic device. In: Proceedings of SPIE - The International Society for Optical Engineering, vol. 3570, pp. 157–163 (1998) 9. Batula, A.M., Kim, Y.E., Ayaz, H.: Virtual and actual humanoid robot control with fourclass motor-imagery-based optical brain-computer interface. BioMed. Res. Int. 2017, article ID 1463512 (2017) 10. Chaudhary, U., Xia, B., Silvoni, S., Cohen, L.G., Birbaumer, N.: Brain computer interface based communication in the completely locked-in state. PLoS Biology 15(1), e1002593 (2017) 11. Duchowski A.T.: Eye Tracking Methodology. Theory and Practice, 3rd Edition, p. 387. Springer, Heidelberg (2017) 12. Bhattacharjee, K., Soni, M.: Eye controlled robotic motion using video tracking in real time. Int. J. Innov. Res. Sci. Eng. Technol. 6(7), 14828–14838 (2017) 13. Shishkin, S.L., Zhao, D.G., Isachenko, A.V., Velichkovsky, B.M.: Gaze-and-braincontrolled interfaces for human-computer and human-robot interaction. Psychol. Russ. State Art 10(3), 120–137 (2017) 14. Asadullaev, R.G., Afonin, A.N., Lomakin, V.V., Sitnikova, M.A.: Neural network classifier of hemodynamic brain activation patterns within a functional near-infrared spectroscopybased brain-computer interface for a bionic prosthetic hand. Drug Invent. Today 12(9), 2130–2136 (2019)
Analytic Model of Mental Rotation Evgeny Meilikov(B) and Rimma Farzetdinova National Research Centre “Kurchatov Institute”, 123182 Moscow, Russia [email protected]
Abstract. The mental rotation is a cognitive process being used by the brain to solve various common problems and some more complicated (for instance, abstract-geometric) tasks. That process is, likely, behind the spatial perception. We suggest a simple analytic model of the mental rotation where patterns of various images of the same object (for instance, its views from various viewpoints), being stored in the memory, are identified as long ago created wells in the energy landscape of the neuron system, while the image of the new object, liable for comparing with the old image, is considered as the virtual (just created) well. Recognizing the image in such a model is the transfer of the system from the new well of the energy landscape, corresponding to the object being presented just now, into one of the old wells, corresponding to other engrams of the same object. The directionality of that transfer is defined by the fact that barriers between the wells have various heights – the more similar distinct images of the object, the lower barriers between respective wells. That governs the preferable transfer from the new (just presented) well of the object to the old well, corresponding to the image that has been long ago stored in the memory. The model suggested is, in principle, based on considering dynamical processes of switching between various engrams of the same object. However, there are no dynamical equations employed in our scheme, and the whole dynamics arises from the Arrhenius-Boltzmann relationship, which, in fact, determines the probability and the time scale of transferring the system between the wells of various engrams of the same object. That model is in qualitative agreement with experiments. Keywords: Mental rotation
1
· Analytic model
Introduction
The mental rotation is the complex cognitive task. To solve it effectively, variety of spaced and correlated neuron processes have to be activated. That phenomenon is intensively studied, and corresponding results are well reproduced and quite objective. However, the neuron processes that take place during the mental rotation have not been understood completely. Though it is generally accepted that a variety of neuronal mechanisms work towards the rotation act, there is no clarity for the question of referring the certain brain zones to that process. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 71–84, 2021. https://doi.org/10.1007/978-3-030-60577-3_8
72
E. Meilikov and R. Farzetdinova
Todays methods of neuron visualization provide a lot of instruments for further insight into cognitive phenomena being early investigated via behavioral observations only. Functions of brain cells are studied with positron tomography and fMRI. However, results of the brain activation are found to be irreproducible for similar behavioral acts. Differences in such aspects as the gender inequality, the relative role of brain hemi-spheres, contributing motoric regions and involving higher-order visual areas are observed. The main brain regions which participate in the mental rotation are the parietal areas, some frontal components, and the occipital and temporal regions (more specifically – premotor, somatomotor and basal ganglia) (Ark 2002). However, in works on neuroimaging mental rotation there are still many significant differences. In this regard, it makes sense to formulate a phenomenological model that reproduces the experimental characteristics of the mental rotation process. That process consists in the imaginative rotation of a given object about some center in 2D or 3D space. In laboratories, to study the mental rotation experimentally they usually use three basic objects. Those are (i) classical 3-D block stimulus, introduced in (Shepard and Metzler 1971), (ii) alphanumeric stimulus, and (iii) line drawings of abstract figures or hands. According to (Vingerhoets et al. 2001), the mental rotation is the cognitive spatial transformation of the imaginary stimuli. In that imaginary version of the physical rotation, the evolution of the object orientation is performed along the shortest trajectory. First evidences in support of that theory have been provided in (Shepard and Metzler 1971). It has been shown that there is the linear dependency of the time, requiring to answer the question concerning the objects congruence, on the angle between those objects. Such a monotonic dependency is accepted as the evidence of the mental rotation. Thus, the larger the rotation angle the more time to adjust the stimuli to the final object by means of the mental rotation (see the review in (Shepard and Cooper 1982))1 . Now we know (see below) that, in general case, the mentioned dependency could be non-linear (but monotonic) one. Three-dimensional object is easily recognized being shown from a definite viewpoint. The corresponding view could be termed canonical. The preference of such canonical views is expressed through shorter response time, lower probability of errors and higher subjective estimating the quality of the presented image (Palmer 1981). However, practice influences that phenomenon: after a few trials differences in response time for random and canonical views are greatly reduced (Edelman 1989). Thus, the time of the recognition is monotonically increases with the disorientation relative to the canonical image, as if the object was turned in brain according to the internal representation. Rates of the rotation range from 40 to 550 degrees in a second depending on the object and the task (Edelman 1991). Experiments for studying the dependency of the response time on the disorienta1
In experiments of Shepard, the goal was to determine if both simultaneously presented images are projects of the same object or various objects being mirror transformations.
Mental Rotation
73
tion angle show the class of phenomena which investigation have been initiated by the seminal paper (Shepard and Metzler 1971) and is known now as the mental rotation. The rotation description in terms of some analog process consisting of the continuous transformation of the internal image, is now included in the base of today’s paradigm of vision (Marr and Freeman 1982). The rotation process is also resulted in the pattern of eye fixations during the experiment (Carpenter and Eisenberg 1978). Subjects make successive fixations, looking forth and back between corresponding features of two figures, with approximately one additional comparison for every 45◦ of angular disparity. The monotone dependency τR (θ) of the response time on the disorientation angle θ is often recognized as the evidence of the analog transformation (rotation). Formally, that dependency could be written as a simple linear relationship τR = τR0 + C1 θ,
(1)
τR = τR0 + C1 θ + F (θ),
(2)
or in more general form where τR0 is the minimal response time defined by the image recognition (not by the comparison), C1 is a constant (various for different observers and being equal to C1 ∼ 50 deg/s by order of value), and F (θ) is a nonlinear function. There are many papers devoted to the experimental study of dependencies τR (θ). Below, to be definite we deal with typical examples: the linear dependency τR (θ) ∝ θ (Shepard and Metzler 1971; Christova 2008) and two types of nonlinear dependencies shown in Fig. 1 (the superlinear dependency) and in Fig. 2 (the sublinear dependency).
τR , s
10 8 6 4 2 0
0
30
60
90
120
150
180
θ, degree Fig. 1. Experimental superlinear dependency of the response time on the disorientation angle (Bilge and Taylor 2017 45:63–80). Solid line is fitting for the dependency (14) (see below).
74
E. Meilikov and R. Farzetdinova
Response time τR, s
8 7 6 5 4 3 2
0
30
60 90 120 150 180 Angular Disparity θ, deg
Fig. 2. Experimental sublinear dependency of the response time on the disorientation angle (Bilge and Taylor 2017 45:63–80). Solid line is fitting for the dependency (20) (see below).
2
Neuronal Network Model of Mental Rotation Under Recognition
How do the mental rotation model and the neuronet perception model correlate? The former one is a pure phenomenological, and the latter one goes further and involves partly physiological neuronet ideas concerning mechanisms of the information storage in the brain. The concrete microscopical mechanism of the mental rotation is not known, but there are various formal models of that phenomenon based on the idea of comparing different neoronet populatons (engrams) and competing between them (Lago-Fern´ andez 2002; Laing 2002; Wilson 2001). Existence of fluctuations (noises) is the fundamental attribute of similar models. They initiate the process of switching between different perceptions (different engrams) of the presented object. Below, by the engram we mean some limited segment of the neuronal network that stores the result of the external exposure (for instance, presenting certain stimuli). Assumption is that the functional role of neurons is to signal about the similarity of the presented stimuli to that one which could be classified as the base one or canonical one (owing to its presentation in the past, preferred demonstration from certain viewpoint and so on) (Barlow 1972). Neuron engrams code and switch the memory of the object, but how do they make that is unknown. The most frequently they say (Mu-ming Poo et al. 2016), that the engram, storing the memory, is the set of synapses being activated or consolidated during the remembering process. From that viewpoint, the recall is the strengthening of the engram irritability leading to its accommodation to environment changes (Pignatelli 2019). There are three possibilities to store the memory of the object (Tarr and Pinker 1989): 1. Engrams of a given object, being viewed from various viewpoints, could be represented in the memory as the structural descriptions in the coordinate
Mental Rotation
75
system, tied to the object, and, hence, independent of its orientation (Marr and Nishihara 1978). 2. Alternative hypothesis is that the object is kept in the only representation, corresponding to the canonical orientation, and the mental rotation operation transforms the input image to this particular orientation (Tarr 1989). Classic mental rotation differs from the recognizing – the process in which the comparison between the object image and its internal representation is carried out (Cheung 2009). 3. At last, the third probability is the storage of object forms as the set of representations, each of which corresponds to the certain orientation of the object. Experiments (Tarr and Pinker 1989) show that subjects store object images in each of presented orientations and recognize them in new orientations by means of the rotation to one of stored orientations. These results correspond to the hybrid of the second (mental transformation) and the third (set of views) above-mentioned hypotheses of the object recognizing: input images are transformed in stored ones – either to the image of the nearest orientation or to the image of the canonical orientation. At that, mental orientation transformations go along the shortest rotation path that adjusts the input image to its stored analog. Hence, there is no any continuous mental rotation (requiring the time which rises with the disparity angle), but there is the ready set of oriented images used for comparing.
3
Energy Function
We will study the popular model where the dynamical process of the image recognition is identical to the motion of the system along the energy landscape upon the presence of the strong enough noise (Haken 1996). Frequently, such a dynamics could be imagined as the motion of a ball across some energy landscape. Deeper wells of that landscape are associated with old or canonical engrams (being stored for a long time), and ball positions in more high-energetic regions of the landscape correspond to new images being subjected to the identification. The process of recognizing the image by the brain consists of subsequent replacements of the ball into the nearest deeper wells corresponding to one or another known image. Commonly, probabilities of flips between wells, corresponding to different objects, differ significantly that leads to the unambiguous object recognizing. The system state changes with time because of inevitable energy fluctuations. However, instead of decisive taking that noise into account we shall apply the known Arrhenius-Kramers formula (Kramers 1940) for the average life time τ of the system in a given quasi-stable state, which is defined by the relation between the height Δ of the energy barrier and the mean amplitude Φ of noise energy fluctuations (this value could be termed as the chemical temperature): τ = τ0 exp(Δ/Φ),
(3)
76
E. Meilikov and R. Farzetdinova
where τ0 is the constant which, in the sense, is the time between successive trials to overcome the barrier. In fact, that relationship determines the probability of the system transfer from one state into another. The chemical, or noise, temperature Φ is the chemical analog of thermal fluctuations (in the traditional chemical kinetics that is the thermal motional energy). The concept of those fluctuations is purely phenomenal one, and various authors mean by that a variety of processes (Moreno-Bote 2007). In the considered case, those are, for instance, chemical fluctuations in synapses (fluctuations of ion or neurotransmitter concentrations in synapse contacts). It is assumed (Moreno-Bote 2007), that neuronal engrams, corresponding to various object images (in our case they differ by the orientation), compete each with other, changing the activity of their neurons. Such a model is based on introducing the energy function U with a variety of local minima (quasi-steady states corresponding to different image orientations), and barriers between those states. Commonly, the energy function is selected by analogy with the phenomenological theory of phase transitions (Toledano 1987) in the form of the power function of some state parameters which variation corresponds to the system transfer from one state into another (in our case, that is the disparity angle Θ). But such a choice is only justified by the ease of expanding that function in a series over the small state parameter near its minima. Therefore, the form of that function could be chosen (from the class of functions describing the required evolution of the potential with many wells under varying the control parameter J) largely for convenience reasons.
4
Various Forms of Energy Landscape
We will be interested in the interval 0 < Θ < π of varying the generalized coordinate Θ, that corresponds to the series of energy minima in the range 0 < θ < 180◦ of misorientation angles. In the picked example (see (4)) minima of U , being of interest to us (wells of the energy landscape), locate in typical for experiments points θ = 0, 45, 90, 135 and180◦ , while maxima (barriers) settle between them. In view of that, the model energy function could be, for instance, taken in the form (4) U (Θ) = U0 [ sin2 4Θ + JΘ], where Θ = πθ/180 is the generalized coordinate of the system (the reduced misorientation angle, in our case), U0 is the characteristic system energy. At J = 0 the energy landscape is symmetric, and at J = 0 it is so skewed that even at J 1 the line of the average energy lowers significantly with decreasing the misorientation angle, while every landscape extremes move slightly only towards lower θ-values. It is important that in doing so the wells become nonsymmetric: barriers Δl for transfers at the left are lower than barriers Δr for transfers at the right (cf. Fig. 3 for J = 0.3). That sets the stage for the system drift to the state of the lowest energy with θ = 0 (that is, in the canonical state).
Mental Rotation
77
θ, degree 2.0 0
45
90
135
180
3π/4
π
U(θ) /U0
1.5 1.0 0.5
Δl
Δr
0 0
π/4
π/2
Θ
Fig. 3. Extremes of the energy function (4) and inter-well barriers at J = 0.3. Arrows are the possible inter-well (over-barrier) transfers.
Barrier heights Δl and Δr is readily found from the relation (4) for the energy. In the linear approximation by J Δl /U0 ≈ 1 − πJ/8, Δr /U0 ≈ 1 + πJ/8.
(5)
Those barriers prevent the system to transfer from one minimum to another and determine transition times τl , τr for transferring the system in the near states with, accordingly, lower and higher energies: τl = τ0 exp(Δl /Φ),
τr = τ0 exp(Δr /Φ).
(6)
If barrier heights differ significantly, more exact – under the condition τl /τr = e
Δr − Δl Φ 1,
(7)
the transfers into states of higher energy (that is – to engrams corresponding to larger image misorientation) could be neglected. Then the total time τR of transferring from the initial N -th minimum into the final one (leftmost in Fig. 3) that corresponds to the canonical image, equals τR = τR0 + N τl ,
(8)
where N is the number of minima separating the initial anf final states. It is natural to assume that this number the more, the greater the angle of disorientation of the presented original image. In other words, we will suppose that certain interval δθ ∼ 30◦ of disorientation angles corresponds to each engram (from the set of engrams of a given object). Then N = θ/δθ,
(9)
78
E. Meilikov and R. Farzetdinova
τR (θ) = τR0 + N τl = τR0 + (θ/δθ)τl ,
(10)
where θ is the misorientation angle of the original image. It is the empirical (linear by θ) Shepard-Metzler law. In the framework of the considered model, that law is the consequence of the assumption that all wells of the energy landscape (and, hence, all inter-well barriers) are identical. In more general model, we could abandon that assumption. Let, for instance, the barriers Δl (N ), separating corresponding wells, be less and less with approaching to the engram of the canonical image (N = 0). Formally, that could be described by the energy function (see Fig. 4) U (Θ) = U0 (1 + JΘ) sin2 4Θ,
(11)
which (according to (9)) corresponds to growing linear dependency of the barrier hight on N : Δl (N ) = Δ0l + N δl , (12) where Δ0l is the depth of the canonical image well (N = 0), δl = πJU0 /4 is the difference of heights of near barriers. We will see that the considered variant of the energy landscape leads to the superlinear dependency τR (N ). θ, degree 45
90
135
180
3π/4
π
U(θ) /U0
2.0 0 1.5 1.0 0.5 0
Δl
0
Δl π/4
0
Δr
π/2
θ
Fig. 4. Extremes of the energy function (11) and inter-well barriers at J = 0.2 . Arrows are inter-well transfers in the process of the mental rotation.
In this case, the response time τR is defined by the sum
Δ0 Δl (0) Δl (1) Δl (N ) δl N δl l 0 0 + τ0 e Φ + e Φ + . . . + e Φ + τ0 e Φ 1 + e Φ + . . . + e Φ , τR = τR = τR
(13) or τR (N ) =
τR0
+ τ0 e
Δ0 l Φ
eN δl /Φ − 1 . eδl /Φ − 1
(14)
Mental Rotation
79
If τR0 0) to the expression (5); this term decreases selection abilities of agents: fk ¼ ebEðSPk Þ þ Er :
ð5aÞ
Using the fitness (5a), we compared two types of evolutionary processes: 1) with learning and 2) without learning. The parameter Er was large Er = 1013, this value suppresses essentially the selection process. Figure 2 shows the dependences of the agent energy of E on the generation number G. The evolution in the presence of learning (the curve 1) and without learning (the curve 2) is considered.
E 2 1
0
200
400
600
800
1000
G
Fig. 2. The agent energy E versus the generation number G in the presence of learning (the curve 1) and without learning (the curve 2). The expression (5a) is used. Both curves show the average energy of agents in the population (averaged for 1000 different calculations).
116
V. G. Red’ko and G. A. Beskhlebnova
The lifetime of agents in this computer experiment was T = 10. Figure 2 demonstrates a certain form of genetic assimilation: learning helps evolution to minimize the spin-glass energy. Certain form of the hiding effect was observed in the computer simulation: if lifetime of agent was sufficiently large (e.g. T = 1000), then the phenotype energy at the end of generation was small (E −60); however, the genotype energy was not essentially minimized (E −30 during a large number of generation (G * 1000)). Thus, strong learning suppresses the evolutionary optimization of agent genotypes. We also analyzed the effect of the learning load. The expression (7) for the fitness was used. The results of computer simulation are characterized in Fig. 3. The presence (the curve 1, c = 1) and absence of the learning load (the curve 2, c = 0) were considered. The lifetime of agents was T = 10.
2
E 1
0
20
40
60 G
80
100
Fig. 3. The agent energy E versus the generation number G. The evolutionary search in the presence of learning. The curve 1 shows the agent energy for the case of presence of the learning load (c = 1), the curve 2 shows the energy of agents in the absence of the learning load (c = 0). T = 10. The expression (7) is used. Results are averaged over 1000 different calculations.
According to the expression (7), the evolutionary process selects the agents that do not need a large changing of phenotypes during learning. Therefore, the learning load accelerates the evolutionary optimization. The effect of the learning load is not large; however, this effect takes place.
3 Conclusion Thus, the model of the interaction between learning and evolution for the case of minimization of the spin-glass energy has been designed and studied. The model analyzes the population of autonomous agents. The genotypes and phenotypes of these agents are coded by spin glasses. The most essential results of the current model are the following. The genetic assimilation, the hiding effect, and the effect of the learning load have been demonstrated in our work. According to computer simulations, some of these effects for the considered parameters are not strong, however, all these effects take place.
Modeling of Interaction Between Learning and Evolution
117
Funding. The work was financially supported by the State Program of Scientific Research Institute for System Analysis. The project number is 0065-2019-0003 (AAA-A19119011590090-2).
References 1. Hinton, G.E., Nowlan, S.J.: How learning can guide evolution. Complex Syst. 1(3), 495–502 (1987) 2. Mayley, G.: Guiding or hiding: explorations into the effects of learning on the rate of evolution. In: Husbands, P., Harvey, I. (eds.). Proceedings of the Fourth European Conference on Artificial Life (ECAL 1997), pp. 135–144. MIT Press, Cambridge (1997) 3. Red’ko, V.G.: Mechanisms of interaction between learning and evolution. Biologically Inspired Cogn. Archit. 22, 95–103 (2017) 4. Sherrington, D., Kirkpatrick, S.: Solvable model of spin-glass. Phys. Rev. Lett. 35(26), 1792– 1796 (1975) 5. Kirkpatrick, S., Sherrington, D.: Infinite range model of spin-glass. Phys. Rev. B. 17(11), 4384–4403 (1978) 6. Tanaka, F., Edwards, S.F.: Analytic theory of the ground state of a spin glass: 1. Ising spin glass. J. Phys. F: Metal Phys. 10(12), 2769–2778 (1980) 7. Young, A.P., Kirkpatrick, S.: Low-temperature behavior of the infinite-range Ising spin-glass: exact statistical mechanics for small samples. Phys. Rev. B. 25(1), 440–451 (1982) 8. Red’ko, V.G.: Neutral evolution game. In: Heylighen, F., Joslyn, C., Turchin, V. (eds.). Principia Cybernetica Web (Principia Cybernetica, Brussels) (1998). http://cleamc11.vub.ac. be/NEUTEG.html. Accessed 28 June 2020 9. Red’ko, V.G., Beskhlebnova, G.A.: Evolutionary minimization of spin glass energy. In: Kryzhanovsky, B., Dunin-Barkowski, W., Red’ko, V., Tiumentsev, Y. (eds.) Advances in Neural Computation, Machine Learning, and Cognitive Research III. NEUROINFORMATICS 2019, Studies in Computational Intelligence, vol 856, pp. 124–130. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-30425-6_13
Symmetry Learning Using Non-traditional Biologically Plausible Learning Method Alexander Lebedev(&), Vladislav Dorofeev, and Vladimir Shakirov Scientific Research Institute for System Analysis of Russian Academy of Sciences, Moscow, Russia [email protected]
Abstract. Most modern methods for training artificial neural networks are based on error backpropagation algorithm. However it has several drawbacks. It is not biology-plausible. It needs computing and propagating derivative of error function. It can't be directly applied to binary neurons. On the other side, algorithms based on Hebb's rule offer more biology-plausible local learning methods. But they basically unsupervised and can't be directly applied to vast amount of tasks, designed for supervised learning. There were several attempts to adapt Hebb's rule for supervised learning, but these algorithms didn't become very widespread. We propose another hybrid method, which use locally available information of neuron activity, but also utilize the information about error. In contrast to other methods, the presence of error doesn't invert the direction of weight change, but affects the learning rate. We test the proposed learning method on symmetry detection task. This task is characterized by practically infinite number of training samples, so methods should demonstrate a generalization ability to solve it. We compare the obtained results with ones, obtained in our previous work. Keywords: Artificial neural networks Symmetry detection Hebb's rule
Biologically plausible learning
1 Introduction 1.1
Backpropagation and its Alternatives
Backpropagation (backprop, BP) have been a most popular algorithm for training artificial neural networks for many years. It was independently invented many times, but the method gained its name and popularity after publication of [1]. However it is considered biologically implausible by many neuroscientists. The implementation of backpropagation is not possible in a real nervous system. For example, it assumes that axons are able to propagate back precise values of error function derivatives to calculate corresponding derivatives on the next level. It seems unrealistic. In 2014 Timothy Lillicrap published a pioneering work [2], where he presented a new learning algorithm that didn't utilize derivatives. It was called Feedback Alignment. The algorithm transmits error signals not by precise information about derivative values, but by multiplication by weights taken at random. Since the publication of this work many alternative algorithms for training artificial neural networks and pretending © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 118–124, 2021. https://doi.org/10.1007/978-3-030-60577-3_13
Symmetry Learning Using Non-traditional Biologically Plausible Learning Method
119
to be more biologically plausible were proposed. For example, [3] proposes an inspired by Lilllicrap's work approximate learning algorithm for memristor-based neural networks. The further advance became the development of Direct Feedback Alignment algorithm. It was published in [4] by Arlid Nokland. In this study a random matrix of weights was used to propagate error information directly to neurons of hidden layers. This architecture was efficient to successfully solve MNIST problem. Independently, in 2015 Pavlov principle was introduced. We provide its formulation from [5]: «PAVLOV PRINCIPLE (PP): The network of neurons, such that the strength of each of the connection between neurons is gradually changing as a function of locally available error signal components and activity states of the neurons connected, comes in the process of network functioning to error-free operation». Another alternative to backpropagation is to use hebbian-like local learning rules. A detailed survey of vast subset of local learning rules is presented in [6, 7] and [8]. The original Hebb's rule is unsupervised by its nature. But there were several attempts to modify it for supervised learning problem. For example, such approach was used in [9, 10, 11] and [12]. We propose another modification of Hebb`s rule, which can solve tasks in supervised learning setting. 1.2
Symmetry Detection Task as Test Problem
We use symmetry detection task to test our method. This task is a popular benchmark in the field of Artificial Neural Networks and Artificial Intelligence research. For example in [1], authors used problem of detection of mirror symmetry as one of the first tasks to test the backpropagation method. In this work the neural network had six input neurons. These neurons were divided into two subsets. If the binary activity value of each input of the first subset was equal to the one of corresponding input of the second subset, the answer was considered positive. The training set contained 100000 samples of input vectors. In [13] another case of exploiting symmetry detection problem was presented. This time it was solved with the help of Boltzmann machine. The problem was formulated as to determine whether a square matrix with binary values had horizontal, vertical or diagonal symmetry. The training set consisted of 4 4 and 10 10 binary matrixes. This means that input vectors consisted of 16 and 100 digits respectfully. The Boltzmann machine was trained on a set of such images selected at random. It obtained 98.2% accuracy on 4 4 problem and 90% accuracy on 10 10 problem.
2 The Model 2.1
Neuron Model and Learning Procedure
The main idea of the proposed learning algorithm was to combine unsupervised hebbian-like learning and error signal consideration according Pavlov principle. The learning algorithm should enable neural network to learn in unsupervised manner and find correlations in input signals. At the same time it should be able to utilize
120
A. Lebedev et al.
information from error signals. As a solution, we propose to adjust weights similar to Hebb’s rule, but vary learning according to error signal. The presence of error decreases learning rate and absence of error increases learning rate. This principle intends to form patterns that correspond to error-free behavior. Weight adjustment of individual i-th synapse is performed according to formulas: w0i ðt þ 1Þ ¼ wi ðtÞ þ lr ðtÞ Xi ðtÞ Y ðtÞ lrðtÞ ¼
e 1 þ jEðtÞj
ð1Þ ð2Þ
Here Xi(t) are components of the binary input vector X at step t, presented at i-th synapse, Y(t) is the corresponding output value of the neuron. We use binary neurons, so their output values can be either 0 or 1. Lr(t) corresponds to dinamic learning rate, that depends on the error value. E(t) is an error value of the network, it corresponds to the difference between desired output signal (of the network as a whole, not of the particalar neuron) and the factual one. It is the same for all neurons of the network. If the output contain several components, the erroe value can be composed of the summ of abs values (or squares) of difference between corresponding output values and the desired ones. It equals to zero if the output is totally correct and it increses with the increase of incosistance of network output. This error value can also be considered as a penalty or anti-reward in reinforcement lerning setting. As it can be obtained from formula (2), the highest reward (or the lowest error) leads to the highest learning rate. Note that this learning rule is assymetric, i.e. weights increase only in case of simultaneous activation of input and output neurons, otherwise they doesn’t change. To prevent weights from unbounded growth and implement their decrease normalization procedure is applied on each step. We used normalization by preserving the summ of squares of weights. So the final weight values are determined by the formula: w0 ðt þ 1Þ ffi wi ðt þ 1Þ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi PN 0 ð t þ 1Þ Þ 2 ð w i i¼1
ð3Þ
The final weight values wi (t + 1) are computed after normalization by the length of intermediate weight vector wi' (t + 1). It is important to note that the proposed learning procedure is consistent with Pavlov Principle as long as weight change is a function of locally available error signal components and activity states of the neurons connected. Formal mathematical rationale goes beyond this work, but we present some intuition. One of the main features of the proposed method is that the weight change is performed in the same direction (assuming the same input signals) both in case of presence and absence of error. So the method remains consistent with the original formulation of Hebb's postulate “When an axon of cell A is near enough to excite cell B or repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased”. This is in contrast, for example to [11], where simultaneous activity of
Symmetry Learning Using Non-traditional Biologically Plausible Learning Method
121
neurons in the presence of negative reward leads to a decrease of weights between them. The latter approach may cause an undesirable effect since we don’t know the preferred output values for all neurons, only for output ones. At the other hand, Hebb's rule is famous for the ability to learn (in unsupervised manner) to detect hidden correlated patterns. But the inversion of weight change direction can interfere this ability. It can be obtained that patterns that are nor correlated with error signal can be learned by the proposed method the same way as traditional Hebb's network. These patterns can be used as features by next neurons. Unlike many approaches (for example [10]) we don't explicitly specify layers for unsupervised pattern learning. We hypothesize that this will make the learning method more flexible. At the same time, among patterns that correlate with error signal, neuron will learn faster to react to patterns that positively correlate with the absence of error (or positive reward). Let`s assume the network had reached an error-free behavior. Let`s assume also that neuron N had learned to react to a pattern, which is characterized by presence of 1s on synapses Xa1, Xa2, … Xak and 0s on all other synapses. Assuming the weights of these synapse have weights equal to 1=k and the others equal to 0, this values will be stable during further weight modification. Indeed, according to (2), the weights change will occur only after simultaneous appearance of 1s on corresponding synapse and on the output of the neuron. Following (2), the new value of weight of i-th synapse will be w‘i ðt þ 1Þ = 1k + e. After normalization according (3) the weight value will return to 1=k. A similar calculation can be performed for patterns that are defined in a probabilistic manner. We can conclude that error-free behavior, once learned by the network, becomes a stable state. In the next section we also describe experimental study of stability of the method. 2.2
Experiments.
To test the effectiveness of proposed learning method we applied it to symmetry detection problem. The detailed description of this problem is present in [14]. The obtained detection accuracy for different number of hidden layers is presented in Table 1. In this series of experiments each hidden layer had 400 neurons. Each neuron of the hidden layer had 50 connections with randomly selected neurons from previous layer. Table 1. Comparison of performance of the proposed learning method for configurations with different number of hidden layers (fully-connected network) Hidden layers 1 2 3
Share of correct answers (averaged over several runs) 98,55% 76,48% 74,34%
Standard deviation
Number of runs
0,42% 12,96% 15,2%
4 4 3
122
A. Lebedev et al.
For comparison, in Table 2 we present results from our previous work [14], obtained by applying Pavlov principle to more traditional learning rule using random weight to multiply error signals (similarly to direct feedback alignment). In this series of experiments each hidden layer had 400 neurons and the network was fullyconnected. Table 2. Comparison of performance of the old learning method for configurations with different number of hidden layers (without normalization) Hidden layers 1 2 3
Share of correct answers (averaged over several runs) 94.80% 82.14% 74.45%
Standard deviation
Number of runs
1.41% 2.20% 2.37%
5 3 3
Recognition accuracy decreases with the increase of amount of layers in both cases. Neither newly proposed method nor the old one demonstrate clear advantage. However the latter situation differs not only by the learning method, but also by the lack of normalization. For more careful comparison we performed a series of experiments with both new proposed learning method and the learning method from [14], but with normalization in both cases and also only 50 neurons in hidden layers. The results for proposed learning rule are presented in Table 3. Table 3. Comparison of performance of the proposed learning method for configurations with different number of hidden layers (50 neurons in each hidden layer) Hidden layers 1 2 3
Share of correct answers (averaged over several runs) 98.15% 95,57% 91,94%
Standard deviation
Number of runs
0,53% 0,65% 2,54%
4 4 5
The results for learning method from [10] are present in Table 4.
Table 4. Comparison of performance of the old learning method for configurations with different number of hidden layers (with normalization, 50 neurons in hidden layers) Hidden layers 1 2 3
Share of correct answers (averaged over several runs) 95,46% 87,62% 66,85%
Standard deviation
Number of runs
3,03% 5,14% 10,11%
4 4 4
Symmetry Learning Using Non-traditional Biologically Plausible Learning Method
123
As it can be observed from Table 3 and Table 4, the situation changed. The reduction of the amount of neurons in hidden layers down to 50 decreased the accuracy of learning method from [14] and even weight normalization didn`t help. On the contrary, the new proposed method only benefited from the decreased number of neurons in hidden layer. The result even better than the one of the old method in case of fully-connected network. The accuracy of proposed learning method also decreases with the increase of the number of layers, but not so dramatically. We also studied the evolution of recognition ability in dynamics. As an illustration, in Fig. 1 a sample history of changing of percentage of correct symmetry recognition for network with 3 hidden layers and 50 synapses for each neuron during training by the proposed method. As it can be obtained, after about 200000 iterations the recognition ability reaches stable level.
100 90 80 70 60 50 40 30 20 10 1 23 45 67 89 111 133 155 177 199 221 243 265 287 309 331 353 375 397 419 441 463 485
0
Fig. 1. The history of changing of percentage of correct symmetry recognition for network with 3 hidden layers and 50 synapses for each neuron during training by the proposed method. The vertical axis corresponds to percentage of correct answers. The horizontal axis corresponds to the number of training steps (in thousands)
3 Conclusion According to experiments, the proposed learning method demonstrate an ability to learn patterns and solve tasks in supervised setting. As long as the symmetry detection task, which we used as a test task, can provide almost “infinite” amount of training samples (the probability of repeats is very low), the ability to solve the task means gaining generalization ability. At least in some cases the proposed method demonstrates better obtained accuracy, then the previously studied one. Although it doesn't rely on using non-local information and propagating complex derivative information. It doesn't even rely on the sign of an error signal, only on the absolute value. That
124
A. Lebedev et al.
perhaps means that the proposed method can be applied even it is not possible to obtain target value of specific neurons or even the error signal is a function of network behavior as a whole but not individual neurons. We consider this feature as a step towards biology plausibility. Perhaps the method needs further investigation. Another area of investigation would be to solve the symmetry recognition task by training a similar network by traditional backpropagation method and compare results with the obtained ones. Acknowledgements. The work financially supported by State Program of SRISA RAS No. 0065–2019-0003 (AAA-A19–119011590090-2).
References 1. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Nature 323, 533–534 (1986) 2. Lillicrap T., Cownden D., Tweed D.B., Akerman C.J.: Random feedback weights support learning in deep neural networks (2014). arXiv: 1411.0247 3. Wang, L., Li, H., Duan, S., Huang, T., Wang, H.: Pavlov associative memory in a memristive neural network and its circuit implementation. Neurocomputing 171, 23–29 (2016) 4. Nokland, A.: Direct feedback alignment provides learning in deep neural networks (2016). arXiv: 160901596 5. Dunin-Barkowski, W.L., Solovyeva, K.P.: Pavlov principle and brain reverse engineering. In: 2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, Saint Lois, Missouri, USA, 30 May–2 June 2018, Paper #37, pp. 1–5 (2018) 6. Baldi, P., Sadowski, P.: A theory of local learning, the learning channel, and the optimality of backpropagation (2015). arXiv: 1506.06472v2 7. Gerstner, W., Kistler, W.M.: Mathematical formulations of Hebbian learning. Biol. Cybern. 87, 404–415 (2002) 8. Kuriscak, E., et al.: Biological context of Hebb learning in artificial neural networks, a review. Neurocomputing 157, 22–23 (2005) 9. Negrov, D., Karandashev, I., Shakirov, VYu., Matveyeva, Yu., Dunin-Barkowski, W., Zenkevich, A.: An approximate backpropagation learning rule for memristor based neural networks using synaptic plasticity. Neurocomputing 237, 193–199 (2017) 10. Kryzhanovskiy, V., Malsagov, M.: Increase of the speed of operation of scalar neural network tree when solving the nearest neighbor search problem in binary space of large dimension. Opt. Memory Neural Netw. (Inf. Opt.) 25(2), 59–71 (2016) 11. Hopfield, J., Krotov, D.: Unsupervised Learning by Competing Hidden Units (2018). arXiv: 1806.10181v1 12. Mazzoni, P., Andersen, R., Jordan, M.: A more biologically plausible learning rule for neural networks. Proc. Nat. Acad. Sci. USA 88, 4433–4437 (1991) 13. Sejnowski, T.J., Kienker, P.K., Hinton, G.E.: Learning symmetry groups with hidden units: beyond the perceptron. Physica 22, 260–275 (1986) (North-Holland, Amsterdam) 14. Lebedev, A.E., Solovyeva, K.P., Dunin-Barkowski, W.L.: The large-scale symmetry learning applying Pavlov principle. In: Advances in Neural Computation, Machine Learning, and Cognitive Research III, Springer, Switzerland (2020)
Providing Situational Awareness in the Control of Unmanned Vehicles Dmitry M. Igonin, Pavel A. Kolganov(B) , and Yury V. Tiumentsev Moscow Aviation Institute (National Research University), Moscow, Russia [email protected], [email protected], [email protected] Abstract. The article considers one of the aspects of the situational awareness problem for control systems of unmanned vehicles. We interpret this problem as getting information about the current situation in which, for example, an unmanned aerial vehicle (UAV) is operating. This information is required as source data for decision-making in the UAV behavior control process. One possible component of situational awareness is information about objects in the space surrounding the UAV. At the same time, it is important to know along which trajectories these objects move. Also, we need to predict the motion of the observed objects. We consider this task in the article as a formation example for one of the elements of situational awareness. To solve this problem, we prepare a data set using the FlightGear flight simulator. We extract from this set the training, validation, and test sets required to obtain a neural network that predicts the trajectory of the object being tracked. Then, based on the collected data that characterize the behavior of the desired object, we design a neural network model based on recurrent neural networks to solve the problem of predicting the trajectory of a dynamic object. Keywords: Artificial intelligence · Recurrent neural network · Convolutional neural network · Machine learning · Dynamical system Unmanned vehicle
1
·
Introduction
For modern information technologies, one of the most challenging scientific and engineering problems is the development of behavior control systems for highautonomous robotic unmanned aerial vehicles (UAVs). Such UAVs should be able to perform complicated missions independently, without human involvement, under uncertainty conditions [1,2]. For this purpose, the UAV must be able to produce control actions adequate to the control goals and the current situation in which the UAV operates. It follows that a critical part of the problem of robotic UAV behavior control is to obtain an assessment of the current situation, i.e., to get information about it, which is required to provide decisionmaking processes in the control system. We will refer to this type of information as situational awareness. The concept of situational awareness is one of the most c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 125–134, 2021. https://doi.org/10.1007/978-3-030-60577-3_14
126
D. M. Igonin et al.
important in terms of both manned and unmanned aircraft and other types of controllable systems. One possible component of situational awareness is information about objects in the space around the UAV. We can obtain this information by solving the problem of tracking these objects and predicting their behavior. This article is about the component of the situational awareness formation subsystem, which receives input from various sources installed onboard the UAV. These data sources can be all kinds of sensor systems, such as radar, LIDAR, ultrasonic sensors, video cameras, communication modules, sensors of positioning systems, etc. All signals from various sources must be processed with a speed that tends to match the speed of real-time processing systems. This requirement is necessary to respond to a changing situation promptly and form control actions appropriate to that situation. A particular case of the described problem of tracking dynamic objects in the vicinity of the UAV is predicting the trajectories of objects located in this vicinity. Technologies based on the use of recurrent neural networks (RNN) [3,4], networks with long-short term memory (LSTM) [5–8], gated recurrent units (GRU) [9] as well as other types of artificial neural networks [8,10,11], including deep neural networks (DNN) [12–14], are quite often used to solve problems of predicting dynamic object trajectories. Neural network architectures used for solving specific tasks depend significantly on many factors, such as specificity of the problem being solved, the features of data used, etc.
2
Statement of the Problem
We can formulate the object trajectory prediction task as forecasting the m characteristics of the trajectory for an observed object in d-dimensional phase space by l steps forward in discrete-time based on information about k previous observations, each of which contains n features (n < d). Thus, the task of restoring functional dependence in the context of supervised learning is solved. There is a set of feature sequence pairs x = (x1 , . . . , xn ) ∈ Rn , Rn ⊂ Rd and target variables y = (y1 , . . . , ys ) ∈ Rs , Rs ⊂ Rd , {(xi , yi )}i=1...n , on which we need to restore relationship of the form y = f (x). The restored functional dependence is an approximation y ≈ f ∗ (x) of the true relationship f (Fig. 1). To estimate the quality of approximation of f ∗ to f , the L(y, f (x)) loss function is used, which must be minimized: f ∗ (x) = argminf (x) L(y, f (x)).
3
Description of the Experiment
Source Data Set. We generate an appropriate data set to solve the task considered in this paper. This data set serves as a source of training, validation,
Situational Awareness in the Control of Unmanned Vehicles
127
and test data for the neural network being formed. This data set is a record of synthetically generated flight paths of a Schleicher ASK 13 double-seat glider used for training of glider pilots. We use the FlightGear flight simulator to obtain these glider trajectories. The simulator is cross-platform, open-source and released under the GNU General Public License. It supports several flight dynamics models. The JSBSim model [15] was used to generate the data set used in this article. The duration of each flight simulation is 1000 s, with intervals between measurements of 1 s. Features describing the spatial position of the aircraft’s center of mass, such as flight altitude, latitude, longitude, and angles of pitch, roll, and yaw, were recorded. We perform the simulation from a starting altitude of 5000 m with an initial speed of 250 km/h, followed by a gradual decrease of the glider. We give an example of a part of the trajectory from the generated data set in Fig. 1. Points on the trajectory represent records of geographic coordinates and altitude marks. For ease of perception, we connect the points in chronological order. Darker points correspond to the beginning of the trajectory, and lighter points correspond to the end of the trajectory. This example clearly demonstrates the nature of the aircraft’s motion, namely, its low intensity of maneuvering combined with a gradual decrease in altitude.
Fig. 1. Example of a part of a trajectory from a generated data set
128
D. M. Igonin et al.
Fig. 2. Change of features latitude, longitude and altitude in time
During the data set generation process, medium-intensity signals were provided as control actions. Figure 2 and Fig. 3 show how the values of each of the above features changed during the flight. When working with neural networks, it is important to understand the data used for their training, validation, and testing. Figure 2 and 3, in combination with Fig. 1, allow us to show the peculiarities of the generated data set in sufficient detail. In particular, Fig. 2 and 3 demonstrate the changes in position of the aircraft in time.
Situational Awareness in the Control of Unmanned Vehicles
129
Fig. 3. Change in roll, pitch and yaw characteristics over time
Data Preprocessing. We perform the geographic coordinates values obtained from the API of the FlightGear simulator into Universal Transverse Mercator (UTM) values. This conversion was performed using the utm package. In addition, the data was converted from absolute coordinates to Δxk = xk − xk−1 shifts at k step relative to (k − 1) step, for all k. Description of the Neural Network Model. We use a rather simple version of a recurrent neural network to solve the problem under consideration.
130
D. M. Igonin et al.
Fig. 4. Model errors when predicting the coordinate x
It incorporates a combination of LSTM-blocks in association with a fully connected layer. The implementation of this network was carried out in the Python 3 programming language, using the Keras and TensorFlow libraries. We train the constructed network basing on the shifts introduced above concerning the coordinates x, y, z independently for each target coordinate. The prediction of angles responsible for the spatial orientation of the aircraft was not carried out. In training, we used the gradient clipping method with the value of the parameter clipnorm = 1. To solve the optimization problem, we use the Adam algorithm with a learning rate lr = 0.02. For validation, we used delayed sampling, which is 30% of the source data set. As the quality functional, the mean square error (MSE) of the prediction for relative sifts with a forecast horizon of one second was used. Neural Network Modeling Results. Consider the operation quality of the trained model for the coordinate x based on the data presented in Figs. 4, 5 and 6. We can see in Fig. 4 graph of the forecasting error of the model along the coordinate x. The experimental data were divided into training, validation, and test parts. The learning of the network parameters took place on the data from the training set; quality control was used to assess the generalization ability of the validation sample. The time window in Fig. 2 and 3 corresponds to one complete trajectory on the basis of which the data set was built. We can see in Fig. 4 that the model predicts quite well on average, but it is wrong at the moments of relatively sharp changes in the direction of motion of the aircraft. The values of the standard deviation and the determination coefficient R2 indicate a reasonably good predictive ability of the model on the delayed sample. The vertical green bar in Fig. 6 shows the mean error. The vertical red bars indicate shifted error value ± MAE; the black bar indicates the reference value of unbiased error. Similar results can be given for predicting the coordinate y, they are presented in Figs. 7, 8 and 9.
Situational Awareness in the Control of Unmanned Vehicles
Fig. 5. Scattering chart of model errors when predicting the coordinate x
Fig. 6. Distribution of model errors in predicting for coordinate x
Fig. 7. Model errors when predicting the coordinate y
131
132
D. M. Igonin et al.
Fig. 8. Scattering chart of model errors when predicting the coordinate y
Fig. 9. Distribution of model errors in predicting for coordinate y
Fig. 10. Scattering chart of model errors when predicting the coordinate alt
Situational Awareness in the Control of Unmanned Vehicles
133
In terms of flight altitude (alt feature), we limit ourselves to the representation of the model error scattering chart in the predicting, which is shown in Fig. 10. Concerning the problem of situational awareness formation, we can conclude the acceptable working ability of the proposed approach to predicting dynamic objects’ trajectories in the vicinity of the considered aircraft. This statement is confirmed by the simulation results presented above.
4
Conclusions
Formation of situational awareness, i.e., obtaining information required for the UAV control system to make decisions, is one of the components of the UAV behavior control problem. One of the essential elements of situational awareness for a UAV is the data for the aircraft in its vicinity, as well as the prediction of the behavior of those aircraft. The article solves this problem for the case when there is only one such aircraft. In the case of several aircraft, the task is essentially the same, but the required computational resources increase significantly. In the advanced version, the UAV should detect and track aircraft in its vicinity using its on-board facilities. This situation, however, is not always possible, especially for small UAVs. For the time being, we are considering the case where objects of interest to the UAV control system are tracked by devices external to our UAV. The data acquired in this way is then transmitted to the UAV via a radio channel. In the future, we also plan to consider obtaining the required data by UAV on-board facilities. We can potentially use various recurrent neural network architectures to solve the tracking problem for dynamic objects such as aircraft. It should be taken into account that such a model in the traditional version is based only on experimental data about the behavior of the object under study and does not use knowledge about this object in any way. The involvement of such knowledge is one of the essential reserves of increasing the complexity level of the tasks being solved and the simulation accuracy. The application of convolutional neural networks seems to be effective as one of the tools for preprocessing the source data used to generate situational awareness of the control system for a robotic aircraft. Such use is one of the directions in the development of the discussed work since we can use machine vision to provide source data for solving the problem of predicting the trajectories for dynamic objects. In general, attempts to predict the behavior of a dynamic object located in the attention area of our controlled object, based only on the use of empirical models, may not provide sufficient accuracy. Therefore, as mentioned above, this paper’s development assumes the use of a priori information about the observed object and physical motion models in combination with an empirical approach to building more accurate behavior predictions. The general conclusion that can be drawn based on the obtained results is that the high generalizing ability and adaptability of various neural networks
134
D. M. Igonin et al.
when processing data and signals of diverse nature is a significant advantage. We have to use this advantage in solving real problems, including the tasks of forming situational awareness of robotic systems.
References 1. Finn, A., Scheding, S.: Developments and Challenges for Autonomous Unmanned Vehicles. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-10704-7 2. Valavanis, K.P. (ed.): Advances in Unmanned Aerial Vehicles: State of the Art and the Road to Autonomomy. Springer, Heidelberg (2007). https://doi.org/10.1007/ 978-1-4020-6114-1 3. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990) 4. Jordan, M.I.: Attractor dynamics and parallelism in a connectionist sequential machine. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pp. 531–546. Erlbaum, Hillsdale (1986) 5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 6. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000) 7. Zyner, A., Worrall, S., Nebot, E.: A recurrent neural network solution for predicting driver intention at unsignalized intersections. IEEE Rob. Autom. Lett. 3(3), 1759– 1764 (2018) 8. Phillips, D.J., Wheeler, T.A., Kochenderfer, M.J.: Generalizable intention prediction of human drivers at intersections. In: 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, pp. 1665–1670 (2017). https://doi.org/10.1109/IVS. 2017.7995948 9. Cho, K., Merrienboer, B., Bahdanau, D., Bengio,Y.: On the properties of neural machine. arXiv: 1409.1259 (2014) 10. Yoon, S., Kum, D.: The multilayer perceptron approach to lateral motion prediction of surrounding vehicles for autonomous vehicles. 2016 IEEE Intelligent Vehicles Symposium (IV), Gothenburg, pp. 1307–1312 (2016). https://doi.org/10. 1109/IVS.2016.7535559 11. Khosroshahi, A., Ohn-Bar, E., Trivedi, M.M.: Surround vehicles trajectory analysis with recurrent neural networks. In: IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 2267–2272 (2016) 12. Zyner, A., Worrall, S., Nebot, E.: Naturalistic driver intention and path prediction using recurrent neural networks. arXiv:1807.09995v1 (2018) 13. Deo, N., Trivedi, M.M.: Convolutional social pooling for vehicle trajectory prediction. arXiv:1805.06771 (2018) 14. Altche, F., de La Fortelle, A.: An LSTM network for highway trajectory prediction. arXiv:1801.07962v1 (2018) 15. JSBSim build team, JSBSim Flight Dynamics Model. http://jsbsim.sourceforge. net/ (2009)
Neurobiology and Neurobionics
Complexity of Continuous Functions and Novel Technologies for Classification of Multi-channel EEG Records Boris S. Darkhovsky1(B) , Alexandra Piryatinska4 , Yuri A. Dubnov1,2 , Alexey Y. Popkov1 , and Alexander Y. Kaplan3 1
4
Federal Research Center “Computer Science and Control” of Russian Academy of Sciences, Moscow, Russia [email protected] 2 National Research University “Higher School of Economics”, Moscow, Russia 3 Moscow State University, Moscow, Russia Department of Mathematics, San Francisco State University, San Francisco, USA Abstract. A multi-channel EEG signal is a time series for which there is no universally recognized mathematical model. The analysis and classification of such and many similar complex signals with an unknown generation mechanism requires the development of model-free technologies. We propose a fundamentally novel approach to the problem of classification for vector time series of arbitrary nature and, in particularly, for multi-channel EEG. The proposed approach is based on our theory of the -complexity of continuous vector-functions. This theory is in line with the general idea of A.N. Kolmogorov on a complexity of an individual object. The theory of the -complexity enables us to effectively characterize the complexity of an individual continuous vector-function. Such a characterization does not depend on the generation mechanism of a continuous vector-function and is its “intrinsic” property. The main results of the -complexity theory are given in the paper. Based on this theory, the principles of new technologies of classification for multi-channel EEG signals are formulated. The proposed technologies do not use any assumptions about the mechanisms of EEG signal generation and, therefore, are model-free. We present the results of the first applications of new technologies to the analysis of real EEG and fNIRS data. We conducted two experiments with data obtained from a study of people with schizophrenia and autism spectrum disorder, and we obtained classification accuracy up to 85% for the first one and up to 88.9% for the second. Keywords: Complexity classification
1
· EEG-signal · Model-free technologies for
Introduction
Encephalographic (EEG) signal is one of the main methods for studying brain activity. Analysis of these signals allows in many cases to identify pathological c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 137–142, 2021. https://doi.org/10.1007/978-3-030-60577-3_15
138
B. S. Darkhovsky et al.
changes and various mental states of the brain. In recent years, the development of brain-computer interfaces (BCI) has been actively developing [1,2]. One of the main problems in this area is the construction of classifiers that could divide the recordings of multi-channel EEG signals into classes depending on one or another mental state of the subject. Currently, a number of classifiers are known that have proven themselves quite well [3–5], provided that the feature space for classification is selected adequately. However, in relation to the classification of EEG signals, the choice of a feature space presents significant difficulties. The main reason for these difficulties is that, firstly, today there is no generally accepted mathematical model for an EEG signal, and secondly, this signal is fundamentally unstationary (in any reasonable sense of the word) (see [6]). The same reasons make the EEG signal one of the most complex, according to most experts, physical signals. Under these conditions, many researchers try to use statistical methods to analyze the EEG signal, (often implicitly) assuming that this signal is generated by some probabilistic mechanism. However, such assumptions are unfounded. Moreover, even if we take a similar point of view, the corresponding random processes are purely unstationary, and for their adequate analysis it is impossible to do without a model for generating these processes. Nevertheless, in practice, for the classification of EEG signals, estimates of their statistical spectra in different frequency ranges are often used. The result is a feature space of large dimension (many tens and even hundreds of features), which one has to work with, as a rule, empirically. Using today’s popular neural network technology to find a satisfactory classification in a multidimensional feature space is difficult. Firstly, training a neural network requires a large amount of data (which is problematic for an EEG, where the record length is usually tens of seconds) and, secondly, a network trained for one person (such training can take many times due to the lack of adequate mathematical description of the EEG signal), must again be trained for another. Under these conditions, the attention of researchers in recent years has been attracted by the idea of using an estimate of a “complexity” of an EEG signal. From a mathematical point of view, the recording of a multi-channel EEG signal is a continuous vector-function of time, defined on a finite segment. The question arises: how to quantify a “complexity” of a continuous vector function? Researchers are trying to use a variety of mathematical methods for this assessment: methods of nonlinear dynamics, information theory, dimensional theory, theory of text’s compression. In our works [7,9], these approaches were examined in details and explained why they are not quite adequate to the problem. In recent years, a new theory has been developing in our works [8,9]—the theory of -complexity of continuous functions (maps). This quantity is an “internal” characteristic of a continuous function, and it does not depend on the mechanism of its generation. It turns out that for functions satisfying the H¨ older condition, the -complexity admits a simple characterization. Exactly this allows us to use the parameters of the complexity as a new diagnostic tool, which allows us to propose a fundamentally
Complexity of Continuous Functions
139
different, model-free approach to the problem of classification of multichannel EEG signals. The paper is organized as follows. In Sect. 2, main definitions and results (on a semantic level) of the -complexity theory of continuous functions are given. In Sect. 3, the main ideas of model-free classification of multichannel EEG are given, and two examples of real data classification are described. In Sect. 4, conclusions are given.
2
Basic Definitions and Results of the -Complexity Theory
In this section, the necessary definitions are given and the results of the theory are described on a semantic level. Exact formulations can be found in [9]. Let x(t) be a continuous function defined on [0, 1]. Let the function is given by its values on some uniform grid with step 0 < h < 1. Let F be a collection of methods for recovering (approximation) of the function by its values on the grid. Fix some (small) number > 0 and choose the recovery method from family F that delivers a minimum of relative (with respect to x(t)) recovery error. If this minimal error is less than , we increase the grid step, if it is more than - we decrease the grid step. Since, under natural assumptions, the recovery error tends to zero when the grid step tends to zero, there exists a minimal number n(, F) of samples of the function for which the relative recovery error for a given set of methods F will not exceed . Main definition The value Sx (, F) = log n(, F) is called the (, F)-complexity of continuous function x(t). The (, F)-complexity is a characterization of the shortest description of a continuous function by given methods with a given accuracy. In this sense, our definition is consistent with the main idea of A.N. Kolmogorov that the complexity of an object should be measured by the length of its shortest description ([10]). The definition given above is generalized to the case of a continuous vectorfunction. Main result can be formulated on a semantic level as follows: For sufficiently rich set of recovery methods F and sufficiently small for “almost any” individual H¨ older vector-function x(t) the following relationship holds Sx (, F) ≈ A + B log In most modern applications, one deal with a vector-function defined by its values on a discrete set of points (i.e., with a finite array of values). We will assume that this array of values is the trace of a continuous vector-function on
140
B. S. Darkhovsky et al.
some uniform grid at [0, 1]. Let us consider how the definition of (, F)-complexity should be transformed in such case. Suppose that we have n values of a vector-function. Let us choose a number 0 < S < 1 and discard uniformly [(1 − S)n] time point from the time series ([·] denotes an integer part of the number). Therefore, S is a fraction of remaining sample points after rejection. From the main result, it is easy to obtain that for “almost any” continuous vector-functions satisfying H¨ older condition and given by a discrete set of values on a uniform grid, if the family of approximation methods is reach enough and the sample size is large enough, the following relation holds: log (n) ≈ A(n) + B(n) log S. This relationship is the foundation of our model-free methodology for classification of multidimensional time series of an arbitrary nature, in particular for classification of multi-channel EEG records. The coefficients A and B will be called the -complexity coefficients.
3
Idea of Model-Free Classification; Some Examples of Real Data Classification
Our main idea of the classification of multichannel EEG records is to use the -complexity coefficients of both the vector series itself and its finite differences of different orders (these differences serve as analogues of derivatives) as a feature space. This feature space does not use any time series models, and one can use well-known classifiers (for example, Random Forest and Support Vector Machine) to search for classes in it. Let us give two examples of the classification of real EEG recordings using our technology. Example 1. Classification of multichannel EEG-records to recognize patients with schizophrenia. The study included a group of 39 adolescents without diagnosed schizophrenia and a group of 45 adolescents with schizophrenia. For all subjects EEG were registered using the standard 10/20 international electrode scheme involving 16 electrodes (O1, O2, P3, P4, Pz, T5, T6, C3, C4, Cz, T3, T4, F3, F4, F7, F8) relative linked earlobe electrodes. The frequency of the recordings 128 Hz and the length of each record after removing of artifacts is 7680 points. The multichannel EEG record is treated as a projection of a continuous vector-function x(t) = (x1 (t), . . . , x16 (t)), t ∈ [a, b] on the uniform grid. For patient we estimated complexity coefficients Ai , Bi according to our technology. Here i is the patient number, i = 1, . . . , 84, where first 39 patients are normal and last 45 patients are with schizophrenic types of disorders.
Complexity of Continuous Functions
141
After that we replaced the original EEG signals by its differences, x(1) (t) = x(t)−x(t−1) (t = 1, . . . , n−1), then x(2) (t) = x(1) (t)−x(1) (t−1) (t = 1, . . . , n− 2), then x(3) (t) = x(2) (t)−x(2) (t−1) (t = 1, . . . , n−3), x(4) (t) = x(3) (t)−x(3) (t− 1) (t = 1, . . . , n − 4). For each of them we estimated the complexity coefficients ADki , BDki , k=1,2,3,4, We tested different combinations of the -complexity coefficients of original series and series of finite differences and found that the complexity coefficients Ai , Bi and complexity coefficients of 4th differences AD4i , BD4i (i = 1, . . . , 84) are the best for classification of our patients into two groups case (schizophrenic type symptoms) and control. It was found that in this feature space the accuracy of the classification was about 85% (see details in [7]). Example 2. Classification of fNIRS-records to recognize patients with autism spectrum disorder. The study included 21 participants aged 20–30. All participants were righthanded. Exclusion criteria were structural heart disease, neurologic disease, diabetes, psychoactive medications, and scalp or hair not permitting adequate optical light detection. All participants took the Autism Spectrum Quotient (AQ) test that quantifies autistic traits in adults. The minimum score on the AQ is 0 and the maximum 50. If an adult has equal to or more than 32 out of 50 such traits, this is highly predictive of ASD (Autism Spectrum Disorder). In this study, that deals with healthy participants, we divided participants to high and low AQ according to a grade of 20 or above. The experiment included three different conditions: Alone, Spontaneous and Synchronisation, analysis was performed only for Synchronization one where the participant and a research assistant sat facing each other and were instructed to move their right hand in synchronization. Condition were interleaved, with a rest period of 15 s between them. Alone condition was repeated three times and Spontaneous and Synchronization four times. Each condition lasted 40 s. For each participant, there were from 3 to 5 segments in the Synchronous state (within one record) with an approximate duration of 100 to 450 samples (10–45 s). The original segments were split into small segments of the same length (10 s each). Thus, on average, 10–15 fragments were obtained for each patient. For each patient, we estimated complexity coefficients A and B for the original fNIRS signal and for its differences up to 4th order. To the obtained dataset, we added some more features (median, maximum spread and interquartile range). As a result we got 30 features. Then we conducted an experiment using cross-validation based on the SVM classifier, and we obtained classification accuracy according to the accuracy metric 78.9% for 5-fold cross-validation and 88.9% for leave-one-out (LVO) cross-validation. We are going to give full description of this experiment in separate paper.
142
4
B. S. Darkhovsky et al.
Conclusion
In this paper, we described the ideas of fundamentally new technologies for the classification of multichannel EEGs. These technologies are based on the concept of -complexity of continuous vector-functions developed in our works. The main feature of the proposed technologies is that they do not use any mathematical models of the EEG signal, i.e. are model-free. Thanks to this, it becomes possible to construct a feature space for classification by the targeted search method in a relatively small space. Two examples of applying new technologies to real data are described. The proposed methods can be used to classification of multidimensional data in many other applications. Acknowledgements. We want to thank our colleagues Z. Volkovich and A. Dahan from ORT Braude College, Israel, and H. Gvirts from Ariel University, Israel for the fNIRS data provided for experiments. This work was supported by Russian Foundation for Basic Research (projects nos. 17-29-02115, 20-07-00221).
References 1. Bambad, M., Zarshenas, H., Auais, M.: Application of BCI systems in neurorehabilitation: a scoping review. Disabil. Rehabil. Assist. Technol. 10(5), 355–364 (2015) 2. Kaplan, A.Y.: Neurophysiological foundations and practical realizations of the brain-machine interfaces in the technology in neurological rehabilitation. Hum. Physiol. 42(1), 103–110 (2016) 3. Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. Springer, New York (2007) 4. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks. Cole Statistics/Probability Series. CRC press, Boca Raton (1984) 5. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000) 6. Kaplan, A.Y.: Nonstationary EEG: methodological and experimental analysis. Success Physiol. Sci. 29, 35–55 (1998) 7. Piryatinska, A., Darkhovsky, B., Kaplan, A.: Binary classification of mutichannelEEG records based on the -complexity of continuous vector functions. Comput. Methods Program. Biomed. 152, 131–139 (2017) 8. Darkhovsky, B.S., Piryatinska, A.: New approach to the segmentation problem for time series of arbitrary nature. Proc. Steklov Inst. Math. 287, 54–67 (2014) 9. Darkhovsky, B.S.: On a complexity and dimension of continuous finite-dimensional maps. In: Theory of Probability and its Applications (2020). In press 10. Kolmogorov, A.N.: Combinatorial foundations of information theory and the calculus of probabilities. Russ. Math. Surv. 38(4), 29–40 (1983)
Towards Neuroinformatic Approach for Second-Person Neuroscience Lubov N. Podladchikova(&), Dmitry G. Shaposhnikov and Evgeny A. Kozubenko
,
Research Centre of Neurotechnology, Southern Federal University, 194, Stachka Avenue, Rostov-on-Don 344090, Russia [email protected]
Abstract. In this work experimental and neuroinformatic approaches to solve current problems in second-person neuroscience area are presented. This direction was called as second (multi)-person neuroscience in opposite oneperson neuroscience. During very long time, inter-person interaction was a subject of conceptual and heuristic consideration in frameworks of psychological and social science. At present, there are many studies in inter-person communication area by experimental neuroscience methods. Up to now research in this area has mainly focused on the accumulation of single phenomena and development of methodology. Main attention in our study is paid to dynamics of gaze fixations and emotions of test participants jointly viewed videos because these types of human activity are always the first behavioral responses during tasks solution. Given the wide variety of known methods used and the results obtained in the field of second-person neuroscience, the following items may be determined as priorities for our research: (i) the development of methods for rigorous quantitative evaluation of the results obtained for example to estimate the synchronicity for brain activity and eye movements; (ii) direct comparison of viewing emotional videos in the conditions of one-person and two-person experiments; (iii) the search for criteria for the selective assessment of the contribution of the mechanisms of the lower and upper levels of visual attention to the observed phenomena. At present, we are starting the investigations of these objectives. Initial testing of the developed experimental approach indicates the possibility of successfully solving the tasks under studying. Keywords: Second-person neuroscience Emotional video clips Fixation of eye movements Area of interest Scan path EEG Activity of autonomous nervous system
1 Introduction During very long time, inter-person interaction was a subject of conceptual and heuristic consideration in frameworks of psychological and social science. At present, there are many studies in inter-person communication area by experimental neuroscience methods [5, 10, 14, 20, 25]. This direction was called as second (multi)-person neuroscience in opposite one-person neuroscience [25]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 143–148, 2021. https://doi.org/10.1007/978-3-030-60577-3_16
144
L. N. Podladchikova et al.
Up to now research in this area has mainly focused on the accumulation of single phenomena and development of methodology [9, 20, 25, 28]. The basic directions of research in this field are as follows: behavioral actions, eye movement recording, hyper-scanning of brain activity (EEG or fMRT), responses of autonomous nervous system (skin-galvanic potential, heart rate and its variability), estimation of emotions and individual characteristics of the activity for interacted participants. Based on the accumulated results, it is assumed that the human brain is not as individual as previously thought [8]. Given the wide variety of known methods used and the results obtained in the field of second-person neuroscience, the following items may be determined as priorities for our research: (i) the development of methods for rigorous quantitative evaluation of the results obtained for example to estimate the synchronicity for brain activity and eye movements; (ii) direct comparison of viewing emotional videos in the conditions of one-person and two-person experiments; (iii) the search for criteria for the selective assessment of the contribution of the mechanisms of the lower and upper levels of visual attention to the observed phenomena. At present, we are starting the investigations of these objectives.
2 Known Results in Second-Person Neuroscience: An Overview The studies performed by experimental neuroscience methods are carried out in several directions namely: behavioral actions, eye movement recording, hyper-scanning of brain activity (EEG or fMRI), reactions of autonomous nervous system (skin-galvanic potential, heart rate and its variability), estimation of emotions and individual characteristics of interacting participants [5, 6, 19, 20, 25, 27]. In any case, synchronous registration of various activities is used when test participants jointly perform different tasks during video clips viewing. Work of Yun et al. [27] is one that proposed a quantitative criterion for evaluating visual-motor coordination of test participants, namely, the index of synchronicity for the positions of the gaze and the fingertip. During the behavioral test, emotional reactions and EEG were also recorded from both participants simultaneously. It was revealed that synchrony of both the movement of the fingertips and neural activity between the two participants increased after cooperative interaction. The authors suggest that increasing the synchrony of interpersonal body movements during interpersonal interaction can be a measurable basis for implicit social interaction. Anderson et al. [1] developed a method Recurrence quantification analysis of eye movements, which can quantify the visual attention dynamics by calculation of the location and number of return fixations. Previously, we identified return fixations by similar method at viewing stationary images and scenes in one-person paradigm [18, 23]. Dual registration of eye movements in combination with other methods is used in most studies in the field of two-person neuroscience [1, 5, 8, 11–14, 16, 21, 22]. Using this method was determined that when solving joint communication tasks, the eye was attracted to faces and to the eye areas [12, 24]. It was found that psychophysiological reactions to eye contact are more pronounced at presentation of living face rather than
Towards Neuroinformatic Approach for Second-Person Neuroscience
145
its static image [11, 19, 26]. At the same time, when presenting static images of previously examined persons in video, the eyes are more often attracted to persons with a frontal position than to those with averted eyes [11], which is interpreted as the participation of the long-term memory mechanisms. Wide range of brain activity synchronization phenomena is described in [8, 10, 13, 16]. In some studies [16], EEG hyper-scanning was performed without direct eye contact between people, especially in the case of a remote conversation. Phase synchronization was detected in different frequency bands and between different brain areas. Up to now, the mechanisms and functional role of synchronization of the brain activity of individuals involved in joint solution of communication tasks remain the subject for various hypotheses. One of these hypotheses is the assumption that the mirror neural system is involved in regulating the synchronization of both body actions and brain activity of interacting individuals [25]. Many studies have found significant emotional reactions of people involved in watching affective videos [6, 22, 24]. To assess the emotional state of test participants, several methods are used, such as self-report, facial expressions, and the reactions of the autonomic nervous system. The similarity of the results obtained in different laboratories is described. For example, when jointly examined events are accompanied by strong emotions, the EEG activity of individuals becomes more synchronized [15]. In addition, when examining both static and dynamic scenes, emotionally significant fragments, especially the faces and eyes, primarily attract the gaze, which was estimated by the latent period and the duration of the first fixations. Significant differences among persons in eye movement parameters were shown during viewing both static [17] and dynamic scenes [21]. In particular, the obvious differences between individuals in the preference of face areas for gaze fixations, namely the eyes or mouth, during the conversation were revealed in [21]. One more individualizing factor is the type of scan path [18, 23], which allows to evaluate the contribution of the dominant type of visual attention (focal or spatial) of this subject to the phenomena observed in the study of interaction between persons.
3 Experimental Setup Developed experimental setup for simultaneous recording of various human activity during viewing of emotional video clips is presented in Fig. 1. Some modifications of method from work Golland et al. [6] were developed for probing tests. In particular, in addition to the lack of direct eye contact and registration of reactions of the autonomous nervous system, as in [6], eye movements and EEG were simultaneously recorded (Fig. 1). As visual stimuli were used video clips from Annotated Creative Commons Emotional Database [2] (https://liris-accede.ec-lyon.fr). Three types of short emotional video clips namely, positive, negative and neutral ones, (n = 12) were selected from this database. The Bioethics Committee of SFedU approved the experimental protocol N 6. Each volunteer signed the agreement to participate in the experiment. Initial testing of the developed experimental setup indicates the possibility of successfully solving the tasks under studying.
146
L. N. Podladchikova et al.
Fig. 1. Experimental setup: a) scheme of human activity recordings; b) recording of eye movements during viewing emotional video clips jointly by two participants.
4 Neuroinformatic Approach to Solve the Current Problems in the Second-Person Neuroscience In several works the models of some phenomena in second-person neuroscience [3, 4, 7, 28] were developed. The most of these models operates with standard neural network methods such as reinforcement learning [3], deep neural networks [4], cascade and parallel convolutional recurrent neural networks [28] and do not use the formalization of experimental data that concern the assessment of eye movements and emotions. These objectives will be solved in our modeling studies. As in our previous studies of viewing stationary images and scenes in one-person paradigm [17, 18, 23], a neuroinformatic approach will be used including formalization of the related quantitative parameters of the synchrony of body actions and brain activity, the development of realistic models, the implementation of computer experiments and the verification of model assumptions. Developed earlier methods to identify areas of interest and viewing scan path types such as estimation of individual features of test participants will be modified for analysis of raw data received during watching emotional video clips.
5 Conclusions In the overview, basic methods, the results obtained and unsolved tasks in the area of second-person neuroscience when test participants jointly viewed videos have been considered. Experimental and neuroinformatic approaches to solve current problems in second-person neuroscience area are presented. Main attention is paid to dynamics of gaze fixations and emotions of test participants jointly viewed videos because these types of human activity are always the first behavioral responses during tasks solution. Initial testing of the developed experimental approach indicates the possibility of successfully solving the tasks under studying.
Towards Neuroinformatic Approach for Second-Person Neuroscience
147
Acknowledgments. This work is supported by the Ministry of Science and Higher Education of the Russian Federation in the framework of Decree No. 218, project N 2019–218-11–8185 «Creating high-tech production of a software package for managing human capital based on neurotechnology for enterprises in the high-tech sector of the Russian Federation»
References 1. Anderson, N., Bischof, W., Laidlaw, W., Risko, E., Kingstone, A.: Recurrence quantification analysis of eye movements. Behav. Res. Methods 45(3), 842–856 (2013) 2. Baveye, Y., Dellandrea, E., Chamaret, C., Chen, L.: LIRIS-ACCEDE: a video database for affective content analysis. IEEE Trans. Affect. Comput. 6(1), 43–55 (2015) 3. Botvinick, M., Ritter, S., Wang, J., Kurth-Nelson, Z., Blundell, C., Hassabis, D.: Reinforcement learning, fast and slow. Trends Cogn. Sci. 23(5), 408–422 (2019) 4. Cichy, R., Kaiser, D.: Deep neural networks as scientific models. Trends Cogn. Sci. 1886, 1– 13 (2019) 5. García, A., Ibáñez, A.: Two-person neuroscience and naturalistic social communication: the role of language and linguistic variables in brain-coupling research. Front. Psych. 5, art. 124 (2014) 6. Golland, Y., Arzouan, Y., Levit-Binnun, N.: The mere co-presence: synchronization of autonomic signals and emotional responses across co-present individuals not engaged in direct interaction. PLoS ONE 10(5), e0125804 (2015) 7. Gunkel, D.J.: Computational interpersonal communication: communication studies and spoken dialogue systems. Communication+ 15(1), 1–20 (2016) 8. Hari, R., Himberg, T., Nummenmaa, L., Hämäläinen, M., Parkkonen, L.: Synchrony of brains and bodies during implicit interpersonal interaction. Trends Cogn. Sci. 17(3), 105– 106 (2013) 9. Kharitonov, A., Zhegallo, A., Ananyeva, K., Kurakova, O.: Registering eye movements in collaborative tasks: methodological problems and solutions. Percep. ECVP Abstract 41, 104–105 (2012) 10. Liu, D., Liu, Sh., Liu, X., Zhang, Ch., Li, A., Jin, Ch., Chen, Y., Wang, H., Zhang, X.: Interactive brain activity: review and progress on EEG-based hyper-scanning in social interactions. Front. Psychol. 9, art. 1862 (2018) 11. Lyyra, P., Myllyneva, A., Hietanen, J.K.: Mentalizing eye contact with a face on a video: gaze direction does not influence autonomic arousal. Scand. J. Psychol. 59(4), 360–367 (2018) 12. Macdonald, R.G., Tatler, B.W.: Gaze in a real-world social interaction: a dual eye-tracking study. Q. J. Exp. Psychol. 71(10), 2162–2173 (2018) 13. Manssuer, L.R., Roberts, M.V., Tipper, S.P.: The late positive potential indexes a role for emotion during learning of trust from eye-gaze cues. Soc. Neurosci. 10(6), 635–650 (2015) 14. Marana, T., Furtnera, M., Liegla, S., Krausc, S., Sachseb, P.: In the eye of a leader: eyedirected gazing shapes perceptions of leaders’ charisma. Leadersh. Q. 30, 101337 (2019) 15. Nummenmaa, L., Glerean, E., Viinikainen, M., Jääskeläinen, I.P., Hari, R., Sams, M.: Emotions promote social interaction by synchronizing brain activity across individuals. PNAS 109(24), 9599–9604 (2012) 16. Pfeiffer, U.J., Vogeley, K., Schilbach, L.: From gaze cueing to dual eye-tracking: novel approaches to investigate the neural correlates of gaze in social interaction. Neurosci. Biobehav. Rev. 37(10), 2516–2528 (2013)
148
L. N. Podladchikova et al.
17. Podladchikova, L.N., Koltunova, T.I., Shaposhnikov, D.G., Lomakina, O.V.: Individual features of viewing emotionally significant images. Neurosci. Behav. Physiol. 47(8), 941– 947 (2017) 18. Podladchikova, L.N., Shaposhnikov, D.G., Koltunova, T.I.: Spatial and temporal properties of gaze return fixations while viewing affective images. Russ. J. Physiol. 104(2), 245–254 (2018) (In Russian) 19. Pönkänen, L.M., Alhoniemi, A., Leppänen, J.M., Hietanen, J.K.: Does it make a difference if I have an eye contact with you or with your picture? an ERP study. Soc. Cogn. Affect. Neurosci. 6(4), 486–494 (2011) 20. Redcay, E., Schilbach, L.: Using second-person neuroscience to elucidate the mechanisms of social interaction. Nat. Rev. Neurosci. 20(8), 495–505 (2019) 21. Rogers, S.L., Speelman, C.P., Guidetti, O., Longmuir, M.: Using dual eye tracking to uncover personal gaze patterns during social interaction. Sci. Rep. 8, 1–19 (2018) 22. Rubo, M., Gamer, M. Social content and emotional valence modulate gaze fixations in dynamic scenes. Sci. Rep. 8(1), 1–11 (2018) 23. Samarin, A., Koltunova, T., Osinov, V., Shaposhnikov, D., Podladchikova, L.: Scanpaths of complex image viewing: insights from experimental and modeling studies. Perception 44(8– 9), 1064–1076 (2015) 24. Scheller, E., Büchel, C., Gamer, M.: Diagnostic features of emotional expressions are processed preferentially. PLoS ONE 7(7), e41792 (2012) 25. Schilbach, L., Timmermans, B., Reddy, V., Costall, A., Bente, G., Schlicht, T., Vogeley, K.: Toward a second-person neuroscience. Behav. Brain Sci. 36, 393–462 (2013) 26. Smith, T.J., Mital, P.K.: Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes. J. Vis. 8(16), 1–24 (2013) 27. Yun, K., Watanabe, K., Shimojo, Sh.: Interpersonal body and neural synchronization as a marker of implicit social interaction. Sci. Rep. 2, 959 (2012) 28. Zhang, D., Yao, L., Zhang, X, Wang, S., Chen, W., Boots, R.: Cascade and parallel convolutional recurrent neural networks on EEG-based intention recognition for brain computer interface. In: Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA , pp. 1703–1710 (2017).
Is the Reinforcement Learning Theory Well Suited to Fit the Functioning of the Cerebral Cortex-Basal Ganglia System? Irina A. Smirnitskaya(&) Research Institute for System Analysis, Russian Academy of Sciences, Moscow, Russia [email protected]
Abstract. The research of W Schultz in the late 1980s and early 1990s of the effect of uncertainty in reward delivery in the behavioral experiments with monkey on the release of dopamine by dopaminergic structures of the midbrain [1, 2] highlighted the analogy between the amount of phasic dopamine release by dopaminergic structures and the reward prediction error of the RL theory [3]. Since that the functioning of the cortex-basal ganglia system is analysed as a possible Reinforcement learning (RL) [4] network. This system is an array of partly connected parallel loops. The basal ganglia is divided into dorsal and ventral subdivisions. In accordance with their functions we can further distinguish four parts in it: dorsolateral striatum, dorsomedial striatum, nucleus accumbens core, nucleus accumbens medial shell. The part of the whole cerebral cortex-basal ganglia system with a center in the dorsolateral striatum may represent action a, used in RL theory, those with the center in dorsomedial striatum represent action value Q(s,a,), with nucleus accumbens core contain state value V(s), the part of this system based on nucleus accumbens medial shell calculates policy p, but in different way than RL theory does Keywords: Reinforcement learning theory Basal ganglia Dorsomedial striatum Dorsolateral striatum Ventral striatum Dopamine
1 Introduction The hallmark of the developed intelligence is the ability to respond flexibly to any change of external environment. In artificial devices, this quality corresponds to the decision-making function. A widely used technique of the modern brain research is to build analogies of the Reinforcement Learning theory (RL) [4] and the algorithm of the working those part of the brain that generates commands to permit or prohibit various actions. The decisive argument was the finding that the amount of dopamine release by dopaminergic structures is proportional to the reward prediction error - an important variable in TD method of the RL theory [3]. The part of the brain that works with dopamine includes the neocortex, thalamus, and basal ganglia. It is a distributed, hierarchically organized system. The principles of its organization can be clarified in special behavioral experiments. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 149–156, 2021. https://doi.org/10.1007/978-3-030-60577-3_17
150
I. A. Smirnitskaya
2 Structure and Functions of the Basal Ganglia The function of the cerebral cortex is to remember objects, that the subject have ever met and to make decisions about actions with them. To perform actions with an object, a distributed representation of it is formed in the cortex, which reflects not only different characteristics of the object itself, but also the way we interact with similar objects in the surrounding world (Fig. 1).
Motor, premotor, and somatosensory cortex
Posterior parietal cortex
Medial prefrontal cortex Insular cortex Orbitofrontal cortex Inferotemporal cortex
Fig. 1. The schematic view of the areas of the cortex involved in decision-making (for primates). The object’s visual features are represented in the posterior parietal cortex; the object, as a possible goal to deal with, is represented in the inferior temporal cortex; object’s value is represented in the orbitofrontal cortex; the areas that calculate the possibility of changing the current goal to a new one are in the medial prefrontal cortex and lateral cortex (not shown). The executive areas – motor and premotor cortex, which operates under the permission of the dorsolateral and ventrolateral prefrontal cortex (not shown). Dotted lines indicate the insular cortex, hidden inside the Lateral sulcus, and the medial prefrontal cortex, located on the medial wall of the hemisphere
To make a decision, it is not enough to remember the object's visual features, but also its value (determined by the results of past actions with this object), the value of other objects that can compete with it for our attention, and to know that often the object is important not in itself, but as an element of some larger structure. These properties of the object are represented in different parts of the cortex (Fig. 1). The decision making occurs as a result of exchange of signals: the intracortical, and with subcortical structures, such as the thalamus and the basal ganglia. The function of the basal ganglia in behavioral choice is twofold. In general, they are a two-level structure consisting of inhibitory neurons. The basal ganglia input module - the striatum, receives a signal from the cortex, and send the signal to the globus pallidus. The signal from the output module - the globus pallidus, goes to the thalamus. If the input from the cortex is strong enough to excite the striatum neurons,
Is the RL Theory Well Suited to Fit the Functioning of the Cerebral
151
then they inhibit the output module neurons, and they do not provide inhibition to the thalamus. This operation is called the disinhibition. As a result, the thalamus transmits a signal to the cortex. If the cortical signal is weak, the disinhibition does not occur, the signal from the thalamus to the cortex does not pass. In this way, cortical patterns are contrasted: strong signals are amplified and weak signals are weakened. The second function of the striatum is closely related to the first - the striatum controls the release of the neuromodulator dopamine. Only in the presence of dopamine the striatum neurons are fired by a cortical signal. Only in the presence of dopamine the learning occurs - the increasing of the effectiveness of the cortical input weights in the striatum. Like the distributed representation of the external world in the cerebral cortex, the cortex-basal ganglia system is also distributed; its individual parts play different, but complementary roles. Structurally, it is a set of parallel, partially connected loops [5]. The ventral striatum affects dopamine release more strongly, in this way keeping a control of dorsal striatum [6]. Shown in Fig. 2 are four subsections of the basal ganglia which are playing different, relatively independent roles. We try to interpret the data of behavioral experiments by assuming that the different parts of the striatum represent different variables of the RL theory.
3 The Roles of Ventral and Dorsal Basal Ganglia Regions Behavioral experiments revealed differences in the ventral and dorsal regions of the striatum. The Pavlovian conditioning is the development of the ability to anticipate the reward when a cue signal (for example, audio or visual) appears. That is, a rat in an experimental chamber, when hearing the bell, runs to the food port, because it has learned that after the signal, food will appear here. Instrumental conditioning differs from Pavlovian: in this case the appearing of the cue signal requires rat to perform an action such as lever pressing, only then food will appears. Removing the dorsal striatum prevents instrumental learning; removing the ventral one prevents both types of learning. A detailed examination showed that the ventral and dorsal striatum are in turn subdivided into two parts that perform different functions [7] (Fig. 2). 3.1
Dorsal Striatum. Acquisition of Goal-Directed Actions and Formation of Habits
In instrumental learning, there are two stages: early, when the movement is learned, but from time to time errors occurs, and late, with the execution brought to automatism. You can learn to perform stereotypical actions without reward, only by repeating them many times. Therefore, actions performed to get a reward are called goal-directed, in contrast to stereotypical, habitual actions. The different roles of the dorsal striatum subdivisions had been demonstrated in experiments [8] with outcome devaluation: first, the rats were trained to press a lever for sucrose, than sucrose was devalued, by pairing its consumption with illness (induced by an injection of lithium chloride). After that, the sucrose no longer seemed attractive to her, and the rat stopped pushing the lever. If the same procedure was performed with the rat trained to automatism, the devaluation of the reward did not
152
I. A. Smirnitskaya
affect the instrumental actions: after cue signal the rat still pressed the lever. Later [9], it was shown that the dorsomedial striatum is responsible for goal-directed actions, its lesion led to the same automatic behavior, the rat did not feel the devaluation of the outcome, she continued to perform the devalued action. Lesion of the dorsolateral striatum, on the contrary, made even well-learned actions sensitive to outcome devaluation [10], it was impossible to develop a habit, in other words, all actions became goal-directed.
Action value Q(s,a)
State value V(s)
Non-RL, In-fly decision making
Motor output for RLlearning, habit learning
Medial prefrontal cortex, amygdala, hippocampus
NAc medial shell
Medial orbitofrontal cortex
Prefrontal, parietal cortex
Dorsomedial striatum
NAc core
Ventral pallidum
VTA
Sensorymotor cortex
Dorsolaterall striatum
GP i / SNr
SNc DOPAMINE
Fig. 2. The distributed nature of decision-making by the cerebral cortex-basal ganglia system (the thalamus is not shown). Four subdivision of the system are shown. The part of the whole cerebral cortex-basal ganglia system with a center in the dorsolateral striatum represents action a, used in RL theory, those with the center in dorsomedial striatum represent action value Q(s,a), the part with nucleus accumbens core (NAc core) contain state value V(s), the part of this system based on nucleus accumbens medial shell calculates policy p Unpainted arrows – excitatory glutamatergic inputs, solid arrows -inhibitory inputs, dotted arrows - dopamine. In primates, rodent dorsomedial striatum corresponds to caudate head, dorsolateral striatum corresponds to caudal putamen. Subdivisions of the ventral striatum - Nucleus accumbens core and Nucleus accumbens medial shell
Previously, it was believed that the neurons of the dorsomedial striatum are active at the beginning of learning, and as the stereotype is formed, the neurons of the dorsolateral subdivision become active predominantly- the activity “passes” to the region, designed specifically for this function [11]. The study [12] verified this assumption. The rat was trained to press the lever 5 times in a row. The procedure was as follows: the rat was placed in an experimental chamber, where it waited for the appearance of the lever, after which it had to learn to make 5 lever presses. Then the lever was removed, which served as a signal for the appearance of food in the food port. At an early stage of learning, neurons in the dorsomedial part were active either when the lever appeared or disappeared (that is, during the “start” and “stop” signals), and neurons in the dorsolateral part were active during the entire period from the
Is the RL Theory Well Suited to Fit the Functioning of the Cerebral
153
appearance to the disappearance of the lever. At a later stage of training, their activity became almost identical, with a slightly more pronounced reaction of the dorsomedial striatum to the “start” and “stop” signals. That is, the transition of activity from the dorsomedial to the dorsolateral striatum did not occur. The experimental data showed that different striatum regions each perform their own function, but with repeated training, the neurons of both sections begin to work in a similar way, synchronously with movements. 3.2
Ventral Striatum and the Control of Dopamine Release
Since pioneering work [13], the ventral striatum has been considered as the limbicmotor interface. After the discovery of W. Schultz that the phasic dopamine release is proportional to the reward prediction error [3], the striatum’s functioning was described in terms of the Reinforcement Learning theory [4]. The importance of the ventral striatum in learning is due to its effect on the phasic dopamine release in the midbrain structures. The RL theory explains the process of trial and error learning. Describing the sequence of actions learning, the theory shows how a cue, that predicts the appearance of the reward, which at first has no value for the learning subject, during training gradually acquires value. The striatum is divided into sections that differ both in their internal structure and in their connections with surrounding structures [14, 15]. In [16], it was found that the phasic release of dopamine in the NAc core was becoming linked to a signal predicting future reward at an early stage of the learning, therefore confirming that this cue signal acquired value and the NAc core’s neurons activity is proportional to it. This is in the agreement with RL theory. Moreover in [16] it was found that the release of dopamine in the NAc shell increases during all events requiring an immediate response (possible explanation see below, in Discussion).
4 Pavlovian-To-Instrumental Transfer Pavlovian-to-instrumental transfer (PIT) is usually studied in animals, although it can also be observed in humans [7]. The rat participates in two experimental procedures first it is Pavlovian training with two types of conditional signals: for example, a bell foreshadows the appearance of a grain pellet, and white noise signals an opportunity to lick a drop of sucrose. The second experiment is instrumental with the same reward options: 2 levers appear in the experimental chamber, - pressing the left one leads to the issuance of a grain pellet, pressing the right one is rewarded with sucrose. And finaly, the trained rat is placed in a chamber with two levers and a sound signal is given, those, that in Pavlovian training. The rat will press the lever with which the same reward was paired in Pavlovian stage. Removing the NAc shell erodes the memory of the difference between stimuli in previous procedures - the rat presses both levers with equal probability. Removal of the NAc core did not interfere with memory [7].
154
I. A. Smirnitskaya
5 Insular Cortex and Ventral Striatum From Fig. 2 we can make inferences that cortex-basal ganglia–midbrain subsystems work as a whole: afferents to their cortex parts are afferents to the whole subsystem. The main cortical input to the NAc core comes from the insular cortex. If we search for implementation of the RL in the brain, there should be some structure in it that receives information about the amount of reward after performing action. The best candidate is the insular cortex, which receives inputs that carry information about the food, pain, the state of internal milieu, that is, about everything resulted from the subject's behavioral choice. In addition, given that the state of the environment s in RL theory, for the animal most often acts as the appearance of the object it want to grasp, or push; or the furnishings of the room in which it can or must not enter, that is, “context”; or the sound, as characteristic of the context, then the animal must remember together the features of the object and the appropriate reward. The insular cortex is divided into three areas that receive a reward signal from the internal environment (granular insula), the area that receives signals about the external, sensory characteristics of the objects (disgranular insula), and the area that sends signals to the executive structures (anterior insula). The involvement of the insular cortex in learning was confirmed, for example, in [17], where the activity of neurons in the insular cortex of mice was recorded during the Pavlovian learning. In mice that learned that the sound signal foreshadows the appearance of food in the food port, neural activity appeared in the insular cortex at the moment of sound signal, and the rat runs to the food port in advance. Inactivation of the insular cortex removed this behavior. In [18], it was shown that the interaction of the insular cortex and NAc core mediates the effect of memory on behavioral choice. The rats developed an instrumental reflex of lever pressing for a reward: pressing the left and right levers was rewarded by different types of food. Then one type of reward was devalued, giving unlimited access to this type of food. Rats with inactivation of the insular cortex in one hemisphere and the NAc core in the other hemisphere did not notice the devaluation.
6 Conclusion At first glance, it seems inappropriate to compare a simple mathematical theory and a complex control device of a living organism. However, the theory of reinforcement learning, like mathematical theory, operates with the most simplified model of trialand-error learning, which is the same kind of learning that animals use. This means that the search for analogues for the variables of RL theory among the modules of the brain is quite acceptable. Thus, it is possible to draw analogues between the parts of the cerebral cortex-striatum-dopaminergic structures complex and the modulus calculating the control variables of the RL theory: 1) a part centered in the dorsolateral striatum and an output unit that sends motor commands a;
Is the RL Theory Well Suited to Fit the Functioning of the Cerebral
155
2) a part centered in the dorsomedial striatum and a block that calculates the action value Q(s,a). This block can contain either a single action or an entire sequence of actions; 3) a part centered in NAc core and a block containing a representation of the state value V(s). The fourth block, centered in The NAc medial shell, which makes a real-time decision about an action, corresponds to the block that calculates the behavior strategy, policy p of the RL theory, but it is of importance that it does not choose between a new and already tested strategy according to the epsilon-greedy rule. This block is more complex - the choice is made through a dialogue between the medial prefrontal cortex, basolateral amygdala, and hippocampus [19]. The behavior difference is obvious humans and vertebrates have a property of curiosity - in contrast to the theory of RL, the average probability of choosing a novelty option is greater than the probability of choosing an already known one. This review provides some data concerning the contribution of brain system centered in the basal ganglia to learning and behavior. It reveals the acceptability of RL theory for characterization of this process. But there are two clear discrepancies. First, for animal it takes only few trials to solve typical learning problem, which is orders of magnitude fewer than it takes for artificial learning system. Perhaps it is due to the flexibility of calculating the policy by the appropriate structure – the fourth block with the centre in NAc medial shell. Second, artificial system armed with Reinforcement learning theory can solve only those task that was assigned by its creator. Acknowledgements. The review was done within the 2020 state task 0065–2019-0003 Research into Neuromorphic Big-Data Processing Systems and Technologies of Their Creation
References 1. Schultz, W.: Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J. Neurophysiol. 56(5), 1439–1461 (1986) 2. Mirenowicz, J., Schultz, W.: Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379(6564), 449–451 (1996) 3. Schultz, W., Dickinson, A.: Neuronal coding of prediction errors. Annu. Rev. Neurosci. 23, 473–500 (2000) 4. Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press, Cambridge (1998) 5. Alexander, G.E., DeLong, M.R., Strick, P.L.: Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Ann. Rev. Neurosci. 9, 357–381 (1986) 6. Haber, S.N., Fudge, J.L., McFarland, N.R.: Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. J. Neurosci. 20(6), 2369–2382 (2000) 7. Balleine, B.W., O’Doherty, J.P.: Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology 35, 48–69 (2010) 8. Adams, C.D., Dickinson, A.: Instrumental responding following reinforcer devaluation. J. Exp. Psychol. 33B, 109–121 (1981)
156
I. A. Smirnitskaya
9. Yin, H.H., Ostlund, S.B., Knowlton, B.J., Balleine, B.W.: The role of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005) 10. Yin, H.H., Knowlton, B.J., Balleine, B.W.: Lesions of dorsolateral striatum preserve outcome expectancy but disrupt habit formation in instrumental learning. Eur. J. Neurosci. 19, 181–189 (2004) 11. Graybiel, A.M.: Habits, rituals, and the evaluative brain. Ann. Rev. Neurosci. 31(1), 359– 387 (2008) 12. Vandaele, Y., Mahajan, N.R., Ottenheimer, D.J., Richard, J.M., Mysore, S.P., Janak, P.H.: Distinct recruitment of dorsomedial and dorsolateral striatum erodes with extended training. Elife 8, e49536 (2019) 13. Mogenson, G.J., Jones, D.L., Yim, C.Y.: From motivation to action: functional interface between the limbic system and the motor system. Prog. Neurobiol. 14(2–3), 69–97 (1980) 14. Groenewegen, H.J., Berendse, H.W., Meredith, G.E., Haber, S.N., et al.: Functional anatomy of the ventral, limbic system-innervated striatum. In: Willner, P., Scheel-Kruger, J. (eds.) The Mesolimbic Dopamine System, pp. 19–59. Wiley, New York (1991) 15. Zahm, D.S.: An integrative neuroanatomical perspective on some subcortical substrates of adaptive responding with emphasis on the nucleus accumbens. Neurosci. Biobehav. Rev. 24, 85–105 (2000) 16. Saddoris, M.P., Cacciapaglia, F., Wightman, R.M., Carelli, R.M.: Differential dopamine release dynamics in the nucleus accumbens core and shell reveal complementary signals for error prediction and incentive motivation. J. Neurosci. 35(33), 11572–11582 (2015) 17. Kusumoto-Yoshida, I., Liu, H., Chen, B.T., Fontanini, A., Bonci, A.: Central role for the insular cortex in mediating conditioned responses to anticipatory cues. Proc. Natl. Acad. Sci. U S A. 112(4), 1190–1195 (2015) 18. Parkes, S.L., Bradfield, L.A., Balleine, B.W.: Interaction of insular cortex and ventral striatum mediates the effect of incentive memory on choice between goal-directed actions. J. Neurosci. 35(16), 6464–6471 (2015) 19. Sil’kis, I.G.: The mechanisms of interdependent influence of prefrontal cortex, hippocampus and amygdala on the basal ganglia functioning and selection of behaviour. Zh. Vyssh. Nerv. Deiat. Im. IP. Pavlova. 64(1), 82–100 (2014)
Neural Activity Retaining in Response to Flash Stimulus in a Ring Model of an Orientation Hypercolumn with Recurrent Connections, Synaptic Depression and Slow NMDA Kinetics Vasilii S. Tiselko1,2(&), Margarita G. Kozeletskaya1, and Anton V. Chizhov1,2 1
Ioffe Institute, Politekhnicheskaya Street, 26, 194021 St. Petersburg, Russia [email protected] 2 Sechenov Institute of Evolutionary Physiology and Biochemistry of RAS, Torez pr., 44, 194223 St. Petersburg, Russia
Abstract. Cortical neural networks in vivo are able to retain their activity evoked by a short stimulus. We compare and analyze two mathematical models of ring-structured networks of an orientational hypercolumn of the visual cortex in order to distinguish the contributions of recurrent connections, synaptic depression and slow synaptic kinetics into the retention effect. Comparison of a more elaborated model with the classical ring-model has helped to translate the mathematical analysis of the later model to the former one. As shown, the network with developed recurrent connections reproduces the retention effect compared to that in experiments. The synaptic depression prevents the effect, however the long-lasting excitatory synaptic current recovers the property. Accounting of the slow kinetics of the NMDA receptors, a characteristic postpeak plateau of activity is reproduced. The models show an invariance to contrast of visual stimuli. Simulations reveal a major role of strong excitatory recurrent connections in the retention effect. Keywords: Flash-stimulus response retention
Recurrent connections Neural activity
1 Introduction The occurrence of prolonged neural activity in response to a brief flash-stimulus was initially described in studies related to visual perception [1–3]. With electrophysiological methods, the phenomenon of neural activity retention in response to brief flashstimulus (less than 50 ms), was observed in studies related to the processing of visual information in mice (10-20 ms in [4], 10–40 ms in [5], 1 ms in [6]), monkeys (25 ms in [7], 10–50 ms in [8], 10–50 ms [9]), cats (50 ms in [10]), rats (5 ms in [11], 1–10 ms in [12]), humans (18 ms in [13], 50 ms in [14]). Similar neural activity retention takes place in experiments with longer visual stimulation on humans (250 ms in [15]), and responses to non-visual stimulation such as electrical stimulation of fingertips of humans (0, 2 ms) [16]. It has been distinguished early and secondary © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 157–163, 2021. https://doi.org/10.1007/978-3-030-60577-3_18
158
V. S. Tiselko et al.
excitation in response to a visual flash-stimulus [17, 18]. Since a single neuron is not capable of maintenance of prolong activity, the observed phenomenon is to be determined by an active network property. Starting in the retina, visual information passes through thalamic nucleus LGN to the primary visual cortex [19, 20]. The structure of each of the areas suggests the presence of recurrent connections between neurons, which are especially crucial for functional orientational hypercolumns in the primary visual cortex [21–23]. Despite numerous experimental observations, exact mechanisms underlying the retention effect have not been revealed. In the present work, we investigated simple models of connected populations of neurons and tried to determine a « minimal » model capable of reproducing the effect. We considered the firing-rate ring model (Ring model) [24, 25] and another, more complex model with synaptic depression and slow kinetics of NMDA receptors (SDN model) [26].
2 Methods 2.1
Firing-Rate Ring Model
Simple firing-rate ring model (Ring model) was inspired by structure of neurons connections in the orientation hypercolumn of primary visual cortex [24, 25, 27], which provides an increase in orientation selectivity. The Ring model describes the activity of an orientation hypercolumn, in which neurons are evenly distributed along the ring according to the preferred orientation of a stimulus. The spike activity of inhibitory neurons in this model is recursively proportional to the activity of neighboring excitatory neurons. The firing-rate model is a system of first order differential equations, in which the dependence of firing-rate on the synaptic current is represented by a linear threshold function: sr
dfi ¼ fi þ f st ðIisyn Þ; dt
ð1Þ
where sr is the time constant, fi is the firing-rate of i-population, Iisyn is the synaptic current of i-population, hi ¼ 2pi=N is the angle of the preferred stimulus orientation, N is the number of populations in the hypercolumn, f st ð xÞ ¼ ½ x þ . The synaptic current of i-population is as follows: Iisyn ¼ N 1
N X ðJ0 þ J1 cosðhi hj ÞÞfj þ I0 þ I1 cosðhi h0 Þ;
ð2Þ
j¼1
where the first term is the total synaptic current generated by the recurrent connections, J0 , J1 are the connection strength parameters, I0 is the amplitude of the homogeneous part of a stimulus, I1 is the amplitude of the oriented part of the stimulus with the orientation h0 .
Neural Activity Retaining in Response to Flash Stimulus
2.2
159
SDN Model
The SDN model simulates activity of neural populations with recurrent connections, synaptic depression and slow kinetics of NMDA channels. This model is similar to the previous one, but the basic system of differential equations describes the dynamics of the current, instead of the population firing-rate: s
dI i ðtÞ i i ¼ I i ðtÞ þ Ilat ðtÞ þ Istim ; dt
ð3Þ
i where s is the time constant of excitatory transmission, the input stimulus is Istim ¼ P j i 1 lat j A½cosð2pði k Þ=N Þ þ , Ilat ðtÞ ¼ N wij Prel ðtÞr ðtÞ is the total input current from j lateral recurrent connections, and wlat ij ¼ glat 1 dij cosð2pði jÞ=N Þ – is the matrix of lateral recurrent connections, with the connection strength glat . The synaptic depression affects the current through the release probability Prel ðtÞ, which is given by the following first order differential equation:
sdepr
dPrel ðtÞ ¼ P0 1 þ sdepr r ðtÞð1 f Þ Prel ðtÞ; dt
ð4Þ
where the parameter sdepr ¼ 500 ms describes how quickly the synapses recover toward its default release probability P0 ¼ 1. The depression factor f ¼ 0:8 describes how much the synapses depress after each spike, thus changing the release probability ðPrel ! fPrel Þ. The firing rate is determined by a simple threshold function equivalent to that in the Ring model, but calculated depending on the sum of fast and slow fast slow excitatory processes r ðtÞ ¼ I ðtÞ þ I ðtÞ þ . 2.3
Quantification of the Retention Effect
Analyzing the profiles of neural activity in response to a flash stimulus, we estimate the following index Cloc as the ratio of the firing rate peak amplitude to the amplitude at the time moment at the plateau: Cloc
r tpeak ¼ plateau ; r ðt Þ
ð5Þ
where r ðtÞ ¼ maxi ri ðtÞ is the maximal firing-rate of all neurons. For most of the numerical experiments, the maximum peak was observed at the end of the stimulus, tpeak ¼ 18 ms, and the characteristic time for the plateau was chosen to be tplateau ¼ 100 ms.
160
V. S. Tiselko et al.
3 Results The SDN model is similar to the Ring model without the synaptic depression Prel . The both models are structured by a ring parameterized by an angle of orientations preferred by neuronal populations. A neuronal population at each node of the ring evolves according to the differential Eq. (1) in the Ring model and (3) in the SDN model. In contrast to the Ring model, where the Eq. (1) is written for the firing-rate, in the SDN model, the main Eq. (3) is for the current, which then determines the firing rate. Comparison of the models reveals the similarity between their parameters: the strength of the recurrent connections J1 glat , J0 ¼ 0 and the input stimulus. With these relations between the parameters, the models show quite similar solutions (Fig. 2a), thus approving the similarity between the models. It allows us to use for the SDN model the results of the mathematical analysis of the Ring model. In particular, the types of the solutions have been determined on the parameter plane (J0 , J1 ). The most important is the boundary between the domain of marginal solutions with bump attractors and the domain of amplitude instability [25]. The critical recurrent connection strength glat is about 4 (Fig. 1).
Fig. 1. The retention effect (Cloc Þ depends on the recurrent connection strength in responses to flash-stimulus. Dotted lines indicate the boundaries of amplitude instability domains for various models: Ring model, complete SDN with synaptic depression and NMDA, SDN without synaptic depression, SDN without both synaptic depression and NMDA.
Simulations show that the network with developed recurrent connections reproduces the retention effect. In response to a flash-stimulus the activity decreases after the
Neural Activity Retaining in Response to Flash Stimulus
161
end of the stimulation (Fig. 2a). The response duration increases with the recurrent connection strength up to the critical boundary of the amplitude instability (Fig. 1).
Fig. 2. (a) Firing rate response to a flash-stimulus (18 ms) in the Ring and SDN models with strong (solid lines) and weak (dashed line) recurrent connections. (b) Responses of the SDN model with and without the NMDA channels. (c–d) Profiles of the SDN solutions in orientation domain in response to constant oriented stimuli of different contrasts show that the effect of invariance to contrast is absent for weak recurrent connections (c) and takes place for strong ones (d), as in the Ring model [25].
Synaptic depression in the models that are based on only fast synaptic recurrent connections (without the slow kinetics of the NMDA receptors) abolishes the retention effect (Fig. 2a, dotted line). In case of taking into account the slow kinetics of NMDA receptors, a characteristic post-peak plateau of activity appears (Fig. 2b), similar to the plateau observed in the experiments, for instance in [8]. In the SDN model with the NMDA current but without the depression, the amplitude instability is observed with smaller values of the recurrent connections strength (Fig. 1). The profile of the activity distribution along the ring in the Ring model is the “bump”. Interestingly, the presence of contrast invariance in SDN model is consistent with the Ring model [25], because of matched parameter values (Fig. 2c, 2d).
162
V. S. Tiselko et al.
In complete SDN model with the synaptic depression and slow NMDA kinetics, we observed the retention effect compared to experimental data (Fig. 2b). Maximum of locking effect is reached with the recurrent connections strength close to the boundaries of the amplitude instability domain in the models without NMDA. With stronger connections, the peak amplitude increases and thus decreases the ratio Cloc (Fig. 1).
4 Conclusion Our analysis confirms that the networks with strong recurrent connections are able to retain the activity evoked by a flash-stimulus. The comparison of the more elaborated SDN model with the classical Ring-model has helped to translate the analysis of the later model to the former one. The SDN model reveals that the synaptic depression prevents the retention effect, however the long-lasting excitatory synaptic current recovers the property.
References 1. Sperling, G.: The information available in brief visual presentations. Psychol. Monogr. 74 (11), 1–29 (1960) 2. Coltheart, M.: The persistences of vision. Phil. Trans. R. Soc. Lond. B Biol. Sci. 290(1038), 57–69 (1980) 3. Loftus, G.R., Duncan, J., Gehrig, P.: On the time course of perceptual information that results from a brief visual presentation. J. Exp. Psychol. Hum. Percept. Perform. 18(2), 530– 549 (1992) 4. Land, R., Engler, G., Kral, A., Engel, A.K.: Response properties of local field potentials and multiunit activity in the mouse visual cortex. Neuroscience 254(19), 141–151 (2013) 5. Reinhold, K., Lien, A.D., Scanziani, M.: Distinct recurrent versus afferent dynamics in cortical visual processing. Nat. Neurosci. 18(12), 1789–1797 (2015) 6. Sachidhanandam, S., Sreenivasan, V., Kyriakatos, A., Kremer, Y., Petersen, C.C.: Membrane potential correlates of sensory perception in mouse barrel cortex. Nat. Neurosci. 16(11), 1671–1677 (2013). https://doi.org/10.1038/nn.3532. Epub 2013 Oct 6 7. Ayzenshtat, I., Gilad, A., Zurawel, G., Slovin, H.: Population response to natural images in the primary visual cortex encodes local stimulus attributes and perceptual processing. J. Neurosci. 32(40), 13971–13986 (2012) 8. Muller, L., Reynaud, A., Chavane, F., Destexhe, A.: The stimulus-evoked population response in visual cortex of awake monkey is a propagating wave. Nat. Commun. https://doi. org/10.1038/ncomms4675 9. Reynaud, A., Takerkart, S., Masson, G.S., Chavane, F.: Linear model decomposition for voltage-sensitive dye imaging signals: application in awake behaving monkey. NeuroImage 54(2), 1196–1210 (2011) 10. Jancke, D., Chavane, F., Naaman, S., Grinvald, A.: Imaging cortical correlates of illusion in early visual cortex. Nature 428(6981), 423–426 (2004) 11. Griffen, T.C., Haley, M.S., Fontanini, A., Maffei, A.: Rapid plasticity of visually evoked responses in rat monocular visual cortex. PLoS One 12(9), e0184618 (2017)
Neural Activity Retaining in Response to Flash Stimulus
163
12. Todorov, M.I., Kékesi, K.A., Borhegyi, Z., Galambos, R., Juhász, G., Hudetz, A.G.: Retinocortical stimulus frequency-dependent gamma coupling: evidence and functional implications of oscillatory potentials. Physiol. Rep. 4(19), e12986 (2016) 13. Keysers, C., Xiao, D.K., Foldiak, P., Perrett, D.I.: Out of sight but not out of mind: the neurophysiology of iconic memory in the superior temporal sulcus. Cogn. Neuropsychol. 22 (3), 316–332 (2005) 14. Zhou, H., Davidson, M., Kok, P., McCurdy, L.Y., de Lange, F.P., Lau, H., Sandberg, K.: Spatiotemporal dynamics of brightness coding in human visual cortex revealed by the temporal context effect. NeuroImage. https://doi.org/10.1016/j.neuroimage.2019.116277 15. Podvalny, E., Yeagle, E., Mégevand, P., Sarid, N., Harel, M., Chechik, G., Mehta, A.D., Malach, R.: Invariant temporal dynamics underlie perceptual stability in human visual cortex. Curr. Biol. 27(2), 155–165 (2017) 16. Palva, S., Linkenkaer-Hansen, K., Näätänen, R., Palva, J.M.: Early neural correlates of conscious somatosensory perception. J. Neurosci. 25(21), 5248–5258 (2005) 17. Funayama, K., Minamisawa, G., Matsumoto, N., Ban, H., Chan, A.W., Matsuki, N., Murphy, T.H., Ikegaya, Y.: Neocortical rebound depolarization enhances visual perception. PLoS Biol. 13(8), e1002231 (2015) 18. Funayama, K., Hagura, N., Ban, H., Ikegaya, Y.: Functional organization of flash-induced V1 offline reactivation. J. Neurosci. 36(46), 11727–11738 (2016) 19. Usrey, W.M., Alitto, H.J.: Visual functions of the thalamus. Annu. Rev. Vis. Sci. 1, 351–371 (2015) 20. Van Essen, D.C., Lewis, J.W., Drury, H.A., Hadjikhani, N., Tootell, R.B., Bakircioglu, M., Miller, M.I.: Mapping visual cortex in monkeys and humans using surface-based atlases. Vision. Res. 41(10–11), 1359–1378 (2001) 21. Kaas, J.H.: Evolution of columns, modules, and domains in the neocortex of primates. Proc. Natl. Acad. Sci. U.S.A. 109, 10655–10660 (2012) 22. Mountcastle, V.B.: The columnar organization of the neocortex. Brain 120(4), 701–722 (1997) 23. Hübener, M., Shoham, D., Grinvald, A., Bonhoeffer, T.: Spatial relationships among three columnar systems in cat area 17. J. Neurosci. 17(23), 9270–9284 (1997) 24. Ben-Yishai, R., Bar-Or, R.L., Sompolinsky, H.: Theory of orientation tuning in visual cortex. Proc. Natl. Acad. Sci. U.S.A. 92(9), 3844–3848 (1995) 25. Hansel, D., Sompolinsky, H.: Modeling feature selectivity in local cortical circuits. In: Methods in Neuronal Modeling: From Ions to Networks, pp. 499–657. MIT Press (1998) 26. Van Rossum, M.C.W., van der Meer, M.A.A., Xiao, D., Oram, M.W.: Adaptive integration in the visual cortex by depressing recurrent cortical circuits. Neural Comput. 20(7), 1847– 1872 (2008) 27. Smirnova, E., Chizhov, A.V.: Orientation hypercolumns of the visual cortex: ring model. Biofizika 56(3), 527–533 (2011)
Applications of Neural Networks
Deep Neural Networks for Ortophoto-Based Vehicle Localization Alexander Rezanov(&)
and Dmitry Yudin
Moscow Institute of Physics and Technology, Moscow Region, Institutsky Per. 9, Dolgoprudny 141700, Russia [email protected]
Abstract. Navigation of unmanned vehicle especially using orthophoto is a topic of active research. This paper is dedicated to study of different methods of orthophoto-based localization methods. For this task new dataset was created. It consists of pairs of ground level and bird’s eye view images collected on vehicle test site of the technology contest Up Great “Winter City”. Different deep network approaches to localization were used: 1) embedding-based, 2) based on synthesis of bird’s eye view using Pix2pix conditional generative adversarial network and masked cross-correlation in map subwindow. The second approach has demonstrated good applicability for the proposed dataset. Mean absolute error of localization on known scenes reached 1 m. The average total time of bird’s eye view generation and subsequent localization is from 0.1 s to 0.2 s. This is an acceptable quality for the task solution and its further use as part of the navigation systems of unmanned vehicles. Keywords: Localization Unmanned vehicle Orthophoto Deep learning Neural network Generative adversarial network Cross-correlation
1 Introduction The low quality of coverage of precise global navigation satellite systems (e.g. GNSS with real-time kinematic, RTK) in urban areas and the high monthly price of RTK makes it relevant to search for other methods of unmanned vehicles localization. For this, various methods already exist based on the use of cameras, IR images, lidars, and orthophoto. Existing approaches orthophoto-based vehicle navigation are conveniently divided into two directions. First direction uses classical computer vision approaches. Second one utilizes deep networks. The most of the classic computer vision approaches work in the same way: firstly, keypoints are extracted from onboard image (or its perspective wrapped copy) then they are matched with keypoints on orthophoto [1, 2]. Deep network-based approaches can be divided into three groups. First group utilize image embedding to match ground image and satellite image patch [3, 4, 5]. Second group firstly transform onboard image into orthophoto using conditional generative adversarial networks (cGAN) [6, 7], then this orthophoto is matched with ground truth ortophoto [9]. Third group suggests real-time vehicle detection and localization based on only aerial images from drone [10]. But such an approach requires constant © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 167–174, 2021. https://doi.org/10.1007/978-3-030-60577-3_19
168
A. Rezanov and D. Yudin
simultaneous use of both an unmanned vehicle and a drone, which is not always possible. The paper is aimed to create an approach to vehicle localization using an image of an on-board video camera and orthophoto, but without using data from a global positioning system. We present an approach to creating an orhtophoto-based dataset for vehicle localization and an analysis of different deep neural networks to operate with it.
2 Problem Definition During our work we have to find vehicle coordinates (x, y) on the orthophoto image using onboard camera data. This localization task is very challenging because of dramatical viewpoint changes between ground image and orthophoto. Described case is notable for the complexity of the terrain: small amount of buildings and bad weather conditions, including snow. In this paper, localization problem is considered under the several restrictions: plane motion of vehicle and known vehicle orientation. Also, orthophoto and onboard images should be shot in similar conditions (weather, season and light). If condition change dramatically, new orthophoto should be made. In order to solve this localization problem, new dataset was created during autonomous car testing on real vehicle test site. This paper involves study of two groups of localization methods: embedding-based and cGAN-based method with crosscorrelation usage. These two approaches utilize different metrics. We use top 1% accuracy for embedding-based approach. This is percent of cases in which real aerial image is in 1% closest aerial images. For cGAN and cross-correlation based approach we use mean absolute error of localization – distance between ground true location and found one. Ground true location is obtained from GNSS module with RTK.
3 Dataset Preparation In this section we discuss dataset preparation for our research. Rosbag files and dronemade orthophoto are used. This dataset includes pairs of onboard images and bird’s eye view images. The dataset was formed during autonomous car tests of Auto-RTK team on the vehicle test site of the technology contest Up Great “Winter City” [11]. Dataset includes 8051 train image pairs (7 tracks) and 884 test pairs (1 track). The test track was selected based on a large number of intersections with other tracks and complicated turns. Our approach (shown on the Fig. 1) is following: 1. Ground truth GPS coordinates and images from onboard camera are read from rosbag. 2. The time stamps are compared with the GPS coordinates (100 measurements per second) and stored images (10 frames per second). 3. GPS coordinates are transformed to UTM.
Deep Neural Networks for Ortophoto-Based Vehicle Localization
169
Fig. 1. Illustration of dataset preparation
4. Then UTM trajectories are superimposed on the orthophoto. (Fig. 1). 5. Keypoints are chosen to solve system (1) and (2). X image ¼ kX utm þ bX
ð1Þ
Y image ¼ kY utm þ bY
ð2Þ
6. Found coefficients are used to transform trajectories and crop bird’s eye view corresponding to point on trajectory: to do it we cut orthophoto part oriented by oriented along the tangent to the trajectory of movement (the lower edge of the figure is perpendicular to the direction of movement of the vehicle). Than image is rotated so velocity vector is pointed on top. Image size is 400 400 pixels based on visual correspondence of visible areas of the road scene in the image of the onboard camera to correspond to visible area.
4 Methods 4.1
Embedding-Based Localization
Embedding-based approaches utilize deep network (CVM-NET-I) to create descriptors of aerial image and ground image. Then closest aerial image in terms of cosine distance is selected. This approach can be improved by using Markov localization [4]. In this research we use CVM-NET-I [3] as baseline architecture for creating descriptors for bird’s eye view and aerial images. It outperforms CVM-NET-II and its code is publicly available. Approach is shown on Fig. 2.
170
4.2
A. Rezanov and D. Yudin
Ortophoto-Based Vehicle Localization Using Synthesized Bird’s Eye View and Cross-Correlation
This research proposes an approach to localization task, which structure is shown in Fig. 3. Pix2pix [12] conditional generative adversarial network was used to synthesize bird’s eye view images and cross correlation is used to find synthesized bird’s eye view on orthophoto. Also, experiments have shown that search in subwindow (white square on Fig. 3.) at a distance not more than 30 m from the previous location area (blue square) significantly improves results.
Fig. 2. CVM-NET-based approach to localization task
Fig. 3. cGAN-based localization with cross-correlation
Bird’s eye view images obtained with pix2pix were rotated using information about vehicle orientation to be orientated in the same way with orthophoto (see Fig. 3). Then
Deep Neural Networks for Ortophoto-Based Vehicle Localization
171
these fragments of the top view were localized on the map using cross-correlation with mask: P 0 0 0 0 x0 ;y0 2M ðTðx ; y Þ Iðx þ x ; y þ y ÞÞ ffi Rðx; yÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð3Þ P P 2 0 ; y0 Þ 2 0 ; y þ y0 Þ Tðx Iðx þ x 0 0 0 0 x ;y 2M x ;y 2M Several improvements are used to increase performance: • Mask (M) based cross-correlation (black areas are not used to find R(x,y)); • search next localization near previous coordinates (in a subwindow) (Table 2); • resolution is decreased to improve inference time (by the scale factor (Table 2).
5 Experimental Results 5.1
CVM-NET-Based Approach
We trained CVM-NET-I for 100 epochs on 8051 train image pairs (7 tracks) The same test dataset with 884 test pairs (1 track) is used. We used pretrained model with starting learning rate 0.00001 and then drop it every 30 epochs. Embedding-based approach have shown poor performance on our data. It can be explained by the fact that our onboard images contain less information than CVUSA (360° view) and Vo and Hayes data (part of panorama made by stereo-camera) (Table 1). Table 1. CVM-Net approach quality Dataset Vo and Hayes CVUSA Our Top 1% accuracy 91.4% 67.9% 40%
5.2
Conditional Generative Adversarial Network for Bird Eye View Reconstruction with Cross-Correlation
Pix2pix conditional generative adversarial network was trained during 200 epochs. Initial learning rate was 0.001 and it was dropped every 50. Obtained results are shown below on Fig. 4.
172
A. Rezanov and D. Yudin
Fig. 4. Results of trained pix2pix in two columns: input inboard image on the left, ground truth bird’s eye view next and cGAN generated bird’s eye view (BEV) on the right
Also, performance of the approach was measured using computer with GPU NVidia RTX 2080 8 GB, CPU Intel Core i5–8400, 16 GB RAM. It takes 0.042 s to reconstruct bird’s eye view. Table 2. Localization time performance Localization method Without subwindow
Scale 4 8 Subwindow with 7 size of the template 4 8 Subwindow with 4 size of the template 4 8
Time, s 0.230 0.050 0.150 0.150 0.063 0.047
Fig. 5. Absolute error of vehicle localization
Deep Neural Networks for Ortophoto-Based Vehicle Localization
173
Localization error of this approach is shown on Fig. 5. The average error of vehicle localization on the test data in known areas is about 1 m. At the same time, when the vehicle meets previously unfamiliar areas and makes unfamiliar maneuvers error increases. This, however, can be overcome by expanding the training set. During the project, the method limitations were clarified: • clear difference in the texture of the road and the space around it without obvious details is needed (this part can be improved by orthophoto segmentation) • knowledge of the plain orientation of vehicle motion with an error within 5º • ability to localize accurately only on the seen areas in similar lighting conditions and similar road scenes.
6 Conclusions The paper describes the solution of orthophoto-based localization task. For this purpose, new dataset was created. It consists of pairs of ground level and bird’s eye view images collected on vehicle test site of the technology contest Up Great “Winter City”. Different deep network approaches to localization were studied: embedding-based and based on cGAN-synthesized bird’s eye view and masked cross-correlation in map subwindow. The second approach has demonstrated good applicability for the proposed dataset. Mean absolute error of localization on known scenes reached 1 m. The average total time of bird’s eye view generation and subsequent localization is from 0.1 s to 0.2 s. This is an acceptable quality for the task solution and its further use as part of the navigation systems of unmanned vehicles. Acknowledgments. This study was carried out under the contract with the Scientific-Design Bureau of Computing Systems (SDB CS) and supported by the Government of the Russian Federation (Agreement No 075–02-2019–967). The authors are also grateful to the Auto-RTK team of the technology contest Up Great “Winter City” who provided source data for prepared dataset.
References 1. Viswanathan, A., Pires, B.R., Huber, D.: Vision based robot localization by ground to satellite matching in GPS-denied situations. In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 192–198 (2014) 2. Lefèvre, S., Tuia, D., Wegner, J.D., Produit, T., Nassaar, A.S.: Toward seamless multiview scene analysis from satellite to street level. Proc. IEEE 105, 1884–1899 (2017) 3. Hu, S., Feng, M., Nguyen, R.M., Lee, G.H.: CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7258–7267 (2018) 4. Hu, S., Lee, G.H.: Image-based geo-localization using satellite imagery. Int. J. Comput. Vis. 1–15 (2019) 5. Sun, B., Chen, C., Zhu, Y., Jiang, J.: GeoCapsNet: Aerial to Ground view Image Geolocalization using Capsule Network (2019). ArXiv, abs/1904.06281
174
A. Rezanov and D. Yudin
6. Tang, H., Xu, D., Sebe, N., Wang, Y., Corso, J.J., Yan, Y.: Multi-channel attention selection GAN with cascaded semantic guidance for cross-view image translation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2412– 2421 (2019) 7. Regmi, K., Borji, A.: Cross-view image synthesis using geometry-guided conditional GANs. Comput. Vis. Image Underst. 187, 102788 (2018) 8. Regmi, K., Borji, A.: Cross-view image synthesis using conditional GANs. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3501–3510 (2018) 9. Regmi, K., Shah, M.: Bridging the domain gap for ground-to-aerial image matching. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 470–479 (2019) 10. Yudin, D.A., Skrynnik, A., Krishtopik, A., Belkin, I., Panov, A.I.: Object detection with deep neural networks for reinforcement learning in the task of autonomous vehicles path planning at the intersection. Opt. Mem. Neural Netw. (Inf. Opt.) 28(4), 283–328 (2019) 11. Final tests. Technology contest «Winter city». https://en.city.upgreat.one/final/. Accessed 29 Apr 2020 12. Isola P., Zhu J.Y., Zhou T., Efros A.A.: Image-to-image translation with conditional adversarial networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5967–5976 (2017)
Roof Defect Segmentation on Aerial Images Using Neural Networks Dmitry A. Yudin1(&) , Vasily Adeshkin1 , Alexandr V. Dolzhenko2 , Alexandr Polyakov2 , and Andrey E. Naumov2 1
2
Moscow Institute of Physics and Technology, Moscow Region, Institutsky Per. 9, Dolgoprudny 141700, Russia [email protected] Belgorod State Technological University named after V.G. Shukhov, Kostukova Str. 46, Belgorod 308012, Russia
Abstract. The paper describes usage of deep neural networks for flat roof defect segmentation on aerial images. Such architectures as U-Net, DeepLabV3+ and HRNet+ OCR are studied for recognition five categories of roof defects: “hollows”, “swelling”, “folds”, “patches” and “breaks”. Paper introduces RoofD dataset containing 6400 image pairs: aerial photos and corresponding ground truth masks. Based on this dataset different approaches to neural networks training are analyzed. New SDice coefficient with categorical cross-entropy is studied for precise training of U-Net and proposed light U-NetMCT architecture. Weighted categorical cross-entropy is studied for DeepLabV3+ and HRNet+ OCR training. It is shown that these training methods allow correctly recognize rare categories of defects. The state-of-the-art model multi-scale HRNet+ OCR achieves the best quality metric of 0.44 mean IoU. In sense of inference time the fastest model is U-NetMCT and DeeplabV3+ with worse quality of 0.33–0.37 mean IoU. The most difficult category for segmentation is “patches” because of small amount of images with this category in the dataset. Paper also demonstrates the possibility of implementation of the obtained models in the special software for automation of the roof state examination in industry, housing and communal services. Keywords: Image segmentation network Deep learning
Roof defect Aerial image Neural
1 Introduction Roof defect recognition on aerial images has serious importance for building state examination in industry, housing and communal services. Automation of this process is topic of application of deep neural networks which can give precise end-to-end solution of image segmentation task. At the same time, there are approaches to defects detecting using classical methods of computer vision [1], but they provide a narrower solution that is difficult to transfer to other tasks.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 175–183, 2021. https://doi.org/10.1007/978-3-030-60577-3_20
176
D. A. Yudin et al.
In general, the problems of defect recognition on various surfaces are solved in similar ways. Some publications are devoted to the classification of local defects using convolutional neural networks, for example steel surface defects [2] and rail surface defects [3]. Other researchers see defect recognition as image segmentation task. Paper [4] discusses semantic segmentation of deflectometric recordings for quality control of reflective surfaces with U-Net-relied method. The article [5] explores the use of FCN and SegNet architectures for automatic pixel-level multiple damage detection of concrete structure, demonstrating a fairly high quality. In the examination of buildings and structures, the analysis of defects of widespread flat roofs is highly demanded [6]. Figure 1 shows examples of photographs of such defects made from a drone. At the same time, no open data sets with such images were found. Therefore, in this study, serious attention is paid to the preparation of our own dataset.
Fig. 1. Examples of aerial images with flat roof defects: a - “hollows”, b - “swellings”, c “folds”, d - “patches” and e - “breaks”
The paper contains the following new contribution of the authors to solving problems on the topic of aerial photograph segmentation: 1) a new dataset containing labeled images of flat roof defects, 2) the study of new approaches to the of modern neural network training in segmentation task: loss functions based on the proposed modified Dice coefficient - SDice, as well as weighted categorical cross-entropy.
2 Task Formulation The paper considers the flat roof defect recognition as a task of segmentation of color aerial images obtained from a drone. Photographs should be taken from approximately the same distance and should have a close meter/pixel ratio for image areas containing roof segments. The 5 most common categories of defects are considered: “hollows”, “swellings”, “folds”, “patches” and “breaks” (Fig. 1). The work is aimed at studying various training methods for different neural network segmentation models that have demonstrated high quality indicators in other practical applications. The importance of such a study is due, on the one hand, to the small number of existing labeled data and, on the other hand, to the recognition of defect types that are rarely found in images. The research involves developing own data set and estimation the quality of neural network
Roof Defect Segmentation on Aerial Images Using Neural Networks
177
algorithms on it using the Intersection over Union (IoU) metric, as well as their inference time.
3 Dataset Preparation In the course of solving assigned task, a new data set was created containing 1618 manually labeled images. The labeleng was carried out by polygonal areas using the OpenCV CVAT [7] with five flat roof defects mentioned above. Next, data augmentation was performed. For this purpose we have used following sequential steps: 1. Affine transformation – image rotation on random degrees from −15 to 15, translation by x and y from −10% to +10% of width and height respectively. 2. Horizontally flip with 0.5 probability. 3. Image resizing to fixed size of 1024 576 pixels. Finally formed RoofD dataset contains 6400 color image pairs with size of 1024 576 divided on two parts: train (5184 images) and test (1216 images). Its details are shown in Table 2. Several image pair examples from the dataset are illustrated in Fig. 2. Table 1. Details of RoofD Dataset Segment category Train Color
Background Hollows Swellings Folds Patches Breaks
ID 0 1 2 3 4 5
Number of pixels
(RGB format) Train Test (0, 0, 0) 2749885350 638910979 (255, 255, 0) 154395478 34493513 (153, 255, 204) 143853128 42520381 (6, 89, 182) 8763604 1098290 (0, 255, 255) 642958 126472 (255, 193, 255) 107098 76349
Number of images Train Test 5184 1216 3598 751 3307 896 1199 208 107 31 54 32
Fig. 2. Examples of from proposed RoofD dataset with aerial images and corresponding masks
Proposed RoofD Dataset is publicly available: https://github.com/yuddim/RoofD.
178
D. A. Yudin et al.
4 U-Net-Based Models Training with Modified Dice Usually, when training neural network models based on the popular U-Net architecture [8], the categorical cross-entropy loss function is used: LCCE ¼
1 XN XC 1 ln p½yi 2 Sc ; i¼1 c¼1 yi 2Sc N
where N is the number of pixels in the mask, C is number of categories, yi is predicted segment for i-th pixel, Sc is ground truth segment of c-th category, p½yi 2 Sc is predicted probability of for i-th pixel to belong to the c-th category, 1yi 2Sc is the indicator function which is equal 1 if predicted segment for i-th pixel is equal to ground truth Sc. To detect small objects Dice coefficient is often used the loss function [9, 10]: XN XN XN c c c c b b Dicec ¼ 2 y þ e = y þ þ e ; y y i i i i i¼1 i¼1 i¼1 where e is small constant equal 1, by ci is ground truth value of output segmentation mask for i-th pixel with c-th category, yci is predicted value of output mask for i-th pixel with c-th category. At the same time, if the desired segments are present only on some images (or a small fraction of the images), then the neural network optimization method for training to obtain a high Dice coefficient (low Loss) will prefer to generate a zero mask at the output. To avoid this problem, the paper proposes to use the modified Dice coefficient SDice, which “works” only on those images where there are objects of the category for which it is used: 8 N P < by ci \d; 0; if SDicec ¼ i¼1 : Dicec ; otherwise: where d is the threshold value corresponding to the minimum number of pixels in the true segment. It is the same for all categories and is assumed to be 10 pixels. The loss function will look like: LSDice ¼ 1
C 1X ac SDicec ; C c¼1
ac ¼ Nall =Nc ; where Nall is the total number of images in the training set, Nc is the number of images in the training set containing objects of the c-th category, ac is the weight of c-th category. During training the loss function is averaged over all examples so some of the examples with this approach will fall out of consideration.
Roof Defect Segmentation on Aerial Images Using Neural Networks
179
This article explores the possibility of improving the quality of training as a U-Net model and its light modification U-NetMCT shown in Fig. 3 using the integrated loss function LCCE þ SDice : LCCE þ SDice ¼ LCCE þ LSDice :
5 Weighted Training of Deeplabv3+ and HRNet + OCR We are also exploring the possibility of roof defect segmentation based on other modern architectures: DeeplabV3+ [11] and HRNet + OCR [12], which were not previously used for the task. The first architecture through the use of lightweight Xception-based Backbone [13] and the concept of Atrous Spatial Pyramid Pooling [11] provides significantly higher speed performance than U-Net. And the second, thanks to the deep HRNet model [14] and refinement of the resulting segments using the ObjectContextual Representation (OCR) approach [12], provides state-of-the-art segmentation quality for a number of practical tasks, for example, recognition of a road scene.
64
64
64
64
64
1024×576
1024×576
1024×576
64
64
32
N
1024×576
32
32
32
1024×576
1024×576
1024×576
Input color image 1024x576
32
32
1024×576
3
6 segmentaon maps 1024x576
conv2D, 3x3, 1x1, ReLU conv2D, 3x3, dilaon=2, ReLU
512x288
256x144
256x144
256x144
128
128
64
512x288
512x288
512x288
512x288
512x288
512x288
max pooling 2D, 2x2, 2x2 Dropout 0.5 + conv2DTranspose, 2x2, 2x2 + ReLU conv2DTranspose, 2x2, 2x2 + ReLU concatenate conv2D, 1x1, 1x1, Somax
Fig. 3. U-NetMCT architecture.
When training these neural network models, a weighted modification of the traditional loss function based on categorical cross-entropy was studied: LwCCE ¼
1 XN XC b1 ln p½yi 2 Sc ; i¼1 c¼1 c yi 2Sc N
180
D. A. Yudin et al.
where bc is the weight for c-th roof defect category. The vector of weights was chosen empirically and has the form b = [0.05, 0.50, 0.50, 0.50, 1.00, 1.00]. This approach allows us to increase the significance (weight) of rare classes of segments that are found only in a small part of the images.
6 Experimental Results The experiments had performed using hardware platform with NVidia Tesla V100 graphics card 32 GB, central processor Intel Xeon Gold 6154 and 128 GB DDR4 RAM. Methods are implemented on the Python 3.7 programming language and deep learning frameworks Tensorflow (U-Net and DeeplabV3+) and PyTorch (HRNet + OCR) with NVidia CUDA technology. Quality of roof defect segmentation was estimated on test sample of RoofD dataset using Intersection over Union (IoU) metric per each category. As speed metric, we had used average inference time (pre- and post-processing steps were not taken into account). Table 2 consists quality and speed estimation for all used neural networks with different training approaches. It shows that addition of SDice to loss metric for UNet-based models improves their quality. In addition, usage of weighted categorical cross-entropy significantly increase IoU metric of DeeplabV3 + and HRNet + OCR for all roof defect categories. Table 2. Quality and speed of roof defect segmentation on test sample of RoofD dataset Metric
U-Net CCE
Background IoU, % Hollows IoU, % Swellings IoU, % Folds IoU, % Patches IoU, % Breaks IoU, % Mean IoU, % Inference time, s
CCE +SDice
U-NetMCT
DeeplabV3+
CCE
CCE
CCE +SDice
HRNet + OCR
wCCE Singlescale + CCE
Singlescale + wCCE
Multiscale + wCCE
93.66 93.91
93.63 93.30
91.52 93.76
91.29
93.32
93.24
47.53 48.59
38.56 37.12
49.65 52.00
51.33
50.84
52.26
34.52 34.39
25.06 24.03
35.49 36.14
36.43
37.05
38.48
22.50 23.88 10.29 12.84
16.81 21.47 0.06 9.27
20.11 22.00 0.00 0.08
25.02 3.87
28.28 16.37
29.64 12.35
23.59 26.80 38.68 40.07 0.088 0.088
0.36 12.95 29.08 33.02 0.043 0.043
0.00 20.36 32.79 37.39 0.047 0.047
39.16 41.18 0.094
33.01 43.15 0.094
38.87 44.14 2.401
Figure 4 demonstrated output segmentation masks for models with the best training techniques.
Roof Defect Segmentation on Aerial Images Using Neural Networks
181
Fig. 4. Output segmentation masks: a – ground truth, b – U-Net (CCE + SDice), c – U-NetMCT (CCE + SDice), d – DeeplabV3 + (wCCE), e – HRNet + OCR (single-scale + wCCE), e – HRNet + OCR (multi-scale + wCCE).
The most stable result is obtained for HRNet + OCR. Its multi-scale version shows similar quality as single-scale but 25 times slower. U-NetMCT is the fastest model but it has a lot of noise in the segmentation masks because of small perceptive field. As a compromise option for further practical application, we propose using the single-scale HRNet + OCR model, which achieves high quality in a time comparable to the operation of U-Net. The described neural network segmentation methods can be implemented in practice in the form of an off-line application for automated state examination of flat building roofs and generation of reports for downloadable aerial images.
7 Conclusions Different approaches are studied for training of modern deep neural networks in roof defect segmentation task. New SDice coefficient with categorical cross-entropy is analyzed for precise training of U-Net and proposed light U-NetMCT architecture. Weighted categorical cross-entropy is used for DeepLabV3+ and HRNet + OCR training improvement. It is shown that these training methods allow correctly recognize rare categories of roof defects. The state-of-the-art model multi-scale HRNet + OCR achieve the best quality metric of 0.44 mean IoU. In sense of inference time the fastest
182
D. A. Yudin et al.
models are U-NetMCT and DeeplabV3+ with worse quality of 0.33–0.37 mean IoU. The most difficult category for segmentation is “patches” because of small amount of image with this category in the dataset. The compromise option for further practical application is using the single-scale HRNet + OCR model, which achieves high quality with inference time comparable to the operation of U-Net. Paper also demonstrates possibility of implementation of the obtained models in the special software for automation of the roof state examination in industry, housing and communal services. Acknowledgment. Task formulation, RoofD dataset and training approaches of deep neural networks (with modified Dice coefficient and weighted cross-entropy) were developed during the project of Russian Fund of Basic Research No 18-47-310009. Experimental results were obtained during works supported by the Government of the Russian Federation (Agreement No. 075-022019-967).
References 1. Kofler, C., Spöck, G., Muhr, R.: Classifying defects in topography images of silicon wafers. In: Winter Simulation Conference (WSC), pp. 3646–3657 (2017) 2. Soukup, D., Huber-Mörk, R.: Convolutional neural networks for steel surface defect detection from photometric stereo images. Lect. Notes Comput. Sci., vol. 8887, pp. 668 – 677 (2014) 3. Faghih-Roohi, S., et al.: Deep convolutional neural networks for detection of rail surface defects. In: International Joint Conference Neural Networks (IJCNN), pp. 2584–2589 (2016) 4. Maestro-Watson, D., Balzategui, J., Eciolaza, L., Arana-Arexolaleiba, N.: Deflectometric data segmentation for surface inspection: a fully convolutional neural network approach. J. Electron. Imaging 29(4), 041007 (2020) 5. Li, S., Zhao, X., Zhou G.: Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput. Aided Civil Infrastruct. Eng., 34(7), 616–634 (2019) 6. Yudin, D., Naumov, A., Dolzhenko, A., Patrakova, E.: Software for roof defects recognition on aerial photographs. J. Phys: Conf. Ser. 1015(3), 032152 (2018) 7. Computer Vision Annotation Tool (CVAT). https://github.com/opencv/cvat. Accessed 10 May 2020 8. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-31924574-4_28 9. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M.J., Arbel, T., Carneiro, G., Syeda-Mahmood, T., Tavares, J.M.R.S., Moradi, M., Bradley, A., Greenspan, H., Papa, J.P., Madabhushi, A., Nascimento, J.C., Cardoso, J.S., Belagiannis, V., Lu, Z. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28 10. Yudin, D.A., Skrynnik, A., Krishtopik, A., Belkin, I., Panov, A.I.: Object detection with deep neural networks for reinforcement learning in the task of autonomous vehicles path planning at the intersection. Opt. Memory Neural Networks 28(4), 283–295 (2019). https:// doi.org/10.3103/S1060992X19040118
Roof Defect Segmentation on Aerial Images Using Neural Networks
183
11. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 833–851. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_49 12. Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. ArXiv, abs/1909.11065 (2019) 13. Chollet, F.: Xception: deep learning with depthwise separable convolutions. CVPR 2017, arXiv:1610.02357 (2017) 14. Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., Xiao, B.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Choice of Hyperparameter Values for Convolutional Neural Networks Based on the Analysis of Intra-network Processes Dmitry M. Igonin(B) , Pavel A. Kolganov, and Yury V. Tiumentsev Moscow Aviation Institute (National Research University), Moscow, Russia [email protected], [email protected], [email protected] Abstract. One of the critical tasks that have to be solved when forming a convolutional neural network (CNN) is the choice of the values of its hyperparameters. Existing attempts to solve this problem are based, as a rule, on one of two approaches. The first of them implements a series of experiments with different values of the hyperparameters of CNN. For each of the obtained sets of hyperparameter values, training is carried out for the corresponding network versions. These experiments are performed until we obtain a CNN with acceptable characteristics. This approach is simple to implement but does not guarantee high performance for CNN. In the second approach, the choice of network hyperparameter values is treated as an optimization problem. With the successful solution of such a problem, it is possible to obtain a CNN with sufficiently high characteristics. However, this task has considerable complexity, and also requires a large consumption of computing resources. This article proposes an alternative approach to solving the problem of choosing the values of hyperparameters for CNN, based on an analysis of the processes taking place in the network. We demonstrate the efficiency of this approach by solving the problem of classifying functional dependencies as an example. Keywords: Artificial neural network · Hyperparameters · Convolutional neural network · Analysis of intra-network processes
1
Introduction
Currently, significant efforts of researchers are aimed at creating methods and tools of artificial intelligence, as well as their application to a variety of applied problems [1,2]. The modern interpretation of artificial intelligence is based mainly on ideas of machine learning. Deep learning [3], applied to artificial neural networks of different classes, is the most popular and in-demand now. One of such classes is convolutional neural networks (CNN) [3,4], the importance of which for practice is determined by the fact that they are a tool for solving a large number of various applied tasks, primarily, the tasks of machine vision. One c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 184–197, 2021. https://doi.org/10.1007/978-3-030-60577-3_21
Choice of Hyperparameter Values for Convolotional Neural Networks
185
of the problems of this type is the semantic segmentation of images obtained by remote sensing of the earth’s surface. Convolutional neural networks are widely used to solve this problem. It was in solving this problem that a method we developed for visualizing the processes occurring in the CNN [5,6], which in the following sections is the basis for the procedure for selecting the values of the CNN hyperparameters. As we know, the formation of the CNN, designed to solve some application problem, involves the choice of values of parameters that determine the architecture of the network (so-called hyperparameters) [7–14], as well as search for values of configurable network parameters (synaptic weights). The second of these two tasks are nothing more than network learning. Before starting to solve this task, we need to choose in one way or another the set of hyperparameters of the network and lock their values. Thus, we can say that hyperparameters are network parameters whose values are set before the learning process starts, and “usual” parameters are those parameters whose values are determined by learning the network. Hyperparameters define the network architecture, i.e., the number and kinds of elements forming the network, the number of network layers, and the links between the elements of these layers. Besides, the hyperparameters include the values that drive the learning algorithm. The specific set of hyperparameters will be different for networks of various classes. It may also differ for networks of the same class, depending on the problem being solved and the features of the learning algorithm. In particular, the hyperparameters set for the deep feedforward network, including CNN, may include the following variables: 1) 2) 3) 4) 5) 6) 7) 8) 9) 10)
number of hidden layers in the network; number of elements in each hidden layer; number of learning parameters in the network; types of activation functions in the network layers. number of learning epochs; learning rate value; initial values of the trained network parameters; training set size; batch size; dropout value.
Defining CNN hyperparameter values is a rather complicated and timeconsuming problem that requires a lot of computing resources. To solve it, a number of approaches were proposed [7–14]. These approaches are based on one way or another organized search of hyperparameter values: direct search on value grid [7], optimization using random search [8,9], optimization using deterministic techniques [10–14]. This article proposes an alternative approach to solving the problem of choosing CNN hyperparameter values based on the analysis of processes running in the network. This approach can be used either independently or in addition to the methods listed above. In the problem that is solved in this article, from the above list of hyperparameters we will consider as variable the number of hidden network layers, as
186
D. M. Igonin et al.
well as the number of elements in each of the hidden layers. The values of all other hyperparameters, except the dropout parameter, are determined based on experience in solving problems of this kind and are locked. The dropout parameter we do not use in our task.
2
A Technique for Choosing CNN Hyperparameter Values
As already noted above, the proposed approach to the search of the values of the CNN hyperparameters is based on the getting and processing of information about the passing a certain set of patterns through the trained network. This information includes the following elements: – the dependence of the number of network parameters used in each layer for all the presented patterns; – distribution of network parameters of each layer between all patterns; – visualization of the passage of each pattern through each network layer [6]. The procedure for using this information to choose network hyperparameter values is iterative. At each iteration, the current hyperparameter values are set and locked, and the network is then trained. This training is carried out according to the following rules: 1. Training, validation, and test samples from iteration to iteration remain unchanged. 2. Before starting the learning process, the same initial value is set for pseudorandom number generators to ensure that results from different iterations can be compared correctly. 3. Based on the experience of solving such tasks, the number of learning epochs is set equal to 150. This number is enough for learning the network. The learning algorithm provides for an early stop if the network error values have not improved in the last 15 epochs (10% of the total prescribed number of epochs). 4. We use two metrics in the network learning process. The first of them is the Accuracy metric, which is the ratio of the number of correct predictions to the total number of patterns in a given class. The second metric is the Loss metric, i.e., the value we need to minimize during the learning. At each learning epoch, the current values of accuracy and losses are remembered. We save the tables of weights for epochs in which the maximum accuracy was achieved on the training and validation sets, as well as for epochs with minimal losses on the training and validation sets, a total of 4 weight tables. Further, from these weight tables, the best in classification quality for the test sample is selected for subsequent analysis. 5. The batch size that we feed to the network at each time instant during learning is assumed to be 64 patterns (the limitations of the used hardware determine this value).
Choice of Hyperparameter Values for Convolotional Neural Networks
187
We use the information gathered from the learning of the current version of the CNN under consideration to analyze the process of patterns passing through the network. This analysis allows us to identify problem areas in the network, eliminate them, and proceed to the next iteration of the process of choosing hyperparameter values.
3
The Problem of Pattern Classification and Source Data for Its Solution
Let us consider the method of choosing the hyperparameter values of a convolutional neural network on the example of the pattern classification problem [4,16,17]. In the output layer of this network, we will use the softmax activation function [3]. We require this function for the network to return the probabilistic correspondence of the pattern to each of the presented classes. Functional dependencies belonging to 10 classes were chosen as patterns for their classification: Sin, Sin2, Sin3, Cos, Cos2, Cos3, Tan, Exp, Sinh, Cosh (Fig. 1(a)). In Sin2, Sin3, Cos2, and Cos3 functions, the current values of the argument are multiplied by 2 and 3, respectively. For all functions considered,
(a)
(b)
Fig. 1. Classified objects: (a) reference patterns; (b) distorted patterns
188
D. M. Igonin et al.
Fig. 2. Database distribution by samples and test set composition by pattern classes
the argument belongs to the segment [−2π, 2π], and they take values from the segment [−1, 1]. The function values are calculated at 300 points, evenly distributed in segment [−2π, 2π]. We interpret further the pattern as a vector of values of one of the functional dependencies shown in Fig. 1(b), with the number of components equal to 300. To generate a database from which training, validation, and test samples are then extracted, we used artificial distortion of the original functional dependencies (reference patterns, Fig. 1(a)) by applying to them additive Gaussian noise with zero mean value and standard deviation σ = 4.0 (in Fig. 1(b), for better visibility, the results for σ = 1.5 are shown). This approach makes it possible to generate an extensive database sufficient for training and testing a convolutional neural network. We divide the set of patterns obtained by applying noise to reference patterns into three subsets (samples): the training set of 10890 patterns, the validation set of 3630 patterns, and the test set of 3630 patterns. Figure 2 shows the distribution of the generated set of patterns on these samples and separate functional relationships.
4
A Procedure for Choosing the CNN Hyperparameter Values
As noted above, the procedure for using information about the passage of classified patterns through the CNN layers to choose the values of the network hyperparameters is iterative. Consider the essence of iterations that form the process of analyzing the information obtained during the learning of CNN, as well as the nature of changes in the values of hyperparameters based on this analysis. Iteration 1. In the example under consideration, we choose a 14-layer variant of the convolutional neural network, as shown in Fig. 3(a), as the initial variant of
Choice of Hyperparameter Values for Convolotional Neural Networks input_1: InputLayer
Start_conv: Conv1D
input_1: InputLayer
input:
(None, 300, 1)
output:
(None, 300, 1)
input_2: InputLayer
input:
(None, 300, 1)
output:
(None, 300, 1)
input:
(None, 300, 1)
output:
(None, 300, 1)
input:
(None, 300, 1)
output:
(None, 300, 50)
Deconv_1_Lambda_in: Lambda
Start_conv: Conv1D
input:
(None, 300, 1)
output:
(None, 300, 300)
Start_conv: Conv1D
input:
(None, 300, 1)
output:
(None, 300, 300) Deconv_1: Conv2DTranspose
max_pooling1d_1: MaxPooling1D
input:
(None, 300, 300)
output:
(None, 150, 300)
max_pooling1d_4: MaxPooling1D
input:
(None, 300, 300)
output:
(None, 150, 300)
Mid_1_conv: Conv1D
Mid_2_conv: Conv1D
input:
(None, 150, 300)
output:
(None, 150, 150)
input:
(None, 150, 150)
output:
(None, 150, 150)
input:
(None, 150, 150)
output:
(None, 150, 150)
max_pooling1d_2: MaxPooling1D
End_conv: Conv1D
dense_2: Dense
dense_3: Dense
flatten_1: Flatten
dense_4: Dense
(None, 150, 150)
output:
(None, 75, 150)
input:
(None, 75, 150)
output:
(None, 75, 75)
max_pooling1d_3: MaxPooling1D
dense_1: Dense
input:
input:
(None, 75, 75)
output:
(None, 37, 75)
input:
(None, 37, 75)
output:
(None, 37, 75)
input:
(None, 37, 75)
output:
(None, 37, 75)
input:
(None, 37, 75)
output:
(None, 37, 75)
input:
(None, 37, 75)
output:
(None, 2775)
input:
(None, 2775)
output:
(None, 10)
(a)
Mid_0_conv: Conv1D
Mid_1_conv: Conv1D
Mid_2_conv: Conv1D
input:
(None, 150, 300)
output:
(None, 150, 75)
input:
(None, 150, 75)
output:
(None, 150, 75)
input:
(None, 150, 75)
output:
(None, 150, 75)
max_pooling1d_5: MaxPooling1D
End_conv: Conv1D
dense_6: Dense
dense_7: Dense
flatten_2: Flatten
dense_8: Dense
(None, 150, 75)
output:
(None, 75, 75)
input:
(None, 75, 75)
output:
(None, 75, 37)
max_pooling1d_6: MaxPooling1D
dense_5: Dense
input:
input:
(None, 75, 37)
output:
(None, 37, 37)
input:
(None, 37, 37)
output:
(None, 37, 37)
input:
(None, 37, 37)
output:
(None, 37, 37)
input:
(None, 37, 37)
output:
(None, 37, 37)
input:
(None, 37, 37)
output:
(None, 1369)
input:
(None, 1369)
output:
(None, 10)
(b)
Mid_0_conv: Conv1D
Mid_1_conv: Conv1D
Mid_2_conv: Conv1D
input:
(None, 300, 50)
output:
(None, 300, 1, 50)
input:
(None, 300, 1, 50)
output:
(None, 300, 1, 100)
Deconv_1_Lambda_out: Lambda
Mid_0_conv: Conv1D
input:
(None, 300, 1, 100)
output:
(None, 300, 100)
input:
(None, 300, 100)
output:
(None, 300, 100)
input:
(None, 300, 100)
output:
(None, 300, 100)
input:
(None, 300, 100)
output:
(None, 300, 100)
max_pooling1d_1: MaxPooling1D
End_conv: Conv1D
dense_2: Dense
dense_3: Dense
flatten_1: Flatten
dense_4: Dense
input:
(None, 300, 100)
output:
(None, 150, 100)
input:
(None, 150, 100)
output:
(None, 150, 37)
max_pooling1d_2: MaxPooling1D
dense_1: Dense
189
input:
(None, 150, 37)
output:
(None, 75, 37)
input:
(None, 75, 37)
output:
(None, 75, 37)
input:
(None, 75, 37)
output:
(None, 75, 37)
input:
(None, 75, 37)
output:
(None, 75, 37)
input:
(None, 75, 37)
output:
(None, 2775)
input:
(None, 2775)
output:
(None, 10)
(c)
Fig. 3. Neural network variants: (a) first iteration; (b) second iteration; (c) third iteration
the neural network, allowing to solve the classification problem. The considered network in this variant contains 350335 of configurable parameters. After CNN is learned in this version, we use it to classify patterns of the test set. Then we build the probability classification matrix for this test set. The value of each of the matrix elements is a probabilistic estimate of the correspondence of the class to itself (diagonal elements) and the probability of mixing the classes between themselves (off-diagonal elements). For the first iteration, this matrix is shown in Fig. 4(a).
190
D. M. Igonin et al.
(a)
(b)
Fig. 4. Learning result at first iteration for the test set: (a) probability matrix; (b) histogram of probability distribution at the network output
We can also build a histogram of the distribution of the probability of matching for each of the classes. As an example in Fig. 4(b), we have such a histogram for the Exp class because it has the worst result in the probability matrix presented in the Fig. 4(a).
Choice of Hyperparameter Values for Convolotional Neural Networks
191
Fig. 5. Using network parameters at first iteration for each of the reference patterns
The next operation is to analyze the nature of the use of the trained parameters of the network under study. For this purpose, we pass the reference patterns of classes shown in Fig. 1(a) through the trained model. The result of the analysis is information on the use of network parameters by each of the reference patterns by layers, shown in Fig. 5. We can see from the Fig. 5, that the network parameters are used on average less than at 20%. It means that the number of network parameters is excessive. The important thing at the stage of reducing the network size is to keep the quality of classification shown in Fig. 4(a). For the considered neural network, as the experiment shows, non-critical in this sense reduction of parameters will be the reduction of the number of neurons in all layers by two times. Iteration 2. The reduced variant of the network (see Fig. 3(b)) has 128955 of learning parameters. The result of the training and analysis is presented in the Fig. 6. The quality of recognition, demonstrated by the data presented in Fig. 6(a), is not worse than in the previous iteration (see Fig. 4(a)). Consequently, the hypothesis of an excessive number of learning parameters in the previous CNN version is confirmed. Based on our analysis of the results presented in Fig. 7, we can assume that to improve the classification results for the patterns, it is necessary that in the hidden layers of the network closer to its input layer, more configurable parameters are used than in other hidden layers compared to the version shown in Fig. 3(b). For this version of the CNN, it is characteristic that the initial hidden layers are used extremely inefficiently (low percentage of the use of learnable parameters). Most of the output layer values are in the Mix area (some network parameters are used by several patterns) and the Empty area (some network parameters are not used by any of the patterns). Therefore, it is necessary to expand the initial hidden network layers to divide the Mix area between patterns. This expansion will allow transferring information about the pattern to subsequent layers of the network. Compared to the previous iterations, the network (see Fig. 3(b)) uses more parameters (see Fig. 7) than its previous version (see Fig. 5). Improvement
192
D. M. Igonin et al.
(a)
(b)
Fig. 6. Learning result at second iteration for the test set: (a) probability matrix; (b) histogram of probability distribution at the network output
of the network classification quality is revealed by reducing the number of Exp class patterns with a low probability of matching the Exp class at the network output (see Fig. 6(b)).
Choice of Hyperparameter Values for Convolotional Neural Networks
193
Fig. 7. Using network parameters at second iteration for each of the reference patterns
Iteration 3. The expansion of the hidden layers of the CNN, the need for which was shown in the previous iteration, will be extended by adding layers of deconvolution. Since there is no support for a one-dimensional deconvolution layer in the TensorFlow package and the network classifies one-dimensional patterns, the conversion to a two-dimensional layer is used, a two-dimensional deconvolution is performed and then the conversion to a one-dimensional layer is executed. The variant with enlarged size of initial hidden layers shown in Fig. 3(c), consists of 16 layers and contains 148715 of learning parameters. The analysis of learning results (see Fig. 8) showed that part of the Empty area (network parameters not used by any of the patterns) in Fig. 9 has decreased. For the trained version of the network (see Fig. 3(c)), the analysis for the reference patterns was carried out. The number of unused parameters of the network has decreased (areas Empty in Fig. 9), in comparison with the previous iteration (areas Empty in Fig. 7). This fact means that the network uses more of its parameters on the initial hidden layers and, consequently, on the subsequent layers, which affected the quality of classification. The average quality of the classification has improved, as it follows from the comparison of Fig. 8(a) and Fig. 6(a). The increase in the probability of the test set patterns matching the Exp class also confirms the decrease in the number of low probability patterns at the network output. It is possible to increase the number of parameters in the middle part of the network and to make some more iterations to improve the quality of classification. However, the above results are quite enough to demonstrate the capabilities of the hyperparameter choice method.
194
D. M. Igonin et al.
(a)
(b)
Fig. 8. Learning result at third iteration for the test set: (a) probability matrix; (b) histogram of probability distribution at the network output
Choice of Hyperparameter Values for Convolotional Neural Networks
195
Fig. 9. Using network parameters at third iteration for each of the reference patterns
5
Conclusions
The article suggests an iterative method for choosing the hyperparameter values of a convolutional neural network. This method is based on the analysis of using trained neural network parameters on a given set of patterns. We solve the task of classifying functional dependencies as an example of this method application. Three iterations were performed, in the course of which information about the passage of patterns through the network was analyzed, and the values of its hyperparameters were corrected. When we perform these iterations, it was found that: 1. The analysis performed at the first iteration showed that the number of trained network parameters, concerning the specific problem being solved, is excessive. Based on the experience of solving problems of this kind, this amount can reduce by two times. Reducing the size of the network did not degrade the quality of the classification of patterns. This result confirms the hypothesis of an excessive number of trained network parameters. 2. The analysis performed at the second iteration demonstrated that the trained parameters on the initial hidden layers of the network are practically not used (the share of their use is less than 20%). Deconvolution layers were added to the network structure, which increased the use of trained parameters for initial hidden layers up to 40%, thereby increasing the quality of pattern classification. 3. At the last iteration, the initial variant of the network selected to solve the task has changed; the total number of configurable parameters has decreased by more than two times. But with such a change in the network, the value of such an important measure as the quality of classification has been preserved.
196
D. M. Igonin et al.
It was found experimentally that in the problem under consideration, the neural network solves the problem of classification ineffectively when using the trained parameters less than on 20% and more than 80%. Further development of the described approach to the choice of CNN hyperparameter values suggests its application to solve the problem of object detection on images obtained by remote sensing methods of the Earth’s surface, considered in [5,18].
References 1. Shakirov, V., Solovyeva, K., Dunin-Barkowski, W.: Review of state-of-the-art in deep learning artificial intelligence. Opt. Mem. Neural Networks 27(2), 65–80 (2018) 2. Neapolitan, R.E., Jiang, X.P.: Artificial Intelligence with an Introduction to Machine Learning. CRC Press, London (2018) 3. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press, Cambridge (2016) 4. Gu, J., et al.: Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108v6 (2017) 5. Igonin, D.M., Tiumentsev, Y.V.: Efficiency analysis for various neuroarchitectures for semantic segmentation of images in remote sensing applications. Opt. Mem. Neural Networks 28(4), 306–320 (2019) 6. The brain from the inside (visualization of the pattern passing through the model of artificial neural network) (2019). https://habr.com/ru/post/438972/F 7. Krishnakumari, K., et al.: Hyperparameter tuning in convolutional neural networks for domain adaptation in sentiment classification (HTCNN-DASC). Soft. Comput. 24, 3511–3527 (2019) 8. Neary, P.L.: Automatic hyperparameter tuning in deep convolutional neural networks using asynchronous reinforcement learning. In: Proceedings of 2018 IEEE International Conference on Cognitive Computing, pp. 73–77 (2018) 9. Florea, A., Andonie, R.: Weighted random search for hyperparameter optimization. arXiv preprint arXiv:2004.01628v1 (2004) 10. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012) 11. Feurer, M., Hutter, F.: Hyperparameter optimization. In: Hutter, F., et al. (eds.) Automated Machine Learning, pp. 3–33. Springer, Cham (2019) 12. Cardona-Escobar, A.F., et al.: Efficient hyperparameter optimization in convolutional neural networks by learning curves prediction. In: Lecture Notes Computer Science, vol. 10657, pp. 143–151 (2018) 13. Diaz, G.I., et al.: An effective algorithm for hyperparameter optimization of neural networks. IBM J. Res. Dev. 61(4/5), 9:1–9:11 (2017) 14. Hinz, T., et al.: Speeding up the hyperparameter optimization of deep convolutional neural networks. Int. J. Comput. Intell. Appl. 17(2), 15 (2018) 15. Ososkov, G., Goncharov, P.: Shallow and deep learning for image classification. Opt. Mem. Neural Networks 26(4), 221–248 (2017) 16. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. arXiv preprint arXiv:1505.04597v1 (2015)
Choice of Hyperparameter Values for Convolotional Neural Networks
197
17. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561v3 (2016) 18. Igonin, D.M., Tiumentsev, Y.V.: Semantic segmentation of images obtained by remote sensing of the Earth. In: Advances in Neural Computation, Machine Learning, and Cognitive Research. Studies in Computational Intelligence, vol. 856, pp. 309–318 (2020)
Generation an Annotated Dataset of Human Poses for Deep Learning Networks Based on Motion Tracking System Igor Artamonov1, Yana Artamonova1(&), Alexander Efitorov2, Vladimir Shirokii2, and Oleg Vasilyev3
2
1 Neurocorpus Ltd., Moscow, Russia [email protected], [email protected] D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, Moscow, Russia 3 Russian State University of Physical Education, Sport, Youth, Tourism, Moscow, Russia
Abstract. In this paper, we propose an original method for relatively fast generation an annotated data set of human's poses for deep neural networks training based on 3D motion capture system. Compared to default pose detection DNNs trained on commonly used open datasets the method makes possible to recognize specific poses and actions more accurately and decreases need for additional image processing operations aimed at correction of various detection errors inherent to these DNNs. We used preinstalled IR motion capture system with reflective passive tags not to capture movement itself but to extract human keypoints at 3D space and got video record at corresponding timestamps. Obtained 3D trajectories were synchronized in time and space with streams from several cameras using approaches of mutual camera calibration and photogrammetry. It allowed us to accurately project keypoint from 3D space to 2D video frame plane and generate human pose annotations for recorded video and train deep neural network based on this dataset. Keywords: Deep learning Human pose detection tracking Photogrammetry Qualisys
Key points Motion
1 Introduction Modern recognition systems based on deep neural networks (DNN) have been actively developed during the recent decade. As the result, DNNs recognize images better than humans [1] and are used in real-world applications such as car autopilots [2] and medical diagnostics [3] and etc. These great results was made possible thanks to the development of hardware technology (GPU, TPU), open source software (tensorflow, pytorch and etc.) and open datasets with millions of annotated objects (Imagenet [4], MS COCO [5], and etc.). Despite these achievements, the development of customized applications for specialized tasks remains a difficult task, since it requires a large training set containing measured data (images) and descriptions for them. In our work, we will consider the problem of human poses detecting and generating an annotated © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 198–204, 2021. https://doi.org/10.1007/978-3-030-60577-3_22
Generation an Annotated Dataset of Human Poses
199
data set based on motion capture systems. Detection and capture of object movements (motion capture) are the technologies for determining, reconstructing and modeling the movements of a whole object or its parts [6]. In a broad sense, these technologies are a step towards a “remote presence” of a person that gives new opportunities in various fields. For medical applications it can provide qualified remote assistance, for education - an access to immersive learning with a deeper degree of immersion, for industry - the ability to work remotely in inaccessible or hazardous areas. Unfortunately, existing solutions are expensive and not widely available. This work attempts to propose a more accessible solution based on the generation of a labeled dataset of human poses. Modern motion recognition are divided into four groups [7, 8]: a) b) c) d) 1.1
fixed infrastructure with specialized video equipment and tags; motion detectors/sensors; specialized video equipment with no tags; conventional video equipment with no sensors/tags. Solutions Based on a Special Equipment with Landmarks
The first solutions in this class were introduced in early 1990s and since then some of them have been successfully commercialized and actively used in the film, animation and the gaming industries. Such systems require a complete infrastructure that includes dedicated site with high-frequency infrared range cameras installed and calibrated. This cameras together with a software can precisely fix coordinates of active (light-emitting) and passive (light-reflecting) marks. Examples of such optical systems are OptiTrack [9] and Qualisys [10]. Usage of specialized pre-installed and calibrated video recording equipment with marks allows to achieve high accuracy of motion capture and to form a high-quality 3D motion model. Disadvantages include a high price of the equipment, high requirements for staff qualifications and an allocation of a special site to mount all that infrastructure. Last restriction is particularly important as it is limiting motion capture by the working space of the site. 1.2
Solutions Using Specialized Sensors
These systems consist of a number of (1) independent active motion sensors installed at key points of moving object and (2) a signal recording and recognition system. Sensors can be based on various principles including (but not limited to) magnetic, inertial (MEMS), acoustic, radio frequency (RFID) etc. Most widely used sensors are based on combination of miniature accelerometers, barometers, gyroscopes with separate wireless data transmission part. Examples of such inertial systems are Rokoko [11], Neuron [12], Xsens [13]. The advantage of these systems is the ability to capture movement outside the fixed site with pre-mounted infrastructure. The main disadvantage of such systems is working on constant elevation level, with no possibility for dynamical lifting
200
I. Artamonov et al.
observation. The mechanism of operation and additional materials regarding tuning of a system based on inertial sensors is presented in [14]. 1.3
Solutions Based on a Specialized Video Equipment
Typical members of this class are integrated depth sensors like Kinect by PrimeSense/Microsoft® and Intel® RealSense that are widely used in researches of motion capture and recognition [15]. In this case the only equipment needed is a specialized camera. The advantage of this technology is a relative ease of application and lower cost of application. The disadvantages are the limited and noticeably varying field accuracy of the resulting 3D model, the restriction on the practical working distance up to 5–7 m from the detector and the restrictions on the number of “recognizable” participants. Presentation of geometric parameters, the quality of the data obtained and a theoretical analysis of errors and accuracy are given in [16, 17]. 1.4
Solutions, Based on a Standard Video Equipment Without Dedicated Marks, Sensors or Tags
This is the most modern approach in field of motion capture and recognition that become possible due to advance in the training and application of neural networks [18, 19]. In this case classic “motion capture” is changing into “motion recognition and semantic evaluation of the action” that is closer to what the human eye and brain do using. A comparison of the traditional methods of computer vision and convolutional neural networks is presented in [20]. This approach demonstrates clear advantage, because of the method is low cost, mobility of use, lack of tags, etc. Unfortunately, the quality of recognition is still far from the possibilities of active commercial use and does not allow to create 3D models [21] with the accuracy that is acceptable for specific practical applications.
2 Experimental Our experiments with human pose detection at real world video records showed that direct implementation of commonly used solutions for the pose recognition give a high amount of various errors, observed during the following cases: • unreasonable “ghosts” appear: an external object with no any humanoid attributes was detected as a person with few keypoints; • “siamese twins” arise when parts of the body of one person become an extension of another if people are close at image, even with no overlapping in frame; • parts of the body are lost and “cut off”; in particular this often applied to raised hands, which are perfectly visible and have good contrast with background; • people sitting behind the tables with lower parts of their bodies were hidden, nevertheless training dataset include “hidden” attribute of keypoints, we didn’t get
Generation an Annotated Dataset of Human Poses
201
hidden keypoints for such cases and the confidence of detected totally observable keypoints was low. The first two points could be easily solved by implementing pose detection DNN not to whole video frame, but to previously selected ROI with high confidence of human inside it. For this purpose faster-RCNN could be used. The main disadvantage of such system is greatly increased amount of computational resources required: if there are 10 humans and 10 ROIs were extracted, human pose DNN could be implemented 10 times for this frame. Unfortunately, points three and four couldn’t be easily solved directly with some additional computational methods: transfer learning at relevant annotated datasets required. A standard way to generate such dataset is generate annotations by hands for relevant images, for example as mentioned in [22] creation of full COCO Data Set (91 common object categories with 2,500,000 labeled instances in 328,000 images) required more than 70000 man-hours. Of course, this way couldn’t solve 4th point and required a lot of human-time work. The elegant solution is a using motion tracking equipment with video recording for this purpose. The whole pipeline is demonstrated at Fig. 1 and includes the following steps: Use motion tracking system and synchronized cameras to get human keypoints’ 3D coordinates and video record of human actions; Use Generative Multi-column Convolutional Neural Networks (GMcCNN) [23] to remove specific objects (such as flexible sensors or light colored points) used by motion tracking system to catching keypoints. Project 3D points to 2D camera frame plain to get keypoint’s coordinates at corresponding images; Use generated dataset to open pose system [19] training and validation.
Fig. 1. Image processing pipeline for automatically generating annotated pose dataset based on motion tracking system and video record. DNN1 – GMcCNN based network for tracking system sensors removing from original image, DNN2 – OpenPose system of keypoints detection.
202
I. Artamonov et al.
In our experiments the Qualisys system that was used is an infrared optical system with passive markers. During the measurements using the Qualisys system testees wear only small number of reflecting tags and do not wear any EM emitting or generating equipment. The system does not affect their health in any way and is completely safe. We are collecting only tag tracks and do not collect any personal information that can discover identities or other viable information of testees. The Qualisys is one of the most expensive and accurate solutions in the line of motion capture systems, since it requires installation, fixation and accurate calibration of infrastructure. However it allows to capture motion with extremely high accuracy that is as high as to 1–2 mm across the field up to 10 m 10 m with sampling rate 100 Hz. Our setup included the full Qualisys package with Qualisys Track Manager (QTM) and Visual 3D (C-Motion) software, a set of Oqus 3-series cameras; markers with a reflective coating (diameter from 2.5 to 40 mm), pixel size 1280 * 1024, 500 fps, x-coordinate 82000, y-coordinate 65000. There were 4 short sessions recorded, less than 5 min at total, but because of high sampling rate after filtration irrelevant 3D coordinates records we got approximately 17 thousands of human pose records. After 2D projection and light-spots removing prepared dataset was used for default openpose network validation and transfer learning. To estimate the results by average precision (AP) metric [24]. It should be noted, that there are no direct corresponding between Qualisys keypoints and MS COCO keypoints and part of keypoints have location slightly shifted relative to the center of the joint that’s why a worse results than known benchmarks were demonstrated at test dataset (subset of frames + poses did not used for DNN training): 43.2 for default network and 51.9 for retrained network.
3 Discussion In this paper, a full pipeline for the automatic generation of large annotated human pose datasets was proposed, which is crucial part of create successful artificial intelligent based custom systems of human action is proposed s recognition. The following recommendations could be implemented to descripted pipeline to improve the results: Using sensor based motion tracking systems (e.g. Rokoko) to 3D points coordinates generation: these systems allows working with multiple people and hidden keypoints, instead of optical-based systems, such Qualisys used at given paper. An adjustment model from keypoints estimated by motion tracking system to COCO keypoints should be developed to correct points localization of generated annotations. Acknowledgments. The reported study was funded by FASIE according to the research project № 1GS1NTI5/43222 06.09.2018.
Generation an Annotated Dataset of Human Poses
203
References 1. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 2. Oliveira, G.L., Burgard, W., Brox, T.: Efficient deep models for monocular road segmentation. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, pp. 4885–4891 (2016) 3. Wu, Y.-H., Gao, S.-H., Mei, J., Xu, J., Fan, D.-P., Zhao, C.-W., Cheng, M.-M.: JCS: an explainable COVID-19 diagnosis system by joint classification and segmentation. arXiv: 2004.07054 (2020) 4. https://image-net.org/download 5. https://cocodataset.org/#home 6. Su, Y., Backlund, P., Engström, H.: Business intelligence challenges for independent game publishing. Int. J. Comput. Games Technol. 2020, 1–8 (2020). Article id 5395187 7. Mündermann, L., Corazza, S., Andriacchi, T.: The evolution of methods for the capture of human movement leading to markerless motion capture for biomechanical applications. J. Neuroeng. Rehabil. 3(1), 1–11 (2006). Article no. 6 8. Jáuregui, D., Horain, P.: 3D Motion Capture by Computer Vision and Virtual Rendering. Lambert Academic Publishing, Saarbrücken (2012). 156 p. 9. https://www.optitrack.com/ 10. https://www.qualisys.com/ 11. https://www.rokoko.com/ 12. https://neuronmocap.com/ 13. https://www.xsens.com/ 14. Chen, P., Li, J., Luo, M., Zhu, N.: Real-time human motion capture driven by a wireless sensor network. Int. J. Comput. Games Technol. 2015(8), 1–14 (2015) 15. https://web.archive.org/web/20120103103154/http://www.kinectforwindows.org/ 16. Khoshelham, K.: Accuracy analysis of Kinect depth data. In: International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XXXVIII-5/W12, pp. 133–138. ISPRS (2011) 17. Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D., Davison, A., Fritzgibbon, A.: KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In: UIST 2011 Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 559– 568. ACM (2011) 18. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: CVPR 2014, pp. 1–9 (2014). arXiv arXiv:1312.4659v3 19. Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv:1812.08008 (2018) 20. Walsh, J., O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G., Krpalkova, L., Riordan, D.: Deep learning vs. traditional computer vision. In: Conference: Computer Vision Conference (CVC), pp. 1–17. CVC (2019) 21. Mehta, D., Sotnychenko, O., Mueller, F., Xu, W., Elgharib, M., Fua, P., Seidel, H.-P., et al.: XNect: real-time multi-person 3D motion capture with a single RGB camera. ACM Trans. Graph. 39(4), 1–24 (2020).. Article 1. arXiv:1907.00837v2
204
I. Artamonov et al.
22. Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Lawrence Zitnick, C., Dollar, P.: Microsoft COCO: common objects in context. arXiv:1405.0312v3 (2014) 23. Wang, Y., Tao, X., Qi, X., Shen, X., Jia, J.: Image inpainting via generative multi-column convolutional neural networks. arXiv:1810.08771 (2018) 24. https://cocodataset.org/#keypoints-eval
Automatic Segmentation of Acute Stroke Lesions Using Convolutional Neural Networks and Histograms of Oriented Gradients Nurlan Mamedov1(&) , Sofya Kulikova1 , Victor Drobakha2, Elena Bartuli2, and Pavel Ragachev2 1
Higher School of Economics (National Research University), Perm, Russia [email protected] 2 Perm State Medical University named after Academician E. A. Wagner, Perm, Russia
Abstract. In this work, research was focused on the recognition of stroke lesions in FLAIR MRI images. The main tools used in the study were convolution neural networks (CNN) and histograms of oriented gradients (HOG). To train the neural network, 706 FLAIR MRI images of real patients were collected and labeled. During the testing of the program implementing the algorithm, it was found out that the histogram of oriented gradients method was ineffective in clarifying the edges of the lesion. To replace the HOG technology, we proposed a method on calculating the average pixel intensity in the lesion area. The result is a program that shows an average value of 0.554 by Dice score. The developed program can be used as a digital assistant for physicians who identify the lesions of stroke on MRI images and for training physicians in radiology and neurology. With the help of the developed technology it is possible to reduce the time of detection of the stroke lesions, reduce variability of results, increase their reproducibility and process a huge amount of data. Keywords: CNN
Stroke MRI Segmentation
1 Introduction Stroke is a neurological disorder caused by disrupted or severely reduced brain blood supply. It is one of the leading causes of death and disability in the world. Accurate and timely identification and characterization of stroke lesions is crucial for choosing the right treatment, making correct recovery prognosis and selecting appropriate rehabilitation strategies. Using magnetic resonance imaging (MRI) has significantly advanced diagnostics and follow-up of stroke lesions. Yet the gold-standard for lesion segmentation is still their manual outlining by radiology experts. Such an approach is time consuming and is prone to high inter-expert variability of the results. Thus, developing automated approaches for segmentation of the stroke lesions should facilitate and standardize decision-making in stroke management. Furthermore, such automated approaches are indispensable for conducting clinical research on large databases when it is not possible to inspect all data manually. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 205–211, 2021. https://doi.org/10.1007/978-3-030-60577-3_23
206
N. Mamedov et al.
A number of various approaches have been suggested for automatic lesion segmentation in stroke with the best performance achieved by the algorithms based on a Random Decision Forest (RDF) and a Convolutional Neural Networks (CNN) [6, 7]. For example, Extra Tree Forests is the one of the RDF-like methods; R-CNN, U-Net, Yolo are the architectures based on CNN. Yet none of the existing approaches achieves the acceptable level of segmentation accuracy compared to manual segmentation. Furthermore, most of the approaches require acquisition of several MRI images, often including diffusion-weighted images, which are not always available under clinical protocols. Since FLAIR (fluid-attenuated inversion recovery) images are almost always included in stroke MRI scanning protocols the goal of the present work was to suggest an automated procedure for segmenting acute stroke lesions on FLAIR MRI images and compare the segmentation results with the existing FLAIR-based approaches on the same clinical dataset. In the research the CNN is the main tool for lesion detection and the HOG is the method for lesion edge detection.
2 Material and Methods 2.1
Material and Image Pre-processing
Database that was used in this work for the development and evaluation of the proposed lesion segmentation method was collected from Perm State Medical University named after academician E. A. Wagner. Dataset contains 706 MRI-FLAIR images were taken on a MRI scanner GE Healthcare Brivo MR 355 with magnetically intense 1,5 T. All lesions were annotated manually. All annotation info is in the “.xml” file format for each of the scan. Another 136 images without labels were chosen for testing. All images were converted from DICOM to JPG format. No skull stripping was used in this research. In each image was labeled the area of lesion (Fig. 1).
Fig. 1. Image labeling
Automatic Segmentation of Acute Stroke Lesions
2.2
207
Lesion Segmentation
2.2.1 Convolutional Neural Network The proposed method uses CNN as the main tool to detect the area of the stroke lesion and the histogram of oriented gradients for edge detection. Convolutional neural networks are networks whose main idea is to reuse the same parts of the neural network to work with small areas of input data. They are mainly used to recognize objects in an image (Fig. 2).
Fig. 2. Scheme of the convolution neural network
The essence of the convolution layer is that if a part of the image in general is similar to the sought figure, which is determined by the convolution filter, then its sum of multiplied values will be greater. The convolution filter is the matrix, which is obtained during the training. 706 FLAIR MRI images were used to train the network to recognize the stroke lesions. The data from the convolution layers are fed to the full connected layer. A full connected layer is a layer that outputs data to a K-dimensional vector, where K is the number of recognition classes. Then the program selects the one. Advantages of using a convolution neural network: 1) 2) 3) 4) 5) 6)
One convolution filter is used for the whole image, instead of each pixel. The grid preserves the input structure as it applies to each input individually. The ability to parallelize the calculations. The method of reverse error propagation is used for learning. Resistant to small image turns and changes. Plenty of ready-made architectures for applied use.
2.2.2 CNN Architectures CNN is great approach to recognize objects on images, but in practice objects have no fixed ratio, size, rotation, and all of these factors makes recognizing quite a heavy computational process, because network should check a humongous number of regions. Because of it, were invented architectures that solves that problem. 1. R-CNN [4] - divides image to regions with selective search, based on color similarity, texture similarity and shape compatibility. Then regions become an input of the CNN and get classified. The problem is that about 2000 regions extracts from each image - it takes about 53 s per image.
208
N. Mamedov et al.
2. Fast R-CNN [3] - applies one neural network to image to generate a convolutional feature map. Then identifies the regions of interests (RoI) with the selective search and they get processed by fully connected layer. The method faster than R-CNN in 9 times. 3. Faster R-CNN [11] - similar to Fast R-CNN architecture uses neural network to generate a feature map, but then it does not use the selective search, instead of it uses a different network to predict the region proposals. Then they get reshaped using a RoI pooling layer, which is then used to classify the objects. 4. YOLO (You Only Look Once) [10] - applies one neural network to the whole image to divide it into regions and predict objects and probabilities for each region. The architecture works faster than previous ones but has a little less accuracy. YOLO has issues with small objects in image because of constraints of the algorithm. In this research was used Faster R-CNN as it is more appropriate and effective in terms of speed and accuracy of recognition. 2.2.3 Histogram of Oriented Gradients A histogram of directional gradients is a way to represent a graphical object based on distinguished features. It is used in computer vision to edge detection and recognizing objects in the image. The principle of work consists in dividing the image into many small areas and calculating the direction of pixels in it. The example of HOG in Fig. 3.
Fig. 3. The example of using HOG
2.3
Proposed Approach
The first stage of the research approach is the training the Faster R-CNN on the labeled dataset. It can be trained end-to-end by back-propagation and stochastic gradient descent [5]. Then the neural network is able to detect the regions of the lesion. Proposed method cuts the regions and then uses the algorithm for clarifying the edges on them. Then it overlays the original image and shows the result of processing.
Automatic Segmentation of Acute Stroke Lesions
209
3 Results 3.1
Issues
During the testing it was found out that it was inefficient to use the method of HOG to detect the edges of the lesion. This is due to the fact that the resolution of the image is low, and the program also works only with a part of it, i.e. with even lower resolution. Because of the issue was developed a method based on calculating the average number of intensity pixel in the recognized area. It consists of two steps: 1) Calculation of the average intensity value among all pixels in the selected area. 2) Covering all pixels whose intensity is above the average. Examples of automatic segmentation are presented in Fig. 4 and Fig. 5.
Fig. 4. Stroke lesion detection on FLAIR-MRI image
Fig. 5. The stroke lesion edges detection
3.2
Comparison with Modern Technologies
Segmentation results were compared using the Dice score across different existing segmentation approaches (Table 1).
210
N. Mamedov et al.
The Dice score (1) is a score that measures the overlap of the manual ground truth lesion segmentation and the automatic lesion segmentation. It was used as the general outcome measure. It was used as the general outcome measure. DiceðA; BÞ ¼
2 jA \ Bj j Aj þ jBj
ð1Þ
Here A is the automatically segmented area and B is the ground truth segmentation. Table 1. Comparison with modern technologies Methods 1) Proposed method on FLAIR-MRI scans 2) Method using customized Markov random fields on FLAIR-MRI scans [13] 3) Method using CNN on DWI-MRI scans [2] 4) Method using DenseNets on DWI-MRI scans [15] 5) Method using automatic tree learning anomaly segmentation [1] 6) Method using two DeconvNets and MUSCLE Net on DWI-MRI scans [13]
Dice score 0.554 ± 0.156 0.582 ± 0.257 0.580 ± 0.230 0.790 ± 0.120 0.670 ± 0.160 0.475 ± 0.135
4 Discussion and Conclusion This work presented a developed technology for the acute stroke recognition on the FLAIR MRI images using Faster R-CNN and the algorithm based on the intensity of the pixel in the region of lesion. Research was carried out in the field of neural networks, especially convolutional neural network, their practical use in the recognition of acute stroke lesions in FLAIRMRI images. The use of HOG proved to be ineffective; instead, an algorithm of edge detection based on information about the pixel intensity in the selected area was used. The documented applications of CNN showed an average Dice score of 0.554. The prospect of the study is to use a U-Net architecture and to retrain the network on a larger dataset labeled by specialists in radiology and neurology. The retraining will improve and refine the results of the neural network. Practical application of the system for recognition of centers of stroke in MRI images can be used as a digital assistant for doctor to reduce the time of MRI images analysis, clarify the boundaries of the centers, increase the reproducibility of the results and process a huge amount of data.
References 1. Boldsen, J., Engedel, T., Pedraza, S., Cho, T., Thomalla, G., Nighoghossian, N., Baron, J., Fiehler, J., Ostergaard, L., Mouridsen, K.: Better Diffusion Segmentation in Acute Ischemic Stroke Through Automatic Tree Learning Anomaly Segmentation. https://www.frontiersin. org/articles/10.3389/fninf.2018.00021/full. Accessed 25 Feb 2020
Automatic Segmentation of Acute Stroke Lesions
211
2. Chen, L., Bentley, P., Rueckert, D.: Fully automatic acute ischemic lesion segmentation in DWI using convolutional neural networks. NeuroImage Clin. 15, 633–643 (2017) 3. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169. Fg 4. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp. 580–587 (2014). https://doi.org/10.1109/CVPR. 2014.81. Fg 5. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541 6. Maier, O., Schröder, C., Forkert, N.D., Martinetz, T., Handels, H.: Classifiers for ischemic stroke lesion segmentation: a comparison study. PLoS ONE 10(12), e0145118 (2015). https://doi.org/10.1371/journal.pone.0145118. Published correction appears in PLoS One. 2016;11(2):e0149828 7. Nielsen, M.: Neural Network and Deep Learning. Determination Press, Brisbane (2015) 8. Nikolenko, S., Kadurin, A., Arkhangelskaya, E.: Глyбoкoe oбyчeниe. Пoгpyжeниe в миp нeйpoнныx ceтeй. Питep (2017) 9. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91 10. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031 11. Subbanna, N., Rajashekar, D., Cheng, B., Thomalla, G., Fiehler, J., Arbel, T., Forkert, N.: Stroke Lesion Segmentation in FLAIR MRI Datasets Using Customized Markov Random Fields. https://www.ncbi.nlm.nih.gov/pubmed/31178820. Accessed 04 Dec 2019 12. Woo, I., Lee, A., Jung, S.C., Lee, H., Kim, N., Cho, S.J., Kim, D., Lee, J., Sunwoo, L., Kang, D.-W.: Fully automatic segmentation of acute ischemic lesions on diffusion-weighted imaging using convolutional neural networks: comparison with conventional algorithms. Korean J. Radiol. 20(8) 1275–1284 (2019) 13. Zhang, R., Zhao, L., Lou, W., Abrigo, J., Mok, V., Chu, W., Wang, D., Shi, L.: Automatic segmentation of acute ischemic stroke from DWI Using 3-D fully convolutional DenseNets. IEEE Trans. Med. Imaging 37, 2149–2160 (2018)
Learning Embodied Agents with Policy Gradients to Navigate in Realistic Environments Alexey Staroverov1 , Vladislav Vetlin2 , Stepan Makarenko1 , Anton Naumov2 , and Aleksandr I. Panov1,3(B) 1
Moscow Institute of Physics and Technology, Moscow, Russia [email protected] 2 Higher School of Economics, Moscow, Russia 3 Artificial Intelligence Research Institute, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow, Russia
’ Abstract. Indoor navigation is one of the main tasks in robotic systems. Most decisions in this area rely on ideal agent coordinates and a preknown room map. However, the high accuracy of indoor localization cannot be achieved in realistic scenarios. For example, the GPS has low accuracy in the room; odometry often gives much noise for accurate positioning, etc. In this paper, we conducted a study of the navigation problem in the realistic Habitat simulator. We proposed a method based on the neural network approach and reinforcement learning that takes into account these factors. The most promising recent approaches were DDPPO and ANM for agent control and DF-VO for localization, during the analysis of which a new approach was developed. This method takes into account the non-determinism of the robot’s actions and the noise level of data from its sensors. Keywords: Reinforcement learning · Active Neural Mapping · Navigation · Neural networks · ROS · DDPPO · Cognitive mapping and planning · Habitat · SLAM · RTAB-MAP
1
Introduction
Reinforcement learning [10] is one of the methods of machine learning using which the intelligent system (agent) learns by interacting with a specific environment. The response of the environment to the decisions made is reinforcement signals. Therefore such learning is a special case of supervised learning, but the supervisor is the environment. Nonetheless, ongoing reinforcement learning research focuses heavily on overly simplified tasks performed in monotonous virtual environments that one cannot transfer to real-world tasks. Today, intelligent robotics that uses deep neural models is far from functional when robots c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 212–221, 2021. https://doi.org/10.1007/978-3-030-60577-3_24
Learning Embodied Agents
213
are free to navigate in the semantically rich real world. The navigation problem is one of the first components that need to be implemented in a general control architecture for an intelligent agent. The statement of the navigation problem can be formulated in different forms. In this article, we discuss the following to the given coordinates, the exploration of the area to compile its detailed map and the task of navigation to the given objects. To build the required robot and train it in real-world areas could be a very demanding task, and reinforced learning requires much time to train the agent, which makes learning from scratch on the robot ineffective. To overcome this problem, during the initial stage of training, we have used the simulator, which most plausibly simulates all the processes occurring with the robot. There are many simulators to pick from for simulating the process of movement of a mobile robot. We have decided to use the Habitat environment [1] that was developed by Facebook. Habitat is a versatile, high-performance 3D simulator with multiple sensors, agents parameters that allow us to handle 3D datasets. As a 3D dataset, we have used the Gibson and Matterport Databases of 3D premises. Gibson’s primary database consists of 572 full buildings; we used a small part of it for Habitat Sim with 60 scenes. Matterport has got 90 scenes. The databases were accumulated as a reconstruction of a 3D scanning from real indoor spaces. For each scene, RGB-D, semantic object annotations, the 3D reconstruction, and surface normal are available. With the use of these datasets, the Habitat environment enables the training of agents in very photorealistic conditions, after which agents model can be transfer into real life. This approach supports the shift of paradigm from ‘internet AI’ based on static datasets (e.g. VQA, ImageNet, COCO) to embodied AI where agents could act as they were under realistic circumstances, testing abilities like long-term planning, map reconstruction and learning from interaction.
2
Problem Formulation
The navigating task to the given coordinates initializes the agent at a random place on the map. The goal is the target coordinates, which are set as (“Go 5 m to the north, 3 m to the west relative to the beginning”). The room map is not available to the agent, and during the evaluation process, the agent can only use the input from the RGB-D camera for navigation. The agent had four actions: forward, turn left, turn right, and stop. Evaluation occurs when the agent selects the ‘STOP’ action. As a metric, SPL (Success weighted by Path Length) is used (1). The episode is considered successful if, when calling ‘STOP’, the agent is within 0.36 m (2x radius of the agent) from the coordinates of the target.
SP L =
N li 1 N i=1 max (pi , li )
(1)
214
A. Staroverov et al.
where: li = length of shortest path between goal and target for an episode pi = length of path taken by agent in an episode. The main features of the Habitat environment are: – Noisy Actuation and Sensing: In many simulators, the agent actions are deterministic, when the agent performs turn-right 10◦ , it turns precisely 10◦ , and forward 0.10 m affects the agent exactly 0.10 m forward (exclude collisions situations). However, in the real world, no robot moves like that due to the actuation error, surface friction, and many other sources, which result as a significant drift over a long trajectory. Also, RGB-D sensor noises are notable and should be count into consideration. – Collision Dynamics and ‘Sliding’: In MINOS, Deepmind Lab, AI2 THOR, Gibson v1, and many others simulators, by default, when a collision happens, this performs the agent slide along the obstacle. In the real world, it is not possible due to the damage that the agent could take. This behavior is commonly present in video game engines as it makes human control more predictable. Habitat developers have found that this behavior enables ‘cheating’ by learned agents; the agents exploit this sliding mechanism and starts to travel through non-navigable areas of the environment (like walls). To correct this issue, sliding on collisions was disable.
3
Existing Approaches
The navigational research has a long history in robotics. A classic pipeline consists of modules: mapping the environment, localizing the agent, planning a path given the map, and following this path. The first two steps often treated as one and named SLAM (simultaneous localization and mapping). Another pipeline is to use end-to-end RL algorithms. RL approach presented massive progress in a couple of years, taking first place in different navigation challenges. Despite that, it could be computationally hard to train from the ground and struggle when deployed in previously unseen realistic large environments. As a state of the art SLAM algorithms, we have taken ORBSLAM2 and RTAB-MAP. Within RL methods, we have noticed DDPPO because of its efficient training process and ANM because of its ability to build a map and divide a task into subtasks. 3.1
Decentralized Distributed Proximal Policy Optimization
In reinforcement learning (RL) algorithms, one of the main ideas is asynchrony. Asynchronous distribution is a very demanding process, even minor errors can lead to agent failure. This makes RL very different from supervised learning, where the synchronous distribution of learning is done through data parallelism. The values of the new parameters are calculated as the weighted average of the
Learning Embodied Agents
215
gradients of all the workers. This parallelism provides a linear acceleration of learning speeds up to 32,000 GPUs. Decentralized Distributed Proximal Policy Optimization (DDPPO) adapted this idea for on-policy RL algorithms [11]. As a general abstraction, this method implements the following: at step k, worker n has a copy of the parameters, θnk , calculates the gradient, ∂θnk , and updates θ via k θnk+1 = P aramU pdate θnk , AllReduce(θ J P P O (θ1k ), ..., θ J P P O (θN )) (2) where ParamUpdate is any first-order optimization technique (e.g. gradient descent) and AllReduce performs a reduction (e.g. mean) over all copies of a variable and returns the result to all workers. The core of DDPPO is the Proximal Policy Optimization (PPO) algorithm [9]. The PPO method for training an RL agent based on a gradient descent over a policy agent. There are two versions of this algorithm. The first based on Trust region policy optimization (TRPO) [8], second based on the clipping method. The first method is to move the gradient vector so long that the Kullback Leibler distance between policy at the end of the gradient vector and the policy at the current point is minimal. A second method is to limit how much the gradient vector can change. Given a θ-parameterized policy πθ and a set of trajectories collected with it (commonly referred to as a ‘rollout’), PPO ˆ ˆ updates Tπθ as follows. Let At = Rt − Vt , be the estimate of the advantage, where Rt = i=t γ i−t ri , and Vˆt is the expected value of Rt , and rt (θ) be the ratio of the probability of the action at under the current policy and the policy used to collect the rollout. The parameters are then updated by maximizing. J P P O (θ) = Et min(rt (θ)Aˆt , clip(rt (θ), 1 − ε, 1 + ε)Aˆt ) (3) Clip means that if the gradient has changed more than (1+) or (1−) times, then we equate the new gradient with the nearest border. DDPPO is an end to end algorithm that trained in the Habitat environment. As an input, the agent uses data from the RGB-D camera and GPS + Compass sensor. Then all input pass through the neural network SE-Resnext-101 [3] and 1024 layers of LSTM [7]. The agent, represented by Facebook, went through a long training process and showed good results. We believe that this algorithm can be successfully used as an integral part of the learning process. Its main disadvantage is that in the absence of a GPS + Compass sensor, the SPL metric drops from 0.98 to 0 at 100 million steps and to 0.15 at 2.5 billion steps. 3.2
Active Neural Mapping
The Active Neural Mapping (ANM) algorithm [2] introduced a modular navigational paradigm that combined the classical and learning-based navigational approaches. The main idea of ANM is to split the agent into four components: Mapper, Global policy, Planner, and Local policy.
216
A. Staroverov et al.
The use of the learning process in the ANM compare to the classic approaches like SLAM and planning provides versatility to input variation and more resistance to noises. The learned global policy can exploit consistent patterns in the structure of real-world environments, learned local policies that use visual feedback could achieve more robust behavior. As a result, this modular approach leads to better performance as well as sample efficiency. – Mapper builds a map of the investigated room and determines the location of the agent on this map. The map has dimensions of 2 × M × M . Where the first channel is the probability of finding an obstacle at a particular point, and the second channel is the level of how much this cell is explored on the map. At each step, Mapper receives current data from an RGB camera, current and previous data from GPS and compass sensors with some noise imposed on them, likewise last state of the map. On output, this module gives the constructed map and the current coordinates of the agent with noise reduction. – Global policy is used in the task to build a global goal for the algorithm. This module sets the algorithm a global goal where move. For example, if the agent sees that the point where he needs to get is outside of the wall from him, then global policy paves his way to bypass the obstacle. As an input to the Global policy comes map, coordinates, obtained from the mapper module, also many points that the agent has already visited. The global policy output gives the coordinates of the current global goal for the agent. – Planner receives a map, current position and global target. Based on the data received, the algorithm uses Fast Marching Method and generates a local coordinate near the agent to get to the point issued by global policy in the most optimal way. All unexplored points on the map are replaced by empty space. – Local policy is used to obtain a specific agent action and editing this action depending on a minor obstacles in front of the agent. It receives an RGB image and a local target from a planner. As output gives a specific action for the agent. 3.3
Simultaneous Localization and Mapping
Simultaneous Localization and Mapping (SLAM) is the task of reconstructing the map of the unknown with the simultaneously localizing agent position. In the last decades, SLAM became a very popular approach in the Computer Vision and Robotics fields and been used in many high-technological companies. In our work, we used SLAM methods to determine the location of the robot and pass this agent position to the RL part. As the most promising visual odometry method, based on neural networks, we used deep frame-to-frame visual odometry (DF-VO) [12]. Out of classic SLAMs, we have focused on the RTAB-MAP [5] and ORB SLAM 2 [6].
Learning Embodied Agents
217
ORB SLAM 2: SLAM algorithm that uses the ORB hotspot detector and descriptor, as well as bag-of-words optimization. The algorithm finds key points in the old and new image, and then uses these key points to determine the offset between frames. The detected position is used to update the global map, further adjusting its position prediction on the map itself. The algorithm does not build a full sparse map of the area, but you can build a complete map using keyframes with key points found in both images (before and after the step). RTAB-MAP: Graph-based SLAM method. It uses a visual loop closure detector to lock and adjust complete loops. To associate new and old data, the algorithm compares frames obtained from different angles. For matching frames, it can use various detectors and keypoint descriptors, including ORB. Each node of the graph (keyframe) contains its position in 3D space, a color 3D depth map, and a list of keymap points that are found in this image. The edges of the graph reflect the relationship between these nodes. A link is created only between neighboring nodes or nodes between which loop closure is detected. The loop closure detector searches for the relationship between the keypoint descriptors of the current keyframe and the key point descriptors found previously. If the number of common key points for the current keyframe and the frame with the highest match exceeds a certain threshold, then the loop closes. In this case, the position of the current frame is refined to match the keyframes with the previous one, and the remaining loop nodes are optimized. And the corresponding new edge is added. After finding the loop closure, the graph positions are optimized in order to minimize errors in the graph (Fig. 1).
Fig. 1. An example of constructing a trajectory with the RTAB-MAP algorithm. The green line is the path that RTAB-MAP builds based on the map.
DF-VO: Monocular visual odometry (VO) algorithm that uses geometry-based and deep learning methods. Comparing to the existing SLAM and visual odometry approaches based on geometry, they all have to be fully adapted for each different task scenarios, when DF-VO shows its robustness to most circumstances. Also, classic algorithms experience scale-drift issues during long trajectories. Some recent works that were based on the machine learning implemented
218
A. Staroverov et al.
visual odometry as an end-to-end task, but the performance is far from equal to geometry-based methods. That’s why DF-VO combines both deep learning and geometry-based methods. In specifically, DF-VO utilizes single-view depths and two-view optical flows convolutional neural networks (CNNs). Given these CNN outputs, geometry-based correspondences are established. As for depth CNN, we have disabled it since we have a ground truth depth sensor. As for optical flow CNN, the LiteFlowNet [4] was used. LiteFlowNet is formed of two sub-networks that are trained as pyramidal feature extraction and optical flow estimation.
4 4.1
Experiments Reinforcement Learning Policy
As a solution, DDPPO and ANM with ground truth map variations and additional segmentation layers were tested. Despite the fact that these algorithms showed almost perfect results in Habitat challenge 2019, with the addition of noise to the actions and sensors, the results are changed. Even with the ideal agent position the trained models could not demonstrate the stated outcomes in the new conditions, and their complete training from scratch (100 million steps) gives the result of 0.78 success rate to us (Fig. 2).
Fig. 2. The success rate metric that algorithm achieve after 100 million steps
Since the ANM mapper module train process is based on supervised learning, but due to the noise, the ideal map can not be built, ANM performance is worst than DDPPO. Therefore our choice was stopped at DDPPO. Our agent completely learned to move in the environment with the ground truth position without collisions, even with the presence of noise. (Fig. 3, 4).
Learning Embodied Agents
4.2
219
Localization Modules
Our goal was to complete the task without an ideal agent position. To achieve that, RTAB-MAP and DF-VO were used as the odometry module and gave it output to trained DDPPO. Comparing ORB-SLAM 2 and RTAB-MAP we can say that they are very close in performance, but RTAB-MAP can use various detectors and descriptors depending on the task, so we made a choice in its favor.
Fig. 3. DDPPO result with action and sensor noises
Fig. 4. DDPPO result without noises
In RTAB-MAP and DF-VO comparison, we use ground truth coordinates and compare only the proximity of the restored trajectory. Overall, RTAB-MAP could achieve much higher quality than DF-VO, but it fails in noisy conditions and starts to output the same coordinates every step regardless of agent movements. On this example, (Fig. 5) RTAB-MAP (red line) reconstruct ground truth (green line) with decent quality until the moment RTAB-MAP fails and stops in one place (red line is stopped, but the green line goes on). DF-VO, in comparison, tracks the agent until the end of the trajectory.
Fig. 5. RTAB-MAP and DF-VO comparison. The green path is ground truth, the red path is predicted one.
220
4.3
A. Staroverov et al.
Combined Approach
As the overall approach, we used DF-VO as a localization module and trained DDPPO with its coordinates. This approach gave us SPL around 0.32 in normal condtions and 0.16 in noisy conditions (Table 1). We evaluated these results at ten different maps and took the average. We conducted three parts of the experiment, test our approach under action and camera noise conditions, under only camera noise condition, and with no noise at all. Also, to compare, we tested zero coordinates (zero pos) with DDPPO and ideal coordinates (ground truth pos) from the environment with DDPPO. For the RTAB-MAP, turn angle was reduced from 10◦ to 5, since RTABMAP cannot track position with such a big difference between frames. With the presence of camera noises, RTAB-MAP also fails, and output zero position at every step, but in good conditions outperform DF-VO by far and could achieve almost ideal localization. The main reason why both RTAB-MAP and DF-VO performance significantly worst than ground truth coordinates, it is hard to determine the final stopping place relative to the reconstructed map. Especially if the goal is near the wall, an agent could reconstruct the goal coordinates on the other side of the wall due to the localization error and start to move in to the different room. Table 1. Table with SPL results for different types of DDPPO localization module Zero Ground truth RTAB-MAP DF-VO Action and sensor noise 0.08 0.58
5
0.09
0.16
Sensor noise
0.10 0.62
0.11
0.20
Without noise
0.13 0.72
0.40
0.32
Conclusion
In this paper, we propose a new reinforcement learning method to train the agent whose goal is to navigate through unseen map of the indoor area. Extensive work has been done to study and test most promising approaches for the navigational task in the Habitat environment. We use DDPPO as the basic algorithm and introduce a number of improvements and modifications that are intended to improve the navigation accuracy. As the localization module, we use DF-VO to determine the position of the agent, which allowed us to navigate using only RGB-D input. The proposed approaches have shown their effectiveness in the event of different kind of noises. In the future, we plan to develop our method and improve the localization accuracy, including for the task of navigation to the object. Acknowledgements. The reported study was supported by RFBR, research Project No. 17-29-07079.
Learning Embodied Agents
221
References 1. Abhishek Kadian*, Joanne Truong*, Gokaslan, A., Clegg, A., Wijmans, E., Lee, S., Savva, M., Chernova, S., Batra, D.: Are we making real progress in simulated environments? measuring the sim2real gap in embodied visual navigation. arXiv preprint arXiv:1912.06321 (2019) 2. Chaplot, D.S., Gandhi, D., Gupta, S., Gupta, A., Salakhutdinov, R.: Learning to explore using active neural slam (2020) 3. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018) 4. Hui, T.W., Tang, X., Loy, C.C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation (2018) 5. Labb´e, M., Michaud, F.: RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for large-scale and long-term online operation. J. Field Robot. 36(2), 416–446 (2019). https://doi.org/10.1002/rob.21831 6. Mur-Artal, R., Tardos, J.D.: ORB-SLAM2: an open-source slam system for monocular, stereo, and RGB-D cameras. IEEE Trans. Robot. 33(5), 1255–1262 (2017). https://doi.org/10.1109/tro.2017.2705103 7. Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling (2014) 8. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization (2015) 9. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017) 10. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018) 11. Wijmans, E., Kadian, A., Morcos, A., Lee, S., Essa, I., Parikh, D., Savva, M., Batra, D.: DD-PPO: learning near-perfect pointgoal navigators from 2.5 billion frames (2020) 12. Zhan, H., Weerasekera, C.S., Bian, J., Reid, I.: Visual odometry revisited: what should be learnt? (2019)
Comparative Efficiency of Prediction of Relativistic Electron Flux in the Near-Earth Space Using Various Machine Learning Methods Irina Myagkova(&) , Vladimir Shirokii , Roman Vladimirov, Oleg Barinov, and Sergey Dolenko D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, Moscow 119991, Russia {irina,shiroky,dolenko}@sinp.msu.ru
Abstract. This paper performs comparative analysis of the prediction quality for the time series of hourly average flux of relativistic electrons (E > 2 meV) in the near-Earth space 1 to 24 h ahead by various machine learning methods. As inputs, all the predictive models used hourly average values of the parameters of solar wind and interplanetary magnetic field measured at the Lagrange point between the Sun and the Earth, the values of Dst and Kp geomagnetic indexes, and the values of the flux of relativistic electrons itself. The machine learning methods used for prediction were the multi-layer perceptron (MLP) type artificial neural networks, the decision tree (random forest) method, and gradient boosting. A comparison of the quality indicators of short-term forecasts with a horizon of one to 24 h showed that the best results were demonstrated by the MLP. The horizon of satisfactory forecast accuracy on independent data is 9 h, the horizon of acceptable accuracy is 12 h. Keywords: Machine learning Artificial neural networks Random forest Gradient boosting Time series prediction Near-Earth space Radiation belts of the Earth Relativistic electrons flux Solar wind
1 Introduction Prediction of relativistic electron (RE) fluxes in the outer radiation belt of the Earth (ERB) is important because the exposure of modern spacecraft to high energy particles can result in significant damage to onboard systems. The difficulties in predicting the time series of the RE flux are largely caused by the nonlinearity of the system “solar wind (SW) – Earth’s magnetosphere” (e.g., [1]). RE flux in the outer ERB can change by an order of magnitude or more in a few hours under the influence of interplanetary magnetic field (IMF) and SW parameters, mainly, of the SW velocity [2–4]. The outer ERB is a much less stable structure compared to the inner one. It is characterized by strong and abrupt changes of relativistic and sub-relativistic electron flux intensity during geomagnetic storms, when IMF and SW parameters strongly vary. The radiation environment in the outer ERB is very important, since there are many © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 222–227, 2021. https://doi.org/10.1007/978-3-030-60577-3_25
Comparative Efficiency of Prediction of Relativistic Electron Flux
223
spacecraft whose orbits are situated in this area of the near-Earth space or cross it. Electronic microcircuits on board these spacecraft can be damaged by fluxes of highenergy electrons, which can lead to their malfunction or loss [5]. Despite many years of experimental studies of the outer ERB and numerous models of electron acceleration in the Earth’s magnetosphere (e.g., [6–8]), the mechanisms of formation of the outer ERB have no final generally accepted explanation. A very urgent task for ensuring radiation safety is the task of predicting the state of radiation environment. It can be solved with the help of modern machine learning (ML) methods, which make possible establishing relationships between the analyzed variables by approximation of empirical dependencies (e.g. [1, 9–11]). The aim of this work is to compare the quality indicators of the forecasts of the time series of the RE flux in the outer ERB, carried out using the SW and IMF parameters obtained in the experiment, by various ML methods - using multilayer perceptron (MLP) type artificial neural networks (ANN) [12], by the decision tree (random forest) method [13], and by the gradient boosting method [14], with prediction horizon of one to 24 h.
2 Data Sources and Preparation For short-term forecasting of the time series of the RE flux in the outer ERB, it is necessary to have operational information about the values of the IMF and SW parameters, the geomagnetic indexes characterizing the state of the geomagnetic field, and the flux of electrons itself. The parameters of the SW plasma and the IMF parameters used in this study were measured at the Lagrange point L1 between the Sun and the Earth on board the ACE (Advanced Composition Explorer) spacecraft (https:// www.srl.caltech.edu/ACE). The values of geomagnetic indexes obtained from the website of the World Geomagnetism Data Center in Kyoto (https://wdc.kugi.kyoto-u. ac.jp/) were also used. Data about the RE flux at the geostationary orbit (GEO) was obtained from the spacecraft of GOES (Geostationary Operational Environmental Satellite) series (https://www.ngdc.noaa.gov/stp/satellite/goes/dataaccess.html). The electron flux has a wide dynamical range (more than 6 orders of magnitude), so instead of the real values of the flux, its logarithmic values were used. To build a predictive model of RE flux - hourly average values of relativistic (E > 2 meV) electrons flux at GEO, we used the time series (TS) of hourly averaged values of the following physical quantities: • SW parameters in the Lagrange point L1 between the Earth and the Sun: SW velocity V (measured in km/s), protons density Np (measured in cm−3). • IMF vector parameters in the same point L1 (measured in nT): Bz (IMF • z-component in GSM system) and B amplitude (IMF modulus). • Geomagnetic indexes: equatorial index Dst (measured in nT) and global geomagnetic index Kp (dimensionless). • Average hourly flux of relativistic electrons with energies > 2 meV at GEO (measured in (cm2ssr)−1). The input of the prediction algorithms was also fed with the values of sine and cosine with daily and annual periods, which made it possible to take into account the
224
I. Myagkova et al.
recurrent changes in the predicted values associated with the rotation of the Earth around its axis and around the Sun [15]. To take into account the prehistory of input parameters, we used the delay embedding of all the TS to the depth of 24 h, that is, in addition to the current values of all the input values, the preceding values for 1, 2, 3, … 23, 24 h before the current one were fed to the algorithm input. This depth of delay embedding seems sufficient to work with data that has an hourly time resolution. The operational data (Browse Data) were used, instead of pre-processed and cleaned Level 2 Data intended for scientific research. The reason was that the forecasting system was developed to work online, when the quality of the input data corresponds to the operational data; therefore, ML algorithms should be used to work with data of this quality. Level 2 data also has a significantly larger number of gaps, which complicate use of ML methods, taking into account delay embedding of TS, which leads to a reduction in the number of patterns proportional to the number of gaps in the initial non-embedded data. To create the predicting model, we used the data array from November 1997 (from the time when the ACE satellite began operation) to December 2019. The data array was divided into the training sample and the test set: the training sample was used to train the algorithm, and the test set was used to independently evaluate the obtained model. For the MLP, the training sample was further divided into training and validation data sets. The training set was used to adjust the weights during MLP training, and the validation set was used for periodic verification during training in order to prevent overtraining. For the training and the validation sets, the data used were from November 1997 to the end of 2014, the training and the validation sets were randomly divided in the ratio of 80% to 20%. The test set included data from the beginning of 2015 to the end of 2019.
3 Results The adaptive models used to solve the problem were based on the following adaptive methods: MLP - multilayer perceptron [12], decision tree method (random forest, RF) [13], and gradient boosting (GB) [14]. The optimal values of hyper-parameters for each method were determined by grid search. The best architecture of the MLP had a single hidden layer with 32 neurons and tanh activation functions both in the hidden layer and in the output one; it was trained by stochastic gradient descent with Nesterov moment, learning rate 0.001, moment 0.5, batch size 200, training stopped after 1000 epochs without error improvement on the validation set. RF and GB used 100 estimators, and all the input features could be fed into each model. Maximum depth of a tree was 10 for RF and 3 for GB; minimum number of patterns in a leaf was 4 for RF and 1 for GB; minimum number of patterns to allow splitting a leaf was 10 for RF and 2 for GB; learning rate for GB was 0.1. To compare the quantitative indicators of the quality of the forecast obtained by various methods, the dependences of the values of the multiple determination coefficient R2 and of standard deviation (RMSE) on the forecast horizon from 1 to 24 h were calculated on the test set. Fig. 1 displays the obtained dependences for the average of
Comparative Efficiency of Prediction of Relativistic Electron Flux
225
Fig. 1. The dependences of the values of the coefficient of multiple determination R2 (left) and of the standard deviation (RMSE) (right) on the forecast horizon.
predictions of 5 identical models with various initializations, and for the trivial model, for which prediction is equal to the latest known value of the predicted TS. We can see that the best results for all the horizons were obtained using the neural network model (black circles). MLP model advantage becomes significant for horizon greater than 3 h. The worst results (the smallest R2 and the greatest RMSE) were demonstrated by the random forest model (blue triangles), but, as one can see, the differences between RF and GB models are small. All the three investigated models are expectedly significantly better than the trivial model (red crosses), starting from the horizon of 2 h. The non-monotonous behavior of the trivial model is caused by the existence of a pronounced variation of the hourly flux with the period equal to the period of daily rotation of the Earth. As examples of the results of forecast, the values of the predicted hourly averaged RE flux with energies >2 meV at GEO three and six hours ahead for the time period from September 26 to October 5, 2019 are shown in Fig. 2. Red solid curve shows the RE flux measured on board GOES satellites. Predicted hourly averaged RE flux by MLP (cyan), random forest (purple) and gradient boosting (green) are presented. Based on the results obtained, the following conclusions can be driven: 1) The best quality of prediction among the three investigated ML methods used was shown by MLP, the worst one – by random forest algorithm. The possible reason is significant non-linearity of the approximated dependence. 2) The maximum forecast horizon at which the prediction quality can be considered satisfactory is 9 h. With a further increase in the horizon, the statistical indicators of forecasts become worse than reasonable threshold values - R2 = 0.85, RMSE = 0.4. 3) Since the results shown by various ML methods turned out to be quite close to each other, the main directions of work to further improve the quality of forecasting should be reducing the dimensionality of the input data by feature selection or feature extraction, and correcting the distribution of input features by nonlinear transformation.
226
I. Myagkova et al.
Fig. 2. An example of comparison of the predicted hourly averaged flux of RE (>2 meV) at geostationary orbit three and six hours ahead on independent data for the time period from September 26 to October 5, 2019 using various ML methods – MLP (cyan), random forest (purple) and gradient boosting (green) with measured data (red).
Comparative Efficiency of Prediction of Relativistic Electron Flux
227
4 Summary The present paper demonstrates that machine learning algorithms make it possible to predict the amplitude values of the hourly averaged flux of relativistic electrons with energies >2 meV at geostationary orbit up to 24 h ahead with reasonable accuracy. The best performance was demonstrated by the multi-layer perceptron. The horizon of satisfactory forecast accuracy on independent data (R2 = 0.85) is 9 h, the horizon of acceptable accuracy (R2 = 0.8) is 12 h. Acknowledgements. This study has been conducted at the expense of Russian Science Foundation, grant no. 16-17-00098-P.
References 1. Cole, D.: Space weather: its effects and predictability. Space Sci. 107, 295–302 (2003) 2. Kataoka, R., Miyoshi, Y.: Average profiles of the solar wind and outer radiation belt during the extreme flux enhancement of relativistic electrons at geosynchronous orbit. Ann. Geophys. 26, 1335–1339 (2008) 3. Myagkova, I., Panasyuk, M., et al.: Correlation between the earth’s outer radiation belt dynamics and solar wind parameters at the solar minimum according to EMP instrument data onboard the coronas-photon satellite. Geomag. Aeron. 51(7), 897–901 (2011) 4. Myagkova, I., Shugay, Yu., Veselovsky, I., Yakovchouk, O.: Comparative analysis of recurrent high-speed solar wind streams influence on the radiation environment of near-earth space in April–July 2010. Sol. Syst. Res. 47(2), 141–155 (2013) 5. Iucci, N., Levitin, A., Belov, A.: Space weather conditions and spacecraft anomalies in different orbits. Space Weather 3(1), S01001 (2005) 6. Friedel, R., Reeves, W., Obara, T.: Relativistic electron dynamics in the inner magnetosphere - a review. J. Atmos. Sol.-Terr. Phys. 64, 265–283 (2002) 7. Reeves, G., McAdams, K., et al.: Acceleration and loss of relativistic electrons during geomagnetic storms. Geophys. Res. Lett. 30, 1529–1561 (2003) 8. Turner, D., Shprits, Y., Hartinger, M., Angelopoulos, V.: Explaining sudden losses of outer radiation belt electrons during geomagnetic storms. Nat. Phys. 8, 208–212 (2012) 9. Koons, H., Gorney, D.: A neural network model of the relativistic electron flux at geosynchronous orbit. J. Geophys. Res. 96, 5549–5556 (1990) 10. Fukata, M., Taguchi, S., Okuzawa, T., Obara, T.: Neural network prediction of relativistic electrons at geosynchronous orbit during the storm recovery phase: effects of recurring substorms. Ann. Geophys. 20(7), 947–951 (2002) 11. Ling, A., Ginet, G., Hilmer, R., Perry, K.: A neural network-based geosynchronous relativistic electron flux forecasting model. Adv. Space Res. (Space Weather) 8(9), S09003 (2010) 12. Haykin, S.: Neural Networks and Learning Machines, 3rd Edn. Pearson, McMaster University, Ontario Canada (2008) 13. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 14. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 15. Myagkova, I., Dolenko, S., Efitorov, A., Shiroky, V., Sentemova, N.: Prediction of relativistic electron flux in the earth’s outer radiation belt at geostationary orbit by adaptive methods. Geomag. Aeron. 57(1), 8–15 (2017)
Metal Oxide Gas Sensors Response Processing by Statistical Shape Analysis and Machine Learning Algorithm for Industrial Safety Applications Alexander Efitorov1, Matvei Andreev2, and Valeriy Krivetskiy2(&) 1
Skobeltsyn Institute of Nuclear Physics, Lomonosov Moscow State University, Leninskie gory, GSP-1, Moscow 119991, Russia 2 Department of Chemistry, Lomonosov Moscow State University, Leninskie Gory 1/3, Moscow 119234, Russia [email protected]
Abstract. Development of new signal processing approaches is essential for improvement of the reliability of metal oxide gas sensor performance in real atmospheric conditions. Advantages of statistical shape analysis (SSA) preprocessing method in combination with deep learning artificial neural network based machine learning classification algorithm are presented in this regard. The results of the presented method application are compared to simple signal preprocessing techniques for amplitude and baseline disturbance compensation in the task of selective detection of hydrogen and propane. Laboratory made sensors based on highly sensitive Au and Pd modified nanocrystalline SnO2 were used. Modulation of sensor working temperature between 150 and 500 °C was applied. A nearly 30% enhanced accuracy of identification of hydrogen or propane at a concentration range of 30–550 ppm under variable real atmospheric conditions has been demonstrated. The key feature of the presented SSA pre-processing approach is absence of the temperature modulated sensor signal characteristic feature extraction process. Instead, the characteristic pattern of sensor response towards various gases is revealed by the application of the correction procedures of translation, scaling and rotation. As a result of such preprocessing no useful signal shape components are lost, while the effects of signal amplitude and baseline drift are eliminated, facilitating recognition. Keywords: Metal oxide gas sensor analysis
Signal processing Statistical shape
1 Introduction Propylene is one of the most important primary components in chemical industry, the demand and production of which is growing year to year. Propane dehydrogenation (PDH) process is gaining popularity in this regard especially due to increasing shale gas exploitation [1]. The key stage of the process is the endothermic catalytic dehydrogenation of propane at high temperatures usually over 500 °C and pressures over 1 MPa. Highly flammable gases – hydrogen, propane and propylene, which can form © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 228–233, 2021. https://doi.org/10.1007/978-3-030-60577-3_26
Metal Oxide Gas Sensors Response Processing by Statistical Shape Analysis
229
extremely explosive mixtures with air, are the main components, circulating through valves, pipelines, reactors, refining columns etc. This circumstance requires high level of assurance in the integrity of the whole plant system of primers and products transportation, conversion and purification [2]. Direct monitoring of volatile hydrocarbons leakage in the air is one of the most reliable means of pipeline integrity control. The application of the unmanned distributed network of chemical sensors is one of the most promising solutions, allowing fast and precise detection of the fracture and estimation of its extent. Metal oxide gas sensors are highly sensitive, low cost, low energy consuming and miniature devices with the possibility of long-term continuous operation, which may be a principal component of such industrial safety technology [3]. Insufficient selectivity of response of metal oxide semiconductor gas sensors, crucial obstacle, which hinders their widespread application, can be overcome by the application of various machine learning mathematical algorithms of signal processing in combination with sensor working temperature modulation and/or sensor array assembly [4, 5]. Recent field tests indicate the feasibility of such approach, however, long term stability of sensor response becomes a problem [6]. Such negative phenomena as sensor response drift and inconsistency, sensitivity degradation hamper the application of metal oxide gas sensors for ambient air monitoring. A number of approaches (principal component analysis – PCA, discrete wavelet transform – DWT, Fourier transform - FT) for raw sensor response pre-processing before application of ML procedures has been demonstrated and shown to be effective in improvement of metal oxide gas sensors performance. They allow to extract characteristic features of sensor response towards particular gas or gas mixture, and minimize low and high frequency signal components, arising from drift and thermal noise effects respectively. However, all these methods rely on heuristic hyperparameters (selection a number of principal components, a frequency band of interest, type of mother wavelet function), defined to ensure best performance of ML algorithm on the exact set of data. Such approach may be a cause of unreliability of performance of metal oxide gas sensors in the task of monitoring of ambient air with variable atmospheric characteristics, local background pollution etc. In the present paper we demonstrate the implementation of statistical shape analysis (SSA) procedures [7] for temperature modulated metal oxide gas sensor response preprocessing for the reliable classification of hydrogen and propane in low concentrations in real urban air by deep learning ANN based ML algorithm. The key enabling feature of the SSA approach is the absence of sensor response features extraction and selection procedures. Instead, the correction procedures of translation, scaling and rotation are applied to each data sample. The proposed method performance in gases discrimination is tested in real atmospheric conditions of highly urbanized area and compared with other previously reported techniques of signal pre-processing on the example of hydrogen and propane in low concentration range, relevant to the leakage detection.
230
A. Efitorov et al.
2 Methods and Materials 2.1
Data Collection
The experimental setup is presented on Fig. 1. Previously reported laboratory made high sensitive gas sensor based on bimetallic modified tin dioxide - AuPd/SnO2 - was used in the study [8]. Sensor was placed in the stainless-steel box, through which outdoor air flow has been established at 0,2 m/s linear velocity. Hydrogen and propane gases were admixed to the air flow through capillary tube at fixed rate, maintained by mass-flow controllers (Bronkhorst, Netherlands). Gases were diluted by the air, supplied by clean air generator (GChV 2.0, OOO “NPP Himelektronika”, Russia). During collection of data set for algorithm training following sequence of gas concentrations were passed through the box: hydrogen 900 ppm, propane 900 ppm, hydrogen 500 ppm, propane 500 ppm, hydrogen 100 ppm, propane 100 ppm. Each stage lasted for one hour and was separated from another by one hour of intact outdoor air flow. This protocol has been repeated 16 times to yield the training dataset. The independent dataset for signal processing algorithm testing consisted of the following steps: outdoor air, hydrogen 550 ppm, 450 ppm, 350 ppm, 200 ppm, 50 ppm, outdoor air, propane 550 ppm, 450 ppm, 350 ppm, 200 ppm, 50 ppm. Each stage lasted for 1 h. This protocol has been repeated 2 times. Additionally, lowest possible concentrations of methane and propane were tested according to the protocol: outdoor air, hydrogen 30 ppm, outdoor air, propane 30 ppm, outdoor air with the duration of each stage of 1 h, which has been repeated 18 times to conclude model testing dataset. Sensors were operated in a working temperature modulation mode described on the Fig. 2. Resistance of the sensors sensitive layer was measured by a calibrated voltage divider circuit with a 10 Hz frequency, so each data sample consisted of 600 data points, containing values of time, actual sensor temperature and resistance.
3 Data Processing Each data sample of the raw sensor response data set has been converted to the logarithmic scale, after which the resistance curve has been shifted to zero with its minimal value. Such form of data, referred herein as “raw” sensor response, was fed to the ANN algorithm for training, validation and testing of the obtained model. Dense multilayer perceptron artificial neural network (MLP ANN) with two hidden layers (160 neurons in each) of hyperbolic tangent, based on the “keras” python library, has been used as a statistical model. Standard scaler normalization (for each feature over data set: mean = 0, standard deviation = 1) was applied to all data samples. High level of dropout protocol (0.8 for hidden layers and 0.5 for output layer) and batch normalization were used for MLP ANN overfitting prevention. Additional signal preprocessing consisted in normalization of each data sample in order to get rid of baseline and amplitude drift. Raw sensor response has also been used for data pre-processing through statistical shape analysis protocol. According to the procedure Kendall’s shape coordinates were used to remove translation and size aspects of signal shape. The original two-dimensional shape of gas sensor response (resistance-temperature) is
Metal Oxide Gas Sensors Response Processing by Statistical Shape Analysis
231
multiplied by Helmert submatrix H to remove the location, and divided on its norm to eliminate scaling factor. Thus, original data sample is transformed into the pre-shape space where the response retains its rotation as well as the shape itself.
Fig. 1. A laboratory setup for metal oxide gas sensor operation in the simulated conditions of industrial safety purpose detection of hydrogen and propane.
Fig. 2. Curves of sensor resistance (logarithmic scale) and temperature change during single cycle of measurements in the flow of outdoor air (left), outdoor air with hydrogen admixture (center), propane admixture (right.)
For the further pre-processing we used a linearized version of the shape space – Procrustes tangent space. In this case only a rotation angle between the given sample and the complex pole should be estimated to eliminate common rotation of data sample, which may have been caused, for example, by the drift of sensor response. The pole of the shape space sphere is usually chosen to correspond to an average shape (full Procrustes average shape). In the case of the given gas classification problem the full
232
A. Efitorov et al.
Procrustes average shape has been computed on the basis of training set data samples, corresponding to intact urban air. This approach guarantees that only local differences of waveform of the sensor response for different gases have a significant impact on the identification process, while all global factors, such as baseline and response amplitude, have been eliminated. Both type of pre-processed sensor response – pre-shape and Procrustes tangent space – were used as an input for ANN classifier algorithm for model training and testing.
4 Results and Discussion The accuracy of hydrogen and propane identification by ANN algorithm coupled with different signal pre-processing methods is reflected by the Table 1. Table 1. Error of gas identification by ML algorithm with different signal pre-processing. Error of gas identification, % Pre-processing methoda Gas Conc., ppm Raw Norm Pre-shape Tan 30 11 14 5 1 H2 50 2 1 0 0 200 0 0 0 0 350 0 0 0 0 450 0 0 0 0 550 0 0 0 0 C3H8 30 87 96 86 88 50 100 100 100 100 200 100 100 63 63 350 100 99 17 3 450 65 59 4 1 550 94 88 6 0 Total error 30 32 22 21 a Raw – raw sensor response, Norm – normalized sensor response, Pre-shape – sensor response, transformed into pre-shape space with removed translation and scaling, Tan – pre-shape sensor response, transformed to Procrustes tangent space.
Hydrogen, being a much more reactive gas, is detected with decent accuracy by ANN algorithm with all applied pre-processing methods, while chemically relative inert propane is often confused with intact urban air (Fig. 3). Application of shape space pre-processing of sensor response allows to significantly improve algorithm discrimination ability at least for propane concentrations above 200 ppm. The observed benefits should be related to removed signal amplitude, which is close in case of urban air with and without propane admixture, while the shape features remained unaffected.
Metal Oxide Gas Sensors Response Processing by Statistical Shape Analysis
233
Fig. 3. Matrix of algorithm answers confusion (fraction of 1) between urban air, H2 and C3H8.
5 Conclusions Application of statistical shape analysis signal pre-processing technique for thermally modulated metal oxide gas sensor response allows to improve selectivity of gases detection by machine learning ANN classification algorithm. New sensor working temperature modulation modes should be designed with the aim of increase of characteristic shape features content in the measured resistance response. Acknowledgements. The reported study was funded by RFBR according to the research project № 18–33-20220.
References 1. Li, C.F., et al.: Defective TiO2 for propane dehydrogenation. Ind. Eng. Chem. Res. 59(10), 4377–4387 (2020) 2. Mujica, L.E., et al.: Leak detection and localization on hydrocarbon transportation lines by combining real-time transient model and multivariate statistical analysis. Struct. Hlth. Monit. 1–2, 2350–2357 (2015) 3. Thorson, A., et al.: Using a low-cost sensor array and machine learning techniques to detect complex pollutant mixtures and identify likely sources. Sensors-Basel 19(17), 3723 (2019) 4. Krivetskiy, V., et al.: Selective detection of individual gases and CO/H2 mixture at low concentrations in air by single semiconductor metal oxide sensors working in dynamic temperature mode. Sensor Actuat. B-Chem. 254, 502–513 (2018) 5. Vergara, A., et al.: Demonstration of fast and accurate discrimination and quantification of chemically similar species utilizing a single cross-selective chemiresistor. Anal. Chem. 86, 6753–6757 (2014) 6. Smith, K.R., et al.: Clustering approaches to improve the performance of low cost air pollution sensors. Faraday Discuss. 200, 621–637 (2017) 7. Dryden, I., Mardia, K.: Statistical Shape Analysis, p. 376. Wiley, Hoboken (1998) 8. Krivetskiy, V., et al.: Effect of AuPd bimetal sensitization on gas sensing performance of Nanocrystalline SnO2 obtained by single step flame spray pyrolysis. Nanomaterials-Basel 9 (95), 728 (2019)
Feature Selection in Neural Network Solution of Inverse Problem Based on Integration of Optical Spectroscopic Methods Igor Isaev1(&) , Olga Sarmanova1,2 , Sergey Burikov1,2 , Tatiana Dolenko1,2 , Kirill Laptinskiy1 , and Sergey Dolenko1 1
2
D.V. Skobeltsyn Institute of Nuclear Physics, M.V. Lomonosov Moscow State University, Moscow, Russia [email protected], [email protected] Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russia [email protected]
Abstract. This study considers a neural network solution to the inverse problem based on the integration of optical spectroscopy methods for determining ion concentrations in aqueous solutions. The effect of integration of physical methods is studied using the selection of significant input features. The previously formulated thesis is confirmed that if the integrated methods differ much by their accuracy, then the integration of these methods is ineffective. Use of joint selection of significant input features may only slightly improve the quality of solution in a limited number of situations. Among the tested methods of feature selection, embedded method based on the analysis of weights of an already trained neural network (multi-layer perceptron) gives better results than filter methods with selection based on standard deviation, cross-correlation, or cross-entropy, at the expense of some additional computation. Selection of the significant input features allows improving the quality of the ANN solution relative to the solution obtained on the full sets of features. The improvement is the greater, the worse is the initial solution. Keywords: Inverse problems Neural networks Integration of physical methods
Feature selection
1 Introduction Water plays an important role in human life, so the problem of controlling the impurities contained in it becomes one of the most important. Water is a good solvent; so many substances in water dissociate and are contained in it as ions. This applies primarily to inorganic salts. In this case, the problem of determining the chemical composition of the solution is reduced to determining the concentration of ions This study has been performed at the expense of the grant of Russian Science Foundation (project no. 19-11-00333). © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 234–241, 2021. https://doi.org/10.1007/978-3-030-60577-3_27
Feature Selection in Neural Network Solution
235
dissolved in water. Among the ions of inorganic elements, ions of heavy metal salts are of great interest, since they are very dangerous for humans [1–3]. They can enter natural waters not only as a result of natural processes (erosion processes, volcanic activity), but also as a result of human activity, e.g. with emissions and runoff of industrial enterprises, as a result of mining and processing of minerals, fuel combustion, etc. [4, 5]. The increase in anthropogenic pressure on the environment makes it especially important to solve the problem of express monitoring of the ionic composition of water. To date, a significant number of methods for determining the concentration of ions in water have been developed. Widely used are chromatographic methods [6, 7], atomic absorption spectroscopy [8], mass spectroscopy [9], electrochemical methods [10] etc. These methods provide very high accuracy of determination of the concentration of ions, but require complex expensive equipment and reagents, complex preprocessing of samples, and they are time-consuming. At the same time, simplicity and efficiency of solving the problem of express monitoring of the water composition are required much more than the highest accuracy of determination of the concentrations. Methods of optical spectroscopy are free from the listed shortcomings, and today they are increasingly used for express monitoring of water environments. Among the optical methods, two of the most widely used methods can be distinguished: Raman spectroscopy [11] and optical absorption spectroscopy [12, 13]. The possibility of using Raman spectra to determine the concentration of ions is based on the presence of own Raman bands of complex ions [11] and the influence of simple ions on the position and shape of the water Raman valence band [14]. These methods are developing quite actively, and it should be noted that recent progress in this area is mainly determined not by the improvement of experimental techniques, but by the development of new approaches to the processing and analysis of spectra. The problem of determination of the type and concentration of ions using absorption or Raman spectra is a typical multi-parameter inverse problem. Various methods and approaches are used to solve this problem, including the method of artificial neural networks (ANN) [15]. This method is widely used for the solution of problems of optical spectroscopy [16, 17]. It can be assumed that integration of methods – use of two or more methods in one experiment should increase the accuracy of solving the inverse problem (in our case, determination of the type and concentration of ions in aqueous solutions). However, as it turned out, simultaneous application of Raman spectroscopy and optical absorption spectroscopy to the array of data obtained in the same experiment does not lead to an increase in the accuracy of solving the problem using the method of ANN [18]. The reason for this result, at the first glance unexpected, may be the specificity of the method of ANN in solving the problems of spectroscopy, where one has to deal with high dimension of the input data (the spectra contain hundreds and thousands of spectral channels). In this unfavorable situation, the number of weights of the ANN used to solve the problem is much greater than the number of patterns (experimental spectra) used to train the ANN. In the case of combining two optical methods, the situation is aggravated, since their two spectra are used simultaneously, and the number of input features is increased. The solution lies in the reduction of the dimensionality of the problem. To do this, a preliminary selection of the input features is made based on
236
I. Isaev et al.
their significance (since it is obvious that not all regions of the spectrum are equally informative). Significant feature selection has been successfully used to increase the accuracy of solving inverse problems of optical spectroscopy [19, 20]. The purpose of this study was to test the effect of integration of physical methods in the conditions of using a reduced and equalized number of significant input features selected from Raman and optical absorption spectra.
2 Experimental Data In the experiment, the water solutions of CuSO4 ; NiSO4 ; CoSO4 ; CuðNO3 Þ2 ; NiðNO3 Þ2 ; CoðNO3 Þ2 salts were studied. In total, 3506 solutions were prepared, the number of ions in a solution varied from 2 to 5, the number of salts in a solution varied from 1 to 6. Spectra of optical absorption, Raman spectra, and the value of pH were measured for each sample. The details of sample preparation, the experimental equipment, and the shape of the spectra were described elsewhere [18]. The output dimension of the problem was 6: the determined parameters were pH and concentra2 tions of 5 ions (Cu2 þ ; Ni2 þ ; Co2 þ ; NO 3 ; SO4 ). The original dataset, which was obtained during the experiment, contained 3506 patterns. It was divided into training, validation and test sets. The size of the sets was 2356, 750, 400 patterns, respectively. Each pattern was described by 2048 features corresponding to Raman spectroscopy, and 811 features corresponding to absorption spectroscopy. Each feature represents a value of the spectrum intensity in the corresponding channel.
3 Methods 3.1
Reducing the Output Dimension
To reduce the output dimension of the problem, autonomous determination of parameters [21–23] was used, where the initial task with N outputs was divided into N single-output tasks, with the construction of a separate single-output ANN for each task. 3.2
Use of Neural Networks
To solve the IP, in this study we chose a multilayer perceptron (MLP) type of ANN, which is a universal approximator. Each MLP used had 1 output and 32 neurons in the single hidden layer, logistic activation function in the hidden layer, and linear in the output one. This architecture showed a high enough quality of the solution on the full sets of features (see below, Fig. 1). To reduce the influence of the initialization of the weights, 5 identical MLPs were trained for each case considered; the statistical indicators of the solution quality for these 5 MLPs were averaged.
Feature Selection in Neural Network Solution
237
Training was performed by stochastic gradient descent with learning rate 0.01 in the hidden layer and 0.001 in the output layer, and moment 0.5. To prevent overtraining, training was terminated by early stop after 500 epochs without improving the quality of the solution on the validation data set. 3.3
Feature Selection Methods
The following groups are usually distinguished among the methods of selecting significant input features based on supervised training [24]: filter methods, embedded methods and wrappers. Filter methods are most computationally efficient, but feature sets they select may be not optimal for the target algorithm (MLP ANN in this study). Among the embedded methods for MLP, two main groups can be noted. The first one includes methods where the reduction of the input dimension is built into the training procedure, in the form of the decay of the weights [26, 27] or regularization [26–29]. Another type of embedded methods is based on the analysis of weights of an already trained neural network [30, 31]. Wrapper methods show better results than filter methods, as the target problem solving algorithm is included in the selection of features. However, they have significantly higher computational costs, especially for a large number of input features. In this study, the following methods were used: filter methods with selection based on standard deviation (SD), cross-correlation (CC), cross-entropy (CE), and the embedded method based on the weights analysis (WA) of an already trained MLP. This choice is due to the requirement for ease of interpretation of the results. Wrapper methods were not used due to the high initial dimensionality of the problem. The decision threshold for each method was adjusted in such a way that the exact number of the selected features was one from the series: 5, 10, 20, 30, 40, 50, 100, 150, 200, 300, 400, 800. The selected input features were used to train the MLP. The dependences of the quality of the solution of the studied inverse problem on the test set upon the number of selected input features were analyzed for various combinations of the features produced by the two types of spectroscopy. The results of the solution were compared for the following sets of input features: • Sets containing features corresponding only to Raman spectroscopy. The full set consisted of 2048 features. • Sets containing features corresponding only to absorption spectroscopy. The full set consisted of 811 features. • Sets obtained when the selection procedure was applied to the full aggregate set of features corresponding to both types of spectroscopy. This full set consisted of 2859 features. • Symmetric union — features selected separately from the Raman full set and from the absorption full set were merged in equal parts. For example, to obtain a set of 20 features, subsets of 10 features each were taken from two types of spectra. • Asymmetric union — features selected separately from the Raman full set and from the absorption full set were combined in a ratio close to the ratio of the number of features in the initial full sets (about 2.5:1). For example, to obtain a set of 20 features, 14 features corresponding to Raman spectroscopy and 6 features corresponding to absorption spectroscopy were taken.
238
I. Isaev et al.
4 Results The dependences of the quality of the solution on the test set upon the number of features selected by various methods are shown in Fig. 1. Due to lack of space, only the results for WA and CC are presented, and only for 3 determined parameters. The results for CE are close to those for CC, and the results for SD are significantly worse.
Fig. 1. Dependence of the quality indicator of the solution (coefficient of multiple determination) on test set upon the number of selected features for the filter selection method by crosscorrelation – CC, and for the embedded selection method by analysis of MLP weights – WA.
Feature Selection in Neural Network Solution
239
Analysis of the results of the study allows one to draw the following conclusions: • For low and moderate rates of reduction of the dimensionality of the input feature space, all the methods of feature selection demonstrate satisfactory results for all determined parameters. • The best quality of solution is nearly always observed using only data of absorption spectroscopy, the worst one – using only data of Raman spectroscopy. The reason is that in an absorption spectrum each ion has its own wide band, only partially overlapping with bands of other ions. On the contrary, in Raman spectra, only complex ions have their own narrow bands, and simple ions influence only the shape of the bands of water – so there are no features whose changes would be determined by the change in concentration of a single simple ion. • However, there are several exceptions. pH influences simultaneously the shape of several bands of the spectra – so, when the number of selected features is small, various kinds of integration of methods outperform determination of pH by absorption spectroscopy data only. Also, well performed feature selection yields solutions with quality close to that on the data of absorption spectroscopy only. This may be partly due to the fact that joint selection may select mostly the features corresponding to absorption spectroscopy. • Selection of the significant input features allows improving the quality of the ANN solution relative to the solution obtained on the full sets of features. The improvement is the greater, the worse is the initial solution. Therefore, the effect is most pronounced when using only Raman spectroscopy data and it is least pronounced when using only absorption spectroscopy data. • Among the methods of integration, joint selection is most flexible and so it performs best. • Selection by standard deviation (not shown in Fig. 1) takes into account the general amount of information brought by a feature (spectral channel), but a large part of this information may be useless to solve the specific studied problem. Therefore, the quality of solution degrades fast with decrease of the amount of selected features (wrong noisy features are selected, and the features most relevant to the studied problem are not). The only exception is determination of the concentrations of complex anions by Raman spectra, where several channels in the vicinity of the narrow own bands of these ions are enough for a good solution of the problem. • Selection by cross-correlation and cross-entropy determines well only the dependences having a strong linear component. Therefore, it degrades relatively quickly with decrease of the number of selected features (it selects wrong features, like selection by SD), especially if there are no own bands influenced only by the determined parameter (Raman spectra of simple cations, absorption spectrum of sulfate ion, and also for pH determination). In all other cases, these methods are effective enough. • Selection by the method of neural network weight analysis is more computationally expensive than the other tested methods, but it outperforms them nearly always, except the case of extremely strong compression (down to 5-10 features) for separate types of integration.
240
I. Isaev et al.
5 Conclusion In this study, it has been confirmed that if integrated physical methods differ much by their accuracy, then the integration of these methods is ineffective. Use of joint selection of significant input features may only slightly improve the quality of solution in a limited number of situations. Among the tested methods of feature selection, embedded method based on the analysis of weights of an already trained neural network gives better results than other methods at the expense of some additional computation.
References 1. Singh, R., Gautam, N., Mishra, A., Gupta, R.: Heavy metals and living systems. Indian J. Pharmacol. 43(3), 246–254 (2011) 2. Koedrith, P., Kim, H.L., Weon, J.I., Seo, Y.R.: Toxicogenomic approaches for understanding molecular mechanisms of heavy metal mutagenicity and carcinogenicity. Int. J. Hyg. Environ. Health 216, 587–598 (2013) 3. Morais, S., de Costa, F.G., de Lourdes Pereira, M., Heavy metals and human health. Environ. Health Emerg. Issues Pract. 10, 227–246 (2012) 4. Bradl, H.: Heavy Metals in the Environment: Origin, Interaction and Remediation, vol. 6, Elsevier (2005) 5. He, Z.L., Yang, X.E., Stoffella, P.J.: Trace elements in agroecosystems and impacts on the environment. J. Trace Elements Med. Biol. 19(2–3), 125–140 (2005) 6. Fa, Y., Yu, Y., Li, F., et al.: Simultaneous detection of anions and cations in mineral water by two-dimensional ion chromatography. J. Chromatogr. A 1554, 123–127 (2018) 7. Sarzanini, C., Bruzzoniti, M.C.: Metal species determination by ion chromatography. Trends Anal. Chem. 20(6–7), 304–310 (2001) 8. Maurya, V.K., Singh, R.P., Prasad, L.B.: Comparative evaluation of trace heavy metal ions in water sample using complexes of dithioligands by flame atomic absorption spectrometry. Orient. J. Chem. 34(1), 100–109 (2018) 9. Kogan, V.T., Pavlov, A.K., Chichagov, Y., et al.: Mobile mass spectrometer for determination of heavy metals in sea water: numerical simulation and experimental verification. Tech. Phys. 52(12), 1604–1610 (2007) 10. Farghaly, O.A., Hameed, R.A., Abu-Nawwas, A.A.H.: Analytical application using modern electrochemical techniques. Int. J. Electrochem. Sci. 9(1), 3287–3318 (2014) 11. Kauffmann, T.H., Fontana, M.D.: Inorganic salts diluted in water probed by Raman spectrometry: data process sing and performance evaluation. Sens. Actuators B 209, 154– 161 (2015) 12. Crompton, T.R.: Determination of Anions in Natural and Treated Waters, 828 p. Taylor & Francis (2002) 13. Kulkarni, S., Dhokpande, S., Kaware, J.: A review on spectrophotometric determination of heavy metals with emphasis on cadmium and nickel determination By UV spectrophotometry. Int. J. Adv. Eng. Res. Sci. (IJAERS) 2(9), 1836–1839 (2015) 14. Burikov, S.A., Dolenko, T.A., Velikotnyi, P.A., Sugonyaev, A.V., Fadeev, V.V.: The effect of hydration of ions of inorganic salts on the shape of the Raman stretching band of water. Opt. Spectrosc. 98(2), 235–239 (2005) 15. Hassoun, M.H.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995)
Feature Selection in Neural Network Solution
241
16. Efitorov, A.O., Dolenko, S.A., Dolenko, T.A., et al.: Use of adaptive methods to solve the inverse problem of determination of ionic composition of multi-component solutions. Opt. Memory Neural Networks (Inf. Opt.) 27(2), 89–99 (2018) 17. Sarmanova, O.E., Laptinskiy, K.A., Burikov, S.A., et al.: Determination of heavy metal ions concentration in aqueous solutions using adaptive data analysis methods. In: Proceedings of the SPIE, vol. 11354, art. 113540L (2020) 18. Isaev, I., Trifonov, N., Sarmanova, O., et al.: Joint application of Raman and optical absorption spectroscopy to determine concentrations of heavy metal ions in water using artificial neural networks. In: Proceedings of the SPIE, vol. 11458, art.114580R (2020) 19. Gushchin, K.A., Burikov, S.A., Dolenko, T.A., et al.: Data dimensionality reduction and evaluation of clusterization quality in the problems of analysis of composition of multicomponent solutions. Opt. Mem. Neural Networks (Inf. Opt.) 24(3), 218–224 (2015) 20. Efitorov, A., Burikov, S., Dolenko, T., et al.: Significant feature selection in neural network solution of an inverse problem in spectroscopy. Procedia Comput. Sci. 66, 93–102 (2015) 21. Dolenko, S., Isaev, I., Obornev, E., et al.: Study of influence of parameter grouping on the error of neural network solution of the inverse problem of electrical prospecting. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013. CCIS, vol. 383. Springer, Heidelberg (2013) 22. Isaev, I., Obornev, E., Obornev, I., et al.: Increase of the resistance to noise in data for neural network solution of the inverse problem of magnetotellurics with group determination of parameters. In: Villa, A., Masulli, P., Pons Rivero, A. (eds.) ICANN 2016. LNCS, vol. 9886, pp. 502–509. Springer, Cham (2016) 23. Isaev, I., Burikov, S., Dolenko, T., Laptinskiy, K., Vervald, A., Dolenko, S.: Joint application of group determination of parameters and of training with noise addition to improve the resilience of the neural network solution of the inverse problem in spectroscopy to noise in data. In: Kurkova, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11139, pp. 435–444. Springer, Cham (2018) 24. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003) 25. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. 50(6), 1–45 (2017) 26. Cibas, T., Soulié, F.F., Gallinari, P., Raudys, S.: Variable selection with neural networks. Neurocomputing 12(2–3), 223–248 (1996) 27. Verikas, A., Bacauskiene, M.: Feature selection with neural networks. Pattern Recogn. Lett. 23(11), 1323–1335 (2002) 28. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005) 29. Scardapane, S., Comminiello, D., Hussain, A., Uncini, A.: Group sparse regularization for deep neural networks. Neurocomputing 241, 81–89 (2017) 30. Gevrey, M., Dimopoulos, I., Lek, S.: Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol. Modell. 160(3), 249–264 (2003) 31. Pérez-Uribe, A.: Relevance metrics to reduce input dimensions in artificial neural networks. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 39–48. Springer, Heidelberg (2007)
The Text Fragment Extraction Module of the Hybrid Intelligent Information System for Analysis of Judicial Practice of Arbitration Courts Maria O. Taran(&), Georgiy I. Revunkov, and Yuriy E. Gapanyuk Bauman Moscow State Technical University, Moscow, Russia [email protected]
Abstract. The architecture of a hybrid intelligent information system for the analysis of the judicial practice of arbitration courts is discussed. The structure of the subsystems of consciousness and subconsciousness in the architecture of the proposed system is considered in detail. The text fragments extraction module plays a crucial role in the subconsciousness subsystem of the proposed system. The principles of operation of the text fragment extraction module are examined in detail. The architecture of a deep neural network, which is the basis of the module, is proposed. The aspects of the training of the proposed deep neural network are considered. Variants of text vectorization based on the tf-idf and fasttext approaches are investigated; vectorized texts are input data for the proposed neural network. Experiments were conducted to determine the quality metrics for the proposed vectorization options. The experimental results show that the vectorization option based on tf-idf is superior to the combined vectorization option based on tf-idf and fasttext. The developed text fragments extraction module makes it possible to implement the proposed system successfully. Keywords: Arbitration court Hybrid Intelligent Information System (HIIS) The consciousness of information system The subconsciousness of information system Text mining Deep learning
1 Introduction Although the Russian judicial system is not a case-based one, court decisions play a crucial role in preparing for a trial. Due to legislative conflicts or lack of regulatory norms, most court decisions are based on the positions of the Supreme Court of the Russian Federation. Representatives of both plaintiffs and defendants necessarily begin preparing for the case by studying judicial practice, searching for similar cases, determining similar circumstances, and justifying their positions. A judicial act may take from 4 to 15 pages. At the same time, such decisions need to be studied enough to get a better idea of the possible outcome of the trial.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 242–248, 2021. https://doi.org/10.1007/978-3-030-60577-3_28
The Text Fragment Extraction Module of the Hybrid Intelligent Information System
243
Searching for judicial acts, reading them, analyzing, and comparing them with a specific situation is a very time-consuming process. Often, court documents contain a large amount of secondary information that does not help, and sometimes even hinders lawyers. Therefore, automating the processing of legal texts is an urgent task [1]. There is no doubt that such automation involves the use of intelligent methods. Currently, it is possible to note a clear trend towards the joint use of different intelligent methods to solve various classes of AI problems. It has led to the emergence of such scientific area as “hybrid intelligent systems” (HIS). As fundamental research in the field of HIS, it is possible to consider the works of Professor Kolesnikov and his colleagues [2] and [3]. It should be noted that in the English-language literature the term “hybrid intelligence” is mainly used to denote the hybridization of various methods of soft computing and expert systems [4] and [5]: neuro-fuzzy systems, fuzzy expert systems, the use of evolutionary approaches for constructing neural networks and fuzzy models and other methods. Nowadays, as a rule, intelligent systems are not developed separately; instead, they are embedded as modules in a traditional information system to solve tasks related to the intelligent processing of data and knowledge. According to [6], this combined system is referred to as a hybrid intelligent information system (HIIS). The main components of a HIIS system include: • The subsystem of subconsciousness is related to the environment in which a HIIS operates. Because the environment can be represented as an unstructured data or a set of continuous signals, the data processing techniques of the MS are mostly based on neural networks, fuzzy logic, and combined neuro-fuzzy methods. • The subsystem of consciousness is based on conventional data and knowledge processing, which may be based on traditional programming or workflow technology. As to the data model, the MC uses ontology-based models. They can be classical ontologies, which are developed within the Semantic Web technology (RDF, RDFa, OWL, and OWL2 standards), or nonstandard ontology models such as [7], including those based on complex networks. Also, the classical objectoriented approach, which in practice is used in most information systems, may be included. • The boundary model of consciousness and subconsciousness is intended for deep integration of modules of consciousness and subconsciousness and represents an interface between these modules with the function of data storage. In this article, we apply the HIIS approach to the architecture of the judicial practice of the arbitration courts analysis system.
2 The Hybrid Intelligent Information System for Analysis of Judicial Practice of Arbitration Courts In this section, we will briefly review the architecture of the proposed system, which is based on HIIS principles (Fig. 1). In the proposed approach, textual (legal) documents act as the HIIS environment.
244
M. O. Taran et al.
The subsystem of subconsciousness includes the concept extraction module (designed to extract concepts from legal documents) and the text fragment extraction module (which is discussed in detail in the following sections). According to [8], the relevant text fragment extraction is a crucial step in legal text processing. The extracted concepts and text fragments are saved into the repository (which acts as a boundary model of consciousness and subconsciousness). The contents of the repository can be considered as a kind of ontology that contains input data for the subsystem of consciousness. The subsystem of consciousness includes modules for automatic referencing, visualization, statistics, and search for associative rules. The subsystem is not discussed in detail in this paper. In the following sections, we will discuss in detail the text fragment extraction module.
Fig. 1. The architecture of the proposed system
3 The Text Fragment Extraction Module The text of the court decision is sent to the module's input. At the output, we get the same text, but with selected fragments that are most valuable for further analysis of the text. The size of the fragments can be arbitrary. By default, separate paragraphs are used as text fragments. In accordance with Art. 170 Arbitration Procedure Code of the Russian Federation, the court's decision consists of introductory, descriptive, motivational, and resolutive parts. In practice, it is the motivational part that is of more interest. Therefore, before processing the text of the judicial act, the above-described semantic parts are highlighted.
The Text Fragment Extraction Module of the Hybrid Intelligent Information System
245
The module operation example in the form of a colorized text fragment is shown in Fig. 2.
Fig. 2. The text fragment extraction module operation example
Text processing is performed in several steps: 1. In the first step, the text is cleared; in particular, extra spaces and empty paragraphs are deleted. 2. In the second step, the motivational part is divided into fragments (paragraphs). In this case, all tables are deleted, and the lists are turned into a single paragraph. 3. In the third step, additional features are extracted from the text fragments, such as dates, amounts, legal norms, and so on. 4. In the fourth step, the selected text fragments are vectorized. Approaches tf-idf and fasttext were used for vectorization. Vectorization options are discussed in the experiments section. 5. In the fifth step, the classification of vectorized text fragments is performed. From the point of view of machine learning, the main task of the considered module can be posed as a multi-class classification problem [9, 10]. Determining the exact number of classes is the subject of further research. Currently, there are three classes that match the colors used to highlight text fragments: blue, orange, and green. The blue ones are links to regulatory legal acts, the orange ones are the interpretation of the norms, and the green ones are the rationale for the decision. 6. In the sixth step, the class labels obtained from the classification are mapped to the source text. As a result of the mapping, tags corresponding to class labels are assigned to fragments of the source text. Visually, tags can be encoded by colorizing text fragments. To implement the fifth step, a deep neural network is proposed, the architecture of which is shown in Fig. 3. The dense and dropout levels are used in the architecture of the neural network in order to improve the quality of classification.
246
M. O. Taran et al.
Fig. 3. The architecture of the neural network
4 The Experiments Optimal parameters of the neural network were found experimentally. Accuracy was used as a metric. The best model showed an accuracy of 0.94 on test data for 30 epochs (Fig. 4).
Fig. 4. Training and validation curves for the neural network
Since there is no open dataset in Russian for the task to be solved, the dataset was created by the authors.
The Text Fragment Extraction Module of the Hybrid Intelligent Information System
247
In accordance with [11], the vectorization methods used can significantly affect the quality of classification. Thus, two options for text vectorization were experimentally investigated. The results of vectorization based on tf-idf and the results of combined vectorization based on tf-idf and fasttext are presented in Table 1. Precision, recall, and F1-score were used as quality metrics. The experimental results show that the vectorization option based on tf-idf is superior to the combined vectorization option based on tf-idf and fasttext. Table 1. Results of experiments for text vectorization options Class
1. The vectorization based on tf-idf
2. The combined vectorization based on tfidf and fasttext Precision Recall F1-score Precision Recall F1-score Blue 0,717 0,849 0,777 0,603 0,904 0,723 Orange 0,943 0,940 0,942 0,929 0,903 0,916 Green 0,951 0,911 0,931 0,954 0,858 0,903
In the future, to improve the quality of classification, it is planned to increase the size of the training dataset.
5 Conclusions The system for the analysis of the judicial practice of arbitration courts may be implemented based on the HIIS approach. The relevant text fragment extraction is a crucial step in legal text processing. Thus, the text fragment extraction module is implemented in the subsystem of subconsciousness. From the point of view of machine learning, the main task of the considered module can be posed as a multi-class classification problem. The deep neural network architecture for classification task solving is proposed. The experimental results show that the vectorization option based on tf-idf is superior to the combined vectorization option based on tf-idf and fasttext. The developed text fragments extraction module makes it possible to implement the proposed system successfully.
References 1. Naykhanova, L.V., Naykhanova, I.V.: Recognition of situations described in the text of legal documents. In: 2019 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon), Vladivostok, Russia, pp. 1–4 (2019) 2. Kirikov, I.A., Kolesnikov, A.V., Listopad, S.V., Rumovskaya, S.B.: Melkozernistie gibridnie intellektualnie sistemy Chast 1: Lingvisticheskiy podhod [Fine-Grained Hybrid Intelligent Systems. Part 1: Linguistic Approach]. Informatika i ee Primeneniya 9(4), 98–105 (2015)
248
M. O. Taran et al.
3. Kirikov, I.A., Kolesnikov, A.V., Listopad, S.V., Rumovskaya, S.B.: Melkozernistie gibridnie intellektualnie sistemy. Chast 2: Dvunapravlennaya gibridizatsia [Fine-Grained Hybrid Intelligent Systems. Part 2: Bidirectional Hybridization]. Informatika i ee Primeneniya 10(1), 96–105 (2016) 4. Zadeh, L.A., Abbasov, A.M., Yager, R.R., Shahbazova, S.N., Reformat, M.Z.: Recent Developments and New Directions in Soft Computing. Springer, Cham (2014) 5. Melin, P., Castillo, O., Kacprzyk, J.: Nature-Inspired Design of Hybrid Intelligent Systems. Springer, Cham (2017) 6. Chernenkiy, V., Gapanyuk, Y., Terekhov, V., Revunkov, G., Kaganov, Y.: The hybrid intelligent information system approach as the basis for cognitive architecture. Procedia Comput. Sci. 145, 143–152 (2018) 7. Shpak, M., Smirnova, E., Karpenko, A., Proletarsky, A.: Mathematical models of learning materials estimation based on subject ontology. In: Abraham, A., Kovalev, S., Tarassov, V., Snášel, V. (eds) Proceedings of the First International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’16). Advances in Intelligent Systems and Computing, vol. 450, pp. 271–276. Springer, Cham (2016). 8. Medvedeva, M., Vols, M., Wieling, M.: Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law (2019). https://doi.org/10. 1007/s10506-019-09255-y 9. Soh, J., Lim, H.K., Chai, I.E.: Legal area classification: a comparative study of text classifiers on singapore supreme court judgments. In: Proceedings of the Natural Legal Language Processing Workshop 2019, pp. 67–77 (2019). https://www.aclweb.org/ anthology/W19-2208 10. Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: Extreme multi-label legal text classification: a case study in EU legislation. ArXiv, abs/1905.10892. (2019) 11. Sugathadasa, K., Ayesha, B., de Silva, N., Perera, A.S., Jayawardana, V., Lakmal, D., Perera, M.: Legal document retrieval using document vector embeddings and deep learning. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2018. Advances in Intelligent Systems and Computing, vol. 857, pp. 160–175. Springer, Cham (2019)
Construction of a Neural Network Semiempirical Model of Deflection of a Sample from a Composite Material Dmitry Tarkhov , Valeriy Tereshin , Galina Malykhina(&) Anastasia Gomzina , Ilya Markov , and Pavel Malykh
,
Peter the Great St. Petersburg Polytechnic University (SPbPU), St. Petersburg 195251, Russia [email protected]
Abstract. In the course of this work, experiments were carried out to simulate the deflection of the composite tape between two supports under the action of gravity without load and under load. The position was approximated during deformation of a given object accounting its individual properties. As an example of this problem, we proposed an approach that allows us to take into account the individual features of a real object without complicating the differential structure of the model using real measurements. The differential model itself describes a narrow class of objects that are similar in some properties and can give a poor forecasting result for specific real objects. Accounting measurements allows extremely well consider the individual characteristics of the simulated object without complicating the differential structure. In the proposed neural network approach, the loss function is composed so that to take into account the differential structure, boundary conditions, and measurement data. The loss function is represented as a sum, each term of which is a weighted functional. One functional is responsible for satisfying the differential structure of the model, another for boundary conditions, and a third for the quality of the description of real measurements. The role of the weights is to align of all terms in a series. The application of global minimum search methods for such a loss function allows one to obtain reliable results even in the case of using an unsuitable differential model. Keywords: Neural networks Modeling Refinement of the differential model
1 Introduction Modeling real objects is an urgent task in various fields of science and technology. The main method of solving it are various kinds of differential models. They are capable to predict the behavior of a real object with high accuracy, if the original differential model is accurate enough. The transition from a differential model to its exact solution allows us to describe the simulated object in the entire domain of definition of its properties. The application of this approach is associated with overcoming several major difficulties: © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 249–255, 2021. https://doi.org/10.1007/978-3-030-60577-3_29
250
• • • •
D. Tarkhov et al.
selection of the structure of the differential equation; selection of coefficients; setting boundary conditions; finding the exact solution.
The selection of the structure of the differential equation is a complex process that requires both theoretical study and experimental study, often not leading to an improvement in the prognostic qualities of the model in some ranges of the simulated object parameters. The selection of the coefficients of a differential equation with its known structure often allows us to get good results and is less time-consuming, but gives a narrower range of possibilities for tuning the model. To overcome the difficulties associated with the selection of the structure of a differential model, it is often done so that the model does not describe well a specific real object, but some ideal object that is close to it in terms of its physical properties. When choosing the coefficients, they are often made constant, which can greatly simplify the search for a solution. Finding the exact analytical solution to the differential equation is most often not possible. To solve the simulation problem, authors of publication [1] resort to various numerical methods. The proposed different approach to the differential model allows us to take into account the features of a particular investigated object. Correction of the parameters is carried out using real measurements simultaneously with the construction of the solution [2–6]. Our approach is as follows. A feedforward neural network is being developed. The loss function, minimizing during training, in the general case is the sum of three functionals with numerical factors. These factors are necessary in order to align the order of the terms in the loss function. The first functional in the loss function is compiled in such a way that the neural network under construction satisfies the differential equation model. The coefficients of the model are free parameters and are also selected in the learning process. The second functional provides a qualitative description of the actual measurements of the behavior of the object. The third functional ensures the satisfaction of boundary conditions.
2 Description of the Problem The article considers the problem of modeling the deflection of a single-span beam with various options for fixing at the ends. Considered methods of fastening: termination (wide fastening), articulated fixed support (narrow fastening). For each type of beam fastening at the ends, free sagging was considered without load and with a weight of 0.1 kg fixed in the geometric center of the beam. A composite section panel made of PVC with a length of 800 mm was used as a beam. In the places of fastening, the beam was fixed in such a way that the wider edge of the section was installed horizontally. The beam fixing points were set horizontally at the same level, the level line was set using the bubble level with an error of no more than 3 mm per 1000 mm of length.
Construction of a Neural Network Semi-empirical Model
251
To simplify the mathematical description, the following assumptions were used: • the beam was considered infinitely thin; • mass m is evenly distributed over the entire length with a constant density; • the load created by a weight suspended in the geometric center of the beam is considered as a concentrated force of length G acting in the direction of acceleration of gravity; • the deflection is flat, absolutely elastic, symmetrical with respect to the geometric center of the beam. The assumptions can significantly simplify the differential model, however, the direct solution to this problem is not confirmed by real data, which indicates it’s the roughness. In contrast to the differential model, application of our approach allows us to account results that well describe the real deflection. We offer the following notation: A is the length of the support reaction force vector in projection onto the x axis; B is the length of the support reaction force vector in projection onto the y axis; D is a differential operator defining a model equation Dh ¼ f ; l is the length of the not deformed beams q ¼ mg l ; the horizontal axis x passing through the centers of fastening of the loaded beam; the vertical y axis is perpendicular to the x axis, directed downward in the direction of deflection; N Xdata :¼ xj j¼1 - the results of measuring the coordinate x of the deflection line for each half of the deflection line; N Ydata :¼ yj j¼1 - the results of measuring the y coordinate of the deflection line for each half of the deflection line; The angle h of rotation of the deflection line, measured from the horizon (x axis) and the length of the deflection line s, measured from the bottom of the deformed beam; N Xdata :¼ xj ; yj j¼1 ¼ Xdata Ydata - results of measurements of the deflection line for each half of the deflection line; sffiffiffiffiffiffiffiffiffiffi n P x2j - used definition of the norm for a discrete representation of kxk ¼ 1n j¼1
functions. The coordinates in which the differential equation modeling the beam deflection is solved are related to the Cartesian. In the Cartesian coordinates the measurements were carried out, by differential dependencies: 8 dx > < ¼ cosðhÞ ds > : dy ¼ sinðhÞ ds
ð1Þ
252
D. Tarkhov et al.
When bending stiffness in the accepted coordinates is taken into account, the problem reduces to solving an ordinary second-order differential equation: EJ
d2h ¼ ðB qsÞ cosðhÞ þ A sinðhÞ ds2
ð2Þ
For the convenience of processing experimental data, a replacement of variables s ¼ 1 2l s was introduced. With the change of variables, Eq. (2) takes the following form d2h ¼ að1 sÞ cosðhÞ þ b sinðhÞ ds2
ð3Þ
The coefficients in the boundary value problem (3) are selected in the process of training the network. Note that building the model immediately for the entire beam did not lead to success. The reason was the insufficiently accurate location of the load in the middle of the beam. Subsequently, models were built separately for the left and right relative to the load of the beam parts. This approach allowed us to create fairly accurate models.
3 Suggested Solution Method For this problem, a feedforward neural network with one hidden layer, with a nonlinear function of activation of neurons, was used. A neuron with a linear activation function was used as the output layer of the network. One parameter is supplied to the network input s - the corresponding one, to the variable with respect to which the problem (3) is solved. The network output corresponds to a function hnet ¼ hnet ðsÞ that approximates the solution to problem (3). To build a neural network solution, the loss function was used: J hnet ; Pequ ; Pdata ¼ dequ Jequ hnet ; Pequ þ ddata Jdata ðhnet ; Pdata Þ
ð4Þ
In functional (4), incoming variables, Pequ , Pdata (pages) are finite, countable subsets of a fixed size: Elements for subsets si , yj are randomly selected from the subsets and, ½0; 1 R þ и Ydata Xdata respectively. The selection of elements is carried out in accordance with a uniform distribution law. The first term in the loss function (4) is determined as follows:
Jequ hnet ; Pequ ¼
m X i¼1
(
)2 d 2 hnet 2 að1 si Þ cosðhnet ðsi ÞÞ þ b sinðhnet ðsi ÞÞ ð5Þ ds s¼si
Construction of a Neural Network Semi-empirical Model
253
The second term in functional (4) is defined as follows: Jdata ðhnet ; Pdata Þ ¼
n X 2 ~yj yj
ð6Þ
j¼1
The function ~yj ¼ ~y hnet sj is determined so as to satisfy the differential dependence (1). To simplify the learning process, instead of finding the exact solution to system (1), the solution was obtained by the Simpson formula with a variable upper limit [2–4]: 2 3 8 cosðhnet ð0ÞÞ þ cosðhnet ðsÞÞ > > > M M 5 P P > ~x ¼ ~l s 4 Þs > 2 6M > þ4 cos hnet ð2n1 cos hnet ns þ2 < M 2M n¼1 n¼1 2 3 sinðhnet ð0ÞÞ þ sinðhnet ðsÞÞ > > > M M > ~y ¼ ~l s 4 5 P P > Þs > 2 6M : þ4 sin hnet ð2n1 sin hnet ns þ 2 M 2M n¼1
ð7Þ
n¼1
Here, ~l ¼ lð1 þ nÞ, where n is some given number. The parameter sj is selected from the condition 8 xj ; yj 2 Xdata ; sj : ~x hnet sj xj e, where e is some small number, which is set in advance. Neural network training was carried out by minimizing functional (4) by the method of pagination with restart [2]. The essence of the technique is that the functionality is minimized by some local algorithm for several eras of learning with unchanged pages Pequ and Pdata . Next, the page Pequ is updated and the process repeats. Changing the training pages allows you to avoid falling into local minima, even when using the local minimization algorithm. A variant of the conjugate gradient method (CG) was chosen as the learning algorithm, since for this problem it gave the best result among the compared methods. During the training of the network, weight factors dequ and ddata were chosen as follows. At first, several epochs of training, the weight factor dequ gradually increased from zero to unity, with the goal of gradually adding a term corresponding to the differential equation, while the second remained constant ddata ¼ 1. Further, the coefficient was assumed to be constant dequ ¼ 1 until the end of the learning process, and the second coefficient ddata changed periodically. The coefficient ddata was updated much less frequently than the pages were updated. Such an approach to the selection of coefficients made it possible to increase the stability of the learning algorithm.
4 Results of the Method Application Application of the developed methodology allowed us to obtain fairly accurate results for the problem solution. The average results for various fastening options and loads are presented in Table 1.
254
D. Tarkhov et al. Table 1. Averaged simulation results Beam fixing
Part of data
Sealing the end Left side of data Right side of data Hinge Left side of data Right side of data
edata edata unloaded beam loaded beam 3:76 102 3:24 102 5:39 102 1:81 102
9:60 103 4:00 102 1:60 102 6:89 103
The values edata were determined as follows edata ¼ M1
M
P
net
~yj ydata = ydata .
j¼1
Figures 1 and 2 show some results of network training. A continuous line indicates the network output, and round markers indicate the measurement data.
Fig. 1. Rigid fastening, with load, the right part of data.
Fig. 2. Hinged, no load, left side of data.
5 Conclusions Based on the results presented in this paper, it can be argued that the proposed method for solving the problem allows us to obtain high-quality results that well describe the real data.
Construction of a Neural Network Semi-empirical Model
255
The proposed algorithm allows to obtain a reliable solution even in the presence of measurements containing outliers. An example of such a solution is given in the Fig. 2. The degree of influence of the differential model can be adjusted by increasing the page size Pequ and adjusting the weight coefficient dequ . However, we should keep in mind that an increase in the accuracy of the solution of Eq. (2) leads to a deterioration in learning outcomes and the description of experimental data. Improving the accuracy of solving system (2) by the Simpson method, by increasing the number of terms, did not lead to an increase in the accuracy of the solution during training. The most highquality results were achieved by taking the parameter M 15 20. Using the length of the sample as an adjustable parameter allowed us to obtain qualitative results and take into account the significant stretching of the sample, which does not account the differential model. The best results for this task were achieved for feedforward neural network with one hidden layer about ten neurons in size. The use of a larger size of the hidden layer is impractical, because improving the quality of the data description does not occur, and the required training time increases significantly. Acknowledgment. This paper is based on research carried out with the financial support of the grant of the Russian Scientific Foundation (project №18-19-00474).
References 1. Hairer E., Norsett S.P., Wanner G.: Solving Ordinary Differential Equations I: Nonstiff Problem, Springer-Verlag, Berlin, 1987. xiv + 480 pp 2. Tarkhov D., Vasilyev A. Semi-empirical Neural Network Modeling and Digital Twins Development, 288 pp. Academic Press, Elsevier, (2019) 3. Vasilyev, A.N., Tarkhov, D.A., Tereshin, V.A., Berminova, M.S., Galyautdinova, A.R.: Semi-empirical neural network model of real thread sagging. In: Studies in Computational Intelligence, vol. 736, pp. 138–146. Springer (2017) 4. Zulkarnay, I.U., Kaverzneva, T.T., Tarkhov, D.A., Tereshin, V.A., Vinokhodov, T.V., Kapitsin, D.R.: A two-layer semi empirical model of nonlinear bending of the cantilevered beam. Journal of Physics: Conference Series, Volume 1044, conference 1 5. Bortkovskaya, M.R., Vasilyev, P.I., Zulkarnay, I.U., Semenova, D.A., Tarkhov, D.A., Udalov, P.P., Shishkina, I.A.: Modeling of the membrane bending with multilayer semiempirical models based on experimental data. In: Proceedings of the II International Scientific Conference “Convergent Cognitive Information Technologies” (Convergent 2017), Moscow, Russia, 24–26 November, 2017 6. Takhov, D.A., Bortkovskaya, M.R., Kaverzneva, T.T., Kapitsin, D.R., Shishkina, I.A., Semenova, D.A. et al.: Semiempirical model of the real membrane bending. In: Problem Advances in Neural Computation, Machine Learning, and Cognitive Research II. SCI, vol. 791, pp. 221–226. Springer (2018)
Combined Neural Network for Assessing the State of Computer Network Elements Igor Saenko1(&) 1
, Fadey Skorik2
, and Igor Kotenko1
Saint-Petersburg Institute for Information and Automation of the Russian Academy of Sciences, 14-th Line, 39, Saint-Petersburg 199178, Russia {ibsaen,ivkote}@comsec.spb.ru 2 Military Communication Academy, Tikhoretsky Avenue, 3, Saint-Petersburg 194064, Russia [email protected]
Abstract. Monitoring the status of computer networks is an integral part of network administration processes. The number of nodes in modern networks is constantly increasing the topology is becoming more complicated. It is becoming increasingly difficult for a system administrator to timely identify and eliminate contingencies. Specialized intelligent support systems or specialized knowledge bases can help in this task. As a rule, the basic of these systems are artificial neural networks. The paper considers the structure of a combined neural network, focused on solving the problem of assessing the state of computer network elements. Three training methods are considered: the stochastic gradient descent, the adaptive learning rate method, and the adaptive inertia method. The results of an experimental evaluation of various options for implementing a combined neural network and its training methods are presented. The results showed a high accuracy of calculations, good adaptability and the possibility of application in a wide range of computer network configurations. Keywords: Artificial neural network layer Computer network
Combined neural network Kohonen
1 Introduction Modern computer networks have a developed, branched topology. They include a large number of different network devices, both of the same type and oriented to the implementation of specific, highly specialized tasks. In addition, existing computer networks tend to expand over time. As a result, the complexity of administration and the need for highly qualified computer network administrators are steadily increasing. It is possible to solve the problem of qualified administration of computer networks by using combined neural networks (CNNs) to assess the state of computer network elements. Such a solution will significantly simplify the analysis of incoming data and automate the identification of emergencies without human intervention [1–4]. The paper proposes the original structure of a combined neural network (CNN), focused on solving the problem of assessing the state of computer network elements, and discusses © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 256–261, 2021. https://doi.org/10.1007/978-3-030-60577-3_30
Combined Neural Network for Assessing
257
the results of its testing using various training methods. The aim of the paper is to substantiate the possibility and evaluate the effectiveness of the proposed CNN and to choose the method of its training.
2 Related Work An analysis of the work on the topic of the use of artificial neural networks (ANN) in monitoring systems and intellectual decision support has shown significant limitations in the areas of their use. In [5], an expert system based on ANNs and fuzzy logic algorithms is considered. The system contains a knowledge base with a specific set of rules for interpreting input data. The advantage of the system is its stability. The disadvantage is the impossibility of self-learning. In [6], a self-learning system based on hybrid neural networks and fuzzy logic rules is proposed. The system is quite robust to noise. However, its training requires significant computational resources. In [7], a method for predicting user reactions in social networks using machine learning methods is presented. The advantage of this method is the simplicity of the neural network. However, the disadvantage is the complexity of the analysis and reduction to the numerical form of the source data. In [8], an original algorithm for classifying real objects based on the Hamming network was proposed. The advantage of the work is fast learner and low need for computing resources. The disadvantages include low resistance to noise and distortion of the input data. In [9], the infrastructural method for detecting DDoS attacks based on the Bayesian model of multiple changes was considered. The advantages include a high accuracy of calculations. The disadvantage is the narrow specialization of the proposed solution. In [10], a method for implementing neural networks in modeling the financial performance of a company that uses pre-trained neural networks is proposed. The advantage is the high speed of training, but the disadvantage is the narrow range of their application. In [11], a method for the use of adaptive resonant theory neural networks to detect and classify network attacks is considered. Despite the prospects for the use of such neural networks, this method requires significant time-consuming training and needs to be improved. Thus, the analysis showed that a common drawback of all the considered works is their narrow specialization or significant limitations. This makes it difficult or impossible to use them to assess the state of computer network elements.
3 The Structure of the Combined Neural Network Assessment of the state of computer network elements begins with data collection. A large number of values of various indicators characterizing the state of computer network elements at various points in time are subjected to collection. As a rule, the quality indicators of a computer network are used as such indicators, which determine the reliability, transparency, performance and throughput of the network, as well as the security of information processed in the network. Sources of information can be system logs of network devices, or network management and control systems based on the
258
I. Saenko et al.
SNMP protocol. The requirements for the system of indicators are such as ease of obtaining and the ability to obtain a complete, redundant characteristic of a computer network. If the selected targets meet the requirements, then the process of training the ANN and setting up an administrator’s decision support system will be greatly simplified. The data collected are normalized. The initial data is brought to the form necessary for the correct training and operation of the neural network. If necessary, during normalization, non-digital values are converted to digital form. It is proposed to normalize the data using a sigmoid logistic function lying within [0; 1]: ~xik ¼
1 ; eaðxik xci Þ þ 1
ð1Þ
where ex ik the initial and normalized values of the k-th data element of the i-th interval, respectively; xci – the center of the normalized i-th interval, a – the function slope parameter. The collected and normalized data directly go to the input of the neural network, where they are further processed. It is proposed to use the CNN for this purpose, the generalized structure of which is shown in Fig. 1.
Fig. 1. The generalized structure of the CNN.
The CNN includes: 1) Kohonen layer (self-organizing map - SOM); 2) two or more blocks, each of which consists of three layers of neurons (with linear, non-linear and RELU activation functions); 3) the output layer, represented by a layer with a linear activation function. The Kohonen layer reduces the dimensionality of the input data and the training time of the neural network as a whole [12]. Three-layer blocks perform the function of generalizing and filtering the incoming information. Their number and the number of active neurons depend on the dimension of the initial data. The output linear layer is necessary to reduce the dimensionality of the output data to the required level. The technique of using the CNN to assess the state of computer network elements includes the following steps: data sampling, normalization of values, reduction of the sample size, sample-processing using the ANN, and interpretation of the obtained results in accordance with the information available in the knowledge repository. The use of the
Combined Neural Network for Assessing
259
CNNs with the proposed structure is possible in a wide range of computer network configurations. By reducing the dimensionality of the input data, it is possible to realize as quick retraining or as well as overfitting of such networks.
4 Experimental Evaluation of the Combined Neural Network To evaluate the effectiveness of various training methods for these networks, series of experiments were carried out. Tables 1 and 2, respectively, provide the minimum training error without and with reducing the dimensionality of the input data. The Kohonen layer reduced the dimensionality. Table 1. Minimum network training error without dimensionality reduction. Input dimension 20 20 50 50 100 100 200 200
Number of hidden layers 2 3 3 4
The number of neurons in the hidden layer 180 450 800 1600
Minimal training error 0.0011 0.0016 0.0015 0.0018
When conducting a comparative analysis of various training methods, stochastic gradient descent (SGD) was the first to be considered [13]. This method updates each of the parameters by subtracting the gradient of the optimized function for the corresponding parameter. The weight ht þ 1 is calculated as ht þ 1 ¼ ht þ vt þ 1 , vt þ 1 ¼ g grad fi ðht Þ, where fi – the function calculated on the i-th part of the data; t – the iteration step; vt þ 1 – the weight increment; grad fi – the gradient of function fi ; g – the learning rate. With a very large g, the learning algorithm diverges, and with a very small one it will converge slowly. The adaptive learning rate method (Adadelta) [14] uses an exponential moving average to estimate the second moment of the gradient. The vt þ 1 value is calculated as vt þ 1
pffiffiffiffiffiffiffiffiffiffiffi xt þ e grad fi ðhi Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ ; gt þ 1 þ e
ð2Þ
where gt – the scaling parameter calculated as gt þ 1 ¼ cgt þ ð1 cÞ grad fi ðht Þ; c – the hyper-parameter, e – the stability constant; xt – the moving average calculated as xt þ 1 ¼ cxt þ ð1 cÞv2t þ 1 . To implement the adaptive inertia method (Adam), the estimates of the first and second moments are initialized with zeros, followed by a small correction. The first moment estimate is calculated as mt þ 1 ¼ c1 mt þ ð1 c1 Þgrad fi ðht Þ. The second
260
I. Saenko et al. Table 2. Minimum network training error with reduced dimensionality.
Input dimension 20 20 50 50 100 100 200 200
Number of hidden layers 2 2 3 4
The number of neurons in the hidden layer 100 250 520 1100
Minimal training error 0.0012 0.0018 0.0018 0.0021
moment estimate is calculated as gt þ 1 ¼ c2 gt þ ð1 c2 Þgrad fiðht Þ, where c1 and c2 _ _ are hyper-parameters. The biased estimates of the first mt þ 1 and second gt þ 1 moments are _ _ mt þ 1 ¼ mt þ 1 = 1 ct1þ 1 ; gt þ 1 ¼ gt þ 1 = 1 ct2þ 1
ð3Þ
Table 3. Average training error for different learning methods. Learning method Stochastic Gradient Descent (SGD) Adaptive learning step method (Adadelta) Adaptive inertia method (Adam)
Average training error 0.0034 0.0014 0.0021
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi _ _ Then, the weight coefficients are calculated as ht þ 1 ¼ ht g mt þ 1 = gt þ 1 þ e. Table 3 presents data on the average learning error for various teaching methods. Analysis of the obtained experimental data allows us to draw the following conclusions. First of all, there is a tendency towards an increase in the error of the neural network with an increase in the dimension of the input data sample. In this regard, there is a need to increase the number of neurons in the hidden layers or the number of hidden layers in order to minimize the error. Using the Kohonen layer as an input filter allows us to reduce the size of the neural network and the required training time with a slight increase in the error at its outputs. The most effective of the training methods considered was the Adadelta method. Moreover, the classification error when testing a trained neural network did not exceed 6.4%.
5 Conclusion The paper suggested the approach to the implementation of the CNN as a part of the intelligent support system for a computer network administrator. The proposed solution has high flexibility and adaptability to various input data configurations. The use of Kohonen networks can significantly reduce the training time and simplify the structure of the neural network. The application of the pro-posed solution is possible in remote control and monitoring systems of peer-to-peer computer networks. Further studies are
Combined Neural Network for Assessing
261
related to the use of the ANN apparatus in heterogeneous computer network monitoring systems with complex topology. Acknowledgement. This research is being supported by the grant of RSF #18-11-00302 in SPIIRAS.
References 1. Chen, Y., Kak, S., Wang, L.: Hybrid neural network architecture for on-line learning. Intell. Inf. Manage. 2, 253–261 (2010) 2. Kotenko, I., Saenko, I., Skorik, F., Bushuev, S.: Neural network approach to forecast the state of the internet of things elements. In: Proceedings of the XVIII International Conference on Soft Computing and Measurements (SCM’2015), IEEE Xplore, pp. 133–135 (2015). https://doi.org/10.1109/scm.2015.7190434 3. Wan, L., Zhu, L., Fergus, R.: A hybrid neural network-latent topic model. In: Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Canary Islands, vol. 22, pp. 1287–1294 (2012) 4. Saenko, I., Skorik, F., Kotenko, I.: Application of hybrid neural networks for monitoring and forecasting computer networks states. In: Cheng, L., Liu, Q., Ronzhin, A. (eds.) ISNN 2016. LNCS, vol. 9719, pp. 521–530. Springer, Cham (2016) 5. Azruddin, A., Gobithasan, R., Rahmat, B., Azman, S., Sureswaran, R.: A hybrid rule based fuzzy-neural expert system for passive network monitoring. In: Proceedings of the Arab Conference on Information Technology. ACIT, pp. 746–752 (2002) 6. Mishra, A., Zaheeruddin, Z.: Design of hybrid fuzzy neural network for function approximation. J. Intell. Learn. Syst. Appl. 2(2), 97–109 (2010) 7. Popova, E.P., Leonenko, V.N.: Predicting user reactions in social networks using machine learning methods. Nauchno-tehnicheskij vestnik informacionnyh tehnologij, mehaniki i optiki [Scientific and Technical Bulletin of Information Technologies, Mechanics and Optics] 20(1), 118–124 (2020). (in Russian) 8. Khristodulo, O.I., Makhmutov, A.A., Sazonova, T.V.: Use algorithm based at Hamming neural network method for natural objects classification. Procedia Comput. Sci. 103, 388– 395 (2017) 9. Behal, S., Krishan, K., Sachdeva, M.: D-FACE: an anomaly based distributed approach for early detection of DDoS attacks and flash events. J. NCA 111, 49–63 (2018) 10. Kurochkina, I.P., Kalinin, I.I., Mamatova, L.A., Shuvalova, E.B.: Neural Networks Method in modeling of the financial company’s performance. Stat. Math. Methods Econ. 4(5), 33–41 (2017) 11. Bukhanov, D.G., Polyakov, V.M.: An approach to improve the architecture of art-2 artificial neural network based on multi-level memory. Fuzzy Technol. Ind. FTI 2018, 235–242 (2018) 12. Souza, L.G.M., Barreto, G.A.: Nonlinear system identification using local ARX models based on the self-organizing map. Learning and Nonlinear Models - Revista da Sociedade Brasileira de Redes Neurais (SBRN) 4(2), 112–123 (2006) 13. Amari, Sh.: A theory of adaptive pattern classifiers. IEEE Trans. Electron. Comput. EC-16 (3), 299–307 (1967) 14. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv:1212.5701.2012
Operational Visual Control of the Presence of Students in Training Sessions Oleg I. Fedyaev(&)
and Nikolay M. Tkachev
Donetsk National Technical University, Donetsk, Ukraine [email protected]
Abstract. The problem of automating the students’ attendance in the classroom is solved by using computer vision. A convolutional neural network is used to recognize a person’s face. The recognition process is implemented in real time. Localization of faces in frames from a video camera is performed by the ViolaJones method. The convolutional neural network of the VGGFace model forms the features of a person’s face. Identification of the person occurs by the facial features similarity. The software is implemented by using the Keras and OpenCV libraries. The control system performs the following functions: captures the faces of students on a video camera when entering the classroom, compares faces with a database of students, notes the presence at the lesson (or being late) in case of successful identification, saves the data in attendance register. For the convenience of video monitoring, the color of the student’s line changes depending on his condition: not present, present, late, absent. The system provides for manual editing of the electronic register and the choice of the subject name. The student’s photo can be uploaded into the database from a file or directly from the camera. Keywords: Face recognition Convolutional neural network Video stream Localization of faces Video control
1 Introduction Image recognition is currently used in many applications. The most relevant and complex recognition problem is a person’ face recognition. The solution for this problem helps with people monitoring, security and many other computerized systems. Despite the successes achieved in recent years in the implementation of computer vision, there are still a number of unsolved problems in this area. The main difficulty of computer face recognition in a video stream, that needs to be overcome, is to recognize a person by the face image regardless of angle and lighting conditions, as well as various changes related to age, hair, etc. [1]. Nowadays, great prospects in solving image recognition problems are associated with deep neural networks. Convolutional neural network [1], which is developed using the ideas of such architectural neural networks as multilayer networks of the cognitron and neocognitron type [2]. Unlike the known classical neural network types, the architecture of the convolutional neural network is built on the principles of the human visual system. Such an architecture allows to identify all the characteristic facial © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 262–268, 2021. https://doi.org/10.1007/978-3-030-60577-3_31
Operational Visual Control of the Presence of Students in Training Sessions
263
features of a two-dimensional image topology. It is believed that in this case, objects are recognized with high accuracy and recognition speed. Moreover, face recognition from the video stream shows that neural networks can work in real time, using limited resources. Recently, complex deep neural networks recognition models have been developed. Therefore, the goal of this work is to assess the feasibility of implementing a system of neural network face recognition from a video stream based on the Keras and OpenCV library tools [3, 4].
2 Process of Face Recognition in Frames from a Video Stream The face recognition process is preceded by an important stage of automatic face localization in the image, the implementation methods of which are now being actively developed [5]. The stage of detecting a face in an image is the first step in the process of solving the higher level problem (for example, face recognition or facial expression recognition). However, the information about the presence and the number of faces in the image or in the video stream can be useful for applications such as security systems and indexing images or video clips database.
Fig. 1. The main processes of computer faces recognition
The face detection module receives images from a video camera in real time at a speed of 30 frames per second, selects and localizes faces on them (see Fig. 1). This function is performed by the face detecting algorithm in the current frame of the video stream. As a result, a sequence of faces images captured by the video camera is formed for their further recognition. Each selected face image is transmitted to the feature vector block, which implements the recognition function f: X ! Y, where X is the set of input face images; Y is the set of feature vectors for faces from X. Thus, the neural network function
264
O. I. Fedyaev and N. M. Tkachev
f associates with each featured face x 2 X the feature vector y 2 Y ðy ¼ f ð xÞÞ that characterizes the face. A convolutional neural network, which was previously trained on dataset of photographs of 2622 people (1000 photographs per person) [1, 3], was used to create facial features. The network is configured to classify a recognizable face, using faces from the training dataset as classes. Therefore, the result of the network is a 2622-dimensional vector, each element of which represents the likelihood of a person resembling one of the training samples. It is believed that two face images belong to the same person if they are equally similar to each person from the training sample. For this feature vector, images from the training dataset should form a fairly acute angle between themselves. When setting up the system, it is necessary to initially create a database of faces for all recognizable people represented by a finite set of corresponding surnames L. For this purpose, for 8x, using the neural network recognition function f, the set of correct pairs is determined as:
ðy; lÞjy ¼ f ð xÞ; x 2 X; l 2 L ;
ð1Þ
where X is the set of prepared recognized faces photos, i.e. patterns of recognized faces images; y is the facial features vector of the image x 2 X; l is the name of the person whose photo is shown in picture x. The whole set of pairs ðy; lÞ is entered into the database of facial feature vectors. In the normal mode of the system operation, i.e. upon recognition, in the comparison module, the features vector of the recognizable face obtained from the output of the convolutional neural network is compared with all database vectors. The comparison procedure is based on the method of calculating the cosine similarity of a recognized face vector with each reference vector from a database using the following formula: Pn Y Y i¼1 yi yi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P Pn 2ffi n kY k Y y2 y i¼1 i
ð2Þ
i¼1 i
where Y and Y are the feature vectors of the respectively recognized face and reference face from the database; n = 2622. A recognized face is considered to correspond to the standard if the obtained similarity coefficient is higher than a certain value (a value of 0.7 was used in the work).
3 Face Image Localization Algorithm The face recognition system uses the VideoCapture function from the OpenCV library [4] to capture the video from camera. In this work two algorithms were considered to solve the problem of automatic localization of faces in the frames of the video stream. The first algorithm is based on the color of human skin, the second – on the Viola-Jones method [5]. Since the first algorithm had a significant drawback due to the strong
Operational Visual Control of the Presence of Students in Training Sessions
265
dependence on lighting, therefore, in the work faces were localized using the ViolaJones method. The advantages of this method include a high degree of correct localization of the face, a small number of false positives and a high speed of work. It is less sensitive to light. The accuracy of isolating faces in a picture using the Viola-Jones algorithm under ideal conditions reaches 90–95%, which is quite acceptable for solving practical problems. The frames from the video stream are processed by the cascading Haar classifier, which is represented by the detectMultiScale function of the CascadeClassifier class. As a result, the Viola-Jones method determines the location of the selected face in the image with a set of parameters, including the coordinates and dimensions of the rectangular frame, which limits the image of the person’s face. All images of selected faces are normalized to a standard size of 224 224 pixels, each of which is presented in the RGB format.
4 Convolutional Neural Network Architecture The convolutional neural network was proposed by Jan Lekun for the effective object recognition in the image [6]. Its multilayer architecture consists of convolution and subsampling layers that alternate with each other. Each layer has a set of several feature planes. Moreover, neurons of the same plane have the same weight, leading to the corresponding local areas of the previous layer. The image of the previous layer is as if scanned by a small window, i.e. is passed through a set of weights (the core of the convolution), and the scan result is displayed on the corresponding neuron of the current layer. The convolution core is interpreted as graphic coding of some features, for example, the presence of a horizontal or vertical line. Thus, the set of planes is a feature map, which allows each plane to find its image areas anywhere in the previous layer. The operation of subsampling performs a reduction in the dimension of the generated feature maps. In this network architecture, it is believed that information about the fact of the presence of the desired feature is more important than the exact knowledge of its coordinates, therefore, the maximum is selected from several neighboring neurons of the feature map and taken as one neuron of a compacted feature map of smaller dimension. Due to this operation, in addition to accelerating further calculations, the network becomes more invariant to the scale of the input image.
Fig. 2. The architecture of the multilayer convolutional neural network model VGGFace
The alternation of layers allows to compose the following feature maps from previous feature maps, containing more general characteristics that are less dependent
266
O. I. Fedyaev and N. M. Tkachev
on image distortion. On each next layer, the map decreases in size, but the number of maps increases. In practice, this means the ability to recognize complex feature hierarchies. Usually, after passing through several layers, the feature map degenerates into a vector or even a scalar, but there are hundreds of such feature maps. An additional multilayer perceptron is installed at the output of the convolutional layers of the network. The network is trained using the standard back-propagation method. The VGGFace neural network was used as a model of the convolutional neural network in the system (see Fig. 2). Vector identifiers (facial features) were formed at its output.
Fig. 3. Localization and face recognition results
5 Video Control of Student Presence in Class The system program is based on the Python library named Keras. The selected development tools provided a high prototyping speed and platform independence. The used convolutional neural network model VGGFace was developed and trained by the developers of the Visual Geometry Group of the University of Oxford. Figure 3 shows an example of the detection and recognition of two faces. The coefficients of recognized faces proximity to each of the three database samples are shown on the left. The recognized face is marked on top with the name of the sample (in this case, the surname). If the face does not correspond to any of the samples, then it is marked as “unknown”.
Operational Visual Control of the Presence of Students in Training Sessions
267
The tasks of automatic learning in order to determine the presence of students in the classroom include the following subtasks: – – – – –
video recording of students entering the classroom; analysis of video stream frames for the presence of faces; comparison of selected faces with a student database; noting students' presence in the lesson in case of successful identification; saving data in the attendance register.
During the system operation, various problems can occur that can impede recognition accuracy: the student’s head rotation angle was not straight enough to the camera, poor lighting, too many faces in the image, due to which some frames were not captured, etc. Therefore, the system provides manual editing of the attendance register. The teacher can visually observe and manage information about students using the main window of the video registration system (see Fig. 4).
Fig. 4. Teacher UI
The system functionality provides adding and editing information about students: name, group and photos. The student’s photo can be selected from a file or directly from the camera. There is a table with a list of students of the group in which the teacher conducts a lesson is displayed. Its placed in the left part of the window. The time of arrival and student's presence state (not marked, arrived, late, absent) are recorded in the table when students enter the classroom. For the convenience of video monitoring, the color of the student’s line changes depending on his state.
268
O. I. Fedyaev and N. M. Tkachev
Subject selector, stop and continue recognition buttons, late mode switcher and lesson data reset button are placed on the right side of the window. When you switch to the late mode, the status of all unmarked students changes to ``Absent'', and those who will be recognized in this mode are marked with the status “Late”. The button “End lesson” writes the data from the table of the current lesson to the attendance register (this button is active only in late mode).
6 Conclusions The article proposes an approach to solving the problem of recognizing a person’s face based on a convolutional neural network. Localization of faces in frames from a video camera was performed by the Viola-Jones method. The convolutional neural network forms the features of a person’s face. Identification of the person occurs by calculating the cosine coefficient of similarity of the facial features vectors. The recognition process is implemented in real time. The developed version of the system that uses the resources of the Keras and OpenCV libraries is used for the operational accounting of the students presence in classroom using the faces images from the video stream.
References 1. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning, p. 802. MIT Press, Cambridge (2016) 2. Fedyaev, O., Makhno, Y.: The system for recognition of noisy and deformed graphic patterns which based on neocognitron neural network. In: Proceedings of 11th National Conference “Conference on Artificial intelligence-2008”, vol. 3, pp. 75–83. URSS, Moscow (2008). (in Russian) 3. Gulli, A., Pal, S.: Deep Learning with Keras: Implement neural networks with Keras on Theano and TensorFlow, p. 318. Packt Publishing, Birmingham (2017) 4. OpenCV – open source computer vision library. https://software.intel.com/en-us/articles/ 5. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features, pp. 1– 8 (2001) 6. Le Cunn, Y., Bengio, Y.: Convolutional Neural Networks for Images Speech and Time Series, pp. 1–14. ATT Laboratories, Holmdel (1995)
Classification of Microscopy Image Stained by Ziehl–Neelsen Method Using Different Architectures of Convolutional Neural Network Inga G. Shelomentseva1,2(&) and Serge V. Chentsov2 1
Krasnoyarsk State Medical University named after Professor V.F. Voino-Yasenetsky, 1, Partizan Zheleznyak Ave, Krasnoyrsk 660022, Russia [email protected] 2 Siberian Federal University, 79, Svobodny Ave, Krasnoyrsk 660041, Russia
Abstract. Tuberculosis (TB) is a global issue of public health. The paper presents the result of the investigation of the clinical efficacy of a convolutional neural network for detection of acid-fast stained TB bacillus. The experimental set contains images of the results of microscopy of patients' sputum stained by the Ziehl–Neelsen method. During the experiment, the original set of images segmented to augmentation the data. We built a few convolutional neural networks (CNN) models to recognize TB bacillus by transfer learning. The experiment conducted based on AlexNet, VGGNet-19, ResNet-18, DenseNet, GoogLeNet-incept-v3, In-ceptionResNet-v2 and the classic three-layer model. The DenseNet is the most productive model of transfer learning on the experimental set. During the study, the usual three-layer convolution network developed, which showed the maximum value of accuracy in the experiment. A convolutional neural network with a simple structure may be an effective base for an automated detection system for stained TB bacilli, but image segmentation is required to increase recognition accuracy. Keywords: Ziehl-nielsen
Image processing Convolutional neural network
1 Introduction When we create an automated system for the recognition of tuberculosis mycobacteria on digital sputum images, we must take into account such features of the data as the similarity of color shades between the tuberculosis mycobacteria and background and the low depth of field of the analyzed images. These features affect the accuracy the classification of mycobacteria by our automated system for the recognition of tuberculosis mycobacteria. One way to improve classification accuracy in image recognition is to use convolutional neural networks. Convolutional neural networks are a set of methods and algorithms aimed at modeling high-level abstractions and providing the allocation of hidden features in the analyzed image [1]. The task of the current study is to test the applicability of convolutional neural networks to build an automated tuberculosis diagnostic system. Solving task, the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 269–275, 2021. https://doi.org/10.1007/978-3-030-60577-3_32
270
I. G. Shelomentseva and S. V. Chentsov
experimenter must take the following conditions: method for the diagnosis of tuberculosis - microscopy of sputum analysis stained by the Ziehl–Neelsen method, solving problem - pattern recognition, number of classes-2, input data type – rgb color model, bmp image format, accuracy on balanced data set at least 95%, response rate - no more than 5 min for classification of a single image.
2 Materials and Methods In this study, the experimenter obtained 630 images of sputum analyses of patients by a microscopy method by Ziehl–Neelsen with the use of a trinocular microscope Micromed at increase 10 60 with the mounted ToupCam digital camera with the resolution of 0.3 MP. The Ziehl–Neelsen method involves treating sputum analysis with carbolic fuchsin, followed by decolorization with 5% sulfuric acid or 3% hydrochloric acid, and coloring with a 0.25% methylene blue solution [2]. As a result, acid-resistant bacteria, which include mycobacterium of tuberculosis, stained in shades of red (Fig. 1).
Fig. 1. Examples of image of sputum analyses stained by the Ziehl-Neelsen technique and ROI after segmentation
Convolutional neural network (CNN) is a model of a neural network with three characteristics - local receptive fields (the connection of neurons with a small area of the previous layer), common weights and the subsampling (compaction using nonlinear transformation). All together gives resistance to morphological transformations and a high degree of generalization for the feature vector. A convolutional network recognize two-dimensional images using a combination of the operations of forming a feature vector and classification.
Classification of Microscopy Image stained by Ziehl–Neelsen Method
271
CNN consists of the following types of layers: input layer, convolution layer, subsampling layer and output layer. The input layer for image recognition is a tensor corresponding to the color channels of the image. Pixels in the input image normalized. The output layer is one-dimensional. We used transfer learning to build a convolutional network architecture for the recognition images of mycobacterium tuberculosis. Transfer learning means the transfer of a convolutional network model developed for one problem to solve another problem. On transfer learning, we used the widely used universal convolution network architectures, such as AlexNet, VGG, ResNet, GoogLeNet, DenseNet, etc. There are medical diagnostic systems for image classification using transfer learning, for example, detection of skin cancer using images based on GoogleNet Inception v3 [3] or chest radiography based on DenseNet and ResNet [4–5]. Y. LeCun developed the AlexNet network in 2012 and had a great influence on the theory of convolutional networks. It is in this model that such methods of model optimization as DropOut, Data Augmentation, and ReLu, work with big data and use the GPU first appear. [6]. At present, the theory of CNN is developing rapidly: VGGNet uses several small-size filters instead of one large-size filter [7], the ResNet network uses direct access connections, in the DenseNet network K levels (Dense Block) have K * (K + 1)/2 direct connections [8], the GoogLeNet network aims to reduce the number of network parameters [9], and the InceptionResNet network combines the principles of residual learning and the parallel use of filters of different dimensions [10–11].
3 Results The researchers carried out the computational experiment in 3 phases as follows: Phase1 – The researchers carried out phase 1 of computational experiment on a personal computer with the following characteristics: Intel® Core ™ i7, 2.70 GHz, 10.00 GB RAM, 64-bit Microsoft Windows 7. The sample results of experimental research for transfer learning based on microscopy images of sputum stained by the Ziehl–Neelsen method presented in Table 1. Phase2 – The researchers carried out phase 2 of computational experiment with on the GPU Google Colab with framework Tensorflow and parallel computing. The studied images went through the stages of filtering and segmentation. The result was about 5084 ROI (region of interest) with mycobacterium tuberculosis and 123197 ROI without mycobacterium tuberculosis. Data augmentation technology applied to the ROI with mycobacteria to obtain a balanced data set. The sample results of experimental research for transfer learning based on the result of the segmentation of microscopy images of sputum stained by the Ziehl–Neelsen method presented in Table 2. Phase3 – The researchers carried out the phase 3 of the computational experiment on the GPU Google Colab with framework Tensorflow and parallel computing. The sample results of experimental research of our own CNN model presented in Table 3. Accuracy used as a criterion for comparing the results of this experiment. It is a universal criterion, which reflects the percentage of all correct decisions (positive and negative) of the classifier.
272
3.1
I. G. Shelomentseva and S. V. Chentsov
Phase 1
AlexNet, VGGNet-19, ResNet-18, DenseNet, GoogLeNet-incept-v3, InceptionResNetv2 were selected as models for computational experiment. The experimental accuracy on the training data is equal to an average of 95%, which corresponds to the objectives of the experiment. The experimental accuracy on the test data fluctuated between 76% and 81% (Table 1), which not correspond to the expected quality criterion. To improve network performance, we used data augmentation and model simplification. Data augmentation slightly increased the accuracy value by 1–2% for each model. Table 1. Experiment results of transfer learning on phase 1 (classification of the original input image) CNN AlexNet VGGNet-19 ResNet-18 GoogLeNetinception-v3 DenseNet InceptionResNet-v2
3.2
Training time, min 28 502 58 246
Accuracy on the test data, % 77,89 76,84 81,05 79,47
Classification time, sec 15,4 152,6 24,8 72,3
315 606
80,53 76,84
95,2 142,2
Phase 2
Filtering and segmentation operations applied to the studied images. The preprocessing is necessary to exclude background from the image and select objects for further classification as acid-resistant mycobacteria or as otherwise object. In this study, linear convolution used as a filtering method and Mexican Hat wavelet transform used as a segmentation method [12]. VGGNet-16, ResNet-50, DenseNet, GoogLeNet-incept-v3, InceptionResNet-v2 were selected as models for computational experiment. The experimental accuracy on the training data is equal to an average of 98%, which corresponds to the objectives of the experiment. The experimental accuracy on the test data fluctuated between 85% and 95% (Table 2), which partially corresponds to the expected quality criterion. To improve network performance, we increased the number of epochs, which slightly increased the accuracy value by 1–2% for each model. 3.3
Phase 3
Goodfellow, Benjamin, and Kurville [13] recommend simplifying the model of the network to improve performance on test data, which correspond with Christian Szegedy's fourth recommendation about balancing the depth of the network, the width of the network, and the amount of data processed [14]. Based on the recommendations of
Classification of Microscopy Image stained by Ziehl–Neelsen Method
273
Table 2. Experiment results of transfer learning on phase 2 (classification of the ROIs) CNN VGGNet-16 ResNet-50 GoogLeNet-inception-v3 DenseNet InceptionResNet-v2
Training time, min Accuracy on the test data, % 93 92,24 153 86,27 100 90,91 113 95,47 153 85,44
Goodfellow et al. and Szegedy, a three-layer convolutional neural network designed, which includes the classic technologies of convolutional networks: dropout, ReLU, 3 3 dimension filters. Figure 2 presents the network architecture, where MaxPooling is a subsampling layer with the max function and Dense is a fully connected layer. Table 3. Architecture of the developed convolutional network (classification of the original input image) Layer
Layer type
1 2 3 4 5 6 7 8 9 10
Convolution MaxPooling Convolution MaxPooling Convolution MaxPooling Flatten Dense Dropout Dense
Input dimension (550,550) (548,548) (274,274) (272,272) (136,136) (134,134) (67,67) 4489 – –
Output dimension – – – – – – – 128 – 2
Filter dimension, amount of filters (3, 3), 32 (2,2) (3, 3), 32 (2,2) (3, 3), 64 (2,2) – – – –
Activation function relu – relu – relu – – relu – sigmoid
In addition to designing the architecture and selection of optimal hyperparameters, it is necessary to determine the basic technology of neural network implementation. The researchers selected Google Colab to run Jupiter Notebook on GPU (NVidia Tesla K80) on a Linux server with 13 GB of video memory using Python 3.6 programming language, TensorFlow framework, and Keras library. Testing results of our own CNN model on images of dimension 150 150 showed a result of 75% accuracy. With an increase in dimension to 550 500 pixels, the accuracy increased to 78.57%. Testing results of our own CNN model on ROI (region of interest, the result of segmentation of microscopy images) of dimension 150 150 showed a result of 96.97% accuracy.
274
I. G. Shelomentseva and S. V. Chentsov
4 Conclusion Goodfellow recommends choosing a basic model and quality metric based on the characteristics of the studied problem. In the second step, Goodfellow recommends experimenting with a conveyor system containing measuring tools for the diagnosis of structural elements of the studied model. During the experiment, it is necessary to measure the parameters of the model and apply optimization strategies by turns - data augmentation, optimizing the structure of the model, changing the model architecture. During the experiment, it is necessary to consider the cost of the used optimization strategies. For example, when working with medical applications, it may be more expensive to collect new data than other optimization strategies, or not possible at all. In the first phase, the researcher performed with the original images. The task was to determine the applicability of convolutional neural networks without using segmentation operations. For the current problem statement, the researcher applied data augmentation and optimization of the convolutional neural network architecture, and the accuracy of the test data obtained 81%. However, the required accuracy on the test data (95%) not achieved. Therefore, the next strategy that needs to apply in the current experiment is the collection of additional data by using the segmentation operation, because obtaining a sufficient number of additional images is difficult. In the second phase, the segmentation of original images applied, and the accuracy of the test data (95%) achieved. In the third phase, the model simplified to three convolutional layers (Fig. 2).
Fig. 2. Architecture of the developed convolutional network (classification of the ROIs)
This model showed an accuracy of 75%–78% on the original images, 96.97% on the segment ROI. The next stage of the experiment will be fine-tuning this model to increase the sensitivity of the network.
References 1. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: GradientBased Learning Applied to Document Recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Classification of Microscopy Image stained by Ziehl–Neelsen Method
275
2. Koch, M.L., Cote, R.A.: Comparison of fluorescence microscopy with Ziehl-Neelsen stain for demonstration of acid-fast bacilli in smear preparations and tissue sections. Am. Rev. Respir. Dis 91, 283–284 (1965) 3. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542 (7639), 115 (2017) 4. Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M.: A Radiologist-level pneumonia detection on chest x-rays with deep learning, Preprint arXiv:1711.05225 (2017) 5. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.: Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weaklysupervised classication and localization of common thorax diseases. In: 30th IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, Hawaii, pp. 3462–3471 (2017) 6. Alom, Z., Taha, T., Yakopcic, C., Westberg, S., Sidike P., Nasrin S., Van Esesn B., Awwal, A., Asari, V.: The history began from AlexNet: a comprehensive survey on deep learning approaches. Preprint arXiv:1803.01164 (2018) 7. Zhang, X., Zou, J., He, K., Sun, J.: Accelerating very deep convolutional networks for classification and detection. Preprint arXiv:1505.06798 (2015) 8. Zhang, A., Lipton, Z., Li, M., Smola, A.: Dive into Deep Learning. Preprint d2l.ai (2020) 9. Zhang, A., Lipton, Z., Li, M., Smola, A.: Densely connected convolutional networks. Preprint arXiv:1608.06993v3 (2016) 10. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, Inception-Resnet and the impact of residual connections on learning. In: Proceedings of 31st AAAI Conference Artificial Intelligence, pp. 4278–4284 (2017) 11. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition 2015, Boston, MA, pp. 1–9 (2015) 12. Narkevich, A.N., Shelomentseva, I.G., Vinogradov, K.A., Sysoev, S.A.: Comparison of segmentation methods for digital microscopic images of sputum stained by the ZiehlNeelsen method. Eng. J. Don 4, 1–11 (2017) 13. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 14. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. Preprint arXiv:1512.00567 (2015)
Using a Sparse Neural Network to Predict Clicks Probabilities in Online Advertising Yuriy S. Fedorenko(&) Bauman Moscow State Technical University, Baumanskaya 2-ya, 5, 105005 Moscow, Russia [email protected]
Abstract. We consider the task of selection personalized advertisement for Internet users in the targeted advertising system. In fact, this leads to the regression problem, when for an arbitrary user U, it is necessary to predict the click probability on a set of banners B1…Bn in order to select the most suitable banners. The real values of the predicted probabilities are also important because they may be used in an auction between different advertising systems on many sites. Since the users’ interests and the set of banners are often changed, it is necessary to train the model in online mode. In addition, large advertising systems have to deal with a large amount of data that needs to be processed in real-time. This limits the complexity of the applicable models. Therefore, linear models that are well suited for dynamic learning remain popular for this task. However, data are rarely linearly separable, and therefore, when using such models, it is required to construct derivative features, for example, by hashing combinations of the original features. A serious drawback is that these combinations are needed to select manually. In this paper, it is proposed to use a neural network with specialized architecture to avoid this problem. Special attention is paid to the analysis of the results on the test set, for which a specialized statistical testing technique is used. The results of testing showed that a neural network model with automatically constructed features works equally with logistic regression with manually selected combinations for hashing. Keywords: Online advertising Clicks probability Online learning Feature hashing Sparse neural network Logistic regression Statistical hypothesis testing
1 Introduction The task of building targeted advertising systems is especially important due to the Internet penetration into all spheres of life, including business, shopping, entertainment, etc. The selection of personalized advertisements for Internet users is a machine learning task in which, based on historical data about shows and clicks of the user (and similar to him users), it is necessary to select new advertising banners. The successful solution of this problem depends mainly on the features of the input data [1]. There are methods supporting automatic feature extraction, selection, and construction, but when applying these methods to online learning tasks, some difficulties arise. In particular, the learning of decision trees in an incremental way requires regular model rebuilding, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 276–282, 2021. https://doi.org/10.1007/978-3-030-60577-3_33
Using a Sparse Neural Network to Predict Clicks Probabilities
277
that is not suitable for high load systems with a large scale input data stream. Deep neural networks have many parameters, approximate complex functions with a large number of local minimums and, as a result, are poorly adapted for online learning. They are also computationally expensive, which makes their use difficult in Internet advertising, where it is necessary to process many incoming requests in a strictly limited time. As a result, in such systems, the use of linear models (for example, logistic regression) remains a popular approach because they are simple and well suited for online learning [2]. However, data are rarely linearly separable, and therefore, when using such models, it is required to construct complex features, for example, by hashing combinations of the original features [3]. In this case, the values of the hash function from the combinations of original features are used as binary features at the input vector of the model (Fig. 1). hash ( fu1 , fb1 )
1 0
hash ( fu 3 , fb2 , fb4 )
.. .
hash( fu4 , fu5 , fb3 )
1 0
.. .
Fig. 1. Hashing combinations of the original features
Such a method with the mapping to another feature space supports online learning and allows finding complex dividing surfaces in the original feature space. A serious drawback is that experts need to select these combinations manually. This paper considers the use of the specific neural network architecture proposed in [4] for solving this problem.
2 Task Definition The regression problem is considered: for an arbitrary user U, described by a feature vector ðf u1 ; f u2 . . .f un Þ and banner Bi with features ðf bi1 ; f bi2 . . .f bim Þ, it is necessary to predict the click probability prðU; Bi Þ. Here Bi 2 B, where B is the set of banners that can be shown to user U (when selecting advertising banners, it is necessary to predict the click probability for all such banners in order to choose the best ones). Despite the choice of N banners with the highest click probability, it is not the ranking task. The real values of the predicted probabilities are also important because on many sites they are used in an auction between different advertising systems.
278
Y. S. Fedorenko
3 The Process of Selection of Advertising Banners In Fig. 2 the banners selection process is presented. The model of clicks probability prediction trains by logs with shows and clicks data. A small part of the requests does not pass through the model, which leads to a random selection of banners. Due to this, banners, for which the model predicts a low value of click probability, may appear in the training set. So, the variety of data is increased.
1 – P_random
Requests from users P_random
Set of banners (B1,…,Bk) for user Um
Set of banners (B1,…,Bk) for user Ur
Show Click
Regression model Update
Prediction Selection N random banners: random N (Pr(Ur , Bi ))
Clicks probabilities for banners: Pr(Um,B1) … Pr(Um,Bk)
User action
Selection N banners with max probability: max N (Pr( Um , Bi ))
N ti xi = −1 if (1) j Wij xj < ti ⎩ xi else. Here ti is the threshold level for the activation of the i-th neuron. The probabilistic Hopfield model differs from the deterministic one by the property that the transition is performed with the probability specified by the Fermi sigmoid function (1 + e−βxi ( j Wij xj −ti ) )−1 . P (x , x) = i
For the threshold level ti = 0 this expression can be rewritten as β β P (x , x) = e− 2 ij Wij xi xj / e− 2 ij Wij xi xj .
(2)
x
The similarity of such an expression with the partition function of the Ising model was remarked in [1].
Hopfield Neural Network and Anisotropic Ising Model
383
We need to say some words about the learning stage in the Hopfield model. The Hebbian paradigm for the learning process of the Hopfield network on the set of m patterns {1 , . . . , m } each of which is a vector k = (k1 , . . . , kn ) is achieved instantaneously by fixing the weight matrix in a following way m
1 k k i j . Wij = n k=1
The work of the network in the recall stage consists of iterative time dynamics defined by the transformation probability (2). The principal interplay of the Hopfield network and the Ising model is due to the energy functional. It turns out that for the symmetric weight function in asynchronous regime the expression E=−
1 Wij xi xj − ti xi 2 i,j i
plays the role of the Lyapunov function, due to the time dynamics it either lower or stays the same. This observation allows to analyze the asymptotic behavior of the Hopfield model. The stable points for its time dynamics are hence the states of local minimum energy for the Ising model on the same lattice. In this paper we explore another relationship between two such models: the Hopfield network on a two-dimensional lattice and the Ising model on the 3-dimensional lattice, making one of the spacial directions relevant to the time evolution. 1.2
Time Evolution and the Ising Model
Let us consider the Hopfield model on the triangular lattice (Fig. 1) colored by 3 colors in Z/3Z. The nontrivial weights are defined by rule: the neuron with the color c is influenced only by the neurons with the colors (c − 1) mod 3. This network is not symmetric despite the canonical definition. This lattice can be viewed as a projection of the cubic lattice to the plane i + j + k = 0. The verteces with different colors represent the planes i + j + k = c mod 3. The temporal behavior of this lattice is equivalent to the 3 independent 3-dimensional cubic lattices. We then consider the one of these 3 lattices. The conditional probability that the model passes throw the states with free initialization data is: P =
b
−1 1 2 3 1 + exp(xijk (wijk xi−1jk + wijk xij−1k + wijk xijk−1 )
i+j+k=a
=
b i+j+k=a
1 2 3 exp((xijk (wijk xi−1jk + wijk xij−1k + wijk xijk−1 )/2) 1 2 3 2 cosh((xijk (wijk xi−1jk + wijk xij−1k + wijk xijk−1 )/2)
384
D. V. Talalaev
-0 -1 -2
Fig. 1. Triangular lattice
Let us define a matrix
⎛1 ⎜ ⎜ A=⎜ ⎜ ⎝
2 1 2 1 2 1 2
1 2 1 2 − 12 − 12
1 2 − 12 1 2 − 12
1 2 − 12 − 12 1 2
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
which is a square root of 1 A2 = 1. Let us consider a transformation f : w → w f (w1 , w2 , w3 ) = (w0 , w12 , w13 , w23 ) given by the system of equations ⎛ ⎛ 0⎞ ⎞ log cosh((w1 + w2 + w3 )/2) w 1 2 3 ⎜ ⎜ w12 ⎟ ⎟ ⎜ 13 ⎟ = A ⎜ log cosh((w1 + w2 − w3 )/2) ⎟ . ⎝ log cosh((w − w + w )/2) ⎠ ⎝w ⎠ w23 log cosh((w1 − w2 − w3 )/2)
(3)
Lemma 1. Let f (w1 , w2 , w3 ) = (w0 , w12 , w13 , w23 ) than the following equation holds (cosh((w1 s1 + w2 s2 + w3 s3 )/2))−1 = exp (w0 + w12 s1 s2 + w13 s1 s3 + w23 s2 s3 )/2
(4)
∀si = ±1. Proof. Both sides of 4 are invariant with respect to the total change of signs si → −si . Hence it is sufficient to prove this statement for 4 combinations of spins with s1 = 1. In this way we get a system of linear equation with the defining matrix A. The fact that it is square root of unity gives the result.
Hopfield Neural Network and Anisotropic Ising Model
385
Remark 1. The transformation f restricted to the last 3 variables F : (w1 , w2 , w3 ) → (w12 , w13 , w23 ) was known in the theory of the Ising model as a “star-triangle” transformation [11]. It appears to be a solution for the Zamolodchikov tetrahedron equation [12]. Theorem 1. The conditional probability 3 coincides with the Ising-type partition function: P =
b
1 2 3 exp (xijk (wijk xi−1jk + wijk xij−1k + wijk xijk−1 )/2 ×
i+j+k=a
×
b
12 13 23 exp (wijk xi−1jk xij−1k + wijk xi−1jk xijk−1 + wijk xij−1k xijk−1 )/2
i+j+k=a 12 13 23 1 2 3 , wijk , wijk ) = F (wijk , wijk , wijk ). where (wijk
Remark 2. This model can be interpreted as an Ising model on the regular cubic 12 13 23 , wijk , wijk ). lattice with additional diagonal edges with weights defined by (wijk We illustrate the weight distribution on Fig. 2.
i j k-1 23 wijk
i j-1 k wijk3
wijk2
w13 ijk
w12 ijk
ijk wijk1
i-1 j k
Fig. 2. Cubic lattice
1.3
Conclusion
The principal aim of this work concerns the possibility of the Bethe ansatz method application to the description of the critical behavior of the Hopfield neural network. This could be fruitful in such a technique as the simulated annealing in neural networks [13]. In this note we generalized a relation between the Hopfield model and the Ising one in the case of the completely anysotropic models. This is interesting for us also for the reason that this case allows to apply the methods of cluster algebraic structures [14,15] for the case of neural network models.
386
D. V. Talalaev
Acknowledgments. The work was partially supported by the grant Leader (math) 20-7-1-21-1 of the foundation for the advancement of theoretical physics and mathematics “BASIS”, this work was carried out within the framework of a development programme for the Regional Scientific and Educational Mathematical Center of the Yaroslavl State University with financial support from the Ministry of Science and Higher Education of the Russian Federation (Agreement No. 075-02-2020-1514/1 additional to the agreement on provision of subsidies from the federal budget No. 075-022020-1514).
References 1. Little, W.A.: The existence of persistent states in the brain. Math. Biosci. 19, 101–120 (1974) 2. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. PNAS 19, 2554–2558 (1982) 3. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. (2009) 4. Recanatesi, S., Katkov, M., Roman, S., Tsodyks, M.: Neural network model of memory retrieval. Front. Comput. Neurosci. 9, 149 (2015) 5. Romani, S., Tsodyks, M.: Short-term plasticity based network model of place cells dynamics. Hippocampus 25, 94–105 (2015) 6. Onsager, L.: Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev. Ser. II 65(3–4), 117–149 (1994) 7. Amit, D.J., Gutfreund, H., Sompolinsky, H.: Statistical mechanics of neural networks near saturation. Ann. Physi. 173, 3S67 (1987) 8. Belavin, A.A., Kulakov, A.G., Usmanov, R.A.: Lectures on theoretical physics. MCCME (2001) 9. Talalaev, D.V.: Towards integrable structure in 3D Ising model. J. Geom. Phys. 148, 103545 (2020) 10. Talalaev, D.V.: Asymmetric Hopfield neural network and twisted tetrahedron equation. arXiv:1806.06680 11. Baxter, R.J.: Exactly Solved Models in Statistical Mechanics. Academic Press (1982) 12. Zamolodchikov, A.B.: Tetrahedra equations and integrable systems in threedimensional space. Soviet Phys. JETP 52, 325–336 (1980) 13. Da, Y., Xiurun, G.: An improved PSO-based ANN with simulated annealing technique. Neurocomputing 63, 527–533 (2005) 14. Berenstein, A., Fomin, S., Zelevinsky, A.: Parametrizations of canonical bases and totally positive matrices. Adv. Math. 122, 49–149 (1996) 15. Gorbounov, V., Talalaev, D.: Electrical varieties as vertex integrable statistical models. arXiv:1905.03522
Development of the Learning Logic Gate for Optimization of Neural Networks at the Hardware Level Taras Mikhailyuk(&) and Sergey Zhernakov Ufa State Aviation Technical University, Ufa, Russia [email protected]
Abstract. The problem of implementation of fully hardware neural networks based on programmable logic circuits is considered. A method for minimizing the hardware costs of artificial neural networks is proposed. The model of the learning logic gate network is given. For programmable logic gate array a model of a learning logic gate is offered. On the base of the methods of algebra of logic the decomposition of a two-layer gate neural network with a trained hidden layer is proposed. Models of learning logic gates based on conjunction and disjunction functions are being developed. The intellectual properties of such models are shown. A method of mapping to the basis of learning logic gate networks is proposed. The results of discrete optimization of network parameters are presented. It is shown that the genetic algorithm has the lowest training error. A conclusion is drawn on the applicability of the obtained models when constructing optimal combinatorial circuits for neural network processing. Keywords: Boolean algebra Logic gate network Hardware neuron Gate neural network FPGA
Learning logic gate
1 Introduction Currently, there is a problem of creating compact intelligent computers based on artificial neural networks in the field of robotics, mobile technology and embedded electronics. Neural networks require significant hardware resources and power, which imposes restrictions on their use in many applications. As part of this study, a method for constructing neural networks using the properties of the hardware platform of FPGA chips is being developed. Unlike GPUs they are more energy efficient, compact and can be an integral subsystem. An important property of these chips for neural networks implementation is the ability to physically place the network on a crystal. Due to the increasing complexity of networks, this task becomes especially urgent. At the hardware level, the addition operation is organized in such a way that an increase in the number of terms is practically impossible, since this leads to serious costs for the adder. Therefore, the classical approach involves the use of an adder with feedbacks to simulate a neuron. This fact makes it impossible to create a fully digital hardware neural network. As one of the possible approaches to solving the problem under consideration, a transition to a learning logical basis is proposed. This allows you © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 387–393, 2021. https://doi.org/10.1007/978-3-030-60577-3_46
388
T. Mikhailyuk and S. Zhernakov
to completely abandon the usual “macro operations” of addition and multiplication and consider a neuron in the form of a network of the learning logical gates. The network of learning logical gates is a representative of logical networks with the ability to specify the type of mapping of the vector of input signals to the output vector using the learning algorithm. The network is based on the synergistic effect obtained by combining neural networks technology and the apparatus of the algebra of logic. This interaction of the two approaches leads to the emergence of a new class of models which allow to automate the synthesis of the hardware components of a computing device.
2 Learning Logic Gate (LLG) Model In the paper [1], a model of the learning logic gates network with a fixed hidden layer was proposed (1). This network can describe any function of the algebra of logic. ys ¼
N _
wsn ^
n¼1
P ^
p ðn1Þ xM ; p
ð1Þ
p¼1
p 0; a 2 ½i T; ði þ 0; 5Þ T Þ; for T ¼ 2p , i ¼ 0; 1; 2; . . .; 2T 1. 1; otherwise, The main disadvantage of the network is the exponential growth in the number of its elements. A similar approach is considered for diagnosing malfunctions of FPGA lookup tables [2]. For the practical application of the LLG network, it is more advisable to get rid of excess logic units in the learning process. For this purpose, it is necessary to determine the elementary units that make up the network. In [3, 4] it was shown that productivity growth is inversely related to the number of hardware resources, and this dependence is hyperbolic in nature. Since a LLG network can be considered as a neural network computing device in logical basis, it is easy to notice the rapid growth in logic gates number for a two-layer network topology with an increase in the dimension of the input vector. Thus, increasing the depth of the network can reduce the number of logic gates per each hidden layer. Use of this approach in the deep learning can reduce the rapid growth of network complexity by extracting more abstract features in each layer [5]. For the general form of a network, it is necessary to replace Mp ðaÞ with an arbitrary bit variable s, removing the functional dependence. The resulting model (2) is distinguished by the presence of a trained hidden layer.
where Mp ðaÞ ¼
y¼
P1 2_
n¼1
wn ^
P ^ xsppn _ bpn
ð2Þ
p¼1
This approach allows you to train the first layer and minimize the network during the learning process [6]. Moreover, the maximum number of logic gates is halved due to the symmetry of the p-hypercube [7, 8].
Development of the Learning Logic Gate for Optimization
389
Model (2) has a fixed two-layer topology. To define a multilayer network, we obtain a model of a learning logic gate. We write Eq. (2) in the form of a system: 8 P V > s > > xppn _ bpn < un ¼ p¼1
ð3Þ
P1 2W > > > w n ^ un :y ¼
n¼1
After transforming the first equation in (3) according to the De Morgan’s law and changing the variables, we get: un ¼
P ^
P P _ _ s g xsppn _ bpn ¼ bpn ^ xppn 7! un ¼ wpn ^ xppn
p¼1
¼
p¼1
P _ wpn ^ xgppn
p¼1
!0
p¼1
Next, we replace the zero degree of the last expression with the free parameter s and obtain the system: 8 > > > < un ¼
P W
wpn ^
g xppn
!sn
p¼1
P1 > 2W > > :y ¼ w n ^ un
n¼1
This transformation is valid because for sn ¼ 1 the system degenerates into a disjunction: y¼
N _ n¼1
¼
N _
wn ^
P _
wpn ^
xsppn
!1 ¼
p¼1
N P _ _ n¼1
! wn ^ wpn
^ xsppn 7! y
p¼1
xp ^ xsppn
n¼1
After substituting the first expression in the second, we write: y¼
P1 2_
n¼1
wn ^
! P sn _ gpn wpn ^ xp p¼1
This model is equivalent to (2) in terms of the set of implemented functions, however, it can be seen that both layers of the learning logic gate network consist of the same computing units. Now we can write the model of the learning logic gate (4).
390
T. Mikhailyuk and S. Zhernakov
y¼
P _
wp ^ xspp
ð4Þ
p¼1
After transformation (4) according to the law of De Morgan, we obtain a model of the learning logic gate based on the conjunction: y¼
P ^ xspp _ bp
ð5Þ
p¼1
In contrast to expression (1), the resulting models are of practical interest in constructing general combinational circuits using discrete optimization methods. On their basis deep logical networks can be built, minimizing the exponential increase in the number of elements. It is known that any logical function can be represented as a neural network. The basic functions AND, OR and NOT can be replaced by a single neuron. Such approach allows to translate a logical function into a neural network basis. Currently, this mapping has been studied sufficiently and does not have wide application. First of all, this is due to the fact that a neural network requires much more resources, while a logical function is defined initially. Of practical interest is the problem of transition from a neural network basis to a basis of the algebra of logic. In this case, the effect of changing the basis is the optimal use of digital integrated circuits at the hardware level. At the same time, it becomes possible to synthesize a digital circuits using optimization algorithms. In this instance digital circuits acquires network architecture. The proposed approach allows us to do this transition, thereby realizing the possibility of a fully hardware implementation of a neural network.
3 Generalized Ability of the Learning Logic Gates A generalization of a neural network is understood to mean its ability to produce the correct result on a dataset that was not presented to it during the training process but which belonged to the set being processed. For the ability to generalize, the neural network must be trained on a sufficiently large amount of input data [6]. At the same time, the capacity to generalize of each neuron derives from its ability to determine the boundary between sets and not between its individual representatives. A logical extension of such inference is the conclusion about the presence of a generalization in the obtained models of LLGs. The binary feature vector forms a finite set of possible values. In the geometric sense, the feature vector defines an n-hypercube with some distribution of classes 1 and 0. Due to this, the number of possible orientations of separating hyperplane can be specified by a finite number [9]. In addition, the hyperplane itself can be replaced by a discrete set of points belonging to it. Then, in general case, the n-hyperplane can be replaced by some set of logic gates. Figure 1 shows the case of separation of a 3-hypercube by hyperplanes for the cases “at least one” and
Development of the Learning Logic Gate for Optimization
391
“exactly all”. These options correspond to trained neurons to perform 3-AND and 3OR functions.
Fig. 1. Separation of a 3-hypercube into classes using AND and OR elements. The hyperplane is shown by a dashed line. Minimal terms belonging to class 1 are marked with bold dots.
The figure shows that the functions are different only on the threshold of entering variables in classes. Spatial orientations of a 3-hyperplane are equivalent. It is noteworthy that a superposition of such functions implements the required hyperplane. Using the models (4) and (5), it can be shown that due to the presence of the parameter s, the possibility of rotation of the hyperplane appears. The vector of sign variables xs11 ; xs22 ; . . .; xsnn defines a point in the n-dimensional Boolean space. Separation of classes 1 and 0 is considered with respect to this point. The disjunction and conjunction functions define low and high activation thresholds. During LLG training, a hyperplane rotates due to the vector of weights and signs. At the same time, LLG does not require viewing all combinations of labeled data. Therefore, for part of the input data, the output values will be determined by the orientation of the hyperplane. By connecting LLGs into a network, an arbitrary dividing surface can be obtained. The linearly separating hyperplane is the most valuable this research since it allows the synthesis of neurons. Undoubtedly, the number of hyperplane orientations is not enough to model a classic neuron, but the LLG network is capable of this. Thus, the LLG is a primitive intelligent device able of learning and generalization.
4 Experiment Results To simulate a threshold element on a network of learning logic gates, a network configuration method based on a trained model of the McCulloch-Pitts neuron is proposed. Figure 2 shows a structural diagram of a simulation of a neuron. The learning process is divided into two stages. At the first stage, a threshold neuron is
392
T. Mikhailyuk and S. Zhernakov
trained on given samples. The training set is defined by labeled binary vectors. The Hebb rule is used as the learning algorithm. Neuron learning cycle label binary feature vector
McCulloch-Pitts neuron
Hebb rule
Neuron simulation cycle LLG network
Optimization algorithm
predicted value
Fig. 2. The model of simulation the McCulloch-Pitts neuron
At the second stage, the training of the LLG network takes place according to the neuron simulation circuit, and expected value is set by the output of the neuron. Such an experiment allows to evaluate the accuracy of the neuron model mapping to the network of logic gates. As a learning algorithm, several combinatorial optimization algorithms were tested: the branch and bound method, the annealing simulation method, and the genetic algorithm (Table 1). The best results on the accuracy of copying the properties of a neuron belong to a genetic algorithm with tournament selection and homogeneous mutation. Proposed approach to learning can reduce the effect of overfitting for LLG network since the aim of the training is a complete copy of the neuron. The experimental results show the possibility of building high-speed neural networks in the basis of LLGs with a sufficiently small learning error. Table 1. Results of simulating a threshold neuron. Learning algorithm Number of iterations Learning error Average training time, h Simulated annealing 1600 0,14 0,5 Branch and bound 141044 0,095 1 Genetic algorithm 10820 0,053 1.5
Experiments were conducted on a laptop with an Intel Core i5-3230M processor running at 2.6 GHz using 8 GB of RAM, using Manjaro Linux 20.0.2.
5 Conclusion A model of a LLG network with the ability to learn and generalize is proposed. This approach is being used in solving various problems and it is expected to simplify intellectual processing in a multidimensional Boolean space and to increase the
Development of the Learning Logic Gate for Optimization
393
productivity of the learning process. Different investigations in the field of the synthesis of optimal digital circuits demonstrate its importance for the construction of optimal digital neural networks. The developed mathematical apparatus for networks of learning logic gates is universal and can be used for the synthesis of optimal high-speed logic circuits. Using this approach, we can eliminate the certainty of setting a fixed number of logic gates in the hidden layer (1). Based on the use of LLG, it becomes possible to create neural structures of a higher level of organization such as, for example, a linear threshold element. The use of a logic network as a basis of intelligent data processing will contribute to a more rational (optimal) use of FPGA chip resources. At current stage of the development of the LLG network, one of its key problems is the exponential increase in the number of elements of the hidden layer [10]. One of the “classical” methods that allows you to get rid of it is the process of setting the number of elements in such a network [11]. At the same time, the experimental results indicate that adjusting the depth of the learning logic gate network model can solve the problem of a rapid increase in the number of elementary units.
References 1. Mikhailyuk, T., Zhernakov, S.: Implementation of a gate neural network based on combinatorial logic elements. In: Kryzhanovsky, B., Dunin-Barkowski, W. (eds.) Neuroinformatics 2017, Advances in Neural Computation, Machine Learning, and Cognitive Research, pp. 23–32. Springer, Cham (2018) 2. Gorodilov, A.Y.: Metody i algoritmy diagnostiki i rekonfiguratsii logiki vysokonadezhnykh PLIS. Kand. diss. (Methods and algorithms for diagnosing and reconfiguring higly reliable FPGA logic). Perm, PSU (2016). 260 p. 3. Galushkin, A.I.: Neuromathematics. IPRJR, Moscow (2002) 4. Mikhailyuk, T.E., Zhernakov, S.V.: On the question of hardware implementation module streaming encryption for comprehensive information security system. Vestnik UGATU 19 (4), 138–148 (2015) 5. Poggio, T., Mhaskar, H., Rosasco, L., et al.: Why and when can deep-but not shallownetworks avoid the curse of dimensionality: a review. Int. J. Autom. Comput. 14, 503–519 (2017) 6. Osovsky, S.: Neural networks for information processing. Finance and statistics, Moscow (2004) 7. Yablonsky, S.V.: Introduction to Discrete Mathematics, 2nd edn. Nauka, Moscow (1986) 8. Kudryavtsev, V.B., Gasanov, E.E., Podkolzin, A.S.: Introduction to the Theory of Intelligent Systems. MAKS Press, Moscow (2006) 9. Antony, M.: Discrete Mathematics of Neural Networks: Selected Topics. SIAM, London (2001) 10. Nigmatullin, R.G.: The Complexity of Boolean Functions. Nauka, Moscow (1991) 11. Mikhailyuk, T.E., Zhernakov, S.V.: On an approach to the selection of the optimal FPGA architecture in neural network logical basis. Informacionnye tehnologii 23(3), 233–240 (2017)
Approximating Conductance-Based Synapses by Current-Based Synapses Mikhail Kiselev(&), Alexey Ivanov, and Daniil Ivanov Chuvash State University, Cheboxary, Russia [email protected]
Abstract. Conductance-based synaptic model is more biologically plausible than current-based synapse but is much harder implementable on the neuromorphic hardware. This paper tries to answer the question – does approximation of the realistic conductance-based model by the simple current-based synapse lead to significant changes of spiking network behavior and degradation of its characteristics? The results obtained from theoretical analysis and computational experiments support the thesis that the considered simplification of synapse model can be evaluated as acceptable. Keywords: Spiking neuron Current-based synapse Conductance-based synapse Neuromorphic processor Leaky Integrate-and-fire neuron with adaptive threshold Liquid state machine
1 Introduction The last decade is marked by significant progress in neuromorphic processor technology. The specialized VLSI chips capable of simulating in real time neural networks consisting of a hundred thousand neurons and using the operation principles close to neuronal ensembles in the brain became a reality. The two most prominent chips of this kind are TrueNorth [1] (designed by IBM) and Loihi [2] (by Intel). There exist a number of similar projects (for example, the project Altay [3] developed in Russia). These chips have an important common feature – they are all based on the non-vonNeumann architecture and are designed to simulate the most physiologically plausible class of neural networks – spiking neural networks (SNN). SNN are ensembles of simple asynchronously functioning neurons, which communicate sending short pulses of constant amplitude (spikes). The design of these devices is a very difficult technological problem. Its success depends crucially on the proper choice of the tradeoff between simplicity of processing units emulating neurons (that determines the size of the network which can fit the chip) and their flexibility, the set of functional features they possess. The most successful SNN known is the human brain. Correspondingly, one of the important guidelines taken into account in the process of neurochip design is neurophysiological plausibility The present work is a part of the research project ArNI conducted in Chuvash State University as a member of Intel Neuromorphic Research Community (INRC). © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 394–402, 2021. https://doi.org/10.1007/978-3-030-60577-3_47
Approximating Conductance-Based Synapses
395
– it is highly desirable that hardware model of neurons fixed in neuromorphic processors would closely resemble biological neurons. In order to set this tradeoff correctly it is necessary to understand which properties of living neurons are essential and which can be discarded without significant harm to SNN performance. The present study is devoted to one particular question related to this general problem, namely to the choice of the model of synapses. The majority of synaptic connections in the brain are chemical synapses. When action potential in the presynaptic neuron comes at such a synapse, it causes the release of a certain amount of neurotransmitter into the synaptic cleft. The specific receptors in the postsynaptic neuron membrane react to the appearance of neurotransmitter and change the conductance of ion gates or activity of certain ion pumps. When neuron does not obtain input spikes for a long time, its membrane potential u becomes equal to its equilibrium value uREST. If an excitatory synapse receives spike then the effect of the spike is depolarizing, making the membrane potential u closer to its threshold value uTHR (neuron fires when u reaches this value). Inhibitory spike acts in the opposite direction hyperpolarizing the neuron. Since presynaptic spikes change membrane ionic conductance, this neuron model is called conductance-based [4]. In this model, the value of potential change is determined by two factors: the strength of the synaptic connection w (also called its weight) and the current value of potential u, or more precisely, the difference between u and the ultimate potential value ^ u, toward which the incoming spike tries to move u (it is also called reversal potential). For depolarizing excitatory spikes, the reversal potential equals to ^u þ (it is positive); for hyperpolarizing inhibitory spikes, it has negative value ^u . Thus, the effect of incoming spikes in the conductance-based model can be described by the formula u u
u þ wð^u þ uÞ; for excitatory synapse; u wðu ^u Þ; for inhibitory synapse:
ð1Þ
Let us note, that for living neurons u is always negative and is in the range [^ u , uTHR]. The model described is quite simple. However, in many cases, even the simpler model is used. It is called the current-based model. In this model, each incoming spike injects a certain charge into neuron so that the potential change does not depend on its current value. The only requirement is that u should not drop lower than ^ u : u
u u þ v; for excitatory synapse; maxðu v; ^u Þ; for inhibitory synapse:
ð2Þ
Here we denoted synaptic weight by v because it is measured in Volts while w in (1) is dimensionless. The difference between these two models seems minor but it is crucial from the viewpoint of digital neurochips. Indeed, the processing of input spikes is the most frequent operation in the life of real neurons and their hardware imitations. The efficiency of its implementation determines to the great extent the performance of the whole chip. The conductance-based model involves multiplication while the currentbased model uses only addition and min/max operations. Multiplication takes several
396
M. Kiselev et al.
times longer time and requires much more sophisticated digital circuits than addition and min/max. Thus, it is no wonder, that all the above mentioned modern digital neurochips support only the current-based synaptic model (in contrast with partially analog neurochips like BrainScales [5] for which the conductance-based model is more natural). This fact makes actual the question: what can we lose when approximate the more realistic conductance-based model by the simpler current-based model? Is the difference between them really significant from practical point of view? During the last two decades, this question was the subject of several studies. We mention below three of them, most recent and closest to the topic of our article. For example, in the paper [6], the SNNs consisting of neurons with conductance-based and current-based neurons and with similar all other characteristics were compared. The authors did not find significant differences in the first-order spectral properties of the network activity, however, the discovered distinctions in more detailed properties such as cross-neuron correlations were characterized by them as substantial. The research described in [7] was devoted to the simulation of epileptic stroke as a transition of neuronal ensembles considered as dynamical systems from the state with normal dynamics to the abnormal state interpreted as epilepsy. It was shown that conditions for this transition depend significantly on the choice of current-based or conductance-based synaptic model. In the paper [8] it was discussed how to build SNN with conductancebased synapses equivalent (in terms of general network activity statistics) to the given SNN with current-based synapses and what is the degree of this equivalence. The authors of [8] discovered that transition from the conductance-based synaptic model to the current-based one does not lead to significant changes for feed-forward SNNs while the dynamics of recurrent SNNs differ noticeably. Our study has two major distinctions from previous studies: • In the previous research works, the simplest LIF (leaky integrate-and-fire) neuron model was considered. We will study more realistic and functionally rich model, implemented on the modern neuroprocessors (e.g. Loihi [2]), namely, leaky integrate-and-fire neuron with adaptive threshold (LIFAT) [9]. As we will see, it is important difference. • We will evaluate differences between current-based and conductance-based synapses in terms of performance demonstrated by SNN in a practical task, namely, classification, and besides, using the classification technique for which functional diversity of SNN ensembles is crucial – the so called liquid state machine (LSM) [10].
2 Theoretical Analysis Let us try to analyze the difference between current-based and conductance-based synapses from the theoretical point of view. Consider a single synapse of a single neuron. What will be the difference between values of u after a presynaptic spike in these two cases if these values before the spike were the same? The question makes sense if the synapses have a comparable weight. It is natural to consider two synapses
Approximating Conductance-Based Synapses
397
as comparable if the effect of spikes coming to them when u ¼ uREST is equal. In the case of the excitatory synapse, this requirement together with (1) and (2) yields v ¼ wð^u þ uREST Þ:
ð3Þ
Then, let us find the maximum difference Δ between a change of postsynaptic membrane potential resulting from presynaptic spike coming to current-based and conductance-based synapses with the weights bound by (3). We are interested in pure effect of a synapse to synaptic potential – without its reset after firing. Obviously, this difference is greater for stronger synapses (in the limits where the neuron does not fire). On the other hand, the strength of excitatory synapses is limited – usually, several presynaptic spikes are necessary to force a neuron at the resting state to fire. Therefore, it would be reasonable to stipulate that v\ðuTHR uTHR Þ=3. It can be easily shown that for physiological values of the constants entering (1) and (2) (^ u ¼ 83 mV; uREST ¼ þ u and in 65 mV; uTHR ¼ 50 mV; ^u ¼ þ 67 mV [4]), Δ is maximum when u ¼ ^ this case uTHR uREST M¼ 3
^u þ ^u 1 ^u þ uREST
0:68 mV:
ð4Þ
Now, let us consider inhibitory synapses. In this case, there is natural limitation w\1 and, therefore, v\uREST ^u . Then the maximum Δ reached when u is close to uTHR is M ¼ uTHR uREST 15 mV:
ð5Þ
The main result of this, possibly, rough evaluation nevertheless seems to be intuitively plausible – choice of current-based or conductance-based model has a much stronger impact for inhibitory synapses than for excitatory synapses. Moreover, such a small absolute value of the maximum theoretical difference between these models for excitatory synapses enables us to doubt that it may be significant. For this reason, we concentrate on inhibitory synapses in this study and, this time, we will try to find experimental evidence of significant distinctions between the synaptic models explored. Namely, we selected two different classification problems and compared classification accuracy demonstrated by networks with current-based and conductancebased synapses.
3 Search for Experimental Evidence of the Distinctions Between Current-Based and Conductance-Based Synapses 3.1
Sequential MNIST
MNIST [11] is one of the most popular public domain benchmarks for pattern recognition algorithms. It contains 60,000 28 28 pixel gray-level images of handwritten digits. In our experiments, the network has 28 input nodes, one per every
398
M. Kiselev et al.
vertical column of pixels. Thus, every image is represented as 28 time series of pixel intensity, 28 values per series – from top to bottom. Every time series x(t) is converted to the spike form by the following algorithm. An input node has an internal variable, y, whose value is incremented by x(t) every time step t. When y exceeds a certain threshold, H, the input node emits a spike and y is decremented by the value of H. As it is often assumed in SNN studies we take time step in these series equal to 1 ms. Presentations of digits are separated by 32 ms intervals without input spikes. The task is to learn to recognize 10 different digits using this spike representation. 3.2
Moving Light Spot
In this task, input spike signals are generated by imaginary camera with a light spot moving in its view field. This light spot has an isotropic 2D Gaussian distribution of brightness. The view field of the camera is divided into 10 10 squares. The following 11 variables are calculated for each square and each time step: • mean brightness bi – 100 values. • mean brightness dynamics: diþ ¼ maxðbi bi1 ; 0Þ and di ¼ maxðbi1 bi ; 0Þ. – 200 values. • brightness gradients. Every square is bisected by a vertical, horizontal or diagonal line. The difference in the mean intensity between the halves is calculated. If it is negative, it is set equal to 0. It gives 8 gradient values per square. – 800 values. In this task, the 1100 numeric time series are generated. These time series are converted into the spike form by the same algorithm as was described above. Target classes are composed of discretized coordinates and movement directions of the light spot. Namely, the view field is broken to 3 3 square zones and the direction of spot movement is discretized into 8 possible direction ranges, 45˚ each. This gives 9 8 = 72 possible values. It is easy to obtain the spatial position of the spot from the input signals while the determination of spot movement direction is a much more nontrivial task. For our experiments, we choose one of the most general classification methods, the so-called liquid state machine (LSM) [10]. More precisely, we use a modification of the classic LSM described in [12], namely, in our case, excitatory synapses of neurons in the SNN entering the LSM are plastic – their weights change in accordance with a variant of spike timing dependent plasticity (STDP) model [4], described in [13]. We call this modification self-organizing liquid state machine (SOLSM). The decision forest algorithm is used as a classifier in the read-out mechanism of the LSM. In this study, the first 1000 s of the input signal were used to train LSM, the rest part of the signal was used to test it (to determine its accuracy). Another important distinctive feature of the present research is the neuron model selected. All the previous works considered in the Introduction tested the two models of synapses on simple LIF neurons. However, many recent research projects in SNN use a more advanced neuron model – leaky integrate-and-fire neuron with adaptive threshold (LIFAT) [14]. In this model, the threshold membrane potential is not constant but increases for too active neurons preventing further growth of their firing frequency.
Approximating Conductance-Based Synapses
399
The LIFAT model is utilized in actively studied SNN architectures (for example, LSNN [15]). It is considered so important that it was implemented on the hardware level in Intel’s Loihi neuroprocessor. We considered two of its variants – when inhibitory synapses are described by conductance-based (henceforth – CoB) or currentbased model (CuB). Let us describe this model formally. Excitatory synapse receiving spike instantly increases membrane potential by the value of its weight. When inhibitory synapse receives spike, it decreases membrane potential by its weight (CuB) or increases membrane conductance by its weight (CoB). Thus, the state of a neuron at the moment t is described by its membrane potential u(t), its threshold potential uTHR(t) (now it is not constant), and its membrane conductance c(t) (only for CoB). Dynamics of these values are described by the equations: 8 P þ þ du u > ^ ¼ c ð u u Þ þ w d t t > i ij dt s > v > i;j > < P dc c ¼ þ w d t tij ðCoBÞ i dt sc > i;j > > P > duTHR uTHR 1 > ^ ðt ^tk Þ Td þ : dt ¼ sT
ð6Þ
k
or P 8 P þ u > wi d t tijþ w < du i d t tij dt ¼ sv þ i;j ðCuBÞ P i;j duTHR uTHR 1 > ^ ðt ^tk Þ Td ¼ þ : sT dt
ð7Þ
k
and the condition that if u exceeds uTHR then the neuron fires and values of u and c (for CoB) are reset to 0. For sake of simplicity all potentials are rescaled so that after the long absence of presynaptic spikes u ! 0 and uTHR ! 1. The meaning of the other constants in (6) and (7) is the following: sv – the membrane passive leakage time constant; sc – the decay time of inhibitory membrane conductance; sT – the time constant of decreasing uTHR to its base value; wiþ - the weight of i-th excitatory þ synapse; w i - the weight of i-th inhibitory synapse; tij - the time moment when i-th excitatory synapse received j-th spike; tij - the time moment when i-th inhibitory synapse received j-th spike; T^ – uTHR is incremented by this value when the neuron fires at the moment ^tk . For the purpose of optimization of these constants, we used the genetic algorithm. In the next section, we discuss the results obtained by this optimization procedure for two synaptic models and the classification problems described above.
4 Experimental Results We used the following protocol for experiments with the two described below classification tasks.
400
M. Kiselev et al.
1. Genetic algorithm was used to find the optimum parameter combination for the conductance-based model. Genetic optimization stopped when 2 consecutive generations demonstrated no progress in accuracy or when all chromosomes were very similar. For the best network configuration, the mean LSM accuracy is measured for 10 networks with varying detailed connectivity. 2. Since we studied inhibitory synapses, we made sure that inhibitory neurons play a significant role. The inhibitory neurons were removed from the best network, and the accuracy of the LSM with this network was measured. A significant decrease in accuracy was treated as an indication of the importance of inhibitory neurons. 3. If the presence of inhibitory neurons was significant, step 1 was repeated for the current-based model. Its results were compared against the results for the conductance-based model 4.1
Mnist
Optimization of the conductance-based model took 9 generations – after that, all chromosomes became too similar. The best network reached accuracy 92.3 ± 0.4%. It was found that inhibitory neurons play an important role in this network. Elimination of inhibitory neurons led to catastrophic growth of excitation – mean firing frequency exceeded 100 Hz and did not depend on external stimulation. Accuracy dropped down below 10%. After that, the same optimization procedure was applied to the current-based model. The accuracy 92.06% close to the value reached by the conductance-based model was obtained in the 8th generation. After that, further progress was not observed during 3 generations and the optimization was terminated. Thus, the MNIST test does not show that the choice of conductance-based or current-based model is important. 4.2
Moving Light Spot
In this test, the optimization procedure for the conductance-based model took 21 generations. Exploration of the activity record of the best network showed that inhibitory neurons were silent during the whole simulation due to very weak synaptic stimulation from input nodes. For this reason, the choice of inhibitory synapse model is insignificant in this test. Thus, it should be concluded that the experiments with LSM trained to solve the two considered classification problems failed to discover significant differences between conductance-based and current-based synaptic models.
5 Conclusion We failed to find significant differences between conductance-based and current-based models of synapses. The simple theoretical analysis shows that the maximum difference of membrane potential changes after presynaptic spike coming to excitatory conductance-based and current-based synapse of comparable weight is almost
Approximating Conductance-Based Synapses
401
negligible (about 2% of the whole range of the possible membrane potential values). On the contrary, this difference may be very great for inhibitory synapses. Therefore, we tried to find the impact of inhibitory synapse model selection on an SNN property important from the practical point of view, namely, the classification accuracy of the LSM including this network. In the first classification task tested, the difference between the two models was insignificant. In the second problem, the presence of inhibitory neurons themselves was found to be insignificant. We see the two reasons why our results differ from the results reported earlier: • The difference is great only for inhibitory neurons. However, the role of inhibitory neurons in the case of LIFAT model is less important – they support network homeostasis, but in the LIFAT model, the same effect is provided by threshold adaptivity. • We studied the effect of the model selection on the high-level network characteristics important for applications instead of detailed network activity properties. It is evident, that while the statement about the unimportance of the choice “conductance-based vs. current-based” cannot be proven rigorously like a mathematical theorem, it needs more explorations on other tasks, under other conditions, and we plan to perform these experiments, but the present results can be already viewed as a strong evidence supporting our thesis.
References 1. Merolla, P.A., et al.: A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673 (2014) 2. Davies, M., et al.: Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018) 3. Grishanov, N.V., et al.: Energo-efficient computations using the neuromorphic processor Altay. In: Proceedings of Microelectronics-2019, p. 592, Moscow, Technosphera (2019). (in Russian) 4. Gerstner, W., Kistler, W.: Spiking Neuron Models. Single Neurons, Populations, Plasticity. Cambridge University Press, Cambridge (2002) 5. Schemmel, J., et al.: A wafer-scale neuromorphic hardware system for large-scale neural modeling. In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pp. 1947–1950 (2010) 6. Cavallari, S., Panzeri, S., Mazzoni, A.: Comparison of the dynamics of neural interactions between current-based and conductance-based integrate-and-fire recurrent networks. Front Neural Circuits. 8, 12 (2014) 7. Andre, D.H., et al.: A homotopic mapping between current-based and conductance-based synapses in a mesoscopic neural model of epilepsy. arXiv:1510.00427v2 [q-bio.NC] (2018) 8. Stöckel, A., Voelker, A.R., Eliasmith, C.: Point Neurons with Conductance-Based Synapses in the Neural Engineering Framework. ArXiv, abs/1710.07659. (2017) 9. Huang, C., et al.: Adaptive spike threshold enables robust and temporally precise neuronal encoding. PLoS Comput. Biol. 12(6), e1004984 (2016) 10. Maass, W.: Liquid state machines: motivation, theory, and applications, in Computability in context: computation and logic in the realworld. World Scientific, pp. 275–296 (2011)
402
M. Kiselev et al.
11. http://yann.lecun.com/exdb/mnist/ 12. Kiselev, M.: Chaotic spiking neural network connectivity configuration leading to memory mechanism formation. In: NEUROINFORMATICS 2019. SCI, vol 856, pp. 398–404 (2020) 13. Kiselev, M., Lavrentyev, A.: A preprocessing layer in spiking neural networks – structure, parameters, performance criteria. In: Proceedings of IJCNN-2019, Budapest, paper N-19450 (2019) 14. Gerstner, W., et al.: Neuronal Dynamics: From Single Neurons to Networks and Models of Cognition. Cambridge University Press, Cambridge (2014) 15. Bellec, G., et al.: Long short-term memory and learning-to-learn in networks of spiking neurons. In: Advances in Neural Information Processing Systems, Montréal, pp. 787–797 (2018)
STDP-Based Classificational Spiking Neural Networks Combining Rate and Temporal Coding Aleksandr Sboev1,2(B) , Danila Vlasov1,2 , Alexey Serenko1 , Roman Rybka1 , and Ivan Moloshnikov1 1
2
NRC “Kurchatov Institute”, Moscow, Russia [email protected] MEPhI National Research Nuclear University, Moscow, Russia
Abstract. This paper proposes a classification algorithm comprising a multi-layer spiking neural network with Spike-Timing-Dependent Plasticity (STDP) learning. Two layers are trained sequentially. In the rate encoding layer, learning is based on the STDP effect of output spiking rate stabilization. The first layer’s output rates are re-encoded into spike times and presented to the second layer, where learning is based on the effect of memorizing repeating spike patterns. Thus, the first layer acts as a transformer of the input data, and the temporal encoding layer decodes the first layer’s output. The accuracy of the two-layer network is 96% on the Fisher’s Iris dataset and 95% on the Wisconsin breast cancer dataset, which outperforms a sole first layer if the latter involves decoding by rule-interpreting output spike rates. The result resolves the lack of efficient decoding rules for the rate-stabilization-based learning algorithm, and shows the principal possibility of stacking layers with different input encoding and learning algorithms. Keywords: Spiking neural networks Classification
1
· Synaptic plasticity · STDP ·
Introduction
Spiking neural networks are a new generation of artificial neural networks with local weight change rules based on synaptic plasticity and with biological neuron models. Information in these networks is processed in the form of current pulses called spikes. Their scientific significance is related to the possibility of their implementation in modern neuromorphic computing devices with low energy consumption. There are a number of approaches to training spiking neural networks with various methods of encoding and decoding the input data. The preferred approach for practical implementation on neuromorphic hardware is based on a local synaptic plasticity mechanism Spike-Timing-Dependent Plasticity (STDP). [1] Previously, methods were proposed [8] both for training spiking neural networks c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 403–411, 2021. https://doi.org/10.1007/978-3-030-60577-3_48
404
A. Sboev et al.
with rate encoding of input data (training is based on the effect of output rate stabilization [6]) and with temporal encoding (training is based on memorizing repeating patterns [3]). However, the decoding rules developed for the rate approach were inferior in efficiency compared to decoding using machine learning methods, in particular, using the gradient boosting classifier. The aim of this work is to study the possibility of using the spiking neural network layer with temporal encoding as a decoder. As a result, a model is proposed that consists of two sequentially trained layers: the first with rate encoding of the input data, and the second, which performs the decoding of the first layer outputs, uses temporal coding. The proposed model’s classification accuracy on the Fisher’s iris dataset and the Wisconsin breast cancer dataset is compared with the accuracy of single-layer models.
2
Models
Both networks, the one with rate encoding and the one with temporal encoding, are comprised of leaky integrate-and-fire (LIF) neurons whose incoming synapses possess additive Spike-Timing-Dependent Plasticity. In additive STDP the change Δw of a synaptic weight depends only on the interval between the arrival of a presynaptic spike tpre and the moment of emitting a postsynaptic spike tpost : ⎧ ⎨−αλ · exp − tpre −tpost , if tpre − tpost > 0; τ − Δw = (1) ⎩λ · exp − tpost −tpre , if tpre − tpost < 0 τ+ The weight is clapped to the range 0 ≤ w ≤ 1 to prevent its unlimited growth: if w + Δw > 1, then Δw = 1 − w; if w + Δw < 0, then Δw = −w. Model parameters λ = 0.03, α = 0.65, and τ+ , τ− are chosen separately for each layer and each dataset. The rule that triggers a weight update under Eq. (1) is, however, different for the two networks. An important part of the STDP model is the scheme [4] of which spikes to take into account as tpre and tpost in the rule (1). For the network with temporal encoding we use all-to-all scheme, where a postsynaptic spike triggers multiple simultaneous weight increase events with that moment as tpost and times of all preceding presynaptic spikes as tpre in Eq. (1). Analogously, each presynaptic spike causes weight decrease events with all postsynaptic spikes that occurred before it. The network with rate encoding uses the restricted symmetric nearest-neighbour spike pairing scheme, because it is with this scheme that STDP exhibits the output rate stabilizing effect on which the learning is based. In this scheme, only adjacent pairs of spikes trigger weight change, facilitation in the post-after-pre case and depression in the pre-after-post case.
STDP-Based SNN Combining Rate and Temporal Coding
405
The LIF neuron was chosen for both networks due to its computational simplicity, while the STDP properties on which both learning algorithms under consideration are based, rate stabilization and memorizing repeating spike patterns, were previously shown numerically with this neuron model. In this model the membrane potential V obeys − (V (t) − Vrest ) Isyn (t) dV = + . dt τm Cm As soon as V exceeds the threshold Vth , V is reset back to the resting potential Vrest , and during the refractory period τref the neuron is insensitive to incoming spikes. When the neuron is not refractory, incoming spikes add exponential pulses to the postsynaptic current Isyn :
over synapses i
over spike times tisp at i-th synapse
Isyn =
wi
tisp
t−ti q sp syn − τsyn e Θ t − tisp , τsyn
where τsyn = 5 ms, qsyn = 5 fC, wi is the weight of the synapse, and Θ is the Heaviside step function. The neuron constants are chosen according to preliminary research [7]: Vrest − 0 mV, Vth = 1 mV, Cm = 1 pF. The neuron leakage constant τm is adjusted separately for each layer and for each classification task.
3
Data
3.1
Fisher’s Iris
The Fisher’s iris dataset1 consists of 150 flowers divided into three classes, of 50 flowers each, corresponding to three species: Iris Setosa Canadensis, Iris Virginica and Iris Versicolor. A flower is described by a vector of four numbers: sepal and petal length and width in centimeters. 3.2
Wisconsin Breast Cancer
Published by the University of Wisconsin, the dataset2 consists of 569 samples describing breast puncture tests, divided into two classes: 357 benign and 212 malignant. A sample is a vector of 30 numbers characterizing cell nuclei in the breast mass, namely, mean, standard deviation and worst (over all nuclei) values of ten characteristics of a nucleus: radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry and fractal dimension.
1 2
https://archive.ics.uci.edu/ml/datasets/Iris. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+ %28Diagnostic%29.
406
4 4.1
A. Sboev et al.
Methods Preprocessing
Input vectors x are normalised so that each component xi lies in the range from 0 to 1: i xi − Xmin , (2) xi ← i i Xmax − Xmin i where Xmin denotes the minimum of i-th component over all vectors of the i i = min ui , and accordingly, Xmax = max ui . training set X: Xmin u∈X
u∈X
After that, the input data is processed with Gaussian receptive fields [2,9,10], which transform an N -dimensional input vector x of the dataset X to a vector of a higher dimension N · M : {g(x1 , μ0 ), . . . , g(x1 , μM ), . . . , g(xN , μ0 ), . . . , g(xN , μM )}. Each component xi is expanded into M components g(xi , μ0 ), . . . , g(xi , μM ): i (x − μj )2 g(xi , μj ) = exp . (3) σ2 i i i + (Xmax − Xmin )· Here μj = Xmin
4.2
j M −1
is the center of j-th receptive field.
Scheme of the Layer for Rate Coding
After the above normalization and processing by receptive fields, the input vectors are encoded into spike sequences which are presented to the input of the neural network model: each component x1 , . . . , xN of a pre-processed input vector x is mapped to K input synapses of each neuron, and these synapses receive Poisson sequences of spikes with the duration of T and the following mean frequencies: p1 (x1 ), . . . , p1 (xN ), . . . , pK (x1 ), . . . , pK (xN ) p(x) = νlow + x · νhigh
(4)
Here the values K = 2, νlow = 2 Hz, and νhigh = 60 Hz were found earlier [8]. The scheme of the layer with rate encoding is presented in Fig. 1A. The network consists of one layer, comprised of one neuron for each class in the input data, each neuron only receiving input samples of its corresponding class during training. The neurons are not connected to each other. The functioning of this layer is based on the effect of stabilizing the output frequency under STDP with the nearest-neighbor-type spike pairing scheme: upon successful training, weights are set so as to fire spikes with similar output rates in response to all samples of the class on which the neuron was trained. After training, this output frequency (see Fig. 2) will be an indicator that the neuron receives a sample of the class in which it was trained, and thus each neuron learns to distinguish its class from the rest.
STDP-Based SNN Combining Rate and Temporal Coding
A First class samples
input1 input 2
Second class samples
input 1 input 2
B output 1
output 1
input N·M·K
output 2
input N·M·K Third class samples
407
input 1 input 2 input N·M·K
All class samples
input 1 input 2 input L*
output 2
output 3
output 3
* L - the number of classes multiplied by number of ensembles
Fig. 1. Topology of the layer with rate encoding (A) and the layer with temporal encoding (B).
Previously [8], the applicability of such a training scheme was shown for two methods of decoding the neurons’ output rates into class labels: “by ratebased rules” and “using Gradient Boosting”. In both methods, at the testing stage STDP plasticity is turned off, both the training and the test samples are presented to the model, and the relation between class labels of the test sample and the output spiking rates of the neurons in response to vectors from the test sample is determined on base of the output rates in response to the training sample. Decoding “by rate-based rules” is inspired by the spiking rate stabilization effect: an vector of a test sample is considered to belong to the “own” class of the neuron whose output frequency in response to this vector differs the least from that neuron’s average frequency in response to the vectors of its own class. Decoding using Gradient Boosting (from the sklearn package [5] with default parameters), further called GBM, is carried out by training it on the output frequencies of all neurons in response to the training set, after which the classifier is to predict the class labels by the output frequencies of neurons in response to test vectors. In this paper, the accuracy of the two decoding methods described above is compared with the accuracy of a temporal encoding neural network model, for which the output frequencies of the rate encoding model act as its input data. 4.3
Scheme of the Layer for Temporal Coding
The scheme with temporal encoding is based on the effect of memorizing repeating spike patterns. In temporal coding, a preprocessed input vector x corresponds to spike times vector t: t=x·T (5)
408
A. Sboev et al.
During a time window T each synapse receives one spike at the moment ti from 0 to T . The topology of the network layer is presented on Fig. 1B. Output neurons are interconnected by static inhibitory synapses and receive all class samples randomly during learning. Additionally, the neuron corresponding to the class of the current input example is stimulated by a reinforcement current so that to force it to fire an output spike early at the beginning of presenting that input example. An output spike, due to inhibitory connections, depresses the membrane potential of other neurons of the layer. That way, STDP strengthens the synaptic connections of the output neuron corresponding to the current example’s class with the inputs that receive the earliest input spikes. Since others classes spikes comes later than produced by reinforcement signal spikes, synaptic weights receiving other’s classes spikes will fall to zero. At the testing stage, no reinforcement current is applied, and the class of an input example is determined by which neuron emits the first spike in response to it. 4.4
Proposed Two-Layer Method
The method considers two-layer spiking network: the first layer uses the topology for rate coding (described in Sect. 4.2), and the second one uses the topology for temporal coding (described in Sect. 4.3). Learning is performed sequentially using the following algorithm: 1. Data is normalized (according to Eq. 2), processed by receptive fields (3) and encoded into spike trains (4). 2. Data is split into 5 independent folds for cross-validation. 3. The first layer (depicted in Fig. 1A) is trained with rate-encoded input. 4. The output frequencies of the trained layer in response to the training dataset are recorded. 5. The frequencies obtained are normalized (2) and encoded into spike times (5). 6. Examples encoded into spike times are used as input for training the temporal-encoding layer (Fig. 1B). 7. The accuracy of the trained network is assessed on the testing dataset by F1-score.
5
Experiments
The proposed method is compared with the efficiency of a single-layer model with temporal coding, as well as a single-layer rate-coding model with various methods of output rates decoding (extracted at step 5 of the above algorithm). Ensembling is applied, independently training 5 instances of the first layer and feeding the output rates of the trained ensemble as a single vector to the second layer. Table 1 shows the classification accuracy (average, minimum and maximum values among the cross-validation partitions) on two datasets with different training schemes.
STDP-Based SNN Combining Rate and Temporal Coding
409
Table 1. Classification F1-score of the proposed ensemble compared to its separate components Model
Encoding
Gradient Boosting —
Decoding
Accuracy Iris Cancer mean min max mean min max
—
96
87 100 95
92 98 88 91
SNN [8]
Temporal coding First spike
97
93 100 90
SNN [8]
Rate coding
Rules
97
93 100 90
88 92
SNN ensemble
Rate coding
Rules
97
93 100 91
87 94
SNN ensemble
Rate coding
GBM
95
90 100 94
93 95
SNN ensemble
Rate coding
SNN (Fig. 1B) 96
93 100 95
94 97
Fig. 2. Output frequencies distribution of the trained first layer on the task of Fisher’s Iris (left) and Breast Cancer Wisconsin (right).
Table 1 shows that the use of ensembles for the breast cancer dataset gives a 1% increase in accuracy compared to a previously published spiking network without ensembles [8]. For a rate-coding layer, its decoding methods or ruleinterpreting its output rates or training GBM on them can be replaced with the time-coding layer, which leads to the same accuracy for Fisher’s Iris dataset and increased accuracy for the breast cancer dataset. Figure 2 shows the distribution of the output frequencies of the first layer neurons in response to the training set examples. The frequency distribution of neuron 1 in response to class 1 has a peak in the low-frequency region, neuron 2 in response to class 2, and so on. Thus, the neurons of the first layer learn to fire spikes with a stationary frequency lower in response to examples of their own classes (Fig. 2). The decoding layer neurons after training establish high synaptic weights only with those first layer
410
A. Sboev et al.
neurons that are trained on their class: the decoding layer neuron responsible for class 1 will strengthen synaptic weights with the first layer neurons trained on class 1, and so on.
6
Conclusion
The results demonstrate the effectiveness of the method for constructing classification neural networks based on two independently trained layers: the first with frequency coding of data and training based on the output rate stabilization effect; and the second with temporal encoding. The proposed method demonstrates the same accuracy as if the second layer of the model is replaced with a formal classifier of gradient boosting, which confirms the effectiveness of the second layer as a decoder of the first layer’s outputs. Moreover, this method is more stable on different tasks than decoding the first layer output frequencies using the previously applied rules.
References 1. Feldman, D.E.: The spike-timing dependence of plasticity. Neuron 75(4), 556–571 (2012). http://www.sciencedirect.com/science/article/pii/S0896627312007039 2. G¨ utig, R., Sompolinsky, H.: The tempotron: a neuron that learns spike timingbased decisions. Nat. Neurosci. 9(3), 420–428 (2006) 3. Masquelier, T., Guyonneau, R., Thorpe, S.J.: Spike Timing Dependent Plasticity finds the start of repeating patterns in continuous spike trains. PLoS ONE 3(1), e1377 (2008) 4. Morrison, A., Diesmann, M., Gerstner, W.: Phenomenological models of synaptic plasticity based on spike timing. Biol. Cybern. 98, 459–478 (2008) 5. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 6. Sboev, A., Serenko, A., Rybka, R., Vlasov, D.: Influence of input encoding on solving a classification task by spiking neural network with STDP. In: Proceedings of the 16th International Conference of Numerical Analysis and Applied Mathematics, vol. 2116, pp. 270007:1–270007:4 (2019). http://aip.scitation.org/doi/abs/10. 1063/1.5114281 7. Sboev, A., Serenko, A., Rybka, R., Vlasov, D., Filchenkov, A.: Estimation of the influence of spiking neural network parameters on classification accuracy using a genetic algorithm. In: Postproceedings of the 9th Annual International Conference on Biologically Inspired Cognitive Architectures (BICA), vol. 145, pp. 488–494 (2018). http://www.sciencedirect.com/science/article/pii/S1877050918323998 8. Sboev, A., Serenko, A., Rybka, R., Vlasov, D.: Solving a classification task by spiking neural network with STDP based on rate and temporal input encoding. Math. Methods Appl. Sci. (2020). https://onlinelibrary.wiley.com/doi/abs/10.1002/mma. 6241
STDP-Based SNN Combining Rate and Temporal Coding
411
9. Wang, J., Belatreche, A., Maguire, L., McGinnity, T.M.: An online supervised learning method for spiking neural networks with adaptive structure. Neurocomputing 144, 526–536 (2014). http://www.sciencedirect.com/science/article/pii/ S0925231214005785 10. Yu, Q., Tang, H., Tan, K.C., Yu, H.: A brain-inspired spiking neural network model with temporal encoding and learning. Neurocomputing 138, 3–13 (2014)
Solving Equations Describing Processes in a Piecewise Homogeneous Medium on Radial Basis Functions Networks Dmitry A. Stenkin(&)
and Vladimir I. Gorbachenko
Penza State University, Penza, Russia [email protected], [email protected]
Abstract. The solution of boundary value problems describing piecewisehomogeneous media on networks of radial basis functions is considered. The proposed algorithm is based on solving individual problems for each area with different properties of the medium and using a common error functional that takes into account errors at the interface between the media. This removes the restrictions on the use of radial basis functions and allows the use of radial basis functions with both unlimited and limited definition areas. We used the fast algorithm proposed by the authors for training networks of radial basis functions by the Levenberg-Marquardt method with analytical calculation of the Jacobi matrix. The algorithm makes it possible to reduce the number of iterations by several orders of magnitude compared to the gradient descent algorithm currently used and to obtain the accuracy of the solution, which is practically unattainable by the gradient descent algorithm. The results of solving the model problem showed the effectiveness of the proposed algorithm. Keywords: Piecewise homogeneous medium Partial differential equations Radial basis functions networks Neural network learning LevenbergMarquardt method
1 Introduction Only some boundary value problems described by partial differential equations (PDE) can be solved analytically. In most cases, equations are solved numerically by finite difference and finite element methods. A promising direction is the solution of PDE by meshless methods using radial basis functions (RBF), based on the principles proposed by E. Kansa [1, 2] (a review of the methods is given in [3]). These methods do not require the construction of a connected grid and allow one to obtain an approximate differentiable analytical solution in the form of a sum of basis functions multi-plied by weights. When using RBF, it is necessary to calculate the vector of weights for the selected RBF parameters so that the obtained approximate solution satisfies the equation and boundary conditions on a certain set of sampling points. To calculate the weights from the conditions that the residuals are equal to zero at test points inside and at the boundary of the solution domain, a system of linear algebraic equations is formed. The main disadvantages of using RBF are difficult to formalize the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 412–419, 2021. https://doi.org/10.1007/978-3-030-60577-3_49
Solving Equations Describing Processes in a Piecewise Homogeneous Medium
413
selection of RBF parameters and the need to solve ill-conditioned systems of linear algebraic equations. The problem of informal choice of RBF parameters is solved by using machine learning to calculate both weights and RBF parameters. With this approach, the PDE solution is the training of a special type of neural network called the radial basis function network (RBFN) [4–6]. Solving PDE on RBFN is an approximation of an unknown solution. RBFN are universal approximators of functions, which is proved in [7]. This means that for any continuous function f ðxÞ of many variables there is an RBFN with many centers and a width common to all functions, such that the function realized by the network is close to f ðxÞ. By proximity is meant the mean square proximity. When using RBFN, tasks are typically considered that describe processes in homogeneous environments. But many practically important tasks, for example, the problems of thermal conductivity [8], modeling of oil reservoirs [9, 10], modeling of groundwater [11] require consideration of piecewise-homogeneous media. The complexity of using RBF to solve problems that describe processes in piecewise homogeneous media is determined by two factors: the nonlocal nature of RBF (each RBF affects the solution in the entire solution area) and the inability to model the discontinuity of the derivative of the solution on the border of areas. There is a known approach to modeling using RBF processes in piecewise homogeneous media, based on the finite-difference approximation of differential operators in the problem [12, 13]. The approach is based on the fact that the finite difference method ensures the fulfillment of ideal conjugation conditions on the border of areas [14]. But with this approach, the error of the difference approximation is introduced. In [15], the onedimensional problem of the theory of elasticity for a piecewise homogeneous medium is solved without using the finite difference method. The solution area is divided into two area, corresponding to areas with different properties of medias. Each area uses RBF with compactly supported functions. The RBF of each area affects the decision only in its own area. Additional conjugation conditions are introduced at the media conjugation boundary. In this paper, we propose an approach to solving PDE in piecewise homogeneous media, the features of which are: • It’s a using of RBFN, which allows to use of an adjustable functional basis. • Separate iterative problem solving for subdomains with different environmental properties and using a common error functional that takes into account the residuals of the error solution at the interface between media. This removes the restrictions on RBF and allows you to apply both RBF with both unlimited and limited areas of definition. • Using the fast RBFN learning algorithm developed by the authors by the Levenberg-Marquardt method with analytical calculation of the Jacobi matrix. The algorithm allows one to reduce the number of iterations by several orders of magnitude in comparison with the gradient descent algorithm currently used and to obtain a solution accuracy that is practically unattainable by the gradient descent algorithm [16–18].
414
D. A. Stenkin and V. I. Gorbachenko
2 Solution Algorithm Consider the solution of the problem, describing the processes in a piecewise homogeneous medium on the model problem described by the equation @ @u @ @u ri ðx; yÞ ri ðx; yÞ þ ¼ f ðx; yÞ; @x @x @y @y
ðx; yÞ 2 X;
uðx; yÞ ¼ pðx; yÞ; ðx; yÞ 2 @X;
i ¼ 1; 2;
ð1Þ ð2Þ
where X—the solution domain; @X—the boundary of the region; f ¼ sinð2px1 Þ sinðpx2 Þ, p ¼ 0, ri —function describing the properties of the medium. The solution domain is the square 1 1, at x ¼ 0:5 divided by a vertical line into two area with different constant values ri , i ¼ 1; 2. On the border of areas conjugation conditions must be met du1 du2 u1 jS ¼ u2 jS ; r1 ¼ r2 : ð3Þ dx S dx S Solution (1)–(3) is made at RBFN. RBFN is a two-layer network [19]. The first layer consists of RBF that implement a nonlinear transformation of the coordinate vector of the point at which the approximation to the solution is calculated. The second RBFN layer is a linear weighted adder: uðxÞ ¼
nRBF X
wm um ðx; pm Þ;
m¼1
where nRBF —the number of RBF, wm — weight RBF um , pm —parameter vector RBF um . In this paper, we use the Gauss function (Gaussian), uðjjx cjj; aÞ ¼ 2 , where c—the position of the function center, a—the shape parameter, exp jjxcjj 2a2 often called the width. We consider the solution of problem (1)–(3) as the solution of two problems for regions 1 and 2, taking into account the conjugation conditions (3). Solving PDE on RBFN is mathematically a minimization of the loss function. Network learning is the minimization of the loss function, which is the sum of the squared residuals of the approximation of the solution at the sampling points inside and at the boundary of the solution domain. In the case of solving the problem for a piecewise homogeneous medium, we introduce additional terms in the loss function in the form of the sum of squared residuals on the interface between the media for the condition of equal
Solving Equations Describing Processes in a Piecewise Homogeneous Medium
415
solutions and flows. The loss function for problem 1 has the form (the function for problem 2 has a similar form) J1 ¼
N1 X
½L1 u1 ðx1i Þ f1 ðx1i Þ2 þ k1
i¼1
K1 X
B1 u1 ðx1j Þ p1 ðx1j Þ
j¼1
2
2 T T X X
2 @u1 xSj @u2 xSj r2 þ k3 u1 xSj u2 xSj þ k4 r1 ; @x @x j¼1 j¼1
ð4Þ
where L1 —the differential operator, B1 —the boundary condition operator, N1 —the number of sampling points in the inner region of X, K1 —the number of sampling points on the border of @X, T—the number of sampling points on the border of areas —T, k1 ; k2 ; k3 ; k4 —penalty factors, x1i , x1j and xSj —the coordinates of the test points inside, on the border of the area and on the border between the media. The solution process is iterative and uses two RBFN. For each iteration using RBFN, one step is taken to minimize the loss (4) for region 1, using the function
solution at the media interface and flow r2 @u2 xSj @x of region 2 from the previous iteration. Then, the minimization step for region 2 is similarly performed using the obtained solution values at the media border and flow for region 1. Iterations continue until the root mean square error of the residuals inside, at the boundary of the region, and also for equality solutions and flows for each environment will not differ by a small amount. PDE solution is RBFN learning, during which weights and network parameters are adjusted (when using Gaussian, these are center coordinates and width). Since the PDE solution is formed during RBFN learning, it is important to use fast network learning algorithms. Usually for learning RBFN, slow algorithms based on the gradient descent method are used. In [6], a fast algorithm of trust region for learning RBFN was proposed. In [16–18], the Levenberg-Marquardt method algorithm [20] was proposed for RBFN learning. The Levenberg-Marquardt method is equivalent to the trust region method [21], but is easier to implement, since it does not require solving the conditional optimization problem at each iteration. In the Levenberg-Marquardt algorithm, the correction of the vector h of parameters RBFN, composed of weights and parameters of RBF, is performed at the iteration k according to the formula hðk þ 1Þ ¼ hðkÞ þ Dhðk þ 1Þ in which the vector of correction of parameters DhðkÞ is found from the solution of a system of linear algebraic equations
JTk1 Jk1 þ lk E DhðkÞ ¼ gk1 ;
where Jk1 and Jk —Jacobi matrices calculated in k 1 and k iteration, E—identity matrix, lk —regularization parameter that changes at each step of learning, g ¼ JT r— gradient vector of functional according to the vector of h parameters, r—residual vector at internal and boundary sampling points. The Jacobi matrix is a matrix of derivatives of the residual vector r with respect to the elements of the parameter vector.
416
D. A. Stenkin and V. I. Gorbachenko
Let’s represent the Jacobi matrix in block form . Matrix blocks have a size ðN1 þ K1 þ 2T Þ nRBF (similar for area 2). Blocks contain derivatives of residuals in network parameters. Discrepancies are equal: ri ¼ L1 u1 ðx1i Þ f1 ðx1i Þ, i ¼ 1; 2; . . .; N1 —residuals at test points in the solution area, ri ¼ B1 u1 ðx1i Þ p1 ðx1i Þ, i ¼ N1 þ 1; N1 þ 2; . . .; N1 þ K1 —residuals at test points on the boundary of the solution domain, ri ¼ u1 ðxSi Þ u2 ðxSi Þ, i ¼ N1 þ K1 þ 1; N1 þ K1 þ 2; . . .; N1 þ K1 þ T—residuals at test points on the interface between media by the condition of equality of solutions, ri ¼ r1 @u1@xðxSi Þ r2 @u2@xðxSi Þ, i ¼ N1 þ K1 þ T þ 1; N1 þ K1 þ T þ 2; . . .; N1 þ K1 þ 2T—residuals at test points on the interface between media by the condition of equality of flows. Given the differentiability of RBF and the structure of RBFN, analytical expressions for the Jacobi matrix were obtained. Laplacian at a sample point xi is " kxck k2 nRBF @ 2 ui @ 2 ui X 2 wk e 2ak Dui ¼ 2 þ 2 ¼ @xi1 @xi2 k¼1
kxi ck k2 2a2k a4k
!# :
Then the elements of the matrix Jw for internal test points are equal @ri kxi cm k2 2a2m ¼ u m ð xi Þ ; @wm a4m
i ¼ 1; 2; . . .; N1 ;
m ¼ 1; 2; . . .; nRBF :
For boundary test points, calculations are performed according to the formula @ri ¼ um ðxi Þ; @wm
i ¼ N1 þ 1; N1 þ 2; . . .; N1 þ K1 ;
m ¼ 1; 2; . . .; nRBF :
Given that the solution at the interface at point xSi , is u1 ðxSi Þ ¼
nP RBF1
wk e
kxSi ck k2 2a2 k
, we
k¼1
get @ri ¼ um ðxSi Þ; i ¼ N1 þ K1 þ 1; N1 þ K1 þ 2; . . .; N1 þ K1 þ T; @wm m ¼ 1; 2; . . .; nRBF :
Normal h nP RBF1 k¼1
derivative
to
the
i wk ðxSi1 ck1 Þ u ð x Þ : Then 2 Si k a k
interface @ri @wm
at
the
point
xSi
is
@u1 ðxSi Þ @xSi1
¼
¼ r1 ðxSi1a2cm1 Þ um ðxSi Þ, where i ¼ N1 þ K1 þ m
T þ 1; N1 þ K1 þ T þ 2; . . .; N1 þ K1 þ 2T; m ¼ 1; 2; . . .; nRBF . Similarly, the remaining elements of the blocks of the Jacobi matrix are calculated. The differential operators of the loss function are also calculated analytically.
Solving Equations Describing Processes in a Piecewise Homogeneous Medium
417
3 Experimental Study The experiments were carried out in the MATLAB system on a computer with an Intel Core i5 8500 processor, a frequency of 3.0 GHz and a RAM capacity of 16.0 GB. Gauss functions were used as RBF. For both tasks, the number of RBF is 64. For both tasks, 80 test points were used inside the solution area and at the boundary, 20 test points on the media conjugate line. The initial value of the weights is random numbers from zero to 0.001. The initial values of the components of the width vector for two regions are 0.3. All penalty factors are 100. The initial value of the regularization parameter is 10. Iterations were performed until the root mean square error of 106 was reached. The centers were regularly located on a square grid of size 8 8. RBFN learning was conducted by the Levenberg-Marquardt method. The solution is achieved on average in 570 iterations, 35.9 s. (due to the random initialization of the network, the number of iterations varies in different experiments). In Fig. 1 shows the location of the centers, weights, and width of RBF. The width is shown by the diameters of the circles, the values of the weights are conditionally shown by the intensity of the fill. Figure 1 shows the importance of tuning not only the weights, but also the RBF parameters.
Fig. 1. The centers, weights, and width of RBF: a) before learning the network; b) Region 1 after learning network; c) Region 2 after learning network
418
D. A. Stenkin and V. I. Gorbachenko
Fig. 2. Graphics solutions: a) numerical solution; b) graph of the numerical solution in the middle section
In Fig. 2 shows solution schedules. Figure 2 shows the gap in the values of the derivatives of the solution on the border of areas.
4 Conclusion An algorithm is proposed for solving problems those describe processes in a piecewise homogeneous medium based on networks of radial basis functions. As the network learning algorithm, the Levenberg-Marquardt algorithm is implemented. An rootmean-square error residual of 107 was achieved. So, the proposed algorithm provides an effective approximate solution to the problem for piecewise-homogeneous media.
References 1. Kansa, E.J.: Multiquadrics—a scattered data approximation scheme with applications to computational fluid-dynamics—I surface approximations and partial derivative estimates. Comput. Math Appl. 19(8–9), 127–145 (1990) 2. Kansa, E.J.: Multiquadrics—a scattered data approximation scheme with applications to computational fluid-dynamics—II solutions to parabolic, hyperbolic and elliptic partial differential equations. Comput. Math Appl. 9(8–9), 147–161 (1990) 3. Chen, W., Fu, Z.-J.: Recent Advances in Radial Basis Function Collocation Methods. Springer, Heidelberg (2014) 4. Yadav, N., Yadav, A., Kumar, M.: An Introduction to Neural Network Methods for Differential Equations. Springer, Dordrecht (2015) 5. Vasiliev, A.N., Tarkhov, D.A.: Neural network modeling: Principles. Algorithms. Applications. St. Petersburg Polytechnic University Publishing House, St. Petersburg (2009) 6. Gorbachenko, V.I., Zhukov, M.V.: Solving boundary value problems of mathematical physics using radial basis function networks. Comput. Math. Math. Phys. 57(1), 145–155 (2017)
Solving Equations Describing Processes in a Piecewise Homogeneous Medium
419
7. Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246–257 (1991) 8. Carslaw, H.S.: Introduction to the Mathematical Theory of the Conduction of Heat in Solids. Franklin Classic, Blackwell (2018) 9. Ertekin, T., Sun, Q., Zhang, J.: Reservoir Simulation: Problems and Solutions. Society of Petroleum Engineers, Richardson (2019) 10. Islam, M.R., Abou-Kassem, J.H., Farouq-Ali, S.M.: Petroleum Reservoir Simulation: The Engineering Approach. Gulf Professional Publishing, Houston (2020) 11. Anderson, M.P., Woessner, W.W., Hunt, R.J.: Applied Groundwater Modeling: Simulation of Flow and Advective Transport. Academic Press, Cambridge (2015) 12. Ngo-Cong, D., Tien, C.M.T., Nguyen-Ky, T., An-Vo, D.-A., Mai-Duy, N., Strunin, D.V., Tran-Cong, T.: A generalised finite difference scheme based on compact integrated radial basis function for flow in heterogeneous soils. Int. J. Numer. Methods Fluids 85(7), 404–429 (2017) 13. Piret, C., Dissanayake, N., Gierke, J.S., Fornberg, B.: The radial basis functions method for improved numerical approximations of geological processes in heterogeneous systems. Math. Geosci. (2019). 10.1007/s11004-019-09820-w 14. Samarskii, A.A., Vabishchevich, P.N.: The Finite Difference Methodology, Volume 2, Computational Heat Transfer. Wiley, New York (1996) 15. Chen, J.-S., Wang, L., Hu, H.-Y., Chi, S.-W.: Subdomain radial basis collocation method for heterogeneous media. Int. J. Numer. Methods Eng. 80(2), 163–190 (2009) 16. Gorbachenko, V.I., Alqezweeni, M.M.: Learning radial basis functions networks in solving boundary value problems. In: 2019 International Russian Automation Conference— RusAutoCon, Sochi, Russia 8–14 September, pp. 1–6 (2019) 17. Aggarwal, C.C.: Neural Networks and Deep Learning. Springer, Cham (2018) 18. Gorbachenko, V., Savenkov, K.: Improving algorithms for learning radial basic functions networks to solve the boundary value problems. In: Avatar-Based Control, Estimation, Communications, and Development of Neuron Multi-Functional Technology Platforms, pp. 66–106. IGI Global, Hershey (2020) 19. Gorbachenko, V.I., Alqezweeni, M.M.: Modeling of objects with distributed parameters on neural networks. Models Syst. Netw. Econ. Technol. Nat. soc. 4(32), 50–64 (2019). (in Russian) 20. Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006) 21. Marquardt, D.W.: An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 11(2), 431–441 (1963)
Decoding Neural Signals with a Compact and Interpretable Convolutional Neural Network Artur Petrosyan(B) , Mikhail Lebedev, and Alexey Ossadtchi National Research University Higher School of Economics, Moscow, Russia [email protected], [email protected] https://bioelectric.hse.ru/en/
Abstract. In this work, we motivate and present a novel compact CNN. For the architectures that combine the adaptation in both space and time, we describen a theoretically justified approach to interpreting the temporal and spatial weights. We apply the proposed architecture to Berlin BCI IV competition and our own datasets to decode electrocorticogram into finger kinematics. Without feature engineering our architecture delivers similar or better decoding accuracy as compared to the BCI competition winner. After training the network, we interpret the solution (spatial and temporal convolution weights) and extract physiologically meaningful patterns.
Keywords: Limb kinematics decoding Convolutional neural network
1
· Ecog · Machine learning ·
Introduction
The algorithms used to extract relevant neural modulations are a key component of the brain-computer interface (BCI) system. Most often, they implement signal conditioning, feature extraction, and decoding steps. Modern machine learning prescribes performing the two last steps simultaneously with the Deep Neural Networks (DNN) [5]. DNNs automatically derive features in the context of assigned regression or classification tasks. Interpretation of the computations performed by a DNN is an important step to ensure the decoding is based on brain activity and not artifacts only indirectly related to the neural phenomena at hand. A proper features interpretation obtained from the first several layers of a DNN can also benefit the automated knowledge discovery process. In case of BCI development, one way to enable this is to use specific DNN architectures that reflect prior knowledge about the neural substrate of the specific neuromodulation used in a particular BCI. Several promising and compact neural architectures have been developed in the context of EEG, MEG and ECoG data analysis over recent years: EEGNet c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 420–428, 2021. https://doi.org/10.1007/978-3-030-60577-3_50
Compact and Interpretable CNN for Neural Signals
421
[4], DeepConvNet [8], LF-CNN and VAR-CNN [9]. By design the weights of these DNNs are readily interpretable with the use of well-known approaches for understanding the linear model weights [3]. However, to make such interpretations correct, extra care is needed. Here we present another compact architecture, technically very similar to LF-CNN, but motivated by somewhat different arguments than those in [9]. We also provide a theoretically based approach to the interpretation of the temporal and spatial convolution weights and illustrate it using a realistically simulated and real data.
2
Methods
We assume the phenomenological setting presented in Fig. 1. The activity e(t) of a complex set of neural populations G1 − GI , responsible for performing a movement act, gets translated into a movement trajectory by means of some most likely non-linear transformation H, i.e. z(t) = H(e(t)). There are also populations A1 − AJ whose activity is not related to movement but impinges onto the sensors. We do not have a direct access to the intensity of firing e(t) of individual populations. Instead, we observe a K-dimensional vector of sensor signals x(t), which is traditionally modeled as a linear mixture of local field potentials (LFPs) s(t) formed around task relevant populations and taks-irrelevant LFPs f (t). The task-relevant and task-irrelevant LFPs impinge onto the sensors with forward model matrices G and A correspondingly, i.e. x(t) = Gs(t) + Af (t) =
I i=1
gi si (t) +
J
aj fj (t)
(1)
j=1
We will refer to the task-irrelevant term recorded by our K sensors as η(t) = j=1 aj fj (t). The LFPs are thought to be the result of activity of the nearby populations and the characteristic frequency of LFPs is related to the population size [1]. The envelope of LFP then approximates the firing intensity of the proximal neuronal population. The inverse mapping is also most commonly sought in the linear form so that the estimates of LFPs are obtained as a linear combination of the sensor signals, i.e. ˆs(t) = WT X(t) where columns of W = [w1 , . . . , wM ] are the spatial filters that aim to counteract the volume conduction effect and tune away from the activity of interference sources. Our goal is to approximate the kinematics z(t) using concurrently obtained indirect records x(t) of activity of neural populations. In general, we do not know G and the most straightforward approach is to learn the direct mapping z(t) = F(x(t)). J
3
Network Architecture
Based on the above considerations, we have developed a compact adaptable architecture shown in Fig. 2. The key component of this architecture is an
422
A. Petrosyan et al.
Fig. 1. Phenomenological model
adaptive envelope extractor. Interestingly, the envelope extractor, a typical module widely used in signal processing, can be readily implemented using deep learning primitives. It comprises several convolutions used for band-pass and low-pass filtering and computing the absolute value. We also use non-trainable batch-norm before activation and standardize input signals.
Fig. 2. The proposed compact DNN architecture
The envelope detectors receive spatially filtered sensor signals sm obtained by the pointwise convolution layer, which counteracts the volume conduction processes modeled by the forward model matrix G, see Fig. 1. Then, as mentioned earlier, we approximate operator H as some function of the lagged power
Compact and Interpretable CNN for Neural Signals
423
of the source time series by means of a fully connected layer that mixes lagged samples of envelopes [em (n), . . . , em (n − N + 1)] from all branches into a single prediction of the kinematic z(n).
4
Two Regression Problems and DNN Weights Interpretation
The proposed architecture processes data in chunks X(t) = [x(t), x(t − 1), . . . x(t − N + 1)] of some prespecified duration of N samples. In the case when chunk size N equals to the length of the first convolution layer weight vector hm , the processing of X(t) by the first two layers applying spatial and temporal filtering can be simply presented as T bm (n) = wm X(t)hm
(2)
By design ReLu(−1) non-linearity followed by the low-pass filtering performed by the second convolution layer extracts envelopes of the estimates of the underlying rhythmic LFPs. Given the one-to-one mapping between the analytic signal and its envelope [2] we can mentally replace the task of optimizing the parameters of the first three layers of the architecture in Fig. 2 to predict envelopes em (t) with a simple regression task of adjusting the spatial and temporal filter weights to obtain envelope’s generating analytic signal bm (t), see Fig. 2. Fixing temporal weights to their optimal value h∗m , the optimal spatial weights can be received as a solution to the following convex optimization problem: 2
∗ T = argminwm { bm (n) − wm X(t)h∗m 2 } wm
(3)
and similarly for the temporal convolution weights: 2
∗T X(t)hm 2 } h∗m = argminhm { bm (t) − wm
(4)
If we assume statistical independence of neural sources sm (t), m = 1, . . . , M , then (given the regression problem (3) and forward model (1)) their topographies can be assessed as: ∗ ∗ = RYm wm , gm = E{Y(t)Y(t)T }wm
(5)
where RYm = E{Y(t)Y(t)T } is a K × K covariance matrix of Y(t) = X(t)hm temporally filtered multi-channel data under the assumption that xk (t), k = 1, ..., K are all zero-mean processes [3]. Then, we observe the exactly symmetric recipe for interpreting the temporal weights. The temporal pattern can be found as: qm = E{V(t)V(t)T }h∗m = RVm h∗m
(6)
∗ is a chunk of input signal passed through the spatial where V(t) = X(t)T wm V filter and Rm = E{V(t)V(t)T } is a branch specific N × N covariance matrix
424
A. Petrosyan et al.
of spatially filtered data. Here we again assume that xk (t), k = 1, ..., K are zero-mean processes. Commonly, we explore the frequency domain of temporal t=N −1 qm (t)e−j2πf t , where qm (t) pattern to get the sense of it, i.e. Qm (f ) = t=0 is the t-th element of qm temporal pattern vector. When the chunk of data is longer than the filter length, the equation (2) has to be written with the convolution operation and will result not into a scalar, but a vector. In this case using the standard Wiener filtering arguments we can arrive at yy ∗ (f )Hm (f ) (7) Q∗m (f ) = Pm as the expression for the Fourier domain representation of the LFP activity pat∗ (f ) in equation (7) is simply the Fourier transform tern in the m-th branch. Hm of the temporal convolution weights vector h∗m .
5
Simulated and Real Data
In order to generate the simulated data, we precisely followed the setup described in our phenomenological diagram in Fig. 1 with the following parameters. We generated four task-related sources with rhythmic LFPs si (t) as narrow-band processes that resulted from filtering the Gaussian pseudo-random sequences in 30–80 Hz, 80–120 Hz, 120–170 Hz and 170–220 Hz bands using FIR filters. We add 10 task-unrelated sources per band with activation time series located in four bands: 40–70 Hz, 90–110 Hz, 130–160 Hz and 180–210 Hz. Kinematics z(t) was generated as a linear combination of the four envelopes. To simulate volume conduction effect we simply randomly generated 4 × 5 dimensional forward matrix G and 40 × 5 dimensional forward matrix A. We simulated 15 min of the synthetic data sampled at 1000 Hz and then split it into equal contiguous train and test parts. We used open source ECoG + kinematics data set from the BCI Competition IV collected by Kubanek et al to compare our compact DNN’s decoding quality to linear models with pre-engineered features. The winning solution provided by Liang and Bougrain [6] have chosen as a baseline in this comparison. Another data set is our own ECoG data CBI (the Center for Bioelectric Interfaces) recorded with a 64-channel microgrid during self paced flexion of each individual finger over 1 min. The ethics research committee of the National Research University, The Higher School of Economics approved the experimental protocol of this study.
6
Simulated Data Results
We have trained the algorithm on simulated data to decode the kinematic z(t) and then to recover the patterns of sources that were found to be important for this task. Figure 3 shows that the only good match with the simulated topographies based on the true underlying sources is performed by Patterns using specific to branch temporal filters. The characteristic dips in the bands that correspond to the interference sources activity are demonstrated by the spectral
Compact and Interpretable CNN for Neural Signals
425
characteristics of the trained temporal filtering weights. Using the estimation theoretical approach (7), we acquire spectral patterns that closely match the simulated ones and have dips compensation.
Fig. 3. Temporal and spatial patterns acquired for a noisy case, SNR = 1.5. See the main text for the more detailed description.
7
Real Data Results: BCI Competition IV
In the context of processing electrophysiological data, the main advantage of deep learning based architectures is their ability to perform automatic feature selection in regression or classification tasks [7]. We have found that the architecture with the adaptive envelope detectors applied to Berlin BCI Competition IV data set performs on par or better compared to the winning solution [6], see Table 1.
8
Real Data Results: CBI Data
The following table shows the achieved accuracy for the four fingers of the two patients achieved with the proposed architecture. In Fig. 4 we have applied the interpretation of the obtained spatial and temporal weights similarly to the way we analysed realistically simulated data. Below we show the interpretation plots for Patient 1 index finger.
426
A. Petrosyan et al.
Fig. 4. The interpretation of network weights for the index finger decoder for patient 1 from CBI data set. Each plot line corresponds to one out of three trained decoder’s branches. The leftmost column shows the spatial filter weights mapped into colours, while the second and the third columns correspond to vanilla spatial patterns and properly recovered ones. The line graphs interpret the temporal filter weights in the Fourier domain. The filter weights are presented by the solid line, the power spectral density (PSD) pattern of the underlying LFP is marked by the blue dash line. The orange dash line, which is more similar to the filter weights Fourier coefficients, is the PSD of the signal at the output of the temporal convolution block. Table 1. Comparative performance of our model architecture (NET) and the winning solution (Winner) of BCI IV competition Data set 4: «finger movements in ECoG ». Subject 1 Thumb Index Middle Ring Little Winner 0.58
0.71
0.14
0.53
0.29
NET
0.69
0.19
0.57
0.24
0.53
Subject 2 Thumb Index Middle Ring Little Winner 0.51
0.37
0.24
0.47
0.35
NET
0.35
0.23
0.39
0.22
0.49
Subject 3 Thumb Index Middle Ring Little Winner 0.69
0.46
0.58
0.58
0.63
NET
0.49
0.49
0.53
0.6
0.72
Compact and Interpretable CNN for Neural Signals
427
Table 2. Decoding performance obtained in two CBI patients. The results show the correlation coefficients between the actual and decoded finger trajectories for four fingers in two patients. Thumb Index Ring Little Subject 1 0.47
0.80
0.62
0.33
Subject 2 0.74
0.54
0.77
0.80
The DNN architecture for the CBI data had three branches, which were tuned to specific spatial-temporal pattern. We demonstrate the spatial filter weights, vanilla and proper patterns, which were interpreted by the expressions described in the Methods section. As you can in Fig. 4, the temporal filter weights (marked by solid line) clearly emphasize the frequency range above 100 Hz in the first two branches and the actual spectral pattern of the source (marked by dash line) in addition to the gamma-band content has peaks at around 11 Hz (in the first and second branches) and in the 25–50 Hz range (the second branch). It may correspond to the sensory-motor rhythm and lower components of the gamma rhythm correspondingly. The third branch appears to be focused on a lower frequency range. Its spatial pattern is notably more diffused than pattern, focused on the higher frequency components in the first two branches. It is consistent with the phenomenon that the activation frequency and size of neural populations are mutually proportional.
9
Conclusion
We introduced a novel compact and interpretable architecture motivated by the knowledge present in the field. We have also extended the weights interpretation approach described earlier in [3] to the interpretation of the temporal convolution weights. We performed experiments with the proposed approach using both simulated and real data. In simulated data set the proposed architecture was able to almost exactly recover the underlying neuronal substrate that contributes to the kinematic time series that it was trained to decode. We applied the proposed architecture to the real data set of BCI IV competition. Our neural network performed the decoding accuracy similar to the winning solution of the BCI competition [6]. Unlike the traditional approach, our DNN model does not require any feature engineering. On the contrary, after training the structure to decode the finger kinematics, we are able to interpret the weights as well as the extracted physiologically meaningful patterns, which correspond to the both temporal and spatial convolution weights. Acknowledgement. This work is supported by the Center for Bioelectric Interfaces NRU HSE, RF Government grant, ag. No.14.641.31.0003.
428
A. Petrosyan et al.
References 1. Buzsaki, G.: Rhythms of the Brain. Oxford University Press, New York (2006) 2. Hahn, S.L.: On the uniqueness of the definition of the amplitude and phase of the analytic signal. Signal Process. 83(8), 1815–1820 (2003) 3. Haufe, S., Meinecke, F., Görgen, K., Dähne, S., Haynes, J.D., Blankertz, B., Bießmann, F.: On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage 87, 96–110 (2014) 4. Lawhern, V.J., Solon, A.J., Waytowich, N.R., Gordon, S.M., Hung, C.P., Lance, B.J.: EEGnet: a compact convolutional network for EEG-based braincomputer interfaces. arXiv preprint arXiv:161108024 (2016) 5. Lemm, S., Blankertz, B., Dickhaus, T., Müller, K.R.: Introduction to machine learning for brain imaging. Neuroimage 56(2), 387–399 (2011) 6. Liang, N., Bougrain, L.: Decoding finger flexion from band-specific ECoG signals in humans. Front. Neurosci. 6, 91 (2012). https://doi.org/10.3389/ fnins.2012.00091 7. Roy, Y., Banville, H., Albuquerque, I., Gramfort, A., Falk, T.H., Faubert, J.: Deep learning-based electroencephalography analysis: a systematic review. J. Neural Eng. 16(5), 051001 (2019) 8. Schirrmeister, R.T., Springenberg, J.T., Fiederer, L.D.J., Glasstetter, M., Eggensperger, K., Tangermann, M., Hutter, F., Burgard, W., Ball, T.: Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG. arXiv preprint arXiv:170305051 (2017) 9. Zubarev, I., Zetter, R., Halme, H.L., Parkkonen, L.: Adaptive neural network classifier for decoding meg signals. NeuroImage 197, 425–434 (2019)
Estimation of the Complexity of the Classification Task Based on the Analysis of Variational Autoencoders Andrey A. Brynza(&) and Maria O. Korlyakova Federal State Budgetary Educational Institution of Higher Education, «Bauman Moscow State Technical University» (Kaluga Branch), Moscow, Russia [email protected]
Abstract. The problem of constructing a criterion, that allows to evaluate the complexity of the problem to solve by evaluating the inner layer of the variational autoencoder, is considered. A variational autoencoder was modeled for image classification tasks of several levels of complexity. A complexity estimate, based on the study of the inner layer of the autoencoder in the form of measuring distances between distributions was formed. Calculated complexity estimates allow you to rank classification tasks according to the expert rating. Keywords: The similarity score Evaluation of neural network architectures Variational autoencoder The analysis of the latent space
1 Introduction The development and dissemination of neural networks as a technique for data analysis and pattern recognition, poses the problem of finding methods for forming a neural network architecture and selecting its hyperparameters in automatic or semi-automatic modes [1]. A number of projects are focused on solving this issue, among which Google AutoML (focused on the technology of building blocks and controlled process of combining them, based on the use of meta-models [2]), AutoKeras (allowing Bayesian optimization to choose parameters and even a deep implementation scheme neural network [3]), Auto-Sklearn (oriented to solving the CASH problem when selecting hyperparameters [4]) and others. Modern approaches to determining the size of a neural network offer options for the selection of hyperparameters or even structural elements based on various search strategies (grid search, evolutionary models [5], selection and block’s adaptation [6] of some tasks to solve other tasks). However, this search always requires significant resources and involves the construction of many candidate models in the course of the formation of the final scheme. The determination of the complexity of the task at the initial stage of the analysis - is one of the most relevant and significant directions of neural networks, which allows to reduce resource costs. A different type of classification tasks implies different approaches to their solution, both from the side of the chosen concept of training, and from the side of the formed © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 429–437, 2021. https://doi.org/10.1007/978-3-030-60577-3_51
430
A. A. Brynza and M. O. Korlyakova
architecture. An obvious technique for assessing the complexity of a task is to estimate the size of a model that has solved it with a given quality. It’s assumed that: – the deeper the network, the more difficult the task, – the smaller the size of the fragments and the number of classes the easier the task. However, this principle can be true only for certain classes of problems. It is proposed to evaluate the complexity of image classification tasks by analyzing the internal representation of classes in a trained decoder for variational autoencoder. Consider the features of the formation and functioning of VAE.
2 Variational Autoencoder Autoencoder is a network, consisting of two interconnected networks, called encoder and decoder (see Fig. 1(a)). The purpose of encoder is to obtain input data and convert it into a more compact representation.
Fig. 1. a) Auto encoder scheme b) VAE latent space [7]
As a rule, encoder training occurs simultaneously with other parts of the neural network, based on the backpropagation method, which allows it to perform the compression necessary for a specific task. The compressed data is much smaller than the input, it follows that the encoder loses some of the information, but at the same time it tries to save as much relevant information as possible. In turn, the decoder is trained to receive compressed data and correctly recover input fragments from them: X PðZjXÞ ¼ NðlðXÞ; ðXÞÞ Variational autoencoder (VAE) is widely used generative model [8], which will create objects that are close in some metrics to training ones (see Fig. 1(b)): Z PðX; 0Þ ¼
PðXjZ;h) P(Z)dz Z
ð1Þ
Estimation of the Complexity of the Classification Task
431
The residual density is defined as follows: ðXf ðZÞÞT ðXf ðZÞÞ
2r2 PðXjZÞ ¼ e , where T ðX f ðZÞÞ ðX f ðZÞÞ - distance between X and its projection. At some point Z this distance reaches its minimum. Selection of point Z , where the distance is minimal, due to the optimization process. Taylor lining up f ðzÞ around Z and, assuming that PðZÞ is a fairly smooth function and does not change much in the neighborhood Z , we get: Z ðXf ðZÞÞT ðXf ðZÞÞ T T 2r2 eðZZ Þ WðXÞ WðXÞðZZ ÞÞ dz; PðXÞ ¼ PðZ Þe
z
Þ where WðXÞ ¼ rf ðZ r ; Z ¼ gðXÞ, the integral on the right: is the n-dimensional EulerPoisson integral. Then the final score is:
PðXÞ ¼ PðZ Þe
ðXfðZ ÞÞT ðXfðZ ÞÞ 2r2
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ; Z ¼ gðxÞ; detðWðXÞT WðxÞ=2p
It follows that PðXÞ depends on: • The distance between the input vector and its reconstruction, the worse restored the less PðXÞ • Probability densities PðZ Þ at the point Z ¼ gðxÞ • Function normalization PðZÞ at the point Z Generative models are used to reproduce random data, that looks similar to a training dataset. However, it is often necessary to change or investigate variations on data, that are already available, and not randomly, but in a certain desired way. In this case, VAEs work better than other currently available methods, because their latent space is continuous, allowing for random transformations and interpolation. The continuity of the latent space is achieved as follows: the encoder produces not one vector, but two - the vector of average values of l and the vector of standard deviations r. The i-th elements of the vectors l and r are the average and standard deviation of the i-th random variable Xi [9]. Together, these values forms an n - dimensional random vector, which is sent to the decoder for data recovery. This stochastic generation means that even for the same input data the encoding result will be different due to the randomness of the encoding vector selection [10]. It is noticed that for simple models (MNIST) the internal space of VAE looks organized in the sense of placing objects of the same classes (it is a reflection of the original descriptions in a new form). Let’s try to use this fact when analyzing the complexity of a task, using VAE as a model tester for dataset. It is assumed that the simpler the classification task, the more often the structure of placing objects in the latent space of VAE is repeated and the more stable to noisy of input image.
432
A. A. Brynza and M. O. Korlyakova
3 Task Complexity Analysis For any task, the generative process has the form (1), where PðXÞ - the probability of a particular image, in principle, be drawn (if the fragment does not belong to the class, the probability is low, and vice versa), PðZÞ - probabilistic distribution of hidden factors, PðXjZÞ - probability distribution of fragments for given hidden factors. To each specific value X may result in a small subset Z, for others PðX j ZÞ close to zero. The similarity of the two distributions of latent spaces among themselves means low complexity of the task. Reducing the similarity of hidden space distributions P1 ðZÞ and P2 ðZÞ, means increased complexity. Let’s consider the following tasks: 1) MNIST [11] - conditionally simple task. The accuracy of the solver is 99.5% [12]. 2) Angle measuring system (AMS) - recognition of the type of angles of the calibration template. Is available complication due to the small size of the training sample, high distortion due to rotation. Possible overlap and erosion of fragments, complicating the classification. The quality of the solution on a simple computer with a 3-layer convolution network is 87.3% [13]. 3) CIFAR-10 – dataset consisting of 60000 fragments, having 10 classes [14]. The quality of the solution with using powerful calculators is * 98.5%. We will consider this task difficult, because a solution on a comparatively weak computer in a short time interval will not give such accuracy (Fig. 2).
Fig. 2. Fragments of samples a) MNIST, b) AMS c) CIFAR-10
Scale each fragment to size 28 28 pixels. We produce rationing. At the first stage, we build the VAE with the following architecture: Encoder: Input !
28 28
Conv1
32maps 3 3; step¼2
! ReLU !
Conv2
64 maps 3 3; step¼2
! ReLU ! FC Layer 20 FC layers
Decoder: Input !
1 1 10
Conv0 1 ! ReLU ! Conv0 2 64 maps 7 7 10; step¼7 64 maps 3 3 64; step¼2 ! Conv0 3 ! ReLU ! Conv0 4 1 map 3 3 32; 32 maps 3 3 64; step¼2
! ReLU
Because CIFAR-10 and MNIST sample size is large, we will construct the main components of the mean values and standard deviations for 2000 fragments per
Estimation of the Complexity of the Classification Task
433
experiment. For the second task - 200 fragments. We pass the test fragments through the trained encoder, at the output we have the following distribution of the mean and standard deviations by classes (see Fig. 3). Distributions of objects have a general character arrangement of attributes for each of the classes. The points in the figure are set at the centers of the most distinguishable classes at maximum distance. A slight displacement of the centers is observed.
Fig. 3. Distribution of µ and r for fragments of the 1st and 2nd experiments passed through VAE
To eliminate the likelihood of distortion of the calculated centers by separate «outliers» , the signs of which after recovery are more similar to the signs of another class, a time-consuming «outburst» elimination procedure is introduced based on a modification of the KOLAPS cluster analysis algorithm [15], which eliminates up to 25% distorting the distribution test sample elements for each experiment. As a result, class groups become sharper. Considering the centers of classes as vertices of the graph, we can evaluate the measure of similarity of the positions of classes relative to each other using the method of separation of partial isomorphism [16]. Mismatch of the processes of differentiation at any step of integration of codes of structural differences means the lack of similarity of structures.
4 Evaluation of the Similarity of Distributions as a Criterion for the Complexity of a Task To obtain some weighted average estimate over the entire subset, we use the measures of remoteness of two probability distributions from each other: – Kullback - Leibler distance (K-L) [17]: (non-negative-valued functional, which is an asymmetric measure of the distance from each other of two probability distributions) KL ¼ 1=2
N X i¼1
1 þ ln r l2 r2
434
A. A. Brynza and M. O. Korlyakova
– Fisher’s criterion (CF) [18]. The variance ratio of the experimental deviations: F ¼ r1 =r2 It is assumed that adding noise to fragments of the test sample for the problem with easy complexity does not greatly affect the main components of the average and standard deviations, but in the high-level complexity tasks, on the contrary. We test the hypothesis by adding uniform noise with zero mean and unit dispersion: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi e ¼ 2 3n=100 randðIÞ 0:5, where n- image noise level, randðIÞ - a matrix of pseudo-random numbers, with size equal the original image size; The readability threshold is at the level 35–40% (see Fig. 4(e), Fig. 5). Further, the Similarity coefficient (KP), the Fisher criterion, and the K-L measure in the framework of this problem will be calculated incorrectly.
Fig. 4. View of fragments of the MNIST dataset with noise level: a) without noise, b) 5%, c) 10%, d) 20%, e) 30%, f) 40%
Noise growth leads to distortion of the obtained distributions. However, up to the 30% border, some general dynamics of behavior is maintained, the K-L measure, like the KP (see Fig. 8(a, b), Table 1) are in close values from each other. From which we can conclude that the problem is simple. The variation in the KF value is due to the significant effect of noise on the dispersion. Consider similar estimates for other tasks (Fig. 6): – AMS. The readability level in noisy conditions for this task is up to 10%. The increase of noise leads to a distortion of the obtained distributions already at 10% noise level. However, some general dynamics of behavior is still preserved (see Fig. 8(a, b), Table 1). KF Allows to evaluate only in the presence of noise up to 5%, then the picture is distorted. The K-L measure, as well as the similarity coefficient, are in relatively close values from each other, but more than in the previous problem, it follows from this, that the task is more complicated. – CIFAR-10. The decoder trained on our computer is not able to produce a clear recovery even for basic fragments. It turns out a blurry picture. Test fragments and distributions of µ and r are hardly distinguishable at the noise level by 5% already (see Fig. 7). An increase in the KF value is observed, indicating the absence of any similarity of class variances. With increasing noise, the K-L measure decreases (see Fig. 8(a), Table 1). The coefficient of graph’s similarity is not informative, because it is not able to reflect the actual estimate of the shift of the center of classes.
Estimation of the Complexity of the Classification Task
Fig. 5. Distribution of µ and r for experiments 1 and 2 with 30% noise level Table 1. Fisher’s measure of the relationship between graphs (class centers) Noice, % Relationships between graphs Graph 1-2 Graph 2-3 MNIST AMS CIFAR MNIST AMS – 1,01 2,04 1,00 1,12 1,12 5 4,11 2,09 1,27 3,02 2,917 10 4,30 5,66 1,99 3,26 11,9 20 3,44 12,5 4,56 2,70 21,5
Graph 3-4 CIFAR MNIST AMS 3,29 1,05 1,01 4,18 3,75 3,08 5,31 4,3 8,08 13,4 3,40 21,7
CIFAR 3,47 1,21 2,03 4,76
Fig. 6. Type of fragments of the test sample with 10% noise level
Fig. 7. Distribution of µ and r for CIFAR-10 with 5% noise addition
435
436
A. A. Brynza and M. O. Korlyakova
Fig. 8. The magnitude diagram of the change a) K-L measures b) Similarity coefficient
For the VAE-built CIFAR-10 task, due to bad recovery quality, the resulting distributions are not amenable to analysis. Thus, the analysis technique has the following form: 1. Build VAE variants for the proposed set of examples and examples with noise. You just need to conduct training without overfitting on a fixed architecture (test dataset) VAE. All types of tasks need to be trained on the same volumes of internal space, i.e. fitting the test model is only related to the size of the latent space 2. If the measures K-L, KP and KF pass within the boundaries of the type of task, then it can be qualified as a task equivalent to a simple, complex or intermediate level 3. The choice of a neural network architecture will be limited to similar architectures for tasks of equivalent complexity.
5 Conclusions The analysis of influence complexity of the task on VAE’s latent space structure was made. It is shown that the complexity of the task determines the nature of the placement of classes in the latent space. Using the Kullback – Leibler metrics, the Similarity coefficient, and the Fisher criterion, at the initial stage, can assess the complexity of the task under consideration and determine its class. There is no need to get the perfect model and care about the quality of training for the VAE test model. Thus, a methodology for testing the complexity of classification tasks and an approximate estimation of the size of a neural network to solve them is proposed.
References 1. Zhang, Y., Yang, Q.: A Survey on Multi-Task Learning (2017). arXiv.org https://arxiv.org/ abs/1707.08114
Estimation of the Complexity of the Classification Task
437
2. Ghiasi, G., Lin, T.-Y., Le, Q.V.: NAS-FPN: learning scalable feature pyramid architecture for object detection (2019). https://openaccess.thecvf.com/content_CVPR_2019/papers/ Ghiasi_NAS%20FPN_Learning_Scalable_Feature_Pyramid_Architecture_for_Object_ Detection_CVPR_2019_paper.pdf 3. Du, M., Yang, F., Zou, N., Hu, X.: Fairness in Deep Learning: A Computational Perspective (2020). arXiv.org https://arxiv.org/pdf/1908.08843.pdf 4. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and Robust Automated Machine Learning (2015). https://papers.nips.cc/paper/5872-efficientand-robust-automated-machine-learning.pdf 5. Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y.L., Tan, J., Le, Q., Kurakin, A.: Large-Scale Evolution of Image Classifiers (2017). arXiv.org https://arxiv.org/abs/1703. 01041 6. Cai, H., Zhu, L., Han, S.: Direct neural architectures earch on target task and hardware (2019). arXiv.org https://arxiv.org/pdf/1812.00332.pdf 7. Carl, D.: Tutorial on Variational Autoencoders (2016). arXiv.org https://arxiv.org/abs/1606. 05908 8. Blei, D., Kucukelbir, A.: Variational Inference: A Review for Statisticians (2018). arXiv.org https://arxiv.org/abs/1601.00670v9 9. Li, Y., Yosinski, J., Clune, J., Lipson, H.: Convergent Learning: Do Different Neural Networks Learn the Same Representations? Accessed 28 Feb 2016 10. Subbotin, S.A.: A set of characteristics and comparison criteria for training samples for solving diagnostic and pattern recognition problems (1) (2010) 11. MNIST Database of handwritten digits. http://www.machinelearning.ru/wiki/index.php? title=MNIST_database_of_handwritten_digits 12. Description obtaining the accuracy of solution the MNIST task. https://xu932.github.io/blog/ kaggle/2020-03-29-digit-recognizer 13. Brynza, A.A., Korlyakova, M.O.: Assessment of the complexity of the neural network convolutional classifier. In: International Scientific and Technical Conference Neuroinformatics-2018, Collection of Scientific Papers (2018) 14. The CIFAR-10 dataset. https://www.cs.toronto.edu/*kriz/cifar.html 15. Pestunov, I.A., Sinyavsky, Yu.N.: Clustering algorithms in satellite image segmentation problems. Bull. KemSU (2) (2012) 16. Pogrebnoy, A.V.: A method for determining the similarity of graph structures based on the allocation of partial isomorphism in geoinformatics problems. News TPU (11) (2015) 17. Maturana, D., Mery, D., Soto, A.: Face recognition with local binary patterns, spatial pyramid histograms and naive bayes nearest neighbor classification. In: Proceedings of the XXVIII International Conference of the Chilean Computer Science Society, IEEE CS Society (2009) 18. Kobzar, A.I.: Applied mathematics statistics. Fizmatlit, p. 816 (2006)
Author Index
A A. Kozubenko, Evgeny, 143 Adeshkin, Vasily, 175 Afonin, Andrey N., 65 Andreev, Matvei, 228 Artamonov, Igor, 198 Artamonova, Yana, 198 Asadullayev, Rustam G., 65 B Bakhshiev, Aleksandr, 339 Barinov, Oleg, 222 Bartuli, Elena, 205 Beskhlebnova, Galina A., 112, 330 Bodrina, Natalya I., 309 Brynza, Andrey A., 429 Burikov, Sergey, 234 Burlakov, Evgenii, 51 Bushov, Yuri V., 37 C Chentsov, Serge V., 269 Chernov, Ivan A., 303 Chizhov, Anton V., 157 D Danko, Sergey, 44 Darkhovsky, Boris S., 137 Dolenko, Sergey, 58, 222, 234 Dolenko, Tatiana, 234 Dolzhenko, Alexandr V., 175 Dorofeev, Vladislav, 118 Dorofeev, Vladislav P., 293, 357 Drobakha, Victor, 205
Dubnov, Yuri A., 137 Dunin-Barkowski, Witali L., 293, 357 E Efitorov, Alexander, 58, 198, 228 Engel, Ekaterina A., 374 Engel, Nikita E., 374 F Farzetdinova, Rimma, 71 Fedorenko, Yuriy S., 276 Fedyaev, Oleg I., 262 Filatova, Natalya N., 309 Fomin, Nikita, 316 G Gapanyuk, Yuriy E., 242 Glyzin, Sergey D., 347 Gomzina, Anastasia, 249 Gorbachenko, Vladimir I., 412 Grodzitsky, Lev, 316 Gurtovoy, Konstantin, 85 I Igonin, Dmitry M., 125, 184 Isaev, Igor, 234 Ivanov, Alexey, 394 Ivanov, Daniil, 394 K Kaplan, Alexander Y., 137 Karandashev, Yakov, 363 Kartashov, Sergey I., 37
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 B. Kryzhanovsky et al. (Eds.): NEUROINFORMATICS 2020, SCI 925, pp. 439–441, 2021. https://doi.org/10.1007/978-3-030-60577-3
440 Kiselev, Mikhail, 394 Kniaz, Vladimir V., 316 Knyaz, Vladimir A., 316 Knyazeva, Irina, 44 Kolganov, Pavel A., 125, 184 Korlyakova, Maria O., 429 Korsakov, Anton, 339 Kotenko, Igor, 256 Kotov, Vladimir B., 101, 330 Kovalev, Alexey K., 283 Kozeletskaya, Margarita G., 157 Krivetskiy, Valeriy, 228 Kulikova, Sofya, 205 L Laptinskiy, Kirill, 234 Lebedev, Alexander, 118 Lebedev, Alexander E., 293, 357 Lebedev, Mikhail, 420 M Makarenko, Nikolay, 44 Makarenko, Stepan, 212 Malkov, Ivan, 51 Malykh, Pavel, 249 Malykhina, Galina, 249 Mamedov, Nurlan, 205 Markov, Ilya, 249 Meilikov, Evgeny, 71 Mikhailyuk, Taras, 387 Mizginov, Vladimir, 316 Moloshnikov, Ivan, 403 Myagkova, Irina, 222 N Naumov, Andrey E., 175 Naumov, Anton, 212 Nuidel, I. V., 10 O Orlov, Vyacheslav, 58 Orlov, Vyacheslav A., 37 Ossadtchi, Alexey, 420 P Panov, Aleksandr I., 212, 283 Papazyan, Ares, 316 Parin, S. B., 10
Author Index Petrosyan, Artur, 420 Piryatinska, Alexandra, 137 Podladchikova, Lubov N., 143 Podtikhov, Artur, 283 Polevaya, S. A., 10 Polyakov, Alexandr, 175 Popkov, Alexey Y., 137 Preobrazhenskaia, Margarita M., 347 Pushkareva, Maria, 363 R Ragachev, Pavel, 205 Red’ko, Vladimir G., 112 Revunkov, Georgiy I., 242 Rezanov, Alexander, 167 Rybka, Roman, 403 S Saenko, Igor, 256 Sarmanova, Olga, 234 Sboev, Aleksandr, 403 Serenko, Alexey, 403 Shaban, Makhmud, 283 Shakirov, Vladimir, 118 Shakirov, Vladimir V., 293, 357 Shamrayev, Anatoliy A., 65 Shaposhnikov, Dmitry G., 143 Shelomentseva, Inga G., 269 Shemagina, O. V., 10 Shirokii, Vladimir, 58, 198, 222 Shumsky, S. A., 3 Sidorov, Konstantin V., 309 Sitnikova, Maria A., 65 Skorik, Fadey, 256 Smirnitskaya, Irina A., 149 Sokhova, Zarema B., 93, 101 Staroverov, Alexey, 212 Stenkin, Dmitry A., 412 Svetlik, Mikhail V., 37 T Talalaev, Dmitry V., 381 Taran, Maria O., 242 Tarkhov, Dmitry, 249 Tarkov, Mikhail S., 303 Tereshin, Valeriy, 249 Tiselko, Vasilii S., 157
Author Index Tiumentsev, Yury V., 125, 184 Tkachev, Nikolay M., 262 U Ushakov, Vadim L., 37 Ushakov, Vadim, 51, 58 V Vasilyev, Oleg, 198 Verkhlyutov, Vitaly, 51 Vetlin, Vladislav, 212 Vladimirov, Roman, 222
441 Vlasov, Danila, 403 Vvedensky, Victor, 85 Y Yakhno, V. G., 10 Yudin, Dmitry, 167 Yudin, Dmitry A., 175 Yulia, Boytsova, 44 Z Zhernakov, Sergey, 387