295 82 5MB
English Pages 264 Year 2010
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES 2010
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Frontiers in Artificial Intelligence and Applications FAIA covers all aspects of theoretical and applied artificial intelligence research in the form of monographs, doctoral dissertations, textbooks, handbooks and proceedings volumes. The FAIA series contains several sub-series, including “Information Modelling and Knowledge Bases” and “Knowledge-Based Intelligent Engineering Systems”. It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong
Volume 221
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Recently published in this series Vol. 220. R. Alquézar, A. Moreno and J. Aguilar (Eds.), Artificial Intelligence Research and Development – Proceedings of the 13th International Conference of the Catalan Association for Artificial Intelligence Vol. 219. I. Skadiņa and A. Vasiļjevs (Eds.), Human Language Technologies – The Baltic Perspective – Proceedings of the Fourth Conference Baltic HLT 2010 Vol. 218. C. Soares and R. Ghani (Eds.), Data Mining for Business Applications Vol. 217. H. Fujita (Ed.), New Trends in Software Methodologies, Tools and Techniques – Proceedings of the 9th SoMeT_10 Vol. 216. P. Baroni, F. Cerutti, M. Giacomin and G.R. Simari (Eds.), Computational Models of Argument – Proceedings of COMMA 2010 Vol. 215. H. Coelho, R. Studer and M. Wooldridge (Eds.), ECAI 2010 – 19th European Conference on Artificial Intelligence Vol. 214. I.-O. Stathopoulou and G.A. Tsihrintzis, Visual Affect Recognition Vol. 213. L. Obrst, T. Janssen and W. Ceusters (Eds.), Ontologies and Semantic Technologies for Intelligence Vol. 212. A. Respício et al. (Eds.), Bridging the Socio-Technical Gap in Decision Support Systems – Challenges for the Next Decade Vol. 211. J.I. da Silva Filho, G. Lambert-Torres and J.M. Abe, Uncertainty Treatment Using Paraconsistent Logic – Introducing Paraconsistent Artificial Neural Networks Vol. 210. O. Kutz et al. (Eds.), Modular Ontologies – Proceedings of the Fourth International Workshop (WoMO 2010)
ISSN 0922-6389 (print) ISSN 1879-8314 (online)
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 Proceedings of the First Annual Meeting of the BICA Society
Edited by
Alexei V. Samsonovich George Mason University, USA
Kamilla R. Jóhannsdóttir University of Akureyri, Iceland
Antonio Chella University of Palermo, Italy
and
Ben Goertzel
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Novamente LLC, USA
Amsterdam • Berlin • Tokyo • Washington, DC
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
© 2010 The authors and IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-60750-660-7 (print) ISBN 978-1-60750-661-4 (online) Library of Congress Control Number: 2010938639 Publisher IOS Press BV Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail: [email protected]
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail: [email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved.
v
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Preface This volume documents the proceedings of the First International Conference on Biologically Inspired Cognitive Architectures (BICA 2010), which is also the First Annual Meeting of the BICA Society. This conference was preceded by 2008 and 2009 AAAI Fall Symposia on BICA that were similar in content (indeed, the special issue of the International Journal of Machine Consciousness1 is composed of a selection of papers and abstracts from all three events, and it is an official complement of this Proceedings volume). The 2008–2009 BICA symposia in turn were preceded by a sequence of the DARPA BICA meetings in 2005–2006 (see below). However, BICA 2010 is the first independent event in the BICA series: it has the status of the first annual meeting of the just established BICA society (further information is available at http://bicasociety.org). Like the 2008 and 2009 BICA Symposia, the present BICA 2010 conference contains a wide variety of ideas and approaches, all centered around the theme of understanding how to create general-purpose humanlike artificial intelligence using inspirations from studies of the brain and the mind. BICA is no modest pursuit: the long-term goals are no less than understanding how human and animal brains work, and creating artificial intelligences with comparable or greater functionality. But, in addition to these long-term goals, BICA research is also yielding interesting and practical research results right now. A cognitive architecture, broadly speaking, is a computational framework for the design of intelligent and even conscious agents. Cognitive architectures may draw their inspiration from many sources, including pure mathematics or physics or abstract theories of cognition. A biologically inspired cognitive architecture (BICA), in particular, is one that incorporates formal mechanisms from computational models of human and animal cognition, drawn from cognitive science or neuroscience. The appeal of the BICA approach should be obvious: currently human and animal brains provide the only physical examples of the level of robustness, flexibility, scalability and consciousness that we want to achieve in artificial intelligence. So it makes sense to learn from them regarding cognitive architectures: both for research aimed at closely replicating human or animal intelligence, and also for research aimed at creating and using human-level artificial intelligence more broadly. Research on the BICA approach to intelligent agents has focused on several different goals. Some BICA projects have a primary goal of accurately simulating human behavior, either for purely scientific reasons – to understand how the human mind works – or for applications in domains such as entertainment, education, military training, and the like. Others are concerned with even deeper correspondence between models and the human brain, going down to neuronal and sub-neuronal level. The goal in this approach is to understand how the brain works. Yet another approach is concerned with designing artificial systems that are successful, efficient, and robust at performing 1
A.V. Samsonovich (guest editor): International Journal of Machine Consciousness, special issue on Biologically Inspired Cognitive Architectures, Vol. 2, No. 2, 2010.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
vi
cognitive tasks that today only humans can perform, tasks that are important for practical applications in the human society and require interaction with humans. Finally, there are BICA projects aimed broadly at creating generally intelligent software systems, without focus on any one application area, but also without a goal of closely simulating human behavior. All four of these goals are represented in the various papers contained in this volume. The term BICA was coined in 2005 by Defense Advanced Research Projects Agency (DARPA), when it was used as the name of a DARPA program administered by the Information Processing Technology Office (IPTO). The DARPA BICA program was terminated in 2006 (more details are available at the DARPA BICA web page at http://www.darpa.mil/ipto/programs/bica/bica.asp). Our usage of the term “BICA” is similar to its usage in the DARPA program; however, the specific ideas and theoretical paradigms presented in the papers here include many directions not encompassed by DARPA’s vision at that time. Moreover, there is no connection between DARPA and the BICA Society. One of the more notable aspects of the BICA approach is its cross-disciplinary nature. The human mind and brain are not architected based on the disciplinary boundaries of modern science, and to understand them almost surely requires rich integration of ideas from multiple fields including computer science, biology, psychology and mathematics. The papers in this volume reflect this cross-disciplinarity in many ways. Another notable aspect of BICA is its integrative nature. A well-thought BICA has a certain holistic integrity to it, but also generally contains multiple subsystems, which may in some cases be incorporated into different BICAs, or used in different ways than the subsystem’s creator envisioned. Thus, the reader who is developing their own approach to cognitive architectures may find many insights in the papers contained here useful for inspiring their own work or even importing into their own architecture, directly or in modified form. Finally we would like to call attention to the relationship between cognition, embodiment and development. In our view, to create a BICA with human-level general intelligence, it may not be necessary to engineer all the relevant subsystems in their mature and complete form. Rather, it may be sufficient to understand the mechanisms of cognitive growth in a relatively simple form, and then let the mature forms arise via an adaptive developmental process. In this approach, one key goal of BICA research becomes understanding what the key cognitive subsystems are, how do they develop, and how they become adaptively integrated in a physical or virtual situated agent able to perform tight interactions within its own body, the other entities and the surrounding environment. With this sort of understanding in hand, it might well be possible to create a BICA with human-level general learning capability, and teach it like a child. Potentially, a population of such learners could ignite a cognitive chain reaction of learning from each other and from common resources, such as human instructors or the Internet. Currently BICA research is still at an early stage, and the practical capabilities of BICA systems are relatively undeveloped. Furthermore, the relationships between the ideas of various researchers in the field are not always clear; and there is considerable knowledge in relevant disciplines that is not yet fully incorporated into our concrete BICA designs. But BICA research is also rapidly developing, with each year bringing significant new insights, moving us closer to our ambitious goals. In this sense, the newborn BICA society, according to the intentions of the Founding Members, will be a main vehicle for the growth and dissemination of breakthrough research in the field of
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
vii
BICA systems. The papers presented in this volume form part of this ongoing process, as will the papers in the ongoing BICA conferences to follow. Alexei V. Samsonovich, Kamilla R. Jóhannsdóttir, Antonio Chella and Ben Goertzel
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Editors
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
ix
BICA 2010 Conference Committees Organizing Committee Chairs Alexei V. SAMSONOVICH George Mason University, USA Kamilla R. JÓHANNSDÓTTIR University of Akureyri, Iceland
Core
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Igor Aleksander (Imperial College London, UK) Bernard J. Baars (The neurosciences Institute, USA) Antonio Chella (University of Palermo, Italy) Ben Goertzel (Novamente LLC, USA) Stephen Grossberg (Boston University, USA) Christian Lebiere (Carnegie Mellon University, USA) David C. Noelle (University of California Merced, USA) Roberto Pirrone (University of Palermo, Italy) Frank E. Ritter (Penn State University, USA) Murray P. Shanahan (Imperial College London, UK) Kristinn R. Thorisson (CADIA; Reykjavik University, Iceland)
Program Committee Samuel S. Adams (Watson IBM Research, USA) Itamar Arel (University of Tennessee, USA) Son K. Dao (HRL Laboratories, LLC, USA) Scott E. Fahlman (Carnegie Mellon University, USA) Ian Fasel (University of Arizona, USA) Stan Franklin (University of Memphis, USA) Eva Hudlicka (Psychometrix Assoc., USA) Magnus Johnsson (Lund University Cognitive Science, Sweden) Alexander A. Letichevsky (Glushkov Institute of Cybernetics, Ukraine) Ali A. Minai (University of Cincinnati, USA) Shane T. Mueller (Klein Associates Division / ARA Inc., USA) Brandon Rohrer (Sandia National Laboratories, USA) Ricardo Sanz (Universidad Politécnica de Madrid, Spain) Colin T. Schmidt (Le Mans University & Arts et Metiers ParisTech, France) Josefina Sierra (Technical University of Catalonia, Spain)
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
x
Terry Stewart (University of Waterloo, Canada) Andrea Stocco (Carnegie Mellon University, USA) Bruce Swett (Decisive Analytics Corporation, USA) Rodrigo Ventura (Instituto Superior Técnico, Portugal) Pei Wang (Temple University, USA) Juyang (John) Weng (Michigan State University, USA)
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Reviewers James S. Albus Igor Aleksander Itamar Arel Bernard J. Baars Jonathan Brickliln Antonio Chella Son K. Dao Scott E. Fahlman Ian Fasel Stan Franklin Ben Goertzel Stephen Grossberg Wan Ching Ho Eva Hudlicka Kamilla R. Jóhannsdóttir Magnus Johnsson Benjamin Johnston Christian Lebiere Alexander A. Letichevsky Ali A. Minai Jonathan H. Morgan Shane T. Mueller David C. Noelle Rony Novianto
Roberto Pirrone Lorenzo Riano Frank E. Ritter Brandon Rohrer Paul Rosenbloom Alexei Samsonovich Ricardo Sanz Colin T. Schmidt Michael Sellers Murray P. Shanahan Josefina Sierra Terry Stewart Andrea Stocco Leopold Stubenberg Bruce Swett Kristinn R. Thórisson Peter Tripodes Akshay Vashist Robert N. VanGulick Craig M. Vineyard Rodrigo Ventura Pei Wang Mark Waser Juyang (John) Weng
BICA 2010 conference was held Friday, Saturday and Sunday, November 12–14, 2010, in Arlington, Virginia, USA.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
xi
Contents Preface Alexei V. Samsonovich, Kamilla R. Jóhannsdóttir, Antonio Chella and Ben Goertzel
v
BICA 2010 Conference Committees
ix
Conference Papers and Extended Abstracts Reverse Engineering the Vision System James Albus
3
Application Feedback in Guiding a Deep-Layered Perception Model Itamar Arel and Shay Berant
4
NeuroNavigator: A Biologically Inspired Universal Cognitive Microcircuit Giorgio A. Ascoli and Alexei V. Samsonovich BINAReE: Bayesian Integrated Neural Architecture for Reasoning and Explanation Robert (Rusty) Bobrow, Paul Robertson and Robert Laddaga SCA-Net: A Sensation-Cognition-Action Network for Speech Processing Michael Connolly Brady
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
A Connectionist Model of MT+/Mstd Explains Human Heading Perception in the Presence of Moving Objects N. Andrew Browning
10
17 23
24
Discovering the Visual Patterns Elicited by Human Scan-Path Andrea Carbone
25
An Architecture for Humanoid Robot Expressing Emotions and Personality Antonio Chella, Rosario Sorbello, Giorgio Vassallo and Giovanni Pilato
33
An Evolutionary Approach to Building Artificial Minds James L. Eilbert
40
Explanatory Aspirations and the Scandal of Cognitive Neuroscience Ross W. Gayler, Simon D. Levy and Rens Bod
42
An Experimental Cognitive Robot Pentti O.A. Haikonen
52
Dopamine and Self-Directed Learning Seth Herd, Brian Mingus and Randall O’Reilly
58
Modelling Human Memory in Robotic Companions for Personalisation and Long-Term Adaptation in HRI Wan Ching Ho, Kerstin Dautenhahn, Mei Yii Lim and Kyron Du Casse Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
64
xii
Assessing the Role of Metacognition in GMU BICA Michael Q. Kalish, Alexei V. Samsonovich, Mark A. Coletti and Kenneth A. De Jong
72
Towards Understanding Trust Through Computational Cognitive Modeling William G. Kennedy
78
An Externalist and Fringe Inspired Cognitive Architecture Riccardo Manzotti
79
Architecture of the Mind with Artificial Neurons Deepak J. Nath
85
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Online Event Segmentation in Active Perception Using Adaptive Strong Anticipation Bruno Nery and Rodrigo Ventura
86
Four Kinds of Learning in One Agent-Oriented Environment Sergei Nirenburg, Marjorie McShane, Stephen Beale, Jesse English and Roberta Catizone
92
Attention in the ASMO Cognitive Architecture Rony Novianto, Benjamin Johnston and Mary-Anne Williams
98
On the Emergence of Novel Behaviours from Complex Non Linear Systems Lorenzo Riano and T.M. McGinnity
106
GRAVA – Context Programming Paul Robertson and Robert Laddaga
113
Implementing First-Order Variables in a Graphical Cognitive Architecture Paul Rosenbloom
119
Cathexis: An Emotional Basis for Human-Like Learning Michael Sellers
125
Biological and Psycholinguistic Influences on Architectures for Natural Language Processing John F. Sowa A Bio-Inspired Model for Executive Control Narayan Srinivasa and Suhas E. Chelian Neural Symbolic Decision Making: A Scalable and Realistic Foundation for Cognitive Architectures Terrence C. Stewart and Chris Eliasmith The Role of the Basal Ganglia – Anterior Prefrontal Circuit as a Biological Instruction Interpreter Andrea Stocco, Christian Lebiere, Randall C. O’Reilly and John R. Anderson Learning to Recognize Objects in Images Using Anisotropic Nonparametric Kernels Douglas Summers-Stay and Yiannis Aloimonos
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
131 137
147
153
163
xiii
Disciple Cognitive Agents: Learning, Problem Solving Assistance, and Tutoring Gheorghe Tecuci, Mihai Boicu, Dorin Marcu and David Schum Attention Focusing Model for Nexting Based on Learning and Reasoning Akshay Vashist and Shoshana Loeb A Neurologically Plausible Artificial Neural Network Computational Architecture of Episodic Memory and Recall Craig M. Vineyard, Michael L. Bernard, Shawn E. Taylor, Thomas P. Caudell, Patrick Watson, Stephen Verzi, Neal J. Cohen and Howard Eichenbaum Validating a High Level Behavioral Representation Language (HERBAL): A Docking Study for ACT-R Changkun Zhao, Jaehyon Paik, Jonathan H. Morgan and Frank E. Ritter
169 170
175
181
Manifesto Introducing the BICA Society Alexei V. Samsonovich, Kamilla R. Jóhannsdóttir, Andrea Stocco and Antonio Chella
191
Review 195
Subject Index
245
Author Index
247
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Toward a Unified Catalog of Implemented Cognitive Architectures Alexei V. Samsonovich
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Conference Papers and Extended Abstracts
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-3
3
Reverse Engineering the Vision System James ALBUS Krasnow Institute for Advanced Studies, George Mason University 4400 University Drive MS 2A1, Fairfax, VA 22030-4444, USA [email protected]
Abstract
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
The vision system is perhaps the most well understood part of the neocortex. The input from the eyes consists of a set of images made up of pixels that are densely packed in the fovea and less so in the periphery. Each pixel is represented by a vector of attributes such as color, brightness, spatial and temporal derivatives. Pixels from each eye are registered in the lateral geniculate nucleus and projected to the cortex where they are processed by a hierarchy of array processors that detect features and patterns and compute their attributes, state, and relationships. These array processors consist of Cortical Computational Units (CCUs) made up of cortical hypercolumns and their underlying thalamic and other subcortical nuclei. Each CCU is capable of performing complex computational functions and communicating with other CCUs at the same and higher and lower levels. The entire visual processing hierarchy generates a rich, colorful, dynamic internal representation that is consciously perceived to be external reality. It is suggested that it may be possible to reverse engineer the human vision system in the near future [1].
References [1] J.S. Albus, Reverse Engineering the Brain, International Journal of Machine Consciousness 2 (2010), 193-211.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
4
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-4
Application Feedback in Guiding a DeepLayered Perception Model a
Itamar Arela and Shay Berantb Department of Electrical Engineering & Computer Science, University of Tennessee b Binatix, Inc., Palo Alto, CA
Abstract. Deep-layer machine learning architectures continue to emerge as a promising biologically-inspired framework for achieving scalable perception in artificial agents. State inference is a consequence of robust perception, allowing the agent to interpret the environment with which it interacts and map such interpretation to desirable actions. However, in existing deep learning schemes, the perception process is guided purely by spatial regularities in the observations, with no feedback provided from the target application (e.g. classification, control). In this paper, we propose a simple yet powerful feedback mechanism, based on adjusting the sample presentation distribution, which guides the perception model in allocating resources for patterns observed. As a result, a much more focused state inference can be achieved leading to greater accuracy and overall performance. The proposed paradigm is demonstrated on a small-scale yet complex image recognition task, clearly illustrating the advantage of incorporating feedback in a deep-learning based cognitive architecture. Keywords. Deep-layered machine learning, perception, spatiotemporal inference.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Introduction Perception is at the core of intelligent systems. The vast amount of information that humans (and advanced robotic systems) are exposed to every second of the day is driven by sensory inputs that span a huge observation space. The latter is due to the natural complexity of the world with which such systems interact. This inestimable amount of information must be somehow efficiently represented if one is to successfully function in the real-world. Deep machine learning (DML) is an emerging field [1] within cognitive computing which may be viewed as a framework for effectively coping with vast amounts of sensory information. One of the key challenges facing the field of cognitive computing is perception in high-dimensional sensory inputs. An application domain in which this challenge clearly arises is pattern recognition in large images, where an input may comprise of millions of pixels. These millions of simultaneous input variables span an enormous space of possible observations. In order to infer the content perceived, a system is required to map each observation to a possible set of recognized patterns. However, due to a phenomenon known as the curse of dimensionality [2], the complexity of training a system to map observations to recognized pattern classes grows exponentially with the number of input variables. Such growth primarily pertains to the number of examples the system is required to be presented with prior to becoming adequately proficient.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
I. Arel and S. Berant / Application Feedback in Guiding a Deep-Layered Perception Model
5
A common approach to overcome the curse of dimensionality is to pre-process the data in a manner that reduces its dimensionality to such a level that can be effectively processed by a classification module, such as a multi-layer perceptron (MLP) artificial neural network. Such dimensionality reduction is often referred to as feature extraction. Its goal is to retain the key information needed to correctly classify the input within a lower-dimensional space. As a result, it can be argued that the intelligence behind many pattern recognition systems has shifted to human-engineered feature extraction processes, which at times are very difficult and highly application-dependent. Moreover, if incomplete, distorted or erroneous features are extracted classification performance may degrade significantly. Some recent neuroscience [3][4] findings have provided insight into the principles governing information representation in the mammal brain, leading to new ideas for designing systems that represent information. One of the key findings has been that the neocortex, which is associated with many cognitive abilities, does not explicitly preprocess sensory signals, but rather allows them to propagate through a complex hierarchy of modules that, over time, learn to represent observations based on the regularities they exhibit. Such hierarchical representation offers many advantages, including robustness to diverse range of noise and distortions in the data, as well as the ability to cope with missing or erroneous inputs. DML continues to emerge as a promising, biologically-inspired framework for complex pattern inference. A key assumption in DML is that representation is driven by regularities in the observations. As one ascends the hierarchical architecture of DML systems, more abstract notions are formed. Hence, in higher layers of the hierarchy scope is gained while detail is lost. This appears to be a pragmatic trade off, as well as a biologically plausible one. In the context of artificial general intelligence (AGI) [5], one can view perception as being identical to modeling data, in the sense that partial observations of a large visual field are utilized in inferring the state of the world with which the agent interacts. In most existing deep learning schemes [6][7] there is either none or weak relationship between the (unsupervised) training of the model (DML) engine and the decision making modules. This forces DML systems to form a representation purely based on regularities in the observation rather than being driven also by the application at hand (e.g. visual pattern recognition). It is well known, for example, that neurons in layer IV of the neocortex receive all of the synaptic connections from outside the cortex (mostly from thalamus), and themselves make short-range, local connections to other cortical layers. This suggests that learning may not be driven exclusively by regularities in the observations, but rather co-guided by external signals. In this paper we present an elegant methodology for guiding the representation of a DML system such that it serves as a more relevant perception engine, yielding improved classification accuracy. The approach is based on adjusting the DML sample presentation distribution as it is trained such that relevant salient features can be hierarchically captured. The rest of this paper is structured as follows. In section 1 we outline the proposed deep learning system and its operational modes. Section 2 describes the proposed feedback-based scheme for guiding DML representation. Section 3 describes the simulation results while in Section 4 conclusions are drawn.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
6
I. Arel and S. Berant / Application Feedback in Guiding a Deep-Layered Perception Model
1. Deep-layered Inference Engine The proposed DML architecture comprises of a hierarchy of multiple layers each hosting a set of identical cortical circuits (or nodes), which are homogeneous to the entire system, as illustrated in Figure 1. Each node models the inputs it receives in an identical manner to all other nodes. This modeling, which can be viewed as a form of lossy compression, essentially represents the inputs in a compact form that captures only the dominant regularities in the observations. The system is trained in an unsupervised manner by exposing the hierarchy to a large set of observations and letting the salient attributes of the inputs be formed across the layers. Next, signals are extracted from this deep-layered inference engine to a supervised classifier for the purpose of robust pattern recognition. Robustness here refers to the ability to exhibit classification invariance to a diverse range of transformations and distortions, including noise, scale, rotation, displacement, etc.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Deep-layer Inference Network
Figure 1. Deep-layered visual perception network comprising multiple layers hosting identical cortical circuits. The lowest layer of the hierarchy receives raw sensory inputs. Features generated by the cortical circuits are passed as input to a supervised classifier.
The internal signals of the cortical circuits comprising the hierarchy may be viewed as forming a feature space, thus capturing salient characteristics of the observations. The top layers of the hierarchy capture broader, more abstract, features of the input data, which are often most relevant for the purpose of pattern recognition. The nature of this deeply-layered inference architecture involves decomposing high-dimensional inputs into smaller patches, representing these patched in a compact manner and hierarchically learning the relationships between these representations across multiple scales. The underlying assumption is that input signal proximity is coherent with the nature of the data structure that is being represented. As an example, two pixels in an image, which are in close proximity, are assumed to exhibit stronger correlation than that exhibited by two pixels that are very distant. This assumption
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
I. Arel and S. Berant / Application Feedback in Guiding a Deep-Layered Perception Model
7
holds firmly for many natural modalities, including natural images and videos, radar images and frequency components of an audio segment. In a face recognition application, for example, the output of the classifier may be a single value denoting whether or not the input pattern corresponds to a particular person. DML combined with a classifier may be viewed as a general semi-supervised learning framework. Training the system can be generalized as follows. During the first step, a set of unlabeled samples (i.e. inputs/observations that do not have a known class label associated with them) are provided as input to the DML engine. The latter will learn from such samples about the general structure of the sensory input space it is presented with. During the second step, a set of labeled samples (i.e. inputs that have a distinct class labels associated with them) is provided as inputs. Signals are extracted to a classifier, which is then trained in a supervised manner on the labeled set. Testing is then achieved by presented unseen observations and evaluating the output of the classifier relative to the actual image class.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2. Application Feedback for Improved Perception As described above, the semi-supervised framework that applies to most deep machine learning schemes implies a strict decoupling between the model (i.e. unsupervised training of the DML architecture) and the application (i.e. classifier). Learning to represent the input space based purely on regularities in the observations appears elegant. However, a more pragmatic approach is to guide the learning process of the DML engine so that it is of greater relevance to the classifier. For example, if the observations exhibit regularities which are not pertinent to the classifier to perform well, they may as well be ignored or discarded by the model. Thus, we propose a feedback mechanism between the classifier and the deep learning engine such that representation is optimized for the classification process. The feedback mechanism proposed involves adapting the sample (i.e. observation) presentation distribution to the DML engine based on results obtained from the classification process. To do so, the DML and classifier trainings are performed concurrently, rather than in succession. This is somewhat of a paradigm shift from existing DML methodologies, but one that is argued vital. The classifier considered is a simple MLP feedforward neural network. As opposed to uniformly presenting samples from the unlabeled set to the DML engine, the sample presentation distribution is modified such that observations which need to be reinforced are presented more frequently. The need to reinforce presentations is derived from the classification error measured such that observations (i.e. input samples) that yield relatively high errors will be more frequently presented to the DML engine.
3. Simulation Results The simulation results pertain to a simple image classification scenario. The goal is to provide an example highlighting the advantage of adjusting the sample distribution in an online manner. A database consisting of a train and test sets of images was created synthetically. Each of the two set contained 500 images, belonging to 9 classes (the letters 'C','G','H','M','P','R','T','X','Z'). These classes were each represented by a template image. Every image in both sets of the database was created from one of the template
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
8
I. Arel and S. Berant / Application Feedback in Guiding a Deep-Layered Perception Model
images, with random distortion applied. The distortion included scaling, rotation, erosion and application of additive noise. Sample images from the test set are illustrated in Figure 2.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 2. Examples of letter images distorted randomly and used in evaluating the proposed DML system.
The system was trained in a supervised manner on the training set, whereby the classifier was being targeted with a vector filled with '0's except for a single '1' in the location that corresponds to the image label. Testing was conducted on test set images, which are guaranteed to differ from the training set. During the training phase, a model and a MLP neural network classifier were trained concurrently, in two modes. In the first mode, there was no classifier-model feedback involved, and the sample distribution remained uniform throughout the entire learning process. In the second mode, a feedback mechanism was applied in order to influence the sample presentation distribution, such that images with high classification error were presented more frequently, as described above. In both modes, training was performed in batches. During each batch, 100 images were randomly preselected for presentation from the N images comprising the training set. In the first mode, these images were selected uniformly and independently at random. During the second mode, an adaptive presentation scheme was applied. At the end of each batch, after the DML parameters update, the classification error for each of the images in the training set has been evaluated. As a result, the sample presentation distribution was updated by applying a simple convex summation of the form ൌ ߙ ݅ݐ ሺͳ െ ߙሻ ݐͳ ݅
݁݅ݐ ܰ σ݅ൌͳ ݁݅ݐ
ǡ ݅ ൌ ͳǡ Ǥ Ǥ ǡ ܰ ݐൌ ʹǡ͵ǡ ǥ
(1)
where ௧ =1/N denotes the probability of selecting image i for presentation at batch t, ݁௧ the classification error for the ith image (calculated as the element mean on the absolute difference between the classifier output and the target vector) at batch t, and 0aR @ij < 0
>aR @ij 2 ¦ >aR @ij
otherwise
i, j:> a R @ ij ! 0
(3)
It can be shown that the illustrated procedure can be seen as a statistical estimation process, being the probability amplitude estimated starting from the sample probability amplitude A.
2. The Cognitive Architecture The architecture of the presented system is inspired to the approach illustrated in [14], [15], and it is organized, as shown in figure 1, in three main areas: the subconceptual area, the emotional area, and the behavioural area. 2.1. The Sub-Conceptual Area
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
The sub-conceptual area is aimed at receiving stimuli from the different modalities of the robot and controlling at a low level also its actuators. Two main modules compose this area: x
The MotionModule controls the actuators of the robot in order to perform the movement requested by the Behavioral Area.
x
The PerceptualModule processes the raw data coming from robot sensors in order to obtain information about the environment, and the external stimuli.
The perceptual information sensed by the processing of the raw sensor data is associated to their English description: as an example, if the system recognizes a “red hammer”, it will be associated to the sentence “I can see a red hammer”. The use of a verbal description of the information retrieved from the environment allows their mapping into the emotional area in an easy way. These natural language descriptions associated to each modality will be the input of the conceptual area. 2.2. The Emotional Area The emotional area makes the robot able to find emotional analogies between the current status and the previous knowledge stored on the robot using the semantic space of emotional states. The associative area is built up in order to reflect and encode not only emotions and objects that provoke emotions, but also the personality of the robot. This is true in a twofold manner: x documents used to induce the associative space characterize its dimensions and concepts organization beneath the space; x personality is also encoded as a set of knoxels that play the role of “attractors” of perceived objects beneath the space.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
36
A. Chella et al. / An Architecture for Humanoid Robot Expressing Emotions and Personality
A corpus of documents dealing with emotional states has been chosen in order to infer the space. Emotional states have been coded as “emotional knoxels” in this space using verbal description of situations capable to evoke feelings and reactions. Environmental incoming stimuli are also encoded using natural language sentences. We have selected the following emotional expressions: sadness, fear, anger, joy, surprise, love and a neutral state. A corpus of 1000 documents, equally distributed among the seven states, including other documents characterizing the personality of the robot, has been built. This set of documents represents the emotional knowledge base of the robot. The excerpts have been organized in homogeneous paragraphs both for text length and emotion. A matrix has been organized where the 6 emotional states and the neutral state have also been coded according to the procedure illustrated in the previous section. Each document has been processed in order to remove all “stopwords” i.e. words that do not carry semantic information. A 87x1000 terms documents matrix A has been created where M=80+7 is the number of words plus the emotional states and N=1000 is the number of excerpts. The generic entry aij of the matrix is the square root of the sample probability of the i-th word belonging to the j-th document. The TSVD technique, with K=150, has been applied to A in order to obtain its best probability amplitude approximation < . This process leads to the construction of a K=150 dimensional conceptual space of emotions S . The axes of S represent the “fundamental” emotional concepts automatically induced by TSVD procedure arising from the data. In the obtained space S , a subset of ni documents for each emotional state corresponding to one of the six “basic emotion” Ei has been projected in S using the folding-in technique, i.e. each excerpt is coded as the sum of the vectors representing the terms composing it. As a result, the j-th excerpt belonging to the subset corresponding to the emotional state Ei is represented in S by an associated vector emji and the emotional state Ei is represented by the set of vectors
^
`
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
emi : j 1,2,K n i . j
(4)
The personality of the robot is encoded as a set of knoxels derived from documents describing the fundamental personality characteristics of the robot. As an example, if the robot has a “shy” personality, a set of documents dealing with shyness, bashfulness, diffidence, sheepishness, reserve, reservedness, introversion, reticence, timidity, and so on are used to construct a cloud of “personality knoxels” that represent attraction points for the external stimuli that provoke emotions in the robot. Therefore the i-th excerpt belonging to the subset corresponding to the personality characteristic is mapped as a vector pi in S . The inputs from the sense channels are coded in natural language words or sentences describing them and projected in the conceptual space using the folding-in technique. These vectors, representative of the inputs from the channels, are merged together as a weighted sum in a single vector f(t) that synthesizes the inputs stimuli from environment at instant t: The input stimulus is therefore “biased” through the computation of the contribution of personality attractors in the space as the weighted sum of the “personality knoxels”. pi representing the personality of the robot, Each personality knoxel can be weighted with coefficients Di (with 0 d DI d1) in order to fine tune the personality influence upon the robot’s behaviors.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Chella et al. / An Architecture for Humanoid Robot Expressing Emotions and Personality
fˆ t
f t ¦ D ip i
37
(5)
f t ¦ D ip i
This procedure represents the common process that arises in human beings, when reality is “filtered” and interpreted by personality. The emotional semantic similarity between the vector fˆ t and the knoxels that
S , plus the “neutral” state, can be evaluated using the cosine similarity measure between each emji and fˆ t :
code the six emotions in
sim( fˆ (t),emji )
fˆ (t) emji fˆ (t) emi
(6)
j
i
An higher value of sim(fˆ (t),em j ) corresponds to an higher value of similarity between the emotion evoked from the input and the emotion Ei associated with the vector emji . The semantic similarity measure is calculated between fˆ t and each emji . The vector emji * that maximizes the quantity expressed in formula will be the inferred emotional state Ei. This process will activate the emotional stimulus “i” with a given intensity given by:
Intensity i j sim(fˆ (t),emji * )
(7)
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2.3. The Behavioral Area The purpose of the Behavioral Area is to manage and execute the behaviors coherently with the emotional state inferred by the Emotional Area; the personality of the robot; the environmental status; the stimulus given by the user and the recent past behavior adopted by the robot A behavior is described as a sequence of primitive actions sent directly to the robot (E ) actuators. Each emotional state Ei is related with different behaviors bk i in order to give to the robot a non-monotonous, non-deterministic, and non-boring response. The choice of the behavior is a function g() of the emotion aroused in the robot as function of the environment, the stimulus perceived and the personality of the robot, and a function of the recent behaviors adopted in the past by the robot. (E ) Among the behaviors associated the emotional state Ei, the behavior bk i is selected by the evaluation of a score associated to each one of them. Bestbehaviori k,t Pk r
Ok t
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
(8)
38
A. Chella et al. / An Architecture for Humanoid Robot Expressing Emotions and Personality
where r is a random value ranging from 0 to 1, t (t>0) is the time elapsed by the (E ) instant at which the k-th behavior bk i associated to the emotional state Ei has been executed and the instant at which this valuation is effected; Pk and Ok are the weights assigned to the random value and to the elapsed time respectively. The response with the highest weight is chosen and executed. Since the emotional stimulus is also weighted, thanks to the intensity parameter. In this manner the reaction will be executed with the same intensity: movements of the parts of the body will be quicker, faster or slower. Summing up, the resulting behavior is therefore a composite function: bk(E i )
F gemotion, personality ,stimulus ,Bestbehavior k,t ,Intensity i j
(9)
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
If the emotional state is classified as “neutral” a standard behavior (to lie down, to sleep, and so on) is randomly selected.
Figure 1. The Emotional Humanoid Robot Architecture.
3. Experimental Results The Architecture of Emotional Humanoid Robot NAO was modeled using the cognitive architecture described in the previous section. A human can interact speaking with the robot through its voice recognition system. The Architecture of the robot elaborates the user sentence considered it as present stimulus and the image information of the camera system and generates the correct behavior taking into account also its own “personality”, the context in which it is immersed, and the most recent behaviors executed by the robot. As a preliminary test bed, we considered a human as a storyteller and the humanoid robot as a child robot. If the human, for example, send to the robot as stimuli the following phrase “Faithful Henry had been so unhappy when his master was changed into a frog…”. the dominant emotional state activated in the emotional space of the robot with a “shy” personality is “SAD” and the robot executes a “SAD” behavior.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Chella et al. / An Architecture for Humanoid Robot Expressing Emotions and Personality
39
4. Conclusions and Future Works The results shown in this paper demonstrate the possibility for a humanoid robot to generate emotional behaviors recognizable by humans. In the future we want to focus our attention increasing the emotional interaction with humans.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
References [1] S.M. Anzalone, F. Cinquegrani, R. Sorbello, A. Chella "An Emotional Humanoid Partner". Linguistic and Cognitive Approaches To Dialog Agents (LaCATODA 2010) At AISB2010 Convention, Leicester, UK April 2010. [2] A. Chella, G. Pilato, R. Sorbello, G. Vassallo, F. Cinquegrani, S.M. Anzalone, "An Emphatic Humanoid Robot with Emotional Latent Semantic Behavior" Simulation, Modeling, and Programming for Autonomous Robots (Springer), LNAI 5325, 2008: pp. 234--245. First International Conference, SIMPAR 2008, Venice, Italy, November 2008. [3] A. Chella, R.E. Barone, G. Pilato, R. Sorbello, “An Emotional Storyteller Robot”. AAAI 2008 Spring Symposium on Emotion, Personality and Social Behavior. Stanford University in March 26-28, 2008. [4] E. Menegatti, G. Silvestri, Pagello E., N. Greggio, A. Cisternino, F. Mazzanti, R. Sorbello, A. Chella. "3D Models of Humanoid Soccer Robot in USARSim and Robotics Studio Simulators" International Journal of Humanoids Robotics (2008). [5] R. C. Arkin, M. Fujita, T. Takagi, R. Hasegawa “An Ethological and Emotional Basis for Human-Robot Interaction” Robotics and Autonomous Systems 42 pp. 191–201, (2003). [6] R. Arkin, M. Fujita, T. Takagi, R. Hasegawa: Ethological Modeling and Architecture for an Entertaiment Robot. In: IEEE Int. Conf. on Robotics & Automation, p.p. 453-458. Seoul. (2001). [7] H. Miwa, K. Itoh, M. Matsumoto, M. Zecca, H. Takanobu, S. Roccella, M. C. Carrozza, P. Dario, A. Takanishi “Effective Emotional Expressions with Emotion Expression Humanoid Robot WE-4RII Integration of Humanoid Robot Hand RCH-1”, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai International Center, Sendai, Japan, September 28 - October 2, 2004, Page(s): 2203 - 2208 vol.3. [8] C. Breazeal “Emotion and Sociable Humanoid Robots” International Journal Human-Computer Studies 59, pp. 119–155 (2003). [9] A. Bruce, I. Nourbakhsh, R. Simmons “The Role of Expressiveness and Attention in Human-Robot Interaction” AAAI Technical Report FS-01-02, 2001. [10] Zhen Liu, Zhi Geng Pan “An Emotion Model of 3D Virtual Characters in Intelligent Virtual Environment”. Lecture Notes in Computer Science, Volume 3784/2005, Affective Computing and Intelligent Interaction Book, November 2005. [11] Agostaro F., Augello A., Pilato G., Vassallo G., Gaglio S.: A Conversational Agent Based on a Conceptual Interpretation of a Data Driven Semantic Space. In: Lecture Notes in Artificial Intelligence, Springer-Verlag GmbH, vol. 3673, pp 381-392 (2005). [12] Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. In: Discourse Processes, 25, 259-284 (1998). [13] G. Pilato, F.Vella, G.Vassallo, M. La Cascia “A Conceptual Probabilistic Model for the Induction of Image Semantics”, Proc of the Fourth IEEE International Conference on Semantic Computing (ICSC 2010) September 22-24, 2010, Carnegie Mellon University, Pittsburgh, PA, USA. (in press) . [14] Chella A., Frixione M. and Gaglio S.: An Architecture for Autonomous Agents Exploiting Conceptual Representations. In: Robotics and Autonomous Systems 25, pp. 231–240 (1998). [15] A. Chella, M. Frixione, and S. Gaglio, “A cognitive architecture for robot self-consciousness,” Artificial Intelligence in Medicine, vol. 44, no. 2, pp. 147–154, 2008.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
40
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-40
! "#$% ' "! "*?#'' 6 & / % % ~ % ^ !^ " '' ( & % (% # ^ %
% ~ % $ % $ $
% ^#% \% *?*_* % !*" ^ # % >+ B ' & J' ,*$% ??_? # * % !*@" ' % ! >& ( % _ % % \ % !*=" ~ % ' & "& ' >'' % # ^ % ~ # % !=*" $ ! " # " % # \ ^ \ \ \
% ' % /*% \% =*% *@_*@= #
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
% !==" % >+ % B ' $*!% ??*_?@ % !=?" % %
% K' % # X ~ % ¡
% % # % % *@_*@ # % !=" ^+ ^ $ ^ % ^ ^ #$% % / $ & L %% $% *@*_*? # % !=" #
=* ^ % ¢% +% £ ¡% % ' % ! > "' % @% % % @_ # =* % !=#" $ 6 # \ \ % ' % !*$% *_? # # $ % !=" ^ ^ % + ¢% % ' % ( % ^ ^^
% *=_*? # @ % !=*" ' % ^ ** * "' & Q'! (' ' ( % # #% % =?*_=? # % % \ !=?" ¤ %¤ \% % % % ' % & K' % =@% _X% % *_? #
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-137
137
A Bio-inspired Model for Executive Control Narayan SRINIVASA1 and Suhas E. CHELIAN Department of Information & System Sciences, HRL Laboratories, LLC 3011 Malibu Canyon Road., Malibu, California 90265, United States of America {nsrinivasa, sechelian}@hrl.com
Abstract. Prefrontal cortex (PFC) is implicated in executive control, herein defined as the ability to map external inputs and internal goals toward actions that solve problems or elicit rewards. In this work, we present a computational model of PFC and accompanying brain systems (e.g. limbic, motor control, and sensory areas) based on a review of structure and function. The current design leverages previous models, but organizes them in novel ways to provide transparent and efficient computation. We propose this model provides a biologically plausible architecture that learns from and uses multimodal spatio-temporal working memories to develop and refine reward-eliciting behaviors. It addresses several anatomical and physiological constraints consistent with neurophysiology. The functional competence of the model is illustrated using the “Egg Hunt” scenario. Keywords. executive control, prefrontal cortex (PFC), spatio-temporal working memory
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Introduction Prefrontal cortex (PFC) is implicated in executive control, herein defined as the ability to map external inputs and internal goals toward actions that solve problems or elicit rewards. Miller and Cohen [1] have reviewed several bio-inspired methods for executive control, and there are also several inspired by cognitive psychology (e.g. SOAR, ACT-R, etc.). The former typically consider simple inputs or limit themselves to a small subset of PFC functions, while the latter often do not consider detailed anatomical or physiological constraints. In this work, we present a computational model of PFC and accompanying brain systems (e.g. limbic, motor control, and sensory areas) based on a review of structure and function in primates. The current design leverages previous models, but organizes them in novel ways to provide transparent and efficient computation. This model can be used with any application--from simple (e.g. Wisconsin Card Sorting Task) to complex (e.g. “Egg Hunt”)--requiring the development and refinement of rewardeliciting behavior. The integrated architecture described below was developed for DARPA IPTO’s Phase 1 of BICA, whose primary goal was to develop biological embodied cognitive agents that could learn and be taught like a human. Section 1 reviews the structure of PFC and its accessory brain regions; section 2 details dynamics within neural circuits for executive control; section 3 discusses major 1
Corresponding Author.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
138
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
functional components that interact with neural circuits for executive control; and section 4 describes how this integrated architecture would be exercised in a challenge problem using robotics.
1. Integrated Architecture Design
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
In this section, we describe the structure of our model. An integrated architecture of PFC, along with accompanying brain regions that assist it in making executive decisions and plans, is shown in Figure 1. The interaction between these components enables an animal or robot to exhibit adaptive behaviors in the face of changing environments.
Limbic Cortex (LC) (e.g. Amygdala) Basal Ganglia (BG) (e.g. SNr/Gp) Motor Cortices (MC) (e.g. Pre-Motor Cortex) Thalamus (THAL) (e.g. Thalamus (Relay/Driver)) Lower-level Sensory Cortices (NC) (e.g. Visual Cortex) Higher-level Sensory Cortices (NC2) (e.g. Parietal Cortex) Prefrontal Cortex (PFC) (e.g. LPFC)
Limbic connections (e.g. Hippocampus to Multisensory Cortex) BG connections (e.g. SNc to VA-Thalamus (Modulator)) Motor connections (e.g. Primary Motor Cortex to Spinal Circuits) Thalamic connections (e.g. Thalamus (Relay/Driver) to Pre-motor cortex) Higher-level cortical connections (e.g. Parietal cortex to LPFC) Lower-level cortical connections (e.g. Visual cortex to Parietal cortex) Prefrontal connections (e.g. OPFC to LPFC)
Figure 1. An integrated architecture of PFC (red center portion) with its accessory brain regions. Some abbreviations used later include: amygdala (AM), hippocampus (HC), arousal system/counter (AS/C), hypothalamus (HT), lateral hypothalamus (LH), pedunculopontine tegmental nucleus (PPTN), substantia nigra pars compacta (SNc), ventroanterior thalamus (VA THAL), premotor cortex (PMC), primary motor cortex (M1), parietal cortex (PC), temporal cortex (TC).
Cognitive inputs of PFC originate from sensory cortices (dark purple boxes in Figure 1). Processed stimuli from these regions feed into the next level, the parietal, temporal and multisensory cortices (light purple boxes in Figure 1), where it can be Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
139
stored as long-term memories. These regions of cerebral cortex interface with lateral and ventrolateral PFC to create higher level spatio-temporal schemas, providing PFC with necessary details about the external world. Here “spatio-temporal schema” refers to a module that can generically represent both the “what” and “when” of stimuli; we use “schema” and “chunk” interchangeably. These inputs are further grouped into multimodal chunks in orbitofrontal PFC (specifically orbitolateral PFC) to create an episodic memory of events. “Multimodal chunks” are composed of two or more unimodal chunks, such as from vision and audition. Orbitofrontal PFC (specifically orbitomedial PFC) is connected to the limbic (yellow boxes in Figure 1) and forebrain structures (green boxes in Figure 1). This provides PFC with details about emotional, motivational and appetitive/aversive stimuli. Lateral and oribtolateral PFC interface with dorsolateral PFC to either create or refine motor plans. These motor plans along with other cognitive plans (such as body centered path plans, etc.) are constantly monitored by anterior cingulate cortex for conflicts. When conflicts arise, anterior cingulate cortex resolves them using prior knowledge of rewards and contexts. Motor schemas learned at dorsolateral PFC are also reinforced by drives via orbitomedial PFC. This allows for selection of motor plans that are rewarding or motivational salient. Dorsolateral PFC interacts with motor control regions by downloading its deliberately created plans into pre-motor area. Due to space limitations, only selection of motor plans is detailed below.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2. Neural Circuits for Executive Control We now consider the dynamics of executive control within PFC and its accessory brain regions. PFC is a convergence zone that receives a variety of inputs including those from limbic, motor control, and sensory areas. These inputs provide PFC with a summary of external and internal state to enable planning and decision making on tasks that are impending or have to be performed immediately. We partition PFC into five main regions (proceeding counter-clockwise from the upper left of the red boxes in Figure 1): lateral PFC (LPFC), ventrolateral PFC (VLPFC), orbitolateral PFC (OLPFC), orbitomedial PFC (OMPFC) and dorsolateral PFC (DLPFC) [2, 3]. In our model, all five regions are capable of chunking their respective inputs into hierarchical spatio-temporal chunks shown in Figure 2b. This can be realized using hierarchical ARTSTORE (hARTSTORE) networks [4]. We define hARTSTORE as the n-tier, possibly multimodal, generalization of the original ARTSTORE network. (If desired, hARTSTORE can be replaced with similar spatiotemporal memory models such as HTM [5] or sequence of clusters [6].) Figure 2a provides an example of the neural representation of spatio-temporal patterns within our PFC model. By using joint rotations as inputs, hARTSTORE can encode sequences of robot poses as a motor schema. The lowest ART network clusters spatial patterns, the second STORE network encodes transitions between these patterns, leaving the third ART network to cluster spatio-temporal chunks; repeating this structure yields greater abstraction and generalization at higher layers. Providing different inputs leads to schemas specialized for vision, language, navigation paths, and so on. These working memories, their long-term storage and associations between these stored memories form the bulk of the representations within our model.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
140
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
One fundamental feature of PFC is to enable behavior selection. This is achieved in our model via a set of three neural signals in the lower left of Figure 1 that enable PFC-guided behavior switching. 1. The first signal is provided directly from sense organs such as the tongue (e.g. juice rewards) that provide hypothalamic dopaminergic bursts to OMPFC (green lines in Figure 3). OMPFC has reinforcing pathways to motor schemas in DLPFC that can affect motor plan selection and hence behavior. 2. A second type of signal is received from the hippocampus (“mismatch counters” in Figure 1) and keeps track of the number of mismatches in the perceptual domain given that an animal or robot is looking for something specific (i.e., with a goal in mind). When the counter exceeds a threshold, negative reward is sent to the hypothalamus resulting in a reset of an OMPFC node thereby causing a new motor schema to be selected. It can also shut off the “GO” signal in motor channels thereby stopping currently active behaviors. 3. The third type of signal connects to the drive nodes of the amygdala (“reset toggles” in Figure 1). These neurons toggle the state of amygdale neurons if there is an antagonistic rebound after a certain goal or emotionally relevant inputs are removed (e.g., fear released produces a rebound of relief). This toggle of states is monitored by PFC and results in a negative reward to OMPFC that can cause total reset of all previously operational motor schemas and hence can terminate behaviors prior to the release of input stimulus (e.g., fear stimulus). We predict that the last two types of signals--resetting of motor schemas due to perceptual or drive states--are computed somewhere between PFC and limbic cortex (Figure 1). In particular, there is neurophysiological evidence that cingulate cortex maintains computations like those of mismatch and toggle switches. Ablation studies suggest that posterior cingulate cortex is involved in spatial orientation and memory, likely served by parahippocampal afferents [7]. Thus we propose that hippocampal/perceptual mismatch resets are relayed via the parahippocampal region to the posterior cingulate cortex that maintains mismatch counts. Anterior cingulate cortex is involved in affective processes likely served by amygdale connections [7], thus serving as a potential site for reset toggles. In our model, PFC is a center implicated for deliberative actions, which are more accurate in their response to the world but require some time to form and be executed. On the other hand, there is evidence for a crude pathway that responds very quickly (e.g. [8]). This pathway (yellow arrows in Figure 1) begins when the stimuli (such as motion or a loud noise) is fed from thalamus to amygdala to hypothalamus, where it generates a response by activating regions such as spinal circuits. This response can be wide ranging from reflexive action (e.g. remove hand from heat) to an increase in blood pressure, perspiration, heat rate, etc.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
141
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
F2
c1
c2
c3
cn
C1 = a1a3a2a4 Chunk – “Hold upright”
Reset
LTM F1
z1
z2
z3
zn
F0
K1
K2
K3
Kn
b1
b2
b3
bn
F2 F1
I y1
J1
y2
y3
J2
yn
J3
Jn
}
I
ART
{a1, a3, a2, a4} Temporal order of items
_
}
X
F0
M1
M3
M2
STORE
Items F2
a1
a2
a3
an
a1
an
Reset
LTM F1
x1
x2
x3
xn
F0
I1
I2
I3
In
I
}
ART
Robot Pose
(a) Hippocampus VTC
ATC
ESM
ALC
EGC ΔTPV
PPC ΔPPV
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
LPFC: Unimodal Schemas
VLPFC: Bimodal schemas
DLPFC: Motor schemas
OLPFC: Multimodal Schemas
To Pre-Motor Cortex
B
OMPFC: EXIN Reward Schemas
T O1
O2
O3
R Top-down bias
To ACC
D1
Amygdala
DL
Va-Thal/SNc
Mismatch reset
(b) Figure 2. (a) hARTSTORE, composed of stacked ART, STORE, and ART networks, can store general spatio-temporal patterns such as motor schemas. (b) All five regions of PFC chunk their respective representations with hARSTORE. LPFC, VLPFC, OLPFC, OMPFC, and DLPFC are in counter-clockwise order starting from the upper left.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
142
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
Figure 3. Limbic cortices within an integrated architecture.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
3. Major Functional Components In this section, we outline how additional components interact with the neural circuits of executive control. The major functional components in our model are: 1.) drives and “EXIN” schemas, 2.) sensations and memories, and 3.) habits. The embodiment of the integrated architecture is referred to as TICA, for “Toddler-Inspired Cognitive Agent.” Drives are internal emotions or urges that are influenced by innate or learned external factors (e.g. looming figures cause fear). They compete against each other (e.g. fear v. explore) and produce only one winner through gated dipole dynamics [9]. (Gated dipoles simulate competition with habituation. I.e., each winner will "tire," ceding its place to other competitors, who will also tire and so on.) The following is a list of the dipoles we use: "Please instructor" v. "Ignore instructor,” Explore v. Exploit, and Fear v. Relief. We assume (or program TICA to that effect) that only the "Please instructor" drive has reverberation, or tonic support that prevents its gradual replacement. This maintains fixation on an instructor's commands. The "Please instructor" drive can cede its place, however, if competing drives (especially "Ignore instructor") have sufficient strength. If all drives are equally active, we assume (or program TICA to that effect) the “Explore” drive will win. “EXIN” schemas are a compressed representation of EXternal motivators (e.g. looming figure) and INternal states (e.g. fear and reward/punishment). EXIN schemas help choose motor plans based on reward/punishment. All drives output and affect EXIN schemas, along with reward/punishment and external state information (Figure 2b). We assume (or program
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
143
TICA to that effect) that the verbal expressions of “Look here,” “Now you build…,” etc. have become conditioned reinforcers to the “Please instructor” drive. As in section 2, negative rewards due to too many mismatches in the hippocampal arousal system cause the currently active EXIN schema to be turned off. This shuts off support to the currently most active motor schema (unless it is supported top-down from OLPFC’s episodic memory) and allows another motor schema to be selected. Negative rewards from the “Ignore instructor” cause all EXIN schemas to be reset. This in turn causes a new episodic memory to be chosen. The details of how episodic memory is created and retrieved are discussed below. Sensations are external stimuli such as words, visual objects, observed (or recognized) behaviors, etc. These sensations are stored into spatio-temproal schemas, chunks, or memories with hARTSTORE. Memories can also store internal information such as motor schemas (e.g. to get to B, start at A and go left at some waypoint) or episodic memories (e.g. a particular trip from A to B). Furthermore, memories can be multimodal (e.g., a sight of an egg might be linked to the sound of the word egg). Language schemas can be encoded and recognized via a spatio-TEMPORAL hARTSTORE network (space is a Wernicke’s Area/Medial Temporal--WA/MT--word, and time is a sequence of words), while visual appearance schemas can be handled via a SPATIO-temporal hARTSTORE network (space is an Inferotemporal--IT--image, and time is a sequence of saccades over the item). Similarly, self-guided motor schemas can be represented as a sequence of joint rotations learned from a sequence of inverse kinematic transformations. Externally recognized action schemas can be represented as a sequence of target or present position vectors translated into egocentric coordinates by mirror neurons (e.g. [10]). The former represents the “where” of an action, while the latter represents the “how” of an action. Bimodal schemas occur through the binding of two unimodal schemas in VLPFC. Episodic memories are then generated when OLPFC samples a sequence of bimodal schemas. In our case, we have focused on imitation learning, and the ability to encode the actions of an instructor. These actions can be played back by reading out bimodal schemas into VLPFC. After separation of these bimodal schemas into unimodal schemas in LPFC, they can be played by DLPFC and its supporting structures. As in section 2, OLPFC nodes require bottom-up support from OMPFC’s EXIN schemas. EXIN schemas represent the EXternal and INternal state of TICA: the former comes from LPFC and VLPFC while the latter comes from amygdala and ventroanterior thalamus. Habits are inbuilt motor plans that are triggered when a match of external stimuli and internal goal (e.g. the sight of an object with an internal memory of it) occurs or there is a lack of external motivators or internal goals (e.g. no looming figures or no desire to find an object). In the case that no motor schema is active, habits are chosen in order of their energy expenditure. For example, given a set of salient points, first TICA will saccade to them, then turn its head, then locomote towards it. Match between a top-down goal and bottom-up input causes TICA (by design) to say “I found it.” or “I build it.”
4. Case Study of Executive Control: “Egg Hunt” Here we show how the integrated architecture would be exercised in a challenge problem using robotics. Our model is capable of handling the test suite described by Mueller et al. [11] but for brevity we focus on an Egg Hunt. (“Egg” represents a
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
144
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
generic object of interest such as an IED or requested tool.) Table 1 outlines what structures and functions would be activated in our integrated architecture giving rise to executive control. References to information flow between components shown in Figure 1 (e.g. ACÆWA/MTÆLPFC, auditory) form a flowchart of “what gets used when.” Again, the embodiment of the integrated architecture is referred to as TICA.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Table 1. Structures and functions activated during the “Egg Hunt” scenario. Gray rows contain actions performed by TICA, while plain rows describe how those actions unfold in the current architecture.
Initialization: TICA wakes up, sees the instructor, and an empty table Æ TICA idly scans the room, instructor, and table Here, the “Explore” drive will win when no other drive is active (see sec. 3). This drive state along with the sensory state of the room will create a new EXIN node. Because no motor schemas are active “top down” (i.e., from LPFC), DLPFC will learn to chunk the saccades created between salient points. After several uneventful saccades (assuming the table and instructor do not induce a change in drives or reward or drastically different sensory context), the same EXIN node will be weakly connected to several saccade motor schemas. Tasking: Instructor then places an item on the table and says, “Look here, find this egg.” After a brief moment, the instructor takes the item off the table. Æ TICA attends to and stores the item. The placement of the item on the table creates a salient point (due to motion), and TICA will saccade to it. When the instructor says, “Look here…” the “Please instructor” drive will be activated (see sec. 3). This will create a new EXIN node, shutting off support to the currently active saccade motor schema in DLPFC, and a new episodic memory will be selected in OLPFC. ACÆWA/MTÆLPFC, auditory LPFC, auditoryÆAM AMÆOMPFC Search: TICA searches for the item. Æ TICA will saccade, turn its head, and locomote until it finds the item. After the item was removed from the table, a saliency map of egg-like objects will be created in the superior colliculus of TICA. If the winning object in this map is not the egg, then there will be a mismatch generated in IT. Repeated mismatches from the Arousal System Counter in the HC will cause a negative reward to arrive at OLPFC via HT and BG. HC (ASÆASC)ÆHT (LH)ÆPPTNÆSNcÆVA THALÆOLPFC In this case, a new EXIN node will be selected, shutting off support to the currently active motor schema. A new motor schema will be selected based on energy expenditure and habituative constraints (e.g., after saccading a long while, even though it consumes the least energy, head rotations are chosen). (These habituative dynamics are like those found in SPAN network [12], where certain motor schemas habituate slower but recover more quickly.) These dynamics unfold in in PMC and M1. The most active EXIN node will condition onto the active motor schema. These schemas in turn will represent inverse kinematics conversions that the DLPFC has chunked. Note that the Arousal System has separate reset counters for PC (“where”) and TC (“what”). The vigilance of TC is higher than that found in PC so that mismatch of features cause more resets than mismatch of locations. OPLFCÆDLPFCÆPMCÆM1
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
145
If the winning object is the egg, then there will be a match generated in IT. By design, TICA will say “I found it.” This is a conditioned reinforcer to the “Ignore instructor” drive, and a new motor schema will be selected given a new EXIN node. Since the “Ignore instructor” drive does not have any reverberation, the “Explore” drive will win again and TICA will saccade idly about.
5. Conclusion In this paper, we propose a bio-inspired model for executive control that learns from and uses multimodal spatio-temporal working memories to develop and refine rewardeliciting behaviors. The primary goal of this work was to develop an integrated architecture that could be used in embodied cognitive agents that could learn and be taught like a human. It addresses several anatomical and physiological constraints consistent with neurophysiology. The functional competence of the model is illustrated using the “Egg Hunt” scenario. There are other bio-inspired architectures of executive control, but these primarily focus on navigation [13, 14] or do not provide detailed biological implementations [15]. In the future, we hope to simulate many if not all portions of the model embedded in a virtual environment. Studies of this model’s dynamics and parameters could provide insight into how perceptual, cognitive, and motor control functions are carried out in the brain.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
References [1] E. Miller, J. Cohen, An integrative theory of prefrontal cortex function, Annual Review of Neuroscience 24 (2001), 167-202. [2] H. Barbas, D. Pandya, Architecture and frontal cortical connections of the premotor cortex (area 6) in the rhesus monkey, Journal of comparative neurology 256 (1987), 211-228. [3] H. Barbas, D. Pandya, Patterns of connections of the prefrontal cortex in the rhesus monkey associated with cortical architecture, In: Frontal Lobe Function and Injury, H. Levin, H. Eisenberg and A. Benton (Eds.) Oxford University Press, Cambridge, 1991, 34-58. [4] G. Bradski, S. Grossberg, Fast learning VIEWNET architectures for recognizing 3-D objects from multiple 2-D views, Neural Networks 8 (1995), 1053-1080. [5] D. George, J. Hawkins, Towards a mathematical theory of cortical micro-circuits, PLoS Computational Biology 5 (2009), e1000532. [6] R. Granger, Engines of the brain: the computational instruction set of human cognition, AI Magazine, 27 (2006), 15-31. [7] B. Vogt, D. Finch, C. Olson, Functional heterogeneity in cingulate cortex: the anterior executive and posterior evaluative regions, Cerebral Cortex 2 (1992), 435-44. [8] J. LeDoux, Emotion, memory, and the brain, Scientific America 12 (2002), 62-71. [9] S. Grossberg, N. Schmajuk, Neural dynamics of attentionally-modulated Pavlovian conditioning: conditioned reinforcement, inhibition and opponent processing, Psychobiology 15 (1987), 195-240. [10] G. Rizzolatti, L. Craighero, The mirror-neuron system, Annual Review of Neuroscience 27 (2004), 169192. [11] S. Mueller, M. Jones, B. Minnery, J. Hiland, The BICA cognitive decathlon: A test suite for biologically-inspired cognitive agents, In: Proceedings of Behavior Representation in Modeling and Simulation Conference, Norfolk, 2007. [12] S. Grossberg, D. Repin, A neural model of how the brain represents and compares multi-digit numbers: spatial and categorical processes, Neural Networks 16 (2003), 1107-1140. [13] A. Baloch, A. Waxman, Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN, Neural Networks 4 (1991), 271-302.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
146
N. Srinivasa and S.E. Chelian / A Bio-Inspired Model for Executive Control
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
[14] W. Gnadt, S. Grossberg, SOVEREIGN: an autonomous neural system for incrementally learning planned action sequences to navigate towards a rewarded goal, Neural Networks 21 (2008), 699-758. [15] J. Taylor, M. Hartley, Through reasoning to cognitive machines, IEEE Computational Intelligence Magazine 2 (2007), 12-24.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-147
147
Neural Symbolic Decision Making: A Scalable and Realistic Foundation for Cognitive Architectures Terrence C. STEWART1 and Chris ELIASMITH Centre for Theoretical Neuroscience, University of Waterloo, Canada
Abstract. We have developed a computational model using spiking neurons that provides the decision-making capabilities required for production system models of cognition. This model conforms to the anatomy and connectivity of the basal ganglia, and the neuron parameters are set based on known neurophysiology. Behavioral-level timing and neural-level spike predictions have been made, and are consistent with empirical results. Here we demonstrate how this system can be used to implement standard production system rules, including complex variable matching and other binding operations. This results in predictions about neural connectivity in the thalamus and cortex. We believe our model can be used as a part of any biologically inspired cognitive architecture, allowing researchers to connect low-level neural implementation details to high-level behavioral effects. Keywords. Neural engineering framework, action selection, basal ganglia; production systems, vector symbolic architectures
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Introduction As we have discussed elsewhere[1], a biological explanation of cognition requires a scalable and neurally realistic mechanism for producing and making use of compositionality. That is, neurons need to be able to represent symbols and symbol structures (e.g. chunks) and manipulate them in a fast and flexible manner, all while taking into account the constraints on the numbers of neurons available and the stochasticity of their output. We achieve this by combining two techniques: the Neural Engineering Framework (NEF)[2] and Vector Symbolic Architectures (VSAs)[3]. The NEF treats groups of neurons as representing vectors, where the dimensionality of the vector is less than the number of neurons. This redundancy allows neurons to represent a value using a distributed representation that is robust to neuron variation and death, and maps well onto the representations found in sensory and motor neurons. Furthermore, the NEF allows us to calculate synaptic connection weights that can compute linear and non-linear transformations of these vectors. VSAs are a family of methods for converting symbols and symbol structures into high-dimensional vectors. Individual symbols map to individual vectors, chosen 1 Corresponding Author: Terrence C. Stewart, Centre for Theoretical Neuroscience, University of Waterloo, Canada; E-mail: [email protected]
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
148
T.C. Stewart and C. Eliasmith / Neural Symbolic Decision Making
randomly or based on semantic similarity. Importantly, an arbitrarily complicated symbol tree is also mapped onto a single vector of the same dimensionality. As the complexity of the symbol tree increases, the accuracy of this representation gradually decreases. This allows for the implementation of chunks that contain a flexible number of slots and values using a fixed number of neurons. To show how this system can be used to implement a production system, we start by using the NEF to develop an action selection model based on the neuroanatomy of the basal ganglia. We then describe how cortical structures can determine the inputs to the action selection model, implementing the Left-Hand-Side (“IF”) of production rules. Finally, we show the thalamic and cortical structures needed to effect the outputs of the action selection, implementing the Right-Hand-Side (“THEN”) portion.
1.The Basal Ganglia and Action Selection
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
The core of our model of action selection is an adaptation of an existing rate-neuron model of the basal ganglia[4]. Unlike the majority of action selection models[5][6], and our own previous model[7], this does not use mutual inhibition to achieve selection in a “winner-take-all” manner, as there is little evidence for such connections in the mammalian basal ganglia[4]. Instead, this model makes use of a feedback loop between the globus pallidus external (GPe) and the sub-thalamic nucleus (STN) to regulate the excitation and inhibition arriving at the globus pallidus internal, (GPi) as shown in Figure 1. Dark circles represent 40 leaky-integrate-and-fire spiking neurons and five such groups are needed per production . The model scales linearly, requiring a total of 200 neurons per production. The input consists of the activation levels for each production. The output indicates the single production that should be selected (the one with the highest activation). Since this output is inhibitory, the selected production is the one for which the corresponding output neurons stop firing. Further details on the model, including its timing properties, can be found in our previous work[8].
Figure 1. Action selection in the basal ganglia. Dark circles represent 40 leaky-integrate-and-fire spiking neurons. As in the mammalian basal ganglia, inhibitory connections are direct (no connections between productions), while excitatory connections are broad. The model shown here has three productions (A, B, and C). As the activation level for each of these productions changes over time (right), the output neurons for the one with the highest activation stop firing, indicating the selection of that production. Striatum D1 and D2 indicate two separate types of neurons in the striatum.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
149
T.C. Stewart and C. Eliasmith / Neural Symbolic Decision Making
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2.The Cortex and Chunk Representation Using the Neural Engineering Framework[2], a group of spiking neurons can represent a vector in a distributed manner to an arbitrary level of accuracy. This is achieved by each neuron having a randomly chosen preferred direction vector e, the vector for which this neuron will fire most strongly (and a common feature of sensory and motor neurons). If we want a group of neurons to represent a particular vector x, we can set the input current J to each of these neurons as per Eq. (1), where is the neuron gain, Jbias is the background input current, and e is the neuron's preferred direction vector. Jbias and are chosen from a distribution to match those found in real neurons. By taking this input current and using any standard neuron model (such as the leaky-integrate-and-fire model used here), we can convert a vector into a pattern of spikes that represent that vector. To determine how well that vector is represented, we also need to be able to convert a pattern of spikes back into a vector. This is done by calculating a decoding vector di for each neuron using Eq. (2), where ai is the average firing rate of neuron i for a given value of x, and the integration is over all values of x. The resulting di values are the least-squares linearly optimal decoding vectors for recovering the original value of x given only outputs from each of these neurons. That is, if we take the post-synaptic current from each neuron (i.e. the ionic current caused by the neural spiking) and add them together weighting each one by di, then the result is an estimate of x. The amount of error for this decoding decreases linearly with the number of neurons used[2], so any degree of accuracy can be achieved simply by increasing the number of neurons. In the models described here, we use 8000 neurons to represent 200-dimensional vectors, giving a RMSE of 0.019. The advantage of calculating d is that we can use it to determine synaptic connection weights that will compute transformations of the vectors being represented. If one group of neurons represents x and we have another group of neurons that we want to represent Mx (any linear transformation of x), then this can be achieved by setting their synaptic connection weights wij via Eq. (3). Furthermore, we can calculate connections that compute any nonlinear function f(x) by finding a new set of decoding weights with Eq. (4). All synaptic connections for the models discussed here use these equations. It should be noted that we are not making claims as to how these connection weights are learned. Rather, we use these calculations to determine what the final outcome of a learning process should be. (1) (2) (3) (4) To use this system as part of a cognitive model, we need to convert chunks (i.e. sets of symbol-value pairs or other more complex tree-like symbolic structures) into a single vector. Converting a single symbol into a vector is straight-forward: each symbol can be replaced by a particular vector. These vectors can be randomly chosen (as in this paper), or chosen to respect semantic similarity so that symbols that are similar to each other (e.g. dog and cat) map to similar vectors (i.e. whose dot product is large).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
150
T.C. Stewart and C. Eliasmith / Neural Symbolic Decision Making
To perform symbol manipulation and to achieve the compositional abilities seen in human cognition, we need to be able to represent and manipulate combinations of these symbols. To achieve this, we make use of Vector Symbolic Architectures[3]. Here, a single vector can represent a symbol structure of arbitrary complexity, but as the complexity increases the accuracy of the representation gradually decreases. These combined vectors are formed by performing mathematical operations on the simpler vectors. Crucially, these operations are approximately reversible, allowing the original vectors to be reconstructed, if needed. The particular Vector Symbolic Architecture used here is Plate's Holographic Reduced Representations[9]. This involves two operations: addition, which combines two vectors to create a third that is similar to both of the original vectors ((a+b)·b§1), and circular convolution, which creates a vector that is highly dissimilar to both of the original vectors ((a9b)·b§0). There is also an inverse operation (*) that reverses the order of the elements in a vector (except for the first one) and allows for an approximate inverse of the circular convolution (a9b9b*§a). All of these operations can be implemented in neurons using Eqs. (2) to (4). This approach allows us to compute a vector to represent any symbol structure, based on the vectors for each individual symbol. For example, “dogs chase cats” could Importantly, the be calculated as subject9dogs+verb9chase+object9cats. resulting vector is very different from the one for “cats chase dogs”, allowing these different symbol structures to be distinguished (unlike what would happen if we simply calculated dogs+chase+cats, which is the same as cats+chase+dogs). Each of the individual terms in the calculation are slot-value pairs, allowing us to implement the standard chunks or frames seen in many cognitive architectures. Furthermore, we can also calculate more complex structures, such as “Tim knows that dogs chase cats” as subj9Tim+verb9knows+obj9(subj9dogs+verb9chase+obj9cats). We can thus use a group of spiking neurons to represent a collection of slot-value pairs. In concordance with common terminology in cognitive architectures (such as ACT-R[10]), we refer to these neural groups as buffers. We believe there are multiple buffers found throughout the cortex, and that they form the basis of communication between neural areas. We have previously shown[11] that 20,000 spiking neurons representing a 1,000 dimensional vector is sufficient for storing 8 slot-value pairs using a vocabulary of 100,000 different symbols.
3.Calculating Production Activation Given buffers throughout the cortex, there need to be connections from these buffers to the basal ganglia which allow it to compute the activation level for each production. Exactly how this is done depends on the complexity of the production rule. The simplest production is a matching rule. A set of slot-value pairs are provided for each buffer, and if the buffer contains that particular slot with that particular value, then the rule can fire (i.e. it will have a high activation). We can achieve this by setting the activation of a production to the the dot product of the current vector stored in the buffer with the ideal vector for that production. For example, the activation for the production rule “IF subject=dog” can be computed as (subj9dog)·x, where x is the current buffer value. This value will be large if x contains that slot-value pair, but will be near zero if it does not. This can be efficiently computed for all productions at once
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
T.C. Stewart and C. Eliasmith / Neural Symbolic Decision Making
151
by forming the matrix M where each row is the vector for that production, using Eq. (3) to determine synaptic connection weights. Figure 2 shows the behaviour of three productions as the value stored in the buffer changes. The three productions match to the vectors subj9dog, obj9cat, and subj9dog+obj9cat. We also scale the vectors in each of these rows to adjust the preference of one production over another (its utility). In this case, the third production is preferred over the other two.
Figure 2. Production matching. The buffer is driven to four states for 100ms each (subj9dog, obj9cat, subj9dog+obj9cat, and obj9mouse). Activation in the basal ganglia is computed by multiplying the value in the buffer by the matrix M via Eq. (3), where each row is the matching pattern for a production. The output shows the three productions being successfully chosen in turn, and no production chosen for the final pattern, since it does not match any of them.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
There are two other production matching rules needed: matching only when a slotvalue pair is not present, and matching only when the values in two different slots are the same (or different). The first is implemented by subtracting the vector representation of the slot-value pair from the corresponding row in matrix M, since this adds a negative component to the dot product (-x·x=-1). For the same/different matching rules, new neurons are needed to compute (buffer9slot1*)·(buffer9slot2*), as shown in Figure 2. This approach will work for comparing slot values in two different buffers, but can only perform one comparison at a time. Matching productions examining different slots must either share this structure (by changing the values in the cortical areas slot1 and slot2 using the techniques in section 4), or use another one.
Figure 3. Complex production matching. Production 1 matches if the subject and object are the same, unless the subject is cat, demonstrating the not rule. Production 2 matches if the subject and object are different. The graph shows the activation levels for each production and successful output action selection as the buffer changes through the four possible object/subject combinations (subj9cat+obj9dog, subj9dog+obj9dog, etc.).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
152
T.C. Stewart and C. Eliasmith / Neural Symbolic Decision Making
4.The Thalamus and Action Execution To perform an action, the output of the basal ganglia must be connected back to the buffer(s) and other cortical areas so that the desired change occurs. The simplest action is a production that sends a particular value to a particular buffer. This is implemented in the thalamus by having a set of neurons (40 per production) which receive inhibition from the basal ganglia. These thalamic neurons are connected to the buffers with a separate transformation matrix Me where each column is the vector for the desired state when this production fires (or zeros for productions which do not affect that buffer). We have previously shown this gives 40-60ms between sequential productions[8]. For more complex actions, we can transfer information from one buffer to another by creating a communication channel that is inhibited by the basal ganglia (so information will only flow when the action is selected). A communication channel is a separate group of neurons connected using the identity matrix I for M in Eq. (3). We can also apply other transformations by adding neural groups that combine information from different vectors. For example, to send the value from slot1 in buffer1 to slot2 in buffer2, we need to compute buffer2=buffer19slot1*9slot2. This can be done with a similar connectivity structure as computing buffer9slot1* in Figure 3. We have presented a method for translating a set of production rules into a spiking neural model that adheres to the anatomy and timing properties of the real brain. We define productions that fire based on matches to particular slot values, the absence of particular values (negative matches), and whether two slot values are the same or different. When these productions fire, they can affect the state of buffers by setting them to particular values or transferring information from one buffer to another. These are the capabilities needed for implementing a wide range of cognitive architectures.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
References [1]Stewart, T. C. and Eliasmith, C. [in press] Compositionality and biologically plausible models, in M. Werning, W. Hinze, & E. Machery (eds.), Oxford Handbook of Compositionality (Oxford U. Press). [2]Eliasmith, C. and Anderson, C. [2003] Neural Engineering: Computation, representation, and dynamics in neurobiological systems (MIT Press). [3]Gayler, R. [2003] Vector symbolic architectures answer Jackendoff's challenges for cognitive neuroscience. In Proc. 4th International Conference on Cognitive Science (Sydney, Australia). [4]Gurney, K., Prescott, T. and Redgrave, P. [2001] A computational model of action selection in the basal ganglia. Biological Cybernetics 84, 401-423. [5]Zylberberg, A., Slezak, D., Roelfsema, P., Dehaene, S. and Sigman, M. [2010] The brain's router: A cortical network model of serial processing in the primate brain. PLoS Computational Biology 6(4). [6]Stocco, A., Lebiere, C. and Anderson, J.R. [2010] Conditional routing of information to the cortex: A model of the basal ganglia's role in cognitive coordination. Psychological Review 117(2), 541-574. [7]Stewart, T.C. and Eliasmith, C. [2009] Spiking neurons and central executive control: The origin of the 50-millisecond cognitive cycle. 9th International Conference on Cognitive Modeling (Manchester, UK). [8]Stewart, T.C., Choo, X. and Eliasmith, C. [2010] Dynamic behaviour of a spiking model of action selection in the basal ganglia. 10th Int. Conference on Cognitive Modeling (Philadelphia, USA). [9]Plate, T. [2003] Holographic reduced representations. (Stanford, CA: CSLI). [10]Anderson, J. R. and Lebiere, C. [1998] The atomic components of thought. (Mahwah, NJ: Erlbaum). [11]Stewart, T., Tang, Y. and Eliasmith, C. [2009] A biologically realistic cleanup memory: Autoassociation in spiking neurons. 9th Int. Conference on Cognitive Modelling (Manchester, UK).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-153
153
The Role of the Basal Ganglia– Anterior Prefrontal Circuit as a Biological Instruction Interpreter
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Andrea STOCCO a,1, Christian LEBIERE b Randall C. O’REILLY c and John R. ANDERSON b a Institute for Learning and Brain Sciences, University of Washington, Seattle, WA b Department of Psychology, Carnegie Mellon University, Pittsburgh, PA c Department of Psychology, University of Colorado at Boulder, Boulder, CO
Abstract. Intelligent and versatile behavior requires the capability of adapting to novel and unanticipated situations. When facing novel and unexpected tasks, a fast and general solution consists in creating new declarative task representations, and subsequently acting upon them. Although this mechanism seems straightforward in general terms, it poses significant difficulties to be implemented in a biological model, and the exact neural substrates of this process are still unknown. Based on the analysis of two different computational models, we hypothesized that the brain circuit for interpreting instructions would comprise the aPFC (holding dependencies among specialized cortical areas) and the basal ganglia (orchestrating the exchange of information among regions). To verify this hypothesis, we designed and ran an fMRI experiment where participants had to perform changing tasks that consisted of different combinations of atomic cognitive operations. Both models and experimental data suggest that the aPFC is critical in representing abstract knowledge that reflects planned cognitive operations. This is consistent with the late appearance of aPFC in the evolution of the human brain, and its role in enabling human intelligence and culture. On the other hand, results and simulations show that the effect of this cortical region is made possible by the contribution of the basal ganglia circuit, which works as a general-purpose interpreter of declarative knowledge. Keywords. Instructions; Basal Ganglia; Cognitive Models; Neural Networks.
Introduction One of the hallmarks of intelligent behavior is the capability of directing one’s own behavior on the basis of predefined, declarative representations. This capability is useful because declarative knowledge is usually more flexible to manipulate than other types of knowledge, and can be more easily communicated. Humans routinely exhibit this type of intelligent behavior when they are engaged in complex tasks such as planning or problem solving. Perhaps the most striking example of this behavior is following instructions, i.e. the capability of traducing abstract representations of behavior into action. This process is akin to interpreting a programming language 1
Corresponding Author: Andrea Stocco, Institute for Learning and Brain Sciences, University of Washington, Seattle, WA 98195. Email: [email protected]. Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
154
A. Stocco et al. / The Role of the Basal Ganglia
statement in computer science. Computationally, this process requires some mandatory computational steps that are independent of the implementation of the interpreter itself; in particular, instructions need to be translated into operations and structures that match the underlying hardware. In this paper, we provide converging and computational evidence that a particular circuit in the human brain is responsible for interpreting instructions. In particular, we present two different models of the task, developed in two different modeling frameworks, together with preliminary results from a neuroimaging experiment. The model and the data suggest that the circuit involved in interpreting instructions comprises the anterior regions of the prefrontal cortex and a set of medial nuclei collectively known as the basal ganglia.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1. The Task Instructed behavior is seldom investigated in cognitive psychology, and data from the instructional phase of experiments routinely discarded. Thus, we developed a novel task that was used for both testing our models and collecting experimental data from participants. The task consists in solving a series of arithmetic problems, each of which is combination of three operations, such as “divide x by 3”, “multiply y by 2”, and “multiply x and y”. Each problem required exactly two input numbers (x and y) and always contained one binary and two unary operations. In order to ensure that intermediate and final results were always integer numbers, participants were instructed to use the quotient as the result of a division, and discard the remainder (e.g., 7 / 2 = 3). The three operations were randomly selected from a set of five, each of which was associated to an alphabetical letter L = {A, B, C, D, E}. Table 1 illustrates the operations used in the experiment and provides some examples. Each trial consisted of three consecutive phases: (a) An instruction phase, where the problem was presented; (b) An execution phase, where the two input numbers where presented and calculations were performed; and (c) A response phase, were participants indicated whether a certain number was the solution to the problem or not. The structure of a sample trial is illustrated in Figure 1. Instructions were presented as a string of letters and variables such as AExDy. Instructions were in prefix notation, so that the above problem was interpreted as A(E(x), D(y)), that is, (x / 3) ¬ (y + 1) (see Table 1). Table 1. The five operations used in the experiment Operation
Meaning
Examples
A(x, y)
x¬y
A(4, 2) = 4 ¬ 2 = 8;
A(2, 3) = 2 ¬ 3 = 6
B(x, y)
x/y
B(8, 2) = 8 / 2 = 4;
B(6, 3) = 6 / 3 = 2
C(x)
x¬2
C(4) = 4 ¬ 2 = 8;
C(3) = 3 ¬ 2 = 6
D(x)
x+1
D(7) = 7 + 1 = 8;
D(3) = 3 + 1 = 4
E(x)
x/3
E(9) = 9 / 3 = 3;
E(6) = 6 / 3 = 2
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Stocco et al. / The Role of the Basal Ganglia
155
Figure 1. Structure of a sample trial in the experiment.
2. Models for Interpreting Instruction
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
To explore the nature of the processes involved in interpreting instructions we developed two computational cognitive models that could perform the task. The two models exemplify two complementary and converging approaches. The first model was developed within an integrated cognitive architecture that allows symbolic encoding and decoding of declarative knowledge by production rules. The second model, on the other hand, was built upon an existing lower-level neural network model of the basal ganglia-prefrontal circuit. 2.1. The ACT-R Model The higher-level model of the instruction task was implemented in ACT-R [1], a cognitive architecture that has been particularly successful in modeling human learning and memory and, more recently, neuroimaging data [2]. ACT-R includes declarative knowledge, represented as dictionary-like arrays of slot-value pairs called chunks, and procedural knowledge, represented as production rules. Chunks are permanently stored in a long-term memory but, in contrast to most production systems, can be accessed only when available in buffers serving as interface with memory and sensory modules [1]. Buffers have a limited capacity of one chunk only, and can only be accessed by production rules. Figure 2 illustrates the relationship between modules, buffers, and procedural knowledge. Production rules specify the chunk patterns across the various buffers in both the condition and action sides. Production rules can typically variabilize only the slot value of a chunk, and only under specific circumstances (effectively, involving no search) can they use a variable to refer to a specific slot (and not its value).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
156
A. Stocco et al. / The Role of the Basal Ganglia
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 2. Overview of the ACT-R architecture [1]. Modules are in light grey; buffers in dark grey.
The ACT-R model can execute the entire task, including visually parsing the screen and performing simulated motor responses. During the instruction phase, the model encodes each problem as a series of three consecutive steps. Each step is created by scanning the instruction string right to left, recursively finding the first unattended letter; retrieving the associated operation; and determining whether to apply the operation it to either x, y, or both. During the execution phase, the model simply retrieves the three steps in order, executing the corresponding operations and updating the values of x and y at the conclusion of each step. In ACT-R, all the task information must be either available in the buffers or retrieved prior to being used. Thus, some choices had to be made on how to distribute the relevant task information. These choices are usually constrained both by the specific computations available in a module and its established mapping to a brain region [1]. For instance, the intermediate values of x and y, together with the current step’s position in the series, were stored in a chunk in the imaginal buffer. This is consistent with the imaginal buffer’s association with the parietal cortex, a brain region critically involved in visuo-spatial working memory and mathematical cognition [1-3]. The two most critical parts of the model are the chunks representing the problem steps and the production rules that interpret them. Problem steps were maintained in a special module that mimics the computations of the existing goal module. A new module was created because the goal module is associated with internal control states and not with declarative templates for future actions [2]. No established association exists between this novel module that processes instructions and a brain region, but some speculations are possible. Its role in holding higher-level representations that tie together lower-level actions suggest an association with the anterior prefrontal cortex (aPFC), which has been often associated with similar functions [4,5] The model’s second key component is the production rules that interpret instructions. These rules differ from standard ACT-R rules in that they use variables to indicate slot names, and not only slot values. This procedure is needed to properly instantiate operations are referring to either x or y. The execution of production rules has been associated with the basal ganglia [2], and basal ganglia activity has been successfully predicted either simply counting the number of production rules fired per time unit [1-3], or by counting the number of variable bindings per time unit [6]. Thus, the model predicts that the activity of the basal ganglia should reflect the increased number of variables in the Execution phase.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Stocco et al. / The Role of the Basal Ganglia
157
2.2. The Conditional Routing Model The ACT-R model provides only indirect evidence of the neural basis of interpreting instructions. More compelling evidence can be obtained by modeling the process of following instructions within a framework that directly deals with the underlying biological circuits. Interpreting instructions requires frequent updating of representations in working memory, a process that is mediated by a neural loop that connects various cortical areas with the prefrontal cortex through the basal ganglia. Several models of this circuit exist (e.g., [7-9]). The conditional routing model by Stocco, Lebiere, and Anderson [9] is both consistent with the known biology of the circuit and provides a biological explanation for some of the computations required by an ACT-R model—in particular, for the variable binding process. The basal ganglia comprise a number of interconnected nuclei that route signals form the entire cortex to the frontal lobes. The heart of the model is the simulated striatum, which receives afferents from the entire cortex and is the entry point of the circuit. The striatum is modeled as a flat structure of projection neurons, the so-called striatal matrix, controlled by a set of interneurons. Biologically, interneurons have a high tonic activity maintaining a constant inhibition on projection neurons [10]. In the model, projection neurons have a high threshold that is calculated to match the expected incoming signals from the cortex and the inhibitory interneurons:
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
^ i wi E(xi)
(1)
where wi is weight of the synapses formed with pre-synaptic neuron i, and E(xi) is the rate-coded expected activation value of i. Variable binding is permitted by the particular two-level organization of the model striatum. The striatal matrix is divided into regions that reflect the organization of the cortex. Thus, every cortical region is represented by a corresponding patch on the striatal matrix. Each path also has an internal organization, with sub-compartments representing different parts of the cortex the original cortical regions projects to. This two-level organization can be imagined as a matrix of source-destination pairs of cortical regions, and the entire striatum can be imagined as a switchboard [9]. Figure 3 provides a visual rendition of this organization.
Figure 3. Organization of the striatum and the cortex in the routing model [9].
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
158
A. Stocco et al. / The Role of the Basal Ganglia
Consistent with neurophysiology [10], neurons in the striatum are mostly silent, with only a minority of them actually active at any time. In our model, the active neurons correspond to the active combinations of sources and destinations. Ignoring local computations that occur within striatal neurons, the final state of the striatum is the block product v M of the initial vector v of activations in the source cortical area, and the switchboard matrix of allowed destinations M. The block product is a special case of tensor producta powerful mechanism for variable binding in neural networks [11]. In this case, the variable is the destination cortical region, which is bound to the value v, i.e. the original content of the source region. Notice the similarity between this mechanism and ACT-R’s production rules, where variables are used to bind the contents of a particular destination buffer to the values held in a source buffer.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2.2.1. Instructions and the Control of Variable Binding The very structure of the model suggests one natural way of interpreting instructions. In the routing model, the execution of an operation simply consists in the proper transfer of signals between cortical regions. For example, updating the values of x and y after an operation consists in copying the representation held in the prefrontal region that retrieves arithmetic facts to the cortical region that temporarily holds either x or y. This transfer is directed by the proper activation of cells in the striatum. In fact, any internal operation can be properly represented as a switchboard matrix that shares the same organization of the striatum. Following this logic, we expanded the routing model by adding a novel cortical area that shares the switchboard organization of M, so that variable bindings in the striatal matrix can be properly controlled by the activation of the corresponding cells in the region. In addition to having a switchboard organization, neurons in this region need to have a very low tonic activity; this is required so that their expected activation value E(x) is low, minimizing the effect in calculating the thresholds in Equation (1). and making it easy to bring the activation of projection neurons above the threshold. In fact, we ran a number of simulations showing that this mechanism is sufficient to make the model execute arbitrary operations such as the instructed arithmetic operations required by the task. One can wonder about the biological plausibility of such a hypothetical region. In fact, the anterior part of the prefrontal cortex (aPFC), and in particular the frontal pole, possesses exactly the necessary computational characteristics. Specifically, the aPFC receives massive projections from the frontal lobe, and these projections are topologically organized, thus providing an organization that resembles the frontal projections to the striatum Also, this region is usually silent during the execution of most tasks, with its most polar part actually deactivates during a task [5], thus satisfying the condition of a low expected value. Finally, its projections seem to innervate a large part of the head of the caudate nucleus, the most frontal part of the basal ganglia [12].
3. Neurocognitive Evidence So far, two different computational models have been presented that provide evidence that the process of interpreting instructions can be achieved by the joint workings of the
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Stocco et al. / The Role of the Basal Ganglia
159
anterior prefrontal cortex and the basal ganglia. Before testing this prediction, it is worth examining whether it is consistent with the existing experimental evidence. There is mounting evidence for the role of the aPFC in holding higher-level representations, such as those needed in analogical and meta-cognitive tasks [13], or in tasks that require branching of different goals [4]. To the best of our knowledge, the involvement of the basal ganglia in interpreting instructions has not been tested directly. Many converging lines of research, however, have singled out the basal ganglia as a potential basis for flexible behavior in general. For instance, there is an obvious connection between the function of the basal ganglia and the regulation and updating of working memory. Patients with either Parkinson’s or Huntington’s disease are impaired in tasks tapping different forms of working memory [14], and working memory-related activity in the basal ganglia has been reported in a number of neuroimaging studies [15,16]. Individual differences in working memory performance are also related to genetic differences in the expression of dopamine receptors in the basal ganglia [17], and high working memory capacity individuals show greater modulation of basal ganglia activity with increasing task demands [18]. Other evidence comes from tasks that require strategic reasoning to cope with changes in task rules. These tasks are often used in investigations of so-called executive functions. One such example is the Wisconsin Card Sorting Task, which requires participants to sort cards according to rules they need to discover by trial and error, and are continuously changed by the experimenter. Again, Parkinson patients are unable to correctly perform this task [19]. In summary, the basal ganglia are recruited in a number of different tasks that share the common property of requiring flexible restructuring of behavior, either because new task rules come into play of because the trial difficulty changes. Furthermore, individual differences in performance in these kinds of paradigms are reliably associated with individual differences in the basal ganglia, either at the level of functional responses or at the level of neuroanatomy.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
4. The Experiment The models’ predictions were tested in a neuroimaging study. Ten participants were recruited to perform the task previously described while lying in a 3T fMRI scanner. Their brain activity was recorded at a rate of a full volume acquisition every 2 seconds, with 34 oblique slices acquired for each volume. Each participant solved 80 problems, divided into four blocks of 20 trials each. Unlike most fMRI experiments, each problem was self-paced. In addition to the distinction between encoding and executing a set of instructions, the experiment manipulated the amount of practice as a second factor. This manipulation provides an additional means to isolate the specific act of interpreting instructions, which is important when analyzing data with a limited number of participants (see below). Practice was manipulated by having participants perform a subset of the problems before the experiment. During the experiment, half of trials were novel and half came from the subset of practiced trials.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
160
A. Stocco et al. / The Role of the Basal Ganglia
4.1. Results
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Because the low number of participants limited the statistical power of traditional analysis, we performed a conjunction analysis, using statistical parameter maps thresholded at a liberal voxel-level value (p < 0.01, uncorrected) to isolate regions that are activated in two or more target contrasts. The ACT-R model predicts that the module corresponding to the aPFC region should be more active in Novel than Practiced trials, in both the Instruction and Execution phases. Thus, we created to statistical parameter maps (one for the Instruction phase, one for the Execution phase) that identified those voxels that were statistically more active during the Novel than during the Practiced trials (i.e., Novel > Practiced). As predicted, the analysis identified a cluster of voxels located in the aPFC region, with smaller cluster located in even anterior position in the frontal lobe. The results of this analysis are illustrated in the top part of Figure 4; the crosshairs highlight the aPFC regions.
Figure 4. Results of the experiment.
Both the ACT-R and the conditional routing model predict that the basal ganglia should be more active during the Execution phase than during the Instruction phase. Additionally, both models predict that this asymmetry should hold for Novel problems only; Practiced problems can be executed as a routine, without referring to the original instructions, and there is no reason to expect any additional basal ganglia involvement during their execution. To verify this hypothesis, we created two new contrast maps that identify those voxels more active in the Execution than the Instruction phase (i.e., Execution > Instruction) in the Novel and in the Practiced problems, respectively. As predicted, we found one cluster of voxels that was more active during the Execution phase and corresponded to the right striatum; it is indicated by the crosshairs in the bottom part of Figure 4. As predicted this cluster showed up only in the contrast map
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Stocco et al. / The Role of the Basal Ganglia
161
obtained from Novel trials; Practiced problems did not show, in fact, any voxel that was more active during the Execution phase. In summary, our preliminary results support our models’ predictions and permit to identify two regions crucially involved in interpreting instructions: the aPFC, probably responsible for encoding and accessing abstract representations of cognitive actions, and the basal ganglia, probably responsible for performing the necessary variable bindings while interpreting instructions.
5. Conclusions
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This paper has presented two models and a neuroimaging study of how humans interpret instructions. The models and the experimental data suggest that a circuit formed by the basal ganglia and the anterior prefrontal cortex provide the necessary computations to translate abstract representations of behavior into action. There are at least three reasons why we believe that understanding how the brain interprets instructions is important. First, following arbitrary representations of actions is the core capability that underlies flexible behavior and planning. Thus, it provides one of the foundations of general intelligence. Additionally, interpreting instructions constitutes an interesting problem because, while its solution is rather simple within symbolic frameworks such as production systems, it is instead rather complex to treat within a connectionist framework. Thus, it provides a challenge for bridging the gap between abstract computations and their biological counterpart. The third and final reason why we consider this problem worth investigating is that it provides access to the basic operations of the human brain. As suggested in the introduction, the process of interpreting instructions consists in the translation of abstract representations into basic primitive operations. Thus, understanding how this translation mechanism works implicitly provides information about the nature of the primitive computations available in the human brain and their implementation.
6. Acknowledgments This research was made possible by award FA9550-08-1-0404 from the Air Force Office of Scientific Research (AFOSR) to John Anderson, Randall C. O’Reilly, and Christian Lebiere; by support from the Army Research Laboratory’s Robotics Collaborative Technology Alliance to Christian Lebiere; and by a special award from the Brain Imaging Research Center (BIRC) of Pittsburgh to Andrea Stocco.
References [1] J.R. Anderson, How can the human mind occur in the physical universe? Oxford University Press, New York, NY, 2007. [2] J.R. Anderson, J.M. Fincham, Y. Qin, and A. Stocco, A central circuit of the mind. Trends in Cognitive Sciences 12 (2008), 136-143. [3] J.R. Anderson, Human symbol manipulation within an integrated cognitive architecture. Cognitive Science 29 (2005), 313-341.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
162
A. Stocco et al. / The Role of the Basal Ganglia
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
[4] E. Koechlin, G. Basso, P. Pietrini, S. Panzer, and J. Grafman, The role of the anterior prefrontal cortex in human cognition. Nature 399 (1999), 148-51. [5] S.J. Gilbert, S. Spengler, J.S. Simons, J.D. Steele, S.M. Lawrie, C.D. Frith, and P.W. Burgess, Functional specialization within rostral prefrontal cortex (Area 10): A meta-analysis. Journal of Cognitive Neuroscience 18 (2006), 932–948. [6] A. Stocco and J.R. Anderson, Endogenous control and task representation: An fMRI study of algebraic problem solving. Journal of Cognitive Neuroscience 20 (2008), 1300-1314. [7] M.J. Frank, B. Loughry, and R.C. O’Reilly, Interactions between frontal cortex and basal ganglia in working memory: A computational model. Cognitive, Affective & Behavioral Neuroscience 1, (2001) 137-160. [8] F.G. Ashby, S.W. Ell, V.V. Valentin, and M.B. Casale, FROST: a distributed neurocomputational model of working memory maintenance. Journal of Cognitive Neuroscience, 17 (2005), 1728-1743. [9] A. Stocco, C. Lebiere, and J. R. Anderson, Conditional routing of information to the cortex: A model of the basal ganglia’s role in cognitive coordination. Psychological Review 117 (2010), 540-574. [10] J.M. Tepper, and J.P. Bolam, Functional diversity and specificity of neostriatal interneurons. Current Opinion in Neurobiology 14 (2004), 685-692. [11] P. Smolensky, Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence 46 (1990) 159-216. [12] A. Di Martino, A. Scheres, D.S. Margulies, A.M.C. Kelly, L.Q. Uddin, Z. Shehzad, B. Biswal, J.R. Walters, F.X. Castellanos, and M.P. Milham, Functional connectivity of human striatum: A resting state fMRI Study. Cerebral Cortex 18 (2008), 2735-2747. [13] E. Ferrer, E., E. O. O’Hare, and S. A. Bunge, Fluid reasoning and the developing brain. Frontiers in Neuroscience, 3 (2009), 46-51. [14] M.G. Packard and B.J. Knowlton, Learning and memory functions of the basal ganglia. Annual Review of Neuroscience 25 (2002) 563-593. [15] T.S. Braver, J.D. Cohen, L.E. Nystrom, J. Jonides, E.E. Smith, and D.C. Noll, A parametric study of prefrontal cortex involvement in human working memory. Neuroimage 5 (1997) 49-62. [16] F. McNab and T. Klingberg, Prefrontal cortex and basal ganglia control access to working memory. Nature Neuroscience 11 (2008) 103-107. [17] Y. Zhang, A. Bertolino, L. Fazio, G. Blasi, A. Rampino, R. Romano, M.-L. T. Lee, T. Xiao, A. Papp, D. Wang, and W. Sadée, Polymorphisms in human dopamine D2 receptor gene affect gene expression, splicing, and neuronal activity during working memory. Proceedings of the National Academy of Sciences 104 (2007), 20552-20557. [18] C.S. Prat, T.A. Keller and M.A. Just, Individual differences in sentence comprehension: a functional magnetic resonance imaging investigation of syntactic and lexical processing demands. Journal of Cognitive Neuroscience 19 (2007), 1950-1963. [19] O. Monchi, M. Petrides, V. Petre, K. Worsley and A. Dagher, Wisconsin Card Sorting revisited: Distinct neural circuits participating in different stages of the task identified by event-related functional magnetic resonance imaging. The Journal of Neuroscience 21 (2001), 7733–7741.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-163
163
Learning to Recognize Objects in Images Using Anisotropic Nonparametric Kernels Douglas SUMMERS-STAY and Yiannis ALOIMONOS University of Maryland, College Park
Abstract. We present a system that makes use of image context to perform pixellevel segmentation for many object classes simultaneously. The system finds approximate nearest neighbors from the training set for a (biologically plausible) feature patch surrounding each pixel. It then uses locally adaptive anisotropic Gaussian kernels to find the shape of the class manifolds embedded in the highdimensional space of the feature patches, in order to find the most likely label for the pixel. An iterative technique allows the system to make use of scene context information to refine its classification. Like humans, the system is able to quickly make use of new information without going through a lengthy training phase. The system provides insight into a possible mechanism for infants to quickly learn to recognize all of the classes they are presented with simultaneously, rather than having to be trained explicitly on a few classes like standard image classification algorithms. Keywords. Object Recognition, Anisotropic, Non-Parametric.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Introduction When we look at the world, we are able to classify many things within the field of view quickly, simultaneously, and effortlessly. Most models of attention assume that when we first look at a scene, the brain pulls out only simple features such as contrast, information density, saturation, creating what is known as a “saliency map.” These features are thought to provide cues for where to fixate in an image, and that objects are only recognized when they are the center of attention indicated by the direction of gaze. A few recent works have shown a more complicated and interesting situation. The earlier experiments in this area simply asked participants to look at a scene and describe what they saw. In this case, the location of fixations was predicted pretty well by the low-level features described above. When participants were asked instead to look for a particular item in a natural image, however, the first few fixations were better predicted by the location of the object to be found [1]. This seems to indicate that even from the first glance at a scene, before the brain would have time to do anything that requires anything as slow as conscious reasoning or fitting of a complex model, it is already able to classify many objects in a scene correctly and in parallel. Only after this process is completed do we fixate on the object of interest in order to begin these slower and more accurate processes which require a focused attention. Our system is an attempt to model this aspect of pre-attentive vision. In brief, we • collect biologically plausible rich features, called “prototypes” from training images with known labels
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
164
D. Summers-Stay and Y. Aloimonos / Learning to Recognize Objects in Images
• • •
use these to classify all features on these same images learn a multi-layer model that can refine these estimates using context apply the multi-layer model to test images
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1. Object Recognition in the Brain The following is a sketch of the current consensus about the process of object recognition in primates. Data from the eyes is streamed along the ventral visual pathway beginning in the primary visual cortex (V1) and ending in the inferotemporal cortex (IT). This in turn informs the prefrontal cortex, where the information can be used for taking action. The entire process from V1 to IT only takes about 30 ms in humans. [2] (Information about location in the image also begins in V1 but follows a different path. We do not attempt to imitate this behavior in our model.) The first cells along the pathway, the simple (S1) cells, are similar to local Gabor filters at a particular orientation and scale. Complex (C1) cells integrate the information from a small number of these S1 cells, responding to oriented edges over a wider range of locations and scales. The input of multiple C1 cells, in turn, are used to create more and more complex filters that respond to particular arrangements of multiple edges over larger and larger areas of the image (S2 and C2 cells.) [3] Cells at the end of this process act like radial basis functions, responding strongly to image regions that contain the pattern of interest, and falling off in Gaussian fashion as the similarity between the input patch and the prototype decreases. [4] Up to this point the process is largely feed forward. But within the inferotemporal cortex, these prototypes receive feedback from the prefrontal cortex [5], influencing the interpretation of inputs so that ambiguous areas are resolved into familiar objects through association with the immediate context. For example, a distant brown blob might be interpreted as a shoe if it is found at the bottom of a leg, or as hair if found at the top of a head. For some cells in the IT cortex, the visual similarity between inputs is less important than semantic similarity. Cells that respond strongly to frontal views of faces, for example, respond partially to profiles of faces, even though their appearance is not similar. [6]
2. Object Recognition by Our System Our system follows this natural model closely for the first stages of processing, approximating the action of S1, C1, S2, and C2 cells. (This part of the system uses a variation on the HMAX features described in [7].) Randomly selected 64 x 64 patches of the training images are input, and the results are 256 dimensional vectors which encode much of the shape information in the patches in a compact way. These vectors (which we will call ‘prototypes’) are associated with training labels, giving the classification of the object at the center of the patch. A sliding window is applied, and the approximate nearest neighbors to each windowed region from among these prototypes are returned. What has been described so far is similar to [8]. We extend the model beyond this with multiple layers of
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
D. Summers-Stay and Y. Aloimonos / Learning to Recognize Objects in Images
165
prototypes that do not merely classify an image as a whole, but create a classification map that shows which regions of the image belong to which class. For each sampled point in the images, we find the most similar prototypes and average them, making use of a rich weighting scheme (discussed later.) Using the correct label maps for these training images, the system learns what it ought to produce when a particular pattern of label maps is generated. We do this by creating a new set of prototypes in a second layer, which take as input not just a patch of the original image, but also the associated patch from the estimated label map created by the first layer. This process is repeated several times. When testing images are presented, the exact same process is followed, except that new prototypes are not collected. Instead, each layer of prototypes create during training is applied in sequence, making use of the estimated label map generated by the previous layer. Algorithm Summary 2.1. Training 1. A set of training images are collected. 2. Corresponding label maps are created. 3. For each layer, 4. Features are collected at many random locations within these training pairs. 5. An index is created to enable fast searching among these features. 6. For each training image, 7. A feature is collected at each pixel in the image. 8. A set of similar features are found. 9. A weighted average of the labels of these features is found. 10. An estimated label map is created from these labels.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2.2. Testing 1. 2. 3.
For each test image For each layer, Follow steps 7-10 above.
Though we have used biological language to describe the process in this paper, the problem can also be formulated as a straightforward statistical inference, as described in [9]. Let a training image be represented by the vector X = (x1, ..., xn). Each of the xi represents a single pixel. Each training image comes with a corresponding ground truth map Y = (y1, ..., yn) where yi {1..K} is the label for each pixel i, and an estimated probability of detection map W = (w1, ..., wn) where all the wi are initially set to the same value. We would like to learn to estimate p(yi |X and W). Since this is too large a space to attempt to learn directly (a megapixel image would result in a million dimensional space), we instead learn p(yi |VÕ (X and W)), where V is a subset of X and W consisting of a patch of pixels surrounding xi. and a patch surrounding wi. Once we have learned p(yi |V), we apply it to the patch surrounding each pixel xi in each training image X. In this way, we create an estimated label map W for each of the training images. In this map W, some pixels will be correctly labeled while their neighbors are incorrectly labeled. Since we have the truth map Y for each image, we
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
166
D. Summers-Stay and Y. Aloimonos / Learning to Recognize Objects in Images
can learn, for example, that a pixel wi surrounded by pixels belonging to a particular class K is more likely to itself belong to that class. Moreover, by using both the estimated map W and the original image X together as one half of the training pair, we can do a better job of estimating yi than if we only had the original image X. This process of iteratively creating new estimated detection maps continues until the maps no longer improve. The sharing of context information between neighboring pixels introduced in this way is comparable to how belief propagation networks or conditional random fields (CRFs) have probabilities defined for sharing probabilities between neighbors.
3. Anisotropic Interpolation
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
While the prototypes are a compressed representation of the patches they are derived from (a 64 x 64 patch with 4096 pixels is represented by only 256 values) they are still too high dimensional for approximate nearest neighbor algorithms to work well. The 100 nearest neighbors will contain some correct matches but also many incorrect matches. The usual way to weight the neighbors is with a Gaussian function on the distance from the point to be estimated. Unfortunately, in high dimensional spaces, all points are approximately the same distance apart. This is one aspect of the ‘curse of dimensionality.’ However, the relevant data lies on a lower dimensional manifold embedded in this 256 dimensional space. Because of this, adaptive anisotropic kernels give a substantial improvement over the standard isotropic Gaussians.
Figure 1. Isotropic Gaussian kernels (left) and anisotropic Gaussian kernels (right) on the same ten points.
The advantage can be seen in the Figure 1. Ten points form an expanding spiral. The points represent prototypes. The spiral is 2-dimensional for illustrative purposes— the actual prototypes are points in a 256 dimensional space. In the first illustration, the weights of each prototype are given by an isotropic Gaussian function. When the prototypes are very similar, the points are close together, and the interpolation between them is reasonably accurate. However, when they are widely spaced, each prototype lies in its own island. Test features which are very similar to one particular prototype will be classified correctly, but ones that lie halfway between two prototypes will not be. In the second illustration, anisotropic kernels are used. These are elongated in the direction of neighboring points of the same class. In this case, the points form a nearly connected spiral, correctly estimating the shape of the underlying manifold. This effect is even more pronounced in higher dimensional spaces where the weight is
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
D. Summers-Stay and Y. Aloimonos / Learning to Recognize Objects in Images
167
concentrated in one direction among hundreds, rather than one direction out of two in the illustration. The methods we used to estimate the shape of these kernels is not biologically plausible, relying on taking the inverse of a covariance matrix. (See [10] for details and formulae for these anisotropic kernels.) In the brain, the shape of these kernels may be formed by interaction among similar prototypes gradually “reaching out” towards their neighbors in the same process that allows redundant prototypes to be gradually eliminated during the learning process. This, however, is purely speculative at present.
4. Results
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
We tested the application on the Weizmann horse database [11]. This database has large variations in the appearance, lighting, and pose of the horses and variations in background appearance. The system was trained on 300 of the images and tested on the remaining 27 (See Figure 2.) 500,000 prototypes were collected at random from the training images for each of the five layers. The system used 64 x 64 patches, and created 256 dimensional prototype vectors. A sliding window was compared to the prototypes, and each pixel was assigned a value between white and black according to the ratio between the weighted kernel sums associated with horse and background labeled prototypes. The output was used to assist the next layer, through a total of five layers. The system was able to not only successfully detect the presence of horses, but to label pixels accurately enough that the limbs of many of the horses are clearly defined in the output maps (Figure 2.) Detection is made easier by the fact that each image contains only one horse, and the lack of partial occlusions. However, due to the windowed nature of the algorithm, these factors have not been found to be very problematic for this system. In addition, the horses are all from roughly the same angle. This means fewer prototypes are needed to learn the class than would otherwise be the case.
Figure 2. Test set. Images (left) and corresponding detection maps (right).
5. Conclusion and Future Directions This seems to be a promising approach to forming rough segmentations of the classes of objects in a scene prior to fixation and segmentation. We have begun experiments on including stereo and motion information, to learn to recognize 3D objects and motions as well as image classes.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
168
D. Summers-Stay and Y. Aloimonos / Learning to Recognize Objects in Images
An advantage of this system is that it requires no more resources to learn many classes from a set of training images than it does to learn just two from the same set. Even classes not explicitly specified, such as head or limb detectors in the case of the horse database, are recognized as being visually and semantically similar implicitly. Labeling a single horse leg, for example, could bring up a cluster of similar horse legs because all would activate the same prototypes. In this way, the system is learning something about every class in the training images, even when it doesn’t have a name for the groups it recognizes as similar. In this way it could combine supervised with unsupervised learning. One other interesting possibility is to replace the mapping to discrete labels with a mapping into some kind of semantic space. Objects recognized as being semantically associated would be able to influence the classification of nearby objects in the scene (the presence of a spoon and plate might help to resolve an ambiguous detection as a cup.)
References
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
[1]
W. Einhäuser, M. Spain, and P. Perona, Objects predict fixations better than early saliency. Journal of Vision, 8(14):18, 1–26, 2008 [2] JJ Foxe, GV Simpson. Flow of Activation from V1 to frontal cortex in humans. Experimental Brain Research, 2002. [3] J Mutch, DG Lowe. Multiclass object recognition with sparse, localized features. CVPR 2006 [4] T. Serre, L. Wolf, T. Poggio. Object recognition with features inspired by visual cortex. CVPR 2005. [5] EK Miller, CA Erickson, R Desimone. Neural mechanisms of visual working memory in prefrontal cortex of the macaque. Journal of Neuroscience, 1996. [6] R Desimone, TD Albright, CG Gross. Stimulus selective properties of inferior temporal neurons in the macaque. Journal of Neuroscience, Vol 4, 1984. [7] M Reisenhuber, T Poggio. Heirarchial models of object recognition in cortex. Nature Neuro. 2, 1999. [8] M Reisenhuber, T Poggio. Heirarchial models of object recognition in cortex. Nature Neuro. 2, 1999. [9] Tu, Zhuowen. Auto-context and Its Application to High-level Vision Tasks. Proc. of IEEE Computer Vision and Pattern Recognition (CVPR), 2008. [10] Thomas Brox, Bodo Rosenhahn, Daniel Cremers and Hans-Peter Seidel. Nonparametric Density Estimation with Adaptive, Anisotropic Kernels for Human Motion Tracking. Lecture Notes in Computer Science, 2007. [11] Weizmann horse database can be found at http://www.msri.org/people/members/eranb/ [12] R. Haralick, K. Shanmugam, and I. Dinstein. Texture Features for Image Classification. IEEE Transactions on Systems, Man, and Cybernetics, 3(6), 1973. [13] H. Seo, and P. Milanfar. Training-free, Generic Object Detection using Locally Adaptive Regression Kernels. IEEE Trans. on Pattern Analysis and Machine Intelligence, June 2009 [14] Wu, B., & Nevatia, R.. Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses. Int. J. Comput Vis (2009) 82: 185–204 [15] L. Zhao and L. S. Davis. Closely Coupled Object Detection and Segmentation. ICCV, 2005. [16] J.Winn and J. Shotton. The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. CVPR, 2006. [17] Wu, B., & Nevatia, R.. Detection and Segmentation of Multiple, Partially Occluded Objects by Grouping, Merging, Assigning Part Detection Responses. Int J Comput Vis (2009) 82: 185–204 [12] A. Hollingworth and J.M. Henderson, Accurate visual memory for previously attended objects in natural scenes. J. of Experimental Psychology: Human Perception and Performance, 28: 113-136, 2002. [13] A. Hollingworth, Constructing visual representations of natural scenes: The roles of short- and longterm visual memory. J. of Experimental Psychology: Human Perception and Performance, 30: 519-537, 2004. [14] R.A. Rensink, The dynamic representation of scenes. Visual Cognition, 7:17-42, 2000.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-169
169
Disciple Cognitive Agents: Learning, Problem Solving Assistance, and Tutoring .
Gheorghe TECUCI, Mihai BOICU, Dorin MARCU, David SCHUM Learning Agents Center, George Mason University 4400 University Drive MS 6B3, Fairfax, VA 22030, USA {tecuci, mboicu, dmarcu, dschum}@gmu.edu, http://lac.gmu.edu
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Abstract Over the years we have researched a computational theory and technology that allows regular computer users who are not knowledge engineers to develop intelligent agents that incorporate their problem solving expertise [1-7]. This resulted in a series of increasingly more powerful Disciple cognitive agents that integrate several complementary capabilities. They are able to learn, directly from their users, their subject matter expertise, which currently takes years to establish, is lost when experts separate from service, and is costly to replace. They can assist their users in solving complex problems in uncertain and dynamic environments, and they can tutor students in expert problem solving. Disciple agents have been developed for a wide variety of domains, including manufacturing [1], education [2], course of action critiquing [3], center of gravity determination [4, 5], and intelligence analysis [6]. The most recent Disciple agents incorporate a significant amount of generic knowledge from the Science of Evidence, allowing them to teach and help their users in discovering and evaluating evidence and hypotheses, through the development of Wigmorean probabilistic inference networks that link evidence to hypotheses in argumentation structures that establish the relevance, believability and inferential force of evidence [7].
References [1] Tecuci G., Disciple: A Theory, Methodology and System for Learning Expert Knowledge, Thèse de Docteur en Science, University of Paris-South, 1988. [2] Tecuci G., Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies, San Diego: Academic Press, 1998. [3] Tecuci G., Boicu M., Bowman M., Marcu D., with a commentary by Burke M., An Innovative Application from the DARPA Knowledge Bases Programs: Rapid Development of a Course of Action Critiquer, AI Magazine, 22, 2, pp. 43-61, 2001. [4] Tecuci G., Boicu M., Boicu C., Marcu D., Stanescu B., Barbulescu M., The Disciple-RKF Learning and Reasoning Agent, Computational Intelligence, 21, 4, pp. 462-479, 2005. [5] Tecuci G., Boicu M., and Comello J., Agent-Assisted Center of Gravity Analysis, CD with Disciple-COG and Lecture Notes used in courses at the US Army War College and Air War College, GMU Press,, 2008. [6] Tecuci G., Boicu M., Marcu D., Boicu C., Barbulescu M., Disciple-LTA: Learning, Tutoring and Analytic Assistance, Journal of Intelligence Community Research and Development, 2008. [7] Tecuci G., Schum D.A., Boicu M., Marcu D., Hamilton B., Intelligence Analysis as Agent-Assisted Discovery of Evidence, Hypotheses and Arguments. In: Phillips-Wren, G., Jain, L.C., Nakamatsu, K., Howlett, R.J. (eds.) Advances in Intelligent Decision Technologies, SIST 4, pp. 1-10. Springer-Verlag, Berlin Heidelberg, 2010.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
170
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-170
Attention Focusing Model for Nexting Based on Learning and Reasoning Akshay Vashist1 and Shoshana Loeb Telcordia Technologies, One Telcordia Dr., Piscataway, NJ 08854 Email: {vashist, shoshi}@research.telcordia.com
Abstract. Intelligence can be argued to result from a complex interplay of learning, reasoning, memory, and other higher level functions, however, the survivability of biological systems ultimately depends on timely actions which are a manifestation of these higher level processes. For cognitive systems to succeed anticipating what would come next (i.e., being “one step ahead”) and deriving when to act and how to prioritize actions is as critical, and mostly likely, integrated with, the higher level sense-making processes. In this paper, we propose a biologically motivated model of execution control, called Nexting, which is founded in current understanding of such processes in biological systems. Keywords: Attention Focus, Executive Control, Nexting, Inference
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Introduction Arriving at timely decisions is critical to survival of biological systems and this necessitates limiting higher cognitive processing to relevant inputs. This functionality in biological systems is controlled by attention focusing mechanism of directing attention through constructing expected future events. Arguably, this functionality is most developed in humans and it is said that “the greatest achievement of the human brain is the ability to imagine objects and episodes that do not exist in the realm of the real, and it is this ability that allows us to think about the future” [1]. The human brain is an “anticipation machine”, and the function of predicting or “making future” is perceived as the most important thing it does [2]. Motivated by this, mechanisms to incorporate some aspects of future expectation and surprise as a trigger for learning have been incorporated in AI and Robotics [3]. There are at least two ways in which brains might be said to anticipate the future. The first, which is shared across higher animals, allows a seamless and uninterrupted processing of information streams by the brain. It entails the prediction of the immediate next event or signal that the brain expects to see based on inputs from the present and the immediate past. This mechanism of creating and expecting future remains unnoticed until it fails, in which case we are surprised, e.g., finding a tiger in a city street. This way of anticipating or making future is denoted as “nexting” immediate prediction or anticipation [1]. The second way of anticipating future is unique to humans and involves the ability to imagine an experience without any direct stream of information from the environment, e.g., imagining a reaction to seeing a tiger in the street. 1
Corresponding Author.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Vashist and S. Loeb / Attention Focusing Model for Nexting Based on Learning and Reasoning
171
This position paper discusses the role of nexting in focusing a system’s attention. We surmise cognitive processes that enable nexting in biological systems and propose a cognitive architecture base. Finally, we pose several open issues for future research.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1. Related Work There exists a vast body of work that has studied as well as modeled the mechanism of nexting – the automated near-term, localized anticipation of events. In particular, “surprise” or expectation failure based mechanisms have been utilized to focus the learning mechanism. This has been accomplished in several ways ranging from relying on generalized relationships between concepts in the knowledge domain to utilizing specific knowledge of experienced and concrete problem situations. The generalized knowledge could be structured in knowledge organization units such as scripts, frames, maps or schemas (e.g., discussion in [4], [5]). The specific experiential information can be structured as cases (e.g., discussion in [5], [6]) or even narratives [7]. In his book “Tell Me a Story” Schank discuss the use of narratives as a normal part of intelligence. Being able to find “without looking for it” a story that will help know what to do in a new situation is key to focusing on the relevant aspects of the situation and hence to predicting what will come next. Schank adds “It is an exceptional aspect of intelligence to be able to find stories that are superficially not so obviously connected to the current situation”. This entails labeling or indexing a story in a complex fashion prior to storage so that it will be available in a variety of ways in the future. In this view, higher intelligence depends upon complex perception and labeling. Consequently, knowing what to store about a story and what to “forget” becomes a critical part of the process. This knowledge, whether general or specific, is used then as a source for the processing of the input stream by generating the expectation for the next item and comparing it to the actual input. This comparison or matching does not necessarily have to be exact and, more importantly, it helps with the processing of incomplete or ambiguous information. The process of nexting is seamless, fast and coherent. Unless interrupted by a failed prediction or unmet expectation, it proceeds without surprises in interpreting the stream of information. Nexting shares some of its functionality with automated planning and scheduling which, as we indicated at the introduction is a deliberate visualization of future scenarios and has been widely studied in AI [8]. The overlap and differences between the two are primarily in the time scale of action and amount of computation. Planning is usually defined as finding a sequence of actions from a given set of actions which is often formulated as a computationally expensive offline process2. On the other hand, nexting is an online process which is guided by both attention focus, expectation of future external inputs or imagination and is therefore not entirely goal driven as is the case with planning. Informally, planning is more associated with scheduling whereas nexting with execution control. Moreover, planning is in response to a particular goal but nexting always follows the same attention focus mechanism. From a search perspective in AI, nexting is closely associated with exploration whereas planning is related to exploitation when enough knowledge about the environment has 2
Computationally, planning and scheduling problems are usually formulated as combinatorial optimization, satisifiablity, dynamic programming, POMDPs, or a mix thereof, etc. all of which are computationally expensive and offline processes.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
172
A. Vashist and S. Loeb / Attention Focusing Model for Nexting Based on Learning and Reasoning
been gathered. We believe a solution to the nexting can be developed based on learning, reasoning, memory and attention focus mechanisms. Some aspects of nexting have been studied in neuroscience and cognitive science as models for attention focusing and attention shifting. In particular, there are a wide variety of visual attention models, most of which are bottom-up models focusing on saliencies in visual content [9]. Such saliencies usually describe spatiotemporal gradients, e.g., intensities, sharp edges, movements in visual content [10, 11]. Visual attention models are driven by external stimuli [12] without considering the context or state of the observer and the process of the attention shifting in these models is to maximize information gather to reduce uncertainty [13, 14]. In contrast, the nexting mechanism, we propose, may be driven without any external stimulus, can be volitional (during imagination), and incorporates state of the observer as a key component. Furthermore, we propose nexting as universally applicable cognitive process which can be applied to any type of stimulus and may use visual attention focus and shifting models when dealing with visual input. Thus nexting subsumes visual attention and attentions shifting models.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2. Attention Focus and Nexting In this position paper we view nexting as a manifestation of interaction between learning, reasoning, memory and attention focus mechanisms, along the lines of [4]. Furthermore, we view nexting as being controlled by the attention focus mechanism that can operate in two different modes: future imagination and execution control for actions in the real world. In either case, it is an inference process. In the future imagination case it is a slower and more conscious reasoning process while in the execution control it is a combination of the two processes. From an exploration perspective, nexting prioritizes information gathering. During the execution control the same process manifests/ plays the role of reducing uncertainty in the global plan by either ascertaining the expectation or triggering reasoning when they fail. Nexting is a form of an inference process and is based on either knowledge which is gathered from learning from past experience or from knowledge which is generated from reasoning about the current situation. The dichotomy between the two modes is demonstrated during the processing of information and expectation failure. When expectations are met, the inference is likely to be based on learning (conditioning) based on past experience which is usually a fast process. When expectations fail, the system is guided by reasoning on gathered knowledge which is usually a slower process. Furthermore, the information gathered from the nexting and expectation failure can also trigger learning of new concepts. This usually happens when a critical number of cases of expectation failures accumulate to enable generalization into new patterns or concepts.
3. Our Architecture and Engine for nexting As discussed above, nexting is realized by the attention focus module. Depending on situation, nexting can be argued as predicting the immediate action based on past
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A. Vashist and S. Loeb / Attention Focusing Model for Nexting Based on Learning and Reasoning
173
experience (learning) or reasoning 3 on the active knowledge. The learning and t formation of inference in nexting and, in a role reverrsal; reasoning modules shape the the inputs to modify learrning and reasoning are obtained from the attention foocus module via nexting. In oth her words, whenever an expectation mismatch or a surpprise occurs the cases are reassoned either as special instances of existing conceptss or determined as instances po otential new concepts and become inputs to the learning nnew concepts.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 1. Logical architecture deescribing functional relationships between components involved in nextting.
Nexting emerges from an interaction between learning, reasoning, inference, mem mory, with expectation, and attention focus mechanisms (see Figure 1). It closely interacts w expectation generation andd expectation matching. When conceptual representationns of perceptual inputs (from multiple m sources) match or nearly match the expectattion, nexting continues in a deffault mode where it can be thought of as inference basedd on learning. When perceptuaal input does not match with expectation or cannott be transformed into known cooncepts, reasoning is invoked to reconcile the current innput with the recent historical innputs. If a consistent reconciliation is found, new expectaation when is modified, otherwise thhe conceptual anomaly is recorded in memory and w enough number of such annomalies accumulates learning attempts to generalize them m to modify existing concepts or o to develop new concepts. The modifications or the nnew concepts are then updated as new inference rules and also act as knowledge basee for reasoning. It is interesting to notte that it appears as if nexting operates similarly in the virrtual world of imagination, how wever, in this mode as there is no input from the percepptual world, the inference earlierr used to convert inputs to concepts is now used to consttrain the imagination to make itt more realistic. This mode of nexting does not have to ddeal 3
Learning and reasoning are themselves interconnected concepts. We distinguish them based on direction of processing inputs annd the speed of inference. Inference based on learning usually proceesses inputs in a single direction to deerive the conclusion whereas reasoning processes inputs and knowledgge in multiple passes back and forth to reach a conclusion. Thus inference based on learning is faster annd is dominant in processing perceptuual inputs. On the other hand, inference based on reasoning is slow w and invoked mostly when expectation fails; it is a dominant process in realizing the higher level sensemaaking capability which is necessary to guide g attention during expectation failure and generate a new sets/sequuence of expectations.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
174
A. Vashist and S. Loeb / Attention Focusing Model for Nexting Based on Learning and Reasoning
with uncertainty in the environment and is so dominated by reasoning which better handles complexity rather than learning which is best to tackle uncertainty in perceptual inputs.
4. Discussion and Open Questions We focused on a hallmark function, nexting, in biological cognitive systems, which is critical to make pragmatic artificial cognitive systems. We have outlined a mechanism for nexting that relies on an anticipation of perceptual inputs through predicting likely future when it operates in the real world but relies dominantly on constraints grounded in knowledge, when it deals with the virtual world of imagination. However, several open questions remain including the calibration of the process that matches anticipated input to the actual inputs as reality hardly ever repeats itself. Further, emotional states likely bias anticipation and one need to research on computational models to enable this function. Although learning, reasoning, inference, and anticipation are often construed as separate functions, a deeper look at them suggests that they are very similar cognitive processes perhaps representing different phases of the same underlying cognitive process. There has been separate research into these aspects of cognition but it needs further research to understand their relationships and to develop theories to unify these mechanisms to allow sustained long term progress in emulating biological cognitive architectures. Unless there is a wide scale research into all aspects, computational systems are unlikely to be able to imitate intelligence which emerges from interactions between all components.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
References [1] D. Gilbert, Stumbling on Happiness, Vintage books, 2006. [2] D. Dennett, Kinds of Minds, New York Basic Books, 1996. [3] S.W. Liddle and D.W. Embley, A Common Core for Active Conceptual Modeling for Learning from Surprises. P.P. Chen and L. Y. Wong (Eds.) Active Conceptual Modeling of Learning: Next Generation Learning Based System Development. Springer, (2006), pp. 47-56. [4] R. Schank, Dynamic Memory: A Theory of Reminding and Learning in Computers and People. Cambridge University Press 1982. [5] C. Riesbeck and R.Schank, Inside Case Based Reasoning. Lawrence Erlbaum 1989. [6] A. Aamodt, E. Plaza, Case Based Reasoning: Foundational Issues, Methodological Variations, and system approaches. AI Communications 7: 39-59. 1994. [7] R. Schank, Tell Me a Story: Narrative and Intelligence. Northwestern University Press, Evanston, Illinois. 1990. [8] S. Buindo, M. Fox (Eds.), Recent Advances in AI Planning. Proc. 5Th European Conference on Planning, Springer, 1999. [9] L. Itti, Models of Bottom-Up Attention and Saliency, In: Neurobiology of Attention, (L. Itti, G. Rees, J. K. Tsotsos Ed.), pp. 576-582, San Diego, CA:Elsevier, 2005. [10] E.Knudsen, Fundamental components of attention. Annual review of Neuroscience 30 (2007): 57-78. [11] C. Eriksen and J. St James, Visual attention within and around the field of focal attention: A zoom lens model. Perception & Psychophysics, 40(4): 225–240. 1986. [12] R. Wright and L. Ward, Orienting of attention. Oxford University Press. [13] Y. Cao and L Zhang, A novel hierarchical model of attention: maximizing information acquisition. ACCV 2009 LNCS 5994:224 - 233. [14] A. Treisman and G. Gelade, A feature-integration theory of attention. Cognitive Psychology, 12: 97136. 1980.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-175
175
A Neurologically Plausible Artificial Neural Network Computational Architecture of Episodic Memory and Recall Craig M. VINEYARDa,b,1 Michael L. BERNARDa, Shawn E. TAYLORa, Thomas P. CAUDELLb, Patrick WATSONc, Stephen VERZIa, Neal J. COHENc and Howard EICHENBAUM d a Sandia National Laboratories, PO Box 5800 Albuquerque, NM 87185 1188 b University of New Mexico c University of Illinois at Urbana-Champaign Beckman Institute d Boston University
Abstract. Episodic memory is supported by the relational memory functions of the hippocampus. Building upon extensive neuroscience research on hippocampal processing, neural density, and connectivity we have implemented a computational architecture using variants of adaptive resonance theory artificial neural networks. Consequently, this model is capable of encoding, storing and processing multi-modal sensory inputs as well as simulating qualitative memory phenomena such as autoassociation and recall. The performance of the model is compared with human subject performance. Thus, in this paper we present a neurologically plausible artificial neural network computational architecture of episodic memory and recall modeled after cortical-hippocampal structure and function.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Keywords. Artificial neural network, hippocampus, computational model
Introduction Without a contextual indicator, the term memory may refer to computer hardware, an encyclopedic compilation of facts, or a personal remembrance of a plethora of experiences. Rather than designing an architecture with an extremely large storehouse of information, we have focused upon the ability to form episodic memories of one’s personal history of previous actions and their outcomes. One brain area, the hippocampus, is critically involved in information processing that is fundamental to the subjective experience of recollection. Thus the hippocampus is critical to remembering the spatial and temporal context of a recalled event, to the ability to mentally replay the sequence of events that compose an entire episode, and to remembering additional related events. Importantly, the hippocampus serves its functions as part of a large and highly interconnected brain system that includes widespread areas of the cerebral cortex. Within this brain system, there is a general consensus that areas of the cerebral cortex are specialized for distinct aspects of cognitive and perceptual processing that are essential to memory, and that the cortex is the repository of detailed representations of perceptions and thoughts [7]. The medial temporal lobe (MTL), where hippocampus is located, is the recipient of inputs from 1
Corresponding Author.- [email protected]
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
176
C.M. Vineyard et al. / Artificial Neural Network Computational Architecture
widespread areas of the cortex and supports the ability to bind together cortical representations such that, when cued by part of a previous representation, the MTL reactivates the full set of cortical representations that compose a retrospective memory. This simple, anatomically based scheme provides the framework on which our model is built. In the following sections, we will describe in greater detail the functional components of this system and the pathways by which information flows among them, and a qualitative model of how they interact.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1. Cortex The nature of cortical inputs to the MTL differs considerably across mammalian species [10]. The proportion of inputs derived from different sensory modalities also varies substantially between species, such that olfaction (e.g., rats), vision (e.g., primates), audition (e.g., bats), or somatosensation (e.g., moles) have become disproportionately represented in the brain in different animals [1]. Nevertheless, the sources of information derived from prefrontal and midline cortical areas, as well as posterior sensory areas, are remarkably consistent across species. Across species, most of the neocortical input to the perirhinal cortex comes from association areas that process unimodal sensory information about qualities of objects (i.e., “what” information), whereas most of the neocortical input to the parahippocampal cortex comes from areas that process polymodal spatial (“where”) information [13][1]. There are connections between the perirhinal cortex and parahippocampal cortex, but the “what” and “where” streams of processing remain largely segregated as the perirhinal cortex projects primarily to the lateral entorhinal area whereas the parahippocampal cortex projects mainly to the medial entorhinal area. Similarly, there are some connections between the entorhinal areas, but the “what” and “where” information streams mainly converge within the hippocampus. The cortical outputs of hippocampal processing involve feedback connections from the hippocampus successively back to the entorhinal cortex, then perirhinal and parahippocampal cortex, and finally, neocortical areas from which the inputs to the MTL originated. Rather than modeling entire sensory modality streams, in this architecture we focused on higher-level visual processing. The model starts at the point where the visual input images have been separated into two components. The first sub-image corresponds to the area seen by the focal area of the eye, and the second sub-image contains the entire fieldof-view for the eye. This division models the higher resolution present in the fovea, as well as the way focus and context information are treated separately through some parts of biological cortex. As an example, at the bottom of Figure 1 this division of visual processing directs the sub-image of a blue sphere towards the ventral stream, whereas the contextual information encoding the spheres position as upper right propagates through the dorsal stream. The representation of cortex within our model is comprised of levels of fuzzy Adaptive Resonance Theory (ART) networks [2]. ART is a well established selforganizing neural technique for classifying input activations. Each ART module within a particular level only receives a segment of the underlying input field, with a subsection overlap inspired by biological cortex [7]. While individually a single ART module is capable of performing categorizations, a leveled approach results in higher level ART’s creating categories of the categories created by the lower level ARTs. This category of categories approach enables the higher level representations to encode more abstract concepts. For example, if the lower level modules were representing apples, oranges, and
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
C.M. Vineyard et al. / Artificial Neural Network Computational Architecture
177
broccoli the higher level module could create categories to represent fruits and vegetables. In our architecture, the two lower levels of ART modules represent cortex while the third layer portrays entorhinal cortex (EC).
Figure 1: Neurocomputational Architecture
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2. Hippocampus Similar to the cortex, our hippocampal representation is comprised of several ART variations. However, as previously alluded to, hippocampus performs a distinctly different function than cortex. While the cortex attempts to represent the conceptual structure of its inputs, the hippocampus attempts to quickly bind snapshots of high level cortical activity, where concepts originating in multimodal sensory input are bound together. The interaction of hippocampus with MTL cortices is critical for binding of items and contexts [5]. As a result of this binding process, the hippocampal representation can also be used to recover neocortical representations from partial activations [3][4]. Inspired by extensive neuroscience research, the hippocampus representation in our model is a loop of neural modules starting at EC, proceeding to dentate gyrus (DG), continuing to CA3, and then returning to entorhinal cortex through a conjoined CA1 and subiculum representation. These sub regions are addressed individually as follows.
3. Dentate Gyrus Anatomically, DG consists of a large number of neurons with relatively sparse neural activation at a given instant. As a result of this phenomena, it has been suggested that the DG creates sparse, non-overlapping codes for unique events via pattern separation [9]. The DG receives the conjoined multimodal sensory signals from EC. It performs pattern separation on this abundance of sensory information to produce sparse output activation, which ensures different semantic concepts are given unique encoding [12]. Computationally, in our model, DG is represented as a series of winner-take-all (WTA) fuzzy-ART modules. A WTA module is a competitive network with a single category which beats out all other categories to represent the input vector. As a result, a
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
178
C.M. Vineyard et al. / Artificial Neural Network Computational Architecture
sparse encoding is created as each of the k WTA modules yields a single output. While each of the WTA modules may consist of numerous neural populations, only a single representative output is active at a given time as is observed in biological neural subsystems. Similar input vectors will be represented by the same single winning output, and dissimilar inputs will be represented by a differing winning output, yielding pattern separated outputs. These outputs serve as the input for CA3.
4. CA3 While both CA1 and CA3 both contain pyramidal cells like the cortex, existence of extensive recurrent connections in CA3 and the presence of inhibitory and excitatory interneurons have led some investigators to suggest that CA3 may be involved in pattern completion [11]. Functionally, CA3 assists with episodic binding through auto-association by which similar concepts are grouped together. In our implementation, the sparse output pattern from DG serves as input to CA3. The functionality of grouping similar concepts together is implemented as a SelfOrganizing Map (SOM) structure. As the name implies, a SOM requires no guidance while grouping similar entities, and additionally maps related entities to distinct topological regions [6]. As opposed to using an existent SOM, such as a Kohonen map, we opted to utilize the desirable properties of ART modules while incorporating the neighborhood updating capabilities of a SOM within a fuzzy-ART module. Consequently, to embody CA3, we have created a SOMART module. Where as normal ART update the matching template for the single winning output node, SOMART update the winner and updates neighboring templates where the magnitude of update falls off with distance from the winning node. Thus the learning algorithm creates regions of activation topologically mapped from semantics of the input space as interpreted by the ART matching algorithm. In effect, related concepts are clustered together to help associate episodic memories and these "islands" of relational bindings form the inputs to CA1.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
5. CA1 Anatomically, the output of CA3 proceeds to CA1 and then to the subiculum as the major output region of the hippocampus. Although it is known that subiculum serves as a major output point completing the hippocampal loop, it is not well known what processing occurs within subiculum. Consequently, as the conjoined endpoint of our hippocampal loop, we have merged the representation of CA1 and subiculum in our model. CA1 has been implicated in learning relational information for temporal sequences and connecting episodic encodings back to the original sensory inputs from EC. This ability to link sequences allows for temporal packaging of episodes. In our model, CA1 is comprised of a unit which temporally integrates CA3 outputs using a set of leaky integrators as well as a Laterally Primed Adaptive Resonance Theory (LAPART) module. The temporal integrator provides a gradient of input conjunctions from CA3, by which the oldest bindings have the smallest signal in the temporal integrator, and likewise the most recent bindings are represented by the largest representation. LAPART is a semi-supervised neural network consisting of two ART modules. Given two inputs, one of the ART modules tries to predict the input pattern the other
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
C.M. Vineyard et al. / Artificial Neural Network Computational Architecture
179
lateral module receives. This predictive mapping capability implements the desired CA1/subiculum functionality in the sense that it completes the hippocampal loop mapping CA3 encodings back to the original EC representation.
6. Performance Comparison
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
As a means of assessing the merit of this computational architecture we have compared model performance with human subject performance. From a qualitative standpoint, we have examined regions of neural activity and activation streams in comparison with humans in familiarity and recollection tasks. Additionally, we have compared model performance with that of humans in auto association tasks. Figure 2 depicts a series of model output representations showing activations of three face-house input pairings. The CA3 region is located in the upper left portion of each of the output representations. The top two examples correspond to two different faces with a common house. Consequently, their CA3 activations are very similar. The lower output example within Figure 2 depicts a different face-house pairing and as expected has a distinct CA3 encoding. For a more in depth discussion of quantitative approaches we have applied to assess our architecture see [15] [16].
Figure 2: Model Visualization
7. Conclusion Anatomical evidence suggests the following hypothesis about how information is encoded and retrieved during memory processing. During encoding, representations of distinct
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
180
C.M. Vineyard et al. / Artificial Neural Network Computational Architecture
items (e.g., people, objects, events) are formed in the perirhinal cortex and lateral entorhinal area. These representations along with back projections to the “what” pathways of the neocortex can then support subsequent judgments of familiarity. In addition, during encoding, item information is combined with contextual (“where”) representations that are formed in the parahippocampal cortex and medial entorhinal area, and the hippocampus associates items and their context. When an item is subsequently presented as a memory cue, the hippocampus completes the full pattern and mediates a recovery of the contextual representation in the parahippocampal cortex and medial entorhinal area. Hippocampal processing may also recover specific item associates of the cue and reactivate those representations in the perirhinal cortex and lateral entorhinal area. The recovery of context and item associations constitutes the experience of retrospective recollection. It is based upon this understanding of the neuroanatomical structure and function of the hippocampus and nearby neural regions that our model is based. Consequently, this model is capable of encoding, storing and processing multi-modal sensory inputs as well as simulating qualitative memory phenomena such as auto-association and recall.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
References [1] R.D. Burwell, M.P. Witter, & D.G. Amaral. Perirhinal and postrhinal cortices of the rat: a review of the neuroanatomical literature and comparison with findings from the monkey brain. Hippocampus, (1995), 390-408. [2] G.A. Carpenter, S. Grossberg, D.B. Rosen, Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system, Neural Networks, 4, (1991), 759-771. [3] N. Cohen, H. Eichenbaum, Memory, Amnesia, and the Hippocampal System, MIT Press, Cambridge MA, 1993. [4] H. Eichenbaum, N. Cohen, From Conditioning to Conscious Recollection Memory Systems of the Brain, Oxford University Press, New York, 2001. [5] H. Eichenbaum, A.P. Yonelinas, C. Ranganath, The Medial Temporal Lobe and Recognition Memory, Annual Review of Neuroscience, (2007), 123-152. [6] S. Haykin. Neural Networks a Comprehensive Foundation, Prentice Hall, Upper Saddle River, N.J, 1999. [7] R. Kingsley, Concise Text of Neuroscience, Williams & Wilkins, Lippincott, 2000. [8] L. Krubitzer, & J. Kaas, The evolution of the neocortex in mammals: how is phenotypic diversity generated?, Current Opinions in Neurobiology, 15, (2005), 444-453. [9] Leutgeb, et al, Independent Codes for Spatial and Episodic Memory in Hippocampal Neuronal Ensembles, Science 309, (2005). [10] J.R. Manns, and H. Eichenbaum, Evolution of the hippocampus, J.H. Kaas, ed. Evolution of Nervous Systems. Vol 3. Academic Press: Oxford, (2006), pp. 465-490. [11] R.C. O'Reilly, J.W. Rudy, Conjunctive representations, the hippocampus, and contextual fear conditioning, Cognitive Affective Behavior Neuroscience, (2001), 66-82. [12] E.T. Rolls, & R.P. Kesner, A computational theory of hippocampal function, and empirical tests of the theory. Progress in Neurobiology, 79, (2006), 1-48. [13] W.A. Suzuki, & D.G. Amaral,Perirhinal and parahippocampal cortices of the macaque monkey: cortical afferents. Journal of Comparative Neurology 350, (1994), 497-533. [14] L. Vila, A survey on temporal reasoning in artificial intelligence, AI Communications, 7, (1994), 4–28. [15] C.M. Vineyard, S.E. Taylor, M.L. Bernard, S.J. Verzi, J.D. Morrow, P. Watson, H. Eichenbaum, M.J. Healy, T.P. Caudell, & N.J. Cohen, Episodic memory modeled by an integrated cortical-hippocampal neural architecture. Human Behavior-computational Intelligence Modeling Conference 2009. [16] C.M. Vineyard, S.E. Taylor, M.L. Bernard, S.J. Verzi, T.P. Caudell, G.L. Heileman, P. Watson, A CorticalHippocampal Neural Architecture for Episodic Memory with Information Theoretic Model Analysis, World Multi-Conference on Systemics, Cybernetics and Informatics, (2010), 281-285. [17] S. Walczak, Artificial neural network medical decision support tool: Predicting transfusion requirements of ER patients, Information Technology in Biomedicine, IEEE Transactions on, 9, (2005) 468–474.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-181
181
Validating a High Level Behavioral Representation Language (HERBAL): A Docking Study for ACT-R Changkun Zhao1, Jaehyon Paik2, Jonathan H. Morgan1, and Frank E. Ritter1 1 The College of Information Sciences and Technology 2 The Department of Industrial and Manufacturing Engineering The Pennsylvania State University University Park, PA 16802 [email protected], [email protected], [email protected], [email protected]
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
ABSTRACT. We present a docking study for Herbal, a high-level behavioral representation language based on the problem space computational model. This study docks an ACT-R model created with Herbal to one created by hand. This comparison accomplishes several things. First, we believe such studies are necessary for achieving and demonstrating the theoretical rigor and repeatability promised by high-level representation languages. Second, it is necessary to evaluate the effectiveness and efficiency of high-level cognitive modeling languages if they are to make a significant impact in either the cognitive or social sciences. Third, this kind of study provides an opportunity to test Herbal's ability to produce ACT-R models from a GOMS-like representation that contains hierarchical methods, memory capacity, and control constructs. Finally, this study provides an example model for future validation work in this area. Our study addresses each of these points by docking Pirolli's [1] price finding model in ACTR with the same model written in Herbal. We extended Herbal to support more memory types, and in the process may have extended the PSCM.
Keywords. Model validation, Docking, Herbal, and ACT-R
Introduction This paper addresses the challenges associated with comparing and validating cognitive models across cognitive architectures. Cognitive models implemented in cognitive architectures have successfully modeled the effects of bounded rationality[2] on cognition[3-6] and to some extent on social interactions[7] Achieving a domain of validity for cognitive models has, however, proven more difficult for several reasons. These reasons include the following: the inability to hold high-level abstractions constant across architectures while testing a specific model; ambiguity regarding the relationship between a model and unique aspects of its host architecture; and confounding variables introduced by architectural differences in generating models[8]. High-level cognitive languages such as Herbal (High-level Behavioral Representation Language)[9], HLSR (High-Level Symbolic Representation)[8], and
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
182
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
Icarus[10] offer an approach for addressing these issues. Each has a representation structure that can serve as a baseline abstraction from which to compare models. These abstractions share a core set of commonalities found in most cognitive architectures including: declarative and procedural memory, memory retrieval mechanisms, goals, methods for responding to external events, and iterative decision-making[8]. Using high-level cognitive languages for model validation, however, requires first testing the ability of these languages to replicate the results of models developed in their supported architectures (Soar, ACT-R, and Jess in the case of Herbal). This study is an initial docking study of Herbal’s ACT-R compiler[11, 21]. Herbal is an open source cognitive modeling language based on Newell et al.’s[6] Problem Space Computational Model (PSCM). It uses a modified version of the PSCM to represent a set of common cognitive modeling knowledge structures to create models in three cognitive architectures (Soar, ACT-R, and Jess). Users can either develop models using a GUI editor or by editing Herbal’s XML code directly[12]. To test Herbal, we extend validation approaches developed for social modeling[13, 14] to the validation of cognitive models. Specifically, we test the equivalence of two versions of Pirolli’s[1] price finding model (one developed in ACTR, the other in Herbal) using an alignment, or “docking”, study methodology.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1. D ocking
cognitive models
Cognitive Science has historically compared simulation results to human data to assess validity. While this method remains vital, it provides no clear criteria for establishing either model equivalence or subsumption within or across cognitive architectures. In addition, this method provides no means of isolating implementation effects from the premises of the theory; the theory and the implementation in essence are indistinguishable[8]. Docking studies conducted within a high-level cognitive architecture provide both criteria for establishing equivalence for models testing the same phenomena and a way of disambiguating implementation effects from the theory’s premises. Cooper, Fox, Farringdon, and Shallice[15] have suggested a similar direction for cognitive modeling. Docking studies are commonly used in social modeling[16], systems engineering[17], and bioinformatics[18] to test model equivalence[13]. Model equivalence is evaluated by comparing the tested models’ output or results after processing identical inputs. Equivalence, in this approach, is further defined in one of three ways: numerical, statistical, or relational equivalence. These notions of equivalence differ in their strictness and are appropriate in different settings. The most rigorous of these tests, numerical equivalence refers to comparing the models’ output to see if the numeric results are identical. Numerical equivalence is seldom used because it inapplicable when testing stochastic models. When validating stochastic models, researchers generally test for statistical equivalence by comparing the models’ distributions over multiple runs. When, however, the models’ inputs or outputs differ, relational equivalence is assessed, for example to what degree the same internal relationships exist across levels of aggregation. Docking studies entail a process of translation that isolates the core premises of a model from the implementation effects associated with its host environment. Achieving behavioral equivalence between models is, however, nontrivial. In addition
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
183
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
to aligning the two models’ parameters, this process requires isolating the models’ core processes and ensuring those processes are consistent across both models. This procedure is commonly referred to as establishing component equivalence. Establishing component equivalence across cognitive models is complicated by the layered nature of cognitive modeling. A cognitive model generally operates within a cognitive architecture to perform a specific task in a given environment. Each layer (the model, the architecture, the task, and the environment) complicates the validation process; however, the embedded nature of cognitive models poses a unique validation problem. Cognitive models implemented in cognitive architectures often represent more than a single theory but a theory of theories, a layered representation specifying different but interrelated aspects of cognition across multiple levels of abstraction. A cognitive architecture is, in itself, a theory of cognition while versions of that architecture can be viewed as elaborations to or revisions of that theory. Furthermore, cognitive models developed in a given architecture can vary with respect to what architectural components they utilize depending upon model’s specified task. In addition, a cognitive model’s output can be complex, making assessing equivalence difficult. Cognitive models frequently produce traces of activity that list the number, types, timing, and information content of the steps performed by the model in a given task. These steps generally rely on stochastic processes whose parameters are specified by the architecture but can be adjusted. For models capable of learning, these processes can vary retention rates, obscure instances where learning has occurred, and produce differences in the models’ learned behaviors despite performing the same steps in the same way. These factors make general predictions about other task sequences or other tasks challenging, but not impossible. We expect these factors will also complicate validating complex cognitive models; however, these confounding variables are known[15] and can be controlled[19]. Also, in many cases, the comparisons and docking procedures do not lead to simple summative measures but formative measures giving rise to insights about the task, the cognitive architecture, and the human behavior.
2. C omparative
study of ACT-R and Herbal Models
For this validation study of Herbal’s ACT-R compiler, we compare three versions of Pirolli’s[1] Price Finding Model (PFM), the original in ACT-R 5: one in ACT-R 6, and the third in Herbal 3.0.5. The PFM has two distinct advantages that made it suitable for this initial study. First, while the PFM is a simple model, we expected its sophisticated manipulation of declarative memory elements would test both Herbal’s ACT-R compiler and the ability of its ontology to represent a broader array of cognitive models. Second, while the PFM’s use of ACT-R’s declarative, procedural, and goal retrieval systems is consistent with more complex ACT-R models, the PFM does not utilize either ACT-R’s perceptual-motor or its subsymbolic computations, simplifying the validation process.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
184
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
2.1. Methodology We tested for numerical equivalence between the three versions of the PFM by comparing three major outputs: the models’ total number of cycles, the number of subcycles, and the models’ best price. While there were no changes to the memory systems used by the PFM between ACT-R 5 and 6, we compared the PFM implemented in ACT-R 6 to a version implemented in Herbal because the original model was lost. After collaborating with the author, we were able to re-implement the PFM in ACT-R 6 in approximately 6 hours, and confirmed numerical equivalence between the original and our re-implemented version (noted in Table 1) for model cycles and best price found. We expected numerical equivalence between the three models for two reasons: first, the inputs and means of interaction for all three models were identical; second, the PFM does not utilize the stochastic processes in ACT-R, namely its perceptual-motor or subsymbolic computations. We confirmed the PFM’s results were constant across both ACT-R versions by running the new model multiple times (n=10). 2.2. Models’ Description: Establishing component equivalence Establishing component equivalence between the ACT-R versions of the PFM and its Herbal counterpart required changes to Herbal’s ACT-R compiler. Though Herbal’s ACT-R compiler has supported hierarchical task analyses[20, 21], supporting the PFM required developing a new retrieval function and changing the interface to enable users to designate chunks as either goal or retrieval chunks. We describe these changes and the steps taken to establish component equivalence below. We first describe the processes used by the PFM in ACT-R 6, and then compare it to those developed for Herbal’s ACT-R complier.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
2.2.1. Pirolli’s Price Finding Model: ACT-R 5 and ACT-R 6 Pirolli’s PFM consists of six productions—four productions for the general goal (start, first-link, next-link, and done) and two productions for the subgoal (minimum-pricestays-the-same and new-minimum-price). The model uses the general goal for finding and storing the best price and the subgoal for comparing new prices with the current best or minimum price. Out of the two prices, the PFM selects the lower price one. The production start initializes the goal and retrieval buffers; the production firstlink uses the first price accessed from declarative memory as the initial best price; the production next-link activates a subgoal to compare a new price with the best price; and the productions new-minimum-price and minimum-price-stays-the-same determine whether a new best price is selected. If the price selected is less than the best price, the subgoal cycle returns the new price as the new minimum price. Otherwise, it returns the current best price as the minimum price. Then, production next-link sets the returned minimum price as the best price in the general goal cycle. The model uses a competitive iterative loop consisting of the productions next-link and done to drive the information foraging process. The PFM determines whether to fire the next-link production by evaluating two external functions1: an expected savings rate and labor 1
These are external in the sense that they are not intrinsic to ACT-R.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
185
value. Calculating the price, expected savings rate, and the labor value, the model fires the next-link production when there is an expected savings associated with searching for a new price; and fires the done production when the expected saving is less than the labor cost. 2.2.2. Pirolli’s Price Finding Model: Herbal 3.0.5
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
As noted previously, we achieved component equivalence in two ways: by modifying Herbal’s interface to enable users to specify whether a component will be used as a chunk-type in the retrieval buffer, and by adding a “retrieve” function as an action element, allowing the model to retrieve a particular value in the retrieval buffer. We created two Herbal types: a link type and a find type. The link type corresponds to the chunk-type link in the ACT-R versions of the PFM while the find type is a goal that replicates the chunk-types find and minimum.
Figure 1. Herbal's ontological representation of the PFM
Figure 1 shows Herbal’s ontological representation of the PFM. This representation contains five levels: agent, problem space, operators, conditions, and actions. The highest level is the user agent, which includes two problem spaces, the find-best-price and the find-minimum. The find-best-price problem space searches a list of prices and stores the best price. The find-minimum problem space compares the best price to a new price and returns the lower price. The two problem spaces consist of six operators that are created to match the six productions found in the ACT-R versions of the PFM. The start, first-link, next-ink, and done operators are associated with the find-best-price problem space while the price-stays-the-same and new-asminimum operators are associated with the find-minimum problem space. Each operator contains one condition and one action to replicate the buffer testing process and the buffer managing process found in ACT-R.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
186
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
2.3. Results The ACT-R 6 version of the PFM used the 30 declarative memory elements a particular expected saving rate. Our ACT-R model examined 27 declarative elements before firing the production done. By the 27th memory element, the best price was $66, with an expected saving rate of $8 per hour. At this price, the expected saving rate was less than the assumed labor cost $10 per hour, resulting in the model firing the done production and terminating the whole process. The Herbal version of the PFM used 30 declarative memory elements, examining 27 before firing the done production. By the 27th memory element, the best price was $66, with an expected savings rate of $8 per hour. Table 1 shows the details of result. Table 1. Experimental results for three tested variables. Model Total Cycles
Sub-Cycles
Best Price
Price Finding Model (ACT-R 5)
53
25
66
Price Finding Model (ACT-R 6)
53
25
66
Price Finding Model (Herbal 3.05)
53
25
66
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
3. D iscussion
and conclusion
Our docking study began with recreating Pirolli’s[1] price finding model. The recreated model performed identically across two versions of ACT-R for the dimensions we tested. After reconstructing the PFM in ACT-R 6, we created a version of the PFM in Herbal. To do this, we had to extend Herbal’s ACT-R compiler to include a retrieval function. The resulting Herbal model was numerically equivalent in the following ways: the number of total cycles, sub-cycles, and the best price. We also compared the hand-coded ACT-R source code with Herbal’s autogenerated source code to determine to what degree Herbal can replicate the ACT-R models. We found that the two versions of the source code are structurally equivalent because they have the same number of productions and chunk types, and the elements of “IF” (conditions) and “THEN” (actions) in every production are nearly the same. They differ, however, in that the Herbal code checks the goal buffer before operating each goal type slot, whereas the hand-coded model only checks the buffer at the beginning of each production’s condition and action. Nevertheless, the Herbal model passed the syntax test, running with no errors. Reimplementing the PFM in ACT-R 6 took 6 hours after corresponding with the author, while implementing the PFM in Herbal took approximately an hour. In this case, using Herbal decreased the time required to build the model. Extending Herbal’s interface and the Herbal’s ACT-R compiler, however, took a couple of days to fully develop and test. The features necessary to better support ACT-R’s declarative memory retrieval functions constitute an extension of not only Herbal’s technical abilities but also its ontology. Nevertheless, this study provides a model for testing high-level cognitive architectures, as well several lessons for future docking studies. For the later comparison and development of models, researchers should publish or archive their models. While rewriting the PFM was not onerous, reconstructing a larger model would be. The use of unstructured model archives, such as act.psy.cmu.edu, is clearly
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
187
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
helpful while more structured archives[22, 23,26], is better yet. Herbal may be able to provide such an archive, at least for the models created in it. Our study suggests that ACT-R 5 and 6 were functionally equivalent for this model. While the PFM was simple, it used core aspects of the architecture. Testing the effects of changes to complex architectures will, however, require docking multiple models of greater complexity than the PFM. Finally, developing and validating models in Herbal or other high-level cognitive modeling languages facilitates both documentation and reuse. Having the model in a formal representation with more explicit entry and exit points (operators in this case) encourages reuse by putting the model in a more comprehensible format. Furthermore, using docking studies incrementally to compare increasingly more complex models affords us the opportunity to identify core processes through testing, helping us retain some degree of parsimony in our theories. This study, thus, provides an incremental testing and development strategy for high-level cognitive architectures. More specifically, operationalizing a concept of component equivalence allows us to identify a set of core processes. Over time, this approach may allow us to more fully realize a unified theory of cognition by establishing more concrete methods of model subsumption. Newell[5] noted that there is more in cognitive architectures than we have dreamed. Examining hand-written models though docking is one way to better understand the capabilities in and salient differences between cognitive architectures. The way we make models has implications for developing more unified theories of cognition. While many models fully utilize the core attributes of their host architectures, others do not. For instance, when we reversed compiled a Soar models[24] we found that not all published Soar models follow the PSCM. In some instances, this may be appropriate and extends their use. These extensions and uses can be supported by a high level language. On the other hand, non-canonical use makes replicating previous work more difficult, and frustrates efforts to build a more coherent body of knowledge in the Cognitive Sciences.
4. F uture
work
In the future, we will explore three directions regarding Herbal. First, we will continue to extend the Herbal ACT-R compiler because it cannot fully support ACT-R currently. Second, we will choose a more complex model with subsymbolic computation such as the Diag Model[25.] Third, we will extend Herbal’s Soar compiler to compile our PFM model into Soar source code. In this case, we can validate the effectiveness of crossarchitecture modeling in Herbal and compare the behavioral differences between Soar and ACT-R.
5. Acknowledgement This work also is supported by the grants from DTRA (HDTRA1-09-1-0054) and ONR (N00014-09-1-1124). We would like to thank Dr. Mark Cohen and Dr. Jong Kim for their useful suggestions and comments.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
188
C. Zhao et al. / Validating a High Level Behavioral Representation Language (HERBAL)
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
References [1] Pirolli, P. Information foraging theory: Adaptive interaction with information. Oxford University Press, New York, NY, 2007. [2] Simon, H. A. A behavioral model of rational choice. The Quarterly Journal of Economics, 69 (1955), 99118. [3] Anderson, J. R. How can the human mind exist in the physical universe? Oxford, New York, NY, 2007. [4] Laird, J. E., Rosenbloom, P.S., & Newell, A. Chunking in Soar: The anatomy of a general learning mechanism. Machine Learning, 1(1986), 11-46. [5] Newell, A. Unified theories of cognition. Harvard University Press, Cambridge, MA, 1990. [6] Newell, A., Yost, G. R., Laird, J. E., Rosenbloom, P. S., & Altmann, E. Formulating the problem space computational model. Carnegie Mellon Computer Science: A 25-Year commemorative. ACM-Press (Addison-Wesley), Reading, MA, 1991, 255-293. [7] Carley, K. M., & Newell, A. The nature of the social agent. Journal of Mathematical Sociology, 19 (1994), 221-262. [8] Jones, R. M., Crossman, J. A., Lebiere, C., & Best, B. J. An abstract language for cognitive modeling. In The Seventh International Conference on Cognitive Modeling, Trieste, Italy, 2006, 160-165. [9] Cohen, M. A., Ritter, F. E., & Haynes, S. R. Applying software engineering to agent development. AI Magazine, 31, 2010, 25-44. [10] Langley, P., & Choi, D. A unified cognitive architecture for physical agents. In Proceedings of the twenty-first national conference on artificial intelligence. AAAI Press, Boston, 2006. [11] Haynes, S. R., Cohen, M. A., & Ritter, F. E. Designs for explaining intelligent agents. International Journal of Human-Computer Studies, 67 (2009), 99-110. [12] Friedrich, M., Cohen, M. A., & Ritter, F. E. A gentle introduction to XML within Herbal. ACS Lab, The Pennsylvania State University, University Park, PA, 2007. [13] Axtell, R., Axelrod, R., Epstein, J. M., & Cohen, M. D. Aligning simulation models: A case study and results. Computational and Mathematical Organization Theory, 1 (1996), 123-141. [14] Loui, M., Carley, K. M., Haghshenass, L., Kunz, J., & Levitt, J. Model comparisons: Docking OrgAhead and SimVision. In Proceedings of the 1st Annual Conference of the North American Association for Computational Social and Organization Science 6 (Day 4). NAACSOS, Pittsburgh, PA, 2003. [15] Cooper, R., Fox, J., Farringdon, J., Shallice, T. A systematic methodology for cognitive modelling. Artificial Intelligence, 35 (1996), 3-44. [16] Xu, J., Gao, Y., & Madey, G. A docking experiment: Swarm and repast for social network modeling. In Proceedings of the Seventh Annual Swarm Researchers Meeting (Swarm 2003). Notre Dame, IN, 2003. [17] Barry, P. S., Koehler, M. T. K., McMahon, M. T., Tivnan, B. F. Agent-directed simulation for systems engineering: Applications to the design of venue defense and the oversight of financial markets. International Journal of Intelligent Control and Systems, 14 (2009), 20-31. [18] Fillizola, M., & Weinstein, H. The study of G-protein coupled receptor oligomerization with computational modeling and bioinformatics. The Federation of European Biochemical Societies Journal, 272 (2005), 2926-2938. [19] Gluck, K. A., & Pew, R. W (Eds.). Modeling human behavior with integrated cognitive architectures: Comparison, evaluation, and validation. Erlbaum, Mahwah, NJ, 2005. [20] Paik, J., Kim, J. W., Ritter, F. E., Morgan, J. H., Haynes, S. R., Cohen, M. A. Building Large Learning Models with Herbal. Proceedings of ICCM -2010- Tenth International Conference on Cognitive Modeling. Philadelphia, USA, 2010, 187-192. [21] Cohen, M. A., Ritter, F. E., & Haynes, S. R. Applying software engineering to agent development. AI Magazine. 31(2010), 25-44. [22] Cornel, R., Amant, R. S., & Shrager, J. Collaboration and modeling support in CogLaborate. In Proceedings of the 19th Conference on Behavior Representation in Modeling and Simulation. BRIMS Society, Charleston, SC, 2010, 10-BRIMS-129, 146-153. [23] Wong, T.J., Cokely, E. T., & Schooler, L. J. An online database of ACT-R parameters: Towards a transparent community-based approach to model development. Proceedings of ICCM - 2010 - Tenth International Conference on Cognitive Modeling. Philadelphia, USA, 2010, 282-286 [24] Girouard, A., Smith, N. W., & Ritter, F. E. Lessons from decompiling an embodied cognitive model. In Cognitio 2006 Workshop, cognitio.uqam.ca/index.php?section=posters&lng=en,2006 [25] Ritter, F. E., & Bibby, P. A. Modeling how, when, and what learning happens in a diagrammatic reasoning task. Cognitive Science. 32(2008), 862-892. [26] Myung, J., & Pitt, M. Cognitive Modeling Repository. Cognitive Science, 2010.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Manifesto
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-191
191
Introducing the BICA Society Alexei V. SAMSONOVICH a,1, Kamilla R. JÓHANNSDÓTTIR b, Andrea STOCCO c and Antonio CHELLA d a Krasnow Institute for Advanced Study, George Mason University 4400 University Drive MS 2A1, Fairfax, VA 22030, USA b Department of Law and Social Sciences, University of Akureyri Solborg, Nordurslod 2, 600 Akureyri, Iceland c Institute for Learning and Brain Sciences, University of Washington, USA Old Fisheries Center, BOX 357988, Seattle, WA 98195, USA d Department of Computer Engineering, University of Palermo Viale delle Scienze edificio 6, 90128 Palermo, Italy [email protected], [email protected], [email protected], [email protected]
Abstract. The Biologically Inspired Cognitive Architectures Society, or the BICA Society, is a recently formed nonprofit organization. The purpose of the Society is to promote and facilitate the transdisciplinary study of biologically inspired cognitive architectures (BICA), in particular, aiming at the emergence of a unifying, generally accepted framework for the design, characterization and implementation of human-level cognitive architectures. The First International Conference on Biologically Inspired Cognitive Architectures (BICA 2010) is at the same time officially the First Annual Meeting of the BICA Society.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Keywords. Scientific societies, roadmap, challenge, human-level intelligence
We live in a unique historical period, when human-level general intelligence can be expected to become available in artifacts within a century, if not sooner. The size of this anticipated breakthrough exceeds human imagination, and has no analogs in the past. It is therefore vital now to unify our efforts in approaching the challenge of making this breakthrough happen and coping with its outcomes. Our vision for the BICA Society, a recently formed nonprofit organization, is that it will help researchers to create a world-wide infrastructure for this unification. The challenge of creating a real-life computational equivalent of the human mind requires that we better understand at a computational level how natural intelligent systems develop their cognitive and learning functions. The paradigm of biologically inspired cognitive architectures (BICAs) 2 has emerged as a new approach bringing together disjointed disciplines to develop this understanding and to use it in order to create artifacts capable of human-like learning, intelligence, and also consciousness. Within this approach researchers from different fields develop general computational architectures that are inspired by the workings of the biological brain and mind. BICAs provide the platform for designing artificial intelligent behavior, and are based on the progress made in understanding and quantifying the mind and the brain. This new 1
Corresponding Author. Here “biologically inspired” is understood broadly as “brain-mind inspired”. The acronym “BICA” was coined in 2005 by Defence Advanced Research Projects Agency (DARPA). The BICA Society has no connection to the terminated DARPA BICA Program (http://www.darpa.mil/ipto/programs/bica/bica.asp). 2
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
192
A.V. Samsonovich et al. / Introducing the BICA Society
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
approach combines the insights from biology and the neurosciences, the sharpness of computer sciences, and the boldness of pursuing a holistic and integrated path to reproducing intelligence. Still, despite impressive successes and growing interest in BICA, wide gaps separate different approaches from each other and from solutions found in biology. Modern scientific societies pursue related yet separate goals, while the mission of the BICA Society consists in the integration of many efforts in addressing the above challenge. The BICA Society aims at providing a shared reference point for researchers from disjointed fields and communities who devote their efforts to solving the same challenge, but speak different languages. Starting with this very conference, the BICA society will promote the transdisciplinary study of cognitive architectures, and empower researchers with means to exchange their ideas and tools to pursue the BICA challenge. In the long-term, the BICA society will help to put forward a unifying, generally accepted framework for the design, characterization and implementation of human-level cognitive architectures. The creation of a real effective human-level BICA is a tremendous challenge. It can be compared to other great challenges like the man on the moon, the Hubble telescope, the human genome, the Manhattan Project, and so on. We believe that it is very unlikely that a single laboratory, no matter how great and resourceful it is, may succeed in this fantastic effort. Human-level BICA is a great challenge, and it can be dealt only by unified efforts of many laboratories, people, research institutions in the world that continuously share their thinking, their knowledge and their partial results. The newborn BICA society will be a main actor in allowing and facilitating this sharing of thinking and knowledge in order to face the great challenge of a human level biologically inspired cognitive architecture. The list of the Founding Members of the BICA Society at this point includes Alexei V. Samsonovich, Kamilla R. Johannsdottir, Andrea Stocco, Antonio Chella, Ben Goertzel, James S. Albus, Christian Lebiere, David Noelle, Stan Franklin, Shane T. Mueller, Itamar Arel, Brandon R. Rohrer, and others.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Review
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved. doi:10.3233/978-1-60750-661-4-195
195
Toward a Unified Catalog of Implemented Cognitive Architectures Alexei V. SAMSONOVICH Krasnow Institute for Advanced Study, George Mason University, 4400 University Drive MS 2A1, Fairfax, VA 22030-4444, USA [email protected]
Abstract. This work is a review of the online Comparative Table of Cognitive Architectures (the version that was available at http://bicasymposium.com/cogarch on September 20, 2010). This continuously updated online resource is a collective product of many researchers and developers of cognitive architectures. Names of its contributors (sorted alphabetically by the represented architecture name) are: James S. Albus (4D/RCS), Christian Lebiere and Andrea Stocco (ACT-R), Stephen Grossberg (ART), Brandon Rohrer (BECCA), Balakrishnan Chandrasekaran and Unmesh Kurup (biSoar), Raul Arrabales (CERA-CRANIUM), Fernand Gobet and Peter Lane (CHREST), Ron Sun (CLARION), Ben Goertzel (CogPrime), Frank Ritter and Rick Evertsz (CoJACK), George Tecuci (Disciple), Shane Mueller (EPIC), Susan L. Epstein (FORR), Stuart C. Shapiro (GLAIR), Alexei Samsonovich (GMU BICA), Jeff Hawkins (HTM), David C. Noelle (Leabra), Stan Franklin (LIDA), Pei Wang (NARS), Akshay Vashist and Shoshana Loeb (Nexting), Cyril Brom (Pogamut), Nick Cassimatis (Polyscheme), L. Andrew Coward (Recommendation Architecture), Ashok Goel, J. William Murdock and Spencer Rugaber (REM), John Laird (Soar), and Kristinn Thórisson (Ymir). All these contributions are summarized in this work in a form that makes the described architectures easy to compare against each other.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Keywords. Cognitive architectures, model and data sharing, unifying framework
Introduction This work is a review of the online resource called Comparative Table of Cognitive Architectures. 1 At the time of writing, this resource is in the process of rapid development. It is a collective product of many researchers and developers of cognitive architectures, who submitted their contributions over email to the author of this paper for posting them on the Internet. Each contribution was posted as is, without subediting, which inevitably resulted in some inconsistency in terminology and in interpretation of statements by the contributors. Despite these inconsistencies and the incompleteness of the source, we (the contributors to the online resource) believe that the present snapshot of the Comparative Table of Cognitive Architectures needs to be documented, so that in making further steps authors would be able to use a bibliographic reference to the documentation of the first step.
1
Specifically, the version of Comparative Table of Cognitive Architectures that was available online at http://bicasymposium.com/cogarch/ on September 20, 2010. Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
196
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Historically, the first prototype of the Comparative Table of Cognitive Architectures was made available online on October 27, 2009. The initiative of this comparative table started during preparation of a discussion panel on cognitive architectures that was organized by Christian Lebiere and Alexei Samsonovich at the 2009 AAAI Fall Symposium on BICA2 held in Arlington, Virginia, in November of 2009. This panel, devoted to general comparative analysis of cognitive architectures, involved 15 panelists: Christian Lebiere (Chair), Bernard Baars, Nick Cassimatis, Balakrishnan Chandrasekaran, Antonio Chella, Ben Goertzel, Steve Grossberg, Owen Holland, John Laird, Frank Ritter, Stuart Shapiro, Andrea Stocco, Ron Sun, Kristinn Thórisson, and Pei Wang. The idea was to bring together researchers from disjointed communities that speak different languages and frequently ignore each other’s existence. Quite unexpectedly, the attempt to engage them in common discussion was very successful [1], and after the panel most of the panelists joined the initiative by submitting their entries to the table. The time has come to make the collected entries more visible by summarizing them in a paper. The names of contributors to the online resource (as of September 20, 2010) and the short names of represented by them cognitive architectures are listed below, sorted alphabetically by the architecture name. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.
James S. Albus (representing 4D/RCS), Christian Lebiere and Andrea Stocco (representing ACT-R), Stephen Grossberg (representing ART), Brandon Rohrer (representing BECCA), Balakrishnan Chandrasekaran and Unmesh Kurup (representing biSoar), Raul Arrabales (representing CERA-CRANIUM), Fernand Gobet and Peter Lane (representing CHREST), Ron Sun (representing CLARION), Ben Goertzel (representing CogPrime), Frank Ritter and Rick Evertsz (representing CoJACK), George Tecuci (representing Disciple), Shane Mueller and Andrea Stocco (representing EPIC), Susan L. Epstein (representing FORR), Stuart C. Shapiro (representing GLAIR), Alexei Samsonovich (representing GMU BICA), Jeff Hawkins (representing HTM), David C. Noelle (representing Leabra), Stan Franklin (representing LIDA), Pei Wang (representing NARS), Akshay Vashist and Shoshana Loeb (representing Nexting), Cyril Brom (representing Pogamut), Nick Cassimatis (representing Polyscheme), L. Andrew Coward (representing Recommendation Architecture), Ashok Goel, J. William Murdock and Spencer Rugaber (representing REM), John Laird and Andrea Stocco (representing Soar), Kristinn Thórisson (representing Ymir).
2
AAAI: Association for the Advancement of Artificial Intelligence. BICA: Biologically Inspired Cognitive Architectures. Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
197
Since the beginning of modern research in cognitive modeling, it was understood that a successful approach in intelligent agent design should be based on integrative cognitive architectures describing complete agents [2]. Since then, the cognitive architecture paradigm proliferated extensively [3-7], resulting in powerful frameworks such as ACT-R and Soar (described below). While these two examples are best known and most widely used, other architectures described here may be known within limited circles only. Nevertheless, every architecture in this review is treated in the same way. This seems necessary in order to be able to compare them all and to understand why cognitive modeling did not result in a revolutionary breakthrough in artificial intelligence during the recent decades. In order to see what vital features or parameters are still missing in modern cognitive architectures, it is necessary to put many disparate approaches next to each other for comparison on equal basis. Only then a unifying framework for integration can emerge. This is the main objective of the online Comparative Table and also of the present work. Description of each cognitive architecture in this review is based on one and the same template that closely follows the template of the table posted online at http://bicasymposium.com/cogarch.1 The data included here with a consent of each contributor is based on, and corresponds to the data posted in the Comparative Table.1
1. 4D/RCS This section is based on the contribution of James S. Albus to the Comparative Table of Cognitive Architectures.1 RCS stands for Real-time Control System. 4D/RCS is a reference model architecture for unmanned vehicle systems. 4D/RCS operates machines and drives vehicles autonomously. It is general-purpose and robust in real-world environments. It uses information from battlefield information network, a priori maps, etc. It pays attention to valued goals and makes decisions about what is most important based on rules of engagement and situational awareness.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1.1. Overview Knowledge and experiences are represented in 4D/RCS using images, maps, objects, events, state, attributes, relationships, situations, episodes, frames. Main components of the architecture include: Behavior Generation, World Modeling, Value Judgment, Sensory Processing, and Knowledge Database. These are organized in a hierarchical real-time control system (RCS) architecture. The cognitive cycle, or control loop, of 4D/RCS (Figure 1) is based on an internal world model that supports perception and behavior. Perception extracts the information necessary to keep the world model current and accurate from the sensory data stream. Behavior uses the world model to decompose goals into appropriate action. Most recent representative publication for 4D/RCS is [8]. This architecture was implemented, tested and studied at NIST (http://www.isd.mel.nist.gov/projects/rcslib/) using C++, Windows Real-time, VXworks, Neutral Messaging Language (NML), Mobility Open Architecture Simulation and Tools (MOAST: http://sourceforge.net/projects/moast/), and Urban Search and Rescue Simulation (USARSim: http://sourceforge.net/projects/usarsim/). The list of funding programs,
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
198
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
projects and environments in which the architecture was used can be found at http://members.cox.net/bica2009/cogarch/4DRCS.pdf. 1.2. Support for Common Components and Features The framework of 4D/RCS supports the following features and components that are common for many cognitive architectures: working memory, semantic memory, episodic memory, procedural memory, iconic memory (image and map representations), perceptual memory (pixels are segmented and grouped into entities and events), cognitive map, reward system (Value Judgment processes compute cost, benefit, risk), attention control (can focus attention on regions of interest), and consciousness (the architecture is aware of self in relation to the environment and other agents).
Mission Goal
Perception
World Model
Behavior
internal external
Sensing
Real World
Action
Figure 1. A bird’s eye view of the cognitive cycle of 4D/RCS.
Sensory, motor and other specific modalities include visual input (color, stereo), LADAR, GPS, odometry, inertial imagery. Supported cognitive functionality includes self-awareness.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
1.3. Learning, Goal and Value Generation, and Cognitive Development
The following general paradigms and aspects of learning are supported by 4D/RCS: unsupervised learning (updates control system parameters in real-time), supervised learning (learns from subject matter experts), arbitrary mixtures of unsupervised and supervised learning, real-time learning, fast stable learning (uses CMAC algorithm that learns fast from error correction; when no errors, learning stops), learning from arbitrarily large databases (all applications are real-world and real-time), learning of non-stationary databases (battlefield environments change unpredictably). Learning algorithms used in 4D/RCS include reinforcement learning (learns parameters for actuator backlash, braking, steering, and acceleration). The architecture supports learning of new representations (maps and trajectories in new environments). 1.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations
The following general paradigms are implemented in 4D/RCS: problem solving (uses rules and/or search methods for solving problems), decision making (makes decision based on Value Judgment calculations).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
199
In addition, the following specific paradigms were modeled with this architecture: task switching, Tower of Hanoi/London, dual task, visual perception with comprehension, spatial exploration, learning and navigation, object/feature search in an environment (search for targets in regions of interest), learning from subject matter experts.
2. ACT-R This section is based on the contribution of Christian Lebiere and Andrea Stocco to the Comparative Table of Cognitive Architectures.1 ACT-R stands for Adaptive Control of Thought - Rational. 2.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in ACT-R using chunks and productions. ACT-R is composed of a set of nearly-independent modules that make information accessible to dedicated buffers of limited capacity (Figure 2). Information is encoded in form of chunks and productions rules; production rules are also responsible for accessing and setting chunks in the module buffers. The ACT theory was originally introduced in [9]. The key reference for ACT-R is [10]. Most recent representative publications include [11]. ACT-R was implemented and studied experimentally at Carnegie Mellon University using Lisp with an optional TCL/Tk interface. Implementations in other programming languages have also been developed by the user community.
Figure 2. A bird’s eye view of ACT-R. The boxes are modules, controlled by a central procedural module through limited-capacity buffers.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
200
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
2.2. Support for Common Components and Features The framework of ACT-R supports the following features and components that are common for many cognitive architectures: semantic memory (encoded as chunks), procedural memory. Working and episodic memory systems are not explicitly defined, but the set of buffers used in inter-module communication can be thought of as providing working memory capacities. Sensory, motor and other specific modalities include: visual input (propositional, based on chunks), auditory input (propositional, based on chunks), basic motor functions (largely hands and finger control). 2.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in ACT-R include: reinforcement learning (for productions, linear discount version), Bayesian update (for memory retrieval). The architecture supports production compilation (forms new productions) and automatic learning of chunks (from buffer contents). 2.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations General paradigms of modeling studies with ACT-R include: problem solving, decision making, language processing, working memory tests. In addition, the following specific paradigms were modeled with this architecture: Stroop task (multiple models), task switching (multiple models), Tower of Hanoi/London, psychological refractory period (PRP) tasks, dual task, N-Back. A full list of paradigms modeled in ACT-R and associated publications can be found at: http://act-r.psy.cmu.edu/publications/index.php.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
3. ART This section is based on the contribution of Stephen Grossberg to the Comparative Table of Cognitive Architectures.1 ART stands for Adaptive Resonance Theory. Biologically-relevant cognitive architectures should clarify how individuals adapt autonomously in real time to a changing world filled with unexpected events; should explain and predict how several different types of learning (recognition, reinforcement, adaptive timing, spatial, motor) interact to this end; should use a small number of equations in a larger number of modules, or microassemblies, to form modal architectures (vision, audition, cognition, etc.) that control the different modalities of intelligence; should reflect the global organization of the brain into parallel processing streams that compute computationally complementary properties within and between these modal architectures; and should exploit the fact that all parts of the neocortex, which supports the highest levels of intelligence in all modalities, are variations of a shared laminar circuit design and thus can communicate with one another in a computationally self-consistent way.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
201
3.1. Overview of the Architecture and Its Study Knowledge and experiences are represented in ART (Figure 3) using visual 3D boundary and surface representations; auditory streams; spatial, object, and verbal working memories; list chunks; drive representations for reinforcement learning; orienting system; expectation filter; spectral timing networks. Main components of the architecture include model brain regions, notably laminar cortical and thalamic circuits.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 3. A bird’s eye view of ART (included with permission of Stephen Grossberg).
ART was originally introduced in [12]. Most recent representative publications can be found at http://cns.bu.edu/~steve. ART was implemented and studied experimentally using nonlinear neural networks (with feedback, multiple spatial and temporal scales). Funding programs, projects and environments in which the architecture was used are also listed at the above URL. 3.2. Support for Common Components and Features The framework of ART supports the following features and components that are common for many cognitive architectures: working memory (recurrent shunting oncenter off-surround network that obeys LTM Invariance Principle and Inhibition of Return rehearsal law), semantic memory (limited associations between chunks), episodic memory (limited, builds on hippocampal spatial and temporal representations), procedural memory (multiple explicitly defined neural systems for learning, planning and control of action), iconic memory (emerges from role of top-down attentive interactions in laminar models of how the visual cortex sees), perceptual memory (model development of laminar visual cortex and explain how both fast perceptual learning with attention and awareness, and slow perceptual learning without attention or awareness, can occur), cognitive map (networks that learn entorhinal grid cell and hippocampal place field representations on line), reward system (model how amygdala, hypothalamus, and basal ganglia interact with sensory and prefrontal cortex to learn to direct attention and actions towards valued goals; used to help explain data about
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
202
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
classical and instrumental conditioning, mental disorders - autism, schizophrenia - and decision making under risk), attention control and consciousness (clarifies how boundary, surface, and prototype attention differ and work together to coordinate object and scene learning). Adaptive Resonance Theory predicts a link between processes of Consciousness, Learning, Expectation, Attention, Resonance, and Synchrony (CLEARS) and that All Conscious States Are Resonant States. Sensory, motor and other specific modalities include: visual input (natural static and dynamic scenes, psychophysical displays: used to develop emerging architecture of visual system from retina to prefrontal cortex, including how 3D boundaries and surface representations form, and how view-dependent and view-invariant object categories are learned under coordinated guidance of spatial and object attention), auditory input (natural sound streams: used to develop emerging architecture of visual system for auditory streaming and speaker-invariant speech recognition), special modalities (SAR, LADAR, multispectral IR, night vision, etc.).
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
3.3. Learning, Goal and Value Generation, and Cognitive Development The following general paradigms and aspects of learning are supported by ART: unsupervised learning (can categorize objects and events or alter spatial maps and sensory-motor gains without supervision), supervised learning (can learn from predictive mismatches with environmental constraints, or explicit teaching signals, when they are available), arbitrary mixtures of unsupervised and supervised learning (e.g., the ARTMAP family of models), real-time learning: both ART (match learning) and Vector Associative Map (VAM; mismatch learning) models use real-time local learning laws; fast stable learning (i.e., adaptive weights converge on each trial without forcing catastrophic forgetting: theorems prove that ART can categorize events in a single learning trial without experiencing catastrophic forgetting in dense nonstationary environments; mismatch learning cannot do this, but this is adaptive in learning spatial and motor data about changing bodies), learning from arbitrarily large databases (i.e., not toy problems: theorems about ART algorithms show that they can do fast learning and self-stabilizing memory in arbitrarily large non-stationary data bases; ART is therefore used in many large-scale applications http://techlab.bu.edu/), learning of non-stationary databases (i.e., when environmental rules change unpredictably). Learning algorithms used in ART include: reinforcement learning (CogEM and TELOS models of how amygdala and basal ganglia interact with orbitofrontal cortex, etc.), Bayesian effects as emergent properties, a combination of Hebbian and antiHebbian properties in learning dynamics, gradient descent methods learning of new representations via self-organization. 3.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with ART include visual and auditory information processing. Specifically, the following paradigms were modeled: problem solving, decision making, analogical reasoning (in rule discovery applications), language processing, working memory tests. In addition, the following specific paradigms were modeled with this architecture: Stroop task , task switching, Tower of Hanoi/London, N-Back, visual perception with
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
203
comprehension, spatial exploration, learning and navigation, object/feature search in an environment, pretend-play (an architecture for teacher-child imitation). 3.5. Meta-Theoretical Questions, Discussion and Conclusions Does this architecture allow for using only local computations? Yes. All operations defined by local operations in neural networks. Can it function autonomously? Yes. ART models can continue to learn stably about non-stationary environments while performing in them. Is it general-purpose in its modality; i.e., is it brittle? Can it pay attention to valued goals? Yes. ART derives its memory stability from matching bottom-up data with learned top-down expectations that pay attention to expected data. ART-CogEM models use cognitive-emotional resonances to focus attention on valued goals. Can it flexibly switch attention between unexpected challenges and valued goals? Yes. Top-down attentive mismatches drive attention reset, shifts, and memory search. Cognitive-emotional and attentional shroud mechanisms modulate attention shifts. Can reinforcement learning and motivation modulate perceptual and cognitive decision-making in this architecture? Yes. Cognitive-emotional and perceptualcognitive resonances interact together for this purpose. Can it adaptively fuse information from multiple types of sensors and modalities? ART categorization discovers multi-modal feature and hierarchical rule combinations that lead to predictive success.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
4. BECCA This section is based on the contribution of Brandon Rohrer to the Comparative Table of Cognitive Architectures.1 BECCA stands for Brain-Emulating Cognition and Control Architecture. It is a general unsupervised learning and control approach based on neuroscientific and psychological models of humans. BECCA was designed to solve the problem of natural-world interaction. The research goal is to place BECCA into a system with unknown inputs and outputs and have it learn to successfully achieve its goals in an arbitrary environment. The current state-of-the-art in solving this problem is the human brain. As a result, BECCA's design and development is based heavily on insights drawn from neuroscience and experimental psychology. The development strategy emphasizes physical embodiment in robots. 4.1. Overview Knowledge and experiences are represented in BECCA using discrete, symbolic percepts and actions. Raw sensory inputs are converted to symbolic percepts, and symbolic actions are refined into specific motor commands. A bird’s eye view of the architecture is shown in Figure 4. Main components can be characterized as follows. Episodic learning and procedural learning are modeled in S-Learning, which is based on sequence representations in the hippocampus. Semantic memory is modeled using Context-Based Similarity (CBS). Perception (conversion of raw sensor data into symbolic percepts) is performed using X-trees, which are based on the function of the cortex. Action refinement is performed using S-trees, which are based on the function of the cerebellum.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
204
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Figure 4. A bird’s eye view of BECCA.
BECCA was originally introduced in [13]. Most recent representative publications include [14,87]. BECCA was implemented and tested / studied experimentally, Java code for robot implementations and MATLAB prototypes of individual components can be found at: http://sites.google.com/site/brohrer/source.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
4.2. Support for Common Components and Features The framework of BECCA supports the following features and components that are common for many cognitive architectures: working memory (a fixed number of recent percepts and actions are stored for use in decision making in S-learning), semantic memory (percepts are clustered into more abstract percepts using Context-Based Similarity; percepts' membership in a cluster is usually partial, and a percept may be a member of many clusters), episodic memory, procedural memory (both episodic and procedural memories are represented in S-learning as sequences of percepts and actions), iconic memory (percepts identified by X-trees may be accessed by S-learning until they are “overwritten” by new experience), cognitive map (somewhat; the processing performed by X-trees results in related percepts being located “near” each other; for instance, it generates visual primitives similar to the direction-specific fields in V1), reward system (basic rewards, such as for finding an energy source or receiving approval, are an integral part of the system; more sophisticated goal-seeking behaviors are predicated on these). Sensory, motor and other specific modalities: BECCA makes no assumptions about the nature of its inputs. It can handle inputs of any nature, including visual, auditory, tactile, proprioceptive, and chemical. It can equally well handle inputs from nonstandard modalities, including magnetic, radiation-detection, GPS, and LADAR. It can, in addition, handle symbolic inputs, such as ASCII text, days of the week, and other categorical data. It has been implemented with color vision, ultrasonic range finders, and ASCII text input. 4.3. Learning, Goal and Value Generation, and Cognitive Development The following general paradigms and aspects of learning are supported by BECCA: unsupervised learning (X-trees form percepts from raw data with no teaching signal), supervised learning (at a high level: supervised learning can be observed when BECCA
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
205
is interacting with a human coach that provides verbal commands and expresses approval). Learning algorithms used in BECCA include reinforcement learning, Bayesian update, Hebbian learning, learning of new representations. Specifically, Slearning falls under the category of Temporal-Difference learning algorithms, which are one flavor of reinforcement learning. When evaluating the outcome of an action, past experience is recalled and summarized in Bayesian terms. The probabilities of outcomes are then used to make the decision. Although BECCA uses no neural networks, X-trees cluster inputs that “fire together”. X-trees learn percepts from raw data. Context-based similarity learns abstract concepts from sequences of percepts. 4.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with BECCA include natural-world interaction, i.e. an embodied agent interacting with an unknown and unmodeled environment, and natural language processing. Specifically, the following paradigms were modeled: problem solving (occurs through trial and error and through application of similar past experiences), decision making (S-learning makes decisions: chooses actions based on past sequences of experiences that led to a rewarded state), analogical reasoning (in an emergent fashion), language processing, spatial exploration, learning and navigation, object/feature search in an environment.
5. BiSoar
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This section is based on the contribution of Balakrishnan Chandrasekaran and Unmesh Kurup to the Comparative Table of Cognitive Architectures.1 During the CogArch panel at BICA 2009, Chandrasekaran called attention to the lack of support in the current family of cognitive architectures for perceptual imagination, and cited his group’s DRS system that has been used to help Soar and ACT-R engage in diagrammatic imagination for problem solving. 5.1. Overview Knowledge and experiences are represented in biSoar using the representational framework of Soar plus diagrams – the diagrammatic part can also be combined with any symbolic general architecture, such as ACT-R. Main components of the architecture include Diagrammatic Representation System (DRS) used for diagram representation; plus perceptual and action routines to get information from and create/modify diagrams. BiSoar was originally introduced in [15]. Other key references include [16]. BiSoar was implemented and tested / studied experimentally using the Soar framework. 5.2. Support for Common Components and Features The framework of biSoar supports the following features and components that are common for many cognitive architectures: working memory (plus diagrammatic working memory), procedural memory (extends Soar’s procedural memory to diagrammatic components), perceptual memory, cognitive map (emergent), attention
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
206
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
control (Soar’s framework). Sensory and special modalities include visual input (diagrams), imagery (diagrammatic), spatial cognition, etc. 5.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in biSoar include reinforcement learning (extends Soar's chunking for diagrammatic components). 5.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations The main general paradigm of modeling studies with biSoar is problem solving. In addition, the following specific paradigms were modeled with this architecture: spatial exploration, learning and navigation.
6. CERA-CRANIUM This section is based on the contribution of Raul Arrabales to the Comparative Table of Cognitive Architectures.1 CERA-CRANIUM is a cognitive architecture designed to control a wide variety of autonomous agents, from physical mobile robots [17] to computer game synthetic characters [18]. The main inspiration of CERA-CRANIUM is the Global Workspace Theory [19]. However, current design also takes the inspiration from other cognitive theories of consciousness and emotions.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
6.1. Overview CERA-CRANIUM consists of two main components (see Figure 5): CERA, a control architecture structured in layers; and CRANIUM, a tool for the creation and management of high amounts of parallel processes in shared workspaces. CERA uses the services provided by CRANIUM with the aim of generating a highly dynamic and adaptive perception mechanism. Knowledge and experiences are represented in CERA-CRANIUM using single percepts, complex percepts, and mission percepts. Actions are generated using single and complex behaviors. Main components of the architecture include: physical layer global workspace, mission-specific layer global workspace, core layer contextualization, attention, sensory prediction, status assessment, and goal management mechanisms. CERA-CRANIUM was originally introduced in [17]. Most recent representative publications include [20, 18]. Currently, there exist two implementations of CERACRANIUM. One is oriented to the control of robots and based on CCR (Concurrency and Coordination Runtime) and DSS (Decentralized Software Services), part of Robotics Developer Studio (http://www.conscious-robots.com/en/roboticsstudio/2.html). The latest implementation is mostly written in Java and has been applied to the control of computer game bots. This CERA-CRANIUM implementation is the winner of the 2K BotPrize 2010 competition, a Turing test adapted to video games [21].
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
207
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Agent
CERA S-M
CERA Physical Layer
Sensor Service Sensor Service
…
…
Sensor Service
CRANIUM Workspace
CRANIUM Workspace Complex Percepts
Single Percepts Sensor Preprocessors
CERA Mission Layer
…
Percept Aggregators
Mission Percepts
…
CERA Core Layer
Specialized Processors
Figure 5. A bird’s eye view of CERA-CRANIUM.
6.2. Support for Common Components and Features The framework of CERA-CRANIUM supports the following features and components that are common for many cognitive architectures: working memory, procedural memory (some processors generate single or complex behaviors), perceptual memory (only as preprocessing buffers in CERA sensor services), cognitive map (missionspecific processors build 2D maps), reward system (status assessment mechanism in core layer), attention control and consciousness (attention is implemented as a bias signal induced from the core layer to the lower levels global workspaces). Sensory, motor and other specific modalities include: visual input (both real cam and synthetic images from the simulator), and special modalities: SONAR, Laser Range Finder. 6.3. Cognitive Modeling and Application Paradigms, Scalability and Limitations
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Main general paradigms of modeling studies with CERA-CRANIUM include Global Workspace Theory; multimodal sensory binding. Specifically, the following paradigms were modeled: problem solving (implicit), decision making. In addition, the following specific paradigms were modeled with this architecture: spatial exploration, learning and navigation.
7. CHREST This section is based on the contribution of Fernand Gobet and Peter Lane to the Comparative Table of Cognitive Architectures.1 7.1. Overview Knowledge and experiences are represented in CHREST using chunks and productions. Main components of the architecture include (Figure 6): Attention, Sensory Memory, Short-Term Memory, and Long-Term Memory. CHREST was originally introduced in [22]. Other key references include [23]. Most recent representative publications include [24]. CHREST was implemented and tested / studied experimentally using Lisp and Java.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
208
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Figure 6. A bird’s eye view of CHREST.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
7.2. Support for Common Components and Features The framework of CHREST supports the following features and components that are common for many cognitive architectures: working memory (called short-term memory; auditory short-term memory and visuo-spatial short-term memory are implemented), semantic memory (implemented by the network of chunks in long-term memory), episodic memory (“episodic” links are used in some simulations), procedural memory (implemented by productions), iconic memory, perceptual memory, attention control and consciousness. Attention plays an important role in the architecture, as it for example determines the next eye fixation and what will be learnt. Sensory, motor and other specific modalities include visual input (coded as a list of structures or arrays), auditory input (coded as spoken text segmented either as words, phonemes, or syllables), other natural language communications, etc. 7.3. Learning, Goal and Value Generation, and Cognitive Development The architecture supports learning of new representations. Chunks and templates (schemata) are automatically and autonomously created as a function of the interaction of the input and the previous state of knowledge. 7.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigm of modeling studies with CHREST is learning. This includes implicit learning, verbal learning, acquisition of first language (syntax, vocabulary), Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
209
development of expertise, memory formation, concept formation. Specifically, the following paradigms were modeled: problem solving (CHREST solves problems mostly by pattern recognition), decision making, language processing (only acquisition of language), implicit memory tasks. In addition, the following specific paradigms were modeled with this architecture: visual perception with comprehension, spatial exploration, learning and navigation, object/feature search in an environment. CHREST can learn from arbitrarily large databases. E.g., simulations on the acquisition of language have used corpora larger than 350k utterances. Simulations with chess have used databases with more than 10k positions. Discrimination networks with up to 300k chunks have been created. CHREST can adaptively fuse information from multiple types of sensors and modalities.
8. CLARION This section is based on the contribution of Ron Sun to the Comparative Table of Cognitive Architectures.1 CLARION is based on a four-way division of implicit versus explicit knowledge and procedural versus declarative knowledge. In addition, CLARION addresses both top-down learning (from explicit to implicit knowledge) and bottom-up learning (from implicit to explicit knowledge). CLARION also addresses motivational and metacognitive processes underlying human cognition.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
8.1. Overview Knowledge and experiences are represented in CLARION within a implicit-explicit dichotomy: using chunks and rules (for explicit knowledge), and neural networks (for implicit knowledge). Main components of the architecture include (Figure 7): the action-centered subsystem (for procedural knowledge) and the non-action-centered subsystem (for declarative knowledge), each in both implicit and explicit forms (represented by rules and neural networks). In addition, there are the motivational subsystem and the metacognitive subsystem. CLARION was originally introduced by Sun et al. in 1996 [25]. For a general overview, see [26]. For more in-depth explanations, see [27-29]. Most recent representative publications can be found at: http://www.cogsci.rpi.edu/~rsun/clarion.html. CLARION was implemented and tested / studied experimentally using Java. Funding programs and projects in which the architecture was supported include Office of Naval Research programs, Army Research Institute programs, etc. 8.2. Support for Common Components and Features The framework of CLARION supports the following features and components that are common for many cognitive architectures but in different (more detailed) ways: semantic, episodic and procedural memory (each in both implicit and explicit forms, based on chunks, rules and neural networks), working memory (a separate structure), a reward system (in the form of a motivational subsystem and a meta-cognitive
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
210
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
subsystem that determines rewards based on the motivational subsystem), as well as other motivational and meta-cognition modules.
Figure 7. A bird’s eye view of CLARION.
Supported cognitive functionality includes skill learning, reasoning, memory, both top-down and bottom-up learning, metacognition, motivation, motivation-cognition interaction, self-awareness, emotion, consciousness, personality types, etc. 8.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in CLARION for implicit learning include: reinforcement learning, Hebbian learning, gradient descent methods (e.g., Backpropagation). For explicit learning, CLARION uses hypothesis testing rule learning and bottom-up rule learning (from implicit to explicit knowledge). The architecture supports learning of new representations (new chunks, new rules, new neural network representations) and furthermore, autonomous learning.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
211
8.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations The following paradigms were modeled with CLARION: skill acquisition, implicit learning, reasoning, problem solving, decision making, working memory tasks, memory tasks, metacognitive tasks, social psychology tasks, personality psychology tasks, motivational dynamics, social simulation, etc. In addition, the following specific paradigms were modeled with this architecture: Tower of Hanoi/London, dual tasks, spatial tasks, learning and navigation, learning from instructions, trial-and-error learning, task switching, etc.
9. CogPrime
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This section is based on the contribution of Ben Goertzel to the Comparative Table of Cognitive Architectures.1
Figure 8. A bird’s eye view of CogPrime.
9.1. Overview From the point of view of knowledge representation, CogPrime is a multirepresentational system. The core representation consists of hypergraphs with uncertain logical relationships and associative relations operating together. Procedures are stored as functional programs; episodes are stored in part as “movies” in a simulation engine. There are other specialized methods as well.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
212
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Main components of the architecture include the following (Figure 8). The primary knowledge store is the AtomSpace, a neural-symbolic weighted labeled hypergraph with multiple cognitive processes acting on it (in a manner carefully designed to manifest cross-process cognitive synergy), and other specialized knowledge stores indexed by it. The cognitive processes are numerous and include an uncertain inference engine (PLN, Probabilistic Logic Networks), a probabilistic evolutionary program learning engine (MOSES, developed initially by Moshe Looks), an attention allocation algorithm (ECAN, Economic Attention Networks, which is neural-net-like), concept formation and blending heuristics, etc. (Figure 8). Work is under way to incorporate a variant of Itamar Arel’s DeSTIN system as a perception and action layer. Motivation and emotion are handled via a variant of Joscha Bach’s MicroPsi framework called CogPsi. Key references include [30,31] (see also http://opencog.org). The OpenCogPrime system implements CogPrime within the open-source OpenCog AI framework, see http://opencog.org. The implementation is mostly in C++ for Linux, some components are written in Java. A Scheme shell is used for interfacing the system. 9.2. Support for Common Components and Features The framework of CogPrime supports the following features and components that are common for many cognitive architectures: working memory, semantic memory, episodic memory, procedural memory, perceptual memory (currently only for vision), cognitive map, reward system (Value Judgment processes compute cost, benefit, risk), attention control (can focus attention on regions and topics of interest) and consciousness (is aware of self in relation to the environment and other agents). Sensory, motor and other specific modalities include: visual input (handled via interfacing with external vision processing tools), auditory input (speech, via an external speech-to-text engine), natural language communications, and special modalities (e.g., CogPrime can read text from the Internet).
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
9.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in CogPrime include: reinforcement learning, Bayesian update, Hebbian learning. The architecture supports learning of new representations (tested only in simple cases). 9.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with CogPrime include control of virtualworld agents and natural language processing. Specifically, the following paradigms were modeled: problem solving, decision making, analogical reasoning, and language processing: comprehension and generation.
10. CoJACK This section is based on the contribution of Frank Ritter and Rick Evertsz to the Comparative Table of Cognitive Architectures.1
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
213
10.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in CoJACK using Beliefs-DesiresIntentions (BDI) architecture that handles events, plans and intentions (procedural memory), beliefsets (declarative memory) and activation levels. Main components of the architecture include (Figure 9): Beliefsets for long term memory, plans, intentions, events, goals. Most recent representative publications include [32]. CoJACK was implemented and tested / studied experimentally using overlay to JACK®.
Figure 9. CoJACK architecture in the context of a synthetic environment.
10.2. Support for Common Components and Features The framework of CoJACK supports the following features and components that are common for many cognitive architectures: working memory (active beliefs and intentions), semantic memory (encoded as weighted beliefs in beliefsets, uses ACT-R's declarative memory equations), episodic memory (not explicitly defined, but would be encoded as beliefs in beliefsets), procedural memory (plans and intentions with activation levels), perceptual memory (gets input from the world as events; these events are processed by plans), cognitive map reward system (uses ACT-R memory equations, so memories and plans get strengthened), attention control and
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
214
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
consciousness (represented by transient events, goal structure, and intentions/beliefs whose activation level is above the threshold). Sensory, motor and other specific modalities include visual input (depends on the particular model). 10.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in CoJACK include reinforcement learning (resulting in strengthening or weakening of plans). 10.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with CoJACK include BDI. Specifically, the following paradigms were modeled: problem solving, decision making, working memory tests. In addition, the following specific paradigms were modeled with this architecture: task switching, Tower of Hanoi/London.
11. Disciple This section is based on the contribution of George Tecuci to the Comparative Table of Cognitive Architectures.1 Disciple is a general agent shell for building cognitive assistants that can learn subject matter expertise from their users, can assist them in solving complex problems in uncertain and dynamic environments, and can tutor students in expert problem solving.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
11.1. Overview As shown in the bottom part of Figure 10, the Disciple shell includes general modules for user-agent interaction, ontology representation, problem solving, learning, and tutoring, as well as domain-independent knowledge (e.g., knowledge for evidencebased reasoning). The problem solving engine of a Disciple cognitive assistant (see the top part of Figure 10) employs a general divide-and-conquer strategy, where complex problems are successively reduced to simpler problems, and the solutions of the simpler problems are successively combined into the solutions of the corresponding complex problems. To exhibit this type of behavior, the knowledge base of the agent contains a hierarchy of ontologies, as well as problem reduction rules and solution synthesis rules which are expressed with the concepts from the ontologies. The most representative early paper on Disciple is [33]. Other key references include [34-37]. Most recent representative publications include [38,39]. Disciple was initially implemented in Lisp and is currently implemented in Java. 11.2. Support for Common Components and Features The framework of Disciple supports features and components that are common for many cognitive architectures, including working memory (reasoning trees), semantic
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
215
memory (ontologies), episodic memory (reasoning examples), and procedural memory (rules). Communication is based on natural language patterns learned from the user. 11.3. Learning, Goal and Value Generation, and Cognitive Development
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
An expert interacts directly with a Disciple cognitive assistant, to teach it to solve problems in a way that is similar to how the expert would teach a less experienced collaborator. This process is based on mixed-initiative problem solving (where the expert solves the more creative parts of a problem and the agent solves the more routine ones), integrated learning and teaching (where the expert helps the agent to learn by providing examples, hints and explanations, and the agent helps the expert to teach it by asking relevant questions), and multistrategy learning (where the agent integrates complementary learning strategies, such as learning from examples, learning from explanations, and learning by analogy, to learn general concepts and rules).
Figure 10. Disciple shell and Disciple cognitive assistants.
11.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Disciple agents have been developed for a wide variety of domains, including manufacturing [33], education [34], course of action critiquing [35], center of gravity determination [36,37], and intelligence analysis [38]. The most recent Disciple agents incorporate a significant amount of generic knowledge from the Science of Evidence, allowing them to teach and help their users in discovering and evaluating evidence and
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
216
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
hypotheses, through the development of Wigmorean probabilistic inference networks that link evidence to hypotheses in argumentation structures that establish the relevance, believability and inferential force of evidence [39]. Disciple agents are used in courses at various institutions, including US Army War College, Joint Forces Staff College, Air War College, and George Mason University.
12. EPIC
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This section is based on the contribution of Shane Mueller and Andrea Stocco to the Comparative Table of Cognitive Architectures.1
Figure 11. A bird’s eye view of EPIC.
12.1. Overview Knowledge and experiences are represented in EPIC using production rules and working memory entries. Main components of the architecture include (Figure 11): cognitive processor (including production rule interpreter and working memory), long term memory, production memory, detailed perceptual-motor interfaces (auditory processor, visual processor, ocular motor processor, vocal motor processor, manual motor processor, tactile processor).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
217
EPIC was originally introduced in [40,41]. Other key references include [42]. EPIC (original version) was implemented and tested / studied experimentally using Common Lisp; EPIC-X was implemented in C++. Funding programs and projects in which the architecture was used were supported by the Office of Naval Research. 12.2. Support for Common Components and Features The framework of EPIC supports the following features and components that are common for many cognitive architectures: working memory, procedural memory (production rules), iconic memory (part of visual perceptual processor), perceptual memory. Sensory, motor and other specific modalities include visual input, auditory input, and motor output (including interaction of visual motor and visual perceptual processors, auditory and speech objects, and spatialized auditory information). 12.3. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with EPIC include simulation of human performance, multi-tasking, PRP Procedure, air traffic, verbal working memory tasks, visual working memory tasks. In addition, the following specific paradigms were modeled with this architecture: Task switching, dual task, N-Back, visual perception with comprehension.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
13. FORR This section is based on the contribution of Susan L. Esptein to the Comparative Table of Cognitive Architectures.1 FORR (FOr the Right Reasons) is highly modular. It includes a declarative memory for facts and a procedural memory represented as a hierarchy of decisionmaking rationales that propose and rate alternative actions. FORR matches perceptions and facts to heuristics, and processes action preferences through its hierarchical structure, along with its heuristics’ real-valued weights. Execution affects the environment or changes declarative memory. Learning in FORR creates new facts and new heuristics, adjusts the weights, and restructures the hierarchy based on facts and on metaheuristics for accuracy, utility, risk, and speed. 13.1. Overview Knowledge and experiences are represented in FORR using Descriptives (shared knowledge resources computed on demand and refreshed only when necessary), Advisors (domain-dependent decision rationales for actions), and Measurements (synopses of problem solving experiences). Main components of the architecture can be characterized as follows (Figure 12). Advisors are organized into three tiers. Tier-1 Advisors are fast and correct, recommend individual actions, and are consulted in a pre-specified order. Tier-2 Advisors trigger in the presence of a recognized situation, recommend (possible partially ordered) sets of actions, and are consulted in a prespecified order. Tier-3 Advisors are heuristics, recommend individual actions, and are
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
218
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
consulted together. Tier-3 Advisors’ opinions express preference strengths that are combines with weights during voting to select an action. A FORR-based system learns those weights from traces of its problem-solving behavior.
Figure 12. A bird’s eye view of FORR (reproduced with permission from [43]).
FORR was originally introduced in [44]. Other key references include [45]. Most recent representative publications include [43]. FORR was implemented and tested / studied experimentally using Common Lisp and Java. Funding programs, projects and environments in which the architecture was used were supported by National Science Foundation.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
13.2. Support for Common Components and Features The framework of FORR supports the following features and components that are common for many cognitive architectures: working memory, semantic memory (in many Descriptives), episodic memory (in task history and summary measurements), procedural memory, iconic memory (in some Descriptives), perceptual memory (Advisors can be weighted by problem progress, and repeated sequences of actions can be learned and stored), reward system (Advisor weights are acquired during selfsupervised learning), attention control and consciousness (some Advisors attend to specific problem features). Sensory, motor and other specific modalities include: auditory input (FORRSooth is an extended version that conducts human-computer dialogues in real time), natural language communications, etc. Supported cognitive functionality includes metacognition, emotional intelligence and elements of personality (personality and emotion can be modeled through Advisors). 13.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in FORR include: reinforcement learning, Bayesian update, Hebbian learning (with respect to groupings of Tier-3 Advisors). The architecture supports learning of new representations (can learn new Advisors). 13.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with FORR include constraint solving, game playing, robot navigation, and spoken dialogue. Specifically, the following
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
219
paradigms were modeled: problem solving, decision making, analogical reasoning (via pattern matching), language processing, metacognitive tasks. In addition, the following specific paradigms were modeled with this architecture: spatial exploration, learning and navigation, learning from instructions.
14. GLAIR
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This section is based on the contribution of Stuart C. Shapiro to the Comparative Table of Cognitive Architectures.1
Figure 13. A bird’s eye view of GLAIR.
14.1. Overview Knowledge and experiences are represented in GLAIR using SNePS, simultaneously a logic-based, assertional frame-based, and propositional graph-based representation. Main components of the architecture include (Figure 13): (1) Knowledge Layer (KL) containing Semantic Memory, Episodic Memory, Quantified and conditional beliefs, Plans for non-primitive acts, Plans to achieve goals, Beliefs about preconditions and effects of acts, Policies (Conditions for performing acts), Self-knowledge, Metaknowledge; (2) Perceptuo-Motor Layer (PML) containing implementations of primitive actions, perceptual structures that ground KL symbols, deictic and modality registers; (3) Sensori-Actuator Layer containing sensor and effector controllers.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
220
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
GLAIR was introduced in [46], most recent representative publications include [47]. GLAIR was implemented and tested / studied experimentally using Common Lisp. 14.2. Support for Common Components and Features The framework of GLAIR supports the following features and components that are common for many cognitive architectures: semantic memory (in SNePS), episodic memory (temporally-related beliefs in SNePS), procedural memory (in PML implemented in Lisp, could be compiled from KL). Sensory, motor and other specific modalities include: visual input (perceptual structures in PML), auditory input (has been done using off-the-shelf speech recognition), special modalities (agents have used speech and navigation). 14.3. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with GLAIR include reasoning, belief change, reasoning without a goal. Specifically, the following paradigms were modeled: analogical reasoning, language processing, learning from instructions.
15. GMU BICA This section is based on the contribution of Alexei Samsonovich to the Comparative Table of Cognitive Architectures.1 GMU BICA is a Biologically Inspired Cognitive Architecture developed at George Mason University.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
15.1. Overview Knowledge and experiences are represented in GMU BICA using schemas and mental states. Main components of the architecture (Figure 14, left) include five memory systems: working, semantic, episodic, procedural, iconic (Input/Output); plus cognitive map, reward system, and driving engine. The distinguishing feature of GMU BICA is the dynamic system of mental states (Figure 14, right: [48]) that naturally enables various forms of metacognition [49].
Figure 14. Left: A bird’s eye view of GMU BICA. Right: a typical snapshot of working memory. Shaded boxes represent mental states. Their number, labels, contents and relations are dynamical variables.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
221
GMU BICA was originally introduced in [50]. Other key references include [48,49,51]. GMU BICA was implemented and tested / studied experimentally at GMU using Matlab, Python, Lisp, Java, and various visualization tools. Support for funding programs, projects and environments in which the architecture was used includes DARPA and GMU Center for Consciousness and Transformation. 15.2. Support for Common Components and Features The framework of GMU BICA supports the following features and components that are common for many cognitive architectures: working memory (includes active mental states), semantic memory (includes schemas), episodic memory (includes inactive mental states aggregated into episodes), procedural memory (includes primitives), iconic memory (input-output buffer), cognitive map, reward system, attention control (the distribution of attention is described by a special attribute in instances of schemas) and consciousness (can be identified with the content of the mental state I-Now). Sensory, motor and other specific modalities include: visual input (symbolic), auditory input (textual commands), motor output (high-level), imagery (mental states IImagined), spatial cognition (spatial learning and pathfinding), etc. Supported cognitive functionality includes metacognition (e.g., via mental states IMeta), self-awareness (a mental state can be self-reflective and can reflect / operate on other mental states), self-regulated learning [48], etc. 15.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in GMU BICA include reinforcement learning (specifically, a version of temporal difference learning). In addition, the architecture supports emergence and learning of new representations (schemas and mental states) from own experience, from imagery, from observation of examples, from instruction, from guided/scaffolded task execution, from an interactive dialogue with instructor.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
15.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with GMU BICA include voluntary perception, cognition and action. Specifically, the following paradigms were modeled with this architecture: problem solving, decision making (in navigation), visual perception with comprehension, perceptual illusions (Necker cube), metacognitive tasks (in navigation), spatial exploration, learning and navigation, object/feature search in an environment. Self-regulated learning, learning from instructions and pretend-play paradigms were studied at a level of meta-simulation.
16. HTM This section is based on the contribution of Jeff Hawkins (with the assistance of Donna Dubinsky) to the Comparative Table of Cognitive Architectures.1 HTM stands for Hierarchical Temporal Memory.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
222
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
16.1. Overview Knowledge and experiences are represented in HTM using sparse distributed representations. Main components: HTM is a biologically constrained model of neocortex and thalamus. HTM models cortex related to sensory perception, learning to infer and predict from high dimensional sensory data. The model starts with a hierarchy of memory nodes. Each node learns to pool spatial pattern using temporal contiguity (using variable order sequences if appropriate) as the teacher. HTMs are inherently modality independent. Biologically, the model maps to cortical regions, layers of cells, columns of cell across the layers, inhibitory cells, and non-linear dendrite properties. All representations are large sparse distributions of cell activities. HTM was originally introduced in [52]. Other key references are listed at www.numenta.com. Most recent representative publications include [53] (see also http://en.wikipedia.org/wiki/Hierarchical_temporal_memory). HTM was implemented and tested / studied experimentally using NuPIC development environment available for PC and Mac. It is available under a free research license and a paid commercial license from Numenta. 16.2. Support for Common Components and Features The framework of HTM supports the following features and components that are common for many cognitive architectures: semantic memory (semantic meaning can be encoded in sparse distributed representations), attention control and consciousness (modeled covert attentional mechanisms within HTMs). HTMs have been applied to vision, audition, network sensors, power systems, and other tasks.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
16.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in HTM include: Bayesian update (HTM hierarchies can be understood in a belief propagation/Bayesian framework), Hebbian learning, learning of new representations. 16.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Specifically, the paradigms of visual perception with comprehension were modeled using this architecture.
17. Leabra This section is based on the contribution of David C. Noelle to the Comparative Table of Cognitive Architectures.1 17.1. Overview Knowledge and experiences are represented in Leabra using patterns of neural firing rates and patterns of synaptic strengths. Sensory events drive patterns of neural
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
223
activation, and such activation-based representations may drive further processing and the production of actions. Knowledge that is retained for long periods is encoded in patterns of synaptic connections, with synaptic strengths determining the activation patterns that arise when knowledge or previous experiences are to be employed. Main components: at the level of gross functional anatomy, most Leabra models employ a tripartite view of brain organization. The brain is coarsely divided into prefrontal cortex, the hippocampus and associated medial-temporal areas, and the rest of cortex – "posterior" areas. Prefrontal cortex provides mechanisms for the flexible retention and manipulation of activation-based representations, playing an important role in working memory and cognitive control. The hippocampus supports the rapid weight-based learning of sparse conjunctive representations, providing central mechanisms for episodic memory. The posterior cortex mostly utilizes slow statistical learning to shape more automatized cognitive processes, including sensory-motor coordination, semantic memory, and the bulk of language processing. At a finer level of detail, other components regularly appear in Leabra-based models. Activation-based processing depends on attractor dynamics utilizing bidirectional excitation between brain regions. Fast pooled lateral inhibition plays a critical role in shaping neural representations. Learning arises from an associational Hebbian component, a biologically plausible error-driven learning component, and a reinforcement learning mechanism dependent on the brain's dopamine system. Leabra was originally introduced in [54]. Most recent representative publications include [55,56]. LEABRA was implemented and tested / studied experimentally using Emergent: open-source software written largely in C++.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
17.2. Support for Common Components and Features The framework of Leabra supports the following features and components that are common for many cognitive architectures: working memory (many Leabra working memory models have been published, mostly focusing on the role of the prefrontal cortex in working memory), semantic memory (many Leabra models of the learning and use of semantic knowledge, abstracted from the statistical regularities over many experiences, have been published, including some language models), episodic memory (many Leabra models of episodic memory have been published, mostly focusing on the role of the hippocampus in episodic memory), and procedural memory: a fair number of Leabra models of automatized sequential action have been produced, with a smaller number specifically addressing issues of motor control. Most of these models explore the shaping of distributed patterns of synaptic strengths in posterior brain areas in order to produce appropriate action sequences in novel situations. Some work on motor skill automaticity has been done. A few models, integrating prefrontal and posterior areas, have focused on the application of explicitly provided rules. Leabra supports cognitive map: Leabra contains the mechanisms necessary to self-organize topographic representations. These have been used to model map-like encodings in the visual system. At this time, it is not clear that these mechanisms have been successfully applied to spatial representation schemes in the hippocampus. Leabra supports reward system: Leabra embraces a few alternative models of the reward-based learning systems dependent on the mesolimbic dopamine systems, including a neural implementation of temporal difference (TD) learning, and, more recently, the PVLV algorithm. Models have been published involving these mechanisms, as well as interactions between dopamine, the amygdala, and both lateral and orbital areas of
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
224
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
prefrontal cortex. Leabra supports iconic memory: In Leabra, iconic memory can result from activation-based attractor dynamics or from small, sometimes transient, changes in synaptic strength, including mechanisms of synaptic depression. Imagery naturally arises from patterns of bidirectional excitation, allowing for top-down influences on sensory areas. Little work has been done, however, in evaluating Leabra models of these phenomena against biological data. Leabra supports perceptual memory: different aspects of perceptual memory can be supported by activation-based learning, small changes in synaptic strengths, frontally-mediated working memory processes, and rapid sparse coding in the hippocampus. Leabra supports attention control and consciousness: In Leabra, attention largely follows a biased competition approach, with top-down activity modulating a process that involves lateral inhibition. Lateral inhibition is a core mechanism in Leabra, as is the bidirectional excitation needed for top-down modulation. Models of spatial attention have been published, including models that use both covert shifts in attention and eye movements in order to improve object recognition and localization. Published models of the role of prefrontal cortex in cognitive control generally involve an attention-like mechanism that allows frontally maintained rules to modulate posterior processing. Virtually no work has been done on consciousness in the Leabra framework, though there is some work currently being done on porting the Mathis and Mozer account of visual awareness into Leabra. Sensory, motor and other specific modalities include: visual input (an advanced Leabra model of visual object recognition has been produced which receives photographic images as input), auditory input (while a few exploratory Leabra models have taken low-level acoustic features as input, this modality has not yet been extensively explored), spatial cognition, etc.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
17.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in Leabra include reinforcement learning, Bayesian update, Hebbian learning, and learning of new representations. Reinforcemenent learning: Leabra embraces a few alternative models of the reward-based learning systems dependent on the mesolimbic dopamine systems, including a neural implementation of temporal difference (TD) learning, and, more recently, the PVLV algorithm. Models have been published involving these mechanisms, as well as interactions between dopamine, the amygdala, and both lateral and orbital areas of prefrontal cortex. Bayesian Update: While Leabra does not include a mechanism for updating knowledge in a Bayes-optimal fashion based on singular experiences, its error-driven learning mechanism does approximate maximum a posteriori outputs given sufficient iterated learning. Hebbian Learning: An associational learning rule, similar to traditional Hebbian learning, is one of the core learning mechanisms in Leabra. Gradient Descent Methods (e.g., Backpropagation): A biologically plausible error-correction learning mechanism, similar in performance to the generalized delta rule but dependent upon bidirectional excitation to communicate error information, is one of the core learning mechanisms in Leabra. Learning of new representations: All active representations in Leabra are, at their core, patterns of neural firing rates. 17.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with Leabra include learning. Specifically, the following paradigms were modeled: decision making (much work has been done on
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
225
Leabra modeling of human decision making in cases of varying reward and probabilistic effects of actions, focusing on the roles of the dopamine system, the norepinepherine system, the amygdala, and orbito-frontal cortex), analogical reasoning (some preliminary work has been done on using dense distributed representations in Leabra to perform analogical mapping), language processing (many Leabra language models have been produced, focusing on both word level and sentence level effects), working memory tests (Leabra models of the prefrontal cortex have explored a variety of working memory phenomena). In addition, the following specific paradigms were modeled with this architecture: Stroop task, task switching, N-Back, visual perception with comprehension (a powerful object recognition model has been constructed), spatial exploration, learning and navigation, object/feature search in an environment (object localization naturally arises in the object recognition model), learning from instructions (some preliminary work on instruction following, particularly in the domain of classification instructions, has been done in Leabra).
18. LIDA This section is based on the contribution of Stan Franklin to the Comparative Table of Cognitive Architectures.1
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
18.1. Overview Knowledge and experiences are represented in LIDA as follows: perceptual knowledge – using nodes and links in a Slipnet-like net with sensory data of various types attached to nodes; episodic knowledge – using Boolean vectors (Sparse Distributed Memory; procedural knowledge – using schemes a la Schema Mechanism. Cognitive cycle (action-perception cycle) in LIDA acts as a cognitive atom. Higher level, multi-cyclic processes are implemented as behavior streams. Main components involved in the cognitive cycle (Figure 15) include sensory memory, perceptual associative memory, workspace, transient episodic memory, declarative memory, global workspace, procedural memory, action selection, sensory motor memory. LIDA was originally introduced in [57]. Other key references include [58]. Most recent representative publications include [59]. 18.2. Support for Common Components and Features The framework of LIDA supports the following features and components that are common for many cognitive architectures: working memory (explicitly included as a workspace with significant internal structure including a current situational model with both real and virtual windows, and a conscious contents queue), semantic memory (implemented automatically as part of declarative memory via sparse distributed memory), episodic memory (both declarative memory and transient episodic memory encoded via sparse distributed memory), procedural memory (schemas a la Drescher), iconic memory (defined using various processed pixel matrices), perceptual memory (semantic net with activation passing; nodes may have sensory data of various sorts attached), reward system (feeling and emotion nodes in perceptual associative memory), attention control and consciousness (implemented a la Global Workspace Theory [19]
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
226
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
with global broadcasts recruiting possible actions in response to the current contents and, also, modulating the various forms of learning).
Internal Stimulus External Stimulus
Sensory Memory
Transient Episodic Memory
Consolidation
Declarative Memory
Environment Perceptual Codelets
Cue
Action Executed
Motor Memory
Episodic Learning
Perceptual Associative Memory
Move Percept
Cue Local Associations
Local Associations
Form Structures Workspace Form Coalitions Move Coalitions
Action Selected
Structural Memory
Structural Learning Attention Memory
Attentional Learning Action Selection
Instantiate schemes
Procedural Memory
Conscious Broadcast
Global Workspace
Procedural Learning
Figure 15. Main architectural components and the cognitive cycle of LIDA.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
18.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in LIDA include reinforcement learning and learning of new representations (for perception, episodic and procedural memories via base-level activation). 18.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with LIDA include Global Workspace Theory paradigms [19]. Specifically, the following paradigms were modeled: problem solving, decision making, language processing, working memory tests.
19. NARS This section is based on the contribution of Pei Wang to the Comparative Table of Cognitive Architectures.1 Though NARS can be considered as a cognitive architecture in a broad sense, it is very different from the other systems. Theoretically, NARS is a normative theory and model of intelligence and cognition as adaptation with insufficient knowledge and
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
227
resources, rather than a direct simulation of human cognitive behaviors, capabilities, or functions; technically, NARS uses a unified reasoning mechanism on a unified memory for learning, problem-solving, etc., rather than integrates different techniques in an architecture. Therefore, accurately speaking, it is not after the same goal as many other cognitive architectures, though still related to them in various aspects. 19.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in NARS using beliefs, tasks, and concepts. Main components of the architecture include an inference engine and an integrated memory and control mechanism (Figure 16). The cognitive cycle can be described as follows (see labels in Figure 16). 1: Input tasks are added into the task buffer. 2: Selected tasks are inserted into the memory. 3: Inserted tasks in memory may also produce beliefs and concepts, as well as change existing ones. 4: In each working cycle, a task and a belief are selected from a concept, and feed to the inference engine as premises. 5: The conclusions derived from the premises by applicable rules are added into the buffer as derived tasks. 6: Selected derived tasks are reported as output tasks.
Figure 16. Main components and the cognitive cycle of NARS.
For key references, see http://sites.google.com/site/narswang/home. Most recent representative publications include [60]. NARS was implemented using open source in Java and Prolog.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
228
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
19.2. Support for Common Components and Features The framework of NARS supports the following features and components that are common for many cognitive architectures: working memory (the active part of the memory), semantic memory (the whole memory is semantic), episodic memory (the part of memory that contains temporal information), procedural memory (the part of memory that is directly related to executable operations), cognitive map (as part of the memory), reward system (experience-based and context-sensitive evaluation).
20. NEXTING This section is based on the contribution of Akshay Vashist and Shoshana Loeb to the Comparative Table of Cognitive Architectures.1 20.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in NEXTING using Facts, Rules, frames (learned/ declared), symbolic representation of raw sensory inputs, expectation generation and matching. Main components of the architecture include (Figure 17): Learning, Reasoning, Imagining, Attention Focus, Time awareness, Expectation generation and matching. Most recent representative publications include [61]. NEXTING was implemented using C++, Java, Perl, and Prolog.
Figure 17. A bird’s eye view of NEXTING.
20.2. Support for Common Components and Features The framework of NEXTING supports the following features and components that are common for many cognitive architectures: working memory, semantic memory (frames), episodic memory (implicit), procedural memory (implicit), attention control (via expectation generation and matching).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
229
20.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in NEXTING include Bayesian update, learning of new representations (capable of representing newly learned knowledge). 20.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with NEXTING include expectation generation and matching via learning and reasoning on stored knowledge and sensory inputs. Specifically, the following paradigms were modeled: decision making (using inference: both statistical and logical), analogical reasoning (in a limited sense), and language processing. In addition, the following specific paradigms were modeled with this architecture: task switching (a necessary feature for nexting), visual perception with comprehension, object/feature search in an environment, learning from instructions.
21. Pogamut This section is based on the contribution of Cyril Brom to the Comparative Table of Cognitive Architectures.1 21.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in Pogamut using rules and operators. Main components of the architecture include (Figure 18): procedural knowledge encoded as reactive rules; episodic and spatial memory encoded within a set of graphbased structures. Key references to Pogamut include [62,63]. Most recent representative publication is [64]. Pogamut was implemented and tested / studied experimentally using Java and Python. Implementations featured ACT-R binding and Emergent binding. 21.2. Support for Common Components and Features The framework of Pogamut supports the following features and components that are common for many cognitive architectures: working memory (declarative), episodic memory (declarative or a spreading activation network), procedural memory (rules), perceptual memory (represents a couple of objects recently perceived by the agent), cognitive map (graph-based and Bayesian). Sensory, motor and other specific modalities include visual input: both symbolic and subsymbolic (tailored for special purposes). 21.3. Learning, Goal and Value Generation, and Cognitive Development The episodic memory uses Hebbian learning. In general, the architecture supports unsupervised learning and can learn in real time, with respect to spatial and episodic memory.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
230
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Unreal Tournament GameBots
A
Pogamut Agent TCP/IP
Gavialib
JMX
Netbeans with Pogamut plugin
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 18. A: A bird’s eye view of Pogamut (see [62] for details). B: A snapshot of Pogamut.
21.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with Pogamut include 3D virtual worlds. Specific paradigms include problem solving (with Prolog binding), decision making, object/feature search in an environment, spatial exploration, learning and navigation.
22. Polyscheme This section is based on the contribution of Nick Cassimatis to the Comparative Table of Cognitive Architectures.1 22.1. Overview Knowledge and experiences are represented in Polyscheme using relational constraints, constraint graphs, first-order literals, taxonomies, and weight matrixes. Main
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
231
components of the architecture include modules using data structures and algorithms specialized for specific concepts (Figure 19). A focus of attention is used for exchanging information among these modules. A focus manager is used for guiding the flow of attention and thus inference. Specialized modules are used for representing and making inferences about specific concepts. Polyscheme was originally introduced in [65]. Most recent representative publications include [66]. Polyscheme was implemented and tested / studied experimentally using Java.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 19. A bird’s eye view of Polyscheme.
22.2. Support for Common Components and Features The framework of Polyscheme supports the following features and components that are common for many cognitive architectures: working memory (each specialist implement its own), semantic memory (each specialist implement its own), episodic memory (each specialist implement its own), procedural memory (each specialist implement its own), iconic memory (there is a spatial specialist that has some of this functionality), perceptual memory (there is a spatial specialist that has some of this functionality), cognitive map (there is a spatial specialist that has this functionality), reward system, and attention control. 22.3. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with Polyscheme include reasoning and model finding. Specifically, the following paradigms were modeled: problem solving, language processing, and pretend-play (in a limited sense).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
232
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
23. Recommendation Architecture (RA)
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This section is based on the contribution of L. Andrew Coward to the Comparative Table of Cognitive Architectures.1 Theoretical arguments indicate that any system which must learn to perform a large number of different behaviors will be constrained into this recommendation architecture form by a combination of practical requirements including the need to limit information handling resources, the need to learn without interference with past learning, the need to recover from component failures and damage, and the need to construct the system efficiently.
Figure 20. A bird’s eye view of Recommendation Architecture.
23.1. Overview Knowledge and experiences are represented in RA using a large set of heuristically defined similarity circumstances, each of which is a group of information conditions that are similar and have tended to occur at the same time in past experience. One similarity circumstance does not correlate unambiguously with any one cognitive category, but each similarity circumstance is associated with a range of recommendation weights in favor of different behaviors (such as identifying categories in current experience). The predominant weight across all currently detected similarity circumstances is the accepted behavior. Main components of the architecture include
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
233
(Figure 20): Condition definition and detection (cortex); Selection of similarity circumstances to be changed in each experience (hippocampus); Selection of sensory and other information to be used for current similarity circumstance detection (thalamus); Assignment and comparison of recommendation weights to determine current behavior (basal ganglia); Reward management to change recommendation weights (nucleus accumbens etc.); Management of relative priority of different types of behavior (amygdala and hypothalamus); Recording and implementation of frequently required behavior sequences (cerebellum). RA was originally introduced in [67]. Other key references include [68,69] (see Most recent also http://cs.anu.edu.au/~Andrew.Coward/References.html). representative publications include [70,71]. RA was implemented and tested / studied experimentally using Smalltalk.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
23.2. Support for Common Components and Features The framework of RA supports the following features and components that are common for many cognitive architectures: working memory (frequency modulation placed on neuron spike outputs, with different modulation phase for different objects in working memory), semantic memory (similarity circumstances, or cortical column receptive fields, that are often detected at the same time acquire ability to indirectly activate each other), episodic memory (similarity circumstances, or cortical column receptive fields, that change at the same time acquire ability to indirectly activate each other; an episodic memory is indirect activation of a group of columns that all changed at the same time at some point in the past; because the hippocampal system manages the selection of cortical columns that will change in response to each sensory experience, information used for this management is applicable to construction of episodic memories), procedural memory (recommendation weights associated in the basal ganglia with cortical column receptive field detections instantiate procedural memory), iconic memory (maintained on the basis of indirect activation of cortical columns recently active at the same time), perceptual memory (based on indirect activation of cortical columns on the basis of recent simultaneous activity), reward system (some receptive field detections are associated with recommendations to increase or decrease recently used behavioral recommendation weights), attention control and consciousness. In this architecture, cortical columns have receptive fields defined by groups of similar conditions that often occurred at the same time in past sensory experiences, and are activated if the receptive field occurs in current sensory inputs. Attention is selection of a subset of currently detected columns to be allowed to communicate their detections to other cortical areas. The selection is on the basis of recommendation strengths of active cortical columns, interpreted through the thalamus, and is implemented by placing a frequency modulation on the action potential sequences generated by the selected columns. Consciousness is a range of phenomena involving pseudosensory experiences. A column can also be indirectly activated if it was recently active, or often active in the past, or if it expanded its receptive field at the same time as a number of currently active columns. Indirect activations lead to “conscious experiences”. Sensory, motor and other specific modalities in RA include visual input (implemented by emulation of action potential outputs of populations of simulated
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
234
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
sensory neurons), auditory input (implemented by emulation of action potential outputs of populations of simulated sensory neurons). 23.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in RA include: reinforcement learning (increases in recommendation weights associated in the basal ganglia with cortical column receptive field detections, on the basis of rewards), Hebbian learning (with an overlay management that determines whether Hebbian learning will occur at any point in time), learning of new representations (a new representation is a new subset of receptive field detections, with slight changes to some receptive fields).
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
23.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms: all cognitive processes are implemented through sequences of receptive field activations, including both direct detections and indirect activations. At each point in the sequence the behavior with the predominant recommendation weight across the currently activated receptive field population is performed. This behavior may be to focus attention on a particular subset of current sensory inputs or to implement a particular type of indirect activation (prolong current activity, or indirectly activate on the basis of recent simultaneous activity, past frequent simultaneous activity, or past simultaneous receptive field change). Recommendation weights are acquired through rewards that result in effective sequences for cognitive processing. Frequently used sequences are recorded in the cerebellum for rapid and accurate implementation. Specifically, the following paradigms were modeled: problem solving (a simple example is fitting together two objects; first step is activating receptive fields often active in the past shortly after fields directly activated by one object were active; because objects have often been seen in the past in several different orientations, this indirect activation is effectively a “mental rotation”; receptive fields combining information from the indirect activation derived from one object and the direct activation from the other object recommend movements to fit the objects together; a bias is placed upon acceptance of such behaviors by taking on the task), decision making (there can be extensive indirect activation steps, with slight changes to receptive fields at each step; eventually, one behavior has a predominant recommendation strength in the basal ganglia, and this behavior is the decision), language processing, working memory tests, perceptual illusions, implicit memory tasks (depend on indirect receptive field activations on the basis of recent simultaneous activity). In addition, the following specific paradigms were modeled with this architecture: task switching, dual task, visual perception with comprehension, spatial exploration, learning and navigation, object/feature search in an environment, learning from instructions.
24. REM This section is based on the contribution of Ashok K. Goel, J. William Murdock, and Spencer Rugaber to the Comparative Table of Cognitive Architectures.1
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
235
24.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in REM using tasks, methods, assertions, and traces. Main components of the architecture include (Figure 21): Task-MethodKnowledge (TMK) models provide functional models of what agents know and how they operate. They describe components of a reasoning process in terms of intended effects, incidental effects, and decomposition into lower-level components. Tasks include the requirements and intended effects of some computation. Methods implement a task and include a state-transition machine in which transitions are accomplished by subtasks. REM was originally introduced in [72]. Other key references include [73,74]. Most recent representative publications include [75]. REM was implemented and tested / studied experimentally using Common Lisp, with Loom as the underlying knowledgeengine; a Java version is in development. Funding programs, projects and environments in which the architecture was used include the NSF Science of Design program, the Self Adaptive Agents project, and Turn-based strategy games.
Figure 21. A bird’s eye view of REM.
24.2. Support for Common Components and Features The framework of REM supports the following features and components that are common for many cognitive architectures: working memory (no commitment to a specific theory of working memory), semantic memory (uses Powerloom as underlying knowledge engine; OWL based ontological representation), episodic memory (defined as traces through the procedural memory), procedural memory (defined as tasks, which are functional elements, and methods, which are behavioral elements), iconic memory (spatial relationships are encoded using the underlying knowledge engine), reward system (functional descriptions of tasks allow agents to determine success or failure of those tasks by observing the state of the world; success or failure can then be used to reinforce decisions made during execution), attention control and consciousness:
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
236
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
implicit in the fact that methods constrain how and when subtasks may be executed (via a state-machine); each method can be in only one state at a time, corresponding to the attended-to portion of a reasoning task (and directly linked to the attended-to knowledge via the tasks requirements and effects). 24.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in REM include: reinforcement learning (learns criteria for selecting alternative methods for accomplishing tasks, and also alternative transitions within a methods state-transition machine), learning of new representations (learns refinements of methods for existing tasks and can respond to a specification of some new task by adapting the methods for some similar task). 24.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with REM include Reflection: adaptation in response to new functional requirements; also Planning and Reinforcement learning. Specifically, the following paradigms were modeled: problem solving, decision making (specifically in selecting methods to perform a task and selecting transitions within a method), analogical reasoning (using the functional specification - requirements and effects - of tasks), and metacognitive tasks. In addition, the following specific paradigms were modeled with this architecture: Tower of Hanoi/London, spatial exploration, learning and navigation, learning from instructions (from new task specifications).
25. SOAR
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This section is based on the contribution of John Laird and Andrea Stocco to the Comparative Table of Cognitive Architectures.1 25.1. Overview Knowledge and experiences are represented in SOAR using rules (procedural knowledge), relational graph structure (semantic knowledge), and episodes of relational graph structures (episodic memory). Main components (Figure 22) are: working memory (encoded as a graph structure; knowledge is represented as rules organized as operators); semantic memory; episodic memory; mental imagery; reinforcement learning. SOAR was originally introduced in [76,77]. Most recent representative publications include [78]. SOAR was implemented in C (with interfaces to almost any language) and Java. Under general framework, Soar uses the Problem Space Computational Model. 25.2. Support for Common Components and Features The framework of SOAR supports the following features and components that are common for many cognitive architectures: working memory (relational graph structure),
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
237
semantic memory (relational graph structure), episodic memory (encoded as graph structures: snapshots of working memory), procedural memory (rules), iconic memory (explicitly defined), reward system (appraisal-based reward as well as user-defined internal/external reward), attention control and consciousness. Sensory, motor and other specific modalities include: visual input (propositional or relational), auditory input (support for text-based communication).
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Figure 22. Extended SOAR (see details in [78]).
25.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in SOAR include: reinforcement learning (for operators: SARSA/Q-learning), and learning of new representations: chunking (forms new rules); also mechanisms to create new episodes, new semantic memories. 25.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with SOAR include problem solving, decision making, analogical reasoning (limited), language processing, and working memory tasks. The following specific paradigms were modeled with this architecture: Tower of Hanoi/London, dual task, spatial exploration, learning and navigation (implemented but not compared to human behavior), learning from instructions (implemented [79] but not compared to human behavior).
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
238
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Recent extensions of Soar [78] (Figure 22) support episodic memory, visual imagery (as a special modality) and emotions, among other forms of higher cognition.
26. Ymir This section is related to the contribution of Kristinn Thórisson to the Comparative Table of Cognitive Architectures.1 26.1. Overview
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Knowledge and experiences are represented in Ymir using distributed modules with traditional Frames. Main components of the architecture (Figure 23) include a network of distributed heterogeneous modules interacting via blackboards.
Figure 23. A bird’s eye view of Ymir.
Ymir was originally introduced in [80,81]. Other key references include [82-85]. Most recent representative publications include [86]. Further details can be found at http://alumni.media.mit.edu/~kris/ymir.html. Ymir was implemented and tested / studied experimentally using CommonLisp, C, C++, 8 networked computers, sensing hardware. 26.2. Support for Common Components and Features The framework of Ymir supports the following features and components that are common for many cognitive architectures: working memory (including Functional
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
239
Sketchboard, Content Blackboard, Motor Feedback Blackboard, and Frames), semantic memory (Frames), episodic memory (Functional Sketchboard, Content Blackboard, Motor Feedback Blackboard), procedural memory (Frames, Action Modules: limited implementation), cognitive map (limited body-centric spatial layout of selected objects), perceptual memory: perceptual representations at multiple levels (complexity) and timescales (see “visual input” and “auditory input” below). Higher cognitive features include a controlled within-utterance attention span and a situated spatial model of embodied self (but no semantic representation of self that could be reasoned over is used in Ymir). Sensory, motor and other specific modalities include: visual input (temporally and spatially accurate vector model of upper human body, including hands, fingers, one eye. Via body-tracking suit, gloves and eyetracker), auditory input (speech recognition: BBN Hark; custom real-time prosody tracker with H* and L* detection), special modalities (multimodal integration and real-time multimodal communicative act interpretation). 26.3. Learning, Goal and Value Generation, and Cognitive Development Learning algorithms used in Ymir include reinforcement learning (in a recent implementation of RadioShowHost).
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
26.4. Cognitive Modeling and Application Paradigms, Scalability and Limitations Main general paradigms of modeling studies with Ymir include integrated behaviorbased and classical AI and blackboards. Specifically, the following paradigms were modeled: decision making (using hierarchical decision modules as well as traditional planning methods), language processing, working memory tasks, task switching, visual perception with comprehension. The architecture can function autonomously, addresses the brittleness problem, can pay attention to valued goals, can flexibly switch attention between unexpected challenges and valued goals, can adaptively fuse information from multiple types of sensors and modalities.
27. Concluding Remarks The material included here is far from being complete. This review is a first step that will have follow-ups. Neither the list of architectures presented here nor the description of each individual architecture can be considered comprehensive or close to complete at this point. The present publication does not aim to explain the ideas behind each design (this will be done elsewhere), and merely documents the fact of co-existence of a large variety of cognitive architectures, models, frameworks, etc. that have many features in common and need to be studied in connection to each other. Many cognitive architectures are not included here, because their data are missing in the online Comparative Table.1 Their names include: Icarus, SAL, Tosca, 4CAPS, AIS, Apex, Atlantis, CogNet, Copycat, DUAL, Emotion Machine, ERE, Gat, Guardian, H-Cogaff, Homer, Imprint, MAX, Omar, PRODIGY, PRS, Psi-Theory, R-CAST, RALPH-MEA, Society of Mind, Subsumption Architecture, Teton, Theo, and many
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
240
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
more (e.g., see http://Cogarch.org, http://en.wikipedia.org/wiki/Cognitive_architectures, http://www.cogsci.rpi.edu/~rsun/arch.html, http://bicasymposium.com/cogarch/). It is tempting to make generalizing observations and conclusions based on the presented material; however, this will be done elsewhere, based on a more complete dataset. It is amazing to see how many cognitive architectures (to be more precise, cognitive architecture families) co-exist at present: 54 are mentioned here, of which 26 are described, and there are maybe hundreds of them in the literature. It is also amazing to see how many of them (virtually all) take their origin from biological inspirations, and how similar different approaches are in their basic foundations: as if they were copied from each other. It is also remarkable how many modern frameworks tend to be advanced and up to the state of the art, incorporating higher cognitive features and functions such as episodic memory, emotions, metacognition, imagery, etc., and (hopefully, in a near future) a human-like self capable of cognitive growth. 27.1. Acknowledgments
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This publication is based on the publicly available Comparative Table of Cognitive Architectures (the version that was available at http://bicasymposium.com/cogarch on September 20, 2010). This continuously updated online resource is a collective product of many independent contributions: researchers and developers of cognitive architectures. All contributors gave me permission to use their contributions in this publication, and I am grateful to them for their help during preparation of this review. Their names include, alphabetically: James S. Albus, Raul Arrabales, Cyril Brom, Nick Cassimatis, Balakrishnan Chandrasekar, Andrew Coward, Susan L. Epstein, Rick Evertsz, Stan Franklin, Fernand Gobet, Ashok Goel, Ben Goertzel, Stephen Grossberg, Jeff Hawkins, Unmesh Kurup, John Laird, Peter Lane, Christian Lebiere, Shoshana Loeb, Shane Mueller, William Murdock, David C. Noelle, Frank Ritter, Brandon Rohrer, Spencer Rugaber, Alexei Samsonovich, Stuart C. Shapiro, Andrea Stocco, Ron Sun, George Tecuci, Akshay Vashist, Pei Wang. Their contributions to the online resource were used here.
References [1] Azevedo, R., Bench-Capon, T., Biswas, G., Carmichael, T., Green, N., Hadzikadic, M., Koyejo, O., Kurup, U., Parsons, S., Pirrone, R., Prakken, H., Samsonovich, A., Scott, D., and Souvenir, R. (2010). Reports on the AAAI 2009 Fall Symposia. AI Magazine 31 (1): 88-94. [2] Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harward University Press. [3] SIGArt, (1991). Special section on integrated cognitive architectures. Sigart Bulletin, 2(4). [4] Pew, R. W., and Mavor, A. S. (Eds.). (1998). Modeling Human and Organizational Behavior: Application to Military Simulations. Washington, DC: National Academy Press. books.nap.edu/catalog/6173.html. [5] Ritter, F. E., Shadbolt, N. R., Elliman, D., Young, R. M., Gobet, F., and Baxter, G. D. (2003). Techniques for Modeling Human Performance in Synthetic Environments: A Supplementary Review. WrightPatterson Air Force Base, OH: Human Systems Information Analysis Center (HSIAC). [6] Gluck, K. A., and Pew, R. W. (Eds.). (2005). Modeling Human Behavior with Integrated Cognitive Architectures: Comparison, Evaluation, and Validation. Mahwah, NJ: Erlbaum. [7] Gray, W. D. (Ed.) (2007). Integrated Models of Cognitive Systems. Series on Cognitive Models and Architectures. Oxford, UK: Oxford University Press. [8] Albus, J. S. and Barbera, A. J. (2005). RCS: A cognitive architecture for intelligent multi-agent systems. Annual Reviews in Control 29 (1): 87-99. [9] Anderson, J. (1976). Language, Memory and Thought. Hillsdale, NJ: Erlbaum Associates.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
241
[10] Anderson, J. R. and Lebiere, C. (1998). The Atomic Components of Thought. Mahwah: Lawrence Erlbaum Associates. [11] Anderson, J. R. (2007). How Can the Human Mind Occur in the Physical Universe? New York: Oxford University Press. [12] Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science 11: 23-63. [13] Rohrer, B. and Hulet, S. (2008). BECCA: a brain emulating cognition and control architecture. In De Jong, D. A., Ed. Progress in Biological Cybernetics Research, pp. 1-38. Nova Publishers. [14] Rohrer, B., Bernard, M., Morrow, J. D., Rothganger, F., and Xavier, P. (2009). Model-free learning and control in a mobile robot. In 5th International Conference on Natural Computation, Tianjin, China, Aug 14-16, 2009. [15] Kurup, U. and Chandrasekaran, B. (2007). Modeling memories of large-scale space using a bimodal cognitive architecture. In Proceedings of the International Conference on Cognitive Modeling, July 2729, 2007, Ann Arbor, MI, 6 pages (CD-ROM). [16] Chandrasekaran, B. (2006). Multimodal cognitive architecture: Making perception more central to intelligent behavior. Proceedings of the AAAI National Conference on Artificial Intelligence, pp. 15081512. Menlo Park, CA: AAAI Press. [17] Arrabales, R., Ledezma, A., and Sanchis, A. (2009). A cognitive approach to multimodal attention. Journal of Physical Agents 3 (1): 53-64. [18] Arrabales, R., Ledezma, A., and Sanchis, A. (2009). Towards conscious-like behavior in computer game characters. In Proceedings of the IEEE Symposium on Computational Intelligence and Games 2009 (CIG-2009), pp. 217-224. [19] Baars, B. J. (1988). A Cognitive Theory of Consciousness. Cambridge, MA: Cambridge University Press. [20] Arrabales, R. Ledezma, A. and Sanchis, A. (2009). CERA-CRANIUM: A Test Bed for Machine Consciousness Research. Proceedings of the International Workshop on Machine Consciousness, Hong Kong. [21] Hingston, P. (2009). A Turing test for computer game bots. IEEE Transactions on Computational Intelligence and AI In Games 1 (3): 169-186. [22] Gobet, F., Lane, P. C. R., Croker, S., Cheng, P. C-H., Jones, G., Oliver, I. and Pine, J. M. (2001). Chunking mechanisms in human learning. TRENDS in Cognitive Sciences 5: 236-243. [23] Gobet, F. and Lane, P. (2005). The CHREST architecture of cognition: Listening to empirical data. In D. Davis (Ed.). Visions of Mind: Architectures for Cognition and Affect (pp. 204-224). Hershey, PA: Information Science Publishing. [24] Gobet, F., and Lane, P. C. R. (2010). The CHREST architecture of cognition: The role of perception in general intelligence. The Third Conference on Artificial General Intelligence. Lugano, Switzerland. [25] Sun, R., Peterson, T., and Merrill, E. (1996). Bottom-up skill learning in reactive sequential decision tasks. Proceedings of 18th Cognitive Science Society Conference, Lawrence Erlbaum Associates, Hillsdale, NJ. pp.684-690. [26] Sun, R. (2004). The CLARION cognitive architecture: Extending cognitive modeling to social simulation. In: Sun, R. (Ed.). Cognition and Multi-Agent Interaction. Cambridge University Press: New York. [27] Sun, R. (2002). Duality of the Mind. Lawrence Erlbaum Associates, Mahwah, NJ. [28] Helie S. and Sun, R. (2010). Incubation, insight, and creative problem solving: A unified theory and a connectionist model. Psychological Review 117 (3): 994-1024. [29] Sun, R., Slusarz, P., and Terry, C. (2005). The interaction of the explicit and the implicit in skill learning: A dual-process approach . Psychological Review 112 (1): 159-192. [30] Goertzel, B. (2009). OpenCogPrime: A cognitive synergy based architecture for embodied general intelligence. In Proceedings of ICCI-2009. [31] Goertzel, B. et al. (2010). OpenCogBot: Achieving generally intelligent virtual agent control and humanoid robotics via cognitive synergy. In Proceedings of ICAI-10, Beijing. [32] Evertsz, R., Pedrotti, M., Busetta, P., Acar, H., and Ritter, F. E. (2009). Populating VBS2 with realistic virtual actors. In Proceedings of the 18th Conference on Behavior Representation in Modeling and Simulation, pp. 1-8. 09-BRIMS-04. [33] Tecuci, G. (1988). Disciple: A Theory, Methodology and System for Learning Expert Knowledge, Thèse de Docteur en Science, University of Paris-South. [34] Tecuci, G. (1988). Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies. San Diego, CA: Academic Press. [35] Tecuci, G., Boicu, M., Bowman, M., Marcu, D., with a commentary by Burke, M. (2001). An innovative application from the DARPA knowledge bases programs: Rapid development of a course of action critique. AI Magazine, 22 (2): 43-61.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
242
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
[36] Tecuci, G., Boicu, M., Boicu, C., Marcu, D., Stanescu, B., and Barbulescu, M. (2005). The DiscipleRKF learning and reasoning agent. Computational Intelligence, 21 (4): 462-479. [37] Tecuci, G., Boicu, M., and Comello, J. (2008). Agent-Assisted Center of Gravity Analysis, CD with Disciple-COG and Lecture Notes used in courses at the US Army War College and Air War College, GMU Press. [38] Tecuci, G., Boicu, M., Marcu, D., Boicu, C., and Barbulescu, M., Disciple-LTA: Learning, tutoring and analytic assistance. Journal of Intelligence Community Research and Development, 2008. [39] Tecuci, G., Schum, D.A., Boicu, M., Marcu, D., and Hamilton, B. (2010). Intelligence analysis as agent-assisted discovery of evidence, hypotheses and arguments. In: Phillips-Wren, G., Jain, L.C., Nakamatsu, K., and Howlett, R.J. (Eds.) Advances in Intelligent Decision Technologies, SIST 4, pp. 110. Springer-Verlag, Berlin Heidelberg. [40] Kieras, D. and Meyer, D. E. (1997). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction, 12: 391438. [41] Meyer, D. E. and Kieras, D. E. (1997). A computational theory of executive cognitive processes and multiple task performance: Part I. Basic mechanisms. Psychological Review 63: 81-97. [42] Kieras, D. EPIC Architecture Principles of Operation (ftp://www.eecs.umich.edu/people/kieras/EPICtutorial/EPICPrinOp.pdf). [43] Epstein, S. L. (2011). Learning expertise with bounded rationality and self-awareness. In Cox, M. T. and Raja, A. (Eds.). Metareasoning: Thinking about Thinking. Cambridge, MA: The MIT Press (forthcoming). [44] Susan L. Epstein. 1994. For the Right Reasons: The FORR Architecture for Learning in a Skill Domain. Cognitive Science 18(3): 479-511. [45] Epstein, S. L. and S. Petrovic. 2010 Learning Expertise with Bounded Rationality and Self-awareness. In Hamadi, Y. and Saubion, E. M. F. (Eds.). Autonomous Search. Springer. [46] Shapiro, S. C. and Bona, J. P. (2009). The GLAIR cognitive architecture. In Samsonovich, A. V. (Ed.). (2009). Biologically Inspired Cognitive Architectures II: Papers from the AAAI Fall Symposium. AAAI Technical Report FS-09-01. Menlo Park, CA: AAAI Press. [47] Shapiro, S. C. and Bona, J. P. (2010). The GLAIR cognitive architecture. International Journal of Machine Consciousness 2 (2): 307-332. [48] Samsonovich, A. V., De Jong, K. A., and Kitsantas, A. (2009). The mental state formalism of GMUBICA. International Journal of Machine Consciousness 1 (1): 111-130. [49] Kalish, M. Q., Samsonovich, A. V., Coletti, M. A., and De Jong, K. A. (2010). Assessing the role of metacognition in GMU BICA. In Samsonovich, A. V., Johannsdottir, K. R., Chella, A., and Goertzel, B. (Eds.). Biologically Inspired Cognitive Architectures 2010: Proceedings of the First Annual Meeting of the BICA Society. Amsterdam, The Netherlands: IOS Press (this volume). [50] Samsonovich, A. V. and De Jong, K. A. (2005). Designing a self-aware neuromorphic hybrid. In K.R. Thórisson, H. Vilhjalmsson, and S. Marsela (Eds.). AAAI-05 Workshop on Modular Construction of Human-Like Intelligence: AAAI Technical Report WS-05-08, pp. 71–78. Menlo Park, CA: AAAI Press (http://ai.ru.is/events/2005/AAAI05ModularWorkshop/papers/WS1105Samsonovich.pdf). [51] Samsonovich, A. V., Ascoli, G. A., De Jong, K. A., and Coletti, M. A. (2006). Integrated hybrid cognitive architecture for a virtual roboscout. In M. Beetz, K. Rajan, M. Thielscher, and R.B. Rusu (Eds.). Cognitive Robotics: Papers from the AAAI Workshop, AAAI Technical Reports WS-06-03, pp. 129–134. Menlo Park, CA: AAAI Press. [52] Hawkins, J. and Blakeslee, S. (2005). On Intelligence. New York: Times Books. [53] George, D.,] and Hawkins, J. (2009) Towards a mathematical theory of cortical micro-circuits. PLoS Computational Biology 5 (10). [54] O’Reilly, R.C. (1996). Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm. Neural Computation 8: 895–938. [55] O’Reilly, R.C. and Munakata, Y. (2000). Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. Cambridge, MA: MIT Press. [56] Jilk, D.J., Lebiere, C., O’Reilly, R.C. and Anderson, J.R. (2008). SAL: An explicitly pluralistic cognitive architecture. Journal of Experimental and Theoretical Artificial Intelligence, 20: 197-218. [57] Franklin, S. and F. G. Patterson, Jr. (2006). The Lida architecture: Adding new modes of learning to an intelligent, autonomous software agent. Integrated Design and Process Technology, IDPT-2006, San Diego, CA, Society for Design and Process Science. [58] Franklin, S., and Ferkin, M. H. (2008). Using broad cognitive models and cognitive robotics to apply computational intelligence to animal cognition. In Smolinski, T. G., Milanova, M. M., and Hassanien, A.-E. (Eds.). Applications of Computational Intelligence in Biology: Current Trends and Open Problems, pp. 363-394. Berlin: Springer-Verlag.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
243
[59] Franklin, S. (2007). A foundational architecture for artificial general intelligence. In Goertzel, B. and Wang, P. (Eds.). Advances In Artificial General Intelligence: Concepts, Architectures and Algorithms, Proceedings of the AGI Workshop 2006, pp. 36-54. Amsterdam, The Netherlands: IOS Press. [60] Wang, P. (2006). Rigid Flexibility: The Logic of Intelligence. Berlin: Springer. [61] Vashist, A. and Loeb, S. (2010). Attention focusing model for nexting based on learning and reasoning. In Samsonovich, A. V., Johannsdottir, K. R., Chella, A., and Goertzel, B. (Eds.). Biologically Inspired Cognitive Architectures 2010: Proceedings of the First Annual Meeting of the BICA Society. Amsterdam, The Netherlands: IOS Press (this volume). [62] Brom, C., Pešková, K., Lukavský, J.: What does your actor remember? Towards characters with a full episodic memory. In Proceedings of 4th International Conference on Virtual Storytelling, LNCS. Springer. [63] Gemrot, J., Kadlec, R., Bida, M., Burkert, O., Pibil, R., Havlicek, J., Zemcak, L., Simlovic, J., Vansa, R., Stolba, M., Plch, T., Brom C. (2009). Pogamut 3 can assist developers in building ai (not only) for their videogame agents. In: Agents for Games and Simulations, LNCS 5920, pp. 1-15. Springer. [64] Brom, C., Lukavský, J., Kadlec, R. (2010). Episodic memory for human-like agents and human-like agents for episodic memory. International Journal of Machine Consciousness 2 (2): 227-244. [65] Cassimatis, N.L., Trafton, J.G., Bugajska, M.D., and Schultz, A.C. (2004). Integrating cognition, perception and action through mental simulation in robots. Journal of Robotics and Autonomous Systems 49 (1-2): 13-23. [66] Cassimatis, N. L., Bignoli, P., Bugajska, M., Dugas, S., Kurup, U., Murugesan, A., and Bello, P. (in press). An Architecture for Adaptive Algorithmic Hybrids. IEEE Transactions on Systems, Man, and Cybernetics. Part B. [67] Coward, L. A. (1990). Pattern Thinking. New York: Praeger. [68] Coward, L.A. (2001). The Recommendation Architecture: lessons from the design of large scale electronic systems for cognitive science. Journal of Cognitive Systems Research 2: 111-156. [69] Coward, L.A. (2005). A System Architecture Approach to the Brain: from Neurons to Consciousness. Nova Science Publishers, New York. [70] Coward, L.A., and Gedeon, T.O. (2009). Implications of resource limitations for a conscious machine. Neurocomputing 72: 767-788. [71] Coward, L. A. (2010). The hippocampal system as the cortical resource manager: a model connecting psychology, anatomy and physiology. Advances in Experimental Medicine and Biology 657: 315 - 364. [72] Murdock, J.W. and Goel, A.K. (2001). Meta-case-based reasoning: using functional models to adapt case-based agents. Proceedings of the 4th International Conference on Case-Based Reasoning (ICCBR'01). Vancouver, Canada, July 30 - August 2, 2001. [73] Murdock, J.W. and Goel, A.K. (2003). Localizing planning with functional process models. Proceedings of the Thirteenth International Conference on Automated Planning and Scheduling (ICAPS'03). Trento, Italy. [74] Ulam, P., Goel, A.K., Jones, J., and Murdock, J.W. (2005). Using model-based reflection to guide reinforcement learning. Proceedings of the IJCAI 2005 Workshop on Reasoning, Representation and Learning in Computer Games. Edinburgh, UK. [75] Murdock, J.W. and Goel, A.K. (2008). Meta-case-based reasoning: self-improvement through selfunderstanding. Journal of Experimental and Theoretical Artificial Intelligence, 20(1):1-36. [76] Laird, J.E., Rosenbloom, P.S., and Newell, A. (1986). Universal Subgoaling and Chunking: The Automatic Generation and Learning of Goal Hierarchies. Boston: Kluwer. [77] Laird, J.E., Newell, A., and Rosenbloom, P.S., (1987). SOAR: An architecture for general intelligence. Artificial Intelligence 33: 1-64. [78] J. E. Laird (2008). Extending the Soar cognitive architecture. In P. Wang, B. Goertzel and S. Franklin, eds. Artificial General Intelligence 2008: Proceedings of the First AGI Conference, pp. 224-235. Amsterdam, The Netherlands: IOS Press. [79] Huffman, S.B., and Laird, J.E. (1995). Flexibly instructable agents. Journal of Artificial Intelligence Research 3: 271-324. [80] Thórisson, K. R. (1996). Communicative Humanoids: A Computational Model of Psycho-Social Dialogue Skills. Ph.D. Thesis, Media Laboratory, Massachusetts Institute of Technology. [81] Thórisson, K. R. (1999). A mind model for multimodal communicative creatures and humanoids. International Journal of Applied Artificial Intelligence, 13(4-5): 449-486. [82] Thórisson, K. R. (2002). Machine perception of multimodal natural dialogue. In P. McKevitt (Ed.). Language, Vision and Music. Amsterdam: John Benjamins. [83] Thórisson, K. R. (1998). Real-time decision making in face to face communication. Second ACM International Conference on Autonomous Agents, Minneapolis, Minnesota, May 11-13, 16-23. [84] Thórisson, K. R. (1997). Layered modular action control for communicative humanoids. Computer Animation '97, Geneva, Switzerland, June 5-6, 134-143.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
244
A.V. Samsonovich / Toward a Unified Catalog of Implemented Cognitive Architectures
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
[85] Thórisson, K. R. (2002). Natural turn-taking needs no manual: a computational theory and model, from perception to action. In B. Granström (Ed.). Multimodality in Language and Speech Systems. Heidelberg: Springer-Verlag. [86] Ng-Thow-Hing, V., K. R. Thórisson, R. K. Sarvadevabhatla, J. Wormer and Thor List (2009). Cognitive map architecture: facilitation of human-robot interaction in humanoid robots. IEEE Robotics and Automation Magazine, 16(1):55-66. [87] Rohrer, B., Morrow, J. D., Rothganger, F., and Xavier, P. G. (2009). Concepts from data. In Samsonovich, A. V. (Ed.). Biologically Inspired Cognitive Architectures II: Papers from the AAAI Fall Symposium. AAAI Technical Report FS-09-01, pp. 116-122. Menlo Park, CA: AAAI Press.
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved.
245
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Subject Index action selection 147 active perception 86 active vision 17 ACT-R 181 Adaptive Resonance Theory (ART) 23 agent network 92 agent-based simulation 125 anisotropic 163 anticipative systems 86 architecture 85 artificial intelligence 85 artificial neural network 175 ASMO 98 attention 98 attention focus 170 basal ganglia 147, 153 Bayesian 113 biology 85 brain 85 cerebellar model 23 challenge 191 cognitive architecture 52, 98, 113, 119, 195 cognitive model 17, 153 cognitive modeling 78 cognitive neuroscience 42 cognitive robotics 86, 106 cognitive robots 52 computational model 175 computational semantics 92 conceptual graphs 131 context 113 data-oriented processing 42 DBN 17 decision-making 92 deep-layered machine learning 4 differential motion 24 docking 181 dynamic behaviour 98 emotions 33, 85, 125 episodic and semantic memory modelling 64 event segmentation 86
evolution 40 executive control 137, 170 extended mind 79 externalism 79 first-order variables 119 FoE 24 fringe 79 gaze modeling 25 genetic algorithms 106 graphical models 119 heading 24 herbal 181 hippocampus 10, 175 human-level intelligence 191 human-like intelligence 72 humanoid robot 33 inference 170 instructions 153 Jackendoff’s challenges 42 latent semantic analysis 33 learning 58, 125 learning by reading 92 long-term HRI 64 machine consciousness 52, 79 machine learning 92 MCMC 17 MDL 113 memory systems 40 metacognitive architectures 72 model and data sharing 195 model validation 181 motion 24 motivation 58 motor learning 23 movement 40 MST 24 MT 24 natural image statistics 25 natural language understanding 131 neural engineering framework 42, 147 neural networks 58, 106, 153 neurons 85 neuroscience 17
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
246
170 106 163 163 4 33 17 137 147 131 191 191 98
Sims 2 designer 125 sparse coding 25 spatial cognition 72 spatial learning 10 spatiotemporal inference 4 spatio-temporal working memory 137 spiking networks 10 trust 78 unifying framework 195 vector symbolic architecture 42, 147 virtual patient 92 vision 113 ViSTARS 24
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
nexting non linear dynamics non-parametric object recognition perception personality POMDP prefrontal cortex (PFC) production systems psycholinguistics roadmap scientific societies self-modification
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Biologically Inspired Cognitive Architectures 2010 A.V. Samsonovich et al. (Eds.) IOS Press, 2010 © 2010 The authors and IOS Press. All rights reserved.
247
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
Author Index Albus, J. Aloimonos, Y. Anderson, J.R. Arel, I. Ascoli, G.A. Beale, S. Berant, S. Bernard, M.L. Bobrow, R. Bod, R. Boicu, M. Brady, M.C. Browning, N.A. Carbone, A. Catizone, R. Caudell, T.P. Chelian, S.E. Chella, A. Cohen, N.J. Coletti, M.A. Dautenhahn, K. De Jong, K.A. Du Casse, K. Eichenbaum, H. Eilbert, J.L. Eliasmith, C. English, J. Gayler, R.W. Goertzel, B. Haikonen, P.O.A. Herd, S. Ho, W.C. Jóhannsdóttir, K.R. Johnston, B. Kalish, M.Q. Kennedy, W.G. Laddaga, R. Lebiere, C. Levy, S.D.
3 163 153 4 10 92 4 175 17 42 169 23 24 25 92 175 137 v, 33, 191 175 72 64 72 64 175 40 147 92 42 v 52 58 64 v, 191 98 72 78 17, 113 153 42
Lim, M.Y. Loeb, S. Manzotti, R. Marcu, D. McGinnity, T.M. McShane, M. Mingus, B. Morgan, J.H. Nath, D.J. Nery, B. Nirenburg, S. Novianto, R. O’Reilly, R.C. Paik, J. Pilato, G. Riano, L. Ritter, F.E. Robertson, P. Rosenbloom, P. Samsonovich, A.V. Schum, D. Sellers, M. Sorbello, R. Sowa, J.F. Srinivasa, N. Stewart, T.C. Stocco, A. Summers-Stay, D. Taylor, S.E. Tecuci, G. Vashist, A. Vassallo, G. Ventura, R. Verzi, S. Vineyard, C.M. Watson, P. Williams, M.-A. Zhao, C.
64 170 79 169 106 92 58 181 85 86 92 98 58, 153 181 33 106 181 17, 113 119 v, 10, 72, 191, 195 169 125 33 131 137 147 153, 191 163 175 169 170 33 86 175 175 175 98 181
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.
Copyright © 2010. IOS Press, Incorporated. All rights reserved.
This page intentionally left blank
Biologically Inspired Cognitive Architectures 2010 : Proceedings of the First Annual Meeting of the BICA Society, edited by A. V.