Computational Complexity: Theory, Techniques, and Applications 1461418003, 9781461418009


148 23 100MB

English Pages [3533]

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Sections
About the Editor-in-Chief
Editorial Board Members
Section Editors
Table of Contents
Contributors
Additive Cellular Automata
Agent Based Computational Economics
Agent Based Modeling and Artificial Life
Agent Based Modeling and Computer Languages
Agent Based Modeling, Large Scale Simulations
Agent Based Modeling, Mathematical Formalism for
Agent Based Modeling and Simulation
Agent Based Modeling and Simulation, Introduction to
Aggregation Operators and Soft Computing
Algorithmic Complexity and Cellular Automata
Amorphous Computing
Analog Computation
Artificial Chemistry
Artificial Intelligence in Modeling and Simulation
Bacterial Computing
Bayesian Games: Games with Incomplete Information
Bayesian Statistics
Bivariate (Two-dimensional)Wavelets
Branching Processes
Cellular Automata as Models of Parallel Computation
Cellular Automata, Classification of
Cellular Automata, Emergent Phenomena in
Cellular Automata and Groups
Cellular Automata in Hyperbolic Spaces
Cellular Automata and Language Theory
Cellular Automata with Memory
Cellular Automata Modeling of Physical Systems
Cellular Automata in Triangular, Pentagonal and Hexagonal Tessellations
Cellular Automata, Universality of
Cellular Automaton Modeling of Tumor Invasion
Cellular Computing
Chaotic Behavior of Cellular Automata
Community Structure in Graphs
Comparison of Discrete and Continuous Wavelet Transforms
Complex Gene Regulatory Networks -- from Structure to Biological Observables: Cell Fate Determination
Complexity in Systems Level Biology and Genetics: Statistical Perspectives
Complex Networks and Graph Theory
Complex Networks, Visualization of
Computer Graphics and Games, Agent Based Modeling in
Computing in Geometrical Constrained Excitable Chemical Systems
Computing with Solitons
Cooperative Games
Cooperative Games (Von Neumann--Morgenstern Stable Sets)
Cooperative Multi-hierarchical Query Answering Systems
Correlated Equilibria and Communication in Games
Correlations in Complex Systems
Cost Sharing
Curvelets and Ridgelets
Data and Dimensionality Reduction in Data Analysis and System Modeling
Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets
Data-Mining and Knowledge Discovery, Introduction to
Data-Mining and Knowledge Discovery, Neural Networks in
Decision Trees
Dependency and Granularity in Data-Mining
Differential Games
Discovery Systems
DNA Computing
Dynamic Games with an Application to Climate Change Models
Dynamics of Cellular Automata in Non-compact Spaces
Embodied and Situated Agents, Adaptive Behavior in
Entropy
Ergodic Theory of Cellular Automata
Evolutionary Game Theory
Evolution in Materio
Evolving Cellular Automata
Evolving Fuzzy Systems
Extreme Value Statistics
Fair Division
Field Theoretic Methods
Firing Squad Synchronization Problem in Cellular Automata
Fluctuations, Importance of: Complexity in the View of Stochastic Processes
Food Webs
Fuzzy Logic
Fuzzy Logic, Type-2 and Uncertainty
Fuzzy Optimization
Fuzzy Probability Theory
Fuzzy Sets Theory, Foundations of
Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions
Game Theory, Introduction to
Game Theory and Strategic Complexity
Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing
Genetic-Fuzzy Data Mining Techniques
Gliders in Cellular Automata
Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach
Granular Computing, Information Models for
Granular Computing, Introduction to
Granular Computing and Modeling of the Uncertainty in Quantum Mechanics
Granular Computing, Philosophical Foundation for
Granular Computing: Practices, Theories, and Future Directions
Granular Computing, Principles and Perspectives of
Granular Computing System Vulnerabilities: Exploring the Dark Side of Social Networking Communities
Granular Model for Data Mining
Granular Neural Network
Granulation of Knowledge: Similarity Based Approach in Information and Decision Systems
Growth Models for Networks
Growth Phenomena in Cellular Automata
Hierarchical Dynamics
Human Sexual Networks
Hybrid Soft Computing Models for Systems Modeling and Control
Identification of Cellular Automata
Immunecomputing
Implementation Theory
Inspection Games
Intelligent Control
Intelligent Systems, Introduction to
Interaction Based Computing in Physics
Internet Topology
Knowledge Discovery: Clustering
Learning in Games
Learning and Planning (Intelligent Systems)
Levy Statistics and Anomalous Transport: Levy Flights and Subdiffusion
Link Analysis and Web Search
Logic and Geometry of Agents in Agent-Based Modeling
Machine Learning, Ensemble Methods in
Manipulating Data and Dimension Reduction Methods: Feature Selection
Market Games and Clubs
Mathematical Basis of Cellular Automata, Introduction to
Mechanical Computing: The Computational Complexity of Physical Devices
Mechanism Design
Membrane Computing
Minority Games
Mobile Agents
Molecular Automata
Motifs in Graphs
Multi-Granular Computing and Quotient Structure
Multivariate Splines and Their Applications
Multiwavelets
Nanocomputers
Network Analysis, Longitudinal Methods of
Networks and Stability
Neuro-fuzzy Systems
Non-negative Matrices and Digraphs
Non-standard Analysis, an Invitation to
Numerical Issues When Using Wavelets
Optical Computing
Phase Transitions in Cellular Automata
Popular Wavelet Families and Filters and Their Use
Positional Analysis and Blockmodeling
Possibility Theory
Principal-Agent Models
Probability Densities in Complex Systems, Measuring
Probability Distributions in Complex Systems
Probability and Statistics in Complex Systems, Introduction to
Quantum Algorithms
Quantum Algorithms and Complexity for Continuous Problems
Quantum Cellular Automata
Quantum Computational Complexity
Quantum Computing
Quantum Computing with Trapped Ions
Quantum Computing Using Optics
Quantum Cryptography
Quantum Error Correction and Fault Tolerant Quantum Computing
Quantum Information Processing
Quantum Information Science, Introduction to
Random Graphs, a Whirlwind Tour of
Random Matrix Theory
Random Walks in Random Environment
Rational, Goal-Oriented Agents
Reaction-Diffusion Computing
Record Statistics and Dynamics
Repeated Games with Complete Information
Repeated Games with Incomplete Information
Reputation Effects
Reversible Cellular Automata
Reversible Computing
Rough and Rough-Fuzzy Sets in Design of Information Systems
Rough Set Data Analysis
Rough Sets in Decision Making
Rough Sets: Foundations and Perspectives
Rule Induction, Missing Attribute Values and Discretization
Self-organized Criticality and Cellular Automata
Self-Replication and Cellular Automata
Semantic Web
Signaling Games
Social Network Analysis, Estimation and Sampling in
Social Network Analysis, Graph Theoretical Approaches to
Social Network Analysis, Large-Scale
Social Network Analysis, Overview of
Social Network Analysis, Two-Mode Concepts in
Social Networks, Algebraic Models for
Social Networks, Diffusion Processes in
Social Networks, Exponential Random Graph (p*)Models for
Social Networks and Granular Computing
Social Network Visualization, Methods of
Social Phenomena Simulation
Social Processes, Simulation Models of
Soft Computing, Introduction to
Static Games
Statistical Applications of Wavelets
Statistics with Imprecise Data
Stochastic Games
Stochastic Loewner Evolution: Linking Universality, Criticality and Conformal Invariance in Complex Systems
Stochastic Processes
Structurally Dynamic Cellular Automata
Swarm Intelligence
Synchronization Phenomena on Networks
Thermodynamics of Computation
Tiling Problem and Undecidability in Cellular Automata
Topological Dynamics of Cellular Automata
Two-Sided Matching Models
Unconventional Computing, Introduction to
Unconventional Computing, Novel Hardware for
Voting
Voting Procedures, Complexity of
Wavelets, Introduction to
Wavelets and the Lifting Scheme
Wavelets and PDE Techniques in Image Processing, a Quick Tour of
World Wide Web, Graph Structure
Zero-Sum Two Person Games
List of Glossary Terms
Index
Recommend Papers

Computational Complexity: Theory, Techniques, and Applications
 1461418003, 9781461418009

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Computational Complexity Theory, Techniques, and Applications

This book consists of selections from the Encyclopedia of Complexity and Systems Science edited by Robert A. Meyers, published by Springer New York in 2009.

Robert A. Meyers (Ed.)

Computational Complexity Theory, Techniques, and Applications

With 1487 Figures and 234 Tables

123

ROBERT A. MEYERS, Ph. D. Editor-in-Chief RAMTECH LIMITED 122 Escalle Lane Larkspur, CA 94939 USA [email protected]

Library of Congress Control Number: 2011940800

ISBN: 978-1-4614-1800-9 This publication is available also as: Print publication under ISBN: 978-1-4614-1799-6 and Print and electronic bundle under ISBN 978-1-4614-1801-6 © 2012 SpringerScience+Business Media, LLC. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. This book consists of selections from the Encyclopedia of Complexity and Systems Science edited by Robert A. Meyers, published by Springer New York in 2009. springer.com Printed on acid free paper

Preface

Complex systems are systems that comprise many interacting parts with the ability to generate a new quality of collective behavior through self-organization, e.g. the spontaneous formation of temporal, spatial or functional structures. They are therefore adaptive as they evolve and may contain self-driving feedback loops. Thus, complex systems are much more than a sum of their parts. Complex systems are often characterized as having extreme sensitivity to initial conditions as well as emergent behavior that are not readily predictable or even completely deterministic. The conclusion is that a reductionist (bottom-up) approach is often an incomplete description of a phenomenon. This recognition, that the collective behavior of the whole system cannot be simply inferred from the understanding of the behavior of the individual components, has led to many new concepts and sophisticated mathematical and modeling tools for application to many scientific, engineering, and societal issues that can be adequately described only in terms of complexity and complex systems. The inherent difficulty, or hardness, of computational problems in complex systems is a fundamental concept in computational complexity theory. This compendium, Computational Complexity, presents a detailed integrated view of the theoretical basis, computational methods and newest applicable approaches to solving inherently difficult problems whose solution requires extensive resources approaching the practical limits of present day computer systems. Key components of computational complexity are detailed, integrated and utilized, varying from Parameterized Complexity Theory (e.g. see articles entitled  Quantum Computing, and  Analog Computation), the Exponential Time Hypothesis (e.g. see articles on  Cellular Automata and Language Theory) and Complexity Class P (e.g. see  Quantum Computational Complexity and  Cellular Automata, Universality of), to Heuristics (e.g.  Social Network Visualization, Methods of and  Repeated Games with Incomplete Information) and Parallel Algorithms (e.g.  Cellular Automata as Models of Parallel Computation and  Optical Computing). There are 209 articles which have been organized into 14 sections each headed by a recognized expert in the field and supported by peer reviewers in addition to the section editor. The sections are:              

Agent Based Modeling and Simulation Cellular Automata, Mathematical Basis of Complex Networks and Graph Theory Data Mining and Knowledge Discovery Game Theory Granular Computing Intelligent Systems Probability and Statistics in Complex Systems Quantum Information Science Social Network Analysis Social Science, Physics and Mathematics, Applications in Soft Computing Unconventional Computing Wavelets

The complete listing of articles and section editors is presented on pages VII to XII.

VI

Preface

The articles are written for an audience of advanced university undergraduate and graduate students, professors, and professionals in a wide range of fields who must manage complexity on scales ranging from the atomic and molecular to the societal and global. Each article was selected and peer reviewed by one of our 13 Section Editors with advice and consultation provided by Board Members: Lotfi Zadeh, Stephen Wolfram and Richard Stearns, and the Editor-in-Chief. This level of coordination assures that the reader can have a level of confidence in the relevance and accuracy of the information far exceeding that generally found on the World Wide Web. Accessibility is also a priority and for this reason each article includes a glossary of important terms and a concise definition of the subject. Robert A. Meyers Editor-in-Chief Larkspur, California July 2011

Sections

Agent Based Modeling and Simulation, Section Editor: Filippo Castiglione Agent Based Computational Economics Agent Based Modeling and Artificial Life Agent Based Modeling and Computer Languages Agent Based Modeling and Simulation, Introduction to Agent Based Modeling, Large Scale Simulations Agent Based Modeling, Mathematical Formalism for Agent-Based Modeling and Simulation Cellular Automaton Modeling of Tumor Invasion Computer Graphics and Games, Agent Based Modeling in Embodied and Situated Agents, Adaptive Behavior in Interaction Based Computing in Physics Logic and Geometry of Agents in Agent-Based Modeling Social Phenomena Simulation Swarm Intelligence Cellular Automata, Mathematical Basis of, Section Editor: Andrew Adamatzky Additive Cellular Automata Algorithmic Complexity and Cellular Automata Cellular Automata and Groups Cellular Automata and Language Theory Cellular Automata as Models of Parallel Computation Cellular Automata in Hyperbolic Spaces Cellular Automata Modeling of Physical Systems Cellular Automata on Triangular, Pentagonal and Hexagonal Tessellations Cellular Automata with Memory Cellular Automata, Classification of Cellular Automata, Emergent Phenomena in Cellular Automata, Universality of Chaotic Behavior of Cellular Automata Dynamics of Cellular Automata in Non-compact Spaces Ergodic Theory of Cellular Automata Evolving Cellular Automata Firing Squad Synchronization Problem in Cellular Automata Gliders in Cellular Automata Growth Phenomena in Cellular Automata Identification of Cellular Automata Mathematical Basis of Cellular Automata, Introduction to

VIII

Sections

Phase Transitions in Cellular Automata Quantum Cellular Automata Reversible Cellular Automata Self-organised Criticality and Cellular Automata Self-Replication and Cellular Automata Structurally Dynamic Cellular Automata Tiling Problem and Undecidability in Cellular Automata Topological Dynamics of Cellular Automata Complex Networks and Graph Theory, Section Editor: Geoffrey Canright Community Structure in Graphs Complex Gene Regulatory Networks – From Structure to Biological Observables: Cell Fate Determination Complex Networks and Graph Theory Complex Networks, Visualization of Food Webs Growth Models for Networks Human Sexual Networks Internet Topology Link Analysis and Web Search Motifs in Graphs Non-negative Matrices and Digraphs Random Graphs, a Whirlwind Tour of Synchronization Phenomena on Networks World Wide Web, Graph Structure Data Mining and Knowledge Discovery, Section Editor: Peter Kokol Data and Dimensionality Reduction in Data Analysis and System Modeling Data-Mining and Knowledge Discovery, Introduction to Data-Mining and Knowledge Discovery, Neural Networks in Data-Mining and Knowledge Discovery: Case Based Reasoning, Nearest Neighbor and Rough Sets Decision Trees Discovery Systems Genetic and Evolutionary Algorithms and Programming: General Introduction and Application to Game Playing Knowledge Discovery: Clustering Machine Learning, Ensemble Methods in Manipulating Data and Dimension Reduction Methods: Feature Selection Game Theory, Section Editor: Marilda Sotomayor Bayesian Games: Games with Incomplete Information Cooperative Games Cooperative Games (Von Neumann–Morgenstern Stable Sets) Correlated Equilibria and Communication in Games Cost Sharing Differential Games Dynamic Games with an Application to Climate Change Models Evolutionary Game Theory Fair Division

Sections

Game Theory and Strategic Complexity Game Theory, Introduction to Implementation Theory Inspection Games Learning in Games Market Games and Clubs Mechanism Design Networks and Stability Principal-Agent Models Repeated Games with Complete Information Repeated Games with Incomplete Information Reputation Effects Signaling Games Static Games Stochastic Games Two-Sided Matching Models Voting Voting Procedures, Complexity of Zero-sum Two Person Games Granular Computing, Section Editor: Tsau Y. Lin Cooperative Multi-Hierarchical Query Answering Systems Dependency and Granularity in Data Mining Fuzzy Logic Fuzzy Probability Theory Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions Genetic-Fuzzy Data Mining Techniques Granular Model for Data Mining Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach Granular Computing and Modeling of the Uncertainty in Quantum Mechanics Granular Computing System Vulnerabilities: Exploring the Dark Side of Social Networking Communities Granular Computing, Information Models for Granular Computing, Introduction to Granular Computing, Philosophical Foundation for Granular Computing, Principles and Perspectives of Granular Computing: Practices, Theories and Future Directions Granular Neural Network Granulation of Knowledge: Similarity Based Approach in Information and Decision Systems Multi-Granular Computing and Quotient Structure Non-standard Analysis, an Invitation to Rough and Rough-Fuzzy Sets in Design of Information Systems Rough Set Data Analysis Rule Induction, Missing Attribute Values and Discretization Social Networks and Granular Computing Intelligent Systems, Section Editor: James A. Hendler Artificial Intelligence in Modeling and Simulation Intelligent Control Intelligent Systems, Introduction to

IX

X

Sections

Learning and Planning (Intelligent Systems) Mobile Agents Semantic Web Probability and Statistics in Complex Systems, Section Editor: Henrik Jeldtoft Jensen Bayesian Statistics Branching Processes Complexity in Systems Level Biology and Genetics: Statistical Perspectives Correlations in Complex Systems Entropy Extreme Value Statistics Field Theoretic Methods Fluctuations, Importance of: Complexity in the View of Stochastic Processes Hierarchical Dynamics Levy Statistics and Anomalous Transport: Levy Flights and Subdiffusion Probability and Statistics in Complex Systems, Introduction to Probability Densities in Complex Systems, Measuring Probability Distributions in Complex Systems Random Matrix Theory Random Walks in Random Environment Record Statistics and Dynamics Stochastic Loewner Evolution: Linking Universality, Criticality and Conformal Invariance in Complex Systems Stochastic Processes Quantum Information Science, Section Editor: Joseph F. Traub Quantum Algorithms Quantum Algorithms and Complexity for Continuous Problems Quantum Computational Complexity Quantum Computing Using Optics Quantum Computing with Trapped Ions Quantum Cryptography Quantum Error Correction and Fault Tolerant Quantum Computing Quantum Information Processing Quantum Information Science, Introduction to Social Network Analysis, Section Editor: John Scott Network Analysis, Longitudinal Methods of Positional Analysis and Blockmodelling Social Network Analysis, Estimation and Sampling in Social Network Analysis, Graph Theoretical Approaches to Social Network Analysis, Large-Scale Social Network Analysis, Overview of Social Network Analysis, Two-Mode Concepts in Social Network Visualization, Methods of Social Networks, Algebraic Models for Social Networks, Diffusion Processes in Social Networks, Exponential Random Graph (p*) Models for

Sections

Social Science, Physics and Mathematics Applications in, Section Editor: Andrzej Nowak Minority Games Rational, Goal-Oriented Agents Social Processes, Simulation Models of Soft Computing, Section Editor: Janusz Kacprzyk Aggregation Operators and Soft Computing Evolving Fuzzy Systems Fuzzy Logic, Type-2 and Uncertainty Fuzzy Optimization Fuzzy Sets Theory, Foundations of Hybrid Soft Computing Models for Systems Modeling and Control Neuro-fuzzy Systems Possibility Theory Rough Sets in Decision Making Rough Sets: Foundations and Perspectives Soft Computing, Introduction to Statistics with Imprecise Data Unconventional Computing, Section Editor: Andrew Adamatzky Amorphous Computing Analog Computation Artificial Chemistry Bacterial Computing Cellular Computing Computing in Geometrical Constrained Excitable Chemical Systems Computing with Solitons DNA Computing Evolution in Materio Immunecomputing Mechanical Computing: The Computational Complexity of Physical Devices Membrane Computing Molecular Automata Nanocomputers Optical Computing Quantum Computing Reaction-Diffusion Computing Reversible Computing Thermodynamics of Computation Unconventional Computing, Introduction to Unconventional Computing, Novel Hardware for Wavelets, Section Editor: Edward Aboufadel Bivariate (Two-dimensional) Wavelets Comparison of Discrete and Continuous Wavelet Transforms Curvelets and Ridgelets

XI

XII

Sections

Multivariate Splines and Their Applictions Multiwavelets Numerical Issues When Using Wavelets Popular Wavelet Families and Filters and Their Use Statistical Applications of Wavelets Wavelets and PDE Techniques in Image Processing, a Quick Tour of Wavelets and the Lifting Scheme Wavelets, Introduction to

About the Editor-in-Chief

Robert A. Meyers President: RAMTECH Limited Manager, Chemical Process Technology, TRW Inc. Post-doctoral Fellow: California Institute of Technology Ph. D. Chemistry, University of California at Los Angeles B. A., Chemistry, California State University, San Diego

Biography Dr. Meyers has worked with more than 25 Nobel laureates during his career. Research Dr. Meyers was Manager of Chemical Technology at TRW (now Northrop Grumman) in Redondo Beach, CA and is now President of RAMTECH Limited. He is co-inventor of the Gravimelt process for desulfurization and demineralization of coal for air pollution and water pollution control. Dr. Meyers is the inventor of and was project manager for the DOE-sponsored Magnetohydrodynamics Seed Regeneration Project which has resulted in the construction and successful operation of a pilot plant for production of potassium formate, a chemical utilized for plasma electricity generation and air pollution control. Dr. Meyers managed the pilot-scale DoE project for determining the hydrodynamics of synthetic fuels. He is a co-inventor of several thermo-oxidative stable polymers which have achieved commercial success as the GE PEI, Upjohn Polyimides and Rhone-Polenc bismaleimide resins. He has also managed projects for photochemistry, chemical lasers, flue gas scrubbing, oil shale analysis and refining, petroleum analysis and refining, global change measurement from space satellites, analysis and mitigation (carbon dioxide and ozone), hydrometallurgical refining, soil and hazardous waste remediation, novel polymers synthesis, modeling of the economics of space transportation systems, space rigidizable structures and chemiluminescence-based devices. He is a senior member of the American Institute of Chemical Engineers, member of the American Physical Society, member of the American Chemical Society and serves on the UCLA Chemistry Department Advisory Board. He was a member of the joint USA-Russia working group on air pollution control and the EPA-sponsored Waste Reduction Institute for Scientists and Engineers.

XIV

About the Editor-in-Chief

Dr. Meyers has more than 20 patents and 50 technical papers. He has published in primary literature journals including Science and the Journal of the American Chemical Society, and is listed in Who’s Who in America and Who’s Who in the World. Dr. Meyers’ scientific achievements have been reviewed in feature articles in the popular press in publications such as The New York Times Science Supplement and The Wall Street Journal as well as more specialized publications such as Chemical Engineering and Coal Age. A public service film was produced by the Environmental Protection Agency of Dr. Meyers’ chemical desulfurization invention for air pollution control. Scientific Books Dr. Meyers is the author or Editor-in-Chief of 12 technical books one of which won the Association of American Publishers Award as the best book in technology and engineering. Encyclopedias Dr. Meyers conceived and has served as Editor-in-Chief of the Academic Press (now Elsevier) Encyclopedia of Physical Science and Technology. This is an 18-volume publication of 780 twenty-page articles written to an audience of university students and practicing professionals. This encyclopedia, first published in 1987, was very successful, and because of this, was revised and reissued in 1992 as a second edition. The Third Edition was published in 2001 and is now online. Dr. Meyers has completed two editions of the Encyclopedia of Molecular Cell Biology and Molecular Medicine for Wiley VCH publishers (1995 and 2004). These cover molecular and cellular level genetics, biochemistry, pharmacology, diseases and structure determination as well as cell biology. His eight-volume Encyclopedia of Environmental Analysis and Remediation was published in 1998 by John Wiley & Sons and his 15-volume Encyclopedia of Analytical Chemistry was published in 2000, also by John Wiley & Sons, all of which are available on-line.

Editorial Board Members

LOTFI A. Z ADEH Professor in the Graduate School, Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley

STEPHEN W OLFRAM Founder and CEO, Wolfram Research Creator, Mathematica® Author, A New Kind of Science

RICHARD E. STEARNS 1993 Turing Award for foundations of computational complexity Current interests include: computational complexity, automata theory, analysis of algorithms, and game theory.

Section Editors

Agent Based Modeling and Simulation

Complex Networks and Graph Theory

FILIPPO CASTIGLIONE Research Scientist Institute for Computing Applications (IAC) “M. Picone” National Research Council (CNR), Italy

GEOFFREY CANRIGHT Senior Research Scientist Telenor Research and Innovation Fornebu, Norway

Cellular Automata, Mathematical Basis of

Data Mining and Knowledge Discovery

ANDREW ADAMATZKY Professor Faculty of Computing, Engineering and Mathematical Science University of the West of England

PETER KOKOL Professor Department of Computer Science University of Maribor, Slovenia

XVIII

Section Editors

Game Theory

MARILDA SOTOMAYOR Professor Department of Economics University of São Paulo, Brazil Department of Economics Brown University, Providence

Probability and Statistics in Complex Systems

HENRIK JELDTOFT JENSEN Professor of Mathematical Physics Department of Mathematics and Institute for Mathematical Sciences Imperial College London

Granular Computing

Quantum Information Science

TSAU Y. LIN Professor Computer Science Department San Jose State University

JOSEPH F. TRAUB Edwin Howard Armstrong Professor of Computer Science Computer Science Department Columbia University

Intelligent Systems

Social Network Analysis

JAMES A. HENDLER Senior Constellation Professor of the Tetherless World Research Constellation Rensselaer Polytechnic Institute

JOHN SCOTT Professor of Sociology School of Social Science and Law University of Plymouth

Section Editors

Social Science, Physics and Mathematics Applications in

ANDRZEJ N OWAK Director of the Center for Complex Systems University of Warsaw Assistant Professor, Psychology Department Florida Atlantic University

Soft Computing

JANUSZ KACPRZYK Deputy Director for Scientific Affairs, Professor Systems Research Institute Polish Academy of Sciences

Unconventional Computing

ANDREW ADAMATZKY Professor Faculty of Computing, Engineering and Mathematical Science University of the West of England

Wavelets

EDWARD ABOUFADEL Professor of Mathematics Grand Valley State University

XIX

Table of Contents

Additive Cellular Automata Burton Voorhees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Agent Based Computational Economics Moshe Levy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

Agent Based Modeling and Artificial Life Charles M. Macal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

Agent Based Modeling and Computer Languages Michael J. North, Charles M. Macal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Agent Based Modeling, Large Scale Simulations Hazel R. Parry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

Agent Based Modeling, Mathematical Formalism for Reinhard Laubenbacher, Abdul S. Jarrah, Henning S. Mortveit, S.S. Ravi . . . . . . . . . . . . . . . . . . .

88

Agent Based Modeling and Simulation Stefania Bandini, Sara Manzoni, Giuseppe Vizzari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

105

Agent Based Modeling and Simulation, Introduction to Filippo Castiglione . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

118

Aggregation Operators and Soft Computing Vicenç Torra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

122

Algorithmic Complexity and Cellular Automata Julien Cervelle, Enrico Formenti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

132

Amorphous Computing Hal Abelson, Jacob Beal, Gerald Jay Sussman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

147

Analog Computation Bruce J. MacLennan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161

Artificial Chemistry Peter Dittrich . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

185

Artificial Intelligence in Modeling and Simulation Bernard Zeigler, Alexandre Muzy, Levent Yilmaz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

204

Bacterial Computing Martyn Amos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

228

Bayesian Games: Games with Incomplete Information Shmuel Zamir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

238

Bayesian Statistics David Draper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

254

Bivariate (Two-dimensional) Wavelets Bin Han . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

275

XXII

Table of Contents

Branching Processes Mikko J. Alava, Kent Bækgaard Lauritsen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

285

Cellular Automata as Models of Parallel Computation Thomas Worsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

298

Cellular Automata, Classification of Klaus Sutner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

312

Cellular Automata, Emergent Phenomena in James E. Hanson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

325

Cellular Automata and Groups Tullio Ceccherini-Silberstein, Michel Coornaert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

336

Cellular Automata in Hyperbolic Spaces Maurice Margenstern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

350

Cellular Automata and Language Theory Martin Kutrib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

359

Cellular Automata with Memory Ramón Alonso-Sanz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

382

Cellular Automata Modeling of Physical Systems Bastien Chopard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

407

Cellular Automata in Triangular, Pentagonal and Hexagonal Tessellations Carter Bays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

434

Cellular Automata, Universality of Jérôme Durand-Lose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

443

Cellular Automaton Modeling of Tumor Invasion Haralambos Hatzikirou, Georg Breier, Andreas Deutsch . . . . . . . . . . . . . . . . . . . . . . . . . . . .

456

Cellular Computing Christof Teuscher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

465

Chaotic Behavior of Cellular Automata Julien Cervelle, Alberto Dennunzio, Enrico Formenti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

479

Community Structure in Graphs Santo Fortunato, Claudio Castellano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

490

Comparison of Discrete and Continuous Wavelet Transforms Palle E. T. Jorgensen, Myung-Sin Song . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

513

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination Sui Huang, Stuart A. Kauffman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

527

Complexity in Systems Level Biology and Genetics: Statistical Perspectives David A. Stephens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

561

Complex Networks and Graph Theory Geoffrey Canright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

579

Complex Networks, Visualization of Vladimir Batagelj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

589

Computer Graphics and Games, Agent Based Modeling in Brian Mac Namee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

604

Computing in Geometrical Constrained Excitable Chemical Systems Jerzy Gorecki, Joanna Natalia Gorecka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

622

Computing with Solitons Darren Rand, Ken Steiglitz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

646

Table of Contents

Cooperative Games Roberto Serrano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

666

Cooperative Games (Von Neumann–Morgenstern Stable Sets) Jun Wako, Shigeo Muto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

675

Cooperative Multi-hierarchical Query Answering Systems Zbigniew W. Ras, Agnieszka Dardzinska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

690

Correlated Equilibria and Communication in Games Françoise Forges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

695

Correlations in Complex Systems Renat M. Yulmetyev, Peter Hänggi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

705

Cost Sharing Maurice Koster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

724

Curvelets and Ridgelets Jalal Fadili, Jean-Luc Starck . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

754

Data and Dimensionality Reduction in Data Analysis and System Modeling Witold Pedrycz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

774

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets Lech Polkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

789

Data-Mining and Knowledge Discovery, Introduction to Peter Kokol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

810

Data-Mining and Knowledge Discovery, Neural Networks in Markus Brameier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

813

Decision Trees Vili Podgorelec, Milan Zorman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

827

Dependency and Granularity in Data-Mining Shusaku Tsumoto, Shoji Hirano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

846

Differential Games Marc Quincampoix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

854

Discovery Systems Petra Povalej, Mateja Verlic, Gregor Stiglic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

862

DNA Computing Martyn Amos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

882

Dynamic Games with an Application to Climate Change Models Prajit K. Dutta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

897

Dynamics of Cellular Automata in Non-compact Spaces Enrico Formenti, Petr K˚urka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

914

Embodied and Situated Agents, Adaptive Behavior in Stefano Nolfi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

925

Entropy Constantino Tsallis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

940

Ergodic Theory of Cellular Automata Marcus Pivato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

965

Evolutionary Game Theory William H. Sandholm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1000

Evolution in Materio Simon Harding, Julian F. Miller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1030

XXIII

XXIV

Table of Contents

Evolving Cellular Automata Martin Cenek, Melanie Mitchell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1043

Evolving Fuzzy Systems Plamen Angelov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1053

Extreme Value Statistics Mario Nicodemi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1066

Fair Division Steven J. Brams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1073

Field Theoretic Methods Uwe Claus Täuber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1080

Firing Squad Synchronization Problem in Cellular Automata Hiroshi Umeo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1094

Fluctuations, Importance of: Complexity in the View of Stochastic Processes Rudolf Friedrich, Joachim Peinke, M. Reza Rahimi Tabar . . . . . . . . . . . . . . . . . . . . . . . . . . .

1131

Food Webs Jennifer A. Dunne . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1155

Fuzzy Logic Lotfi A. Zadeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1177

Fuzzy Logic, Type-2 and Uncertainty Robert I. John, Jerry M. Mendel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1201

Fuzzy Optimization Weldon A. Lodwick, Elizabeth A. Untiedt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1211

Fuzzy Probability Theory Michael Beer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1240

Fuzzy Sets Theory, Foundations of Janusz Kacprzyk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1253

Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions I. Burhan Türk¸sen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1274

Game Theory, Introduction to Marilda Sotomayor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1289

Game Theory and Strategic Complexity Kalyan Chatterjee, Hamid Sabourian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1292

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing Michael Orlov, Moshe Sipper, Ami Hauptman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1309

Genetic-Fuzzy Data Mining Techniques Tzung-Pei Hong, Chun-Hao Chen, Vincent S. Tseng . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1321

Gliders in Cellular Automata Carter Bays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1337

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach Salvatore Greco, Benedetto Matarazzo, Roman Słowi´nski . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1347

Granular Computing, Information Models for Steven A. Demurjian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1369

Granular Computing, Introduction to Tsau Young Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1377

Granular Computing and Modeling of the Uncertainty in Quantum Mechanics Kow-Lung Chang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1381

Table of Contents

Granular Computing, Philosophical Foundation for Zhengxin Chen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1389

Granular Computing: Practices, Theories, and Future Directions Tsau Young Lin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1404

Granular Computing, Principles and Perspectives of Jianchao Han, Nick Cercone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1421

Granular Computing System Vulnerabilities: Exploring the Dark Side of Social Networking Communities Steve Webb, James Caverlee, Calton Pu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1433

Granular Model for Data Mining Anita Wasilewska, Ernestina Menasalvas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1444

Granular Neural Network Yan-Qing Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1455

Granulation of Knowledge: Similarity Based Approach in Information and Decision Systems Lech Polkowski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1464

Growth Models for Networks Sergey N. Dorogovtsev . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1488

Growth Phenomena in Cellular Automata Janko Gravner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1499

Hierarchical Dynamics Martin Nilsson Jacobi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1514

Human Sexual Networks Fredrik Liljeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1535

Hybrid Soft Computing Models for Systems Modeling and Control Oscar Castillo, Patricia Melin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1547

Identification of Cellular Automata Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1564

Immunecomputing Jon Timmis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1576

Implementation Theory Luis C. Corchón . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1588

Inspection Games Rudolf Avenhaus, Morton J. Canty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1605

Intelligent Control Clarence W. de Silva . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1619

Intelligent Systems, Introduction to James Hendler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1642

Interaction Based Computing in Physics Franco Bagnoli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1644

Internet Topology Yihua He, Georgos Siganos, Michalis Faloutsos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1663

Knowledge Discovery: Clustering Pavel Berkhin, Inderjit S. Dhillon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1681

Learning in Games John Nachbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1695

Learning and Planning (Intelligent Systems) Ugur Kuter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1706

XXV

XXVI

Table of Contents

Levy Statistics and Anomalous Transport: Levy Flights and Subdiffusion Ralf Metzler, Aleksei V. Chechkin, Joseph Klafter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1724

Link Analysis and Web Search Johannes Bjelland, Geoffrey Canright, Kenth Engø-Monsen . . . . . . . . . . . . . . . . . . . . . . . . . . .

1746

Logic and Geometry of Agents in Agent-Based Modeling Samson Abramsky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1767

Machine Learning, Ensemble Methods in Sašo Džeroski, Panˇce Panov, Bernard Ženko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1781

Manipulating Data and Dimension Reduction Methods: Feature Selection Huan Liu, Zheng Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1790

Market Games and Clubs Myrna Wooders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1801

Mathematical Basis of Cellular Automata, Introduction to Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1819

Mechanical Computing: The Computational Complexity of Physical Devices John H. Reif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1821

Mechanism Design Ron Lavi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1837

Membrane Computing ˘ Gheorghe PAun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1851

Minority Games Chi Ho Yeung, Yi-Cheng Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1863

Mobile Agents Niranjan Suri, Jan Vitek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1880

Molecular Automata Joanne Macdonald, Darko Stefanovic, Milan Stojanovic . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1894

Motifs in Graphs Sergi Valverde, Ricard V. Solé . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1919

Multi-Granular Computing and Quotient Structure Ling Zhang, Bo Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1929

Multivariate Splines and Their Applications Ming-Jun Lai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1939

Multiwavelets Fritz Keinert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1981

Nanocomputers Ferdinand Peper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1998

Network Analysis, Longitudinal Methods of Tom A. B. Snijders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2029

Networks and Stability Frank H. Page Jr., Myrna Wooders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2044

Neuro-fuzzy Systems Leszek Rutkowski, Krzysztof Cpałka, Robert Nowicki, Agata Pokropi´nska, Rafał Scherer . . . . . . . . . . .

2069

Non-negative Matrices and Digraphs Abraham Berman, Naomi Shaked-Monderer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2082

Non-standard Analysis, an Invitation to Wei-Zhe Yang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2096

Table of Contents

Numerical Issues When Using Wavelets Jean-Luc Starck, Jalal Fadili . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2121

Optical Computing Thomas J. Naughton, Damien Woods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2138

Phase Transitions in Cellular Automata Nino Boccara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2157

Popular Wavelet Families and Filters and Their Use Ming-Jun Lai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2168

Positional Analysis and Blockmodeling Patrick Doreian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2226

Possibility Theory Didier Dubois, Henri Prade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2240

Principal-Agent Models Inés Macho-Stadler, David Pérez-Castrillo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2253

Probability Densities in Complex Systems, Measuring Gunnar Pruessner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2267

Probability Distributions in Complex Systems Didier Sornette . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2286

Probability and Statistics in Complex Systems, Introduction to Henrik Jeldtoft Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2301

Quantum Algorithms Michele Mosca . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2303

Quantum Algorithms and Complexity for Continuous Problems Anargyros Papageorgiou, Joseph F. Traub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2334

Quantum Cellular Automata Karoline Wiesner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2351

Quantum Computational Complexity John Watrous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2361

Quantum Computing Viv Kendon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2388

Quantum Computing with Trapped Ions Wolfgang Lange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2406

Quantum Computing Using Optics Gerard J. Milburn, Andrew G. White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2437

Quantum Cryptography Hoi-Kwong Lo, Yi Zhao . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2453

Quantum Error Correction and Fault Tolerant Quantum Computing Markus Grassl, Martin Rötteler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2478

Quantum Information Processing Seth Lloyd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2496

Quantum Information Science, Introduction to Joseph F. Traub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2534

Random Graphs, a Whirlwind Tour of Fan Chung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2536

Random Matrix Theory Güler Ergün . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2549

XXVII

XXVIII

Table of Contents

Random Walks in Random Environment Ofer Zeitouni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2564

Rational, Goal-Oriented Agents Rosaria Conte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2578

Reaction-Diffusion Computing Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2594

Record Statistics and Dynamics Paolo Sibani, Henrik, Jeldtoft Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2611

Repeated Games with Complete Information Olivier Gossner, Tristan Tomala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2620

Repeated Games with Incomplete Information Jérôme Renault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2635

Reputation Effects George J. Mailath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2656

Reversible Cellular Automata Kenichi Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2668

Reversible Computing Kenichi Morita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2685

Rough and Rough-Fuzzy Sets in Design of Information Systems Theresa Beaubouef, Frederick Petry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2702

Rough Set Data Analysis Shusaku Tsumoto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2716

Rough Sets in Decision Making Roman Słowi´nski, Salvatore Greco, Benedetto Matarazzo . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2727

Rough Sets: Foundations and Perspectives James F. Peters, Andrzej Skowron, Jarosław Stepaniuk . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2761

Rule Induction, Missing Attribute Values and Discretization Jerzy W. Grzymala-Busse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2772

Self-organized Criticality and Cellular Automata Michael Creutz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2780

Self-Replication and Cellular Automata Gianluca Tempesti, Daniel Mange, André Stauffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2792

Semantic Web Wendy Hall, Kieron O’Hara . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2810

Signaling Games Joel Sobel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2830

Social Network Analysis, Estimation and Sampling in Ove Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2845

Social Network Analysis, Graph Theoretical Approaches to Wouter de Nooy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2864

Social Network Analysis, Large-Scale Vladimir Batagelj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2878

Social Network Analysis, Overview of John Scott . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2898

Social Network Analysis, Two-Mode Concepts in Stephen P. Borgatti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2912

Table of Contents

Social Networks, Algebraic Models for Philippa Pattison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2925

Social Networks, Diffusion Processes in Thomas W. Valente . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2940

Social Networks, Exponential Random Graph (p*) Models for Garry Robins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2953

Social Networks and Granular Computing Churn-Jung Liau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2968

Social Network Visualization, Methods of Linton C. Freeman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2981

Social Phenomena Simulation Paul Davidsson, Harko Verhagen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2999

Social Processes, Simulation Models of Klaus G. Troitzsch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3004

Soft Computing, Introduction to Janusz Kacprzyk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3020

Static Games Oscar Volij . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3023

Statistical Applications of Wavelets Sofia Olhede . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3043

Statistics with Imprecise Data María Ángeles Gil, Olgierd Hryniewicz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3052

Stochastic Games Eilon Solan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3064

Stochastic Loewner Evolution: Linking Universality, Criticality and Conformal Invariance in Complex Systems Hans C. Fogedby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3075

Stochastic Processes Alan J. McKane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3097

Structurally Dynamic Cellular Automata Andrew Ilachinski . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3114

Swarm Intelligence Gerardo Beni . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3150

Synchronization Phenomena on Networks Guanrong Chen, Ming Zhao, Tao Zhou, Bing-Hong Wang . . . . . . . . . . . . . . . . . . . . . . . . . . .

3170

Thermodynamics of Computation H. John Caulfield, Lei Qian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3187

Tiling Problem and Undecidability in Cellular Automata Jarkko Kari . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3198

Topological Dynamics of Cellular Automata Petr K˚urka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3212

Two-Sided Matching Models Marilda Sotomayor, Ömer Özak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3234

Unconventional Computing, Introduction to Andrew Adamatzky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3258

XXIX

XXX

Table of Contents

Unconventional Computing, Novel Hardware for Tetsuya Asai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3260

Voting Alvaro Sandroni, Jonathan Pogach, Michela Tincani, Antonio Penta, Deniz Selman . . . . . . . . . . . . .

3280

Voting Procedures, Complexity of Olivier Hudry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3291

Wavelets, Introduction to Edward Aboufadel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3314

Wavelets and the Lifting Scheme Anders La Cour–Harbo, Arne Jensen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3316

Wavelets and PDE Techniques in Image Processing, a Quick Tour of Hao-Min Zhou, Tony F. Chan, Jianhong Shen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3341

World Wide Web, Graph Structure Lada A. Adamic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3358

Zero-Sum Two Person Games T.E.S. Raghavan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3372

List of Glossary Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3397

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3417

Contributors

ABELSON, HAL Massachusetts Institute of Technology Cambridge USA

ANGELOV, PLAMEN Lancaster University Lancaster UK

ABOUFADEL, EDWARD Grand Valley State University Allendale USA

ASAI , TETSUYA Hokkaido University Sapporo Japan

ABRAMSKY, SAMSON Oxford University Computing Laboratory Oxford UK

AVENHAUS, RUDOLF Armed Forces University Munich Neubiberg Germany

ADAMATZKY, ANDREW University of the West of England Bristol UK

BAGNOLI , FRANCO University of Florence Florence Italy

ADAMIC, LADA A. University of Michigan Ann Arbor USA

BANDINI , STEFANIA University of Milan-Bicocca Milan Italy

ALAVA , MIKKO J. Espoo University of Technology Espoo Finland ALONSO-SANZ, RAMÓN Universidad Politécnica de Madrid Madrid Spain

BATAGELJ, VLADIMIR University of Ljubljana Ljubljana Slovenia BAYS, CARTER University of South Carolina Columbia USA

AMOS, MARTYN Manchester Metropolitan University Manchester UK

BEAL, JACOB Massachusetts Institute of Technology Cambridge USA

ÁNGELES GIL, MARÍA University of Oviedo Oviedo Spain

BEAUBOUEF, THERESA Southeastern Louisiana University Hammond USA

XXXII

Contributors

BEER, MICHAEL National University of Singapore Kent Ridge Singapore

CANTY, MORTON J. Forschungszentrum Jülich Jülich Germany

BENI , GERARDO University of California Riverside Riverside USA

CASTELLANO, CLAUDIO “Sapienza” Università di Roma Roma Italy

BERKHIN, PAVEL eBay Inc. San Jose USA BERMAN, ABRAHAM Technion – Israel Institute of Technology Haifa Israel BJELLAND, JOHANNES Telenor R&I Fornebu Norway BOCCARA , N INO University of Illinois Chicago USA CE Saclay Gif-sur-Yvette France BORGATTI , STEPHEN P. University of Kentucky Lexington USA

CASTIGLIONE, FILIPPO Institute for Computing Applications (IAC) – National Research Council (CNR) Rome Italy CASTILLO, OSCAR Tijuana Institute of Technology Tijuana Mexico CAULFIELD, H. JOHN Fisk University Nashville USA CAVERLEE, JAMES Texas A&M University College Station USA CECCHERINI -SILBERSTEIN, TULLIO Università del Sannio Benevento Italy

BRAMEIER, MARKUS University of Aarhus Århus Denmark

CENEK, MARTIN Portland State University Portland USA

BRAMS, STEVEN J. New York University New York USA

CERCONE, N ICK York University Toronto Canada

BREIER, GEORG Technische Universität Dresden Dresden Germany

CERVELLE, JULIEN Université Paris-Est Marne la Vallée France

CANRIGHT, GEOFFREY Telenor R&I Fornebu Norway

CHANG, KOW-LUNG National Taiwan University Taipeh Taiwan

Contributors

CHAN, TONY F. University of California Los Angeles USA CHATTERJEE, KALYAN The Pennsylvania State University University Park USA CHECHKIN, ALEKSEI V. Institute for Theoretical Physics NSC KIPT Kharkov Ukraine CHEN, CHUN-HAO National Cheng–Kung University Tainan Taiwan CHEN, GUANRONG City University of Hong Kong Hong Kong China CHEN, Z HENGXIN University of Nebraska at Omaha Omaha USA CHOPARD, BASTIEN University of Geneva Geneva Switzerland CHUNG, FAN University of California San Diego USA CONTE, ROSARIA CNR Rome Italy

CPAŁKA , KRZYSZTOF Cz˛estochowa University of Technology Cz˛estochowa Poland Academy of Humanities and Economics Lodz Poland CREUTZ, MICHAEL Brookhaven National Laboratory Upton USA DARDZINSKA , AGNIESZKA Białystok Technical University Białystok Poland DAVIDSSON, PAUL Blekinge Institute of Technology Ronneby Sweden DEMURJIAN, STEVEN A. The University of Connecticut Storrs USA DENNUNZIO, ALBERTO Università degli Studi di Milano-Bicocca Milan Italy DE N OOY , W OUTER University of Amsterdam Amsterdam The Netherlands DE SILVA , CLARENCE W. University of British Columbia Vancouver Canada

DEUTSCH, ANDREAS Technische Universität Dresden Dresden Germany

COORNAERT, MICHEL Université Louis Pasteur et CNRS Strasbourg France

DHILLON, INDERJIT S. University of Texas Austin USA

CORCHÓN, LUIS C. Universidad Carlos III Madrid Spain

DITTRICH, PETER Friedrich Schiller University Jena Jena Germany

XXXIII

XXXIV

Contributors

DOREIAN, PATRICK University of Pittsburgh Pittsburgh USA

FADILI , JALAL Ecole Nationale Supèrieure d’Ingènieurs de Caen Caen Cedex France

DOROGOVTSEV, SERGEY N. Universidade de Aveiro Aveiro Portugal A. F. Ioffe Physico-Technical Institute St. Petersburg Russia

FALOUTSOS, MICHALIS University of California Riverside USA

DRAPER, DAVID University of California Santa Cruz USA DUBOIS, DIDIER Universite Paul Sabatier Toulouse Cedex France DUNNE, JENNIFER A. Santa Fe Institute Santa Fe USA Pacific Ecoinformatics and Computational Ecology Lab Berkeley USA DURAND-LOSE, JÉRÔME Université d’Orléans Orléans France DUTTA , PRAJIT K. Columbia University New York USA DŽEROSKI , SAŠO Jožef Stefan Institute Ljubljana Slovenia

FOGEDBY, HANS C. University of Aarhus Aarhus Denmark Niels Bohr Institute Copenhagen Denmark FORGES, FRANÇOISE Université Paris-Dauphine Paris France FORMENTI , ENRICO Université de Nice Sophia Antipolis Sophia Antipolis France FORTUNATO, SANTO ISI Foundation Torino Italy FRANK, OVE Stockholm University Stockholm Sweden FREEMAN, LINTON C. University of California Irvine USA

ENGØ-MONSEN, KENTH Telenor R&I Fornebu Norway

FRIEDRICH, RUDOLF University of Münster Münster Germany

ERGÜN, GÜLER University of Bath Bath UK

GORECKA , JOANNA N ATALIA Polish Academy of Science Warsaw Poland

Contributors

GORECKI , JERZY Polish Academy of Science Warsaw Poland Cardinal Stefan Wyszynski University Warsaw Poland GOSSNER, OLIVIER Northwestern University Paris France GRASSL, MARKUS Austrian Academy of Sciences Innsbruck Austria GRAVNER, JANKO University of California Davis USA GRECO, SALVATORE University of Catania Catania Italy GRZYMALA -BUSSE, JERZY W. University of Kansas Lawrence USA Polish Academy of Sciences Warsaw Poland

HANSON, JAMES E. IBM T.J. Watson Research Center Yorktown Heights USA HARDING, SIMON Memorial University St. John’s Canada HATZIKIROU, HARALAMBOS Technische Universität Dresden Dresden Germany HAUPTMAN, AMI Ben-Gurion University Beer-Sheva Israel HENDLER, JAMES Rensselaer Polytechnic Institute Troy USA HE, YIHUA University of California Riverside USA HIRANO, SHOJI Shimane University, School of Medicine Enya-cho Izumo City, Shimane Japan

HALL, W ENDY University of Southampton Southampton United Kingdom

HONG, TZUNG-PEI National University of Kaohsiung Kaohsiung Taiwan

HAN, BIN University of Alberta Edmonton Canada

HRYNIEWICZ, OLGIERD Systems Research Institute Warsaw Poland

HÄNGGI , PETER University of Augsburg Augsburg Germany

HUANG, SUI Department of Biological Sciences, University of Calgary Calgary Canada

HAN, JIANCHAO California State University Dominguez Hills, Carson USA

HUDRY, OLIVIER École Nationale Supérieure des Télécommunications Paris France

XXXV

XXXVI

Contributors

ILACHINSKI , ANDREW Center for Naval Analyses Alexandria USA

KENDON, VIV University of Leeds Leeds UK

JARRAH, ABDUL S. Virginia Polytechnic Institute and State University Virginia USA

KLAFTER, JOSEPH Tel Aviv University Tel Aviv Israel University of Freiburg Freiburg Germany

JENSEN, ARNE Aalborg University Aalborg East Denmark JENSEN, HENRIK JELDTOFT Institute for Mathematical Sciences London UK Imperial College London London UK JENSEN, HENRIK, JELDTOFT Imperial College London London UK JOHN, ROBERT I. De Montfort University Leicester United Kingdom JORGENSEN, PALLE E. T. The University of Iowa Iowa City USA KACPRZYK, JANUSZ Polish Academy of Sciences Warsaw Poland

KOKOL, PETER University of Maribor Maribor Slovenia KOSTER, MAURICE University of Amsterdam Amsterdam Netherlands ˚ KURKA , PETR Université de Nice Sophia Antipolis Nice France Academy of Sciences and Charles University Prague Czechia

KUTER, UGUR University of Maryland College Park USA KUTRIB, MARTIN Universität Giessen Giessen Germany

KARI , JARKKO University of Turku Turku Finland

LA COUR–HARBO, ANDERS Aalborg University Aalborg East Denmark

KAUFFMAN, STUART A. Department of Biological Sciences, University of Calgary Calgary Canada

LAI , MING-JUN The University of Georgia Athens USA

KEINERT, FRITZ Iowa State University Ames USA

LANGE, W OLFGANG University of Sussex Brighton UK

Contributors XXXVII

LAUBENBACHER, REINHARD Virginia Polytechnic Institute and State University Virginia USA LAURITSEN, KENT BÆKGAARD Danish Meteorological Institute Copenhagen Denmark LAVI , RON The Technion – Israel Institute of Technology Haifa Israel LEVY, MOSHE The Hebrew University Jerusalem Israel LIAU, CHURN-JUNG Academia Sinica Taipei Taiwan

MACAL, CHARLES M. Center for Complex Adaptive Agent Systems Simulation (CAS2 ) Argonne USA MACDONALD, JOANNE Columbia University New York USA MACHO-STADLER, INÉS Universitat Autònoma de Barcelona Barcelona Spain MACLENNAN, BRUCE J. University of Tennessee Knoxville USA MAC N AMEE, BRIAN Dublin Institute of Technology Dublin Ireland

LILJEROS, FREDRIK Stockholm University Stockholm Sweden

MAILATH, GEORGE J. University of Pennsylvania Philadelphia USA

LIN, TSAU YOUNG San Jose State University San Jose USA

MANGE, DANIEL Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland

LIU, HUAN Arizona State University Tempe USA

MANZONI , SARA University of Milan-Bicocca Milan Italy

LLOYD, SETH MIT Cambridge USA

MARGENSTERN, MAURICE Université Paul Verlaine Metz France

LODWICK, W ELDON A. University of Colorado Denver Denver USA

MATARAZZO, BENEDETTO University of Catania Catania Italy

LO, HOI -KWONG University of Toronto Toronto Canada

MCKANE, ALAN J. University of Manchester Manchester UK

XXXVIII Contributors

MELIN, PATRICIA Tijuana Institute of Technology Tijuana Mexico

MUTO, SHIGEO Institute of Technology Tokyo Japan

MENASALVAS, ERNESTINA Facultad de Informatica Madrid Spain

MUZY, ALEXANDRE Università di Corsica Corte France

MENDEL, JERRY M. University of Southern California Los Angeles USA METZLER, RALF Technical University of Munich Garching Germany MILBURN, GERARD J. The University of Queensland Brisbane Australia MILLER, JULIAN F. University of York Heslington UK MITCHELL, MELANIE Portland State University Portland USA MORITA , KENICHI Hiroshima University Higashi-Hiroshima Japan MORTVEIT, HENNING S. Virginia Polytechnic Institute and State University Virginia USA MOSCA , MICHELE University of Waterloo Waterloo Canada St. Jerome’s University Waterloo Canada Perimeter Institute for Theoretical Physics Waterloo Canada

N ACHBAR, JOHN Washington University St. Louis USA N AUGHTON, THOMAS J. National University of Ireland Maynooth County Kildare Ireland University of Oulu, RFMedia Laboratory Ylivieska Finland N ICODEMI , MARIO University of Warwick Coventry UK N ILSSON JACOBI , MARTIN Chalmers University of Technology Gothenburg Sweden N OLFI , STEFANO National Research Council (CNR) Rome Italy N ORTH, MICHAEL J. Center for Complex Adaptive Agent Systems Simulation (CAS2 ) Argonne USA N OWICKI , ROBERT Cz˛estochowa University of Technology Cz˛estochowa Poland O’HARA , KIERON University of Southampton Southampton United Kingdom

Contributors

OLHEDE, SOFIA University College London London UK

PEINKE, JOACHIM Carl-von-Ossietzky University Oldenburg Oldenburg Germany

ORLOV, MICHAEL Ben-Gurion University Beer-Sheva Israel

PENTA , ANTONIO University of Pennsylvania Philadelphia USA

ÖZAK, ÖMER Brown University Providence USA

PEPER, FERDINAND National Institute of Information and Communications Technology Kobe Japan

PAGE JR., FRANK H. Indiana University Bloomington USA Universite Paris 1 Pantheon–Sorbonne France ˇ PANOV, PAN CE Jožef Stefan Institute Ljubljana Slovenia

PAPAGEORGIOU, ANARGYROS Columbia University New York USA PARRY, HAZEL R. Central Science Laboratory York UK PATTISON, PHILIPPA University of Melbourne Parkville Australia ˘ UN, GHEORGHE PA Institute of Mathematics of the Romanian Academy Bucure¸sti Romania PEDRYCZ, W ITOLD University of Alberta Edmonton Canada Polish Academy of Sciences Warsaw Poland

PÉREZ-CASTRILLO, DAVID Universitat Autònoma de Barcelona Barcelona Spain PETERS, JAMES F. University of Manitoba Winnipeg Canada PETRY, FREDERICK Stennis Space Center Mississippi USA PIVATO, MARCUS Trent University Peterborough Canada PODGORELEC, VILI University of Maribor Maribor Slovenia POGACH, JONATHAN University of Pennsylvania Philadelphia USA ´ , AGATA POKROPI NSKA Jan Dlugosz University Cz˛estochowa Poland

POLKOWSKI , LECH Polish-Japanese Institute of Information Technology Warsaw Poland

XXXIX

XL

Contributors

POVALEJ, PETRA University of Maribor Maribor Slovenia

RENAULT, JÉRÔME Université Paris Dauphine Paris France

PRADE, HENRI Universite Paul Sabatier Toulouse Cedex France

REZA RAHIMI TABAR, M. Sharif University of Technology Theran Iran

PRUESSNER, GUNNAR Imperial College London London UK PU, CALTON Georgia Institute of Technology Atlanta USA QIAN, LEI Fisk University Nashville USA QUINCAMPOIX, MARC Université de Bretagne Occidentale Brest France RAGHAVAN, T.E.S. University of Illinois Chicago USA RAND, DARREN Massachusetts Institute of Technology Lexington USA RAS, Z BIGNIEW W. University of North Carolina Charlotte USA Polish Academy of Sciences Warsaw Poland

ROBINS, GARRY University of Melbourne Melbourne Australia RÖTTELER, MARTIN NEC Laboratories America, Inc. Princeton USA RUTKOWSKI , LESZEK Cz˛estochowa University of Technology Cz˛estochowa Poland SABOURIAN, HAMID University of Cambridge Cambridge UK SANDHOLM, W ILLIAM H. University of Wisconsin Madison USA SANDRONI , ALVARO University of Pennsylvania Philadelphia USA SCHERER, RAFAŁ Cz˛estochowa University of Technology Cz˛estochowa Poland

RAVI , S.S. University at Albany – State University of New York New York USA

SCOTT, JOHN University of Plymouth Plymouth UK

REIF, JOHN H. Duke University Durham USA

SELMAN, DENIZ University of Pennsylvania Philadelphia USA

Contributors

SERRANO, ROBERTO Brown University Providence USA IMDEA-Social Sciences Madrid Spain SHAKED-MONDERER, N AOMI Emek Yezreel College Emek Yezreel Israel SHEN, JIANHONG Barclays Capital New York USA SIBANI , PAOLO SDU Odense Denmark SIGANOS, GEORGOS University of California Riverside USA SIPPER, MOSHE Ben-Gurion University Beer-Sheva Israel SKOWRON, ANDRZEJ Warsaw University Warsaw Poland ´ , ROMAN SŁOWI NSKI Poznan University of Technology Poznan Poland Polish Academy of Sciences Warsaw Poland

SOLAN, EILON Tel Aviv University Tel Aviv Israel SOLÉ, RICARD V. Santa Fe Institute Santa Fe USA SONG, MYUNG-SIN Southern Illinois University Edwardsville USA SORNETTE, DIDIER ETH Zurich Zurich Switzerland SOTOMAYOR, MARILDA University of São Paulo/SP São Paulo Brazil Brown University Providence USA STARCK, JEAN-LUC CEA/Saclay Gif sur Yvette France STAUFFER, ANDRÉ Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland STEFANOVIC, DARKO University of New Mexico Albuquerque USA STEIGLITZ, KEN Princeton University Princeton USA

SNIJDERS, TOM A. B. University of Oxford Oxford United Kingdom

STEPANIUK, JAROSŁAW Białystok University of Technology Białystok Poland

SOBEL, JOEL University of California San Diego USA

STEPHENS, DAVID A. McGill University Montreal Canada

XLI

XLII

Contributors

STIGLIC, GREGOR University of Maribor Maribor Slovenia

TORRA , VICENÇ Institut d’Investigació en Intelligència Artificial – CSIC Bellaterra Spain

STOJANOVIC, MILAN Columbia University New York USA

TRAUB, JOSEPH F. Columbia University New York USA

SURI , N IRANJAN Institute for Human and Machine Cognition Pensacola USA SUSSMAN, GERALD JAY Massachusetts Institute of Technology Cambridge USA SUTNER, KLAUS Carnegie Mellon University Pittsburgh USA

TROITZSCH, KLAUS G. Universität Koblenz-Landau Koblenz Germany TSALLIS, CONSTANTINO Centro Brasileiro de Pesquisas Físicas Rio de Janeiro Brazil Santa Fe Institute Santa Fe USA

TÄUBER, UWE CLAUS Virginia Polytechnic Institute and State University Blacksburg USA

TSENG, VINCENT S. National Cheng–Kung University Tainan Taiwan

TEMPESTI , GIANLUCA University of York York UK

TSUMOTO, SHUSAKU Faculty of Medicine, Shimane University Shimane Japan

TEUSCHER, CHRISTOF Los Alamos National Laboratory Los Alamos USA

TÜRK¸S EN, I. BURHAN TOBB-ETÜ, (Economics and Technology University of the Union of Turkish Chambers and Commodity Exchanges) Ankara Republic of Turkey

TIMMIS, JON University of York York UK University of York York UK

UMEO, HIROSHI University of Osaka Osaka Japan

TINCANI , MICHELA University of Pennsylvania Philadelphia USA

UNTIEDT, ELIZABETH A. University of Colorado Denver Denver USA

TOMALA , TRISTAN HEC Paris Paris France

VALENTE, THOMAS W. University of Southern California Alhambra USA

Contributors

VALVERDE, SERGI Parc de Recerca Biomedica de Barcelona Barcelona Spain

W EBB, STEVE Georgia Institute of Technology Atlanta USA

VERHAGEN, HARKO Stockholm University and Royal Institute of Technology Stockholm Sweden

W HITE, ANDREW G. The University of Queensland Brisbane Australia

VERLIC, MATEJA University of Maribor Maribor Slovenia

W IESNER, KAROLINE University of Bristol Bristol UK University of Bristol Bristol UK

VITEK, JAN Purdue University West Lafayette USA VIZZARI , GIUSEPPE University of Milan-Bicocca Milan Italy VOLIJ, OSCAR Ben-Gurion University Beer-Sheva Israel

W OODERS, MYRNA Vanderbilt University Nashville USA University of Warwick Coventry UK

VOORHEES, BURTON Athabasca University Athabasca Canada

W OODS, DAMIEN University College Cork Cork Ireland University if Seville Seville Spain

W AKO, JUN Gakushuin University Tokyo Japan

W ORSCH, THOMAS Universität Karlsruhe Karlsruhe Germany

W ANG, BING-HONG University of Science and Technology of China Hefei Anhui China Shanghai Academy of System Science Shanghai China

YANG, W EI -Z HE National Taiwan University Taipei Taiwan

W ASILEWSKA, ANITA Stony Brook University Stony Brook USA W ATROUS, JOHN University of Waterloo Waterloo Canada

YEUNG, CHI HO The Hong Kong University of Science and Technology Hong Kong China Université de Fribourg Pérolles, Fribourg Switzerland University of Electronic Science and Technology of China (UESTC) Chengdu China

XLIII

XLIV

Contributors

YILMAZ, LEVENT Auburn University Alabama USA

Z HANG, YAN-QING Georgia State University Atlanta USA

YULMETYEV, RENAT M. Kazan State University Kazan Russia Tatar State University of Pedagogical and Humanities Sciences Kazan Russia

Z HANG, YI -CHENG The Hong Kong University of Science and Technology Hong Kong China Université de Fribourg Pérolles, Fribourg Switzerland University of Electronic Science and Technology of China (UESTC) Chengdu China

Z ADEH, LOTFI A. University of California Berkeley USA Z AMIR, SHMUEL Hebrew University Jerusalem Israel Z EIGLER, BERNARD University of Arizona Tucson USA Z EITOUNI, OFER University of Minnesota Minneapolis USA Ž ENKO, BERNARD Jožef Stefan Institute Ljubljana Slovenia

Z HAO, MING University of Science and Technology of China Hefei Anhui China Z HAO, YI University of Toronto Toronto Canada Z HAO, Z HENG Arizona State University Tempe USA Z HOU, HAO-MIN Georgia Institute of Technology Atlanta USA

Z HANG, BO Tsinghua University Beijing China

Z HOU, TAO University of Science and Technology of China Hefei Anhui China

Z HANG, LING Anhui University, Hefei Anhui China

Z ORMAN, MILAN University of Maribor Maribor Slovenia

Additive Cellular Automata

Additive Cellular Automata BURTON VOORHEES Center for Science, Athabasca University, Athabasca, Canada

Article Outline Glossary Definition of the Subject Introduction Notation and Formal Definitions Additive Cellular Automata in One Dimension d-Dimensional Rules Future Directions Bibliography

Glossary Cellular automata Cellular automata are dynamical systems that are discrete in space, time, and value. A state of a cellular automaton is a spatial array of discrete cells, each containing a value chosen from a finite alphabet. The state space for a cellular automaton is the set of all such configurations. Alphabet of a cellular automaton The alphabet of a cellular automaton is the set of symbols or values that can appear in each cell. The alphabet contains a distinguished symbol called the null or quiescent symbol, usually indicated by 0, which satisfies the condition of an additive identity: 0 C x D x. Cellular automata rule The rule, or update rule of a cellular automaton describes how any given state is transformed into its successor state. The update rule of a cellular automaton is described by a rule table, which defines a local neighborhood mapping, or equivalently as a global update mapping. Additive cellular automata An additive cellular automaton is a cellular automaton whose update rule satisfies the condition that its action on the sum of two states is equal to the sum of its actions on the two states separately. Linear cellular automata A linear cellular automaton is a cellular automaton whose update rule satisfies the condition that its action on the sum of two states separately equals action on the sum of the two states plus its action on the state in which all cells contain the quiescent symbol. Note that some researchers reverse the definitions of additivity and linearity.

Neighborhood The neighborhood of a given cell is the set of cells that contribute to the update of value in that cell under the specified update rule. Rule table The rule table of a cellular automaton is a listing of all neighborhoods together with the symbol that each neighborhood maps to under the local update rule. Local maps of a cellular automaton The local mapping for a cellular automaton is a map from the set of all neighborhoods of a cell to the automaton alphabet. State transition diagram The state transition diagram (STD) of a cellular automaton is a directed graph with each vertex labeled by a possible state and an edge directed from a vertex x to a vertex y if and only if the state labeling vertex x maps to the state labeling vertex y under application of the automaton update rule. Transient states A transient state of a cellular automaton is a state that can at most appear only once in the evolution of the automaton rule. Cyclic states A cyclic state of a cellular automaton is a state lying on a cycle of the automaton update rule, hence it is periodically revisited in the evolution of the rule. Basins of attraction The basins of attraction of a cellular automaton are the equivalences classes of cyclic states together with their associated transient states, with two states being equivalent if they lie on the same cycle of the update rule. Predecessor state A state x is the predecessor of a state y if and only if x maps to y under application of the cellular automaton update rule. More specifically, a state x is an nth order predecessor of a state y if it maps to y under n applications of the update rule. Garden-of-Eden A Garden-of-Eden state is a state that has no predecessor. It can be present only as an initial condition. Surjectivity A mapping is surjective (or onto) if every state has a predecessor. Injectivity A mapping is injective (one-to-one) if every state in its domain maps to a unique state in its range. That is, if states x and y both map to a state z then x D y. Reversibility A mapping X is reversible if and only if a second mapping X 1 exists such that if X (x) D y then X 1 (y) D x. For finite state spaces reversibility and injectivity are identical. Definition of the Subject Cellular automata are discrete dynamical systems in which an extended array of symbols from a finite alphabet is it-

1

2

Additive Cellular Automata

eratively updated according to a specified local rule. Originally developed by John von Neumann [1,2] in 1948, following suggestions from Stanislaw Ulam, for the purpose of showing that self-replicating automata could be constructed. Von Neumann’s construction followed a complicated set of reproduction rules but later work showed that self-reproducing automata could be constructed with only simple update rules, e. g. [3]. More generally, cellular automata are of interest because they show that highly complex patterns can arise from the application of very simple update rules. While conceptually simple, they provide a robust modeling class for application in a variety of disciplines, e. g. [4], as well as fertile grounds for theoretical research. Additive cellular automata are the simplest class of cellular automata. They have been extensively studied from both theoretical and practical perspectives. Introduction A wide variety of cellular automata applications, in a number of differing disciplines, has appeared in the past fifty years, see, e. g. [5,6,7,8]. Among other things, cellular automata have been used to model growth and aggregation processes [9,10,11,12]; discrete reaction-diffusion systems [13,14,15,16,17]; spin exchange systems [18,19]; biological pattern formation [20,21]; disease processes and transmission [22,23,24,25,26]; DNA sequences, and gene interactions [27,28]; spiral galaxies [29]; social interaction networks [30]; and forest fires [31,32]. They have been used for language and pattern recognition [33, 34,35,36,37,38,39]; image processing [40,41]; as parallel computers [42,43,44,45,46,47]; parallel multipliers [48]; sorters [49]; and prime number sieves [50]. In recent years, cellular automata have become important for VLSI logic circuit design [51]. Circuit designers need “simple, regular, modular, and cascadable logic circuit structure to realize a complex function” and cellular automata, which show a significant advantage over linear feedback shift registers, the traditional circuit building block, satisfy this need (see [52] for an extensive survey). Cellular automata, in particular additive cellular automata, are of value for producing high-quality pseudorandom sequences [53,54,55,56,57]; for pseudoexhaustive and deterministic pattern generation [58,59,60,61,62,63]; for signature analysis [64,65,66]; error correcting codes [67,68]; pseudoassociative memory [69]; and cryptography [70]. In this discussion, attention focuses on the subclass of additive cellular automata. These are the simplest cellular automata, characterized by the property that the action of the update rule on the sum of two states is equal to the sum of the rule acting on each state separately. Hybrid

additive rules (i. e., with different cells evolving according to different additive rules) have proved particularly useful for generation of pseudorandom and pseudoexhaustive sequences, signature analysis, and other circuit design applications, e. g. [52,71,72]. The remainder of this article is organized as follows: Sect. “Notation and Formal Definitions” introduces definitions and notational conventions. In Sect. “Additive Cellular Automata in One Dimension”, consideration is restricted to one-dimensional rules. The influence of boundary conditions on the evolution of one-dimensional rules, conditions for rule additivity, generation of fractal space-time outputs, equivalent forms of rule representation, injectivity and reversibility, transient lengths, and cycle periods are discussed using several approaches. Taking X as the global operator for an additive cellular automata, a method for analytic solution of equations of the form X () D ˇ is described. Section “d-Dimensional Rules” describes work on d-dimensional rules defined on tori. The discrete baker transformation is defined and used to generalize one-dimensional results on transient lengths, cycle periods, and similarity of state transition diagrams. Extensive references to the literature are provided throughout and a set of general references is provided at the end of the bibliography. Notation and Formal Definitions Let S(L) D fs i g be the set of lattice sites of a d-dimensional lattice L with nr equal to the number of lattice sites on dimension r. Denote by A a finite symbols set with jAj D p (usually prime). An A-configuration on L is a surjective map v : A 7! S(L) that assigns a symbol from A to each site in S(L). In this way, every A-configuration defines a size n1      nd , d-dimensional matrix  of symbols drawn from A. Denote the set of all A-configurations on L by E (A; L). Each s i 2 S(L) is labeled by an integer vector iE D (i1 ; : : : ; i d ) where i r is the number of sites along the rth dimension separating s i from the assigned origin in L. The shift operator on the rth dimension of L is the map r : L 7! L defined by r (s i ) D s j ;

Ej D (i1 ; : : : ; i r  1; : : : ; i d ) :

(1)

Equivalently, the shift maps the value at site Ei to the value at site Ej. Let (s i ; t) D (i1 ; : : : ; i d ; t) 2 A be the entry of  corresponding to site s i at iteration t for any discrete dynamical system having E (A; L) as state space. Given a finite set of integer d-tuples N D f(k1 ; : : : ; kd )g, define the

Additive Cellular Automata

N -neighborhood of a site s i 2 S(L) as

o n ˇ N(s i ) D s j ˇ jE D Ei C Ek; kE 2 N

(2)

A neighborhood configuration is a surjective map v : A 7! N(s0 ). Denote the set of all neighborhood configurations by EN (A). The rule table for a cellular automata acting on the state space E (A; L) with standard neighborhood N(s0 ) is defined by a map x : EN (A) 7! A (note that this map need not be surjective or injective). The value of x for a given neighborhood configuration is called the (value of the) rule component of that configuration. The map x : EN (A) 7! A induces a global map X : E (A; L) 7! E (A; L) as follows: For any given element ˇ ˚  (t) 2 E (A; L), the set C (s i ) D (s j ; t) ˇs j 2 N(s i ) is a neighborhood configuration for the site s i , hence the map (s i ; t) 7! x(C (s i )) for all s i produces a new symbol (s i ; t C 1). The site s i is called the mapping site. When taken over all mapping sites, this produces a matrix (t C 1) that is the representation of X ((t)). A cellular automaton is indicated by reference to its rule table or to the global map defined by this rule table. A cellular automaton with global map X is additive if and only if, for all pairs of states  and ˇ, X ( C ˇ) D X () C X (ˇ)

(3)

Addition of states is carried out site-wise mod(p) on the matrix representations of  and ˇ; for example, for a onedimensional six-site lattice with p D 3 the sum of 120112 and 021212 is 111021. The definition for additivity given in [52] differs slightly from this standard definition. There, a binary valued cellular automaton is called “linear” if its local rule only involves the XOR operation and “additive” if it involves XOR and/or XNOR. A rule involving XNOR can be written as the binary complement of a rule involving only XOR. In terms of the global operator of the rule, this means that it has the form 1 C X where X satisfies Eq. (3) and 1 represents the rule that maps every site to 1. Thus, (1 C X )( C ˇ) equals 1 : : : 1 C X ( C ˇ) while (1 C X )() C (1 C X )(ˇ) D 1 : : : 1 C 1 : : : 1 C X () C X (ˇ) D X () C X (ˇ) mod(2) : In what follows, an additive rule is defined strictly as one obeying Eq. (3), corresponding to rules that are “linear” in [52]. Much of the formal study of cellular automata has focused on the properties and forms of representation of the map X : E (A; L) 7! E (A; L). The structure of the state

transition diagram (STD(X )) of this map is of particular interest. Example 1 (Continuous Transformations of the Shift Dynamical System) Let L be isomorphic to the set of integers Z. Then E (A; Z) is the set of infinite sequences with entries from A. With the product topology induced by the discrete topology on A and  as the left shift map, the system (E (A; Z);  ) is the shift dynamical system on A. The set of cellular automata maps X : E (A; Z) 7! E (A; Z) constitutes the class of continuous shift-commuting transformations of (E (A; Z);  ), a fundamental result of Hedlund [73]. Example 2 (Elementary Cellular Automata) Let L be isomorphic to L with A D f0; 1g and N D f1; 0; 1g. The neighborhood of site s i is fs i1 ; s i ; s iC1 g and EN (A) D f000; 001; 010; 011; 100; 101; 110; 111g. In this one-dimensional case, the rule table can be written as x i D x(i0 i1 i2 ) where i0 i1 i2 is the binary form of the index i. Listing this gives the standard form for the rule table of an elementary cellular automata. 000 x0

001 x1

010 x2

011 x3

100 x4

101 x5

110 x6

111 x7

The standard labeling scheme for elementary cellular automata was introduced by Wolfram [74], who observed that the rule table for elementary rules defines the binary P number 71D0 x i 2 i and used this number to label the corresponding rule. Example 3 (The Game of Life) This simple 2-dimensional cellular automata was invented by John Conway to illustrate a self-reproducing system. It was first presented in 1970 by Martin Gardner [75,76]. The game takes place on a square lattice, either infinite or toridal. The neighborhood of a cell consists of the eight cells surrounding it. The alphabet is {0,1}: a 1 in a cell indicates that cell is alive, a 0 indicates that it is dead. The update rules are: (a) If a cell contains a 0, it remains 0 unless exactly three of its neighbors contain a 1; (b) If a cell contains a 1 then it remains a 1 if and only if two or three of its neighbors are 1. This cellular automata produces a number of interesting patterns including a variety of fixed points (still life); oscillators (period 2); and moving patterns (gliders, spaceships); as well as more exotic patterns such as glider guns which generate a stream of glider patterns. Additive Cellular Automata in One Dimension Much of the work on cellular automata has focused on rules in one dimension (d D 1). This section reviews some of this work.

3

4

Additive Cellular Automata

Boundary Conditions and Additivity In the case of one-dimensional cellular automata, the lattice L can be isomorphic to the integers; to the nonnegative integers; to the finite set f0; : : : ; n  1g 2 Z; or to the integers modulo an integer n. In the first case, there are no boundary conditions; in the remaining three cases, different boundary conditions apply. If L is isomorphic to Zn , the integers mod(n), the boundary conditions are periodic and the lattice is circular (it is a p-adic necklace). This is called a cylindrical cellular automata [77] because evolution of the rule can be represented as taking place on a cylinder. If the lattice is isomorphic to f0; : : : ; n  1g, null, or Dirchlet boundary conditions are set [78,79,80]. That is, the symbol assigned to all sites in L outside of this set is the null symbol. When the lattice is isomorphic to the non-negative integers ZC , null boundary conditions are set at the left boundary. In these latter two cases, the neighborhood structure assumed may influence the need for null conditions. Example 4 (Elementary Rule 90) Let ı represent the global map for the elementary cellular automata rule 90, with rule table 000 0

001 1

010 0

011 1

100 1

101 0

110 1

111 0

For a binary string  in Z or Zn the action of rule 90 is defined by [ı()] i D  i1 C  iC1 mod(2), where all indices are taken mod(n) in the case of Zn . In the remaining cases, 8 iD0 ˆ jwj  c :

K(wjv) < K' (wjv) C c : Let z be a shortest program for x knowing v, and let z0 be cz 0 is a proa shortest program for y knowing hv; xi. As z; gram for hx; yi knowing v with representation system ', one has

Proposition 2 For any c 2 RC , there are at least (2nC1  2nc ) c-incompressible words w such that jwj 6 n. Proof Each program produces only one word; therefore there cannot be more words whose complexity is below n than the number of programs of size n. Hence, one finds

K(hx; yijv) 0

cz j C c < K' (wjv) C c 6 jz; 6 K(xjv) C K(yjhv; xi) C 2 log2 (K(xjv)) C 3 C c ; which completes the proof.

The next result proves the existence of incompressible words.



Of course, this inequality is not an equality since if x D y, the complexity of K(hx; yi) is the same as K(x) up to some constant. When the equality holds, it is said that x and y have no mutual information. Note that if we can represent the combination of programs z and z0 in a shorter way, the upper bound is decreased by the same amount. For instance, if we choose x ; y D `(`(x))2 01`(x)x y, it becomes

b

2 log2 log2 jxj C log2 jxj C jxj C jyj : If we know that a program never contains a special word u, then with x ; y D xuy it becomes jxj C jyj.

b

Examples The properties seen so far allow one to prove most of the relations between complexities of words. For instance, choosing f : w 7! ww and g being the identity in Corollary 1, we obtain that there is a constant c such that jK(wwjv)  K(wjv)j < c. In the same way, letting f be the identity and g be defined by g(x) D " in Theorem 3, we get that there is a constant c such that K(wjv) < K(w) C c. By Theorem 4, Corollary 1, and choosing f as hx; yi 7! x y, and g as the identity, we have that there is a constant c such that K(x y) < K(x)CK(y)C2 min log2 (K(x)); log2 (K(y)))Cc:

jfw; K(w) < ngj 6 2n  1 : This implies that jfw; K(w) < jwj  c and jwj 6 ngj 6 jfw; K(w) < ngj < 2nc ; and, since fw; K(w) > jwj  c and jwj 6 ng  fw; jwj 6 ngnfw; K(w) < jwj  c and jwj 6 ng ; it holds that jfw; K(w) > jwj  c and jwj 6 ngj > 2nC1  2nc :  From this proposition one deduces that half of the words whose size is less than or equal to n reach the upper bound of their Kolmogorov complexity. This incompressibility method relies on the fact that most words are incompressible. Martin–Löf Tests Martin–Löf tests give another equivalent definition of random sequences. The idea is simple: a sequence is random if it does not pass any computable test of singularity (i. e.,

Algorithmic Complexity and Cellular Automata

a test that selects words from a “negligible” recursively enumerable set). As for Kolmogorov complexity, this notion needs a proper definition and implies a universal test. However, in this review we prefer to express tests in terms of Kolmogorov complexity. (We refer the reader to [13] for more on Martin–Löf tests.) Let us start with an example. Consider the test “to have the same number of 0s and 1s.” Define the set E D fw; w has the same number of 0s and 1sg and order it in military order (length first, then lexicographic order), E D fe0 ; e1 ; : : : g. Consider the computable function f : x 7! e x (which is in fact computable   for any decidable set E). Note that there are 2n n words has length 2n, x is less than in E of length 2n. Thus, if e x 2n p n C O(1). Then, us, whose logarithm is 2n  log 2 n ing Theorem 3 with function f , one finds that r je x j K(e x ) < K(x) < log2 x < je x j  log2 ; 2 up to an additive constant. We conclude that all members ex of E are not c-incompressible whenever they are long enough. This notion of test corresponds to “belonging to a small set” in Kolmogorov complexity terms. The next proposition formalizes these ideas. Proposition 3 Let E be a recursively enumerable set such that jE \ f0; 1gn j D o(2n ). Then, for all constants c, there is an integer M such that all words of length greater than M of E are not c-incompressible. Proof In this proof, we represent integers as binary strings. Let x ; y be defined, for integers x and y, by

b

b

x ; y D jxj2 01x y : Let f be the computable function 8 < f0; 1g ! f0; 1g ; f : x; y 7! the yth word of length x : in the enumeration of E :

b

From Theorems 2 and 3, there is a constant d such that for all words u of E K(e) < j f 1 (e)j C d : Note u n D jE \ f0; 1gn j. From the definition of function f one has j f 1 (e)j 6 3 C 2 log2 (jej) C log2 (ujej ) : As u n D o(2n ), log2 (ujej )  jej tends to 1, so there is an integer M such that when jej > M, K(e) < j f 1 (e)j C d < jejc. This proves that no members of E whose length is greater than M are c-incompressible. 

Dynamical Systems and Symbolic Factors A (discrete-time) dynamical system is a structure hX; f i where X is the set of states of the system and f is a function from X to itself that, given a state, tells which is the next state of the system. In the literature, the state space is usually assumed to be a compact metric space and f is a continuous function. The study of the asymptotic properties of a dynamical system is, in general, difficult. Therefore, it can be interesting to associate the studied system with a simpler one and deduce the properties of the original system from the simple one. Indeed, under the compactness assumption, one can associate each system hX; f i with its symbolic factor as follows. Consider a finite open covering fˇ0 ; ˇ1 ; : : : ; ˇ k g of X. Label each set ˇ i with a symbol ˛ i . For any orbit of initial condition x 2 X, we build an infinite word wx on f˛0 ; ˛1 ; : : : ; ˛ k g such that for all i 2 N, w x (i) D ˛ j if f i (x) 2 ˇ j for some j 2 f0; 1; : : : ; kg. If f i (x) belongs to ˇ j \ ˇ h (for j ¤ h), then arbitrarily choose either ˛ j or ˛ h . Denote by Sx the set of infinite words associated with S the initial condition x 2 X and set S D x2X S x . The system hS;  i is the symbolic system associated with hX; f i;  is the shift map defined as 8x 2 S8i 2 N;  (x) i D x iC1 . When fˇ0 ; ˇ1 ; : : : ; ˇ k g is a clopen partition, then hS;  i is a factor of hX; f i (see [12] for instance). Dynamical systems theory has made great strides in recent decades, and a huge quantity of new systems have appeared. As a consequence, scientists have tried to classify them according to different criteria. From interactions among physicists was born the idea that in order to simulate a certain physical phenomenon, one should use a dynamical system whose complexity is not higher than that of the phenomenon under study. The problem is the meaning of the word “complexity”; in general it has a different meaning for each scientist. From a computerscience point of view, if we look at the state space of factor systems, they can be seen as sets of bi-infinite words on a fixed alphabet. Hence, each factor hS;  i can be associated with a language as follows. For each pair of words u w write u v if u is a factor of w. Given a finite word u and a bi-infinite word v, with an abuse of notation we write u v if u occurs as a factor in v. The language L(v) associated with a bi-infinite word v 2 A is defined as L(v) D fu 2 A ; u vg : Finally, the language L(S) associated with the symbolic factor hS;  i is given by L(S) D fu 2 A ; u vg :

137

138

Algorithmic Complexity and Cellular Automata

The idea is that the complexity of the system hX; f i is proportional to the language complexity of its symbolic factor hS;  i (see, for example,  Topological Dynamics of Cellular Automata). In [4,5], Brudno proposes to evaluate the complexity of symbolic factors using Kolmogorov complexity. Indeed, the complexity of the orbit of initial condition x 2 X according to finite open covering fˇ0 ; ˇ1 ; : : : ; ˇ k g is defined as K(x; f ; fˇ0 ; ˇ1 ; : : : ; ˇ k g) D lim sup min

n!1 w2S x

K(w0:n ) : n

Finally, the complexity of the orbit x is given by K(x; f ) D sup K(x; f ; ˇ) ; where the supremum is taken w.r.t. all possible finite open coverings ˇ of X. Brudno has proven the following result. Theorem 5 ([4]) Consider a dynamical system hX; f i and an ergodic measure . For -almost all x 2 X, K(x; f ) D H ( f ), where H ( f ) is the measure entropy of hX; f i (for more on the measure entropy see, for example,  Ergodic Theory of Cellular Automata). Cellular Automata Cellular automata (CA) are (discrete-time) dynamical systems that have received increased attention in the last few decades as formal models of complex systems based on local interaction rules. This is mainly due to their great variety of distinct dynamical behavior and to the possibility of easily performing large-scale computer simulations. In this section we quickly review the definitions and useful results, which are necessary in the sequel. Consider the set of configurations C , which consists of all functions from Z D into A. The space C is usually equipped with the Cantor metric dC defined as 8a; b 2 C ;

dC (a; b) D 2n ; with n D min fkE v k1 : a(E v ) ¤ b(E v )g ; vE2Z D

(3)

where kE v k1 denotes the maximum of the absolute value of the components of vE. The topology induced by dC coincides with the product topology induced by the discrete topology on A. With this topology, C is a compact, perfect, and totally disconnected space. Let N D fuE1 ; : : : ; uEs g be an ordered set of vectors of Z D and f : As 7! A be a function. Definition 5 (CA) The D-dimensional CA based on the local rule ı and the neighborhood frame N is the pair

hC ; f i, where f : C 7! C is the global transition rule defined as follows: 8c 2 C ;

8E v 2 ZD ;   v C uEs ) : f (c)(E v ) D ı c(E v C uE1 ); : : : ; c(E

(4)

Note that the mapping f is (uniformly) continuous with respect to dC . Hence, the pair hC ; f i is a proper (discretetime) dynamical system. In [7], a new metric on the phase space is introduced to better match the intuitive idea of chaotic behavior with its mathematical formulation. More results in this research direction can be found in [1,2,3,15]. This volume dedicates a whole chapter to the subject ( Dynamics of Cellular Automata in Non-compact Spaces). Here we simply recall the definition of the Besicovitch distance since it will be used in Subsect. “Example 2”. Consider the Hamming distance between two words u v on the same alphabet A, #(u; v) D jfi 2 N j u i ¤ v i gj. This distance can be easily extended to work on words that are factors of bi-infinite words as follows: # h;k (u; v) D jfi 2 [h; k] j u i ¤ v i gj ; where h; k 2 Z and h < k. Finally, the Besicovitch pseudodistance is defined for any pair of bi-infinite words u v as dB (u; v) D lim sup n!1

#n;n (u; v) : 2n C 1

The pseudodistance dB can be turned into a distance when : taking its restriction to AZ jD: , where D is the relation of “being at null dB distance.” Roughly speaking, dB measures the upper density of differences between two bi-infinite words. The space-time diagram  is a graphical representation of a CA orbit. Formally, for a D-dimensional CA f D with state set A,  is a function from AZ  N  Z to A D defined as  (x; i) D f i (x) j , for all x 2 AZ , i 2 N and j 2 Z. The limit set ˝ f contains the long-term behavior of a CA on a given set U and is defined as follows: \ f n (U) : ˝ f (U) D n2N

Unfortunately, any nontrivial property on the CA limit set is undecidable [11]. Other interesting information on a CA’s long-term behavior is given by the orbit limit set  defined as follows: [ O0f (u) ;  f (U) D u2U

Algorithmic Complexity and Cellular Automata

where H 0 is the set of adherence points of H. Note that in general the limit set and the orbit limit sets give different information. For example, consider a rich configuration c i. e., a configuration containing all possible finite patterns (here we adopt the terminology of [6]) and the shift map  . Then,  (fcg) D AZ , while ˝ (fcg) is countable. Algorithmic Complexity as a Demonstration Tool The incompressibility method is a valuable formal tool to decrease the combinatorial complexity of problems. It is essentially based on the following ideas (see also Chap. 6 in [13]):  Incompressible words cannot be produced by short programs. Hence, if one has an incompressible (infinite) word, it cannot be algorithmically obtained.  Most words are incompressible. Hence, a word taken at random can usually be considered to be incompressible without loss of generality.  If one “proves” a recursive property on incompressible words, then, by Proposition 3, we have a contradiction.

Application to Cellular Automata A portion of a space-time diagram of a CA can be computed given its local rule and the initial condition. The part of the diagram that depends only upon it has at most the complexity of this portion (Fig. 1). This fact often implies great computational dependencies between the initial part and the final part of the portion of the diagram: if the final part has high complexity, then the initial part must be at least as complex. Using this basic idea, proofs are structured as follows: assume that the final part has high complexity; use the hypothesis to prove that the initial part is not complex; then we have a contradiction. This technique provides a faster and clearer way to prove results that could be obtained by technical combinatorial arguments. The first example illustrates this fact by rewriting a combinatorial proof in terms of Kolmogorov complexity. The second one is a result that was directly written in terms of Kolmogorov complexity. Example 1 Consider the following result about languages recognizable by CA. Proposition 4 ([17]) The language L D fuvu; u; v 2 f0; 1g ; juj > 1g is not recognizable by a real-time oneway CA.

Algorithmic Complexity and Cellular Automata, Figure 1 States in the gray zone can be computed for the states of the black line: K(gray zone) 6 K(black line) up to an additive constant

Before giving the proof in terms of Kolmogorov complexity, we must recall some concepts. A language L is accepted in real-time by a CA if is accepted after jxj transitions. An input word is accepted by a CA if after accepting state. A CA is one-way if the neighborhood is f0; 1g (i. e., the central cell and the one on its right). Proof One-way CA have some interesting properties. First, from a one-way CA recognizing L, a computer can build a one-way CA recognizing L˙ D fuvu; u; v 2 ˙  ; juj > 1g, where ˙ is any finite alphabet. The idea is to code the ith letter of ˙ by 1u0 1u1    1u k 00, where u0 u1    u k is i written in binary, and to apply the former CA. Second, for any one-way CA of global rule f we have that f jwj (xw y) D f jwj (xw) f jwj (w y) for all words x, y, and w. Now, assume that a one-way CA recognizes L in real time. Then, there is an algorithm F that, for any integer n, computes a local rule for a one-way CA that recognizes Lf0;:::;n1g in real time. Fix an integer n and choose a 0-incompressible word w of length n(n  1). Let A be the subset of Z of pairs (x; y) for x; y 2 f0; : : : ; n  1g and let x 6D y be defined by A D f(x; y); the nx C yth bit of w is 1g : Since A can be computed from w and vice versa, one finds that K(A) D K(w) D jwj D n(n  1) up to an additive constant that does not depend on n. Order the set A D f(x0 ; y0 ); (x1 ; y1 ); : : : ; g and build a new word u as follows: u0 D y 0 x 0 ; u iC1 D u i y iC1 x iC1 u i ; u D ujAj1 ;

139

140

Algorithmic Complexity and Cellular Automata

From Lemma 1 in [17], one finds that for all x; y 2 f0; : : : ; n  1g, (x; y) 2 A , xuy 2 L. Let ı be the local rule produced by F(n), Q the set of final states, and f the associated global rule. Since f

juj

(xuy) D f

juj

(xu) f

juj

(uy);

0  y!n , f , u and n; 0  The two words of y 0 of length ur that surround y!n 0 and that are missing to compute x!n with Eq. (6).

We obtain that 0 0 K(x!n jy!n )  2urCK(u)CK(n)CK( f )CO(1)  o(n)

one has that

(7)

(x; y) 2 A , xuy 2 L , ı( f

juj

(xu); f

juj

(uy)) 2 Q :

f juj (xu)

Hence, from the knowledge of each and f juj (ux) for x 2 f0; : : : ; n  1g, a list of 2n integers of f0; : : : ; n  1g, one can compute A. We conclude that K(A) 6 2n log2 (n) up to an additive constant that does not depend on n. This contradicts K(A) D n(n  1) for large enough n.  Example 2 Notation Given x 2 AZ , we denote by x!n the word D w 2 S (2nC1) obtained by taking all the states xi for i 2 (2n C 1) D in the martial order. The next example shows a proof that directly uses the incompressibility method. D

Theorem 6 In the Besicovitch topological space there is no transitive CA. Recall that a CA f is transitive if for all nonempty sets A B there exists n 2 N such that f n (A) \ B ¤ ;. Proof By contradiction, assume that there exists a transitive CA f of radius r with C D jSj states. Let x and y be two configurations such that 8n 2 N; K(x!n jy!n )

n : 2

A simple counting argument proves that such configurations x and y always exist. Since f is transitive, there are two configurations x 0 and y 0 such that for all n 2 N

(the notations O and o are defined with respect to n). Note that n is fixed and hence K(n) is a constant bounded by log2 n. Similarly, r and S are fixed, and hence K(f ) is constant and bounded by C 2rC1 log2 C C O(1). 0 Let us evaluate K(y!n jy!n ). Let a1 ; a2 ; a3 ; : : : ; a k be 0 differ at, sorted the positive positions that y!n and y!n in increasing order. Let b1 D a1 and b i D a i  a i1 , for 2  i  k. By Eq. (5) we know that k  4ın. Note that Pk iD1 b i D a k  n. Symmetrically let a10 ; a20 ; a30 ; : : : ; a 0k 0 be the absolute values of the strictly negative positions that y!n and 0 differ at, sorted in increasing order. Let b10 D a10 and y!n 0 b i D a0i  a0i1 , where 2  i  k 0 . Equation (5) states that k 0  4ın. Since the logarithm is a concave function, one has P X ln b i bi n  ln  ln ; k k k and hence X n ln b i  k ln ; k

(8)

which also holds for b0i and k 0 . Knowledge of the bi , b0i , and k C k 0 states of the cells of 0 0 , is enough to compute y!n , where y!n differs from y!n 0 y!n from y!n . Hence, X 0 jy!n )  ln(b i ) K(y!n X C ln(b0i ) C (k C k 0 ) log2 C C O(1) : Equation (8) states that

0 (x!n ; x!n )  4"n ;

(5)

0 )  4ın ; (y!n ; y!n

and an integer u (which only depends on " and ı) such that f u (y 0 ) D x 0 ;

(6) 1

where " D ı D (4e 10 log2 C ) . In what follows only n varies, while C, u, x, y, x 0 , y 0 , ı, and " are fixed and independent of n. 0 By Eq. (6), one may compute the word x!n from the following items:

0 K(y!n jy!n )  k ln

n n Ck 0 ln 0 C(kCk 0 ) log2 CCO(1): k k

The function k 7! k ln nk is increasing on [0; ne ]. As n k  4ın  10 log , we have that 2C e

k ln

10n n n n  4ın ln  10 log C ln e 10 log2 C  10 log C 2 2 k 4ın e e

and that (k C k 0 ) log2 C 

2n log2 C : e 10 log2 C

Algorithmic Complexity and Cellular Automata

Replacing a, b, and k by a0 , b0 , and k 0 , the same sequence of inequalities leads to a similar result. One deduces that 0 jy!n )  K(y!n

(2 log2 C C 20)n C O(1) : e 10 log2 C

(9)

0 ). Similarly, Eq. (9) is also true with K(x!n jx!n The triangular mequality for Kolmogorov complexity K(ajb)  K(ajc) C K(bjc) (consequence of theorems 3 and 4) gives: 0 0 0 ) C K(x!n jy!n ) K(x!n jy!n )  K(x!n jx!n 0 C K(y!n jy!n ) C O(1) :

Equations (9) and (7) allow one to conclude that K(x!n jy!n ) 

(2 log2 C C 20)n C o(n) : e 10 log2 C

The hypothesis on x and y was K(x!n jy!n ) n2 . This implies that n (2C C 20)n C o(n) :  2 e 10 log2 C The last inequality is false for big enough n.



Measuring CA Structural Complexity Another use of Kolmogorov complexity in the study of CA is to understand what maximum complexity they can produce extracting examples of CA that show high complexity characteristics. The question is to define what is the meaning of “show high complexity characteristics” and, more precisely, what characteristic to consider. This section is devoted to structural characteristics of CA, that is to say, complexity that can be observed through static particularities. The Case of Tilings In this section, we give the original example that was given for tilings, often considered as a static version of CA. In [10], Durand et al. construct a tile set whose tilings have maximum complexity. This paper contains two main results. The first one is an upper bound for the complexity of tilings, and the second one is an example of tiling that reaches this bound. First we recall the definitions about tilings of a plane by Wang tiles. Definition 6 (Tilings with Wang tiles) Let C be a finite set of colors. A Wang tile is a quadruplet (n; s; e; w) of four colors from C corresponding to a square tile whose top color is n, left color is w, right color is e, and bottom color

is s. A Wang tile cannot be rotated but can be copied to infinity. Given a set of Wang tiles T, we say that a plane can be tiled by T if one can place tiles from T on the square grid Z2 such that the adjacent border of neighboring tiles has the same color. A set of tiles T that can tile the plane is called a palette. The notion of local constraint gives a point of view closer to CA than tilings. Roughly speaking, it gives the local constraints that a tiling of the plane using 0 and 1 must satisfy. Note that this notion can be defined on any alphabet, but we can equivalently code any letter with 0 and 1. Definition 7 (Tilings by local constraints) Let r be a positive integer called radius. Let C be a set of square patterns of size 2r C 1 made of 0 and 1 (formally a function from r; r to f0; 1g). The set is said to tile the plane if there is a way to put zeros and ones on the 2-D grid (formally a function from Z2 to f0; 1g) whose patterns of size 2r C 1 are all in C. The possible layouts of zeros and ones on the (2-D) grid are called the tilings acceptable for C. Seminal papers on this subject used Wang tiles. We translate these results in terms of local constraints in order to more smoothly apply them to CA. Theorem 7 proves that, among tilings acceptable by a local constraint, there is always one that is not too complex. Note that this is a good notion since the use of a constraint of radius 1, f 0 ; 1 g, which allows for all possible patterns, provides for acceptable high-complexity tilings since all tilings are acceptable. Theorem 7 ([10]) Let C be a local constraint. There is tiling acceptable for C such that the Kolmogorov complexity of the central pattern of size n is O(n) (recall that the maximal complexity for a square pattern n  n is O(n2 )). Proof The idea is simple: if one knows the bits present in a border of width r of a pattern of size n, there are finitely many possibilities to fill the interior, so the first one in any computable order (for instance, lexicographically when putting all horizontal lines one after the other) has at most the complexity of the border since an algorithm can enumerate all possible fillings and take the first one in the chosen order. Then, if one knows the bits in borders of width r of all central square patterns of size 2n for all positive integers n (Fig. 2), one can recursively compute for each gap the first possible filling for a given computable order. The tiling obtained in this way (this actually defines a tiling since all cells are eventually assigned a value) has the required complexity: in order to compute the central pattern of size n,

141

142

Algorithmic Complexity and Cellular Automata

Algorithmic Complexity and Cellular Automata, Figure 2 Nested squares of size 2k and border width r

the algorithm simply needs all the borders of size 2k for k 6 2n, which is of length at most O(n).  The next result proves that this bound is almost reachable. Theorem 8 Let r be any computable monotone and unbounded function. There exists a local constraint Cr such that for all tilings acceptable for Cr , the complexity of the central square pattern of is O(n/r(n)). The original statement does not have the O part. However, note that the simulation ofp Wang tiles by a local constraint uses a square of size ` D d log ke to simulate a single tile from a set of k Wang tiles; a square pattern of size n in the tiling corresponds to a square pattern of size n` in the original tiling. Function r can grow very slowly (for instance, the inverse of the Ackermann function) provided it grows monotonously to infinity and is computable. Proof The proof is rather technical, and we only give a sketch. The basic idea consists in taking a tiling constructed by Robinson [16] in order to prove that it is undecidable to test if a local constraint can tile the plane. This tiling is self-similar and is represented in Fig. 3. As it contains increasingly larger squares that occur periodically (note that the whole tiling is not periodic but only quasiperiodic), one can perform more and more computation steps within these squares (the periodicity is required to be sure that squares are present).

Algorithmic Complexity and Cellular Automata, Figure 3 Robinson’s tiling

Using this tiling, Robinson can build a local constraint that simulates any Turing machine. Note that in the present case, the situation is more tricky than it seems since some technical features must be assured, like the fact that a square must deal with the smaller squares inside it or the constraint to add to make sure that smaller squares

Algorithmic Complexity and Cellular Automata

have the same input as the bigger one. Using the constraint to forbid the occurrence of any final state, one gets that the compatible tilings will only simulate computations for inputs for which it does not halt. To finish the proof, Durand et al. build a local constraint C that simulates a special Turing machine that halts on inputs whose complexity is small. Such a Turing machine exists since, though Kolmogorov complexity is not computable, testing all programs from " to 1n allows a Turing machine to compute all words whose complexity is below n and halt if it finds one. Then all tilings compatible with C contain in each square an input on which the Turing machine does not halt yet is of high complexity. Function r occurs in the technical arguments since computable zones do not grow as fast as the length of squares.  The Case of Cellular Automata As we have seen so far, one of the results of [10] is that a tile set always produces tilings whose central square patterns of size n have a Kolmogorov complexity of O(n) and not n2 (which is the maximal complexity). In the case of CA, something similar holds for spacetime diagrams. Indeed, if one knows the initial row. One can compute the triangular part of the space-time diagram which depends on it (see Fig. 1). Then, as with tilings, the complexity of an n  n-square is the same as the complexity of the first line, i. e., O(n). However, unlike tilings, in CA, there is no restriction as to the initial configuration, hence CA have simple space-time diagrams. Thus in this case Kolmogorov complexity is not of great help. One idea to improve the results could be to study how the complexity of configurations evolves during the application of the global rule. This aspect is particularly interesting with respect to dynamical properties. This is the subject of the next section. Consider a CA F with radius r and local rule ı. Then, its orbit limit set cannot be empty. Indeed, let a be a state of f . Consider the configuration ! a! . Let s a D ı(a; : : : ; a). Then, f (! a! ) D ! s ! a . Consider now a graph whose vertices are the states of the CA and the edges are (a; s a ). Since each vertex has an outgoing edge (actually exactly one), it must contain a cycle a0 ! a1 !    ! a k ! a0 . Then each of the configurations ! a! i for ! a! . ) D 0 6 i 6 k is in the limit set since f k (! a! i i This simple fact proves that any orbit limit set (and any limit set) of a CA must contain at least a monochromatic configuration whose complexity is low (for any reasonable definition). However, one can build a CA whose orbit limit set contains only complex configurations, except

for the mandatory monochromatic one using the localconstraints technique discussed in the previous section. Proposition 5 There exists a CA whose orbit limit set contains only complex configurations. Proof Let r be any computable monotone and unbounded function and Cr the associated local constraint. Let A be the alphabet that Cr is defined on. Consider the 2-D CA f r on the alphabet A [ f#g (we assume f#g ¤ A). Let r be the radius of f r (the same radius as Cr ). Finally, the local rule ı of f r is defined as follows: ( P0 if P 2 Cr ; ı(P) D # otherwise : Using this local rule, one can verify the following fact. If the configuration c is not acceptable for Cr , then (O(c))0 D f! #! g; (O(c))0 D fcg otherwise. Indeed, if c is acceptable for Cr , then f (c) D c; otherwise there is a position i such that c(i) is not a valid pattern for Cr . Then f (c)(i) D #. By simple induction, this means that all cells that are at a distance less than kr from position i become # after k steps. Hence, for all n > 0, after k > nC2jij steps, r the Cantor distance between f k (c) and ! #! is less than 2n , i. e., O(c) tends to ! #! .  Measuring the Complexity of CA Dynamics The results of the previous section have limited range since they tell something about the quasicomplexity but nothing about the plain complexity of the limit set. To enhance our study we need to introduce some general concepts, namely, randomness spaces. Randomness Spaces Roughly speaking, a randomness space is a structure made of a topological space and a measure that helps in defining which points of the space are random. More formally we can give the following. Definition 8 ([6]) A randomness space is a structure hX; B; i where X is a topological space, B : N ! 2 X a total numbering of a subbase of X, and  a Borel measure. Given a numbering B for a subbase for an open set of X, one can produce a numbering B0 for a base as follows: \ B( j) ; B0 (i) D j2D iC1

where D : N ! fE j E N and E is finiteg is the bijection P defined by D1 (E) D i2E 2 i . B0 is called the base derived from the subbase B. Given two sequences of open sets (Vn )

143

144

Algorithmic Complexity and Cellular Automata

and (U n ) of X, we say that (Vn ) is U-computable if there exists a recursively enumerable set H 2 N such that [ 8n 2 N; Vn D Ui ; i2N;hn;ii2H

where hi; ji D (i C j)(i C j C 1) C j is the classical bijection between N 2 and N. Note that this bijection can be extended to N D (for D > 1) as follows: hx1 ; x2 ; : : : ; x k i D hx1 ; hx2 ; : : : ; x k ii. Definition 9 ([6]) Given a randomness space hX; B; i, a randomness test on X is a B0 -computable sequence (U n ) of open sets such that 8n 2 N; (U n )  2n . Given a randomness test (U n ), a point x 2 X is said to pass the test (U n ) if x 2 \n2N U n . In other words, tests select points belonging to sets of null measure. The computability of the tests assures the computability of the selected null measure sets. Definition 10 ([6]) Given a randomness space hX; B; i, a point x 2 X is nonrandom if x passes some randomness test. The point x 2 X is randomk if it is not nonrandom. Finally, note that for any D 1, hAZ ; B; i is a randomness space when setting B as o n D B( j C D  hi1 ; : : : ; i D i) D c 2 AZ ; c i 1 ;:::;i D D a j ; D

where A D fa1 ; : : : ; a j ; : : : ; a D g and  is the classical product measure built from the uniform Bernoulli measure over A. Theorem 9 ([6]) Consider a D-dimensional CA f . Then, the following statements are equivalent: 1. f is surjective; D 2. 8c 2 AZ , if c is rich (i. e., c contains all possible finite patterns), then f (c) is rich; D 3. 8c 2 AZ , if c is random, then f (c) is random. Theorem 10 ([6]) Consider a D-dimensional CA f . Then, D 8c 2 AZ , if c is not rich, then f (c) is not rich. Theorem 11 ([6]) Consider a 1-D CA f . Then, 8c 2 AZ , if c is nonrandom, then f (c) is nonrandom. D

Note that the result in Theorem 11 is proved only for 1-D CA; its generalization to higher dimensions is still an open problem. Open problem 1 Do D-dimensional CA for D > 1 preserve nonrandomness? From Theorem 9 we know that the property of preserving randomness (resp. richness) is related to surjectivity.

Hence, randomness (resp. richness) preserving is decidable in one dimension and undecidable in higher dimensions. The opposite relations are still open. Open problem 2 Is nonrichness (resp. nonrandomness) a decidable property? Algorithmic Distance In this section we review an approach to the study of CA dynamics from the point of view of algorithmic complexity that is completely different than the one reported in the previous section. For more details on the algorithmic distance see  Chaotic Behavior of Cellular Automata. In this new approach we are going to define a new distance using Kolmogorov complexity in such a way that two points x and y are near if it is “easy” to transform x into y or vice versa using a computer program. In this way, if a CA turns out to be sensitive to initial conditions, for example, then it means that it is able to create new information. Indeed, we will see that this is not the case. Definition 11 The algorithmic distance between x 2 AZ D and y 2 AZ is defined as follows:

D

da (x; y) D lim sup n!1

K(x ! njy ! n) C K(y ! njx ! n) : 2(2n C 1) D

It is not difficult to see that da is only a pseudodistance since there are many points that are at null distance (those that differ only on a finite number of cells, for example). Consider the relation Š of “being at null da distance, D i. e., 8x; y 2 AZ ; x Š y if and only if da (x; y) D 0. Then, D hAZ /Š ; da i is a metric space. Note that the definition of da does not depend on the chosen additively optimal universal description mode since the additive constant disappears when dividing by 2(2n C 1) D and taking the superior limit. Moreover, by Theorem 2, the distance is bounded by 1. The following results summarize the main properties of this new metric space. Theorem 12 ( Chaotic Behavior of Cellular Automata) D The metric space hAZ /Š ; da i is perfect, pathwise connected, infinite dimensional, nonseparable, and noncompact. Theorem 12 says that the new topological space has enough interesting properties to make worthwhile the study of CA dynamics on it. The first interesting result obtained by this approach concerns surjective CA. Recall that surjectivity plays a central role in the study of chaotic behavior since it is a nec-

Algorithmic Complexity and Cellular Automata

essary condition for many other properties used to define deterministic chaos like expansivity, transitivity, ergodicity, and so on (see  Topological Dynamics of Cellular Automata and [8] for more on this subject). The following reD sult proves that in the new topology AZ /Š the situation is completely different. Proposition 6 ( Chaotic Behavior of Cellular Automata) If f is a surjective CA, then d A (x; f (x)) D 0 for D any x 2 AZ /Š . In other words, every CA behaves like the D identity in AZ /Š . Proof In order to compute f (x)!n from x!n , one need only know the index of f in the set of CA with radius r, state set S, and dimension d; therefore K( f (x)!n jx!n ) 6 2dr(2n C 2r C 1)d1 log2 jSj C K( f ) C 2 log2 K( f ) C c ; and similarly K(x!n j f (x)!n ) 6 gdr(2n C 1)d1 log2 jSj C K( f ) C 2 log2 K( f ) C c : Dividing by 2(2n C 1)d and taking the superior limit one  finds that da (x; f (x)) D 0. The result of Proposition 6 means that surjective CA can neither create new information nor destroy it. Hence, they have a high degree of stability from an algorithmic point of view. This contrasts with what happens in the Cantor topology. We conclude that the classical notion of deterministic chaos is orthogonal to “algorithmic chaos,” at least in what concerns the class of CA. Proposition 7 ( Chaotic Behavior of Cellular Automata) Consider a CA f that is neither surjective nor constant. Then, there exist two configuraD tions x; y 2 AZ /Š such that da (x; y) D 0 but da ( f (x); f (y)) 6D 0. In other words, Proposition 7 says that nonsurjective, nonconstant CA are not compatible with Š and hence not continuous. This means that, for any pair of configurations x y, this kind of CA either destroys completely the information content of x y or preserves it in one, say x, and destroys it in y. However, the following result says that some weak form of continuity still persists (see  Topological Dynamics of Cellular Automata and [8] for the definitions of equicontinuity point and sensibility to initial conditions). Proposition 8 ( Chaotic Behavior of Cellular Automata) Consider a CA f and let a be the configuration made with all cells in state a. Then, a is both a fixed point and an equicontinuity point for f .

Even if CA were noncontinuous on AZ /Š , one could still wonder what happens with respect to the usual properties used to define deterministic chaos. For instance, by Proposition 8, it is clear that no CA is sensitive to initial conditions. The following question is still open. D

Open problem 3 Is a the only equicontinuity point for CA D on AZ /Š ? Future Directions In this paper we have illustrated how algorithmic complexity can help in the study of CA dynamics. We essentially used it as a powerful tool to decrease the combinatorial complexity of problems. These kinds of applications are only at their beginnings and much more are expected in the future. For example, in view of the results of Subsect. “Example 1”, we wonder if Kolmogorov complexity can help in proving the famous conjecture that languages recognizable in real time by CA are a strict subclass of linear-time recognizable languages (see [9,14]). Another completely different development would consist in finding how and if Theorem 11 extends to higher dimensions. How this property can be restated in the context of the algorithmic distance is also of great interest. Finally, how to extend the results obtained for CA to other dynamical systems is a research direction that must be explored. We are rather confident that this can shed new light on the complexity behavior of such systems. Acknowledgments This work has been partially supported by the ANR Blanc Project “Sycomore”. Bibliography Primary Literature 1. Blanchard F, Formenti E, Kurka P (1999) Cellular automata in the Cantor, Besicovitch and Weyl topological spaces. Complex Syst 11:107–123 2. Blanchard F, Cervelle J, Formenti E (2003) Periodicity and transitivity for cellular automata in Besicovitch topologies. In: Rovan B, Vojtas P (eds) (MFCS’2003), vol 2747. Springer, Bratislava, pp 228–238 3. Blanchard F, Cervelle J, Formenti E (2005) Some results about chaotic behavior of cellular automata. Theor Comput Sci 349(3):318–336 4. Brudno AA (1978) The complexity of the trajectories of a dynamical system. Russ Math Surv 33(1):197–198 5. Brudno AA (1983) Entropy and the complexity of the trajectories of a dynamical system. Trans Moscow Math Soc 44:127 6. Calude CS, Hertling P, Jürgensen H, Weihrauch K (2001)

145

146

Algorithmic Complexity and Cellular Automata

7.

8.

9.

10.

11. 12. 13.

Randomness on full shift spaces. Chaos, Solitons Fractals 12(3):491–503 Cattaneo G, Formenti E, Margara L, Mazoyer J (1997) A shiftinvariant metric on SZ inducing a non-trivial topology. In: Privara I, Rusika P (eds) (MFCS’97), vol 1295. Springer, Bratislava, pp 179–188 Cervelle J, Durand B, Formenti E (2001) Algorithmic information theory and cellular automata dynamics. In: Mathematical Foundations of Computer Science (MFCS’01). Lectures Notes in Computer Science, vol 2136. Springer, Berlin pp 248– 259 Delorme M, Mazoyer J (1999) Cellular automata as languages recognizers. In: Cellular automata: A parallel model. Kluwer, Dordrecht Durand B, Levin L, Shen A (2001) Complex tilings. In: STOC ’01: Proceedings of the 33rd annual ACM symposium on theory of computing, pp 732–739 Kari J (1994) Rice’s theorem for the limit set of cellular automata. Theor Comput Sci 127(2):229–254 Kurka ˚ P (1997) Languages, equicontinuity and attractors in cellular automata. Ergod Theory Dyn Syst 17:417–433 Li M, Vitányi P (1997) An introduction to Kolmogorov complexity and its applications, 2nd edn. Springer, Berlin

14. M Delorme EF, Mazoyer J (2000) Open problems. Research Report LIP 2000-25, Ecole Normale Supérieure de Lyon 15. Pivato M (2005) Cellular automata vs. quasisturmian systems. Ergod Theory Dyn Syst 25(5):1583–1632 16. Robinson RM (1971) Undecidability and nonperiodicity for tilings of the plane. Invent Math 12(3):177–209 17. Terrier V (1996) Language not recognizable in real time by oneway cellular automata. Theor Comput Sci 156:283–287 18. Wolfram S (2002) A new kind of science. Wolfram Media, Champaign. http://www.wolframscience.com/

Books and Reviews Batterman RW, White HS (1996) Chaos and algorithmic complexity. Fund Phys 26(3):307–336 Bennet CH, Gács P, Li M, Vitányi P, Zurek W (1998) Information distance. EEE Trans Inf Theory 44(4):1407–1423 Calude CS (2002) Information and randomness. Texts in theoretical computer science, 2nd edn. Springer, Berlin Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York White HS (1993) Algorithmic complexity of points in dynamical systems. Ergod Theory Dyn Syst 13:807–830

Amorphous Computing

Amorphous Computing HAL ABELSON, JACOB BEAL, GERALD JAY SUSSMAN Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA Article Outline Glossary Definition of the Subject Introduction The Amorphous Computing Model Programming Amorphous Systems Amorphous Computing Paradigms Primitives for Amorphous Computing Means of Combination and Abstraction Supporting Infrastructure and Services Lessons for Engineering Future Directions Bibliography Glossary Amorphous computer A collection of computational particles dispersed irregularly on a surface or throughout a volume, where individual particles have no a priori knowledge of their positions or orientations. Computational particle A (possibly faulty) individual device for an amorphous computer. Each particle has modest computing power and a modest amount of memory. The particles are not synchronized, although they are all capable of operating at similar speeds, since they are fabricated by the same process. All particles are programmed identically, although each particle has means for storing local state and for generating random numbers. Field A function assigning a value to every particle in an amorphous computer. Gradient A basic amorphous computing primitive that estimates the distance from each particle to the nearest particle designated as a source of the gradient. Definition of the Subject The goal of amorphous computing is to identify organizational principles and create programming technologies for obtaining intentional, pre-specified behavior from the cooperation of myriad unreliable parts that are arranged in unknown, irregular, and time-varying ways. The heightened relevance of amorphous computing today stems from the emergence of new technologies that could serve as substrates for information processing systems of

immense power at unprecedentedly low cost, if only we could master the challenge of programming them. Introduction Even as the foundations of computer science were being laid, researchers could hardly help noticing the contrast between the robustness of natural organisms and the fragility of the new computing devices. As John von Neumann remarked in 1948 [53]: With our artificial automata we are moving much more in the dark than nature appears to be with its organisms. We are, and apparently, at least at present, have to be much more ‘scared’ by the occurrence of an isolated error and by the malfunction which must be behind it. Our behavior is clearly that of overcaution, generated by ignorance. Amorphous computing emerged as a field in the mid1990s, from the convergence of three factors:  Inspiration from the cellular automata models for fundamental physics [13,34].  Hope that understanding the robustness of biological development could both help overcome the brittleness typical of computer systems and also illuminate the mechanisms of developmental biology.  The prospect of nearly free computers in vast quantities.

Microfabrication One technology that has come to fruition over the past decade is micro-mechanical electronic component manufacture, which integrates logic circuits, micro-sensors, actuators, and communication on a single chip. Aggregates of these can be manufactured extremely inexpensively, provided that not all the chips need work correctly, and that there is no need to arrange the chips into precise geometrical configurations or to establish precise interconnections among them. A decade ago, researchers envisioned smart dust elements small enough to be borne on air currents to form clouds of communicating sensor particles [26]. Airborne sensor clouds are still a dream, but networks of millimeter-scale particles are now commercially available for environmental monitoring applications [40]. With low enough manufacturing costs, we could mix such particles into bulk materials to form coatings like “smart paint” that can sense data and communicate its actions to the outside world. A smart paint coating on a wall could sense

147

148

Amorphous Computing

vibrations, monitor the premises for intruders, or cancel noise. Bridges or buildings coated with smart paint could report on traffic and wind loads and monitor structural integrity. If the particles have actuators, then the paint could even heal small cracks by shifting the material around. Making the particles mobile opens up entire new classes of applications that are beginning to be explored by research in swarm robotics [35] and modular robotics [49].

Digital computers have always been constructed to behave as precise arrangements of reliable parts, and almost all techniques for organizing computations depend upon this precision and reliability. Amorphous computing seeks to discover new programming methods that do not require precise control over the interaction or arrangement of the individual computing elements and to instantiate these techniques in new programming languages.

Cellular Engineering

The Amorphous Computing Model

The second disruptive technology that motivates the study of amorphous computing is microbiology. Biological organisms have served as motivating metaphors for computing since the days of calculating engines, but it is only over the past decade that we have begun to see how biology could literally be a substrate for computing, through the possibility of constructing digital-logic circuits within individual living cells. In one technology logic signals are represented not by electrical voltages and currents, but by concentrations of DNA binding proteins, and logic elements are realized as binding sites where proteins interact through promotion and repression. As a simple example, if A and B are proteins whose concentrations represent logic levels, then an “inverter” can be implemented in DNA as a genetic unit through which A serves as a repressor that blocks the production of B [29,54,55,61]. Since cells can reproduce themselves and obtain energy from their environment, the resulting information processing units could be manufactured in bulk at a very low cost. There is beginning to take shape a technology of cellular engineering that can tailor-make programmable cells to function as sensors or delivery vehicles for pharmaceuticals, or even as chemical factories for the assembly of nanoscale structures. Researchers in this emerging field of synthetic biology are starting to assemble registries of standard logical components implemented in DNA that can be inserted into E. coli [48]. The components have been engineered to permit standard means of combining them, so that biological logic designers can assemble circuits in a mix-and-match way, similar to how electrical logic designers create circuits from standard TTL parts. There’s even an International Genetically Engineered Machine Competition where student teams from universities around the world compete to create novel biological devices from parts in the registry [24]. Either of these technologies—microfabricated particles or engineered cells—provides a path to cheaply fabricate aggregates of massive numbers of computing elements. But harnessing these for computing is quite a different matter, because the aggregates are unstructured.

Amorphous computing models the salient features of an unstructured aggregate through the notion of an amorphous computer, a collection of computational particles dispersed irregularly on a surface or throughout a volume, where individual particles have no a priori knowledge of their positions or orientations. The particles are possibly faulty, may contain sensors and effect actions, and in some applications might be mobile. Each particle has modest computing power and a modest amount of memory. The particles are not synchronized, although they are all capable of operating at similar speeds, since they are fabricated by the same process. All particles are programmed identically, although each particle has means for storing local state and for generating random numbers. There may also be several distinguished particles that have been initialized to particular states. Each particle can communicate with a few nearby neighbors. In an electronic amorphous computer the particles might communicate via short-distance radio, whereas bioengineered cells might communicate by chemical signals. Although the details of the communication model can vary, the maximum distance over which two particles can communicate effectively is assumed to be small compared with the size of the entire amorphous computer. Communication is assumed to be unreliable and a sender has no assurance that a message has been received (Higher-level protocols with message acknowledgement can be built on such unreliable channels). We assume that the number of particles may be very large (on the order of 106 to 1012 ). Algorithms appropriate to run on an amorphous computer should be relatively independent of the number of particles: the performance should degrade gracefully as the number of particles decreases. Thus, the entire amorphous computer can be regarded as a massively parallel computing system, and previous investigations into massively parallel computing, such as research in cellular automata, are one source of ideas for dealing with amorphous computers. However, amorphous computing differs from investigations into cellular automata, because amorphous mechanisms must be

Amorphous Computing

independent of the detailed configuration, reliability, and synchronization of the particles. Programming Amorphous Systems A central theme in amorphous computing is the search for programming paradigms that work within the amorphous model. Here, biology has been a rich source of metaphors for inspiring new programming techniques. In embryonic development, even though the precise arrangements and numbers of the individual cells are highly variable, the genetic “programs” coded in DNA nevertheless produce well-defined intricate shapes and precise forms. Amorphous computers should be able to achieve similar results. One technique for programming an amorphous computer uses diffusion. One particle (chosen by some symmetry-breaking process) broadcasts a message. This message is received by each of its neighbors, which propagate it to their neighbors, and so on, to create a wave that spreads throughout the system. The message contains a count, and each particle stores the received count and increments it before re-broadcasting. Once a particle has stored its count, it stops re-broadcasting and ignores future count messages. This count-up wave gives each particle a rough measure of its distance from the original source. One can also produce regions of controlled size, by having the count message relayed only if the count is below a designated bound. Two such count-up waves can be combined to identify a chain of particles between two given particles A and B. Particle A begins by generating a count-up wave as above. This time, however, each intermediate particle, when it receives its count, performs a handshake to identify its “predecessor”—the particle from which it received the count (and whose own count will therefore be one less). When the wave of count messages reaches B, B sends a “successor” message, informing its predecessor that it should become part of the chain and should send a message to its predecessor, and so on, all the way back to A. Note that this method works, even though the particles are irregularly distributed, provided there is a path from A to B. The motivating metaphor for these two programs is chemical gradient diffusion, which is a foundational mechanism in biological development [60]. In nature, biological mechanisms not only generate elaborate forms, they can also maintain forms and repair them. We can modify the above amorphous line-drawing program so that it produces self-repairing line: first, particles keep rebroadcasting their count and successor messages. Second, the status of a particle as having a count or being in the chain decays over time unless it is refreshed by new mes-

sages. That is, a particle that stops hearing successor messages intended for it will eventually revert to not being in the chain and will stop broadcasting its own successor messages. A particle that stops hearing its count being broadcast will start acting as if it never had a count, pick up a new count from the messages it hears, and start broadcasting the count messages with the new count. Clement and Nagpal [12] demonstrated that this mechanism can be used to generate self-repairing lines and other patterns, and even re-route lines and patterns around “dead” regions where particles have stopped functioning. The relationship with biology flows in the other direction as well: the amorphous algorithm for repair is a model which is not obviously inconsistent with the facts of angiogenesis in the repair of wounds. Although the existence of the algorithm has no bearing on the facts of the matter, it may stimulate systems-level thinking about models in biological research. For example, Patel et al. use amorphous ideas to analyze the growth of epithelial cells [43]. Amorphous Computing Paradigms Amorphous computing is still in its infancy. Most of linguistic investigations based on the amorphous computing model have been carried out in simulation. Nevertheless, this work has yielded a rich variety of programming paradigms that demonstrate that one can in fact achieve robustness in face of the unreliability of individual particles and the absence of precise organization among them. Marker Propagation for Amorphous Particles Weiss’s Microbial Colony Language [55] is a marker propagation language for programming the particles in an amorphous computer. The program to be executed, which is the same for each particle, is constructed as a set of rules. The state of each particle includes a set of binary markers, and rules are enabled by boolean combinations of the markers. The rules, which have the form (trigger, condition, action) are triggered by the receipt of labelled messages from neighboring particles. A rule may test conditions, set or clear various markers, and it broadcast further messages to its neighbors. Each message carries a count that determines how far it will diffuse, and each marker has a lifetime that determines how long its value lasts. Supporting these language’s rules is a runtime system that automatically propagates messages and manages the lifetimes of markers, so that the programmer need not deal with these operations explicitly. Weiss’s system is powerful, but the level of abstraction is very low. This is because it was motivated by cellular engineering—as something that can be directly im-

149

150

Amorphous Computing

Amorphous Computing, Figure 1 A Microbial Colony Language program organizes a tube into a structure similar to that of somites in the developing vertebrate (from [55]).

Amorphous Computing, Figure 2 A pattern generated by GPL whose shape mimics a chain of CMOS inverters (from [15])

plemented by genetic regulatory networks. The language is therefore more useful as a tool set in which to implement higher-level languages such as GPL (see below), serving as a demonstration that in principle, these higher-level languages can be implemented by genetic regulatory networks as well. Figure 1 shows an example simulation programmed in this language, that organizes an initially undifferentiated column of particles into a structure with band of two alternating colors: a caricature of somites in developing vertebrae. The Growing Point Language Coore’s Growing Point Language (GPL) [15] demonstrates that an amorphous computer can be configured by a program that is common to all the computing elements to generate highly complex patterns, such as the pattern representing the interconnection structure of an arbitrary electrical circuit as shown in Fig. 2. GPL is inspired by a botanical metaphor based on growing points and tropisms. A growing point is a locus of activity in an amorphous computer. A growing point propagates through the computer by transferring its activity from one computing element to a neighbor.

As a growing point passes through the computer it effects the differentiation of the behaviors of the particles it visits. Particles secrete “chemical” signals whose countup waves define gradients, and these attract or repel growing points as directed by programmer-specific “tropisms”. Coore demonstrated that these mechanisms are sufficient to permit amorphous computers to generate any arbitrary prespecified graph structure pattern, up to topology. Unlike real biology, however, once a pattern has been constructed, there is no clear mechanism to maintain it in the face of changes to the material. Also, from a programming linguistic point of view, there is no clear way to compose shapes by composing growing points. More recently, Gayle and Coore have shown how GPL may be extended to produce arbitrarily large patterns such as arbitrary text strings[21]. D’Hondt and D’Hondt have explored the use of GPL for geometrical constructions and its relations with computational geometry [18,19]. Origami-Based Self-Assembly Nagpal [38] developed a prototype model for controlling programmable materials. She showed how to organize a program to direct an amorphous sheet of deformable particles to cooperate to construct a large family

Amorphous Computing

Amorphous Computing, Figure 3 Folding an envelope structure (from [38]). A pattern of lines is constructed according to origami axioms. Elements then coordinate to fold the sheet using an actuation model based on epithelial cell morphogenesis. In the figure, black indicates the front side of the sheet, grey indicates the back side, and the various colored bands show the folds and creases that are generated by the amorphous process. The small white spots show gaps in the sheet cause by “dead” or missing cells—the process works despite these

of globally-specified predetermined shapes. Her method, which is inspired by the folding of epithelial tissue, allows a programmer to specify a sequence of folds, where the set of available folds is sufficient to create any origami shape (as shown by Huzita’s axioms for origami [23]). Figure 3 shows a sheet of amorphous particles, where particles can

cooperate to create creases and folds, assembling itself into the well-known origami “cup” structure. Nagpal showed how this language of folds can be compiled into a a low-level program that can be distributed to all of the particles of the amorphous sheet, similar to Coore’s GPL or Weiss’s MCL. With a few differences of initial state (for example, particles at the edges of the sheet know that they are edge particles) the particles run their copies of the program, interact with their neighbors, and fold up to make the predetermined shape. This technique is quite robust. Nagpal studied the range of shapes that can be constructed using her method, and on their sensitivity to errors of communication, random cell death, and density of the cells. As a programming framework, the origami language has more structure than the growing point language, because the origami methods allow composition of shape constructions. On the other hand, once a shape has been constructed, there is no clear mechanism to maintain existing patterns in the face of changes to the material. Dynamic Recruitment In Butera [10]’s “paintable computing”, processes dynamically recruit computational particles from an amorphous computer to implement their goals. As one of his examples Butera uses dynamic recruitment to implement a robust storage system for streaming audio and images (Fig. 4). Fragments of the image and audio stream circulate freely through the amorphous computer and are marshaled to a port when needed. The audio fragments also sort themselves into a playable stream as they migrate to the port. To enhance robustness, there are multiple copies of the lower resolution image fragments and fewer copies of the higher resolution image fragments. Thus, the image is hard to destroy; with lost fragments the image is degraded but not destroyed.

Amorphous Computing, Figure 4 Butera dynamically controls the flow of information through an amorphous computer. In a image fragments spread through the computer so that a degraded copy can be recovered from any segment; the original image is on the left, the blurry copy on the right has been recovered from the small region shown below. In b audio fragments sort themselves into a playable stream as they migrate to an output port; cooler colors are earlier times (from [10]).

151

152

Amorphous Computing

Amorphous Computing, Figure 5 A tracking program written in Proto sends the location of a target region (orange) to a listener (red) along a channel (small red dots) in the network (indicated by green lines). The continuous space and time abstraction allows the same program to run at different resolutions

Clement and Nagpal [12] also use dynamic recruitment in the development of active gradients, as described below. Growth and Regeneration Kondacs [30] showed how to synthesize arbitrary twodimensional shapes by growing them. These computing units, or “cells”, are identically-programmed and decentralized, with the ability to engage in only limited, local communication. Growth is implemented by allowing cells to multiply. Each of his cells may create a child cell and place it randomly within a ring around the mother cell. Cells may also die, a fact which Kondacs puts to use for temporary scaffolding when building complex shapes. If a structure requires construction of a narrow neck between two other structures it can be built precisely by laying down a thick connection and later trimming it to size. Attributes of this system include scalability, robustness, and the ability for self-repair. Just as a starfish can regenerate its entire body from part of a limb, his system can self-repair in the event of agent death: his sphere-network representation allows the structure to be grown starting from any sphere, and every cell contains all necessary information for reproducing the missing structure. Abstraction to Continuous Space and Time The amorphous model postulates computing particles distributed throughout a space. If the particles are dense, one can imagine the particles as actually filling the space, and create programming abstractions that view the space itself as the object being programmed, rather than the collection of particles. Beal and Bachrach [1,7] pursued this approach by creating a language, Proto, where programmers specify

the behavior of an amorphous computer as though it were a continuous material filling the space it occupies. Proto programs manipulate fields of values spanning the entire space. Programming primitives are designed to make it simple to compile global operations to operations at each point of the continuum. These operations are approximated by having each device represent a nearby chunk of space. Programs are specified in space and time units that are independent of the distribution of particles and of the particulars of communication and execution on those particles (Fig. 5). Programs are composed functionally, and many of the details of communication and composition are made implicit by Proto’s runtime system, allowing complex programs to be expressed simply. Proto has been applied to applications in sensor networks like target tracking and threat avoidance, to swarm robotics and to modular robotics, e. g., generating a planar wave for coordinated actuation. Newton’s language Regiment [41,42] also takes a continuous view of space and time. Regiment is organized in terms of stream operations, where each stream represents a time-varying quantity over a part of space, for example, the average value of the temperature over a disc of a given radius centered at a designated point. Regiment, also a functional language, is designed to gather streams of data from regions of the amorphous computer and accumulate them at a single point. This assumption allows Regiment to provide region-wide summary functions that are difficult to implement in Proto.

Primitives for Amorphous Computing The previous section illustrated some paradigms that have been developed for programming amorphous systems, each paradigm building on some organizing metaphor.

Amorphous Computing

But eventually, meeting the challenge of amorphous systems will require a more comprehensive linguistic framework. We can approach the task of creating such a framework following the perspective in [59], which views languages in terms of primitives, means of combination, and means of abstraction. The fact that amorphous computers consist of vast numbers of unreliable and unsynchronized particles, arranged in space in ways that are locally unknown, constrains the primitive mechanisms available for organizing cooperation among the particles. While amorphous computers are naturally massively parallel, the kind of computation that they are most suited for is parallelism that does not depend on explicit synchronization and the use of atomic operations to control concurrent access to resources. However, there are large classes of useful behaviors that can be implemented without these tools. Primitive mechanisms that are appropriate for specifying behavior on amorphous computers include gossip, random choice, fields, and gradients. Gossip Gossip, also known as epidemic communication [17,20], is a simple communication mechanism. The goal of a gossip computation is to obtain an agreement about the value of some parameter. Each particle broadcasts its opinion of the parameter to its neighbors, and computation is performed by each particle combining the values that it receives from its neighbors, without consideration of the identification of the source. If the computation changes a particle’s opinion of the value, it rebroadcasts its new opinion. The process concludes when the are no further broadcasts. For example, an aggregate can agree upon the minimum of the values held by all the particles as follows. Each particle broadcasts its value. Each recipient compares its current value with the value that it receives. If the received value is smaller than its current value, it changes its current value to that minimum and rebroadcasts the new value. The advantage of gossip is that it flows in all directions and is very difficult to disrupt. The disadvantage is that the lack of source information makes it difficult to revise a decision. Random Choice Random choice is used to break symmetry, allowing the particles to differentiate their behavior. The simplest use of random choice is to establish local identity of particles: each particle chooses a random number to identify

itself to its neighbors. If the number of possible choices is large enough, then it is unlikely that any nearby particles will choose the same number, and this number can thus be used as an identifier for the particle to its neighbors. Random choice can be combined with gossip to elect leaders, either for the entire system or for local regions. (If collisions in choice can be detected, then the number of choices need not be much higher than then number of neighbors. Also, using gossip to elect leaders makes sense only when we expect a leader to be long-lived, due to the difficulty of changing the decision to designate a replacement leader.) To elect a single leader for the entire system, every particle chooses a value, then gossips to find the minimum. The particle with the minimum value becomes the leader. To elect regional leaders, we instead use gossip to carry the identity of the first leader a particle has heard of. Each particle uses random choice as a “coin flip” to decide when to declare itself a leader; if the flip comes up heads enough times before the particle hears of another leader, the particle declares itself a leader and broadcasts that fact to its neighbors. The entire system is thus broken up into contiguous domains of particles who first heard some particular particle declare itself a leader. One challenge in using random choice on an amorphous computer is to ensure that the particulars of particle distribution do not have an unexpected effect on the outcome. For example, if we wish to control the expected size of the domain that each regional leader presides over, then the probability of becoming a leader must depend on the density of the particles. Fields Every component of the state of the computational particles in an amorphous computer may be thought of as a field over the discrete space occupied by those particles. If the density of particles is large enough this field of values may be thought of as an approximation of a field on the continuous space. We can make amorphous models that approximate the solutions of the classical partial-differential equations of physics, given appropriate boundary conditions. The amorphous methods can be shown to be consistent, convergent and stable. For example, the algorithm for solving the Laplace equation with Dirichlet conditions is analogous to the way it would be solved on a lattice. Each particle must repeatedly update the value of the solution to be the average of the solutions posted by its neighbors, but the boundary points must not change their values. This algorithm will

153

154

Amorphous Computing

Amorphous Computing, Figure 6 A line being maintained by active gradients, from [12]. A line (black) is constructed between two anchor regions (dark grey) based on the active gradient emitted by the right anchor region (light grays). The line is able to rapidly repair itself following failures because the gradient actively maintains itself

eventually converge, although very slowly, independent of the order of the updates and the details of the local connectedness of the network. There are optimizations, such as over-relaxation, that are just as applicable in the amorphous context as on a regular grid. Katzenelson [27] has shown similar results for the diffusion equation, complete with analytic estimates of the errors that arise from the discrete and irregularly connected network. In the diffusion equation there is a conserved quantity, the amount of material diffusing. Rauch [47] has shown how this can work with the wave equation, illustrating that systems that conserve energy and momentum can also be effectively modeled with an amorphous computer. The simulation of the wave equation does require that the communicating particles know their relative positions, but it is not hard to establish local coordinate systems. Gradients An important primitive in amorphous computing is the gradient, which estimates the distance from each particle to the nearest particle designated as a source. The gradient is inspired by the chemical-gradient diffusion process that is crucial to biological development. Amorphous computing builds on this idea, but does not necessarily compute the distance using diffusion because simulating diffusion can be expensive. The common alternative is a linear-time mechanism that depends on active computation and relaying of information rather than passive diffusion. Calculation of a gradient starts with each source particle setting its distance estimate to zero, and every other particle setting its distance

estimate to infinity. The sources then broadcast their estimates to their neighbors. When a particle receives a message from its neighbor, it compares its current distance estimate to the distance through its neighbor. If the distance through its neighbor is less, it chooses that to be its estimate, and broadcasts its new estimate onwards. Although the basic form of the gradient is simple, there are several ways in which gradients can be varied to better match the context in which they are used. These choices may be made largely independently, giving a wide variety of options when designing a system. Variations which have been explored include: Active Gradients An active gradient [12,14,16] monitors its validity in the face of changing sources and device failure, and maintains correct distance values. For example, if the supporting sources disappear, the gradient is deallocated. A gradient may also carry version information, allowing its source to change more smoothly. Active gradients can provide self-repairing coordinate systems, as a foundation for robust construction of patterns. Figure 6 shows the “count-down wave” line described above in the introduction to this article. The line’s implementation in terms of active gradients provides for self-repair when the underlying amorphous computer is damaged. Polarity A gradient may be set to count down from a positive value at the source, rather than to count up from zero. This bounds the distance that the gradient can span, which can help limit resource usage, but may limit the scalability of programs.

Amorphous Computing

Adaptivity As described above, a gradient relaxes once to a distance estimate. If communication is expensive and precision unimportant, the gradient can take the first value that arrives and ignore all subsequent values. If we want the gradient to adapt to changes in the distribution of particles or sources, then the particles need to broadcast at regular intervals. We can then have estimates that converge smoothly to precise estimates by adding a restoring force which acts opposite to the relaxation, allowing the gradient to rise when unconstrained by its neighbors. If, on the other hand, we value adaptation speed over smoothness, then each particle can recalculate its distance estimate from scratch with each new batch of values. Carrier Normally, the distance value calculated by the gradient is the signal we are interested in. A gradient may instead be used to carry an arbitrary signal outward from the source. In this case, the value at each particle is the most recently arrived value from the nearest source. Distance Measure A gradient’s distance measure is, of course, dependent on how much knowledge we have about the relative positions of neighbors. It is sometimes advantageous to discard good information and use only hopcount values, since it is easier to make an adaptive gradient using hop-count values. Non-linear distance measures are also possible, such as a count-down gradient that decays exponentially from the source. Finally, the value of a gradient may depend on more sources than the nearest (this is the case for a chemical gradient), though this may be very expensive to calculate. Coordinates and Clusters Computational particles may be built with restrictions about what can be known about local geometry. A particle may know that it can reliably communicate with a few neighbors. If we assume that these neighbors are all within a disc of some approximate communication radius then distances to others may be estimated by minimum hop count [28]. However, it is possible that more elaborate particles can estimate distances to near neighbors. For example, the Cricket localization system [45] uses the fact that sound travels more slowly than radio, so the distance is estimated by the difference in time of arrival between simultaneously transmitted signals. McLurkin’s swarmbots [35] use the ISIS communication system that gives bearing and range information. However, it is possible for a sufficiently dense amorphous computer to produce local coordinate systems for its particles with even the crudest method of determining distances. We can make an atlas of overlap-

ping coordinate systems, using random symmetry breaking to make new starting baselines [3]. These coordinate systems can be combined and made consistent to form a manifold, even if the amorphous computer is not flat or simply connected. One way to establish coordinates is to choose two initial particles that are a known distance apart. Each one serves as the source of a gradient. A pair of rectangular axes can be determined by the shortest path between them and by a bisector constructed where the two gradients are equal. These may be refined by averaging and calibrated using the known distance between the selected particles. After the axes are established, they may source new gradients that can be combined to make coordinates for the region near these axes. The coordinate system can be further refined using further averaging. Other natural coordinate constructions are bipolar elliptical. This kind of construction was pioneered by Coore [14] and Nagpal [37]. Katzenelson [27] did early work to determine the kind of accuracy that can be expected from such a construction. Spatial clustering can be accomplished with any of a wide variety of algorithms, such as the clubs algorithm [16], LOCI [36], or persistent node partitioning [4]. Clusters can themselves be clustered, forming a hierarchical clustering of logarithmic height. Means of Combination and Abstraction A programming framework for amorphous systems requires more than primitive mechanisms. We also need suitable means of combination, so that programmers can combine behaviors to produce more complex behaviors, and means of abstraction so that the compound behaviors can be named and manipulated as units. Here are a few means of combination that have been investigated with amorphous computing. Spatial and Temporal Sequencing Several behaviors can be strung together in a sequence. The challenge in controlling such a sequence is to determine when one phase has completed and it is safe to move on to the next. Trigger rules can be used to detect completion locally. In Coore’s Growing Point Language [15], all of the sequencing decisions are made locally, with different growing points progressing independently. There is no difficulty of synchronization in this approach because the only time when two growing points need to agree is when they have become spatially coincident. When growing points merge, the independent processes are automatically synchronized.

155

156

Amorphous Computing

Nagpal’s origami language [38] has long-range operations that cannot overlap in time unless they are in non-interacting regions of the space. The implementation uses barrier synchronization to sequence the operations: when completion is detected locally, a signal is propagated throughout a marked region of the sheet, and the next operation begins after a waiting time determined by the diameter of the sheet. With adaptive gradients, we can use the presence of an inducing signal to run an operation. When the induction signal disappears, the operation ceases and the particles begin the next operation. This allows sequencing to be triggered by the last detection of completion rather than by the first. Pipelining If a behavior is self-stabilizing (meaning that it converges to a correct state from any arbitrary state) then we can use it in a sequence without knowing when the previous phase completes. The evolving output of the previous phase serves as the input of this next phase, and once the preceding behavior has converged, the self-stabilizing behavior will converge as well. If the previous phase evolves smoothly towards its final state, then by the time it has converged, the next phase may have almost converged as well, working from its partial results. For example, the coordinate system mechanism described above can be pipelined; the final coordinates are being formed even as the farther particles learn that they are not on one of the two axes. Restriction to Spatial Regions Because the particles of an amorphous computer are distributed in space it is natural to assign particular behaviors to specific spatial regions. In Beal and Bachrach’s work, restriction of a process to a region is a primitive[7]. As another example, when Nagpal’s system folds an origami construction, regions on different faces may differentiate so that they fold in different patterns. These folds may, if the physics permits, be performed simultaneously. It may be necessary to sequence later construction that depends on the completion of the substructures. Regions of space can be named using coordinates, clustering, or implicitly through calculations on fields. Indeed, one could implement solid modelling on an amorphous computer. Once a region is identified, a particle can test whether it is a member of that region when deciding whether to run a behavior. It is also necessary to specify how a particle should change its behavior if its membership in a region may vary with time.

Modularity and Abstraction Standard means of abstraction may be applied in an amorphous computing context, such as naming procedures, data structures, and processes. The question for amorphous computing is what collection of entities is useful to name. Because geometry is essential in an amorphous computing context, it becomes appropriate to describe computational processes in terms of geometric entities. Thus there are new opportunities for combining geometric structures and naming the combinations. For example, it is appropriate to compute with, combine, and name regions of space, intervals of time, and fields defined on them [7,42]. It may also be useful to describe the propagation of information through the amorphous computer in terms of the light cone of an event [2]. Not all traditional abstractions extend nicely to an amorphous computing context because of the challenges of scale and the fallibility of parts and interconnect. For example, atomic transactions may be excessively expensive in an amorphous computing context. And yet, some of the goals that a programmer might use atomic transactions to accomplish, such as the approximate enforcement of conservation laws, can be obtained using techniques that are compatible with an amorphous environment, as shown by Rauch [47]. Supporting Infrastructure and Services Amorphous computing languages, with their primitives, means of combination, and means of abstraction, rest on supporting services. One example, described above, is the automatic message propagation and decay in Weiss’s Microbial Colony Language [55]. MCL programs do not need to deal with this explicitly because it is incorporated into the operating system of the MCL machine. Experience with amorphous computing is beginning to identify other key services that amorphous machines must supply. Particle Identity Particles must be able to choose identifiers for communicating with their neighbors. More generally, there are many operations in an amorphous computation where the particles may need to choose numbers, with the property that individual particles choose different numbers. If we are willing to pay the cost, it is possible to build unique identifiers into the particles, as is done with current macroscopic computers. We need only locally unique identifiers, however, so we can obtain them using pseudorandom-number generators. On the surface of it,

Amorphous Computing

this may seem problematic, since the particles in the amorphous are assumed to be manufactured identically, with identical programs. There are, however, ways to obtain individualized random numbers. For example, the particles are not synchronized, and they are not really physically identical, so they will run at slightly different rates. This difference is enough to allow pseudorandom-number generators to get locally out of synch and produce different sequences of numbers. Amorphous computing particles that have sensors may also get seeds for their pseudorandomnumber generators from sensor noise. Local Geometry and Gradients Particles must maintain connections with their neighbors, tracking who they can reliably communicate with, and whatever local geometry information is available. Because particles may fail or move, this information needs to be maintained actively. The geometry information may include distance and bearing to each neighbor, as well as the time it takes to communicate with each neighbor. But many implementations will not be able to give significant distance or bearing information. Since all of this information may be obsolete or inaccurately measured, the particles must also maintain information on how reliable each piece of information is. An amorphous computer must know the dimension of the space it occupies. This will generally be a constant—either the computer covers a surface or fills a volume. In rare cases, however, the effective dimension of a computer may change: for example, paint is threedimensional in a bucket and two-dimensional once applied. Combining this information with how the number of accessible correspondents changes with distance, an amorphous process can derive curvature and local density information. An amorphous computer should also support gradient propagation as part of the infrastructure: a programmer should not have to explicitly deal with the propagation of gradients (or other broadcast communications) in each particle. A process may explicitly initiate a gradient, or explicitly react to one that it is interested in, but the propagation of the gradient through a particle should be automatically maintained by the infrastructure. Implementing Communication Communication between neighbors can occur through any number of mechanisms, each with its own set of properties: amorphous computing systems have been built that communicate through directional infrared [35], RF broadcast [22], and low-speed serial cables [9,46]. Simulated sys-

tems have also included other mechanisms such as signals superimposed on the power connections [11] and chemical diffusion [55]. Communication between particles can be made implicit with a neighborhood shared memory. In this arrangement, each particle designates some of its internal state to be shared with its neighbors. The particles regularly communicate, giving each particle a best-effort view of the exposed portions of the states of its neighbors. The contents of the exposed state may be specified explicitly [10,35,58] or implicitly [7]. The shared memory allows the system to be tuned by trading off communication rate against the quality of the synchronization, and decreasing transmission rates when the exposed state is not changing. Lessons for Engineering As von Neumann remarked half a century ago, biological systems are strikingly robust when compared with our artificial systems. Even today software is fragile. Computer science is currently built on a foundation that largely assumes the existence of a perfect infrastructure. Integrated circuits are fabricated in clean-room environments, tested deterministically, and discarded if even a single defect is uncovered. Entire software systems fail with single-line errors. In contrast, biological systems rely on local computation, local communication, and local state, yet they exhibit tremendous resilience. Although this contrast is most striking in computer science, amorphous computing can provide lessons throughout engineering. Amorphous computing concentrates on making systems flexible and adaptable at the expense of efficiency. Amorphous computing requires an engineer to work under extreme conditions. The engineer must arrange the cooperation of vast numbers of identical computational particles to accomplish prespecified goals, but may not depend upon the numbers. We may not depend on any prespecified interconnect of the particles. We may not depend on synchronization of the particles. We may not depend on the stability of the communications system. We may not depend on the long-term survival of any individual particles. The combination of these obstacles forces us to abandon many of the comforts that are available in more typical engineering domains. By restricting ourselves in this way we obtain some robustness and flexibility, at the cost of potentially inefficient use of resources, because the algorithms that are appropriate are ones that do not take advantage of these assumptions. Algorithms that work well in an amorphous context depend on the average behavior of participating particles. For example, in Nagpal’s origami system a fold that

157

158

Amorphous Computing

is specified will be satisfactory if it is approximately in the right place and if most of the particles on the specified fold line agree that they are part of the fold line: dissenters will be overruled by the majority. In Proto a programmer can address only regions of space, assumed to be populated by many particles. The programmer may not address individual particles, so failures of individual particles are unlikely to make major perturbations to the behavior of the system. An amorphous computation can be quite immune to details of the macroscopic geometry as well as to the interconnectedness of the particles. Since amorphous computations make their own local coordinate systems, they are relatively independent of coordinate distortions. In an amorphous computation we accept a wide range of outcomes that arise from variations of the local geometry. Tolerance of local variation can lead to surprising flexibility: the mechanisms which allow Nagpal’s origami language to tolerate local distortions allow programs to distort globally as well, and Nagpal shows how such variations can account for the variations in the head shapes of related species of Drosophila [38]. In Coore’s language one specifies the topology of the pattern to be constructed, but only limited information about the geometry. The topology will be obtained, regardless of the local geometry, so long as there is sufficient density of particles to support the topology. Amorphous computations based on a continuous model of space (as in Proto) are naturally scale independent. Since an amorphous computer is composed of unsynchronized particles, a program may not depend upon a priori timing of events. The sequencing of phases of a process must be determined by either explicit termination signals or with times measured dynamically. So amorphous computations are time-scale independent by construction. A program for an amorphous computer may not depend on the reliability of the particles or the communication paths. As a consequence it is necessary to construct the program so as to dynamically compensate for failures. One way to do this is to specify the result as the satisfaction of a set of constraints, and to build the program as a homeostatic mechanism that continually drives the system toward satisfaction of those constraints. For example, an active gradient continually maintains each particle’s estimate of the distance to the source of the gradient. This can be used to establish and maintain connections in the face of failures of particles or relocation of the source. If a system is specified in this way, repair after injury is a continuation of the development process: an injury causes some constraints to become unsatisfied, and the development process builds new structure to heal the injury.

By restricting the assumptions that a programmer can rely upon we increase the flexibility and reliability of the programs that are constructed. However, it is not yet clear how this limits the range of possible applications of amorphous computing. Future Directions Computer hardware is almost free, and in the future it will continue to decrease in price and size. Sensors and actuators are improving as well. Future systems will have vast numbers of computing mechansisms with integrated sensors and actuators, to a degree that outstrips our current approaches to system design. When the numbers become large enough, the appropriate programming technology will be amorphous computing. This transition has already begun to appear in several fields: Sensor networks The success of sensor network research has encouraged the planning and deployment of everlarger numbers of devices. The ad-hoc, time-varying nature of sensor networks has encouraged amorphous approaches, such as communication through directed diffusion [25] and Newton’s Regiment language [42]. Robotics Multi-agent robotics is much like sensor networks, except that the devices are mobile and have actuators. Swarm robotics considers independently mobile robots working together as a team like ants or bees, while modular robotics consider robots that physically attach to one another in order to make shapes or perform actions, working together like the cells of an organism. Gradients are being used to create “flocking” behaviors in swarm robotics [35,44]. In modular robotics, Stoy uses gradients to create shapes [51] while De Rosa et al. form shapes through stochastic growth and decay [49]. Pervasive computing Pervasive computing seeks to exploit the rapid proliferation of wireless computing devices throughout our everyday environment. Mamei and Zambonelli’s TOTA system [32] is an amorphous computing implementation supporting a model of programming using fields and gradients [33]. Servat and Drogoul have suggested combining amorphous computing and reactive agent-based systems to produce something they call “pervasive intelligence” [50]. Multicore processors As it becomes more difficult to increase processor speed, chip manufacturers are looking for performance gains through increasing the number of processing cores per chip. Butera’s work [10] looks toward a future in which there are thousands of cores per chip and it is no longer rea-

Amorphous Computing

sonable to assume they are all working or have them communicate all-to-all. While much of amorphous computing research is inspired by biological observations, it is also likely that insights and lessons learned from programming amorphous computers will help elucidate some biological problems [43]. Some of this will be stimulated by the emerging engineering of biological systems. Current work in synthetic biology [24,48,61] is centered on controlling the molecular biology of cells. Soon synthetic biologists will begin to engineer biofilms and perhaps direct the construction of multicellular organs, where amorphous computing will become an essential technological tool.

Bibliography Primary Literature 1. Bachrach J, Beal J (2006) Programming a sensor network as an amorphous medium. In: DCOSS 2006 Posters, June 2006 2. Bachrach J, Beal J, Fujiwara T (2007) Continuous space-time semantics allow adaptive program execution. In: IEEE International Conference on Self-Adaptive and Self-Organizing Systems, 2007 3. Bachrach J, Nagpal R, Salib M, Shrobe H (2003) Experimental results and theoretical analysis of a self-organizing global coordinate system for ad hoc sensor networks. Telecommun Syst J, Special Issue on Wireless System Networks 26(2–4):213–233 4. Beal J (2003) A robust amorphous hierarchy from persistent nodes. In: Commun Syst Netw 5. Beal J (2004) Programming an amorphous computational medium. In: Unconventional Programming Paradigms International Workshop, September 2004 6. Beal J (2005) Amorphous medium language. In: Large-Scale Multi-Agent Systems Workshop (LSMAS). Held in Conjunction with AAMAS-05 7. Beal J, Bachrach J (2006) Infrastructure for engineered emergence on sensor/actuator networks. In: IEEE Intelligent Systems, 2006 8. Beal J, Sussman G (2005) Biologically-inspired robust spatial programming. Technical Report AI Memo 2005-001, MIT, January 2005 9. Beebee W M68hc11 gunk api book. http://www.swiss.ai.mit. edu/projects/amorphous/HC11/api.html. Accessed 31 May 2007 10. Butera W (2002) Programming a Paintable Computer. Ph D thesis, MIT 11. Campbell J, Pillai P, Goldstein SC (2005) The robot is the tether: Active, adaptive power routing for modular robots with unary inter-robot connectors. In: IROS 2005 12. Clement L, Nagpal R (2003) Self-assembly and self-repairing topologies. In: Workshop on Adaptability in Multi-Agent Systems, RoboCup Australian Open, January 2003 13. Codd EF (1968) Cellular Automata. Academic Press, New York 14. Coore D (1998) Establishing a coordinate system on an amorphous computer. In: MIT Student Workshop on High Performance Computing, 1998

15. Coore D (1999) Botanical Computing: A Developmental Approach to Generating Interconnect Topologies on an Amorphous Computer. Ph D thesis, MIT 16. Coore D, Nagpal R, Weiss R (1997) Paradigms for structure in an amorphous computer. Technical Report AI Memo 1614, MIT 17. Demers A, Greene D, Hauser C, Irish W, Larson J, Shenker S, Stuygis H, Swinehart D, Terry D (1987) Epidemic algorithms for replicated database maintenance. In: 7th ACM Symposium on Operating Systems Principles, 1987 18. D’Hondt E, D’Hondt T (2001) Amorphous geometry. In: ECAL 2001 19. D’Hondt E, D’Hondt T (2001) Experiments in amorphous geometry. In: 2001 International Conference on Artificial Intelligence 20. Ganesan D, Krishnamachari B, Woo A, Culler D, Estrin D, Wicker S (2002) An empirical study of epidemic algorithms in large scale multihop wireless networks. Technical Report IRB-TR-02003, Intel Research Berkeley 21. Gayle O, Coore D (2006) Self-organizing text in an amorphous environment. In: ICCS 2006 22. Hill J, Szewcyk R, Woo A, Culler D, Hollar S, Pister K (2000) System architecture directions for networked sensors. In: ASPLOS, November 2000 23. Huzita H, Scimemi B (1989) The algebra of paper-folding. In: First International Meeting of Origami Science and Technology, 1989 24. igem 2006: international genetically engineered machine competition (2006) http://www.igem2006.com. Accessed 31 May 2007 25. Intanagonwiwat C, Govindan R, Estrin D (2000) Directed diffusion: a scalable and robust communication paradigm for sensor networks. In: Mobile Computing and Networking, pp 56–67 26. Kahn JM, Katz RH, Pister KSJ (1999) Mobile networking for smart dust. In: ACM/IEEE Int. Conf. on Mobile Computing and Networking (MobiCom 99), August 1999 27. Katzenelson J (1999) Notes on amorphous computing. (Unpublished Draft) 28. Kleinrock L, Sylvester J (1978) Optimum transmission radii for packet radio networks or why six is a magic number. In: IEEE Natl Telecommun. Conf, December 1978, pp 4.3.1–4.3.5 29. Knight TF, Sussman GJ (1998) Cellular gate technology. In: First International Conference on Unconventional Models of Computation (UMC98) 30. Kondacs A (2003) Biologically-inspired self-assembly of 2d shapes, using global-to-local compilation. In: International Joint Conference on Artificial Intelligence (IJCAI) 31. Mamei M, Zambonelli F (2003) Spray computers: Frontiers of self-organization for pervasive computing. In: WOA 2003 32. Mamei M, Zambonelli F (2004) Spatial computing: the tota approach. In: WOA 2004, pp 126–142 33. Mamei M, Zambonelli F (2005) Physical deployment of digital pheromones through rfid technology. In: AAMAS 2005, pp 1353–1354 34. Margolus N (1988) Physics and Computation. Ph D thesis, MIT 35. McLurkin J (2004) Stupid robot tricks: A behavior-based distributed algorithm library for programming swarms of robots. Master’s thesis, MIT 36. Mittal V, Demirbas M, Arora A (2003) Loci: Local clustering service for large scale wireless sensor networks. Technical Report OSU-CISRC-2/03-TR07, Ohio State University

159

160

Amorphous Computing

37. Nagpal R (1999) Organizing a global coordinate system from local information on an amorphous computer. Technical Report AI Memo 1666, MIT 38. Nagpal R (2001) Programmable Self-Assembly: Constructing Global Shape using Biologically-inspired Local Interactions and Origami Mathematics. Ph D thesis, MIT 39. Nagpal R, Mamei M (2004) Engineering amorphous computing systems. In: Bergenti F, Gleizes MP, Zambonelli F (eds) Methodologies and Software Engineering for Agent Systems, The Agent-Oriented Software Engineering Handbook. Kluwer, New York, pp 303–320 40. Dust Networks. http://www.dust-inc.com. Accessed 31 May 2007 41. Newton R, Morrisett G, Welsh M (2007) The regiment macroprogramming system. In: International Conference on Information Processing in Sensor Networks (IPSN’07) 42. Newton R, Welsh M (2004) Region streams: Functional macroprogramming for sensor networks. In: First International Workshop on Data Management for Sensor Networks (DMSN), August 2004 43. Patel A, Nagpal R, Gibson M, Perrimon N (2006) The emergence of geometric order in proliferating metazoan epithelia. Nature 442:1038–1041 44. Payton D, Daily M, Estowski R, Howard M, Lee C (2001) Pheromone robotics. Autonomous Robotics 11:319–324 45. Priyantha N, Chakraborty A, Balakrishnan H (2000) The cricket location-support system. In: ACM International Conference on Mobile Computing and Networking (ACM MOBICOM), August 2000 46. Raffle H, Parkes A, Ishii H (2004) Topobo: A constructive assembly system with kinetic memory. CHI 47. Rauch E (1999) Discrete, amorphous physical models. Master’s thesis, MIT 48. Registry of standard biological parts. http://parts.mit.edu. Accessed 31 May 2007 49. De Rosa M, Goldstein SC, Lee P, Campbell J, Pillai P (2006) Scalable shape sculpting via hole motion: Motion planning in lattice-constrained module robots. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation (ICRA ‘06), May 2006

50. Servat D, Drogoul A (2002) Combining amorphous computing and reactive agent-based systems: a paradigm for pervasive intelligence? In: AAMAS 2002 51. Stoy K (2003) Emergent Control of Self-Reconfigurable Robots. Ph D thesis, University of Southern Denmark 52. Sutherland A (2003) Towards rseam: Resilient serial execution on amorphous machines. Master’s thesis, MIT 53. von Neumann J (1951) The general and logical theory of automata. In: Jeffress L (ed) Cerebral Mechanisms for Behavior. Wiley, New York, p 16 54. Weiss R, Knight T (2000) Engineered communications for microbial robotics. In: Sixth International Meeting on DNA Based Computers (DNA6) 55. Weiss R (2001) Cellular Computation and Communications using Engineered Genetic Regular Networks. Ph D thesis, MIT 56. Welsh M, Mainland G (2004) Programming sensor networks using abstract regions. In: Proceedings of the First USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI ’04), March 2004 57. Werfel J, Bar-Yam Y, Nagpal R (2005) Building patterned structures with robot swarms. In: IJCAI 58. Whitehouse K, Sharp C, Brewer E, Culler D (2004) Hood: a neighborhood abstraction for sensor networks. In: Proceedings of the 2nd international conference on Mobile systems, applications, and services. 59. Abelson H, Sussman GJ, Sussman J (1996) Structure and Interpretation of Computer Programs, 2nd edn. MIT Press, Cambridge 60. Ashe HL, Briscoe J (2006) The interpretation of morphogen gradients. Development 133:385–94 61. Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, Netravali I (2003) Genetic circuit building blocks for cellular computation, communications, and signal processing. Natural Computing 2(1):47–84

Books and Reviews Abelson H, Allen D, Coore D, Hanson C, Homsy G, Knight T, Nagpal R, Rauch E, Sussman G, Weiss R (1999) Amorphous computing. Technical Report AIM-1665, MIT

Analog Computation

Analog Computation BRUCE J. MACLENNAN Department of Electrical Engineering & Computer Science, University of Tennessee, Knoxville, USA Article Outline Glossary Definition of the Subject Introduction Fundamentals of Analog Computing Analog Computation in Nature General-Purpose Analog Computation Analog Computation and the Turing Limit Analog Thinking Future Directions Bibliography Glossary Accuracy The closeness of a computation to the corresponding primary system. BSS The theory of computation over the real numbers defined by Blum, Shub, and Smale. Church–Turing (CT) computation The model of computation based on the Turing machine and other equivalent abstract computing machines; commonly accepted as defining the limits of digital computation. EAC Extended analog computer defined by Rubel. GPAC General-purpose analog computer. Nomograph A device for the graphical solution of equations by means of a family of curves and a straightedge. ODE Ordinary differential equation. PDE Partial differential equation. Potentiometer A variable resistance, adjustable by the computer operator, used in electronic analog computing as an attenuator for setting constants and parameters in a computation. Precision The quality of an analog representation or computation, which depends on both resolution and stability. Primary system The system being simulated, modeled, analyzed, or controlled by an analog computer, also called the target system. Scaling The adjustment, by constant multiplication, of variables in the primary system (including time) so that the corresponding variables in the analog systems are in an appropriate range. TM Turing machine.

Definition of the Subject Although analog computation was eclipsed by digital computation in the second half of the twentieth century, it is returning as an important alternative computing technology. Indeed, as explained in this article, theoretical results imply that analog computation can escape from the limitations of digital computation. Furthermore, analog computation has emerged as an important theoretical framework for discussing computation in the brain and other natural systems. Analog computation gets its name from an analogy, or systematic relationship, between the physical processes in the computer and those in the system it is intended to model or simulate (the primary system). For example, the electrical quantities voltage, current, and conductance might be used as analogs of the fluid pressure, flow rate, and pipe diameter. More specifically, in traditional analog computation, physical quantities in the computation obey the same mathematical laws as physical quantities in the primary system. Thus the computational quantities are proportional to the modeled quantities. This is in contrast to digital computation, in which quantities are represented by strings of symbols (e. g., binary digits) that have no direct physical relationship to the modeled quantities. According to the Oxford English Dictionary (2nd ed., s.vv. analogue, digital), these usages emerged in the 1940s. However, in a fundamental sense all computing is based on an analogy, that is, on a systematic relationship between the states and processes in the computer and those in the primary system. In a digital computer, the relationship is more abstract and complex than simple proportionality, but even so simple an analog computer as a slide rule goes beyond strict proportion (i. e., distance on the rule is proportional to the logarithm of the number). In both analog and digital computation—indeed in all computation—the relevant abstract mathematical structure of the problem is realized in the physical states and processes of the computer, but the realization may be more or less direct [40,41,46]. Therefore, despite the etymologies of the terms “analog” and “digital”, in modern usage the principal distinction between digital and analog computation is that the former operates on discrete representations in discrete steps, while the later operated on continuous representations by means of continuous processes (e. g., MacLennan [46], Siegelmann [p. 147 in 78], Small [p. 30 in 82], Weyrick [p. 3 in 89]). That is, the primary distinction resides in the topologies of the states and processes, and it would be more accurate to refer to discrete and continuous computa-

161

162

Analog Computation

tion [p. 39 in 25]. (Consider so-called analog and digital clocks. The principal difference resides in the continuity or discreteness of the representation of time; the motion of the two (or three) hands of an “analog” clock do not mimic the motion of the rotating earth or the position of the sun relative to it.) Introduction History Pre-electronic Analog Computation Just like digital calculation, analog computation was originally performed by hand. Thus we find several analog computational procedures in the “constructions” of Euclidean geometry (Euclid, fl. 300 BCE), which derive from techniques used in ancient surveying and architecture. For example, Problem II.51 is “to divide a given straight line into two parts, so that the rectangle contained by the whole and one of the parts shall be equal to the square of the other part”. Also, Problem VI.13 is “to find a mean proportional between two given straight lines”, and VI.30 is “to cut a given straight line in extreme and mean ratio”. These procedures do not make use of measurements in terms of any fixed unit or of digital calculation; the lengths and other continuous quantities are manipulated directly (via compass and straightedge). On the other hand, the techniques involve discrete, precise operational steps, and so they can be considered algorithms, but over continuous magnitudes rather than discrete numbers. It is interesting to note that the ancient Greeks distinguished continuous magnitudes (Grk., megethoi), which have physical dimensions (e. g., length, area, rate), from discrete numbers (Grk., arithmoi), which do not [49]. Euclid axiomatizes them separately (magnitudes in Book V, numbers in Book VII), and a mathematical system comprising both discrete and continuous quantities was not achieved until the nineteenth century in the work of Weierstrass and Dedekind. The earliest known mechanical analog computer is the “Antikythera mechanism”, which was found in 1900 in a shipwreck under the sea near the Greek island of Antikythera (between Kythera and Crete). It dates to the second century BCE and appears to be intended for astronomical calculations. The device is sophisticated (at least 70 gears) and well engineered, suggesting that it was not the first of its type, and therefore that other analog computing devices may have been used in the ancient Mediterranean world [22]. Indeed, according to Cicero (Rep. 22) and other authors, Archimedes (c. 287–c. 212 BCE) and other ancient scientists also built analog computers, such as armillary spheres, for astronomical simu-

lation and computation. Other antique mechanical analog computers include the astrolabe, which is used for the determination of longitude and a variety of other astronomical purposes, and the torquetum, which converts astronomical measurements between equatorial, ecliptic, and horizontal coordinates. A class of special-purpose analog computer, which is simple in conception but may be used for a wide range of purposes, is the nomograph (also, nomogram, alignment chart). In its most common form, it permits the solution of quite arbitrary equations in three real variables, f (u; v; w) D 0. The nomograph is a chart or graph with scales for each of the variables; typically these scales are curved and have non-uniform numerical markings. Given values for any two of the variables, a straightedge is laid across their positions on their scales, and the value of the third variable is read off where the straightedge crosses the third scale. Nomographs were used to solve many problems in engineering and applied mathematics. They improve intuitive understanding by allowing the relationships among the variables to be visualized, and facilitate exploring their variation by moving the straightedge. Lipka (1918) is an example of a course in graphical and mechanical methods of analog computation, including nomographs and slide rules. Until the introduction of portable electronic calculators in the early 1970s, the slide rule was the most familiar analog computing device. Slide rules use logarithms for multiplication and division, and they were invented in the early seventeenth century shortly after John Napier’s description of logarithms. The mid-nineteenth century saw the development of the field analogy method by G. Kirchhoff (1824–1887) and others [33]. In this approach an electrical field in an electrolytic tank or conductive paper was used to solve twodimensional boundary problems for temperature distributions and magnetic fields [p. 34 in 82]. It is an early example of analog field computation. In the nineteenth century a number of mechanical analog computers were developed for integration and differentiation (e. g., Litka 1918, pp. 246–256; Clymer [15]). For example, the planimeter measures the area under a curve or within a closed boundary. While the operator moves a pointer along the curve, a rotating wheel accumulates the area. Similarly, the integraph is able to draw the integral of a given function as its shape is traced. Other mechanical devices can draw the derivative of a curve or compute a tangent line at a given point. In the late nineteenth century William Thomson, Lord Kelvin, constructed several analog computers, including a “tide predictor” and a “harmonic analyzer”, which com-

Analog Computation

puted the Fourier coefficients of a tidal curve [85,86]. In 1876 he described how the mechanical integrators invented by his brother could be connected together in a feedback loop in order to solve second and higher order differential equations (Small [pp. 34–35, 42 in 82], Thomson [84]). He was unable to construct this differential analyzer, which had to await the invention of the torque amplifier in 1927. The torque amplifier and other technical advancements permitted Vannevar Bush at MIT to construct the first practical differential analyzer in 1930 [pp. 42– 45 in 82]. It had six integrators and could also do addition, subtraction, multiplication, and division. Input data were entered in the form of continuous curves, and the machine automatically plotted the output curves continuously as the equations were integrated. Similar differential analyzers were constructed at other laboratories in the US and the UK. Setting up a problem on the MIT differential analyzer took a long time; gears and rods had to be arranged to define the required dependencies among the variables. Bush later designed a much more sophisticated machine, the Rockefeller Differential Analyzer, which became operational in 1947. With 18 integrators (out of a planned 30), it provided programmatic control of machine setup, and permitted several jobs to be run simultaneously. Mechanical differential analyzers were rapidly supplanted by electronic analog computers in the mid-1950s, and most were disassembled in the 1960s (Bowles [10], Owens [61], Small [pp. 50–45 in 82]). During World War II, and even later wars, an important application of optical and mechanical analog computation was in “gun directors” and “bomb sights”, which performed ballistic computations to accurately target artillery and dropped ordnance. Electronic Analog Computation in the 20th Century It is commonly supposed that electronic analog computers were superior to mechanical analog computers, and they were in many respects, including speed, cost, ease of construction, size, and portability [pp. 54–56 in 82]. On the other hand, mechanical integrators produced higher precision results (0.1% vs. 1% for early electronic devices) and had greater mathematical flexibility (they were able to integrate with respect to any variable, not just time). However, many important applications did not require high precision and focused on dynamic systems for which time integration was sufficient. Analog computers (non-electronic as well as electronic) can be divided into active-element and passiveelement computers; the former involve some kind of am-

plification, the latter do not [pp. 2-1–4 in 87]. Passiveelement computers included the network analyzers, which were developed in the 1920s to analyze electric power distribution networks, and which continued in use through the 1950s [pp. 35–40 in 82]. They were also applied to problems in thermodynamics, aircraft design, and mechanical engineering. In these systems networks or grids of resistive elements or reactive elements (i. e., involving capacitance and inductance as well as resistance) were used to model the spatial distribution of physical quantities such as voltage, current, and power (in electric distribution networks), electrical potential in space, stress in solid materials, temperature (in heat diffusion problems), pressure, fluid flow rate, and wave amplitude [p. 2-2 in 87]. That is, network analyzers dealt with partial differential equations (PDEs), whereas active-element computers, such as the differential analyzer and its electronic successors, were restricted to ordinary differential equations (ODEs) in which time was the independent variable. Large network analyzers are early examples of analog field computers. Electronic analog computers became feasible after the invention of the DC operational amplifier (“op amp”) c. 1940 [pp. 64, 67–72 in 82]. Already in the 1930s scientists at Bell Telephone Laboratories (BTL) had developed the DC-coupled feedback-stabilized amplifier, which is the basis of the op amp. In 1940, as the USA prepared to enter World War II, DL Parkinson at BTL had a dream in which he saw DC amplifiers being used to control an antiaircraft gun. As a consequence, with his colleagues CA Lovell and BT Weber, he wrote a series of papers on “electrical mathematics”, which described electrical circuits to “operationalize” addition, subtraction, integration, differentiation, etc. The project to produce an electronic gundirector led to the development and refinement of DC op amps suitable for analog computation. The war-time work at BTL was focused primarily on control applications of analog devices, such as the gundirector. Other researchers, such as E. Lakatos at BTL, were more interested in applying them to general-purpose analog computation for science and engineering, which resulted in the design of the General Purpose Analog Computer (GPAC), also called “Gypsy”, completed in 1949 [pp. 69–71 in 82]. Building on the BTL op amp design, fundamental work on electronic analog computation was conducted at Columbia University in the 1940s. In particular, this research showed how analog computation could be applied to the simulation of dynamic systems and to the solution of nonlinear equations. Commercial general-purpose analog computers (GPACs) emerged in the late 1940s and early 1950s [pp. 72–73 in 82]. Typically they provided several dozen

163

164

Analog Computation

integrators, but several GPACs could be connected together to solve larger problems. Later, large-scale GPACs might have up to 500 amplifiers and compute with 0.01%– 0.1% precision [p. 2-33 in 87]. Besides integrators, typical GPACs provided adders, subtracters, multipliers, fixed function generators (e. g., logarithms, exponentials, trigonometric functions), and variable function generators (for user-defined functions) [Chaps. 1.3, 2.4 in 87]. A GPAC was programmed by connecting these components together, often by means of a patch panel. In addition, parameters could be entered by adjusting potentiometers (attenuators), and arbitrary functions could be entered in the form of graphs [pp. 172–81, 2-154–156 in 87]. Output devices plotted data continuously or displayed it numerically [pp. 3-1–30 in 87]. The most basic way of using a GPAC was in singleshot mode [pp. 168–170 in 89]. First, parameters and initial values were entered into the potentiometers. Next, putting a master switch in “reset” mode controlled relays to apply the initial values to the integrators. Turning the switch to “operate” or “compute” mode allowed the computation to take place (i. e., the integrators to integrate). Finally, placing the switch in “hold” mode stopped the computation and stabilized the values, allowing them to be read from the computer (e. g., on voltmeters). Although single-shot operation was also called “slow operation” (in comparison to “repetitive operation”, discussed next), it was in practice quite fast. Because all of the devices computed in parallel and at electronic speeds, analog computers usually solved problems in real-time but often much faster (Truitt and Rogers [pp. 1-30–32 in 87], Small [p. 72 in 82]). One common application of GPACs was to explore the effect of one or more parameters on the behavior of a system. To facilitate this exploration of the parameter space, some GPACs provided a repetitive operation mode, which worked as follows (Weyrick [p. 170 in 89], Small [p. 72 in 82]). An electronic clock switched the computer between reset and compute modes at an adjustable rate (e. g., 10–1000 cycles per second) [p. 280, n. 1 in 2]. In effect the simulation was rerun at the clock rate, but if any parameters were adjusted, the simulation results would vary along with them. Therefore, within a few seconds, an entire family of related simulations could be run. More importantly, the operator could acquire an intuitive understanding of the system’s dependence on its parameters. The Eclipse of Analog Computing A common view is that electronic analog computers were a primitive predecessor of the digital computer, and that their use was just a historical episode, or even a digression, in the inevitable triumph of digital technology. It is supposed that the cur-

rent digital hegemony is a simple matter of technological superiority. However, the history is much more complicated, and involves a number of social, economic, historical, pedagogical, and also technical factors, which are outside the scope of this article (see Small [81] and Small [82], especially Chap. 8, for more information). In any case, beginning after World War II and continuing for twenty-five years, there was lively debate about the relative merits of analog and digital computation. Speed was an oft-cited advantage of analog computers [Chap. 8 in 82]. While early digital computers were much faster than mechanical differential analyzers, they were slower (often by several orders of magnitude) than electronic analog computers. Furthermore, although digital computers could perform individual arithmetic operations rapidly, complete problems were solved sequentially, one operation at a time, whereas analog computers operated in parallel. Thus it was argued that increasingly large problems required more time to solve on a digital computer, whereas on an analog computer they might require more hardware but not more time. Even as digital computing speed was improved, analog computing retained its advantage for several decades, but this advantage eroded steadily. Another important issue was the comparative precision of digital and analog computation [Chap. 8 in 82]. Analog computers typically computed with three or four digits of precision, and it was very expensive to do much better, due to the difficulty of manufacturing the parts and other factors. In contrast, digital computers could perform arithmetic operations with many digits of precision, and the hardware cost was approximately proportional to the number of digits. Against this, analog computing advocates argued that many problems did not require such high precision, because the measurements were known to only a few significant figures and the mathematical models were approximations. Further, they distinguished between precision and accuracy, which refers to the conformity of the computation to physical reality, and they argued that digital computation was often less accurate than analog, due to numerical limitations (e. g., truncation, cumulative error in numerical integration). Nevertheless, some important applications, such as the calculation of missile trajectories, required greater precision, and for these, digital computation had the advantage. Indeed, to some extent precision was viewed as inherently desirable, even in applications where it was unimportant, and it was easily mistaken for accuracy. (See Sect. “Precision” for more on precision and accuracy.) There was even a social factor involved, in that the written programs, precision, and exactness of digital com-

Analog Computation

putation were associated with mathematics and science, but the hands-on operation, parameter variation, and approximate solutions of analog computation were associated with engineers, and so analog computing inherited “the lower status of engineering vis-à-vis science” [p. 251 in 82]. Thus the status of digital computing was further enhanced as engineering became more mathematical and scientific after World War II [pp. 247– 251 in 82]. Already by the mid-1950s the competition between analog and digital had evolved into the idea that they were complementary technologies. This resulted in the development of a variety of hybrid analog/digital computing systems [pp. 251–253, 263–266 in 82]. In some cases this involved using a digital computer to control an analog computer by using digital logic to connect the analog computing elements, set parameters, and gather data. This improved the accessibility and usability of analog computers, but had the disadvantage of distancing the user from the physical analog system. The intercontinental ballistic missile program in the USA stimulated the further development of hybrid computers in the late 1950s and 1960s [81]. These applications required the speed of analog computation to simulate the closed-loop control systems and the precision of digital computation for accurate computation of trajectories. However, by the early 1970s hybrids were being displaced by all digital systems. Certainly part of the reason was the steady improvement in digital technology, driven by a vibrant digital computer industry, but contemporaries also pointed to an inaccurate perception that analog computing was obsolete and to a lack of education about the advantages and techniques of analog computing. Another argument made in favor of digital computers was that they were general-purpose, since they could be used in business data processing and other application domains, whereas analog computers were essentially specialpurpose, since they were limited to scientific computation [pp. 248–250 in 82]. Against this it was argued that all computing is essentially computing by analogy, and therefore analog computation was general-purpose because the class of analog computers included digital computers! (See also Sect. “Definition of the Subject” on computing by analogy.) Be that as it may, analog computation, as normally understood, is restricted to continuous variables, and so it was not immediately applicable to discrete data, such as that manipulated in business computing and other nonscientific applications. Therefore business (and eventually consumer) applications motivated the computer industry’s investment in digital computer technology at the expense of analog technology.

Although it is commonly believed that analog computers quickly disappeared after digital computers became available, this is inaccurate, for both general-purpose and special-purpose analog computers have continued to be used in specialized applications to the present time. For example, a general-purpose electrical (vs. electronic) analog computer, the Anacom, was still in use in 1991. This is not technological atavism, for “there is no doubt considerable truth in the fact that Anacom continued to be used because it effectively met a need in a historically neglected but nevertheless important computer application area” [3]. As mentioned, the reasons for the eclipse of analog computing were not simply the technological superiority of digital computation; the conditions were much more complex. Therefore a change in conditions has necessitated a reevaluation of analog technology. Analog VLSI In the mid-1980s, Carver Mead, who already had made important contributions to digital VLSI technology, began to advocate for the development of analog VLSI [51,52]. His motivation was that “the nervous system of even a very simple animal contains computing paradigms that are orders of magnitude more effective than are those found in systems made by humans” and that they “can be realized in our most commonly available technology—silicon integrated circuits” [pp. xi in 52]. However, he argued, since these natural computation systems are analog and highly non-linear, progress would require understanding neural information processing in animals and applying it in a new analog VLSI technology. Because analog computation is closer to the physical laws by which all computation is realized (which are continuous), analog circuits often use fewer devices than corresponding digital circuits. For example, a four-quadrant adder (capable of adding two signed numbers) can be fabricated from four transistors [pp. 87–88 in 52], and a fourquadrant multiplier from nine to seventeen, depending on the required range of operation [pp. 90–96 in 52]. Intuitions derived from digital logic about what is simple or complex to compute are often misleading when applied to analog computation. For example, two transistors are sufficient to compute the logarithm or exponential, five for the hyperbolic tangent (which is very useful in neural computation), and three for the square root [pp. 70– 71, 97–99 in 52]. Thus analog VLSI is an attractive approach to “post-Moore’s Law computing” (see Sect. “Future Directions” below). Mead and his colleagues demonstrated a number of analog VLSI devices inspired by the nervous system, including a “silicon retina” and an “electronic cochlea” [Chaps. 15–16 in 52], research that has lead to a renaissance of interest in electronic analog computing.

165

166

Analog Computation

Non-Electronic Analog Computation As will be explained in the body of this article, analog computation suggests many opportunities for future computing technologies. Many physical phenomena are potential media for analog computation provided they have useful mathematical structure (i. e., the mathematical laws describing them are mathematical functions useful for general- or specialpurpose computation), and they are sufficiently controllable for practical use. Article Roadmap The remainder of this article will begin by summarizing the fundamentals of analog computing, starting with the continuous state space and the various processes by which analog computation can be organized in time. Next it will discuss analog computation in nature, which provides models and inspiration for many contemporary uses of analog computation, such as neural networks. Then we consider general-purpose analog computing, both from a theoretical perspective and in terms of practical generalpurpose analog computers. This leads to a discussion of the theoretical power of analog computation and in particular to the issue of whether analog computing is in some sense more powerful than digital computing. We briefly consider the cognitive aspects of analog computing, and whether it leads to a different approach to computation than does digital computing. Finally, we conclude with some observations on the role of analog computation in “post-Moore’s Law computing”. Fundamentals of Analog Computing Continuous State Space As discussed in Sect. “Introduction”, the fundamental characteristic that distinguishes analog from digital computation is that the state space is continuous in analog computation and discrete in digital computation. Therefore it might be more accurate to call analog and digital computation continuous and discrete computation, respectively. Furthermore, since the earliest days there have been hybrid computers that combine continuous and discrete state spaces and processes. Thus, there are several respects in which the state space may be continuous. In the simplest case the state space comprises a finite (generally modest) number of variables, each holding a continuous quantity (e. g., voltage, current, charge). In a traditional GPAC they correspond to the variables in the ODEs defining the computational process, each typically having some independent meaning in the analysis of the problem. Mathematically, the variables are taken to contain bounded real numbers, although complex-valued

variables are also possible (e. g., in AC electronic analog computers). In a practical sense, however, their precision is limited by noise, stability, device tolerance, and other factors (discussed below, Sect. “Characteristics of Analog Computation”). In typical analog neural networks the state space is larger in dimension but more structured than in the former case. The artificial neurons are organized into one or more layers, each composed of a (possibly large) number of artificial neurons. Commonly each layer of neurons is densely connected to the next layer. In general the layers each have some meaning in the problem domain, but the individual neurons constituting them do not (and so, in mathematical descriptions, the neurons are typically numbered rather than named). The individual artificial neurons usually perform a simple computation such as this: y D  (s) ;

where s D b C

n X

wi xi ;

iD1

and where y is the activity of the neuron, x1 ; : : : ; x n are the activities of the neurons that provide its inputs, b is a bias term, and w1 ; : : : ; w n are the weights or strengths of the connections. Often the activation function  is a realvalued sigmoid (“S-shaped”) function, such as the logistic sigmoid,  (s) D

1 ; 1 C es

in which case the neuron activity y is a real number, but some applications use a discontinuous threshold function, such as the Heaviside function, ( C1 if s 0 U(s) D 0 if s < 0 in which case the activity is a discrete quantity. The saturated-linear or piecewise-linear sigmoid is also used occasionally: 8 ˆ 1  (s) D s if 0  s  1 ˆ : 0 if s < 0 : Regardless of whether the activation function is continuous or discrete, the bias b and connection weights w1 ; : : : ; w n are real numbers, as is the “net input” P s D i w i x i to the activation function. Analog computation may be used to evaluate the linear combination s and the activation function  (s), if it is real-valued. The biases

Analog Computation

and weights are normally determined by a learning algorithm (e. g., back-propagation), which is also a good candidate for analog implementation. In summary, the continuous state space of a neural network includes the bias values and net inputs of the neurons and the interconnection strengths between the neurons. It also includes the activity values of the neurons, if the activation function is a real-valued sigmoid function, as is often the case. Often large groups (“layers”) of neurons (and the connections between these groups) have some intuitive meaning in the problem domain, but typically the individual neuron activities, bias values, and interconnection weights do not. If we extrapolate the number of neurons in a layer to the continuum limit, we get a field, which may be defined as a continuous distribution of continuous quantity. Treating a group of artificial or biological neurons as a continuous mass is a reasonable mathematical approximation if their number is sufficiently large and if their spatial arrangement is significant (as it generally is in the brain). Fields are especially useful in modeling cortical maps, in which information is represented by the pattern of activity over a region of neural cortex. In field computation the state space in continuous in two ways: it is continuous in variation but also in space. Therefore, field computation is especially applicable to solving PDEs and to processing spatially extended information such as visual images. Some early analog computing devices were capable of field computation [pp. 114–17, 2-2–16 in 87]. For example, as previously mentioned (Sect. “Introduction”), large resistor and capacitor networks could be used for solving PDEs such as diffusion problems. In these cases a discrete ensemble of resistors and capacitors was used to approximate a continuous field, while in other cases the computing medium was spatially continuous. The latter made use of conductive sheets (for two-dimensional fields) or electrolytic tanks (for two- or three-dimensional fields). When they were applied to steady-state spatial problems, these analog computers were called field plotters or potential analyzers. The ability to fabricate very large arrays of analog computing devices, combined with the need to exploit massive parallelism in realtime computation and control applications, creates new opportunities for field computation [37,38,43]. There is also renewed interest in using physical fields in analog computation. For example, Rubel [73] defined an abstract extended analog computer (EAC), which augments Shannon’s [77] general purpose analog computer with (unspecified) facilities for field computation, such as PDE solvers (see Sects. Shannon’s Analysis – Rubel’s Extended Analog Computer below). JW. Mills

has explored the practical application of these ideas in his artificial neural field networks and VLSI EACs, which use the diffusion of electrons in bulk silicon or conductive gels and plastics for 2D and 3D field computation [53,54]. Computational Process We have considered the continuous state space, which is the basis for analog computing, but there are a variety of ways in which analog computers can operate on the state. In particular, the state can change continuously in time or be updated at distinct instants (as in digital computation). Continuous Time Since the laws of physics on which analog computing is based are differential equations, many analog computations proceed in continuous real time. Also, as we have seen, an important application of analog computers in the late 19th and early 20th centuries was the integration of ODEs in which time is the independent variable. A common technique in analog simulation of physical systems is time scaling, in which the differential equations are altered systematically so the simulation proceeds either more slowly or more quickly than the primary system (see Sect. “Characteristics of Analog Computation” for more on time scaling). On the other hand, because analog computations are close to the physical processes that realize them, analog computing is rapid, which makes it very suitable for real-time control applications. In principle, any mathematically describable physical process operating on time-varying physical quantities can be used for analog computation. In practice, however, analog computers typically provide familiar operations that scientists and engineers use in differential equations [70,87]. These include basic arithmetic operations, such as algebraic sum and difference (u(t) D v(t) ˙ w(t)), constant multiplication or scaling (u(t) D cv(t)), variable multiplication and division (u(t) D v(t)w(t), u(t) D v(t)/w(t)), and inversion (u(t) D v(t)). Transcendental functions may be provided, such as the exponential (u(t) D exp v(t)), logarithm (u(t) D ln v(t)), trigonometric functions (u(t) D sin v(t), etc.), and resolvers for converting between polar and rectangular coordinates. Most important, of course, is definite integraRt tion (u(t) D v0 C 0 v( ) d ), but differentiation may also be provided (u(t) D v˙(t)). Generally, however, direct differentiation is avoided, since noise tends to have a higher frequency than the signal, and therefore differentiation amplifies noise; typically problems are reformulated to avoid direct differentiation [pp. 26–27 in 89]. As previously mentioned, many GPACs include (arbitrary) function generators, which allow the use of functions defined

167

168

Analog Computation

only by a graph and for which no mathematical definition might be available; in this way empirically defined functions can be used [pp. 32–42 in 70]. Thus, given a graph (x; f (x)), or a sufficient set of samples, (x k ; f (x k )), the function generator approximates u(t) D f (v(t)). Rather less common are generators for arbitrary functions of two variables, u(t) D f (v(t); w(t)), in which the function may be defined by a surface, (x; y; f (x; y)), or by sufficient samples from it. Although analog computing is primarily continuous, there are situations in which discontinuous behavior is required. Therefore some analog computers provide comparators, which produce a discontinuous result depending on the relative value of two input values. For example, ( k if v w ; uD 0 if v < w : Typically, this would be implemented as a Heaviside (unit step) function applied to the difference of the inputs, w D kU(v  w). In addition to allowing the definition of discontinuous functions, comparators provide a primitive decision making ability, and may be used, for example to terminate a computation (switching the computer from “operate” to “hold” mode). Other operations that have proved useful in analog computation are time delays and noise generators [Chap. 7 in 31]. The function of a time delay is simply to retard the signal by an adjustable delay T > 0 : u(t C T) D v(t). One common application is to model delays in the primary system (e. g., human response time). Typically a noise generator produces time-invariant Gaussian-distributed noise with zero mean and a flat power spectrum (over a band compatible with the analog computing process). The standard deviation can be adjusted by scaling, the mean can be shifted by addition, and the spectrum altered by filtering, as required by the application. Historically noise generators were used to model noise and other random effects in the primary system, to determine, for example, its sensitivity to effects such as turbulence. However, noise can make a positive contribution in some analog computing algorithms (e. g., for symmetry breaking and in simulated annealing, weight perturbation learning, and stochastic resonance). As already mentioned, some analog computing devices for the direct solution of PDEs have been developed. In general a PDE solver depends on an analogous physical process, that is, on a process obeying the same class of PDEs that it is intended to solve. For example, in Mill’s EAC, diffusion of electrons in conductive sheets or solids is used to solve diffusion equations [53,54]. His-

torically, PDEs were solved on electronic GPACs by discretizing all but one of the independent variables, thus replacing the differential equations by difference equations [70], pp. 173–193. That is, computation over a field was approximated by computation over a finite real array. Reaction-diffusion computation is an important example of continuous-time analog computing  Reaction-Diffusion Computing. The state is represented by a set of time-varying chemical concentration fields, c1 ; : : : ; c n . These fields are distributed across a one-, two-, or threedimensional space ˝, so that, for x 2 ˝, c k (x; t) represents the concentration of chemical k at location x and time t. Computation proceeds in continuous time according to reaction-diffusion equations, which have the form: @c D Dr 2 c C F(c) ; @t where c D (c1 ; : : : ; c n )T is the vector of concentrations, D D diag(d1 ; : : : ; d n ) is a diagonal matrix of positive diffusion rates, and F is nonlinear vector function that describes how the chemical reactions affect the concentrations. Some neural net models operate in continuous time and thus are examples of continuous-time analog computation. For example, Grossberg [26,27,28] defines the activity of a neuron by differential equations such as this: n n X X x˙ i D a i x i C b i j w (C) f (x )  c i j w () j j ij i j g j (x j )C I i : jD1

jD1

This describes the continuous change in the activity of neuron i resulting from passive decay (first term), positive feedback from other neurons (second term), negative feedback (third term), and input (last term). The f j and g j are () nonlinear activation functions, and the w (C) i j and w i j are adaptable excitatory and inhibitory connection strengths, respectively. The continuous Hopfield network is another example of continuous-time analog computation [30]. The output y i of a neuron is a nonlinear function of its internal state xi , y i D  (x i ), where the hyperbolic tangent is usually used as the activation function,  (x) D tanh x, because its range is [1; 1]. The internal state is defined by a differential equation, i x˙ i D a i x i C b i C

n X

wi j y j ;

jD1

where i is a time constant, ai is the decay rate, b i is the bias, and wij is the connection weight to neuron i from

Analog Computation

neuron j. In a Hopfield network every neuron is symmetrically connected to every other (w i j D w ji ) but not to itself (w i i D 0). Of course analog VLSI implementations of neural networks also operate in continuous time (e. g., [20,52].) Concurrent with the resurgence of interest in analog computation have been innovative reconceptualizations of continuous-time computation. For example, Brockett [12] has shown that dynamical systems can perform a number of problems normally considered to be intrinsically sequential. In particular, a certain system of ODEs (a nonperiodic finite Toda lattice) can sort a list of numbers by continuous-time analog computation. The system is started with the vector x equal to the values to be sorted and a vector y initialized to small nonzero values; the y vector converges to a sorted permutation of x. Sequential Time computation refers to computation in which discrete computational operations take place in succession but at no definite interval [88]. Ordinary digital computer programs take place in sequential time, for the operations occur one after another, but the individual operations are not required to have any specific duration, so long as they take finite time. One of the oldest examples of sequential analog computation is provided by the compass-and-straightedge constructions of traditional Euclidean geometry (Sect. “Introduction”). These computations proceed by a sequence of discrete operations, but the individual operations involve continuous representations (e. g., compass settings, straightedge positions) and operate on a continuous state (the figure under construction). Slide rule calculation might seem to be an example of sequential analog computation, but if we look at it, we see that although the operations are performed by an analog device, the intermediate results are recorded digitally (and so this part of the state space is discrete). Thus it is a kind of hybrid computation. The familiar digital computer automates sequential digital computations that once were performed manually by human “computers”. Sequential analog computation can be similarly automated. That is, just as the control unit of an ordinary digital computer sequences digital computations, so a digital control unit can sequence analog computations. In addition to the analog computation devices (adders, multipliers, etc.), such a computer must provide variables and registers capable of holding continuous quantities between the sequential steps of the computation (see also Sect. “Discrete Time” below). The primitive operations of sequential-time analog computation are typically similar to those in continuoustime computation (e. g., addition, multiplication, tran-

scendental functions), but integration and differentiation with respect to sequential time do not make sense. However, continuous-time integration within a single step, and space-domain integration, as in PDE solvers or field computation devices, are compatible with sequential analog computation. In general, any model of digital computation can be converted to a similar model of sequential analog computation by changing the discrete state space to a continuum, and making appropriate changes to the rest of the model. For example, we can make an analog Turing machine by allowing it to write a bounded real number (rather than a symbol from a finite alphabet) onto a tape cell. The Turing machine’s finite control can be altered to test for tape markings in some specified range. Similarly, in a series of publications Blum, Shub, and Smale developed a theory of computation over the reals, which is an abstract model of sequential-time analog computation [6,7]. In this “BSS model” programs are represented as flowcharts, but they are able to operate on realvalued variables. Using this model they were able to prove a number of theorems about the complexity of sequential analog algorithms. The BSS model, and some other sequential analog computation models, assume that it is possible to make exact comparisons between real numbers (analogous to exact comparisons between integers or discrete symbols in digital computation) and to use the result of the comparison to control the path of execution. Comparisons of this kind are problematic because they imply infinite precision in the comparator (which may be defensible in a mathematical model but is impossible in physical analog devices), and because they make the execution path a discontinuous function of the state (whereas analog computation is usually continuous). Indeed, it has been argued that this is not “true” analog computation [p. 148 in 78]. Many artificial neural network models are examples of sequential-time analog computation. In a simple feed-forward neural network, an input vector is processed by the layers in order, as in a pipeline. That is, the output of layer n becomes the input of layer nC1. Since the model does not make any assumptions about the amount of time it takes a vector to be processed by each layer and to propagate to the next, execution takes place in sequential time. Most recurrent neural networks, which have feedback, also operate in sequential time, since the activities of all the neurons are updated synchronously (that is, signals propagate through the layers, or back to earlier layers, in lockstep). Many artificial neural-net learning algorithms are also sequential-time analog computations. For example, the

169

170

Analog Computation

back-propagation algorithm updates a network’s weights, moving sequentially backward through the layers. In summary, the correctness of sequential time computation (analog or digital) depends on the order of operations, not on their duration, and similarly the efficiency of sequential computations is evaluated in terms of the number of operations, not on their total duration. Discrete Time analog computation has similarities to both continuous-time and sequential analog computation. Like the latter, it proceeds by a sequence of discrete (analog) computation steps; like the former, these steps occur at a constant rate in real time (e. g., some “frame rate”). If the real-time rate is sufficient for the application, then discrete-time computation can approximate continuoustime computation (including integration and differentiation). Some electronic GPACs implemented discrete-time analog computation by a modification of repetitive operation mode, called iterative analog computation [Chap. 9 in 2]. Recall (Sect. “Electronic Analog Computation in the 20th Century”) that in repetitive operation mode a clock rapidly switched the computer between reset and compute modes, thus repeating the same analog computation, but with different parameters (set by the operator). However, each repetition was independent of the others. Iterative operation was different in that analog values computed by one iteration could be used as initial values in the next. This was accomplished by means of an analog memory circuit (based on an op amp) that sampled an analog value at the end of one compute cycle (effectively during hold mode) and used it to initialize an integrator during the following reset cycle. (A modified version of the memory circuit could be used to retain a value over several iterations.) Iterative computation was used for problems such as determining, by iterative search or refinement, the initial conditions that would lead to a desired state at a future time. Since the analog computations were iterated at a fixed clock rate, iterative operation is an example of discrete-time analog computation. However, the clock rate is not directly relevant in some applications (such as the iterative solution of boundary value problems), in which case iterative operation is better characterized as sequential analog computation. The principal contemporary examples of discrete-time analog computing are in neural network applications to time-series analysis and (discrete-time) control. In each of these cases the input to the neural net is a sequence of discrete-time samples, which propagate through the net and generate discrete-time output signals. Many of these neural nets are recurrent, that is, values from later layers

are fed back into earlier layers, which allows the net to remember information from one sample to the next. Analog Computer Programs The concept of a program is central to digital computing, both practically, for it is the means for programming general-purpose digital computers, and theoretically, for it defines the limits of what can be computed by a universal machine, such as a universal Turing machine. Therefore it is important to discuss means for describing or specifying analog computations. Traditionally, analog computers were used to solve ODEs (and sometimes PDEs), and so in one sense a mathematical differential equation is one way to represent an analog computation. However, since the equations were usually not suitable for direct solution on an analog computer, the process of programming involved the translation of the equations into a schematic diagram showing how the analog computing devices (integrators etc.) should be connected to solve the problem. These diagrams are the closest analogies to digital computer programs and may be compared to flowcharts, which were once popular in digital computer programming. It is worth noting, however, that flowcharts (and ordinary computer programs) represent sequences among operations, whereas analog computing diagrams represent functional relationships among variables, and therefore a kind of parallel data flow. Differential equations and schematic diagrams are suitable for continuous-time computation, but for sequential analog computation something more akin to a conventional digital program can be used. Thus, as previously discussed (Sect. “Sequential Time”), the BSS system uses flowcharts to describe sequential computations over the reals. Similarly, C. Moore [55] defines recursive functions over the reals by means of a notation similar to a programming language. In principle any sort of analog computation might involve constants that are arbitrary real numbers, which therefore might not be expressible in finite form (e. g., as a finite string of digits). Although this is of theoretical interest (see Sect. “Real-valued Inputs, Outputs, and Constants” below), from a practical standpoint these constants could be set with about at most four digits of precision [p. 11 in 70]. Indeed, automatic potentiometer-setting devices were constructed that read a series of decimal numerals from punched paper tape and used them to set the potentiometers for the constants [pp. 3-58–60 in 87]. Nevertheless it is worth observing that analog computers do allow continuous inputs that need not be expressed in digital notation, for example, when the parameters of a sim-

Analog Computation

ulation are continuously varied by the operator. In principle, therefore, an analog program can incorporate constants that are represented by a real-valued physical quantity (e. g., an angle or a distance), which need not be expressed digitally. Further, as we have seen (Sect. “Electronic Analog Computation in the 20th Century”), some electronic analog computers could compute a function by means of an arbitrarily drawn curve, that is, not represented by an equation or a finite set of digitized points. Therefore, in the context of analog computing it is natural to expand the concept of a program beyond discrete symbols to include continuous representations (scalar magnitudes, vectors, curves, shapes, surfaces, etc.). Typically such continuous representations would be used as adjuncts to conventional discrete representations of the analog computational process, such as equations or diagrams. However, in some cases the most natural static representation of the process is itself continuous, in which case it is more like a “guiding image” than a textual prescription [42]. A simple example is a potential surface, which defines a continuum of trajectories from initial states (possible inputs) to fixed-point attractors (the results of the computations). Such a “program” may define a deterministic computation (e. g., if the computation proceeds by gradient descent), or it may constrain a nondeterministic computation (e. g., if the computation may proceed by any potential-decreasing trajectory). Thus analog computation suggests a broadened notion of programs and programming. Characteristics of Analog Computation Precision Analog computation is evaluated in terms of both accuracy and precision, but the two must be distinguished carefully [pp. 25–28 in 2,pp. 12–13 in 89,pp. 257– 261 in 82]. Accuracy refers primarily to the relationship between a simulation and the primary system it is simulating or, more generally, to the relationship between the results of a computation and the mathematically correct result. Accuracy is a result of many factors, including the mathematical model chosen, the way it is set up on a computer, and the precision of the analog computing devices. Precision, therefore, is a narrower notion, which refers to the quality of a representation or computing device. In analog computing, precision depends on resolution (fineness of operation) and stability (absence of drift), and may be measured as a fraction of the represented value. Thus a precision of 0.01% means that the representation will stay within 0.01% of the represented value for a reasonable period of time. For purposes of comparing analog devices, the precision is usually expressed as a fraction of full-scale

variation (i. e., the difference between the maximum and minimum representable values). It is apparent that the precision of analog computing devices depends on many factors. One is the choice of physical process and the way it is utilized in the device. For example a linear mathematical operation can be realized by using a linear region of a nonlinear physical process, but the realization will be approximate and have some inherent imprecision. Also, associated, unavoidable physical effects (e. g., loading, and leakage and other losses) may prevent precise implementation of an intended mathematical function. Further, there are fundamental physical limitations to resolution (e. g., quantum effects, diffraction). Noise is inevitable, both intrinsic (e. g., thermal noise) and extrinsic (e. g., ambient radiation). Changes in ambient physical conditions, such as temperature, can affect the physical processes and decrease precision. At slower time scales, materials and components age and their physical characteristics change. In addition, there are always technical and economic limits to the control of components, materials, and processes in analog device fabrication. The precision of analog and digital computing devices depend on very different factors. The precision of a (binary) digital device depends on the number of bits, which influences the amount of hardware, but not its quality. For example, a 64-bit adder is about twice the size of a 32bit adder, but can made out of the same components. At worst, the size of a digital device might increase with the square of the number of bits of precision. This is because binary digital devices only need to represent two states, and therefore they can operate in saturation. The fabrication standards sufficient for the first bit of precision are also sufficient for the 64th bit. Analog devices, in contrast, need to be able to represent a continuum of states precisely. Therefore, the fabrication of high-precision analog devices is much more expensive than low-precision devices, since the quality of components, materials, and processes must be much more carefully controlled. Doubling the precision of an analog device may be expensive, whereas the cost of each additional bit of digital precision is incremental; that is, the cost is proportional to the logarithm of the precision expressed as a fraction of full range. The forgoing considerations might seem to be a convincing argument for the superiority of digital to analog technology, and indeed they were an important factor in the competition between analog and digital computers in the middle of the twentieth century [pp. 257–261 in 82]. However, as was argued at that time, many computer applications do not require high precision. Indeed, in many engineering applications, the input data are known to only a few digits, and the equations may be approximate or de-

171

172

Analog Computation

rived from experiments. In these cases the very high precision of digital computation is unnecessary and may in fact be misleading (e. g., if one displays all 14 digits of a result that is accurate to only three). Furthermore, many applications in image processing and control do not require high precision. More recently, research in artificial neural networks (ANNs) has shown that low-precision analog computation is sufficient for almost all ANN applications. Indeed, neural information processing in the brain seems to operate with very low precision (perhaps less than 10% [p. 378 in 50]), for which it compensates with massive parallelism. For example, by coarse coding a population of low-precision devices can represent information with relatively high precision [pp. 91–96 in 74,75]. Scaling An important aspect of analog computing is scaling, which is used to adjust a problem to an analog computer. First is time scaling, which adjusts a problem to the characteristic time scale at which a computer operates, which is a consequence of its design and the physical processes by which it is realized [pp. 37–44 in 62,pp. 262– 263 in 70,pp. 241–243 in 89]. For example, we might want a simulation to proceed on a very different time scale from the primary system. Thus a weather or economic simulation should proceed faster than real time in order to get useful predictions. Conversely, we might want to slow down a simulation of protein folding so that we can observe the stages in the process. Also, for accurate results it is necessary to avoid exceeding the maximum response rate of the analog devices, which might dictate a slower simulation speed. On the other hand, too slow a computation might be inaccurate as a consequence of instability (e. g., drift and leakage in the integrators). Time scaling affects only time-dependent operations such as integration. For example, suppose t, time in the primary system or “problem time”, is related to , time in the computer, by D ˇt. Therefore, an integration Rt u(t) D 0 v(t 0 )dt 0 in the primary system is replaced by R the integration u( ) D ˇ 1 0 v( 0 )d 0 on the computer. Thus time scaling may be accomplished simply by decreasing the input gain to the integrator by a factor of ˇ. Fundamental to analog computation is the representation of a continuous quantity in the primary system by a continuous quantity in the computer. For example, a displacement x in meters might be represented by a potential V in volts. The two are related by an amplitude or magnitude scale factor, V D ˛x, (with units volts/meter), chosen to meet two criteria [pp. 103–106 in 2,Chap. 4 in 62,pp. 127–128 in 70,pp. 233–240 in 89]. On the one hand, ˛ must be sufficiently small so that the range of the problem variable is accommodated within the range of values

supported by the computing device. Exceeding the device’s intended operating range may lead to inaccurate results (e. g., forcing a linear device into nonlinear behavior). On the other hand, the scale factor should not be too small, or relevant variation in the problem variable will be less than the resolution of the device, also leading to inaccuracy. (Recall that precision is specified as a fraction of fullrange variation.) In addition to the explicit variables of the primary system, there are implicit variables, such as the time derivatives of the explicit variables, and scale factors must be chosen for them too. For example, in addition to displacement x, a problem might include velocity x˙ and acceleration x¨ . Therefore, scale factors ˛, ˛ 0 , and ˛ 00 must be chosen so that ˛x, ˛ 0 x˙ , and ˛ 00 x¨ have an appropriate range of variation (neither too large nor too small). Once a scale factor has been chosen, the primary system equations are adjusted to obtain the analog computing equations. For example, if we haveR scaled u D ˛x and t v D ˛ 0 x˙ , then the integration x(t) D 0 x˙ (t 0 )dt 0 would be computed by scaled equation: Z ˛ t 0 0 u(t) D 0 v(t )dt : ˛ 0 This is accomplished by simply setting the input gain of the integrator to ˛/˛ 0 . In practice, time scaling and magnitude scaling are not independent [p. 262 in 70]. For example, if the derivatives of a variable can be large, then the variable can change rapidly, and so it may be necessary to slow down the computation to avoid exceeding the high-frequency response of the computer. Conversely, small derivatives might require the computation to be run faster to avoid integrator leakage etc. Appropriate scale factors are determined by considering both the physics and the mathematics of the problem [pp. 40–44 in 62]. That is, first, the physics of the primary system may limit the ranges of the variables and their derivatives. Second, analysis of the mathematical equations describing the system can give additional information on the ranges of the variables. For example, in some cases the natural frequency of a system can be estimated from the coefficients of the differential equations; the maximum of the nth derivative is then estimated as the n power of this frequency [p. 42 in 62,pp. 238–240 in 89]. In any case, it is not necessary to have accurate values for the ranges; rough estimates giving orders of magnitude are adequate. It is tempting to think of magnitude scaling as a problem unique to analog computing, but before the invention of floating-point numbers it was also necessary in digital computer programming. In any case it is an essential

Analog Computation

aspect of analog computing, in which physical processes are more directly used for computation than they are in digital computing. Although the necessity of scaling has been a source of criticism, advocates for analog computing have argued that it is a blessing in disguise, because it leads to improved understanding of the primary system, which was often the goal of the computation in the first place [5,Chap. 8 in 82]. Practitioners of analog computing are more likely to have an intuitive understanding of both the primary system and its mathematical description (see Sect. “Analog Thinking”). Analog Computation in Nature Computational processes—that is to say, information processing and control—occur in many living systems, most obviously in nervous systems, but also in the selforganized behavior of groups of organisms. In most cases natural computation is analog, either because it makes use of continuous natural processes, or because it makes use of discrete but stochastic processes. Several examples will be considered briefly. Neural Computation In the past neurons were thought of binary computing devices, something like digital logic gates. This was a consequence of the “all or nothing” response of a neuron, which refers to the fact that it does or does not generate an action potential (voltage spike) depending, respectively, on whether its total input exceeds a threshold or not (more accurately, it generates an action potential if the membrane depolarization at the axon hillock exceeds the threshold and the neuron is not in its refractory period). Certainly some neurons (e. g., so-called “command neurons”) do act something like logic gates. However, most neurons are analyzed better as analog devices, because the rate of impulse generation represents significant information. In particular, an amplitude code, the membrane potential near the axon hillock (which is a summation of the electrical influences on the neuron), is translated into a rate code for more reliable long-distance transmission along the axons. Nevertheless, the code is low precision (about one digit), since information theory shows that it takes at least N milliseconds (and probably more like 5N ms) to discriminate N values [39]. The rate code is translated back to an amplitude code by the synapses, since successive impulses release neurotransmitter from the axon terminal, which diffuses across the synaptic cleft to receptors. Thus a synapse acts as a leaky integrator to time-average the impulses. As previously discussed (Sect. “Continuous State Space”), many artificial neural net models have real-valued

neural activities, which correspond to rate-encoded axonal signals of biological neurons. On the other hand, these models typically treat the input connections as simple realvalued weights, which ignores the analog signal processing that takes place in the dendritic trees of biological neurons. The dendritic trees of many neurons are complex structures, which often have thousand of synaptic inputs. The binding of neurotransmitters to receptors causes minute voltage fluctuations, which propagate along the membrane, and ultimately cause voltage fluctuations at the axon hillock, which influence the impulse rate. Since the dendrites have both resistance and capacitance, to a first approximation the signal propagation is described by the “cable equations”, which describe passive signal propagation in cables of specified diameter, capacitance, and resistance [Chap. 1 in 1]. Therefore, to a first approximation, a neuron’s dendritic net operates as an adaptive linear analog filter with thousands of inputs, and so it is capable of quite complex signal processing. More accurately, however, it must be treated as a nonlinear analog filter, since voltage-gated ion channels introduce nonlinear effects. The extent of analog signal processing in dendritic trees is still poorly understood. In most cases, then, neural information processing is treated best as low-precision analog computation. Although individual neurons have quite broadly tuned responses, accuracy in perception and sensorimotor control is achieved through coarse coding, as already discussed (Sect. “Characteristics of Analog Computation”). Further, one widely used neural representation is the cortical map, in which neurons are systematically arranged in accord with one or more dimensions of their stimulus space, so that stimuli are represented by patterns of activity over the map. (Examples are tonotopic maps, in which pitch is mapped to cortical location, and retinotopic maps, in which cortical location represents retinal location.) Since neural density in the cortex is at least 146 000 neurons per square millimeter [p. 51 in 14], even relatively small cortical maps can be treated as fields and information processing in them as analog field computation. Overall, the brain demonstrates what can be accomplished by massively parallel analog computation, even if the individual devices are comparatively slow and of low precision. Adaptive Self-Organization in Social Insects Another example of analog computation in nature is provided by the self-organizing behavior of social insects, microorganisms, and other populations [13]. Often such organisms respond to concentrations, or gradients in the concentrations, of chemicals produced by other members

173

174

Analog Computation

of the population. These chemicals may be deposited and diffuse through the environment. In other cases, insects and other organisms communicate by contact, but may maintain estimates of the relative proportions of different kinds of contacts. Because the quantities are effectively continuous, all these are examples of analog control and computation. Self-organizing populations provide many informative examples of the use of natural processes for analog information processing and control. For example, diffusion of pheromones is a common means of self-organization in insect colonies, facilitating the creation of paths to resources, the construction of nests, and many other functions [13]. Real diffusion (as opposed to sequential simulations of it) executes, in effect, a massively parallel search of paths from the chemical’s source to its recipients and allows the identification of near-optimal paths. Furthermore, if the chemical degrades, as is generally the case, then the system will be adaptive, in effect continually searching out the shortest paths, so long as source continues to function [13]. Simulated diffusion has been applied to robot path planning [32,69]. Genetic Circuits Another example of natural analog computing is provided by the genetic regulatory networks that control the behavior of cells, in multicellular organisms as well as single-celled ones [16]. These networks are defined by the mutually interdependent regulatory genes, promoters, and repressors that control the internal and external behavior of a cell. The interdependencies are mediated by proteins, the synthesis of which is governed by genes, and which in turn regulate the synthesis of other gene products (or themselves). Since it is the quantities of these substances that is relevant, many of the regulatory motifs can be described in computational terms as adders, subtracters, integrators, etc. Thus the genetic regulatory network implements an analog control system for the cell [68]. It might be argued that the number of intracellular molecules of a particular protein is a (relatively small) discrete number, and therefore that it is inaccurate to treat it as a continuous quantity. However, the molecular processes in the cell are stochastic, and so the relevant quantity is the probability that a regulatory protein will bind to a regulatory site. Further, the processes take place in continuous real time, and so the rates are generally the significant quantities. Finally, although in some cases gene activity is either on or off (more accurately: very low), in other cases it varies continuously between these extremes [pp. 388–390 in 29].

Embryological development combines the analog control of individual cells with the sort of self-organization of populations seen in social insects and other colonial organisms. Locomotion of the cells and the expression of specific genes is controlled by chemical signals, among other mechanisms [16,17]. Thus PDEs have proved useful in explaining some aspects of development; for example reaction-diffusion equations have been used to describe the formation of hair-coat patterns and related phenomena [13,48,57]; see  Reaction-Diffusion Computing. Therefore the developmental process is governed by naturally occurring analog computation. Is Everything a Computer? It might seem that any continuous physical process could be viewed as analog computation, which would make the term almost meaningless. As the question has been put, is it meaningful (or useful) to say that the solar system is computing Kepler’s laws? In fact, it is possible and worthwhile to make a distinction between computation and other physical processes that happen to be described by mathematical laws [40,41,44,46]. If we recall the original meaning of analog computation (Sect. “Definition of the Subject”), we see that the computational system is used to solve some mathematical problem with respect to a primary system. What makes this possible is that the computational system and the primary system have the same, or systematically related, abstract (mathematical) structures. Thus the computational system can inform us about the primary system, or be used to control it, etc. Although from a practical standpoint some analogs are better than others, in principle any physical system can be used that obeys the same equations as the primary system. Based on these considerations we may define computation as a physical process the purpose of which is the abstract manipulation of abstract objects (i. e., information processing); this definition applies to analog, digital, and hybrid computation [40,41,44,46]. Therefore, to determine if a natural system is computational we need to look to its purpose or function within the context of the living system of which it is a part. One test of whether its function is the abstract manipulation of abstract objects is to ask whether it could still fulfill its function if realized by different physical processes, a property called multiple realizability. (Similarly, in artificial systems, a simulation of the economy might be realized equally accurately by a hydraulic analog computer or an electronic analog computer [5].) By this standard, the majority of the nervous system is purely computational; in principle it could be

Analog Computation

replaced by electronic devices obeying the same differential equations. In the other cases we have considered (selforganization of living populations, genetic circuits) there are instances of both pure computation and computation mixed with other functions (for example, where the specific substances used have other – e. g. metabolic – roles in the living system). General-Purpose Analog Computation The Importance of General-Purpose Computers Although special-purpose analog and digital computers have been developed, and continue to be developed, for many purposes, the importance of general-purpose computers, which can be adapted easily for a wide variety of purposes, has been recognized since at least the nineteenth century. Babbage’s plans for a general-purpose digital computer, his analytical engine (1835), are well known, but a general-purpose differential analyzer was advocated by Kelvin [84]. Practical general-purpose analog and digital computers were first developed at about the same time: from the early 1930s through the war years. Generalpurpose computers of both kinds permit the prototyping of special-purpose computers and, more importantly, permit the flexible reuse of computer hardware for different or evolving purposes. The concept of a general-purpose computer is useful also for determining the limits of a computing paradigm. If one can design—theoretically or practically—a universal computer, that is, a general-purpose computer capable of simulating any computer in a relevant class, then anything uncomputable by the universal computer will also be uncomputable by any computer in that class. This is, of course, the approach used to show that certain functions are uncomputable by any Turing machine because they are uncomputable by a universal Turing machine. For the same reason, the concept of general-purpose analog computers, and in particular of universal analog computers are theoretically important for establishing limits to analog computation. General-Purpose Electronic Analog Computers Before taking up these theoretical issues, it is worth recalling that a typical electronic GPAC would include linear elements, such as adders, subtracters, constant multipliers, integrators, and differentiators; nonlinear elements, such as variable multipliers and function generators; other computational elements, such as comparators, noise generators, and delay elements (Sect. “Electronic Analog Computation in the 20th Century”). These are, of

course, in addition to input/output devices, which would not affect its computational abilities. Shannon’s Analysis Claude Shannon did an important analysis of the computational capabilities of the differential analyzer, which applies to many GPACs [76,77]. He considered an abstract differential analyzer equipped with an unlimited number of integrators, adders, constant multipliers, and function generators (for functions with only a finite number of finite discontinuities), with at most one source of drive (which limits possible interconnections between units). This was based on prior work that had shown that almost all the generally used elementary functions could be generated with addition and integration. We will summarize informally a few of Shannon’s results; for details, please consult the original paper. First Shannon offers proofs that, by setting up the correct ODEs, a GPAC with the mentioned facilities can generate any function if and only if is not hypertranscendental (Theorem II); thus the GPAC can generate any function that is algebraic transcendental (a very large class), but not, for example, Euler’s gamma function and Riemann’s zeta function. He also shows that the GPAC can generate functions derived from generable functions, such as the integrals, derivatives, inverses, and compositions of generable functions (Thms. III, IV). These results can be generalized to functions of any number of variables, and to their compositions, partial derivatives, and inverses with respect to any one variable (Thms. VI, VII, IX, X). Next Shannon shows that a function of any number of variables that is continuous over a closed region of space can be approximated arbitrarily closely over that region with a finite number of adders and integrators (Thms. V, VIII). Shannon then turns from the generation of functions to the solution of ODEs and shows that the GPAC can solve any system of ODEs defined in terms of nonhypertranscendental functions (Thm. XI). Finally, Shannon addresses a question that might seem of limited interest, but turns out to be relevant to the computational power of analog computers (see Sect. “Analog Computation and the Turing Limit” below). To understand it we must recall that he was investigating the differential analyzer—a mechanical analog computer—but similar issues arise in other analog computing technologies. The question is whether it is possible to perform an arbitrary constant multiplication, u D kv, by means of gear ratios. He show that if we have just two gear ratios a and b(a; b ¤ 0; 1), such that b is not a rational power of a, then

175

176

Analog Computation

by combinations of these gears we can approximate k arbitrarily closely (Thm. XII). That is, to approximate multiplication by arbitrary real numbers, it is sufficient to be able to multiply by a, b, and their inverses, provided a and b are not related by a rational power. Shannon mentions an alternative method Rof constant v multiplication, which uses integration, kv D 0 kdv, but this requires setting the integrand to the constant function k. Therefore, multiplying by an arbitrary real number requires the ability to input an arbitrary real as the integrand. The issue of real-valued inputs and outputs to analog computers is relevant both to their theoretical power and to practical matters of their application (see Sect. “Real-valued Inputs, Output, and Constants”). Shannon’s proofs, which were incomplete, were eventually refined by M. Pour-El [63] and finally corrected by L. Lipshitz and L.A. Rubel [35]. Rubel [72] proved that Shannon’s GPAC cannot solve the Dirichlet problem for Laplace’s equation on the disk; indeed, it is limited to initial-value problems for algebraic ODEs. Specifically, the Shannon–Pour-El Thesis is that the outputs of the GPAC are exactly the solutions of the algebraic differential equations, that is, equations of the form P[x; y(x); y 0 (x); y 00 (x); : : : ; y (n) (x)] D 0 ; where P is a polynomial that is not identically vanishing in any of its variables (these are the differentially algebraic functions) [71]. (For details please consult the cited papers.) The limitations of Shannon’s GPAC motivated Rubel’s definition of the Extended Analog Computer. Rubel’s Extended Analog Computer The combination of Rubel’s [71] conviction that the brain is an analog computer together with the limitations of Shannon’s GPAC led him to propose the Extended Analog Computer (EAC) [73]. Like Shannon’s GPAC (and the Turing machine), the EAC is a conceptual computer intended to facilitate theoretical investigation of the limits of a class of computers. The EAC extends the GPAC in a number of respects. For example, whereas the GPAC solves equations defined over a single variable (time), the EAC can generate functions over any finite number of real variables. Further, whereas the GPAC is restricted to initial-value problems for ODEs, the EAC solves both initial- and boundary-value problems for a variety of PDEs. The EAC is structured into a series of levels, each more powerful than the ones below it, from which it accepts inputs. The inputs to the lowest level are a finite number of real variables (“settings”). At this level it operates on real

polynomials, from which it is able to generate the differentially algebraic functions. The computing on each level is accomplished by conceptual analog devices, which include constant real-number generators, adders, multipliers, differentiators, “substituters” (for function composition), devices for analytic continuation, and inverters, which solve systems of equations defined over functions generated by the lower levels. Most characteristic of the EAC is the “boundary-value-problem box”, which solves systems of PDEs and ODEs subject to boundary conditions and other constraints. The PDEs are defined in terms of functions generated by the lower levels. Such PDE solvers may seem implausible, and so it is important to recall fieldcomputing devices for this purpose were implemented in some practical analog computers (see Sect. “History”) and more recently in Mills’ EAC [54]. As Rubel observed, PDE solvers could be implemented by physical processes that obey the same PDEs (heat equation, wave equation, etc.). (See also Sect. “Future Directions” below.) Finally, the EAC is required to be “extremely wellposed”, which means that each level is relatively insensitive to perturbations in its inputs; thus “all the outputs depend in a strongly deterministic and stable way on the initial settings of the machine” [73]. Rubel [73] proves that the EAC can compute everything that the GPAC can compute, but also such functions as the gamma and zeta, and that it can solve the Dirichlet problem for Laplace’s equation on the disk, all of which are beyond the GPAC’s capabilities. Further, whereas the GPAC can compute differentially algebraic functions of time, the EAC can compute differentially algebraic functions of any finite number of real variables. In fact, Rubel did not find any real-analytic (C 1 ) function that is not computable on the EAC, but he observes that if the EAC can indeed generate every real-analytic function, it would be too broad to be useful as a model of analog computation. Analog Computation and the Turing Limit Introduction The Church–Turing Thesis asserts that anything that is effectively computable is computable by a Turing machine, but the Turing machine (and equivalent models, such as the lambda calculus) are models of discrete computation, and so it is natural to wonder how analog computing compares in power, and in particular whether it can compute beyond the “Turing limit”. Superficial answers are easy to obtain, but the issue is subtle because it depends upon choices among definitions, none of which is obviously correct, it involves the foundations of mathematics and its

Analog Computation

philosophy, and it raises epistemological issues about the role of models in scientific theories. Nevertheless this is an active research area, but many of the results are apparently inconsistent due to the differing assumptions on which they are based. Therefore this section will be limited to a mention of a few of the interesting results, but without attempting a comprehensive, systematic, or detailed survey; Siegelmann [78] can serve as an introduction to the literature. A Sampling of Theoretical Results Continuous-Time Models P. Orponen’s [59] 1997 survey of continuous-time computation theory is a good introduction to the literature as of that time; here we give a sample of these and more recent results. There are several results showing that—under various assumptions—analog computers have at least the power of Turing machines (TMs). For example, M.S. Branicky [11] showed that a TM could be simulated by ODEs, but he used non-differentiable functions; O. Bournez et al. [8] provide an alternative construction using only analytic functions. They also prove that the GPAC computability coincides with (Turing-)computable analysis, which is surprising, since the gamma function is Turingcomputable but, as we have seen, the GPAC cannot generate it. The paradox is resolved by a distinction between generating a function and computing it, with the latter, broader notion permitting convergent computation of the function (that is, as t ! 1). However, the computational power of general ODEs has not been determined in general [p. 149 in 78]. MB Pour-El and I Richards exhibit a Turing-computable ODE that does not have a Turingcomputable solution [64,66]. M. Stannett [83] also defined a continuous-time analog computer that could solve the halting problem. C. Moore [55] defines a class of continuous-time recursive functions over the reals, which includes a zerofinding operator . Functions can be classified into a hierarchy depending on the number of uses of , with the lowest level (no s) corresponding approximately to Shannon’s GPAC. Higher levels can compute non-Turingcomputable functions, such as the decision procedure for the halting problem, but he questions whether this result is relevant in the physical world, which is constrained by “noise, quantum effects, finite accuracy, and limited resources”. O. Bournez and M. Cosnard [9] have extended these results and shown that many dynamical systems have super-Turing power. S. Omohundro [58] showed that a system of ten coupled nonlinear PDEs could simulate an arbitrary cellu-

lar automaton (see  Mathematical Basis of Cellular Automata, Introduction to), which implies that PDEs have at least Turing power. Further, D. Wolpert and B.J. MacLennan [90,91] showed that any TM can be simulated by a field computer with linear dynamics, but the construction uses Dirac delta functions. Pour-El and Richards exhibit a wave equation in three-dimensional space with Turing-computable initial conditions, but for which the unique solution is Turing-uncomputable [65,66]. Sequential-Time Models We will mention a few of the results that have been obtained concerning the power of sequential-time analog computation. Although the BSS model has been investigated extensively, its power has not been completely determined [6,7]. It is known to depend on whether just rational numbers or arbitrary real numbers are allowed in its programs [p. 148 in 78]. A coupled map lattice (CML) is a cellular automaton with real-valued states  Mathematical Basis of Cellular Automata, Introduction to; it is a sequential-time analog computer, which can be considered a discrete-space approximation to a simple sequential-time field computer. P. Orponen and M. Matamala [60] showed that a finite CML can simulate a universal Turing machine. However, since a CML can simulate a BSS program or a recurrent neural network (see Sect. “Recurrent Neural Networks” below), it actually has super-Turing power [p. 149 in 78]. Recurrent neural networks are some of the most important examples of sequential analog computers, and so the following section is devoted to them. Recurrent Neural Networks With the renewed interest in neural networks in the mid-1980s, may investigators wondered if recurrent neural nets have superTuring power. M. Garzon and S. Franklin showed that a sequential-time net with a countable infinity of neurons could exceed Turing power [21,23,24]. Indeed, Siegelmann and E.D. Sontag [80] showed that finite neural nets with real-valued weights have super-Turing power, but W. Maass and Sontag [36] showed that recurrent nets with Gaussian or similar noise had sub-Turing power, illustrating again the dependence on these results on assumptions about what is a reasonable mathematical idealization of analog computing. For recent results on recurrent neural networks, we will restrict our attention of the work of Siegelmann [78], who addresses the computational power of these networks in terms of the classes of languages they can recognize. Without loss of generality the languages are restricted to sets of binary strings. A string to be tested is fed to the network one bit at a time, along with an input that indi-

177

178

Analog Computation

cates when the end of the input string has been reached. The network is said to decide whether the string is in the language if it correctly indicates whether it is in the set or not, after some finite number of sequential steps since input began. Siegelmann shows that, if exponential time is allowed for recognition, finite recurrent neural networks with real-valued weights (and saturated-linear activation functions) can compute all languages, and thus they are more powerful than Turing machines. Similarly, stochastic networks with rational weights also have super-Turing power, although less power than the deterministic nets with real weights. (Specifically, they compute P/POLY and BPP/log  respectively; see Siegelmann [Chaps. 4, 9 in 78] for details.) She further argues that these neural networks serve as a “standard model” of (sequential) analog computation (comparable to Turing machines in Church-Turing computation), and therefore that the limits and capabilities of these nets apply to sequential analog computation generally. Siegelmann [p. 156 in 78] observes that the superTuring power of recurrent neural networks is a consequence of their use of non-rational real-valued weights. In effect, a real number can contain an infinite number of bits of information. This raises the question of how the non-rational weights of a network can ever be set, since it is not possible to define a physical quantity with infinite precision. However, although non-rational weights may not be able to be set from outside the network, they can be computed within the network by learning algorithms, which are analog computations. Thus, Siegelmann suggests, the fundamental distinction may be between static computational models, such as the Turing machine and its equivalents, and dynamically evolving computational models, which can tune continuously variable parameters and thereby achieve super-Turing power. Dissipative Models Beyond the issue of the power of analog computing relative to the Turing limit, there are also questions of its relative efficiency. For example, could analog computing solve NP-hard problems in polynomial or even linear time? In traditional computational complexity theory, efficiency issues are addressed in terms of the asymptotic number of computation steps to compute a function as the size of the function’s input increases. One way to address corresponding issues in an analog context is by treating an analog computation as a dissipative system, which in this context means a system that decreases some quantity (analogous to energy) so that the system state converges to an point attractor. From this perspective, the initial state of the system incorporates the input

to the computation, and the attractor represents its output. Therefore, HT Sieglemann, S Fishman, and A BenHur have developed a complexity theory for dissipative systems, in both sequential and continuous time, which addresses the rate of convergence in terms of the underlying rates of the system [4,79]. The relation between dissipative complexity classes (e. g., Pd , N Pd ) and corresponding classical complexity classes (P, N P) remains unclear [p. 151 in 78]. Real-Valued Inputs, Outputs, and Constants A common argument, with relevance to the theoretical power of analog computation, is that an input to an analog computer must be determined by setting a dial to a number or by typing a number into digital-to-analog conversion device, and therefore that the input will be a rational number. The same argument applies to any internal constants in the analog computation. Similarly, it is argued, any output from an analog computer must be measured, and the accuracy of measurement is limited, so that the result will be a rational number. Therefore, it is claimed, real numbers are irrelevant to analog computing, since any practical analog computer computes a function from the rationals to the rationals, and can therefore be simulated by a Turing machine. (See related arguments by Martin Davis [18,19].) There are a number of interrelated issues here, which may be considered briefly. First, the argument is couched in terms of the input or output of digital representations, and the numbers so represented are necessarily rational (more generally, computable). This seems natural enough when we think of an analog computer as a calculating device, and in fact many historical analog computers were used in this way and had digital inputs and outputs (since this is our most reliable way of recording and reproducing quantities). However, in many analog control systems, the inputs and outputs are continuous physical quantities that vary continuously in time (also a continuous physical quantity); that is, according to current physical theory, these quantities are real numbers, which vary according to differential equations. It is worth recalling that physical quantities are neither rational nor irrational; they can be so classified only in comparison with each other or with respect to a unit, that is, only if they are measured and digitally represented. Furthermore, physical quantities are neither computable nor uncomputable (in a Church-Turing sense); these terms apply only to discrete representations of these quantities (i. e., to numerals or other digital representations).

Analog Computation

Therefore, in accord with ordinary mathematical descriptions of physical processes, analog computations can can be treated as having arbitrary real numbers (in some range) as inputs, outputs, or internal states; like other continuous processes, continuous-time analog computations pass through all the reals in some range, including nonTuring-computable reals. Paradoxically, however, these same physical processes can be simulated on digital computers. The Issue of Simulation by Turing Machines and Digital Computers Theoretical results about the computational power, relative to Turing machines, of neural networks and other analog models of computation raise difficult issues, some of which are epistemological rather than strictly technical. On the one hand, we have a series of theoretical results proving the super-Turing power of analog computation models of various kinds. On the other hand, we have the obvious fact that neural nets are routinely simulated on ordinary digital computers, which have at most the power of Turing machines. Furthermore, it is reasonable to suppose that any physical process that might be used to realize analog computation—and certainly the known processes—could be simulated on a digital computer, as is done routinely in computational science. This would seem to be incontrovertible proof that analog computation is no more powerful than Turing machines. The crux of the paradox lies, of course, in the non-Turing-computable reals. These numbers are a familiar, accepted, and necessary part of standard mathematics, in which physical theory is formulated, but from the standpoint of Church-Turing (CT) computation they do not exist. This suggests that the the paradox is not a contradiction, but reflects a divergence between the goals and assumptions of the two models of computation. The Problem of Models of Computation These issues may be put in context by recalling that the Church-Turing (CT) model of computation is in fact a model, and therefore that it has the limitations of all models. A model is a cognitive tool that improves our ability to understand some class of phenomena by preserving relevant characteristics of the phenomena while altering other, irrelevant (or less relevant) characteristics. For example, a scale model alters the size (taken to be irrelevant) while preserving shape and other characteristics. Often a model achieves its purposes by making simplifying or idealizing assumptions, which facilitate analysis or simulation of the system. For example, we may use a linear math-

ematical model of a physical process that is only approximately linear. For a model to be effective it must preserve characteristics and make simplifying assumptions that are appropriate to the domain of questions it is intended to answer, its frame of relevance [46]. If a model is applied to problems outside of its frame of relevance, then it may give answers that are misleading or incorrect, because they depend more on the simplifying assumptions than on the phenomena being modeled. Therefore we must be especially cautious applying a model outside of its frame of relevance, or even at the limits of its frame, where the simplifying assumptions become progressively less appropriate. The problem is aggravated by the fact that often the frame of relevance is not explicit defined, but resides in a tacit background of practices and skills within some discipline. Therefore, to determine the applicability of the CT model of computation to analog computing, we must consider the frame of relevance of the CT model. This is easiest if we recall the domain of issues and questions it was originally developed to address: issues of effective calculability and derivability in formalized mathematics. This frame of relevance determines many of the assumptions of the CT model, for example, that information is represented by finite discrete structures of symbols from a finite alphabet, that information processing proceeds by the application of definite formal rules at discrete instants of time, and that a computational or derivational process must be completed in a finite number of these steps.1 Many of these assumptions are incompatible with analog computing and with the frames of relevance of many models of analog computation. Relevant Issues for Analog Computation Analog computation is often used for control. Historically, analog computers were used in control systems and to simulate control systems, but contemporary analog VLSI is also frequently applied in control. Natural analog computation also frequently serves a control function, for example, sensorimotor control by the nervous system, genetic regulation in cells, and self-organized cooperation in insect colonies. Therefore, control systems provide one frame of relevance for models of analog computation. In this frame of relevance real-time response is a critical issue, which models of analog computation, therefore, ought to be able to address. Thus it is necessary to be able to relate the speed and frequency response of analog computation to the rates of the physical processes by which the computation is realized. Traditional methods of algorithm 1 See MacLennan [45,46] for a more detailed discussion of the frame of relevance of the CT model.

179

180

Analog Computation

analysis, which are based on sequential time and asymptotic behavior, are inadequate in this frame of relevance. On the one hand, the constants (time scale factors), which reflect the underlying rate of computation are absolutely critical (but ignored in asymptotic analysis); on the other hand, in control applications the asymptotic behavior of algorithm is generally irrelevant, since the inputs are typically fixed in size or of a limited range of sizes. The CT model of computation is oriented around the idea that the purpose of a computation is to evaluate a mathematical function. Therefore the basic criterion of adequacy for a computation is correctness, that is, that given a precise representation of an input to the function, it will produce (after finitely many steps) a precise representation of the corresponding output of the function. In the context of natural computation and control, however, other criteria may be equally or even more relevant. For example, robustness is important: how well does the system respond in the presence of noise, uncertainty, imprecision, and error, which are unavoidable in real natural and artificial control systems, and how well does it respond to defects and damage, which arise in many natural and artificial contexts. Since the real world is unpredictable, flexibility is also important: how well does an artificial system respond to inputs for which it was not designed, and how well does a natural system behave in situations outside the range of those to which it is evolutionarily adapted. Therefore, adaptability (through learning and other means) is another important issue in this frame of relevance.2 Transcending Turing Computability Thus we see that many applications of analog computation raise different questions from those addressed by the CT model of computation; the most useful models of analog computing will have a different frame of relevance. In order to address traditional questions such as whether analog computers can compute “beyond the Turing limit”, or whether they can solve NP-hard problems in polynomial time, it is necessary to construct models of analog computation within the CT frame of relevance. Unfortunately, constructing such models requires making commitments about many issues (such as the representation of reals and the discretization of time), that may affect the answers to these questions, but are fundamentally unimportant in the frame of relevance of the most useful applications of the concept of analog computation. Therefore, being overly focused on traditional problems in the theory of computation (which was formulated for a different frame of rele2 See MacLennan [45,46] for a more detailed discussion of the frames of relevance of natural computation and control.

vance) may distract us from formulating models of analog computation that can address important issues in its own frame of relevance. Analog Thinking It will be worthwhile to say a few words about the cognitive implications of analog computing, which are a largely forgotten aspect of analog vs. digital debates of the late 20th century. For example, it was argued that analog computing provides a deeper intuitive understanding of a system than the alternatives do [5,Chap. 8 in 82]. On the one hand, analog computers afforded a means of understanding analytically intractable systems by means of “dynamic models”. By setting up an analog simulation, it was possible to vary the parameters and explore interactively the behavior of a dynamical system that could not be analyzed mathematically. Digital simulations, in contrast, were orders of magnitude slower and did not permit this kind of interactive investigation. (Performance has improved sufficiently in contemporary digital computers so that in many cases digital simulations can be used as dynamic models, sometimes with an interface that mimics an analog computer; see [5].) Analog computing is also relevant to the cognitive distinction between knowing how (procedural knowledge) and knowing that (factual knowledge) [Chap. 8 in 82]. The latter (“know-that”) is more characteristic of scientific culture, which strives for generality and exactness, often by designing experiments that allow phenomena to be studied in isolation, whereas the former (“know-how”) is more characteristic of engineering culture; at least it was so through the first half of the twentieth century, before the development of “engineering science” and the widespread use of analytic techniques in engineering education and practice. Engineers were faced with analytically intractable systems, with inexact measurements, and with empirical relationships (characteristic curves, etc.), all of which made analog computers attractive for solving engineering problems. Furthermore, because analog computing made use of physical phenomena that were mathematically analogous to those in the primary system, the engineer’s intuition and understanding of one system could be transferred to the other. Some commentators have mourned the loss of hands-on intuitive understanding attendant on the increasingly scientific orientation of engineering education and the disappearance of analog computer [5,34,61,67]. I will mention one last cognitive issue relevant to the differences between analog and digital computing. As already discussed Sect. “Characteristics of Analog Compu-

Analog Computation

tation”, it is generally agreed that it is less expensive to achieve high precision with digital technology than with analog technology. Of course, high precision may not be important, for example when the available data are inexact or in natural computation. Further, some advocates of analog computing argue that high precision digital result are often misleading [p. 261 in 82]. Precision does not imply accuracy, and the fact that an answer is displayed with 10 digits does not guarantee that it is accurate to 10 digits; in particular, engineering data may be known to only a few significant figures, and the accuracy of digital calculation may be limited by numerical problems. Therefore, on the one hand, users of digital computers might fall into the trap of trusting their apparently exact results, but users of modest-precision analog computers were more inclined to healthy skepticism about their computations. Or so it was claimed. Future Directions Certainly there are many purposes that are best served by digital technology; indeed there is a tendency nowadays to think that everything is done better digitally. Therefore it will be worthwhile to consider whether analog computation should have a role in future computing technologies. I will argue that the approaching end of Moore’s Law [56], which has predicted exponential growth in digital logic densities, will encourage the development of new analog computing technologies. Two avenues present themselves as ways toward greater computing power: faster individual computing elements and greater densities of computing elements. Greater density increases power by facilitating parallel computing, and by enabling greater computing power to be put into smaller packages. Other things being equal, the fewer the layers of implementation between the computational operations and the physical processes that realize them, that is to say, the more directly the physical processes implement the computations, the more quickly they will be able to proceed. Since most physical processes are continuous (defined by differential equations), analog computation is generally faster than digital. For example, we may compare analog addition, implemented directly by the additive combination of physical quantities, with the sequential process of digital addition. Similarly, other things being equal, the fewer physical devices required to implement a computational element, the greater will be the density of these elements. Therefore, in general, the closer the computational process is to the physical processes that realize it, the fewer devices will be required, and so the continuity of physical law suggests that analog com-

putation has the potential for greater density than digital. For example, four transistors can realize analog addition, whereas many more are required for digital addition. Both considerations argue for an increasing role of analog computation in post-Moore’s Law computing. From this broad perspective, there are many physical phenomena that are potentially usable for future analog computing technologies. We seek phenomena that can be described by well-known and useful mathematical functions (e. g., addition, multiplication, exponential, logarithm, convolution). These descriptions do not need to be exact for the phenomena to be useful in many applications, for which limited range and precision are adequate. Furthermore, in some applications speed is not an important criterion; for example, in some control applications, small size, low power, robustness, etc. may be more important than speed, so long as the computer responds quickly enough to accomplish the control task. Of course there are many other considerations in determining whether given physical phenomena can be used for practical analog computation in a given application [47]. These include stability, controllability, manufacturability, and the ease of interfacing with input and output transducers and other devices. Nevertheless, in the post-Moore’s Law world, we will have to be willing to consider all physical phenomena as potential computing technologies, and in many cases we will find that analog computing is the most effective way to utilize them. Natural computation provides many examples of effective analog computation realized by relatively slow, low-precision operations, often through massive parallelism. Therefore, post-Moore’s Law computing has much to learn from the natural world. Bibliography Primary Literature 1. Anderson JA (1995) An Introduction to Neural Networks. MIT Press, Cambridge 2. Ashley JR (1963) Introduction to Analog Computing. Wiley, New York 3. Aspray W (1993) Edwin L. Harder and the Anacom: Analog computing at Westinghouse. IEEE Ann Hist of Comput 15(2):35–52 4. Ben-Hur A, Siegelmann HT, Fishman S (2002) A theory of complexity for continuous time systems. J Complex 18:51–86 5. Bissell CC (2004) A great disappearing act: The electronic analogue computer. In: IEEE Conference on the History of Electronics, Bletchley, June 2004. pp 28–30 6. Blum L, Cucker F, Shub M, Smale S (1998) Complexity and Real Computation. Springer, Berlin 7. Blum L, Shub M, Smale S (1988) On a theory of computation and complexity over the real numbers: NP completeness, re-

181

182

Analog Computation

8.

9.

10.

11.

12.

13.

14. 15.

16.

17. 18.

19. 20. 21.

22.

23.

24.

25. 26.

cursive functions and universal machines. Bulletin Am Math Soc 21:1–46 Bournez O, Campagnolo ML, Graça DS, Hainry E. The General Purpose Analog Computer and computable analysis are two equivalent paradigms of analog computation. In: Theory and Applications of Models of Computation (TAMC 2006). Lectures Notes in Computer Science, vol 3959. Springer, Berlin, pp 631– 643 Bournez O, Cosnard M (1996) On the computational power of dynamical systems and hybrid systems. Theor Comput Sci 168(2):417–59 Bowles MD (1996) US technological enthusiasm and British technological skepticism in the age of the analog brain. Ann Hist Comput 18(4):5–15 Branicky MS (1994) Analog computation with continuous ODEs. In: Proceedings IEEE Workshop on Physics and Computation, Dallas, pp 265–274 Brockett RW (1988) Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems. In: Proceedings 27th IEEE Conference Decision and Control, Austin, December 1988, pp 799–803 Camazine S, Deneubourg J-L, Franks NR, Sneyd G, Theraulaz J, Bonabeau E (2001) Self-organization in Biological Systems. Princeton Univ. Press, New York Changeux J-P (1985) Neuronal Man: The Biology of Mind (trans: Garey LL). Oxford University Press, Oxford Clymer AB (1993) The mechanical analog computers of Hannibal Ford and William Newell. IEEE Ann Hist Comput 15(2): 19–34 Davidson EH (2006) The Regulatory Genome: Gene Regulatory Networks in Development and Evolution. Academic Press, Amsterdam Davies JA (2005) Mechanisms of Morphogensis. Elsevier, Amsterdam Davis M (2004) The myth of hypercomputation. In: Teuscher C (ed) Alan Turing: Life and Legacy of a Great Thinker. Springer, Berlin, pp 195–212 Davis M (2006) Why there is no such discipline as hypercomputation. Appl Math Comput 178:4–7 Fakhraie SM, Smith KC (1997) VLSI-Compatible Implementation for Artificial Neural Networks. Kluwer, Boston Franklin S, Garzon M (1990) Neural computability. In: Omidvar OM (ed) Progress in Neural Networks, vol 1. Ablex, Norwood, pp 127–145 Freeth T, Bitsakis Y, Moussas X, Seiradakis JH, Tselikas A, Mangou H, Zafeiropoulou M, Hadland R, Bate D, Ramsey A, Allen M, Crawley A, Hockley P, Malzbender T, Gelb D, Ambrisco W, Edmunds MG (2006) Decoding the ancient Greek astronomical calculator known as the Antikythera mechanism. Nature 444:587–591 Garzon M, Franklin S (1989) Neural computability ii (extended abstract). In: Proceedings, IJCNN International Joint Conference on Neural Networks, vol 1. Institute of Electrical and Electronic Engineers, New York, pp 631–637 Garzon M, Franklin S (1990) Computation on graphs. In: Omidvar OM (ed) Progress in Neural Networks, vol 2. Ablex, Norwood Goldstine HH (1972) The Computer from Pascal to von Neumann. Princeton, Princeton Grossberg S (1967) Nonlinear difference-differential equations

27.

28.

29. 30.

31. 32. 33.

34.

35. 36.

37.

38.

39.

40.

41. 42.

43. 44.

45.

in prediction and learning theory. Proc Nat Acad Sci USA 58(4):1329–1334 Grossberg S (1973) Contour enhancement, short term memory, and constancies in reverberating neural networks. Stud Appl Math LII:213–257 Grossberg S (1976) Adaptive pattern classification and universal recoding: I. parallel development and coding of neural feature detectors. Biol Cybern 23:121–134 Hartl DL (1994) Genetics, 3rd edn. Jones & Bartlett, Boston Hopfield JJ (1984) Neurons with graded response have collective computational properties like those of two-state neurons. Proc Nat Acad Sci USA 81:3088–92 Howe RM (1961) Design Fundamentals of Analog Computer Components. Van Nostrand, Princeton Khatib O (1986) Real-time obstacle avoidance for manipulators and mobile robots. Int J Robot Res 5:90–99 Kirchhoff G (1845) Ueber den Durchgang eines elektrischen Stromes durch eine Ebene, insbesondere durch eine kreisförmige. Ann Phys Chem 140(64)(4):497–514 Lang GF (2000) Analog was not a computer trademark! Why would anyone write about analog computers in year 2000? Sound Vib 34(8):16–24 Lipshitz L, Rubel LA (1987) A differentially algebraic replacment theorem. Proc Am Math Soc 99(2):367–72 Maass W, Sontag ED (1999) Analog neural nets with Gaussian or other common noise distributions cannot recognize arbitrary regular languages. Neural Comput 11(3):771–782 MacLennan BJ (1987) Technology-independent design of neurocomputers: The universal field computer. In: Caudill M, Butler C (eds) Proceedings of the IEEE First International Conference on Neural Networks, vol 3, IEEE Press, pp 39–49 MacLennan BJ (1990) Field computation: A theoretical framework for massively parallel analog computation, parts I–IV. Technical Report CS-90-100, Department of Computer Science, University of Tennessee, Knoxville. Available from http:// www.cs.utk.edu/~mclennan. Accessed 20 May 2008 MacLennan BJ (1991) Gabor representations of spatiotemporal visual images. Technical Report CS-91-144, Department of Computer Science, University of Tennessee, Knoxville. Available from http://www.cs.utk.edu/~mclennan. Accessed 20 May 2008 MacLennan BJ (1994) Continuous computation and the emergence of the discrete. In: Pribram KH (ed) Origins: Brain & SelfOrganization, Lawrence Erlbaum, Hillsdale, pp 121–151. MacLennan BJ (1994) “Words lie in our way”. Minds Mach 4(4):421–437 MacLennan BJ (1995) Continuous formal systems: A unifying model in language and cognition. In: Proceedings of the IEEE Workshop on Architectures for Semiotic Modeling and Situation Analysis in Large Complex Systems, Monterey, August 1995. pp 161–172. Also available from http://www.cs.utk.edu/ ~mclennan. Accessed 20 May 2008 MacLennan BJ (1999) Field computation in natural and artificial intelligence. Inf Sci 119:73–89 MacLennan BJ (2001) Can differential equations compute? Technical Report UT-CS-01-459, Department of Computer Science, University of Tennessee, Knoxville. Available from http:// www.cs.utk.edu/~mclennan. Accessed 20 May 2008 MacLennan BJ (2003) Transcending Turing computability. Minds Mach 13:3–22

Analog Computation

46. MacLennan BJ (2004) Natural computation and non-Turing models of computation. Theor Comput Sci 317:115–145 47. MacLennan BJ (in press) Super-Turing or non-Turing? Extending the concept of computation. Int J Unconv Comput, in press 48. Maini PK, Othmer HG (eds) (2001) Mathematical Models for Biological Pattern Formation. Springer, New York 49. Maziarz EA, Greenwood T (1968) Greek Mathematical Philosophy. Frederick Ungar, New York 50. McClelland JL, Rumelhart DE, the PDP Research Group. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 2: Psychological and Biological Models. MIT Press, Cambridge 51. Mead C (1987) Silicon models of neural computation. In: Caudill M, Butler C (eds) Proceedings, IEEE First International Conference on Neural Networks, vol I. IEEE Press, Piscataway, pp 91–106 52. Mead C (1989) Analog VLSI and Neural Systems. AddisonWesley, Reading 53. Mills JW (1996) The continuous retina: Image processing with a single-sensor artificial neural field network. In: Proceedings IEEE Conference on Neural Networks. IEEE Press, Piscataway 54. Mills JW, Himebaugh B, Kopecky B, Parker M, Shue C, Weilemann C (2006) “Empty space” computes: The evolution of an unconventional supercomputer. In: Proceedings of the 3rd Conference on Computing Frontiers, New York, May 2006. ACM Press, pp 115–126 55. Moore C (1996) Recursion theory on the reals and continuoustime computation. Theor Comput Sci 162:23–44 56. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117 57. Murray JD (1977) Lectures on Nonlinear Differential-Equation Models in Biology. Oxford, Oxford 58. Omohundro S (1984) Modeling cellular automata with partial differential equations. Physica D 10:128–34, 1984. 59. Orponen P (1997) A survey of continous-time computation theory. In: Advances in Algorithms, Languages, and Complexity, Kluwer, Dordrecht, pp 209–224 60. Orponen P, Matamala M (1996) Universal computation by finite two-dimensional coupled map lattices. In: Proceedings, Physics and Computation 1996, New England Complex Systems Institute, Cambridge, pp 243–7 61. Owens L (1986) Vannevar Bush and the differential analyzer: The text and context of an early computer. Technol Culture 27(1):63–95 62. Peterson GR (1967) Basic Analog Computation. Macmillan, New York 63. Pour-El MB (1974) Abstract computability and its relation to the general purpose analog computer (some connections between logic, differential equations and analog computers). Trans Am Math Soc 199:1–29 64. Pour-El MB, Richards I (1979) A computable ordinary differential equation which possesses no computable solution. Ann Math Log 17:61–90 65. Pour-EL MB, Richards I (1981) The wave equation with computable initial data such that its unique solution is not computable. Adv Math 39:215–239 66. Pour-El MB, Richards I (1982) Noncomputability in models of physical phenomena. Int J Theor Phys, 21:553–555 67. Puchta S (1996) On the role of mathematics and mathematical

68. 69.

70. 71. 72. 73. 74.

75.

76. 77.

78. 79.

80. 81. 82.

83.

84.

85. 86.

87. 88.

89.

knowledge in the invention of Vannevar Bush’s early analog computers. IEEE Ann Hist Comput 18(4):49–59 Reiner JM (1968) The Organism as an Adaptive Control System. Prentice-Hall, Englewood Cliffs Rimon E, Koditschek DE (1989) The construction of analytic diffeomorphisms for exact robot navigation on star worlds. In: Proceedings of the 1989 IEEE International Conference on Robotics and Automation, Scottsdale AZ. IEEE Press, New York, pp 21–26 Rogers AE, Connolly TW (1960) Analog Computation in Engineering Design. McGraw-Hill, New York Rubel LA (1985) The brain as an analog computer. J Theor Neurobiol 4:73–81 Rubel LA (1988) Some mathematical limitations of the generalpurpose analog computer. Adv Appl Math 9:22–34 Rubel LA (1993) The extended analog computer. Adv Appl Math 14:39–50 Rumelhart DE, McClelland JL, the PDP Research Group (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol 1: Foundations. MIT Press, Cambridge Sanger TD (1996) Probability density estimation for the interpretation of neural population codes. J Neurophysiol 76:2790– 2793 Shannon CE (1941) Mathematical theory of the differential analyzer. J Math Phys Mass Inst Technol 20:337–354 Shannon CE (1993) Mathematical theory of the differential analyzer. In: Sloane NJA, Wyner AD (eds) Claude Elwood Shannon: Collected Papers. IEEE Press, New York, pp 496–513 Siegelmann HT (1999) Neural Networks and Analog Computation: Beyond the Turing Limit. Birkhäuser, Boston Siegelmann HT, Ben-Hur A, Fishman S (1999) Computational complexity for continuous time dynamics. Phys Rev Lett 83(7):1463–6 Siegelmann HT, Sontag ED (1994) Analog computation via neural networks. Theor Comput Sci 131:331–360 Small JS (1993) General-purpose electronic analog computing. IEEE Ann Hist Comput 15(2):8–18 Small JS (2001) The Analogue Alternative: The electronic analogue computer in Britain and the USA, 1930–1975. Routledge, London & New York Stannett M (19901) X-machines and the halting problem: Building a super-Turing machine. Form Asp Comput 2:331– 341 Thomson W (Lord Kelvin) (1876) Mechanical integration of the general linear differential equation of any order with variable coefficients. Proc Royal Soc 24:271–275 Thomson W (Lord Kelvin) (1878) Harmonic analyzer. Proc Royal Soc 27:371–373 Thomson W (Lord Kelvin) (1938). The tides. In: The Harvard Classics, vol 30: Scientific Papers. Collier, New York, pp 274– 307 Truitt TD, Rogers AE (1960) Basics of Analog Computers. John F. Rider, New York van Gelder T (1997) Dynamics and cognition. In: Haugeland J (ed) Mind Design II: Philosophy, Psychology and Artificial Intelligence. MIT Press, Cambridge MA, revised & enlarged edition, Chap 16, pp 421–450 Weyrick RC (1969) Fundamentals of Analog Computers. Prentice-Hall, Englewood Cliffs

183

184

Analog Computation

90. Wolpert DH (1991) A computationally universal field computer which is purely linear. Technical Report LA-UR-91-2937. Los Alamos National Laboratory, Loa Alamos 91. Wolpert DH, MacLennan BJ (1993) A computationally universal field computer that is purely linear. Technical Report CS93-206. Dept. of Computer Science, University of Tennessee, Knoxville

Books and Reviews Fifer S (1961) Analog computation: Theory, techniques and applications, 4 vols. McGraw-Hill, New York Bissell CC (2004) A great disappearing act: The electronic analogue

computer. In: IEEE conference on the history of electronics, 28– 30 Bletchley, June 2004 Lipka J (1918) Graphical and mechanical computation. Wiley, New York Mead C (1989) Analog VLSI and neural systems. Addison-Wesley, Reading Siegelmann HT (1999) Neural networks and analog computation: Beyond the Turing limit. Birkhäuser, Boston Small JS (2001) The analogue alternative: The electronic analogue computer in Britain and the USA, 1930–1975. Routledge, London & New York Small JS (1993) General-purpose electronic analog computing: 1945–1965. IEEE Ann Hist Comput 15(2):8–18

Artificial Chemistry

Artificial Chemistry PETER DITTRICH Department of Mathematics and Computer Science, Friedrich Schiller University Jena, Jena, Germany

Article Outline Glossary Definition of the Subject Introduction Basic Building Blocks of an Artificial Chemistry Structure-to-Function Mapping Space Theory Evolution Information Processing Future Directions Bibliography

Glossary Molecular species A molecular species is an abstract class denoting an ensemble of identical molecules. Equivalently the terms “species”, “compound”, or just “molecule” are used; in some specific context also the terms “substrate” or “metabolite”. Molecule A molecule is a concrete instance of a molecular species. Molecules are those entities of an artificial chemistry that react. Note that sometimes the term “molecule” is used equivalently to molecular species. Reaction network A set of molecular species together with a set of reaction rules. Formally, a reaction network is equivalent to a Petri network. A reaction network describes the stoichiometric structure of a reaction system. Order of a reaction The order of a reaction is the sum of the exponents of concentrations in the kinetic rate law (if given). Note that an order can be fractional. If only the stoichiometric coefficients of the reaction rule are given (Eq. (1)), the order is the sum of the coefficients of the left-hand side species. When assuming mass-action kinetics, both definitions are equivalent. Autocatalytic set A (self-maintaining) set where each molecule is produced catalytically by molecules from that set. Note that an autocatalytic may produce molecules not present in that set. Closure A set of molecules A is closed, if no combination of molecules from A can react to form a molecule outside A. Note that the term “closure” has been also

used to denote the catalytical closure of an autocatalytic set. Self-maintaining A set of molecules is called self-maintaining, if it is able to maintain all its constituents. In a purely autocatalytic system under flow condition, this means that every molecule can be catalytically produced by at least one reaction among molecule from the set. (Chemical) Organization A closed and self-maintaining set of molecules. Multiset A multiset is like a set but where elements can appear more than once; that is, each element has a multiplicity, e. g., fa; a; b; c; c; cg. Definition of the Subject Artificial chemistries are chemical-like systems or abstract models of chemical processes. They are studied in order to illuminate and understand fundamental principles of chemical systems as well as to exploit the chemical metaphor as a design principle for information processing systems in fields like chemical computing or nonlinear optimization. An artificial chemistry (AC) is usually a formal (and, more seldom, a physical) system that consists of objects called molecules, which interact according to rules called reactions. Compared to conventional chemical models, artificial chemistries are more abstract in the sense that there is usually not a one-to-one mapping between the molecules (and reactions) of the artificial chemistry to real molecules (and reactions). An artificial chemistry aims at capturing the logic of chemistry rather than trying to explain a particular chemical system. More formally, an artificial chemistry can be defined by a set of molecular species M, a set of reaction rules R, and specifications of the dynamics, e. g., kinetic laws, update algorithm, and geometry of the reaction vessel. The scientific motivation for AC research is to build abstract models in order to understand chemical-like systems in all kind of domains ranging from physics, chemistry, biology, computer science, and sociology. Abstract chemical models to study fundamental principles, such as spatial pattern formation and self-replication, can be traced back to the beginning of modern computer science in the 40s [91,97]. Since then a growing number of approaches have tackled questions concerning the origin of complex forms, the origin of life itself [19,81], or cellular diversity [48]. In the same way a variety of approaches for constructing ACs have appeared, ranging from simple ordinary differential equations [20] to complex individualbased algebraic transformation systems [26,36,51].

185

186

Artificial Chemistry

The engineering motivation for AC research aims at employing the chemical metaphor as a programming and computing paradigm. Approaches can be distinguished according to whether chemistry serves as a paradigm to program or construct conventional in-silico information processing systems [4], Or whether real molecules are used for information processing like in molecular computing [13] or DNA computing [2]  Cellular Computing,  DNA Computing. Introduction The term artificial chemistry appeared around 1990 in the field of artificial life [72]. However, models that fell under the definition of artificial chemistry appeared also decades before, such as von Neumann’s universal self-replicating automata [97]. In the 50s, a famous abstract chemical system was introduced by Turing [91] in order to show how spatial diffusion can destabilize a chemical system leading to spatial pattern formation. Turing’s artificial chemistries consist of only a handful of chemical species interacting according to a couple of reaction rules, each carefully designed. The dynamics is simulated by a partial differential equation. Turing’s model possesses the structure of a conventional reaction kinetics chemical model (Subsect. “The Chemical Differential Equation”). However it does not aim at modeling a particular reaction process, but aims at exploring how, in principle, spatial patterns can be generated by simple chemical processes. Turing’s artificial chemistries allow one to study whether a particular mechanism (e. g., destabilization through diffusion) can explain a particular phenomena (e. g., symmetry breaking and pattern formation). The example illustrates that studying artificial chemistries is more a synthetic approach. That is, understanding should be gained through the synthesis of an artificial system and its observation under various conditions [45,55]. The underlying assumption is that understanding a phenomena and how this phenomena can be (re-)created (without copying it) is closely related [96]. Today’s artificial chemistries are much more complex than Turing’s model. They consist of a huge, sometimes infinite, amount of different molecular species and their reaction rules are defined not manually one by one, but implicitly (Sect. “Structure-to-Function Mapping”) and can evolve (Sect. “Evolution”). Questions that are tackled are the structure and organization of large chemical networks, the origin of life, prebiotic chemical evolution, and the emergence of chemical information and its processing.

Introductory Example It is relatively easy to create a complex, infinitely sized reaction network. Assume that the set of possible molecules M are strings over an alphabet. Next, we have to define a reaction function describing what happens when two molecules a1 ; a2 2 M collide. A general approach is to map a1 to a machine, operator, or function f D fold(a1 ) with f : M ! M [ felasticg. The mapping fold can be defined in various ways, such as by interpreting a1 as a computer program, a Turing machine, or a lambda expression (Sect. “Structure-to-Function Mapping”). Next, we apply the machine f derived from a1 to the second molecule. If f (a2 ) D elastic the molecules collided elastically and nothing happens. This could be the case if the machine f does not halt within a predefined amount of time. Otherwise the molecules react and a new molecule a3 D f (a2 ) is produced. How this new molecule changes the reaction vessel depends on the algorithm simulating the dynamics of the vessel. Algorithm 1 is a–simple, widely applied algorithm that simulates second-order catalytic reactions under flow conditions: Because the population size is constant and thus limited, there is competition and only those molecules that are somehow created will sustain. First, the algorithm chooses two molecules randomly, which simulates a collision. Note that a realistic measure of time takes the amount of collision and not the amount of reaction events into account. Then, we determine stochastically whether the selected molecules react. If the molecules react, a randomly selected molecule from the population is replaced by the new product, which assures that the population size stays constant. The replaced molecules constitute the outflow. The inflow is implicitly assumed and is generated by the catalytic production of molecules. From a chemical point of view, the algorithm assumes that a product is built from an energy-rich substrate, which does not appear in the algorithm and which is assumed to be available at constant concentration. The following section (Sect. “Basic Building Blocks of an Artificial Chemistry”) describes the basic building blocks of an artificial chemistry in more detail. In Sect. “Structure-to-Function Mapping” we will explore various techniques to define reaction rules. The role of space is briefly discussed in Sect. “Space”. Then the important theoretical concepts of an autocatalytic set and a chemical organization are explained (Sect. “Theory”). Sect. “Evolution” shows how artificial chemistries can be used to study (chemical) evolution. This article does not discuss information processing in detail, because there are a series of other specialized articles on that topic, which are summarized in Sect. “Information Processing”.

Artificial Chemistry

INPUT: P : population (array of k molecules) randomPosition() := randomInt(0, k-1); reaction(r1, r2) := (fold(r1))(r2); while not terminate do reactant1 := P[randomPosition()]; reactant2 := P[randomPosition()]; product := reaction(reactant1, reactant2); if product == not elastic P[randomPosition()] := product; fi t := t + 1/k ;; increment simulated time od Artificial Chemistry, Algorithm 1

Basic Building Blocks of an Artificial Chemistry

Reaction Rules

Molecules

The set of reaction rules R describes how molecules from M D fa1 ; a2 ; : : :g interact. A rule 2 R can be written

The first step in defining an AC requires that we define the set of all molecular species that can appear in the AC. The easiest way to specify this set is to enumerate explicitly all molecules as symbols. For example: M D fH2 ; O2 ; H2 O; H2 O2 g or, equivalently, M D fa; b; c; dg. A symbol is just a name without any structure. Most conventional (bio-)chemical reaction system models are defined this way. Also many artificial chemistries where networks are designed “by hand” [20], are created randomly [85], or are evolved by changing the reaction rules [41,42,43] use only symbolic molecules without any structure. Alternatively, molecules can posses a structure. In that case, the set of molecules is implicitly defined. For example: M D fall polymers that can be made from the two monomers a and b}. In this case, the set of all possible molecules can even become infinite, or at least quite large. A vast variety of such rule-based molecule definitions can be found in different approaches. For example, structured molecules may be abstract character sequences [3,23, 48,64], sequences of instructions [1,39], lambda-expressions [26], combinators [83], binary strings [5,16,90], numbers [8], machine code [72], graphs [78], swarms [79], or proofs [28]. We can call a molecule’s representation its structure, in contrast to its function, which is given by the reaction rules R. The description of the valid molecules and their structure is usually the first step when an AC is defined. This is analogous to a part of chemistry that describes what kind of atom configurations form stable molecules and how these molecules appear.

according to the chemical notation of reaction rules in the form l a 1 ; a1 C l a 2 ; a2 C    ! r a 1 ; a1 C r a 2 ; a2 C    : (1) The stoichiometric coefficients l a; and r a; describe the amount of molecular species a 2 M in reaction 2 R on the left-hand and right-hand side, respectively. Together, the stoichiometric coefficients define the stoichiometric matrix S D (s a; ) D (r a;  l a; ) :

(2)

An entry s a; of the stoichiometric matrix denotes the net amount of molecules of type a produced in reaction . A reaction rule determines a multiset of molecules (the left-hand side) that can react and subsequently be replaced by the molecules on the right-hand side. Note that the sign “C” is not an operator here, but should only separate the components on either side. The set of all molecules M and a set of reaction rules R define the reaction network hM; Ri of the AC. The reaction network is equivalent to a Petri net [71]. It can be represented by two matrices, (l a; ) and (r a; ), made up of the stoichiometric coefficients. Equivalently we can represent the reaction network as a hyper-graph or a directed bipartite graph with two node types denoting reaction rules and molecular species, respectively, and edge labels for the stoichiometry. A rule is applicable only if certain conditions are fulfilled. The major condition is that all of the left-hand side components must be available. This condition can

187

188

Artificial Chemistry

be broadened easily to include other parameters such as neighborhood, rate constants, probability for a reaction, influence of modifier species, or energy consumption. In such a case, a reaction rule would also contain additional information or further parameters. Whether or not these additional predicates are taken into consideration depends on the objectives of the artificial chemistry. If it is meant to simulate real chemistry as accurate as possible, it is necessary to integrate these parameters into the simulation system. If the goal is to build an abstract model, many of these parameters can be omitted. Like for the set of molecules we can define the set of reaction rules explicitly by enumerating all rules symbolically [20], or we can define it implicitly by referring to the structure of the molecules. Implicit definitions of reaction rules use, for example, string matching/string concatenation [3,60,64], lambda-calculus [26,27], Turing machines [40,90], finite state machines [98] or machine code language [1,16,78,87], proof theory [27], matrix multiplication [5], swarm dynamics [79], or simple arithmetic operations [8]. Note that in some cases the reactions can emerge as the result of the interaction of many atomic particles, whose dynamics is specified by force fields or rules [79]. Section “Structure-to-Function Mapping” will discuss implicitly defined reactions in more details. Dynamics The third element of an artificial chemistry is its specification of how the reaction rules give rise to a dynamic process of molecules reacting in some kind of reaction vessel. It is assumed that the reaction vessel contains multiple copies of molecules, which can be seen as instances of the molecular species M. This section summarizes how the dynamics of a reaction vessel (which usually contains a huge number of molecules) can be modeled and simulated. The approaches can be characterized roughly by whether each molecule is treated explicitly or whether all molecules of one type are represented by a number denoting their frequency or concentration. The Chemical Differential Equation A reaction network hM; Ri specifies the structure of a reaction system, but does not contain any notion of time. A common way to specify the dynamics of the reaction system is by using a system of ordinary differential equations of the following form: x˙ (t) D Sv(x(t)) ;

(3)

where x D (x1 ; : : : ; x m )T 2 Rm is a concentration vector

depending on time t, S the stoichiometric matrix derived from R (Eq. (2)), and v D (v1 ; : : : ; vr )T 2 Rr a flux vector depending on the current concentration vector. A flux v 0 describes the velocity or turnover rate of reaction 2 R. The actual value of v depends usually on the concentration of the species participating in the reaction (i. e., LHS( ) fa 2 M : l a; > 0g) and sometimes on additional modifier species, whose concentration is not influenced by that reaction. There are two important assumptions that are due to the nature of reaction systems. These assumptions relate the kinetic function v to the reaction rules R: Assumption 1 A reaction can only take place if all species of its left-hand side LHS(() ) are present. This implies that for all molecules a 2 M and reactions 2 R with a 2 LHS( ), if x a D 0 then v D 0. The flux v must be zero, if the concentration xa of a molecule appearing on the left-hand side of this reaction is zero. This assumption meets the obvious fact that a molecule has to be present to react. Assumption 2 If all species LHS( ) of a reaction 2 R are present in the reactor (e. g. for all a 2 LHS( ); x a > 0) the flux of that reaction is positive, (i. e., v > 0). In other words, the flux v must be positive, if all molecules required for that reaction are present, even in small quantities. Note that this assumption implies that a modifier species necessary for that reaction must appear as a species on the left- (and right-) hand side of . In short, taking Assumption 1 and 2 together, we demand for a chemical differential equation: v > 0 , 8a 2 LHS( ); x a > 0 :

(4)

Note that Assumption 1 is absolutely necessary. Although Assumption 2 is very reasonable, there might be situations where it is not true. Assume, for example, a reaction of the form a C a ! b. If there is only one single molecule a left in the reaction vessel, the reaction cannot take place. Note however, if we want to model this effect, an ODE is the wrong approach and we should choose a method like those described in the following sections. There are a large number of kinetic laws fulfilling these assumptions (Eq. (4)), including all laws that are usually applied in practice. The most fundamental of such kinetic laws is mass-action kinetics, which is just the product of the concentrations of the interacting molecules (cf. [32]): Y l a; v (x) D xa : (5) a2M

P The sum of the exponents in the kinetic law a2M l a; is called the order of the reaction. Most more compli-

Artificial Chemistry

cated laws like Michaelis–Menten kinetics are derived from mass-action kinetics. In this sense, mass-action kinetics is the most general approach, allowing one to emulate all more complicated laws derived from it, but at the expense of a larger amount of molecular species. Explicitly Simulated Collisions The chemical differential equation is a time and state continuous approximation of a discrete system where molecules collide stochastically and react. In the introduction we saw how this can be simulated by a rather simple algorithm, which is widely used in AC research [5,16,26]. The algorithm presented in the introduction simulates a second-order catalytic flow system with constant population size. It is relatively easy to extend this algorithm to include reaction rates and arbitrary orders. First, the algorithm chooses a subset from P, which simulates a collision among molecules. Note that the resulting subset could be empty, which can be used to simulate an inflow (i. e., reactions of the form ! A, where the left-hand side is empty). By defining the probability of obtaining a subset of size n, we can control the rate of reactions of order n. If the molecules react, the reactands are removed from the population and the products are added. Note that this algorithm can be interpreted as a stochastic rewriting system operating on a multiset of molecules [70,88]. For large population size k, the dynamics created by the algorithm tends to the continuous dynamics of the chemical differential equation assuming mass-action kinetics. For the special case of second-order catalytic flow, as described by the algorithm in the introduction, the dynamics can be described by the catalytic network equation [85] X x˙ k D ˛ i;k j x i x j  x k ˚(x) ; with i; j2M

˚(x) D

X

˛ i;k j x i x j ;

(6)

i; j;k2M

which is a generalization of the replicator equation [35]. If the reaction function is deterministic, we can set the kinetic constants to ( 1 reaction(i; j) D k ; k ˛ i; j D (7) 0 otherwise : This dynamic is of particular interest, because competition is generated due to a limited population size. Note that the limited population size is equivalent to an unlimited population size with a limited substrate, e. g., [3].

Discrete-Event Simulation If the copy number of a particular molecular species becomes high or if the reaction rates differ strongly, it is more efficient to use discreteevent simulation techniques. The most famous technique is Gillespie’s algorithm [32]. Roughly, in each step, the algorithm generates two random numbers depending on the current population state to determine the next reaction to occur as well as the time interval t when it will occur. Then the simulated time is advanced t :D t C t, and the molecule count of the population updated according to the reaction that occurred. The Gillespie algorithm generates a statistically correct trajectory of the chemical master equation. The chemical master equation is the most general way to formulate the stochastic dynamics of a chemical system. The chemical master equation is a first-order differential equation describing the time-evolution of the probability of a reaction system to occupy each one of its possible discrete set of states. In a well-stirred (non spatial) AC a state is equivalent to a multiset of molecules. Structure-to-Function Mapping A fundamental “logic” of real chemical systems is that molecules posses a structure and that this structure determines how the molecule interacts with other molecules. Thus, there is a mapping from the structure of a molecule to its function, which determines the reaction rules. In real chemistry the mapping from structure to function is given by natural or physical laws. In artificial chemistries the structure-to-function mapping is usually given by some kind of algorithm. In most cases, an artificial chemistry using a structure-to-function mapping does not aim at modeling real molecular dynamics in detail. Rather the aim is to capture the fact that there is a structure, a function, and a relation between them. Having an AC with a structure-to-function mapping at hand, we can study strongly constructive dynamical systems [27,28]. These are systems where, through the interaction of its components, novel molecular species appear. Example: Lambda-Calculus (AlChemy) This section demonstrates a structure-to-function that employs a concept from computer science: the -calculus. The -calculus has been used by Fontana [26] and Fontana and Buss [27] to define a constructive artificial chemistry. In the so-called AlChemy, a molecule is a normalized -expression. A -expression is a word over an alphabet A D f; :; (; )g [ V where V D fx1 ; x2 ; : : :g is an infinite set of available variable names. The set of  expres-

189

190

Artificial Chemistry

while not terminate() do reactands := choseASubsetFrom(P); if randomReal(0, 1) < reactionProbability(reactands); products = reaction(reactands); P := remove(P, reactands); P := insert(P, products); fi t := t + 1/ sizeOf(P) ;; increment simulated time od Artificial Chemistry, Algorithm 2

sions  is defined for x 2 V ; s1 2 ; s2 2  by x2 x:s2 2  (s2 )s1 2 

variable name ; abstraction ; application :

An abstraction x:s2 can be interpreted as a function definition, where x is the parameter in the “body” s2 . The expression (s2 )s1 can be interpreted as the application of s2 on s1 , which is formalized by the rewriting rule (x:s2 )s1 D s2 [x

 s1 ] ;

(8)

where s2 [x  s1 ] denotes the term which is generated by replacing every unbounded occurrence of x in s2 by s1 . A variable x is bounded if it appears in in a form like : : : (x: : : : x : : :) : : :. It is also not allowed to apply the rewriting rule if a variable becomes bounded. Example: Let s1 D x1 :(x1 )x2 :x2 and s2 D x3 :x3 then we can derive: (s1 )s2 ) (x1 :(x1 )x2 :x2 )x3 :x3 ) (x3 :x3 )x2 :x2 ) x2 :x2 :

(9)

The simplest way to define a set of second-order catalytic reactions R by applying a molecule s1 to another molecule s2 : s1 C s2 ! s1 C s2 C normalForm((s1 )s2 ) :

(10)

The procedure normalForm reduces its argument term to normal form, which is in practice bounded by a maximum of available time and memory; if these resources are exceeded before termination, the collision is considered to be elastic. It should be noted that the -calculus allows an elegant generalization of the collision rule by defining it by -expression ˚ 2  : s1 C s2 ! s1 C s2 C normalForm(((˚ )s1 )s2 ). In order to simulate the artificial chemistry, Fontana and Buss [27] used an algorithm like the algorithm described in Subsect. “Explicitly Simulated Collisions”,

which simulates second-order catalytic mass-action kinetics in a well-stirred population under flow condition. AlChemy possesses a couple of important general properties: Molecules come in two forms: as passive data possessing a structure and as active machines operating on those structures. This generates a “strange loop” like in typogenetics [36,94], which allows molecules to refer to themselves and to operate on themselves. Structures operate on structures, and by doing so change the way structures operate on structures, and so on; creating a “strange”, self-referential dynamic loop. Obviously, there are always some laws that cannot be changed, which are here the rules of the lambda-calculus, defining the structure-to-function mapping. Thus, we can interpret these fixed and predefined laws of an artificial chemistry as the natural or physical laws. Arithmetic and Logic Operations One of the most easy ways to define reaction rules implicitly is to apply arithmetic operations taken from mathematics. Even the simplest rules can generate interesting behaviors. Assume, for example, that the molecules are natural numbers: M D f0; 1; 2 : : : ; n  1g and the reaction rules are defined by division: reaction(a1 ; a2 ) D a1 /a2 if a1 is a multiple of a2 , otherwise the molecules do not react. For the dynamics we assume a finite, discrete population and an algorithm like the one described in Subsect. “Explicitly Simulated Collisions”. With increasing population size the resulting reaction system displays a phase transition, at which the population is able to produce prime numbers with high probability (see [8] for details). In a typical simulation, the initially high diversity of a random population is reduced leaving a set of non-reacting prime numbers. A quite different behavior can be obtained by simply replacing the division by addition: reaction(a1 ; a2 ) D a1 C a2 mod n with n D jMj the number of molecular species. Initializing the well-stirred tank reactor only with

Artificial Chemistry

the molecular species 1, the diversity rapidly increases towards its maximum and the reactor behaves apparently totally chaotic after a very short transient phase (for a sufficiently large population size). However, there are still regularities, which depend on the prime factors of n (cf. [15]). Matrix-Multiplication Chemistry More complex reaction operators can be defined that operate on vectors or strings instead of scalar numbers as molecules. The matrix multiplication chemistry introduce by Banzhaf [5,6,7] uses binary strings as molecules. The reaction between two binary strings is performed by folding one of them into a matrix which then operates on the other string by multiplication.

The system has a couple of interesting properties: Despite the relatively simple definition of the basic reaction mechanism, the resulting complexity of the reaction system and its dynamics is surprisingly high. Like in typogenetics and Fontana’s lambda-chemistry, molecules appear in two forms; as passive data (binary strings) and as active operators (binary matrix). The folding of a binary string to a matrix is the central operation of the structure-to-function mapping. The matrix is multiplied with substrings of the operand and thus some kind of locality is preserved, which mimics local operation of macromolecules (e. g. ribosomes or restriction enzymes) on polymers (e. g. RNA or DNA). Locality is also conserved in the folding, because bits that are close in the string are also close in the matrix.

Example of a reaction for 4-bit molecules Assume a reaction s1 C s2 H) s3 . The general approach is:

Autocatalytic Polymer Chemistries

1. Fold s1 to matrix M. Example: s1 D (s11 ; s12 ; s13 ; s14 ) MD

s11 s13

s12 s14

:

(11)

2. Multiply  M with subsequences of s2 . Example: Let s2 D s21 ; s22 ; s23 ; s24 be divided into two subsequences s212 D s21 ; s22 and s234 D (s23 ; s24 ). Then we can multiply M with the subsequences: s312 D M ˇ s212 ;

s334 D M ˇ s234 :

(12)

3. Compose s3 by concatenating the products. Example: s3 D s312 ˚ s334 . There are various ways of defining the vector matrix product ˇ. It was mainly used with the following threshold multiplication. Given a bit vector x D (x1 ; : : : ; x n ) and a bit matrix M D (M i j ) then the term y D M ˇ x is defined by: ( yj D

Pn

x i M i; j  ˚ :

0

if

1

otherwise :

iD1

(13)

The threshold multiplication is similar to the common matrix-vector product, except that the resulting vector is mapped to a binary vector by using the threshold ˚. Simulating the dynamics by an ODE or explicit molecular collisions (Subsect. “Explicitly Simulated Collisions”), we can observe that such a system would develop into a steady state where some string species support each other in production and thus become a stable autocatalytic cycle, whereas others would disappear due to the competition in the reactor.

In order to study the emergence and evolution of autocatalytic sets [19,46,75] Bagley, Farmer, Fontana, Kauffman and others [3,23,47,48,59] have used artificial chemistries where the molecules are character sequences (e. g., M D fa; b; aa; ab; ba; bb; aaa; aab; : : :g) and the reactions are concatenation and cleavage, for example: aa C babb ˛ aababb

(slow) :

(14)

Additionally, each molecule can act as a catalyst enhancing the rate of a concatenation reaction. aa C babb C bbb ˛ bbb C aababb

(fast) :

(15)

Figure 1 shows an example of an autocatalytic network that appeared. There are two time scales. Reactions that are not catalyzed are assumed to be very slow and not depicted, whereas catalyzed reactions, inflow, and decay are fast. Typical experiments simulate a well-stirred flow reactor using meta-dynamical ODE frame work [3]; that is, the ODE model is dynamically changed when new molecular species appear or present species vanish. Note that the structure of the molecules does not fully determine the reaction rules. Which molecule catalyzes which reaction (the dotted arrows in Fig. 1) is assigned explicitly randomly. An interesting aspect is that the catalytic or autocatalytic polymer sets (or reaction networks) evolve without having a genome [47,48]. The simulation studies show how small, spontaneous fluctuations can be amplified by an autocatalytic network, possibly leading to a modification of the entire network [3]. Recent studies by Fernando and Rowe [24] include further aspects like energy conservation and compartmentalization.

191

192

Artificial Chemistry

Artificial Chemistry, Figure 1 Example of an autocatalytic polymer network. The dotted lines represent catalytic links, e. g., aa C ba C aaa ! aaba C aaa. All molecules are subject to a dilution flow. In this example, all polymerization reactions are reversible and there is a continuous inflow of the monomers a and b. Note that the network contains further auto-catalytic networks, for example, fa; b; aa; bag and fa; aag, of which some are closed and thus organizations, for example, fa; b; aa; bag. The set fa; aag is not closed, because b and ba can be produced

Bagley and Farmer [3] showed that autocatalytic sets can be silhouetted against a noisy background of spontaneously reacting molecules under moderate (that is, neither too low nor too high) flow. Artificial Chemistries Inspired by Turing Machines The concept of an abstract automaton or Turing machine provides a base for a variety of structure-function mappings. In these approaches molecules are usually represented as a sequence of bits, characters, or instructions [36,51,63,86]. A sequence of bits specifies the behavior of a machine by coding for its state transition function. Thus, like in the matrix-multiplication chemistry and the lambda-chemistry, a molecule appears in two forms, namely as passive data (e. g., a binary string) and as an active machine. Also here we can call the mapping from a binary string into its machine folding, which might be indeterministic and may depend on other molecules (e. g., [40]). In the early 1970s Laing [51] argued for abstract, nonanalogous models in order to develop a general theory for living systems. For developing such general theory, so-called artificial organisms would be required. Laing suggested a series of artificial organisms [51,52,53] that should allow one to study general properties of life and thus would allow one to derive a theory which is not re-

stricted to the instance of life we observe on earth. The artificial organisms consist of different compartments, e. g., a “brain” plus “body” parts. These compartments contain binary strings as molecules. Strings are translated to a sequence of instructions forming a three-dimensional shape (cf. [78,86,94]). In order to perform a reaction, two molecules are attached such that they touch at one position (Fig. 2). One of the molecules is executed like a Turing machine, manipulating the passive data molecule as its tape. Laing proved that his artificial organisms are able to perform universal computation. He also demonstrated different forms of self-reproduction, self-description and selfinspection using his molecular machines [53]. Typogenetics is a similar approach, which was introduced in 1979 by Hofstadter in order to illustrate the “formal logic of life” [36]. Later, typogenetics was simulated and investigated in more detail [67,94,95]. The molecules of typogenetics are character sequences (called strands) over the alphabet A, C, G, T. The reaction rules are “typographic” manipulations based on a set of predefined basic operations like cutting, insertion, or deletion of characters. A sequence of such operations forms a unit (called an enzyme) which may operate on a character sequence like a Turing machine on its tape. A character string can be “translated” to an enzyme (i. e., a sequence of operations) by mapping two characters to an operation according to a predefined “genetic code”.

Artificial Chemistry

sites could not coadapt and went extinct. In a spatial environment (e. g., a 2-D lattice), sets of cooperating polymers evolved, interacting in a hypercyclic fashion. The authors also observed a chemoton-like [30] cooperative behavior, with spatially isolated, membrane-bounded evolutionary stable molecular organizations. Machines with Fixed Tape Size In order to perform large-scale systematic simulation studies like the investigation of noise [40] or intrinsic evolution [16], it makes sense to limit the study to molecules of fixed tractable length. Ikegami and Hashimoto [40] developed an abstract artificial chemistry with two types of molecular species: 7 bit long tapes, which are mapped to 16 bit long machines. Tapes and machines form two separated populations, simulated in parallel. Reactions take place between a tape and a machine according to the following reaction scheme: 0 sM C sT ! sM C sT C sM C sT0 :

Artificial Chemistry, Figure 2 Illustration of Laing’s molecular machines. A program molecule is associated with a data molecule. Figure from [52]

Another artificial chemistry whose reaction mechanism is inspired by the Turing machine was suggested by McCaskill [63] in order to study the self-organization of complex chemical systems consisting of catalytic polymers. Variants of this AC were realized later in a specially designed parallel reconfigurable computer based on FPGAs – Field Programmable Gate Arrays [12,18,89]. In this approach, molecules are binary strings of fixed length (e. g., 20 [63]) or of variable length [12]. As in previous approaches, a string codes for an automaton able to manipulate other strings. And again, pattern matching is used to check whether two molecules can react and to obtain the “binding site”; i. e., the location where the active molecule (machine) manipulates the passive molecule (tape). The general reaction scheme can be written as: sM C sT ! sM C sT0 :

(16)

In experimental studies, self-replicating strings appeared frequently. In coevolution with parasites, an evolutionary arms race started among these species and the selfreplicating string diversified to an extent that the para-

(17)

A machine sM can react with a tape sT , if its head matches a substring of the tape sT and its tail matches a different substring of the tape sT . The machine operates only between these two substrings (called reading frame) which results in a tape sT0 . The tape sT0 is then “translated” (folded) 0 . into a machine sM Ikegami and Hashimoto [40] showed that under the influence of low noise, simple autocatalytic loops are formed. When the noise level is increased, the reaction network is destabilized by parasites, but after a relatively long transient phase (about 1000 generations) a very stable, dense reaction network appears, called core network [40]. A core network maintains its relatively high diversity even if the noise is deactivated. The active mutation rate is high. When the noise level is very high, only small, degenerated core networks emerge with a low diversity and very low (even no) active mutation. The core networks which emerged under the influence of external noise are very stable so that their is no further development after they have appeared. Assembler Automata An Assembler automaton is like a parallel von Neumann machine. It consists of a core memory and a set of processing units running in parallel. Inspired by Core Wars [14], assembler automata have been used to create certain artificial life systems, such as Coreworld [72,73], Tierra [74], Avida [1], and Amoeba [69]. Although these machines have been classified as artificial chemistries [72], it is in general difficult to identify molecules or reactions. Furthermore, the assembler automaton Tierra has explicitly

193

194

Artificial Chemistry

been designed to imitate the Cambrian explosion and not a chemical process. Nevertheless, in some cases we can interpret the execution of an assembler automaton as a chemical process, which is especially possible in a clear way in experiments with Avida. Here, a molecule is a single assembler program, which is protected by a memory management system. The system is initialized with manually written programs that are able to self-replicate. Therefore, in a basic version of Avida only unimolecular first-order reactions occur, which are of replicator type. The reaction scheme can be written as a ! a C mutation(a). The function mutation represents the possible errors that can occur while the program is self-replicating. Latticei Molecular Systems In this section systems are discussed which consist of a regular lattice, where each lattice site can hold a part (e. g. atom) of a molecule. Between parts, bonds can be formed so that a molecule covers many lattice sites. This is different to systems where a lattice site holds a complete molecule, like in Avida. The important difference is that in lattice molecular systems, the space of the molecular structure is identical to the space in which the molecules are floating around. In systems where a molecule covers just one lattice site, the molecular structure is described in a different space independently from the space in which the molecule is located. Lattice molecular systems have been intensively used to model polymers [92], protein folding [82], and RNA structures. Besides approaches which intend to model real molecular dynamics as accurately as possible, there are also approaches which try to build abstract models. These models should give insight into statistical properties of polymers like their energy landscape and folding processes [76,77], but should not give insight into questions concerning origin and evolution self-maintaining organizations or molecular information processing. For these questions more abstract systems are studied as described in the following. Varela, Maturana, and Uribe introduced in [93] a lattice molecular system to illustrate their concept of autopoiesis (cf. [65,99]). The system consists of a 2-D square lattice. Each lattice site can be occupied by one of the following atoms: Substrate S, catalyst K, and monomer L. Atoms may form bonds and thus form molecular structures on the lattice. If molecules come close to each other they may react according to the following reaction rules: K C 2S ! K C L (1) Composition: Formation of a monomer. : : : -L-L C L ! : : : -L-L-L (2) Concatenation:

Artificial Chemistry, Figure 3 Illustration of a lattice molecular automaton that contains an autopoietic entity. Its membrane is formed by a chain of L monomers and encloses a catalyst K. Only substrate S may diffuse through the membrane. Substrate inside is catalyzed by K to form free monomers. If the membrane is damaged by disintegration of a monomer L it can be quickly repaired by a free monomer. See [65,93] for details

L C L ! L-L

Formation of a bond between a monomer and another monomer with no more than one bond. This reaction is inhibited by a double-bounded monomer [65]. L ! 2S (3) Disintegration: Decomposition of a monomer Figure 3 illustrates an autopoietic entity that may arise. Note that it is quite difficult to find the right dynamical conditions under which such autopoietic structures are stable [65,68].

Cellular Automata Cellular automata  Mathematical Basis of Cellular Automata, Introduction to can be used as a medium to simulate chemical-like processes in various ways. An obvious approach is to use the cellular automata to model space where each cell can hold a molecule (like in Avida) or atom (like in lattice molecular automata). However there are approaches where it is not clear at the onset what a molecule or reaction is. The model specification does not contain any notion of a molecule or reaction, so that an observer has to identify them. Molecules can be equated

Artificial Chemistry

Artificial Chemistry, Figure 4 Example of mechanical artificial chemistries. a Self-assembling magnetic tiles by Hosokawa et al. [38]. b Rotating magnetic discs by Grzybowski et al. [33]. Figures taken (and modified) from [33,38]

with self-propagating patterns like gliders, self-reproducing loops [54,80], or with the moving boundary between two homogeneous domains [37]. Note that in the latter case, particles become visible as space-time structures, which are defined by boundaries and not by connected set of cells with specific states. An interaction between these boundaries is then interpreted as a reaction. Mechanical Artificial Chemistries There are also physical systems that can be regarded as artificial chemistries. Hosokawa et al. [38] presented a mechanical self-assembly system consisting of triangular shapes that form bonds by permanent magnets. Interpreting attachment and detachment as chemical reactions, Hosokawa et al. [38] derived a chemical differential equation (Subsect. “The Chemical Differential Equation”) modeling the kinetics of the structures appearing in the system. Another approach uses millimeter-sized magnetic disks at a liquid-air interface, subject to a magnetic field produced by a rotating permanent magnet [33]. These magnetic discs exhibit various types of structures, which might even interact resembling chemical reactions. The important difference to the first approach is that the rotating magnetic discs form dissipative structures, which require continuous energy supply.

Semi-Realistic Molecular Structures and Graph Rewriting Recent development in artificial chemistry aims at more realistic molecular structures and reactions, like oo chemistry by Bersini [10] (see also [57]) or toy chemistry [9]. These approaches apply graph rewriting, which is a powerful and quite universal approach for defining transformations of graph-like structures [50] In toy chemistry, molecules are represented as labeled graphs; i. e., by their structural formulas; their basic properties are derived by a simplified version of the extended Hückel MO theory that operates directly on the graphs; chemical reaction mechanisms are implemented as graph rewriting rules acting on the structural formulas; reactivities and selectivities are modeled by a variant of the frontier molecular orbital theory based on the extended Hückel scheme. Figure 5 shows an example of a graph-rewriting rule for unimolecular reactions. Space Many approaches discussed above assume that the reaction vessel is well-stirred. However, especially in living systems, space plays an important role. Cells, for example, are not a bag of enzymes, but spatially highly structured.

195

196

Artificial Chemistry

Theory

Artificial Chemistry, Figure 5 An example of a graph-rewriting rule (top) and its application to the synthesis of a bridge ring system (bottom). Figure from [9]

Moreover, it is assumed that space has played an important role in the origin of life. Techniques for Modeling Space In chemistry, space is usually modeled by assuming an Euclidean space (partial differential equation or particlebased) or by compartments, which we can find also in many artificial chemistries [29,70]. However, some artificial chemistries use more abstract, “computer friendly” spaces, such as a core-memory like in Tierra [74] or a planar triangular graph [83]. There are systems like MGS [31] that allow to specify various topological structures easily. Approaches like P-systems and membrane computing [70] allow one even to change the spatial structure dynamically  Membrane Computing. Phenomena in Space In general, space delays the extinction of species by fitter species, which is for example, used in Avida to obtain more complex evolutionary patterns [1]. Space leads usually to higher diversity. Moreover, systems that are instable in a well-stirred reaction vessel can become stable in a spatial situation. An example is the hypercycle [20], which stabilizes against parasites when space is introduced [11]. Conversely, space can destabilize an equilibrium, leading to symmetry breaking, which has been suggested as an important mechanism underlying morphogenesis [91]. Space can support the co-existence of chemical species, which would not co-exist in a well-stirred system [49]. Space is also necessary for the formation of autopoietic structures [93] and the formation of units that undergo Darwinian evolution [24].

There is a large body of theory from domains like chemical reaction system theory [21], Petri net theory [71], and rewriting system theory (e. g., P-systems [70]), which applies also to artificial chemistries. In this section, however, we shall investigate in more detail those theoretical concepts which have emerged in artificial chemistry research. When working with complex artificial chemistries, we are usually more interested in the qualitative than quantitative nature of the dynamics. That is, we study how the set of molecular species present in the reaction vessel changes over time, rather than studying a more detailed trajectory in concentration space. The most prominent qualitative concept is the autocatalytic set, which has been proposed as an important element in the origin of life [19,46,75]. An autocatalytic set is a set of molecular species where each species is produced by at least one catalytic reaction within the set [41,47] (Fig. 1). This property has also been called self-maintaining [27] or (catalytic) closure [48]. Formally: A set of species A M is called an autocatalytic set (or sometimes, catalytically closed set), if for all species1 a 2 A there is a catalyst a0 2 A, and a reaction 2 R such that a0 is catalyst in (i. e., a0 2 LHS( ) and a 0 2 RHS( )) and can take place in A (i. e., LHS( ) A). Example 1 (autocatalytic sets) R D fa ! 2a; a ! a C b; a !; b !g. Molecule a catalyzes its own production and the production of b. Both molecules are subject to a dilution flow (or, equivalently, spontaneous decay), which is usually assumed in ACs studying autocatalytic sets. In this example there are three autocatalytic sets: two non-trivial autocatalytic sets fag and fa; bg, and the empty set fg, which is from a mathematical point of view also an autocatalytic set. The term “autocatalytic set” makes sense only in ACs where catalysis is possible, it is not useful when applied to arbitrary reaction networks. For this reason Dittrich and Speroni di Fenizio [17] introduced a general notion of self-maintenance, which includes the autocatalytic set as a special case: Formally, given a reaction network hM; Ri with m D jMj molecules and r D jRj reactions, and let S D (s a; ) be the (m  r) stoichiometric matrix implied by the reaction rules R, where s a; denotes the number 1 For general reaction systems, this definition has to be refined. When A contains species that are part of the inflow, like a and b in Fig. 1, but which are not produced in a catalytic way, we might want them to be part of an autocatalytic set. Assume, for example, the set A D fa; b; aa; bag from Fig. 1, where aa catalyzes the production of aa and ba, while using up “substrate” a and b, which are not catalytically produced.

Artificial Chemistry

of molecules of type a produced in reaction . A set of molecules C M is called self-maintaining, if there exists a flux vector v 2 Rr such that the following three conditions apply: (1) for all reactions that can take place in C (i. e., LHS( ) C) the flux v > 0; (2) for all remaining reactions (i. e., LHS( ) ª C), the flux v D 0; and (3) for all molecules a 2 C, the production rate (Sv) a 0. v denotes the element of v describing the flux (i. e. rate) of reaction . (Sv) a is the production rate of molecule a given flux vector v. In Example 1 there are three self-maintaining (autocatalytic) sets. Interestingly, there are self-maintaining sets that cannot make up a stationary state. In our Example 1 only two of the three self-maintaining sets are species combinations that can make up a stationary state (i. e., a state x0 for which 0 D Sv(x0 ) holds). The self-maintaining set fag cannot make up a stationary state because a generates b through the reaction a ! a C b. Thus, there is no stationary state (of Eq. (3) with Assumptions 1 and 2) in which the concentration of a is positive while the concentration of b is zero. In order to filter out those less interesting self-maintaining sets, Fontana and Buss [27] introduced a concept taken from mathematics called closure: Formally, a set of species A M is closed, if for all reactions with LHS( ) A (the reactions that can take place in A), the products are also contained in A, i. e., RHS( ) A). Closure and self-maintenance lead to the important concept of a chemical organization [17,27]: Given an arbitrary reaction network hM; Ri, a set of molecular species that is closed and self-maintaining is called an organization. The importance of an organization is illustrated by a theorem roughly saying that given a fixed point of the chemical ODE (Eq. (3)), then the species with positive concentrations form an organization [17]. In other words, we have only to check those species combinations for stationary states that are organizations. The set of all organizations can be visualized nicely by a Hasse-diagram, which sketches the hierarchical (organizational) structure of the reaction network (Fig. 6). The dynamics of the artificial chemistry can then be explained within this Hassediagram as a movement from organization to organization [61,84]. Note that in systems under flow condition, the (finite) set of all organizations forms an algebraic lattice [17]. That is, given two organizations, there is always a unique organization union and organization intersection. Although the concept of autopoiesis [93] has been described informally in quite some detail, a stringent formal definition is lacking. In a way, we can interpret the formal concepts above as an attempt to approach necessary prop-

Artificial Chemistry, Figure 6 Lattice of organizations of the autocatalytic network shown in Fig. 1. An organization is a closed and self-maintaining set of species. Two organizations are connected by a line, if one is contained in the other and there is no organization in between. The vertical position of an organization is determined by the number of species it contains

erties of an autopoietic system step by step by precise formal means. Obviously, being (contained in) at least one organization is a necessary condition for a chemical autopoietic system. But it is not sufficient. Missing are the notion of robustness and a spatial concept that formalizes a system’s ability to maintain (and self-create) its own identity, e. g., through maintaining a membrane (Fig. 3). Evolution With artificial chemistries we can study chemical evolution. Chemical evolution (also called “pre-biotic evolution”) describes the first step in the development of life, such as the formation of complex organic molecules from simpler (in-)organic compounds [62]. Because there are no pre-biotic fossils, the study of chemical evolution has to rely on experiments [56,66] and theoretic (simulation) models [19]. Artificial chemistries aim at capturing the

197

198

Artificial Chemistry

constructive nature of these chemical systems and try to reproduce their evolution in computer simulations. The approaches can be distinguished by whether the evolution is driven by external operators like mutation, or whether variation and selection appears intrinsically through the chemical dynamics itself. Extrinsic Evolution In extrinsic approaches, an external variation operator changes the reaction network by adding, removing, or manipulating a reaction, which may also lead to the addition or removal of chemical species. In this approach, a molecule does not need to posses a structure. The following example by Jain and Krishna [41] shows that this approach allows one to create an evolving system quite elegantly: Let us assume that the reaction network consists of m species M D f1; : : : ; mg. There are first-order catalytic reaction rules of the form (i ! i C j) 2 R and a general dilution flow (a !) 2 R for all species a 2 M. The reaction network is completely described by a directed graph represented by the adjacency matrix C D (c i; j ); i; j 2 M; c i; j 2 f0; 1g, with c i; j D 1 if molecule j catalyzes the production of i, and c i; j D 0 otherwise. In order to avoid that selfreplicators dominate the system, Jain and Krishna assume c i;i D 0 for all molecules i 2 M. At the beginning, the reaction network is randomly initialized, that is, (for i ¤ j)c i; j D 1 with probability p and c i; j D 0, otherwise. In order to simulate the dynamics, we assume a population of molecules represented by the concentration vector x D (x1 ; : : : ; x m ), where xi represents the current concentration of species i. The whole system is simulated in the following way: Step 1: Simulate the chemical differential equation X X c j;i  x i c k; j x j ; (18) x˙ i D i2M

a slow, evolutionary time scale at which the reaction network evolves (Step 2 and 3), and a fast time scale at which the molecules catalytically react (Step 1). In this model we can observe how an autocatalytic set inevitably appears after a period of disorder. After its arrival the largest autocatalytic set increases rapidly its connectivity until it spans the whole network. Subsequently, the connectivity converges to a steady state determined by the rate of external perturbations. The resulting highly non-random network is not fully stable, so that we can also study the causes for crashes and recoveries. For example, Jain and Krishna [44] identified that in the absence of large external perturbation, the appearance of a new viable species is a major cause of large extinction and recoveries. Furthermore, crashes can be caused by extinction of a “keystone species”. Note that for these observations, the new species created in Step 3 do not need to inherit any information from other species. Intrinsic Evolution Evolutionary phenomena can also be caused by the intrinsic dynamics of the (artificial) chemistry. In this case, external operators like those mutating the reaction network are not required. As opposed to the previous approach, we can define the whole reaction network at the onset and let only the composition of molecules present in the reaction vessel evolve. The reaction rules are (usually) de-

k; j2M

until a steady state is reached. Note that this steady state is usually independent of the initial concentrations, cf. [85]. Step 2: Select the “mutating” species i, which is the species with smallest concentration in the steady state. Step 3: Update the reaction network by replacing the mutating species by a new species, which is created randomly in the same way as the initial species. That is, the entries of the ith row and ith column of the adjacency matrix C are replaced by randomly chosen entries with the same probability p as during initialization. Step 4: Go to Step 1. Note that there are two explicitly simulated time scales:

Artificial Chemistry, Figure 7 Illustration of syntactically and semantically closed organizations. Each organization consists of an infinite number of molecular species (connected by a dotted line). A circle represents a molecule having the structure: x1 :x2 : : : : 27

Artificial Chemistry

fined by a structure-to-function mapping similar to those described in Sect. “Structure-to-Function Mapping”. The advantage of this approach is that we need not define an external operator changing the reaction network’s topology. Furthermore, the evolution can be more realistic because, for example, molecular species that have vanished can reenter at a later time, which does not happen in an approach like the one described previously. Also, when we have a structure-to-function mapping, we can study how the structure of the molecules is related to the emergence of chemical organizations. There are various approaches using, for example, Turing machines [63], lambda-calculus [26,27], abstract automata [16], or combinators [83] for the structure-tofunction mapping. In all of those approaches we can observe an important phenomenon: While the AC evolves autocatalytic sets or more general, chemical organizations become visible. Like in the system by Jain and Krishna this effect is indicated also by an increase of the connectivity of the species within the population. When an organization becomes visible, the system has focused on a sub-space of the whole set of possible molecules. Those molecules, belonging to the emerged or-

ganization, posses usually relatively high concentrations, because they are generated by many reactions. The closure of this set is usually smaller than the universe M. Depending on the setting the emerged organization can consist of a single self-replicator, a small set of mutually producing molecules, or a large set of different molecules. Note that in the latter case the organization can be so large (and even infinite) that not all its molecules are present in the population (see Fig. 7 for an example). Nevertheless the population can carry the organization, if the system can keep a generating set of molecules within the population. An emerged organization can be quite stable but can also change either by external perturbations like randomly inserted molecules or by internal fluctuations caused by reactions among molecules that are not part of the emerged organization, but which have remained in small quantities in the population, cf. for an example [61]. Dittrich and Banzhaf [16] have shown that it is even possible to obtain evolutionary behavior without any externally generated variation, that is, even without any inflow of random molecules. And without any explicitly defined fitness function or selection process. Selection emerges as a result of the dilution flow and the limited population size.

Artificial Chemistry, Figure 8 Example of a self-evolving artificial chemistry. The figure shows the concentration of some selected species of a reaction vessel containing approximately 104 different species. In this example, molecules are binary strings of length 32 bit. Two binary strings react by mapping one of them to a finite state machine operating on the second binary string. There is only a general, non-selective dilution flow. No other operators like mutation, variation, or selection are applied. Note that the structure of a new species tends to be similar to the structure of the species it has been created from. The dynamics simulated by the algorithm of the introduction. Population size k D 106 molecules. Figure from [16]

199

200

Artificial Chemistry

Syntactic and Semantic Closure In many “constructive” implicitly defined artificial chemistries we can observe species appearing that share syntactical and functional similarities that are invariant under the reaction operation. Fontana and Buss [27] called this phenomenon syntactic and semantic closure. Syntactic closure refers to the observation that the molecules within an emerged organization O  M are structurally similar. That is, they share certain structural features. This allows one to describe the set of molecules O by a formal language or grammar in a compact way. If we would instead pick a subset A from M randomly, the expected length of A’s description is on the order of its size. Assume, for example, if we pick one million strings of length 100 randomly from the set of all strings of length 100, then we would need about 100 million characters to describe the resulting set. Interestingly, the organizations that appear in implicitly defined ACs can be described much more compactly by a grammar. Syntactic and semantic closure should be illustrated with an example taken from [27], where O  M is even infinite in size. In particular experiments with the lambdachemistry, molecules appeared that posses the following structure: A i; j x1 :x2 : : : : x i :x j

with

ji;

(19)

for example x1 :x2 :x3 :x2 . Syntactical closure means that we can specify such structural rules specifying the molecules of O and that these structural rules are invariant under the reaction mechanism. If molecules with the structure given by Eq. (19) react, their product can also be described by Eq. (19). Semantic closure means that we can describe the reactions taking place within O by referring to the molecule’s grammatical structure (e. g. Eq. (19)). In our example, all reactions within O can be described by the following laws (illustrated by Fig. 7):

Note that syntactical and semantical closure appears also in real chemical system and is exploited by chemistry to organize chemical explanations. It might even be stated that without this phenomenon, chemistry as we know it would not be possible. Information Processing The chemical metaphor gives rise to a variety of computing concepts, which are explained in detail in further articles of this encyclopedia. In approaches like amorphous computing  Amorphous Computing chemical mechanisms are used as a sub-system for communication between a huge number of simple, spatially distributed processing units. In membrane computing  Membrane Computing, rewriting operations are not only used to react molecules, but also to model active transport between compartments and to change the spatial compartment structure itself. Beside these theoretical and in-silico artificial chemistries there are approaches that aim at using real molecules: Spatial patterns like waves and solitons in excitable media can be exploited to compute, see  Reaction-Diffusion Computing and  Computing in Geometrical Constrained Excitable Chemical Systems. Whereas other approaches like conformational computing and DNA computing  DNA Computing do not rely on spatially structured reaction vessels. Other closely related topics are molecular automata  Molecular Automata and bacterial computing  Bacterial Computing. Future Directions

Currently, we can observe a convergence of artificial chemistries and chemical models towards more realistic artificial chemistries [9]. On the one hand, models of systems biology are inspired by techniques from artificial chemistries, e. g., in the domain of rule-based modeling [22,34]. On the other hand, artificial chemistries become more realistic, for example, by adopting more real8i; j > 1; k; l : A i; j C A k;l H) A i1; j1 ; istic molecular structures or using techniques from computational chemistry to calculate energies [9,57]. Further8i > 1; j D 1; k; l : A i; j C A k;l H) A kCi1;l Ci1 : (20) more, the relation between artificial chemistries and real biological systems is made more and more explicit [25, For example x1 :x2 :x3 :x4 :x3 C x1 :x2 :x2 H) 45,58]. In the future, novel theories and techniques have to be x1 :x2 :x3 :x2 and x1 :x2 :x1 Cx1 :x2 :x3 :x4 :x3 H) x1 :x2 :x3 :x4 :x5 :x4 , respectively (Fig. 7). As a result developed to handle complex, implicitly defined reaction of the semantic closure we need not to refer to the un- systems. This development will especially be driven by the derlying reaction mechanism (e. g., the lambda-calculus). needs of system biology, when implicitly defined models Instead, we can explain the reactions within O on a more (i. e., rule-based models) become more frequent. Another open challenge lies in the creation of realistic abstract level by referring to the grammatical structure (chemical) open-ended evolutionary systems. For exam(e. g., Eq. (19)).

Artificial Chemistry

ple, an artificial chemistry with “true” open-ended evolution or the ability to show a satisfying long transient phase, which would be comparable to the natural process where novelties continuously appear, has not been presented yet. The reason for this might be lacking computing power, insufficient man-power to implement what we know, or missing knowledge concerning the fundamental mechanism of (chemical) evolution. Artificial chemistries could provide a powerful platform to study whether the mechanisms that we believe explain (chemical) evolution are indeed candidates. As sketched above, there is a broad range of currently explored practical application domains of artificial chemistries in technical artifacts, such as ambient computing, amorphous computing, organic computing, or smart materials. And eventually, artificial chemistry might come back to the realm of real chemistry and inspire the design of novel computing reaction systems like in the field of molecular computing, molecular communication, bacterial computing, or synthetic biology.

Bibliography Primary Literature 1. Adami C, Brown CT (1994) Evolutionary learning in the 2D artificial life system avida. In: Brooks RA, Maes P (eds) Prof artificial life IV. MIT Press, Cambridge, pp 377–381. ISBN 0-262-52190-3 2. Adleman LM (1994) Molecular computation of solutions to combinatorical problems. Science 266:1021 3. Bagley RJ, Farmer JD (1992) Spontaneous emergence of a metabolism. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. Addison-Wesley, Redwood City, pp 93–140. ISBN 0-201-52570-4 4. Banâtre J-P, Métayer DL (1986) A new computational model and its discipline of programming. Technical Report RR-0566. INRIA, Rennes 5. Banzhaf W (1993) Self-replicating sequences of binary numbers – foundations I and II: General and strings of length n = 4. Biol Cybern 69:269–281 6. Banzhaf W (1994) Self-replicating sequences of binary numbers: The build-up of complexity. Complex Syst 8:215–225 7. Banzhaf W (1995) Self-organizing algorithms derived from RNA interactions. In: Banzhaf W, Eeckman FH (eds) Evolution and Biocomputing. LNCS, vol 899. Springer, Berlin, pp 69–103 8. Banzhaf W, Dittrich P, Rauhe H (1996) Emergent computation by catalytic reactions. Nanotechnology 7(1):307–314 9. Benkö G, Flamm C, Stadler PF (2003) A graph-based toy model of chemistry. J Chem Inf Comput Sci 43(4):1085–1093. doi:10. 1021/ci0200570 10. Bersini H (2000) Reaction mechanisms in the oo chemistry. In: Bedau MA, McCaskill JS, Packard NH, Rasmussen S (eds) Artificial life VII. MIT Press, Cambridge, pp 39–48 11. Boerlijst MC, Hogeweg P (1991) Spiral wave structure in prebiotic evolution: Hypercycles stable against parasites. Physica D 48(1):17–28

12. Breyer J, Ackermann J, McCaskill J (1999) Evolving reaction-diffusion ecosystems with self-assembling structure in thin films. Artif Life 4(1):25–40 13. Conrad M (1992) Molecular computing: The key-lock paradigm. Computer 25:11–22 14. Dewdney AK (1984) In the game called core war hostile programs engage in a battle of bits. Sci Amer 250:14–22 15. Dittrich P (2001) On artificial chemistries. Ph D thesis, University of Dortmund 16. Dittrich P, Banzhaf W (1998) Self-evolution in a constructive binary string system. Artif Life 4(2):203–220 17. Dittrich P, Speroni di Fenizio P (2007) Chemical organization theory. Bull Math Biol 69(4):1199–1231. doi:10.1007/ s11538-006-9130-8 18. Ehricht R, Ellinger T, McCascill JS (1997) Cooperative amplification of templates by cross-hybridization (CATCH). Eur J Biochem 243(1/2):358–364 19. Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58(10):465–523 20. Eigen M, Schuster P (1977) The hypercycle: A principle of natural self-organisation, part A. Naturwissenschaften 64(11): 541–565 21. Érdi P, Tóth J (1989) Mathematical models of chemical reactions: Theory and applications of deterministic and stochastic models. Pinceton University Press, Princeton 22. Faeder JR, Blinov ML, Goldstein B, Hlavacek WS (2005) Rulebased modeling of biochemical networks. Complexity. doi:10. 1002/cplx.20074 23. Farmer JD, Kauffman SA, Packard NH (1986) Autocatalytic replication of polymers. Physica D 22:50–67 24. Fernando C, Rowe J (2007) Natural selection in chemical evolution. J Theor Biol 247(1):152–167. doi:10.1016/j.jtbi.2007.01. 028 25. Fernando C, von Kiedrowski G, Szathmáry E (2007) A stochastic model of nonenzymatic nucleic acid replication: Elongators sequester replicators. J Mol Evol 64(5):572–585. doi:10.1007/ s00239-006-0218-4 26. Fontana W (1992) Algorithmic chemistry. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. AddisonWesley, Redwood City, pp 159–210 27. Fontana W, Buss LW (1994) ‘The arrival of the fittest’: Toward a theory of biological organization. Bull Math Biol 56: 1–64 28. Fontana W, Buss LW (1996) The barrier of objects: From dynamical systems to bounded organization. In: Casti J, Karlqvist A (eds) Boundaries and barriers. Addison-Wesley, Redwood City, pp 56–116 29. Furusawa C, Kaneko K (1998) Emergence of multicellular organisms with dynamic differentiation and spatial pattern. Artif Life 4:79–93 30. Gánti T (1975) Organization of chemical reactions into dividing and metabolizing units: The chemotons. Biosystems 7(1):15–21 31. Giavitto J-L, Michel O (2001) MGS: A rule-based programming language for complex objects and collections. Electron Note Theor Comput Sci 59(4):286–304 32. Gillespie DT (1976) General method for numerically simulating stochastic time evolution of coupled chemical-reaction. J Comput Phys 22(4):403–434 33. Grzybowski BA, Stone HA, Whitesides GM (2000) Dynamic self-

201

202

Artificial Chemistry

34.

35. 36. 37.

38.

39. 40. 41.

42.

43.

44.

45. 46.

47. 48. 49.

50.

51. 52. 53. 54. 55. 56.

assembly of magnetized, millimetre-sized objects rotating at a liquid-air interface. Nature 405(6790):1033–1036 Hlavacek W, Faeder J, Blinov M, Posner R, Hucka M, Fontana W (2006) Rules for modeling signal-transduction systems. Sci STKE 2006:re6 Hofbauer J, Sigmund K (1988) Dynamical systems and the theory of evolution. University Press, Cambridge Hofstadter DR (1979) Gödel, Escher, Bach: An eternal golden braid. Basic Books Inc, New York. ISBN 0-465-02685-0 Hordijk W, Crutchfield JP, Mitchell M (1996) Embedded-particle computation in evolved cellular automata. In: Toffoli T, Biafore M, Leäo J (eds) PhysComp96. New England Complex Systems Institute, Cambridge, pp 153–8 Hosokawa K, Shimoyama I, Miura H (1994) Dynamics of selfassembling systems: Analogy with chemical kinetics. Artif Life 1(4):413–427 Hutton TJ (2002) Evolvable self-replicating molecules in an artificial chemistry. Artif Life 8(4):341–356 Ikegami T, Hashimoto T (1995) Active mutation in self-reproducing networks of machines and tapes. Artif Life 2(3):305–318 Jain S, Krishna S (1998) Autocatalytic sets and the growth of complexity in an evolutionary model. Phys Rev Lett 81(25):5684–5687 Jain S, Krishna S (1999) Emergence and growth of complex networks in adaptive systems. Comput Phys Commun 122:116–121 Jain S, Krishna S (2001) A model for the emergence of cooperation, interdependence, and structure in evolving networks. Proc Natl Acad Sci USA 98(2):543–547 Jain S, Krishna S (2002) Large extinctions in an evolutionary model: The role of innovation and keystone species. Proc Natl Acad Sci USA 99(4):2055–2060. doi:10.1073/pnas.032618499 Kaneko K (2007) Life: An introduction to complex systems biology. Springer, Berlin Kauffman SA (1971) Cellular homeostasis, epigenesis and replication in randomly aggregated macromolecular systems. J Cybern 1:71–96 Kauffman SA (1986) Autocatalytic sets of proteins. J Theor Biol 119:1–24 Kauffman SA (1993) The origins of order: Self-organization and selection in evolution. Oxford University Press, New York Kirner T, Ackermann J, Ehricht R, McCaskill JS (1999) Complex patterns predicted in an in vitro experimental model system for the evolution of molecular cooperation. Biophys Chem 79(3):163–86 Kniemeyer O, Buck-Sorlin GH, Kurth W (2004) A graph grammar approach to artificial life. Artif Life 10(4):413–431. doi:10.1162/ 1064546041766451 Laing R (1972) Artificial organisms and autonomous cell rules. J Cybern 2(1):38–49 Laing R (1975) Some alternative reproductive strategies in artificial molecular machines. J Theor Biol 54:63–84 Laing R (1977) Automaton models of reproduction by self-inspection. J Theor Biol 66:437–56 Langton CG (1984) Self-reproduction in cellular automata. Physica D 10D(1–2):135–44 Langton CG (1989) Artificial life. In: Langton CG (ed) Proc of artificial life. Addison-Wesley, Redwood City, pp 1–48 Lazcano A, Bada JL (2003) The 1953 Stanley L. Miller experiment: Fifty years of prebiotic organic chemistry. Orig Life Evol Biosph 33(3):235–42

57. Lenaerts T, Bersini H (2009) A synthon approach to artificial chemistry. Artif Life 9 (in press) 58. Lenski RE, Ofria C, Collier TC, Adami C (1999) Genome complexity, robustness and genetic interactions in digital organisms. Nature 400(6745):661–4 59. Lohn JD, Colombano S, Scargle J, Stassinopoulos D, Haith GL (1998) Evolution of catalytic reaction sets using genetic algorithms. In: Proc IEEE International Conference on Evolutionary Computation. IEEE, New York, pp 487–492 60. Lugowski MW (1989) Computational metabolism: Towards biological geometries for computing. In: Langton CG (ed) Artificial Life. Addison-Wesley, Redwood City, pp 341–368. ISBN 0201-09346-4 61. Matsumaru N, Speroni di Fenizio P, Centler F, Dittrich P (2006) On the evolution of chemical organizations. In: Artmann S, Dittrich P (eds) Proc of the 7th german workshop of artificial life. IOS Press, Amsterdam, pp 135–146 62. Maynard Smith J, Szathmáry E (1995) The major transitions in evolution. Oxford University Press, New York 63. McCaskill JS (1988) Polymer chemistry on tape: A computational model for emergent genetics. Internal report. MPI for Biophysical Chemistry, Göttingen 64. McCaskill JS, Chorongiewski H, Mekelburg D, Tangen U, Gemm U (1994) Configurable computer hardware to simulate longtime self-organization of biopolymers. Ber Bunsenges Phys Chem 98(9):1114–1114 65. McMullin B, Varela FJ (1997) Rediscovering computational autopoiesis. In: Husbands P, Harvey I (eds) Fourth european conference on artificial life. MIT Press, Cambridge, pp 38–47 66. Miller SL (1953) A production of amino acids under possible primitive earth conditions. Science 117(3046):528–9 67. Morris HC (1989) Typogenetics: A logic for artificial life. In: Langton CG (ed) Artif life. Addison-Wesley, Redwood City, pp 341–368 68. Ono N, Ikegami T (2000) Self-maintenance and self-reproduction in an abstract cell model. J Theor Biol 206(2):243–253 69. Pargellis AN (1996) The spontaneous generation of digital “life”. Physica D 91(1–2):86–96 70. Pˇaun G (2000) Computing with membranes. J Comput Syst Sci 61(1):108–143 71. Petri CA (1962) Kommunikation mit Automaten. Ph D thesis, University of Bonn 72. Rasmussen S, Knudsen C, Feldberg R, Hindsholm M (1990) The coreworld: Emergence and evolution of cooperative structures in a computational chemistry. Physica D 42:111–134 73. Rasmussen S, Knudsen C, Feldberg R (1992) Dynamics of programmable matter. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. Addison-Wesley, Redwood City, pp 211–291. ISBN 0-201-52570-4 74. Ray TS (1992) An approach to the synthesis of life. In: Langton CG, Taylor C, Farmer JD, Rasmussen S (eds) Artificial life II. Addison-Wesley, Redwood City, pp 371–408 75. Rössler OE (1971) A system theoretic model for biogenesis. Z Naturforsch B 26(8):741–746 76. Sali A, Shakhnovich E, Karplus M (1994) How does a protein fold? Nature 369(6477):248–251 77. Sali A, Shakhnovich E, Karplus M (1994) Kinetics of protein folding: A lattice model study of the requirements for folding to the native state. J Mol Biol 235(5):1614–1636 78. Salzberg C (2007) A graph-based reflexive artificial chemistry. Biosystems 87(1):1–12

Artificial Chemistry

79. Sayama H (2009) Swarm chemistry. Artif Life. (in press) 80. Sayama H (1998) Introduction of structural dissolution into Langton’s self-reproducing loop. In: Adami C, Belew R, Kitano H, Taylor C (eds) Artificial life VI. MIT Press, Cambridge, pp 114–122 81. Segre D, Ben-Eli D, Lancet D (2000) Compositional genomes: Prebiotic information transfer in mutually catalytic noncovalent assemblies. Proc Natl Acad Sci USA 97(8):4112–4117 82. Socci ND, Onuchic JN (1995) Folding kinetics of proteinlike heteropolymers. J Chem Phys 101(2):1519–1528 83. Speroni di Fenizio P (2000) A less abstract artficial chemistry. In: Bedau MA, McCaskill JS, Packard NH, Rasmussen S (eds) Artificial life VII. MIT Press, Cambridge, pp 49–53 84. Speroni Di Fenizio P, Dittrich P (2002) Artificial chemistry’s global dynamics. Movement in the lattice of organisation. J Three Dimens Images 16(4):160–163. ISSN 13422189 85. Stadler PF, Fontana W, Miller JH (1993) Random catalytic reaction networks. Physica D 63:378–392 86. Suzuki H (2007) Mathematical folding of node chains in a molecular network. Biosystems 87(2–3):125–135. doi:10. 1016/j.biosystems.2006.09.005 87. Suzuki K, Ikegami T (2006) Spatial-pattern-induced evolution of a self-replicating loop network. Artif Life 12(4):461–485. doi: 10.1162/artl.2006.12.4.461 88. Suzuki Y, Tanaka H (1997) Symbolic chemical system based on abstract rewriting and its behavior pattern. Artif Life Robotics 1:211–219 89. Tangen U, Schulte L, McCaskill JS (1997) A parallel hardware evolvable computer polyp. In: Pocek KL, Arnold J (eds) IEEE symposium on FPGAs for custopm computing machines. IEEE Computer Society, Los Alamitos

90. Thürk M (1993) Ein Modell zur Selbstorganisation von Automatenalgorithmen zum Studium molekularer Evolution. Ph D thesis, Universität Jena 91. Turing AM (1952) The chemical basis of morphogenesis. Phil Trans R Soc London B 237:37–72 92. Vanderzande C (1998) Lattice models of polymers. Cambridge University Press, Cambridge 93. Varela FJ, Maturana HR, Uribe R (1974) Autopoiesis: The organization of living systems. BioSystems 5(4):187–196 94. Varetto L (1993) Typogenetics: An artificial genetic system. J Theor Biol 160(2):185–205 95. Varetto L (1998) Studying artificial life with a molecular automaton. J Theor Biol 193(2):257–285 96. Vico G (1710) De antiquissima Italorum sapientia ex linguae originibus eruenda librir tres. Neapel 97. von Neumann J, Burks A (ed) (1966) The theory of self-reproducing automata. University of Illinois Press, Urbana 98. Zauner K-P, Conrad M (1996) Simulating the interplay of structure, kinetics, and dynamics in complex biochemical networks. In: Hofestädt R, Lengauer T, Löffler M, Schomburg D (eds) Computer science and biology GCB’96. University of Leipzig, Leipzig, pp 336–338 99. Zeleny M (1977) Self-organization of living systems: A formal model of autopoiesis. Int J General Sci 4:13–28

Books and Reviews Adami C (1998) Introduction to artificial life. Springer, New York Dittrich P, Ziegler J, Banzhaf W (2001) Artificial chemistries – a review. Artif Life 7(3):225–275 Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge

203

204

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation BERNARD Z EIGLER1 , ALEXANDRE MUZY2 , LEVENT YILMAZ3 1 Arizona Center for Integrative Modeling and Simulation, University of Arizona, Tucson, USA 2 CNRS, Università di Corsica, Corte, France 3 Auburn University, Alabama, USA Article Outline Glossary Definition of the Subject Introduction Review of System Theory and Framework for Modeling and Simulation Fundamental Problems in M&S AI-Related Software Background AI Methods in Fundamental Problems of M&S Automation of M&S SES/Model Base Architecture for an Automated Modeler/Simulationist Intelligent Agents in Simulation Future Directions Bibliography Glossary Behavior The observable manifestation of an interaction with a system. DEVS Discrete Event System Specification formalism describes models developed for simulation; applications include simulation based testing of collaborative services. Endomorphic agents Agents that contain models of themselves and/or of other endomorphic Agents. Levels of interoperability Levels at which systems can interoperate such as syntactic, semantic and pragmatic. The higher the level, the more effective is information exchange among participants. Levels of system specification Levels at which dynamic input/output systems can be described, known, or specified ranging from behavioral to structural. Metadata Data that describes other data; a hierarchical concept in which metadata are a descriptive abstraction above the data it describes. Model-based automation Automation of system development and deployment that employs models or system specifications, such as DEVS, to derive artifacts. Modeling and simulation ontology The SES is interpreted as an ontology for the domain of hierarchical,

modular simulation models specified with the DEVS formalism. Net-centric environment Network Centered, typically Internet-centered or web-centered information exchange medium. Ontology Language that describes a state of the world from a particular conceptual view and usually pertains to a particular application domain. Pragmatic frame A means of characterizing the consumer’s use of the information sent by a producer; formalized using the concept of processing network model. Pragmatics Pragmatics is based on Speech Act Theory and focuses on elucidating the intent of the semantics constrained by a given context. Metadata tags to support pragmatics include Authority, Urgency/ Consequences, Relationship, Tense and Completeness. Predicate logic An expressive form of declarative language that can describe ontologies using symbols for individuals, operations, variables, functions with governing axioms and constraints. Schema An advanced form of XML document definition, extends the DTD concept. Semantics Semantics determines the content of messages in which information is packaged. The meaning of a message is the eventual outcome of the processing that it supports. Sensor Device that can sense or detect some aspect of the world or some change in such an aspect. System specification Formalism for describing or specifying a system. There are levels of system specification ranging from behavior to structure. Service-oriented architecture Web service architecture in which services are designed to be (1) accessed without knowledge of their internals through well-defined interfaces and (2) readily discoverable and composable. Structure The internal mechanism that produces the behavior of a system. System entity structure Ontological basis for modeling and simulation. Its pruned entity structures can describe both static data sets and dynamic simulation models. Syntax Prescribes the form of messages in which information is packaged. UML Unified Modeling Language is a software development language and environment that can be used for ontology development and has tools that map UML specifications into XML. XML eXtensible Markup Language provides a syntax for document structures containing tagged information

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Table 1 Hierarchy of system specifications Level 4 3 2

Name Coupled systems I/O System I/O Function

1

I/O Behavior

0

I/O Frame

What we specify at this level System built up by several component systems that are coupled together System with state and state transitions to generate the behavior Collection of input/output pairs constituting the allowed behavior partitioned according to the initial state the system is in when the input is applied Collection of input/output pairs constituting the allowed behavior of the system from an external Black Box view Input and output variables and ports together with allowed values

where tag definitions set up the basis for semantic interpretation. Definition of the Subject This article discusses the role of Artificial Intelligence (AI) in Modeling and Simulation (M&S). AI is the field of computer science that attempts to construct computer systems that emulate human problem solving behavior with the goal of understanding human intelligence. M&S is a multidisciplinary field of systems engineering, software engineering, and computer science that seeks to develop robust methodologies for constructing computerized models with the goal of providing tools that can assist humans in all activities of the M&S enterprise. Although each of these disciplines has its core community there have been numerous intersections and cross-fertilizations between the two fields. From the perspective of this article, we view M&S as presenting some fundamental and very difficult problems whose solution may benefit from the concepts and techniques of AI. Introduction To state the M&S problems that may benefit from AI we first briefly review a system-theory based framework for M&S that provides a language and concepts to facilitate definitive problem statement. We then introduce some key problem areas: verification and validation, reuse and composability, and distributed simulation and systems of systems interoperability. After some further review of software and AI-related background, we go on to outline some areas of AI that have direct applicability to the just given problems in M&S. In order to provide a unifying theme for the problem and solutions, we then raise the question of whether all of M&S can be automated into an integrated autonomous artificial modeler/simulationist. We then proceed to explore an approach to developing such an intelligent agent and present a concrete means by which such an agent could engage in M&S. We close with con-

sideration of an advanced feature that such an agent must have if it is to fully emulate human capability—the ability, to a limited, but significant extent, to construct and employ models of its own “mind” as well of the “minds” of other agents. Review of System Theory and Framework for Modeling and Simulation Hierarchy of System Specifications Systems theory [1] deals with a hierarchy of system specifications which defines levels at which a system may be known or specified. Table 1 shows this Hierarchy of System Specifications (in simplified form, see [2] for full exposition).  At level 0 we deal with the input and output interface of a system.  At level 1 we deal with purely observational recordings of the behavior of a system. This is an I/O relation which consists of a set of pairs of input behaviors and associated output behaviors.  At level 2 we have knowledge of the initial state when the input is applied. This allows partitioning the input/output pairs of level 1 into non-overlapping subsets, each subset associated with a different starting state.  At level 3 the system is described by state space and state transition functions. The transition function describes the state-to-state transitions caused by the inputs and the outputs generated thereupon.  At level 4 a system is specified by a set of components and a coupling structure. The components are systems on their own with their own state set and state transition functions. A coupling structure defines how those interact. A property of a coupled system that is called “closure under coupling” guarantees that a coupled system at level 3 itself specifies a system. This property allows hierarchical construction of systems, i. e., that

205

206

Artificial Intelligence in Modeling and Simulation

coupled systems can be used as components in larger coupled systems. As we shall see in a moment, the system specification hierarchy provides a mathematical underpinning to define a framework for modeling and simulation. Each of the entities (e. g., real world, model, simulation, and experimental frame) will be described as a system known or specified at some level of specification. The essence of modeling and simulation lies in establishing relations between pairs of system descriptions. These relations pertain to the validity of a system description at one level of specification relative to another system description at a different (higher, lower, or equal) level of specification. On the basis of the arrangement of system levels as shown in Table 1, we distinguish between vertical and horizontal relations. A vertical relation is called an association mapping and takes a system at one level of specification and generates its counterpart at another level of specification. The downward motion in the structure-to-behavior direction, formally represents the process by which the behavior of a model is generated. This is relevant in simulation and testing when the model generates the behavior which then can be compared with the desired behavior. The opposite upward mapping relates a system description at a lower level with one at a higher level of specification. While the downward association of specifications is straightforward, the upward association is much less so. This is because in the upward direction information is introduced while in the downward direction information is reduced. Many structures exhibit the same behavior and recovering a unique structure from a given behavior is not possible. The upward direction, however, is fundamental in the design process where a structure (system at level 3) has to be found that is capable of generating the desired behavior (system at level 1). Framework for Modeling and Simulation The Framework for M&S as described in [2] establishes entities and their relationships that are central to the M&S enterprise (see Fig. 1. The entities of the Framework are: source system, model, simulator, and experimental frame; they are related by the modeling and the simulation relationships. Each entity is formally characterized as a system at an appropriate level of specification of a generic dynamic system. Source System The source system is the real or virtual environment that we are interested in modeling. It is viewed as a source of

Artificial Intelligence in Modeling and Simulation, Figure 1 Framework entities and relationships

observable data, in the form of time-indexed trajectories of variables. The data that has been gathered from observing or otherwise experimenting with a system is called the system behavior database. These data are viewed or acquired through experimental frames of interest to the model development and user. As we shall see, in the case of model validation, these data are the basis for comparison with data generated by a model. Thus, these data must be sufficient to enable reliable comparison as well as being accepted by both the model developer and the test agency as the basis for comparison. Data sources for this purpose might be measurement taken in prior experiments, mathematical representation of the measured data, or expert knowledge of the system behavior by accepted subject matter experts. Experimental Frame An experimental frame is a specification of the conditions under which the system is observed or experimented with [3]. An experimental frame is the operational formulation of the objectives that motivate a M&S project. A frame is realized as a system that interacts with the system of interest to obtain the data of interest under specified conditions. An experimental frame specification consists of four major subsections: Input stimuli Specification of the class of admissible input time-dependent stimuli. This is the class from which individual samples will be drawn and injected into the model or system under test for particular experiments. Control Specification of the conditions under which the model or system will be initialized, continued under examination, and terminated.

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 2 Experimental frame and components

Metrics Specification of the data summarization functions and the measures to be employed to provide quantitative or qualitative measures of the input/output behavior of the model. Examples of such metrics are performance indices, goodness of fit criteria, and error accuracy bound. Analysis Specification of means by which the results of data collection in the frame will be analyzed to arrive at final conclusions. The data collected in a frame consists of pairs of input/output time functions. When an experimental frame is realized as a system to interact with the model or system under test the specifications become components of the driving system. For example, a generator of output time functions implements the class of input stimuli. An experimental frame is the operational formulation of the objectives that motivate a modeling and simulation project. Many experimental frames can be formulated for the same system (both source system and model) and the same experimental frame may apply to many systems. Why would we want to define many frames for the same system? Or apply the same frame to many systems? For the same reason that we might have different objectives in modeling the same system, or have the same objective in modeling different systems. There are two equally valid views of an experimental frame. One, views a frame as a definition of the type of data elements that will go into the database. The second views a frame as a system that interacts with the system of interest to obtain the data of

interest under specified conditions. In this view, the frame is characterized by its implementation as a measurement system or observer. In this implementation, a frame typically has three types of components (as shown in Fig. 2 and Fig. 3): a generator, that generates input segments to the system; acceptor that monitors an experiment to see the desired experimental conditions are met; and transducer that observes and analyzes the system output segments.

Artificial Intelligence in Modeling and Simulation, Figure 3 Experimental frame and its components

207

208

Artificial Intelligence in Modeling and Simulation

Figure 2b illustrates a simple, but ubiquitous, pattern for experimental frames that measure typical job processing performance metrics, such as round trip time and throughput. Illustrated in the web context, a generator produces service request messages at a given rate. The time that has elapsed between sending of a request and its return from a server is the round trip time. A transducer notes the departures and arrivals of requests allowing it to compute the average round trip time and other related statistics, as well as the throughput and unsatisfied (or lost) requests. An acceptor notes whether performance achieves the developer’s objectives, for example, whether the throughput exceeds the desired level and/or whether say 99% of the round trip times are below a given threshold. Objectives for modeling relate to the role of the model in systems design, management or control. Experimental frames translate the objectives into more precise experimentation conditions for the source system or its models. We can distinguish between objectives concerning those for verification and validation of (a) models and (b) systems. In the case of models, experimental frames translate the objectives into more precise experimentation conditions for the source system and/or its models. A model under test is expected to be valid for the source system in each such frame. Having stated the objectives, there is presumably a best level of resolution to answer these questions. The more demanding the questions, the greater the resolution likely to be needed to answer them. Thus, the choice of appropriate levels of abstraction also hinges on the objectives and their experimental frame counterparts. In the case of objectives for verification and validation of systems, we need to be given, or be able to formulate, the requirements for the behavior of the system at the IO behavior level. The experimental frame then is formulated to translate these requirements into a set of possible experiments to test whether the system actually performs its required behavior. In addition we can formulate measures of the effectiveness (MOE) of a system in accomplishing its goals. We call such measures, outcome measures. In order to compute such measures, the system must expose relevant variables, we’ll call these output variables, whose values can be observed during execution runs of the system.

crete Event System Specification (DEVS) formalism delineates the subclass of discrete event systems and it can also represent the systems specified within traditional formalisms such as differential (continuous) and difference (discrete time) equations [4]. In DEVS, as in systems theory, a model can be atomic, i. e., not further decomposed, or coupled, in which case it consists of components that are coupled or interconnected together.

Model

Validation and Verification

A model is a system specification, such as a set of instructions, rules, equations, or constraints for generating input/output behavior. Models may be expressed in a variety of formalisms that may be understood as means for specifying subclasses of dynamic systems. The Dis-

The basic concepts of verification and validation (V&V) have been described in different settings, levels of details, and points of views and are still evolving. These concepts have been studied by a variety of scientific and engineering disciplines and various flavors of validation and

Simulator A simulator is any computation system (such as a single processor, or a processor network, or more abstractly an algorithm), capable of executing a model to generate its behavior. The more general purpose a simulator is the greater the extent to which it can be configured to execute a variety of model types. In order of increasing capability, simulators can be:  Dedicated to a particular model or small class of similar models  Capable of accepting all (practical) models from a wide class, such as an application domain (e. g., communication systems)  Restricted to models expressed in a particular modeling formalism, such as continuous differential equation models  Capable of accepting multi-formalism models (having components from several formalism classes, such as continuous and discrete event). A simulator can take many forms such as on a single computer or multiple computers executing on a network. Fundamental Problems in M&S We have now reviewed a system-theory-based framework for M&S that provides a language and concepts in which to formulate key problems in M&S. Next on our agenda is to discuss problem areas including: verification and validation, reuse and composability, and distributed simulation and systems of systems interoperability. These are challenging, and heretofore, unsolved problems at the core of the M&S enterprise.

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 4 Basic approach to model validation

verification concepts and techniques have emerged from a modeling and simulation perspective. Within the modeling and simulation community, a variety of methodologies for V&V have been suggested in the literature [5,6,7]. A categorization of 77 verification, validation and testing techniques along with 15 principles has been offered to guide the application of these techniques [8]. However, these methods vary extensively – e. g., alpha testing, induction, cause and effect graphing, inference, predicate calculus, proof of correctness, and user interface testing and are only loosely related to one another. Therefore, such a categorization can only serve as an informal guideline for the development of a process for V&V of models and systems. Validation and verification concepts are themselves founded on more primitive concepts such as system specifications and homomorphism as discussed in the framework of M&S [2]. In this framework, the entities system, experimental frame, model, simulator take on real importance only when properly related to each other. For example, we build a model of a particular system for some objective and only some models, and not others, are suitable. Thus, it is critical to the success of a simulation modeling effort that certain relationships hold. Two of the most important are validity and simulator correctness. The basic modeling relation, validity, refers to the relation between a model, a source system and an experimental frame. The most basic concept, replicative validity, is affirmed if, for all the experiments possible within the experimental frame, the behavior of the model and system

agree within acceptable tolerance. The term accuracy is often used in place of validity. Another term, fidelity, is often used for a combination of both validity and detail. Thus, a high fidelity model may refer to a model that is both highly detailed and valid (in some understood experimental frame). However, when used this way, the assumption seems to be that high detail alone is needed for high fidelity, as if validity is a necessary consequence of high detail. In fact, it is possible to have a very detailed model that is nevertheless very much in error, simply because some of the highly resolved components function in a different manner than their real system counterparts. The basic approach to model validation is comparison of the behavior generated by a model and the source system it represents within a given experimental frame. The basis for comparison serves as the reference Fig. 4 against which the accuracy of the model is measured. The basic simulation relation, simulator correctness, is a relation between a simulator and a model. A simulator correctly simulates a model if it is guaranteed to faithfully generate the model’s output behavior given its initial state and its input trajectory. In practice, as suggested above, simulators are constructed to execute not just one model but also a family of possible models. For example, a network simulator provides both a simulator and a class of network models it can simulate. In such cases, we must establish that a simulator will correctly execute the particular class of models it claims to support. Conceptually, the approach to testing for such execution, illustrated in Fig. 5, is

209

210

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 5 Basic approach to simulator verification

Artificial Intelligence in Modeling and Simulation, Figure 6 Basic approach to system validation

to perform a number of test cases in which the same model is provided to the simulator under test and to a “gold standard simulator” which is known to correctly simulate the model. Of course such test case models must lie within the class supported by the simulat or under test as well as presented in the form that it expects to receive them. Comparison of the output behaviors in the same manner as with model validation is then employed to check the agreement between the two simulators.

If the specifications of both the simulator and the model are available in separated form where each can be accessed independently, it may be possible to prove correctness mathematically. The case of system validation is illustrated in Fig. 6. Here the system is considered as a hardware and/or software implementation to be validated against requirements for its input/output behavior. The goal is to develop test models that can stimulate the implemented system with

Artificial Intelligence in Modeling and Simulation

inputs and can observe its outputs to compare them with those required by the behavior requirements. Also shown is a dotted path in which a reference model is constructed that is capable of simulation execution. Construction of such a reference model is more difficult to develop than the test models since it requires not only knowing in advance what output to test for, but to actually to generate such an output. Although such a reference model is not required, it may be desirable in situations in which the extra cost of development is justified by the additional range of tests that might be possible and the consequential increased coverage this may provide.

ferent components. For systems composed of models with dynamics that are intrinsically heterogeneous, it is crucial to use multiple modeling formalisms to describe them. However, combining different model types poses a variety of challenges [9,12,13]. Sarjoughian [14], introduced an approach to multi-formalism modeling that employs an interfacing mechanism called a Knowledge Interchange Broker to compose model components expressed in diverse formalisms. The KIB supports translation from the semantics of one formalism into that of a second to ensure coordinated and correct execution simulation algorithms of distinct modeling formalisms.

Model Reuse and Composability

Distributed Simulation and System of Systems Interoperability

Model reuse and composability are two sides of the same coin—it is patently desirable to reuse models, the fruits of earlier or others work. However, typically such models will become components in a larger composite model and must be able to interact meaningfully with them. While software development disciplines are successfully applying a component-based approach to build software systems, the additional systems dynamics involved in simulation models has resisted straight forward reuse and composition approaches. A model is only reusable to the extent that its original dynamic systems assumptions are consistent with the constraints of the new simulation application. Consequently, without contextual information to guide selection and refactoring, a model may not be reused to advantage within a new experimental frame. Davis and Anderson [9] argue that to foster such reuse, model representation methods should distinguish, and separately specify, the model, simulator, and the experimental frame. However, Yilmaz and Oren [10] pointed out that more contextual information is needed beyond the information provided by the set of experimental frames to which a model is applicable [11], namely, the characterization of the context in which the model was constructed. These authors extended the basic model-simulator-experimental frame perspective to emphasize the role of context in reuse. They make a sharp distinction between the objective context within which a simulation model is originally defined and the intentional context in which the model is being qualified for reuse. They extend the system theoretic levels of specification discussed earlier to define certain behavioral model dependency relations needed to formalize conceptual, realization, and experimental aspects of context. As the scope of simulation applications grows, it is increasingly the case that more than one modeling paradigm is needed to adequately express the dynamics of the dif-

The problems of model reuse and composability manifest themselves strongly in the context of distributed simulation where the objective is to enable existing geographically dispersed simulators to meaningfully interact, or federate, together. We briefly review experience with interoperability in the distributed simulation context and a linguistically based approach to the System of Systems (SoS) interoperability problem [15]. Sage and Cuppan [16] drew the parallel between viewing the construction of SoS as a federation of systems and the federation that is supported by the High Level Architecture (HLA), an IEEE standard fostered by the DoD to enable composition of simulations [17,18]. HLA is a network middleware layer that supports message exchanges among simulations, called federates, in a neutral format. However, experience with HLA has been disappointing and forced acknowledging the difference between technical interoperability and substantive interoperability [19]. The first only enables heterogeneous simulations to exchange data but does not guarantee the second, which is the desired outcome of exchanging meaningful data, namely, that coherent interaction among federates takes place. Tolk and Muguirra [20] introduced the Levels of Conceptual Interoperability Model (LCIM) which identified seven levels of interoperability among participating systems. These levels can be viewed as a refinement of the operational interoperability type which is one of three defined by [15]. The operational type concerns linkages between systems in their interactions with one another, the environment, and with users. The other types apply to the context in which systems are constructed and acquired. They are constructive—relating to linkages between organizations responsible for system construction and programmatic—linkages between program offices to manage system acquisition.

211

212

Artificial Intelligence in Modeling and Simulation

AI-Related Software Background To proceed to the discussion of the role of AI in addressing key problems in M&S, we need to provide some further software and AI-related background. We offer a brief historical account of object-orientation and agent-based systems as a springboard to discuss the upcoming concepts of object frameworks, ontologies and endomorphic agents. Object-Orientation and Agent-Based Systems Many of the software technology advances of the last 30 years have been initiated from the field of M&S. Objects, as code modules with both structure and behavior, were first introduced in the SIMULA simulation language [21]. Objects blossomed in various directions and became institutionalized in the widely adopted programming language C++ and later in the infrastructure for the web in the form of Java [22] and its variants. The freedom from straight-line procedural programming that objectorientation championed was taken up in AI in two directions: various forms of knowledge representation and of autonomy. Rule-based systems aggregate modular ifthen logic elements—the rules—that can be activated in some form of causal sequence (inference chains) by an execution engine [23]. In their passive state, rules represent static discrete pieces of inferential logic, called declarative knowledge. However, when activated, a rule influences the state of the computation and the activation of subsequent rules, providing the system a dynamic or procedural, knowledge characteristic as well. Frame-based systems further expanded knowledge representation flexibility and inferencing capability by supporting slots and constraints on their values—the frames—as well as their taxonomies based on generalization/specialization relationships [24]. Convergence with object-orientation became apparent in that frames could be identified as objects and their taxonomic organization could be identified with classes within object-style organizations that are based on sub-class hierarchies. On the other hand, the modular nature of objects together with their behavior and interaction with other objects, led to the concept of agents which embodied increased autonomy and self-determination [25]. Agents represent individual threads of computation and are typically deployable in distributed form over computer networks where they interact with local environments and communicate/coordinate with each other. A wide variety of agent types exists in large part determined by the variety and sophistication of their processing capacities—ranging from agents that simply gather information on packet traffic in a network to logic-based entities with elaborate rea-

soning capacity and authority to make decisions (employing knowledge representations just mentioned.) The step from agents to their aggregates is natural, thus leading to the concept of multi-agent systems or societies of agents, especially in the realm of modeling and simulation [26]. To explore the role of AI in M&S at the present, we project the just-given historical background to the concurrent concepts of object frameworks, ontologies and endomorphic agents. The Unified Modeling Language (UML) is gaining a strong foothold as the defacto standard for object-based software development. Starting as a diagrammatical means of software system representation, it has evolved to a formally specified language in which the fundamental properties of objects are abstracted and organized [27,28]. Ontologies are models of the world relating to specific aspects or applications that are typically represented in frame-based languages and form the knowledge components for logical agents on the Semantic Web [29]. A convergence is underway that re-enforces the commonality of the object-based origins of AI and software engineering. UML is being extended to incorporate ontology representations so that software systems in general will have more explicit models of their domains of operation. As we shall soon see, endomorphic agents refer to agents that include abstractions of their own structure and behavior within their ontologies of the world [30]. The M&S Framework Within Unified Modeling Language (UML) With object-orientation as unified by UML and some background in agent-based systems, we are in a position to discuss the computational realization of the M&S framework discussed earlier. The computational framework is based on the Discrete Event System Specification (DEVS) formalism and implemented in various object-oriented environments. Using UML we can represent the framework as a set of classes and relations as illustrated in Figs. 7 and 8. Various software implementations of DEVS support different subsets of the classes and relations. In particular, we mention a recent implementation of DEVS within a Service-Oriented Architecture (SOA) environment called DEVS/SOA [31,32]. This implementation exploits some of the benefits afforded by the web environment mentioned earlier and provides a context for consideration of the primary target of our discussion, comprehensive automation of the M&S enterprise. We use one of the UML constructs, the use case diagram, to depict the various capabilities that would be involved in automating all or parts of the M&S enterprise. Use cases are represented by ovals that connect to at least

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 7 M&S framework formulated within UML

Artificial Intelligence in Modeling and Simulation, Figure 8 M&S framework classes and relations in a UML representation

one actor (stick figure) and to other use cases through “includes” relations, shown as dotted arrows. For example, a sensor (actor) collects data (use case) which includes storage of data (use case). A memory actor stores and

retrieves models which include storage and retrieval (respectively) of data. Constructing models includes retrieving stored data within an experimental frame. Validating models includes retrieving models from memory as com-

213

214

Artificial Intelligence in Modeling and Simulation

ponents and simulating the composite model to generate data within an experimental frame. The emulator-simulator actor does the simulating to execute the model so that its generated behavior can be matched against the stored data in the experimental frame. The objectives of the human modeler drive the model evaluator and hence the choice of experimental frames to consider as well as models to validate. Models can be used in (at least) two time frames [33]. In the long term, they support planning of actions to be taken in the future. In the short term, models support execution and control in real-time of previously planned actions. AI Methods in Fundamental Problems of M&S The enterprise of modeling and simulation is characterized by activities such as model, simulator and experimental frame creation, construction, reuse, composition, verification and validation. We have seen that valid model construction requires significant expertise in all the components of the M&S enterprise, e. g., modeling formalisms, simulation methods, and domain understanding and knowledge. Needless to say, few people can bring all such elements to the table, and this situation creates a significant bottleneck to progress in such projects. Among the contributing factors are lack of trained personnel that must be brought in, expense of such high capability experts, and the time needed to construct models to the resolution required for most objectives. This section introduces some AI-related technologies that can ameliorate this situation: Service-Oriented Architecture and Seman-

Artificial Intelligence in Modeling and Simulation, Figure 9 UML use case formulation of the overall M&S enterprise

tic Web, ontologies, constrained natural language capabilities, and genetic algorithms. Subsequently we will consider these as components in unified, comprehensive, and autonomous automation of M&S. Service-Oriented Architecture and Semantic Web On the World Wide Web, a Service-Oriented Architecture is a market place of open and discoverable web-services incorporating, as they mature, Semantic Web technologies [34]. The eXtensible Markup Language (XML) is the standard format for encoding data sets and there are standards for sending and receiving XML [35]. Unfortunately, the problem just starts at this level. There are myriad ways, or Schemata, to encode data into XML and a good number of such Schemata have already been developed. More often than not, they are different in detail when applied to the same domains. What explains this incompatibility? In a Service-Oriented Architecture, the producer sends messages containing XML documents generated in accordance with a schema. The consumer receives and interprets these messages using the same schema in which they were sent. Such a message encodes a world state description (or changes in it) that is a member of a set delineated by an ontology. The ontology takes into account the pragmatic frame, i. e., a description of how the information will be used in downstream processing. In a SOA environment, data dissemination may be dominated by “user pull of data,” incremental transmission, discovery using metadata, and automated retrieval of data to meet user pragmatic frame specifications. This is the SOA con-

Artificial Intelligence in Modeling and Simulation

cept of data-centered, interface-driven, loose coupling between producers and consumers. The SOA concept requires the development of platform-independent, community-accepted, standards that allow raw data to be syntactically packaged into XML and accompanied by metadata that describes the semantic and pragmatic information needed to effectively process the data into increasingly higher-value products downstream. Artificial Intelligence in Modeling and Simulation, Figure 10 Interoperability levels in distributed simulation

Ontologies Semantic Web researchers typically seek to develop intelligent agents that can draw logical inferences from diverse, possibly contradictory, ontologies such as a web search might discover. Semantic Web research has led to a focus on ontologies [34]. These are logical languages that provide a common vocabulary of terms and axiomatic relations among them for a subject area. In contrast, the newly emerging area of ontology integration assumes that human understanding and collaboration will not be replaced by intelligent agents. Therefore, the goal is to create concepts and tools to help people develop practical solutions to incompatibility problems that impede “effective” exchange of data and ways of testing that such solutions have been correctly implemented. As illustrated in Fig. 10, interoperability of systems can be considered at three linguistically-inspired levels: syntactic, semantic, and pragmatic. The levels are summarized in Table 2. More detail is provided in [36]. Constrained Natural Language Model development can be substantially aided by enabling users to specify modeling constructs using some form of

constrained natural language [37]. The goal is to overcome modeling complexity by letting users with limited or nonexistent formal modeling or programming background convey essential information using natural language, a form of expression that is natural and intuitive. Practicality demands constraining the actual expressions that can be used so that the linguistic processing is tractable and the input can be interpreted unambiguously. Some techniques allow the user to narrow down essential components for model construction. Their goal is to reduce ambiguity between the user’s requirements and essential model construction components. A natural language interface allows model specification in terms of a verb phrase consisting of a verb, noun, and modifier, for example “build car quickly.” Conceptual realization of a model from a verb phrase ties in closely with Checkland’s [38] insight that an appropriate verb should be used to express the root definition, or core purpose, of a system. The main barrier between many people and existing modeling software is their lack of computer literacy and this provides an incentive to develop natural language interfaces as a means of bridging this gap. Natural language

Artificial Intelligence in Modeling and Simulation, Table 2 Linguistic levels Linguistic Level Pragmatic – how information in messages is used Semantic – shared understanding of meaning of messages

A collaboration of systems or services interoperates at this level if: The receiver reacts to the message in a manner that the sender intends

The receiver assigns the same meaning as the sender did to the message.

Syntactic – common The consumer is able to receive and rules governing parse the sender’s message composition and transmitting of messages

Examples An order from a commander is obeyed by the troops in the field as the commander intended. A necessary condition is that the information arrives in a timely manner and that its meaning has been preserved (semantic interoperability) An order from a commander to multi -national participants in a coalition operation is understood in a common manner despite translation into different languages. A necessary condition is that the information can be unequivocally extracted from the data (syntactic interoperability) A common network protocol (e. g. IPv4) is employed ensuring that all nodes on the network can send and receive data bit arrays adhering to a prescribed format.

215

216

Artificial Intelligence in Modeling and Simulation

expression could create modelers out of people who think semantically, but do not have the requisite computer skills to express these ideas. A semantic representation frees the user to explore the system on the familiar grounds of natural language and opens the way for brain storming, innovation and testing of models before they leave the drawing board. Genetic Algorithms The genetic algorithm is a subset of evolutionary algorithms that model biological processes to search in highly complex spaces. A genetic algorithm (GA) allows a population composed of many individuals to evolve under specified selection rules to a state that maximizes the “fitness.” The theory was developed by John Holland [39] and popularized by Goldberg who was able to solve a difficult problem involving the control of gas pipeline transmission for his dissertation [40]. Numerous applications of GAs have since been chronicled [41,42]. Recently, GAs have been applied to cutting-edge problems in automated construction of simulation models, as discussed below [43]. Automation of M&S We are now ready to suggest a unifying theme for the problems in M&S and possible AI-based solutions, by raising the question of whether all of M&S can be automated into an integrated autonomous artificial modeler/simulationist. First, we provide some background needed to explore an approach to developing such an intelligent agent based on the System Entity Structure/Model Base framework, a hybrid methodology that combines elements of AI and M&S. System Entity Structure The System Entity Structure (SES) concepts were first presented in [44]. They were subsequently extended and implemented in a knowledge-based design environment [45]. Application to model base management originated with [46]. Subsequent formalizations and implementations were developed in [47,48,49,50,51]. Applications to various domains are given in [52]. A System Entity Structure is a knowledge representation formalism which focuses on certain elements and relationships that relate to M&S. Entities represent things that exist in the real world or sometimes in an imagined world. Aspects represent ways of decomposing things into more fine grained ones. Multi-aspects are aspects for which the components are all of one kind. Specializations represent categories or families of specific forms

that a thing can assume provides the means to represent a family of models as a labeled tree. Two of its key features are support for decomposition and specialization. The former allows decomposing a large system into smaller systems. The latter supports representation of alternative choices. Specialization enables representing a generic model (e. g., a computer display model) as one of its specialized variations (e. g., a flat panel display or a CRT display). On the basis of SES axiomatic specifications, a family of models (design-space) can be represented and further automatically pruned to generate a simulation model. Such models can be systematically studied and experimented with based on alternative design choices. An important, salient feature of SES is its ability to represent models not only in terms of their decomposition and specialization, but also their aspects. The SES represents alternative decompositions via aspects. The system entity structure (SES) formalism provides an operational language for specifying such hierarchical structures. An SES is a structural knowledge representation scheme that systematically organizes a family of possible structures of a system. Such a family characterizes decomposition, coupling, and taxonomic relationships among entities. An entity represents a real world object. The decomposition of an entity concerns how it may be broken down into sub-entities. In addition, coupling specifications tell how sub-entities may be coupled together to reconstitute the entity and be associated with an aspect. The taxonomic relationship concerns admissible variants of an entity. The SES/Model-Base framework [52] is a powerful means to support the plan-generate-evaluate paradigm in systems design. Within the framework, entity structures organize models in a model base. Thus, modeling activity within the framework consists of three sub-activities: specification of model composition structure, specification of model behavior, and synthesis of a simulation model. The SES is governed by an axiomatic framework in which entities alternate with the other items. For example, a thing is made up of parts; therefore, its entity representation has a corresponding aspect which, in turn, has entities representing the parts. A System Entity Structure specifies a family of hierarchical, modular simulation models, each of which corresponds to a complete pruning of the SES. Thus, the SES formalism can be viewed as an ontology with the set of all simulation models as its domain of discourse. The mapping from SES to the Systems formalism, particularly to the DEVS formalism, is discussed in [36]. We note that simulation models include both static and dynamic elements in any application domain, hence represent an advanced form of ontology framework.

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 11 SES/Model base architecture for automated M&S

SES/Model Base Architecture for an Automated Modeler/Simulationist In this section, we raise the challenge of creating a fully automated modeler/simulationist that can autonomously carry out all the separate functions identified in the M&S framework as well as the high level management of these functions that is currently under exclusively human control. Recall the use case diagrams in Fig. 9 that depict the various capabilities that would need to be involved in realizing a completely automated modeler/simulationist. To link up with the primary modules of mind, we assign model construction to the belief generator—interpreting beliefs as models [54]. Motivations outside the M&S component drive the belief evaluator and hence the choice of experimental frames to consider as well as models to validate. External desire generators stimulate the imaginer/envisioner to run models to make predictions within pragmatic frames and assist in action planning. The use case diagram of Fig. 9, is itself a model of how modeling and simulation activities may be carried out within human minds. We need not be committed to particular details at this early stage, but will assume that such a model can be refined to provide a useful representation of human mental activity from the perspective of M&S. This provides the basis for examining how such an arti-

ficially intelligent modeler/simulationist might work and considering the requirements for comprehensive automation of M&S. The SES/MB methodology, introduced earlier, provides a basis for formulating a conceptual architecture for automating all of the M&S activities depicted earlier in Fig. 9 into an integrated system. As illustrated in Fig. 11, the dichotomy into real-time use of models for acting in the real world and the longer-term development of models that can be employed in such real-time activities is manifested in the distinction between passive and active models. Passive models are stored in a repository that can be likened to long-term memory. Such models under go the life-cycle mentioned earlier in which they are validated and employed within experimental frames of interest for long-term forecasting or decision-making. However, in addition to this quite standard concept of operation, there is an input pathway from the short term, or working, memory in which models are executed in real-time. Such execution, in real-world environments, often results in deficiencies, which provide impetus and requirements for instigating new model construction. Whereas long-term model development and application objectives are characterized by experimental frames, the short-term execution objectives are characterized by pragmatic frames. As discussed in [36], a pragmatic frame provides a means of

217

218

Artificial Intelligence in Modeling and Simulation

characterizing the use for which an executable model is being sought. Such models must be simple enough to execute within, usually, stringent real-time deadlines. The M&S Framework Within Mind Architecture An influential formulation of recent work relating to mind and brain [53] views mind as the behavior of the brain, where mind is characterized by a massively modular architecture. This means that mind is, composed of a large number of modules, each responsible for different functions, and each largely independent and sparsely connected with others. Evolution is assumed to favor such differentiation and specialization since under suitably weakly interactive environments they are less redundant and more efficient in consuming space and time resources. Indeed, this formulation is reminiscent of the class of problems characterized by Systems of Systems (SoS) in which the attempt is made to integrate existing systems, originally built to perform specific functions, into a more comprehensive and multifunctional system. As discussed in [15], the components of each system can be viewed as communicating with each other within a common ontology, or model of the world that is tuned to the smooth functioning of the organization. However, such ontologies may well be mismatched to support integration at the systems level. Despite working on the results of a long history of pre-human evolution, the fact that consciousness seems to provide a unified and undifferentiated picture of mind, suggests that human evolution has to a large extent solved the SoS integration problem. The Activity Paradigm for Automated M&S At least for the initial part of its life, a modeling agent needs to work on a “first order” assumption about its environment, namely, that it can exploit only semantics-free properties [39]. Regularities, such as periodic behaviors, and stimulus-response associations, are one source of such semantics-free properties. In this section, we will focus on a fundamental property of systems, such as the brain’s neural network, that have a large number of components. This distribution of activity of such a system over space and time provides a rich and semantics-free substrate from which models can be generated. Proposing structures and algorithms to track and replicate this activity should support automated modeling and simulation of patterns of reality. The goal of the activity paradigm is to extract mechanisms from natural phenomena and behaviors to automate and guide the M&S process. The brain offers a quintessential illustration of activity and its potential use to construct models. Figure 12

Artificial Intelligence in Modeling and Simulation, Figure 12 Brain module activities

describes brain electrical activities [54]. Positron emission tomography (PET) is used to record electrical activity inside the brain. The PET method scans show what happens inside the brain when resting and when stimulated by words and music. The red areas indicate high brain activities. Language and music produce responses in opposite sides of the brain (showing the sub-system specializations). There are many levels of activity (ranging from low to high.) There is a strong link between modularity and the applicability of activity measurement as a useful concept. Indeed, modules represent loci for activity—a distribution of activity would not be discernable over a network were there no modules that could be observed to be in different states of activity. As just illustrated, neuroscientists are exploiting this activity paradigm to associate brain areas with functions and to gain insight into areas that are active or inactive over different, but related, functions, such as language and music processing. We can generalize this approach as a paradigm for an automated modeling agent. Component, Activity and Discrete-Event Abstractions Figure 13 depicts a brain description through components and activities. First, the modeler considers one brain activity (e. g., listening music.) This first level activity corresponds to simulation components at a lower level. At this level, activity of the components can be considered to be only on (grey boxes) or off (white boxes.) At lower levels, structure and behaviors can be decomposed. The structure of the higher component level is detailed at lower levels (e. g., to the neuronal network). The behavior is also detailed through activity and discrete-event abstraction [55]. At the finest levels, activity of components can be detailed.

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 13 Hierarchy of components, activity and discrete-event abstractions

Using the pattern detection and quantization methods continuously changing variables can also be treated within the activity paradigm. As illustrated in Fig. 14, small slopes and small peaks can signal low activity whereas high slopes and peaks can signal high activity levels. To provide for scale, discrete event abstraction can be achieved using quantization [56]. To determine the activity level, a quantum or measure of significant change has to be chosen. The quantum size acts as a filter on the continuous flow. For example, one can notice that in the figure, using the displayed quantum, the smallest peaks will not be significant. Thus, different levels of resolution can be achieved by employing different quantum sizes. A genetic algorithm can be used to find the optimum such level of resolution given for a given modeling objective [43]. Activity Tracking Within the activity paradigm, M&S consists of capturing activity paths through component processing and transformations. To determine the basic structure of the whole system, an automated modeler has to answer questions of the form: where and how is activity produced, received, and transmitted? Figure 15 represents a component-based view of activity flow in a neuronal network. Activity paths through components are represented by full arrows. Ac-

Artificial Intelligence in Modeling and Simulation, Figure 14 Activity sensitivity and discrete-events

Artificial Intelligence in Modeling and Simulation, Figure 15 Activity paths in neurons

tivity is represented by full circles. Components are represented by squares. The modeler must generate such a graph based on observed data—but how does it obtain such data? One approach is characteristic of current use of PET scans by neuroscientists. This approach exploits a relationship between activity and energy—activity requires consumption of energy, therefore, observing areas of high energy consumption signals areas of high activity. Notice that this correlation requires that energy consumption be

219

220

Artificial Intelligence in Modeling and Simulation

localizable to modules in the same way that activity is so localized. So for example, current computer architectures that provide a single power source to all components do not lend themselves to such observation. How activity is passed on from component to component can be related to the modularity styles (none, weak, strong) of the components. Concepts relating such modularity styles to activity transfer need to be developed to support an activity tracking methodology that goes beyond reliance on energy correlation. Activity Model Validation Recall that having generated a model of an observed system (whether through activity tracking or by other means), the next step is validation. In order to perform such validation, the modeler needs an approach to generating activity profiles from simulation experiments on the model and to comparing these profiles with those observed in the real system. Muzy and Nutaro [57] have developed algorithms that exploit activity tracking to achieve efficient simulation of DEVS models. These algorithms can be

Artificial Intelligence in Modeling and Simulation, Figure 16 Evolution of the use of intelligent agents in simulation

adopted to provide an activity tracking pattern applicable to a given simulation model to extract its activity profiles for comparison with those of the modeled system. A forthcoming monograph will develop the activity paradigm for M&S in greater detail [58]. Intelligent Agents in Simulation Recent trends in technology as well as the use of simulation in exploring complex artificial and natural information processes [62,63] have made it clear that simulation model fidelity and complexity will continue to increase dramatically in the coming decades. The dynamic and distributed nature of simulation applications, the significance of exploratory analysis of complex phenomena [64], and the need for modeling the micro-level interactions, collaboration, and cooperation among real-world entities is bringing a shift in the way systems are being conceptualized. Using intelligent agents in simulation models is based on the idea that it is possible to represent the behavior of active entities in the world in terms of the interactions of an assembly of agents with their own operational autonomy.

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 17 Agent-directed simulation framework

The early pervading view on the use of agents in simulation stems from the developments in Distributed Artificial Intelligence (DAI), as well as advances in agent architectures and agent-oriented programming. The DAI perspective to modeling systems in terms of entities that are capable of solving problems by means of reasoning through symbol manipulation resulted in various technologies that constitute the basic elements of agent systems. The early work on design of agent simulators within the DAI community focused on answering the question of how goals and intentions of agents emerge and how they lead to execution of actions that change the state of their environment. The agent-directed approach to simulating agent systems lies at the intersection of several disciplines: DAI, Control Theory, Complex Adaptive Systems (CAS), and Discrete-event Systems/Simulation. As shown in Fig. 16, these core disciplines gave direction to technology, languages, and possible applications, which then influenced the evolution of the synergy between simulation and agent systems. Distributed Artificial Intelligence and Simulation While progress in agent simulators and interpreters resulted in various agent architectures and their computational engines, the ability to coordinate agent ensembles was recognized early as a key challenge [65]. The MACE system [66] is considered as one of the major milestones in DAI. Specifically, the proposed DAI system integrated concepts from concurrent programming (e. g., actor formalism [67]) and knowledge representation to symbolically reason about skills and beliefs pertaining to modeling the environment. Task allocation and coordination were considered as fundamental challenges in early DAI systems. The contract net protocol developed by [68] pro-

vided the basis for modeling collaboration in simulation of distributed problem solving. Agent Simulation Architectures One of the first agent-oriented simulation languages, AGENT-0 [69], provided a framework that enabled the representation of beliefs and intentions of agents. Unlike object-oriented simulation languages such as SIMULA 67 [70], the first object-oriented language for specifying discrete-event systems, AGENT-O and McCarthy’s Elephant2000 language incorporated speech act theory to provide flexible communication mechanisms for agents. DAI and cognitive psychology influenced the development of cognitive agents such as those found in AGENT-0, e. g., the Belief-Desires-Intentions (BDI) framework [71]. However, procedural reasoning and control theory provided a basis for the design and implementation of reactive agents. Classical control theory enables the specification of a mathematical model that describes the interaction of a control system and its environment. The analogy between an agent and control system facilitated the formalization of agent interactions in terms of a formal specification of dynamic systems. The shortcomings of reactive agents (i. e., lack of mechanisms of goal-directed behavior) and cognitive agents (i. e., issues pertaining to computational tractability in deliberative reasoning) led to the development of hybrid architectures such as the RAP system [72]. Agents are often viewed as design metaphors in the development of models for simulation and gaming. Yet, this narrow view limits the potential of agents in improving various other dimensions of simulation. To this end, Fig. 17 presents a unified paradigm of Agent-Directed Simulation that consists of two categories as follows:

221

222

Artificial Intelligence in Modeling and Simulation

(1) Simulation for Agents (agent simulation), i. e., simulation of systems that can be modeled by agents (in engineering, human and social dynamics, military applications etc.) and (2) Agents for Simulation that can be grouped under two groups: agent-supported simulation and agentbased simulation.

tasks. Also, agent-based simulation is useful for having complex experiments and deliberative knowledge processing such as planning, deciding, and reasoning. Agents are also critical enablers to improve composability and interoperability of simulation models [73]. Agent-Supported Simulation

Agent Simulation Agent simulation involves the use of agents as design metaphors in developing simulation models. Agent simulation involves the use of simulation conceptual frameworks (e. g., discrete-event, activity scanning) to simulate the behavioral dynamics of agent systems and incorporate autonomous agents that function in parallel to achieve their goals and objectives. Agents possess high-level interaction mechanisms independent of the problem being solved. Communication protocols and mechanisms for interaction via task allocation, coordination of actions, and conflict resolution at varying levels of sophistication are primary elements of agent simulations. Simulating agent systems requires understanding the basic principles, organizational mechanisms, and technologies underlying such systems. Agent-Based Simulation Agent-based simulation is the use of agent technology to monitor and generate model behavior. This is similar to the use of AI techniques for the generation of model behavior (e. g., qualitative simulation and knowledge-based simulation). Development of novel and advanced simulation methodologies such as multisimulation suggests the use of intelligent agents as simulator coordinators, where run-time decisions for model staging and updating takes place to facilitate dynamic composability. The perception feature of agents makes them pertinent for monitoring

Artificial Intelligence in Modeling and Simulation, Figure 18 M&S within mind

Agent-supported simulation deals with the use of agents as a support facility to enable computer assistance by enhancing cognitive capabilities in problem specification and solving. Hence, agent-supported simulation involves the use of intelligent agents to improve simulation and gaming infrastructures or environments. Agent-supported simulation is used for the following purposes:  to provide computer assistance for front-end and/or back-end interface functions;  to process elements of a simulation study symbolically (for example, for consistency checks and built-in reliability); and  to provide cognitive abilities to the elements of a simulation study, such as learning or understanding abilities. For instance, in simulations with defense applications, agents are often used as support facilities to  see the battlefield,  fuse, integrate and de-conflict the information presented by the decision-maker,  generate alarms based on the recognition of specific patterns,  Filter, sort, track, and prioritize the disseminated information, and  generate contingency plans and courses of actions. A significant requirement for the design and simulation of agent systems is the distributed knowledge that represents

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 19 Emergence of endomorphic agents

the mental model that characterizes each agent’s beliefs about the environment, itself, and other agents. Endomorphic agent concepts provide a framework for addressing the difficult conceptual issues that arise in this domain. Endomorphic Agents We now consider an advanced feature that an autonomous, integrated and comprehensive modeler/simulationist agent must have if it is fully emulate human capability. This is the ability, to a limited, but significant extent, to construct and employ models of its own mind as well of the minds of other agents. We use the term “mind” in the sense just discussed. The concept of an endomorphic agent is illustrated in Fig. 18 and 19 in a sequence of related diagrams. The diagram labeled with an oval with embedded number 1 is that of Fig. 9 with the modifications mentioned earlier to match up with human motivation and desire generation modules. In diagram 2, the label “mind” refers to the set of M&S capabilities depicted in Fig. 9. As in [30], an agent, human or technological, is considered to be composed of a mind and a body. Here, “body” represents the external manifestation of the agent, which is observable by other agents. Whereas, in contrast, mind is hidden from view and must be a construct, or model, of other agents. In other words, to use the language of evolutionary psychology, agents must develop a “theory of mind” about other

agents from observation of their external behavior. An endomorphic agent is represented in diagram 3 with a mental model of the body and mind of the agent in diagram 2. This second agent is shown more explicitly in diagram 4, with a mental representation of the first agent’s body and mind. Diagram 5 depicts the recursive aspect of endomorphism, where the (original) agent of diagram 2 has developed a model of the second agent’s body and mind. But the latter model contains the just-mentioned model of the first agent’s body and mind. This leads to a potentially infinite regress in which—apparently—each agent can have a representation of the other agent, and by reflection, of himself, that increases in depth of nesting without end. Hofstadter [59] represents a similar concept in the diagram on page 144 of his book, in which the comic character Sluggo is “dreaming of himself dreaming of himself dreaming of himself, without end.” He then uses the label on the Morton Salt box on page 145 to show how not all self reference involves infinite recursion. On the label, the girl carrying the salt box obscures its label with her arm, thereby shutting down the regress. Thus, the salt box has a representation of itself on its label but this representation is only partial. Reference [30] related the termination in self-reference to the agent’s objectives and requirements in constructing models of himself, other agents, and the environment. Briefly, the agent need only to go as deep as needed to get a reliable model of the other agents. The agent can

223

224

Artificial Intelligence in Modeling and Simulation

Artificial Intelligence in Modeling and Simulation, Figure 20 Interacting models of others in baseball

stop at level 1 with a representation of the other’s bodies. However, this might not allow predicting another’s movements, particularly if the latter has a mind in control of these movements. This would force the first agent to include at least a crude model of the other agent’s mind. In a competitive situation, having such a model might give the first agent an advantage and this might lead the second agent to likewise develop a predictive model of the first agent. With the second agent now seeming to become less predictable, the first agent might develop a model of the second agent’s mind that restores lost predictive power. This would likely have to include a reflected representation of himself, although the impending regression could be halted if this representation did not, itself, contain a model of the other agent. Thus, the depth to which competitive endomorphic agents have models of themselves and others might be the product of a co-evolutionary “mental arms race” in which an improvement in one side triggers a contingent improvement in the other—the improvement being an incremental refinement of the internal models by successively adding more levels of nesting. Minsky [60] conjectured that termination of the potentially infinite regress in agent’s models of each other within a society of mind might be constrained by shear limitations on the ability to martial the resources required to support the necessary computation. We can go further by assuming that agents have differing mental capacities to support such computational nesting. Therefore, an agent

with greater capacity might be able to “out think” one of lesser capability. This is illustrated by the following reallife story drawn from a recent newspaper account of a critical play in a baseball game. Interacting Models of Others in Competitive Sport The following account is illustrated in Fig. 20. A Ninth Inning to Forget Cordero Can’t Close, Then Base-Running Gaffe Ends Nats’ Rally Steve Yanda – Washington Post Staff Writer Jun 24, 2007 Copyright The Washington Post Company Jun 24, 2007Indians 4, Nationals 3 Nook Logan played out the ending of last night’s game in his head as he stood on second base in the bottom of the ninth inning. The bases were loaded with one out and the Washington Nationals trailed by one run. Even if Felipe Lopez, the batter at the plate, grounded the ball, say, right back to Cleveland Indians closer Joe Borowski, the pitcher merely would throw home. Awaiting the toss would be catcher Kelly Shoppach, who would tag the plate and attempt to nail Lopez at first. By the time Shoppach’s throw reached first baseman Victor Martinez, Logan figured he would be gliding across the plate with the ty-

Artificial Intelligence in Modeling and Simulation

ing run. Lopez did ground to Borowski, and the closer did fire the ball home. However, Shoppach elected to throw to third instead of first, catching Logan drifting too far off the bag for the final out in the Nationals’ 4–3 loss at RFK Stadium. “I thought [Shoppach] was going to throw to first,” Logan said. And if the catcher had, would Logan have scored all the way from second? “Easy.” We’ll analyze this account to show how it throws light on the advantage rendered by having an endomorphic capability to process to a nesting depth exceeding that of an opponent. The situation starts the bottom of the ninth inning with the Washington Nationals at bats having the bases loaded with one out and trailing by one run. The runner on second base, Nook Logan plays out the ending of the game in his head. This can be interpreted in terms of endomorphic models as follows. Logan makes a prediction using his models of the opposing pitcher and catcher, namely that the pitcher would throw home and the catcher would tag the plate and attempt to nail the batter at first. Logan then makes a prediction using a model of himself, namely, that he would be able to reach home plate while the pitcher’s thrown ball was traveling to first base. In actual play, the catcher threw the ball to third and caught Logan out. This is evidence that the catcher was able to play out the simulation to a greater depth then was Logan. The catcher’s model of the situation agreed with that of Logan as it related to the other actors. The difference was that the catcher used a model of Logan that predicted that the latter (Logan) would predict the he (the catcher) would throw to first. Having this prediction, the catcher decided instead to throw the ball to the second baseman which resulted in putting Logan out. We note that the catcher’s model was based on his model of Logan’s model so it was at one level greater in depth of nesting than the latter. To have succeeded, Logan would have had to be able to support one more level, namely, to have a model of the catcher that would predict that the catcher would use the model of himself (Logan) to out-think him and then make the counter move, not to start towards third base. The enigma of such endomorphic agents provides extreme challenges to further research in AI and M&S. The formal and computational framework that the M&S framework discussed here provides may be of particular advantage to cognitive psychologists and philosophers interested in an active area of investigation in which the terms “theory of mind,” “simulation,” and “mind reading” are employed without much in the way of definition [61].

Future Directions M&S presents some fundamental and very difficult problems whose solution may benefit from the concepts and techniques of AI. We have discussed some key problem areas including verification and validation, reuse and composability, and distributed simulation and systems of systems interoperability. We have also considered some areas of AI that have direct applicability to problems in M&S, such as Service-Oriented Architecture and Semantic Web, ontologies, constrained natural language, and genetic algorithms. In order to provide a unifying theme for the problem and solutions, we raised the question of whether all of M&S can be automated into an integrated autonomous artificial modeler/simulationist. We explored an approach to developing such an intelligent agent based on the System Entity Structure/Model Base framework, a hybrid methodology that combines elements of AI and M&S. We proposed a concrete methodology by which such an agent could engage in M&S based on activity tracking. There are numerous challenges to AI that implementing such a methodology in automated form presents. We closed with consideration of endomorphic modeling capability, an advanced feature that such an agent must have if it is to fully emulate human M&S capability. Since this capacity implies an infinite regress in which models contain models of themselves without end, it can only be had to a limited degree. However, it may offer critical insights into competitive co-evolutionary human or higher order primate behavior to launch more intensive research into model nesting depth. This is the degree to which an endomorphic agent can marshal mental resources needed to construct and employ models of its own “mind” as well of the ‘minds” of other agents. The enigma of such endomorphic agents provides extreme challenges to further research in AI and M&S, as well as related disciplines such as cognitive science and philosophy.

Bibliography Primary Literature 1. Wymore AW (1993) Model-based systems engineering: An introduction to the mathematical theory of discrete systems and to the tricotyledon theory of system design. CRC, Boca Raton 2. Zeigler BP, Kim TG, Praehofer H (2000) Theory of modeling and simulation. Academic Press, New York 3. Ören TI, Zeigler BP (1979) Concepts for advanced simulation methodologies. Simulation 32(3):69–82 4. http://en.wikipedia.org/wiki/DEVS Accessed Aug 2008 5. Knepell PL, Aragno DC (1993) Simulation validation: a confidence assessment methodology. IEEE Computer Society Press, Los Alamitos

225

226

Artificial Intelligence in Modeling and Simulation

6. Law AM, Kelton WD (1999) Simulation modeling and analysis, 3rd edn. McGraw-Hill, Columbus 7. Sargent RG (1994) Verification and validation of simulation models. In: Winter simulation conference. pp 77–84 8. Balci O (1998) Verification, validation, and testing. In: Winter simulation conference. 9. Davis KP, Anderson AR (2003) Improving the composability of department of defense models and simulations, RAND technical report. http://www.rand.org/pubs/monographs/MG101/. Accessed Nov 2007; J Def Model Simul Appl Methodol Technol 1(1):5–17 10. Ylmaz L, Oren TI (2004) A conceptual model for reusable simulations within a model-simulator-context framework. Conference on conceptual modeling and simulation. Conceptual Models Conference, Italy, 28–31 October, pp 28–31 11. Traore M, Muxy A (2004) Capturing the dual relationship between simulation models and their context. Simulation practice and theory. Elsevier 12. Page E, Opper J (1999) Observations on the complexity of composable simulation. In: Proceedings of winter simulation conference, Orlando, pp 553–560 13. Kasputis S, Ng H (2000) Composable simulations. In: Proceedings of winter simulation conference, Orlando, pp 1577–1584 14. Sarjoughain HS (2006) Model composability. In: Perrone LF, Wieland FP, Liu J, Lawson BG, Nicol DM, Fujimoto RM (eds) Proceedings of the winter simulation conference, pp 104–158 15. DiMario MJ (2006) System of systems interoperability types and characteristics in joint command and control. In: Proceedings of the 2006 IEEE/SMC international conference on system of systems engineering, Los Angeles, April 2006 16. Sage AP, Cuppan CD (2001) On the systems engineering and management of systems of systems and federation of systems. Information knowledge systems management, vol 2, pp 325– 345 17. Dahmann JS, Kuhl F, Weatherly R (1998) Standards for simulation: as simple as possible but not simpler the high level architecture for simulation. Simulation 71(6):378 18. Sarjoughian HS, Zeigler BP (2000) DEVS and HLA: Complimentary paradigms for M&S? Trans SCS 4(17):187–197 19. Yilmaz L (2004) On the need for contextualized introspective simulation models to improve reuse and composability of defense simulations. J Def Model Simul 1(3):135–145 20. Tolk A, Muguira JA (2003) The levels of conceptual interoperability model (LCIM). In: Proceedings fall simulation interoperability workshop, http://www.sisostds.org Accessed Aug 2008 21. http://en.wikipedia.org/wiki/Simula Accessed Aug 2008 22. http://en.wikipedia.org/wiki/Java Accessed Aug 2008 23. http://en.wikipedia.org/wiki/Expert_system Accessed Aug 2008 24. http://en.wikipedia.org/wiki/Frame_language Accessed Aug 2008 25. http://en.wikipedia.org/wiki/Agent_based_model Accessed Aug 2008 26. http://www.swarm.org/wiki/Main_Page Accessed Aug 2008 27. Unified Modeling Language (UML) http://www.omg.org/ technology/documents/formal/uml.htm 28. Object Modeling Group (OMG) http://www.omg.org 29. http://en.wikipedia.org/wiki/Semantic_web Accessed Aug 2008

30. Zeigler BP (1990) Object Oriented Simulation with Hierarchical, Modular Models: Intelligent Agents and Endomorphic Systems. Academic Press, Orlando 31. http://en.wikipedia.org/wiki/Service_oriented_architecture Accessed Aug 2008 32. Mittal S, Mak E, Nutaro JJ (2006) DEVS-Based dynamic modeling & simulation reconfiguration using enhanced DoDAF design process. Special issue on DoDAF. J Def Model Simul, Dec (3)4:239–267 33. Ziegler BP (1988) Simulation methodology/model manipulation. In: Encyclopedia of systems and controls. Pergamon Press, England 34. Alexiev V, Breu M, de Bruijn J, Fensel D, Lara R, Lausen H (2005) Information integration with ontologies. Wiley, New York 35. Kim L (2003) Official XMLSPY handbook. Wiley, Indianapolis 36. Zeigler BP, Hammonds P (2007) Modeling & simulation-based data engineering: introducing pragmatics into ontologies for net-centric information exchange. Academic Press, New York 37. Simard RJ, Zeigler BP, Couretas JN (1994) Verb phrase model specification via system entity structures. AI and Planning in high autonomy systems, 1994. Distributed interactive simulation environments. Proceedings of the Fifth Annual Conference, 7–9 Dec 1994, pp 192–1989 38. Checkland P (1999) Soft systems methodology in action. Wiley, London 39. Holland JH (1992) Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. MIT Press, Cambridge 40. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley Professional, Princeton 41. Davis L (1987) Genetic algorithms and simulated annealing. Morgan Kaufmann, San Francisco 42. Zbigniew M (1996) Genetic algorithms + data structures = evolution programs. Springer, Heidelberg 43. Cheon S (2007) Experimental frame structuring for automated model construction: application to simulated weather generation. Doct Diss, Dept of ECE, University of Arizona, Tucson 44. Zeigler BP (1984) Multifaceted modelling and discrete event simulation. Academic Press, London 45. Rozenblit JW, Hu J, Zeigler BP, Kim TG (1990) Knowledge-based design and simulation environment (KBDSE): foundational concepts and implementation. J Oper Res Soc 41(6):475–489 46. Kim TG, Lee C, Zeigler BP, Christensen ER (1990) System entity structuring and model base management. IEEE Trans Syst Man Cyber 20(5):1013–1024 47. Zeigler BP, Zhang G (1989) The system entity structure: knowledge representation for simulation modeling and design. In: Widman LA, Loparo KA, Nielsen N (eds) Artificial intelligence, simulation and modeling. Wiley, New York, pp 47–73 48. Luh C, Zeigler BP (1991) Model base management for multifaceted systems. ACM Trans Model Comp Sim 1(3):195–218 49. Couretas J (1998) System entity structure alternatives enumeration environment (SEAS). Doctoral Dissertation Dept of ECE, University of Arizona 50. Hyu C Park, Tag G Kim (1998) A relational algebraic framework for VHDL models management. Trans SCS 15(2):43–55 51. Chi SD, Lee J, Kim Y (1997) Using the SES/MB framework to analyze traffic flow. Trans SCS 14(4):211–221 52. Cho TH, Zeigler BP, Rozenblit JW (1996) A knowledge based simulation environment for hierarchical flexible manufacturing. IEEE Trans Syst Man Cyber- Part A: Syst Hum 26(1):81–91

Artificial Intelligence in Modeling and Simulation

53. Carruthers P (2006) Massively modular mind architecture the architecture of the mind. Oxford University Press, USA, pp 480 54. Wolpert L (2004) Six impossible things before breakfast: The evolutionary origin of belief, W.W. Norton London 55. Zeigler BP (2005) Discrete event abstraction: an emerging paradigm for modeling complex adaptive systems perspectives on adaptation, In: Booker L (ed) Natural and artificial systems, essays in honor of John Holland. Oxford University Press, Oxford 56. Nutaro J, Zeigler BP (2007) On the stability and performance of discrete event methods for simulating continuous systems. J Comput Phys 227(1):797–819 57. Muzy A, Nutaro JJ (2005) Algorithms for efficient implementation of the DEVS & DSDEVS abstract simulators. In: 1st Open International Conference on Modeling and Simulation (OICMS). Clermont-Ferrand, France, pp 273–279 58. Muzy A The activity paradigm for modeling and simulation of complex systems. (in process) 59. Hofstadter D (2007) I am a strange loop. Basic Books 60. Minsky M (1988) Society of mind. Simon & Schuster, Goldman 61. Alvin I (2006) Goldman simulating minds: the philosophy, psychology, and neuroscience of mindreading. Oxford University Press, USA 62. Denning PJ (2007) Computing is a natural science. Commun ACM 50(7):13–18 63. Luck M, McBurney P, Preist C (2003) Agent technology: enabling next generation computing a roadmap for agent based computing. Agentlink, Liverpool 64. Miller JH, Page SE (2007) Complex adaptive systems: an introduction to computational models of social life. Princeton University Press, Princeton 65. Ferber J (1999) Multi-Agent systems: an introduction to distributed artificial intelligence. Addison-Wesley, Princeton 66. Gasser L, Braganza C, Herman N (1987) Mace: a extensible testbed for distributed AI research. Distributed artificial intelligence – research notes in artificial intelligence, pp 119–152 67. Agha G, Hewitt C (1985) Concurrent programming using actors: exploiting large-scale parallelism. In: Proceedings of the foundations of software technology and theoretical computer science, Fifth Conference, pp 19–41

68. Smith RG (1980) The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Trans Comput 29(12):1104–1113 69. Shoham Y (1993) Agent-oriented programming. Artif Intell 60(1):51–92 70. Dahl OJ, Nygaard K (1967) SIMULA67 Common base definiton. Norweigan Computing Center, Norway 71. Rao AS, George MP (1995) BDI-agents: from theory to practice. In: Proceedings of the first intl. conference on multiagent systems, San Francisco 72. Firby RJ (1992) Building symbolic primitives with continuous control routines. In: Procedings of the First Int Conf on AI Planning Systems. College Park, MD pp 62–29 73. Yilmaz L, Paspuletti S (2005) Toward a meta-level framework for agent-supported interoperation of defense simulations. J Def Model Simul 2(3):161–175

Books and Reviews Alexiev V, Breu M, de Bruijn J, Fensel D, Lara R, Lausen H (2005) Information integration with ontologies. Wiley, New York Alvin I (2006) Goldman Simulating Minds: The philosophy, psychology, and neuroscience of mindreading. Oxford University Press, USA Carruthers P (2006) Massively modular mind architecture The architecture of the mind. http://www.amazon.com/exec/obidos/ search-handle-url/102-1253221-6360121 John H (1992) Holland adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. The MIT Press Cambridge Zeigler BP (1990) Object oriented simulation with hierarchical, modular models: intelligent agents and endomorphic systems. Academic Press, Orlando Zeigler BP, Hammonds P (2007) Modeling & simulation-based data engineering: introducing pragmatics into ontologies for net-centric information exchange. Academic Press, New York Zeigler BP, Kim TG, Praehofer H (2000) Theory of modeling and simulation. Academic Press, New York

227

228

Bacterial Computing

Bacterial Computing MARTYN AMOS Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, UK Article Outline Glossary Definition of the Subject Introduction Motivation for Bacterial Computing The Logic of Life Rewiring Genetic Circuitry Successful Implementations Future Directions Bibliography Glossary DNA Deoxyribonucleic acid. Molecule that encodes the genetic information of cellular organisms. Operon Set of functionally related genes with a common promoter (“on switch”). Plasmid Small circular DNA molecule used to transfer genes from one organism to another. RNA Ribonucleic acid. Molecule similar to DNA, which helps in the conversion of genetic information to proteins. Transcription Conversion of a genetic sequence into RNA. Translation Conversion of an RNA sequence into an amino acid sequence (and, ultimately, a protein). Definition of the Subject Bacterial computing is a conceptual subset of synthetic biology, which is itself an emerging scientific discipline largely concerned with the engineering of biological systems. The goals of synthetic biology may be loosely partioned into four sets: (1) To better understand the fundamental operation of the biological system being engineered, (2) To extend synthetic chemistry, and create improved systems for the synthesis of molecules, (3) To investigate the “optimization” of existing biological systems for human purposes, (4) To develop and apply rational engineering principles to the design and construction of biological systems. It is on these last two goals that we focus in the current article. The main benefits that may accrue from these studies are both theoretical and practical; the construction and study of synthetic biosystems could improve our quantitative understanding of the fundamental underly-

ing processes, as well as suggesting plausible applications in fields as diverse as pharmaceutical synthesis and delivery, biosensing, tissue engineering, bionanotechnology, biomaterials, energy production and environmental remediation. Introduction Complex natural processes may often be described in terms of networks of computational components, such as Boolean logic gates or artificial neurons. The interaction of biological molecules and the flow of information controlling the development and behavior of organisms is particularly amenable to this approach, and these models are well-established in the biological community. However, only relatively recently have papers appeared proposing the use of such systems to perform useful, human-defined tasks. For example, rather than merely using the network analogy as a convenient technique for clarifying our understanding of complex systems, it may now be possible to harness the power of such systems for the purposes of computation. Despite the relatively recent emergence of biological computing as a distinct research area, the link between biology and computer science is not a new one. Of course, for years biologists have used computers to store and analyze experimental data. Indeed, it is widely accepted that the huge advances of the Human Genome Project (as well as other genome projects) were only made possible by the powerful computational tools available. Bioinformatics has emerged as “the science of the 21st century”, requiring the contributions of truly interdisciplinary scientists who are equally at home at the lab bench or writing software at the computer. However, the seeds of the relationship between biology and computer science were sown over fifty years ago, when the latter discipline did not even exist. When, in the 17th century, the French mathematician and philosopher René Descartes declared to Queen Christina of Sweden that animals could be considered a class of machines, she challenged him to demonstrate how a clock could reproduce. Three centuries later in 1951, with the publication of “The General and Logical Theory of Automata” [38] John von Neumann showed how a machine could indeed construct a copy of itself. Von Neumann believed that the behavior of natural organisms, although orders of magnitude more complex, was similar to that of the most intricate machines of the day. He believed that life was based on logic. We now begin to look at how this view of life may be used, not simply as a useful analogy, but as the practical foundation of a whole new engineering discipline.

Bacterial Computing

Motivation for Bacterial Computing Here we consider the main motivations behind recent work on bacterial computing (and, more broadly, synthetic biology). Before recombinant DNA technology made it possible to construct new genetic sequences, biologists were restricted to crudely “knocking out” individual genes from an organism’s genome, and then assessing the damage caused (or otherwise). Such knock-outs gradually allowed them to piece together fragments of causality, but the process was very time-consuming and error-prone. Since the dawn of genetic engineering – with the ability to synthesize novel gene segments – biologists have been in a position to make much more finely-tuned modifications to their organism of choice, this generating much more refined data. Other advances in biochemistry have also contributed, allowing scientists to – for example – investigate new types of genetic systems with, for example, twelve bases, rather than the traditional four [14]. Such creations have yielded valuable insights into the mechanics of mutation, adaptation and evolution. Researchers in synthetic biology are now extending their work beyond the synthesis of single genes, and are now introducing whole new gene complexes into organisms. The objectives behind this work are both theoretical and practical. As Benner and Seymour argue [5], “ . . . a synthetic goal forces scientists to cross uncharted ground to encounter and solve problems that are not easily encountered through [top-down] analysis. This drives the emergence of new paradigms [“world views”] in ways that analysis cannot easily do.” Drew Endy agrees. “Worst-case scenario, it’s a complementary approach to traditional discovery science” [18]. “Best-case scenario, we get systems that are simpler and easier to understand . . . ” Or, to put it bluntly, “Let’s build new biological systems – systems that are easier to understand because we made them that way” [31]. As well as shedding new light on the underlying biology, these novel systems may well have significant practical utility. Such new creations, according to Endy’s “personal wish list” might include “generating biological machines that could clean up toxic waste, detect chemical weapons, perform simple computations, stalk cancer cells, lay down electronic circuits, synthesize complex compounds and even produce hydrogen from sunlight” [18]. In the next Section we begin to consider how this might be achieved, by first describing the underlying logic of genetic circuitry. The Logic of Life Twenty years after von Neumann’s seminal paper, Francois Jacob and Jacques Monod identified specific natural

processes that could be viewed as behaving according to logical principles: “The logic of biological regulatory systems abides not by Hegelian laws but, like the workings of computers, by the propositional algebra of George Boole.” [29] This conclusion was drawn from earlier work of Jacob and Monod [30]. In addition, Jacob and Monod described the “lactose system” [20], which is one of the archetypal examples of a Boolean biosystem. We describe this system shortly, but first give a brief introduction to the operation of genes in general terms. DNA as the Carrier of Genetic Information The central dogma of molecular biology [9] is that DNA produces RNA, which in turn produces proteins. The basic “building blocks” of genetic information are known as genes. Each gene codes for one specific protein and may be turned on (expressed) or off (repressed) when required. Transcription and Translation We now describe the processes that determine the structure of a protein, and hence its function. Note that in what follows we assume the processes described occur in bacteria, rather than in higher organisms such as humans. For a full description of the structure of the DNA molecule, see the chapter on DNA computing. In order for a DNA sequence to be converted into a protein molecule, it must be read (transcribed) and the transcript converted (translated) into a protein. Transcription of a gene produces a messenger RNA (mRNA) copy, which can then be translated into a protein. Transcription proceeds as follows. The mRNA copy is synthesized by an enzyme known as RNA polymerase. In order to do this, the RNA polymerase must be able to recognize the specific region to be transcribed. This specificity requirement facilitates the regulation of genetic expression, thus preventing the production of unwanted proteins. Transcription begins at specific sites within the DNA sequence, known as promoters. These promoters may be thought of as “markers”, or “signs”, in that they are not transcribed into RNA. The regions that are transcribed into RNA (and eventually translated into protein) are referred to as structural genes. The RNA polymerase recognizes the promoter, and transcription begins. In order for the RNA polymerase to begin transcription, the double helix must be opened so that the sequence of bases may be read. This opening involves the breaking of the hydrogen bonds between bases. The RNA polymerase then

229

230

Bacterial Computing

moves along the DNA template strand in the 3 ! 50 direction. As it does so, the polymerase creates an antiparallel mRNA chain (that is, the mRNA strand is the equivalent of the Watson-Crick complement of the template). However, there is one significant difference, in that RNA contains uracil instead of thymine. Thus, in mRNA terms, “U binds with A.” The RNA polymerase moves along the DNA, the DNA re-coiling into its double-helix structure behind it, until it reaches the end of the region to be transcribed. The end of this region is marked by a terminator which, like the promoter, is not transcribed. Genetic Regulation Each step of the conversion, from stored information (DNA), through mRNA (messenger), to protein synthesis (effector), is itself catalyzed by other effector molecules. These may be enzymes or other factors that are required for a process to continue (for example, sugars). Consequently, a loop is formed, where products of one gene are required to produce further gene products, and may even influence that gene’s own expression. This process was first described by Jacob and Monod in 1961 [20], a discovery that earned them a share of the 1965 Nobel Prize in Physiology or Medicine. Genes are composed of a number of distinct regions, which control and encode the desired product. These regions are generally of the form promoter–gene– terminator. Transcription may be regulated by effector molecules known as inducers and repressors, which interact with the promoter and increase or decrease the level of transcription. This allows effective control over the expression of proteins, avoiding the production of unnecessary compounds. It is important to note at this stage that, in reality, genetic regulation does not conform to the digital “on-off” model that is popularly portrayed; rather, it is continuous or analog in nature. The Lac Operon One of the most well-studied genetic systems is the lac operon. An operon is a set of functionally related genes with a common promoter. An example of this is the lac operon, which contains three structural genes that allow E. coli to utilize the sugar lactose. When E.coli is grown on the sugar glucose, the product of the (separate, and unrelated to the lac operon) lacI gene represses the transcription of the lacZYA operon (i. e., the operon is turned off). However, if lactose is supplied together with glucose, a lactose by-product is produced which interacts with the repressor molecule, preventing

it from repressing the lacZYA operon. This de-repression does not itself initiate transcription, since it would be inefficient to utilize lactose if the more common sugar glucose were still available. The operon is positively regulated (i. e., “encouraged”) by a different molecule, whose level increases as the amount of available glucose decreases. Therefore, if lactose were present as the sole carbon source, the lacI repression would be relaxed and the high “encouraging” levels would activate transcription, leading to the synthesis of the lacZYA gene products. Thus, the promoter is under the control of two sugars, and the lacZYA operon is only transcribed when lactose is present and glucose is absent. In essence, Jacob and Monod showed how a gene may be thought of (in very abstract terms) as a binary switch, and how the state of that switch might be affected by the presence or absence of certain molecules. Monod’s point, made in his classic book Chance and Necessity and quoted above, was that the workings of biological systems operate not by Hegel’s philosophical or metaphysical logic of understanding, but according to the formal, mathematicallygrounded logical system of George Boole. What Jacob and Monod found was that the transcription of a gene may be regulated by molecules known as inducers and repressors, which either increase or decrease the “volume” of a gene (corresponding to its level of transcription, which isn’t always as clear cut and binary as Monod’s quote might suggest). These molecules interact with the promoter region of a gene, allowing the gene’s level to be finely “tuned”. The lac genes are so-called because, in the E. coli bacterium, they combine to produce a variety of proteins that allow the cell to metabolise the sugar lactose (which is most commonly found in milk, hence the derivation from the Latin, lact, meaning milk). For reasons of efficiency, these proteins should only be produced (i. e., the genes be turned on) when lactose is present in the cell’s environment. Making these proteins when lactose is absent would be a waste of the cell’s resources, after all. However, a different sugar – glucose – will always be preferable to lactose, if the cell can get it, since glucose is an “easier” form of sugar to metabolise. So, the input to and output from the lac operon may be expressed as a truth table, with G and L standing for glucose and lactose (1 if present, 0 if absent), and O standing for the output of the operon (1 if on, 0 if off): G 0 0 1 1

L 0 1 0 1

O 0 1 0 0

Bacterial Computing

The Boolean function that the lac operon therefore physically computes is (L AND (NOT G)), since it only outputs 1 if L=1 (lactose present) and G=0 (glucose is absent). By showing how one gene could affect the expression of another – just like a transistor feeds into the input of another and affects its state – Jacob and Monod laid the foundations for a new way of thinking about genes; not simply in terms of protein blueprints, but of circuits of interacting parts, or dynamic networks of interconnected switches and logic gates. This view of the genome is now well-established [22,23,27], but in the next Section we show how it might be used to guide the engineering of biological systems. Rewiring Genetic Circuitry A key difference between the wiring of a computer chip and the circuitry of the cell is that “electronics engineers know exactly how resistors and capacitors are wired to each other because they installed the wiring. But biologists often don’t have a complete picture. They may not know which of thousands of genes and proteins are interacting at a given moment, making it hard to predict how circuits will behave inside cells” [12]. This makes the task of reengineering vastly more complex. Rather than trying assimilate the huge amounts of data currently being generated by the various genome projects, synthetic biologists are taking a novel route – simplify and build. “They create models of genetic circuits, build the circuits, see if they work, and adjust them if they don’t – learning about biology in the process. ‘I view it as a reductionist approach to systems biology,’ says biomedical engineer Jim Collins of Boston University” [12]. The field of systems biology has emerged in recent years as an alternative to the traditional reductionist way of doing science. Rather than simply focussing on a single level of description (such as individual proteins), researchers are now seeking to integrate information from many different layers of complexity. By studing how different biological components interact, rather than simply looking at their structure, systems biologists are attempting to build models of systems from the bottom up. A model is simply an abstract description of how a system operates – for example, a set of equations that describe how a disease spreads throughout a population. The point of a model is to capture the essence of a system’s operation, and it should therefore be as simple as possible. Crucially, a model should also be capable of making predictions, which can then be tested against reality using real data (for example, if infected people are placed in quarantine for a week, how does this affect the progress of the

disease in question, or what happens if I feed this signal into the chip?). The results obtained from these tests may then feed back in to further refinements of the model, in a continuous cycle of improvement. When a model suggests a plausible structure for a synthetic genetic circuit, the next stage is to engineer it into the chosen organism, such as a bacterium. Once the structure of the DNA molecule was elucidated and the processes of transcription and translation were understood, molecular biologists were frustrated by the lack of suitable experimental techniques that would facilitate more detailed examination of the genetic material. However, in the early 1970s, several techniques were developed that allowed previously impossible experiments to be carried out (see [8,33]). These techniques quickly led to the first ever successful cloning experiments [19,25]. Cloning is generally defined as “ . . . the production of multiple identical copies of a single gene, cell, virus, or organism.” [35]. This is achieved as follows: a specific sequence (corresponding, perhaps, to a novel gene) is inserted in a circular DNA molecule, known as a plasmid, or vector, producing a recombinant DNA molecule. The vector acts as a vehicle, transporting the sequence into a host cell (usually a bacterium, such as E.coli). Cloning single genes is well-established, but is often done on an ad hoc basis. If biological computing is to succeed, it requires some degree of standardization, in the same way that computer manufacturers build different computers, but using a standard library of components. “Biobricks are the first example of standard biological parts,” explains Drew Endy [21]. “You will be able to use biobricks to program systems that do whatever biological systems do.” He continues. “That way, if in the future, someone asks me to make an organism that, say, counts to 3,000 and then turns left, I can grab the parts I need off the shelf, hook them together and predict how they will perform” [15]. Each biobrick is a simple component, such as an AND gate, or an inverter (NOT). Put them together one after the other, and you have a NAND (NOT-AND) gate, which is all that is needed to build any Boolean circuit (an arbitrary circuit can be translated to an equivalent circuit that uses only NAND gates. It will be much bigger than the original, but it will compute the same function. Such considerations were important in the early stages of integrated circuits, when building different logic gates was difficult and expensive). Just as transistors can be used together to build logic gates, and these gates then combined into circuits, there exists a hierarchy of complexity with biobricks. At the bottom are the “parts”, which generally correspond to coding regions for proteins. Then, one level up, we have “devices”, which are built from parts – the oscillator of Elowitz and

231

232

Bacterial Computing

Leibler, for example, could be constructed from three inverter devices chained together, since all an inverter does is “flip” its signal from 1 to 0 (or vice versa). This circuit would be an example of the biobricks at the top of the conceptual tree – “systems”, which are collections of parts to do a significant task (like oscillating or counting). Tom Knight at MIT made the first 6 biobricks, each held in a plasmid ready for use. As we have stated, plasmids can be used to insert novel DNA sequences in the genomes of bacteria, which act as the “testbed” for the biobrick circuits. “Just pour the contents of one of these vials into a standard reagent solution, and the DNA will transform itself into a functional component of the bacteria,” he explains [6]. Drew Endy was instrumental in developing this work further, one invaluable resource being the Registry of Standard Biological Parts [32], the definitive catalogue of new biological component. At the start of 2006, it contained 224 “basic” parts and 459 “composite” parts, with 270 parts “under construction”. Biobricks are still at a relatively early stage, but “Eventually we’ll be able to design and build in silico and go out and have things synthesized,” says Jay Keasling, head of Lawrence Berkeley National Laboratory’s new synthetic biology department [12]. Successful Implementations Although important foundational work had been performed by Arkin and Ross as early as 1994 [2], the year 2000 was a particularly significant one for synthetic biology. In January two foundational papers appeared backto-back in the same issue of Nature. “Networks of interacting biomolecules carry out many essential functions in living cells, but the ‘design principles’ underlying the functioning of such intracellular networks remain poorly understood, despite intensive efforts including quantitative analysis of relatively simple systems. Here we present a complementary approach to this problem: the design and construction of a synthetic network to implement a particular function” [11]. That was the introduction to a paper that Drew Endy would call “the high-water mark of a synthetic genetic circuit that does something” [12]. In the first of the two articles, Michael Elowitz and Stanislau Leibler (then both at Princeton) showed how to build microscopic “Christmas tree lights” using bacteria. Synthetic Oscillator In physics, an oscillator is a system that produces a regular, periodic “output”. Familiar examples include a pendulum, a vibrating string, or a lighthouse. Linking several oscilla-

tors together in some way gives rise to synchrony – for example, heart cells repeatedly firing in unison, or millions of fireflies blinking on and off, seemingly as one [36]. Leibler actually had two articles published in the same high-impact issue of Nature. The other was a short communication, co-authored with Naama Barkai – also at Princeton, but in the department of Physics [3]. In their paper, titled “Circadian clocks limited by noise”, Leibler and Barkai showed how a simple model of biochemical networks could oscillate reliably, even in the presence of noise. They argued that such oscillations (which might, for example, control the internal circadian clock that tells us when to wake up and when to be tired) are based on networks of genetic regulation. They built a simulation of a simple regulatory network using a Monte Carlo algorithm. They found that, however they perturbed the system, it still oscillated reliably, although, at the time, their results existed only in silico. The other paper by Leibler was much more applied, in the sense that they had constructed a biological circuit [11]. Elowitz and Leibler had succeeded in constructing an artificial genetic oscillator in cells, using a synthetic network of repressors. They called this construction the repressilator. Rather than investigating existing oscillator networks, Elowitz and Leibler decided to build one entirely from first principles. They chose three repressor-promoter pairs that had already been sequenced and characterized, and first built a mathematical model in software. By running the sets of equations, they identified from their simulation results certain molecular characteristics of the components that gave rise to so-called limit cycle oscillations; those that are robust to perturbations. This information from the model’s results lead Elowitz and Leibler to select strong promoter molecules and repressor molecules that would rapidly decay. In order to implement the oscillation, they chose three genes, each of which affected one of the others by repressing it, or turning it off. For the sake of illustration, we call the genes A, B and C. The product of gene A turns off (represses) gene B. The absence of B (which represses C) allows C to turn on. C is chosen such that it turns gene A off again, and the three genes loop continuously in a “daisy chain” effect, turning on and off in a repetitive cycle. However, some form of reporting is necessary in order to confirm that the oscillation is occurring as planned. Green fluorescent protein (GFP) is a molecule found occurring naturally in the jellyfish Aequorea victoria. Biologists find it invaluable because it has one interesting property – when placed under ultraviolet light, it glows. Biologists quickly sequenced the gene responsible for producing this protein, as they realized that it could have

Bacterial Computing

many applications as a reporter. By inserting the gene into an organism, you have a ready-made “status light” – when placed into bacteria, they glow brightly if the gene is turned on, and look normal if it’s turned off. We can think of it in terms of a Boolean circuit – if the circuit outputs the value 1, the GFP promoter is produced to turn on the light. If the value is 0, the promoter isn’t produced, the GFP gene isn’t expressed, and the light stays off. Elowitz and Leibler set up their gene network so that the GFP gene would be expressed whenever gene C was turned off – when it was turned on, the GFP would gradually decay and fade away. They synthesized the appropriate DNA sequences and inserted them into a plasmid, eventually yielding a population of bacteria that blinked on and off in a repetitive cycle, like miniature lighthouses. Moreover, and perhaps most significantly, the period between flashes was longer than the time taken for the cells to divide, showing that the state of the system had been passed on during reproduction. Synthetic Toggle Switch Rather than model an existing circuit and then altering it, Elowitz and Leibler had taken a “bottom up” approach to learning about how gene circuits operate. The other notable paper to appear in that issue was written by Timothy Gardner, Charles Cantor and Jim Collins, all of Boston University in the US. In 2000, Gardner, Collins and Cantor observed that genetic switching (such as that observed in the lambda phage [34]) had not yet been “demonstrated in networks of non-specialized regulatory components” [13]. That is to say, at that point nobody had been able to construct a switch out of genes that hadn’t already been “designed” by evolution to perform that specific task. The team had a similar philosophy to that of Elowitz and Leibler, in that their main motivation was being able to test theories about the fundamental behaviour of gene regulatory networks. “Owing to the difficulty of testing their predictions,” they explained, “these theories have not, in general, been verified experimentally. Here we have integrated theory and experiment by constructing and testing a synthetic, bistable [two-state] gene circuit based on the predictions of a simple mathematical model.” “We were looking for the equivalent of a light switch to flip processes on or off in the cell,” explained Gardner [10]. “Then I realized a way to do this with genes instead of with electric circuits.” The team chose two genes that were mutually inhibitory – that is, each produced a molecule that would turn the other off. One important thing to bear in mind is that the system didn’t have a single input. Al-

though the team acknowledged that bistability might be possible – in theory – using only a single promoter that regulated itself , they anticipated possible problems with robustness and experimental tunability if they used that approach. Instead, they decided to use a system whereby each “side” of the switch could be “pressed” by a different stimulus – the addition of a chemical on one, and a change in temperature on the other. Things were set up so that if the system was in the state induced by the chemical, it would stay in that state until the temperature was changed, and would only change back again if the chemical was reintroduced. Importantly, these stimuli did not have to be applied continuously – a “short sharp” burst was enough to cause the switch to flip over. As with the other experiment, Gardner and his colleagues used GFP as the system state reporter, so that the cells glowed in one state, and looked “normal” in the other. In line with the bottom-up approach, they first created a mathematical model of the system and made some predictions about how it would behave inside a cell. Within a year, Gardner had spliced the appropriate genes into the bacteria, and he was able to flip them –at will – from one state the the other. As McAdams and Arkin observed, synthetic “one way” switches had been created in the mid1980s, but “this is perhaps the first engineered design exploiting bistability to produce a switch with capability of reversibly switching between two . . . stable states” [28]. The potential applications of such a bacterial switch were clear. As they state in the conclusion to their article, “As a practical device, the toggle switch . . . may find applications in gene therapy and biotechnology.” They also borrowed from the language of computer programming, using an analogy between their construction and the short “applets” written in the Java language, which now allow us to download and run programs in our web browser. “Finally, as a cellular memory unit, the toggle forms the basis for ‘genetic applets’ – self-contained, programmable, synthetic gene circuits for the control of cell function”. Engineered Communication Towards the end of his life, Alan Turing did some foundational work on pattern formation in nature, in an attempt to explain how zebras get their striped coats or leopards their spots. The study of morphogenesis (from the Greek, morphe – shape, and genesis – creation. “Amorphous” therefore means “without shape or structure”) is concerned with how cells split to assume new roles and communicate with another to form very precise shapes, such as tissues and organs. Turing postulated that the diffusion of chemical signals both within and between cells is

233

234

Bacterial Computing

the main driving force behind such complex pattern formation [37]. Although Turing’s work was mainly concerned with the processes occurring amongst cells inside a developing embryo, it is clear that chemical signalling also goes on between bacteria. Ron Weiss of Princeton University was particularly interested in Vibrio fischeri, a bacterium that has a symbiotic relationship with a variety of aquatic creatures, including the Hawaiian squid. This relationship is due mainly to the fact that the bacteria exhibit bioluminescence – they generate a chemical known as a Luciferase (coded by the Lux gene), a version of which is also found in fireflies, and which causes them to glow when gathered together in numbers. Cells within the primitive light organs of the squid draw in bacteria from the seawater and encourage them to grow. Crucially, once enough bacteria are packed into the light organ they produce a signal to tell the squid cells to stop attracting their colleagues, and only then do they begin to glow. The cells get a safe environment in which to grow, protected from competition, and the squid has a light source by which to navigate and catch prey. The mechanism by which the Vibrio “know” when to start glowing is known as quorum sensing, since there have to be sufficient “members” present for luminiscence to occur. The bacteria secrete an autoinducer molecule, known as VAI (Vibrio Auto Inducer), which diffuses through the cell wall. The Lux gene (which generates the glowing chemical) needs to be activated (turned on) by a particular protein – which attracts the attention of the polymerase – but the protein can only do this with help from the VAI. Its particular 3D structure is such that it can’t fit tightly onto the gene unless it’s been slightly bent out of shape. Where there’s enough VAI present, it locks onto the protein and alters its conformation, so that it can turn on the gene. Thus, the concentration of VAI is absolutely crucial; once a critical threshold has been passed, the bacteria “know” that there are enough of them present, and they begin to glow. Weiss realized that this quorum-based cell-to-cell communication mechanism could provide a powerful framework for the construction of bacterial devices – imagine, for example, a tube of solution containing engineered bacteria that can be added to a sample of seawater, causing it to glow only if the concentration of a particular pollutant exceeds a certain threshold. Crucially, as we will see shortly, it also allows the possibility of generating precise “complex patterned” development. Weiss set up two colonies of E. coli, one containing “sender”, and the other “receivers”. The idea was that the senders would generate a chemical signal made up of VAI,

which could diffuse across a gap and then be picked up by the receivers. Once a strong enough signal was being communicated, the receivers would glow using GFP to say that it had been picked up. Weiss cloned the appropriate gene sequences (corresponding to a type of biobrick) into his bacteria, placed colonies of receiver cells on a plate, and the receivers started to glow in acknowledgment. Synthetic Circuit Evolution In late 2002, Weiss and his colleagues published another paper, this time describing how rigorous engineering principles may be brought to bear on the problem of designing and building entirely new genetic circuitry. The motivation was clear – “biological circuit engineers will have to confront their inability to predict the precise behavior of even the most simple synthetic networks, a serious shortcoming and challenge for the design and construction of more sophisticated genetic circuitry in the future” [39]. Together with colleagues Yohei Yokobayashi and Frances Arnold, Weiss proposed a two stage strategy: first, design a circuit from the bottom up, as Elowitz and others had before, and clone it into bacteria. Such circuits are highly unlikely to work first time, “because the behavior of biological components inside living cells is highly context-dependent, the actual circuit performance will likely differ from the design predictions, often resulting in a poorly performing or nonfunctional circuit.” Rather than simply abandoning their design, Weiss and his team decided to then tune the circuit inside the cell itself, by applying the principles of evolution. By inducing mutations in the DNA that they had just introduced, they were able to slightly modify the behaviour of the circuit that it represented. Of course, many of these changes would be catastrophic, giving even worse performance than before, but, occasionally, they observed a minor improvement. In that case, they kept the “winning” bacteria, and subjected it to another round of mutation, in a repeated cycle. In a microcosmic version of Darwinian evolution, mutation followed by selection of the fittest took an initially unpromising pool of broken circuits and transformed them into winners. “Ron is utilizing the power of evolution to design networks in ways so that they perform exactly the way you want them to,” observed Jim Collins [16]. In a commentary article in the same issue of the journal, Jeff Hasty called this approach “design then mutate” [17]. The team showed how a circuit made up of three genetic gates could be fine-tuned in vivo to give the correct performance, and they concluded that “the approach we have outlined should serve as a ro-

Bacterial Computing

bust and widely applicable route to obtaining circuits, as well as new genetic devices, that function inside living cells.” Pattern Formation The next topic studied by Weiss and his team was the problem of space – specifically, how to get a population of bacteria to cover a surface with a specific density. This facility could be useful when designing bacterial biosensors – devices that detect chemicals in the environment and produce a response. By controlling the density of the microbial components, it might be possible to tune the sensitivity of the overall device. More importantly, the ability for cells to control their own density would provide a useful “self-destruct” mechanism were these geneticallymodified bugs ever to be released into the environment for “real world” applications. In “Programmed population control by cell-cell communication and regulated killing” [40], Weiss and his team built on their previous results to demonstrate the ability to keep the density of an E. coli population artificially low – that is, below the “natural” density that could be supported by the available nutrients. They designed a genetic circuit that caused the bacteria to generate a different Vibrio signalling molecule, only this time, instead of making the cells glow, a sufficient concentration would flip a switch inside the cell, turning on a killer gene, encoding a protein that was toxic in sufficient quantities. The system behaved exactly as predicted by their mathematical model. The culture grew at an exponential rate (that is, doubling every time step) for seven hours, before hitting the defined density threshold. At that point the population dropped sharply, as countless cells expired, until the population settled at a steady density significantly (ten times) lower than an unmodified “control” colony. The team concluded that “The population-control circuit lays the foundations for using cell-cell communication to programme interactions among bacterial colonies, allowing the concept of communication -regulated growth and death to be extended to engineering synthetic ecosystems” [40]. The next stage was to programme cells to form specific spatial patterns in the dish. As we have already mentioned briefly, pattern formation is one characteristic of multicellular systems. This is generally achieved using some form of chemical signalling, combined with a differential response – that is, different cells, although genetically identical, may “read” the environmental signals and react in different ways, depending on their internal state. For example, one cell might be “hungry” and choose to move towards a food source, while an identical cell might choose

to remain in the same spot, since it has adequate levels of energy. The team used a variant of the sender-receiver model, only this time adding a “distance detection” component to the receiver circuit. The senders were placed in the center of the dish, and the receivers distributed uniformly across the surface. The receivers constructed so that they could measure the strength of the signal being “beamed” from the senders, a signal which decayed over distance (a little like a radio station gradually breaking up as you move out of the reception area). The cells were engineered so that only those that were either “near” to the senders or “far” from the senders would generate a response (those in the middle region were instructed to keep quiet). These cells are genetically identical, and are uniformly distributed over the surface – the differential response comes in the way that they assess the strength of the signal, and make a decision on whether or not to respond. The power of the system was increased further by making the near cells glow green, and those far away glow red (using a different fluorescent protein). When the team set the system running, they observed the formation of a “dartboard” pattern, with the “bullseye” being the colony of senders (instructed to glow cyan, or light blue), which was surrounded by a green ring, which in turn was surrounded by a red ring. By placing three sender colonies in a triangle, they were also able to obtain a green heart-shaped pattern, formed by the intersection of three green circles, as well as other patterns, determined solely by the initial configuration of senders [4]. Bacterial Camera Rather than generating light, a different team decided to use bacteria to detect light – in the process, building the world’s first microbial camera. By engineering a dense bed of E. coli, a team of students led by Chris Voight at Berkeley developed light-sensitive “film” capable of storing images at a resolution of 100 megapixels per square inch. E. coli are not normally sensitive to light, so the group took genes coding for photoreceptors from blue-green algae, and spliced them into their bugs [24]. When light was shone on the cells, it turned on a genetic switch that cause a chemical inside them to permanently darken, thus generating a black “pixel”. By projecting an image onto a plate of bacteria, the team were able to obtain several monochrome images, including the Nature logo and the face of team member Andrew Ellington. Nobel Laureate Sir Harry Kroto, discoverer of “buckballs”, called the team’s camera an “extremely exciting advance” [26], going on to say that “I have always thought that the first ma-

235

236

Bacterial Computing

jor nanotechnology advances would involve some sort of chemical modification of biology.” Future Directions Weiss and his team team suggest that “the integration of such systems into higher-level organisms and with different cell functions will have practical applications in threedimensional tissue engineering, biosensing, and biomaterial fabrication.” [4] One possible use for such a system might lie in the detection of bio-weapons – spread a culture of bacteria over a surface, and, with the appropriate control circuit, they will be able to accurately pinpoint the location of any pathogens. Programmed cells could eventually replace artificial tissues, or even organs – current attempts to build such constructions in the laboratory rely on cells arranging themselves around an artificial scaffold. Controlled cellular structure formation could do away with the need for such support – “The way we’re doing tissue engineering, right now, . . . is very unnatural,” argues Weiss. “Clearly cells make scaffolds themselves. If we’re able to program them to do that, we might be able to embed them in the site of injury and have them figure out for themselves what the pattern should be” [7]. In addition to building structures, others are considering engineering cells to act as miniature drug delivery systems – fighting disease or infection from the inside. Adam Arkin and Chris Voight are currently investigating the use of modified E. coli to battle against cancer tumours, while Jay Keasling and co-workers at Berkeley are looking at engineering circuits into the same bacteria to persuade them to generate a potent antimalarial drug that is normally found in small amounts in wormwood plants. Clearly, bacterial computing/synthetic biology is still at a relatively early stage in its development, although the field is growing at a tremendous pace. It could be argued, with some justification, that the dominant science of the new millennium may well prove to be at the intersection of biology and computing. As biologist Roger Brent argues, “I think that synthetic biology . . . will be as important to the 21st century as [the] ability to manipulate bits was to the 20th” [1]. Bibliography Primary Literature 1. Anon (2004) Roger Brent and the alpha project. ACM Ubiquity, 5(3) 2. Arkin A, Ross J (1994) Computational functions in biochemical reaction networks. Biophysical J 67:560–578 3. Barkai N, Leibler S (2000) Circadian clocks limited by noise. Nature 403:267–268

4. Basu S, Gerchman Y, Collins CH, Arnold FH, Weiss R (2005) A synthetic multicellular system for programmed pattern formation. Nature 434:1130–1134 5. Benner SA, Sismour M (2005) Synthetic biology. Nature Rev Genet 6:533–543 6. Brown C (2004) BioBricks to help reverse-engineer life. EE Times, June 11 7. Brown S (2005) Command performances. San Diego UnionTribune, December 14 8. Brown TA (1990) Gene cloning: an introduction, 2nd edn. Chapman and Hall, London 9. Crick F (1970) Central dogma of molecular biology. Nature 227:561–563 10. Eisenberg A (2000) Unlike viruses, bacteria find a welcome in the world of computing. New York Times, June 1 11. Elowitz M, Leibler S (2000) A synthetic oscillatory network of transcriptional regulators. Nature 403:335–338 12. Ferber D (2004) Synthetic biology: microbes made to order. Science 303(5655):158–161 13. Gardner T, Cantor R, Collins J (2000) Construction of a genetic toggle switch in Escherichia coli. Nature 403:339–342 14. Geyer CR, Battersby TR, Benner SA (2003) Nucleobase pairing in expanded Watson-Crick-like genetic information systems. Structure 11:1485–1498 15. Gibbs WW (2004) Synthetic life. Scientific Am April 26 16. Gravitz L (2004) 10 emerging technologies that will change your world. MIT Technol Rev February 17. Hasty J (2002) Design then mutate. Proc Natl Acad Sci 99(26):16516–16518 18. Hopkin K (2004) Life: the next generation. The Scientist 18(19):56 19. Jackson DA, Symons RH, Berg P (1972) Biochemical method for inserting new genetic information into DNA of simian virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc Natl Acad Sci 69:2904–2909 20. Jacob F, Monod J (1961) Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol 3:318–356 21. Jha A (2005) From the cells up. The Guardian, March 10 22. Kauffman S (1993) Gene regulation networks: a theory for their global structure and behaviors. Current topics in developmental biology 6:145–182 23. Kauffman SA (1993) The origins of order: Self-organization and selection in evolution. Oxford University Press, New York 24. Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voight CA (2005) Engineering Escherichia coli to see light. Nature 438:441–442 25. Lobban PE, Sutton CA (1973) Enzymatic end-to-end joining of DNA molecules. J Mol Biol 78(3):453–471 26. Marks P (2005) For ultrasharp pictures, use a living camera. New Scientist, November 26, p 28 27. McAdams HH, Shapiro L (1995) Circuit simulation of genetic networks. Science 269(5224):650–656 28. McAdams HH, Arkin A (2000) Genetic regulatory circuits: Advances toward a genetic circuit engineering discipline. Current Biol 10:318–320 29. Monod J (1970) Chance and Necessity. Penguin, London 30. Monod J, Changeux JP, Jacob F (1963) Allosteric proteins and cellular control systems. J Mol Biol 6:306–329 31. Morton O (2005) Life, Reinvented. Wired 13(1), January

Bacterial Computing

32. Registry of Standard Biological Parts. http://parts.mit.edu/ 33. Old R, Primrose S (1994) Principles of Gene Manipulation, an Introduction to Genetic Engineering, 5th edn. Blackwell, Boston 34. Ptashne M (2004) A Genetic Switch, 3rd edn. Phage Lambda Revisited. Cold Spring Harbor Laboratory Press, Woodbury 35. Roberts L, Murrell C (eds) (1998) An introduction to genetic engineering. Department of Biological Sciences, University of Warwick 36. Strogatz S (2003) Sync: The Emerging Science of Spontaneous Order. Penguin, London 37. Turing AM (1952) The chemical basis of morphogenesis. Phil Trans Roy Soc B 237:37–72 38. von Neumann J (1941) The general and logical theory of automata. In: Cerebral Mechanisms in Behavior. Wiley, New York 39. Yokobayashi Y, Weiss R, Arnold FH (2002) Directed evolution of a genetic circuit. Proc Natl Acad Sci 99(26):16587–16591 40. You L, Cox III RS, Weiss R, Arnold FH (2004) Programmed population control by cell-cell communication and regulated killing. Nature 428:868–871

Books and Reviews Alon U (2006) An Introduction to Systems Biology: Design Principles of Biological Circuits. Chapman and Hall/CRC Amos M (ed) (2004) Cellular Computing. Series in Systems Biology Oxford University Press Amos M (2006) Genesis Machines: The New Science of Biocomputing. Atlantic Books, London Benner SA (2003) Synthetic biology: Act natural. Nature 421: 118 Endy D (2005) Foundations for engineering biology. Nature 436:449–453 Kobayashi H, Kaern M, Araki M, Chung K, Gardner TS, Cantor CR, Collins JJ (2004) Programmable cells: interfacing natural and engineered gene networks. Proc Natl Acad Sci 101(22):8414– 8419 Sayler GS, Simpson ML, Cox CD (2004) Emerging foundations: nano-engineering and bio-microelectronics for environmental biotechnology. Curr Opin Microbiol 7:267–273

237

238

Bayesian Games: Games with Incomplete Information

Bayesian Games: Games with Incomplete Information SHMUEL Z AMIR Center for the Study of Rationality, Hebrew University, Jerusalem, Israel

Article Outline Glossary Definition of the Subject Introduction Harsanyi’s Model: The Notion of Type Aumann’s Model Harsanyi’s Model and Hierarchies of Beliefs The Universal Belief Space Belief Subspaces Consistent Beliefs and Common Priors Bayesian Games and Bayesian Equilibrium Bayesian Equilibrium and Correlated Equilibrium Concluding Remarks and Future Directions Acknowledgments Bibliography Glossary Bayesian game An interactive decision situation involving several decision makers (players) in which each player has beliefs about (i. e. assigns probability distribution to) the payoff relevant parameters and the beliefs of the other players. State of nature Payoff relevant data of the game such as payoff functions, value of a random variable, etc. It is convenient to think of a state of nature as a full description of a ‘game-form’ (actions and payoff functions). Type Also known as state of mind, is a full description of player’s beliefs (about the state of nature), beliefs about beliefs of the other players, beliefs about the beliefs about his beliefs, etc. ad infinitum. State of the world A specification of the state of nature (payoff relevant parameters) and the players’ types (belief of all levels). That is, a state of the world is a state of nature and a list of the states of mind of all players. Common prior and consistent beliefs The beliefs of players in a game with incomplete information are said to be consistent if they are derived from the same probability distribution (the common prior) by conditioning on each player’s private information. In other words, if the beliefs are consistent, the only source of differences in beliefs is difference in information.

Bayesian equilibrium A Nash equilibrium of a Bayesian game: A list of behavior and beliefs such that each player is doing his best to maximize his payoff, according to his beliefs about the behavior of the other players. Correlated equilibrium A Nash equilibrium in an extension of the game in which there is a chance move, and each player has only partial information about its outcome. Definition of the Subject Bayesian games (also known as Games with Incomplete Information) are models of interactive decision situations in which the decision makers (players) have only partial information about the data of the game and about the other players. Clearly this is typically the situation we are facing and hence the importance of the subject: The basic underlying assumption of classical game theory according to which the data of the game is common knowledge (CK) among the players, is too strong and often implausible in real situations. The importance of Bayesian games is in providing the tools and methodology to relax this implausible assumption, to enable modeling of the overwhelming majority of real-life situations in which players have only partial information about the payoff relevant data. As a result of the interactive nature of the situation, this methodology turns out to be rather deep and sophisticated, both conceptually and mathematically: Adopting the classical Bayesian approach of statistics, we encounter the need to deal with an infinite hierarchy of beliefs: what does each player believe that the other player believes about what he believes. . . is the actual payoff associated with a certain outcome? It is not surprising that this methodological difficulty was a major obstacle in the development of the theory, and this article is largely devoted to explaining and resolving this methodological difficulty. Introduction A game is a mathematical model for an interactive decision situation involving several decision makers (players) whose decisions affect each other. A basic, often implicit, assumption is that the data of the game, which we call the state of nature, are common knowledge (CK) among the players. In particular the actions available to the players and the payoff functions are CK. This is a rather strong assumption that says that every player knows all actions and payoff functions of all players, every player knows that all other players know all actions and payoff functions, every player knows that every player knows that every player knows. . . etc. ad infinitum. Bayesian games (also known as

Bayesian Games: Games with Incomplete Information

games with incomplete information), which is the subject of this article, are models of interactive decision situations in which each player has only partial information about the payoff relevant parameters of the given situation. Adopting the Bayesian approach, we assume that a player who has only partial knowledge about the state of nature has some beliefs, namely prior distribution, about the parameters which he does not know or he is uncertain about. However, unlike in a statistical problem which involves a single decision maker, this is not enough in an interactive situation: As the decisions of other players are relevant, so are their beliefs, since they affect their decisions. Thus a player must have beliefs about the beliefs of other players. For the same reason, a player needs beliefs about the beliefs of other players about his beliefs and so on. This interactive reasoning about beliefs leads unavoidably to infinite hierarchies of beliefs which looks rather intractable. The natural emergence of hierarchies of beliefs is illustrated in the following example: Example 1 Two players, P1 and P2, play a 2  2 game whose payoffs depend on an unknown state of nature s 2 f1; 2g. Player P1’s actions are fT; Bg, player P2’s actions are fL; Rg and the payoffs are given in the following matrices:

Now, since the optimal action of P1 depends not only on his belief p but also on the, unknown to him, action of P2, which depends on his belief q, player P1 must therefore have beliefs about q. These are his second-level beliefs, namely beliefs about beliefs. But then, since this is relevant and unknown to P2, he must have beliefs about that which will be third-level beliefs of P2, and so on. The whole infinite hierarchies of beliefs of the two players pop out naturally in the analysis of this simple two-person game of incomplete information. The objective of this article is to model this kind of situation. Most of the effort will be devoted to the modeling of the mutual beliefs structure and only then we add the underlying game which, together with the beliefs structure, defines a Bayesian game for which we define the notion of Bayesian equilibrium. Harsanyi’s Model: The Notion of Type

Assume that the belief (prior) of P1 about the event fs D 1g is p and the belief of P2 about the same event is q. The best action of P1 depends both on his prior and on the action of P2, and similarly for the best action of P2. This is given in the following tables:

As suggested by our introductory example, the straightforward way to describe the mutual beliefs structure in a situation of incomplete information is to specify explicitly the whole hierarchies of beliefs of the players, that is, the beliefs of each player about the unknown parameters of the game, each player’s beliefs about the other players’ beliefs about these parameters, each player’s beliefs about the other players’ beliefs about his beliefs about the parameters, and so on ad infinitum. This may be called the explicit approach and is in fact feasible and was explored and developed at a later stage of the theory (see [18,5,6,7]). We will come back to it when we discuss the universal belief space. However, for obvious reasons, the explicit approach is mathematically rather cumbersome and hardly manageable. Indeed this was a major obstacle to the development of the theory of games with incomplete information at its early stages. The breakthrough was provided by John Harsanyi [11] in a seminal work that earned him the Nobel Prize some thirty years later. While Harsanyi actually formulated the problem verbally, in an explicit way, he suggested a solution that ‘avoided’ the difficulty of having to deal with infinite hierarchies of beliefs, by providing a much more workable implicit, encapsulated model which we present now.

239

240

Bayesian Games: Games with Incomplete Information

The key notion in Harsanyi’s model is that of type. Each player can be of several types where a type is to be thought of as a full description of the player’s beliefs about the state of nature (the data of the game), beliefs about the beliefs of other players about the state of nature and about his own beliefs, etc. One may think of a player’s type as his state of mind; a specific configuration of his brain that contains an answer to any question regarding beliefs about the state of nature and about the types of the other players. Note that this implies self-reference (of a type to itself through the types of other players) which is unavoidable in an interactive decision situation. A Harsanyi game of incomplete information consists of the following ingredients (to simplify notations, assume all sets to be finite):  I – Player’s set.  S – The set of states of nature.  T i – The type set of player i 2 I. Let T D  i2I Ti – denote the type set, that is, the set type profiles.  Y  S  T – a set of states of the world.  p 2 (Y) – probability distribution on Y, called the common prior. (For a set A, we denote the set of probability distributions on A by (A).) Remark A state of the world ! thus consists of a state of nature and a list of the types of the players. We denote it as ! D (s(!); t1 (!); : : : ; t n (!)) : We think of the state of nature as a full description of the game which we call a game-form. So, if it is a game in strategic form, we write the state of nature at state of the world ! as: s(!) D (I; (A i (!)) i2I ; (u i (; !)) i2I ) : The payoff functions ui depend only on the state of nature and not on the types. That is, for all i 2 I: s(!) D s(! 0 ) ) u i (; !) D u i (; ! 0 ) : The game with incomplete information is played as follows: (1) A chance move chooses ! D (s(!); t1 (!); : : : ; t n (!)) 2 Y using the probability distribution p. (2) Each player is told his chosen type t i (!) (but not the chosen state of nature s(!) and not the other players’ types ti (!) D (t j (!)) j¤i ). (3) The players choose simultaneously an action: player i chooses a i 2 A i (!) and receives a payoff u i (a; !) where a D (a1 ; : : : ; a n ) is the vector of chosen actions

and ! is the state of the world chosen by the chance move. Remark The set A i (!) of actions available to player i in state of the world ! must be known to him. Since his only information is his type t i (!), we must impose that A i (!) is T i -measurable, i. e., t i (!) D t i (! 0 ) ) A i (!) D A i (! 0 ) : Note that if s(!) was commonly known among the players, it would be a regular game in strategic form. We use the term ‘game-form’ to indicate that the players have only partial information about s(!). The players do not know which s(!) is being played. In other words, in the extensive form game of Harsanyi, the game-forms (s(!))!2Y are not subgames since they are interconnected by information sets: Player i does not know which s(!) is being played since he does not know !; he knows only his own type t i (!). An important application of Harsanyi’s model is made in auction theory, as an auction is a clear situation of incomplete information. For example, in a closed privatevalue auction of a single indivisible object, the type of a player is his private-value for the object, which is typically known to him and not to other players. We come back to this in the section entitled “Examples of Bayesian Equilibria”. Aumann’s Model A frequently used model of incomplete information was given by Aumann [2]. Definition 2 An Aumann model of incomplete information is (I; Y; ( i ) i2I ; P) where:  I is the players’ set.  Y is a (finite) set whose elements are called states of the world.  For i 2 I,  i is a partition of Y.  P is a probability distribution on Y, also called the common prior. In this model a state of the world ! 2 Y is chosen according to the probability distribution P, and each player i is informed of  i (!), the element of his partition that contains the chosen state of the world !. This is the informational structure which becomes a game with incomplete information if we add a mapping s : Y ! S. The state of nature s(!) is the game-form corresponding to the state of the world ! (with the requirement that the action sets A i (!) are  i -measurable). It is readily seen that Aumann’s model is a Harsanyi model in which the type set T i of player i is the set of

Bayesian Games: Games with Incomplete Information

his partition elements, i. e., Ti D f i (!)j! 2 Yg, and the common prior on Y is P. Conversely, any Harsanyi model is an Aumann model in which the partitions are those defined by the types, i. e.,  i (!) D f! 0 2 Yjt i (! 0 ) D t i (!)g. Harsanyi’s Model and Hierarchies of Beliefs As our starting point in modeling incomplete information situations was the appearance of hierarchies of beliefs, one may ask how is the Harsanyi (or Aumann) model related to hierarchies of beliefs and how does it capture this unavoidable feature of incomplete information situations? The main observation towards answering this question is the following: Proposition 3 Any state of the world in Aumann’s model or any type profile t 2 T in Harsanyi’s model defines (uniquely) a hierarchy of mutual beliefs among the players. Let us illustrate the idea of the proof by the following example: Example Consider a Harsanyi model with two players, I and II, each of which can be of two types: TI D fI1 ; I2 g, TII D fII1 ; II2 g and thus: T D f(I1 ; II1 ); (I1 ; II2 ); (I2 ; II1 ); (I2 ; II2 )g. The probability p on types is given by:

Denote the corresponding states of nature by a D s(I1 II1 ), b D s(I1 II2 ), c D s(I2 II1 ) and d D s(I2 II2 ). These are the states of nature about which there is incomplete information. The game in extensive form:

Assume that the state of nature is a. What are the belief hierarchies of the players?

First-level beliefs are obtained by each player from p, by conditioning on his type:  I 1 : With probability 12 the state is a and with probability 1 2 the state is b.  I 2 : With probability 23 the state is c and with probability 1 3 the state is d.  II1 : With probability 37 the state is a and with probability 47 the state is c.  II2 : With probability 35 the state is b and with probability 25 the state is d. Second-level beliefs (using short-hand notation for the    above beliefs: 12 a C 12 b , etc.):     I 1 : With probability 12 , player II believes 37 a C 47 c ,    and with probability 12 , player II believes 35 b C 25 d .     I 2 : With probability 23 , player II believes 37 a C 47 c ,    and with probability 13 , player II believes 35 b C 25 d .     II1 : With probability 37 , player I believes 12 a C 12 b ,    and with probability 47 , player I believes 23 c C 13 d .     II2 : With probability 35 , player I believes 12 a C 12 b ,    and with probability 25 , player I believes 23 c C 13 d . Third-level beliefs:  I 1 : With probability

1 2,

player II believes that: “With    player I believes 12 a C 12 b and with probability    probability 47 , player I believes 23 c C 13 d ”. 3 7,

And with probability 12 , player II believes that: “With    probability 35 , player I believes 12 a C 12 b and with    probability 25 , player I believes 23 c C 13 d ”. and so on and so on. The idea is very simple and powerful; since each player of a given type has a probability distribution (beliefs) both about the types of the other players

241

242

Bayesian Games: Games with Incomplete Information

and about the set S of states of nature, the hierarchies of beliefs are constructed inductively: If the kth level beliefs (about S) are defined for each type, then the beliefs about types generates the (k C 1)th level of beliefs. Thus the compact model of Harsanyi does capture the whole hierarchies of beliefs and it is rather tractable. The natural question is whether this model can be used for all hierarchies of beliefs. In other words, given any hierarchy of mutual beliefs of a set of players I about a set S of states of nature, can it be represented by a Harsanyi game? This was answered by Mertens and Zamir [18], who constructed the universal belief space; that is, given a set S of states of nature and a finite set I of players, they looked for the space ˝ of all possible hierarchies of mutual beliefs about S among the players in I. This construction is outlined in the next section. The Universal Belief Space Given a finite set of players I D f1; : : : ; ng and a set S of states of nature, which are assumed to be compact, we first identify the mathematical spaces in which lie the hierarchies of beliefs. Recall that (A) denotes the set of probability distributions on A and define inductively the sequence of spaces (X k )1 kD1 by X1

D (S)

X kC1 D X k  (S  X kn1 );

(1) for k D 1; 2; : : : :

(2)

Any probability distribution on S can be a first-level belief and is thus in X 1 . A second-level belief is a joint probability distribution on S and the first-level beliefs of the other (n  1) players. This is an element in (S  X1n1 ) and therefore a two-level hierarchy is an element of the product space X 1  (S  X1n1 ), and so on for any level. Note that at each level belief is a joint probability distribution on S and the previous level beliefs, allowing for correlation between the two. In dealing with these probability spaces we need to have some mathematical structure. More specifically, we make use of the weak topology: Definition 4 A sequence (Fn )1 nD1 of probability measures  topology to the probabil(on ˝) converges in the weak R R ity F if and only if limn!1 ˝ g(!)dFn D ˝ g(!)dF for all bounded and continuous functions g : ˝ ! R. It follows from the compactness of S that all spaces defined by (1)–(2) are compact in the weak topology. However, for k > 1, not every element of X k represents a coherent hierarchy of beliefs of level k. For example, if (1 ; 2 ) 2 X 2 where 1 2 (S) D X1 and 2 2 (S  X1n1 ), then for this to describe meaningful

beliefs of a player, the marginal distribution of 2 on S must coincide with 1 . More generally, any event A in the space of k-level beliefs has to have the same (marginal) probability in any higher-level beliefs. Furthermore, not only are each player’s beliefs coherent, but he also considers only coherent beliefs of the other players (only those that are in support of his beliefs). Expressing formally this coherency condition yields a selection Tk X k such that T1 D X 1 D (S). It is proved that the projection of TkC1 on X k is T k (that is, any coherent k-level hierarchy can be extended to a coherent k C 1-level hierarchy) and that all the sets T k are compact. Therefore, the projective limit, T D lim1 k Tk , is well defined and nonempty.1 Definition 5 The universal type space T is the projective limit of the spaces (Tk )1 kD1 . That is, T is the set of all coherent infinite hierarchies of beliefs regarding S, of a player in I. It does not depend on i since by construction it contains all possible hierarchies of beliefs regarding S, and it is therefore the same for all players. It is determined only by S and the number of players n. Proposition 6 The universal type space T is compact and satisfies T  (S  T n1 ) :

(3)

The  sign in (3) is to be read as an isomorphism and Proposition 6 says that a type of player can be identified with a joint probability distribution on the state of nature and the types of the other players. The implicit equation (3) reflects the self-reference and circularity of the notion of type: The type of a player is his beliefs about the state of nature and about all the beliefs of the other players, in particular, their beliefs about his own beliefs. Definition 7 The universal belief space (UBS) is the space ˝ defined by: ˝ D S  Tn :

(4)

An element of ˝ is called a state of the world. Thus a state of the world is ! D (s(!); t1 (!); t2 (!); : : : ; t n (!)) with s(!) 2 S and t i (!) 2 T for all i in I. This is the specification of the states of nature and the types of all players. The universal belief space ˝ is what we looked for: the set of all incomplete information and mutual belief configurations of a set of n players regarding the state of 1 The projective limit (also known as the inverse limit) of the sequence (Tk )1 kD1 is the space T of all sequences (1 ; 2 ; : : : ) 2 1 kD1 Tk which satisfy: For any k 2 N, there is a probability distribution k 2 (S  Tkn1 ) such that kC1 D (k ; k ).

Bayesian Games: Games with Incomplete Information

nature. In particular, as we will see later, all Harsanyi and Aumann models are embedded in ˝, but it includes also belief configurations that cannot be modeled as Harsanyi games. As we noted before, the UBS is determined only by the set of states of nature S and the set of players I, so it should be denoted as ˝(S; I). For the sake of simplicity we shall omit the arguments and write ˝, unless we wish to emphasize the underlying sets S and I. The execution of the construction of the UBS according to the outline above involves some non-trivial mathematics, as can be seen in Mertens and Zamir [18]. The reason is that even with a finite number of states of nature, the space of first-level beliefs is a continuum, the second level is the space of probability distributions on a continuum and the third level is the space of probability distributions on the space of probability distributions on a continuum. This requires some structure for these spaces: For a (Borel) p measurable event E let B i (E) be the event “player i of type ti believes that the probability of E is at least p”, that is, p

B i (E) D f! 2 ˝jt i (E) pg Since this is the object of beliefs of players other than i (beliefs of j ¤ i about the beliefs of i), this set must also be measurable. Mertens and Zamir used the weak topology p which is the minimal topology with which the event B i (E) is (Borel) measurable for any (Borel) measurable event E. In this topology, if A is a compact set then (A), the space of all probability distributions on A, is also compact. However, the hierarchic construction can also be made with stronger topologies on (A) (see [9,12,17]). Heifetz and Samet [14] worked out the construction of the universal belief space without topology, using only a measurable structure (which is implied by the assumption that the beliefs of the players are measurable). All these explicit constructions of the belief space are within what is called the semantic approach. Aumann [6] provided another construction of a belief system using the syntactic approach based on sentences and logical formulas specifying explicitly what each player believes about the state of nature, about the beliefs of the other players about the state of nature and so on. For a detailed construction see Aumann [6], Heifetz and Mongin [13], and Meier [16]. For a comparison of the syntactic and semantic approaches see Aumann and Heifetz [7]. Belief Subspaces In constructing the universal belief space we implicitly assumed that each player knows his own type since we specified only his beliefs about the state of nature and about the beliefs of the other players. In view of that, and since

by (3) a type of player i is a probability distribution on S  T Infig , we can view a type ti also as a probability distribution on ˝ D S  T I in which the marginal distribution on T i is a degenerate delta function at ti ; that is, if ! D (s(!); t1 (!); t2 (!); : : : ; t n (!)), then for all i in I, t i (!) 2 (˝) and

t i (!)[t i D t i (!)] D 1 :

(5)

In particular it follows that if Supp(t i ) denotes the support of ti , then ! 0 2 Supp(t i (!)) ) t i (! 0 ) D t i (!) :

(6)

Let Pi (!) D Supp(t i (!)) ˝. This defines a possibility correspondence; at state of the world !, player i does not consider as possible any point not in Pi (!). By (6), Pi (!) \ Pi (! 0 ) ¤  ) Pi (!) D Pi (! 0 ) : However, unlike in Aumann’s model, Pi does not define a partition of ˝ since it is possible that ! … Pi (!), and hence the union [!2˝ Pi (!) may be strictly smaller than ˝ (see Example 7). If ! 2 Pi (!) Y holds for all ! in some subspace Y  ˝, then (Pi (!))!2Y is a partition of Y. As we said, the universal belief space includes all possible beliefs and mutual belief structures over the state of nature. However, in a specific situation of incomplete information, it may well be that only part of ˝ is relevant for describing the situation. If the state of the world is ! then clearly all states of the world in [ i2I Pi (!) are relevant, but this is not all, because if ! 0 2 Pi (!) then all states in Pj (! 0 ), for j ¤ i, are also relevant in the considerations of player i. This observation motivates the following definition: Definition 8 A belief subspace (BL-subspace) is a closed subset Y of ˝ which satisfies: Pi (!) Y

8i 2 I

and 8! 2 Y :

(7)

A belief subspace is minimal if it has no proper subset which is also a belief subspace. Given ! 2 ˝, the belief subspace at !, denoted by Y(!), is the minimal subspace containing !. Since ˝ is a BL-subspace, Y(!) is well defined for all ! 2 ˝. A BL-subspace is a closed subset of ˝ which is also closed under beliefs of the players. In any ! 2 Y, it contains all states of the world which are relevant to the situation: If ! 0 … Y, then no player believes that ! 0 is possible, no player believes that any other player believes that ! 0 is possible, no player believes that any player believes that any player believes. . . , etc.

243

244

Bayesian Games: Games with Incomplete Information

Remark 9 The subspace Y(!) is meant to be the minimal subspace which is belief-closed by all players at the state !. ˜ Thus, a natural definition would be: Y(!) is the minimal BL-subspace containing Pi (!) for all i in I. However, if for ˜ Yet, every player the state ! is not in Pi (!) then ! … Y(!). even if it is not in the belief closure of the players, the real state ! is still relevant (at least for the analyst) because it determines the true state of nature; that is, it determines the true payoffs of the game. This is the reason for adding the true state of the world !, even though “it may not be in the mind of the players”. It follows from (5), (6) and (7) that a BL-subspace Y has the following structure: Proposition 10 A closed subset Y of the universal belief space ˝ is a BL-subspace if and only if it satisfies the following conditions: 1. For any ! D (s(!); t1 (!); t2 (!); : : : ; t n (!)) 2 Y, and for all i, the type t i (!) is a probability distribution on Y. 2. For any ! and ! 0 in Y, ! 0 2 Supp(t i (!)) ) t i (! 0 ) D t i (!) : In fact condition 1 follows directly from Definition 8 while condition 2 follows from the general property of the UBS expressed in (6). Given a BL-subspace Y in ˝(S; I) we denote by T i the type set of player i, Ti D ft i (!)j! 2 Yg ; and note that unlike in the UBS, in a specific model Y, the type sets are typically not the same for all i, and the analogue of (4) is Y S  T1      Tn : A BL-subspace is a model of incomplete information about the state of nature. As we saw in Harsanyi’s model, in any model of incomplete information about a fixed set S of states of nature, involving the same set of players I, a state of the world ! defines (encapsulates) an infinite hierarchy of mutual beliefs of the players I on S. By the universality of the belief space ˝(S; I), there is ! 0 2 ˝(S; I) with the same hierarchy of beliefs as that of !. The mapping of each ! to its corresponding ! 0 in ˝(S; I) is called a belief morphism, as it preserves the belief structure. Mertens and Zamir [18] proved that the space ˝(S; I) is universal in the sense that any model Y of incomplete information of the set of players I about the state of nature s 2 S can be embedded in ˝(S; I) via belief morphism ' : Y ! ˝(S; I) so that '(Y) is a belief subspace in ˝(S; I). In the following examples we give the BL-subspaces representing some known models.

Examples of Belief Subspaces Example 1 (A game with complete information) If the state of nature is s0 2 S then in the universal belief space ˝(S; I), the game is described by a BL-subspace Y consisting of a single state of the world: Y D f!g where ! D (s0 ; [1!]; : : : ; [1!]) : Here [1!] is the only possible probability distribution on Y, namely, the trivial distribution supported by !. In particular, the state of nature s0 (i. e., the data of the game) is commonly known. Example 2 (Commonly known uncertainty about the state of nature) Assume that the players’ set is I D f1; : : : ; ng and there are k states of nature representing, say, k possible n-dimensional payoff matrices G1 ; : : : ; G k . At the beginning of the game, the payoff matrix is chosen by a chance move according to the probability vector p D (p1 ; : : : ; p k ) which is commonly known by the players but no player receives any information about the outcome of the chance move. The set of states of nature is S D fG1 ; : : : ; G k g. The situation described above is embedded in the UBS, ˝(S; I), as the following BLsubspace Y consisting of k states of the world (denoting p 2 (Y) by [p1 !1 ; : : : ; p k ! k ]):     

Y D f!1 ; : : : ; ! k g !1 D (G1 ; [p1 !1 ; : : : ; p k ! k ]; : : : ; [p1 !1 ; : : : ; p k ! k ]) !2 D (G2 ; [p1 !1 ; : : : ; p k ! k ]; : : : ; [p1 !1 ; : : : ; p k ! k ]) :::::: ! k D (G k ; [p1 !1 ; : : : ; p k ! k ]; : : : ; [p1 !1 ; : : : ; p k ! k ]).

There is a single type, [p1 !1 ; : : : ; p k ! k ], which is the same for all players. It should be emphasized that the type is a distribution on Y (and not just on the states of nature), which implies that the beliefs [p1 G1 ; : : : ; p k G k ] on the state of nature are commonly known by the players. Example 3 (Two players with incomplete information on one side) There are two players, I D fI; IIg, and two possible payoff matrices, S D fG1 ; G2 g. The payoff matrix is chosen at random with P(s D G1 ) D p, known to both players. The outcome of this chance move is known only to player I. Aumann and Maschler have studied such situations in which the chosen matrix is played repeatedly and the issue is how the informed player strategically uses his information (see Aumann and Maschler [8] and its references). This situation is presented in the UBS by the following BL-subspace:  Y D f!1 ; !2 g  !1 D (G1 ; [1!1 ]; [p!1 ; (1  p)!2 ])

Bayesian Games: Games with Incomplete Information

 !2 D (G2 ; [1!2 ]; [p!1 ; (1  p)!2 ]). Player I has two possible types: I1 D [1!1 ] when he is informed of G1 , and I2 D [1!2 ] when he is informed of G2 . Player II has only one type, II D [p!1 ; (1  p)!2 ]. We describe this situation in the following extensive form-like figure in which the oval forms describe the types of the players in the various vertices.

Example 5 (Incomplete information on two sides: A Harsanyi game) In this example, the set of players is again I D fI; IIg and the set of states of nature is S D fs11 ; s12 ; s21 ; s22 g. In the universal belief space ˝(S; I) consider the following BL-subspace consisting of four states of the world. Example 4 (Incomplete information about the other players’ information) In the next example, taken from Sorin and Zamir [23], one of two players always knows the state of nature but he may be uncertain whether the other player knows it. There are two players, I D fI; IIg, and two possible payoff matrices, S D fG1 ; G2 g. It is commonly known to both players that the payoff matrix is chosen at random by a toss of a fair coin: P(s D G1 ) D 1/2. The outcome of this chance move is told to player I. In addition, if (and only if) the matrix G1 was chosen, another fair coin toss determines whether to inform player II which payoff matrix was chosen. In any case player I is not told the result of the second coin toss. This situation is described by the following belief space with three states of the world:    

Y D f!1 ; !2 ; !3 g !1 D (G1 ; [ 12 !1 ; 12 !2 ]; [1!1 ]) !2 D (G1 ; [ 12 !1 ; 12 !2 ]; [ 13 !2 ; 23 !3 ]) !3 D (G2 ; [1!3 ]; [ 12 !2 ; 12 !3 ])

Each player has two types and the type sets are:



 1 1 !1 ; !2 ; [1!3 ] 2 2

 1 2 : TII D fII1 ; II2 g D [1!1 ] ; !2 ; !3 3 3

    

Y D f!11 ; !12 ; !21 ; !22 g  !11 D s11 ; [ 37 !11 ; 47 !12 ]; [ 35 !11 ; 25 !21 ] !12 D s12 ; [ 37 !11 ; 47 !12 ]; [ 45 !12 ; 15 !22 ] !21 D s21 ; [ 23 !21 ; 13 !22 ]; [ 35 !11 ; 25 !21 ] !22 D s22 ; [ 23 !21 ; 13 !22 ]; [ 45 !12 ; 15 !22 ]

Again, each player has two types and the type sets are:



3 2 4 !11 ; !12 ; !21 ; 7 7 3

3 4 2 !11 ; !21 ; !12 ; TII D fII1 ; II2 g D 5 5 5 TI D fI1 ; I2 g D

The type of a player determines his beliefs about the type of the other player. For example, player I of type I 1 assigns probability 3/7 to the state of the world ! 11 in which player II is of type II1 , and probability 4/7 to the state of the world ! 12 in which player II is of type II2 . Therefore, the beliefs of type I 1 about the types of player II are P(II1 ) D 3/7, P(II2 ) D 4/7. The mutual beliefs about each other’s type are given in the following tables:

TI D fI1 ; I2 g D

I1 I2

II1 3/7 2/3

II2 4/7 1/3

Beliefs of player I Note that in all our examples of belief subspaces, condition (6) is satisfied; the support of a player’s type contains only states of the world in which he has that type. The game with incomplete information is described in the following figure:

 1 !22 3  1 !22 : 5

I1 I2

II1

II2

3/5 2/5

4/5 1/5

Beliefs of player II

These are precisely the beliefs of Bayesian players if the pair of types (t I ; t II ) in T D TI  TII is chosen according to the prior probability distribution p below, and each player is then informed of his own type:

245

246

Bayesian Games: Games with Incomplete Information

tent beliefs cannot be described as a Harsanyi or Aumann model; it cannot be described as a game in extensive form.

In other words, this BL-subspace is a Harsanyi game with type sets TI ; TII and the prior probability distribution p on the types. Actually, as there is one-to-one mapping between the type set T and the set S of states of nature, the situation is generated by a chance move choosing the state of nature s i j 2 S according to the distribution p (that is, P(s i j ) D P(I i ; II j ) for i and j in f1; 2g) and then player I is informed of i and player II is informed of j. As a matter of fact, all the BL-subspaces in the previous examples can also be written as Harsanyi games, mostly in a trivial way. Example 6 (Inconsistent beliefs) In the same universal belief space, ˝(S; I) of the previous example, consider now another BL-subspace Y˜ which differs from Y only by changing the type II1 of player II from [ 35 !11 ; 25 !21 ] to [ 12 !11 ; 12 !21 ], that is,     

Y˜ D f!11 ; !12 ; !21 ; !22 g !11 D s11 ; [ 37 !11 ; 47 !12 ]; [ 12 !11 ; !12 D s12 ; [ 37 !11 ; 47 !12 ]; [ 45 !12 ; !21 D s21 ; [ 23 !21 ; 13 !22 ]; [ 12 !11 ; !22 D s22 ; [ 23 !21 ; 13 !22 ]; [ 45 !12 ;



1 2 !21 ] 1 5 !22 ] 1 2 !21 ] 1 5 !22 ]

with type sets:



 3 2 4 1 !11 ; !12 ; !21 ; !22 7 7 3 3

 1 4 1 1 !11 ; !21 ; !12 ; !22 TII D fII1 ; II2 g D : 2 2 5 5 TI D fI1 ; I2 g D

Now, the mutual beliefs about each other’s type are:

I1 I2

II1 3/7 2/3

II2 4/7 1/3

Beliefs of player I

I1 I2

II1

II2

1/2 1/2

4/5 1/5

Beliefs of player II

Unlike in the previous example, these beliefs cannot be derived from a prior distribution p. According to Harsanyi, these are inconsistent beliefs. A BL-subspace with inconsis-

Example 7 (“Highly inconsistent” beliefs) In the previous example, even though the beliefs of the players were inconsistent in all states of the world, the true state was considered possible by all players (for example in the state ! 12 player I assigns to this state probability 4/7 and player II assigns to it probability 4/5). As was emphasized before, the UBS contains all belief configurations, including highly inconsistent or wrong beliefs, as the following example shows. The belief subspace of the two players I and II concerning the state of nature which can be s1 or s2 is given by:  Y D f!  1 ; !2 g   !1 D s1 ; [ 12 !1 ; 12 !2 ]; [1!2 ]  !2 D s2 ; [ 12 !1 ; 12 !2 ]; [1!2 ] . In the state of the world ! 1 , the state of nature is s1 , player I assigns equal probabilities to s1 and s2 , but player II assigns probability 1 to s2 . In other words, he does not consider as possible the true state of the world (and also the true state of nature): !1 … PI (!1 ) and consequently [!2Y PI (!) D f!2 g which is strictly smaller than Y. By the definition of belief subspace and condition (6), this also implies that [!2˝ PI (!) is strictly smaller than ˝ (as it does not contain ! 1 ). Consistent Beliefs and Common Priors A BL-subspace Y is a semantic belief system presenting, via the notion of types, the hierarchies of belief of a set of players having incomplete information about the state of nature. A state of the world captures the situation at what is called the interim stage: Each player knows his own type and has beliefs about the state of nature and the types of the other players. The question “what is the real state of the world !?” is not addressed. In a BL-subspace, there is no chance move with explicit probability distribution that chooses the state of the world, while such a probability distribution is part of a Harsanyi or an Aumann model. Yet, in the belief space Y of Example 5 in the previous section, such a prior distribution p emerged endogenously from the structure of Y. More specifically, if the state ! 2 Y is chosen by a chance move according to the probability distribution p and each player i is told his type t i (!), then his beliefs are precisely those described by t i (!). This is a property of the BL-subspace that we call consistency (which does not hold, for instance, for the BL-subspace Y˜ in Example 6) and that we define now: Let Y ˝ be a BLsubspace.

Bayesian Games: Games with Incomplete Information

Definition 11 (i) A probability distribution p 2 (Y) is said to be consistent if for any player i 2 I, Z t i (!)dp : (8) pD Y

(ii) A BL-subspace Y is said to be consistent if there is a consistent probability distribution p with Supp(p) D Y. A consistent BL-subspace will be called a C-subspace. A state of the world ! 2 ˝ is said to be consistent if it is a point in a C-subspace. The interpretation of (8) is that the probability distribution p is “the average” of the types t i (!) of player i (which are also probability distributions on Y), when the average is taken on Y according to p. This definition is not transparent; it is not clear how it captures the consistency property we have just explained, in terms of a chance move choosing ! 2 Y according to p. However, it turns out to be equivalent. For ! 2 Y denote  i (!) D f! 0 2 Yjt i (! 0 ) D t i (!)g; then we have: Proposition 12 A probability distribution p 2 (Y) is consistent if and only if t i (!)(A) D p(Aj i (!))

same information will have precisely the same beliefs. It is no surprise that this assumption has strong consequences, the most known of which is due to Aumann [2]: Players with consistent beliefs cannot agree to disagree. That is, if at some state of the world it is commonly known that one player assigns probability q1 to an event E and another player assigns probability q2 to the same event, then it must be the case that q1 D q2 . Variants of this result appear under the title of “No trade theorems” (see, e. g., [19]): Rational players with consistent beliefs cannot believe that they both can gain from a trade or a bet between them. The plausibility and the justification of the common prior assumption was extensively discussed in the literature (see, e. g., [4,10,11]). It is sometimes referred to in the literature as the Harsanyi doctrine. Here we only make the observation that within the set of BL-subspaces in ˝, the set of consistent BL-subspaces is a set of measure zero. To see the idea of the proof, consider the following example: Example 8 (Generalization of Examples 5 and 6) Consider a BL-subspace as in Examples 5 and 6 but with type sets: TI D fI1 ; I2 g D f[˛1 !11 ; (1  ˛1 )!12 ]; [˛2 !21 ; (1  ˛2 )!22 ]g TII D fII1 ; II2 g

(9)

holds for all i 2 I and for any measurable set A Y. In particular, a Harsanyi or an Aumann model is represented by a consistent BL-subspace since, by construction, the beliefs are derived from a common prior distribution which is part of the data of the model. The role of the prior distribution p in these models is actually not that of an additional parameter of the model but rather that of an additional assumption on the belief system, namely, the consistency assumption. In fact, if a minimal belief subspace is consistent, then the common prior p is uniquely determined by the beliefs, as we saw in Example 5; there is no need to specify p as additional data of the system. Proposition 13 If ! 2 ˝ is a consistent state of the world, and if Y(!) is the smallest consistent BL-subspace containing !, then the consistent probability distribution p on Y(!) is uniquely determined. (The formulation of this proposition requires some technical qualification if Y(!) is a continuum.) The consistency (or the existence of a common prior), is quite a strong assumption. It assumes that differences in beliefs (i. e., in probability assessments) are due only to differences in information; players having precisely the

D f[ˇ1 !11 ; (1  ˇ1 )!21 ]; [ˇ2 !12 ; (1  ˇ2 )!22 ]g : For any (˛1 ; ˛2 ; ˇ1 ; ˇ2 ) 2 [0; 1]4 this is a BL-subspace. The mutual beliefs about each other’s type are:

I1 I2

II1 ˛1 ˛2

II2 1  ˛1 1  ˛2

Beliefs of player I

I1 I2

II1

II2

ˇ1 1  ˇ1

ˇ2 1  ˇ2

Beliefs of player II

If the subspace is consistent, these beliefs are obtained as conditional distributions from some prior probability distribution p on T D TI  TII , say, by p of the following matrix:

247

248

Bayesian Games: Games with Incomplete Information

This implies (assuming p i j ¤ 0 for all i and j), ˛1 p11 D ; p12 1  ˛1

p21 ˛2 D p22 1  ˛2 p11 p22 ˛1 1  ˛2 and hence D : p12 p21 1  ˛1 ˛2

Similarly, p11 ˇ1 D ; p21 1  ˇ1

p12 ˇ2 D p22 1  ˇ2 p11 p22 ˇ1 1  ˇ2 and hence D : p12 p21 1  ˇ1 ˇ2

It follows that the types must satisfy: ˛1 1  ˛2 ˇ1 1  ˇ2 D ; 1  ˛1 ˛2 1  ˇ1 ˇ2

all the players. For a vector of actions a 2 A(!), we write u i (!; a) for u i (!)(a). Given a BL-subspace Y ˝(S; I) we define the Bayesian game on Y as follows: Definition 14 The Bayesian game on Y is a vector payoff game in which:  I D f1; : : : ; ng – the players’ set.  ˙ i – the strategy set of player i, is the set of mappings  i : Y ! A i

which are Ti –measurable :

In particular: t i (!1 ) D t i (!2 ) H)  i (!1 ) D  i (!2 ) :

(10)

which is generally not the case. More precisely, the set of (˛1 ; ˛2 ; ˇ1 ; ˇ2 ) 2 [0; 1]4 satisfying the condition (10) is a set of measure zero; it is a three-dimensional set in the four-dimensional set [0; 1]4 . Nyarko [21] proved that even the ratio of the dimensions of the set of consistent BLsubspaces to the dimension of the set of BL-subspaces goes to zero as the latter goes to infinity. Summing up, most BLsubspaces are inconsistent and thus do not satisfy the common prior condition.

Let ˙ D  i2I ˙ i .  The payoff function   ui for player i is a vector-valued function u i D u t i t i 2Ti , where u t i (the payoff function of player i of type ti ) is a mapping u t i : ˙ ! R defined by Z u t i ( ) D

u i (!;  (!))dt i (!) :

(11)

Y

Bayesian Games and Bayesian Equilibrium As we said, a game with incomplete information played by Bayesian players, often called a Bayesian game, is a game in which the players have incomplete information about the data of the game. Being a Bayesian, each player has beliefs (probability distribution) about any relevant data he does not know, including the beliefs of the other players. So far, we have developed the belief structure of such a situation which is a BL-subspace Y in the universal belief space ˝(S; I). Now we add the action sets and the payoff functions. These are actually part of the description of the state of nature: The mapping s : ˝ ! S assigns to each state of the world ! the game-form s(!) played at this state. To emphasize this interpretation of s(!) as a game-form, we denote it also as ! : ! D (I; A i (t i (!)) i2I ; (u i (!)) i2I ) ; where A i (t i (!)) is the actions set (pure strategies) of player i at ! and u i (!) : A(!) ! R is his payoff function and A(!) D  i2I A i (t i (!)) is the set of action profiles at state !. Note that while the actions of a player depend only on his type, his payoff depends on the actions and types of

Note that u t i is T i –measurable, as it should be. When Y is a finite BL-subspace, the above-defined Bayesian game is an n-person “game” in which the payoff for player i is a vector with a payoff for each one of his types (therefore, a vector of dimension jTi j). It becomes a regular gameform for a given state of the world ! since then the payoff to player i is u t i (!) . However, these game-forms are not regular games since they are interconnected; the players do not know which of these “games” they are playing (since they do not know the state of the world !). Thus, just like a Harsanyi game, a Bayesian game on a BL-subspace Y consists of a family of connected gameforms, one for each ! 2 Y. However, unlike a Harsanyi game, a Bayesian game has no chance move that chooses the state of the world (or the vector of types). A way to transform a Bayesian game into a regular game was suggested by R. Selten and was named by Harsanyi as the Selten game G  (see p. 496 in [11]). This is a game with jT1 j  jT2 j  : : :  jTn j players (one for each type) in which each player t i 2 Ti chooses a strategy and then selects his (n  1) partners, one from each T j ; j ¤ i, according to his beliefs ti .

Bayesian Games: Games with Incomplete Information

Bayesian Equilibrium Although a Bayesian game is not a regular game, the Nash equilibrium concept based on the notion of best reply can be adapted to yield the solution concept of Bayesian equilibrium (also called Nash–Bayes equilibrium). Definition 15 A vector of strategies  D (1 ; : : : ; n ), in a Bayesian game, is called a Bayesian equilibrium if for all i in I and for all ti in T i , u t i () u t i (i ; ˜ i ) ;

8˜ i 2 ˙ i ;

(12)

where, as usual, i D ( j ) j¤i denotes the vector of strategies of players other than i. Thus, a Bayesian equilibrium specifies a behavior for each player which is a best reply to what he believes is the behavior of the other players, that is, a best reply to the strategies of the other players given his type. In a game with complete information, which corresponds to a BL-subspace with one state of the world (Y D f!g), as there is only one type of each player, and the beliefs are all probability one on a singleton, the Bayesian equilibrium is just the wellknown Nash equilibrium. Remark 16 It is readily seen that when Y is finite, any Bayesian equilibrium is a Nash equilibrium of the Selten game G  in which each type is a player who selects the types of his partners according to his beliefs. Similarly, we can transform the Bayesian game into an ordinary game in strategic form by defining the payoff function to player i P to be u˜ i D t i 2Ti  t i u t i where  t i are strictly positive. Again, independently of the values of the constants  t i , any Bayesian equilibrium is a Nash equilibrium of this game and vice versa. In particular, if we choose the conP stants so that t i 2Ti  t i D 1, we obtain the game suggested by Aumann and Maschler in 1967 (see p. 95 in [8]) and again, the set of Nash equilibria of this game is precisely the set of Bayesian equilibria. The Harsanyi Game Revisited As we observed in Example 5, the belief structure of a consistent BL-subspace is the same as in a Harsanyi game after the chance move choosing the types. That is, the embedding of the Harsanyi game as a BL-subspace in the universal belief space is only at the interim stage, after the moment that each player gets to know his type. The Harsanyi game on the other hand is at the ex ante stage, before a player knows his type. Then, what is the relation between the Nash equilibrium in the Harsanyi game at the ex ante stage and the equilibrium at the interim stage, namely, the Bayesian equilibrium of the corresponding BL-subspace?

This is an important question concerning the embedding of the Harsanyi game in the UBS since, as we said before, the chance move choosing the types does not appear explicitly in the UBS. The answer to this question was given by Harsanyi (1967–8) (assuming that each type ti has a positive probability): Theorem 17 (Harsanyi) The set of Nash equilibria of a Harsanyi game is identical to the set of Bayesian equilibria of the equivalent BL-subspace in the UBS. In other words, this theorem states that any equilibrium in the ex ante stage is also an equilibrium at the interim stage and vice versa. In modeling situations of incomplete information, the interim stage is the natural one; if a player knows his beliefs (type), then why should he analyze the situation, as Harsanyi suggests, from the ex ante point of view as if his type was not known to him and he could equally well be of another type? Theorem 17 provides a technical answer to this question: The equilibria are the same in both games and the equilibrium strategy of the ex ante game specifies for each type precisely his equilibrium strategy at the interim stage. In that respect, for a player who knows his type, the Harsanyi model is just an auxiliary game to compute his equilibrium behavior. Of course the deeper answer to the question above comes from the interactive nature of the situation: Even though player i knows he is of type ti , he knows that his partners do not know that and that they may consider the possibility that he is of type ˜t i , and since this affects their behavior, the behavior of type ˜t i is also relevant to player i who knows he is of type ti . Finally, Theorem 17 makes the Bayesian equilibrium the natural extension of the Nash equilibrium concept to games with incomplete information for consistent or inconsistent beliefs, when the Harsanyi ordinary game model is unavailable. Examples of Bayesian Equilibria In Example 6, there are two players of two types each, and with inconsistent mutual beliefs given by

I1 I2

II1 3/7 2/3

II2 4/7 1/3

Beliefs of player I

I1 I2

II1

II2

1/2 1/2

4/5 1/5

Beliefs of player II

Assume that the payoff matrices for the four type’s of profiles are:

249

250

Bayesian Games: Games with Incomplete Information

As the beliefs are inconsistent they cannot be presented by a Harsanyi game. Yet, we can compute the Bayesian equilibrium of this Bayesian game. Let (x; y) be the strategy of player I, which is:  Play the mixed strategy [x(T); (1  x)(B)] when you are of type I 1 .  Play the mixed strategy [y(T); (1  y)(B)] when you are of type I 2 . and let (z; t) be the strategy of player II, which is:  Play the mixed strategy [z(L); (1  z)(R)] when you are of type II1 .  Play the mixed strategy [t(L); (1  t)(R)] when you are of type II2 . For 0 < x; y; z; t < 1, each player of each type must be indifferent between his two pure actions; that yields the values in equilibrium: xD

3 ; 5

yD

2 ; 5

zD

7 ; 9

tD

2 : 9

There is no “expected payoff” since this is a Bayesian game and not a game; the expected payoffs depend on the actual state of the world, i. e., the actual types of the players and the actual payoff matrix. For example, the state of the world is !11 D (G11 ; I1 ; II1 ); the expected payoffs are: (!11 ) D





46 6 3 2 7/9 D ; G11 ; : 2/9 5 5 45 45

Similarly:

3 2 ; G12 5 5

2 3 (!21 ) D ; G21 5 5

2 3 (!22 ) D ; G22 5 5 (!12 ) D

2/9 7/9 7/9 2/9 2/9 7/9



D



D



D

18 4 ; 45 45 21 21 ; 45 45 28 70 ; 45 45



However, these are the objective payoffs as viewed by the analyst; they are viewed differently by the players. For player i of type ti the relevant payoff is his subjective payoff u t i ( ) defined in (11). For example, at state ! 11 (or ! 12 ) player I believes that with probability 3/7 the state is ! 11 in which case his payoff is 46/45 and with probability 4/7 the state is ! 12 in which case his payoff is 18/45. Therefore his subjective expected payoff at state ! 11 is 3/7  46/45 C 4/7  18/45 D 2/3. Similar computations show that in states ! 21 or ! 22 player I “expects” a payoff of 7/15 while player II “expects” 3/10 at states ! 11 or ! 21 and 86/225 in states ! 12 or ! 22 . Bayesian equilibrium is widely used in Auction Theory, which constitutes an important and successful application of the theory of games with incomplete information. The simplest example is that of two buyers bidding in a first-price auction for an indivisible object. If each buyer i has a private value vi for the object (which is independent of the private value vj of the other buyer), and if he further believes that vj is random with uniform probability distribution on [0; 1], then this is a Bayesian game in which the type of a player is his private valuation; that is, the type sets are T1 D T2 D [0; 1], which is a continuum. This is a consistent Bayesian game (that is, a Harsanyi game) since the beliefs are derived from the uniform probability distribution on T1  T2 D [0; 1]2 . A Bayesian equilibrium of this game is that in which each player bids half of his private value: b i (v i ) D v i /2 (see, e. g., Chap. III in [25]). Although auction theory was developed far beyond this simple example, almost all the models studied so far are Bayesian games with consistent beliefs, that is, Harsanyi games. The main reason of course is that consistent Bayesian games are more manageable since they can be described in terms of an equivalent ordinary game in strategic form. However, inconsistent beliefs are rather plausible and exist in the market place in general and even more so in auction situations. An example of that is the case of collusion of bidders: When a bidding ring is formed, it may well be the case that some of the bidders outside the ring are unaware of its existence and behave under the belief that all bidders are competitive. The members of the ring may or may not know whether the other bidders know about the ring, or they may be uncertain about it. This rather plausible mutual belief situation is typically inconsistent and has to be treated as an inconsistent Bayesian game for which a Bayesian equilibrium is to be found.

Bayesian Equilibrium and Correlated Equilibrium

:

Correlated equilibrium was introduced in Aumann (1974) as the Nash equilibrium of a game extended by adding to

Bayesian Games: Games with Incomplete Information

it random events about which the players have partial information. Basically, starting from an ordinary game, Aumann added a probability space and information structure and obtained a game with incomplete information, the equilibrium of which he called a correlated equilibrium of the original game. The fact that the Nash equilibrium of a game with incomplete information is the Bayesian equilibrium suggests that the concept of correlated equilibrium is closely related to that of Bayesian equilibrium. In fact Aumann noticed that and discussed it in a second paper entitled “Correlated equilibrium as an expression of Bayesian rationality” [3]. In this section, we review briefly, by way of an example, the concept of correlated equilibrium, and state formally its relation to the concept of Bayesian equilibrium. Example 18 Consider a two-person game with actions fT; Bg for player 1 and fL; Rg for player 2 with corresponding payoffs given in the following matrix:

This game has three Nash equilibria: (T; R) with payoff (2; 7), (B; L) with payoff (7; 2) and the mixed equilibrium ([ 23 (T); 13 (B)]; [ 23 (L); 13 (R)]) with payoff (4 23 ; 4 23 ). Suppose that we add to the game a chance move that chooses an element in fT; Bg  fL; Rg according to the following probability distribution :

Let us now extend the game G to a game with incomplete information G  in which a chance move chooses an element in fT; Bg  fL; Rg according to the probability distribution above. Then player 1 is informed of the first (left) component of the chosen element and player 2 is informed of the second (right) component. Then each player chooses an action in G and the payoff is made. If we interpret the partial information as a suggestion of which action to choose, then it is readily verified that following the suggestion is a Nash equilibrium of the extended game yielding a payoff (5; 5). This was called by Aumann a correlated equilibrium of the original game G. In our terminology, the extended game G  is a Bayesian game and its

Nash equilibrium is its Bayesian equilibrium. Thus what we have here is that a correlated equilibrium of a game is just the Bayesian equilibrium of its extension to a game with incomplete information. We now make this a general formal statement. For simplicity, we use the Aumann model of a game with incomplete information. Let G D (I; (A i ) i2I ; (u i ) i2I ) be a game in strategic form where I is the set of players, Ai is the set of actions (pure strategies) of player i and ui is his payoff function. Definition 19 Given a game in strategic form G, an incomplete information extension (the I-extension) of the game G is the game G  given by G  D (I; (A i ) i2I ; (u i ) i2I ; (Y; p)); ( i ) i2I ) ; where (Y; p) is a finite probability space and  i is a partition of Y (the information partition of player i). This is an Aumann model of incomplete information and, as we noted before, it is also a Harsanyi type-based model in which the type of player i at state ! 2 Y is t i (!) D  i (!), and a strategy of player i is a mapping from his type set to his mixed actions:  i : Ti ! (A i ). We identify a correlated equilibrium in the game G by the probability distribution  on the vectors of actions A D A1 ; : : : ; A n . Thus  2 (A) is a correlated equilibrium of the game G if when a 2 A is chosen according to  and each player i is suggested to play ai , his best reply is in fact to play the action ai . Given a game with incomplete information G  as in definition 19, any vector of strategies of the players  D (1 ; : : : ; n ) induces a probability distribution on the vectors of actions a 2 A. We denote this as  2 (A). We can now state the relation between correlated and Bayesian equilibria: Theorem 20 Let  be a Bayesian equilibrium in the game of incomplete information G  D (I; (A i ) i2I ; (u i ) i2I ; (Y; p)); ( i ) i2I ); then the induced probability distribution  is a correlated equilibrium of the basic game G D (I; (A i ) i2I ; (u i ) i2I ). The other direction is: Theorem 21 Let  be a correlated equilibrium of the game G D (I; (A i ) i2I ; (u i ) i2I ); then G has an extension to a game with incomplete information G  D (I; (A i ) i2I ; (u i ) i2I ; (Y; p)); ( i ) i2I ) with a Bayesian equilibrium  for which  D .

251

252

Bayesian Games: Games with Incomplete Information

Concluding Remarks and Future Directions The Consistency Assumption To the heated discussion of the merits and justification of the consistency assumption in economic and game-theoretical models, we would like to add a couple of remarks. In our opinion, the appropriate way of modeling an incomplete information situation is at the interim stage, that is, when a player knows his own beliefs (type). The Harsanyi ex ante model is just an auxiliary construction for the analysis. Actually this was also the view of Harsanyi, who justified his model by proving that it provides the same equilibria as the interim stage situation it generates (Theorem 17). The Harsanyi doctrine says roughly that our models “should be consistent” and if we get an inconsistent model it must be the case that it not be a “correct” model of the situation at hand. This becomes less convincing if we agree that the interim stage is what we are interested in: Not only are most mutual beliefs inconsistent, as we saw in the section entitled “Consistent Beliefs and Common Priors” above, but it is hard to argue convincingly that the model in Example 5 describes an adequate mutual belief situation while the model in Example 6 does not; the only difference between the two is that in one model, a certain type’s beliefs are [ 35 !11 ; 25 !21 ] while in the other model his beliefs are [ 12 !11 ; 12 !21 ]. Another related point is the fact that if players’ beliefs are the data of the situation (in the interim stage), then these are typically imprecise and rather hard to measure. Therefore any meaningful result of our analysis should be robust to small changes in the beliefs. This cannot be achieved within the consistent belief systems which are a thin set of measure zero in the universal belief space. Knowledge and Beliefs Our interest in this article was mostly in the notion of beliefs of players and less in the notion of knowledge. These are two related but different notions. Knowledge is defined through a knowledge operator satisfying some axioms. Beliefs are defined by means of probability distributions. Aumann’s model, discussed in the section entitled “Aumann’s Model” above, has both elements: The knowledge was generated by the partitions of the players while the beliefs were generated by the probability P on the space Y (and the partitions). Being interested in the subjective beliefs of the player we could understand “at state of the world ! 2 ˝ player i knows the event E ˝” to mean “at state of the world ! 2 ˝ player i assigns to the event E ˝ probability 1”. However, in the universal belief space, “belief with probability 1” does not satisfy a cen-

tral axiom of the knowledge operator. Namely, if at ! 2 ˝ player i knows the event E ˝, then ! 2 E. That is, if a player knows an event, then this event in fact happened. In the universal belief space where all coherent beliefs are possible, in a state ! 2 ˝ a player may assign probability 1 to the event f! 0 g where ! 0 ¤ !. In fact, if in a BLsubspace Y the condition ! 2 Pi (!) is satisfied for all i and all ! 2 Y, then belief with probability 1 is a knowledge operator on Y. This in fact was the case in Aumann’s and in Harsanyi’s models where, by construction, the support of the beliefs of a player in the state ! always included !. For a detailed discussion of the relationship between knowledge and beliefs in the universal belief space see Vassilakis and Zamir [24]. Future Directions We have not said much about the existence of Bayesian equilibrium, mainly because it has not been studied enough and there are no general results, especially in the non-consistent case. We can readily see that a Bayesian game on a finite BL-subspace in which each state of nature s(!) is a finite game-form has a Bayesian equilibrium in mixed strategies. This can be proved, for example, by transforming the Bayesian game into an ordinary finite game (see Remark 16) and applying the Nash theorem for finite games. For games with incomplete information with a continuum of strategies and payoff functions not necessarily continuous, there are no general existence results. Even in consistent auction models, existence was proved for specific models separately (see [20,15,22]). Establishing general existence results for large families of Bayesian games is clearly an important future direction of research. Since, as we argued before, most games are Bayesian games, the existence of a Bayesian equilibrium should, and could, reach at least the level of generality available for the existence of a Nash equilibrium. Acknowledgments I am grateful to two anonymous reviewers for their helpful comments. Bibliography 1. Aumann R (1974) Subjectivity and Correlation in Randomized Strategies. J Math Econ 1:67–96 2. Aumann R (1976) Agreeing to disagree. Ann Stat 4:1236–1239 3. Aumann R (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18 4. Aumann R (1998) Common priors: A reply to Gul. Econometrica 66:929–938 5. Aumann R (1999) Interactive epistemology I: Knowledge. Intern J Game Theory 28:263–300

Bayesian Games: Games with Incomplete Information

6. Aumann R (1999) Interactive epistemology II: Probability. Intern J Game Theory 28:301–314 7. Aumann R, Heifetz A (2002) Incomplete Information. In: Aumann R, Hart S (eds) Handbook of Game Theory, vol 3. Elsevier, pp 1666–1686 8. Aumann R, Maschler M (1995) Repeated Games with Incomplete Information. MIT Press, Cambridge 9. Brandenburger A, Dekel E (1993) Hierarchies of beliefs and common knowledge. J Econ Theory 59:189–198 10. Gul F (1998) A comment on Aumann’s Bayesian view. Econometrica 66:923–927 11. Harsanyi J (1967–8) Games with incomplete information played by ‘Bayesian’ players, parts I–III. Manag Sci 8:159–182, 320–334, 486–502 12. Heifetz A (1993) The Bayesian formulation of incomplete information, the non-compact case. Intern J Game Theory 21:329– 338 13. Heifetz A, Mongin P (2001) Probability logic for type spaces. Games Econ Behav 35:31–53 14. Heifetz A, Samet D (1998) Topology-free topology of beliefs. J Econ Theory 82:324–341 15. Maskin E, Riley J (2000) Asymmetric auctions. Rev Econ Stud 67:413–438

16. Meier M (2001) An infinitary probability logic for type spaces. CORE Discussion Paper 2001/61 17. Mertens J-F, Sorin S, Zamir S (1994) Repeated Games, Part A: Background Material. CORE Discussion Paper No. 9420 18. Mertens J-F, Zamir S (1985) Foundation of Bayesian analysis for games with incomplete information. Intern J Game Theory 14:1–29 19. Milgrom PR, Stokey N (1982) Information, trade and common knowledge. J Eco Theory 26:17–27 20. Milgrom PR, Weber RJ (1982) A Theory of Auctions and Competitive Bidding. Econometrica 50:1089–1122 21. Nyarko Y (1991) Most games violate the Harsanyi doctrine. C.V. Starr working paper #91–39, NYU 22. Reny P, Zamir S (2004) On the existence of pure strategy monotone equilibria in asymmetric first price auctions. Econometrica 72:1105–1125 23. Sorin S, Zamir S (1985) A 2-person game with lack of information on 1 12 sides. Math Oper Res 10:17–23 24. Vassilakis S, Zamir S (1993) Common beliefs and common knowledge. J Math Econ 22:495–505 25. Wolfstetter E (1999) Topics in Microeconomics. Cambridge University Press, Cambridge

253

254

Bayesian Statistics

Bayesian Statistics DAVID DRAPER Department of Applied Mathematics and Statistics, Baskin School of Engineering, University of California, Santa Cruz, USA Article Outline Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography Glossary Bayes’ theorem; prior, likelihood and posterior distributions Given (a) , something of interest which is unknown to the person making an uncertainty assessment, conveniently referred to as You, (b) y, an information source which is relevant to decreasing Your uncertainty about , (c) a desire to learn about from y in a way that is both internally and externally logically consistent, and (d) B, Your background assumptions and judgments about how the world works, as these assumptions and judgments relate to learning about from y, it can be shown that You are compelled in this situation to reason within the standard rules of probability as the basis of Your inferences about , predictions of future data y  , and decisions in the face of uncertainty (see below for contrasts between inference, prediction and decision-making), and to quantify Your uncertainty about any unknown quantities through conditional probability distributions. When inferences about are the goal, Bayes’ Theorem provides a means of combining all relevant information internal and external to y: p( jy; B) D c p( jB) l( jy; B) :

(1)

Here, for example in the case in which is a real-valued vector of length k, (a) p( jB) is Your prior distribution about given B (in the form of a probability density function), which quantifies all relevant information available to You about external to y, (b) c is a positive normalizing constant, chosen to make the density on the left side of the equation integrate to 1, (c) l( jy; B) is Your likelihood distribution for given y and B, which is defined to be a den-

sity-normalized multiple of Your sampling distribution p(j ; B) for future data values y  given and B, but re-interpreted as a function of for fixed y, and (d) p( jy; B) is Your posterior distribution about given y and B, which summarizes Your current total information about and solves the basic inference problem. Bayesian parametric and non-parametric modeling (1) Following de Finetti [23], a Bayesian statistical model is a joint predictive distribution p(y1 ; : : : ; y n ) for observable quantities yi that have not yet been observed, and about which You are therefore uncertain. When the yi are real-valued, often You will not regard them as probabilistically independent (informally, the yi are independent if information about any of them does not help You to predict the others); but it may be possible to identify a parameter vector D ( 1 ; : : : ; k ) such that You would judge the yi conditionally independent given , and would therefore be willing to model them via the relation p(y1 ; : : : ; y n j ) D

n Y

p(y i j ) :

(2)

iD1

When combined with a prior distribution p( ) on that is appropriate to the context, this is Bayesian parametric modeling, in which p(y i j ) will often have a standard distributional form (such as binomial, Poisson or Gaussian). (2) When a (finite) parameter vector that induces conditional independence cannot be found, if You judge your uncertainty about the realvalued yi exchangeable (see below), then a representation theorem of de Finetti [21] states informally that all internally logically consistent predictive distributions p(y1 ; : : : ; y n ) can be expressed in a way that is equivalent to the hierarchical model (see below) (FjB)

p(FjB) IID

(y i jF; B)

F;

(3)

where (a) F is the cumulative distribution function (CDF) of the underlying process (y1 ; y2 ; : : : ) from which You are willing to regard p(y1 ; : : : ; y n ) as (in effect) like a random sample and (b) p(FjB) is Your prior distribution on the space F of all CDFs on the real line. This (placing probability distributions on infinite-dimensional spaces such as F ) is Bayesian nonparametric modeling, in which priors involving Dirichlet processes and/or Pólya trees (see Sect. “Inference: Parametric and Non-Parametric Modeling of Count Data”) are often used.

Bayesian Statistics

Exchangeability A sequence y D (y1 ; : : : ; y n ) of random variables (for n 1) is (finitely) exchangeable if the joint probability distribution p(y1 ; : : : ; y n ) of the elements of y is invariant under permutation of the indices (1; : : : ; n), and a countably infinite sequence (y1 ; y2 ; : : : ) is (infinitely) exchangeable if every finite subsequence is finitely exchangeable. Hierarchical modeling Often Your uncertainty about something unknown to You can be seen to have a nested or hierarchical character. One class of examples arises in cluster sampling in fields such as education and medicine, in which students (level 1) are nested within classrooms (level 2) and patients (level 1) within hospitals (level 2); cluster sampling involves random samples (and therefore uncertainty) at two or more levels in such a data hierarchy (examples of this type of hierarchical modeling are given in Sect. “Strengths and Weaknesses of the Two Approaches”). Another, quite different, class of examples of Bayesian hierarchical modeling is exemplified by equation (3) above, in which is was helpful to decompose Your overall predictive uncertainty about (y1 ; : : : ; y n ) into (a) uncertainty about F and then (b) uncertainty about the yi given F (examples of this type of hierarchical modeling appear in Sect. “Inference and Prediction: Binary Outcomes with No Covariates” and “Inference: Parametric and Non-Parametric Modeling of Count Data”). Inference, prediction and decision-making; samples and populations Given a data source y, inference involves drawing probabilistic conclusions about the underlying process that gave rise to y, prediction involves summarizing uncertainty about future observable data values y  , and decision-making involves looking for optimal behavioral choices in the face of uncertainty (about either the underlying process, or the future, or both). In some cases inference takes the form of reasoning backwards from a sample of data values to a population: a (larger) universe of possible data values from which You judge that the sample has been drawn in a manner that is representative (i. e., so that the sampled and unsampled values in the population are (likely to be) similar in relevant ways). Mixture modeling Given y, unknown to You, and B, Your background assumptions and judgments relevant to y, You have a choice: You can either model (Your uncertainty about) y directly, through the probability distribution p(yjB), or (if that is not feasible) You can identify a quantity x upon which You judge y to depend and model y hierarchically, in two stages: first by modeling x, through the probability distribu-

tion p(xjB), and then by modeling y given x, through the probability distribution p(yjx; B): Z p(yjx; B) p(xjB) dx ; (4) p(yjB) D X

where X is the space of possible values of x over which Your uncertainty is expressed. This is mixture modeling, a special case of hierarchical modeling (see above). In hierarchical notation (4) can be re-expressed as

 x yD : (5) (y j x) Examples of mixture modeling in this article include (a) equation (3) above, with F playing the role of x; (b) the basic equation governing Bayesian prediction, discussed in Sect. “The Bayesian Statistical Paradigm”; (c) Bayesian model averaging (Sect. “The Bayesian Statistical Paradigm”); (d) de Finetti’s representation theorem for binary outcomes (Sect. “Inference and Prediction: Binary Outcomes with No Covariates”); (e) random-effects parametric and non-parametric modeling of count data (Sect. “Inference: Parametric and Non-Parametric Modeling of Count Data”); and (f) integrated likelihoods in Bayes factors (Sect. “Decision-Making: Variable Selection in Generalized Linear Models; Bayesian Model Selection”). Probability – frequentist and Bayesian In the frequentist probability paradigm, attention is restricted to phenomena that are inherently repeatable under (essentially) identical conditions; then, for an event A of interest, P f (A) is the limiting relative frequency with which A occurs in the (hypothetical) repetitions, as the number of repetitions n ! 1. By contrast, Your Bayesian probability PB (AjB) is the numerical weight of evidence, given Your background information B relevant to A, in favor of a true-false proposition A whose truth status is uncertain to You, obeying a series of reasonable axioms to ensure that Your Bayesian probabilities are internally logically consistent. Utility To ensure internal logical consistency, optimal decision-making proceeds by (a) specifying a utility function U(a; 0 ) quantifying the numerical value associated with taking action a if the unknown is really 0 and (b) maximizing expected utility, where the expectation is taken over uncertainty in as quantified by the posterior distribution p( jy; B). Definition of the Subject and Introduction Statistics may be defined as the study of uncertainty: how to measure it, and how to make choices in the

255

256

Bayesian Statistics

face of it. Uncertainty is quantified via probability, of which there are two leading paradigms, frequentist (discussed in Sect. “Comparison with the Frequentist Statistical Paradigm”) and Bayesian. In the Bayesian approach to probability the primitive constructs are true-false propositions A whose truth status is uncertain, and the probability of A is the numerical weight of evidence in favor of A, constrained to obey a set of axioms to ensure that Bayesian probabilities are coherent (internally logically consistent). The discipline of statistics may be divided broadly into four activities: description (graphical and numerical summaries of a data set y, without attempting to reason outward from it; this activity is almost entirely non-probabilistic and will not be discussed further here), inference (drawing probabilistic conclusions about the underlying process that gave rise to y), prediction (summarizing uncertainty about future observable data values y  ), and decision-making (looking for optimal behavioral choices in the face of uncertainty). Bayesian statistics is an approach to inference, prediction and decision-making that is based on the Bayesian probability paradigm, in which uncertainty about an unknown (this is the inference problem) is quantified by means of a conditional probability distribution p( jy; B); here y is all available relevant data and B summarizes the background assumptions and judgments of the person making the uncertainty assessment. Prediction of a future y  is similarly based on the conditional probability distribution p(y  jy; B), and optimal decision-making proceeds by (a) specifying a utility function U(a; 0 ) quantifying the numerical reward associated with taking action a if the unknown is really 0 and (b) maximizing expected utility, where the expectation is taken over uncertainty in as quantified by p( jy; B). The Bayesian Statistical Paradigm Statistics is the branch of mathematical and scientific inquiry devoted to the study of uncertainty: its consequences, and how to behave sensibly in its presence. The subject draws heavily on probability, a discipline which predates it by about 100 years: basic probability theory can be traced [48] to work of Pascal, Fermat and Huygens in the 1650s, and the beginnings of statistics [34,109] are evident in work of Bayes published in the 1760s. The Bayesian statistical paradigm consists of three basic ingredients:  , something of interest which is unknown (or only partially known) to the person making the uncertainty assessment, conveniently referred to, in a convention proposed by Good (1950), as You. Often is a parameter vector of real numbers (of finite length k, say) or

a matrix, but it can literally be almost anything: for example, a function (three leading examples are a cumulative distribution function (CDF), a density, or a regression surface), a phylogenetic tree, an image of a region on the surface of Mars at a particular moment in time, : : :  y, an information source which is relevant to decreasing Your uncertainty about . Often y is a vector of real numbers (of finite length n, say), but it can also literally be almost anything: for instance, a time series, a movie, the text in a book, : : :  A desire to learn about from y in a way that is both coherent (internally consistent: in other words, free of internal logical contradictions; Bernardo and Smith [11] give a precise definition of coherence) and well-calibrated (externally consistent: for example, capable of making accurate predictions of future data y  ). It turns out [23,53] that You are compelled in this situation to reason within the standard rules of probability (see below) as the basis of Your inferences about , predictions of future data y  , and decisions in the face of uncertainty, and to quantify Your uncertainty about any unknown quantities through conditional probability distributions, as in the following three basic equations of Bayesian statistics: p( jy; B) D c p( jB) l( jy; B) Z p(y  jy; B) D p(y  j ; B) p( jy; B) d    a D argmax E( jy;B) U(a; ) :

(6)

a2A

(The basic rules of probability [71] are: for any truefalse propositions A and B and any background assumptions and judgments B, (convexity) 0  P(AjB)  1, with equality at 1 iff A is known to be true under B; (multiplication) P(A and BjB) D P(AjB) P(BjA; B) D P(BjB) P(AjB; B); and (addition) P(A or BjB) D P(AjB) C P(BjB)  P(A and BjB).) The meaning of the equations in (6) is as follows.  B stands for Your background (often not fully stated) assumptions and judgments about how the world works, as these assumptions and judgments relate to learning about from y. B is often omitted from the basic equations (sometimes with unfortunate consequences), yielding the simpler-looking forms p( jy) D c p( ) l( jy) Z p(y  j ) p( jy) d p(y  jy) D    a D argmax E( jy) U(a; ) : a2A

(7)

Bayesian Statistics

 p( jB) is Your prior information about given B, in the form of a probability density function (PDF) or probability mass function (PMF) if lives continuously or discretely on R k (this is generically referred to as Your prior distribution), and p( jy; B) is Your posterior distribution about given y and B, which summarizes Your current total information about and solves the basic inference problem. These are actually not very good names for p( jB) and p( jy; B), because (for example) p( jB) really stands for all (relevant) information about (given B) external to y, whether that information was obtained before (or after) y arrives, but (a) they do emphasize the sequential nature of learning and (b) through long usage it would be difficult for more accurate names to be adopted.  c (here and throughout) is a generic positive normalizing constant, inserted into the first equation in (6) to make the left-hand side integrate (or sum) to 1 (as any coherent distribution must).  p(y  j ; B) is Your sampling distribution for future data values y  given and B (and presumably You would use the same sampling distribution p(yj ; B) for (past) data values y, mentally turning the clock back to a point before the data arrives and thinking about what values of y You might see). This assumes that You are willing to regard Your data as like random draws from a population of possible data values (an heroic assumption in some cases, for instance with observational rather than randomized data; this same assumption arises in the frequentist statistical paradigm, discussed below in Sect. “Comparison with the Frequentist Statistical Paradigm”).  l( jy; B) is Your likelihood function for given y and B, which is defined to be any positive constant multiple of the sampling distribution p(yj ; B) but re-interpreted as a function of for fixed y: l( jy; B) D c p(yj ; B) :

(8)

The likelihood function is also central to one of the main approaches to frequentist statistical inference, developed by Fisher [37]; the two approaches are contrasted in Sect.“Comparison with the Frequentist Statistical Paradigm”. All of the symbols in the first equation in (6) have now been defined, and this equation can be recognized as Bayes’ Theorem, named after Bayes [5] because a special case of it appears prominently in work of his that was published posthumously. It describes how to pass coherently from information about external to y (quantified in the prior distribution p( jB)) to information

both internal and external to y (quantified in the posterior distribution p( jy; B)), via the likelihood function l( jy; B): You multiply the prior and likelihood pointwise in and normalize so that the posterior distribution p( jy; B) integrates (or sums) to 1.  According to the second equation in (6), p(y  jy; B), Your (posterior) predictive distribution for future data y  given (past) data y and B, which solves the basic prediction problem, must be a weighted average of Your sampling distribution p(y  j ; B) weighted by Your current best information p( jy; B) about given y and B; in this integral  is the space of possible values of over which Your uncertainty is expressed. (The second equation in (6) contains a simplifying assumption that should be mentioned: in full generality the first term p(y  j ; B) inside the integral would be p(y  jy; ; B), but it is almost always the case that the information in y is redundant in the presence of complete knowledge of , in which case p(y  jy; ; B) D p(y  j ; B); this state of affairs could be described by saying that the past and future are conditionally independent given the truth. A simple example of this phenomenon is provided by coin-tossing: if You are watching a Bernoulli( ) process unfold (see Sect. “Inference and Prediction: Binary Outcomes with No Covariates”) whose success probability is unknown to You, the information that 8 of the first 10 tosses have been heads is definitely useful to You in predicting the 11th toss, but if instead You somehow knew that was 0.7, the outcome of the first 10 tosses would be irrelevant to You in predicting any future tosses.)  Finally, in the context of making a choice in the face of uncertainty, A is Your set of possible actions, U(a; 0 ) is the numerical value (utility) You attach to taking action a if the unknown is really 0 (specified, without loss of generality, so that large utility values are preferred by You), and the third equation in (6) says that to make the choice coherently You should find the action a  that maximizes expected utility (MEU); here the expectation Z   E( jy;B) U(a; ) D U(a; ) p( jy; B) d (9) 

is taken over uncertainty in as quantified by the posterior distribution p( jy; B). This summarizes the entire Bayesian statistical paradigm, which is driven by the three equations in (6). Examples of its use include clinical trial design [56] and analysis [105]; spatio-temporal modeling, with environmental applications [101]; forecasting and dynamic linear mod-

257

258

Bayesian Statistics

els [115]; non-parametric estimation of receiver operating characteristic curves, with applications in medicine and agriculture [49]; finite selection models, with health policy applications [79]; Bayesian CART model search, with applications in breast cancer research [16]; construction of radiocarbon calibration curves, with archaeological applications [15]; factor regression models, with applications to gene expression data [114]; mixture modeling for high-density genotyping arrays, with bioinformatic applications [100]; the EM algorithm for Bayesian fitting of latent process models [76]; state-space modeling, with applications in particle-filtering [92]; causal inference [42,99]; hierarchical modeling of DNA sequences, with genetic and medical applications [77]; hierarchical Poisson regression modeling, with applications in health care evaluation [17]; multiscale modeling, with engineering and financial applications [33]; expected posterior prior distributions for model selection [91]; nested Dirichlet processes, with applications in the health sciences [96]; Bayesian methods in the study of sustainable fisheries [74,82]; hierarchical non-parametric meta-analysis, with medical and educational applications [81]; and structural equation modeling of multilevel data, with applications to health policy [19]. Challenges to the paradigm include the following:  Q: How do You specify the sampling distribution/ likelihood function that quantifies the information about the unknown internal to Your data set y? A: (1) The solution to this problem, which is common to all approaches to statistical inference, involves imagining future data y  from the same process that has yielded or will yield Your data set y; often the variability You expect in future data values can be quantified (at least approximately) through a standard parametric family of distributions (such as the Bernoulli/binomial for binary data, the Poisson for count data, and the Gaussian for real-valued outcomes) and the parameter vector of this family becomes the unknown of interest. (2) Uncertainty in the likelihood function is referred to as model uncertainty [67]; a leading approach to quantifying this source of uncertainty is Bayesian model averaging [18,25,52], in which uncertainty about the models M in an ensemble M of models (specifying M is part of B) is assessed and propagated for a quantity, such as a future data value y  , that is common to all models via the expression Z p(y  jy; M; B) p(Mjy; B) dM : (10) p(y  jy; B) D M

In other words, to make coherent predictions in the presence of model uncertainty You should form

a weighted average of the conditional predictive distributions p(y  jy; M; B), weighted by the posterior model probabilities p(Mjy; B). Other potentially useful approaches to model uncertainty include Bayesian non-parametric modeling, which is examined in Sect. “Inference: Parametric and Non-Parametric Modeling of Count Data”, and methods based on crossvalidation [110], in which (in Bayesian language) part of the data is used to specify the prior distribution on M (which is an input to calculating the posterior model probabilities) and the rest of the data is employed to update that prior.  Q: How do You quantify information about the unknown external to Your data set y in the prior probability distribution p( jB)? A: (1) There is an extensive literature on elicitation of prior (and other) probabilities; notable references include O’Hagan et al. [85] and the citations given there. (2) If is a parameter vector and the likelihood function is a member of the exponential family [11], the prior distribution can be chosen in such a way that the prior and posterior distributions for have the same mathematical form (such a prior is said to be conjugate to the given likelihood); this may greatly simplify the computations, and often prior information can (at least approximately) be quantified by choosing a member of the conjugate family (see Sect.“Inference and Prediction: Binary Outcomes with No Covariates” for an example of both of these phenomena). In situations where it is not precisely clear how to quantify the available information external to y, two sets of tools are available:  Sensitivity analysis [30], also known as pre-posterior analysis [4]: Before the data have begun to arrive, You can (a) generate data similar to what You expect You will see, (b) choose a plausible prior specification and update it to the posterior on the quantities of greatest interest, (c) repeat (b) across a variety of plausible alternatives, and (d) see if there is substantial stability in conclusions across the variations in prior specification. If so, fine; if not, this approach can be combined with hierarchical modeling [68]: You can collect all of the plausible priors and add a layer hierarchically to the prior specification, with the new layer indexing variation across the prior alternatives.  Bayesian robustness [8,95]: If, for example, the context of the problem implies that You only wish to specify that the prior distribution belongs to an infinite-dimensional class (such as, for priors on (0; 1), the class of monotone non-increasing functions)

Bayesian Statistics

with (for instance) bounds on the first two moments, You can in turn quantify bounds on summaries of the resulting posterior distribution, which may be narrow enough to demonstrate that Your uncertainty in specifying the prior does not lead to differences that are large in practical terms. Often context suggests specification of a prior that has relatively little information content in relation to the likelihood information; for reasons that are made clear in Sect. “Inference and Prediction: Binary Outcomes with No Covariates”, such priors are referred to as relatively diffuse or flat (the term non-informative is sometimes also used, but this seems worth avoiding, because any prior specification takes a particular position regarding the amount of relevant information external to the data). See Bernardo [10] and Kass and Wasserman [60] for a variety of formal methods for generating diffuse prior distributions.  Q: How do You quantify Your utility function U(a; ) for optimal decision-making? A: There is a rather less extensive statistical literature on elicitation of utility than probability; notable references include Fishburn [35,36], Schervish et al. [103], and the citations in Bernardo ans Smith [11]. There is a parallel (and somewhat richer) economics literature on utility elicitation; see, for instance, Abdellaoui [1] and Blavatskyy [12]. Sect. “Decision-Making: Variable Selection in Generalized Linear Models; Bayesian Model Selection” provides a decision-theoretic example.  Suppose that D ( 1 ; : : : ; k ) is a parameter vector of length k. Then (a) computing the normalizing constant in Bayes’ Theorem Z cD

Z 

p(yj 1 ; : : : ; k ; B)

 p( 1 ; : : : ; k jB) d 1    d k

1 (11)

involves evaluating a k-dimensional integral; (b) the predictive distribution in the second equation in (6) involves another k-dimensional integral; and (c) the posterior p( 1 ; : : : ; k jy; B) is a k-dimensional probability distribution, which for k > 2 can be difficult to visualize, so that attention often focuses on the marginal posterior distributions Z Z p( j jy; B) D    p( 1 ; : : : ; k jy; B) d  j (12) for j D 1; : : : k, where  j is the vector with component j omitted; each of these marginal distributions involves a (k  1)-dimensional integral. If k is large these

integrals can be difficult or impossible to evaluate exactly, and a general method for computing accurate approximations to them proved elusive from the time of Bayes in the eighteenth century until recently (in the late eighteenth century Laplace [63]) developed an analytical method, which today bears his name, for approximating integrals that arise in Bayesian work [11], but his method is not as general as the computationally-intensive techniques in widespread current use). Around 1990 there was a fundamental shift in Bayesian computation, with the belated discovery by the statistics profession of a class of techniques – Markov chain Monte Carlo (MCMC) methods [41,44] – for approximating high-dimensional Bayesian integrals in a computationally-intensive manner, which had been published in the chemical physics literature in the 1950s [78]; these methods came into focus for the Bayesian community at a moment when desktop computers had finally become fast enough to make use of such techniques. MCMC methods approximate integrals associated with the posterior distribution p( jy; B) by (a) creating a Markov chain whose equilibrium distribution is the desired posterior and (b) sampling from this chain from an initial (0) (i) until equilibrium has been reached (all draws up to this point are typically discarded) and (ii) for a sufficiently long period thereafter to achieve the desired approximation accuracy. With the advent and refinement of MCMC methods since 1990, the Bayesian integration problem has been solved for a wide variety of models, with more ambitious sampling schemes made possible year after year with increased computing speeds: for instance, in problems in which the dimension of the parameter space is not fixed in advance (an example is regression changepoint problems [104], where the outcome y is assumed to depend linearly (apart from stochastic noise) on the predictor(s) x but with an unknown number of changes of slope and intercept and unknown locations for those changes), ordinary MCMC techniques will not work; in such problems methods such as reversible-jump MCMC [47,94] and Markov birth-death processes [108], which create Markov chains that permit trans-dimensional jumps, are required. The main drawback of MCMC methods is that they do not necessarily scale well as n (the number of data observations) increases; one alternative, popular in the machine learning community, is variational methods [55], which convert the integration problem into an optimization problem by (a) approximating the posterior distribution of interest by a family of distributions yielding a closed-form approximation to the integral

259

260

Bayesian Statistics

and (b) finding the member of the family that maximizes the accuracy of the approximation.  Bayesian decision theory [6], based on maximizing expected utility, is unambiguous in its normative recommendation for how a single agent (You) should make a choice in the face of uncertainty, and it has had widespread success in fields such as economics (e. g., [2,50]) and medicine (e. g., [87,116]). It is well known, however [3,112]), that Bayesian decision theory (or indeed any other formal approach that seeks an optimal behavioral choice) can be problematic when used normatively for group decision-making, because of conflicts in preferences among members of the group. This is an important unsolved problem.

mortality outcomes become known to You, Your predictive distribution would not change. This still seems to leave p(y1 ; : : : ; y n jB) substantially unspecified (where B now includes the judgment of exchangeability of the yi ), but de Finetti [20] proved a remarkable theorem which shows (in effect) that all exchangeable predictive distributions for a vector of binary observables are representable as mixtures of Bernoulli sampling distributions: if You’re willing to regard (y1 ; : : : ; y n ) as the first n terms in an infinitely exchangeable binary sequence (y1 ; y2 ; : : : ) (which just means that every finite subsequence is exchangeable), then to achieve coherence Your predictive distribution must be expressible as Z 1

s n (1  )ns n p( jB) d ; (13) p(y1 ; : : : ; y n jB) D 0

Three Examples Inference and Prediction: Binary Outcomes with No Covariates Consider the problem of measuring the quality of care at a particular hospital H. One way to do this is to examine the outcomes of that care, such as mortality, after adjusting for the burden of illness brought by the patients to H on admission. As an even simpler version of this problem, consider just the n binary mortality observables y D (y1 ; : : : ; y n ) (with mortality measured within 30 days of admission, say; 1 = died, 0 = lived) that You will see from all of the patients at H with a particular admission diagnosis (heart attack, say) during some prespecified future time window. You acknowledge Your uncertainty about which elements in the sequence will be 0s and which 1s, and You wish to quantify this uncertainty using the Bayesian paradigm. As de Finetti [20] noted, in this situation Your fundamental imperative is to construct a predictive distribution p(y1 ; : : : ; y n jB) that expresses Your uncertainty about the future observables, rather than – as is perhaps more common – to reach immediately for a standard family of parametric models for the yi (in other words, to posit the existence of a vector

D ( 1 ; : : : ; k ) of parameters and to model the observables by appeal to a family p(y i j ; B) of probability distributions indexed by ). Even though the yi are binary, with all but the smallest values of n it still seems a formidable task to elicit from Yourself an n-dimensional predictive distribution p(y1 ; : : : ; y n jB). De Finetti [20] showed, however, that the task is easier than it seems. In the absence of any further information about the patients, You notice that Your uncertainty about them is exchangeable: if someone (without telling You) were to rearrange the order in which their

P where s n D niD1 y i . Here the quantity on the right side of (13) is more than just an integration variable: the equation says that in Your predictive modeling of the binary yi You may as well proceed as if  There is a quantity called , interpretable both as the marginal death probability p(y i D 1j ; B) for each patient and as the long-run mortality rate in the infinite sequence (y1 ; y2 ; : : : ) (which serves, in effect, as a population of values to which conclusions from the data can be generalized);  Conditional on and B, the yi are independent identically distributed (IID) Bernoulli ( ); and  can be viewed as a realization of a random variable with density p( jB). In other words, exchangeability of Your uncertainty about a binary process is functionally equivalent to assuming the simple Bayesian hierarchical model [27] ( jB)



p( jB)

IID

(y i j ; B) Bernoulli( ) ;

(14)

and p( jB) is recognizable as Your prior distribution for , the underlying death rate for heart attack patients similar to those You expect will arrive at hospital H during the relevant time window. Consider now the problem of quantitatively specifying prior information about . From (13) and (14) the likelihood function is l( jy; B) D c s n (1  )ns n ;

(15)

which (when interpreted in the Bayesian manner as a density in ) is recognizable as a member of the Beta family of probability distributions: for ˛; ˇ > 0 and 0 < < 1,

Beta(˛; ˇ) iff p( ) D c ˛1 (1  )ˇ 1 :

(16)

Bayesian Statistics

Moreover, this family has the property that the product of two Beta densities is another Beta density, so by Bayes’ Theorem if the prior p( jB) is chosen to be Beta(˛; ˇ) for some (as-yet unspecified) ˛ > 0 and ˇ > 0, then the posterior will be Beta(˛ C s n ; ˇ C n  s n ): this is conjugacy (Sect. “The Bayesian Statistical Paradigm”) of the Beta family for the Bernoulli/binomial likelihood. In this case the conjugacy leads to a simple interpretation of ˛ and ˇ: the prior acts like a data set with ˛ 1s and ˇ 0s, in the sense that if person 1 does a Bayesian analysis with a Beta(˛; ˇ) prior and sample data y D (y1 ; : : : ; y n ) and person 2 instead merges the corresponding “prior data set” with y and does a maximum-likelihood analysis (Sect. “Comparison with the Frequentist Statistical Paradigm”) on the resulting merged data, the two people will get the same answers. This also shows that the prior sample size n0 in the Beta– Bernoulli/binomial model is (˛ C ˇ). Given that the mean of a Beta(˛; ˇ) distribution is ˛ / (˛ C ˇ), calculation reveals that the posterior mean (˛ C s n ) / (˛ C ˇ C n) of is a weighted average of the prior mean and the data mean P y D n1 niD1 y i , with prior and data weights n0 and n, re-

spectively: ˛ C sn D ˛CˇCn

n0



˛ ˛Cˇ



C ny

n0 C n

:

(17)

These facts shed intuitive light on how Bayes’ Theorem combines information internal and external to a given data source: thinking of prior information as equivalent to a data set is a valuable intuition, even in non-conjugate settings. The choice of ˛ and ˇ naturally depends on the available information external to y. Consider for illustration two such specifications:  Analyst 1 does a web search and finds that the 30-day mortality rate for heart attack (given average quality of care and average patient sickness at admission) in her country is 15%. The information she has about hospital H is that its care and patient sickness are not likely to be wildly different from the country averages but that a mortality deviation from the mean, if present, would

Bayesian Statistics, Figure 1 Prior-to-posterior updating with two prior specifications in the mortality data set (in both panels, prior: long dotted lines; likelihood: short dotted lines; posterior: solid lines). The left and right panels give the updating with the priors for Analysts 1 and 2, respectively

261

262

Bayesian Statistics

be more likely to occur on the high side than the low. Having lived in the community served by H for some time and having not heard anything either outstanding or deplorable about the hospital, she would be surprised to find that the underlying heart attack death rate at H was less than (say) 5% or greater than (say) 30%. One way to quantify this information is to set the prior mean to 15% and to place (say) 95% of the prior mass between 5% and 30%.  Analyst 2 has little information external to y and thus wishes to specify a relatively diffuse prior distribution that does not dramatically favor any part of the unit interval. Numerical integration reveals that (˛; ˇ) D (4:5; 25:5), with a prior sample size of 30.0, satisfies Analyst 1’s constraints. Analyst 2’s diffuse prior evidently corresponds to a rather small prior sample size; a variety of positive values of ˛ and ˇ near 0 are possible, all of which will lead to a relatively flat prior. Suppose for illustration that the time period in question is about four years in length and H is a medium-size US hospital; then there will be about n D 385 heart attack patients in the data set y. Suppose further that the observed : mortality rate at H comes out y D s n / n D 69 / 385 D 18%. Figure 1 summarizes the prior-to-posterior updating with this data set and the two priors for Analysts 1 (left panel) and 2 (right panel), with ˛ D ˇ D 1 (the Uniform distribution) for Analyst 2. Even though the two priors are rather different – Analyst 1’s prior is skewed, : with a prior mean of 0.15 and n0 D 30; Analyst 2’s prior is flat, with a prior mean of 0.5 and n0 D 2 – it is evident that the posterior distributions are nearly the same in both cases; this is because the data sample size n D 385 is so much larger than either of the prior sample sizes, so that the likelihood information dominates. With both priors the likelihood and posterior distributions are nearly the same, another consequence of n0  n. For Analyst 1 the posterior mean, standard deviation, and 95% central posterior interval for are (0:177; 0:00241; 0:142; 0:215), and the corresponding numerical results for Analyst 2 are (0:181; 0:00258; 0:144; 0:221); again it is clear that the two sets of results are almost identical. With a large sample size, careful elicitation – like that undertaken by Analyst 1 – will often yield results similar to those with a diffuse prior. The posterior predictive distributionp(y nC1 jy1 ; : : :y n ; B) for the next observation, having observed the first n, is also straightforward to calculate in closed form with the conjugate prior in this model. It is clear that p(y nC1 jy; B) has to be a Bernoulli(  ) distribution for some  , and

intuition says that  should just be the mean ˛  / (˛  C ˇ  ) of the posterior distribution for given y, in which ˛  D ˛ C s n and ˇ  D ˇ C n  s n are the parameters of the Beta posterior. To check this, making use of the fact that the normalizing constant in the Beta(˛; ˇ) family is  (˛ C ˇ) /  (˛)  (ˇ), the second equation in (6) gives p(y nC1 jy1 ; : : : y n ; B) Z 1  (˛  C ˇ  ) ˛ 1 D

y nC1 (1  )1y nC1

 (˛  )  (ˇ  ) 0  (1  )ˇ 

D

 1

ˇ)

 (˛ C  (˛  )  (ˇ  )

d Z 1

˛

 Cy

nC1 1

0



 (1  )(ˇ y nC1 C1)1 d  (˛  C y nC1 )  (ˇ   y nC1 C 1) D  (˛  )  (ˇ  )    (˛ C ˇ )  ;  (˛  C ˇ  C 1)

(18)

this, combined with the fact that  (x C 1) /  (x) D x for any real x, yields, for example in the case y nC1 D 1,

 (˛  C 1)  (˛  ) ˛ D  ; ˛ C ˇ

p(y nC1 D 1jy; B) D



 (˛  C ˇ  )  (˛  C ˇ  C 1)



(19)

confirming intuition. Inference: Parametric and Non-Parametric Modeling of Count Data Most elderly people in the Western world say they would prefer to spend the end of their lives at home, but many instead finish their lives in an institution (a nursing home or hospital). How can elderly people living in their communities be offered health and social services that would help to prevent institutionalization? Hendriksen et al. [51] conducted an experiment in the 1980s in Denmark to test the effectiveness of in-home geriatric assessment (IHGA), a form of preventive medicine in which each person’s medical and social needs are assessed and acted upon individually. A total of n D 572 elderly people living in noninstitutional settings in a number of villages were randomized, nC D 287 to a control group, who received standard health care, and n T D 285 to a treatment group, who received standard care plus IHGA. The number of hospitalizations during the two-year life of the study was an outcome of particular interest. The data are presented and summarized in Table 1. Evidently IHGA lowered the mean hospitalization rate per

Bayesian Statistics Bayesian Statistics, Table 1 Distribution of number of hospitalizations in the IHGA study

Group Control Treatment

Number of Hospitalizations Sample 0 1 2 3 4 5 6 7 n Mean Variance 138 77 46 12 8 4 0 2 287 0.944 1.54 147 83 37 13 3 1 1 0 285 0.768 1.02

(b) G(FjB) D limn C !1 p(Fn C jB), where p(jB) is Your joint probability distribution on (C1 ; C2 ; : : : ); and (c) F is the space of all possible CDFs on R. Equation (20) says informally that exchangeability of Your uncertainty about an observable process unfolding on the real line is functionally equivalent to assuming the Bayesian hierarchical model (FjB)

p(FjB) IID

two years (for the elderly Danish people in the study, at : least) by (0:944  0:768) D 0:176, which is about an 18% reduction from the control level, a clinically large difference. The question then becomes, in Bayesian inferential language: what is the posterior distribution for the treatment effect in the entire population P of patients judged exchangeable with those in the study? Continuing to refer to the relevant analyst as You, with a binary outcome variable and no covariates in Sect. “Inference and Prediction: Binary Outcomes with No Covariates” the model arose naturally from a judgment of exchangeability of Your uncertainty about all n outcomes, but such a judgment of unconditional exchangeability would not be appropriate initially here; to make such a judgment would be to assert that the treatment and control interventions have the same effect on hospitalization, and it was the point of the study to see if this is true. Here, at least initially, it would be more scientifically appropriate to assert exchangeability separately and in parallel within the two experimental groups, a judgment de Finetti [22] called partial exchangeability and which has more recently been referred to as conditional exchangeability [28,72] given the treatment/control status covariate. Considering for the moment just the control group outcome values C i ; i D 1; : : : ; nC , and seeking as in Sect. “Inference and Prediction: Binary Outcomes with No Covariates” to model them via a predictive distribution p(C1 ; : : : ; C n C jB), de Finetti’s previous representation theorem is not available because the outcomes are real-valued rather than binary, but he proved [21] another theorem for this situation as well: if You’re willing to regard (C1 ; : : : ; C n C ) as the first nC terms in an infinitely exchangeable sequence (C1 ; C2 ; : : : ) of values on R (which plays the role of the population P , under the control condition, in this problem), then to achieve coherence Your predictive distribution must be expressible as

(y i jF; B)

F;

(21)

where p(FjB) is a prior distribution on F . Placing distributions on functions, such as CDFs and regression surfaces, is the topic addressed by the field of Bayesian nonparametric (BNP) modeling [24,80], an area of statistics that has recently moved completely into the realm of dayto-day implementation and relevance through advances in MCMC computational methods. Two rich families of prior distributions on CDFs about which a wealth of practical experience has recently accumulated include (mixtures of) Dirichlet processes [32] and Pólya trees [66]. Parametric modeling is of course also possible with the IHGA data: as noted by Krnjaji´c et al. [62], who explore both parametric and BNP models for data of this kind, Poisson modeling is a natural choice, since the outcome consists of counts of relatively rare events. The first Poisson model to which one would generally turn is a fixed-effects model, in which (C i jC ) are IID Poisson(C ) (i D 1; : : : ; nC D 287) and (T j jT ) are IID Poisson(T ) ( j D 1; : : : ; n T D 285), with a diffuse prior on (C ; T ) if little is known, external to the data set, about the underlying hospitalization rates in the control and treatment groups. However, the last two columns of Table 1 reveal that the sample variance is noticeably larger than the sample mean in both groups, indicating substantial Poisson over-dispersion. For a second, improved, parametric model this suggests a random-effects Poisson model of the form indep

(C i j i C ) Poisson( i C )  IID  N(ˇ0C ; C2 ) ; log( i C )jˇ0C ; C2

(22)

and similarly for the treatment group, with diffuse priors for (ˇ0C ; C2 ; ˇ0T ; T2 ). As Krnjaji´c et al. [62] note, from a medical point of view this model is more plausible than the fixed-effects formulation: each patient in the control group has his/her own latent (unobserved) underlying rate Z Y nC of hospitalization  i C , which may well differ from the unp(C1 ; : : : ; C n C jB) D F(C i ) dG(FjB) ; (20) derlying rates of the other control patients because of unF iD1 measured differences in factors such as health status at the here (a) F has an interpretation as F(t) D lim n C !1 Fn C (t), beginning of the experiment (and similarly for the treatwhere Fn C is the empirical CDF based on (C1 ; : : : ; C n C ); ment group).

263

264

Bayesian Statistics

Model (22), when complemented by its analogue in the treatment group, specifies a Lognormal mixture of Poisson distributions for each group and is straightforward to fit by MCMC, but the Gaussian assumption for the mixing distribution is conventional, not motivated by the underlying science of the problem, and if the distribution of the latent variables is not Gaussian – for example, if it is multimodal or skewed – model (22) may well lead to incorrect inferences. Krnjaji´c et al. [62] therefore also examine several BNP models that are centered on the random-effects Poisson model but which permit learning about the true underlying distribution of the latent variables instead of assuming it is Gaussian. One of their models, when applied (for example) to the control group, was (C i j i C )   log( i C )jG (Gj˛; ;  2 )

indep



IID

Poisson( i C )



G



DP[˛ N(;  2 )] :

(23)

Here DP[˛ N(;  2 )] refers to a Dirichlet process prior distribution, on the CDF G of the latent variables, which is centered at the N(;  2 ) model with precision parameter ˛. Model (23) is an expansion of the random-effects Poisson model (22) in that the latter is a special case of the former (obtained by letting ˛ ! 1). Model expansion is a common Bayesian analytic tool which helps to assess and propagate model uncertainty: if You are uncertain about a particular modeling detail, instead of fitting a model that assumes this detail is correct with probability 1, embed it in a richer model class of which it is a special case, and let the data tell You about its plausibility. With the IHGA data, models (22) and (23) turned out to arrive at similar inferential conclusions – in both cases point estimates of the ratio of the treatment mean to the control mean were about 0.82 with a posterior standard deviation of about 0.09, and a posterior probability that the (population) mean ratio was less than 1 of about 0.95, so that evidence is strong that IHGA lowers mean hospitalizations not just in the sample but in the collection P of elderly people to whom it is appropriate to generalize. But the two modeling approaches need not yield similar results: if the latent variable distribution is far from Gaussian, model (22) will not be able to adjust to this violation of one of its basic assumptions. Krnjaji´c et al. [62] performed a simulation study in which data sets with 300 observations were generated from various Gaussian and non-Gaussian latent variable distributions and a variety of parametric and BNP models were fit to the resulting count data; Fig. 2 summarizes the prior and posterior predictive distributions from models (22; top panel)

and (23; bottom panel) with a bimodal latent variable distribution. The parametric Gaussian random-effects model cannot fit the bimodality on the data scale, but the BNP model – even though centered on the Gaussian as the random-effects distribution – adapts smoothly to the underlying bimodal reality. Decision-Making: Variable Selection in Generalized Linear Models; Bayesian Model Selection Variable selection (choosing the “best” subset of predictors) in generalized linear models is an old problem, dating back at least to the 1960s, and many methods [113] have been proposed to try to solve it; but virtually all of them ignore an aspect of the problem that can be important: the cost of data collection of the predictors. An example, studied by Fouskakis and Draper [39], which is an elaboration of the problem examined in Sect. “Inference and Prediction: Binary Outcomes with No Covariates”, arises in the field of quality of health care measurement, where patient sickness at admission is often assessed by using logistic regression of an outcome, such as mortality within 30 days of admission, on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing standard variable selection methods (for instance, backward selection from a model with all predictors) to find an “optimal” subset of 10–20 indicators that predict mortality well. The problem with such benefit-only methods is that they ignore the considerable differences among the sickness indicators in the cost of data collection; this issue is crucial when admission sickness is used to drive programs (now implemented or under consideration in several countries, including the US and UK) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness), because such quality of care investigations are typically conducted under cost constraints. When both data-collection cost and accuracy of prediction of 30-day mortality are considered, a large variable-selection problem arises in which the only variables that make it into the final scale should be those that achieve a cost-benefit tradeoff. Variable selection is an example of the broader process of model selection, in which questions such as “Is model M 1 better than M 2 ?” and “Is M 1 good enough?” arise. These inquiries cannot be addressed, however, without first answering a new set of questions: good enough (better than) for what purpose? Specifying this purpose [26,57,61,70] identifies model selection as a decision problem that should be approached by constructing a contextually relevant utility function and maximizing expected utility. Fouskakis and Draper [39] create a utility

Bayesian Statistics

Bayesian Statistics, Figure 2 Prior (open circles) and posterior (solid circles) predictive distributions under models (22) and (23) (top and bottom panels, respectively) based on a data set generated from a bimodal latent variable distribution. In each panel, the histogram plots the simulated counts

function, for variable selection in their severity of illness problem, with two components that are combined additively: a data-collection component (in monetary units, such as US$), which is simply the negative of the total amount of money required to collect data on a given set of patients with a given subset of the sickness indicators; and a predictive-accuracy component, in which a method

is devised to convert increased predictive accuracy into decreased monetary cost by thinking about the consequences of labeling a hospital with bad quality of care “good” and vice versa. One aspect of their work, with a data set (from a RAND study: [58]) involving p D 83 sickness indicators gathered on a representative sample of n D 2,532 elderly American patients hospitalized in the period 1980–

265

266

Bayesian Statistics

Bayesian Statistics, Figure 3 Estimated expected utility of all 16 384 variable subsets in the quality of care study based on RAND data

86 with pneumonia, focused only on the p D 14 variables in the original RAND sickness scale; this was chosen because 214 D 16 384 was a small enough number of possible models to do brute-force enumeration of the estimated expected utility (EEU) of all the models. Figure 3 is a parallel boxplot of the EEUs of all 16 384 variable subsets, with the boxplots sorted by the number of variables in each model. The model with no predictors does poorly, with an EEU of about US$14.5, but from a cost-benefit point of view the RAND sickness scale with all 14 variables is even worse (US$15.7), because it includes expensive variables that do not add much to the predictive power in relation to cheaper variables that predict almost as well. The best subsets have 4–6 variables and would save about US$8 per patient when compared with the entire 14–variable scale; this would amount to significant savings if the observed-versus-expected assessment method were applied widely. Returning to the general problem of Bayesian model selection, two cases can be distinguished: situations in which the precise purpose to which the model will be put can be specified (as in the variable-selection problem above), and settings in which at least some of the end uses to which the modeling will be put are not yet known. In

this second situation it is still helpful to reason in a decision-theoretic way: the hallmark of a good (bad) model is that it makes good (bad) predictions, so a utility function based on predictive accuracy can be a good general-purpose choice. With (a) a single sample of data y, (b) a future data value y  , and (c) two models M j ( j D 1; 2) for illustration, what is needed is a scoring rule that measures the discrepancy between y  and its predictive distribution p(y  jy; M j ; B) under model M j . It turns out [46,86] that the optimal (impartial, symmetric, proper) scoring rules are linear functions of log p(y  jy; M j ; B), which has a simple intuitive motivation: if the predictive distribution is Gaussian, for example, then values of y  close to the center (in other words, those for which the prediction has been good) will receive a greater reward than those in the tails. An example [65], in this one-sample setting, of a model selection criterion (a) based on prediction, (b) motivated by utility considerations and (c) with good model discrimination properties [29] is the full-sample log score n 1X LS FS (M j jy; B) D log p (y i jy; M j ; B) ; (24) n iD1

which is related to the conditional predictive ordinate cri-

Bayesian Statistics

terion [90]. Other Bayesian model selection criteria in current use include the following:  Bayes factors [60]: Bayes’ Theorem, written in odds form for discriminating between models M 1 and M 2 , says that p(M1 jy; B) D p(M2 jy; B)



p(M1 jB) p(yjM1 ; B)  ; p(M2 jB) p(yjM2 ; B)

(25)

here the prior odds in favor of M1 ; p(M1 jB) ; p(M2 jB)

 Deviance Information Criterion (DIC): Given a parametric model p(yj j ; M j ; B), Spiegelhalter et al. [106] define the deviance information criterion (DIC) (by analogy with other information criteria) to be a tradeoff between (a) an estimate of the model lack of fit, as measured by the deviance D( j ) (where j is the posterior mean of j under M j ; for the purpose of DIC, the deviance of a model [75] is minus twice the logarithm of the likelihood for that model), and (b) a penalty for model complexity equal to twice the effective number of parameters p D j of the model: DIC(M j jy; B) D D( j ) C 2 pˆ D j :

are multiplied by the Bayes factor

When p D j is difficult to read directly from the model (for example, in complex hierarchical models, especially those with random effects), Spiegelhalter et al. motivate the following estimate, which is easy to compute from standard MCMC output:

p(yjM1 ; B) p(yjM2 ; B) to produce the posterior odds p(M1 jy; B) : p(M2 jy; B)

pˆ D j D D( j )  D( j ) ;

According to the logic of this criterion, models with high posterior probability are to be preferred, and if all the models under consideration are equally plausible a priori this reduces to preferring models with larger Bayes factors in their favor. One problem with this approach is that – in parametric models in which model M j has parameter vector j defined on parameter space  j – the integrated likelihoods p(yjM j ; B) appearing in the Bayes factor can be expressed as p(yjM j ; B) D

Z j

(27)

p(yj j ; M j ; B) p( j jM j ; B) d j

  D E( j jM j ;B) p(yj j ; M j ; B) : (26) In other words, the numerator and denominator ingredients in the Bayes factor are each expressible as expectations of likelihood functions with respect to the prior distributions on the model parameters, and if context suggests that these priors should be specified diffusely the resulting Bayes factor can be unstable as a function of precisely how the diffuseness is specified. Various attempts have been made to remedy this instability of Bayes factors (for example, {partial, intrinsic, fractional} Bayes factors, well calibrated priors, conventional priors, intrinsic priors, expected posterior priors, . . . ; [9]); all of these methods appear to require an appeal to ad-hockery which is absent from the log score approach.

(28)

in other words, pˆ D j is the difference between the posterior mean of the deviance and the deviance evaluated at the posterior mean of the parameters. DIC is available as an option in several MCMC packages, including WinBUGS [107] and MLwiN [93]. One difficulty with DIC is that the MCMC estimate of p D j can be poor if the marginal posteriors for one or more parameters (using the parameterization that defines the deviance) are far from Gaussian; reparameterization (onto parameter scales where the posteriors are approximately Normal) helps but can still lead to mediocre estimates of p D j . Other notable recent references on the subject of Bayesian variable selection include Brown et al. [13], who examine multivariate regression in the context of compositional data, and George and Foster [43], who use empirical Bayes methods in the Gaussian linear model. Comparison with the Frequentist Statistical Paradigm Strengths and Weaknesses of the Two Approaches Frequentist statistics, which has concentrated mainly on inference, proceeds by (i) thinking of the values in a data set y as like a random sample from a population P (a set to which it is hoped that conclusions based on the data can validly be generalized), (ii) specifying a summary of interest in P (such as the population mean of the outcome variable), (iii) identifying a function ˆ of y that can

267

268

Bayesian Statistics

serve as a reasonable estimate of , (iv) imagining repeating the random sampling from P to get other data sets y and therefore other values of ˆ , and (v) using the random behavior of ˆ across these repetitions to make inferential probability statements involving . A leading implementation of the frequentist paradigm [37] is based on using the value ˆMLE that maximizes the likelihood function as the estimate of and obtaining a measure of uncertainty for ˆMLE from the curvature of the logarithm of the likelihood function at its maximum; this is maximum likelihood inference. Each of the frequentist and Bayesian approaches to statistics has strengths and weaknesses.  The frequentist paradigm has the advantage that repeated-sampling calculations are often more tractable than manipulations with conditional probability distributions, and it has the clear strength that it focuses attention on the scientifically important issue of calibration: in settings where the true data-generating process is known (e. g., in simulations of random sampling from a known population P ), how often does a particular method of statistical inference recover known truth? The frequentist approach has the disadvantage that it only applies to inherently repeatable phenomena, and therefore cannot be used to quantify uncertainty about many true-false propositions of real-world interest (for example, if You are a doctor to whom a new patient (male, say) has just come, strictly speaking You cannot talk about the frequentist probability that this patient is HIV positive; he either is or he is not, and his arriving at Your office is not the outcome of any repeatable process that is straightforward to identify). In practice the frequentist approach also has the weaknesses that (a) model uncertainty is more difficult to assess and propagate in this paradigm, (b) predictive uncertainty assessments are not always straightforward to create from the frequentist point of view (the bootstrap [31] is one possible solution) and (c) inferential calibration may not be easy to achieve when the sample size n is small. An example of several of these drawbacks arises in the construction of confidence intervals [83], in which repeated-sampling statements such as P f ( ˆlow < < ˆhigh ) D 0:95

(29)

(where Pf quantifies the frequentist variability in ˆlow and ˆhigh across repeated samples from P ) are interpreted in the frequentist paradigm as suggesting that the unknown lies between ˆlow and ˆhigh with 95% “confidence.” Two difficulties with this are that

(a) equation (29) looks like a probability statement about but is not, because in the frequentist approach is a fixed unknown constant that cannot be described probabilistically, and (b) with small sample sizes nominal 95% confidence intervals based on maximum likelihood estimation can have actual coverage (the percentage of time in repeated sampling that the interval includes the true ) substantially less than 95%.  The Bayesian approach has the following clear advantages: (a) It applies (at least in principle) to uncertainty about anything, whether associated with a repeatable process or not; (b) inference is unambiguously based on the first equation in (6), without the need to face questions such as what constitutes a “reasonable” estimate of (step (iii) in the frequentist inferential paradigm above); (c) prediction is straightforwardly and unambiguously based on the second equation in (6); and (d) in the problem of decision analysis a celebrated theorem of Wald [111] says informally that all good decisions can be interpreted as having been arrived at by maximizing expected utility as in the third equation of (6), so the Bayesian approach appears to be the way forward in decision problems rather broadly (but note the final challenge at the end of Sect. “The Bayesian Statistical Paradigm”). The principal disadvantage of the Bayesian approach is that coherence (internal logical consistency) by itself does not guarantee good calibration: You are free in the Bayesian paradigm to insert strong prior information in the modeling process (without violating coherence), and – if this information is seen after the fact to have been out of step with the world – Your inferences, predictions and/or decisions may also be off-target (of course, the same is true in the both the frequentist and Bayesian paradigms with regard to Your modeling of the likelihood information). Two examples of frequentist inferences having poor calibration properties in small samples were given by Browne and Draper [14]. Their first example again concerns the measurement of quality of care, which is often studied with cluster samples: a random sample of J hospitals (indexed by j) and a random sample of N total patients (indexed by i) nested in the chosen hospitals is taken, and quality of care for the chosen patients and various hospital- and patient-level predictors are measured. With yij as the quality of care score for patient i in hospital j, a first step would often be to fit a variance-components model with random effects at both the hospital and patient levels, to assess the relative magnitudes of within- and between-hospital vari-

Bayesian Statistics

ability in quality of care: y i j D ˇ0 C u j C e i j ; i D 1; : : : ; n j ; J X

nj D N ;

j D 1; : : : ; J ;

IID

(u j ju2 ) N(0; u2 ) ;

jD1 IID

(e i j j e2 ) N(0;  e2 ) :

(30)

Browne and Draper [14] used a simulation study to show that, with a variety of maximum-likelihood-based methods for creating confidence intervals for u2 , the actual coverage of nominal 95% intervals ranged from 72–94% across realistic sample sizes and true parameter values in the fields of education and medicine, versus 89–94% for Bayesian methods based on diffuse priors. Their second example involved a re-analysis of a Guatemalan National Survey of Maternal and Child Health [89,97], with three-level data (births nested within mothers within communities), working with the randomeffects logistic regression model   indep with (y i jk j p i jk ) Bernoulli p i jk   logit p i jk D ˇ0 Cˇ1 x1i jk Cˇ2 x2 jk Cˇ3 x3k Cu jk Cv k ; (31) where yijk is a binary indicator of modern prenatal care or not and where u jk N(0; u2 ) and v k N(0; v2 ) were random effects at the mother and community levels (respectively). Simulating data sets with 2 449 births by 1 558 women living in 161 communities (as in the Rodríguez and Goldman study [97]), Browne and Draper [14] showed that things can be even worse for likelihood-based methods in this model, with actual coverages (at nominal 95%) as low as 0–2% for intervals for u2 and v2 , whereas Bayesian methods with diffuse priors again produced actual coverages from 89–96%. The technical problem is that the marginal likelihood functions for random-effects variances are often heavily skewed, with maxima at or near 0 even when the true variance is positive; Bayesian methods, which integrate over the likelihood function rather than maximizing it, can have (much) better small-sample calibration performance as a result. Some Historical Perspective The earliest published formal example of an attempt to do statistical inference – to reason backwards from effects to causes – seems to have been Bayes [5], who defined conditional probability for the first time and noted that the result we now call Bayes’ Theorem was a trivial consequence of the definition. From the 1760s til the 1920s,

all (or almost all) statistical inference was Bayesian, using the paradigm that Fisher and others referred to as inverse probability; prominent Bayesians of this period included Gauss [40], Laplace [64] and Pearson [88]. This Bayesian consensus changed with the publication of Fisher [37], which laid out a user-friendly program for maximum-likelihood estimation and inference in a wide variety of problems. Fisher railed against Bayesian inference; his principal objection was that in settings where little was known about a parameter (vector) external to the data, a number of prior distributions could be put forward to quantify this relative ignorance. He believed passionately in the late Victorian–Edwardian goal of scientific objectivity, and it bothered him greatly that two analysts with somewhat different diffuse priors might obtain somewhat different posteriors. (There is a Bayesian account of objectivity: a probability is objective if many different people more or less agree on its value. An example would be the probability of drawing a red ball from an urn known to contain 20 red and 80 white balls, if a sincere attempt is made to thoroughly mix the balls without looking at them and to draw the ball in a way that does not tend to favor one ball over another.) There are two problems with Fisher’s argument, which he never addressed: 1. He would be perfectly correct to raise this objection to Bayesian analysis if investigators were often forced to do inference based solely on prior information with no data, but in practice with even modest sample sizes the posterior is relatively insensitive to the precise manner in which diffuseness is specified in the prior, because the likelihood information in such situations is relatively so much stronger than the prior information; Sect. “Inference and Prediction: Binary Outcomes with No Covariates” provides an example of this phenomenon. 2. If Fisher had looked at the entire process of inference with an engineering eye to sensitivity and stability, he would have been forced to admit that uncertainty in how to specify the likelihood function has inferential consequences that are often an order of magnitude larger than those arising from uncertainty in how to specify the prior. It is an inescapable fact that subjectivity, through assumptions and judgments (such as the form of the likelihood function), is an integral part of any statistical analysis in problems of realistic complexity. In spite of these unrebutted flaws in Fisher’s objections to Bayesian inference, two schools of frequentist inference – one based on Fisher’s maximum-likelihood esti-

269

270

Bayesian Statistics

mation and significance tests [38], the other based on the confidence intervals and hypothesis tests of Neyman [83] and Neyman and Pearson [84] – came to dominate statistical practice from the 1920s at least through the 1980s. One major reason for this was practical: the Bayesian paradigm is based on integrating over the posterior distribution, and accurate approximations to high-dimensional integrals were not available during the period in question. Fisher’s technology, based on differentiation (to find the maximum and curvature of the logarithm of the likelihood function) rather than integration, was a much more tractable approach for its time. Jeffreys [54], working in the field of astronomy, and Savage [102] and Lindley [69], building on de Finetti’s results, advocated forcefully for the adoption of Bayesian methods, but prior to the advent of MCMC techniques (in the late 1980s) Bayesians were often in the position of saying that they knew the best way to solve statistical problems but the computations were beyond them. MCMC has removed this practical objection to the Bayesian paradigm for a wide class of problems. The increased availability of affordable computers with decent CPU throughput in the 1980s also helped to overcome one objection raised in Sect. “Strengths and Weaknesses of the Two Approaches” against likelihood methods, that they can produce poorly-calibrated inferences with small samples, through the introduction of the bootstrap by Efron [31] in 1979. At this writing (a) both the frequentist and Bayesian paradigms are in vigorous inferential use, with the proportion of Bayesian articles in leading journals continuing an increase that began in the 1980s; (b) Bayesian MCMC analyses are often employed to produce meaningful predictive conclusions, with the use of the bootstrap increasing for frequentist predictive calibration; and (c) the Bayesian paradigm dominates decision analysis. A Bayesian-Frequentist Fusion During the 20th century the debate over which paradigm to use was often framed in such a way that it seemed it was necessary to choose one approach and defend it against attacks from people who had chosen the other, but there is nothing that forces an analyst to choose a single paradigm. Since both approaches have strengths and weaknesses, it seems worthwhile instead to seek a fusion of the two that makes best use of the strengths. Because (a) the Bayesian paradigm appears to be the most flexible way so far developed for quantifying all sources of uncertainty and (b) its main weakness is that coherence does not guarantee good calibration, a number of statisti-

cians, including Rubin [98], Draper [26], and Little [73], have suggested a fusion in which inferences, predictions and decisions are formulated using Bayesian methods and then evaluated for their calibration properties using frequentist methods, for example by using Bayesian models to create 95% predictive intervals for observables not used in the modeling process and seeing if approximately 95% of these intervals include the actual observed values. Analysts more accustomed to the purely frequentist (likelihood) paradigm who prefer not to explicitly make use of prior distributions may still find it useful to reason in a Bayesian way, by integrating over the parameter uncertainty in their likelihood functions rather than maximizing over it, in order to enjoy the superior calibration properties that integration has been demonstrated to provide. Future Directions Since the mid- to late-1980s the Bayesian statistical paradigm has made significant advances in many fields of inquiry, including agriculture, archaeology, astronomy, bioinformatics, biology, economics, education, environmetrics, finance, health policy, and medicine (see Sect. “The Bayesian Statistical Paradigm” for recent citations of work in many of these disciplines). Three areas of methodological and theoretical research appear particularly promising for extending the useful scope of Bayesian work, as follows:  Elicitation of prior distributions and utility functions: It is arguable that too much use is made in Bayesian analysis of diffuse prior distributions, because (a) accurate elicitation of non-diffuse priors is hard work and (b) lingering traces still remain of a desire to at least appear to achieve the unattainable Victorian–Edwardian goal of objectivity, the (false) argument being that the use of diffuse priors somehow equates to an absence of subjectivity (see, e. g., the papers by Berger [7] and Goldstein [45] and the ensuing discussion for a vigorous debate on this issue). It is also arguable that too much emphasis was placed in the 20th century on inference at the expense of decision-making, with inferential tools such as the Neyman–Pearson hypothesis testing machinery (Sect. “Some Historical Perspective”) used incorrectly to make decisions for which they are not optimal; the main reason for this, as noted in Sect. “Strengths and Weaknesses of the Two Approaches” and “Some Historical Perspective”, is that (a) the frequentist paradigm was dominant from the 1920s through the 1980s and (b) the high ground in decision theory is dominated by the Bayesian approach.

Bayesian Statistics

Relevant citations of excellent recent work on elicitation of prior distributions and utility functions were given in Sect. “The Bayesian Statistical Paradigm”; it is natural to expect that there will be a greater emphasis on decision theory and non-diffuse prior modeling in the future, and elicitation in those fields of Bayesian methodology is an important area of continuing research.  Group decision-making: As noted in Sect. “The Bayesian Statistical Paradigm”, maximizing expected utility is an effective method for decision-making by a single agent, but when two or more agents are involved in the decision process this approach cannot be guaranteed to yield a satisfying solution: there may be conflicts in the agents’ preferences, particularly if their relationship is at least partly adversarial. With three or more possible actions, transitivity of preference – if You prefer action a1 to a2 and a2 to a3 , then You should prefer a1 to a3 – is a criterion that any reasonable decision-making process should obey; informally, a well-known theorem by Arrow [3] states that even if all of the agents’ utility functions obey transitivity, there is no way to combine their utility functions into a single decisionmaking process that is guaranteed to respect transitivity. However, Arrow’s theorem is temporally static, in the sense that the agents do not share their utility functions with each other and iterate after doing so, and it assumes that all agents have the same set A of feasible actions. If agents A1 and A2 have action spaces A1 and A2 that are not identical and they share the details of their utility specification with each other, it is possible that A1 may realize that one of the actions in A2 that (s)he had not considered is better than any of the actions in A1 or vice versa; thus a temporally dynamic solution to the problem posed by Arrow’s theorem may be possible, even if A1 and A2 are partially adversarial. This is another important area for new research.  Bayesian computation: Since the late 1980s, simulation-based computation based on Markov chain Monte Carlo (MCMC) methods has made useful Bayesian analyses possible in an increasingly broad range of application areas, and (as noted in Sect. “The Bayesian Statistical Paradigm”) increases in computing speed and sophistication of MCMC algorithms have enhanced this trend significantly. However, if a regression-style data set is visualized as a matrix with n rows (one for each subject of inquiry) and k columns (one for each variable measured on the subjects), MCMC methods do not necessarily scale well in either n or k, with the result that they can be too slow to be of practical

use with large data sets (e.g, at current desktop computing speeds, with n and/or k on the order of 105 or greater). Improving the scaling of MCMC methods, or finding a new approach to Bayesian computation that scales better, is thus a third important area for continuing study.

Bibliography 1. Abdellaoui M (2000) Parameter-free elicitation of utility and probability weighting functions. Manag Sci 46:1497–1512 2. Aleskerov F, Bouyssou D, Monjardet B (2007) Utility Maximization, Choice and Preference, 2nd edn. Springer, New York 3. Arrow KJ (1963) Social Choice and Individual Values, 2nd edn. Yale University Press, New Haven CT 4. Barlow RE, Wu AS (1981) Preposterior analysis of Bayes estimators of mean life. Biometrika 68:403–410 5. Bayes T (1764) An essay towards solving a problem in the doctrine of chances. Philos Trans Royal Soc Lond 53:370–418 6. Berger JO (1985) Statistical Decision Theory and Bayesian Analysis. Springer, New York 7. Berger JO (2006) The case for objective Bayesian analysis (with discussion). Bayesian Anal 1:385–472 8. Berger JO, Betro B, Moreno E, Pericchi LR, Ruggeri F, Salinetti G, Wasserman L (eds) (1995) Bayesian Robustness. Institute of Mathematical Statistics Lecture Notes-Monograph Series, vol 29. IMS, Hayward CA 9. Berger JO, Pericchi LR (2001) Objective Bayesian methods for model selection: introduction and comparison. In: Lahiri P (ed) Model Selection. Monograph Series, vol 38. Institute of Mathematical Statistics Lecture Notes Series, Beachwood, pp 135–207 10. Bernardo JM (1979) Reference posterior distributions for Bayesian inference (with discussion). J Royal Stat Soc, Series B 41:113–147 11. Bernardo JM, Smith AFM (1994) Bayesian Theory. Wiley, New York 12. Blavatskyy P (2006) Error propagation in the elicitation of utility and probability weighting functions. Theory Decis 60:315– 334 13. Brown PJ, Vannucci M, Fearn T (1998) Multivariate Bayesian variable selection and prediction. J Royal Stat Soc, Series B 60:627–641 14. Browne WJ, Draper D (2006) A comparison of Bayesian and likelihood methods for fitting multilevel models (with discussion). Bayesian Anal 1:473–550 15. Buck C, Blackwell P (2008) Bayesian construction of radiocarbon calibration curves (with discussion). In: Case Studies in Bayesian Statistics, vol 9. Springer, New York 16. Chipman H, George EI, McCulloch RE (1998) Bayesian CART model search (with discussion). J Am Stat Assoc 93:935–960 17. Christiansen CL, Morris CN (1997) Hierarchical Poisson regression modeling. J Am Stat Assoc 92:618–632 18. Clyde M, George EI (2004) Model uncertainty. Stat Sci 19:81– 94 19. Das S, Chen MH, Kim S, Warren N (2008) A Bayesian structural equations model for multilevel data with missing responses and missing covariates. Bayesian Anal 3:197–224

271

272

Bayesian Statistics

20. de Finetti B (1930) Funzione caratteristica di un fenomeno aleatorio. Mem R Accad Lincei 4:86–133 21. de Finetti B (1937). La prévision: ses lois logiques, ses sources subjectives. Ann Inst H Poincaré 7:1–68 (reprinted in translation as de Finetti B (1980) Foresight: its logical laws, its subjective sources. In: Kyburg HE, Smokler HE (eds) Studies in Subjective Probability. Dover, New York, pp 93–158) 22. de Finetti B (1938/1980). Sur la condition d’équivalence partielle. Actual Sci Ind 739 (reprinted in translation as de Finetti B (1980) On the condition of partial exchangeability. In: Jeffrey R (ed) Studies in Inductive Logic and Probability. University of California Press, Berkeley, pp 193–206) 23. de Finetti B (1970) Teoria delle Probabilità, vol 1 and 2. Eunaudi, Torino (reprinted in translation as de Finetti B (1974– 75) Theory of probability, vol 1 and 2. Wiley, Chichester) 24. Dey D, Müller P, Sinha D (eds) (1998) Practical Nonparametric and Semiparametric Bayesian Statistics. Springer, New York 25. Draper D (1995) Assessment and propagation of model uncertainty (with discussion). J Royal Stat Soc, Series B 57:45–97 26. Draper D (1999) Model uncertainty yes, discrete model averaging maybe. Comment on: Hoeting JA, Madigan D, Raftery AE, Volinsky CT (eds) Bayesian model averaging: a tutorial. Stat Sci 14:405–409 27. Draper D (2007) Bayesian multilevel analysis and MCMC. In: de Leeuw J, Meijer E (eds) Handbook of Multilevel Analysis. Springer, New York, pp 31–94 28. Draper D, Hodges J, Mallows C, Pregibon D (1993) Exchangeability and data analysis (with discussion). J Royal Stat Soc, Series A 156:9–37 29. Draper D, Krnjaji´c M (2008) Bayesian model specification. Submitted 30. Duran BS, Booker JM (1988) A Bayes sensitivity analysis when using the Beta distribution as a prior. IEEE Trans Reliab 37:239–247 31. Efron B (1979) Bootstrap methods. Ann Stat 7:1–26 32. Ferguson T (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230 33. Ferreira MAR, Lee HKH (2007) Multiscale Modeling. Springer, New York 34. Fienberg SE (2006) When did Bayesian inference become “Bayesian”? Bayesian Anal 1:1–40 35. Fishburn PC (1970) Utility Theory for Decision Making. Wiley, New York 36. Fishburn PC (1981) Subjective expected utility: a review of normative theories. Theory Decis 13:139–199 37. Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos Trans Royal Soc Lond, Series A 222:309–368 38. Fisher RA (1925) Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh 39. Fouskakis D, Draper D (2008) Comparing stochastic optimization methods for variable selection in binary outcome prediction, with application to health policy. J Am Stat Assoc, forthcoming 40. Gauss CF (1809) Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium, vol 2. Perthes and Besser, Hamburg 41. Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409

42. Gelman A, Meng X-L (2004) Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives. Wiley, New York 43. George EI, Foster DP (2000) Calibration and empirical Bayes variable selection. Biometrika 87:731–747 44. Gilks WR, Richardson S, Spiegelhalter DJ (eds) (1996) Markov Chain Monte Carlo in Practice. Chapman, New York 45. Goldstein M (2006) Subjective Bayesian analysis: principles and practice (with discussion). Bayesian Anal 1:385–472 46. Good IJ (1950) Probability and the Weighing of Evidence. Charles Griffin, London 47. Green P (1995) Reversible jump Markov chain Monte carlo computation and Bayesian model determination. Biometrika 82:711–713 48. Hacking I (1984) The Emergence of Probability. University Press, Cambridge 49. Hanson TE, Kottas A, Branscum AJ (2008) Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian non-parametric approaches. J Royal Stat Soc, Series C (Applied Statistics) 57:207–226 50. Hellwig K, Speckbacher G, Weniges P (2000) Utility maximization under capital growth constraints. J Math Econ 33:1–12 51. Hendriksen C, Lund E, Stromgard E (1984) Consequences of assessment and intervention among elderly people: a three year randomized controlled trial. Br Med J 289:1522–1524 52. Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417 53. Jaynes ET (2003) Probability Theory: The Logic of Science. Cambridge University Press, Cambridge 54. Jeffreys H (1931) Scientific Inference. Cambridge University Press, Cambridge 55. Jordan MI, Ghahramani Z, Jaakkola TS, Saul L (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233 56. Kadane JB (ed) (1996) Bayesian Methods and Ethics in a Clinical Trial Design. Wiley, New York 57. Kadane JB, Dickey JM (1980) Bayesian decision theory and the simplification of models. In: Kmenta J, Ramsey J (eds) Evaluation of Econometric Models. Academic Press, New York 58. Kahn K, Rubenstein L, Draper D, Kosecoff J, Rogers W, Keeler E, Brook R (1990) The effects of the DRG-based Prospective Payment System on quality of care for hospitalized Medicare patients: An introduction to the series (with discussion). J Am Med Assoc 264:1953–1955 59. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795 60. Kass RE, Wasserman L (1996) The selection of prior distributions by formal rules. J Am Stat Assoc 91:1343–1370 61. Key J, Pericchi LR, Smith AFM (1999) Bayesian model choice: what and why? (with discussion). In: Bernardo JM, Berger JO, Dawid AP, Smith AFM (eds) Bayesian Statistics 6. Clarendon Press, Oxford, pp 343–370 62. Krnjaji´c M, Kottas A, Draper D (2008) Parametric and nonparametric Bayesian model specification: a case study involving models for count data. Comput Stat Data Anal 52:2110–2128 63. Laplace PS (1774) Mémoire sur la probabilité des causes par les évenements. Mém Acad Sci Paris 6:621–656 64. Laplace PS (1812) Théorie Analytique des Probabilités. Courcier, Paris

Bayesian Statistics

65. Laud PW, Ibrahim JG (1995) Predictive model selection. J Royal Stat Soc, Series B 57:247–262 66. Lavine M (1992) Some aspects of Pólya tree distributions for statistical modelling. Ann Stat 20:1222–1235 67. Leamer EE (1978) Specification searches: Ad hoc inference with non-experimental data. Wiley, New York 68. Leonard T, Hsu JSJ (1999) Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers. Cambridge University Press, Cambridge 69. Lindley DV (1965) Introduction to Probability and Statistics. Cambridge University Press, Cambridge 70. Lindley DV (1968) The choice of variables in multiple regression (with discussion). J Royal Stat Soc, Series B 30:31–66 71. Lindley DV (2006) Understanding Uncertainty. Wiley, New York 72. Lindley DV, Novick MR (1981) The role of exchangeability in inference. Ann Stat 9:45–58 73. Little RJA (2006) Calibrated Bayes: A Bayes/frequentist roadmap. Am Stat 60:213–223 74. Mangel M, Munch SB (2003) Opportunities for Bayesian analysis in the search for sustainable fisheries. ISBA Bulletin 10:3–5 75. McCullagh P, Nelder JA (1989) Generalized Linear Models, 2nd edn. Chapman, New York 76. Meng XL, van Dyk DA (1997) The EM Algorithm: an old folk song sung to a fast new tune (with discussion). J Royal Stat Soc, Series B 59:511–567 77. Merl D, Prado R (2007) Detecting selection in DNA sequences: Bayesian modelling and inference (with discussion). In: Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West M (eds) Bayesian Statistics 8. University Press, Oxford, pp 1–22 78. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machine. J Chem Phys 21:1087–1091 79. Morris CN, Hill J (2000) The Health Insurance Experiment: design using the Finite Selection Model. In: Morton SC, Rolph JE (eds) Public Policy and Statistics: Case Studies from RAND. Springer, New York, pp 29–53 80. Müller P, Quintana F (2004) Nonparametric Bayesian data analysis. Stat Sci 19:95–110 81. Müller P, Quintana F, Rosner G (2004) Hierarchical meta-analysis over related non-parametric Bayesian models. J Royal Stat Soc, Series B 66:735–749 82. Munch SB, Kottas A, Mangel M (2005) Bayesian nonparametric analysis of stock-recruitment relationships. Can J Fish Aquat Sci 62:1808–1821 83. Neyman J (1937) Outline of a theory of statistical estimation based on the classical theory of probability. Philos Trans Royal Soc Lond A 236:333–380 84. Neyman J, Pearson ES (1928) On the use and interpretation of certain test criteria for purposes of statistical inference. Biometrika 20:175–240 85. O’Hagan A, Buck CE, Daneshkhah A, Eiser JR, Garthwaite PH, Jenkinson DJ, Oakley JE, Rakow T (2006) Uncertain Judgements. Eliciting Experts’ Probabilities. Wiley, New York 86. O’Hagan A, Forster J (2004) Bayesian Inference, 2nd edn. In: Kendall’s Advanced Theory of Statistics, vol 2B. Arnold, London 87. Parmigiani G (2002) Modeling in medical decision-making: A Bayesian approach. Wiley, New York

88. Pearson KP (1895) Mathematical contributions to the theory of evolution, II. Skew variation in homogeneous material. Proc Royal Soc Lond 57:257–260 89. Pebley AR, Goldman N (1992) Family, community, ethnic identity, and the use of formal health care services in Guatemala. Working Paper 92-12. Office of Population Research, Princeton 90. Pettit LI (1990) The conditional predictive ordinate for the Normal distribution. J Royal Stat Soc, Series B 52:175–184 91. Pérez JM, Berger JO (2002) Expected posterior prior distributions for model selection. Biometrika 89:491–512 92. Polson NG, Stroud JR, Müller P (2008) Practical filtering with sequential parameter learning. J Royal Stat Soc, Series B 70:413–428 93. Rashbash J, Steele F, Browne WJ, Prosser B (2005) A User’s Guide to MLwiN, Version 2.0. Centre for Multilevel Modelling, University of Bristol, Bristol UK; available at www.cmm.bristol. ac.uk Accessed 15 Aug 2008 94. Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J Royal Stat Soc, Series B 59:731–792 95. Rios Insua D, Ruggeri F (eds) (2000) Robust Bayesian Analysis. Springer, New York 96. Rodríguez A, Dunston DB, Gelfand AE (2008) The nested Dirichlet process. J Am Stat Assoc, 103, forthcoming 97. Rodríguez G, Goldman N (1995) An assessment of estimation procedures for multilevel models with binary responses. J Royal Stat Soc, Series A 158:73–89 98. Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 12:1151– 1172 99. Rubin DB (2005) Bayesian inference for causal effects. In: Rao CR, Dey DK (eds) Handbook of Statistics: Bayesian Thinking, Modeling and Computation, vol 25. Elsevier, Amsterdam, pp 1–16 100. Sabatti C, Lange K (2008) Bayesian Gaussian mixture models for high-density genotyping arrays. J Am Stat Assoc 103:89–100 101. Sansó B, Forest CE, Zantedeschi D (2008) Inferring climate system properties using a computer model (with discussion). Bayesian Anal 3:1–62 102. Savage LJ (1954) The Foundations of Statistics. Wiley, New York 103. Schervish MJ, Seidenfeld T, Kadane JB (1990) State-dependent utilities. J Am Stat Assoc 85:840–847 104. Seidou O, Asselin JJ, Ouarda TMBJ (2007) Bayesian multivariate linear regression with application to change point models in hydrometeorological variables. In: Water Resources Research 43, W08401, doi:10.1029/2005WR004835. 105. Spiegelhalter DJ, Abrams KR, Myles JP (2004) Bayesian Approaches to Clinical Trials and Health-Care Evaluation. Wiley, New York 106. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A (2002) Bayesian measures of model complexity and fit (with discussion). J Royal Stat Soc, Series B 64:583–640 107. Spiegelhalter DJ, Thomas A, Best NG (1999) WinBUGS Version 1.2 User Manual. MRC Biostatistics Unit, Cambridge 108. Stephens M (2000) Bayesian analysis of mixture models with an unknown number of components – an alternative to reversible-jump methods. Ann Stat 28:40–74

273

274

Bayesian Statistics

109. Stigler SM (1986) The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard University Press, Cambridge 110. Stone M (1974) Cross-validation choice and assessment of statistical predictions (with discussion). J Royal Stat Soc, Series B 36:111–147 111. Wald A (1950) Statistical Decision Functions. Wiley, New York 112. Weerahandi S, Zidek JV (1981) Multi-Bayesian statistical decision theory. J Royal Stat Soc, Series A 144:85–93

113. Weisberg S (2005) Applied Linear Regression, 3rd edn. Wiley, New York 114. West M (2003) Bayesian factor regression models in the “large p, small n paradigm.” Bayesian Statistics 7:723–732 115. West M, Harrison PJ (1997) Bayesian Forecasting and Dynamic Models. Springer, New York 116. Whitehead J (2006) Using Bayesian decision theory in doseescalation studies. In: Chevret S (ed) Statistical Methods for Dose-Finding Experiments. Wiley, New York, pp 149–171

Bivariate (Two-dimensional) Wavelets

Bivariate (Two-dimensional) Wavelets BIN HAN Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, Canada Article Outline Glossary Definitions Introduction Bivariate Refinable Functions and Their Properties The Projection Method Bivariate Orthonormal and Biorthogonal Wavelets Bivariate Riesz Wavelets Pairs of Dual Wavelet Frames Future Directions Bibliography Glossary Dilation matrix A 2  2 matrix M is called a dilation matrix if all the entries of M are integers and all the eigenvalues of M are greater than one in modulus. Isotropic dilation matrix A dilation matrix M is said to be isotropic if M is similar to a diagonal matrix and all its eigenvalues have the same modulus. Wavelet system A wavelet system is a collection of square integrable functions that are generated from a finite set of functions (which are called wavelets) by using integer shifts and dilations. Definitions Throughout this article, R; C; Z denote the real line, the complex plane, and the set of all integers, respectively. For 1 6 p 6 1, L p (R2 ) denotes the set of all Lebesgue p measurable bivariate functions f such that k f kL (R2 ) :D p R p 2 R2 j f (x)j dx < 1. In particular, the space L2 (R ) of square integrable functions is a Hilbert space under the inner product Z h f ; gi :D

R2

f (x)g(x)dx ;

f ; g 2 L2 (R2 ) ;

where g(x) denotes the complex conjugate of the complex number g(x). In applications such as image processing and computer graphics, the following are commonly used isotropic

dilation matrices: 1 1 Mp2 D ; 1 1 2 1 ; Mp3 D 1 2

1 1 d dI2 D 0

Q p2 D

1 ; 1 0 ; d

(1)

where d is an integer with jdj > 1. Using a dilation matrix M, a bivariate M-wavelet system is generated by integer shifts and dilates from a finite set f 1 ; : : : ; L g of functions in L2 (R2 ). More precisely, the set of all the basic wavelet building blocks of an M-wavelet system generated by f 1 ; : : : ; L g is given by 1

X M (f

;:::;

:D f

`;M j;k

L

g)

: j 2 Z; k 2 Z2 ; ` D 1; : : : ; Lg ;

(2)

where `;M j;k (x)

:D j det Mj j/2

`

(M j x  k) ;

x 2 R2 :

(3)

One of the main goals in wavelet analysis is to find wavelet systems X M (f 1 ; : : : ; L g) with some desirable properties such that any two-dimensional function or signal f 2 L2 (R2 ) can be sparsely and efficiently represented under the M-wavelet system X M (f 1 ; : : : ; L g): f D

L X X X

h`j;k ( f )

`;M j;k ;

(4)

`D1 j2Z k2Z2

where h`j;k : L2 (R2 ) 7! C are linear functionals. There are many types of wavelet systems studied in the literature. In the following, let us outline some of the most important types of wavelet systems. Orthonormal Wavelets We say that f 1 ; : : : ; L g generates an orthonormal Mwavelet basis in L2 (R2 ) if the system X M (f 1 ; : : : ; L g) is an orthonormal basis of the Hilbert space L2 (R2 ). That is, the linear span of elements in X M (f 1 ; : : : ; L g) is dense in L2 (R2 ) and h

`;M j;k ;

`0 ;M j0 ;k 0 i 0

D ı``0 ı j j0 ı kk 0 ;

8 j; j 2 Z ; k; k 0 2 Z2 ; `; `0 D 1; : : : ; L ;

(5)

where ı denotes the Dirac sequence such that ı0 D 1 and ı k D 0 for all k ¤ 0. For an orthonormal wavelet basis X M (f 1 ; : : : ; L g), the linear functional h`j;k in (4) is

275

276

Bivariate (Two-dimensional) Wavelets

`;M j;k i and the representation in (4)

given by h`j;k ( f ) D h f ; becomes f D

L X X X `D1 j2Z

hf;

`;M j;k i

`;M j;k

f 2 L2 (R2 )

;

(6)

k2Z2

with the series converging in L2 (R2 ). Riesz Wavelets We say that f 1 ; : : : ; L g generates a Riesz M-wavelet basis in L2 (R2 ) if the system X M (f 1 ; : : : ; L g) is a Riesz basis of L2 (R2 ). That is, the linear span of elements in X M (f 1 ; : : : ; L g) is dense in L2 (R2 ) and there exist two positive constants C1 and C2 such that 2   X L X X L X X X  `;M  ` 2 `  C1 jc j;k j 6  c j;k j;k     2 2 `D1 j2Z k2Z `D1 j2Z k2Z 2 L 2 (R )

6 C2

L X X X

jc `j;k j2

`D1 j2Z k2Z2

for all finitely supported sequences fc `j;k g j2Z;k2Z2 ;`D1;:::;L . Clearly, a Riesz M-wavelet generalizes an orthonormal M-wavelet by relaxing the orthogonality requirement in (5). For a Riesz basis X M (f 1 ; : : : ; L g), it is wellknown that there exists a dual Riesz basis f ˜ `; j;k : j 2 Z; k 2 Z2 ; ` D 1; : : : ; Lg of elements in L2 (R2 ) (this set is not necessarily generated by integer shifts and dilates from some finite set of functions) such that (5) still holds after 0 0 0 0 replacing `j0 ;k;M0 by ˜ ` ; j ;k . For a Riesz wavelet basis, the linear functional in (4) becomes h` ( f ) D h f ; ˜ `; j;k i. In j;k

fact, ˜ `; j;k :D F 1 ( is defined to be F ( f ) :D

`;M j;k ),

L X X X `D1 j2Z

hf;

where F : L2 (R2 ) 7! L2 (R2 )

`;M j;k i

`;M j;k

;

f 2 L2 (R2 ) :

k2Z2

(7) Wavelet Frames A further generalization of a Riesz wavelet is a wavelet frame. We say that f 1 ; : : : ; L g generates an M-wavelet frame in L2 (R2 ) if the system X M (f 1 ; : : : ; L g) is a frame of L2 (R2 ). That is, there exist two positive constants C1 and C2 such that C1 k f k2L 2 (R2 ) 6

L X X X

jh f ;

`;M 2 j;k ij

`D1 j2Z k2Z2

6 C2 k f k2L 2 (R2 ) ;

8 f 2 L2 (R2 ) :

(8)

It is not difficult to check that a Riesz M-wavelet is an M-wavelet frame. Then (8) guarantees that the frame operator F in (7) is a bounded and invertible linear operator. For an M-wavelet frame, the linear functional in (4) can be chosen to be h`j;k ( f ) D h f ; F 1 ( `;M j;k )i; however, such functionals may not be unique. The fundamental difference between a Riesz wavelet and a wavelet frame lies in that for any given function f 2 L2 (R2 ), the representation in (4) is unique under a Riesz wavelet while it may not be unique under a wavelet frame. The representation in (4) with the choice h`j;k ( f ) D h f ; F 1 ( `;M j;k )i is called the canonical representation of a wavelet frame. In other 2 words, fF 1 ( `;M j;k ) : j 2 Z; k 2 Z ; ` D 1; : : : ; Lg is called the canonical dual frame of the given wavelet frame X M (f 1 ; : : : ; L g). Biorthogonal Wavelets We say that (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g) generates a pair of biorthogonal M-wavelet bases in L2 (R2 ) if each of X M (f 1 ; : : : ; L g) and X M (f ˜ 1 ; : : : ; ˜ L g) is a Riesz basis of L2 (R2 ) and (5) still holds after replac0 0 ing `j0 ;k;M0 by ˜ `j0 ;k;M0 . In other words, the dual Riesz basis of X M (f 1 ; : : : ; L g) has the wavelet structure and is given by X M (f ˜ 1 ; : : : ; ˜ L g). For a biorthogonal wavelet, the wavelet representation in (4) becomes f D

L X X X `D1 j2Z

h f ; ˜ `;M j;k i

`;M j;k

;

f 2 L2 (R2 ) : (9)

k2Z2

Obviously, f 1 ; : : : ; L g generates an orthonormal M-wavelet basis in L2 (R2 ) if and only if (f 1 ; : : : ; L g; f 1 ; : : : ; L g) generates a pair of biorthogonal M-wavelet bases in L2 (R2 ). Dual Wavelet Frames Similarly, we have the notion of a pair of dual wavelet frames. We say that (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g) generates a pair of dual M-wavelet frames in L2 (R2 ) if each of X M (f 1 ; : : : ; L g) and X M (f ˜ 1 ; : : : ; ˜ L g) is an M-wavelet frame in L2 (R2 ) and h f ; gi D

L X X X `D1 j2Z

h f ; ˜ `;M j;k ih

`;M j;k ; gi

;

k2Z2

f ; g 2 L2 (R2 ) : It follows from the above identity that (9) still holds for a pair of dual M-wavelet frames. We say that f 1 ; : : : ; L g generates a tight M-wavelet frame if (f 1 ; : : : ; L g;

Bivariate (Two-dimensional) Wavelets

f 1 ; : : : ; L g) generates a pair of dual M-wavelet frames in L2 (R2 ). For a tight wavelet frame f 1 ; : : : ; L g, the wavelet representation in (6) still holds. So, a tight wavelet frame is a generalization of an orthonormal wavelet basis. Introduction Bivariate (two-dimensional) wavelets are of interest in representing and processing two dimensional data such as images and surfaces. In this article we shall discuss some basic background and results on bivariate wavelets. We denote ˘ J the set of all bivariate polynomials of total degree at most J. For compactly supported functions 1 ; : : : ; L , we say that f 1 ; : : : ; L g has J vanishing moments if hP; ` i D 0 for all ` D 1; : : : ; L and all polynomials P 2 ˘ J1 . The advantages of wavelet representations largely lie in the following aspects: 1. There is a fast wavelet transform (FWT) for computing the wavelet coefficients h`j;k ( f ) in the wavelet representation (4). 2. The wavelet representation has good time and frequency localization. Roughly speaking, the basic building blocks `;M have good time localization and j;k smoothness. 3. The wavelet representation is sparse. For a smooth function f , most wavelet coefficients are negligible. Generally, the wavelet coefficient h`j;k ( f ) only depends on the information of f in a small neighborhood of the ˜ `;M support of `;M j;k (more precisely, the support of j;k if h` ( f ) D h f ; ˜ `;M i). If f in a small neighborhood of j;k

j;k

the support of `;M j;k is smooth or behaves like a polynomial to certain degree, then by the vanishing moments of the dual wavelet functions, h`j;k ( f ) is often negligible. 4. For a variety of function spaces B such as Sobolev and Besov spaces, its norm k f kB , f 2 B, is equivalent to certain weighted sequence norm of its wavelet coefficients fh`j;k ( f )g j2Z;k2Z2 ;`D1;:::;L . For more details on advantages and applications of wavelets, see [6,8,14,16,21,58]. For a dilation matrix M D dI2 , the easiest way for obtaining a bivariate wavelet is to use the tensor product method. In other words, all the functions in the generator set f 1 ; : : : ; L g  L2 (R2 ) take the form ` (x; y) D f ` (x)g ` (y), x; y 2 R, where f ` and g ` are some univariate functions in L2 (R). However, tensor product (also called separable) bivariate wavelets give preference to the horizontal and vertical directions which may not be desirable in applications such as image processing ([9,16,50,58]).

Also, for a non-diagonal dilation matrix, it is difficult or impossible to use the tensor product method to obtain bivariate wavelets. Therefore, nonseparable bivariate wavelets themselves are of importance and interest in both theory and application. In this article, we shall address several aspects of bivariate wavelets with a general twodimensional dilation matrix. Bivariate Refinable Functions and Their Properties In order to have a fast wavelet transform to compute the wavelet coefficients in (4), the generators 1 ; : : : ; L in a wavelet system are generally obtained from a refinable function  via a multiresolution analysis [6,15,16,58]. Let M be a 2  2 dilation matrix. For a function  in L2 (R2 ), we say that  is M-refinable if it satisfies the following refinement equation X a k (Mx  k) ; a.e. x 2 R2 ; (10) (x) D j det Mj k2Z2

where a : Z2 7! C is a finitely supported sequence on P Z2 satisfying k2Z2 a k D 1. Such a sequence a is often called a mask in wavelet analysis and computer graphics, or a low-pass filter in signal processing. Using the Fourier transform ˆ of  2 L2 (R2 ) \ L1 (R2 ) and the Fourier series aˆ of a, which are defined to be Z ˆ () :D (x)e i x dx and R2 X ˆ a k e i k ;  2 R2 ; (11) a() :D k2Z2

the refinement equation (10) can be equivalently rewritten as ˆ T ) D a() ˆ ; ˆ () (M

 2 R2 ;

(12)

where M T denotes the transpose of the dilation matrix M. ˆ D 1 and aˆ is a trigonometric polynomial, one Since a(0) can define a function ˆ by ˆ () :D

1 Y

aˆ((M T ) j ) ;

 2 R2 ;

(13)

jD1

with the series converging uniformly on any compact set of R2 . It is known ([4,16]) that  is a compactly supported tempered distribution and clearly  satisfies the refinement equation in (12) with mask a. We call  the standard refinable function associated with mask a and dilation M. Now the generators 1 ; : : : ; L of a wavelet system are often obtained from the refinable function  via c`

b ˆ (M T ) :D a` ()() ;

 2 R2 ; ` D 1; : : : ; L ;

277

278

Bivariate (Two-dimensional) Wavelets

for some 2-periodic trigonometric polynomials ab` . Such sequences a` are called wavelet masks or high-pass filters. Except for very few special masks a such as masks for box spline refinable functions (see [20] for box splines), most refinable functions , obtained in (13) from mask a and dilation M, do not have explicit analytic expressions and a so-called cascade algorithm or subdivision scheme is used to approximate the refinable function . Let B be a Banach space of bivariate functions. Starting with a suitable initial function f 2 B, we iteratively compute a sen quence fQ a;M f g1 nD0 of functions, where Q a;M f :D j det Mj

X

a k f (M  k) :

(14)

k2Z2 n If the sequence fQ a;M f g1 nD0 of functions converges to some f1 2 B in the Banach space B, then we have Q a;M f1 D f1 . That is, as a fixed point of the cascade operator Q a;M , f1 is a solution to the refinement equation (10). A cascade algorithm plays an important role in the study of refinable functions in wavelet analysis and of subdivision surfaces in computer graphics. For more details on cascade algorithms and subdivision schemes, see [4,19,21,22,23,31,36,54] and numerous references therein. One of the most important properties of a refinable function  is its smoothness, which is measured by its Lp smoothness exponent p () and is defined to be

p () :D supf :  2 Wp (R2 )g ;

1 6 p 6 1 ; (15)

where Wp (R2 ) denotes the fractional Sobolev space of order > 0 in L p (R2 ). The notion of sum rules of a mask is closely related to the vanishing moments of a wavelet system ([16,44]). For a bivariate mask a, we say that a satisfies the sum rules of order J with the dilation matrix M if X

a jCM k P( jCM k) D

k2Z2

X

a M k P(M k); 8P 2 ˘ J1 :

k2Z2

(16) 



Or equivalently, @1 1 @2 2 aˆ(2 ) D 0 for all 1 C 2 < J and  2 [(M T )1 Z2 ]nZ2 , where @1 and @2 denote the partial derivatives in the first and second coordinate, respectively. Throughout the article, we denote sr(a,M) the highest order of sum rules satisfied by the mask a with the dilation matrix M. To investigate various properties of refinable functions, next we introduce a quantity p (a; M) in wavelet analysis (see [31]). For a bivariate mask a and a dilation

matrix M, we denote

p (a; M) :D  log (M) [j det Mj11/p (a; M; p)] ; 16 p 61;

(17)

where (M) denotes the spectral radius of M and n : (a; M; p) :D max lim sup ka n;(1 ;2 ) k1/n ` p (Z2 ) n!1

o 1 C 2 D sr(a; M); 1 ; 2 2 N [ f0g ; where the sequence a n;(1 ;2 ) is defined by its Fourier series as follows: for  D (1 ; 2 ) 2 R2 ,

3

a n;(1 ;2 ) () :D (1  ei1 )1 (1  e i2 )2 T n1 ˆ ˆ ˆ T ) a() ˆ : a((M ) ) a((M T )n2 )    a(M

It is known that ([31,36]) p (a; M) > q (a; M) > p (a; M) C (1/q  1/p) log (M) j det Mj for all 1 6 p 6 q 6 1. Parts of the following result essentially appeared in various forms in many papers in the literature. For the study of refinable functions and cascade algorithms, see [4,10,12,15,16,22,23,25,29,31,45,48,53] and references therein. The following is from [31]. Theorem 1 Let M be a 2  2 isotropic dilation matrix and a be a finitely supported bivariate mask. Let  denote the standard refinable function associated with mask a and dilation M. Then p () > p (a; M) for all 1 6 p 6 1. If the shifts of  are stable, that is, f(  k) : k 2 Z2 g is a Riesz system in L p (R2 ), then p () D p (a; M). Moreover, for every nonnegative integer J, the following statements are equivalent. 1. For every compactly supported function f 2 WpJ (R2 ) (if p D 1, we require f 2 C J (R2 )) such that fˆ(0) D 1   and @1 1 @2 2 fˆ(2 k) D 0 for all 1 C 2 < J and n 2 k 2 Z nf0g, the cascade sequence fQ a;M f g1 nD0 con-

verges in the Sobolev space WpJ (R2 ) (in fact, the limit function is ). 2. p (a; M) > J.

Symmetry of a refinable function is also another important property of a wavelet system in applications. For example, symmetry is desirable in order to handle the boundary of an image or to improve the visual quality of a surface ([16,21,22,32,50,58]). Let M be a 2  2 dilation matrix and G be a finite set of 2  2 integer matrices. We say that G is a symmetry group with respect to M ([27,29]) if G is a group under matrix

Bivariate (Two-dimensional) Wavelets

multiplication and MEM 1 2 G for all E 2 G. In dimension two, two commonly used symmetry groups are D4 and D6 :

 1 0 1 0 0 1 0 1 ;˙ ;˙ ;˙ D4 :D ˙ 0 1 0 1 1 0 1 0 and

1 D6 :D ˙ 0 0 ˙ 1

0 0 ;˙ 1 1 1 1 ;˙ 0 0

1 1 ;˙ 1 1 1 1 ;˙ 1 1

1 ; 0  0 : 1

For symmetric refinable functions, we have the following result ([32, Proposition 1]): Proposition 2 Let M be a 2  2 dilation matrix and G be a symmetry group with respect to M. Let a be a bivariate mask and  be the standard refinable function associated with mask a and dilation M. Then a is G-symmetric with center ca : a E(kc a )Cc a D a k for all k 2 Z2 and E 2 G, if and only if,  is G-symmetric with center c: (E(c)Cc) D ;

8E2G

if 1 (a; M) > 0 and a is an interpolatory mask with dilation M, that is, a0 D j det Mj1 and a M k D 0 for all k 2 Z2 nf0g. Example 6 Let M D 2I2 and ˆ 1 ; 2 ) : D cos(1 /2) cos(2 /2) cos(1 /2 C 2 /2) a(  1 C 2 cos(1 ) C 2 cos(2 ) C 2 cos(1 C 2 )  cos(21 C 2 )  cos(1 C 22 )   cos(1  2 ) /4 : (18) Then sr(a; 2I2 ) D 4, a is D6 -symmetric with center 0 and a is an interpolatory mask with dilation 2I 2 . Since

2 (a; 2I2 )  2:44077, so, 1 (a; 2I2 ) > 2 (a; 2I2 )  1  1:44077 > 0. By Theorem 5, the associated standard refinable function  is interpolating. The butterfly interpolatory subdivision scheme for triangular meshes is based on this mask a (see [23]). For more details on bivariate interpolatory masks and interpolating refinable functions, see [4,19,22,23,25,26,27, 31,37,38,39,59].

with c :D (MI2 )1 c a : The Projection Method

In the following, let us present some examples of bivariate refinable functions. ˆ 1 ; 2 ) :D cos4 (1 /2) Example 3 Let M D 2I2 and a( 4 cos (2 /2). Then a is a tensor product mask with sr(a; 2I2 ) D 4 and a is D4 -symmetric with center 0. The associated standard refinable function  is the tensor product spline of order 4. The Catmull-Clark subdivision scheme for quadrilateral meshes in computer graphics is based on this mask a ([21]). ˆ 1 ; 2 ) :D cos2 (1 /2) Example 4 Let M D 2I2 and a( 2 2 cos (2 /2) cos (1 /2 C 2 /2). Then sr(a; 2I2 ) D 4 and a is D6 -symmetric with center 0. The associated refinable function  is the convolution of the three direction box spline with itself ([20]). The Loop subdivision scheme for triangular meshes in computer graphics is based on this mask a ([21]). Another important property of a refinable function is interpolation. We say that a bivariate function  is interpolating if  is continuous and (k) D ı k for all k 2 Z2 . Theorem 5 Let M be a 2  2 dilation matrix and a be a finitely supported bivariate mask. Let  denote the standard refinable function associated with mask a and dilation M. Then  is an interpolating function if and only

In applications, one is interested in analyzing some optimal properties of multivariate wavelets. The projection method is useful for this purpose. Let r and s be two positive integers with r 6 s. Let P be an r  s real-valued matrix. For a compactly supported function  and a finitely supported mask a in dimension s, we define the projected function P and projected mask Pa in dimension r by ˆ T ) and c P() :D (P c Pa() :D

t X

ˆ T  C 2" j ) ; a(P

 2 Rr ;

(19)

jD1

where ˆ and aˆ are understood to be continuous and f"1 ; : : : ; " t g is a complete set of representatives of the distinct cosets of [PT Rr ]/Zs . If P is an integer matrix, then c D a(P ˆ T ). Pa() Now we have the following result on projected refinable functions ([27,28,34,35]). Theorem 7 Let N be an s  s dilation matrix and M be an r  r dilation matrix. Let P be an r  s integer matrix such that PN D MP and PZs D Zr . Let aˆ be a 2-periodic ˆ D 1 and trigonometric polynomials in s-variables with a(0)  the standard N-refinable function associated with mask a.

279

280

Bivariate (Two-dimensional) Wavelets

c T ) D c c Then sr(a; N) 6 sr(Pa; M) and P(M Pa()P(). That is, P is M-refinable with mask Pa. Moreover, for all 1 6 p 6 1,

p () 6 p (P) and j det Mj11/p (Pa; M; p) 6 j det Nj11/p (a; N; p) : If we further assume that (M) D (N), then p (a; N) 6

p (Pa; M). As pointed out in [34,35], the projection method is closely related to box splines. For a given r  s (direction) integer matrix  of rank r with r 6 s, the Fourier transform of its associated box spline M and its mask a are given by (see [20])

b

Y 1  e i k M () :D ik  

and

k2

ac  () :D

Y 1 C e i k ; 2

 2 Rr ;

(20)

the projected mask Pa is an interpolatory mask with dilation M and sr(Pa; M) > sr(a; j det MjI r ). Example 10 For M D Mp2 or M D Qp2 , we can take H :D Mp2 in Theorem 9. For more details on the projection method, see [25,27,28, 32,34,35]. Bivariate Orthonormal and Biorthogonal Wavelets In this section, we shall discuss the analysis and construction of bivariate orthonormal and biorthogonal wavelets. For analysis of biorthogonal wavelets, the following result is well-known (see [9,11,16,24,26,31,46,51,53] and references therein): Theorem 11 Let M be a 2  2 dilation matrix and a; a˜ be two finitely supported bivariate masks. Let  and ˜ be the standard M-refinable functions associated with masks a and a˜, respectively. Then ; ˜ 2 L2 (R2 ) and satisfy the biorthogonality relation

k2

where k 2  means that k is a column vector of  and k goes through all the columns of  once and only once. Let [0;1]s denote the characteristic function of the unit cube [0; 1]s . From (20), it is evident that the box spline M is just the projected function [0;1]s , since [0;1]s D M by [0;1]s () D [0;1]s ( T );  2 Rr . Note that M is 2I r -refinable with the mask a , since M (2) D ac  () M  (). As an application of the projection method in Theorem 7, we have ([25, Theorem 3.5.]):

b

b

2

1

Corollary 8 Let M D 2I s and a be an interpolatory mask with dilation M such that a is supported inside [3; 3]s . Then 1 (a; 2I s ) 6 2 and therefore,  62 C 2 (Rs ), where  is the standard refinable function associated with mask a and dilation 2I s . We use proof by contradiction. Suppose 1 (a; 2I s ) > 2. Let P D [1; 0; : : : ; 0] be a 1  s matrix. Then we must have sr(a; 2I s ) > 3, which, combining with other assumptions c D 1/2 C 9/16 cos()  1/16 cos(3). on a, will force Pa() Since 1 (Pa; 2) D 2, by Theorem 7, we must have

1 (a; 2I s ) 6 1 (Pa; 2) D 2. A contradiction. So, 1 (a; 2I s ) 6 2. In particular, the refinable function in the butterfly subdivision scheme in Example 6 is not C2 . The projection method can be used to construct interpolatory masks painlessly [27]). Theorem 9 Let M be an r  r dilation matrix. Then there is an r  r integer matrix H such that MZr D HZr and H r D j det MjI r . Let P :D j det Mj1 H. Then for any (tensor product) interpolatory mask a with dilation j det MjI r ,

˜  k)i D ı k ; h; (

k 2 Z2 ;

(21)

˜ M) > 0, and (a; a˜) is if and only if, 2 (a; M) > 0, 2 ( a; a pair of dual masks: X aˆ( C 2 )aˆ˜ ( C 2 ) D 1 ;  2 R2 ; (22) 2 M T

where  M T is a complete set of representatives of distinct cosets of [(M T )1 Z2 ]/Z2 with 0 2  M T . Moreover, if (21) holds and there exist 2-periodic trigonometric polynomi˜1 ; : : : ; a˜ m1 with m :D j det Mj such als ab1 ; : : : ; a m1 , ab that

1

M

1

b ()M

ˆb [ a; a 1 ;:::; a m1 ]

b ()

T

a˜ 1 ;:::; a˜ m1 ] [ aˆ˜ ;b

D Im ;

 2 R2 ; (23)

where for  2 R2 ,

b () :D

M

2

ˆb [ a; a 1 ;:::; a m1 ]

ˆ C 2 0 ) a(

6 ˆ C 2 1 ) 6 a( 6: 6: 4:

b ab ( C 2 )

ab1 ( C 2 0 )

3

: : : a m1 ( C 2 0 )

ab1 ( C 2 1 ) :: :

: : : m1 : : :: : :

1

b

ˆ C 2 m1 ) ab1 ( C 2 m1 ) : : : a m1 ( C 2 m1 ) a(

7 7 7 7 5

(24) with f0 ; : : : ;  m1 g :D  M T and 0 :D 0. Define : : : ; m1 ; ˜ 1 ; : : : ; ˜ m1 by c`

1;

b ˆ (M T ) :D a` ()() and

c ˆ˜ ; ˜ ` (M T ) :D ab ˜` ()()

` D 1; : : : ; m  1 :

(25)

˜ 1 ; : : : ; ˜ m1 g) generates a pair Then (f of biorthogonal M-wavelet bases in L2 (R2 ). 1; : : : ;

m1 g; f

Bivariate (Two-dimensional) Wavelets

As a direct consequence of Theorem 11, for orthonormal wavelets, we have Corollary 12 Let M be a 2  2 dilation matrix and a be a finitely supported bivariate mask. Let  be the standard refinable function associated with mask a and dilation M. Then  has orthonormal shifts, that is, h; (  k)i D ı k for all k 2 Z2 , if and only if, 2 (a; M) > 0 and a is an orthogonal mask (that is, (22) is satisfied with aˆ˜ being replaced ˆ If in addition there exist 2-periodic trigonometby a). ric polynomials ab1 ; : : : ; a m1 with m :D j det Mj such that M

b ()M

1 b b ]()

T

D I m . Then f

ˆb ˆ a 1 ;:::; a m1 [ a; a 1 ;:::; a m1 ] [ a; : : : ; m1 g, defined in (25), generates

1;

an orthonormal

M-wavelet basis in L2 (R2 ). The masks a and a˜ are called low-pass filters and the masks a 1 ; : : : ; a m1 ; a˜1 ; : : : ; a˜ m1 are called high-pass filters in the engineering literature. The equation in (23) is called the matrix extension problem in the wavelet litera˜ ture (see [6,16,46,58]). Given a pair of dual masks a and a, b b 1 m1 1 m1 though the existence of a ; : : : ; a ; a˜ ; : : : ; a˜ (without symmetry) in the matrix extension problem in (23) is guaranteed by the Quillen–Suslin theorem, it is far from a trivial task to construct them (in particular, with symmetry) algorithmically. For the orthogonal case, the general theory of the matrix extension problem in (23) with a˜ D a and a˜` D a` ; ` D 1; : : : ; m  1, remains unanswered. However, when j det Mj D 2, the matrix extension problem is trivial. In fact, letting  2  M T nf0g (that is,  M T D f0;  g, one defines ab1 () :D e i aˆ˜ ( C 2 ) ˜1 () :D e i a( ˆ C 2 ), where  2 Z2 satisfies and ab    D 1/2. It is not an easy task to construct nonseparable orthogonal masks with desirable properties for a general dilation matrix. For example, for the dilation matrix Qp2 in (1), it is not known so far in the literature [9] whether there is a compactly supported C1 Qp2 refinable function with orthonormal shifts. See [1,2,9, 25,27,30,32,50,51,53,57] for more details on construction of orthogonal masks and orthonormal refinable functions. However, for any dilation matrix, “separable” orthogonal masks with arbitrarily high orders of sum rules can be easily obtained via the projection method. Let M be a 2  2 dilation matrix. Then M D E diag(d1 ; d2 )F for some integer matrices E and F such that j det Ej D j det Fj D 1 and d1 ; d2 2 N. Let a be a tensor product orthogonal mask with the diagonal dilation matrix diag(d1 ; d2 ) satisfying the sum rules of any preassigned order J. Then the projected mask Ea (that is, (Ea) k :D a E 1 k ; k 2 Z2 ) is an orthogonal mask with dilation M and sr(Ea; M) > J. See [27, Corollary 3.4.] for more detail.

1

1

Using the high-pass filters in the matrix extension problem, for a given mask a, dual masks a˜ (without any guaranteed order of sum rules of the dual masks) of a can be obtained by the lifting scheme in [63] and the stable completion method in [3]. Special families of dual masks with sum rules can be obtained by the convolution method in [24, Proposition 3.7.] and [43], but the constructed dual masks generally have longer supports with respect to their orders of sum rules. Another method, which we shall cite here, for constructing all finitely supported dual masks with any preassigned orders of sum rules is the CBC (coset by coset) algorithm proposed in [5,25,26]. For  D (1 ; 2 ) and D (1 ; 2 ), we say 6  if

1 6 1 and 2 6 2 . Also we denote jj D 1 C 2 , ! :D 1 !2 ! and x  :D x11 x22 for x D (x1 ; x2 ). We denote ˝ M a complete set of representatives of distinct cosets of Z2 /[MZ2 ] with 0 2 ˝ M . The following result is from [25]. Theorem 13 (CBC Algorithm) Let M be a 2  2 dilation matrix and a be an interpolatory mask with dilation M. Let J be any preassigned positive integer. a ; jj < J, from the mask a by 1. Compute the quantities h the recursive formula: X X ! a :D ı  (1)jj a k k  : ha h

!(  )! 2 06 0, where aˆ˜ dilation M. If 2 (a; M) > 0 and 2 ( a;

1

b

T

is the (1; 1)-entry of the matrix [M b1 () ]1 , ˆ a ;:::; a m1 ] [ a; then f 1 ; : : : ; m1 g, which are defined in (25), generates a Riesz M-wavelet basis in L2 (R2 ). ˆ 1 ; 2 ) :D cos2 (1 /2) Example 17 Let M D 2I2 and a( cos2 (2 /2) cos2 (1 /2 C 2 /2) in Example 4. Then sr(a; 2I2 ) D 4 and a is D6 -symmetric with center 0. Define ([41,60]) ab1 ( ;  ) : D ei(1 C2 ) aˆ( C ;  ); 1

2

1

2

ab2 (1 ; 2 ) : D ei2 aˆ(1 ; 2 C ); ˆ 1 C ; 2 C ) : ab3 (1 ; 2 ) : D e i1 a( Then all the conditions in Theorem 16 are satisfied and f 1 ; 2 ; 3 g generates a Riesz 2I 2 -wavelet basis in L2 (R2 ) ([41]). This Riesz wavelet derived from the Loop scheme has been used in [49] for mesh compression in computer graphics with impressive performance. Pairs of Dual Wavelet Frames In this section, we mention a method for constructing bivariate dual wavelet frames. The following Oblique Extension Principle (OEP) has been proposed in [18] (and independently in [7]. Also see [17]) for constructing pairs of dual wavelet frames. Theorem 18 Let M be a 2  2 dilation matrix and a; a˜ be two finitely supported masks. Let ; ˜ be the standard ˜ respectively, such M-refinable functions with masks a and a, that ; ˜ 2 L2 (R2 ). If there exist 2-periodic trigonometric

Bivariate (Two-dimensional) Wavelets

˜1 ; : : : ; a˜bL such that (0) D 1, polynomials ; ab1 ; : : : ; abL ; ab b b b 1 L 1 a (0) D    D a (0) D a˜ (0) D    D a˜bL (0) D 0, and T

M

ˆb [(M T ) a; a 1 ;:::; abL ]

()M ˆ b1 bL () [ a˜ ; a˜ ;:::; a˜ ]

D diag(( C 0 ); : : : ; ( C  m1 )) ;

(26)

where f0 ; : : : ; m1 g D  M T with 0 D 0 in Theorem 11. Then (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g), defined in (25), generates a pair of dual M-wavelet frames in L2 (R2 ). For dimension one, many interesting tight wavelet frames and pairs of dual wavelet frames in L2 (R) have been constructed via the OEP method in the literature, for more details, see [7,17,18,24,30,52,61,62] and references therein. The application of the OEP in high dimensions is much more difficult, mainly due to the matrix extension problem in (26). The projection method can also be used to obtain pairs of dual wavelet frames ([34,35]). Theorem 19 Let M be an r  r dilation matrix and N be an s  s dilation matrix with r 6 s. Let P be an r  s integer matrix of rank r such that MP D PN and PT (Zr n[M T Zr ]) Zs n[N T Zs ]. Let 1 ; : : : ; L ; ˜ 1 ; : : : ; ˜ L be compactly supported functions in L2 (Rs ) such that

2 (

`

)>0

and 2 ( ˜ ` ) > 0

8 ` D 1; : : : ; L :

If (f 1 ; : : : ; L g; f ˜ 1 ; : : : ; ˜ L g) generates a pair of dual N-wavelet frames in L2 (Rs ), then (fP 1 ; : : : ; P L g; fP ˜ 1 ; : : : ; P ˜ L g) generates a pair of dual M-wavelet frames in L2 (Rr ). Example 20 Let  be an r  s (direction) integer matrix such that  T (Zr n[2Zr ]) Zs n[2Z2 ]. Let M D 2I r s and N D 2I s . Let f 1 ; : : : ; 2 1 g be the generators of the tensor product Haar orthonormal wavelet in dimension s, derived from the Haar orthonormal refinable function s  :D [0;1]s . Then by Theorem 19, f 1 ; : : : ;  2 1 g r generates a tight 2I r -wavelet frame in L2 (R ) and all the projected wavelet functions are derived from the refinable box spline function  D M . Future Directions There are still many challenging problems on bivariate wavelets. In the following, we only mention a few here. 1. For any 2  2 dilation matrix M, e. g., M D Qp2 in (1), can one always construct a family of MRA compactly supported orthonormal M-wavelet bases with arbitrarily high smoothness (and with symmetry if j det Mj > 2)?

2. The matrix extension problem in (23) for orthogonal masks. That is, for a given orthogonal mask a with dilation M, find finitely supported high-pass filters a 1 ; : : : ; a m1 (with symmetry if possible) such that (23) holds with a˜ D a and a˜` D a` . 3. The matrix extension problem in (23) for a given pair of dual masks with symmetry. That is, for a given pair of symmetric dual masks a and a˜ with dilation M, find finitely supported symmetric high-pass filters a1 ; : : : ; a m1 ; a˜1 ; : : : ; a˜ m1 such that (23) is satisfied. 4. Directional bivariate wavelets. In order to handle edges of different orientations in images, directional wavelets are of interest in applications. See [13,55] and many references on this topic. Bibliography Primary Literature 1. Ayache A (2001) Some methods for constructing nonseparable, orthonormal, compactly supported wavelet bases. Appl Comput Harmon Anal 10:99–111 2. Belogay E, Wang Y (1999) Arbitrarily smooth orthogonal nonseparable wavelets in R2 . SIAM J Math Anal 30:678–697 3. Carnicer JM, Dahmen W, Peña JM (1996) Local decomposition of refinable spaces and wavelets. Appl Comput Harmon Anal 3:127–153 4. Cavaretta AS, Dahmen W, Micchelli CA (1991) Stationary subdivision. Mem Amer Math Soc 93(453)1–186 5. Chen DR, Han B, Riemenschneider SD (2000) Construction of multivariate biorthogonal wavelets with arbitrary vanishing moments. Adv Comput Math 13:131–165 6. Chui CK (1992) An introduction to wavelets. Academic Press, Boston 7. Chui CK, He W, Stöckler J (2002) Compactly supported tight and sibling frames with maximum vanishing moments. Appl Comput Harmon Anal 13:224–262 8. Cohen A (2003) Numerical analysis of wavelet methods. NorthHolland, Amsterdam 9. Cohen A, Daubechies I (1993) Nonseparable bidimensional wavelet bases. Rev Mat Iberoamericana 9:51–137 10. Cohen A, Daubechies I (1996) A new technique to estimate the regularity of refinable functions. Rev Mat Iberoamericana 12:527–591 11. Cohen A, Daubechies I, Feauveau JC (1992) Biorthogonal bases of compactly supported wavelets. Comm Pure Appl Math 45:485–560 12. Cohen A, Gröchenig K, Villemoes LF (1999) Regularity of multivariate refinable functions. Constr Approx 15:241–255 13. Cohen A, Schlenker JM (1993) Compactly supported bidimensional wavelet bases with hexagonal symmetry. Constr Approx 9:209–236 14. Dahmen W (1997) Wavelet and multiscale methods for operator equations. Acta Numer 6:55–228 15. Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Comm Pure Appl Math 41:909–996 16. Daubechies I (1992) Ten Lectures on Wavelets, CBMS-NSF Series. SIAM, Philadelphia

283

284

Bivariate (Two-dimensional) Wavelets

17. Daubechies I, Han B (2004) Pairs of dual wavelet frames from any two refinable functions. Constr Approx 20:325–352 18. Daubechies I, Han B, Ron A, Shen Z (2003) Framelets: MRAbased constructions of wavelet frames. Appl Comput Harmon Anal 14:1–46 19. Dahlke S, Gröchenig K, Maass P (1999) A new approach to interpolating scaling functions. Appl Anal 72:485–500 20. de Boor C, Hollig K, Riemenschneider SD (1993) Box splines. Springer, New York 21. DeRose T, Forsey DR, Kobbelt L, Lounsbery M, Peters J, Schröder P, Zorin D (1998) Subdivision for Modeling and Animation, (course notes) 22. Dyn N, Levin D (2002) Subdivision schemes in geometric modeling. Acta Numer 11:73–144 23. Dyn N, Gregory JA, Levin D (1990) A butterfly subdivision scheme for surface interpolation with tension control. ACM Trans Graph 9:160–169 24. Han B (1997) On dual wavelet tight frames. Appl Comput Harmon Anal 4:380–413 25. Han B (2000) Analysis and construction of optimal multivariate biorthogonal wavelets with compact support. SIAM Math Anal 31:274–304 26. Han B (2001) Approximation properties and construction of Hermite interpolants and biorthogonal multiwavelets. J Approx Theory 110:18–53 27. Han B (2002) Symmetry property and construction of wavelets with a general dilation matrix. Lin Algeb Appl 353:207–225 28. Han B (2002) Projectable multivariate refinable functions and biorthogonal wavelets. Appl Comput Harmon Anal 13:89–102 29. Han B (2003) Computing the smoothness exponent of a symmetric multivariate refinable function. SIAM J Matrix Anal Appl 24:693–714 30. Han B (2003) Compactly supported tight wavelet frames and orthonormal wavelets of exponential decay with a general dilation matrix. J Comput Appl Math 155:43–67 31. Han B (2003) Vector cascade algorithms and refinable function vectors in Sobolev spaces. J Approx Theory 124:44–88 32. Han B (2004) Symmetric multivariate orthogonal refinable functions. Appl Comput Harmon Anal 17:277–292 33. Han B (2006) On a conjecture about MRA Riesz wavelet bases. Proc Amer Math Soc 134:1973–1983 34. Han B (2006) The projection method in wavelet analysis. In: Chen G, Lai MJ (eds) Modern Methods in Mathematics. Nashboro Press, Brentwood, pp. 202–225 35. Han B (2008) Construction of wavelets and framelets by the projection method. Int J Appl Math Appl 1:1–40 36. Han B, Jia RQ (1998) Multivariate refinement equations and convergence of subdivision schemes. SIAM J Math Anal 29:1177–1199 37. Han B, Jia RQ (1999) Optimal interpolatory subdivision schemes in multidimensional spaces. SIAM J Numer Anal 36:105–124 38. Han B, Jia RQ (2002) Quincunx fundamental refinable functions and quincunx biorthogonal wavelets. Math Comp 71:165–196 39. Han B, Jia RQ (2006) Optimal C 2 two-dimensional interpolatory ternary subdivision schemes with two-ring stencils. Math Comp 75:1287–1308 40. Han B, Jia RQ (2007) Characterization of Riesz bases of wavelets generated from multiresolution analysis. Appl Comput Harmon Anal 23:321–345

41. Han B, Shen Z (2005) Wavelets from the Loop scheme. J Fourier Anal Appl 11:615–637 42. Han B, Shen Z (2006) Wavelets with short support. SIAM J Math Anal 38:530–556 43. Ji H, Riemenschneider SD, Shen Z (1999) Multivariate compactly supported fundamental refinable functions, duals, and biorthogonal wavelets. Stud Appl Math 102:173–204 44. Jia RQ (1998) Approximation properties of multivariate wavelets. Comp Math 67:647–665 45. Jia RQ (1999) Characterization of smoothness of multivariate refinable functions in Sobolev spaces. Trans Amer Math Soc 351:4089–4112 46. Jia RQ, Micchelli CA (1991) Using the refinement equation for the construction of pre-wavelets II: Power of two. In: Laurent PJ, Le Méhauté A, Schumaker LL (eds) Curves and Surfaces. Academic Press, New York, pp. 209–246 47. Jia RQ, Wang JZ, Zhou DX (2003) Compactly supported wavelet bases for Sobolev spaces. Appl Comput Harmon Anal 15:224– 241 48. Jiang QT (1998) On the regularity of matrix refinable functions. SIAM J Math Anal 29:1157–1176 49. Khodakovsky A, Schröder P, Sweldens W (2000) Progressive geometry compression. Proc SIGGRAPH 50. Kovaˇcevi´c J, Vetterli M (1992) Nonseparable multidimensional perfect reconstruction filter banks and wavelet bases for Rn . IEEE Trans Inform Theory 38:533–555 51. Lai MJ (2006) Construction of multivariate compactly supported orthonormal wavelets. Adv Comput Math 25:41–56 52. Lai MJ, Stöckler J (2006) Construction of multivariate compactly supported tight wavelet frames. Appl Comput Harmon Anal 21:324–348 53. Lawton W, Lee SL, Shen Z (1997) Stability and orthonormality of multivariate refinable functions. SIAM J Math Anal 28:999– 1014 54. Lawton W, Lee SL, Shen Z (1998) Convergence of multidimensional cascade algorithm. Numer Math 78:427–438 55. Le Pennec E, Mallat S (2005) Sparse geometric image representations with bandelets. IEEE Trans Image Process 14:423–438 56. Lorentz R, Oswald P (2000) Criteria for hierarchical bases in Sobolev spaces. Appl Comput Harmon Anal 8:32–85 57. Maass P (1996) Families of orthogonal two-dimensional wavelets. SIAM J Math Anal 27:1454–1481 58. Mallat S (1998) A wavelet tour of signal processing. Academic Press, San Diego 59. Riemenschneider SD, Shen Z (1997) Multidimensional interpolatory subdivision schemes. SIAM J Numer Anal 34:2357– 2381 60. Riemenschneider SD, Shen ZW (1992) Wavelets and prewavelets in low dimensions. J Approx Theory 71:18–38 61. Ron A, Shen Z (1997) Affine systems in L2 (Rd ): the analysis of the analysis operator. J Funct Anal 148:408–447 62. Ron A, Shen Z (1997) Affine systems in L2 (Rd ) II: dual systems. J Fourier Anal Appl 3:617–637 63. Sweldens W (1996) The lifting scheme: a custom-design construction of biorthogonal wavelets. Appl Comput Harmon Anal 3:186–200

Books and Reviews Cabrelli C, Heil C, Molter U (2004) Self-similarity and multiwavelets in higher dimensions. Mem Amer Math Soc 170(807):1–82

Branching Processes

Branching Processes MIKKO J. ALAVA 1 , KENT BÆKGAARD LAURITSEN2 Department of Engineering Physics, Espoo University of Technology, Espoo, Finland 2 Research Department, Danish Meteorological Institute, Copenhagen, Denmark

1

Article Outline Glossary Definition of the Subject Introduction Branching Processes Self-Organized Branching Processes Scaling and Dissipation Network Science and Branching Processes Conclusions Future Directions Acknowledgments Bibliography Glossary Markov process A process characterized by a set of probabilities to go from a certain state at time t to another state at time t C 1. These transition probabilities are independent of the history of the process and only depend on a fixed probability assigned to the transition. Critical properties and scaling The behavior of equilibrium and many non-equilibrium systems in steady states contain critical points where the systems display scale invariance and the correlation functions exhibit an algebraic behavior characterized by so-called critical exponents. A characteristics of this type of behavior is the lack of finite length and time scales (also reminiscent of fractals). The behavior near the critical points can be described by scaling functions that are universal and that do not depend on the detailed microscopic dynamics. Avalanches When a system is perturbed in such a way that a disturbance propagates throughout the system one speak of an avalanche. The local avalanche dynamics may either conserve energy (particles) or dissipate energy. The avalanche may also loose energy when it reaches the system boundary. In the neighborhood of a critical point the avalanche distribution is described by a power-law distribution. Self-organized criticality (SOC) SOC is the surprising “critical” state in which many systems from physics to biology to social ones find themselves. In physics jar-

gon, they exhibit scale-invariance, which means that the dynamics – consisting of avalanches – has no typical scale in time or space. The really necessary ingredient is that there is a hidden, fine-tuned balance between how such systems are driven to create the dynamic response, and how they dissipate the input (“energy”) to still remain in balance. Networks These are descriptions of interacting systems, where in graph theoretical language nodes or vertices are connected by links or edges. The interesting thing can be the structure of the network, and its dynamics, or that of a process on the top of it like the spreading of computer viruses on the Internet. Definition of the Subject Consider the fate of a human population on a small, isolated island. It consists of a certain number of individuals, and the most obvious question, of importance in particular for the inhabitants of the island, is whether this number will go to zero. Humans die and reproduce in steps of one, and therefore one can try to analyze this fate mathematically by writing down what is called master equations, to describe the dynamics as a “branching process” (BP). The branching here means that if at time t D 0 there are N humans, at the next step t D 1 there can be N  1 (or N C 1 or N C 2 if the only change from t D 0 was that a pair of twins was born). The outcome will depend in the simplest case on a “branching number”, or the number of offspring  that a human being will have [1,2,3,4]. If the offspring created are too few, then the population will decay, or reach an “absorbing state” out of which it will never escape. Likewise if they are many (the Malthusian case in reality, perhaps), exponential growth in time will ensue in the simplest case. In between, there is an interesting twist: a phase transition that separates these two outcomes at a critical value of c . As is typical of such transitions in statistical physics one runs into scale-free behavior. The lifetime of the population suddenly has no typical scale, and its total size will be a stochastic quantity, described by a probability distribution that again has no typical scale exactly at c . The example of a small island also illustrates the many different twists that one can find in branching processes. The population can be “spatially dispersed” such that the individuals are separated by distance. There are in fact two interacting populations, called “male” and “female”, and if the size of one of the populations becomes zero the other one will die out soon as well. The people on the island eat, and there is thus a hidden variable in the dynamics, the availability of food. This causes a history

285

286

Branching Processes

effect which makes the dynamics of human population what is called “non-Markovian”. Imagine as above, that we look at the number of persons on the island at discrete times. A Markovian process is such that the probabilities to go from a state (say of) N to state N C ı depends only on the fixed probability assigned to the “transition” N ! N C ıN. Clearly, any relatively faithful description of the death and birth rates of human beings has to consider the average state of nourishment, or whether there is enough food for reproduction. Introduction Branching processes are often perfect models of complex systems, or in other words exhibit deep complexity themselves. Consider the following example of a one-dimensional model of activated random walkers [5]. Take a line of sites x i ; i D 1 : : : L. Fill the sites randomly to a certain density n D N/L, where N is the pre-set number of individuals performing the activated random walk. Now, let us apply the simple rule, that if there are two or more walkers at the same xj , two of them get “activated” and hop to j  1 or j C 1, again at random. In other words, this version of the drunken bar-hoppers problem has the twist that they do not like each other. If the system is “periodic” or i D 1 is connected to i D L, then the dynamics is controlled by the density n. For a critical value nc (estimated by numerical simulations to be about 0.9488. . . [6]) a phase transition takes place, such that for n < nc the asymptotic state is the “absorbing one”, where all the walkers are immobilized since N i D 1 or 0. In the opposite case for n > nc there is an active phase such that (in the infinite L-limit) the activity persists forever. This particular model is unquestionably non-Markovian if one only considers the number of active walkers or their density . One needs to know the full state of N i to be able to write down exact probabilities for how changes in a discrete step of time. The most interesting things happen if one changes the one-dimensional lattice by adapting two new rules. If a walker walks out (to i D 0 or i D L C 1), it disappears. Second, if there are no active ones ( D 0), one adds one new walker randomly to the system. Now the activity (t) is always at a marginal value, and the long-term average of n becomes a (possibly L-dependent) constant such that the first statement is true, i. e. the system becomes critical. With these rules, the model of activated random walkers is also known as the Manna model after the Indian physicist [5], and it exhibits the phenomenon dubbed “Self-Organized Criticality” (SOC) [7]. Figure 1 shows an example of the dynamics by using what is called an “activity plot”,

Branching Processes, Figure 1 We follow the “activity” in a one-dimensional system of random activated walkers. The walkers stroll around the x-axis, and the pattern becomes in fact scale-invariant. This system is such that some of the walkers disappear (by escaping through open boundaries) and to maintain a constant density new ones are added. One question one may ask is what is the waiting time, that another (or the same) walker gets activated at the same location after a time of inactivity. As one can see from the figure, this can be a result of old activity getting back, or a new “avalanche” starting from the addition of an outsider (courtesy of Lasse Laurson)

where those locations xi are marked both in space and time which happen to contain just-activated walkers. One can now apply several kinds of measures to the system, but the figure already hints about the reasons why these simple models have found much interest. The structure of the activity is a self-affine fractal (for discussions about fractals see other reviews in this volume). The main enthusiasm about SOC comes from the avalanches. These are in other words the bursts of activity that separate quiescent times (when D 0). The silence is broken by the addition of a particle or a walker, and it creates an integrated quantity (volume) of activity, RT s D 0 (t)dt, where for 0 < t < T, > 0 and D 0 at the endpoints t and T. The original boost to SOC took place after Per Bak, Chao Tang, and Kay Wiesenfeld published in 1987 a highly influential paper in the premium physics journal Physical Review Letters, and introduced what is called the Bak–Tang–Wiesenfeld (BTW) sandpile model – of which the Manna one is a relative [7]. The BTW and Manna models and many others exhibit the important property that the avalanche sizes s have a scale-free probability distribution but note that this simple criticality is not always true, not even for the BTW model which shows so-called multiscaling [8,9]. Scale-free probability distri-

Branching Processes

butions are usually written as P(s) s

 s

Ds

f s (s/L ) ;

(1)

here all the subscripts refer to the fact that we look at avalanches. s and Ds define the avalanche exponent and the cut-off exponent, respectively. f s is a cut-off function that together with Ds includes the fact that the avalanches are restricted somehow by the system size (in the onedimensional Manna model, if s becomes too large many walkers are lost, and first n drops and then goes to zero). Similar statements can be made about the avalanche durations (T), area or support A, and so forth [10,11,12,13,14]. The discovery of simple-to-define models, yet of very complex behavior has been a welcome gift since there are many, many phenomena that exhibit apparently scale“free statistics and/or bursty or intermittent dynamics similar to SOC models. These come from natural sciences – but not only, since economy and sociology are also giving rise to cases that need explanations. One particular field where these ideas have found much interest is the physics of materials ranging from understanding earthquakes to looking at the behavior of vortices in superconductors. The Gutenberg–Richter law of earthquake magnitudes is a powerlaw, and one can measure similar statistics from fracture experiments by measuring acoustic emission event energies from tensile tests on ordinary paper [15]. The modern theory of networks is concerned with graph or network structures which exhibit scale-free characteristics, in particular the number of neighbors a vertex has is often a stochastic quantity exhibiting a power-law-like probability distribution [16,17]. Here, branching processes are also finding applications. They can describe the development of network structures, e. g., as measured by the degree distribution, the number of neighbors a node (vertex) is connected to, and as an extension of more usual models give rise to interesting results when the dynamics of populations on networks are being considered. Whenever comparing with real, empirical phenomena models based on branching processes give a paradigm on two levels. In the case of SOC this is given by a combination of “internal dynamics” and “an ensemble”. Of these, the first means e. g. that activated walkers or particles of type A are moved around with certain rules, and of course there is an enormous variety of possible models. e. g. in the Manna-model this is obvious if one splits the walkers into categories A (active) and B (passive). Then, there is the question of how the balance (assuming this balance is true) is maintained. The SOC models do this by a combination of dissipation (by e. g. particles dropping off the edge of a system) and drive (by addition of B’s), where the rates are chosen actually pretty carefully. In the case of growing

networks, one can similarly ask what kind of growth laws allow for scale-free distributions of the node degree. For the theory of complex systems branching processes thus give rise to two questions: what classes of models are there? What kinds of truly different ensembles are there? The theory of reaction-diffusion systems has tried to answer to the first question since the 70’s, and the developments reflect the idea of universality in statistical physics (see e. g. the article of Dietrich Stauffer in this volume). There, this means that the behavior of systems at “critical points” – such as defined by the c and nc from above – follows from the dimension at hand and the “universality class” at hand. The activated walkers’ model, it has recently been established, belongs to the “fixed energy sandpile” one, and is closely related to other seemingly far-fetched ones such as the depinning/pinning of domain walls in magnets (see e. g. [18]). The second question can be stated in two ways: forgetting about the detailed model at hand, when can one expect complex behavior such as power-law distributions, avalanches etc.? Second, and more technically, can one derive the exponents such as s and Ds from those of the same model at its “usual” phase transition? The theory of branching processes provide many answers to these questions, and in particular help to illustrate the influence of boundary conditions and modes of driving on the expected behavior. Thus, one gets a clear idea of the kind of complexity one can expect to see in the many different kinds of systems where avalanches and intermittency and scaling is observed. Branching Processes The mathematical branching process is defined for a set of objects that do not interact. At each iteration, each object can give rise to new objects with some probability p (or in general it can be a set of probabilities). By continuing this iterative process the objects will form what is referred to as a cluster or an avalanche. We can now ask questions of the following type: Will the process continue for ever? Will the process maybe die out and stop after a finite number of iterations? What will the average lifetime be? What is the average size of the clusters? And what is (the asymptotic form of) the probability that the process is active after a certain number of iterations, etc. BP Definition We will consider a simple discrete BP denoted the Galton– Watson process. For other types of BPs (including processes in continuous time) we refer to the book by Harris [1]. We will denote the number of objects at generation n as: z0 ; z1 ; z2 ; : : : ; z n ; : : :. The index n is referred to

287

288

Branching Processes

as time and time t D 0 corresponds to the 0th generation where we take z0 D 1, i. e., the process starts from one object. We will assume that the transition from generation n to n C 1 is given by a probability law that is independent of n, i. e., it is assumed that it forms a Markov process. And finally it will be assumed that different objects do not interact with one another. The probability measure for the process is characterized by probabilities as follows: pk is the probability that an object in the nth generation will have k offsprings in the n C 1th generation. We assume that pk is independent of n. The probabilities pk thus reads p k D Prob(z1 D k) and fulfills: X

pk D 1 :

(2)

Branching Processes, Figure 3 Schematic drawing of the behavior of the probability of extinction q, of the branching process. The quantity 1  q is similar to the order parameter for systems in equilibrium at their critical points and the quantity mc is referred to as the critical point for the BP

k

Figure 2 shows an example of the tree structure for a branching process and the resulting avalanche. One can define the probability generating function f (s) associated to the transition probabilities: f (s) D

X

k

pk s :

(3)

k

The first and second moments of the number z1 are denoted as: m D Ez1 ;

 2 D Var z1 :

By taking derivatives of the generating function at s D 1 it follows that: Ez n D m n ;

Var z n D n 2 :

For the BP defined in Fig. 2, it follows that m D 2p and  2 D 4p(1  p). An important quantity is the probability for extinction q. It is obtained as follows: q D P(z n ! 0) D P(z n D 0 for some n) D lim P(z n D 0): n

(6)

(4)

Branching Processes, Figure 2 Schematic drawing of an avalanche in a system with a maximum of n D 3 avalanche generations corresponding to N D 2nC1  1 D 15 sites. Each black site relaxes with probability p to two new black sites and with probability 1  p to two white sites (i. e., p0 D 1  p, p1 D 0, p2 D p). The black sites are part of an avalanche of size s D 7, whereas the active sites at the boundary yield a boundary avalanche of size 3 (p; t) D 2

(5)

It can be shown that for m  1 : q D 1 ; and for m > 1: there exists a solution that fulfills q D f (q), where 0 < q < 1 [1]. It is possible to show that limn P(z n D k) D 0, for k D 1; 2; 3; : : : , and that z n ! 0 with probability q, and z n ! 1, with probability 1  q. Thus, the sequence fz n g does not remain positive and bounded [1]. The quantity 1  q is similar to an order parameter for systems in equilibrium and its behavior is schematically shown in Fig. 3. The behavior around the value mc D 1, the so-called critical value for the BP (see below), can in analogy to second order phase transitions in equilibrium systems be described by a critical exponent ˇ defined as follows (ˇ D 1, cf. [1]):

0; m  mc (7) Prob(survival) D (m  mc )ˇ ; m > mc :

Avalanches and Critical Point We will next consider the clusters, or avalanches, in more detail and obtain the asymptotic form of the probability distributions. The size of the cluster is given by the sum

Branching Processes

z D z1 C z2 C z3 C : : :. One can also consider other types of clusters, e. g., the activity  of the boundary (of a finite tree) and define the boundary avalanche (cf. Fig. 2). For concreteness, we consider the BP defined in Fig. 2. The quantities Pn (s; p) and Q n (; p) denote the probabilities of having an avalanche of size s and boundary size  in a system with n generations. The corresponding generating functions are defined by [1] X f n (x; p) Pn (s; p)x s (8) s

g n (x; p)

X

Q n (; p)x  :

(9)



Due to the hierarchical structure of the branching process, it is possible to write down recursion relations for Pn (s; p) and Q n (; p):   (10) f nC1 (x; p) D x (1  p) C p f n2 (x; p) ; g nC1 (x; p) D (1  p) C pg n2 (x; p) ;

(11)

where f0 (x; p) D g0 (x; p) D x. The avalanche distribution D(s) is determined by Pn (s; p) by using the recursion relation (10). The stationary solution of Eq. (10) in the limit n  1 is given by p 1  1  4x 2 p(1  p) : (12) f (x; p) D 2x p By expanding Eq. (12) as a series in x, comparing with the definition (8), and using Stirling’s formula for the high-order terms, we obtain for sizes such that 1  s . n the following behavior: p   2(1  p)/ p exp s/sc (p) : (13) Pn (s; p) D s 3/2 The cutoff sc (p) is given by sc (p) D 2/ ln[4p(1  p)]. As p ! 1/2, sc (p) ! 1, thus showing explicitly that the critical value for the branching process is pc D 1/2 (i. e., mc D 1), and that the mean-field avalanche exponent, cf. Eq. (1), for the critical branching process is D 3/2. The expression (13) is only valid for avalanches which are not effected by the finite size of the system. For avalanches with n . s . N, it is possible to solve the recursion relation (10), and then obtain Pn (s; p) for p pc by the use of a Tauberian theorem [19,20,21]. By carrying out such an analysis one obtains after some algebra  Pn (s; p)  A(p) exp s/s0 (p) , with functions A(p) and s0 (p) which can not be determined analytically. Nevertheless, we see that for any p the probabilities Pn (s; p) will decay exponentially. One can also calculate the asymptotic form of Q n (; p) for 1   . n and p pc by the use of

a Tauberian theorem [19,20,21]. We will return to study this in the next section where we will also define and investigate the distribution of the time to extinction. Self-Organized Branching Processes We now return to discussing the link between self-organized criticality as mentioned in the introduction and branching processes. The simplest theoretical approach to SOC is mean-field theory [22], which allows for a qualitative description of the behavior of the SOC state. Meanfield exponents for SOC models have been obtained by various approaches [22,23,24,25] and it turns out that their values (e. g., D 3/2) are the same for all the models considered thus far. This fact can easily be understood since the spreading of an avalanche in mean-field theory can be described by a front consisting of non-interacting particles that can either trigger subsequent activity or die out. This kind of process is reminiscent of a branching process. The connection between branching processes and SOC has been investigated, and it has been argued that the mean-field behavior of sandpile models can be described by a critical branching process [26,27,28]. For a branching process to be critical one must fine tune a control parameter to a critical value. This, by definition, cannot be the case for a SOC system, where the critical state is approached dynamically without the need to fine tune any parameter. In the so-called self-organized branching-process (SOBP), the coupling of the local dynamical rules to a global condition drives the system into a state that is indeed described by a critical branching process [29]. It turns out that the mean-field theory of SOC models can be exactly mapped to the SOBP model. In the mean-field description of the sandpile model (d ! 1) one neglects correlations, which implies that avalanches do not form loops and hence spread as a branching process. In the SOBP model, an avalanche starts with a single active site, which then relaxes with probability p, leading to two new active sites. With probability 1  p the initial site does not relax and the avalanche stops. If the avalanche does not stop, one repeats the procedure for the new active sites until no active site remains. The parameter p is the probability that a site relaxes when it is triggered by an external input. For the SOBP branching process, there is a critical value, pc D 1/2, such that for p > pc the probability to have an infinite avalanche is non-zero, while for p < pc all avalanches are finite. Thus, p D pc corresponds to the critical case, where avalanches are power law distributed. In this description, however, the boundary conditions are not taken into account – even though they are cru-

289

290

Branching Processes

cial for the self-organization process. The boundary conditions can be introduced in the problem in a natural way by allowing for no more than n generations for each avalanche. Schematically, we can view the evolution of a single avalanche of size s as taking place on a tree of size N D 2nC1  1 (see Fig. 2). If the avalanche reaches the boundary of the tree, one counts the number of active sites n (which in the sandpile language corresponds to the energy leaving the system), and we expect that p decreases for the next avalanche. If, on the other hand, the avalanche stops before reaching the boundary, then p will slightly increase. The number of generations n can be thought of as some measure of the linear size of the system. The above avalanche scenario is described by the following dynamical equation for p(t): p(t C 1) D p(t) C

1  n (p; t) ; N

(14)

where n , the size of an avalanche reaching the boundary, fluctuates in time and hence acts as a stochastic driving force. If n D 0, then p increases (because some energy has been put into the system without any output), whereas if n > 0 then p decreases (due to energy leaving the system). Equation (14) describes the global dynamics of the SOBP, as opposed to the local dynamics which is given by the branching process. One can study the model for a fixed value of n, and then take the limit n ! 1. In this way, we perform the long-time limit before the “thermodynamic” limit, which corresponds exactly to what happens in sandpile models. We will now show that the SOBP model provides a mean-field theory of self-organized critical systems. Consider for simplicity the sandpile model of activated random walkers from the Introduction [5]: When a particle is added to a site zi , the site will relax if z i D 1. In the limit d ! 1, the avalanche will never visit the same site more than once. Accordingly, each site in the avalanche will relax with the same probability p D P(z D 1). Eventually, the avalanche will stop, and  0 particles will leave the system. Thus, the total number of particles M(t) evolves according to

Branching Processes, Figure 4 The value of p as a function of time for a system with n D 10 generations. The two curves refer to two different initial conditions, above pc () and below pc (ı). After a transient, the control parameter p(t) reaches its critical value pc and fluctuates around it with short-range correlations

where  hn i  n D (2p)n  n (p; t) describes the fluctuations in the steady state. Thus,  is obtained by measuring n for each realization of the process. Without the last term, Eq. (16) has a fixed point (dp/dt D 0) for p D pc D 1/2. On linearizing Eq. (16), one sees that the fixed point is attractive, which demonstrates the self organization of the SOBP model since the noise /N will have vanishingly small effect in the thermodynamic limit [29]. Figure 4 shows the value of p as a function of time. Independent of the initial conditions, one finds that after a transient p(t) reaches the self-organized state described by the critical value pc D 1/2 and fluctuates around it with short-range correlations (of the order of one time unit). By computing the variance of p(t), one finds that the fluctuations can be very well described by a Gaussian distribution, (p) [29]. In the limit N ! 1, the distribution (p) approaches a delta function, ı(p  pc ). Avalanche Distributions

M(t C 1) D M(t) C 1   :

(15)

The dynamical Eq. (14) for the SOBP model is recovered by noting that M(t) D N P(z D 1) D N p. By taking the continuum time limit of Eq. (14), it is possible to obtain the following expression: dp 1  (2p)n (p; t) D C ; dt N N

(16)

Figure 5 shows the avalanche size distribution D(s) for different values of the number of generations n. One notices that there is a scaling region (D(s) s  with D 3/2), whose size increases with n, and characterized by an exponential cutoff. This power-law scaling is a signature of the mean-field criticality of the SOBP model. The distribution of active sites at the boundary, D( ), for different values of the number of generations falls off exponentially [29].

Branching Processes

The avalanche lifetime distribution L(t) t y yields the probability to have an avalanche that lasts for a time t. For a system with m generations one obtains L(m) m2 [1]. Identifying the number of generations m of an avalanche with the time t, we thus obtain the meanfield value y D 2, in agreement with simulations of the SOBP model [29]. In summary, the self-organized branching process captures the physical features of the self-organization mechanism in sandpile models. By explicitly incorporating the boundary conditions it follows that the dynamics drives the system into a stationary state, which in the thermodynamic limit corresponds to the critical branching process. Scaling and Dissipation Branching Processes, Figure 5 Log–log plot of the avalanche distribution D(s) for different system sizes. The number of generations n increases from left to right. A line with slope  D 3/2 is plotted for reference, and it describes the behavior of the data for intermediate s values, cf. Eq. (18). For large s, the distributions fall off exponentially

In the limit where n  1 one can obtain various analytical results and e. g. calculate the avalanche distribution D(s) for the SOBP model. In addition, one can obtain results for finite, but large, values of n. The distribution D(s) can be calculated as the average value of Pn (s; p) with respect to the probability density (p), i. e., according to the formula Z 1 D(s) D dp (p) Pn (s; p) : (17) 0

The simulation results in Fig. 4 yield that (p) for N  1 approaches the delta function ı(p  pc ). Thus, from Eqs. (13) and (17) we obtain the power-law behavior r 2  (18) s ; D(s) D  where D 3/2, and for s & n we obtain an exponential cutoff exp(s/s0 (pc )). These results are in complete agreement with the numerical results shown in Fig. 5. The deviations from the power-law behavior (18) are due to the fact that Eq. (13) is only valid for 1  s . n. One can also calculate the asymptotic form of Q n (; p) for 1   . n and p pc by the use of a Tauberian theorem [19,20,21]; the result show that the boundary avalanche distribution is Z 1 8 dp (p) Q n (; p) D 2 exp (2/n) ; (19) D() D n 0 which agrees with simulation results for n  1, cf. [29].

Sometimes it can be difficult to determine whether the cutoff in the scaling is due to finite-size effects or due to the fact that the system is not at but rather only close to the critical point. In this respect, it is important to test the robustness of critical behavior by understanding which perturbations destroy the critical properties. It has been shown numerically [30,31,32] that the breaking of the conservation of particle numbers leads to a characteristic size in the avalanche distributions. We will now allow for dissipation in branching processes and show how the system self-organizes into a sub-critical state. In other words, the degree of nonconservation is a relevant parameter in the renormalization group sense [33]. Consider again the two-state model introduced by Manna [5]. Some degree of nonconservation can be introduced in the model by allowing for energy dissipation in a relaxation event. In a continuous energy model this can be done by transferring to the neighboring sites only a fraction (1  ) of the energy lost by the relaxing site [30]. In a discrete energy model, such as the Manna two-state model, one can introduce dissipation as the probability  that the two particles transferred by the relaxing site are annihilated [31]. For  D 0 one recovers the original two-state model. Numerical simulations [30,31] show that different ways of considering dissipation lead to the same effect: a characteristic length is introduced into the system and the criticality is lost. As a result, the avalanche size distribution decays not as a pure power law but rather as D(s) s  hs (s/sc ) :

(20)

Here hs (x) is a cutoff function and the cutoff size scales as sc  ' :

(21)

291

292

Branching Processes

The size s is defined as the number of sites that relax in an avalanche. We define the avalanche lifetime T as the number of steps comprising an avalanche. The corresponding distribution decays as D(T) T y h T (T/Tc ) ;

(22)

where hT (x) is another cutoff function and Tc is a cutoff that scales as Tc   :

p D pc

1 : 2(1  )

(25)

Thus, for p > pc the probability to have an infinite avalanche is non-zero, while for p < pc all avalanches are finite. The value p D pc corresponds to the critical case where avalanches are power law distributed.

(23)

The cutoff or “scaling” functions hs (x) and hT (x) fall off exponentially for x  1. To construct the mean-field theory one proceeds as follows [34]: When a particle is added to an arbitrary site, the site will relax if a particle was already present, which occurs with probability p D P(z D 1), the probability that the site is occupied. If a relaxation occurs, the two particles are transferred with probability 1   to two of the infinitely many nearest neighbors, or they are dissipated with probability  (see Fig. 6). The avalanche process in the mean-field limit is a branching process. Moreover, the branching process can be described by the effective branching probability p˜ p(1  ) ;

where p˜ is the probability to create two new active sites. We know that there is a critical value for p˜ D 1/2, or

The Properties of the Steady State To address the self-organization, consider the evolution of the total number of particles M(t) in the system after each avalanche: M(t C 1) D M(t) C 1   (p; t)  (p; t) :

(26)

Here  is the number of particles that leave the system from the boundaries and  is the number of particles lost by dissipation. Since (cf. Sect. “Self–Organized Branching Processes”) M(t) D N P(z D 1) D N p, we obtain an evolution equation for the parameter p: p(t C 1) D p(t) C

(24)

1   (p; t)  (p; t) : N

(27)

This equation reduces to the SOBP model for the case of no dissipation ( D 0). In the continuum limit one obtains [34]  (p; t) 1  dp 1  (2p(1  ))n  pH(p(1  )) C D : dt N N (28) Here, we defined the function H(p(1  )), that can be obtained analytically, and introduced the function (p; t) to describe the fluctuations around the average values of  and . It can be shown numerically that the effect of this “noise” term is vanishingly small in the limit N ! 1 [34]. Without the noise term one can study the fixed points of Eq. (28) and one finds that there is only one fixed point, Branching Processes, Figure 6 Schematic drawing of an avalanche in a system with a maximum of n D 3 avalanche generations corresponding to N D 2nC1  1 D 15 sites. Each black site () can relax in three different ways: (i) with probability p(1  ) to two new black sites, (ii) with probability 1  p the avalanche stops, and (iii) with probability p two particles are dissipated at a black site, which then becomes a marked site (˚), and the avalanche stops. The black sites are part of an avalanche of size s D 6, whereas the active sites at the boundary yield 3 (p; t) D 2. There was one dissipation event such that  D 2

p D 1/2 ;

(29)

independent of the value of ; the corrections to this value are of the order O(1/N). By linearizing Eq. (28), it follows that the fixed point is attractive. This result implies that the SOBP model self-organizes into a state with p D p . In Fig. 7 is shown the value of p as a function of time for different values of the dissipation . We find that independent of the initial conditions after a transient p(t) reaches the self-organized steady-state described by the

Branching Processes

Branching Processes, Figure 7 The value of the control parameter p(t) as a function of time for a system with different levels of dissipation. After a transient, p(t) reaches its fixed-point value p D 1/2 and fluctuates around it with short-range time correlations

Branching Processes, Figure 8 Phase diagram for the SOBP model with dissipation. The dashed line shows the fixed points p D 1/2 of the dynamics, with the flow being indicated by the arrows. The solid line shows the critical points, cf. Eq. (25)

fixed point value p D 1/2 and fluctuates around it with short-range correlations (of the order of one time unit). The fluctuations around the critical value decrease with the system size as 1/N. It follows that in the limit N ! 1 the distribution (p) of p approaches a delta function (p) ı(p  p ). By comparing the fixed point value (29) with the critical value (25), we obtain that in the presence of dissipation ( > 0) the self-organized steady-state of the system is subcritical. Figure 8 is a schematic picture of the phase diagram of the model, including the line p D pc of critical behavior (25) and the line p D p of fixed points (29). These two lines intersect only for  D 0. Avalanche and Lifetime Distributions In analogy to the results in Sect. “Self–Organized Branching Processes”, we obtain similar formulas for the avalanche size distributions but with p˜ replacing p. As a result we obtain the distribution r 2 1 C  C ::: D(s) D exp (s/sc ()) : (30)  s We can expand sc ( p˜ ()) D 2/ln [4 p˜ (1  p˜)] in  with the result 2 sc () ' ; ' D 2 : (31)  Furthermore, the mean-field exponent for the critical branching process is obtained setting  D 0, i. e., D 3/2 :

(32)

Branching Processes, Figure 9 Log–log plot of the avalanche distribution D(s) for different levels of dissipation. A line with slope  D 3/2 is plotted for reference, and it describes the behavior of the data for intermediate s values, cf. Eq. (30). For large s, the distributions fall off exponentially. The data collapse is produced according to Eq. (30)

These results are in complete agreement with the SOBP model and the simulation of D(s) for the SOBP model with dissipation (cf. Fig. 9). The deviations from the power-law behavior (30) are due to the fact that Eq. (13) is only valid for 1  s . n. Next, consider the avalanche lifetime distribution D(T) characterizing the probability to obtain an avalanche which spans m generations. It can be shown that the result

293

294

Branching Processes

Branching Processes, Figure 10 Log–log plot of the lifetime distribution D(T) for different levels of dissipation. A line with slope y D 2 is plotted for reference. Note the initial deviations from the power law for  D 0 due to the strong corrections to scaling. The data collapse is produced according to Eq. (33)

can be expressed in the scaling form [1,34] D(T) T y exp(T/Tc ) ;

(33)

where Tc   ;

D1:

(34)

The lifetime exponent y was defined in Eq. (22), wherefrom we confirm the mean-field result y D2:

(35)

In Fig. 10, we show the data collapse produced by Eq. (33) for the lifetime distributions for different values of . In summary, the effect of dissipation on the dynamics of the sandpile model in the mean-field limit (d ! 1) is described by a branching process. The evolution equation for the branching probability has a single attractive fixed point which in the presence of dissipation is not a critical point. The level of dissipation  therefore acts as a relevant parameter for the SOBP model. These results show, in the mean-field limit, that criticality in the sandpile model is lost when dissipation is present. Network Science and Branching Processes An interesting mixture of two kinds of “complex systems” is achieved by considering what kind of applications one can find of branching processes applied to or on networks.

Here, network is a loose term in the way it is often used – as in the famous “six degrees” concept of how the social interactions of human beings can be measured to have a “small world” character [16,17,35]. These structures are usually thought of in terms of graphs, with vertices forming a set G, and edges a set E such that if e g g 0 2 E it means that it connects two sites g and g 0 2 G. An edge or link can either be symmetric or asymmetric. Another article in this volume discusses in depth the modern view on networks, providing more information. The Structural properties of networks heterogeneous networks are most usually measured by the degree ki of site i 2 G, i. e. the number of nearest neighbors the vertex i is connected to. One can go further by defining weights wij for edges eij [36,37,38]. The interest in heterogeneous networks is largely due to two typical properties one can find. In a scale-free network the structure is such that the probability distribution has a power-law form P(k) k ˛ up to a size-dependent cut-off kmax . One can now show using mean-field-like reference models (such as the “configuration model”, which has no structural correlations and a prefixed P(k)), that it follows that these networks tend to have a “small world” character, that the average vertex-vertex distance is only logarithmic in the number of vertices N: d(G) ln N. This property is correlated with the fact that the various moments of k depend on ˛. For hki to remain finite, ˛ > 2 is required. In many empirically measured networks it appears that ˛ < 3, which implies that hk 2 i diverges (we are assuming, that R 1 P(k) is cut off at a maximum degree kmax for which k max 1/N, from which the divergences follow) [16,17,35]. The first central question of the two we pose here is, what happens to branching processes on the top of such networks? Again, this can be translated into an inquiry about the “phase diagram” – given a model – and its particulars. We would like to understand what is the critical threshold c , and what happens to sub-critical and supercritical systems (with appropriate values of ). The answers may of course depend on the model at hand, i. e. the universality classes are still an important issue. One easy analytical result for un-correlated (in particular tree-like) networks is that activity will persist in the branching process if the number of offspring is at least one for an active “individual” or vertex. Consider now one such vertex, i. Look at the neighbors of neighbors. There are of the order O (k i  hkneighbor of i i  1) of these, so c 1/hk 2 i. This demonstrates the highly important result, that the critical threshold may vanish in heterogeneous networks, for ˛  3 in the example case. Pastor-Satorras and Vespignani were the first to point out this fact for the Susceptible-Infected-Susceptible (SIS)

Branching Processes

model [39]. The analysis is relatively easy to do using mean-field theory for k , the average activity for the SIS branching process on nodes of degree k. The central equation reads h i (36) @ t k (t) D  k (t) C k 1  k (t)  k () : Here  measures the probability that at least one of the neighbors of a vertex of degree k is infected. A MF-treatment of , excluding degree correlations, makes it possible to show that the SIS-model may have a zero threshold, and also establishes other interesting properties as h k i. As is natural, the branching process concentrates on vertices with higher degrees, that is k increases with k. The consequences of a zero epidemic threshold are plentiful and important [40]. The original application of the theory was to the spreading of computer viruses: the Internet is (on the Autonomous System level) a scale-free network [41]. So is the network of websites. The outcome even for subcritical epidemics is a long life-time, and further complications ensue if the internal dynamics of the spreading do not follow Poissonian statistics of time (e. g. the waiting time before vertex i sends a virus to a neighbor), but a power-law distribution [42]. Another highly topical issue is the behavior and control of human-based disease outbreaks. How to vaccinate against and isolate a dangerous virus epidemy depends on the structure of the network on which the spreading takes place, and on the detailed rules and dynamics. An intuitively easy idea is to concentrate on the “hubs”, on the most connected vertices of the network [43]. This can be an airport through which travellers acting as carriers move, or it can – as in the case of sexually transmitted viruses such as HIV – be the most active individuals. There are indeed claims that the network of sexual contacts is scale-free, with a small exponent ˛. Of interest here is the recent result that for  > c and ˛ < 3 the spreading of branching processes deviates from usual expectations. The outgrowth covers a finite fraction of the network in a short time (vanishing in the large-N limit), and the growth in time of the infected nodes becomes polynomial, after an initial exponential phase [44]. Branching processes can also be applied to network structure. The “standard model” of growing networks is the Barabasi–Albert model. In it, from a small seed graph one grows an example network by so-called preferential attachment. That is, there are two mechanisms that operate: i) new vertices are added, and ii) old ones grow more links (degree increases) by getting linked to new ones. There is a whole zoo of various network growth models, but we next overview some ideas that directly connect to

Branching Processes, Figure 11 Two processes creating a growing or steady-state network: addition and merging of vertices

branching processes. These apply to cases in which vertices disappear by combining with other vertices, are generated from thin air, or split into “sub-vertices” such that old links from the original one are retained. One important mechanism, operative in cellular protein interaction networks, is the duplication of vertices (a protein) with slight changes to the copy’s connectivity from the parent vertex. Figure 11 illustrates an example where the structure in the steady-state can be fruitfully described by similar mathematics as in other similar cases in networks. The basic rules are two-fold: i) a new vertex is added to the network, ii) two randomly chosen vertices merge. This kind of mechanisms are reminiscent of aggregation processes, of e. g. sticky particles that form colloidal aggregates in fluids. The simple model leads to a degree distribution of the asymptotic form P(k) k 3/2 , reminiscent of the mean-field branching process result for the avalanche size distribution (where D 3/2, cf. Sect. “Branching Processes”) [45,46]. The mathematical tool here is given by rate equations that describe the number of vertices with a certain degree, k. These are similar to those that one can write for the size of an avalanche in other contexts. It is an interesting question of what kind of generalizations one can find by considering cases where the edges have weights, and the elementary processes are made dependent in some way on those [47]. It is worth noting that the aggregation/branching process description of network structure develops easily deep complexity. This can be achieved by choosing the nodes to be joined with an eye to the graph geometry, and/or splitting vertices with some particular rules. In the latter case, a natural starting point is to maintain all the links of the original vertex and to distribute them among the descendant vertices. Then, one has to choose whether to link those to each other or not – either choice influences the local correlations. The same is naturally true, if one considers reactions of the type k C x 0 ! k 00 , which originate from joining two neighboring vertices with degrees k and

295

296

Branching Processes

x 0 . The likelihood of this rate is dependent on the conditional probability of a vertex with degree k having a neighbor with x 0 . Such processes are dependent on structural properties of the networks, which themselves change in time, thus the theoretical understanding is difficult due to the correlations. As a final note, in some cases the merging of vertices produces to the network a giant component, a vertex which has a finite fraction of the total mass (or, links in the whole). This is analogous to avalanching systems which are “weakly first-order”, in other words exhibit statistics which has a power-law, scale-free part and then a separate, singular peak. Often, this would be a final giant avalanche.

scription of non-Markovian phenomena, as when for instance the avalanche shape is non-symmetrical [51]. This indicates that there is an underlying mechanism which needs to be incorporated into the BP model.

Conclusions

Bibliography

In this short overview we have given some ideas of how to understand complexity via the tool of branching processes. The main issue has been that they are an excellent means of understanding “criticality” and “complexity” in many systems. We have concentrated on two particular applications, SOC and networks, to illustrate this. Many other important fields where BP-based ideas find use have been left out, from biology and the dynamics of species and molecules to geophysics and the spatial and temporal properties of say earthquakes. An example is the so-called ETAS model used for their modeling (see [48,49,50]). Future Directions The applications of branching processes to complex systems continue on various fronts. One can predict interesting developments in a number of cases. First, as indicated the dynamics of BP’s on networks are not understood very well at all when it comes to the possible scenarios. A simple question is, whether the usual language of absorbing state phase transitions applies, and if not why – and what is the effect of the heterogeneous geometry of complex networks. In many cases the structure of a network is a dynamic entity, which can be described by generalized branching processes and there has been so far relatively little work into this direction. The inclusion of spatial effects and temporal memory dynamics is another interesting and important future avenue. Essentially, one searches for complicated variants of usual BP’s to be able to model avalanching systems or cases where one wants to compute the typical time to reach an absorbing state, and the related distribution. Or, the question concerns the supercritical state ( > c ) and the spreading from a seed. As noted in the networks section this can be complicated by the presence of non-Poissonian temporal statistics. Another exciting future task is the de-

Acknowledgments We are grateful to our colleague Stefano Zapperi with whom we have collaborated on topics related to networks, avalanches, and branching processes. This work was supported by the Academy of Finland through the Center of Excellence program (M.J.A.) and EUMETSAT’s GRAS Satellite Application Facility (K.B.L.).

Primary Literature 1. Harris TE (1989) The Theory of Branching Processes. Dover, New York 2. Kimmel M, Axelrod DE (2002) Branching Processes in Biology. Springer, New York 3. Athreya KB, Ney PE (2004) Branching Processes. Dover Publications, Inc., Mineola 4. Haccou P, Jagers P, Vatutin VA (2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press, Cambridge 5. Manna SS (1991) J Phys A 24:L363. In this two-state model, the energy takes the two stable values, zi D 0 (empty) and zi D 1 (particle). When zi  zc , with zc D 2, the site relaxes by distributing two particles to two randomly chosen neighbors 6. Dickman R, Alava MJ, Munoz MA, Peltola J, Vespignani A, Zapperi S (2001) Phys Rev E64:056104 7. Bak P, Tang C, Wiesenfeld K (1987) Phys Rev Lett 59:381; (1988) Phys Rev A 38:364 8. Tebaldi C, De Menech M, Stella AL (1999) Phys Rev Lett 83:3952 9. Stella AL, De Menech M (2001) Physica A 295:1001 10. Vespignani A, Dickman R, Munoz MA, Zapperi S (2000) Phys Rev E 62:4564 11. Dickman R, Munoz MA, Vespignani A, Zapperi S (2000) Braz J Phys 30:27 12. Alava M (2003) Self-Organized Criticality as a Phase Transition. In: Korutcheva E, Cuerno R (eds) Advances in Condensed Matter and Statistical Physics. arXiv:cond-mat/0307688,(2004) Nova Publishers, p 45 13. Lubeck S (2004) Int J Mod Phys B18:3977 14. Jensen HJ (1998) Self-Organized Criticality. Cambridge University Press, Cambridge 15. Alava MJ, Nukala PKNN, Zapperi S (2006) Statistical models of fracture. Adv Phys 55:349–476 16. Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47 17. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford; (2002) Adv Phys 51:1079; (2004) arXiv:condmat/0404593 18. Bonachela JA, Chate H, Dornic I, Munoz MA (2007) Phys Rev Lett 98:115702

Branching Processes

19. Feller W (1971) An Introduction to Probability Theory and its Applications. vol 2, 2nd edn. Wiley, New York 20. Asmussen S, Hering H (1983) Branching Processes. Birkhäuser, Boston 21. Weiss GH (1994) Aspects and Applications of the Random Walk. North-Holland, Amsterdam 22. Tang C, Bak P (1988) J Stat Phys 51:797 23. Dhar D, Majumdar SN (1990) J Phys A 23:4333 24. Janowsky SA, Laberge CA (1993) J Phys A 26:L973 25. Flyvbjerg H, Sneppen K, Bak P (1993) Phys Rev Lett 71:4087; de Boer J, Derrida B, Flyvbjerg H, Jackson AD, Wettig T (1994) Phys Rev Lett 73:906 26. Alstrøm P (1988) Phys Rev A 38:4905 27. Christensen K, Olami Z (1993) Phys Rev E 48:3361 28. García-Pelayo R (1994) Phys Rev E 49:4903 29. Zapperi S, Lauritsen KB, Stanley HE (1995) Phys Rev Lett 75:4071 30. Manna SS, Kiss LB, Kertész J (1990) J Stat Phys 61:923 31. Tadi´c B, Nowak U, Usadel KD, Ramaswamy R, Padlewski S (1992) Phys Rev A 45:8536 32. Tadic B, Ramaswamy R (1996) Phys Rev E 54:3157 33. Vespignani A, Zapperi S, Pietronero L (1995) Phys Rev E 51:1711 34. Lauritsen KB, Zapperi S, Stanley HE (1996) Phys Rev E 54:2483 35. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167 36. Yook SH, Jeong H, Barabasi AL, Tu Y (2001) Phys Rev Lett 86:5835 37. Barrat A, Barthelemy M, Pastor-Satorras R, Vespignani A (2004) Proc Natl Acad Sci USA 101:3747 38. Barrat A, Barthelemy M, Vespignani A (2004) Phys Rev Lett 92:228701 39. Pastor-Satorras R, Vespignani A (2001) Phys Rev Lett 86:3200 40. Dorogovtsev SN, Goltsev AV, Mendes JFF (2007) arXiv:condmat/0750.0110 41. Pastor-Satorras R, Vespignani A (2004) Evolution and Structure of the Internet: A Statistical Physics Approach. Cambridge University Press, Cambridge 42. Vazquez A, Balazs R, Andras L, Barabasi AL (2007) Phys Rev Lett 98:158702

43. Colizza V, Barrat A, Barthelemy M, Vespignani A (2006) PNAS 103:2015 44. Vazquez A (2006) Phys Rev Lett 96:038702 45. Kim BJ, Trusina A, Minnhagen P, Sneppen K (2005) Eur Phys J B43:369 46. Alava MJ, Dorogovtsev SN (2005) Phys Rev E 71:036107 47. Hui Z, Zi-You G, Gang Y, Wen-Xu W (2006) Chin Phys Lett 23:275 48. Ogata Y (1988) J Am Stat Assoc 83:9 49. Saichev A, Helmstetter A, Sornette D (2005) Pure Appl Geophys 162:1113 50. Lippidello E, Godano C, de Arcangelis L (2007) Phys Rev Lett 98:098501 51. Zapperi S, Castellano C, Colaiori F, Durin G (2005) Nat Phys 1:46

Books and Reviews Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47 Asmussen S, Hering H (1883) Branching Processes. Birkhäuser, Boston Athreya KB, Ney PE (2004) Branching Processes. Dover Publications, Inc., Mineola Feller W (1971) An Introduction to Probability Theory and its Applications. vol 2, 2nd edn. Wiley, New York Haccou P, Jagers P, Vatutin VA (2005) Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press, Cambridge Harris TE (1989) The Theory of Branching Processes. Dover, New York Jensen HJ (1998) Self-Organized Criticality. Cambridge University Press, Cambridge Kimmel M, Axelrod DE (2002) Branching Processes in Biology. Springer, New York Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167 Weiss GH (1994) Aspects and Applications of the Random Walk. North-Holland, Amsterdam

297

298

Cellular Automata as Models of Parallel Computation

Cellular Automata as Models of Parallel Computation THOMAS W ORSCH Lehrstuhl Informatik für Ingenieure und Naturwissenschaftler, Universität Karlsruhe, Karlsruhe, Germany Article Outline Glossary Definition of the Subject Introduction Time and Space Complexity Measuring and Controlling the Activities Communication in CA Future Directions Bibliography Glossary Cellular automaton The classical fine-grained parallel model introduced by John von Neumann. Hyperbolic cellular automaton A cellular automaton resulting from a tessellation of the hyperbolic plane. Parallel Turing machine A generalization of Turing’s classical model where several control units work cooperatively on the same tape (or set of tapes). Time complexity Number of steps needed for computing a result. Usually a function t : NC ! NC , t(n) being the maximum (“worst case”) for any input of size n. Space complexity Number of cells needed for computing a result. Usually a function s : NC ! NC , s(n) being the maximum for any input of size n. State change complexity Number of proper state changes of cells during a computation. Usually a function sc : NC ! NC , sc(n) being the maximum for any input of size n. Processor complexity Maximum number of control units of a parallel Turing machine which are simultaneously active during a computation. Usually a function sc : NC ! NC , sc(n) being the maximum for any input of size n. NC The set f1; 2; 3; : : : g of positive natural numbers. Z The set f: : : ; 3; 2; 1; 0; 1; 2; 3; : : : g of integers. QG The set of all (total) functions from a set G to a set Q. Definition of the Subject This article will explore the properties of cellular automata (CA) as a parallel model.

The Main Theme We will first look at the standard model of CA and compare it with Turing machines as the standard sequential model, mainly from a computational complexity point of view. From there we will proceed in two directions: by removing computational power and by adding computational power in different ways in order to gain insight into the importance of some ingredients of the definition of CA. What is Left Out There are topics which we will not cover although they would have fit under the title. One such topic is parallel algorithms for CA. There are algorithmic problems which make sense only for parallel models. Probably the most famous for CA is the so-called Firing Squad Synchronization Problem. This is the topic of Umeo’s article ( Firing Squad Synchronization Problem in Cellular Automata), which can also be found in this encyclopedia. Another such topic in this area is the Leader election problem. For CA it has received increased attention in recent years. See the paper by Stratmann and Worsch [29] and the references therein for more details. And we do want to mention the most exciting (in our opinion) CA algorithm: Tougne has designed a CA which, starting from a single point, after t steps has generated the discretized circle of radius t, for all t; see [5] for this gem. There are also models which generalize standard CA by making the cells more powerful. Kutrib has introduced push-down cellular automata [14]. As the name indicates, in this model each cell does not have a finite memory but can make use of a potentially unbounded stack of symbols. The area of nondeterministic CA is also not covered here. For results concerning formal language recognition with these devices refer to  Cellular Automata and Language Theory. All these topics are, unfortunately, beyond the scope of this article. Structure of the Paper The core of this article consists of four sections: Introduction: The main point is the standard definition of Euclidean deterministic synchronous cellular automata. Furthermore, some general aspects of parallel models and typical questions and problems are discussed. Time and space complexity: After defining the standard computational complexity measures, we compare CA with different resource bounds. The comparison of CA

Cellular Automata as Models of Parallel Computation

with the Turing Machine (TM) gives basic insights into their computational power. Measuring and controlling activities: There are two approaches to measure the “amount of parallelism” in CA. One is an additional complexity measured directly for CA, the other via the definition of so-called parallel Turing machines. Both are discussed. Communication: Here we have a look at “variants” of CA with communication structures other than the onedimensional line. We sketch the proofs that some of these variants are in the second machine class. Introduction In this section we will first formalize the classical model of cellular automata, basically introduced by von Neumann [21]. Afterwards we will recap some general facts about parallel models. Definition of Cellular Automata There are several equivalent formalizations of CA and of course one chooses the one most appropriate for the topics to be investigated. Our point of view will be that each CA consists of a regular arrangement of basic processing elements working in parallel while exchanging data. Below, for each of the words regular, basic, processing, parallel and exchanging, we first give the standard definition for clarification. Then we briefly point out possible alternatives which will be discussed in more detail in later sections. Underlying Grid A Cellular Automaton (CA) consists of a set G of cells, where each cell has at least one neighbor with which it can exchange data. Informally speaking one usually assumes a “regular” arrangement of cells and, in particular, identically shaped neighborhoods. For a d-dimensional CA, d 2 NC , one can think of G D Zd . Neighbors are specified by a finite set N of coordinate differences called the neighborhood. The cell i 2 G has as its neighbors the cells i C n for all n 2 N. Usually one assumes that 0 2 N. (Here we write 0 for a vector of d zeros.) As long as one is not specifically interested in the precise role N is playing, one may assume some standard neighborhood: The von Neumann neighborhood of P radius r is N (r) D f(k1 ; : : : ; kd ) j j jk j j  rg and the (r) Moore neighborhood of radius r is M D f(k1 ; : : : ; kd ) j max j jk j j  rg. The choices of G and N determine the structure of what in a real parallel computer would be called the “com-

munication network”. We will usually consider the case G D Zd and assume that the neighborhood is N D N (1) . Discussion The structure of connections between cells is sometimes defined using the concept of Cayley graphs. Refer to the article by Ceccherini–Silberstein ( Cellular Automata and Groups), also in this encyclopedia, for details. Another approach is via regular tessellations. For example the 2-dimensional Euclidean space can be tiled with copies of a square. These can be considered as cells, and cells sharing an edge are neighbors. Similarly one can tile, e. g., the hyperbolic plane with copies of a regular k-gon. This will be considered to some extent in Sect. “Communication in CA”. A more thorough exposition can be found in the article by Margenstern ( Cellular Automata in Hyperbolic Spaces) also in this encyclopedia. CA resulting, for example, from tessellations of the 2-dimensional Euclidean plane with triangles or hexagons are considered in the article by Bays ( Cellular Automata in Triangular, Pentagonal and Hexagonal Tessellations) also in this encyclopedia. Global and Local Configurations The basic processing capabilities of each cell are those of a finite automaton. The set of possible states of each cell, denoted by Q, is finite. As inputs to be processed each cell gets the states of all the cells in its neighborhood. We will write QG for the set of all functions from G to Q. Thus each c 2 Q G describes a possible global state of the whole CA. We will call these c (global) configurations. On the other hand, functions ` : N ! Q are called local configurations. We say that in a configuration c cell i observes the local configuration c iCN : N ! Q where c iCN (n) D c(i C n). A cell gets its currently observed local configuration as input. It remains to be defined how they are processed. Dynamics The dynamics of a CA are defined by specifying the local dynamics of a single cell and how cells “operate in parallel” (if at all). In the first sections we will consider the classical case:  A local transition function is a function f : Q N ! Q prescribing for each local configuration ` 2 Q N the next state f (`) of a cell which currently observes ` in its neighborhood. In particular, this means that we are considering deterministic behavior of cells.  Furthermore, we will first concentrate on CA where all cells are working synchronously: The possible transitions from one configuration to the next one in one

299

300

Cellular Automata as Models of Parallel Computation

global step of the CA can be described by a function F : Q G ! Q G requiring that all cells make one state transition: 8i 2 G : F(c)(i) D f (c iCN ). For alternative definitions of the dynamic behavior of CA see Sect. “Measuring and Controlling the Activities”. Discussion Basically, the above definition of CA is the standard one going back to von Neumann [21]; he used G D Z2 and N D f(1; 0); (1; 0); (0; 1); (0; 1)g for his construction. But for all of the aspects just defined there are other possibilities, some of which will be discussed in later sections. Finite Computations on CA In this article we are interested in using CA as devices for computing, given some finite input, in a finite number of steps a finite output. Inputs As the prototypical examples of problems to be solved by CA and other models we will consider the recognition of formal languages. This has the advantages that the inputs have a simple structure and, more importantly, the output is only one bit (accept or reject) and can be formalized easily. A detailed discussion of CA as formal language recognizers can be found in the article by Kutrib ( Cellular Automata and Language Theory). The input alphabet will be denoted by A. We assume that A  Q. In addition there has to be a special state q 2 Q which is called a quiescent state because it has the property that for the quiescent local configuration `q : N ! Q : n 7! q the local transition function must specify f (`q ) D q. In the literature two input modes are usually considered.  Parallel input mode: For an input w D x1    x n 2 An the initial configuration cw is defined as ( cw (i) D

xj

iff i D ( j; 0; : : : ; 0)

q

otherwise :

 Sequential input mode: In this case, all cells are in state q in the initial configuration. But cell (0; : : : ; 0) is a designated input cell and acts differently from the others. It works according to a function g : Q N (A~[fqg). During the first n steps the input cell gets input symbol xj in step j; after the last input symbol it always gets q. CA using this type of input are often called iterative arrays (IA).

Unless otherwise noted we will always assume parallel input mode. Conceptually the difference between the two input modes has the same consequences as for TM. If input is provided sequentially it is meaningful to have a look at computations which “use” less than n cells (see the definition of space complexity later on). Technically, some results occur only for the parallel input mode but not for the sequential one, or vice versa. This is the case, for example, when one looks at devices with p small time bounds like n or n C n steps. But as soon as one considers more generally ‚(n) or more steps and a space complexity of at least ‚(n), both CA and IA can simulate each other in linear time:  An IA can first read the complete input word, storing it in successive cells, and then start simulating a CA, and  A CA can shift the whole word to the cell holding the first input symbol and have it act as the designated input cell of an IA. Outputs Concerning output, one usually defines that a CA has finished its work whenever it has reached a stable configuration c, i. e., F(c) D c. In such a case we will also say that the CA halts (although formally one can continue to apply F). An input word w 2 AC is accepted, iff the cell (1; 0; : : : ; 0) which got the first input symbol, is in an accepting state from a designated finite subset FC  Q of states. We write L(C) for the set of all words w 2 AC which are accepted by a CA C. For the sake of simplicity we will assume that all deterministic machines under consideration halt for all inputs. E. g., the CA as defined above always reaches a stable configuration. The sequence of all configurations from the initial one for some input w to the stable one is called the computation for input w. Discussion For 1-dimensional CA the definition of parallel input is the obvious one. For higher-dimensional CA, say G D Z2 , one could also think of more compact forms of input for one-dimensional words, e. g., inscribing the input symbols row by row into a square with side length p d ne. But this requires extra care. Depending on the formal language to be accepted, the special way in which the symbols are input might provide additional information which is useful for the language recognition task at hand. Since a CA performs work on an infinite number of bits in each step, it would also be possible to consider inputs and outputs of infinite length, e. g., as representations of all real numbers in the interval [0; : : : ; 1]. There is much

Cellular Automata as Models of Parallel Computation

less literature about this aspect; see for example chapter 11 of [8]. It is not too surprising that this area is also related to the view of CA as dynamical systems (instead of computers); see the contributions by Formenti ( Chaotic Behavior of Cellular Automata) and K˚urka ( Topological Dynamics of Cellular Automata). Example: Recognition of Palindromes As an example that will also be useful later, consider the formal language Lpal of palindromes of odd length: Lpal D fvxv R j v 2 A ^ x 2 Ag : (Here vR is the mirror image of v.) For example, if A contains all Latin letters saippuakauppias belongs to Lpal (the Finnish word for a soap dealer). It is known that each T  TM with only one tape and only one head on it (see Subsect. “Turing Machines” for a quick introduction) needs time (n2 ) for the recognition of at least some inputs of length n belonging to Lpal [10]. We will sketch a CA recognizing Lpal in time ‚(n). As the set of states for a single cell we use Q D A [ f g [ Q l  Qr  Qv  Q l r , basically subdividing each cell in 4 “registers”, each containing a “substate”. The substates from Q l D f < a; < b; g and Qr D fa > ; b > ; g are used to shift input symbols to the left and to the right respectively. In the third register a substate from Qv D f+; -g indicates the results of comparisons. In the fourth register substates from Q l r D f > ; < ; < + > ; < - > ; g are used to realize “signals” > and < which identify the middle cell and distribute the relevant overall comparison result to all cells. As accepting states one chooses those, whose last component is : FC D Q l  Qr  Qv  fg. There is a total of 3 C 3  3  2  5 D 93 states and for a complete definition of the local transition function one would have to specify f (x; y; z) for 933 D 804357 triples of states. We will not do that, but we will sketch some important parts. In the first step the registers are initialized. For all x; y; z 2 A: `(1) x x

`(0) y y y y

`(1) z z

then < zl xr > f (`) D v 0m 0 dm Here, of course, the new value of the third register is computed as ( v 0m

D

+

if v m D + ^ z l D x r

-

otherwise :

0 in detail. We do not describe the computation of d m Figure 1 shows the computation for the input babbbab, which is a palindrome. Horizontal double lines separate configurations at subsequent time steps. Registers in state are simply left empty. As can be seen there is a triangle in the space time diagram consisting of the n input cells at time t D 0 and shrinking at both ends by one cell in each subsequent step where “a lot of activity” can happen due to the shifting of the input symbols in both directions.

f (`) ( < y; y > ; +; > ) ( < y; y > ; +; ) ( < y; y > ; +; < ) ( < y; y > ; +; < + > )

In all later steps, if < xl xr > `(1) D vl dl

< yl yr > `(0) D vm dm

< zl zr > `(1) D vr dr

Cellular Automata as Models of Parallel Computation, Figure 1 Recognition of a palindrome; the last configuration is stable and the cell which initially stored the first input symbol is in an accepting state

301

302

Cellular Automata as Models of Parallel Computation

Clearly a two-head TM can also recognize Lpal in linear time, by first moving one head to the last symbol and then synchronously shifting both heads towards each other comparing the symbols read. Informally speaking in this case the ability of multihead TM to transport a small amount of information over a long distance in one step can be “compensated” by CA by shifting a large amount of information over a short distance. We will see in Theorem 3 that this observation can be generalized. Complexity Measures: Time and Space For one-dimensional CA it is straightforward to define their time and space complexity. We will consider only worst-case complexity. Remember that we assume that all CA halt for all inputs (reach a stable configuration). For w 2 AC let time0 (w) denote the smallest number of steps such that the CA reaches a stable configuration after steps when started from the initial configuration for input w. Then time : NC ! NC : n 7! maxftime0 (w) j w 2 An g is called the time complexity of the CA. Similarly, let space0 (w) denote the total number of cells which are not quiescent in at least one configuration occurring during the computation for input w. Then space : NC ! NC : n 7! maxfspace0 (w) j w 2 An g is called the space complexity of the CA. If we want to mention a specific CA C, we indicate it as an index, e. g., timeC . If s and t are functions NC ! NC , we write CA  SPC(s)  TIME(t) for the set of formal languages which can be accepted by some CA C with spaceC  s and timeC  t, and analogously CA  SPC(s) and CA  TIME(t) if only one complexity measure is bounded. Thus we only look at upper bounds. For a whole set of functions T , we will use the abbreviation [ CA  TIME(T ) D CA  TIME( f ) :

that TM  TIME(Pol(n)) D P and TM  SPC(Pol(n)) D PSPACE. Discussion For higher-dimensional CA the definition of space complexity requires more consideration. One possibility is to count the number of cells used during the computation. A different, but sometimes more convenient approach is to count the number of cells in the smallest hyper-rectangle comprising all used cells. Turing Machines For reference, and because we will consider a parallel variant, we set forth some definitions of Turing machines. In general we allow Turing machines with k work tapes and h heads on each of them. Each square carries a symbol from the tape alphabet B which includes the blank symbol . The control unit (CU) is a finite automaton with set of states S. The possible actions of a deterministic TM are described by a function f : S  B k h ! S  B k h  D k h , where D D f1; 0; C1g is used for indicating the direction of movement of a head. If the machine reaches a situation in which f (s; b1 ; : : : ; b k h ) D (s; b1 ; : : : ; b k h , 0; : : : ; 0) for the current state s and the currently scanned symbols b1 ; : : : ; b k h , we say that it halts. Initially a word w of length n over the input alphabet A  B is written on the first tape on squares 1; : : : ; n, all other tape squares are empty, i. e., carry the . An input is accepted if the CU halts in an accepting state from a designated subset FC  S. L(T) denotes the formal language of all words accepted by a TM T. We write kT h  TM  SPC(s)  TIME(t) for the class of all formal languages which can be recognized by TM T with k work tapes and h heads on each of them which have a space complexity space T  s and time T  t. If k and/or h is missing, 1 is assumed instead. If the whole prefix kT h is missing, T is assumed. If arbitrary k and h are allowed, we write T .

t2T

Typical examples will be T D O(n) or T D Pol(n), where S in general Pol( f ) D k2NC O( f k ). Resource bounded complexity classes for other computational models will be noted similarly. If we want to make the dimension of the CA explicit, we write Zd  CA  : : :; if the prefix Zd is missing, d D 1 is to be assumed. Throughout this article n will always denote the length of input words. Thus a time complexity of Pol(n) simply means polynomial time, and similarly for space, so

Sequential Versus Parallel Models Today quite a number of different computational models are known which intuitively look as if they are parallel. Several years ago van Emde Boas [6] observed that many of these models have one property in common. The problems that can be solved in polynomial time on such a model P , coincide with the problems that can be solved in polynomial space on Turing machines: P  T IME(Pol(n)) D TM  SPC(Pol(n)) D PSPACE :

Cellular Automata as Models of Parallel Computation

Here we have chosen P as an abbreviation for “parallel” model. Models P satisfying this equality are by definition the members of the so-called second machine class. On the other hand, the first machine class is formed by all models S satisfying the relation S  SPC(s)  T IME(t) D TM SPC(‚(s))  T IME (Pol(t))

at least for some reasonable functions s and t. We deliberately avoid making this more precise. In general there is consensus on which models are in the these machine classes. We do want to point out that the naming of the two machine classes does not mean that they are different or even disjoint. This is not known. For example, if P D PSPACE, it might be that the classes coincide. Furthermore there are models, e. g., Savitch’s NLPRAM [25], which might be in neither machine class. Another observation is that in order to possibly classify a machine model it obviously has to have something like “time complexity” and/or “space complexity”. This may sound trivial, but we will see in Subsect. “Parallel Turing Machines” that, for example, for so-called parallel Turing machines with several work tapes it is in fact not. Time and Space Complexity Comparison of Resource Bounded One-Dimensional CA It is clear that time and space complexity for CA are Blum measures [2] and hence infinite hierarchies of complexity classes exist. It follows from the more general Theorem 9 for parallel Turing machines that the following holds: Theorem 1 Let s and t be two functions such that s is fully CA space constructable in time t and t is CA computable in space s and time t. Then: [ CA  SPC(‚(s/ ))  TIME(‚(t/ )) …O(1)

¤ CA  SPC(O(s))  TIME(O(t))

CA  SPC(o(s))  TIME(o(t)) ¤ CA  SPC(O(s))  TIME(O(t)) CA  TIME(o(t)) ¤ CA  T IME(O(t)) : The second and third inclusion are simple corollaries of the first one. We do not go into the details of the definition of CA constructibility, but note that for hierarchy results for TM one sometimes needs analogous additional conditions. For details, interested readers are referred to [3,12,17].

We not that for CA the situation is better than for deterministic TM: there one needs f (n) log f (n) 2 o(g(n)) in order to prove TM  TIME( f ) ¤ TM  T IME(g). Open Problem 2 For the proper inclusions in Theorem 1 the construction used in [34] really needs to increase the space used in order to get the time hierarchy. It is an open problem whether there also exists a time hierarchy if the space complexity is fixed, e. g., as s(n) D n. It is even an open problem to prove or disprove that the inclusion CA  SPC(n)  TIME (n) CA  SPC(n)  TIME (2O(n) ) : is proper or not. We will come back to this topic in Subsect. “Parallel Turing Machines” on parallel Turing machines. Comparison with Turing Machines It is well-known that a TM with one tape and one head on that tape can be simulated by a one-dimensional CA. See for example the paper by Smith [27]. But even multi-tape TM can be simulated by a one-dimensional CA without any significant loss of time. Theorem 3 For all space bounds s(n) n and all time bounds t(n) n the following holds for one-dimensional CA and TM with an arbitrary number of heads on its tapes:  T  TM  SPC(s)  TIME(t) CA  SPC(s)  TIME(O(t)) : Sketch of the Simulation We first describe a simulation for 1T 1  TM. In this case the actions of the TM are of the form s; b ! s 0 ; b0 ; d where s; s 0 2 S are old and new state, b; b0 2 B old and new tape symbol and d 2 f1; 0; C1g the direction of head movement. The simulating CA uses three substates in each cell, one for a TM state, one for a tape symbol, and an additional one for shifting tape symbols: Q D Q S  Q T  Q M . We use Q S D S [ f g and a substate of means, that the cell does not store a state. Similarly Q T D B [ f < ı; ı > g and a substate of < ı or ı > means that there is no symbol stored but a “hole” to be filled with an adjacent symbol. Substates from Q S D B  f < ; > g, like < b and b > , are used for shifting symbols from one cell to the adjacent one to the left or right. Instead of moving the state one cell to the right or left whenever the TM moves its head, the tape contents as stored in the CA are shifted in the opposite direction. Assume for example that the TM performs the following actions:  s0,d ! s1, d0 , + 1

303

304

Cellular Automata as Models of Parallel Computation

In other words, the CA transports “a large amount of information over a short distance in one step”. Theorem 3 says that this ability is at least as powerful as the ability of multi-head TM to transport “a small amount of information over a long distance in one step”. Open Problem 4 The question remains whether some kind of converse also holds and in Theorem 3 an D sign would be correct instead of the , or whether CA are more powerful, i. e., a ¨ sign would be correct. This is not known. The best simulation of CA by TM that is known is the obvious one: states of neighboring cells are stored on adjacent tape squares. For the simulation of one CA step the TM basically makes one sweep across the complete tape segment containing the states of all non-quiescent cells updating them one after the other. As a consequence one gets Cellular Automata as Models of Parallel Computation, Figure 2 Shifting tape contents step by step

 s1,e ! s2, e0 ,  1  s2,d0 ! s3, d00 ,  1  s3,c ! s4, c0 , + 1 Figure 2 shows how shifting the tape in direction d can be achieved by sending the current symbol in that direction and sending a “hole” ı in the opposite direction d. It should be clear that the required state changes of each cell depend only on information available in its neighborhood. A consequence of this approach to incrementally shift the tape contents is that it takes an arbitrary large number of steps until all symbols have been shifted. On the other hand, after only two steps the cell simulating the TM control unit has information about the next symbol visited and can simulate the next TM step and initialize the next tape shift. Clearly the same approach can be used if one wants to simulate a TM with several tapes, each having one head. For each additional tape the CA would use two additional registers analogously to the middle and bottom row used in Fig. 2 for one tape. Stoß [28] has proved that kT h  TM (h heads on each tape) can be simulated by (kh)T  TM (only one head on each tape) in linear time. Hence there is nothing left to prove. Discussion As one can see in Fig. 2, in every second step one signal is sent to the left and one to the right. Thus, if the TM moves its head a lot and if the tape segment which has to be shifted is already long, many signals are traveling simultaneously.

Theorem 5 For all space bounds s(n) n and all time bounds t(n) n holds: CA SPC(s)TIME(t) TMSPC(s)TIME(O(s  t))    TM  SPC(s)  TIME O t 2 : The construction proving the first inclusion needs only a one-head TM, and no possibility is known to take advantage of more heads. The second inclusion follows from the observation that in order to use an initially blank tape square, a TM must move one of its heads there, which requires time. Thus s 2 O(t). Taking Theorems 3 and 5 together, one immediately gets Corollary 6 Cellular automata are in the first machine class. And it is not known that CA are in the second machine class. In this regard they are “more like” sequential models. The reason for this is the fact, the number of active processing units only grows polynomially with the number of steps in a computation. In Sect. “Communication in CA” variations of the standard CA model will be considered, where this is different. Measuring and Controlling the Activities Parallel Turing Machines One possible way to make a parallel model from Turing machines is to allow several control units (CU), but with all of them working on the same tape (or tapes). This model can be traced back at least to a paper by Hemmerling [9] who called it Systems of Turing automata. A few years later Wiedermann [32] coined the term Parallel Turing machine (PTM).

Cellular Automata as Models of Parallel Computation

We consider only the case where there is only one tape and each of the control units has only one head on that tape. As for sequential TM, we usually drop the prefix 1T 1 for PTM, too. Readers interested in the case of PTM with multi-head CUs are referred to [33]. PTM with One-Head Control Units The specification of a PTM includes a tape alphabet B with a blank Cybil  and a set S of possible states for each CU. A PTM starts with one CU on the first input symbol as a sequential 1T 1  TM. During the computation the number of control units may increase and decrease, but all CUs always work cooperatively on one common tape. The idea is to have the CUs act independently unless they are “close” to each other, retaining the idea of only local interactions, as in CA. A configuration of a PTM is a pair c D (p; b). The mapping b : Z ! B describes the contents of the tape. Let 2S denote the power set of S. The mapping p : Z ! 2S describes for each tape square i the set of states of the finite automata currently visiting it. In particular, this formalization means that it is not possible to distinguish two automata on the same square and in the same state: the idea is that because of that they will always behave identically and hence need not be distinguished. The mode of operation of a PTM is determined by the transition function f : 2Q  B ! 2QD  B where D is the set f1; 0; 1g of possible movements of a control unit. In order to compute the successor configuration c 0 D (p0 ; b0 ) of a configuration c D (p; b), f is simultaneously computed for all tape positions i 2 Z. The arguments used are the set of states of the finite automata currently visiting square i and its tape symbol. Let (M 0i ; b0i ) D f (p(i); b(i)). Then the new symbol on square i in configuration c 0 is b0 (i) D b0i . The set of finite automata on square i is replaced by a new set of finite automata (defined by M 0i Q  D) each of which changes the tape square according to the indicated direction of movement. Therefore p0 (i) D fq j (q; 1) 2 M 0i1 _ (q; 0) 2 M 0i _ (q; 1) 2 M 0iC1 g. Thus f induces a global transition function F mapping global configurations to global configurations. In order to make the model useful (and to come up to some intuitive expectations) it is required, that CUs cannot arise “out of nothing” and that the symbol on a tape square can change only if it is visited by at least one CU. In other words we require that 8 b 2 B : f (;; b) D (;; b). Observe that the number of finite automata on the tape may change during a computation. Automata may vanish, for example if f (fsg; b) D (;; b) and new automata may be generated, for example if f (fsg; b) D (f(q; 1); (q0 ; 0)g; b).

For the recognition of formal languages we define the initial configuration cw for an input word w 2 AC as the one in which w is written on the otherwise blank tape on squares 1; 2; : : : ; jwj, and in which there exists exactly one finite automaton in an initial state q0 on square 1. A configuration (p; b) of aPTM is called accepting iff it is stable (i. e. F((p; b)) D (p; b)) and p(1) FC . The language L(P) recognized by a PTM P is the set of input words, for which it reaches an accepting configuration. Complexity Measures for PTM Time complexity of a PTM can be defined in the obvious way. For space complexity, one counts the total number of tape squares which are used in at least one configuration. Here we call a tape square i unused in a configuration c D (p; b) if p(i) D ; and b(i) D ; otherwise it is used. What makes PTM interesting is the definition of its processor complexity. Let proc0 (w) denote the maximum number of CU which exist simultaneously in a configuration occurring during the computation for input w and define proc : NC ! NC : n 7! maxfproc0 (w) j w 2 An g. For complexity classes we use the notation PTMSPC(s) TIME(t)  PROC(p) etc. The processor complexity is one way to measure (an upper bound on) “how many activities” happened simultaneously. It should be clear that at the lower end one has the case of constant proc(n) D 1, which means that the PTM is in fact (equivalent to) a sequential TM. The other extreme is to have CUs “everywhere”. In that case proc(n) 2 ‚(space(n)), and one basically has a CA. In other words, processor complexity measures the mount of parallelism of a PTM. Theorem 7 For all space bounds s and time bounds t: PTM  SPC(s)T IME(t)  PROC (1) D TM  SPC(s)  TIME(t) PTM  SPC(s)T IME(t)  PROC (s) D CA  SPC(s)  TIME(O(t)) : Under additional constructibility conditions it is even possible to get a generalization of Theorem 5: Theorem 8 For all functions s(n) n, t(n) n, and h(n) 1, where h is fully PTM processor constructable in space s, time t, and with h processors, holds: CA  SPC(O(s))  TIME(O(t)) PTMSPC(O(s))T IME(O(st/h))PROC (O(h)): Decreasing the processor complexity indeed leads to the expected slowdown.

305

306

Cellular Automata as Models of Parallel Computation

Relations Between PTM Complexity Classes (part 1) The interesting question now is whether different upper bounds on the processor complexity result in different computational power. In general that is not the case, as PTM with only one CU are TM and hence computationally universal. (As a side remark we note that therefore processor complexity cannot be a Blum measure. In fact that should be more or less clear since, e. g., deciding whether a second CU will ever be generated might require finding out whether the first CU, i. e., a TM, ever reaches a specific state.) In this first part we consider the case where two complexity measures are allowed to grow in order to get a hierarchy. Results which only need one growing measure are the topic of the second part. First of all it turns out that for fixed processor complexity between log n and s(n) there is a space/time hierarchy: Theorem 9 Let s and t be two functions such that s is fully PTM space constructable in time t and t is PTM computable in space s and time t and let h log. Then: [ …O(1)

PTM  SPC(‚(s/ ))  TIME(‚(t/ ))  PROC (O(h))

¤ PTM  SPC(O(s))  T IME(O(t))  PROC (O(h)) : The proof of this theorem applies the usual idea of diagonalization. Technical details can be found in [34]. Instead of keeping processor complexity fixed and letting space complexity grow, one can also do the opposite. As for analogous results for TM, one needs the additional restriction to one fixed tape alphabet. One gets the following result, where the complexity classes carry the additional information about the size of the tape alphabet. Theorem 10 Let s, t and h be three functions such that s is fully PTM space constructable in time t and with h processors, and that t and h are PTM computable in space s and time t and with h processors such that in all cases the tape is not written. Let b 2 be the size of the tape alphabet. Then: [ …O(1)

PTM  SPC(s)  TIME‚(t/ )  PROC(h/ )  ALPH(b)

¤ PTM  SPC(s)  TIME(‚(st))  PROC (‚(h))  ALPH(b) : Again we do not go into the details of the constructibility definitions which can be found in [34]. The important point here is that one can prove that increasing time

and processor complexity by a non-constant factor does increase the (language recognition) capabilities of PTM, even if the space complexity is fixed, provided that one does not allow any changes to the tape alphabet. In particular the theorem holds for the case space(n) D n. It is now interesting to reconsider Open Problem 2. Let’s assume that CA  SPC(n)  TIME (n) D CA  SPC(n)  TIME (2O(n) ) : One may choose  (n) D log n, t(n) D 2n/ log n and h(n) D n in Theorem 10. Using that together with Theorem 7 the assumption would give rise to PTM  SPC(n)  T IME

2n/ log n log n

!

 PROC

n log n



 ALPH(b) ¤ PTM  SPC(n)  TIME(n2

n/ log n

)  PROC (n)  ALPH(b)

D PTM  SPC(n)  T IME(n)  PROC(n)  ALPH(b) : If the polynomial time hierarchy for n-space bounded CA collapses, then there are languages which cannot be recognized by PTM in almost exponential time with n/ log n processors but which can be recognized by PTM with n processors in linear time, if the tape alphabet is fixed. Relations Between PTM Complexity Classes (part 2) One can get rid of the fixed alphabet condition by using a combinatorial argument for a specific formal language (instead of diagonalization) and even have not only the space but also the processor complexity fixed and still get a time hierarchy. The price to pay is that the range of time bounds is more restricted than in Theorem 10. Consider the formal language o n Lvv D vcjvj v j v 2 fa; bgC : It contains all words which can be divided into three segments of equal length such that the first and third are identical. Intuitively whatever type of machine is used for recognition, it is unavoidable to “move” the complete information from one end to the other. Lvv shares this feature with Lpal . Using a counting argument inspired by Hennie’s concept of crossing sequences [10] applied to Lpal , one can show: Lemma 11 ([34]) If P is a PTM recognizing Lvv , then time2P  proc P 2 (n3 / log2 n).

Cellular Automata as Models of Parallel Computation

On the other hand, one can construct a PTM recognizing Lvv with processor complexity na for sufficiently nice a:

For the language Lvv which already played a role in the previous subsection one can show:

Lemma 12 ([34]) For each a 2 Q with 0 < a < 1 holds:       Lvv 2 PTMSPC(n)TIME ‚ n2a PROC ‚ n a :

Lemma 15 Let f (n) be a non-decreasing function which is not in O(log n), i. e., limn!1 log n/ f (n) D 0. Then any CA C recognizing Lvv makes a total of at least (n2 / f (n)) state changes in the segment containing the n input cells and n cells to the left and to the right of them. In particular, if timeC 2 ‚(n), then sumchgC 2 (n2 / f (n)). Furthermore maxchgC 2 (n/ f (n)).

Putting these lemmas together yields another hierarchy theorem: Theorem 13 For rational numbers 0 < a < 1 and 0 < " < 3/2  a/2 holds:    PTM  SPC(n)  TIME ‚ n3/2a/2"

    PROC ‚ n a    ¤ PTM  SPC(n)  T IME ‚ n2a     PROC ‚ n a :

Hence, for a close to 1 a “small” increase in time by some 0 n" suffices to increase the recognition power of PTM while the processor complexity is fixed at na and the space complexity is fixed at n as well. Open Problem 14 For the recognition of Lvv there is a gap between the lower bound of time2P  proc P 2 (n3 / log2 n) in Lemma 11 and the upper bound of time2P  proc P 2 O(n4a ) in Lemma 12. It is not known whether the upper or the lower bound or both can be improved. An even more difficult problem is to prove a similar result for the case a D 1, i. e., cellular automata, as mentioned in Open Problem 2. State Change Complexity In CMOS technology, what costs most of the energy is to make a proper state change, from zero to one or from one to zero. Motivated by this fact Vollmar [30] introduced the state change complexity for CA. There are two variants based on the same idea: Given a halting CA computation for an input w and a cell i one can count the number of time points , 1   time0 (w), such that cell i is in different states at times  1 and . Denote that number by change0 (w; i). Define maxchg0 (w) D max change0 (w; i) and i2G X 0 change0 (w; i) sumchg (w) D i2G

and maxchg(n) D maxfmaxchg0 (w) j w 2 An g sumchg(n) D maxfsumchg0 (w) j w 2 An g :

In the paper by Sanders et al. [24] a generalization of this lemma to d-dimensional CA is proved. Open Problem 16 While the processor complexity of PTM measures how many activities happen simultaneously “across space”, state change complexity measures how many activities happen over time. For both cases we have made use of the same formal language in proofs. That might be an indication that there are connections between the two complexity measures. But no non-trivial results are known until now. Asynchronous CA Until now we have only considered one global mode of operation: the so-called synchronous case, where in each global step of the CA all cells must update their states synchronously. Several models have been considered where this requirement has been relaxed. Generally speaking, asynchronous CA are characterized by the fact that in one global step of the CA some cells are active and do update their states (all according to the same local transition function) while others do nothing, i. e., remain in the same state as before. There are then different approaches to specify some restrictions on which cells may be active or not. Asynchronous update mode. The simplest possibility is to not quantify anything and to say that a configuration c 0 is a legal successor of configuration c, denoted c ` c 0 , iff for all i 2 G one has c 0 (i) D c(i) or c 0 (i) D f (c iCN ). Unordered sequential update mode. In this special case it is required that there is only one active cell in each global step, i. e., card(fijc 0 (i) 6D c(i)g)  1. Since CA with an asynchronous update mode are no longer deterministic from a global (configuration) point of view, it is not completely clear how to define, e. g., formal language recognition and time complexity. Of course one could follow the way it is done for nondeterministic TM. To the best of our knowledge this has not considered for

307

308

Cellular Automata as Models of Parallel Computation

asynchronous CA. (There are results for general nondeterministic CA; see for example  Cellular Automata and Language Theory.) It should be noted that Nakamura [20] has provided a very elegant construction for simulating a CA Cs with synchronous update mode on a CA Ca with one of the above asynchronous update modes. Each cell stores the “current” and the “previous” state of a Cs -cell before its last activation and a counter value T modulo 3 (Q a D Qs  Qs  f0; 1; 2g). The local transition function f a is defined in such a way that an activated cell does the following:  T always indicates how often a cell has already been updated modulo 3.  If the counters of all neighbors have value T or T C 1, the current Cs -state of the cell is remembered as previous state and a new current state is computed according to f s from the current and previous Cs -states of the neighbors; the selection between current and previous state depends on the counter value of that cell. In this case the counter is incremented.  If the counter of at least one neighboring cell is at T  1, the activated cell keeps its complete state as it is. Therefore, if one does want to gain something using asynchronous CA, their local transition functions would have to be designed for that specific usage. Recently, interest has increased considerably in CA where the “degree of (a-)synchrony” is quantified via probabilities. In these cases one considers CA with only a finite number of cells. Probabilistic update mode. Let 0  ˛  1 be a probability. In probabilistic update mode each legal global step c ` c 0 of the CA is assigned a probability by requiring that each cell i independently has a probability ˛ of updating its state. Random sequential update mode. This is the case when in each global step one of the cells in G is chosen with even probability and its state updated, while all others do not change their state. CA operating in this mode are called fully asynchronous by some authors. These models can be considered special cases of what is usually called probabilistic or stochastic CA. For these CA the local transition function is no longer a map from QN to Q, but from QN to [0; 1]Q . For each ` 2 Q N the value f (`) is a probability distribution for the next state (satisP fying q2Q f (`)(q) D 1). There are only very few papers about formal language recognition with probabilistic CA; see [18].

On the other hand, probabilistic update modes have received some attention recently. See for example [22] and the references therein. Development of this area is still at its beginning. Until now, specific local rules have mainly been investigated; for an exception see [7]. Communication in CA Until now we have considered only one-dimensional Euclidean CA where one bit of information can reach O(t) cells in t steps. In this section we will have a look at a few possibilities for changing the way cells communicate in a CA. First we have a quick look at CA where the underlying grid is Zd . The topic of the second subsection is CA where the cells are connected to form a tree. Different Dimensionality In Zd  CA with, e. g., von Neumann neighborhood of radius 1, a cell has the potential to influence O(t d ) cells in t steps. This is a polynomial number of cells. It comes as no surprise that Zd  CA  SPC(s)  TIME(t) TM  SPC(Pol(s))  TIME(Pol(t)) and hence Zd  CA are in the first machine class. One might only wonder why for the TM a space bound of ‚(s) might not be sufficient. This is due to the fact that the shape of the cells actually used by the CA might have an “irregular” structure and that the TM has to perform some bookkeeping or simulate a whole (hyper-)rectangle of cells encompassing all that are really used by the CA. Trivially, a d-dimensional CA can be simulated on a d 0 -dimensional CA, where d 0 > d. The question is how much one loses when decreasing the dimensionality. The currently known best result in this direction is by Scheben [26]: Theorem 17 It is possible to simulate a d 0 -dimensional CA with running time t on a d-dimensional CA, d < d 0 with running time and space O(t 2

lddd 0 /de

).

It should be noted that the above result is not directly about language recognition; the redistribution of input symbols needed for the simulation is not taken into account. Readers interested in that as well are referred to [1]. Open Problem 18 Try to find simulations of lower on higher dimensional CA which somehow make use of the “higher connectivity” between cells. It is probably much too difficult or even impossible to hope for general

Cellular Automata as Models of Parallel Computation

speedups. But efficient use of space (small hypercubes) for computations without losing time might be achievable. Tree CA and hyperbolic CA Starting from the root of a full binary tree one can reach an exponential number 2 t of nodes in t steps. If there are some computing capabilities related to the nodes, there is at least the possibility that such a device might exhibit some kind of strong parallelism. One of the earliest papers in this respect is Wiedermann’s article [31] (unfortunately only available in Slovak). The model introduced there one would now call parallel Turing machines, where a tape is not a linear array of cells, but where the cells are connected in such a way as to form a tree. A proof is sketched, showing that these devices can simulate PRAMs in linear time (assuming the socalled logarithmic cost model). PRAMs are in the second machine class. So, indeed in some sense, trees are powerful. Below we first quickly introduce a PSPACE-complete problem which is a useful tool in order to prove the power of computational models involving trees. A few examples of such models are considered afterwards. Quantified Boolean Formula The instances of the problem Quantified Boolean Formula (QBF, sometimes also called QSAT) have the structure Q1 x1 Q2 x2    Q k x k : F(x1 ; : : : ; x k ) : Here F(x1 ; : : : ; x k ) is a Boolean formula with variables x1 ; : : : ; x k and connectives ^, _ and :. Each Qj is one of the quantifiers 8 or 9. The problem is to decide whether the formula is true under the obvious interpretation. This problem is known to be complete for PSPACE. All known TM, i. e., all deterministic sequential algorithms, for solving QBF require exponential time. Thus a proof that QBF can be solved by some model M in polynomial time (usually) implies that all problems in PSPACE can be solved by M in polynomial time. Often this can be paired with the “opposite” results that problems that can be solved in polynomial time on M are in PSPACE, and hence TM  PSPACE D M  P. This, of course, is not the case only for models “with trees”; see [6] for many alternatives. Tree CA A tree CA (TCA for short) working on a full d-ary tree can be defined as follows: There is a set of states Q. For the root there is a local transition function f0 : (A [ fg)  Q  Qd ! Q, which uses an input symbol (if available), the root cell’s own state and

those of the d child nodes to computex the next state of the node. And there are d local transition functions f i : Q  Q  Q d ! Q, where 1  i  d. The ith child of a node uses f i to compute its new state depending the state of its parent node, its own state and the states of the d child nodes. For language recognition, input is provided sequentially to the root node during the first n steps and a blank symbol  afterwards. A word is accepted if the root node enters an accepting state from a designated subset FC  Q. Mycielski and Niwi´nski [19] were the first to realize that sequential polynomial reductions can be carried out by TCA and that QBF can be recognized by tree CA as well: A formula to be checked with k variables is copied and distributed to 2 k “evaluation cells”. The sequences of left/right choices of the paths to them determine a valuation of the variables with zeros and ones. Each evaluation cell uses the subtree below it to evaluate F(x1 ; : : : ; x k ) accordingly. The results are propagated up to the root. Each cell in level i below the root, 1  i  k above an evaluation cell combines the results using _ or ^, depending on whether the ith quantifier of the formula was 9 or 8. On the other hand, it is routine work to prove that the result of a TCA running in polynomial time can be computed sequentially in polynomial space by a depth first procedure. Hence one gets: Theorem 19 TCA  TIME(Pol(n)) D PSPACE Thus tree cellular automata are in the second machine class. Hyperbolic CA Two-dimensional CA as defined in Sect. “Introduction” can be considered as arising from the tessellation of the Euclidean plane Z2 with squares. Therefore, more generally, sometimes CA on a grid G D Zd are called Euclidean CA. Analogously, some Hyperbolic CA arise from tessellation of the hyperbolic plane with some regular polygon. They are covered in depth in a separate article ( Cellular Automata in Hyperbolic Spaces). Here we just consider one special case: The two-dimensional hyperbolic plane can be tiled with copies of the regular 6-gon with six right angles. If one considers only one quarter and draws a graph with the tiles as nodes and links between those nodes who share a common tile edge, one gets the graph depicted in Fig. 3. Basically it is a tree with two types of nodes, black and white ones, and some “additional” edges depicted as dotted lines. The root is a white node. The first child of each node is black. All other children are white; a black node has 2 white children, a white node has 3 white children.

309

310

Cellular Automata as Models of Parallel Computation

Cellular Automata as Models of Parallel Computation, Figure 3 The first levels of a tree of cells resulting from a tiling of the hyperbolic plane with 6-gons

For hyperbolic CA (HCA) one uses the formalism analogously to that described for tree CA. As one can see, basically HCA are trees with some additional edges. It is therefore not surprising that they can accept the languages from PSPACE in polynomial time. The inverse inclusion is also proved similarly to the tree case. This gives:

Future Directions

Theorem 20

Proper Inclusions and Denser Hierarchies

HCA  TIME(Pol(n)) D PSPACE It is also interesting to have a look at the analogs of P, PSPACE, and so on for hyperbolic CA. Somewhat surprisingly Iwamoto et al. [11] have shown:

At several points in this paper we have pointed out open problems which deserve further investigation. Here, we want to stress three areas which we consider particularly interesting in the area of “CA as a parallel model”.

It has been pointed out several times, that inclusions of complexity classes are not known to be proper, or that gaps between resource bounds still needed to be “large” in order to prove that the inclusion of the related classes is a proper one. For the foreseeable future it remains a wide area for further research. Most probably new techniques will have to be developed to make significant progress.

Theorem 21 Activities HCA  T IME(Pol(n)) D HCA  SPC(Pol(n)) D NHCA  TIME(Pol(n)) D NHCA  SPC(Pol(n)) where NHCA denotes nondeterministic hyperbolic CA. The analogous equalities hold for exponential time and space. Outlook There is yet another possibility for bringing trees into play: trees of configurations. The concept of alternation [4] can be carried over to cellular automata. Since there are several active computational units, the definitions are little bit more involved and it turns out that one has several possibilities which also result in models with slightly different properties. But in all cases one gets models from the second machine class. For results, readers are referred to [13] and [23]. On the other end there is some research on what happens if one restricts the possibilities for communication between neighboring cells: Instead of getting information about the complete states of the neighbors, in the extreme case only one bit can be exchanged. See for example [15].

The motivation for considering state change complexity was the energy consumption of CMOS hardware. It is known that irreversible physical computational processes must consume energy. This seems not to be the case for reversible ones. Therefore reversible CA are also interesting in this respect. The definition of reversible CA and results for them are the topic of the article by Morita ( Reversible Cellular Automata), also in this encyclopedia. Surprisingly, all currently known simulations of irreversible CA on reversible ones (this is possible) exhibit a large state change complexity. This deserves further investigation. Also the examination of CA which are “reversible on the computational core” has been started only recently [16]. There are first surprising results; the impacts on computational complexity are unforeseeable. Asynchronicity and Randomization Randomization is an important topic in sequential computing. It is high time that this is also investigated in much

Cellular Automata as Models of Parallel Computation

more depth for cellular automata. The same holds for cellular automata where not all cells are updating their states synchronously. These areas promise a wealth of new insights into the essence of fine-grained parallel systems. Bibliography 1. Achilles AC, Kutrib M, Worsch T (1996) On relations between arrays of processing elements of different dimensionality. In: Vollmar R, Erhard W, Jossifov V (eds) Proceedings Parcella ’96, no. 96 in Mathematical Research. Akademie, Berlin, pp 13–20 2. Blum M (1967) A machine-independent theory of the complexity of recursive functions. J ACM 14:322–336 3. Buchholz T, Kutrib M (1998) On time computability of functions in one-way cellular automata. Acta Inf 35(4):329–352 4. Chandra AK, Kozen DC, Stockmeyer LJ (1981) Alternation. J ACM 28(1):114–133 5. Delorme M, Mazoyer J, Tougne L (1999) Discrete parabolas and circles on 2D cellular automata. Theor Comput Sci 218(2):347– 417 6. van Emde Boas P (1990) Machine models and simulations. In: van Leeuwen J (ed) Handbook of Theoretical Computer Science, vol A. Elsevier Science Publishers and MIT Press, Amsterdam, chap 1, pp 1–66 7. Fatès N, Thierry É, Morvan M, Schabanel N (2006) Fully asynchronous behavior of double-quiescent elementary cellular automata. Theor Comput Sci 362:1–16 8. Garzon M (1991) Models of Massive Parallelism. Texts in Theoretical Computer Science. Springer, Berlin 9. Hemmerling A (1979) Concentration of multidimensional tape-bounded systems of Turing automata and cellular spaces. In: Budach L (ed) International Conference on Fundamentals of Computation Theory (FCT ’79). Akademie, Berlin, pp 167–174 10. Hennie FC (1965) One-tape, off-line Turing machine computations. Inf Control 8(6):553–578 11. Iwamoto C, Margenstern M (2004) Time and space complexity classes of hyperbolic cellular automata. IEICE Trans Inf Syst E87-D(3):700–707 12. Iwamoto C, Hatsuyama T, Morita K, Imai K (2002) Constructible functions in cellular automata and their applications to hierarchy results. Theor Comput Sci 270(1–2):797–809 13. Iwamoto C, Tateishi K, Morita K, Imai K (2003) Simulations between multi-dimensional deterministic and alternating cellular automata. Fundamenta Informaticae 58(3/4):261–271 14. Kutrib M (2008) Efficient pushdown cellular automata: Universality, time and space hierarchies. J Cell Autom 3(2):93–114 15. Kutrib M, Malcher A (2006) Fast cellular automata with restricted inter-cell communication: Computational capacity. In: Navarro YKG, Bertossi L (ed) Proceedings Theor Comput Sci (IFIP TCS 2006), pp 151–164

16. Kutrib M, Malcher A (2007) Real-time reversible iterative arrays. In: Csuhaj-Varjú E, Ésik Z (ed) Fundametals of Computation Theory 2007. LNCS vol 4639. Springer, Berlin, pp 376–387 17. Mazoyer J, Terrier V (1999) Signals in one-dimensional cellular automata. Theor Comput Sci 217(1):53–80 18. Merkle D, Worsch T (2002) Formal language recognition by stochastic cellular automata. Fundamenta Informaticae 52(1– 3):181–199 ´ 19. Mycielski J, Niwinski D (1991) Cellular automata on trees, a model for parallel computation. Fundamenta Informaticae XV:139–144 20. Nakamura K (1981) Synchronous to asynchronous transformation of polyautomata. J Comput Syst Sci 23:22–37 21. von Neumann J (1966) Theory of Self-Reproducing Automata. University of Illinois Press, Champaign. Edited and completed by Arthur W. Burks 22. Regnault D, Schabanel N, Thierry É (2007) Progress in the analysis of stochastic 2d cellular automata: a study of asychronous 2d minority. In: Csuhaj-Varjú E, Ésik Z (ed) Fundametals of Computation Theory 2007. LNCS vol 4639. Springer, Berlin, pp 376– 387 23. Reischle F, Worsch T (1998) Simulations between alternating CA, alternating TM and circuit families. In: MFCS’98 satellite workshop on cellular automata, pp 105–114 24. Sanders P, Vollmar R, Worsch T (2002) Cellular automata: Energy consumption and physical feasibility. Fundamenta Informaticae 52(1–3):233–248 25. Savitch WJ (1978) Parallel and nondeterministic time complexity classes. In: Proc. 5th ICALP, pp 411–424 26. Scheben C (2006) Simulation of d0 -dimensional cellular automata on d-dimensional cellular automata. In: El Yacoubi S, Chopard B, Bandini S (eds) Proceedings ACRI 2006. LNCS, vol 4173. Springer, Berlin, pp 131–140 27. Smith AR (1971) Simple computation-universal cellular spaces. J ACM 18(3):339–353 28. Stoß HJ (1970) k-Band-Simulation von k-Kopf-Turing-Maschinen. Computing 6:309–317 29. Stratmann M, Worsch T (2002) Leader election in d-dimensional CA in time diam  log(diam). Future Gener Comput Syst 18(7):939–950 30. Vollmar R (1982) Some remarks about the ‘efficiency’ of polyautomata. Int J Theor Phys 21:1007–1015 31. Wiedermann J (1983) Paralelný Turingov stroj – Model distribuovaného poˇcítaˇca. In: Gruska J (ed) Distribouvané a parallelné systémy. CRC, Bratislava, pp 205–214 32. Wiedermann J (1984) Parallel Turing machines. Tech. Rep. RUU-CS-84-11. University Utrecht, Utrecht 33. Worsch T (1997) On parallel Turing machines with multi-head control units. Parallel Comput 23(11):1683–1697 34. Worsch T (1999) Parallel Turing machines with one-head control units and cellular automata. Theor Comput Sci 217(1):3–30

311

312

Cellular Automata, Classification of

Cellular Automata, Classification of KLAUS SUTNER Carnegie Mellon University, Pittsburgh, USA Article Outline Glossary Definition of the Subject Introduction Reversibility and Surjectivity Definability and Computability Computational Equivalence Conclusion Bibliography Glossary Cellular automaton For our purposes, a (one-dimensional) cellular automaton (CA) is given by a local map : ˙ w ! ˙ where ˙ is the underlying alphabet of the automaton and w is its width. As a data structure, suitable as input to a decision algorithm, a CA can thus be specified by a simple lookup table. We abuse notation and write (x) for the result of applying the global map of the CA to configuration x 2 ˙ Z . Wolfram classes Wolfram proposed a heuristic classification of cellular automata based on observations of typical behaviors. The classification comprises four classes: evolution leads to trivial configurations, to periodic configurations, evolution is chaotic, evolution leads to complicated, persistent structures. Undecidability It was recognized by logicians and mathematicians in the first half of the 20th century that there is an abundance of well-defined problems that cannot be solved by means of an algorithm, a mechanical procedure that is guaranteed to terminate after finitely many steps and produce the appropriate answer. The best known example of an undecidable problem is Turing’s Halting Problem: there is no algorithm to determine whether a given Turing machine halts when run on an empty tape. Semi-decidability A problem is said to be semi-decidable or computably enumerable if it admits an algorithm that returns “yes” after finitely many steps if this is indeed the correct answer. Otherwise the algorithm never terminates. The Halting Problem is the standard example for a semi-decidable problem. A problem is decidable if, and only if, the problem itself and its negation are semi-decidable.

Universality A computational device is universal it is capable of simulating any other computational device. The existence of universal computers was another central insight of the early days of computability theory and is closely related to undecidability. Reversibility A discrete dynamical system is reversible if the evolution of the system incurs no loss of information: the state at time t can be recovered from the state at time t C 1. For CAs this means that the global map is injective. Surjectivity The global map of a CA is surjective if every configuration appears as the image of another. By contrast, a configuration that fails to have a predecessor is often referred to as a Garden-of-Eden. Finite configurations One often considers CA with a special quiescent state: the homogeneous configuration where all cells are in the quiescent state is required to be fixed point under the global map. Infinite configurations where all but finitely many cells are in the quiescent state are often called finite configurations. This is somewhat of a misnomer; we prefer to speak about configurations with finite support. Definition of the Subject Cellular automata display a large variety of behaviors. This was recognized clearly when extensive simulations of cellular automata, and in particular one-dimensional CA, became computationally feasible around 1980. Surprisingly, even when one considers only elementary CA, which are constrained to a binary alphabet and local maps involving only nearest neighbors, complicated behaviors are observed in some cases. In fact, it appears that most behaviors observed in automata with more states and larger neighborhoods already have qualitative analogues in the realm of elementary CA. Careful empirical studies lead Wolfram to suggest a phenomenological classification of CA based on the long-term evolution of configurations, see [68,71] and Sect. “Introduction”. While Wolfram’s four classes clearly capture some of the behavior of CA it turns out that any attempt at formalizing this taxonomy meets with considerable difficulties. Even apparently simple questions about the behavior of CA turn out to be algorithmically undecidable and it is highly challenging to provide a detailed mathematical analysis of these systems. Introduction In the early 1980’s Wolfram published a collection of 20 open problems in the the theory of CA, see [69]. The first problem on his list is “What overall classification of cellu-

Cellular Automata, Classification of

lar automata behavior can be given?” As Wolfram points out, experimental mathematics provides a first answer to this problem: one performs a large number of explicit simulations and observes the patterns associated with the long term evolution of a configuration, see [67,71]. Wolfram proposed a classification that is based on extensive simulations in particular of one-dimensional cellular automata where the evolution of a configuration can be visualized naturally as a two-dimensional image. The classification involves four classes that can be described as follows:    

W1: Evolution leads to homogeneous fixed points. W2: Evolution leads to periodic configurations. W3: Evolution leads to chaotic, aperiodic patterns. W4: Evolution produces persistent, complex patterns of localized structures.

Thus, Wolfram’s first three classes follow closely concepts from continuous dynamics: fixed point attractors, periodic attractors and strange attractors, respectively. They correspond roughly to systems with zero temporal and spatial entropy, zero temporal entropy but positive spatial entropy, and positive temporal and spatial entropy, respectively. W4 is more difficult to associate with a continuous analogue except to say that transients are typically very long. To understand this class it is preferable to consider CA as models of massively parallel computation rather than as particular discrete dynamical systems. It was conjectured by Wolfram that W4 automata are capable of performing complicated computations and may often be computationally universal. Four examples of elementary CA that are typical of the four classes are shown in Fig. 1. Li and Packard [32,33] proposed a slightly modified version of this hierarchy by refining the low classes and in particular Wolfram’s W2. Much like Wolfram’s classification, the Li–Packard classification is concerned with the asymptotic behavior of the automaton, the structure and behavior of the limiting configurations. Here is one version of the Li–Packard classification, see [33].  LP1: Evolution leads to homogeneous fixed points.  LP2: Evolution leads to non-homogeneous fixed points, perhaps up a to a shift.  LP3: Evolution leads to ultimately periodic configurations. Regions with periodic behavior are separated by domain walls, possibly up to a shift.  LP4: Configurations produce locally chaotic behavior. Regions with chaotic behavior are separated by domain walls, possibly up to a shift.  LP5: Evolution leads to chaotic patterns that are spatially unbounded.

 LP6: Evolution is complex. Transients are long and lead to complicated space-time patterns which may be nonmonotonic in their behavior. By contrast, a classification closer to traditional dynamical systems theory was introduced by K˚urka, see [27,28]. The classification rests on the notions of equicontinuity, sensitivity to initial conditions and expansivity. Suppose x is a point in some metric space and f a map on that space. Then f is equicontinuous at x if 8" > 0 9ı > 0 8y 2 Bı (x); n 2 N : (d( f n (x); f n (y)) < ") where d(:; :) denotes a metric. Thus, all points in a sufficiently small neighborhood of x remain close to the iterates of x for the whole orbit. Global equicontinuity is a fairly strong condition, it implies that the limit set of the automaton is reached after finitely many steps. The map is sensitive (to initial conditions) if 8x; " > 0 9ı > 0 8y 2 Bı (x) 9n 2 N : (d( f n (x); f n (y)) ") : Lastly, the map is positively expansive if 9" > 0 8x ¤ y 9n 2 N : (d( f n (x); f n (y)) ") : K˚urka’s classification then takes the following form.  K1: All points are equicontinuous under the global map.  K2: Some but not all points are equicontinuous under the global map.  K3: The global map is sensitive but not positively expansive.  K4: The global map is positively expansive. This type of classification is perfectly suited to the analysis of uncountable spaces such as the Cantor space f0; 1gN or the full shift space ˙ Z which carry a natural metric structure. For the most part we will not pursue the analysis of CA by topological and measure theoretic means here and refer to  Topological Dynamics of Cellular Automata in this volume for a discussion of these methods. See Sect. “Definability and Computability” for the connections between topology and computability. Given the apparent complexity of observable CA behavior one might suspect that it is difficult to pinpoint the location of an arbitrary given CA in any particular classification scheme with any precision. This is in contrast to simple parameterizations of the space of CA rules such as Langton’s  parameter that are inherently easy to

313

314

Cellular Automata, Classification of

Cellular Automata, Classification of, Figure 1 Typical examples of the behavior described by Wolfram’s classes among elementary cellular automata

compute. Briefly, the  value of a local map is the fraction of local configurations that map to a non-zero value, see [29,33]. Small  values result in short transients leading to fixed points or simple periodic configurations. As  increases the transients grow longer and the orbits become more and more complex until, at last, the dynamics become chaotic. Informally, sweeping the  value from 0 to 1 will produce CA in W1, then W2, then W4 and lastly in W3. The last transition appears to be associated with a threshold phenomenon. It is unclear what the connection between Langton’s -value and computational properties of a CA is, see [37,46]. Other numerical measures that appear to be loosely connected to classifications are the mean field parameters of Gutowitz [20,21], the Z-parameter by Wuensche [72], see also [44]. It seems doubtful that a structured taxonomy along the lines of Wolfram or Li–Packard can be derived from a simple numerical measure such as the  value alone, or even from a combination of several such values. However, they may be useful as empirical evidence for membership in a particular class. Classification also becomes significantly easier when one restricts one’s attention to a limited class of CA such as additive CA, see  Additive Cellular Automata. In this

context, additive means that the local rule of the automaP ton has the form (E x ) D i c i x i where the coefficients as well as the states are modular numbers. A number of properties starting with injectivity and surjectivity as well as topological properties such as equicontinuity and sensitivity can be expressed in terms of simple arithmetic conditions on the rule coefficients. For example, equicontinuity is equivalent to all prime divisors of the modulus m dividing all coefficients ci , i > 1, see [35] and the references therein. It is also noteworthy that in the linear case methods tend to carry over to arbitrary dimensions; in general there is a significant step in complexity from dimension one to dimension two. No claim is made that the given classifications are complete; in fact, one should think of them as prototypes rather than definitive taxonomies. For example, one might add the class of nilpotent CA at the bottom. A CA is nilpotent if all configurations evolve to a particular fixed point after finitely many steps. Equivalently, by compactness, there is a bound n such that all configurations evolve to the fixed point in no more than n steps. Likewise, we could add the class of intrinsically universal CA at the top. A CA is intrinsically universal if it is capable of simulating all other CA of the same dimension in some reasonable sense. For

Cellular Automata, Classification of

a fairly natural notion of simulation see [45]. At any rate, considerable effort is made in the references to elaborate the characteristics of the various classes. For many concrete CA visual inspection of the orbits of a suitable sample of configurations readily suggests membership in one of the classes. Reversibility and Surjectivity A first tentative step towards the classification of a dynamical systems is to determine its reversibility or lack thereof. Thus we are trying to determine whether the evolution of the system is associated with loss of information, or whether it is possible to reconstruct the state of the system at time t from its state at time t C 1. In terms of the global map of the system we have to decide injectivity. Closely related is the question whether the global map is surjective, i. e., whether there is no Garden-of-Eden: every configuration has a predecessor under the global map. As a consequence, the limit set of the automaton is the whole space. It was shown of Hedlund that for CA the two notions are connected: every reversible CA is also surjective, see [24],  Reversible Cellular Automata. As a matter of fact, reversibility of the global map of a CA implies openness of the global map, and openness implies surjectivity. The converse implications are both false. By a well-known theorem by Hedlund [24] the global maps of CA are precisely the continuous maps that commute with the shift. It follows from basic topology that the inverse global map of a reversible CA is again the global map of a suitable CA. Hence, the predecessor configuration of a given configuration can be reconstructed by another suitably chosen CA. For results concerning reversibility on the limit set of the automaton see [61]. From the perspective of complexity the key result concerning reversible systems is the work by Lecerf [30] and Bennett [7]. They show that reversible Turing machines can compute any partial recursive function, modulo a minor technical problem: In a reversible Turing machine there is no loss of information; on the other hand even simple computable functions are clearly irreversible in the sense that, say, the sum of two natural numbers does not determine these numbers uniquely. To address this issue one has to adjust the notion of computability slightly in the context of reversible computation: given a partial recursive function f : N ! N the function b f (x) D hx; f (x)i can be computed by a reversible Turing machine where h:; :i is any effective pairing function. If f itself happens to be injective then there is no need for the coding device and f can be computed by a reversible Turing machine directly. For example, we can compute the product of two primes re-

versibly. Morita demonstrated that the same holds true for one-dimensional cellular automata [38,40,62],  Tiling Problem and Undecidability in Cellular Automata: reversibility is no obstruction to computational universality. As a matter of fact, any irreversible cellular automaton can be simulated by a reversible one, at least on configurations with finite support. Thus one should expect reversible CA to exhibit fairly complicated behavior in general. For infinite, one-dimensional CA it was shown by Amoroso and Patt [2] that reversibility is decidable. Moreover, it is decidable if the the global map is surjective. An efficient practical algorithm using concepts of automata theory can be found in [55], see also [10,14,23]. The fast algorithm is based on interpreting a one-dimensional CA as a deterministic transducer, see [6,48] for background. The underlying semi-automaton of the transducer is a de Bruijn automaton B whose states are words in ˙ w1 where ˙ is the alphabet of the CA and w is its width. c The transitions are given by ax ! xb where a; b; c 2 ˙ , w2 x2˙ and c D (axb), being the local map of the CA. Since B is strongly connected, the product automaton of B will contain a strongly connected component C that contains the diagonal D, an isomorphic copy of B. The global map of the CA is reversible if, and only if, C D D is the only non-trivial component. It was shown by Hedlund [24] that surjectivity of the global map is equivalent with local injectivity: the restriction of the map to configurations with finite support must be injective. The latter property holds if, and only if, C D D and is thus easily decidable. Automata theory does not readily generalize to words of dimensions higher than one. Indeed, reversibility and surjectivity in dimensions higher than one are undecidable, see [26] and  Tiling Problem and Undecidability in Cellular Automata in this volume for the rather intricate argument needed to establish this fact. While the structure of reversible one-dimensional CA is well-understood, see  Tiling Problem and Undecidability in Cellular Automata, [16], and while there is an efficient algorithm to check reversibility, few methods are known that allow for the construction of interesting reversible CA. There is a noteworthy trick due to Fredkin that exploits the reversibility of the Fibonacci equation X nC1 D X n C X n1 . When addition is interpreted as exclusive or this can be used to construct a second-order CA from any given binary CA; the former can then be recoded as a first-order CA over a 4-letter alphabet. For example, for the open but irreversible elementary CA number 90 we obtain the CA shown in Fig. 2. Another interesting class of reversible one-dimensional CA, the so-called partitioned cellular automata (PCA), is due to Morita and Harao, see [38,39,40]. One

315

316

Cellular Automata, Classification of

Cellular Automata, Classification of, Figure 2 A reversible automaton obtained by applying Fredkin’s construction to the irreversible elementary CA 77

can think of a PCA as a cellular automaton whose cells are divided into multiple tracks; specifically Morita uses an alphabet of the form ˙ D ˙1  ˙2  ˙3 . The configurations of the automaton can be written as (X; Y; Z) where X 2 ˙1 Z , Y 2 ˙2 Z and Z 2 ˙3 Z . Now consider the shearing map  defined by (X; Y; Z) D (RS(X); Y; LS(Z)) where RS and LS denote the right and left shift, respectively. Given any function f : ˙ ! ˙ we can define a global map f ı  where f is assumed to be applied point-wise. Since the shearing map is bijective, the CA will be reversible if, and only if, the map f is bijective. It is relatively easy to construct bijections f that cause the CA to perform particular computational tasks, even when a direct construction appears to be entirely intractable.

Definability and Computability Formalizing Wolfram’s Classes Wolfram’s classification is an attempt to categorize the complexity of the CA by studying the patterns observed during the long-term evolution of all configurations. The first two classes are relatively easy to observe, but it is difficult to distinguish between the last two classes. In particular W4 is closely related to the kind of behavior that would be expected in connection with systems that are capable of performing complicated computations, including the ability to perform universal computation; a property that is notoriously difficult to check, see [52]. The focus on the full configuration space rather than a significant sub-

Cellular Automata, Classification of

set thereof corresponds to the worst-case approach wellknown in complexity theory and is somewhat inferior to an average case analysis. Indeed, Baldwin and Shelah point out that a product construction can be used to design a CA whose behavior is an amalgamation of the behavior of two given CA, see [3,4]. By combining CA in different classes one obtains striking examples of the weakness of the worst-case approach. A natural example of this mixed type of behavior is elementary CA 184 which displays class II or class III behavior, depending on the initial configuration. Another basic example for this type of behavior is the well-studied elementary CA 30, see Sect. “Conclusion”. Still, for many CA a worst-case classification seems to provide useful information about the structural properties of the automaton. The first attempt at formalizing Wolfram’s class was made by Culik and Yu who proposed the following hierarchy, given here in cumulative form, see [11]:  CY1: All configurations evolve to a fixed point.  CY2: All configurations evolve to a periodic configuration.  CY3: The orbits of all configurations are decidable.  CY4: No constraints. The Culik–Yu classification employs two rather different methods. The first two classes can be defined by a simple formula in a suitable logic whereas the third (and the fourth in the disjoint version of the hierarchy) rely on notions of computability theory. As a general framework for both approaches we consider discrete dynamical systems, structures of the form A D hC ; i where C ˙ Z is the space of configurations of the system and  is the “next configuration” relation on C. We will only consider the deterministic case where for each configuration x there exists precisely one configuration y such that x  y. Hence we are really dealing with algebras with one unary function, but iteration is slightly easier to deal with in the relational setting. The structures most important in this context are the ones arising from a CA. For any local map we consider the structure A D hC ; i where the next configuration relation is determined by x  (x). Using the standard language of first order logic we can readily express properties of the CA in terms of the system A . For example, the system is reversible, respectively surjective, if the following assertions are valid over A: 8 x; y; z (x  z and y  z implies x D y) ; 8 x 9 y (y  x) : As we have seen, both properties are easily decidable in the one-dimensional case. In fact, one can express the ba-

sic predicate x  y (as well as equality) in terms of finite state machines on infinite words. These machines are defined like ordinary finite state machines but the acceptance condition requires that certain states are reached infinitely and co-infinitely often, see [8,19]. The emptiness problem for these automata is easily decidable using graph theoretic algorithms. Since regular languages on infinite words are closed under union, complementation and projection, much like their finite counterparts, and all the corresponding operations on automata are effective, it follows that one can decide the validity of first order sentences over A such as the two examples above: the model-checking problem for these structures and first order logic is decidable, see [34]. For example, we can decide whether there is a configuration that has a certain number of predecessors. Alternatively, one can translate these sentences into monadic second order logic of one successor, and use wellknown automata-based decision algorithms there directly, see [8]. Similar methods can be used to handle configurations with finite support, corresponding to weak monadic second order logic. Since the complexity of the decision procedure is non-elementary one should not expect to be able to handle complicated assertions. On the other hand, at least for weak monadic second order logic practical implementations of the decision method exist, see [17]. There is no hope of generalizing this approach as the undecidability of, say, reversibility in higher dimensions demonstrates. t C Write x ! y if x evolves to y in exactly t steps, x ! y  if x evolves to y in any positive number of steps and x ! y t if x evolves to y in any number of steps. Note that ! is  definable for each fixed t, but ! fails to be so definable in first order logic. This is in analogy to the undefinability of path existence problems in the first order theory of graphs, see [34]. Hence it is natural to extend our language so we can express iterations of the global map, either by adding transitive closures or by moving to some limited system of 

higher order logic over A where ! is definable, see [8]. Arguably the most basic decision problem associated with a system A that requires iteration of the global map is the Reachability Problem: given two configurations x and y, does the evolution of x lead to y? A closely related but different question is the Confluence Problem: will two configurations x and y evolve to the same limit cycle? Confluence is an equivalence relation and allows for the decomposition of configuration space into limit cycles together with their basins of attraction. The Reachability and Confluence Problem amount to determining, given configurations x and y, whether 

x! y;





9 z (x ! z and y ! z) ;

317

318

Cellular Automata, Classification of

respectively. As another example, the first two Culik–Yu class can be defined like so: 

8 x 9 z (x ! z and z  z); 

C

8 x 9 z (x ! z and z ! z) : It is not difficult to give similar definitions for the lower Li– Packard classes if one extends the language by a function symbol denoting the shift operator. The third Culik–Yu class is somewhat more involved. By definition, a CA lies in the third class if it admits a global decision algorithm to determine whether a given configuration x evolves to another given configuration y in a finite number of steps. In other words, we are looking for automata where the Reachability Problem is algorithmically solvable. While one can agree that W4 roughly translates into undecidability and is thus properly situated in the hierarchy, it is unclear how chaotic patterns in W3 relate to decidability. No method is known to translate the apparent lack of tangible, persistent patterns in rules such as elementary CA 30 into decision algorithms for Reachability. There is another, somewhat more technical problem to overcome in formalizing classifications. Recall that the full configuration space is C D ˙ Z . Intuitively, given x 2 C we can effectively determine the next configuration y D (x). However, classical computability theory does not deal with infinitary objects such as arbitrary configuration so a bit of care is needed here. The key insight is that we can determine arbitrary finite segments of (x) using only finite segments of x (and, of course, the lookup table for the local map). There are several ways to model computability on ˙ Z based on this idea of finite approximations, we refer to [66] for a particularly appealing model based on so-called type-2 Turing machines; the reference also contains many pointers to the literature as well as a comparison between the different approaches. It is easy to see that for any CA the global map as well as all its iterates t are computable, the latter uniformly in t. However, due to the finitary nature of all computations, equality is not decidable in type-2 computability: the unequal operator U0 (x; y) D 0 if x ¤ y, U0 (x; y) undefined otherwise, is computable and thus unequality is semi-decidable, but the stronger U0 (x; y) D 0 if x ¤ y, U0 (x; y) D 1, otherwise, is not computable. The last result is perhaps somewhat counterintuitive, but it is inevitable if we strictly adhere to the finite approximation principle. In order to avoid problems of this kind it has become customary to consider certain subspaces of the full configuration space, in particular Cfin , the collection of configurations with finite support, Cper , the collection of spatially periodic configurations and Cap , the collection of al-

most periodic configurations of the form : : : uuuwvvv : : : where u, v and w are all finite words over the alphabet of the automaton. Thus, an almost periodic configuration differs from a configuration of the form ! u v ! in only finitely many places. Configurations with finite support correspond to the special case where u D v D 0 is a special quiescent symbol and spatially periodic configurations correspond to u D v, w D ". The most general type of configuration that admits a finitary description is the class Crec of recursive configurations, where the assignment of state to a cell is given by a computable function. It is clear that all these subspaces are closed under the application of a global map. Except for Cfin there are also closed under inverse maps in the following sense: given a configuration y in some subspace that has a predecessor x in Call there already exists a predecessor in the same subspace, see [55,58]. This is obvious except in the case of recursive configurations. The reference also shows that the recursive predecessor cannot be computed effectively from the target configuration. Thus, for computational purposes the dynamics of the cellular automaton are best reflected in Cap : it includes all configuration with finite support and we can effectively trace an orbit in both directions. It is not hard to see that Cap is the least such class. Alas, it is standard procedure to avoid minor technical difficulties arising from the infinitely repeated spatial patterns and establish classifications over the subspace Cfin . There is a arguably not much harm in this simplification since Cfin is a dense subspace of Call and compactness can be used to lift properties from Cfin to the full configuration space. The Culik–Yu hierarchy is correspondingly defined over Cfin , the class of all configurations of finite support. In this setting, the first three classes of this hierarchy are undecidable and the fourth is undecidable in the disjunctive version: there is no algorithm to test whether a CA admits undecidable orbits. As it turns out, the CA classes are complete in their natural complexity classes within the arithmetical hierarchy [50,52]. Checking membership in the first two classes comes down to performing an infinite number of potentially unbounded searches and can be described logically by a ˘2 expression, a formula of type 8 x 9 y R(x; y) where R is a decidable predicate. Indeed, CY1 and CY2 are both ˘2 -complete. Thus, deciding whether all configurations on a CA evolve to a fixed point is equivalent to the classical problem of determining whether a semi-decidable set is infinite. The third class is even less amenable to algorithmic attack; one can show that CY3 is ˙3 -complete, see [53]. Thus, deciding whether all orbits are decidable is as difficult as determining whether any given semi-decidable set is decidable. It is

Cellular Automata, Classification of

not difficult to adjust these undecidability results to similar classes such as the lower levels of the Li–Packard hierarchy that takes into account spatial displacements of patterns. Effective Dynamical Systems and Universality The key property of CA that is responsible for all these undecidability results is the fact that CA are capable of performing arbitrary computations. This is unsurprising when one defines computability in terms of Turing machines, the devices introduced by Turing in the 1930’s, see [47,63]. Unlike the Gödel–Herbrand approach using general recursive functions or Church’s -calculus, Turing’s devices are naturally closely related to discrete dynamical systems. For example, we can express an instantaneous description of a Turing machine as a finite sequence al al C1 : : : a1 p a1 a2 : : : ar where the ai are tape symbols and p is a state of the machine, with the understanding that the head is positioned at a1 and that all unspecified tape cells contain the blank symbol. Needless to say, these Turing machine configurations can also be construed as finite support configurations of a one-dimensional CA. It follows that a onedimensional CA can be used to simulate an arbitrary Turing machine, hence CA are computational universal: any computable function whatsoever can already be computed by a CA. Note, though, that the simulation is not entirely trivial. First, we have to rely on input/output conventions. For example, we may insist that objects in the input domain, typically tuples of natural numbers, are translated into a configuration of the CA by a primitive recursive coding function. Second, we need to adopt some convention that determines when the desired output has occurred: we follow the evolution of the input configuration until some “halting” condition applies. Again, this condition must be primitive recursively decidable though there is considerable leeway as to how the end of a computation should be signaled by the CA. For example, we could insist that a particular cell reaches a special state, that an arbitrary cell reaches a special state, that the configuration be a fixed point and so forth. Lastly, if and when a halting configuration is reached, we a apply a primitive recursive decoding function to obtain the desired output. Restricting the space to configurations that have finite support, that are spatially periodic, and so forth, produces an effective dynamical system: the configurations can be coded as integers in some natural way, and the next configuration relation is primitive recursive in the sense that the corresponding relation on code numbers is so primitive

recursive. A classical example for an effective dynamical system is given by selecting the instantaneous descriptions of a Turing machine M as configurations, and one-step relation of the Turing machine as the operation of C . Thus we obtain a system A M whose orbits represent the computations of the Turing machine. Likewise, given the local map of a CA we obtain a system A whose operation is the induced global map. While the full configuration space Call violates the effectiveness condition, any of the spaces Cper , Cfin , Cap and Crec will give rise to an effective dynamical system. Closure properties as well as recent work on the universality of elementary CA 110, see Sect. “Conclusion”, suggests that the class of almost periodic configurations, also known as backgrounds or wallpapers, see [9,58], is perhaps the most natural setting. Both Cfin and Cap provide a suitable setting for a CA that simulates a Turing machine: we can interpret A M as a subspace of A for some suitably constructed one-dimensional CA ; the orbits of the subspace encode computations of the Turing machine. It follows from the undecidability of the Halting Problem for Turing machines that the Reachability Problem for these particular CA is undecidable. Note, though, that orbits in A M may well be finite, so some care must be taken in setting up the simulation. For example, one can translate halting configurations into fixed points. Another problem is caused by the worstcase nature of our classification schemes: in Turing machines and their associated systems A M it is only behavior on specially prepared initial configurations that matters, whereas the behavior of a CA depends on all configurations. The behavior of a Turing machine on all instantaneous descriptions, rather than just the ones that can occur during a legitimate computation on some actual input, was first studied by Davis, see [12,13], and also Hooper [25]. Call a Turing machine stable if it halts on any instantaneous description whatsoever. With some extra care one can then construct a CA that lies in the first Culik–Yu class, yet has the same computational power as the Turing machine. Davis showed that every total recursive function can already be computed by a stable Turing machine, so membership in CY1 is not an impediment to considerable computational power. The argument rests on a particular decomposition of recursive functions. Alternatively, one directly manipulate Turing machines to obtain a similar result, see [49,53]. On the other hand, unstable Turing machines yield a natural and coding-free definition of universality: a Turing machine is Davis-universal if the set of all instantaneous description on which the machine halts is ˙1 -complete. The mathematical theory of infinite CA is arguably more elegant than the actually observable finite case. As

319

320

Cellular Automata, Classification of

a consequence, classifications are typically concerned with CA operating on infinite grids, so that even a configuration with finite support can carry arbitrarily much information. If we restrict our attention to the space of configurations on a finite grid a more fine-grained analysis is required. For a finite grid of size n the configuration space has the form Cn D [n] ! ˙ and is itself finite, hence any orbit is ultimately periodic and the Reachability Problem is trivially decidable. However, in practice there is little difference between the finite and infinite case. First, computational complexity issues make it practically impossible to analyze even systems of modest size. The Reachability Problem for finite CA, while decidable, is PSPACE-complete even in the one-dimensional case. Computational hardness appears in many other places. For example, if we try to determine whether a given configuration on a finite grid is a Garden-of-Eden the problem turns out to be NLOG-complete in dimension one and NP -complete in all higher dimensions, see [56]. Second, it stands to reason that the more interesting classification problem in the finite case takes the following parameterized form: given a local map together with boundary conditions, determine the behavior of on all finite grids. Under periodic boundary conditions this comes down to the study of Cper and it seems that there is little difference between this and the fixed boundary case. Since all orbits on a finite grid are ultimately periodic one needs to apply a more fine-grained classification that takes into account transient lengths. It is undecidable whether all configurations on all finite grids evolve to a fixed point under a given local map, see [54]. Thus, there is no algorithm to determine whether 

hCn ; i ˆ 8 x 9 z (x ! z and z  z) for all grid sizes n. The transient lengths are trivially bounded by kn where k is the size of the alphabet of the automaton. It is undecidable whether the transient lengths grow according to some polynomial bound, even when the polynomial in question is constant. Restrictions of the configuration space are one way to obtain an effective dynamical system. Another is to interpret the approximation-based notion of computability on the full space in terms of topology. It is well-known that computable maps Call ! Call are continuous in the standard product topology. The clopen sets in this topology are the finite unions of cylinder sets where a cylinder set is determined by the values of a configuration in finitely many places. By a celebrated result of Hedlund the global maps of a CA on the full space are characterized by being continuous and shift-invariant. Perhaps somewhat counter-intuitively, the decidable subsets of Call

are quite weak, they consist precisely of the clopen sets. Now consider a partition of Call into finitely many clopen sets C0 ; C2 ; : : : ; C n1 . Thus, it is decidable which block of the partition a given point in the space belongs to. Moreover, Boolean operations on clopen sets as well as application of the global map and the inverse global map are all computable. The partition affords a natural projection  : Call ! ˙n where ˙n D f0; 1; : : : ; n  1g and (x) D i iff x 2 C i . Hence the projection translates orbits in the full space Call into a class W of !-words over ˙n , the symbolic orbits of the system. The Cantor space ˙nZ together with the shift describes all logically possible orbits with respect to the given partition and W describes the symbolic orbits that actually occur in the given CA. The shift operator corresponds to an application of the global map of the CA. The finite factors of W provide information about possible finite traces of an orbit when filtered through the given partition. Whole orbits, again filtered through the partition, can be described by !-words. To tackle the classification of the CA in terms of W it was suggested by Delvenne et al., see [15], to refer to the CA as decidable if there it is decidable whether W has nonempty intersection with a !-regular language. Alas, decidability in this sense is very difficult, its complexity being ˙11 -complete and thus outside of the arithmetical hierarchy. Likewise it is suggested to call a CA universal if the problem of deciding whether the cover of W, the collection of all finite factors, is ˙1 -complete, in analogy to Davisuniversality. Computational Equivalence In recent work, Wolfram suggests a so-called Principle of Computational Equivalence, or PCE for short, see [71], p. 717. PCE states that most computational processes come in only two flavors: they are either of a very simple kind and avoid undecidability, or they represent a universal computation and are therefore no less complicated than the Halting Problem. Thus, Wolfram proposes a zero-one law: almost all computational systems, and thus in particular all CA, are either as complicated as a universal Turing machine or are computationally simple. As evidence for PCE Wolfram adduces a very large collection of simulations of various effective dynamical systems such as Turing machines, register machines, tag systems, rewrite systems, combinators, and cellular automata. It is pointed out in Chap. 3 of [71], that in all these classes of systems there are surprisingly small examples that exhibit exceedingly complicated behavior – and presumably are capable of universal computation. Thus it is conceivable that universality is a rather common property, a property that is

Cellular Automata, Classification of

indeed shared by all systems that are not obviously simple. Of course, it is often very difficult to give a complete proof of the computational universality of a natural system, as opposed to carefully constructed one, so it is not entirely clear how many of Wolfram’s examples are in fact universal. As a case in point consider the universality proof of Conway’s Game of Life, or the argument for elementary CA 110. If Wolfram’s PCE can be formally established in some form it stands to reason that it will apply to all effective dynamical systems and in particular to CA. Hence, classifications of CA would be rather straightforward: at the top there would be the class of universal CA, directly preceded by a class similar to the third Culik–Yu class, plus a variety of subclasses along the lines of the lower Li– Packard classes. The corresponding problem in classical computability theory was first considered in the 1930’s by Post and is now known as Post’s Problem: is there a semi-decidable set that fails to be decidable, yet is not as complicated as the Halting Set? In terms of Turing degrees the problem thus is to construct a semi-decidable set A such that ; 3. Future Directions This short survey has only been able to hint at the vast wealth of emergent phenomena that arise in CA. Much work yet remains to be done, in classifying the different structures, identifying general laws governing their behavior, and determining the the causal mechanisms that lead them to arise. For example, there are as yet no general techniques for determining whether a given domain is stable in a given CA; for characterizing the set of initial conditions that will eventually give rise to it; or for working out the particles that it supports. In CA or two or more dimensions, a large body of descriptive results are available, but these are more frequently anecdotal than systematic. A significant barrier to progress has been the lack of good mathematical techniques for identifying, describing, and classifying domains. One promising development in this area is an information-theoretic filtering technique that can operate on configurations of any dimension [13]. Bibliography Primary Literature 1. Boccara N, Nasser J, Roger M (1991) Particlelike structures and their interactions in spatio-temporal patterns generated by one-dimensional deterministic cellular automaton rules. Phys Rev A 44:866 2. Chate H, Manneville P (1992) Collective behaviors in spatially extended systems with local interactions and synchronous updating. Profress Theor Phys 87:1 3. Crutchfield JP, Hanson JE (1993) Turbulent pattern bases for cellular automata. Physica D 69:279 4. Eloranta K, Nummelin E (1992) The kink of cellular automaton rule 18 performs a random walk. J Stat Phys 69:1131 5. Fisch R, Gravner J, Griffeath D (1991) Threshold-range scaling of excitable cellular automata. Stat Comput 1:23–39

6. Gallas J, Grassberger P, Hermann H, Ueberholz P (1992) Noisy collective behavior in deterministic cellular automata. Physica A 180:19 7. Grassberger P (1984) Chaos and diffusion in deterministic cellular automata. Phys D 10:52 8. Griffeath D (2008) The primordial soup kitchen. http://psoup. math.wisc.edu/kitchen.html 9. Hanson JE, Crutchfield JP (1992) The attractor-basin portrait of a cellular automaton. J Stat Phys 66:1415 10. Hanson JE, Crutchfield JP (1997) Computational mechanics of cellular automata: An example. Physica D 103:169 11. Hopcroft JE, Ullman JD (1979) Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading 12. Lindgren K, Moore C, Nordahl M (1998) Complexity of two-dimensional patterns. J Stat Phys 91:909 13. Shalizi C, Haslinger R, Rouquier J, Klinker K, Moore C (2006) Automatic filters for the detection of coherent structure in spatiotemporal systems. Phys Rev E 73:036104 14. Wojtowicz M (2008) Mirek’s cellebration. http://www.mirekw. com/ca/ 15. Wolfram S (1984) Universality and complexity in cellular automata. Physica D 10:1

Books and Reviews Das R, Crutchfield JP, Mitchell M, Hanson JE (1995) Evolving Globally Synchronized Cellular Automata. In: Eshelman LJ (ed) Proceedings of the Sixth International Conference on Genetic Algorithms. Morgan Kaufmann, San Mateo Gerhardt M, Schuster H, Tyson J (1990) A cellular automaton model of excitable medai including curvature and dispersion. Science 247:1563 Gutowitz HA (1991) Transients, Cycles, and Complexity in Cellular Automata. Phys Rev A 44:R7881 Henze C, Tyson J (1996) Cellular automaton model of three-dimensional excitable media. J Chem Soc Faraday Trans 92:2883 Hordijk W, Shalizi C, Crutchfield J (2001) Upper bound on the products of particle interactions in cellular automata. Physica D 154:240 Iooss G, Helleman RH, Stora R (ed) (1983) Chaotic Behavior of Deterministic Systems. North-Holland, Amsterdam Ito H (1988) Intriguing Properties of Global Structure in Some Classes of Finite Cellular Automata. Physica 31D:318 Jen E (1986) Global Properties of Cellular Automata. J Stat Phys 43:219 Kaneko K (1986) Attractors, Basin Structures and Information Processing in Cellular Automata. In: Wolfram S (ed) Theory and Applications of Cellular Automata. World Scientific, Singapore, pp 367 Langton C (1990) Computation at the Edge of Chaos: Phase transitions and emergent computation. Physica D 42:12 Lindgren K (1987) Correlations and Random Information in Cellular Automata. Complex Syst 1:529 Lindgren K, Nordahl M (1988) Complexity Measures and Cellular Automata. Complex Syst 2:409 Lindgren K, Nordahl M (1990) Universal Computation in Simple One-Dimensional Cellular Automata. Complex Syst 4:299 Mitchell M (1998) Computation in Cellular Automata: A Selected Review. In: Schuster H, Gramms T (eds) Nonstandard Computation. Wiley, New York

Cellular Automata, Emergent Phenomena in

Packard NH (1984) Complexity in Growing Patterns in Cellular Automata. In: Demongeot J, Goles E, Tchuente M (eds) Dynamical Behavior of Automata: Theory and Applications. Academic Press, New York Packard NH (1985) Lattice Models for Solidification and Aggregation. Proceedings of the First International Symposium on Form, Tsukuba Pivato M (2007) Defect Particle Kinematics in One-Dimensional Cellular Automata. Theor Comput Sci 377:205–228

Weimar J (1997) Cellular automata for reaction-diffusion systems. Parallel Comput 23:1699 Wolfram S (1984) Computation Theory of Cellular Automata. Comm Math Phys 96:15 Wolfram S (1986) Theory and Applications of Cellular Automata. World Scientific Publishers, Singapore Wuensche A, Lesser MJ (1992) The Global Dynamics of Cellular Automata. Santa Fe Institute Studies in the Science of Complexity, Reference vol 1. Addison-Wesley, Redwood City

335

336

Cellular Automata and Groups

Cellular Automata and Groups TULLIO CECCHERINI -SILBERSTEIN1, MICHEL COORNAERT2 1 Dipartimento di Ingegneria, Università del Sannio, Benevento, Italy 2 Institut de Recherche Mathématique Avancée, Université Louis Pasteur et CNRS, Strasbourg, France Article Outline Glossary Definition of the Subject Introduction Cellular Automata Cellular Automata with a Finite Alphabet Linear Cellular Automata Group Rings and Kaplansky Conjectures Future Directions Bibliography Glossary Groups A group is a set G endowed with a binary operation G  G 3 (g; h) 7! gh 2 G, called the multiplication, that satisfies the following properties: (i) for all g; h and k in G, (gh)k D g(hk) (associativity); (ii) there exists an element 1G 2 G (necessarily unique) such that, for all g in G, 1G g D g1G D g (existence of the identity element); (iii) for each g in G, there exists an element g 1 2 G (necessarily unique) such that g g 1 D g 1 g D 1G (existence of the inverses). A group G is said to be Abelian (or commutative) if the operation is commutative, that is, for all g; h 2 G one has gh D hg. A group F is called free if there is a subset S  F such that any element g of F can be uniquely written as a reduced word on S, i. e. in the form g D s1˛1 s2˛2    s ˛n n , where n 0, s i 2 S and ˛ i 2 Z n f0g for 1  i  n, and such that s i ¤ s iC1 for 1  i  n  1. Such a set S is called a free basis for F. The cardinality of S is an invariant of the group F and it is called the rank of F. A group G is finitely generated if there exists a finite subset S  G such that every element g 2 G can be expressed as a product of elements of S and their inverses, that is, g D s11 s22    s nn , where n 0 and s i 2 S,  i D ˙1 for 1  i  n. The minimal n for which such an expression exists is called the word length of g with respect to S and it is denoted by `(g). The group G is a (discrete) metric space with the

distance function d : G  G ! RC defined by setting d(g; g 0 ) D `(g 1 g 0 ) for all g; g 0 2 G. The set S is called a finite generating subset for G and one says that S is symmetric provided that s 2 S implies s1 2 S. The Cayley graph of a finitely generated group G w.r. to a symmetric finite generating subset S  G is the (undirected) graph Cay(G; S) with vertex set G and where two elements g; g 0 2 G are joined by an edge if and only if g 1 g 0 2 S. A group G is residually finite if the intersection of all subgroups of G of finite index is trivial. A group G is amenable if it admits a right-invariant mean, that is, a map  : P (G) ! [0; 1], where P (G) denotes the set of all subsets of G, satisfying the following conditions: (i) (G) D 1 (normalization); (ii) (A [ B) D (A) C (B) for all A; B 2 P (G) such that A \ B D ¿ (finite additivity); (iii) (Ag) D (A) for all g 2 G and A 2 P (G) (right-invariance). Rings A ring is a set R equipped with two binary operations R  R 3 (a; b) 7! a C b 2 R and R  R 3 (a; b) 7! ab 2 R, called the addition and the multiplication, respectively, such that the following properties are satisfied: (i) R, with the addition operation, is an Abelian group with identity element 0, called the zero element, (the inverse of an element a 2 R is denoted by a); (ii) the multiplication is associative and admits an identity element 1, called the unit element; (iii) multiplication is distributive with respect to addition, that is, a(b C c) D ab C ac and (b C c)a D ba C ca for all a; b and c 2 R. A ring R is commutative if ab D ba for all a; b 2 R. A field is a commutative ring K ¤ f0g where every non-zero element a 2 K is invertible, that is there exists a1 2 K such that aa1 D 1. In a ring R a non-trivial element a is called a zero-divisor if there exists a non-zero element b 2 R such that either ab D 0 or ba D 0. A ring R is directly finite if whenever ab D 1 then necessarily ba D 1, for all a; b 2 R. If the ring M d (R) of d  d matrices with coefficients in R is directly finite for all d 1 one says that R is stably finite. Let R be a ring and let G be a group. Denote by R[G] P the set of all formal sums g2G ˛ g g where ˛ g 2 R and ˛ g D 0 except for finitely many elements g 2 G. We define two binary operations on R[G], namely the addition, by setting ! ! X X X ˛g g C ˇh h D (˛ g C ˇ g )g ; g2G

h2G

and the multiplication, by setting

g2G

Cellular Automata and Groups

X g2G

! ˛g g

X

! ˇh h D

h2G

X

˛ g ˇ h gh

g;h2G

kDg h

X

˛ g ˇ g 1 k k:

g;k2G

Then, with these two operations, R[G] becomes a ring; it is called the group ring of G with coefficients in R. Cellular automata Let G be a group, called the universe, and let A be a set, called the alphabet. A configuration is a map x : G ! A. The set AG of all configurations is equipped with the right action of G defined by AG  G 3 (x; g) 7! x g 2 AG , where x g (g 0 ) D x(g g 0 ) for all g 0 2 G. A cellular automaton over G with coefficients in A is a map : AG ! AG satisfying the following condition: there exists a finite subset M  G and a map  : AM ! A such that (x)(g) D (x g j M ) for all x 2 AG ; g 2 G, where x g j M denotes the restriction of xg to M. Such a set M is called a memory set and  is called a local defining map for . If A D V is a vector space over a field K, then a cellular automaton : V G ! V G , with memory set M  G and local defining map  : V M ! V , is said to be linear provided that  is linear. Two configurations x; x 0 2 AG are said to be almost equal if the set fg 2 G; x(g) ¤ x 0 (g)g at which they differ is finite. A cellular automaton is called pre-injective if whenever (x) D (x 0 ) for two almost equal configurations x; x 0 2 AG one necessarily has x D x 0 . A Garden of Eden configuration is a configuration x 2 AG n (AG ). Clearly, GOE configurations exist if and only if is not surjective. Definition of the Subject A cellular automaton is a self-mapping of the set of configurations of a group defined from local and invariant rules. Cellular automata were first only considered on the n-dimensional lattice group Z n and for configurations taking values in a finite alphabet set but they may be formally defined on any group and for any alphabet. However, it is usually assumed that the alphabet set is endowed with some mathematical structure and that the local defining rules are related to this structure in some way. It turns out that general properties of cellular automata often reflect properties of the underlying group. As an example, the Garden of Eden theorem asserts that if the group is amenable and the alphabet is finite, then the surjectivity of a cellular automaton is equivalent to its pre-injectivity

(a weak form of injectivity). There is also a linear version of the Garden of Eden theorem for linear cellular automata and finite-dimensional vector spaces as alphabets. It is an amazing fact that famous conjectures of Kaplansky about the structure of group rings can be reformulated in terms of linear cellular automata. Introduction The goal of this paper is to survey results related to the Garden of Eden theorem and the surjunctivity problem for cellular automata. The notion of a cellular automaton goes back to John von Neumann [37] and Stan Ulam [34]. Although cellular automata were firstly considered only in theoretical computer science, nowadays they play a prominent role also in physics and biology, where they serve as models for several phenomena ( Cellular Automata Modeling of Physical Systems,  Chaotic Behavior of Cellular Automata), and in mathematics. In particular, cellular automata are studied in ergodic theory ( Ergodic Theory of Cellular Automata) and in the theory of dynamical systems ( Topological Dynamics of Cellular Automata), in functional and harmonic analysis, and in group theory. In the classical framework, the universe U is the lattice Z2 of integer points in Euclidean plane and the alphabet A is a finite set, typically A D f0; 1g. The set AU D fx : U ! Ag is the configuration space, a map x : U ! A is a configuration and a point (n; m) 2 U is called a cell. One is given a neighborhood M of the origin (0; 0) 2 U, typically, for some r > 0, M D f(n; m) 2 Z2 : jnj C jmj  rg (von Neumann r-ball) or M D f(n; m) 2 Z2 : jnj; jmj  rg (Moore’s r-ball) and a local map  : AM ! A. One then “extends”  to the whole universe obtaining a map : AU ! AU , called a cellular automaton, by setting (x)(n; m) D (x(n C s; m C t)(s;t)2M ). This way, the value (x)(n; m) 2 A of the configuration x at the cell (n; m) 2 U only depends on the values x(n C s; m C s) of x at its neighboring cells (x C s; y C t) (x; y) C (s; t) 2 (x; y)C M, in other words, is Z2 -equivariant. M is called a memory set for and  a local defining map. In 1963 E.F. Moore proved that if a cellular automa2 2 ton : AZ ! AZ is surjective then it is also pre-injective, a weak form of injectivity. Shortly later, John Myhill proved the converse to Moore’s theorem. The equivalence of surjectivity and pre-injectivity of cellular automata is referred to as the Garden of Eden theorem (briefly GOE theorem), this biblical terminology being motivated by the fact that it gives necessary and sufficient conditions for the existence of configurations x that are not in the image 2 2 2 of , i. e. x 2 AZ n (AZ ), so that, thinking of ( ; AZ ) as

337

338

Cellular Automata and Groups

a discrete dynamical system, with being the time, they can appear only as “initial configurations”. It was immediately realized that the GOE theorem was holding also in higher dimension, namely for cellular automata with universe U D Zd , the lattice of integer points in the d-dimensional space. Then, Machì and Mignosi [27] gave the definition of a cellular automaton over a finitely generated group and extended the GOE theorem to the class of groups G having sub-exponential growth, that is for which the growth function G (n), which counts the elements g 2 G at “distance” at most n from the unit element 1G of G, has a growth weaker than the exponential, p in formulæ, lim n!1 n G (n) D 1. Finally, in 1999 Ceccherini-Silberstein, Machì and Scarabotti [9] extended the GOE theorem to the class of amenable groups. It is interesting to note that the notion of an amenable group was also introduced by von Neumann [36]. This class of groups contains all finite groups, all Abelian groups, and in fact all solvable groups, all groups of sub-exponential growth and it is closed under the operation of taking subgroups, quotients, directed limits and extensions. In [27] two examples of cellular automata with universe the free group F 2 of rank two, the prototype of a non-amenable group, which are surjective but not pre-injective and, conversely, preinjective but not surjective, thus providing an instance of the failure of the theorems of Moore and Myhill and so of the GOE theorem. In [9] it is shown that this examples can be extended to the class of groups, thus necessarily nonamenable, containing the free group F 2 . We do not know whether the GOE theorem only holds for amenable groups or there are examples of groups which are non-amenable and have no free subgroups: by results of Olshanskii [30] and Adyan [1] it is know that such class is non-empty. In 1999 Misha Gromov [20], using a quite different terminology, reproved the GOE for cellular automata whose universes are infinite amenable graphs  with a dense pseudogroup of holonomies (in other words such  s are rich in symmetries). In addition, he considered not only cellular automata from the full configuration space A into itself but also between subshifts X; Y  A . He used the notion of entropy of a subshift (a concept hidden in the papers [27] and [9]. In the mid of the fifties W. Gottschalk introduced the notion of surjunctivity of maps. A map f : X ! Y is surjunctive if it is surjective or not injective. We say that a group G is surjunctive if all cellular automata : AG ! AG with finite alphabet are surjunctive. Lawton [18] proved that residually finite groups are surjunctive. From the GOE theorem for amenable groups [9] one immediately deduce that amenable groups are surjunctive as well. Finally Gromov [20] and, independently, Ben-

jamin Weiss [38] proved that all sofic groups (the class of sofic groups contains all residually finite groups and all amenable groups) are surjunctive. It is not known whether or not all groups are surjunctive. In the literature there is a notion of a linear cellular automaton. This means that the alphabet is not only a finite set but also bears the structure of an Abelian group and that the local defining map  is a group homomorphism, that is, it preserves the group operation. These are also called additive cellular automata ( Additive Cellular Automata). In [5], motivated by [20], we introduced another notion of linearity for cellular automata. Given a group G and a vector space V over a (not necessarily finite) field K, the configuration space is V G and a cellular automaton : V G ! V G is linear if the local defining map  : V B ! V is K-linear. The set LCA(V ; G) of all linear cellular automata with alphabet V and universe G naturally bears a structure of a ring. The finiteness condition for a set A in the classical framework is now replaced by the finite dimensionality of V. Similarly, the notion of entropy for subshifts X  AG is now replaced by that of mean-dimension (a notion due to Gromov [20]). In [5] we proved the GOE theorem for linear cellular automata : V G ! V G with alphabet a finite dimensional vector space and with G an amenable group. Moreover, we proved a linear version of Gottschalk surjunctivity theorem for residually finite groups. In the same paper we also establish a connection with the theory of group rings. Given a group G and a field K, there is a one-to-one correspondence between the elements in the group ring K[G] and the cellular automata : KG ! KG . This correspondence preserves the ring structures of K[G] and LCA(K; G). This led to a reformulation of a long standing problem, raised by Irving Kaplansky [23], about the absence of zero-divisors in K[G] for G a torsion-free group, in terms of the pre-injectivity of all 2 LCA(K; G). In [6] we proved the linear version of the Gromov– Weiss surjunctivity theorem for sofic groups and established another application to the theory of group rings. We extended the correspondence above to a ring isomorphism between the ring Matd (K[G]) of d  d matrices with coefficients in the group ring K[G] and LCA(Kd ; G). This led to a reformulation of another famous problem, raised by Irving Kaplansky [24] about the structure of group rings. A group ring K[G] is stably finite if and only if, for all d 1, all linear cellular automata : (Kd )G ! (Kd )G are surjunctive. As a byproduct we obtained another proof of the fact that group rings over sofic groups are stably finite,

Cellular Automata and Groups

a result previously established by G. Elek and A. Szabó [11] using different methods. The paper is organized as follows. In Sect. “Cellular Automata” we present the general definition of a cellular automaton for any alphabet and any group. This includes a few basic examples, namely Conway’s Game of Life, the majority action and the discrete Laplacian. In the subsequent section we specify our attention to cellular automata with a finite alphabet. We present the notions of Cayley graphs (for finitely generated groups), of amenable groups, and of entropy for G-invariant subsets in the configuration space. This leads to a description of the setting and the statement of the Garden of Eden theorem for amenable groups. We also give detailed expositions of a few examples showing that the hypotheses of amenability cannot, in general, be removed from the assumption of this theorem. We also present the notion of surjunctivity, of sofic groups and state the surjunctivity theorem of Gromov and Weiss for sofic groups. In Sect. “Linear Cellular Automata” we introduce the notions of linear cellular automata and of mean dimension for G-invariant subspaces in V G . We then discuss the linear analogue of the Garden of Eden theorem and, again, we provide explicit examples showing that the assumptions of the theorem (amenability of the group and finite dimensionality of the underlying vector space) cannot, in general, be removed. Finally we present the linear analogue of the surjunctivity theorem of Gromov and Weiss for linear cellular automata over sofic groups. In Sect. “Group Rings and Kaplansky Conjectures” we give the definition of a group ring and present a representation of linear cellular automata as matrices with coefficients in the group ring. This leads to the reformulation of the two long standing problems raised by Kaplansky about the structure of group rings. Finally, in Sect. “Future Directions” we present a list of open problems with a description of more recent results related to the Garden of Eden theorem and to the surjunctivity problem. Cellular Automata The Configuration Space Let G be a group, called the universe, and let A be a set, called the alphabet or the set of states. A configuration is a map x : G ! A. The set AG of all configurations is equipped with the right action of G defined by AG  G 3 (x; g) 7! x g 2 AG , where x g (g 0 ) D x(g g 0 ) for all g 0 2 G. Cellular Automata A cellular automaton over G with coefficients in A is a map : AG ! AG satisfying the following condition: there ex-

ists a finite subset M  G and a map  : AM ! A such that (x)(g) D (x g j M )

(1)

for all x 2 AG ; g 2 G, where x g j M denotes the restriction of xg to M. Such a set M is called a memory set and  is called a local defining map for . It follows directly from the definition that every cellular automaton : AG ! AG is G-equivariant, i. e., it satisfies (x g ) D (x) g

(2)

for all g 2 G and x 2 AG . Note that if M is a memory set for , then any finite set M 0  G containing M is also a memory set for . The local defining map associated with such an M 0 is the map 0 0 0 : AM ! A given by 0 D  ı , where  : AM ! AM is the restriction map. However, there exists a unique memory set M 0 of minimal cardinality. This memory set M 0 is called the minimal memory set for . We denote by CA(G; A) the set of all cellular automata over G with alphabet A. Examples Example 1 (Conway’s Game of Life [3]) The most famous example of a cellular automaton is the Game of Life of John Horton Conway. The set of states is A D f0; 1g. State 0 corresponds to absence of life while state 1 indicates life. Therefore passing from 0 to 1 can be interpreted as birth, while passing from 1 to 0 corresponds to death. The universe for Life is the group G D Z2 , that is, the free Abelian group of rank 2. The minimal memory set is M D f1; 0; 1g2  Z2 . The set M is the Moore neighborhood of the origin in Z2 . It consists of the origin (0; 0) and its eight neighbors ˙(1; 0); ˙(0; 1); ˙(1; 1); ˙(1; 1). The corresponding local defining map  : AM ! A given by 8 ˆ ˆ ˆ ˆ

ˆ 0:5 and to 0 for m < 0:5. If m is exactly 0.5, then the last state is assigned (s (T) D  i(T) ). i This memory mechanism is accumulative in their demand of knowledge of past history: to calculate the memory ˚  charge ! i(T) it is not necessary to know the whole  i(t) series, while it can be sequentially calculated as: ! i(T) D ˛! i(T1) C  i(T) . It is, ˝(T) D (˛ T  1)/(˛  1). The choice of the memory factor ˛ simulates the longterm or remnant memory effect: the limit case ˛ D 1 corresponds to memory with equally weighted records (full memory, equivalent to mode if k D 2), whereas ˛  1 intensifies the contribution of the most recent states and diminishes the contribution of the past ones (short term working memory). The choice ˛ D 0 leads to the ahistoric model. In the most unbalanced scenario up to T, i. e.:  i(1) D    D  i(T1) ¤  i(T) , it is: 1 ˛1 1 ) T D 2 ˛ 1 2 ˛T  ˛ 1 1 D : m(1; 1; : : : ; 1; 0) D ) T 2 ˛ 1 2

m(0; 0; : : : ; 0; 1) D

Thus, memory is only operative if ˛ is greater than a critical ˛T that verifies: ˛TT  2˛T C 1 D 0 ;

(1)

in which case cells will be featured at T with state values different to the last one. Initial operative values are: ˛3 D 0:61805, ˛4 D 0:5437. When T ! 1, Eq. 1 becomes: 2˛1 C 1 D 0, thus, in the k D 2 scenario, ˛-memory is not effective if ˛  0:5. A Worked Example: The Parity Rule The so-called parity rule states: cell alive if the number of neighbors is odd, dead on the contrary case. Figure 2 shows the effect of memory on the parity rule starting from a single live cell in the Moore neighborhood. In accordance with the above given values of ˛ 3 and ˛ 4 : (i) The

pattern at T D 4 is the ahistoric one if ˛  0:6, altered when ˛ 0:7, and (ii) the patterns at T D 5 for ˛ D 0:54 and ˛ D 0:55 differ. Not low levels of memory tend to freeze the dynamics since the early time-steps, e. g. over 0.54 in Fig. 2. In the particular case of full memory small oscillators of short range in time are frequently generated, such as the periodtwo oscillator that appears as soon as at T D 2 in Fig. 2. The group of evolution patterns shown in the [0.503,0.54] interval of ˛ variation of Fig. 2, is rather unexpected to be generated by the parity rule, because they are too sophisticated for this simple rule. On the contrary, the evolution patterns with very small memory, ˛ D 0:501, resemble those of the ahistoric model in Fig. 2. But this similitude breaks later on, as Fig. 3 reveals: from T D 19, the parity rule with minimal memory evolves producing patterns notably different to the ahistoric ones. These patterns tend to be framed in squares of size not over T  T, whereas in the ahistoric case, the patterns tend to be framed in 2T  2T square regions, so even minimal memory induces a very notable reduction in the affected cell area in the scenario of Fig. 2. The patterns of the featured cells tend not to be far to the actual ones, albeit examples of notable divergence can be traced in Fig. 2. In the particular case of the minimal memory scenario of Fig. 2, that of ˛ D 0:501, memory has no effect up to T D 9, when the pattern of featured live cells reduces to the initial one; afterward both evolutions are fairly similar up to T D 18, but at this time step both kinds of patterns notably differs, and since then the evolution patterns in Fig. 3 notably diverge from the ahistoric ones. To give consideration to previous states (historic memory) in two-dimensional CA tends to confine the disruption generated by a single live cell. As a rule, full memory tends to generate oscillators, and less historic information retained, i. e. smaller ˛ value, implies an approach to the ahistoric model in a rather smooth form. But the transition which decreases the memory factor from ˛ D 1:0 (full memory) to ˛ D 0:5 (ahistoric model), is not always regular, and some kind of erratic effect of memory can be traced. The inertial (or conservating) effect of memory dramatically changes the dynamics of the semitotalistic LIFE rule. Thus, (i) the vividness that some small clusters exhibit in LIFE, has not been detected in LIFE with memory. In particular, the glider in LIFE does not glide with

 Cellular Automata with Memory, Figure 2 The 2D parity rule with memory up to T D 15

383

384

Cellular Automata with Memory

Cellular Automata with Memory

Cellular Automata with Memory, Figure 3 The 2D parity rule with ˛ D 0:501 memory starting from a single site live cell up to T D 55

memory, but stabilizes very close to its initial position as , (ii) as the size of a configuration increases, ofthe tub ten live clusters tend to persist with a higher number of live cells in LIFE with memory than in the ahistoric formulation, (iii) a single mutant appearing in a stable agar can lead to its destruction in the ahistoric model, whereas its effect tends to be restricted to its proximity with memory [26]. One-Dimensional CA Elementary rules are one-dimensional, two-state rules operating on nearest neighbors. Following Wolfram’s notation, these rules are characterized by a sequence of binary values (ˇ) associated with each of the eight possible triplets 

(T) (T)   i1 ;  i(T) ;  iC1 :

111 ˇ1

110 ˇ2

101 ˇ3

100 ˇ4

011 ˇ5

010 ˇ6

001 ˇ7

000 ˇ8

The rules are conveniently specified by their rule numP ber R D 8iD1 ˇ i 28i . Legal rules are reflection symmetric (ˇ5 D ˇ2 ; ˇ7 D ˇ4 ), and quiescent (ˇ8 D 0), restrictions that leave 32 possible legal rules. Figure 4 shows the spatio-temporal patterns of legal rules affected by memory when starting from a single live cell [17]. Patterns are shown up to T D 63, with the memory factor varying from 0.6 to 1.0 by 0.1 intervals, and adopting also values close to the limit of its effectivity: 0.5. As a rule, the transition from the ˛ D 1:0 (fully historic) to the ahistoric scenario is fairly gradual, so that the patterns become more expanded as less historic memory is retained (smaller ˛). Rules 50, 122, 178,250, 94, and 222,254 are paradigmatic of this smooth evolution. Rules 222 and

254 are not included in Fig. 4 as they evolve as rule 94 but with the inside of patterns full of active cells. Rules 126 and 182 also present a gradual evolution, although their patterns with high levels of memory models hardly resemble the historic ones. Examples without a smooth effect of memory are also present in Fig. 4: (i) rule 150 is sharply restrained at ˛ D 0:6, (ii) the important rule 54 extinguish in [0.8,0.9], but not with full memory, (iii) the rules in the group {18,90,146,218} become extinct from ˛ D 0:501. Memory kills the evolution for these rules already at T D 4 for ˛ values over ˛ 3 (thus over 0.6 in Fig. 4): after T D 3 all the cells, even the two outer cells alive at T D 3, are featured as dead, and (iv) rule 22 becomes extinct for ˛ D 0:501, not in 0.507, 0.6, and 0.7, again extinguish at 0.8 and 0.9, and finally generate an oscillator with full memory. It has been argued that rules 18, 22, 122, 146 and 182 simul ate Rule 90 in that their behavior coincides when restricted to certain spatial subsequences. Starting with a single site live cell, the coincidence fully applies in the historic model for rules 90, 18 and 146. Rule 22 shares with these rules the extinction for high ˛ values, with the notable exception of no extinction in the fully historic model. Rules 122 and 182 diverge in their behavior: there is a gradual decrease in the width of evolving patterns as ˛ is higher, but they do not reach extinction. Figure 5 shows the effect of memory on legal rules when starting at random: the values of sites are initially uncorrelated and chosen at random to be 0 (blank) or 1 (gray) with probability 0.5. Differences in patterns resulting from reversing the center site value are shown as black pixels. Patterns are shown up to T D 60, in a line of size 129 with periodic boundary conditions imposed on the edges. Only the nine legal rules which generate non-periodic pat-

385

386

Cellular Automata with Memory

Cellular Automata with Memory, Figure 4 Elementary, legal rules with memory from a single site live cell

terns in the ahistoric scenario are significantly affected by memory. The patterns with inverted triangles dominate the scene in the ahistoric patterns of Fig. 5, a common appearance that memory tends to eliminate. History has a dramatic effect on Rule 18. Even at the low value of ˛ D 0:6; the appearance of its spatio-tem-

poral pattern fully changes: a number of isolated periodic structures are generated, far from the distinctive inverted triangle world of the ahistoric pattern. For ˛ D 0:7; the live structures are fewer, advancing the extinction found in [0.8,0.9]. In the fully historic model, simple periodic patterns survive.

Cellular Automata with Memory

Cellular Automata with Memory, Figure 4 (continued)

Rule 146 is affected by memory in much the same way as Rule 18 because their binary codes differ only in their ˇ1 value. The spatio-temporal of rule 182 and its equivalent Rule 146 are reminiscent, though those of Rule 182 look like a negatives photogram of those of Rule 146. The effect of memory on rule 22 and the complex rule 54 is similar. Their spatio-temporal patterns in ˛ D 0:6 and ˛ D 0:7 keep the essential of the ahistoric, although the inverted triangles become enlarged and tend to be

more sophisticated in their basis. A notable discontinuity is found for both rules ascending in the value of the memory factor: in ˛ D 0:8 and ˛ D 0:9 only a few simple structures survive. But unexpectedly, the patterns of the fully historic scenario differ markedly from the others, showing a high degree of synchronization. The four remaining chaotic legal rules (90, 122, 126 and 150) show a much smoother evolution from the ahistoric to the historic scenario: no pattern evolves either to

387

388

Cellular Automata with Memory

Cellular Automata with Memory, Figure 5 Elementary, legal rules with memory starting at random

full extinction or to the preservation of only a few isolated persistent propagating structures (solitons). Rules 122 and 126, evolve in a similar form, showing a high degree of synchronization in the fully historic model. As a rule, the effect of memory on the differences in patterns (DP) resulting from reversing the value of its initial center site is reminiscent of that on the spatio-temporal patterns, albeit this very much depends on the actual simulation run. In the case of rule 18 for example, damage is not present in the simulation of Fig. 5. The group of rules 90, 122, 126 and 150 shows a, let us say canonical, fairly gradual evolution from the ahistoric to the historic scenario, so that the DP appear more constrained as more historic memory is retained, with no extinction for any ˛

value. Figure 6 shows the evolution of the fraction T of sites with value 1, starting at random ( 0 D 0:5). The simulation is implemented for the same rules as in Fig. 4, but with notably wider lattice: N D 500: A visual inspection of the plots in Fig. 6, ratifies the general features observed in the patterns in Fig. 5 regarding density. That also stands for damage spreading: as a rule, memory depletes the damaged region. In one-dimensional r D 2 CA, the value of a given site depends on values of the nearest and next-nearest neighbors. Totalistic rules with memory have the form:  r D 2(T) (T) (T) (T)  i(TC1) D  s(T) C s i2 i1 C s i C s iC1 C s iC2 . The effect of memory on these rules follows the way traced in the r D 1 context, albeit with a rich casuistic studied in [14].

Cellular Automata with Memory

Cellular Automata with Memory, Figure 5 (continued)

Probabilistic CA So far the CA considered are deterministic. In order to study perturbations to deterministic CA as well as transitional changes from one deterministic CA to another, it is natural to generalize the deterministic CA framework to the probabilistic scenario. In the elementary scenario, the ˇ are replaced by probabilities .  (T) (T)  : ;  i(T) ;  iC1 p D P  i(TC1) D 1  i1 111 110 101 100 011 010 001 000 ˇ1 ˇ2 ˇ3 ˇ4 ˇ5 ˇ6 ˇ7 ˇ8 p1 p2 p3 p4 p5 p6 p7 p8

ˇ 2 f0; 1g p 2 [0; 1]

As in the deterministic scenario, memory can be embedded in probabilistic CA (PCA) by featuring cells by a summary of past states si instead of by their last  (T) (T)  state  i : p D P  i(TC1) D 1/s (T) i1 ; s i ; s iC1 . Again, mem-

ory is embedded into the characterization of cells but not in the construction of the stochastic transition rules, as done in the canonical approach to PCA. We have explored the effect of memory    on three different subsets 0; p ; 0; p ; p ; 0; p ; 0 2 4 2 4   , 0; 0; p3 ; 1; 0; p6 ; 1; 0 , and p1 ; p2 ; p1 ; p2 ; p2 ; 0; p2 ; 0 in [9]. Other Memories A number of average-like memory mechanisms can readily be proposed by using weights different to that implemented in the ˛-memory mechanism: ı(t) D ˛ Tt . Among the plausible choices of ı, we mention the weights ı(t) D t c and ı(t) D c t , c 2 N, in which the larger the value of c, the more heavily is the recent past taken into account, and consequently the closer the scenario to the ahistoric model [19,21]. Both weights allow for integer-based

389

390

Cellular Automata with Memory

Cellular Automata with Memory, Figure 5 (continued)

arithmetics (à la CA) comparing 2! (T) to 2˝(T) to get the featuring states s (a clear computational advantage over the ˛-based model), and are accumulative in respect to charge: ! i(T) D ! i(T1) C T c  i(T) , ! i(T) D ! i(T1) C c T  i(T) . Nevertheless, they share the same drawback: powers explode, at high values of t, even for c D 2. Limited trailing memory would keep memory of only the last states. This is implemented in the conP text of average memory as: ! i(T) D TtD> ı(t) i(t) , with > D max(1; T  C 1). Limiting the trailing memory would approach the model to the ahistoric model ( D 1). In the geometrically discounted method, such an effect is more appreciable when the value of ˛ is high, whereas at low ˛ values (already close to the ahistoric model when memory is not limited) the effect of limiting the trailing memory is not so important. In the k D 2 context, if D 3; provided that ˛ > ˛3 D 0:61805; the memory

mechanism turns out to be that of selecting the mode   D mode  i(T2) ;  i(T) ;  i(T1) , of the last three states: s(T) i i. e. the elementary rule 232. Figure 7 shows the effect of this kind of memory on legal rules. As is known, history has a dramatic effect on Rules 18, 90, 146 and 218 as their pattern dies out as early as at T D 4: The case of Rule 22 is particular: two branches are generated at T D 17 in the historic model; the patterns of the remaining rules in the historic model are much reminiscent of the ahistoric ones, but, let us say, compressed. Figure 7 shows also the effect of memory on some relevant quiescent asymmetric rules. Rule 2 shifts a single site live cell one space at every time-step in the ahistoric model; with the pattern dying at T D 4. This evolution is common to all rules that just shift a single site cell without increasing the number of living cells at T D 2, this is the case of the important rules 184 and 226. The patterns generated by

Cellular Automata with Memory

Cellular Automata with Memory, Figure 6 Evolution of the density starting at random in elementary legal rules. Color code: blue ! full memory, black ! ˛ D 0:8, red ! ahistoric model

Cellular Automata with Memory, Figure 7 Legal (first row of patterns) and quiescent asymmetric elementary rules significantly affected by the mode of the three last states of memory

rules 6 and 14 are rectified (in the sense of having the lines in the spatio-temporal pattern slower slope) by memory in such a way that the total number of live cells in the historic

and ahistoric spatio-temporal patterns is the same. Again, the historic patterns of the remaining rules in Fig. 7 seem, as a rule, like the ahistoric ones compressed [22].

391

392

Cellular Automata with Memory

Cellular Automata with Memory, Figure 8 The Rule 150 with elementary rules up to R D 125 as memory

Cellular Automata with Memory

Cellular Automata with Memory, Figure 9 The parity rule with elementary rules as memory. Evolution from T D 4  15 in the Neumann neighborhood starting from a singe site live cell

Elementary rules (ER, noted f ) can in turn act as memory rules:   D f  i(T2) ;  i(T) ;  i(T1) s (T) i Figure 8 shows the effect of ER memories up to R D 125 on rule 150 starting from a single site live cell up to T D 13. The effect of ER memories with R > 125 on rule 150 as well as on rule 90 is shown in [23]. In the latter case, complementary memory rules (rules whose rule number adds 255) have the same effect on rule 90 (regardless of the role played by the three last states in  and the initial configuration). In the ahistoric scenario, Rules 90 and 150 are linear (or additive): i. e., any initial pattern can be decomposed into the superposition of patterns from a single site seed. Each of these configurations can be evolved independently and the results superposed (module two) to obtain the final complete pattern. The additivity of rules 90 and 150 remains in the historic model with linear memory rules.

Figure 9 shows the effect of elementary rules on the 2D parity rule with von Neumann neighborhood from a singe site live cell. This figure shows patterns from T D 4, being . The consideration of CA the three first patterns: rules as memory induces a fairly unexplored explosion of new patterns. CA with Three Statesk This section deals with CA with three possible values at each site (k D 3), noted f0; 1; 2g, so the rounding mechanism is implemented by comparing the unrounded weighted mean m to the hallmarks 0.5 and 1.5, assigning the last state in case on an equality to any of these values. Thus, s T D 0 if m T < 0:5 ; s T D 1 if 0:5 < m T < 1:5 ; s T D 2 if m T > 1:5 ;

and

s T D  T if m T D 0:5 or m T D 1:5 :

393

394

Cellular Automata with Memory

Cellular Automata with Memory, Figure 10 Parity k D 3 rules starting from a single  D 1 seed. The red cells are at state 1, the blue ones at state

In the most unbalanced cell dynamics, historic memory takes effect after time step T only if ˛ > ˛T , with 3˛TT  4˛T C 1 D 0, which in the temporal limit becomes 4˛ C 1 D 0 , ˛ D 0:25. In general, in CA with k states (termed from 0 to k  1), the characteristic equation at T is (2k  3)˛TT  (2k  1)˛T C 1 D 0, which becomes 2(k  1)˛ C 1 D 0 in the temporal limit. It is then concluded that memory does not affect the scenario if ˛  ˛ (k) D 1/(2(k  1)).  (T) We study first totalistic rules:  i(TC1) D   i1 C  (T)  i(T) C  iC1 , characterized by a sequence of ternary values (ˇs ) associated with each of the seven possible values of the sum (s) of the neighbors: (ˇ6 ; ˇ5 ; ˇ4 ; ˇ3 ; ˇ2 ; P ˇ1 ; ˇ0 ), with associated rule number R D 6sD0 ˇs 3s 2 [0; 2186]: Figure 10 shows the effect of memory on quiescent (ˇ0 D 0) parity rules, i. e. rules with ˇ1 ; ˇ3 and ˇ 5 non null, and ˇ2 D ˇ4 D ˇ6 D 0: Patterns are shown up to T D 26. The pattern for ˛ D 0:3 is shown to test its proximity to the ahistoric one (recall that if ˛  0:25 memory takes no effect). Starting with a single site seed it can be concluded, regarding proper three-state rules such as those in Fig. 10, that: (i) as an overall rule the patterns become more expanded as less historic memory is retained (smaller ˛). This characteristic inhibition of growth effect of memory is traced on rules 300 and 543 in Fig. 10, (ii) the transition from the fully historic to the ahistoric scenario tends to be gradual in regard to the amplitude of the spatio-temporal patterns, although their composition can differ notably, even at close ˛ values, (iii) in contrast

to the two-state scenario, memory fires the pattern of some three-state rules that die out in the ahistoric model, and no rule with memory dies out. Thus, the effect of memory on rules 276, 519, 303 and 546 is somewhat unexpected: they die out at ˛  0:3 but at ˛ D 0:4 the pattern expands, the expansion being inhibited (in Fig. 10) only at ˛ 0:8. This activation under memory of rules that die at T D 3 in the ahistoric model is unfeasible in the k D 2 scenario. The features in the evolving patterns starting from a single seed in Fig. 10 are qualitatively reflected starting at random as shown with rule 276 in Fig. 11, which is also activated (even at ˛ D 0:3) when starting at random. The effect of average memory (˛ and integer-based models, unlimited and limited trailing memory, even D 2) and that of the mode of the last three states has been studied in [21]. When working with more than three states, it is an inherent consequence of averaging the tendency to bias the featuring state to the mean value: 1. That explains the redshift in the previous figures. This led us to focus on a much more fair memory mechanism: the mode, in what follows. Mode memory allows for manipulation of pure symbols, avoiding any computing/arithmetics. In excitable CA, the three states are featured: resting 0, excited 1 and refractory 2. State transitions from excited to refractory and from refractory to resting are unconditional, they take place independently on a cell’s neighborhood state:  i(T) D 1 !  i(TC1) D 2,  i(T) D 2 !  i(TC1) D 0. In [15] the excitation rule adopts a Pavlovian phenomenon of defensive inhibition: when strength of stimulus applied exceeds a certain limit the

Cellular Automata with Memory

Cellular Automata with Memory, Figure 11 The k D 3; R D 276 rule starting at random

system ‘shuts down’, this can be naively interpreted as an inbuilt protection of energy loss and exhaustion. To simulate the phenomenon of defensive inhibition we adopt interval excitation rules [2], and a resting cell becomes excited only if one or two of its neighbors are excited:   P  i(T) D 0 !  i(T) D 1 if j2N i  j(T) D 1 2 f1; 2g [3]. Figure 12 shows the effect of mode of the last three time steps memory on the defensive-inhibition CA rule with the Moore neighborhood, starting from a simple configuration. At T D 3 the outer excited cells in the actual pattern are not featured as excited but as resting cells (twice resting versus one excited), and the series of evolving patterns with memory diverges from the ahistoric evolution at T D 4, becoming less expanded. Again, memory tends to restrain the evolution. The effect of memory on the beehive rule, a totalistic two-dimensional CA rule with three states implemented in the hexagonal tessellation [57] has been explored in [13].

Cellular Automata with Memory, Figure 12 Effect of mode memory on the defensive inhibition CA rule

Reversible CA The second-order in time implementation based on the subtraction modulo of the number of states (noted ):  i(TC1) D ( j(T) 2 N i )   i(T1) , readily reverses as:  i(T1) D ( j(T) 2 N i )   i(TC1) . To preserve the reversible feature, memory has to be endowed only in the pivotal component of the rule transition, so:  i(T1) D (TC1) (s(T) . j 2 Ni )  i For reversing from T it is necessary to know not only  i(T) and  i(TC1) but also ! i(T) to be compared to ˝(T), to obtain: 8 0 if 2! i(T) < ˝(T) ˆ ˆ < s(T) D  i(TC1) if 2! i(T) D ˝(T) i ˆ ˆ : 1 if 2! i(T) > ˝(T) : Then to progress in the reversing, to obtain s (T1) D i   round ! i(T1) /˝(T  1) , it is necessary to calculate   ! i(T1) D ! i(T)   i(T) /˛. But in order to avoid dividing by the memory factor (recall that operations with real numbers are not exact in computer arithmetic), it is preferable to work with  i(T1) D ! i(T)  i(T) , and to comP Tt . This leads pare these values to  (T  1) D T1 tD1 ˛ to: 8 0 if 2 i(T1) <  (T  1) ˆ ˆ < s(T1) D  i(T) if 2 i(T1) D  (T  1) i ˆ ˆ : 1 if 2 i(T1) >  (T  1) :

395

396

Cellular Automata with Memory

Cellular Automata with Memory, Figure 13 The reversible parity rule with memory

In general:  i(T ) D  i(T C1) ˛ 1  i(T C1) ;  (T  ) D  (T  C 1)  ˛ 1 . Figure 13 shows the effect of memory on the reversible parity rule starting from a single site live cell, so the scenario of Figs. 2 and 3, with the reversible qualification. As expected, the simulations corresponding to ˛ D 0:6 or below shows the ahistoric pattern at T D 4, whereas memory leads to a pattern different from ˛ D 0:7, and the pattern at T D 5 for ˛ D 0:54 and ˛ D 0:55 differ. Again, in the reversible formulation with memory, (i) the configuration of the patterns is notably altered, (ii) the speed of diffusion of the area affected are notably reduced, even by minimal

memory (˛ D 0:501), (iii) high levels of memory tend to freeze the dynamics since the early time-steps. We have studied the effect of memory in the reversible formulation of CA in many scenarios, e. g., totalistic, k D r D 2 rules [7], or rules with three states [21]. Reversible systems are of interest since they preserve information and energy and allow unambiguous backtracking. They are studied in computer science in order to design computers which would consume less energy [51]. Reversibility is also an important issue in fundamental physics [31,41,52,53]. Geraldt ’t Hooft, in a speculative paper [34], suggests that a suitably defined deterministic, lo-

Cellular Automata with Memory

Cellular Automata with Memory, Figure 14 The parity rule with four inputs: effect of memory and random rewiring. Distance between two consecutive patterns in the ahistoric model (red) and memory models of ˛ levels: 0.6,0.7.0.8, 0.9 (dotted) and 1.0 (blue)

cal reversible CA might provide a viable formalism for constructing field theories on a Planck scale. Svozil [50] also asks for changes in the underlying assumptions of current field theories in order to make their discretization appear more CA-like. Applications of reversible CA with memory in cryptography are being scrutinized [30,42]. Heterogeneous CA CA on networks have arbitrary connections, but, as proper CA, the transition rule is identical for all cells. This generalization of the CA paradigm addresses the intermediate class between CA and Boolean networks (BN, considered in the following section) in which, rules may be different at each site. In networks two topological ends exist, random and regular networks, both display totally opposite geometric properties. Random networks have lower clustering coefficients and shorter average path length between nodes commonly known as small world property. On the other hand, regular graphs, have a large average path length between nodes and high clustering coefficients. In an attempt to build a network with characteristics observed in real networks, a large clustering coefficient and a small world property, Watts and Strogatz (WS, [54]) proposed a model built by randomly rewiring a regular lattice. Thus, the WS model interpolates between regular and ran-

dom networks, taking a single new parameter, the random rewiring degree, i. e.: the probability that any node redirects a connection, randomly, to any other. The WS model displays the high clustering coefficient common to regular lattices as well as the small world property (the small world property has been related to faster flow in the information transmission). The long-range links introduced by the randomization procedure dramatically reduce the diameter of the network, even when very few links are rewired. Figure 14 shows the effect of memory and topology on the parity rule with four inputs in a lattice of size 65  65 with periodic boundary conditions, starting at random. As expected, memory depletes the Hamming distance between two consecutive patterns in relation to the ahistoric model, particularly when the degree of rewiring is high. With full memory, quasi-oscillators tend to appear. As a rule, the higher the curve the lower the memory factor ˛, but in the particular case of a regular lattice (and lattice with 10% of rewiring), the evolution of the distance in the full memory model turns out rather atypical, as it is maintained over some memory models with lower ˛ parameters. Figure 15 shows the evolution of the damage spread when reversing the initial state of the 3  3 central cells in the initial scenario of Fig. 14. The fraction of cells with the state reversed is plotted in the regular and 10% of rewiring scenarios. The plots corresponding to higher

397

398

Cellular Automata with Memory

Cellular Automata with Memory, Figure 15 Damage up to T D 100 in the parity CA of Fig. 14

Cellular Automata with Memory, Figure 16 Relative Hamming distance between two consecutive patterns. Boolean network with totalistic, K D 4 rules in the scenario of Fig. 14

rates of rewiring are very similar to that of the 10% case in Fig. 15. Damage spreads fast very soon as rewiring is present, even in a short extent. Boolean Networks In Boolean Networks (BN,[38]), instead of what happens in canonical CA, cells may have arbitrary connections and rules may be different at each site. Working with totalistic P rules:  i(TC1) D  i ( j2N i s (T) j ). The main features on the effect of memory in Fig. 14 are preserved in Fig. 16: (i) the ordering of the historic networks tends to be stronger with a high memory factor, (ii) with full memory, quasi-oscillators appear (it seems that full memory tends to induce oscillation), (iii) in the particular case of the regular graph (and a lesser extent in the networks with low rewiring), the evolution of the

full memory model turns out rather atypical, as it is maintained over some of those memory models with lower ˛ parameters. The relative Hamming distance between the ahistoric patterns and those of historic rewiring tends to be fairly constant around 0.3, after a very short initial transition period. Figure 17 shows the evolution of the damage when reversing the initial state of the 3  3 central cells. As a rule in every frame, corresponding to increasing rates of random rewiring, the higher the curve the lower the memory factor ˛. The damage vanishing effect induced by memory does result apparently in the regular scenario of Fig. 17, but only full memory controls the damage spreading when the rewiring degree is not high, the dynamics with the remaining ˛ levels tend to the damage propagation that characterizes the ahistoric model. Thus, with up to 10% of connections rewired, full memory notably controls the

Cellular Automata with Memory

Cellular Automata with Memory, Figure 17 Evolution of the damage when reversing the initial state of the 3  3 central cells in the scenario of Fig. 16

spreading, but this control capacity tends to disappear with a higher percentage of rewiring connections. In fact, with rewiring of 50% or higher, neither full memory seems to be very effective in altering the final rate of damage, which tends to reach a plateau around 30% regardless of scenario. A level notably coincident with the percolation threshold in site percolation in the simple cubic lattice, and the critical point for the nearest neighbor Kaufmann model on the square lattice [49]: 0.31.

Structurally Dynamic CA Structurally dynamic cellular automata (SDCA) were suggested by Ilachinski and Halpern [36]. The essential new feature of this model was that the connections between the cells are allowed to change according to rules similar in nature to the state transition rules associated with the conventional CA. This means that given certain conditions, specified by the link transition rules, links between rules may be created and destroyed; the neighborhood of each cell is dynamic, so, state and link configurations of an SDCA are both dynamic and continually interacting. If cells are numbered 1 to N, their connectivity is specified by an N  N connectivity matrix in which  i j D 1 if cells i and j are connected; 0 otherwise. So, now: (T) (T) (TC1) N i D f j/ i j D 1g and  i D ( j(T) 2 N i(T) ). The geodesic distance between two cells i and j, ı i j , is defined as the number of links in the shortest path between i and j. Thus, i and j are direct neighbors if ı i j D 1, and are next-nearest neighbors if ı i j D 2, so

(T)

D f j/ı (T) i j D 2g. There are two types of link transition functions in an SDCA: couplers and decouplers, the former add new links, the latter remove links. The coupler and decoupler set determines the link transition rule: (T) (T) (TC1) D (l i(T) ij j ;  i ;  j ). Instead of introducing the formalism of the SDCA, we deal here with just one example in which the decoupler rule removes all links connected to cells in which both val(TC1) D 0 iff  i(T) C  j(T) D 0) ues are zero ((T) i j D 1 ! i j and the coupler rule adds links between all next-nearest neighbor sites in which both values are one ((T) ij D (T) (T) (T) j D 1 iff  C  D 2 and j 2 N N 0 ! (TC1) i i j i ). The SDCA with these transition rules for connections, together with the parity rule for mass states, is implemented in Fig. 18, in which the initial Euclidean lattice with four neighbors (so the generic cell  has eight next-nearest neighbors: ) is seeded with a 3  3 block of ones. After the first iteration, most of the lattice structure has decayed as an effect of the decoupler rule, so that the active value cells and links are confined to a small region. After T D 6, the link and value structures become periodic, with a periodicity of two. Memory can be embedded in links in a similar manner as in state values, so the link between any two cells is featured by a mapping of its previous link values: l i(T) j D (T) ; : : : ;  ). The distance between two cells in the hisl((1) ij ij toric model (di j ), is defined in terms of l instead of  values, so that i and j are direct neighbors if d i j D 1, and are next-nearest neighbors if d i j D 2. Now: N i(T) D f j/d (T) ij D D 2g. Generalizing the approach 1g, and N N i(T) D f j/d (T) ij NN i

399

400

Cellular Automata with Memory

Cellular Automata with Memory, Figure 18 The SDCA described in text up to T D 6

Cellular Automata with Memory, Figure 19 The SD cellular automaton introduced in text with weighted memory of factor ˛. Evolution from T D 4 up to T D 9 starting as in Fig. 18

to embedded memory applied to states, the unchanged transition rules ( and ) operate on the featured link 2 N i ); (TC1) D and cell state values:  i(TC1) D (s (T) j ij (T) (T) (T) (l i j; s i ; s j ). Figure 19 shows the effect of ˛-memory on the cellular automaton above introduced starting as in Fig. 18. The effect of memory on SDCA in the hexagonal and triangular tessellations is scrutinized in [11]. A plausible wiring dynamics when dealing with excitable CA is that in which the decoupler rule removes all links connected to cells in which both values are at refrac(TC1) tory state ((T) D 0 iff  i(T) D  j(T) =2) i j D 1 ! i j and the coupler rule adds links between all next-nearest neighbor sites in which both values are excited ((T) ij D D 1 iff  i(T) D  j(T) D 1 and j 2 N N (T) 0 ! (TC1) ij i ). In the SDCA in Fig. 20, the transition rule for cell states is that of the generalized defensive inhibition rule: resting cell is excited if its ratio of excited and connected to the cell neighbors to total number of connected neighbors lies in the interval [1/8,2/8]. The initial scenario of Fig. 20 is that of Fig. 12 with the wiring network revealed, that of an Euclidean lattice with eight neighbors, in which, the generic

cell  has 16 next-nearest neighbors: . No decoupling is verified at the first iteration in Fig. 20, but the excited cells generate new connections, most of them lost, together with some of the initial ones, at T D 3. The excited cells at T D 3 generate a crown of new connections at T D 4. Figure 21 shows the ahistoric and mode memory patterns at T D 20. The figure makes apparent the preserving effect of memory. The Fredkin’s reversible construction is feasible in the SDCA scenario extending the  operation also to links: (T) (T) (T1) D ((T) . These automata (TC1) ij i j ; i ;  j )  i j  may be endowed with memory as:  i(TC1) D  s (T) 2 j  (T) (T) (T)  (T)  (T1) (TC1) (T1) N i  i ; i j D l i j ; s i ; s j  i j [12]. The SDCA seems to be particularly appropriate for modeling the human brain function—updating links between cells imitates variation of synaptic connections between neurons represented by the cells—in which the relevant role of memory is apparent. Models similar to SDCA have been adopted to build a dynamical network approach to quantum space-time physics [45,46]. Reversibility is an important issue at such a fundamental physics level. Technical applications of SDCA may also be traced [47].

Cellular Automata with Memory

Cellular Automata with Memory, Figure 20 The k D 3 SD cellular automaton described in text, up to T D 4

Cellular Automata with Memory, Figure 21 The SD cellular automaton starting as in Fig. 20 at T D 20, with no memory (left) and mode memory in both cell states and links

Anyway, besides their potential applications, SDCA with memory have an aesthetic and mathematical interest on their own [1,35]. Nevertheless, it seems plausible that further study on SDCA (and Lattice Gas Automata with dynamical geometry [40]) with memory should turn out to be profitable. Memory in Other Discrete Time Contexts Continuous-Valued CA The mechanism of implementation of memory adopted here, keeping the transition rule unaltered but applying it to a function of previous states, can be adopted in any spatialized dynamical system. Thus, historic memory can be embedded in:  Continuousvalued CA (or Coupled Map Lattices in which the state variable ranges in R, and the transi-

tion rule ' is a continuous function [37]), just by considering m instead of  in the application of the up (T)  . An elementary dating rule:  i(TC1) D ' m(T) j 2 Ni CA of this kind with memory would be [20]:  i(TC1) D  (T) (T) (T)  1 3 m i1 C m i C m iC1 .  Fuzzy CA, a sort of continuous CA with states ranging in the real [0,1] interval. An illustration of the effect of memory in fuzzy CA is given in [17]. The illustration  (T) operates on the elementary rule 90 :  i(TC1) D  i1 ^ (T)  (T) (T)  ) _ ((: i1 ) ^  iC1 , which after fuzzification (: iC1 (a_b ! min(1; aCb); a^b ! ab; :a ! 1a) yields: (T) (T) (T) (T) C iC1 2 i1  iC1 ; thus incorporating  i(TC1) D  i1

(T) (T) (T) memory:  i(TC1) D m(T) i1 C m iC1  2m i1 m iC1 .  Quantum CA, such, for example, as the simple 1D quantum CA models introduced in [32]: 1  (T) (T)  ;  j(TC1) D 1/2 iı j1 C  j(T) C iı   jC1 N

401

402

Cellular Automata with Memory

which would become with memory [20]:  j(TC1) D

 1  (T)  (T) iım(T) j1 C m j C iı m jC1 : 1/2 N

Spatial Prisoner’s Dilemma The Prisoner’s Dilemma (PD) is a game played by two players (A and B), who may choose either to cooperate (C or 1) or to defect (D or 0). Mutual cooperators each score the reward R, mutual defectors score the punishment P ; D scores the temptation T against C, who scores S (sucker’s payoff) in such an encounter. Provided that T > R > P > S, mutual defection is the only equilibrium strategy pair. Thus, in a single round both players are to be penalized instead of both rewarded, but cooperation may be rewarded in an iterated (or spatial) formulation. The game is simplified (while preserving its essentials) if P D S D 0: Choosing R D 1; the model will have only one parameter: the temptation T =b. In the spatial version of the PD, each player occupies at a site (i,j) in a 2D lattice. In each generation the payoff of a given individual (p(T) i; j ), is the sum over all interactions with the eight nearest neighbors and with its own site. In the next generation, an individual cell is assigned the decision (d (T) i; j ) that received the highest payoff among all the cells of its Moore’s neighborhood. In case of a tie, the cell retains its choice. The spatialized PD (SPD for short) has proved to be a promising tool to explain how cooperation can hold out against the ever-present threat of exploitation [43]. This is a task that presents problems in the classic struggle for survival Darwinian framework. When dealing with the SPD, memory can be embedded not only in choices but also in rewards. Thus, in the historic model we dealt with, at T: (i) the payoffs coming from previous rounds are accumulated ( i;(T)j ), and (ii) players are featured by a summary of past decisions (ı (T) i; j ). Again, in each round or generation, a given cell plays with each of the eight neighbors and itself, the decision ı in the cell of the neighborhood with the highest  being adopted. This approach to modeling memory has been rather neglected, the usual being that of designing strategies that specify the choice for every possible outcome in the sequence of historic choices recalled [33,39]. Table 1 shows the initial scenario starting from a single defector if 8b > 9 , b > 1:125, which means that neighbors of the initial defector become defectors at T D 2. Nowak and May paid particular attention in their seminal papers to b D 1:85, a high but not excessive temptation value which leads to complex dynamics. After T D 2, defection can advance to a 5  5 square or be

Cellular Automata with Memory, Table 1 Choices at T D 1 and T D 2; accumulated payoffs after T D 1 and T D 2 starting from a single defector in the SPD. b > 9/8

d(1) D ı (1) 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1

1 1 1 1 1

p(1) D  (1) 9 9 9 9 9 8 8 8 9 8 8b 8 9 8 8 8 9 9 9 9

 (2) = ˛p(1) + p(2) 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 8 9˛ + 7 9˛ + 9 9˛ + 9 8˛ + 5b 9˛ + 9 9˛ + 9 8˛ + 3b 9˛ + 9 9˛ + 9 8˛ + 5b 9˛ + 9 9˛ + 8 9˛ + 7 9˛ + 9 9˛ + 9 9˛ + 9

9 9 9 9 9

9˛ + 9 9˛ + 6 8˛ + 3b 8˛ 8˛ + 3b 9˛ + 6 9˛ + 9

d(2) D ı(2) 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1

9˛ + 9 9˛ + 7 8˛ + 5b 8˛ + 3b 8˛ + 5b 9˛ + 7 9˛ + 9

9˛ + 9 9˛ + 8 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 8 9˛ + 9

1 1 0 0 0 1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1

9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9 9˛ + 9

restrained as a 3  3 square, depending on the comparison of 8˛ C 5  1:85 (the maximum  value of the recent defectors) with 9˛ C 9 (the  value of the non-affected players). As 8˛ C 5  1:85 D 9˛ C 9 ! ˛ D 0:25, i. e., if ˛ > 0:25, defection remains confined to a 3  3 square at T D 3. Here we see the paradigmatic effect of memory: it tends to avoid the spread of defection. If memory is limited to the last three iterations:  i;(T)j D  2 (T2) (T) ˛ 2 p(T2) C˛ p(T1) C p(T) C˛d (T1) C i; j i; j i; j ; m i; j D ˛ d i; j i; j  (T) (T) (T) 2 d i; j /(˛ C ˛ C 1); ) ı i; j D round(m i; j ), with assigna(2) tions at T D 2:  i;(2)j D ˛ i;(1)j C  i;(2)j ; ı (2) i; j D d i; j . Memory has a dramatic restrictive effect on the advance of defection as shown in Fig. 22. This figure shows the frequency of cooperators (f ) starting from a single defector and from a random configuration of defectors in a lattice of size 400  400 with periodic boundary conditions when b D 1:85. When starting from a single defector, f at time step T is computed as the frequency of cooperators within the square of size (2(T  1) C 1)2 centered on the initial D site. The ahistoric plot reveals the convergence of f to 0.318, (which seems to be the same value regardless of the initial conditions [43]). Starting from a single defector (a), the model with small memory (˛ D 0:1) seems to reach a similar f value, but sooner and in a smoother way. The plot corresponding to ˛ D 0:2 still shows an early decay in f that leads it to about 0.6, but higher memory factor values lead f close to or over 0.9 very soon. Starting at random (b), the curves corresponding to

Cellular Automata with Memory

Cellular Automata with Memory, Figure 22 Frequency of cooperators (f) with memory of the last three iterations. a starting from a single defector, b starting at random (f(1) D 0:5). The red curves correspond to the ahistoric model, the blue ones to the full memory model, the remaining curves to values of ˛ from 0.1 to 0.9 by 0.1 intervals, in which, as a rule, the higher the ˛ the higher the f for any given T

0:1  ˛  0:6 (thus with no memory of choices) do mimic the ahistoric curve but with higher f , as ˛ 0:7 (also memory of choices) the frequency of cooperators grows monotonically to reach almost full cooperation: D persists as scattered unconnected small oscillators (D-blinkers), as shown in Fig. 23. Similar results are found for any temptation value in the parameter region 0:8 < b < 2:0, in which spatial chaos is characteristic in the ahistoric model. It is then concluded that short-type memory supports cooperation. As a natural extension of the described binary model, the 0-1 assumption underlying the model can be relaxed by allowing for degrees of cooperation in a continuousvalued scenario. Denoting by x the degree of cooperation of player A and by y the degree of cooperation of the player B, a consistent way to specify the pay-off for values of x and y other than zero or one is to simply interpolate between the extreme payoffs of the binary case. Thus, the payoff that the player A receives is:



y R S G A (x; y) D (x; 1  x) : T P 1 y In the continuous-valued formulation it is ı m,  (1) historic (2)  including ı (2) D ˛d C d /(˛ C 1). Table 2 illusi; j i; j i; j trates the initial scenario starting from a single (full) defector. Unlike in the binary model, in which the initial defector never becomes a cooperator, the initial defector cooperates with degree ˛/(1 C ˛) at T D 3: its neighbors which received the highest accumulated payoff (those in the corners with  (2) D 8˛ C 5b > 8b˛), achieved this mean de-

Cellular Automata with Memory, Figure 23 Patterns at T D 200 starting at random in the scenario of Fig. 22b

gree of cooperation after T D 2. Memory dramatically constrains the advance of defection in a smooth way, even for the low level ˛ D 0:1. The effect appears much more homogeneous compared to the binary model, with no special case for high values of ˛, as memory on decisions is always operative in the continuous-valued model [24]. The effect of unlimited trailing memory on the SPD has been studied in [5,6,7,8,9,10].

403

404

Cellular Automata with Memory Cellular Automata with Memory, Table 2 Weighted mean degrees of cooperation after T D 2 and degree of cooperation at T D 3 starting with a single defector in the continuous-valued SPD with b D 1:85

ı (2) 1 1 1 1 1 ˛ ˛ ˛ 1 1C˛ 1C˛ 1C˛ 1 1 1

˛ 1C˛ ˛ 1C˛

1 1

˛ 1C˛

˛ 1C˛ ˛ 1C˛

1

1

0

d(3) (˛ < 0:25) 1 1 1 1 1 1 1 ˛ ˛ ˛ ˛ ˛ 1 1C˛ 1C˛ 1C˛ 1C˛ 1C˛ 1 1

1

1

1

1

1

1

˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛

1 1

˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛

˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛

˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛

˛ 1C˛ ˛ 1C˛ ˛ 1C˛ ˛ 1C˛

1

1

1

1

Discrete-Time Dynamical Systems Memory can be embedded in any model in which time plays a dynamical role. Thus, Markov chains p0TC1 D p0T M become with memory: p0TC1 D ı0T M with ıT being a weighted mean of the probability distributions up to T: ıT D (p1 ; : : : ; pT ). In such scenery, even a minimal incorporation of memory notably alters the evolution of p [23]. Last but not least, conventional, non-spatialized, discrete dynamical systems become with memory: x TC1 D f (m T ) with mT being an average of past values. As an overall rule, memory leads the dynamics a fixed point of the map f [4]. We will introduce an example of this in the context of the PD game in which players follow the socalled Paulov strategy: a Paulov player cooperates if and only if both players opted for the same alternative in the previous move. The name Paulov stems from the fact that this strategy embodies an almost reflex-like response to the payoff: it repeats its former move if it was rewarded by T or R, but switches behavior if it was punished by receiving only P or S. By coding cooperation as 1 and defection as 0, this strategy can be formulated in terms of the choices x of Player A (Paulov) and y of Player B as: x (TC1) D 1 j x (T)  y (T) j. The Paulov strategy has proved to be very successful in its con-

1

d(3) (˛ < 0:25) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ˛ ˛ ˛ 1 1 1C˛ 1 1C˛ 1C˛

1

1 1

1

1 1

1

1 1 1 1 1 1

1

˛ 1C˛ ˛ 1C˛

˛ 1C˛ ˛ 1C˛

˛ 1C˛ ˛ 1C˛

1 1

1 1

1 1 1 1 1 1

tests with otherstrategies [44]. Let us give a simple example of this: suppose that Player B adopts an Anti-Paulov strategy (which cooperates to the extent Paulov defects) with y (TC1) D 1 j 1  x (T)  y (T) j. Thus, in an iterated  Paulov-Anti-Paulov (PAP)  contest, with T(x; y) D 1  jx  yj ; 1  j1  x  yj , it is T(0; 0) D T(1; 1) D (1; 0), T(1; 0) D (0; 1), and T(0; 1) D (0; 1), so that (0,1) turns out to be immutable. Therefore, in an iterated PAP contest, Paulov will always defect, and Anti-Paulov will always cooperate. Relaxing the 0-1 assumption in the standard formulation of the PAP contest, degrees of cooperation can be considered in a continuous-valued scenario. Now x and y will denote the degrees of cooperation of players A and B respectively, with both x and y lying in [0,1]. In this scenario, not only (0,1) is a fixed point, but also T(0:8; 0:6) D (0:8; 0:6). Computer implementation of the iterated PAP tournament turns out to be fully disrupting of the theoretical dynamics. The errors caused by the finite precision of the computer floating point arithmetics (a common problem in dynamical systems working modulo 1) make the final fate of every point to be (0,1). With no exceptions: even the theoretically fixed point (0.8,0.6) ends up as (0,1) in the computerized implementation. A natural way to incorporate older choices in the strategies of decision is to feature players by a summary

Cellular Automata with Memory, Figure 24 Dynamics of the mean values of x (red) and y (blue) starting from any of the points of the 1  1 square

Cellular Automata with Memory

(m) of their own choices farther back in time. The PAP (T) contest becomes in this way: x (TC1) D 1 j m(T) x  my j (T) ; y (TC1) D 1 j 1  m(T) x  m y j. The simplest historic extension results in considering only the two last choices: m(z(T1) ; z(T) ) D (˛z(T1) C z(T) )/(˛ C 1) (z stands for both x and y) [10]. Figure 24 shows the dynamics of the mean values of x and y starting from any of the 101  101 lattice points of the 1  1 square with sides divided by 0.01 intervals. The dynamics in the ahistoric context are rather striking: immediately, at T D 2; both x and y increase from 0.5 up to app. 0:66(' 2/3), a value which remains stable up to app. T D 100, but soon after Paulov cooperation plummets, with the corresponding firing of cooperation of Anti-Paulov: finite precision arithmetics leads every point to (0,1). With memory, Paulov not only keeps a permanent mean degree cooperation but it is higher than that of AntiPaulov; memory tends to lead the overall dynamics to the ahistoric (theoretically) fixed point (0.8, 0.6). Future Directions Embedding memory in states (and in links if wiring is also dynamic) broadens the spectrum of CA as a tool for modeling, in a fairly natural way of easy computer implementation. It is likely that in some contexts, a transition rule with memory could match the correct behavior of the CA system of a given complex system (physical, biological, social and so on). A major impediment in modeling with CA stems from the difficulty of utilizing the CA complex behavior to exhibit a particular behavior or perform a particular function. Average memory in CA tends to inhibit complexity, inhibition that can be modulated by varying the depth of memory, but memory not of average type opens a notable new perspective in CA. This could mean a potential advantage of CA with memory over standard CA as a tool for modeling. Anyway, besides their potential applications, CA with memory (CAM) have an aesthetic and mathematical interest on their own. Thus, it seems plausible that further study on CA with memory should turn out profitable, and, maybe that as a result of a further rigorous study of CAM it will be possible to paraphrase T. Toffoli in presenting CAM—as an alternative to (rather than an approximation of) integro-differential equations in modeling— phenomena with memory. Bibliography Primary Literature 1. Adamatzky A (1994) Identification of Cellular Automata. Taylor and Francis

2. Adamatzky A (2001) Computing in Nonlinear Media and Automata Collectives. IoP Publishing, London 3. Adamatzky A, Holland O (1998) Phenomenology of excitation in 2-D cellular automata and swarm systems. Chaos Solit Fract 9:1233–1265 4. Aicardi F, Invernizzi S (1982) Memory effects in Discrete Dynamical Systems. Int J Bifurc Chaos 2(4):815–830 5. Alonso-Sanz R (1999) The Historic Prisoner’s Dilemma. Int J Bifurc Chaos 9(6):1197–1210 6. Alonso-Sanz R (2003) Reversible Cellular Automata with Memory. Phys D 175:1–30 7. Alonso-Sanz R (2004) One-dimensional, r D 2 cellular automata with memory. Int J Bifurc Chaos 14:3217–3248 8. Alonso-Sanz R (2004) One-dimensional, r D 2 cellular automata with memory. Int J BifurcChaos 14:3217–3248 9. Alonso-Sanz R (2005) Phase transitions in an elementary probabilistic cellular automaton with memory. Phys A 347:383–401 Alonso-Sanz R, Martin M (2004) Elementary Probabilistic Cellular Automata with Memory in Cells. Sloot PMA et al (eds) LNCS, vol 3305. Springer, Berlin, pp 11–20 10. Alonso-Sanz R (2005) The Paulov versus Anti-Paulov contest with memory. Int J Bifurc Chaos 15(10):3395–3407 11. Alonso-Sanz R (2006) A Structurally Dynamic Cellular Automaton with Memory in the Triangular Tessellation. Complex Syst 17(1):1–15. Alonso-Sanz R, Martin, M (2006) A Structurally Dynamic Cellular Automaton with Memory in the Hexagonal Tessellation. In: El Yacoubi S, Chopard B, Bandini S (eds) LNCS, vol 4774. Springer, Berlin, pp 30-40 12. Alonso-Sanz R (2007) Reversible Structurally Dynamic Cellular Automata with Memory: a simple example. J Cell Automata 2:197–201 13. Alonso-Sanz R (2006) The Beehive Cellular Automaton with Memory. J Cell Autom 1:195–211 14. Alonso-Sanz R (2007) A Structurally Dynamic Cellular Automaton with Memory. Chaos Solit Fract 32:1285–1295 15. Alonso-Sanz R, Adamatzky A (2008) On memory and structurally dynamism in excitable cellular automata with defensive inhibition. Int J Bifurc Chaos 18(2):527–539 16. Alonso-Sanz R, Cardenas JP (2007) On the effect of memory in Boolean networks with disordered dynamics: the K D 4 case. Int J Modrn Phys C 18:1313–1327 17. Alonso-Sanz R, Martin M (2002) One-dimensional cellular automata with memory: patterns starting with a single site seed. Int J Bifurc Chaos 12:205–226 18. Alonso-Sanz R, Martin M (2002) Two-dimensional cellular automata with memory: patterns starting with a single site seed. Int J Mod Phys C 13:49–65 19. Alonso-Sanz R, Martin M (2003) Cellular automata with accumulative memory: legal rules starting from a single site seed. Int J Mod Phys C 14:695–719 20. Alonso-Sanz R, Martin M (2004) Elementary cellular automata with memory. Complex Syst 14:99–126 21. Alonso-Sanz R, Martin M (2004) Three-state one-dimensional cellular automata with memory. Chaos, Solitons Fractals 21:809–834 22. Alonso-Sanz R, Martin M (2005) One-dimensional Cellular Automata with Memory in Cells of the Most Frequent Recent Value. Complex Syst 15:203–236 23. Alonso-Sanz R, Martin M (2006) Elementary Cellular Automata with Elementary Memory Rules in Cells: the case of linear rules. J Cell Autom 1:70–86

405

406

Cellular Automata with Memory

24. Alonso-Sanz R, Martin M (2006) Memory Boosts Cooperation. Int J Mod Phys C 17(6):841–852 25. Alonso-Sanz R, Martin MC, Martin M (2000) Discounting in the Historic Prisoner’s Dilemma. Int J Bifurc Chaos 10(1):87–102 26. Alonso-Sanz R, Martin MC, Martin M (2001) Historic Life Int J Bifurc Chaos 11(6):1665–1682 27. Alonso-Sanz R, Martin MC, Martin M (2001) The Effect of Memory in the Spatial Continuous-valued Prisoner’s Dilemma. Int J Bifurc Chaos 11(8):2061–2083 28. Alonso-Sanz R, Martin MC, Martin M (2001) The Historic Strategist. Int J Bifurc Chaos 11(4):943–966 29. Alonso-Sanz R, Martin MC, Martin M (2001) The HistoricStochastic Strategist. Int J Bifurc Chaos 11(7):2037–2050 30. Alvarez G, Hernandez A, Hernandez L, Martin A (2005) A secure scheme to share secret color images. Comput Phys Commun 173:9–16 31. Fredkin E (1990) Digital mechanics. An informal process based on reversible universal cellular automata. Physica D 45:254– 270 32. Grössing G, Zeilinger A (1988) Structures in Quantum Cellular Automata. Physica B 15:366 33. Hauert C, Schuster HG (1997) Effects of increasing the number of players and memory steps in the iterated Prisoner’s Dilemma, a numerical approach. Proc R Soc Lond B 264:513– 519 34. Hooft G (1988) Equivalence Relations Between Deterministic and Quantum Mechanical Systems. J Statistical Phys 53(1/2):323–344 35. Ilachinski A (2000) Cellular Automata. World Scientific, Singapore 36. Ilachinsky A, Halpern P (1987) Structurally dynamic cellular automata. Complex Syst 1:503–527 37. Kaneko K (1986) Phenomenology and Characterization of coupled map lattices, in Dynamical Systems and Sigular Phenomena. World Scientific, Singapore 38. Kauffman SA (1993) The origins of order: Self-Organization and Selection in Evolution. Oxford University Press, Oxford 39. Lindgren K, Nordahl MG (1994) Evolutionary dynamics of spatial games. Physica D 75:292–309 40. Love PJ, Boghosian BM, Meyer DA (2004) Lattice gas simulations of dynamical geometry in one dimension. Phil Trans R Soc Lond A 362:1667 41. Margolus N (1984) Physics-like Models of Computation. Physica D 10:81–95

42. Martin del Rey A, Pereira Mateus J, Rodriguez Sanchez G (2005) A secret sharing scheme based on cellular automata. Appl Math Comput 170(2):1356–1364 43. Nowak MA, May RM (1992) Evolutionary games and spatial chaos. Nature 359:826 44. Nowak MA, Sigmund K (1993) A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature 364:56–58 45. Requardt M (1998) Cellular networks as models for Plank-scale physics. J Phys A 31:7797; (2006) The continuum limit to discrete geometries, arxiv.org/abs/math-ps/0507017 46. Requardt M (2006) Emergent Properties in Structurally Dynamic Disordered Cellular Networks. J Cell Aut 2:273 47. Ros H, Hempel H, Schimansky-Geier L (1994) Stochastic dynamics of catalytic CO oxidation on Pt(100). Pysica A 206:421– 440 48. Sanchez JR, Alonso-Sanz R (2004) Multifractal Properties of R90 Cellular Automaton with Memory. Int J Mod Phys C 15:1461 49. Stauffer D, Aharony A (1994) Introduction to percolation Theory. CRC Press, London 50. Svozil K (1986) Are quantum fields cellular automata? Phys Lett A 119(41):153–156 51. Toffoli T, Margolus M (1987) Cellular Automata Machines. MIT Press, Massachusetts 52. Toffoli T, Margolus N (1990) Invertible cellular automata: a review. Physica D 45:229–253 53. Vichniac G (1984) Simulating physics with Cellular Automata. Physica D 10:96–115 54. Watts DJ, Strogatz SH (1998) Collective dynamics of SmallWorld networks. Nature 393:440–442 55. Wolf-Gladrow DA (2000) Lattice-Gas Cellular Automata and Lattice Boltzmann Models. Springer, Berlin 56. Wolfram S (1984) Universality and Complexity in Cellular Automata. Physica D 10:1–35 57. Wuensche A (2005) Glider dynamics in 3-value hexagonal cellular automata: the beehive rule. Int J Unconv Comput 1:375– 398 58. Wuensche A, Lesser M (1992) The Global Dynamics of Cellular Automata. Addison-Wesley, Massachusetts

Books and Reviews Alonso-Sanz R (2008) Cellular Automata with Memory. Old City Publising, Philadelphia (in press)

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems BASTIEN CHOPARD Computer Science Department, University of Geneva, Geneva, Switzerland Article Outline Glossary Definition of the Subject Introduction Definition of a Cellular Automata Limitations, Advantages, Drawbacks, and Extensions Applications Future Directions Bibliography Cellular automata offer a powerful modeling framework to describe and study physical systems composed of interacting components. The potential of this approach is demonstrated in the case of applications taken from various fields of physics, such as reaction-diffusion systems, pattern formation phenomena, fluid flows and road traffic models. Glossary BGK models Lattice Boltzmann models where the collision term ˝ is expressed as a deviation from a given local equilibrium distribution f (0) , namely ˝ D ( f (0)  f )/ , where f is the unknown particle distribution and a relaxation time (which is a parameter of the model). BGK stands for Bhatnager, Gross and Krook who first considered such a collision term, but not specifically in the context of lattice systems. CA Abbreviation for cellular automata or cellular automaton. Cell The elementary spatial component of a CA. The cell is characterized by a state whose value evolves in time according to the CA rule. Cellular automaton System composed of adjacent cells or sites (usually organized as a regular lattice) which evolves in discrete time steps. Each cell is characterized by an internal state whose value belongs to a finite set. The updating of these states is made in parallel according to a local rule involving only a neighborhood of each cell. Conservation law A property of a physical system in which some quantity (such as mass, momentum or energy) is locally conserved during the time evolution. These conservation laws should be included in the mi-

crodynamics of a CA model because they are essential ingredients governing the macroscopic behavior of any physical system. Collision The process by which the particles of a LGA change their direction of motion. Continuity equation An equation of the form @ t C div u D 0 expressing the mass (or particle number) conservation law. The quantity is the local density of particles and u the local velocity field. Critical phenomena The phenomena which occur in the vicinity of a continuous phase transition, and are characterized by very long correlation length. Diffusion A physical process described by the equation @ t D Dr 2 , where is the density of a diffusing substance. Microscopically, diffusion can be viewed as a random motion of particles. DLA Abbreviation of Diffusion Limited Aggregation. Model of a physical growth process in which diffusing particles stick on an existing cluster when they hit it. Initially, the cluster is reduced to a single seed particle and grows as more and more particles arrive. A DLA cluster is a fractal object whose dimension is typically 1.72 if the experiment is conducted in a two-dimensional space. Dynamical system A system of equations (differential equations or discretized equations) modeling the dynamical behavior of a physical system. Equilibrium states States characterizing a closed system or a system in thermal equilibrium with a heat bath. Ergodicity Property of a system or process for which the time-averages of the observables converge, in a probabilistic sense, to their ensemble averages. Exclusion principle A restriction which is imposed on LGA or CA models to limit the number of particles per site and/or lattice directions. This ensures that the dynamics can be described with a cellular automata rule with a given maximum number of bits. The consequence of this exclusion principle is that the equilibrium distribution of the particle numbers follows a Fermi–Dirac-like distribution in LGA dynamics. FHP model Abbreviation for the Frisch, Hasslacher and Pomeau lattice gas model which was the first serious candidate to simulate two-dimensional hydrodynamics on a hexagonal lattice. Fractal Mathematical object usually having a geometrical representation and whose spatial dimension is not an integer. The relation between the size of the object and its “mass” does not obey that of usual geometrical objects. A DLA cluster is an example of a fractal. Front The region where some physical process occurs. Usually the front includes the locations in space that

407

408

Cellular Automata Modeling of Physical Systems

are first affected by the phenomena. For instance, in a reaction process between two spatially separated reactants, the front describes the region where the reaction takes place. HPP model Abbreviation for the Hardy, de Pazzis and Pomeau model. The first two-dimensional LGA aimed at modeling the behavior of particles colliding on a square lattice with mass and momentum conservation. The HPP model has several physical drawbacks that have been overcome with the FHP model. Invariant A quantity which is conserved during the evolution of a dynamical system. Some invariants are imposed by the physical laws (mass, momentum, energy) and others result from the model used to describe physical situations (spurious, staggered invariants). Collisional invariants are constant vectors in the space where the Chapman–Enskog expansion is performed, associated to each quantity conserved by the collision term. Ising model Hamiltonian model describing the ferromagnetic paramagnetic transition. Each local classical spin variables s i D ˙1 interacts with its neighbors. Isotropy The property of continuous systems to be invariant under any rotations of the spatial coordinate system. Physical quantities defined on a lattice and obtained by an averaging procedure may or may not be isotropic, in the continuous limit. It depends on the type of lattice and the nature of the quantity. Secondorder tensors are isotropic on a 2D square lattice but fourth-order tensors need a hexagonal lattice. Lattice The set of cells (or sites) making up the spatial area covered by a CA. Lattice Boltzmann model A physical model defined on a lattice where the variables associated to each site represent an average number of particles or the probability of the presence of a particle with a given velocity. Lattice Boltzmann models can be derived from cellular automata dynamics by an averaging and factorization procedure, or be defined per se, independently of a specific realization. Lattice gas A system defined on a lattice where particles are present and follow given dynamics. Lattice gas automata (LGA) are a particular class of such a system where the dynamics are performed in parallel over all the sites and can be decomposed in two stages: (i) propagation: the particles jump to a nearest-neighbor site, according to their direction of motion and (ii) collision: the particles entering the same site at the same iteration interact so as to produce a new particle distribution. HPP and FHP are well-known LGA.

Lattice spacing The separation between two adjacent sites of a regular lattice. Throughout this book, it is denoted by the symbol r . LB Abbreviation for Lattice Boltzmann. LGA Abbreviation for Lattice Gas Automaton. See lattice gas model for a definition. Local equilibrium Situation in which a large system can be decomposed into subsystems, very small on a macroscopic scale but large on a microscopic scale such that each sub-system can be assumed to be in thermal equilibrium. The local equilibrium distribution is the function which makes the collision term of a Boltzmann equation vanish. Lookup table A table in which all possible outcomes of a cellular automata rule are pre-computed. The use of a lookup table yields a fast implementation of a cellular automata dynamics since however complicated a rule is, the evolution of any configuration of a site and its neighbors is directly obtained through a memory access. The size of a lookup table grows exponentially with the number of bits involved in the rule. Margolus neighborhood A neighborhood made of twoby-two blocks of cells, typically in a two-dimensional square lattice. Each cell is updated according to the values of the other cells in the same block. A different rule may possibly be assigned dependent on whether the cell is at the upper left, upper right, lower left or lower right location. After each iteration, the lattice partition defining the Margolus blocs is shifted one cell right and one cell down so that at every other step, information can be exchanged across the lattice. Can be generalized to higher dimensions. Microdynamics The Boolean equation governing the time evolution of a LGA model or a cellular automata system. Moore neighborhood A neighborhood composed of the central cell and all eight nearest and next-nearest neighbors in a two-dimensional square lattice. Can be generalized to higher dimensions. Multiparticle models Discrete dynamics modeling a physical system in which an arbitrary number of particles is allowed at each site. This is an extension of an LGA where no exclusion principle is imposed. Navier–Stokes equation The equation describing the velocity field u in a fluid flow. For an incompressible fluid (@ t D 0), it reads 1 @ t u C (u  r)u D  rP C r 2 u where is the density and P the pressure. The Navier– Stokes equation expresses the local momentum con-

Cellular Automata Modeling of Physical Systems

servation in the fluid and, as opposed to the Euler equation, includes the dissipative effects with a viscosity term r 2 u. Together with the continuity equation, this is the fundamental equation of fluid dynamics. Neighborhood The set of all cells necessary to compute a cellular automaton rule. A neighborhood is usually composed of several adjacent cells organized in a simple geometrical structure. Moore, von Neumann and Margolus neighborhoods are typical examples. Occupation numbers Boolean quantities indicating the presence or absence of a particle in a given physical state. Open system A system communicating with the environment by exchange of energy or matter. Parallel Refers to an action which is performed simultaneously at several places. A parallel updating rule corresponds to the updating of all cells at the same time as if resulting from the computations of several independent processors. Partitioning A technique consisting of dividing space in adjacent domains (through a partition) so that the evolution of each block is uniquely determined by the states of the elements within the block. Phase transition Change of state obtained when varying a control parameter such as the one occurring in the boiling or freezing of a liquid, or in the change between ferromagnetic and paramagnetic states of a magnetic solid. Propagation This is the process by which the particles of a LGA are moved to a nearest neighbor, according to the direction of their velocity vector vi . In one time step t the particle travel from cell r to cell r C vi  t where r C vi  t is the nearest neighbor in lattice direction i. Random walk A series of uncorrelated steps of length unity describing a random path with zero average displacement but characteristic size proportional to the square root of the number of steps. Reaction-diffusion systems Systems made of one or several species of particles which diffuse and react among themselves to produce some new species. Scaling hypothesis A hypothesis concerning the analytical properties of the thermodynamic potentials and the correlation functions in a problem invariant under a change of scale. Scaling law Relations among the critical exponents describing the power law behaviors of physical quantities in systems invariant under a change of scale. Self-organized criticality Concept aimed at describing a class of dynamical systems which naturally drive

themselves to a state where interesting physics occurs at all scales. Site Same as a cell, but preferred terminology in LGA and LB models. Spatially extended systems Physical systems involving many spatial degrees of freedom and which, usually, have rich dynamics and show complex behaviors. Coupled map lattices and cellular automata provides a way to model spatially extended systems. Spin Internal degree of freedom associated to particles in order to describe their magnetic state. A widely used case is the one of classical Ising spins. To each particle, one associates an “arrow” which is allowed to take only two different orientations, up or down. Time step Interval of time separating two consecutive iterations in the evolution of a discrete time process, like a CA or a LB model. Throughout this work the time step is denoted by the symbol t . Universality The phenomenon whereby many microscopically different systems exhibit a critical behavior with quantitatively identical properties such as the critical exponents. Updating operation consisting of assigning a new value to a set of variables, for instance those describing the states of a cellular automata system. The updating can be done in parallel and synchronously as is the case in CA dynamics or sequentially, one variable after another, as is usually the case for Monte–Carlo dynamics. Parallel, asynchronous updating is less common but can be envisaged too. Sequential and parallel updating schemes may yield different results since the interdependencies between variables are treated differently. Viscosity A property of a fluid indicating how much momentum “diffuses” through the fluid in a inhomogeneous flow pattern. Equivalently, it describes the stress occurring between two fluid layers moving with different velocities. A high viscosity means that the resulting drag force is important and low viscosity means that this force is weak. Kinematic viscosity is usually denoted by and dynamic viscosity is denoted by  D where is the fluid density. von Neumann neighborhood On a two-dimensional square lattice, the neighborhood including a central cell and its nearest neighbors north, south, east and west. Ziff model A simple model describing adsorption– dissociation–desorption on a catalytic surface. This model is based upon some of the known steps of the reaction A–B2 on a catalyst surface (for example CO–O2 ).

409

410

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 1 The game of life automaton. Black dots represent living cells whereas dead cells are white. The figure shows the evolution of a random initial configuration and the formation of spatial structures, with possibly some emerging functionalities. The figure shows the evolution of some random initial configurations

Definition of the Subject The computational science community has always been faced with the challenge of bringing efficient numerical tools to solve problems of increasing difficulty. Nowadays, the investigation and understanding of the so-called complex systems, and the simulation of all kinds of phenomena originating from the interaction of many components are of central importance in many area of science. Cellular automata turn out to be a very fruitful approach to address many scientific problems by providing an efficient way to model and simulate specific phenomena for which more traditional computational techniques are hardly applicable. The goal of this article is to provide the reader with the foundation of this approach, as well as a selection of simple applications of the cellular automata approach to the modeling of physical systems. We invite the reader to consult the web site http://cui.unige.ch/~chopard/CA/ Animations/img-root.html in order to view short movies about several of the models discussed in this article. Introduction Cellular automata (hereafter termed CA) are an idealization of the physical world in which space and time are of discrete nature. In addition to space and time, the physical quantities also take only a finite set of values. Since it has been proposed by von Neumann in the late 1940s, the cellular automata approach has been applied to a large range of scientific problems (see for instance [4,10,16,35,42,51]). International conferences (e. g. ACRI) and dedicated journals (J. of Cellular Automata) also describe current developments. When von Neumann developed the concept of CA, its motivation was to extract the abstract (or algorithmic) mechanisms leading to self-reproduction of biological organisms [6].

Following the suggestions of S. Ulam, von Neumann addressed this question in the framework of a fully discrete universe made up of simple cells. Each cell was characterized by an internal state, which typically consists of a finite number of information bits. Von Neumann suggested that this system of cells evolves, in discrete time steps, like simple automata which only know of a simple recipe to compute their new internal state. The rule determining the evolution of this system is the same for all cells and is a function of the states of the cell itself and its neighbors. Similarly to what happens in any biological system, the activity of the cells takes place simultaneously. The same clock is assumed to drive the evolution of every cell and the updating of their internal state occurs synchronously. Such a fully discrete dynamical system (cellular space), as invented by von Neumann, is now referred to as a cellular automaton. Among the early applications of CA, the game of life [19] is famous. In 1970, the mathematician John Conway proposed a simple model leading to complex behaviors. He imagined a two-dimensional square lattice, like a checkerboard, in which each cell can be either alive (state one) or dead (state zero). The updating rule of the game of life is as follows: a dead cell surrounded by exactly three living cells gets back to life; a living cell surrounded by less than two or more than three neighbors dies of isolation or overcrowdness. Here, the surrounding cells correspond to the neighborhood composed of the four nearest cells (North, South, East and West), plus the four second nearest neighbors, along the diagonals. It turns out that the game of life automaton has an unexpectedly rich behavior. Complex structures emerge out of a primitive “soup” and evolve so as to develop some skills that are absent of the elementary cells (see Fig. 1).

Cellular Automata Modeling of Physical Systems

The game of life is a cellular automata capable of universal computations: it is always possible to find an initial configuration of the cellular space reproducing the behavior of any electronic gate and, thus, to mimic any computation process. Although this observation has little practical interest, it is very important from a theoretical point of view since it assesses the ability of CAs to be a nonrestrictive computational technique. A very important feature of CAs is that they provide simple models of complex systems. They exemplify the fact that a collective behavior can emerge out of the sum of many, simply interacting, components. Even if the basic and local interactions are perfectly known, it is possible that the global behavior obeys new laws that are not obviously extrapolated from the individual properties, as if the whole were more than the sum of all the parts. These properties make cellular automata a very interesting approach to model physical systems and in particular to simulate complex and nonequilibrium phenomena. The studies undertaken by S. Wolfram in the 1980s [50,51] clearly establishes that a CA may exhibit many of the behaviors encountered in continuous systems, yet in a much simpler mathematical framework. A further step is to recognize that CAs are not only behaving similarly to some dynamical processes, they can also represent an actual model of a given physical system, leading to macroscopic predictions that could be checked experimentally. This fact follows from statistical mechanics which tells us that the macroscopic behavior of many systems is often only weakly related to the details of its microscopic reality. Only symmetries and conservation laws survive to the change of observation level: it is well known that the flows of a fluid, a gas or even a granular media are very similar at a macroscopic scale, in spite of their different microscopic nature. When one is interested in the global or macroscopic properties of a system, it is therefore a clear advantage to invent a much simpler microscopic reality, which is more appropriate to the available numerical means of investigation. An interesting example is the FHP fluid model proposed by Frisch, Hasslacher and Pomeau in 1986 [18] which can be viewed as a fully discrete molecular dynamic and yet behaves as predicted by the Navier–Stokes equation when the observation time and length scales are much larger than the lattice and automaton time step. A cellular automata model can then be seen as an idealized universe with its own microscopic reality but, nevertheless, with the same macroscopic behavior as given in the real system.

The cellular automata paradigm presents nevertheless some weaknesses inherent to its discrete nature. In the early 1990s, Lattice Boltzmann (LB) models were proposed to remedy some of these problems, using real-valued states instead of Boolean variables. It turns out that LB models are indeed a very powerful approach which combines numerical efficiency with the advantage of having a model whose microscopic components are intuitive. LB-fluids are more and more used to solve complex flows such as multi-component fluids or complicated geometries problems. See for instance [10,40,41,49] for an introduction to LB models. Definition of a Cellular Automata In order to give a definition of a cellular automaton, we first present a simple example, the so-called parity rule. Although it is very basic, the rule we discuss here exhibits a surprisingly rich behavior. It was proposed initially by Edward Fredkin in the 1970s [3] and is defined on a twodimensional square lattice. Each site of the lattice is a cell which is labeled by its position r D (i; j) where i and j are the row and column indices. A function (r; t) is associated with the lattice to describe the state of each cell r at iteration t. This quantity can be either 0 or 1. The cellular automata rule specifies how the states (r; t C 1) are to be computed from the states at iteration t. We start from an initial condition at time t D 0 with a given configuration of the values (r; t D 0) on the lattice. The state at time t D 1 will be obtained as follows (1) Each site r computes the sum of the values (r0 ; 0) on the four nearest neighbor sites r0 at north, west, south, and east. The system is supposed to be periodic in both i and j directions (like on a torus) so that this calculation is well defined for all sites. (2) If this sum is even, the new state (r; t D 1) is 0 (white) and, else, it is 1 (black). The same rule (steps 1 and 2) is repeated over to find the states at time t D 2; 3; 4; : : :. From a mathematical point of view, this cellular automata parity rule can be expressed by the following relation (i; j; t C 1) D (i C 1; j; t) ˚ ˚

(i  1; j; t)

(i; j C 1; t) ˚

(i; j  1; t)

(1)

where the symbol ˚ stands for the exclusive OR logical operation. It is also the sum modulo 2: 1 ˚ 1 D 0 ˚ 0 D 0 and 1 ˚ 0 D 0 ˚ 1 D 1.

411

412

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 2 The ˚ rule (or parity rule) on a 256  256 periodic lattice. a Initial configuration; b and c configurations after tb D 93 and tc D 110 iterations, respectively

When this rule is iterated, very nice geometric patterns are observed, as shown in Fig. 2. This property of generating complex patterns starting from a simple rule is generic of many cellular automata rules. Here, complexity results from some spatial organization which builds up as the rule is iterated. The various contributions of successive iterations combine together in a specific way. The spatial patterns that are observed reflect how the terms are combined algebraically. A computer implementation of this CA can be given. In Fig. 3 we propose as, an illustration, a Matlab program (the reader can also consider Octave which is a free version of Matlab). On the basis of this example we now give a definition of a cellular automata. Formally, a cellular automata is a tuple (A; ; R; N ) where (i) A is a regular lattice of cells covering a portion of a d-dimensional space. (ii) A set  (r; t) D f (1) (r; t);  (2) (r; t); : : : ;  (m) (r; t)g of m Boolean variables attached to each site r of the lattice and giving the local state of the cells at time t. (iii) A set R of rules, R D fR(1) ; R(2) ; : : : ; R(m) g, which specifies the time evolution of the states  (r; t) in the following way  ( j) (r; t C  t ) D R( j) ( (r; t);  (r C v1 ; t);  (r C v2 ; t); : : : ;  (r C vq ; t))

(2)

where r C vk designate the cells belonging to the neighborhood N of cell r. In the above definition, the rule R is identical for all sites and is applied simultaneously to each of them, leading to synchronous dynamics. As the number of configurations of the neighborhood is finite, it is common to precompute all the values of R in a lookup table. Otherwise, an algebraic expression can be used and evaluated at each iteration, for each cell, as in Eq. (1).

It is important to notice that the rule is homogeneous, that is it cannot not depend explicitly on the cell position r. However, spatial (or even temporal) inhomogeneities can be introduced anyway by having some  j (r) systematically 1 in some given locations of the lattice to mark particular cells on which a different rule applies. Boundary cells are a typical example of spatial inhomogeneities. Similarly, it is easy to alternate between two rules by having a bit which is 1 at even time steps and 0 at odd time steps. The neighborhood N of each cell (i. e. the spatial region around each cell used to compute the next state) is usually made of its adjacent cells. It is often restricted to the nearest or next to nearest neighbors, otherwise the complexity of the rule is too large. For a two-dimensional cellular automaton, two neighborhoods are often considered: the von Neumann neighborhood which consists of a central cell (the one which is to be updated) and its four geographical neighbors North, West, South, and East. The Moore neighborhood contains, in addition, the second nearest neighbor North-East, North-West, SouthWest, and South-East, that is a total of nine cells. Another interesting neighborhood is the Margolus neighborhood briefly described in the glossary. According to the above definition, a cellular automaton is deterministic. The rule R is some well-defined function and a given initial configuration will always evolve identically. However, it may be very convenient for some applications to have a certain degree of randomness in the rule. For instance, it may be desirable that a rule selects one outcome among several possible states, with a probability p. Cellular automata whose updating rule is driven by some external probabilities are called probabilistic cellular automata. On the other hand, those which strictly comply with the definition given above, are referred to as deterministic cellular automata. Probabilistic cellular automata are a very useful generalization because they offer a way to adjust the parameters of a rule in a continuous range of values, despite the dis-

Cellular Automata Modeling of Physical Systems

nx=128; ny=128; a=zeros(nx,ny);

% size of the domain: 128x128 % the states are first initialized to 0

north=[ 2:nx,1]; % vectors to access the neighbors south=[nx, 1:nx-1]; % corresponding to a cyclic permutation east=[2:ny,1]; % of 1:nx or 1:ny west=[ny,1:ny-1]; % a central patch is initialized with 1’s a(nx/2-3:nx/2+2, ny/2-4:ny/2+3)=1; for t=1:65 % let us pcolor(a) % axis off axis square shading flat drawnow % somme=a(north,:) a=mod(somme,2); end

do 65 iterations build a graphical representation

display it + a(south,:) + a(:,west) + a(:,east);

Cellular Automata Modeling of Physical Systems, Figure 3 A example of a Matlab program for the parity rule

crete nature of the cellular automata world. This is very convenient when modeling physical systems in which, for instance, particles are annihilated or created at some given rate. Limitations, Advantages, Drawbacks, and Extensions The interpretation of the cellular automata dynamics in terms of simple “microscopic” rules offers a very intuitive and powerful approach to model phenomena that are very difficult to include in more traditional approaches (such as differential equations). For instance, boundary conditions are often naturally implemented in a cellular automata model because they have a natural interpretation at this level of description (e. g. particles bouncing back on an obstacle). Numerically, an advantage of the CA approach is its simplicity and its adequation to computer architectures and parallel machines. In addition, working with Boolean quantities prevent numerical instabilities since an exact computation is made. There is no truncation or approximation in the dynamics itself. Finally, a CA model is an implementation of an N-body system where all correlations are taken into account, as well as all spontaneous fluctuations arising in a system made up of many particles. On the other hand, cellular automata models have several drawbacks related to their fully discrete nature. An im-

portant one is the statistical noise requiring systematic averaging processes. Another one is the little flexibility to adjust parameters of a rule in order to describe a wider range of physical situations. The Lattice Boltzmann approach solves several of the above problems. On the other hand, it may be numerically unstable and, also, requires some hypotheses of molecular chaos which reduces the some of the richness of the original CA dynamics [10]. Finally, we should remark that the CA approach is not a rigid framework but should allow for many extensions according to the problem at hand. The CA methodology is a philosophy of modeling where one seeks a description in terms of simple but essential mechanisms. Its richness and interest comes from the microscopic contents of its rule for which there is, in general, a clear physical or intuitive interpretation of the dynamics directly at the level of the cell. Einstein’s quote “Everything should be made as simple as possible, but not simpler” is a good illustration of the CA methodology to the modeling of physical systems. Applications Many physical situations, like fluid flows, pattern formation, reaction-diffusion processes, nucleation-aggregation growth phenomena, phase transition, population dynamics, or traffic processes are very well suited to the cellu-

413

414

Cellular Automata Modeling of Physical Systems

lar automata approach because both space and time play a central role in the behavior of these systems. Below we describe several applications which illustrate the potential of the approach and that can be extended in order to address a wide class of scientific problems. A Growth Model A simple class of cellular automata rules consists of the socalled majority rules. The updating selects the new state of each cell so as to conform to the value currently held by the majority of the neighbors. Typically, in these majority rules, the state is either 0 or 1. A very interesting behavior is observed with the twisted majority rule proposed by G. Vichniac [44]: in two-dimensions, each cell considers its Moore neighborhood (i. e. itself plus its eight nearest neighbors) and computes the sum of the cells having a value 1. This sum can be any value between 0 and 9. The new state s(t C 1) of each cell is then determined from this local sum, according to the following table sum(t) 0 1 2 3 4 5 6 7 8 9 s(t C 1) 0 0 0 0 1 0 1 1 1 1

(3)

As opposed to the plain majority rule, here, the two middle entries of the table have been swapped. Therefore, when there is a slight majority of 1 around a cell, it turns to 0. Conversely, if there is a slight majority of 0, the cell becomes 1. Surprisingly enough this rule describes the interface motion between two phases, as illustrated in Fig. 4. Vich-

niac has observed that the normal velocity of the interface is proportional to its local curvature, as required by the Allen–Cahn [21] equation. Of course, due to its local nature, the rule cannot detect the curvature of the interface directly. However, as the rule is iterated, local information is propagated to the nearest neighbors and the radius of curvature emerges as a collective effect. This rule is particularly interesting when the initial configuration is a random mixture of the two phases, with equal concentration. Otherwise, some pathological behaviors may occur. For instance, an initial square of 1’s surrounded by zero’s will not evolve: 90-degree angles are not eroded and remain stable structures. Ising-Like Dynamics The Ising model is extensively used in physics. Its basic constituents are spins si which can be in one of two states: s i 2 f1; 1g. These spins are organized on a regular lattice in d-dimensions and coupled in the sense that each pair (s i ; s j ) of neighbor spins contributes an amount Js i s j to the energy of the system. Intuitively, the dynamics of such a system is that a spin flips (s i ! s i ) if this is favorable in view of the energy of the local configuration. Vichniac [44], in the 1980s, proposed a CA rule, called Q2R, simulating the behavior of Ising spin dynamics. The model is as follows: We consider a two-dimensional square lattice such that each site holds a spin si which is either up (s i D 1) or down (s i D 0) (instead of ˙1). The coupling between spins is assumed to come from the von Neumann neighborhood (i. e. north, west, south, and east neighbors).

Cellular Automata Modeling of Physical Systems, Figure 4 Evolution of the twisted majority. The inherent “surface tension” present in the rule tends to separate the red phases s D 1 from the blue phase s D 0. The snapshots a, b and c correspond to t D 0, t D 72, and t D 270 iterations, respectively. The other colors indicate how “capes” have been eroded and “bays” filled: light blue shows the blue regions that have been eroded during the last few iterations and yellow marks the red regions that have been filled

Cellular Automata Modeling of Physical Systems

In this simple model, the spins will flip (or not flip) during their discrete time evolution according to a local energy conservation principle. This means we are considering a system which cannot exchange energy with its surroundings. The model will be a microcanonical cellular automata simulation of Ising spin dynamics, without a temperature but with a critical energy. A spin si can flip at time t to become 1  s i at time t C 1 if and only if this move does not cause any energy change. Accordingly, spin si will flip if the number of its neighbors with spin up is the same as the number of its neighbors with spin down. However, one has to remember that the motion of all spins is simultaneous in a cellular automata. The decision to flip is based on the assumption that the neighbors are not changing. If they are allowed to flip too, (because they obey the same rule), then energy may not be conserved. A way to cure this problem is to split the updating in two phases and consider a partition of the lattice in odd and even sites (e. g. the white and black squares of a chessboard in 2D): first, one flips the spins located at odd positions, according to the configuration of the even spins. In the second phase, the even sublattice is updated according to the odd one. The spatial structure (defining the two sublattices) is obtained by adding an extra bit b to each lattice site, whose value is 0 for the odd sublattice and 1 for the even sublattice. The flipping rule described earlier is then regulated by the value of b. It takes place only for those sites for which b D 1. Of course, the value of b is also updated at each iteration according to b(t C 1) D 1  b(t), so that at the next iteration, the other sublattice is considered. In two-dimensions, the Q2R rule can be then expressed by the following expressions

s i j (tC1) D

8 ˆ < 1  s i j (t) ˆ : s (t) ij

if b i j D 1 and s i1; j Cs iC1; j C s i; j1 C s i; jC1 D 2 otherwise (4)

and b i j (t C 1) D 1  b i j (t)

rounded by domains of “down” magnetization (black regions). In this dynamics, energy is exactly conserved because that is the way the rule is built. However, the number of spins down and up may vary. In the present experiment, the fraction of spins up increases from 11% in the initial state to about 40% in the stationary state. Since there is an excess of spins down in this system, there is a resulting macroscopic magnetization. It is interesting to study this model with various initial fractions s of spins up. When starting with a random initial condition, similar to that of Fig. 5a, it is observed that, for many values of s , the system evolves to a state where there is, in the average, the same amount of spin down and up, that is no macroscopic magnetization. However, if the initial configuration presents a sufficiently large excess of one kind of spins, then a macroscopic magnetization builds up as time goes on. This means there is a phase transition between a situation of zero magnetization and a situation of positive or negative magnetization. It turns out that this transition occurs when the total energy E of the system is low enough (a low energy means that most of the spins are aligned and that there is an excess of one species over the other), or more precisely when E is smaller than a critical energy Ec . In that sense, the Q2R rule captures an important aspect of a real magnetic system, namely a non-zero magnetization at low energy (which can be related to a low temperature situation) and a transition to a nonmagnetic phase at high energy. However, Q2R also exhibits unexpected behavior that is difficult to detect from a simple observation. There is a breaking of ergodicity: a given initial configuration of energy E0 evolves without visiting completely the region of the phase space characterized by E D E0 . This is illustrated by the following simple 1D example, where a ring of four spins with periodic boundary condition are considered. t : 1001 t C 1 : 1100

(5)

t C 2 : 0110

(6)

t C 3 : 0011 where the indices (i; j) label the Cartesian coordinates and s i j (t D 0) is either one or zero. The question is now how well does this cellular automata rule perform to describe an Ising model? Figure 5 shows a computer simulation of the Q2R rule, starting from an initial configuration with approximately 11% of spins s i j D 1 (Fig. 5a). After a transient phase (figures b and c), the system reaches a stationary state where domains with “up” magnetization (white regions) are sur-

t C 4 : 1001 After four iterations, the system cycles back to its original state. The configuration of this example has E0 D 0. As we observed, it never evolves to 0111, which is also a configuration of zero energy. This nonergodicity means that not only energy is conserved during the evolution of the automaton, but also another quantity which partitions the energy surface in independent regions.

415

416

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 5 Evolution of a system of spins with the Q2R rule. Black represents the spins down sij D 0 and white the spins up sij D 1. The four images a, b, c, and d show the system at four different times ta D 0 < tb  tcd

Competition Models and Cell Differentiation In Sect. “A Growth Model” we have discussed a majority rule in which the cells imitate their neighbors. In some sense, this corresponds to a cooperative behavior between the cells. A quite different situation can be obtained if the cells obey competitive dynamics. For instance, we may imagine that the cells compete for some resources at the expense of their nearest neighbors. A winner is a cell of state 1 and a loser a cell of state 0. No two winner cells can be neighbors and any loser cell must have at least one winner neighbor (otherwise nothing would have prevented it to also win). It is interesting to note that this problem has a direct application in biology, to study cell differentiation. It

has been observed in the development of Drosophila that about 25% of the cells forming the embryo evolve to the state of neuroblast, while the remaining 75% do not. How can we explain this differentiation and the observed fraction since, at the beginning of the process all cells can be assumed equivalent? A possible mechanism [28] is that some competition takes place between the adjacent biological cells. In other words, each cell produces some substance S but the production rate is inhibited by the amount of S already present in the neighboring cells. Differentiation occurs when a cell reaches a level of S above a given threshold. The competition CA model we propose to describe this situation is the following. Because of the analogy with the biological system, we shall consider a hexagonal lattice,

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 6 The hexagonal lattice used for the competition-inhibition CA rule. Black cells are cells of state 1 (winners) and white cells are cells of state 0 (losers). The two possible final states with a fully regular structure are illustrated with density 1/3 and 1/7 of a winner, respectively

which is a reasonable approximation of the cell arrangement observed in the Drosophila embryo (see Fig. 6). We assume that the values of S can be 0 (inhibited) or 1 (active) in each lattice cell.  A S D 0 cell will grow (i. e. turn to S D 1) with probability pgrow provided that all its neighbors are 0. Otherwise, it stays inhibited.  A cell in state S D 1 will decay (i. e. turn to S D 0) with probability pdecay if it is surrounded by at least one active cell. If the active cell is isolated (all the neighbors are in state 0) it remains in state 1. The evolution stops (stationary process) when no S D 1 cell feels any more inhibition from its neighbors and when all S D 0 cells are inhibited by their neighborhood. Then, with our biological interpretation, cells with S D 1 are those which will differentiate. What is the expected fraction of these S D 1 cells in the final configuration? Clearly, from Fig. 6, the maximum value is 1/3. According to the inhibition condition we imposed, this is the close-packed situation on the hexagonal lattice. On the other hand, the minimal value is 1/7, corresponding to a situation where the lattice is partitioned in blocks with one active cell surrounded by six inhibited cells. In practice we do not expect any of these two limits to occur spontaneously after the automaton evolution. On the contrary, we should observe clusters of close-packed active cells surrounded by defects, i. e. regions of low density of active cells. CA simulations show indeed that the final fraction s of active cells is a mix of the two limiting situations of Fig. 6 :23  s  :24 almost irrespectively of the values chosen for panihil and pgrowth . This is exactly the value expected from the biological observations made on the Drosophila embryo. Thus, cell

differentiation can be explained by a geometrical competition without having to specify the inhibitory couplings between adjacent cell and the production rate (i. e. the values of panihil and pgrowth ): the result is quite robust against any possible choices. Traffic Models Cellular automata models for road traffic have received a great deal of interest during the past few years (see [13,32, 33,36,37,47,48,52] for instance). One-dimensional models for single lane car motions are quite simple and elegant. The road is represented as a line of cells, each of them being occupied or not by a vehicle. All cars travel in the same direction (say to the right). Their positions are updated synchronously. During the motion, each car can be at rest or jump to the nearest neighbor site, along the direction of motion. The rule is simply that a car moves only if its destination cell is empty. This means that the drivers do not know whether the car in front will move or will be blocked by another car. Therefore, the state of each cell si is entirely determined by the occupancy of the cell itself and its two nearest neighbors s i1 and s iC1 . The motion rule can be summarized by the following table, where all eight possible configurations (s i1 s i s iC1 ) t ! (s i ) tC1 are given (111) (110) (101) (100) (011) (010) „ƒ‚… „ƒ‚… „ƒ‚… „ƒ‚… „ƒ‚… „ƒ‚… 1

0

1

1

1

0

(001) (000) : „ƒ‚… „ƒ‚… 0

(7)

0

This cellular automaton rule turns out to be Wolfram rule 184 [50,52]. These simple dynamics capture an interesting feature of real car motion: traffic congestion. Suppose we have a low car density in the system, for instance something

417

418

Cellular Automata Modeling of Physical Systems

like : : : 0010000010010000010 : : : :

(8)

This is a free traffic regime in which all the cars are able to move. The average velocity hvi defined as the number of motions divided by the number of cars is then hvf i D 1

(9)

where the subscript f indicates a free state. On the other hand, in a high density configuration such as : : : 110101110101101110 : : : :

(10)

only six cars over 12 will move and hvi D 1/2. This is a partially jammed regime. If the car positions were uncorrelated, the number of moving cars (i. e. the number of particle-hole pairs) would be given by L (1  ), where L is the system size. Since the number of cars is L, the average velocity would be hvuncorreli D 1  :

(11)

However, in this model, the car occupancy of adjacent sites is highly correlated and the vehicles cannot move until a hole has appeared in front of them. The car distribution tries to self-adjust to a situation where there is one spacing between consecutive cars. For densities less than one-half, this is easily realized and the system can organize to have one car every other site. Therefore, due to these correlations, Eq. (11) is wrong in the high density regime. In this case, since a car needs a hole to move to, we expect that the number of moving cars simply equals the number of empty cells [52]. Thus, the number of motions is L(1  ) and the average velocity in the jammed phase is hv j i D

1 :

Cellular Automata Modeling of Physical Systems, Figure 7 Traffic flow diagram for the simple CA traffic rule

 The cars slow down when required: u0i ! u00i D d i  1, if u0i d i . 00  The cars have a random behavior: u00i ! u000 i D u i  1, 00 with probability pi if u i > 0.  Finally, the cars move u000 i sites ahead.

(12)

From the above relations we can compute the so-called fundamental flow diagram, i. e. the relation between the flow of cars hvi as a function of the car density : for  1/2, we use the free regime expression and hvi D . For densities > 1/2, we use the jammed expression and hvi D 1  . The resulting diagram is shown in Fig. 7. As in real traffic, we observe that the flow of cars reaches a maximum value before decreasing. A richer version of the above CA traffic model is due to Nagel and Schreckenberg [33,47,48]. The cars may have several possible velocities u D 0; 1; 2; : : : ; umax . Let ui be the velocity of car i and di the distance, along the road, separating cars i and i C 1. The updating rule is:  The cars accelerate when possible: u i ! u0i D u i C 1, if u i < umax .

Cellular Automata Modeling of Physical Systems, Figure 8 The four central cells represent a roundabout which is traveled counterclockwise. The gray levels indicate the different traffic lanes: white is a northbound lane, light gray an eastbound lane, gray a southbound lane and, finally, dark gray is a westbound lane. The dots labeled a, b, c, d, e, f, g, and h are cars which will move to the destination cell indicated by the arrows, as determined by some local decision rule. Cars without an arrow are forbidden to move

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 9 Traffic configuration after 600 iterations, for a car density of 30%. Streets are white, buildings gray and the black pixels represent the cars. Situation a corresponds to the roundabout junctions, whereas image b mimics the presence of traffic lights. In the second case, queues are more likely to form and the global mobility is less than in the first case

Cellular Automata Modeling of Physical Systems, Figure 10 Average velocity versus average density for the cellular automata street network, for a time-uncorrelated turning strategies and b a fixed driver decision. The different curves correspond to different distances L between successive road junctions. The dashed line is the analytical prediction (see [13]). Junction deadlock is likely to occur in b, resulting in a completely jammed state

This rule captures some important behaviors of real traffic on a highway: velocity fluctuations due to a nondeterministic behavior of the drivers, and “stop-and-go” waves observed in a high-density traffic regime. We refer the reader to recent literature for the new developments of this topic. See for instance [24,25]. Note that a street network can also be described using a CA. A possible approach is to model a road intersection as a roundabout. Cars in the roundabout have prior-

ity over those willing to enter. Figure 8 illustrates a simple four-way road junction. Traffic lights can also be modeled in a CA by stopping, during a given number of iterations, the car reaching a given cell. Figure 9 illustrates CA simulation of a Manhattan-like city in which junctions are either controlled by a roundabout or by a traffic light. In both cases, the destination of a car reaching the junction is randomly chosen.

419

420

Cellular Automata Modeling of Physical Systems

Figure 10 shows the fundamental flow diagram obtained with the CA model, for a Manhattan-like city governed by roundabouts separated by a distance L. CA modeling of urban traffic has been used for real situations by many authors (see for instance [11]) and some cities in Europe and USA use the CA approach as a way to manage traffic. Note finally that crowd motion is also addressed within the CA framework. Recent results [7,29] show that the approach is quite promising to plan evacuation strategies and reproduce several of the motion patterns observed in real crowds. A Simple Gas: The HPP Model The HPP rule is a simple example of an important class of cellular automata models: lattice gas automata (LGA). The basic ingredient of such models are point particles that move on a lattice, according to appropriate rules so as to mimic fully discrete “molecular dynamics.” The HPP lattice gas automata is traditionally defined on a two-dimensional square lattice. Particles can move along the main directions of the lattice, as shown in Fig. 11. The model limits to 1 the number of particles entering a given site with a given direction of motion. This is the exclusion principle which is common in most LGA (LGA models without the exclusion principle are called multiparticle models [10]). With at most one particle per site and direction, four bits of information at each site are enough to describe the system during its evolution. For instance, if at iteration t site r has the following state s(r; t) D (1011), it means that three particles are entering the site along direction 1, 3, and 4, respectively. The cellular automata rule describing the evolution of s(r; t) is often split into two steps: collision and motion (or propagation). The collision phase specifies how the particles entering the same site will interact and change their trajectories. During the propagation phase, the particles actually move to the nearest neighbor site they are traveling to. This decomposition into two phases is a quite convenient way to partition the space so that the collision rule is purely local. Figure 12 illustrates the HPP rules. According to our Boolean representation of the particles at each site, the collision part for the two head on collisions are expressed as (1010) ! (0101)

(0101) ! (1010)

(13)

all the other configurations being unchanged. During the propagation phase, the first bit of the state variable is shifted to the east neighbor cell, the second bit to the north and so on.

Cellular Automata Modeling of Physical Systems, Figure 11 Example of a configuration of HPP particles

The aim of this rule is to reproduce some aspect of the real interactions between particles, namely that momentum and particle number are conserved during a collision. From Fig. 12, it is easily checked that these properties are obeyed: a pair of zero momentum particles along a given direction is transformed into another pair of zero momentum along the perpendicular axis. It is easy to express the HPP model in a mathematical form. For this purpose, the so-called occupation numbers n i (r; t) are introduced for each lattice site r and each time step t. The index i labels the lattice directions (or the possible velocities of the particles). In the HPP model, the lattice has four directions (North, West, South, and East) and i runs from 1 to 4. By definition and due to the exclusion principle, the ni ’s are Boolean variables 8 < 1 if a particle is entering site r at time t along lattice direction i n i (r; t) D : 0 otherwise : From this definition it is clear that, for HPP, the ni ’s are simply the components of the state s introduced above s D (n1 ; n2 ; n3 ; n4 ) : In an LGA model, the microdynamics can be naturally expressed in terms of the occupation numbers ni as n i (r C vi  t ; t C  t ) D n i (r; t) C ˝ i (n(r; t))

(14)

Cellular Automata Modeling of Physical Systems

be expressed as n i (r C vi  t ; t C  t ) D n i  n i n iC2 (1  n iC1 )(1  n iC3 ) C n iC1 n iC3 (1  n i )(1  n iC2 ) :

Cellular Automata Modeling of Physical Systems, Figure 12 The HPP rule: a a single particle has a ballistic motion until it experiences a collision; b and c the two nontrivial collisions of the HPP model: two particles experiencing a head on collision are deflected in the perpendicular direction. In the other situations, the motion is ballistic, that is the particles are transparent to each other when they cross the same site

where vi is a vector denoting the speed of the particle in the ith lattice direction. The function ˝ is called the collision term and it describes the interaction of the particles which meet at the same time and same location. Note that another way to express Eq. (14) is through the so-called collision and propagation operators C and P n(t C  t ) D PCn(t)

(15)

where n(t) describe the set of values n i (r; t) for all i and r. The quantities C and P act over the entire lattice. They are defined as (Pn) i (r) D n i (r  vi  t ) ; (Cn) i (r) D n i (r) C ˝ i : More specifically, for the HPP model, it can be shown [10] that the collision and propagation phase can

(16)

In this equation, the values i C m are wrapped onto the values 1 to 4 and the right-hand term is computed at position r and time t. The HPP rule captures another important ingredient of the microscopic nature of a real interaction: invariance under time reversal. Figure 12b and c show that, if at some given time, the directions of motion of all particles are reversed, the system will just trace back its own history. Since the dynamics of a deterministic cellular automaton is exact, this fact allows us to demonstrate the properties of physical systems to return to their original situation when all the particles reverse their velocity. Figure 13 illustrates the time evolution of an HPP gas initially confined in the left compartment of a container. There is an aperture on the wall of the compartment and the gas particles will flow so as to fill the entire space available to them. In order to include a solid boundary in the system, the HPP rule is modified as follows: when a site is a wall (indicated by an extra bit), the particles no longer experience the HPP collision but bounce back from where they came. Therefore, particles cannot escape a region delimited by such a reflecting boundary. If the system of Fig. 13 is evolved, it reaches an equilibrium after a long enough time and no macroscopic trace of its initial state is visible any longer. However, no information has been lost during the process (no numerical dissipation) and the system has the memory of where it comes from. Reversing all the velocities and iterating the HPP rule makes all particles go back to the compartment in which they were initially located. Reversing the particle velocity can be described by an operator R which swaps occupation numbers with opposite velocities (Rn) i D n iC2 : The reversibility of HPP stem from the fact that the collision and propagation operators obey PRP D R

CRC D R :

Thus (PC)n PR(PC)n D R which shows that the system will return to its initial state (though with opposite velocities) if the first n iterations are followed by a velocity change vi ! vi , a propagation and again n iterations.

421

422

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 13 Time evolution of an HPP gas. a From the initial state to equilibrium. b Illustration of time reversal invariance: in the rightmost image of a, the velocity of each particle is reversed and the particles naturally return to their initial position

This time-reversal behavior is only possible because the dynamics are perfectly exact and no numerical errors are present in the numerical scheme. If one introduces externally some errors (for instance, one can add an extra particle in the system) before the direction of motion of each particle is reversed, then reversibility is lost. Note that this property has inspired a new symmetric cryptography algorithm called Crystal [30], which exploits and develops the existing analogy between discrete physical models of particles and the standard diffusionconfusion paradigm of cryptography proposed by Shannon [39]. The FHP Model The HPP rule is interesting because it illustrates the basic ingredients of LGA models. However, the capability of this rule to model a real gas of particles is poor, due to a lack of isotropy and spurious invariants. A remedy to this problem is to use a different lattice and a different collision model.

The FHP rule (proposed by Frisch, Hasslacher, and Pomeau [18] in 1986) was the first CA whose behavior was shown to reproduce, within some limits, a two-dimensional fluid. The FHP model is an abstraction, at a microscopic scale, of a fluid. It is expected to contain all the salient features of a real fluid. It is well known that the continuity and Navier–Stokes equations of hydrodynamics express the local conservation of mass and momentum in a fluid. The detailed nature of the microscopic interactions does not affect the form of these equations but only the values of the coefficients (such as the viscosity) appearing in them. Therefore, the basic ingredients one has to include in the microdynamics of the FHP model is the conservation of particles and momentum after each updating step. In addition, some symmetries are required so that, in the macroscopic limit, where time and space can be considered as continuous variables, the system be isotropic. As in the case of the HPP model, the microdynamics of FHP is given in terms of Boolean variables describing the occupation numbers at each site of the lattice and at

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 14 The two-body collision in the FHP model. On the right part of the figure, the two possible outcomes of the collision are shown in dark and light gray, respectively. They both occur with probability one-half

Cellular Automata Modeling of Physical Systems, Figure 15 The three-body collision in the FHP model

each time step (i. e. the presence or the absence of a fluid particle). The FHP particles move in discrete time steps, with a velocity of constant modulus, pointing along one of the six directions of the lattice. Interactions take place among particles entering the same site at the same time and result in a new local distribution of particle velocities. In order to conserve the number of particles and the momentum during each interaction, only a few configurations lead to a nontrivial collision (i. e. a collision in which the directions of motion have changed). For instance, when exactly two particles enter the same site with opposite velocities, both of them are deflected by 60 degrees so that the output of the collision is still a zero momentum configuration with two particles. As shown in Fig. 14, the deflection can occur to the right or to the left, indifferently. For symmetry reasons, the two possibilities are chosen randomly, with equal probability. Another type of collision is considered: when exactly three particles collide with an angle of 120 degrees between each other, they bounce back (so that the momentum after collision is zero, as it was before collision). Figure 15 illustrates this rule. For the simplest case we are considering here, all interactions come from the two collision processes described above. For all other configurations (i. e. those which are not obtained by rotations of the situations given in Figs. 14

Cellular Automata Modeling of Physical Systems, Figure 16 Development of a sound wave in a FHP gas, due to an over particle concentration in the middle of the system

and 15) no collision occurs and the particles go through as if they were transparent to each other. Both two- and three-body collisions are necessary to avoid extra conservation laws. The two-particle collision removes a pair of particles with a zero total momentum and moves it to another lattice direction. Therefore, it conserves momentum along each line of the lattice. On the other hand, three-body interactions deflect particles by 180 degrees and cause the net momentum of each lattice line to change. However, three-body collisions conserve the number of particles within each lattice line. The FHP model has played a central role in computational physics because it can be shown (see for instance [10]) that the density , defined as the average number of particles at a given lattice site and u the average velocity of these particles, obey Navier–Stokes equation 1 @ t u C (u  r)u D  r p C r 2 u

(17)

where p D cs2 is the scalar pressure, with cs the speed of sound and is the kinematic viscosity. Note that here both and cs are quantities that emerge from the FHP dynamics. The speed of sound reflects the lattice topology whereas the viscosity reflects the details of the collision process. As an illustration, Fig. 16 shows the propagation of a density wave in a FHP model. Figure 17 shows the eddies that form when a FHP-fluid flow against an obstacle. More complex models can be built by adding new processes on top of a FHP-fluid. For instance, Fig. 18 shows the result of a model of snow transport and deposition by wind. In addition to the wind flow, obtained from a FHP model, snow particles are traveling due to the combined effect of wind and gravity. Upon reaching the ground, they pile up (possible after toppling) so as to form a new boundary condition for the wind. In Fig. 19 an extension of the model (using the lattice Boltzmann approach described in Sect. “Lattice Boltzmann Models”) shows how a fence with ground clearance creates a snow deposit.

423

424

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 19 A lattice Boltzmann snow transport and deposition model. The three panels show the time evolution of the snow deposit (yellow) past a fence. Airborne snowflakes are shown as white dots Cellular Automata Modeling of Physical Systems, Figure 17 Flow pattern from a simulation of a FHP model

u D

X

f i vi

i

Cellular Automata Modeling of Physical Systems, Figure 18 A CA snow transport and deposition model

Lattice Boltzmann Models Lattice Boltzmann models (LBM) are an extension of the CA-fluid described in the previous section. The main conceptual difference is that in LBM, the CA state is no longer Boolean numbers ni but real-valued quantity f i for each lattice directions i. Instead of describing the presence or absence of a particle, the interpretation of f i is the density distribution function of particles traveling in lattice directions i. As with LGA (see Eq. (16)), the dynamics of LBM can be expressed in terms of an equation for the f i ’s. The fluid quantities such as the density and velocity field u are obtained by summing the contributions from all lattice directions D

X i

fi

where vi denotes the possible particle velocities in the lattice. From a numerical point of view, the advantages of suppressing the Boolean constraint are several: less statistical noise, more numerical accuracy and, importantly, more flexibility to choose the lattice topology, the collision operator and boundary conditions. Thus, for many practical applications, the LBM approach is preferred to the LGA one. The so-called BGK (or “single-time relaxation”) LBM f i (r C  t vi ; t C  t ) D f i (r; t) C

 1  eq f ( ; u)  f i (18) i

where f eq is a given function, has been used extensively in the literature to simulate complex flows. The method is now recognized as a serious competitor of the more traditional approach based on the computer solution of the Navier–Stokes partial differential equation. It is beyond the scope of this article to discuss the LBM approach in more detail. We refer the reader to several textbooks on this topic [8,10,40,41,49]. In addition to some rather technical aspects, one advantage of the LBM over the Navier–Stokes equation is its extended range of validity when the Knudsen number is not negligible (e. g. in microflows) [2]. Note finally that the LBM approach is also applicable to reaction-diffusion processes [1] or wave equation [10] eq by simply choosing an appropriate expression for f i .

Cellular Automata Modeling of Physical Systems

Diffusion Processes Diffusive phenomena and reaction processes play an important role in many areas of physics, chemistry, and biology and still constitute an active field of research. Systems in which reactive particles are brought into contact by a diffusion process and transform, often give rise to very complex behaviors. Pattern formation [31,34], is a typical example of such a behavior. CA provide an interesting framework to describe reaction-diffusion phenomena. The HPP rule we discussed in the Sect. “A Simple Gas: The HPP Model” can be easily modified to produce many synchronous random walks. Random walk is well known to be the microscopic origin of a diffusion process. Thus, instead of experiencing a mass and momentum conserving collision, each particle now selects, at random, a new direction of motion among the possible values permitted by the lattice. Since several particles may enter the same site (up to four, on a two-dimensional square lattice), the random change of directions should be such that there are never two or more particles exiting a site in the same direction. This would otherwise violate the exclusion principle. The solution is to shuffle the directions of motion or, more precisely, to perform a random permutation of the velocity vectors, independently at each lattice site and each time step. Figure 20 illustrates this probabilistic evolution rule for a 2D square lattice. It can be shown that the quantity defined as the average number of particle at site r and time t obeys the dif-

Cellular Automata Modeling of Physical Systems, Figure 20 How the entering particles are deflected at a typical site, as a result of the diffusion rule. The four possible outcomes occur with respective probabilities p0 , p1 , p2 , and p3 . The figure shows four particles, but the mechanism is data-blind and any one of the arrows can be removed when fewer entering particles are present

fusion equation [10]   @ t C div D grad D 0 where D is the diffusion constant whose expression is DD

2r t



1 1  4(p C p2 ) 4

D

2r t



p C p0 4[1  (p C p0 )] (19)

where t and r are the time step and lattice spacing, respectively. For the one- and three-dimensional cases, a similar approach can be developed [10]. As an example of the use of the present random walk cellular automata rule, we discuss an application to growth processes. In many cases, growth is governed by an aggregation mechanism: like particles stick to each other as they meet and, as a result, form a complicated pattern with a branching structure. A prototype model of aggregation is the so-called DLA model (diffusion-limited aggregation), introduced by Witten and Sander [46] in the early 1980s. Since its introduction, the DLA model has been investigated in great detail. However, diffusion-limited aggregation is a far from equilibrium process which is not described theoretically by first principle only. Spatial fluctuations that are typical of the DLA growth are difficult to take into account and a numerical approach is necessary to complete the analysis. DLA-like processes can be readily modeled by our diffusion cellular automata, provided that an appropriate rule is added to take into account the particle-particle aggregation. The first step is to introduce a rest particle to represent the particles of the aggregate. Therefore, in a twodimensional system, a lattice site can be occupied by up to four diffusing particles, or by one “solid” particle. Figure 21 shows a two-dimensional DLA-like cluster grown by the cellular automata dynamics. At the beginning of the simulation, one or more rest particles are introduced in the system to act as aggregation seeds. The rest of the system is filled with particles with average concentration . When a diffusing particle becomes nearest neighbor to a rest particle, it stops and sticks to it by transforming into a rest particle. Since several particles can enter the same site, we may choose to aggregate all of them at once (i. e. a rest particle is actually composed of several moving particles), or to accept the aggregation only when a single particle is present. In addition to this question, the sticking condition is important. If any diffusing particle always sticks to the DLA cluster, the growth is very fast and can be influenced by the underlying lattice anisotropy. It is therefore more appropriate to stick with some probability ps .

425

426

Cellular Automata Modeling of Physical Systems

simultaneously present at a given site. Also, when Cs already exist at this site, the exclusion principle may prevent the formation of new ones. A simple choice is to have A and B react only when they perform a head-on collision and when no Cs are present in the perpendicular directions. Figure 22 displays such a process. Other rules can be considered if we want to enhance the reaction (make it more likely) or to deal with more complex situations (2A C B ! C, for instance). A parameter k can be introduced to tune the reaction rate K by controlling the probability of a reaction taking place. Using an appropriate mathematical tool [10], one can show that the idealized microscopic reaction-diffusion behavior implemented by the CA rule obeys the expected partial differential equation @ t A D Dr 2 A  K A B Cellular Automata Modeling of Physical Systems, Figure 21 Two-dimensional cellular automata DLA-like cluster (black), obtained with ps D 1, an aggregation threshold of 1 particle and a density of diffusing particle of 0.06 per lattice direction. The gray dots represent the diffusing particles not yet aggregated. The fractal dimension is found to be df D 1:78

Reaction-Diffusion Processes A reaction term can be added on top of the CA diffusion rule. For the sake of illustration let us consider a process such as K

ACB ! C

(20)

where A, B, and C are different chemical species, all diffusing in the same solvent, and K is the reaction constant. To account for this reaction, one can consider the following mechanism: at the “microscopic” level of the discrete lattice dynamics, all the three species are first governed by a diffusion rule. When an A and a B particle enter the same site at the same time, they disappear and form a C particle. Of course, there are several ways to select the events that will produce a C when more than one A or one B are

(21)

provided k is correctly chosen [10]. As an example of a CA reaction-diffusion model, we show in Fig. 23 the formation of the so-called Liesegang patterns [22]. Liesegang patterns are produced by precipitation and aggregation in the wake of a moving reaction front. Typically, they are observed in a test tube containing a gel in which a chemical species B (for example AgNO3 ) reacts with another species A (for example HCl). At the beginning of the experiment, B is uniformly distributed in the gel with concentration b0 . The other species A, with concentration a0 is allowed to diffuse into the tube from its open extremity. Provided that the concentration a0 is larger than b0 , a reaction front propagates in the tube. As this A C B reaction goes on, formation of consecutive bands of precipitate (AgCl in our example) is observed in the tube, as shown in Fig. 23. Although this figure is from a computer simulation [12], it is very close to the picture of a real experiment. Figure 24 shows the same process but in a different geometry. Species A is added in the middle of a 2D gel and diffuses radially. Rings (a) or spirals (b) result from the interplay between the reaction front and the solidification.

Cellular Automata Modeling of Physical Systems, Figure 22 Automata implementation of the A C B ! C reaction process. The reaction takes place with probability k. The Boolean quantity  determines in which direction the C particle is moving after its creation

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 23 Example of the formation of Liesegang bands in a cellular automata simulation. The red bands correspond to the precipitate which results from the A C B reaction front (in blue)

Cellular Automata Modeling of Physical Systems, Figure 24 Formation of Liesegang rings in a cellular automata simulation. The red spots correspond to the precipitate created by the A C B reaction front (in blue)

Excitable Media Excitable media are other examples of reaction processes where unexpected space-time patterns are created. As opposed to the reaction-diffusion models discussed above, diffusion is not considered here explicitly. It is assumed that reaction occurs between nearest neighbor cells, making the transport of species unnecessary. The main focus is on the description of chemical waves propagating in the system much faster and differently than any diffusion process would produce. An excitable medium is basically characterized by three states [5]: the resting state, the excited state, and the refractory state. The resting state is a stable state of the system. But a resting state can respond to a local perturbation and become excited. Then, the excited state evolves to a refractory state where it no longer influences its neighbors and, finally, returns to the resting state.

A generic behavior of excitable media is to produce chemical waves of various geometries [26,27]. Ring and spiral waves are a typical pattern of excitations. Many chemical systems exhibit an excitable behavior. The Selkov model [38] and the Belousov–Zhabotinsky reaction are examples. Chemical waves play an important role in many biological processes (nervous systems, muscles) since they can mediate the transport of information from one place to another. The Greenberg–Hasting model is an example of a cellular automata model of an excitable media. This rule, and its generalization, have been extensively studied [17,20]. The implementation we propose here for the Greenberg–Hasting model is the following: the state (r; t) of site r at time t takes its value in the set f0; 1; 2; : : : ; n  1g. The state D 0 is the resting state. The states D 1; : : : ; n/2 (n is assumed to be even) correspond to excited states. The rest, D n/2 C 1; : : : ; n  1 are the re-

427

428

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 25 Excitable medium: evolution of a configuration with 5% of excited states D 1, and 95% of resting states (black), for n D 8 and kD3

fractory states. The cellular automata evolution rule is the following: (r; t) is excited or refractory, then (r; t C 1) D (r; t) C 1 mod n. 2. If (r; t) D 0 (resting state) it remains so, unless there are at least k excited sites in the Moore neighborhood of site r. In this case  (r; t C 1) D 1.

1. If

The n states play the role of a clock: an excited state evolves through the sequence of all possible states until it returns to 0, which corresponds to a stable situation. The behavior of this rule is quite sensitive to the value of n and the excitation threshold k. Figure 25 shows the evolution of this automaton for a given set of the parameters n and k. The simulation is started with a uniform configuration of resting states, perturbed by some excited sites randomly distributed over the system. Note that if the concentration of perturbation is low enough, excitation dies out rapidly and the system returns to the rest state. Increasing the number of perturbed states leads to the formation of traveling waves and self-sustained oscillations may appear in the form of ring or spiral waves. The Greenberg–Hasting model has some similarity with the “tube-worms” rule proposed by Toffoli and Margolus [42]. This rule is intended to model the Belousov– Zhabotinsky reaction and is as follows. The state of each site is either 0 (refractory) or 1 (excited) and a local timer (whose value is 3, 2, 1, or 0) controls the refractory period. Each iteration of the rule can be expressed by the following sequence of operations: (i) where the timer is zero, the state is excited; (ii) the timer is decreased by 1 unless it is 0; (iii) a site becomes refractory whenever the timer is equal to 2; (iv) the timer is reset to 3 for the excited sites which

Cellular Automata Modeling of Physical Systems, Figure 26 The tube-worms rule for an excitable media

have two, or more than four, excited sites in their Moore neighborhood. Figure 26 shows a simulation of this automaton, starting from a random initial configuration of the timers and the excited states. We observe the formation of spiral pairs of excitations. Note that this rule is very sensitive to small modifications (in particular to the order of operations (i) to (iv)). Another rule which is also similar to Greenberg– Hasting and Margolus–Toffoli tube-worm models is the

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 27 The forest fire rule: green sites correspond to a grown tree, black pixels represent burned sites and the yellow color indicates a burning tree. The snapshots given here represent three situations after a few hundred iterations. The parameters of the rule are p D 0:3 and f D 6  105

so-called forest-fire model. This rule describes the propagation of a fire or, in a different context, may also be used to mimic contagion in the case of an epidemic. Here we describe the case of a forest-fire rule. The forest-fire rule is a probabilistic CA defined on a d-dimensional cubic lattice. Initially, each site is occupied by either a tree, a burning tree, or is empty. The state of the system is parallel updated according to the following rule: (1) a burning tree becomes an empty site; (2) a green tree becomes a burning tree if at least one of its nearest neighbors is burning; (3) at an empty site, a tree grows with probability p; (4) A tree without a burning neighbor becomes a burning tree with probability f (so as to mimic an effect of lightning). Figure 27 illustrates the behavior of this rule, in a twodimensional situation. Provided that the time scales of tree growth and burning down of forest clusters are well separated (i. e. in the limit f /p ! 0), this model has self-organized critical states [15]. This means that in the steady state, several physical quantities characterizing the system have a power law behavior. Surface Reaction Models Other important reaction models that can be described by a CA are surface-reaction models where nonequilibrium phase transitions can be observed. Nonequilibrium phase transitions are an important topic in physics because no general theory is available to describe such systems and most of the known results are based on numerical simulations. The so-called Ziff model [54] gives an example of the reaction A–B2 on a catalyst surface (for example CO–O2 ). The system is out of equilibrium because it is an open system in which material continuously flows in and out.

However, after a while, it reaches a stationary state and, depending on some control parameters, may be in different phases. The basic steps are  A gas mixture with concentrations X B2 of B2 and X A of A sits above a surface on which it can be adsorbed. The surface is divided into elementary cells and each cell can adsorb one atom only.  The B species can be adsorbed only in the atomic form. A molecule B2 dissociates into two B atoms only if two adjacent cells are empty. Otherwise the B2 molecule is rejected.  If two nearest neighbor cells are occupied by different species they chemically react and the product of the reaction is desorbed. In the example of the CO–O2 reaction, the desorbed product is a CO2 molecule. This final desorption step is necessary for the product to be recovered and for the catalyst to be regenerated. However, the gas above the surface is assumed to be continually replenished by fresh material so that its composition remains constant during the whole evolution. It is found by sequential numerical simulation [54] that a reactive steady state occurs only in a window defined by X1 < X A < X2 where X1 D 0:389 ˙ 0:005 and X2 D 0:525 ˙ 0:001 (provided that X B2 D 1  X A ). This situation is illustrated in Fig. 28, though for the corresponding cellular automata dynamics and X B2 ¤ 1  X A . Outside this window of parameter, the steady state is a “poisoned” catalyst of pure A (when X A > X 2 ) or pure B (when X A < X 1 ). For X1 < X A < X 2 , the coverage fraction varies continuously with X A and one speaks of a continuous (or second-order) nonequilibrium phase

429

430

Cellular Automata Modeling of Physical Systems

Cellular Automata Modeling of Physical Systems, Figure 28 Typical microscopic configuration in the stationary state of the CA Ziff model, where there is coexistence of the two species over time. The simulation corresponds to the generalized model described by rules R1, R2, R3, and R4 below. The blue and green dots represent, respectively, the A and B particles, whereas the empty sites are black

transition. At X A D X2 , the coverage fraction varies discontinuously with X A and one speaks of a discontinuous (or first-order) nonequilibrium phase transition. Figure 30 displays this behavior. The asymmetry of behavior at X 1 and X 2 comes from the fact that A and B atoms have a different adsorption rule: two vacant adjacent sites are necessary for B to stick on the surface, whereas one empty site is enough for A.

In a CA approach the elementary cells of the catalyst are mapped onto the cells of the automaton. In order to model the different processes, each cell j can be in one of four different states, denoted j j i D j0i, jAi, jBi or jCi. The state j0i corresponds to an empty cell, jAi to a cell occupied by an atom A, and jBi to a cell occupied by an atom B. The state jCi is artificial and represents a precursor state describing the conditional occupation of the cell by an atom B. Conditional means that during the next evolution step of the automaton, jCi will become jBi or j0i depending upon the fact that a nearest neighbor cell is empty and ready to receive the second B atom of the molecule B2 . This conditional state is necessary to describe the dissociation of B2 molecules on the surface. The main difficulty when implementing the Ziff model with a fully synchronous updating scheme is to ensure that the correct stoichiometry is obeyed. Indeed, since all atoms take a decision at the same time, the same atom could well take part in a reaction with several different neighbors, unless some care is taken. The solution to this problem is to add a vector field to every site in the lattice [53], as shown in Fig. 29. A vector field is a collection of arrows, one at each lattice site, that can point in any of the four directions of the lattice. The directions of the arrows at each time step are assigned randomly. Thus, a two-site process is carried out only on those pairs of sites in which the arrows point toward each other (matching nearest-neighbor pairs (MNN)). This concept of reacting matching pairs is a general way to partition the parallel computation in local parts. In the present implementation, the following generalization of the dynamics is included: an empty site remains empty with some probability. One has then two control

Cellular Automata Modeling of Physical Systems, Figure 29 Illustration of rules R2 and R3. The arrows select which neighbor is considered for a reaction. Dark and white particles represent the A and B species, respectively. The shaded region corresponds to cells that are not relevant to the present discussion such as, for instance, cells occupied by the intermediate C species

Cellular Automata Modeling of Physical Systems

R4: If j

D jCi then 8 ˆ 0, the system hX; F n i is transitive. Proposition 1 ([27]) Any mixing DTDS is totally transitive. At the beginning of the eighties, Auslander and Yorke introduced the following definition of chaos [5]. Definition 7 (AY-Chaos) A DTDS is AY-chaotic if it is transitive and it is sensitive to the initial conditions. This definition involves two fundamental characteristics: the undecomposability of the system, due to transitivity and the unpredictability of the dynamical evolution, due to sensitivity. We now introduce a notion which is often referred to as an element of regularity a chaotic dynamical system must exhibit. Definition 8 (DPO) A DTDS hX; Fi has the denseness of periodic points (or, it is regular) if the set of its periodic points is dense in X. The following is a standard result for compact DTDS. Proposition 2 If a compact DTDS has DPO then it is surjective. In his famous book [22], Devaney modified the AY-chaos adding the denseness of periodic points. Definition 9 (D-Chaos) A DTDS is said to be D-chaotic if it is sensitive, transitive and regular. An interesting result states that sensitivity, despite its popular appeal, is redundant in the Devaney definition of chaos.

Stability All the previous properties can be considered as components of a chaotic, and then unstable, behavior for a DTDS. We now illustrate some properties concerning conditions of stability for a system. Definition 10 (Equicontinuous Point) A state x 2 X of a DTDS hX; Fi is an equicontinuous point if for any " > 0 there exists ı > 0 such that for all y 2 X, d(y; x) < ı implies that 8n 2 N; d(F n (y); F n (x)) < ". In other words, a point x is equicontinuous (or Lyapunov stable) if for any " > 0, there exists a neighborhood of x whose states have orbits which stay close to the orbit of x with distance less than ". This is a condition of local stability for the system. Associated with this notion involving a single state, we have two notions of global stability based on the “size” of the set of the equicontinuous points. Definition 11 (Equicontinuity) A DTDS hX; Fi is said to be equicontinuous if for any " > 0 there exists ı > 0 such that for all x; y 2 X, d(y; x) < ı implies that 8n 2 N; (F n (y); F n (x)) < ". Given a DTDS, let E be its set of equicontinuity points. Remark that if a DTDS is equicontinuous then the set E of all its equicontinuity points is the whole X. The converse is also true in the compact settings. Furthermore, if a system is sensitive then E D ;. In general, the converse is not true [43]. Definition 12 (Almost Equicontinuity) A DTDS is almost equicontinuous if the set of its equicontinuous points E is residual (i. e., it can be obtained by a infinite intersection of dense open subsets). It is obvious that equicontinuous systems are almost equicontinuous. In the sequel, almost equicontinuous systems which are not equicontinuous will be called strictly almost equicontinuous. An important result affirms that transitive systems on compact spaces are almost equicontinuous if and only if they are not sensitive [3].

481

482

Chaotic Behavior of Cellular Automata

Topological Entropy Topological entropy is another interesting property which can be taken into account in order to study the degree of chaoticity of a system. It was introduced in [2] as an invariant of topological conjugacy. The notion of topological entropy is based on the complexity of the coverings of the systems. Recall that an open covering of a topological space X is a family of open sets whose union is X. The join of two open coverings U and V is U _ V D fU \ V : U 2 U; V 2 V g. The inverse image of an open covering U by a map F : X 7! X is F 1 (U) D fF 1 (U) : U 2 Ug. On the basis of these previous notions, the entropy of a system hX; Fi over an open covering U is defined as H(X;F; U) log jU _ F 1 (U)    _ F (n1) (U)j ; n!1 n

D lim

where jUj is the cardinality of U. Definition 13 (Topological Entropy) The topological entropy of a DTDS hX; Fi is h(X; F) D supfH(X; F; U) : U is an open covering of Xg (1) Topological entropy represents the exponential growth of the number of orbit segments which can be distinguished with a certain good, but finite, accuracy. In other words, it measures the uncertainty of the system evolutions when a partial knowledge of the initial state is given. There are close relationships between the entropy and the topological properties we have seen so far. For instance, we have the following. Proposition 4 ([3,12,28]) In compact DTDS, transitivity and positive entropy imply sensitivity. Cellular Automata Consider the set of configurations C which consists of all functions from Z D into A. The space C is usually equipped with the Thychonoff (or Cantor) metric d defined as 8a; b 2 C ;

d(a; b) D 2n ; with

n o v ) ¤ b(E v) ; n D min kvkE1 : a(E vE2Z D

where kvkE1 denotes the maximum of the absolute value of the components of vE. The topology induced by d coincides with the product topology induced by the discrete

topology on A. With this topology, C is a compact, perfect and totally disconnected space. ˚ Let N D uE1 ; : : : ; uEs be an ordered set of vectors of Z D and f : As 7! A be a function. Definition 14 (CA) The D-dimensional CA based on the local rule f and the neighborhood frame N is the pair hC i ; F where F : C 7! C is the global transition rule defined as follows: v 2 ZD ; 8c 2 C ; 8E F(c)(E v ) D f (c(E v C uE1 ); : : : ; c(E v C uEs )) : (2) Note that the mapping F is (uniformly) continuous with respect to the Thychonoff metric. Hence, the pair hC i ; F is a proper discrete time dynamical system. Let Z m D f0; 1; : : : ; m  1g be the group of the integers modulo m. Denote by Cm the configuration space C for the special case A D Z m . When Cm is equipped with the natural extensions of the sum and the product operations, it turns out to be a linear space. Therefore, one can exploit the properties of linear spaces to simplify the proofs and the overall presentation. s 7! Z is said to be linear if there A function f : Z m m exist 1 ; : : : ; s 2 Z m such that it can be expressed as: " s # X s i x i 8(x1 ; : : : ; x s ) 2 Z m ; f (x1 ; : : : ; x s ) D iD1

m

where [x]m is the integer x taken modulo m. Definition 15 (Linear CA) A D-dimensional linear CA is a CA hC im ; F whose local rule f is linear. Note that for the linear CA Equ. (2) becomes: " s # X D v 2 Z ; F(c)(E v) D  i c(E v C uE i ) 8c 2 C ; 8E iD1

: m

The Case of Cellular Automata In this section the results seen so far are specialized to the CA setting, focusing on dimension one. The following result allows a first classification of one-dimensional CA according to their degree of chaoticity. Theorem 1 ([43]) A one-dimensional CA is sensitive if and only if it is not almost equicontinuous. In other words, for CA the dichotomy between sensitivity and almost equicontinuity is true and not only under the transitivity condition. As a consequence, the family of all one-dimensional CA can be partitioned into four classes [43]: (k1) equicontinuous CA; (k2) strictly almost equicontinuous CA;

Chaotic Behavior of Cellular Automata

(k3) sensitive CA; (k4) expansive CA. This classification almost fits for higher dimensions. The problem is that there exist CA between the classes K2 and K3 (i. e., non sensitive CA without any equicontinuity point). Even relaxing K2 definition to “CA having some equicontinuity point”, the gap persists (see, for instance [57]). Unfortunately, much like most of the interesting properties of CA, the properties defining the above classification scheme are also affected by undecidability. Theorem 2 ([23]) For each i D 1; 2; 3, there is no algorithm to decide if a one-dimensional CA belongs to the class Ki. The following conjecture stresses the fact that nothing is known about the decidability for the membership in K4. Conjecture 1 (Folklore) Membership in class K4 is undecidable. Remark that the above conjecture is clearly false for dimensions greater than 1 since there do not exist expansive CA for dimension strictly greater than 1 [53]. Proposition 5 ([7,11]) Expansive CA are strongly transitive and mixing. In CA settings, the notion of total transitivity reduces to the simple transitivity. Moreover, there is a strict relation between transitive and sensitive CA. Theorem 3 ([49]) If a CA is transitive then it is totally transitive. Theorem 4 ([28]) Transitive CA are sensitive. As we have already seen, sensitivity is undecidable. Hence, in view of the combinatorial complexity of transitive CA, the following conjectures sound true. Conjecture 2 Transitivity is an undecidable property. Conjecture 3 [48] Strongly transitive CA are (topologically) mixing. Chaos and Combinatorial Properties In this section, when referring to a one-dimensional CA, we assume that u1 D min N and us D max N (see also Sect. “Definitions”). Furthermore, we call elementary a one-dimensional CA with alphabet A D f0; 1g and N D f1; 0; 1g (there exist 256 possible elementary CA which can be enumerated according to their local rule [64]).

In CA settings, most of the chaos components are related to some properties of combinatorial nature like injectivity, surjectivity and openness. First of all, remark that injectivity and surjectivity are dimension sensitive properties in the sense of the following. Theorem 5 ([4,39]) Injectivity and surjectivity are decidable in dimension 1, while they are not decidable in dimension greater than 1. A one-dimensional CA is said to be a right CA (resp., left CA) if u1 > 0 (resp., us < 0). Theorem 6 ([1]) Any surjective and right (or left) CA is topologically mixing. The previous result can be generalized in dimension greater than 1 in the following sense. Theorem 7 If for a given surjective D-dimensional CA there exists a (D  1)-dimensional hyperplane H (as a linear subspace of Z D ) such that: 1. all the neighbor vectors stay on the same side of a H, and 2. no vectors lay on H, then the CA is topologically mixing. Proof Choose two configurations c and d and a natural number r. Let U and V be the two distinct open balls of radius 2r and of center c and d, respectively (in a metric space (X; d) the open ball of radius ı > 0 and center x 2 X is the set Bı (x) D fy 2 X j d(y; x) < ıg). For any integer n > 1, denote by N n the neighbor frame of the CA F n and with d n 2 F n (d) any n-preimage of d. The values c(E x ) for xE 2 O depend only on the values d n (E x ) for xE 2 v j kvkE1  rg. By the hypothesis, O C N n , where O D fE there exists an integer m > 0 such that for any n m the sets O and O C N n are disjoint. Therefore, for any n m x ) D d(E x ) for build a configuration e n 2 C such that e n (E xE 2 O, and e n (E x ) D d n (E x ) for xE 2 O C N n . Then, for any n m, e n 2 V and F n (e n ) 2 U.  Injectivity prevents a CA from being strong transitive as stated in the following Theorem 8 ([11]) Any strongly transitive CA is not injective. Recall that a CA of global rule F is open if F is an open function. Equivalently, in the one-dimensional case, every configuration has the same numbers of predecessors [33]. Theorem 9 ([55]) Openness is decidable in dimension one. Remark that mixing CA are not necessarily open (consider, for instance, the elementary rule 106, see [44]). The

483

484

Chaotic Behavior of Cellular Automata

following conjecture is true when replacing strong transitivity by expansively [43]. Conjecture 4 Strongly transitive CA are open. Recall that the shift map  : AZ 7! AZ is the one-dimensional linear CA defined by the neighborhood N D fC1g and by the coefficient 1 D 1. A configuration of a one-dimensional CA is called jointly periodic if it is periodic both for the CA and the shift map (i. e., it is also spatially periodic). A CA is said to have the joint denseness of periodic orbits property (JDPO) if it admits a dense set of jointly periodic configurations. Obviously, JDPO is a stronger form of DPO. Theorem 10 ([13]) Open CA have JDPO. The common feeling is that (J)DPO is a property of a class wider than open CA. Indeed, Conjecture 5 [8] Every surjective CA has (J)DPO.

As a consequence of Theorem 13, if all mixing CA have DPO then all transitive CA have DPO. Permutivity is another easy-to-check combinatorial property strictly related to chaotic behavior. Definition 16 (Permutive CA) A function f : As 7! A is permutive in the variable ai if for any (a1 ; : : : ; a i1 ; a iC1 ; : : : ; a s ) 2 As1 the function a 7! f (a1 ; : : : ; a i1 ; a; a iC1 ; : : : ; a s ) is a permutation. In the one-dimensional case, a function f that is permutive in the leftmost variable a1 (resp., rightmost as ), it is called leftmost (resp. rightmost) permutive. CA with either leftmost or rightmost permutive local rule share most of the chaos components. Theorem 14 ([18]) Any one-dimensional CA based on a leftmost (or, rightmost) permutive local rule with u1 < 0 (resp., us > 0) is topologically mixing.

If this conjecture were true then, as a consequence of Theorem 5 and Proposition 2, DPO would be decidable in dimension one (and undecidable in greater dimensions). Up to now, Conjecture 5 has been proved true for some restricted classes of one-dimensional surjective CA beside open CA.

The previous result can be generalized to any dimension in the following sense.

Theorem 11 ([8]) Almost equicontinuous surjective CA have JDPO.

1. f is permutive in the variable ai , and u i k2 D maxfkE uk2 j 2. the neighbor vector uE i is such that kE uE 2 Ng, and 3. all the coordinates of uEi have absolute value l, for some integer l > 0,

Consider for a while a CA whose alphabet is an algebraic group. A configuration is said to be finite if there exists an integer h such that for any i, jij > k, c(i) D 0, where 0 is P the null element of the group. Denote s(c) D i c(i) the sum of the values of a finite configuration. A one-dimensional CA F is called number conserving if for any finite configuration c, s(c) D s(F(c)). Theorem 12 ([26]) have DPO.

Number-conserving surjective CA

If a CA F is number-conserving, then for any h 2 Z the CA  h ı F is number-conserving. As a consequence we have that Corollary 1 Number-conserving surjective CA have JDPO. Proof Let F be a number-conserving CA. Choose h 2 Z in such a way that the CA  h ı F is a (number-conserving) right CA. By a result in [1], both the CA  h ı F and F have JDPO.  In a recent work it is proved that the problem of solving Conjecture 5 can be reduced to the study of mixing CA. Theorem 13 ([1]) If all mixing CA have DPO then every surjective CA has JDPO.

Theorem 15 Let f and N be the local rule and the neighborhood frame, respectively, of a given D-dimensional CA. If there exists i such that

then the given CA is topologically mixing. Proof Without loss of generality, assume that uE i D (l; : : : ; l). Let U and V be two distinct open balls of equal radius 2r , where r is an arbitrary natural number. For any integer n 1, denote with N n the neighbor frame of the CA F n , with f n the corresponding lox ) the set fE x C vE j vE 2 N n g for cal rule, and with N n (E a given xE 2 Z D . Note that f n is permutive in the variable corresponding to the neighbor vector nuE i 2 N n . Choose two configurations c 2 U and d 2 V. Let m be the smaller natural number such that ml > 2r. For any n m, we are going to build a configuration d n 2 U such that F n (d n ) 2 V. Set d n (Ez ) D c(Ez) for zE 2 O where O D fE v j kvkE1  rg. In this way d n 2 U. In order to obn x ) D d(E x ) for tain F (d n ) 2 V , it is required that F n (d n )(E each xE 2 O. We complete the configuration dn by starting with xE D yE, where yE D (r; : : : ; r). Choose arbitrarily the values d n (Ez ) for zE 2 (N n (Ey ) n O) n f yE C nuE i g (note that O  N n (Ey )). By permutivity of f n , there exists a 2 A such that if one set d n (Ey C nuE i ) D a, then

Chaotic Behavior of Cellular Automata

F n (d n )(Ey ) D d(Ey ). Let now xE D yE C eE1 . Choose arbitrarily the values d n (Ez) for zE 2 (N n (E x ) n N n (Ey )) n fE x C nuEi g. By the same argument as above, there exists a 2 A such that if one set d n (E x C nuEi ) D a, then F n (d n )(E x ) D d(E x ). Proceeding in this way one can complete dn in order to  obtain F n (d n ) 2 V. Theorem 16 ([16]) Any one-dimensional CA based on a leftmost (or, rightmost) permutive local rule with u1 < 0 (resp., us > 0) has (J)DPO. Theorem 17 ([54]) Any one-dimensional CA based on a leftmost and rightmost permutive local rule with u1 < 0 and us > 0 is expansive. As a consequence of Proposition 5, we have the following result for which we give a direct proof in order to make clearer the result which follows immediately after. Proposition 6 Any one-dimensional CA based on a leftmost and rightmost permutive local rule with u1 < 0 and us > 0 is strongly transitive. Proof Choose arbitrarily two configurations c; o 2 AZ and an integer k > 0. Let n > 0 be the first integer such that nr > k, where r D maxfu1 ; us g. We are going to construct a configuration b 2 AZ such that d(b; c) < 2k and F n (b) D o. Fix b(x) D c(x) for each x D nu1 ; : : : ; nus  1. In this way d(b; c) < 2k . For each i 2 N we are going to find suitable values b(nus C i) in order to obtain F n (b)(i) D o(i). Let us start with i D 0. By the hypothesis, the local rule f n of the CA F n is permutive in the rightmost variable nus . Thus, there exists a value a0 2 A such that, if one sets b(nus ) D a0 , we obtain F n (b)(0) D o(0). By the same reasons as above, there exists a value a1 2 A such that, if one set b(nus C 1) D a1 , we obtain F n (b)(1) D o(1). Proceeding in this way one can complete the configuration b for any position nus C i. Finally, since f n is permutive also in the leftmost variable nu1 one can use the same technique to complete the configuration b for the positions nu1  1, nu1  2, . . . , in such a way that for any integer i < 0, F n (b)(i) D o(i).  The previous result can be generalized as follows. Denote eE1 ; eE2 ; : : : ; eED the canonical basis of RD . Theorem 18 Let f and N be the local rule and the neighborhood frame, respectively, of a given D-dimensional CA. If there exists an integer l > 0 such that 1. f is permutive in all the 2D variable corresponding to the neighbor vectors (˙l; : : : ; ˙l), and 2. for each vector uE 2 N, we have kuk1  l, then the CA F is strongly transitive.

Proof For the sake of simplicity, we only trait the case D D 2. For higher dimensions, the idea of the proof is the same. Let uE2 D (l; l), uE3 D (l; l), uE4 D (l; l), uE5 D (l; l). Choose arbitrarily two configurations c; o 2 2 AZ and an integer k > 0. Let n > 0 be the first integer such that nl > k. We are going to construct a configura2 tion b 2 AZ such that d(b; c) < 2k and F n (b) D o. Fix b(x) D c(x) for each xE ¤ nuE2 with kxkE1  n. In this way d(b; c) < 2k . For each i 2 Z we are going to find suitable values for the configuration b in order to obtain F n (b)(i eE1 ) D o(i eE1 ). Let us start with i D 0. By the hypothesis, the local rule f n of the CA F n is permutive in the variable nuE2 . Thus, there exists a value a(0;0) 2 A such that, if one set b(nuE1 ) D a(0;0) , we obtain F n (b)(0E ) D o(0E ). Now, choose arbitrary values of b in the positions (n C 1)E e1 C j eE2 for j D n; : : : ; n  1. By the same reasons as above, there exists a value a(0;1) 2 A such that, if one sets b(nu2 C 1E e1 ) D a(0;1) , we obtain e1 ) D o(E e1 ). Proceeding in this way, at each step i F n (b)(E (i > 1), one can complete the configuration b for all the positions (n C i)E e1 C jE e2 for j D n; : : : ; n, obtaining F n (b)(i eE1 ) D o(i eE1 ). In a similar way, by using the fact that the local rule f n of the CA F n is permutive in the variable nuE3 , for any i < 0 one can complete the cone2 for figuration b for all the positions (n C i)E e1 C jE j D n; : : : ; n, obtaining F n (b)(i eE1 ) D o(i eE1 ). Now, for each step j D 1; 2; : : : , choose arbitrarily the values of b e2 and i eE1 C (n C j)E e2 with in the positions i eE1 C (n C j)E i D n; : : : n  1. The permutivity of f n in the variables nuE2 , nuE3 , nuE5 , and nuE4 permits one to complete the configuration b in the positions (n C i)E e1 C (n C j)E e2 for all e2 for all integer integers i 0, (n C i)E e1 C (n C j)E e2 for all integers i 0, i < 0, (n C i)E e1 C (n  j)E and (n C i)E e1 C (n  j)E e2 for all integers i < 0, so e2 ) D that for each step j we obtain 8i 2 Z; F n (b)(i eE1 C jE e2 ).  o(i eE1 C jE CA, Entropy and Decidability In [34], it is shown that, in the case of CA, the definition of topological entropy can be restated in a simpler form than (1). The space-time diagram S(c) generated by a configuration c of a D-dimensional CA is the (D C 1)-dimensional infinite figure obtained by drawing in sequence the elements of the orbit of initial state c along the temporal axis. Formally, S(c) is a function from N  Z D in A defined as v ). For a given CA, 8t 2 N; 8E v 2 Z D , S(c)(t; vE) D F t (c)(E fix a time t and a finite square region of side of length k in the lattice. In this way, a finite (D C 1)-dimensional figure (hyper-rectangle) is identified in all space-time diagrams.

485

486

Chaotic Behavior of Cellular Automata

Transitivity a linear CA is topologically transitive if and only if gcd f2 ; : : : ; s g D 1. Mixing a linear CA is topologically mixing if and only if it is topologically transitive. Strong transitivity a linear CA is strongly transitive if for each prime p dividing m, there exist at least two coefficients  i ;  j such that p −  i and p −  j . Regularity (DPO) a linear CA has the denseness of periodic points if it is surjective. Chaotic Behavior of Cellular Automata, Figure 1 N (k; t) is the number of distinct blue blocks that can be obtained starting from any initial configuration (orange plane)

Let N (k; t) be the number of distinct finite hyper-rectangles obtained by all possible space-time diagrams for the CA (i. e., N (k; t) is the number of the all space-time diagrams which are distinct in this finite region). The topological entropy of any given CA can be expressed as h(C ; F) D lim lim N (k; t) k!1 t!1

Despite the expression of the CA entropy is simpler than for a generic DTDS, the following result holds. Theorem 19 ([34]) The topological entropy of CA is uncomputable. Nevertheless there exist some classes of CA where it is computable [20,45]. Unfortunately, in most of these cases it is difficult to establish if a CA is a member of these classes. Results for Linear CA: Everything Is Detectable In the sequel, we assume that a linear CA on Cm is based on a neighborhood frame N D uE1 ; : : : ; uEs whose corresponding coefficients of the local rule are 1 ; : : : ; s . Moreover, without loss of generality, we suppose uE1 D 0E. In most formulas the coefficient 1 does not appear.

Concerning positive expansively, since in dimensions greater than one, there are no such CA, the following theorem characterizes expansively for linear CA in dimension one. For this situation we consider a local Pr  rule f with expression f (xr ; : : : ; x r ) D iDr a i x i m . Theorem 21 ([47]) A linear one dimensional CA is positively expansive if and only if gcd fm; ar ; : : : ; a1 g D 1 and gcd fm; a1 ; : : : ; ar g D 1. Decidability Results for Other Properties The next result was stated incompletely in [47] since the case of non sensitive CA without equicontinuity points is not treated, tough they exist [57]. Theorem 22 Let F be a linear cellular automaton. Then the following properties are equivalent 1. 2. 3. 4.

F is equicontinuous F has an equicontinuity point F is not sensitive for all prime p such that pjm, p divides gcd(2 ; : : : ; s ).

Proof 1) H) 2) and 2) H) 3) are obvious. 3) H) 4) is done by negating the formula for sensitive CA in Theorem 20. Let us prove that 4) H) 1). Suppose that F is a linear CA. We decompose F D G C H by separating the term in 1 from the others: " s # X H(x)(E v ) D 1 x(E v ) G(x)(E v) D  i c(E v C uE i ) : iD2

Decidability Results for Chaotic Properties The next results state that all chaotic properties introduced in Section III are decidable. Yet, one can use the formulas to build samples of cellular automata that has the required properties. Theorem 20 ([17,19,47]) Sensitivity a linear CA is sensitive to the initial conditions if there exists a prime number p such that pjm and p − gcd f2 ; : : : ; s g.

m

˛    p l l be the decomposition in prime factors lcm f˛ i g. The condition 4) gives that for all k,

p˛1 1

Let m D and a D l p j and then m divides any product of a factors ˘ iD1 i k i . Let vE be a vector such that for all i, uE i  vE has nonnegative coordinates. Classically, we represent local rules of linear CA by D-variable polynomials (this representation, together with the representation of configurations by formal power series allows to simplify the calculus of images through the iterates of the CA [36]). Let X 1 ; : : : ; X D

Chaotic Behavior of Cellular Automata

be the variables. For yE D (y1 ; : : : ; y D ) 2 Z D , we note X yE D X y i . We consider the polynomial P the monomial ˘ iD1 i associated with G combined with a translation of vector s  X uE i E v . The coefficients of Pa are products vE, P D ˘ iD2 i of a factors i hence [P a ] m D 0. This means that the composition of G and the translation of vector vE is nilpotent and then that G is nilpotent. As F is the sum of 1 times the identity and a nilpotent CA, we conclude that F is equicontinuous.  The next theorem gives the formula for some combinatorial properties Theorem 23 ([36]). Surjectivity a linear CA is surjective if and only if gcd f1 ; : : : ; s g D 1. Injectivity a linear CA is injective if and only if for each prime p decomposing m there exists an unique coefficient i such that p does not divide i . Computation of Entropy for Linear Cellular Automata Let us start by considering the one-dimensional case. Theorem 24 Let us consider a one dimensional linear CA. Let m D p1k 1    p khh be the prime factor decomposition of m. The topological entropy of the CA is h(C; F) D

h X

k i (R i  L i ) log(p i )

iD1

where L i D min Pi and R i D max Pi , with Pi D f0g [ f j : gcd(a j ; p i ) D 1g. In [50], it is proved that for dimensions greater than one, there are only two possible values for the topological entropy: zero or infinity. Theorem 25 A D-dimensional linear CA hC; Fi with D 2 is either sensitive and h(C; F) D 1 or equicontinuous and h(C; F) D 0. By a combination of Theorem 25 and 20, it is possible to establish if a D dimensional linear CA with D 2 has either zero or infinite entropy. Linear CA, Fractal Dimension and Chaos In this section we review the relations between strong transitivity and fractal dimension in the special case of linear CA. The idea is that when a system is chaotic then it produces evolutions which are complex even from a (topological) dimension point of view. Any linear CA F, can be associated with its W-limit set, a subset of the (D C 1)-dimensional Euclid space defined

as follows. Let tn be a sequence of integers (we call them times) which tends to infinity. A subset S F (t n ) of (D C 1)dimensional Euclid space represents a space-time pattern until time t n  1:  ˚ S F (t n ) D (t; i)s.t.F t (e1 ) i ¤ 0; t < t n : A W-limit set for F is defined by lim n!1 S F (t n )/t n if the limit exists, where S F (t n )/t n is the contracted set of S F (t n ) by the rate t1n i. e. S F (t n )/t n contains the point (t/t n ; i/t n ) if and only if S F (t n ) contains the point (t; i). The limit lim n!1 S F (t n )/t n exists when lim inf n!1 S F (t n )/t n and lim supn!1 S F (t n )/t n coincide, where ( S F (t n ) lim inf D x 2 R DC1 : 8 j; n!1 tn ) S F (t j ) ; x j ! x when j ! 1 9x j 2 tj and

( ˚  S F (t n ) D x 2 R DC1 : 9 t n j ; 8 j; 9x n j lim sup tn n!1 2

S F (t n j ) tn j

)

; x n j ! x when j ! 1 ;

˚  for a subsequence t n j of ft n g. For the particular case of linear CA, the W-limit set always exists [30,31,32,56]. In the last ten years, the Wlimit set of additive CA has been extensively studied [61, 62,63]. It has been proved that for most of additive CA, it has interesting dimensional properties which completely characterize the set of quiescent configurations [56]. Here we link dimension properties of a W-limit set with chaotic properties. Correlating dimensional properties of invariant sets to dynamical properties has become during the years a fruitful source of new understanding [52]. Let X be a metric space. The Hausdorff dimension DH of V X is defined as: 

 X  h jU i j D 1 D H (V) D sup h 2 Rj lim inf !0

where the infimum is taken over all countable coverings U i of V such that the diameter jU i jof each U i is less than  (for more on Hausdorff dimension as well as other definitions of fractal dimension see [24]). Given a CA F we denote DH (F) the Hausdorff dimension of its W-limit set. Proposition 7 ([25,47]) Consider a linear CA F over Z p k where p is a prime number. If 1 < D H (F) < 2 then F is strongly transitive.

487

488

Chaotic Behavior of Cellular Automata

The converse relation is still an open problem. It would be also an interesting research direction to find out similar notions and results for general CA. Conjecture 6 Consider a linear CA F over Z p k , where p is a prime number. If F is strongly transitive then 1 < D H (F) < 2. Future Directions In this chapter we reviewed the chaotic behavior of cellular automata. It is clear from the results seen so far that there are close similarities between the chaotic behavior of dynamical systems on the real interval and CA. To complete the picture, it remains only to prove (or disprove) Conjecture 5. Due to its apparent difficulty, this problem promises to keep researchers occupied for some years yet. The study of the decidability of chaotic properties like expansively, transitivity, mixing etc. is another research direction which should be further addressed in the near future. It seems that new ideas are necessary since the proof techniques used up to now have been revealed as unsuccessful. The solution to these problems will be a source of new understanding and will certainly produce new results in connected fields. Finally, remark that most of the results on the chaotic behavior of CA are concerned with dimension one. A lot of work should be done to verify what happens in higher dimensions. Acknowledgments This work has been supported by the Interlink/MIUR project “Cellular Automata: Topological Properties, Chaos and Associated Formal Languages”, by the ANR Blanc Project “Sycomore” and by the PRIN/MIUR project “Formal Languages and Automata: Mathematical and Applicative Aspects”. Bibliography Primary Literature 1. Acerbi L, Dennunzio A, Formenti E (2007) Shifting and lifting of cellular automata. In: Third Conference on Computability in Europe, CiE 2007, Siena, Italy, 18–23 June 2007. Lecture Notes in Computer Science, vol 4497. Springer, Berlin, pp 1–10 2. Adler R, Konheim A, McAndrew J (1965) Topological entropy. Trans Amer Math Soc 114:309–319 3. Akin E, Auslander E, Berg K (1996) When is a transitive map chaotic? In: Bergelson V, March P, Rosenblatt J (eds) Convergence in Ergodic Theory and Probability. de Gruyter, Berlin, pp 25–40

4. Amoroso S, Patt YN (1972) Decision procedures for surjectivity and injectivity of parallel maps for tesselation structures. J Comp Syst Sci 6:448–464 5. Auslander J, Yorke JA (1980) Interval maps, factors of maps and chaos. Tohoku Math J 32:177–188 6. Banks J, Brooks J, Cairns G, Davis G, Stacey P (1992) On Devaney’s definition of chaos. Am Math Mon 99:332–334 7. Blanchard F, Maass A (1997) Dynamical properties of expansive one-sided cellular automata. Israel J Math 99:149–174 8. Blanchard F, Tisseur P (2000) Some properties of cellular automata with equicontinuity points. Ann Inst Henri Poincaré, Probabilité et Statistiques 36:569–582 9. Blanchard F, Kurka ˚ P, Maass A (1997) Topological and measuretheoretic properties of one-dimensional cellular automata. Physica D 103:86–99 10. Blanchard F, Formenti E, Kurka ˚ K (1998) Cellular automata in the Cantor, Besicovitch and Weyl topological spaces. Complex Syst 11:107–123 11. Blanchard F, Cervelle J, Formenti E (2005) Some results about chaotic behavior of cellular automata. Theor Comp Sci 349:318–336 12. Blanchard F, Glasner E, Kolyada S, Maass A (2002) On Li-Yorke pairs. J Reine Angewandte Math 547:51–68 13. Boyle M, Kitchens B (1999) Periodic points for cellular automata. Indag Math 10:483–493 14. Boyle M, Maass A (2000) Expansive invertible one-sided cellular automata. J Math Soc Jpn 54(4):725–740 15. Cattaneo G, Formenti E, Margara L, Mazoyer J (1997) A Shift-invariant Metric on SZ Inducing a Non-trivial Topology. In: Mathmatical Foundations of Computer Science 1997. Lecture Notes in Computer Science, vol 1295. Springer, Berlin, pp 179–188 16. Cattaneo G, Finelli M, Margara L (2000) Investigating topological chaos by elementary cellular automata dynamics. Theor Comp Sci 244:219–241 17. Cattaneo G, Formenti E, Manzini G, Margara L (2000) Ergodicity, transitivity, and regularity for linear cellular automata. Theor Comp Sci 233:147–164. A preliminary version of this paper has been presented to the Symposium of Theoretical Computer Science (STACS’97). LNCS, vol 1200 18. Cattaneo G, Dennunzio A, Margara L (2002) Chaotic subshifts and related languages applications to one-dimensional cellular automata. Fundam Inform 52:39–80 19. Cattaneo G, Dennunzio A, Margara L (2004) Solution of some conjectures about topological properties of linear cellular automata. Theor Comp Sci 325:249–271 20. D’Amico M, Manzini G, Margara L (2003) On computing the entropy of cellular automata. Theor Comp Sci 290:1629–1646 21. Denker M, Grillenberger C, Sigmund K (1976) Ergodic Theory on Compact Spaces.Lecture Notes in Mathematics, vol 527. Springer, Berlin 22. Devaney RL (1989) An Introduction to chaotic dynamical systems, 2nd edn. Addison-Wesley, Reading 23. Durand B, Formenti E, Varouchas G (2003) On undecidability of equicontinuity classification for cellular automata. Discrete Mathematics and Theoretical Computer Science, vol AB. pp 117–128 24. Edgar GA (1990) Measure, topology and fractal geometry. Undergraduate texts in Mathematics. Springer, New York 25. Formenti E (2003) On the sensitivity of additive cellular automata in Besicovitch topologies. Theor Comp Sci 301(1– 3):341–354

Chaotic Behavior of Cellular Automata

26. Formenti E, Grange A (2003) Number conserving cellular automata II: dynamics. Theor Comp Sci 304(1–3):269–290 27. Furstenberg H (1967) Disjointness in ergodic theory, minimal sets, and a problem in diophantine approximation. Math Syst Theor (now Theor Comp Syst) 1(1):1–49 28. Glasner E, Weiss B (1993) Sensitive dependence on initial condition. Nonlinearity 6:1067–1075 29. Guckenheimer J (1979) Sensitive dependence to initial condition for one-dimensional maps. Commun Math Phys 70:133– 160 30. Haeseler FV, Peitgen HO, Skordev G (1992) Linear cellular automata, substitutions, hierarchical iterated system. In: Fractal geometry and Computer graphics. Springer, Berlin 31. Haeseler FV, Peitgen HO, Skordev G (1993) Multifractal decompositions of rescaled evolution sets of equivariant cellular automata: selected examples. Technical report, Institut für Dynamische Systeme, Universität Bremen 32. Haeseler FV, Peitgen HO, Skordev G (1995) Global analysis of self-similarity features of cellular automata: selected examples. Physica D 86:64–80 33. Hedlund GA (1969) Endomorphism and automorphism of the shift dynamical system. Math Sy Theor 3:320–375 34. Hurd LP, Kari J, Culik K (1992) The topological entropy of cellular automata is uncomputable. Ergodic. Th Dyn Sy 12:255–265 35. Hurley M (1990) Ergodic aspects of cellular automata. Ergod Theor Dyn Sy 10:671–685 36. Ito M, Osato N, Nasu M (1983) Linear cellular automata over zm . J Comp Sy Sci 27:127–140 37. IV Assaf D, Gadbois S (1992) Definition of chaos. Am Math Mon 99:865 38. Kannan V, Nagar A (2002) Topological transitivity for discrete dynamical systems. In: Misra JC (ed) Applicable Mathematics in Golden Age. Narosa Pub, New Dehli 39. Kari J (1994) Reversibility and surjectivity problems of cellular automata. J Comp Sy Sci 48:149–182 40. Kari J (1994) Rice’s theorem for the limit set of cellular automata. Theor Comp Sci 127(2):229–254 41. Knudsen C (1994) Chaos without nonperiodicity. Am Math Mon 101:563–565 42. Kolyada S, Snoha L (1997) Some aspect of topological transitivity – a survey. Grazer Mathematische Berichte 334:3–35 43. Kurka ˚ P (1997) Languages, equicontinuity and attractors in cellular automata. Ergo Theor Dyn Sy 17:417–433 44. Kurka ˚ P (2004) Topological and Symbolic Dynamics, vol 11 of Cours Spécialisés. Société Mathématique de France, Paris 45. Di Lena P (2006) Decidable properties for regular cellular automata. In: Navarro G, Bertolossi L, Koliayakawa Y (eds) Proceedings of Fourth IFIP International Conference on Theoretical Computer Science, pp 185–196. Springer, Santiago de Chile 46. Li TY, Yorke JA (1975) Period three implies chaos. Am Math Mon 82:985–992 47. Manzini G, Margara L (1999) A complete and efficiently computable topological classification of D-dimensional linear cellular automata over Z m . Theor Comp Sci 221(1–2):157–177 48. Margara L (1999) On some topological properties of linear cellular automata. In: Kutylowski M, Pacholski L, Wierzbicki T (eds) Mathematical Foundations of Computer Science 1999

49. 50. 51.

52.

53.

54.

55.

56. 57. 58. 59. 60. 61. 62. 63. 64.

(MFCS99). Lecture Notes in Computer Science, vol 1672. Springer, Berlin, pp 209–219 Moothathu TKS (2005) Homogenity of surjective cellular automata. Discret Contin Dyn Syst 13:195–202 Morris G, Ward T (1998) Entropy bounds for endomorphisms commuting with k actions. Israel J Math 106:1–12 Nasu M (1995) Textile Systems for Endomorphisms and automorphisms of the shift, vol 114 of Memoires of the American Mathematical Society. American Mathematical Society, Providence Pesin YK (1997) Dimension Theory in Dynamical Systems. Chicago Lectures in Mathematics. The University of Chicago Press, Chicago Shereshevsky MA (1993) Expansiveness, entropy and polynomial growth for groups acting on subshifts by automorphisms. Indag Math 4:203–210 Shereshevsky MA, Afraimovich VS (1993) Bipermutative cellular automata are topologically conjugate to the one-sided Bernoulli shift. Random Comput Dynam 1(1):91–98 Sutner K (1999) Linear cellular automata and de Bruijn automata. In: Delorme M, Mazoyer J (eds) Cellular Automata, a Parallel Model, number 460 in Mathematics and Its Applications. Kluwer, Dordrecht Takahashi S (1992) Self-similarity of linear cellular automata. J Comp Syst Sci 44:114–140 Theyssier G (2007) Personal communication Vellekoop M, Berglund R (1994) On intervals, transitivity = chaos. Am Math Mon 101:353–355 Walters P (1982) An Introduction to Ergodic Theory. Springer, Berlin Weiss B (1971) Topological transitivity and ergodic measures. Math Syst Theor 5:71–5 Willson S (1984) Growth rates and fractional dimensions in cellular automata. Physica D 10:69–74 Willson S (1987) Computing fractal dimensions for additive cellular automata. Physica D 24:190–206 Willson S (1987) The equality of fractional dimensions for certain cellular automata. Physica D 24, 179–189 Wolfram S (1986) Theory and Applications of Cellular Automata. World Scientific, Singapore

Books and Reviews Akin E (1993) The general topology of dynamical systems. Graduate Stud. Math 1 Am Math Soc, Providence Akin E, Kolyada S (2003) Li-Yorke sensitivity. Nonlinearity 16:1421– 1433 Block LS, Coppel WA (1992) Dynamics in One Dymension. Springer, Berlin Katok A, Hasselblatt B (1995) Introduction to the Modern Theory of Dynamical Systems. Cambridge University Press, Cambridge Kitchens PB (1997) Symbolic dynamics: One-Sided, Two-Sided and Countable State Markov Shifts. Universitext. Springer, Berlin Kolyada SF (2004) Li-yorke sensitivity and other concepts of chaos. Ukr Math J 56(8):1242–1257 Lind D, Marcus B (1995) An Introduction to Symbolic Dynamics and Coding. Cambidge University Press, Cambidge

489

490

Community Structure in Graphs

Community Structure in Graphs SANTO FORTUNATO1 , CLAUDIO CASTELLANO2 1 Complex Networks Lagrange Laboratory (CNLL), ISI Foundation, Torino, Italy 2 SMC, INFM-CNR and Dipartimento di Fisica, “Sapienza” Università di Roma, Roma, Italy Article Outline Glossary Definition of the Subject Introduction Elements of Community Detection Computer Science: Graph Partitioning Social Science: Hierarchical and k-Means Clustering New Methods Testing Methods The Mesoscopic Description of a Graph Future Directions Bibliography Glossary Graph A graph is a set of elements, called vertices or nodes, where pairs of vertices are connected by relational links, or edges. A graph can be considered as the simplest representation of a complex system, where the vertices are the elementary units of the system and the edges represent their mutual interactions. Community A community is a group of graph vertices that “belong together” according to some precisely defined criteria which can be measured. Many definitions have been proposed. A common approach is to define a community as a group of vertices such that the density of edges between vertices of the group is higher than the average edge density in the graph. In the text also the terms module or cluster are used when referring to a community. Partition A partition is a split of a graph in subsets with each vertex assigned to only one of them. This last condition may be relaxed to include the case of overlapping communities, imposing that each vertex is assigned to at least one subset. Dendrogram A dendrogram, or hierarchical tree, is a branching diagram representing successive divisions of a graph into communities. Dendrograms are frequently used in social network analysis and computational biology, especially in biological taxonomy. Scalability Scalability expresses the computational complexity of an algorithm. If the running time of a com-

munity detection algorithm, working on a graph with n vertices and m edges, is proportional to the product n˛ mˇ , one says that the algorithm scales as O(n˛ mˇ ). Knowing the scalability allows to estimate the range of applicability of an algorithm. Definition of the Subject Graph vertices are often organized into groups that seem to live fairly independently of the rest of the graph, with which they share but a few edges, whereas the relationships between group members are stronger, as shown by the large number of mutual connections. Such groups of vertices, or communities, can be considered as independent compartments of a graph. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. The task is very hard, though, both conceptually, due to the ambiguity in the definition of community and in the discrimination of different partitions and practically, because algorithms must find “good” partitions among an exponentially large number of them. Other complications are represented by the possible occurrence of hierarchies, i. e. communities which are nested inside larger communities, and by the existence of overlaps between communities, due to the presence of nodes belonging to more groups. All these aspects are dealt with in some detail and many methods are described, from traditional approaches used in computer science and sociology to recent techniques developed mostly within statistical physics. Introduction The origin of graph theory dates back to Euler’s solution [1] of the puzzle of Königsberg’s bridges in 1736. Since then a lot has been learned about graphs and their mathematical properties [2]. In the 20th century they have also become extremely useful as representation of a wide variety of systems in different areas. Biological, social, technological, and information networks can be studied as graphs, and graph analysis has become crucial to understand the features of these systems. For instance, social network analysis started in the 1930s and has become one of the most important topics in sociology [3,4]. In recent times, the computer revolution has provided scholars with a huge amount of data and computational resources to process and analyze these data. The size of real networks one can potentially handle has also grown considerably, reaching millions or even billions of vertices. The need to deal with such a large number of units has produced a deep change in the way graphs are approached [5,6,7,8,9].

Community Structure in Graphs

Community Structure in Graphs, Figure 1 A simple graph with three communities, highlighted by the dashed circles

Real networks are not random graphs. The random graph, introduced by P. Erdös and A. Rényi [10], is the paradigm of a disordered graph: in it, the probability of having an edge between a pair of vertices is equal for all possible pairs. In a random graph, the distribution of edges among the vertices is highly homogeneous. For instance, the distribution of the number of neighbors of a vertex, or degree, is binomial, so most vertices have equal or similar degree. In many real networks, instead, there are big inhomogeneities, revealing a high level of order and organization. The degree distribution is broad, with a tail that often follows a power law: therefore, many vertices with low degree coexist with some vertices with large degree. Furthermore, the distribution of edges is not only globally, but also locally inhomogeneous, with high concentrations of edges within special groups of nodes, and low concentrations between these groups. This feature of real networks is called community structure and is the topic of this chapter. In Fig. 1 a schematic example of a graph with community structure is shown. Communities are groups of vertices which probably share common properties and/or play similar roles within the graph. So, communities may correspond to groups of pages of the World Wide Web dealing with related topics [11], to functional modules such as cycles and pathways in metabolic networks [12,13], to groups of related individuals in social networks [14,15], to compartments in food webs [16,17], and so on.

Community detection is important for other reasons, too. Identifying modules and their boundaries allows for a classification of vertices, according to their topological position in the modules. So, vertices with a central position in their clusters, i. e. sharing a large number of edges with the other group partners, may have an important function of control and stability within the group; vertices lying at the boundaries between modules play an important role of mediation and lead the relationships and exchanges between different communities. Such classification seems to be meaningful in social [18,19,20] and metabolic networks [12]. Finally, one can study the graph where vertices are the communities and edges are set between modules if there are connections between some of their vertices in the original graph and/or if the modules overlap. In this way one attains a coarse-grained description of the original graph, which unveils the relationships between modules. Recent studies indicate that networks of communities have a different degree distribution with respect to the full graphs [13]; however, the origin of their structures can be explained by the same mechanism [21]. The aim of community detection in graphs is to identify the modules only based on the topology. The problem has a long tradition and it has appeared in various forms in several disciplines. For instance, in parallel computing it is crucial to know what is the best way to allocate tasks to processors so as to minimize the communications between them and enable a rapid performance of the calculation. This can be accomplished by splitting the computer cluster into groups with roughly the same number of processors, such that the number of physical connections between processors of different groups is minimal. The mathematical formalization of this problem is called graph partitioning. The first algorithms for graph partitioning were proposed in the early 1970s. Clustering analysis is also an important aspect in the study of social networks. The most popular techniques are hierarchical clustering and k-means clustering, where vertices are joined into groups according to their mutual similarity. In a seminal paper, Girvan and Newman proposed a new algorithm, aiming at the identification of edges lying between communities and their successive removal, a procedure that after a few iterations leads to the isolation of modules [14]. The intercommunity edges are detected according to the values of a centrality measure, the edge betweenness, that expresses the importance of the role of the edges in processes where signals are transmitted across the graph following paths of minimal length. The paper triggered a big activity in the field, and many new methods have been proposed in the last years. In particu-

491

492

Community Structure in Graphs

lar, physicists entered the game, bringing in their tools and techniques: spin models, optimization, percolation, random walks, synchronization, etc., became ingredients of new original algorithms. Earlier reviews of the topic can be found in [22,23]. Section “Elements of Community Detection” is about the basic elements of community detection, starting from the definition of community. The classical problem of graph partitioning and the methods for clustering analysis in sociology are presented in Sect. “Computer Science: Graph Partitioning” and “Social Science: Hierarchical and k-Means Clustering”, respectively. Section “New Methods” is devoted to a description of the new methods. In Sect. “Testing Methods” the problem of testing algorithms is discussed. Section “The Mesoscopic Description of a Graph” introduces the description of graphs at the level of communities. Finally, Sect. “Future Directions” highlights the perspectives of the field and sorts out promising research directions for the future. This chapter makes use of some basic concepts of graph theory, that can be found in any introductory textbook, like [2]. Some of them are briefly explained in the text. Elements of Community Detection The problem of community detection is, at first sight, intuitively clear. However, when one needs to formalize it in detail things are not so well defined. In the intuitive concept some ambiguities are hidden and there are often many equally legitimate ways of resolving them. Hence the term “Community Detection” actually indicates several rather different problems. First of all, there is no unique way of translating into a precise prescription the intuitive idea of community. Many possibilities exist, as discussed below. Some of these possible definitions allow for vertices to belong to more than one community. It is then possible to look for overlapping or nonoverlapping communities. Another ambiguity has to do with the concept of community structure. It may be intended as a single partition of the graph or as a hierarchy of partitions, at different levels of coarse-graining. There is then a problem of comparison. Which one is the best partition (or the best hierarchy)? If one could, in principle, analyze all possible partitions of a graph, one would need a sensible way of measuring their “quality” to single out the partitions with the strongest community structure. It may even occur that one graph has no community structure and one should be able to realize it. Finding a good method for comparing partitions is not a trivial task and different choices are possible. Last but not least,

the number of possible partitions grows faster than exponentially with the graph size, so that, in practice, it is not possible to analyze them all. Therefore one has to devise smart methods to find ‘good’ partitions in a reasonable time. Again, a very hard problem. Before introducing the basic concepts and discussing the relevant questions it is important to stress that the identification of topological clusters is possible only if the graphs are sparse, i. e. if the number of edges m is of the order of the number of nodes n of the graph. If m  n, the distribution of edges among the nodes is too homogeneous for communities to make sense. Definition of Community The first and foremost problem is how to define precisely what a community is. The intuitive notion presented in the Introduction is related to the comparison of the number of edges joining vertices within a module (“intracommunity edges”) with the number of edges joining vertices of different modules (“intercommunity edges”). A module is characterized by a larger density of links “inside” than “outside”. This notion can be however formalized in many ways. Social network analysts have devised many definitions of subgroups with various degrees of internal cohesion among vertices [3,4]. Many other definitions have been introduced by computer scientists and physicists. In general, the definitions can be classified in three main categories.  Local definitions. Here the attention is focused on the vertices of the subgraph under investigation and on its immediate neighborhood, disregarding the rest of the graph. These prescriptions come mostly from social network analysis and can be further subdivided in selfreferring, when one considers the subgraph alone, and comparative, when the mutual cohesion of the vertices of the subgraph is compared with their cohesion with the external neighbors. Self-referring definitions identify classes of subgraphs like cliques, n-cliques, k-plexes, etc. They are maximal subgraphs, which cannot be enlarged with the addition of new vertices and edges without losing the property which defines them. The concept of clique is very important and often recurring when one studies graphs. A clique is a maximal subgraph where each vertex is adjacent to all the others. In the literature it is common to call cliques also non-maximal subgraphs. Triangles are the simplest cliques, and are frequent in real networks. Larger cliques are rare, so they are not good models of communities. Besides, finding cliques is computationally very demanding: the Bron–Kerbosch method [24] runs in a time growing

Community Structure in Graphs

exponentially with the size of the graph. The definition of clique is very strict. A softer constraint is represented by the concept of n-clique, which is a maximal subgraph such that the distance of each pair of its vertices is not larger than n. A k-plex is a maximal subgraph such that each vertex is adjacent to all the others except at most k of them. In contrast, a k-core is a maximal subgraph where each vertex is adjacent to at least k vertices within the subgraph. Comparative definitions include that of LS set, or strong community, and that of weak community. An LS set is a subgraph where each node has more neighbors inside than outside the subgraph. Instead, in a weak community, the total degree of the nodes inside the community exceeds the external total degree, i. e. the number of links lying between the community and the rest of the graph. LS sets are also weak communities, but the inverse is not true, in general. The notion of weak community was introduced by Radicchi et al. [25].  Global definitions. Communities are structural units of the graph, so it is reasonable to think that their distinctive features can be recognized if one analyzes a subgraph with respect to the graph as a whole. Global definitions usually start from a null model, i. e. a graph which matches the original in some of its topological features, but which does not display community structure. After that, the linking properties of subgraphs of the initial graph are compared with those of the corresponding subgraphs in the null model. The simplest way to design a null model is to introduce randomness in the distribution of edges among the vertices. A random graph à la Erdös–Rényi, for instance, has no community structure, as any two vertices have the same probability to be adjacent, so there is no preferential linking involving special groups of vertices. The most popular null model is that proposed by Newman and Girvan and consists of a randomized version of the original graph, where edges are rewired at random, under the constraint that each vertex keeps its degree [26]. This null model is the basic concept behind the definition of modularity, a function which evaluates the goodness of partitions of a graph into modules (see Sect. “Evaluating Partitions: Quality Functions”). Here a subset of vertices is a community if the number of edges inside the subset exceeds the expected number of internal edges that the subset would have in the null model. A more general definition, where one counts small connected subgraphs (motifs), and not necessarily edges, can be found in [27]. A general class of null models, including that of modularity, has been designed by Reichardt and Bornholdt [28].

 Definitions based on vertex similarity. In this last category, communities are groups of vertices which are similar to each other. A quantitative criterion is chosen to evaluate the similarity between each pair of vertices, connected or not. The criterion may be local or global: for instance one can estimate the distance between a pair of vertices. Similarities can be also extracted from eigenvector components of special matrices, which are usually close in value for vertices belonging to the same community. Similarity measures are at the basis of the method of hierarchical clustering, to be discussed in Sect. “Social Science: Hierarchical and k-Means Clustering”. The main problem in this case is the need to introduce an additional criterion to select meaningful partitions. It is worth remarking that, in spite of the wide variety of definitions, in many detection algorithms communities are not defined at all, but are a byproduct of the procedure. This is the case of the divisive algorithms described in Sect. “Divisive Algorithms” and of the dynamic algorithms of Sect. “Dynamic Algorithms”. Evaluating Partitions: Quality Functions Strictly speaking, a partition of a graph in communities is a split of the graph in clusters, with each vertex assigned to only one cluster. The latter condition may be relaxed, as shown in Sect. “Overlapping Communities”. Whatever the definition of community is, there is usually a large number of possible partitions. It is then necessary to establish which partitions exhibit a real community structure. For that, one needs a quality function, i. e. a quantitative criterion to evaluate how good a partition is. The most popular quality function is the modularity of Newman and Girvan [26]. It can be written in several ways, as

ki k j 1 X QD (1) ı(C i ; C j ) ; Ai j  2m 2m ij

where the sum runs over all pairs of vertices, A is the adjacency matrix, ki the degree of vertex i and m the total number of edges of the graph. The element Aij of the adjacency matrix is 1 if vertices i and j are connected, otherwise it is 0. The ı-function yields one if vertices i and j are in the same community, zero otherwise. Because of that, the only contributions to the sum come from vertex pairs belonging to the same cluster: by grouping them together the sum over the vertex pairs can be rewritten as a sum over the modules "

# nm X ds 2 ls QD : (2)  m 2m sD1

493

494

Community Structure in Graphs

Here, nm is the number of modules, ls the total number of edges joining vertices of module s and ds the sum of the degrees of the vertices of s. In Eq. (2), the first term of each summand is the fraction of edges of the graph inside the module, whereas the second term represents the expected fraction of edges that would be there if the graph were a random graph with the same degree for each vertex. In such a case, a vertex could be attached to any other vertex of the graph, and the probability of a connection between two vertices is proportional to the product of their degrees. So, for a vertex pair, the comparison between real and expected edges is expressed by the corresponding summand of Eq. (1). Equation (2) embeds an implicit definition of community: a subgraph is a module if the number of edges inside it is larger than the expected number in modularity’s null model. If this is the case, the vertices of the subgraph are more tightly connected than expected. Basically, if each summand in Eq. (2) is non-negative, the corresponding subgraph is a module. Besides, the larger the difference between real and expected edges, the more “modular” the subgraph. So, large positive values of Q are expected to indicate good partitions. The modularity of the whole graph, taken as a single community, is zero, as the two terms of the only summand in this case are equal and opposite. Modularity is always smaller than one, and can be negative as well. For instance, the partition in which each vertex is a community is always negative. This is a nice feature of the measure, implying that, if there are no partitions with positive modularity, the graph has no community structure. On the contrary, the existence of partitions with large negative modularity values may hint to the existence of subgroups with very few internal edges and many edges lying between them (multipartite structure). Modularity has been employed as a quality function in many algorithms, like some of the divisive algorithms of Sect. “Divisive Algorithms”. In addition, modularity optimization is itself a popular method for community detection (see Sect. “Modularity Optimization”). Modularity also allows to assess the stability of partitions [29] and to transform a graph into a smaller one by preserving its community structure [30]. However, there are some caveats on the use of the measure. The most important concerns the value of modularity for a partition. For which values one can say that there is a clear community structure in a graph? The question is tricky: if two graphs have the same type of modular structure, but different sizes, modularity will be larger for the larger graph. So, modularity values cannot be compared for different graphs. Moreover, one would expect that partitions of random graphs will have modularity values close

to zero, as no community structure is expected there. Instead, it has been shown that partitions of random graphs may attain fairly large modularity values, as the probability that the distribution of edges on the vertices is locally inhomogeneous in specific realizations is not negligible [31]. Finally, a recent analysis has proved that modularity increases if subgraphs smaller than a characteristic size are merged [32]. This fact represents a serious bias when one looks for communities via modularity optimization and is discussed in more detail in Sect. “Modularity Optimization”. Hierarchies Graph vertices can have various levels of organization. Modules can display an internal community structure, i. e. they can contain smaller modules, which can in turn include other modules, and so on. In this case one says that the graph is hierarchical (see Fig. 2). For a clear classification of the vertices and their roles inside a graph, it is important to find all modules of the graph as well as their hierarchy. A natural way to represent the hierarchical structure of a graph is to draw a dendrogram, like the one illustrated in Fig. 3. Here, partitions of a graph with twelve vertices are shown. At the bottom, each vertex is its own module. By moving upwards, groups of vertices are successively aggregated. Merges of communities are represented by horizontal lines. The uppermost level represents the whole graph as a single community. Cutting the diagram horizontally at some height, as shown in the figure (dashed

Community Structure in Graphs, Figure 2 Schematic example of a hierarchical graph. Sixteen modules with four vertices each are clearly organized in groups of four

Community Structure in Graphs

Community Structure in Graphs, Figure 3 A dendrogram, or hierarchical tree. Horizontal cuts correspond to partitions of the graph in communities. Reprinted figure with permission from [26]

line), displays one level of organization of the graph vertices. The diagram is hierarchical by construction: each community belonging to a level is fully included in a community at a higher level. Dendrograms are regularly used in sociology and biology. The technique of hierarchical clustering, described in Sect. “Social Science: Hierarchical and k-Means Clustering”, lends itself naturally to this kind of representation.

Community Structure in Graphs, Figure 4 Overlapping communities. In the partition highlighted by the dashed contours, some vertices are shared between more groups

Overlapping Communities As stated in Sect. “Evaluating Partitions: Quality Functions”, in a partition each vertex is generally attributed only to one module. However, vertices lying at the boundary between modules may be difficult to assign to one module or another, based on their connections with the other vertices. In this case, it makes sense to consider such intermediate vertices as belonging to more groups, which are then called overlapping communities (Fig. 4). Many real networks are characterized by a modular structure with sizeable overlaps between different clusters. In social networks, people usually belong to more communities, according to their personal life and interests: for instance a person may have tight relationships both with the people of its working environment and with other individuals involved in common free time activities. Accounting for overlaps is also a way to better exploit the information that one can derive from topology. Ideally, one could estimate the degree of participation of a vertex in different communities, which corresponds to the likelihood that the vertex belongs to the various groups. Community detection algorithms, instead, often disagree in the classification of peripheral vertices of modules, because they are forced to put them in a single cluster, which may be the wrong one. The problem of community detection is so hard that very few algorithms consider the possibility of hav-

ing overlapping communities. An interesting method has been recently proposed by G. Palla et al. [13] and is described in Sect. “Clique Percolation”. For standard algorithms, the problem of identifying overlapping vertices could be addressed by checking for the stability of partitions against slight variations in the structure of the graph, as described in [33]. Computer Science: Graph Partitioning The problem of graph partitioning consists in dividing the vertices in g groups of predefined size, such that the number of edges lying between the groups is minimal. The number of edges running between modules is called cut size. Figure 5 presents the solution of the problem for a graph with fourteen vertices, for g D 2 and clusters of equal size. The specification of the number of modules of the partition is necessary. If one simply imposed a partition with the minimal cut size, and left the number of modules free, the solution would be trivial, corresponding to all vertices ending up in the same module, as this would yield a vanishing cut size. Graph partitioning is a fundamental issue in parallel computing, circuit partitioning and layout, and in the design of many serial algorithms, including techniques to

495

496

Community Structure in Graphs

Community Structure in Graphs, Figure 5 Graph partitioning. The cut shows the partition in two groups of equal size

solve partial differential equations and sparse linear systems of equations. Most variants of the graph partitioning problem are NP-hard, i. e. it is unlikely that the solution can be computed in a time growing as a power of the graph size. There are however several algorithms that can do a good job, even if their solutions are not necessarily optimal [34]. Most algorithms perform a bisection of the graph, which is already a complex task. Partitions into more than two modules are usually attained by iterative bisectioning. The Kernighan–Lin algorithm [35] is one of the earliest methods proposed and is still frequently used, often in combination with other techniques. The authors were motivated by the problem of partitioning electronic circuits onto boards: the nodes contained in different boards need to be linked to each other with the least number of connections. The procedure is an optimization of a benefit function Q, which represents the difference between the number of edges inside the modules and the number of edges lying between them. The starting point is an initial partition of the graph in two clusters of the predefined size: such initial partition can be random or suggested by some information on the graph structure. Then, subsets consisting of equal numbers of vertices are swapped between the two groups, so that Q has the maximal increase. To reduce the risk to be trapped in local maxima of Q, the procedure includes some swaps that decrease the function Q. After a series of swaps with positive and negative gains, the partition with the largest value of Q is selected and used as starting point of a new series of iterations. The Kernighan–

Lin algorithm is quite fast, scaling as O(n2 ) in worst-case time, n being the number of vertices. The partitions found by the procedure are strongly dependent on the initial configuration and other algorithms can do better. However, the method is used to improve on the partitions found through other techniques, by using them as starting configurations for the algorithm. Another popular technique is the spectral bisection method, which is based on the properties of the Laplacian matrix. The Laplacian matrix (or simply Laplacian) of a graph is obtained from the adjacency matrix A by placing on the diagonal the degrees of the vertices and by changing the signs of the other elements. The Laplacian has all non-negative eigenvalues and at least one zero eigenvalue, as the sum of the elements of each row and column of the matrix is zero. If a graph is divided into g connected components, the Laplacian would have g degenerate eigenvectors with eigenvalue zero and can be written in block-diagonal form, i. e. the vertices can be ordered in such a way that the Laplacian displays g square blocks along the diagonal, with entries different from zero, whereas all other elements vanish. Each block is the Laplacian of the corresponding subgraph, so it has the trivial eigenvector with components (1; 1; 1; : : : ; 1; 1). Therefore, there are g degenerate eigenvectors with equal non-vanishing components in correspondence of the vertices of a block, whereas all other components are zero. In this way, from the components of the eigenvectors one can identify the connected components of the graph. If the graph is connected, but consists of g subgraphs which are weakly linked to each other, the spectrum will have one zero eigenvalue and g  1 eigenvalues which are close to zero. If the groups are two, the second lowest eigenvalue will be close to zero and the corresponding eigenvector, also called Fiedler vector, can be used to identify the two clusters as shown below. Every partition of a graph with n vertices in two groups can be represented by an index vector s, whose component s i is C1 if vertex i is in one group and 1 if it is in the other group. The cut size R of the partition of the graph in the two groups can be written as 1 R D sT Ls ; (3) 4 where L is the Laplacian matrix and sT the transpose of P vector s. Vector s can be written as s D i a i v i , where v i , i D 1; : : : ; n are the eigenvectors of the Laplacian. If s is properly normalized, then X a2i  i ; (4) RD i

where  i is the Laplacian eigenvalue corresponding to

Community Structure in Graphs

eigenvector v i . It is worth remarking that the sum contains at most n  1 terms, as the Laplacian has at least one zero eigenvalue. Minimizing R equals to the minimization of the sum on the right-hand side of Eq. (4). This task is still very hard. However, if the second lowest eigenvector 2 is close enough to zero, a good approximation of the minimum can be attained by choosing s parallel to the Fiedler vector v2 : this would reduce the sum to 2 , which is a small number. But the index vector cannot be perfectly parallel to v2 by construction, because all its components are equal in modulus, whereas the components of v2 are not. The best one can do is to match the signs of the components. So, one can set s i D C1(1) if v2i > 0 (< 0). It may happen that the sizes of the two corresponding groups do not match the predefined sizes one wishes to have. In this case, if one aims at a split in n1 and n2 D n  n1 vertices, the best strategy is to order the components of the Fiedler vector from the lowest to the largest values and to put in one group the vertices corresponding to the first n1 components from the top or the bottom, and the remaining vertices in the second group. If there is a discrepancy between n1 and the number of positive or negative components of v2 , this procedure yields two partitions: the better solution is the one that gives the smallest cut size. The spectral bisection method is quite fast. The first eigenvectors of the Laplacian can be computed by using the Lanczos method [36], that scales as m/(3  2 ), where m is the number of edges of the graph. If the eigenvalues 2 and 3 are well separated, the running time of the algorithm is much shorter than the time required to calculate the complete set of eigenvectors, which scales as O(n3 ). The method gives in general good partitions, that can be further improved by applying the Kernighan–Lin algorithm. Other methods for graph partitioning include levelstructure partitioning, the geometric algorithm, multilevel algorithms, etc. A good description of these algorithms can be found in [34]. Graph partitioning algorithms are not good for community detection, because it is necessary to provide as input both the number of groups and their size, about which in principle one knows nothing. Instead, one would like an algorithm capable to produce this information in its output. Besides, using iterative bisectioning to split the graph in more pieces is not a reliable procedure. Social Science: Hierarchical and k-Means Clustering In social network analysis, one partitions actors/vertices in clusters such that actors in the same cluster are more similar between themselves than actors of different clusters.

The two most used techniques to perform clustering analysis in sociology are hierarchical clustering and k-means clustering. The starting point of hierarchical clustering is the definition of a similarity measure between vertices. After a measure is chosen, one computes the similarity for each pair of vertices, no matter if they are connected or not. At the end of this process, one is left with a new n  n matrix X, the similarity matrix. Initially, there are n groups, each containing one of the vertices. At each step, the two most similar groups are merged; the procedure continues until all vertices are in the same group. There are different ways to define the similarity between groups out of the matrix X. In single linkage clustering, the similarity between two groups is the minimum element xij , with i in one group and j in the other. On the contrary, the maximum element xij for vertices of different groups is used in the procedure of complete linkage clustering. In average linkage clustering one has to compute the average of the xij . The procedure can be better illustrated by means of dendrograms, like the one in Fig. 3. One should note that hierarchical clustering does not deliver a single partition, but a set of partitions. There are many possible ways to define a similarity measure for the vertices based on the topology of the network. A possibility is to define a distance between vertices, like sX xi j D (A i k  A jk )2 : (5) k¤i; j

This is a dissimilarity measure, based on the concept of structural equivalence. Two vertices are structurally equivalent if they have the same neighbors, even if they are not adjacent themselves. If i and j are structurally equivalent, x i j D 0. Vertices with large degree and different neighbors are considered very “far” from each other. Another measure related to structural equivalence is the Pearson correlation between columns or rows of the adjacency matrix, P (A i k   i )(A jk   j ) ; (6) xi j D k n i  j P where the averages  i D ( j A i j )/n and the variances P  i D j (A i j   i )2 /n. An alternative measure is the number of edge(or vertex-) independent paths between two vertices. Independent paths do not share any edge (vertex), and their number is related to the maximum flow that can be conveyed between the two vertices under the constraint that each edge can carry only one unit of flow (max-

497

498

Community Structure in Graphs

flow/min-cut theorem). Similarly, one could consider all paths running between two vertices. In this case, there is the problem that the total number of paths is infinite, but this can be avoided if one performs a weighted sum of the number of paths, where paths of length l are weighted by the factor ˛ l , with ˛ < 1. So, the weights of long paths are exponentially suppressed and the sum converges. Hierarchical clustering has the advantage that it does not require a preliminary knowledge on the number and size of the clusters. However, it does not provide a way to discriminate between the many partitions obtained by the procedure, and to choose that or those that better represent the community structure of the graph. Moreover, the results of the method depend on the specific similarity measure adopted. Finally, it does not correctly classify all vertices of a community, and in many cases some vertices are missed even if they have a central role in their clusters [22]. Another popular clustering technique in sociology is k-means clustering [37]. Here, the number of clusters is preassigned, say k. The vertices of the graph are embedded in a metric space, so that each vertex is a point and a distance measure is defined between pairs of points in the space. The distance is a measure of dissimilarity between vertices. The aim of the algorithm is to identify k points in this space, or centroids, so that each vertex is associated to one centroid and the sum of the distances of all vertices from their respective centroids is minimal. To achieve this, one starts from an initial distribution of centroids such that they are as far as possible from each other. In the first iteration, each vertex is assigned to the nearest centroid. Next, the centers of mass of the k clusters are estimated and become a new set of centroids, which allows for a new classification of the vertices, and so on. After a sufficient number of iterations, the positions of the centroids are stable, and the clusters do not change any more. The solution found is not necessarily optimal, as it strongly depends on the initial choice of the centroids. The result can be improved by performing more runs starting from different initial conditions. The limitation of k-means clustering is the same as that of the graph partitioning algorithms: the number of clusters must be specified at the beginning, the method is not able to derive it. In addition, the embedding in a metric space can be natural for some graphs, but rather artificial for others. New Methods From the previous two sections it is clear that traditional approaches to derive graph partitions have serious limits.

The most important problem is the need to provide the algorithms with information that one would like to derive from the algorithms themselves, like the number of clusters and their size. Even when these inputs are not necessary, like in hierarchical clustering, there is the question of estimating the goodness of the partitions, so that one can pick the best one. For these reasons, there has been a major effort in the last years to devise algorithms capable of extracting a complete information about the community structure of graphs. These methods can be grouped into different categories. Divisive Algorithms A simple way to identify communities in a graph is to detect the edges that connect vertices of different communities and remove them, so that the clusters get disconnected from each other. This is the philosophy of divisive algorithms. The crucial point is to find a property of intercommunity edges that could allow for their identification. Any divisive method delivers many partitions, which are by construction hierarchical, so that they can be represented with dendrograms. Algorithm of Girvan and Newman. The most popular algorithm is that proposed by Girvan and Newman [14]. The method is also historically important, because it marked the beginning of a new era in the field of community detection. Here edges are selected according to the values of measures of edge centrality, estimating the importance of edges according to some property or process running on the graph. The steps of the algorithm are: 1. 2. 3. 4.

Computation of the centrality for all edges; Removal of edge with largest centrality; Recalculation of centralities on the running graph; Iteration of the cycle from step 2.

Girvan and Newman focused on the concept of betweenness, which is a variable expressing the frequency of the participation of edges to a process. They considered three alternative definitions: edge betweenness, currentflow betweenness and random walk betweenness. Edge betweenness is the number of shortest paths between all vertex pairs that run along the edge. It is an extension to edges of the concept of site betweenness, introduced by Freeman in 1977 [20]. It is intuitive that intercommunity edges have a large value of the edge betweenness, because many shortest paths connecting vertices of different communities will pass through them (Fig. 6). The betweenness of all edges of the graph can be calculated in a time that scales as O(mn), with techniques based on breadth-first-search [26,38].

Community Structure in Graphs

Community Structure in Graphs, Figure 6 Edge betweenness is highest for edges connecting communities. In the figure, the thick edge in the middle has a much higher betweenness than all other edges, because all shortest paths connecting vertices of the two communities run through it

Current-flow betweenness is defined by considering the graph a resistor network, with edges having unit resistance. If a voltage difference is applied between any two vertices, each edge carries some amount of current, that can be calculated by solving Kirchoff’s equations. The procedure is repeated for all possible vertex pairs: the current-flow betweenness of an edge is the average value of the current carried by the edge. Calculation of currentflow betweenness requires the inversion of an n  n matrix (once), followed by obtaining and averaging the current for all pairs of nodes. Each of these two tasks takes a time O(n3 ) for a sparse matrix. The random-walk betweenness of an edge says how frequently a random walker running on the graph goes across the edge. We remind that a random walker moving from a vertex follows each edge with equal probability. A pair of vertices is chosen at random, s and t. The walker starts at s and keeps moving until it hits t, where it stops. One computes the probability that each edge was crossed by the walker, and averages over all possible choices for the vertices s and t. The complete calculation requires a time O(n3 ) on a sparse graph. It is possible to show that this measure is equivalent to current-flow betweenness [39]. Calculating edge betweenness is much faster than current-flow or random walk betweenness (O(n2 ) versus O(n3 ) on sparse graphs). In addition, in practical applications the Girvan–Newman algorithm with edge betweenness gives better results than adopting the other centrality measures. Numerical studies show that the recalculation step 3 of Girvan–Newman algorithm is essential to detect meaningful communities. This introduces an additional factor m in the running time of the algorithm: consequently, the edge betweenness version scales as O(m2 n), or O(n3 ) on a sparse graph. Because of that, the algorithm is quite slow, and applicable to graphs with up to n 10 000 vertices, with current computational resources. In the original version of Girvan–Newman’s algorithm [14], the authors had to deal with the whole hierarchy of partitions, as they had no procedure to say which

partition is the best. In a successive refinement [26], they selected the partition with the largest value of modularity (see Sect. “Evaluating Partitions: Quality Functions”), a criterion that has been frequently used ever since. There have been countless applications of the Girvan–Newman method: the algorithm is now integrated in well known libraries of network analysis programs. Algorithm of Tyler et al. Tyler, Wilkinson and Huberman proposed a modification of the Girvan–Newman algorithm, to improve the speed of the calculation [40]. The modification consists in calculating the contribution to edge betweenness only from a limited number of vertex pairs, chosen at random, deriving a sort of Monte Carlo estimate. The procedure induces statistical errors in the values of the edge betweenness. As a consequence, the partitions are in general different for different choices of the sampling pairs of vertices. However, the authors showed that, by repeating the calculation many times, the method gives good results, with a substantial gain of computer time. In practical examples, only vertices lying at the boundary between communities may not be clearly classified, and be assigned sometimes to a group, sometimes to another. The method has been applied to a network of people corresponding through email [40] and to networks of gene co-occurrences [41]. Algorithm of Fortunato et al. An alternative measure of centrality for edges is information centrality. It is based on the concept of efficiency [42], which estimates how easily information travels on a graph according to the length of shortest paths between vertices. The information centrality of an edge is the variation of the efficiency of the graph if the edge is removed. In the algorithm by Fortunato, Latora and Marchiori [43], edges are removed according to decreasing values of information centrality. The method is analogous to that of Girvan and Newman, but slower, as it scales as O(n4 ) on a sparse graph. On the other hand, it gives a better classification of vertices when communities are fuzzy, i. e. with a high degree of interconnectedness. Algorithm of Radicchi et al. Because of the high density of edges within communities, it is easy to find loops in them, i. e. closed non-intersecting paths. On the contrary, edges lying between communities will hardly be part of short loops. Based on this intuitive idea, Radicchi et al. proposed a new measure, the edge clustering coefficient, such that low values of the measure are likely to correspond to intercommunity edges [25]. The edge clustering coefficient generalizes to edges the notion of clustering coefficient introduced by Watts and Strogatz for vertices [44]. The latter is the number of triangles including a vertex divided by the number of possible triangles that can be formed. The edge clustering coefficient is the num-

499

500

Community Structure in Graphs

ber of loops of length g including the edge divided by the number of possible cycles. Usually, loops of length g D 3 or 4 are considered. At each iteration, the edge with smallest clustering coefficient is removed, the measure is recalculated again, and so on. The procedure stops when all clusters obtained are LS-sets or “weak” communities (see Sect. “Definition of Community”). Since the edge clustering coefficient is a local measure, involving at most an extended neighborhood of the edge, it can be calculated very quickly. The running time of the algorithm to completion is O(m4 /n2 ), or O(n2 ) on a sparse graph, so it is much shorter than the running time of the Girvan–Newman method. On the other hand, the method may give poor results when the graph has few loops, as it happens in several non-social networks. In this case, in fact, the edge clustering coefficient is small and fairly similar for all edges, and the algorithm may fail to identify the bridges between communities. Modularity Optimization If Newman–Girvan modularity Q (Sect. “Evaluating Partitions: Quality Functions”) is a good indicator of the quality of partitions, the partition corresponding to its maximum value on a given graph should be the best, or at least a very good one. This is the main motivation for modularity maximization, perhaps the most popular class of methods to detect communities in graphs. An exhaustive optimization of Q is impossible, due to the huge number of ways in which it is possible to partition a graph, even when the latter is small. Besides, the true maximum is out of reach, as it has been recently proved that modularity optimization is an NP-hard problem [45], so it is probably impossible to find the solution in a time growing polynomially with the size of the graph. However, there are currently several algorithms able to find fairly good approximations of the modularity maximum in a reasonable time. Greedy techniques. The first algorithm devised to maximize modularity was a greedy method of Newman [46]. It is an agglomerative method, where groups of vertices are successively joined to form larger communities such that modularity increases after the merging. One starts from n clusters, each containing a single vertex. Edges are not initially present, they are added one by one during the procedure. However, modularity is always calculated from the full topology of the graph, since one wants to find its partitions. Adding a first edge to the set of disconnected vertices reduces the number of groups from n to n  1, so it delivers a new partition of the graph. The edge is chosen such that this partition gives the maximum increase of modularity with respect to the previous configuration.

All other edges are added based on the same principle. If the insertion of an edge does not change the partition, i. e. the clusters are the same, modularity stays the same. The number of partitions found during the procedure is n, each with a different number of clusters, from n to 1. The largest value of modularity in this subset of partitions is the approximation of the modularity maximum given by the algorithm. The update of the modularity value at each iteration step can be performed in a time O(n C m), so the algorithm runs to completion in a time O((m C n)n), or O(n2 ) on a sparse graph, which is fast. In a later paper by Clauset et al. [47], it was shown that the calculation of modularity during the procedure can be performed much more quickly by use of max-heaps, special data structures created using a binary tree. By doing that, the algorithm scales as O(md log n), where d is the depth of the dendrogram describing the successive partitions found during the execution of the algorithm, which grows as log n for graphs with a strong hierarchical structure. For those graphs, the running time of the method is then O(n log2 n), which allows to analyze the community structure of very large graphs, up to 107 vertices. The greedy algorithm is currently the only algorithm that can be used to estimate the modularity maximum on such large graphs. On the other hand, the approximation it finds is not that good, as compared with other techniques. The accuracy of the algorithm can be considerably improved if one accounts for the size of the groups to be merged [48], or if the hierarchical agglomeration is started from some good intermediate configuration, rather than from the individual vertices [49]. Simulated annealing. Simulated annealing [50] is a probabilistic procedure for global optimization used in different fields and problems. It consists in performing an exploration of the space of possible states, looking for the global optimum of a function F, say its maximum. Transitions from one state to another occur with probability 1 if F increases after the change, otherwise with a probability exp(ˇF), where F is the decrease of the function and ˇ is an index of stochastic noise, a sort of inverse temperature, which increases after each iteration. The noise reduces the risk that the system gets trapped in local optima. At some stage, the system converges to a stable state, which can be an arbitrarily good approximation of the maximum of F, depending on how many states were explored and how slowly ˇ is varied. Simulated annealing was first employed for modularity optimization by R. Guimerá et al. [31]. Its standard implementation combines two types of “moves”: local moves, where a single vertex is shifted from one cluster to another, taken at random; global moves, consisting of merges and splits of

Community Structure in Graphs

communities. In practical applications, one typically combines n2 local moves with n global ones in one iteration. The method can potentially come very close to the true modularity maximum, but it is slow. Therefore, it can be used for small graphs, with up to about 104 vertices. Applications include studies of potential energy landscapes [51] and of metabolic networks [12]. Extremal optimization. Extremal optimization is a heuristic search procedure proposed by Boettcher and Percus [52], in order to achieve an accuracy comparable with simulated annealing, but with a substantial gain in computer time. It is based on the optimization of local variables, expressing the contribution of each unit of the system to the global function at study. This technique was used for modularity optimization by Duch and Arenas [53]. Modularity can be indeed written as a sum over the vertices: the local modularity of a vertex is the value of the corresponding term in this sum. A fitness measure for each vertex is obtained by dividing the local modularity of the vertex by its degree. One starts from a random partition of the graph in two groups. At each iteration, the vertex with the lowest fitness is shifted to the other cluster. The move changes the partition, so the local fitnesses need to be recalculated. The process continues until the global modularity Q cannot be improved any more by the procedure. At this stage, each cluster is considered as a graph on its own and the procedure is repeated, as long as Q increases for the partitions found. The algorithm finds an excellent approximation of the modularity maximum in a time O(n2 log n), so it represents a good tradeoff between accuracy and speed. Spectral optimization. Modularity can be optimized using the eigenvalues and eigenvectors of a special matrix, the modularity matrix B, whose elements are Bi j D Ai j 

ki k j ; 2m

(7)

where the notation is the same used in Eq. (1). The method [54,55] is analogous to spectral bisection, described in Sect. “Computer Science: Graph Partitioning”. The difference is that here the Laplacian matrix is replaced by the modularity matrix. Between Q and B there is the same relation as between R and L in Eq. (3), so modularity can be written as a weighted sum of the eigenvalues of B, just like Eq. (4). Here one has to look for the eigenvector of B with largest eigenvalue, u1 , and group the vertices according to the signs of the components of u1 , just like in Sect. “Computer Science: Graph Partitioning”. The Kernighan–Lin algorithm can then be used to improve the result. The procedure is repeated for each of the clusters separately, and the number of communities increases as

long as modularity does. The advantage over spectral bisection is that it is not necessary to specify the size of the two groups, because it is determined by taking the partition with largest modularity. The drawback is similar as for spectral bisection, i. e. the algorithm gives the best results for bisections, whereas it is less accurate when the number of communities is larger than two. The situation could be improved by using the other eigenvectors with positive eigenvalues of the modularity matrix. In addition, the eigenvectors with the most negative eigenvalues are important to detect a possible multipartite structure of the graph, as they give the most relevant contribution to the modularity minimum. The algorithm typically runs in a time O(n2 log n) for a sparse graph, when one computes only the first eigenvector, so it is faster than extremal optimization, and slightly more accurate, especially for large graphs. Finally, some general remarks on modularity optimization and its reliability. A large value for the modularity maximum does not necessarily mean that a graph has a community structure. Random graphs can also have partitions with large modularity values, even though clusters are not explicitly built in [31,56]. Therefore, the modularity maximum of a graph reveals its community structure only if it is appreciably larger than the modularity maximum of random graphs of the same size [57]. In addition, one assumes that the modularity maximum delivers the “best” partition of the network in communities. However, this is not always true [32]. In the definition of modularity (Eq. (2)) the graph is compared with a random version of it, that keeps the degrees of its vertices. If groups of vertices in the graphs are more tightly connected than they would be in the randomized graph, modularity optimization would consider them as parts of p the same module. But if the groups have less than m internal edges, the expected number of edges running between them in modularity’s null model is less than one, and a single interconnecting edge would cause the merging of the two groups in the optimal partition. This holds for every density of edges inside the groups, even in the limit case in which all vertices of each group are connected to each other, i. e. if the groups are cliques. In Fig. 7 a graph is made out of nc identical cliques, with l vertices each, connected by single edges. It is intuitive to think that the modules of the best partition are the single cliques: instead, if nc is larger than about l2 , modularity would be higher for the partition in which pairs of consecutive cliques are parts of the same module (indicated by the dashed lines in the figure). The problem holds for a wide class of possible null models [58]. Attempts have been made to solve it within the modularity framework [59,60,61].

501

502

Community Structure in Graphs

Community Structure in Graphs, Figure 7 Resolution limit of modularity optimization. The natural community structure of the graph, represented by the individual cliques (circles), is not recognized by optimizing modularity, if the cliques are smaller than a scale depending on the size of the graph. Reprinted figure with permission from [32]

Modifications of the measure have also been suggested. Massen and Doye proposed a slight variation of modularity’s null model [51]: it is still a graph with the same degree sequence as the original, and with edges rewired at random among the vertices, but one imposes the additional constraint that there can be neither multiple edges between a pair of vertices nor edges joining a vertex with itself (selfedges). Muff, Rao and Caflisch remarked that modularity’s null model implicitly assumes that each vertex could be attached to any other, whether in real cases a cluster is usually connected to few other clusters [62]. Therefore, they proposed a local version of modularity, in which the expected number of edges within a module is not calculated with respect to the full graph, but considering just a portion of it, namely the subgraph including the module and its neighboring modules. Spectral Algorithms As discussed above, spectral properties of graph matrices are frequently used to find partitions. Traditional methods are in general unable to predict the number and size of the clusters, which instead must be fed into the procedure. Recent algorithms, reviewed below, are more powerful. Algorithm of Donetti and Muñoz. An elegant method based on the eigenvectors of the Laplacian matrix has been

Community Structure in Graphs, Figure 8 Spectral algorithm by Donetti and Muñoz. Vertex i is represented by the values of the ith components of Laplacian eigenvectors. In this example, the graph has an adhoc division in four communities, indicated by different symbols. The communities are better separated in two dimensions (b) than in one (a). Reprinted figure with permission from [63]

devised by Donetti and Muñoz [63]. The idea is simple: the values of the eigenvector components are close for vertices in the same community, so one can use them as coordinates to represent vertices as points in a metric space. So, if one uses M eigenvectors, one can embed the vertices in an M-dimensional space. Communities appear as groups of points well separated from each other, as illustrated in Fig. 8. The separation is the more visible, the larger the number of dimensions/eigenvectors M. The space points are grouped in communities by hierarchical clustering (see Sect. “Social Science: Hierarchical and k-Means Clustering”). The final partition is the one with largest modularity. For the similarity measure between vertices, Donetti and Muñoz used both the Euclidean distance and the angle distance. The angle distance between two points is the angle between the vectors going from the origin of the Mdimensional space to either point. Applications show that the best results are obtained with complete-linkage clustering. The algorithm runs to completion in a time O(n3 ), which is not fast. Moreover, the number M of eigenvectors that are needed to have a clean separation of the clusters is not known a priori. Algorithm of Capocci et al. Similarly to Donetti and Muñoz, Capocci et al. used eigenvector components to identify communities [64]. In this case the eigenvectors are those of the normal matrix, that is derived from the adjacency matrix by dividing each row by the sum of its elements. The eigenvectors can be quickly calculated by

Community Structure in Graphs

performing a constrained optimization of a suitable cost function. A similarity matrix is built by calculating the correlation between eigenvector components: the similarity between vertices i and j is the Pearson correlation coefficient between their corresponding eigenvector components, where the averages are taken over the set of eigenvectors used. The method can be extended to directed graphs. It is useful to estimate vertex similarities, however it does not provide a well-defined partition of the graph. Algorithm of Wu and Huberman. A fast algorithm by Wu and Huberman identifies communities based on the properties of resistor networks [65]. It is essentially a method for bisectioning graph, similar to spectral bisection, although partitions in an arbitrary number of communities can be obtained by iterative applications. The graph is transformed into a resistor network where each edge has unit resistance. A unit potential difference is set between two randomly chosen vertices. The idea is that, if there is a clear division in two communities of the graph, there will be a visible gap between voltage values for vertices at the borders between the clusters. The voltages are calculated by solving Kirchoff’s equations: an exact resolution would be too time consuming, but it is possible to find a reasonably good approximation in a linear time for a sparse graph with a clear community structure, so the more time consuming part of the algorithm is the sorting of the voltage values, which takes time O(n log n). Any possible vertex pair can be chosen to set the initial potential difference, so the procedure should be repeated for all possible vertex pairs. The authors showed that this is not necessary, and that a limited number of sampling pairs is sufficient to get good results, so the algorithm scales as O(n log n) and is very fast. An interesting feature of the method is that it can quickly find the natural community of any vertex, without determining the complete partition of the graph. For that, one uses the vertex as source voltage and places the sink at an arbitrary vertex. The same feature is present in an older algorithm by Flake et al. [11], where one uses max-flow instead of current flow. Previous works have shown that also the eigenvectors of the transfer matrix T can be used to extract useful information on community structure [66,67]. The element T ij of the transfer matrix is 1/k j if i and j are neighbors, where kj is the degree of j, otherwise it is zero. The transfer matrix rules the process of diffusion on graphs. Dynamic Algorithms This section describes methods employing processes running on the graph, focusing on spin-spin interactions, random walk and synchronization.

Q-state Potts model. The Potts model is among the most popular models in statistical mechanics [68]. It describes a system of spins that can be in q different states. The interaction is ferromagnetic, i. e. it favors spin alignment, so at zero temperature all spins are in the same state. If antiferromagnetic interactions are also present, the ground state of the system may not be the one where all spins are aligned, but a state where different spin values coexist, in homogeneous clusters. If Potts spin variables are assigned to the vertices of a graph with community structure, and the interactions are between neighboring spins, it is likely that the topological clusters could be recovered from like-valued spin clusters of the system, as there are many more interactions inside communities than outside. Based on this idea, inspired by an earlier paper by Blatt, Wiseman and Domany [69], Reichardt and Bornholdt proposed a method to detect communities that maps the graph onto a q-Potts model with nearest-neighbors interactions [70]. The Hamiltonian of the model, i. e. its energy, is the sum of two competing terms, one favoring spin alignment, one antialignment. The relative weight of these two terms is expressed by a parameter  , which is usually set to the value of the density of edges of the graph. The goal is to find the ground state of the system, i. e. to minimize the energy. This can be done with simulated annealing [50], starting from a configuration where spins are randomly assigned to the vertices and the number of states q is very high. The procedure is quite fast and the results do not depend on q. The method also allows to identify vertices shared between communities, from the comparison of partitions corresponding to global and local energy minima. More recently, Reichardt and Bornholdt derived a general framework [28], in which detecting community structure is equivalent to finding the ground state of a q-Potts model spin glass [71]. Their previous method and modularity optimization are recovered as special cases. Overlapping communities can be discovered by comparing partitions with the same (minimal) energy, and hierarchical structure can be investigated by tuning a parameter acting on the density of edges of a reference graph without community structure. Random walk. Using random walks to find communities comes from the idea that a random walker spends a long time inside a community due to the high density of edges and consequent number of paths that could be followed. Zhou used random walks to define a distance between pairs of vertices [72]: the distance dij between i and j is the average number of edges that a random walker has to cross to reach j starting from i. Close vertices are likely to belong to the same community. Zhou defines the “global attractor” of a vertex i to be the closest vertex to i

503

504

Community Structure in Graphs

(smallest dij ), whereas the “local attractor” of i is its closest neighbor. Two types of communities are defined, according to local or global attractors: a vertex i has to be put in the same community of its attractor and of all other vertices for which i is an attractor. Communities must be minimal subgraphs, i. e. they cannot include smaller subgraphs which are communities according to the chosen criterion. Applications to real and artificial networks show that the method can find meaningful partitions. In a successive paper [73], Zhou introduced a measure of dissimilarity between vertices based on the distance defined above. The measure resembles the definition of distance based on structural equivalence of Eq. (5), where the elements of the adjacency matrix are replaced by the corresponding distances. Graph partitions are obtained with a divisive procedure that, starting from the graph as a single community, performs successive splits based on the criterion that vertices in the same cluster must be less dissimilar than a running threshold, which is decreased during the process. The hierarchy of partitions derived by the method is representative of actual community structures for several real and artificial graphs. In another work [74], Zhou and Lipowsky defined distances with biased random walkers, where the bias is due to the fact that walkers move preferentially towards vertices sharing a large number of neighbors with the starting vertex. A different distance measure between vertices based on random walks was introduced by Latapy and Pons [75]. The distance is calculated from the probabilities that the random walker moves from a vertex to another in a fixed number of steps. Vertices are then grouped into communities through hierarchical clustering. The method is quite fast, running to completion in a time O(n2 log n) on a sparse graph. Synchronization. Synchronization is another promising dynamic process to reveal communities in graphs. If oscillators are placed at the vertices, with initial random phases, and have nearest-neighbor interactions, oscillators in the same community synchronize first, whereas a full synchronization requires a longer time. So, if one follows the time evolution of the process, states with synchronized clusters of vertices can be quite stable and long-lived, so they can be easily recognized. This was first shown by Arenas, Díaz–Guilera and Pérez–Vicente [76]. They used Kuramoto oscillators [77], which are coupled two-dimensional vectors endowed with a proper frequency of oscillations. If the interaction coupling exceeds a threshold, the dynamics leads to synchronization. Arenas et al. showed that the time evolution of the system reveals some intermediate time scales, corresponding to topological scales of the graph, i. e. to different levels of organization of the vertices. Hierarchical community structure can be revealed

in this way. Based on the same principle, Boccaletti et al. designed a community detection method based on synchronization [79]. The synchronization dynamics is a variation of Kuramoto’s model, the opinion changing rate (OCR) model [80]. The evolution equations of the model are solved for decreasing values of a parameter that tunes the strength of the interaction coupling between neighboring vertices. In this way, different partitions are recovered: the partition with the largest value of modularity is chosen. The algorithm scales in a time O(mn), or O(n2 ) on sparse graphs, and gives good results on practical examples. However, synchronization-based algorithms may not be reliable when communities are very different in size. Clique Percolation In most of the approaches examined so far, communities have been characterized and discovered, directly or indirectly, by some global property of the graph, like betweenness, modularity, etc., or by some process that involves the graph as a whole, like random walks, synchronization, etc. But communities can be also interpreted as a form of local organization of the graph, so they could be defined from some property of the groups of vertices themselves, regardless of the rest of the graph. Moreover, very few of the algorithms presented so far are able to deal with the problem of overlapping communities (Sect. “Overlapping Communities”). A method that accounts both for the locality of the community definition and for the possibility of having overlapping communities is the Clique Percolation Method (CPM) by Palla et al. [13]. It is based on the concept that the internal edges of community are likely to form cliques due to their high density. On the other hand, it is unlikely that intercommunity edges form cliques: this idea was already used in the divisive method of Radicchi et al. (see Sect. “Divisive Algorithms”). Palla et al. define a k-clique as a complete graph with k vertices. Notice that this definition is different from the definition of n-clique (see Sect. “Definition of Community”) used in social science. If it were possible for a clique to move on a graph, in some way, it would probably get trapped inside its original community, as it could not cross the bottleneck formed by the intercommunity edges. Palla et al. introduced a number of concepts to implement this idea. Two k-cliques are adjacent if they share k  1 vertices. The union of adjacent k-cliques is called k-clique chain. Two k-cliques are connected if they are part of a k-clique chain. Finally, a k-clique community is the largest connected subgraph obtained by the union of a k-clique and of all k-cliques which are connected to it. Examples of k-clique communities are shown in Fig. 9.

Community Structure in Graphs

Community Structure in Graphs, Figure 9 Clique Percolation Method. The example shows communities spanned by adjacent 3-cliques (triangles). Overlapping vertices are shown by the bigger dots. Reprinted figure with permission from [13]

One could say that a k-clique community is identified by making a k-clique “roll” over adjacent k-cliques, where rolling means rotating a k-clique about the k  1 vertices it shares with any adjacent k-clique. By construction, k-clique communities can share vertices, so they can be overlapping. There may be vertices belonging to non-adjacent k-cliques, which could be reached by different paths and end up in different clusters. In order to find k-clique communities, one searches first for maximal cliques, a task that is known to require a running time that grows exponentially with the size of the graph. However, the authors found that, for the real networks they analyzed, the procedure is quite fast, allowing to analyze graphs with up to 105 vertices in a reasonably short time. The actual scalability of the algorithm depends on many factors, and cannot be expressed in closed form. The algorithm has been extended to the analysis of weighted [81] and directed [82] graphs. It was recently used to study the evolution of community structure in social networks [83]. A special software, called CFinder, based on the CPM, has been designed by Palla and coworkers and is freely available. The CPM has the same limit as the algorithm of Radicchi et al.: It assumes that the graph has a large number of cliques, so it may fail to give meaningful partitions for graphs with just a few cliques, like technological networks. Other Techniques This section describes some algorithms that do not fit in the previous categories, although some overlap is possible.

Markov Cluster Algorithm (MCL). This method, invented by van Dongen [84], simulates a peculiar process of flow diffusion in a graph. One starts from the stochastic matrix of the graph, which is obtained from the adjacency matrix by dividing each element Aij by the degree of i. The element Sij of the stochastic matrix gives the probability that a random walker, sitting at vertex i, moves to j. The sum of the elements of each column of S is one. Each iteration of the algorithm consists of two steps. In the first step, called expansion, the stochastic matrix of the graph is raised to an integer power p (usually p D 2). The entry M ij of the resulting matrix gives the probability that a random walker, starting from vertex i, reaches j in p steps (diffusion flow). The second step, which has no physical counterpart, consists in raising each single entry of the matrix M to some power ˛, where ˛ is now real-valued. This operation, called inflation, enhances the weights between pairs of vertices with large values of the diffusion flow, which are likely to be in the same community. Next, the elements of each row must be divided by their sum, such that the sum of the elements of the row equals one and a new stochastic matrix is recovered. After some iterations, the process delivers a stable matrix, with some remarkable properties. Its elements are either zero or one, so it is a sort of adjacency matrix. Most importantly, the graph described by the matrix is disconnected, and its connected components are the communities of the original graph. The method is really simple to implement, which is the main reason of its success: as of now, the MCL is one of the most used clustering algorithms in bioinformatics. Due to the matrix multiplication of the expansion step, the algorithm should scale as O(n3 ), even if the graph is sparse, as the running matrix becomes quickly dense after a few steps of the algorithm. However, while computing the matrix multiplication, MCL keeps only a maximum number k of non-zero elements per column, where k is usually much smaller than n. So, the actual worst-case running time of the algorithm is O(nk 2 ) on a sparse graph. A problem of the method is the fact that the final partition is sensitive to the parameter ˛ used in the inflation step. Therefore several partitions can be obtained, and it is not clear which are the most meaningful or representative. Maximum likelihood. Newman and Leicht have recently proposed an algorithm based on traditional tools and techniques of statistical inference [85]. The method consists in deducing the group structure of the graph by checking which possible partition better “fits” the graph topology. The goodness of the fit is measured by the likelihood that the observed graph structure was generated by the particular set of relationships between vertices that define a partition. The latter is described by two sets of model

505

506

Community Structure in Graphs

parameters, expressing the size of the clusters and the connection preferences among the vertices, i. e. the probabilities that vertices of one cluster are linked to any vertex. The partition corresponding to the maximum likelihood is obtained by iterating a set of coupled equations for the variables, starting from a suitable set of initial conditions. Convergence is fast, so the algorithm could be applied to fairly large graphs, with up to about 106 vertices. A nice feature of the method is that it discovers more general types of vertex classes than communities. For instance, multipartite structure could be uncovered, or mixed patterns where multipartite subgraphs coexist with communities, etc. In this respect, it is more powerful than most methods of community detection, which are bound to focus only on proper communities, i. e. subgraphs with more internal than external edges. In addition, since partitions are defined by assigning probability values to the vertices, expressing the extent of their membership in a group, it is possible that some vertices are not clearly assigned to a group, but to more groups, so the method is able to deal with overlapping communities. The main drawback of the algorithm is the fact that one needs to specify the number of groups at the beginning of the calculation, a number that is often unknown for real networks. It is possible to derive this information self-consistently by maximizing the probability that the data are reproduced by partitions with a given number of clusters. But this procedure involves some degree of approximation, and the results are often not good. L-shell method. This is an agglomerative method designed by Bagrow and Bollt [86]. The algorithm finds the community of any vertex, although the authors also presented a more general procedure to identify the full community structure of the graph. Communities are defined locally, based on a simple criterion involving the number of edges inside and outside a group of vertices. One starts from a vertex-origin and keeps adding vertices lying on successive shells, where a shell is defined as a set of vertices at a fixed geodesic distance from the origin. The first shell includes the nearest neighbors of the origin, the second the next-to-nearest neighbors, and so on. At each iteration, one calculates the number of edges connecting vertices of the new layer to vertices inside and outside the running cluster. If the ratio of these two numbers (“emerging degree”) exceeds some predefined threshold, the vertices of the new shell are added to the cluster, otherwise the process stops. Because of the local nature of the process, the algorithm is very fast and can identify communities very quickly. By repeating the process starting from every vertex, one could derive a membership matrix M: the element M ij is one if vertex j belongs to the community of ver-

tex i, otherwise it is zero. The membership matrix can be rewritten by suitably permutating rows and columns based on their mutual distances. The distance between two rows (or columns) is defined as the number of entries whose elements differ. If the graph has a clear community structure, the membership matrix takes a block-diagonal form, where the blocks identify the communities. Unfortunately, the rearrangement of the matrix requires a time O(n3 ), so it is quite slow. In a different algorithm, local communities are discovered through greedy maximization of a local modularity measure [87]. Algorithm of Eckmann and Moses. This is another method where communities are defined based on a local criterion [88]. The idea is to use the clustering coefficient [44] of a vertex as a quantity to distinguish tightly connected groups of vertices. Many edges mean many loops inside a community, so the vertices of a community are likely to have a large clustering coefficient. The latter can be related to the average distance between pairs of neighbors of the vertex. The possible values of the distance are 1 (if neighbors are connected) or 2 (if they are not), so the average distance lies between 1 and 2. The more triangles there are in the subgraph, the shorter the average distance. Since each vertex has always distance 1 from its neighbors, the fact that the average distance between its neighbors is different from 1 reminds what happens when one measures segments on a curved surface. Endowed with a metric, represented by the geodesic distance between vertices/points, and a curvature, the graph can be embedded in a geometric space. Communities appear as portions of the graph with a large curvature. The algorithm was applied to the graph representation of the World Wide Web, where vertices are Web pages and edges are the hyperlinks that take users from a page to the other. The authors found that communities correspond to Web pages dealing with the same topic. Algorithm of Sales–Pardo et al. This is an algorithm designed to detect hierarchical community structure (see Sect. “Hierarchies”), a realistic feature of many natural, social and technological networks, that most algorithms usually neglect. The authors [89] introduce first a similarity measure between pairs of vertices based on Newman– Girvan modularity: basically the similarity between two vertices is the frequency with which they coexist in the same community in partitions corresponding to local optima of modularity. The latter are configurations for which modularity is stable, i. e. it cannot increase if one shifts one vertex from one cluster to another or by merging or splitting clusters. Next, the similarity matrix is put in blockdiagonal form, by minimizing a cost function expressing the average distance of connected vertices from the diag-

Community Structure in Graphs

onal. The blocks correspond to the communities and the recovered partition represents the largest scale organization level. To determine levels at lower scales, one iterates the procedure for each subgraph identified at the previous level, which is considered as an independent graph. The method yields then a hierarchy by construction, as communities at each level are nested within communities at higher levels. The algorithm is not fast, as both the search of local optima for modularity and the rearrangement of the similarity matrix are performed with simulated annealing, but delivers good results for computer generated networks, and meaningful partitions for some social, technological and biological networks. Algorithm by Rosvall and Bergstrom. The modular structure can be considered as a reduced description of a graph to approximate the whole information contained in its adjacency matrix. Based on this idea, Rosvall and Bergstrom [90] envisioned a communication process in which a partition of a network in communities represents a synthesis Y of the full structure that a signaler sends to a receiver, who tries to infer the original graph topology X from it. The best partition corresponds to the signal Y that contains the most information about X. This can be quantitatively assessed by the maximization of the mutual information I(X; Y) [91]. The method is better than modularity optimization, especially when communities are of different size. The optimization of the mutual information is performed by simulated annealing, so the method is rather slow and can be applied to graphs with up to about 104 vertices. Testing Methods When a community detection algorithm is designed, it is necessary to test its performance, and compare it with other methods. Ideally, one would like to have graphs with known community structure and check whether the algorithm is able to find it, or how closely can come to it. In any case, one needs to compare partitions found by the

Community Structure in Graphs, Figure 10 Benchmark of Girvan and Newman. The three pictures correspond to zin D 15 (a), zin D 11 (b) and zin D 8 (c). In c the four groups are basically invisible. Reprinted figure with permission from [12]

method with “real” partitions. How can different partitions of the same graph be compared? Danon et al. [92] used a measure borrowed from information theory, the normalized mutual information. One builds a confusion matrix N, whose element N ij is the number of vertices of the real community i that are also in the detected community j. Since the partitions to be compared may have different numbers of clusters, N is usually not a square matrix. The similarity of two partitions A and B is given by the following expression 2

P c A Pc B

N i j log(N i j N/N i: N: j ) Pc B ; iD1 N i: log(N i: /N) C jD1 N: j log(N: j /N)

I(A; B) D Pc A

iD1

jD1

(8) where c B (c A ) is the number of communities in partition A(B), N i: is the sum of the elements of N on row i and N: j is the sum of the elements of N on column j. Another useful measure of similarity between partitions is the Jaccard index, which is regularly used in scientometric research. Given two partitions A and B, the Jaccard index is defined as I J (A; B) D

n11 ; n11 C n01 C n10

(9)

where n11 is the number of pairs of vertices which are in the same community in both partitions and n01 (n10 ) denotes the number of pairs of elements which are put in the same community in A(B) and in different communities in B(A). A nice presentation of criteria to compare partitions can be found in [93]. In the literature on community detection, algorithms have been generally tested on two types of graphs: computer generated graphs and real networks. The most famous computer generated benchmark is a class of graphs designed by Girvan and Newman [14]. Each graph consists of 128 vertices, arranged in four groups with 32 vertices each: 1–32, 33–64, 65–96 and 97–128. The average degree of each vertex is set to 16. The density of edges inside the groups is tuned by a parameter zin , expressing the average number of edges shared by each vertex of a group with the other members (internal degree). Naturally, when zin is close to 16, there is a clear community structure (see Fig. 10a), as most edges will join vertices of the same community, whereas when zin  8 there are more edges connecting vertices of different communities and the graph looks fuzzy (see Fig. 10c). In this way, one can realize different degrees of mixing between the groups. In this case the test consists in calculating the similarity between the partitions determined by the method at study

507

508

Community Structure in Graphs

and the natural partition of the graph in the four equalsized groups. The similarity can be calculated by using the measure of Eq. (8), but in the literature one used a different quantity, i. e. the fraction of correctly classified vertices. A vertex is correctly classified if it is in the same cluster with at least 16 of its “natural” partners. If the model partition has clusters given by the merging of two or more natural groups, all vertices of the cluster are considered incorrectly classified. The number of correctly classified vertices is then divided by the total size of the graph, to yield a number between 0 and 1. One usually builds many realizations of the graph for a particular value of zin and computes the average fraction of correctly classified vertices, which is a measure of the sensitivity of the method. The procedure is then iterated for different values of zin . Many different algorithms have been compared with each other according to the diagram where the fraction of correctly classified vertices is plotted against zin . Most algorithms usually do a good job for large zin and start to fail when zin approaches 8. The recipe to label vertices as correctly or incorrectly classified is somewhat arbitrary, though, and measures like those of Eq. (8) and (9) are probably more objective. There is also a subtle problem concerning the reliability of the test. Because of the randomness involved in the process of distributing edges among the vertices, it may well be that, in specific realizations of the graph, some vertices share more edges with members of another group than of their own. In this case, it is inappropriate to consider the initial partition in four groups as the real partition of the graph. Tests on real networks usually focus on a limited number of examples, for which one has precise information about the vertices and their properties. The most popular real network with a known community structure is the social network of Zachary’s karate club (see Fig. 11). This is a social network representing the personal relationships between members of a karate club at an American university. During two years, the sociologist Wayne Zachary observed the ties between members, both inside and outside the club [94]. At some point, a conflict arose between the club’s administrator (vertex 1) and one of the teachers (vertex 33), which led to the split of the club in two smaller clubs, with some members staying with the administrator and the others following the instructor. Vertices of the two groups are highlighted by squares and circles in Fig. 11. The question is whether the actual social split could be predicted from the network topology. Several algorithms are actually able to identify the two classes, apart from a few intermediate vertices, which may be misclassified (e. g. vertices 3, 10). Other methods are less successful: for instance, the maximum of Newman–Girvan

modularity corresponds to a split of the network in four groups [53,63]. It is fundamental however to stress that the comparison of community structures detected by the various methods with the split of Zachary’s karate club is based on a very strong assumption: that the split actually reproduced the separation of the social network in two communities. There is no real argument, beyond common wisdom, supporting this assumption. Two other networks have frequently been used to test community detection algorithms: the network of American college football teams derived by Girvan and Newman [14] and the social network of bottlenose dolphins constructed by Lusseau [95]. Also for these networks the caveat applies: Nothing guarantees that “reasonable” communities, defined on the basis of non-topological information, must coincide with those detected by methods based only on topology. The Mesoscopic Description of a Graph Community detection algorithms have been applied to a huge variety of real systems, including social, biological and technological networks. The partitions found for each system are usually similar, as the algorithms, in spite of their specific implementations, are all inspired by close intuitive notions of community. What are the general properties of these partitions? The analysis of partitions and their properties delivers a mesoscopic description of the graph, where the communities, and not the vertices, are the elementary units of the topology. The term mesoscopic is used because the relevant scale here lies between the scale of the vertices and that of the full graph. A simple question is whether the communities of a graph are usu-

Community Structure in Graphs, Figure 11 Zachary’s karate club network, an example of graph with known community structure. Reprinted figure with permission from [26]

Community Structure in Graphs

Community Structure in Graphs, Figure 12 Cumulative distribution of community sizes for the Amazon purchasing network. The partition is derived by greedy modularity optimization. Reprinted figure with permission from [47]

individual properties of the vertices. A nice classification has been proposed by Guimerá and Amaral [12,96]. The role of a vertex depends on the values of two indices, the z-score and the participation ratio, that determine the position of the vertex within its own module and with respect to the other modules. The z-score compares the internal degree of the vertex in its module with the average internal degree of the vertices in the module. The participation ratio says how the edges of the vertex are distributed among the modules. Based on these two indices, Guimerá and Amaral distinguish seven roles for a vertex. These roles seem to be correlated to functions of vertices: in metabolic networks, for instance, vertices sharing many edges with vertices of other modules (“connectors”) are often metabolites which are more conserved across species than other metabolites, i. e. they have an evolutionary advantage [12]. Future Directions

ally about of the same size or whether the community sizes have some special distribution. It turns out that the distribution of community sizes is skewed, with a tail that obeys a power law with exponents in the range between 1 and 3 [13,22,23,47]. So, there seems to be no characteristic size for a community: small communities usually coexist with large ones. As an example, Fig. 12 shows the cumulative distribution of community sizes for a recommendation network of the online vendor Amazon.com. Vertices are products and there is a connection between item A and B if B was frequently purchased by buyers of A. We remind that the cumulative distribution is the integral of the probability distribution: if the cumulative distribution is a power law with exponent ˛, the probability distribution is also a power law with exponent ˛ C 1. If communities are overlapping, one could derive a network, where the communities are the vertices and pairs of vertices are connected if their corresponding communities overlap [13]. Such networks seem to have some special properties. For instance, the degree distribution is a particular function, with an initial exponential decay followed by a slower power law decay. A recent analysis has shown that such distribution can be reproduced by assuming that the graph grows according to a simple preferential attachment mechanism, where communities with large degree have an enhanced chance to interact/overlap with new communities [21]. Finally, by knowing the community structure of a graph, it is possible to classify vertices according to their roles within their community, which may allow to infer

The problem of community detection is truly interdisciplinary. It involves scientists of different disciplines both in the design of algorithms and in their applications. The past years have witnessed huge progresses and novelties in this topic. Many methods have been developed, based on various principles. Their scalability has improved by at least one power in the graph size in just a couple of years. Currently partitions in graphs with up to millions of vertices can be found. From this point of view, the limit is close, and future improvements in this sense are unlikely. Algorithms running in linear time are very quick, but their results are often not very good. The major breakthrough introduced by the new methods is the possibility of extracting graph partitions with no preliminary knowledge or inputs about the community structure of the graph. Most new algorithms do not need to know how many communities there are, a major drawback of computer science approaches: they derive this information from the graph topology itself. Similarly, algorithms of new generation are able to select one or a few meaningful partitions, whereas social science approaches usually produce a whole hierarchy of partitions, which they are unable to discriminate. Especially in the last two years, the quality of the output produced by some algorithms has considerably improved. Realistic aspects of community structure, like overlapping and hierarchical communities, are now often taken into account. The main question is: is there at present a good method to detect communities in graphs? The answer depends on what is meant by “good”. Several algorithms give satisfactory results when they are tested as described in

509

510

Community Structure in Graphs

Sect. “Testing Methods”: in this respect, they can be considered good. However, if examined in more detail, some methods disclose serious limits and biases. For instance, the most popular method used nowadays, modularity optimization, is likely to give problems in the analysis of large graphs. Most algorithms are likely to fail in some limit, still one can derive useful indications from them: from the comparison of partitions derived by different methods one could extract the cores of real communities. The ideal method is one that delivers meaningful partitions and handles overlapping communities and hierarchy, possibly in a short time. No such method exists yet. Finding a good method for community detection is a crucial endeavor in biology, sociology and computer science. In particular, biologists often rely on the application of clustering techniques to classify their data. Due to the bioinformatics revolution, gene regulatory networks, protein–protein interaction networks, metabolic networks, etc., are now much better known that they used to be in the past and finally susceptible to solid quantitative investigations. Uncovering their modular structure is an open challenge and a necessary step to discover properties of elementary biological constituents and to understand how biological systems work. Bibliography Primary Literature 1. Euler L (1736) Solutio problematis ad geometriam situs pertinentis. Commentarii Academiae Petropolitanae 8:128–140 2. Bollobás B (1998) Modern Graph Theory. Springer, New York 3. Wasserman S, Faust K (1994) Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge 4. Scott JP (2000) Social Network Analysis. Sage Publications Ltd, London 5. Barabási AL, Albert R (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97 6. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks: from biological nets to the Internet and WWW. Oxford University Press, Oxford 7. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256 8. Pastor-Satorras R, Vespignani A (2004) Evolution and structure of the Internet: A statistical physics approach. Cambridge University Press, Cambridge 9. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU (2006) Complex Networks: Structure and Dynamics. Phys Rep 424:175–308 10. Erdös P, Rényi A (1959) On Random Graphs. Publicationes Mathematicae Debrecen 6:290–297 11. Flake GW, Lawrence S, Lee Giles C, Coetzee FM (2002) SelfOrganization and Identification of Web Communities. IEEE Comput 35(3):66–71 12. Guimerà R, Amaral LAN (2005) Functional cartography of complex metabolic networks. Nature 433:895–900

13. Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818 14. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99(12): 7821–7826 15. Lusseau D, Newman MEJ (2004) Identifying the role that animals play in their social networks. Proc R Soc Lond B 271: S477–S481 16. Pimm SL (1979) The structure of food webs. Theor Popul Biol 16:144–158 17. Krause AE, Frank KA, Mason DM, Ulanowicz RE, Taylor WW (2003) Compartments exposed in food-web structure. Nature 426:282–285 18. Granovetter M (1973) The Strength of Weak Ties. Am J Sociol 78:1360–1380 19. Burt RS (1976) Positions in Networks. Soc Force 55(1):93–122 20. Freeman LC (1977) A Set of Measures of Centrality Based on Betweenness. Sociometry 40(1):35–41 21. Pollner P, Palla G, Vicsek T (2006) Preferential attachment of communities: The same principle, but a higher level. Europhys Lett 73(3):478–484 22. Newman MEJ (2004) Detecting community structure in networks. Eur Phys J B 38:321–330 23. Danon L, Duch J, Arenas A, Díaz-Guilera A (2007) Community structure identification. In: Caldarelli G, Vespignani A (eds) Large Scale Structure and Dynamics of Complex Networks: From Information Technology to Finance and Natural Science. World Scientific, Singapore, pp 93–114 24. Bron C, Kerbosch J (1973) Finding all cliques on an undirected graph. Commun ACM 16:575–577 25. Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D (2004) Defining and identifying communities in networks. Proc Nat Acad Sci USA 101(9):2658–2663 26. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113 27. Arenas A, Fernández A, Fortunato S, Gómez S (2007) Motifbased communities in complex networks. arXiv:0710.0059 28. Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74:016110 29. Massen CP, Doye JPK (2006) Thermodynamics of community structure. arXiv:cond-mat/0610077 30. Arenas A, Duch J, Fernándes A, Gómez S (2007) Size reduction of complex networks preserving modularity. New J Phys 9(6):176–180 31. Guimerà R, Sales-Pardo M, Amaral LAN (2004) Modularity from fluctuations in random graphs and complex networks. Phys Rev E 70:025101(R) 32. Fortunato S, Barthélemy M (2007) Resolution limit in community detection. Proc Nat Acad Sci USA 104(1):36–41 33. Gfeller D, Chappelier J-C, De Los Rios P (2005) Finding instabilities in the community structure of complex networks. Phys Rev E 72:056135 34. Pothen A (1997) Graph partitioning algorithms with applications to scientific computing. In: Keyes DE, Sameh A, Venkatakrishnan V (eds) Parallel Numerical Algorithms. Kluwer Academic Press, Boston, pp 323–368 35. Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307 36. Golub GH, Van Loan CF (1989) Matrix computations. John Hopkins University Press, Baltimore

Community Structure in Graphs

37. JB MacQueen (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley. University of California Press, pp 281–297 38. Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177 39. Newman MEJ (2005) A measure of betweenness centrality based on random walks. Soc Netw 27:39–54 40. Tyler JR, Wilkinson DM, Huberman BA (2003) Email as spectroscopy: automated discovery of community structure within organizations. In: Huysman M, Wenger E, Wulf V (eds) Proceeding of the First International Conference on Communities and Technologies. Kluwer Academic Press, Amsterdam 41. Wilkinson DM, Huberman BA (2004) A method for finding communities of related genes. Proc Nat Acad Sci USA 101(1): 5241–5248 42. Latora V, Marchiori M (2001) Efficient behavior of small-world networks. Phys Rev Lett 87:198701 43. Fortunato S, Latora V, Marchiori M (2004) A method to find community structures based on information centrality. Phys Rev E 70:056104 44. Watts D, Strogatz SH (1998) Collective dynamics of “smallworld” networks. Nature 393:440–442 45. Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D (2007) On finding graph clusterings with maximum modularity. In: Proceedings of the 33rd International Workshop on Graph-Theoretical Concepts in Computer Science (WG’07). Springer, Berlin 46. Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69:066133 47. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111 48. Danon L, Díaz-Guilera A, Arenas A (2006) The effect of size heterogeneity on community identification in complex networks. J Stat Mech Theory Exp 11:P11010 49. Pujol JM, Béjar J, Delgado J (2006) Clustering algorithm for determining community structure in large networks. Phys Rev E 74:016107 50. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 51. Massen CP, Doye JPK (2005) Identifying communities within energy landscapes. Phys Rev E 71:046101 52. Boettcher S, Percus AG (2001) Optimization with extremal dynamics. Phys Rev Lett 86:5211–5214 53. Duch J, Arenas A (2005) Community detection in complex networks using extremal optimization. Phys Rev E 72:027104 54. Newman MEJ (2006) Modularity and community structure in networks. Proc Nat Acad Sci USA 103 (23):8577–8582 55. Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74:036104 56. Reichardt J, Bornholdt S (2007) Partitioning and modularity of graphs with arbitrary degree distribution. Phys Rev E 76:015102(R) 57. Reichardt J, Bornholdt S (2006) When are networks truly modular? Phys D 224:20–26 58. Kumpula JM, Saramäki J, Kaski K, Kertész J (2007) Limited resolution in complex network community detection with Potts model approach. Eur Phys J B 56:41–45 59. Arenas A, Fernándes A, Gómez S (2007) Multiple resolution of the modular structure of complex networks. arXiv:physics/ 0703218

60. Ruan J, Zhang W (2007) Identifying network communities with high resolution. arXiv:0704.3759 61. Kumpula JM, Saramäki J, Kaski K, Kertész J (2007) Limited resolution and multiresolution methods in complex network community detection. In: Kertész J, Bornholdt S, Mantegna RN (eds) Noise and Stochastics in Complex Systems and Finance. Proc SPIE 6601:660116 62. Muff S, Rao F, Caflisch A (2005) Local modularity measure for network clusterizations. Phys Rev E 72:056107 63. Donetti L, Muñoz MA (2004) Detecting network communities: a new systematic and efficient algorithm. J Stat Mech Theory Exp P10012 64. Capocci A, Servedio VDP, Caldarelli G, Colaiori F (2004) Detecting communities in large networks. Phys A 352(2–4):669–676 65. Wu F, Huberman BA (2004) Finding communities in linear time: a physics approach. Eur Phys J B 38:331–338 66. Eriksen KA, Simonsen I, Maslov S, Sneppen K (2003) Modularity and extreme edges of the Internet. Phys Rev Lett 90(14):148701 67. Simonsen I, Eriksen KA, Maslov S, Sneppen K (2004) Diffusion on complex networks: a way to probe their large-scale topological structure. Physica A 336:163–173 68. Wu FY (1982) The Potts model. Rev Mod Phys 54:235–268 69. Blatt M, Wiseman S, Domany E (1996) Superparamagnetic clustering of data. Phys Rev Lett 76(18):3251–3254 70. Reichardt J, Bornholdt S (2004) Detecting fuzzy community structure in complex networks. Phys Rev Lett 93(21):218701 71. Mezard M, Parisi G, Virasoro M (1987) Spin glass theory and beyond. World Scientific Publishing Company, Singapore 72. Zhou H (2003) Network landscape from a Brownian particle’s perspective. Phys Rev E 67:041908 73. Zhou H (2003) Distance, dissimilarity index, and network community structure. Phys Rev E 67:061901 74. Zhou H, Lipowsky R (2004) Network Brownian motion: A new method to measure vertex-vertex proximity and to identify communities and subcommunities. Lect Notes Comput Sci 3038:1062–1069 75. Latapy M, Pons P 92005) Computing communities in large networks using random walks. Lect Notes Comput Sci 3733: 284–293 76. Arenas A, Díaz-Guilera A, Pérez-Vicente CJ (2006) Synchronization reveals topological scales in complex networks. Phys Rev Lett 96:114102 77. Kuramoto Y (1984) Chemical Oscillations, Waves and Turbulence. Springer, Berlin 78. Arenas A, Díaz-Guilera A (2007) Synchronization and modularity in complex networks. Eur Phys J ST 143:19–25 79. Boccaletti S, Ivanchenko M, Latora V, Pluchino A, Rapisarda A (2007) Detecting complex network modularity by dynamical clustering. Phys Rev E 76:045102(R) 80. Pluchino A, Latora V, Rapisarda A (2005) Changing opinions in a changing world: a new perspective in sociophysics. Int J Mod Phys C 16(4):505–522 81. Farkas I, Ábel D, Palla G, Vicsek T (2007) Weighted network modules. New J Phys 9:180 82. Palla G, Farkas IJ, Pollner P, Derényi I, Vicsek T (2007) Directed network modules. New J Phys 9:186 83. Palla G, Barabási A-L, Vicsek T (2007) Quantifying social groups evolution. Nature 446:664–667 84. van Dongen S (2000) Graph Clustering by Flow Simulation. Ph D thesis, University of Utrecht, The Netherlands

511

512

Community Structure in Graphs

85. Newman MEJ, Leicht E (2007) Mixture models and exploratory analysis in networks. Proc Nat Acad Sci USA 104(23):9564– 9569 86. Bagrow JP, Bollt EM (2005) Local method for detecting communities. Phys Rev E 72:046108 87. Clauset A (2005) Finding local community structure in networks. Phys Rev E 72:026132 88. Eckmann J-P, Moses E (2002) Curvature of co-links uncovers hidden thematic layers in the World Wide Web. Proc Nat Acad Sci USA 99(9):5825–5829 89. Sales-Pardo M, Guimerá R, Amaral LAN (2007) Extracting the hierarchical organization of complex systems. arXiv:0705. 1679 90. Rosvall M, Bergstrom CT (2007) An information-theoretic framework for resolving community structure in complex networks. Proc Nat Acad Sci USA 104(18):7327–7331 91. Shannon CE, Weaver V (1949) The Mathematical Theory of Communication. University of Illinois Press, Champaign 92. Danon L, Díaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp P09008 93. Gustafsson M, Hörnquist M, Lombardi A (2006) Comparison

and validation of community structures in complex networks. Physica A 367:559–576 94. Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthr Res 33:452–473 95. Lusseau D (2003) The emergent properties of a dolphin social network. Proc R Soc Lond B 270(2):S186–188 96. Guimerá R, Amaral LAN (2005) Cartography of complex networks: modules and universal roles. J Stat Mech Theory Exp P02001

Books and Reviews Bollobás B (2001) Random Graphs. Cambridge University Press, Cambridge Chung FRK (1997) Spectral Graph Theory. CBMS Regional Conference Series in Mathematics 92. American Mathematical Society, Providence Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford Elsner U (1997) Graph Partitioning: a Survey. Technical Report 9727, Technische Universität Chemnitz, Chemnitz

Comparison of Discrete and Continuous Wavelet Transforms

Comparison of Discrete and Continuous Wavelet Transforms PALLE E. T. JORGENSEN1 , MYUNG-SIN SONG2 1 Department of Mathematics, The University of Iowa, Iowa City, USA 2 Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, USA Article Outline Glossary Definition of the Subject Introduction The Discrete vs. Continuous Wavelet Algorithms List of Names and Discoveries History Tools from Mathematics A Transfer Operator Future Directions Literature Acknowledgments Bibliography This glossary consists of a list of terms used inside the paper in mathematics, in probability, in engineering, and, on occasion, in physics. To clarify the seemingly confusing use of up to four different names for the same idea or concept, we have further added informal explanations spelling out the reasons behind the differences in current terminology from neighboring fields. D ISCLAIMER : This glossary has the structure of four areas. A number of terms are listed line by line, and each line is followed by explanation. Some “terms” have up to four separate (yet commonly accepted) names.

Glossary MATHEMATICS : function (measurable), PROBABILITY: random variable, ENGINEERING: signal, PHYSICS: state Mathematically, functions may map between any two sets, say, from X to Y; but if X is a probability space (typically called ˝), it comes with a -algebra B of measurable sets, and probability measure P. Elements E in B are called events, and P(E) the probability of E. Corresponding measurable functions with values in a vector space are called random variables, a terminology which suggests a stochastic viewpoint. The function values of a random variable may represent the outcomes of an experiment, for example “throwing of a die”. Yet, function theory is widely used in engineering where functions are typically thought of as signal. In this case, X may be the real line for time, or Rd . En-

gineers visualize functions as signals. A particular signal may have a stochastic component, and this feature simply introduces an extra stochastic variable into the “signal”, for example noise. Turning to physics, in our present application, the physical functions will be typically be in some L2 space, and L2 -functions with unit norm represent quantum mechanical “states”. MATHEMATICS : sequence (incl. vector-valued), PROBABILITY: random walk, ENGINEERING : timeseries, PHYSICS: measurement Mathematically, a sequence is a function defined on the integers Z or on subsets of Z, for example the natural numbers N. Hence, if time is discrete, this to the engineer represents a time series, such as a speech signal, or any measurement which depends on time. But we will also allow functions on lattices such as Zd . In the case d D 2, we may be considering the grayscale numbers which represent exposure in a digital camera. In this case, the function (grayscale) is defined on a subset of Z2 , and is then simply a matrix. A random walk on Zd is an assignment of a sequential and random motion as a function of time. The randomness presupposes assigned probabilities. But we will use the term “random walk” also in connection with random walks on combinatorial trees. MATHEMATICS : nested subspaces, PROBABILITY: refinement, ENGINEERING: multiresolution, PHYSICS: scales of visual resolutions While finite or infinite families of nested subspaces are ubiquitous in mathematics, and have been popular in Hilbert space theory for generations (at least since the 1930s), this idea was revived in a different guise in 1986 by Stéphane Mallat, then an engineering graduate student. In its adaptation to wavelets, the idea is now referred to as the multiresolution method. What made the idea especially popular in the wavelet community was that it offered a skeleton on which various discrete algorithms in applied mathematics could be attached and turned into wavelet constructions in harmonic analysis. In fact what we now call multiresolutions have come to signify a crucial link between the world of discrete wavelet algorithms, which are popular in computational mathematics and in engineering (signal/image processing, data mining, etc.) on the one side, and on the other side continuous wavelet bases in function spaces, especially in L2 (Rd ). Further, the multiresolution idea closely mimics how fractals are analyzed with the use of finite function systems. But in mathematics, or more precisely in operator theory, the underlying idea dates back to work of John von

513

514

Comparison of Discrete and Continuous Wavelet Transforms

Neumann, Norbert Wiener, and Herman Wold, where nested and closed subspaces in Hilbert space were used extensively in an axiomatic approach to stationary processes, especially for time series. Wold proved that any (stationary) time series can be decomposed into two different parts: The first (deterministic) part can be exactly described by a linear combination of its own past, while the second part is the opposite extreme; it is unitary, in the language of von Neumann. Von Neumann’s version of the same theorem is a pillar in operator theory. It states that every isometry in a Hilbert space H is the unique sum of a shift isometry and a unitary operator, i. e., the initial Hilbert space H splits canonically as an orthogonal sum of two subspaces Hs and Hu in H , one which carries the shift operator, and the other Hu the unitary part. The shift isometry is defined from a nested scale of closed spaces V n , such that the intersection of these spaces is Hu . Specifically,     V1  V0  V1  V2      Vn  VnC1     ^ _ Vn D Hu ; and Vn D H : n

n

However, Stéphane Mallat was motivated instead by the notion of scales of resolutions in the sense of optics. This, in turn, is based on a certain “artificial-intelligence” approach to vision and optics, developed earlier by David Marr at MIT, an approach which imitates the mechanism of vision in the human eye. The connection from these developments in the 1980s back to von Neumann is this: Each of the closed subspaces V n corresponds to a level of resolution in such a way that a larger subspace represents a finer resolution. Resolutions are relative, not absolute! In this view, the relative complement of the smaller (or coarser) subspace in larger space then represents the visual detail which is added in passing from a blurred image to a finer one, i. e., to a finer visual resolution. This view became an instant hit in the wavelet community, as it offered a repository for the fundamental father and the mother functions, also called the scaling function ', and the wavelet function . Via a system of translation and scaling operators, these functions then generate nested subspaces, and we recover the scaling identities which initialize the appropriate algorithms. What results is now called the family of pyramid algorithms in wavelet analysis. The approach itself is called the multiresolution approach (MRA) to wavelets. And in the meantime various generalizations (GMRAs) have emerged.

In all of this, there was a second “accident” at play: As it turned out, pyramid algorithms in wavelet analysis now lend themselves via multiresolutions, or nested scales of closed subspaces, to an analysis based on frequency bands. Here we refer to bands of frequencies as they have already been used for a long time in signal processing. One reason for the success in varied disciplines of the same geometric idea is perhaps that it is closely modeled on how we historically have represented numbers in the positional number system. Analogies to the Euclidean algorithm seem especially compelling. MATHEMATICS : operator, PROBABILITY: process, ENGINEERING : black box, PHYSICS : observable (if selfadjoint) In linear algebra students are familiar with the distinctions between (linear) transformations T (here called “operators”) and matrices. For a fixed operator T : V ! W, there is a variety of matrices, one for each choice of basis in V and in W. In many engineering applications, the transformations are not restricted to be linear, but instead represent some experiment (“black box”, in Norbert Wiener’s terminology), one with an input and an output, usually functions of time. The input could be an external voltage function, the black box an electric circuit, and the output the resulting voltage in the circuit. (The output is a solution to a differential equation.) This context is somewhat different from that of quantum mechanical (QM) operators T : V ! V where V is a Hilbert space. In QM, selfadjoint operators represent observables such as position Q and momentum P, or time and energy. MATHEMATICS : Fourier dual pair, PROBABILITY: generating function, ENGINEERING: time/frequency, PHYSICS : P/Q The following dual pairs position Q/momentum P, and time/energy may be computed with the use of Fourier series or Fourier transforms; and in this sense they are examples of Fourier dual pairs. If for example time is discrete, then frequency may be represented by numbers in the interval [ 0; 2); or in [ 0; 1) if we enter the number 2 into the Fourier exponential. Functions of the frequency are then periodic, so the two endpoints are identified. In the case of the interval [ 0; 1), 0 on the left is identified with 1 on the right. So a low frequency band is an interval centered at 0, while a high frequency band is an interval centered at 1/2. Let a function W on [ 0; 1) represent a probability assignment. Such functions W are thought of as “filters” in signal processing. We say that W is low-pass if it is 1 at 0, or if it is near 1 for frequencies near 0.

Comparison of Discrete and Continuous Wavelet Transforms

Low-pass filters pass signals with low frequencies, and block the others. If instead some filter W is 1 at 1/2, or takes values near 1 for frequencies near 1/2, then we say that W is high-pass; it passes signals with high frequency. MATHEMATICS : convolution, PROBABILITY: —, ENGINEERING : filter, PHYSICS : smearing Pointwise multiplication of functions of frequencies corresponds in the Fourier dual time-domain to the operation of convolution (or of Cauchy product if the time-scale is discrete.) The process of modifying a signal with a fixed convolution is called a linear filter in signal processing. The corresponding Fourier dual frequency function is then referred to as “frequency response” or the “frequency response function”. More generally, in the continuous case, since convolution tends to improve smoothness of functions, physicists call it “smearing.” MATHEMATICS : decomposition (e. g., Fourier coefficients in a Fourier expansion), PROBABILITY: , ENGINEERING : analysis, PHYSICS : frequency components Calculating the Fourier coefficients is “analysis,” and adding up the pure frequencies (i. e., summing the Fourier series) is called synthesis. But this view carries over more generally to engineering where there are more operations involved on the two sides, e. g., breaking up a signal into its frequency bands, transforming further, and then adding up the “banded” functions in the end. If the signal out is the same as the signal in, we say that the analysis/synthesis yields perfect reconstruction. MATHEMATICS : integrate (e. g., inverse Fourier PROBABILITY: reconstruct, ENGI transform), NEERING : synthesis, PHYSICS : superposition Here the terms related to “synthesis” refer to the second half of the kind of signal-processing design outlined in the previous paragraph. MATHEMATICS : subspace, PROBABILITY: —, ENGINEERING : resolution, PHYSICS : (signals in a) frequency band For a space of functions (signals), the selection of certain frequencies serves as a way of selecting special signals. When the process of scaling is introduced into optics of a digital camera, we note that a nested family of subspaces corresponds to a grading of visual resolutions. MATHEMATICS : Cuntz relations, PROBABILITY: —, ENGINEERING : perfect reconstruction from subbands, PHYSICS: subband decomposition N1 X iD0

S i S i D 1 ;

and

S i S j D ı i; j 1 :

MATHEMATICS : inner product, PROBABILITY: correlation, ENGINEERING : transition probability, PHYSICS: probability of transition from one state to another In many applications, a vector space with inner product captures perfectly the geometric and probabilistic features of the situation. This can be axiomatized in the language of Hilbert space; and the inner product is the most crucial ingredient in the familiar axiom system for Hilbert space. MATHEMATICS : fout D T fin , PROBABILITY: —, ENGINEERING : input/output, PHYSICS : transformation of states Systems theory language for operators T : V ! W. Then vectors in V are input, and in the range of T output. MATHEMATICS : fractal, PROBABILITY: —, ENGINEERING : , PHYSICS :  Intuitively, think of a fractal as reflecting similarity of scales such as is seen in fern-like images that look “roughly” the same at small and at large scales. Fractals are produced from an infinite iteration of a finite set of maps, and this algorithm is perfectly suited to the kind of subdivision which is a cornerstone of the discrete wavelet algorithm. Self-similarity could refer alternately to space, and to time. And further versatility is added, in that flexibility is allowed into the definition of “similar”. MATHEMATICS : —, PROBABILITY: —, ENGINEERING : data mining, PHYSICS:  The problem of how to handle and make use of large volumes of data is a corollary of the digital revolution. As a result, the subject of data mining itself changes rapidly. Digitized information (data) is now easy to capture automatically and to store electronically. In science, commerce, and industry, data represent collected observations and information: In business, there are data on markets, competitors, and customers. In manufacturing, there are data for optimizing production opportunities, and for improving processes. A tremendous potential for data mining exists in medicine, genetics, and energy. But raw data are not always directly usable, as is evident by inspection. A key to advances is our ability to extract information and knowledge from the data (hence “data mining”), and to understand the phenomena governing data sources. Data mining is now taught in a variety of forms in engineering departments, as well as in statistics and computer science departments. One of the structures often hidden in data sets is some degree of scale. The goal is to detect and identify one or more natural global and local scales in the data. Once this is done, it is often possible to detect associated similarities of scale, much like the familiar scale-similar-

515

516

Comparison of Discrete and Continuous Wavelet Transforms

ity from multidimensional wavelets, and from fractals. Indeed, various adaptations of wavelet-like algorithms have been shown to be useful. These algorithms themselves are useful in detecting scale-similarities, and are applicable to other types of pattern recognition. Hence, in this context, generalized multiresolutions offer another tool for discovering structures in large data sets, such as those stored in the resources of the Internet. Because of the sheer volume of data involved, a strictly manual analysis is out of the question. Instead, sophisticated query processors based on statistical and mathematical techniques are used in generating insights and extracting conclusions from data sets. Multiresolutions Haar’s work in 1909–1910 had implicitly the key idea which got wavelet mathematics started on a roll 75 years later with Yves Meyer, Ingrid Daubechies, Stéphane Mallat, and others—namely the idea of a multiresolution. In that respect Haar was ahead of his time. See Figs. 1 and 2 for details.     V1  V0  V1     ; V0 C W0 D V1 The word “multiresolution” suggests a connection to optics from physics. So that should have been a hint to mathematicians to take a closer look at trends in signal and image processing! Moreover, even staying within mathematics, it turns out that as a general notion this same idea of a “multiresolution” has long roots in mathematics, even in such modern and pure areas as operator theory and Hilbert-space geometry. Looking even closer at these interconnections, we can now recognize scales of subspaces (so-called multiresolutions) in classical algorithmic construction of orthog-

Comparison of Discrete and Continuous Wavelet Transforms, Figure 1 Multiresolution. L2 (Rd )-version (continuous); ' 2 V0 , 2 W0

Comparison of Discrete and Continuous Wavelet Transforms, Figure 2 Multiresolution. l 2 (Z)-version (discrete); ' 2 V0 , 2 W0

onal bases in inner-product spaces, now taught in lots of mathematics courses under the name of the Gram– Schmidt algorithm. Indeed, a closer look at good old Gram–Schmidt reveals that it is a matrix algorithm, Hence new mathematical tools involving non-commutativity! If the signal to be analyzed is an image, then why not select a fixed but suitable resolution (or a subspace of signals corresponding to a selected resolution), and then do the computations there? The selection of a fixed “resolution” is dictated by practical concerns. That idea was key in turning computation of wavelet coefficients into iterated matrix algorithms. As the matrix operations get large, the computation is carried out in a variety of paths arising from big matrix products. The dichotomy, continuous vs. discrete, is quite familiar to engineers. The industrial engineers typically work with huge volumes of numbers. Numbers! – So why wavelets? Well, what matters to the industrial engineer is not really the wavelets, but the fact that special wavelet functions serve as an efficient way to encode large data sets – I mean encode for computations. And the wavelet algorithms are computational. They work on numbers. Encoding numbers into pictures, images, or graphs of functions comes later, perhaps at the very end of the computation. But without the graphics, I doubt that we would understand any of this half as well as we do now. The same can be said for the many issues that relate to the crucial mathematical concept of self-similarity, as we know it from fractals, and more generally from recursive algorithms. Definition of the Subject In this paper we outline several points of view on the interplay between discrete and continuous wavelet transforms; stressing both pure and applied aspects of both. We outline some new links between the two transform technologies based on the theory of representations of generators and relations. By this we mean a finite system of generators which are represented by operators in Hilbert space. We further outline how these representations yield sub-band filter banks for signal and image processing algorithms. The word “wavelet transform” (WT) means different things to different people: Pure and applied mathematicians typically give different answers the questions “What is the WT?” And engineers in turn have their own preferred quite different approach to WTs. Still there are two main trends in how WTs are used, the continuous WT on one side, and the discrete WT on the other. Here we offer

Comparison of Discrete and Continuous Wavelet Transforms

a user friendly outline of both, but with a slant toward geometric methods from the theory of operators in Hilbert space. Our paper is organized as follows: For the benefit of diverse reader groups, we begin with Glossary (Sect. “Glossary”). This is a substantial part of our account, and it reflects the multiplicity of how the subject is used. The concept of multiresolutions or multiresolution analysis (MRA) serves as a link between the discrete and continuous theory. In Sect. “List of Names and Discoveries”, we summarize how different mathematicians and scientists have contributed to and shaped the subject over the years. The next two sections then offer a technical overview of both discrete and the continuous WTs. This includes basic tools from Fourier analysis and from operators in Hilbert space. In Sect. “Tools from Mathematics” and Sect. “A Transfer Operator”, we outline the connections between the separate parts of mathematics and their applications to WTs. Introduction While applied problems such as time series, signals and processing of digital images come from engineering and from the sciences, they have in the past two decades taken a life of their own as an exciting new area of applied mathematics. While searches in Google on these keywords typically yield sites numbered in the millions, the diversity of applications is wide, and it seems reasonable here to narrow our focus to some of the approaches that are both more mathematical and more recent. For references, see for example [1,6,23,31]. In addition, our own interests (e. g., [20,21,27,28]) have colored the presentation below. Each of the two areas, the discrete side, and the continuous theory is huge as measured by recent journal publications. A leading theme in our article is the independent interest in a multitude of interconnections between the discrete algorithm and their uses in the more mathematical analysis of function spaces (continuous wavelet transforms). The mathematics involved in the study and the applications of this interaction we feel is of benefit to both mathematicians and to engineers. See also [20]. An early paper [9] by Daubechies and Lagarias was especially influential in connecting the two worlds, discrete and continuous.

and pass to the Hilbert space L2 (Rd ) of all square integrable functions on Rd , referring to d-dimensional Lebesgue measure. A wavelet basis refers to a family of basis functions for L2 (Rd ) generated from a finite set of normalized functions i , the index i chosen from a fixed and finite index set I, and from two operations, one called scaling, and the other translation. The scaling is typically specified by a d by d matrix over the integers Z such that all the eigenvalues in modulus are bigger than one, lie outside the closed unit disk in the complex plane. The d-lattice is denoted Zd , and the translations will be by vectors selected from Zd . We say that we have a wavelet basis if the triple indexed family i; j;k (x) :D j det Aj j/2 (A j x C k) forms an orthonormal basis (ONB) for L2 (Rd ) as i varies in I, j 2 Z, and k 2 Rd . The word “orthonormal” for a family F of vectors in a Hilbert space H refers to the norm and the inner product in H : The vectors in an orthonormal family F are assumed to have norm one, and to be mutually orthogonal. If the family is also total (i. e., the vectors in F span a subspace which is dense in H ), we say that F is an orthonormal basis (ONB.) While there are other popular wavelet bases, for example frame bases, and dual bases (see e. g., [2,18] and the papers cited there), the ONBs are the most agreeable at least from the mathematical point of view. That there are bases of this kind is not at all clear, and the subject of wavelets in this continuous context has gained much from its connections to the discrete world of signal- and image-processing. Here we shall outline some of these connections with an emphasis on the mathematical context. So we will be stressing the theory of Hilbert space, and bounded linear operators acting in Hilbert space H , both individual operators, and families of operators which form algebras. As was noticed recently the operators which specify particular subband algorithms from the discrete world of signal-processing turn out to satisfy relations that were found (or rediscovered independently) in the theory of operator algebras, and which go under the name of Cuntz algebras, denoted O N if n is the number of bands. For additional details, see e. g., [21]. N1 , and In symbols the C  -algebra has generators (S i ) iD0 the relations are N1 X

S i S i D 1

(1)

iD0

The Discrete vs. Continuous Wavelet Algorithms The Discrete Wavelet Transform If one stays with function spaces, it is then popular to pick the d-dimensional Lebesgue measure on Rd , d D 1; 2; : : :,

(where 1 is the identity element in O N ) and N1 X iD0

S i S i D 1 ;

and

S i S j D ı i; j 1 :

(2)

517

518

Comparison of Discrete and Continuous Wavelet Transforms

In a representation on a Hilbert space, say H , the symbols Si turn into bounded operators, also denoted Si , and the identity element 1 turns into the identity operator I in H , i. e., the operator I : h ! h, for h 2 H . In operator language, the two formulas (1) and (2) state that each Si is an isometry in H , and that the respective ranges S i H are mutually orthogonal, i. e., S i H ? S j H for i ¤ j. Introducing the projections Pi D S i S i , we get Pi Pj D ı i; j Pi , and N1 X

Pi D I :

iD0

In the engineering literature this takes the form of programming diagrams: Fig. 3. If the process of Fig. 3 is repeated, we arrive at the discrete wavelet transform (Fig. 4) or stated in the form of images (n D 5) (Fig. 5). Selecting a resolution subspace V0 D closure span f'(  k)jk 2 Zg, we arrive at a wavelet subdivision f j;k j j 0; k 2 Zg, where j;k (x) D 2 j/2 (2 j x  k), P and the continuous expansion f D j;k h j;k j f i j;k or the discrete analogue derived from the isometries, i D 1; 2; : : : ; N  1, S0k S i for k D 0; 1; 2; : : : ; called the discrete wavelet transform.

Comparison of Discrete and Continuous Wavelet Transforms, Figure 3 Perfect reconstruction in a subband filtering as used in signaland image-processing

Comparison of Discrete and Continuous Wavelet Transforms, Figure 4 Binary decision tree for a sequential selection of filters.

Notational Convention In algorithms, the letter N is popular, and often used for counting more than one thing. In the present contest of the discrete wavelet algorithm (DWA) or DWT, we count two things, “the number of times a picture is decomposed via subdivision”. We have used n for this. The other related but different number N is the number of subbands, N D 2 for the dyadic DWT, and N D 4 for the image DWT. The image-processing WT in our present context is the tensor product of the 1-D dyadic WT, so 2  2 D 4. Caution: Not all DWAs arise as tensor products of N D 2 models. The wavelets coming from tensor products are called separable. When a particular image-processing scheme is used for generating continuous wavelets it is not transparent if we are looking at a separable or inseparable wavelet!

Comparison of Discrete and Continuous Wavelet Transforms, Figure 5 The subdivided squares represent the use of the pyramid subdivision algorithm to image processing, as it is used on pixel squares. At each subdivision step the top left-hand square represents averages of nearby pixel numbers, averages taken with respect to the chosen low-pass filter; while the three directions, horizontal, vertical, and diagonal represent detail differences, with the three represented by separate bands and filters. So in this model, there are four bands, and they may be realized by a tensor product construction applied to dyadic filters in the separate x- and the y-directions in the plane. For the discrete WT used in image-processing, we use iteration of four isometries S0 ; SH ; SV ; and SD with mutually orthogonal ranges, and satisC SH S C SV S C SD S D I, fying the following sum-rule S0 S 0 H V D with I denoting the identity operator in an appropriate l 2 -space

Comparison of Discrete and Continuous Wavelet Transforms

The Continuous Wavelet Transform Consider functions f on the real line R. We select the Hilbert space of functions to be L2 (R) To start a continuous WT, we must select a function 2 L2 (R) and r; s 2 R such that the following family of functions r;s (x)

D r1/2

x  s r

creates an over-complete basis for L2 (R). An over-complete family of vectors in a Hilbert space is often called a coherent decomposition. This terminology comes from quantum optics. What is needed for a continuous WT in the simplest case is the following representation valid for all f 2 L2 (R): f (x) D C 1

“ h R2

r;s j f i r;s (x)

drds r2

R where C :D R j ˆ (!)j2 d! D ! and where h r;s j f i R (y) f (y)dy. The refinements and implications of r;s R this are spelled out in tables in Sect. “Connections to Group Theory”. Some Background on Hilbert Space Comparison of Discrete and Continuous Wavelet Transforms, Figure 6 n D 2 Jorgensen. The selection of filters (Fig 4) is represented by the use of one of the operators Si in Fig 3. A planar version of this principle is illustrated in Fig 6. For a more detailed discussion, see e. g., [3]

Wavelet theory is the art of finding a special kind of basis in Hilbert space. Let H be a Hilbert space over C and denote the inner product h  j  i. For us, it is assumed linear in the second variable. If H D L2 (R), then Z f (x) g (x) dx : h f j g i :D R

To clarify the distinction, it is helpful to look at the representations of the Cuntz relations by operators in Hilbert space. We are dealing with representations of the two distinct algebras O2 , and O4 ; two frequency subbands vs. four subbands. Note that the Cuntz O2 , and O4 are given axiomatic, or purely symbolically. It is only when subband filters are chosen that we get representations. This also means that the choice of N is made initially; and the same N is used in different runs of the programs. In contrast, the number of times a picture is decomposed varies from one experiment to the next! Summary: N D 2 for the dyadic DWT: The operators in the representation are S0 , S1 . One average operator, and one detail operator. The detail operator S1 “counts” local detail variations. Image-processing. Then N D 4 is fixed as we run different images in the DWT: The operators are now: S0 , SH , SV , SD . One average operator, and three detail operator for local detail variations in the three directions in the plane.

If H D `2 (Z), then X ¯n n : h  j  i :D n2Z

Let T D R/2Z. If H D L2 (T ), then Z  1 f ( ) g ( ) d : h f j g i :D 2  Functions f 2 L2 (T ) have Fourier series: Setting e n ( ) D e i n , Z  1 ei n f ( ) d ; fˆ (n) :D h e n j f i D 2  and k f k2L 2 (T ) D

Xˇ ˇ ˇ fˆ (n) ˇ2 : n2Z

519

520

Comparison of Discrete and Continuous Wavelet Transforms

Similarly if f 2 L2 (R), then Z ei x t f (x) dx ; fˆ (t) :D

Increasing the Dimension In wavelet theory, [7] there is a tradition for reserving ' for the father function and for the mother function. A 1-level wavelet transform of an N  M image can be represented as

R

and k f k2L 2 (R) D

1 2

Z R

0

ˇ ˇ ˇ fˆ (t) ˇ2 dt :

Let J be an index set. We shall only need to consider the case when J is countable. Let f ˛ g˛2J be a family of nonzero vectors in a Hilbert space H . We say it is an orthonormal basis (ONB) if ˝ ˛ (Kronecker delta) (3) ˛ j ˇ D ı˛;ˇ and if X

jh

˛

j f ij2 D k f k2

holds for all f 2 H :

(4)

˛2J

a1 @ f 7! – v1

Introducing the rank-one operators Q˛ :D j ˛ i h ˛ j of Dirac’s terminology, see [3], we see that f ˛ g˛2J is an ONB if and only if the Q˛ ’s are projections, and X Q˛ D I (D the identity operator in H ) : (5) ˛2J

It is a (normalized) tight frame if and only if Eq. (5) holds but with no further restriction on the rank-one operators Q˛ . It is a frame with frame constants A and B if the operator X Q˛ S :D ˛2J

satisfies

i

˝ Wn1 : H (x; y) D (x)'(y) XX D g i h j '(2x  i)'(2y  j)

h D

i 1

j

˝ Vn1 : V (x; y) D '(x) (y) XX D h i g j '(2x  i)'(2y  j)

v D

Wm1 i

1

j

Vm1

(7)

j

˝ Wn1 : D (x; y) D (x) (y) XX D g i g j '(2x  i)'(2y  j)

d D

Wm1 i

j

where ' is the father function and is the mother function in sense of wavelet, V space denotes the average space and the W spaces are the difference space from multiresolution analysis (MRA) [7]. In the formulas, we have the following two indexed number systems a :D (h i ) and d :D (g i ), a is for averages, and d is for local differences. They are really the input for the DWT. But they also are the key link between the two transforms, the discrete and continuous. The link is made up of the following scaling identities:

AI  S  BI in the order of Hermitian operators. (We say that operators H i D H i , i D 1; 2, satisfy H1  H2 if h f j H1 f i  h f j H2 f i holds for all f 2 H ). If h; k are vectors in a Hilbert space H , then the operator A D jhi hkj is defined by the identity h u j Av i D h u j h i h k j v i for all u; v 2 H . Wavelets in L2 (R) are generated by simple operations on one or more functions in L2 (R), the operations come in pairs, say scaling and translation, or phase-modulation and translations. If N 2 f2; 3; : : : g we set  j  j/2 N x  k for j; k 2 Z : j;k (x) :D N

(6)

a1 D Vm1 ˝ Vn1 : ' A (x; y) D '(x)'(y) XX D h i h j '(2x  i)'(2y  j)

˛2J

holds for all f 2 H :

j

1 h1 –A d1

where the subimages h1 ; d1 ; a1 and v1 each have the dimension of N/2 by M/2.

1

If only Eq. (4) is assumed, but not Eq. (3), we say that f ˛ g˛2J is a (normalized) tight frame. We say that it is a frame with frame constants 0 < A  B < 1 if X A k f k2  jh ˛ j f ij2  B k f k2

j

'(x) D 2

X

h i '(2x  i) ;

i2Z

(x) D 2

X

g i '(2x  i) ;

i2Z

P The and (low-pass normalization) i2Z h i D 1. scalars (h i ) may be real or complex; they may be finite or infinite in number. If there are four of them, it is called the “four tap”, etc. The finite case is best for computations since it corresponds to compactly supported functions. This means that the two functions ' and will vanish outside some finite interval on a real line.

Comparison of Discrete and Continuous Wavelet Transforms

The two number systems are further subjected to orthogonality relations, of which X i2Z

1 h¯ i h iC2k D ı0;k 2

(8)

is the best known. The systems h and g are both low-pass and high-pass filter coefficients. In Eq. (7), a1 denotes the first averaged image, which consists of average intensity values of the original image. Note that only ' function, V space and h coefficients are used here. Similarly, h1 denotes the first detail image of horizontal components, which consists of intensity difference along the vertical axis of the original image. Note that ' function is used on y and function on x, W space for x values and V space for y values; and both h and g coefficients are used accordingly. The data v1 denote the first detail image of vertical components, which consists of intensity difference along the horizontal axis of the original image. Note that ' function is used on x and function on y, W space for y values and V space for x values; and both h and g coefficients are used accordingly. Finally, d1 denotes the first detail image of diagonal components, which consists of intensity difference along the diagonal axis of the original image. The original image is reconstructed from the decomposed image by taking the sum of the averaged image and the detail images and scaling by a scaling factor. It could be noted that only function, W space and g coefficients are used here. See [28,33]. This decomposition not only limits to one step but it can be done again and again on the averaged detail depending on the size of the image. Once it stops at certain level, quantization (see [26,32]) is done on the image. This quantization step may be lossy or lossless. Then the lossless entropy encoding is done on the decomposed and quantized image. The relevance of the system of identities Eq. (8) may be summarized as follows. Set 1X m0 (z) :D h k z k for all z 2 T ; 2 k2Z

g k :D (1) k h¯ 1k

for all k 2 Z ;

1X g k z k ; and 2 k2Z p (S j f )(z) D 2m j (z) f (z2 ) ; m1 (z) :D

for j D 0; 1 ; f 2 L2 (T ) ; z 2 T : Then the following conditions are equivalent: (a) The system of Eq. (8) is satisfied.

Comparison of Discrete and Continuous Wavelet Transforms, Figure 7 Matrix representation of filters operations.

(b) The operators S0 and S1 satisfy the Cuntz relations. (c) We have perfect reconstruction in the subband system of Fig. 4. Note that the two operators S0 and S1 have equivalent matrix representations. Recalling that by Parseval’s formula, we have L2 (T ) ' l 2 (Z). So representing S0 instead as an 1  1 matrix acting on column vectors x D (x j ) j2Z we get p X h i2 j x j (S0 x) i D 2 j2Z

and for the adjoint operator F0 :D S0 , we get the matrix representation 1 X¯ (F0 x) i D p h j2i x j 2 j2Z with the overbar signifying complex conjugation. This is computational significance to the two matrix representations, both the matrix for S0 , and for F0 :D S0 , is slanted. However, the slanting of one is the mirror-image of the other, i. e., Fig. 7. Significance of Slanting The slanted matrix representations refers to the corresponding operators in L2 . In general operators in Hilbert function spaces have many matrix representations, one for each orthonormal basis (ONB), but here we are concerned with the ONB consisting of the Fourier frequencies z j , j 2 Z. So in our matrix representations for the S operators and their adjoints we will be acting on column vectors, each infinite column representing a vector in the sequence space l2 . A vector in l2 is said to be of finite size if it has only a finite set of non-zero entries. It is the matrix F 0 that is effective for iterated matrix computation. Reason: When a column vector x of a fixed size, say 2s is multiplied, or acted on by F 0 , the result is a vector y of half the size, i. e., of size s. So y D F0 x. If we use F 0 and F 1 together on x, then we get two vectors, each of size s, the other one z D F1 x, and we can form the combined column vector of y and z; stacking y on top of z. In

521

522

Comparison of Discrete and Continuous Wavelet Transforms

our application, y represents averages, while z represents local differences: Hence the wavelet algorithm. 2 : 3 : 6 : 7 6 y1 7 7 6 2 3 6 y0 7 :: 7 6 : 6 7 6 y1 7 6 7 7 6 6 : 7 2 3 6x2 7 6 6 :: 7 F0 6x1 7 7 7 6 7 7 4 56 6 6 – 7 D – 6 x0 7 6 7 6 : 7 6 : 7 F1 6 x1 7 6 7 6 : 7 6 x2 7 7 6 6 z1 7 4 5 :: 7 6 6 z0 7 : 7 6 6 z1 7 5 4 :: : y D F0 x z D F1 x Connections to Group Theory The first line in the two tables below is the continuous wavelet transform. It comes from what in physics is called coherent vector decompositions. Both transforms applies to vectors in Hilbert space H , and H may vary from case to case. Common to all transforms is vector input and output. If the input agrees with output we say that the combined process yields the identity operator image. N1 is 1 : H ! H or written 1H . So for example if (S i ) iD0 a finite operator system, and input/output operator example may take the form N1 X

The Summary of and variations on the resolution of the identity operator 1 in L2 or in `2 , for and ˜ where  xs  1 r;s (x) D r 2 r , Z C

D

R

d! ˆ j (!)j2 < 1 ; j!j

R d! ˆ (!) ˆ˜ (!) is given in similarly for ˜ and C ; ˜ D R j!j Table 1. Then the assertions in Table 1 amount to the equations in Table 2. A function satisfying the resolution identity is called a coherent vector in mathematical physics. The representation theory ˚  for  the (ax C b)-group,  i. e., the matrix a b j a 2 R ; b 2 R , serves as its ungroup G D C 0 1 derpinning. Then the tables above illustrate how the f j;k g wavelet system arises from a discretization of the following unitary representation of G:



1 xb U a b  f (x) D a 2 f a 0 1 acting on L2 (R). This unitary representation also explains the discretization step in passing from the first line to the second in the tables above. The functions f j;k j j; k 2 Z g which make up a wavelet system result from the choice of a suitable coherent vector 2 L2 (R), and then setting

  j   j 2jx  k : D U (x) D 2 2 (x)  j j;k 2 k2 0

1

Even though this representation lies at the historical origin of the subject of wavelets, the (ax C b)-group seems to be

S i S i D 1H :

iD0

Comparison of Discrete and Continuous Wavelet Transforms, Table 1 Splitting of a signal into filtered components: average and a scale of detail. Several instances: Continuous vs. discrete Operator representation ND2 Continuous resolution Discrete resolution

Overcomplete Basis “ drds C 1 j r;s ih r;s j D 1 r2 2 R XXˇ ˇ ˛˝ ˇ j;k ˇ j;k D 1 ; j;k

Dual Bases “ drds C 1 j r;s ih ˜ r;s j D 1 ˜ ; r2 R2 XX ˇ ˛ ˇ j;k h ˜ j;k j D 1

j2Z k2Z

j2Z k2Z

Corresponding to r D 2j , s D k2j N2 Sequence spaces

Isometries in `2 N1 X Si S i D 1 ; where S0 ; : : : ; SN1

Dual Operator System in `2 N1 X Si S˜  i D 1 ; for a dual operator

iD0

iD0

are adjoints to the quadrature mirror filter operators F i , i. e., Si D Fi

system S0 ; : : : ; SN1 , S˜ 0 ; : : : ; S˜ N1

Comparison of Discrete and Continuous Wavelet Transforms

Comparison of Discrete and Continuous Wavelet Transforms, Table 2 Application of the operator representation to specific signals, contiuous and discrete “ drds drds 2 2 2 (R) C 1 jh j f ij Dkf k 8 f 2 L h f j r;s i h ˜ r;s j g i Dh f j g i 8 f ; g 2 L2 (R) r;s L2 ;˜ r2 r2 R2 R2 XX XX ˇ˝ ˝ ˛ ˛ˇ ˇ j;k j f ˇ2 Dkf k22 8 f 2 L2 (R) f j j;k h ˜ j;k j g i Dh f j g i 8 f ; g 2 L2 (R) L

C 1



j2Z k2Z N1 X  S c2 i iD0

2

D kck

2

8c 2 `

j2Z k2Z N1 X˝ S i c iD0

now largely forgotten in the next generation of the wavelet community. But Chaps. 1–3 of [7] still serve as a beautiful presentation of this (now much ignored) side of the subject. It also serves as a link to mathematical physics and to classical analysis. List of Names and Discoveries Many of the main discoveries summarized below are now lore. 1807 Jean Baptiste Joseph Fourier mathematics, physics (heat conduction) Expressing functions as sums of sine and cosine waves of frequencies in arithmetic progression (now called Fourier series). 1909 Alfred Haar mathematics Discovered, while a student of David Hilbert, an orthonormal basis consisting of step functions, applicable both to functions on an interval, and functions on the whole real line. While it was not realized at the time, Haar’s construction was a precursor of what is now known as the Mallat subdivision, and multiresolution method, as well as the subdivision wavelet algorithms. 1946 Denes Gabor (Nobel Prize): physics (optics, holography) Discovered basis expansions for what might now be called time-frequency wavelets, as opposed to timescale wavelets. 1948 Claude Elwood Shannon mathematics, engineering (information theory) A rigorous formula used by the phone company for sampling speech signals. Quantizing information, entropy, founder of what is now called the mathematical theory of communication. 1976 Claude Garland, Daniel Esteban (both) signal processing Discovered subband coding of digital transmission of speech signals over the telephone. 1981 Jean Morlet petroleum engineer Suggested the term “ondelettes.” J.M. decomposed re-

˛ j S˜  i d D hc j di

8 c; d 2 `2

flected seismic signals into sums of “wavelets (Fr.: ondelettes) of constant shape,” i. e., a decomposition of signals into wavelet shapes, selected from a library of such shapes (now called wavelet series). Received somewhat late recognition for his work. Due to contributions by A. Grossman and Y. Meyer, Morlet’s discoveries have now come to play a central role in the theory. 1985 Yves Meyer mathematics, applications Mentor for A. Cohen, S. Mallat, and other of the wavelet pioneers, Y.M. discovered infinitely often differentiable wavelets. 1989 Albert Cohen mathematics (orthogonality relations), numerical analysis Discovered the use of wavelet filters in the analysis of wavelets — the so-called Cohen condition for orthogonality. 1986 Stéphane Mallat mathematics, signal and image processing Discovered what is now known as the subdivision, and multiresolution method, as well as the subdivision wavelet algorithms. This allowed the effective use of operators in the Hilbert space L2 (R), and of the parallel computational use of recursive matrix algorithms. 1987 Ingrid Daubechies mathematics, physics, and communications Discovered differentiable wavelets, with the number of derivatives roughly half the length of the support interval. Further found polynomial algorithmic for their construction (with coauthor Jeff Lagarias; joint spectral radius formulas). 1991 Wayne Lawton mathematics (the wavelet transfer operator) Discovered the use of a transfer operator in the analysis of wavelets: orthogonality and smoothness. 1992 The FBI using wavelet algorithms in digitizing and compressing fingerprints C. Brislawn and his group at Los Alamos created the theory and the codes which allowed the compression of the enormous FBI fingerprint file, creating A/D, a new database of fingerprints.

523

524

Comparison of Discrete and Continuous Wavelet Transforms

2000 The International Standards Organization A wavelet-based picture compression standard, called JPEG 2000, for digital encoding of images. 1994 David Donoho statistics, mathematics Pioneered the use of wavelet bases and tools from statistics to “denoise” images and signals. History While wavelets as they have appeared in the mathematics literature (e. g., [7]) for a long time, starting with Haar in 1909, involve function spaces, the connections to a host of discrete problems from engineering is more subtle. Moreover the deeper connections between the discrete algorithms and the function spaces of mathematical analysis are of a more recent vintage, see e. g., [31] and [21]. Here we begin with the function spaces. This part of wavelet theory refers to continuous wavelet transforms (details below). It dominated the wavelet literature in the 1980s, and is beautifully treated in the first four chapters in [7] and in [8]. The word “continuous” refers to the continuum of the real line R. Here we consider spaces of functions in one or more real dimensions, i. e., functions on the line R (signals), the plane R2 (images), or in higher dimensions Rd , functions of d real variables. Tools from Mathematics In our presentation, we will rely on tools from at least three separate areas of mathematics, and we will outline how they interact to form a coherent theory, and how they come together to form a link between what is now called the discrete and the continuous wavelet transform. It is the discrete case that is popular with engineers ([1,23,29,30]), while the continuous case has come to play a central role in the part of mathematics referred to as harmonic analysis, [8]. The three areas are, operator algebras, dynamical systems, and basis constructions: a. Operator algebras. The theory of operator algebras in turn breaks up in two parts: One the study of “the algebras themselves” as they emerge from the axioms of von Neumann (von Neumann algebras), and Gelfand, Kadison and Segal (C  -algebras.) The other has a more applied slant: It involves “the representations” of the algebras. By this we refer to the following: The algebras will typically be specified by generators and by relations, and by a certain norm-completion, in any case by a system of axioms. This holds both for the normclosed algebras, the so called C  -algebras, and for the weakly closed algebras, the von Neumann algebras. In fact there is a close connection between the two parts of

the theory: For example, representations of C  -algebras generate von Neumann algebras. To talk about representations of a fixed algebra say A we must specify a Hilbert space, and a homomorphism from A into the algebra B(H) of all bounded operators on H . We require that sends the identity element in A into the identity operator acting on H , and that (a  ) D ( (a)) where the last star now refers to the adjoint operator. It was realized in the last ten years (see for example [3,21,22] that a family of representations that wavelets which are basis constructions in harmonic analysis, in signal/image analysis, and in computational mathematics may be built up from representations of an especially important family of simple C  -algebras, the Cuntz algebras. The Cuntz algebras are denoted O2 ; O3 ; : : :, including O1 . b. Dynamical systems. The connection between the Cuntz algebras O N for N D 2; 3; : : : are relevant to the kind of dynamical systems which are built on branchinglaws, the case of O N representing N-fold branching. The reason for this is that if N is fixed, O N includes in its definition an iterated subdivision, but within the context of Hilbert space. For more details, see e. g., [12,13,14,15,16,17,22]. c. Analysis of bases in function spaces. The connection to basis constructions using wavelets is this: The context for wavelets is a Hilbert space H , where H may be L2 (Rd ) where d is a dimension, d D 1 for the line (signals), d D 2 for the plane (images), etc. The more successful bases in Hilbert space are the orthonormal bases ONBs, but until the mid 1980s, there were no ONBs in L2 (Rd ) which were entirely algorithmic and effective for computations. One reason for this is that the tools that had been used for 200 years since Fourier involved basis functions (Fourier wave functions) which were not localized. Moreover these existing Fourier tools were not friendly to algorithmic computations. A Transfer Operator A popular tool for deciding if a candidate for a wavelet basis is in fact an ONB uses a certain transfer operator. Variants of this operator is used in diverse areas of applied mathematics. It is an operator which involves a weighted average over a finite set of possibilities. Hence it is natural for understanding random walk algorithms. As remarked in for example [12,20,21,22], it was also studied in physics, for example by David Ruelle, who used to prove results on phase transition for infinite spin systems in quantum statistical mechanics. In fact the transfer operator has many

Comparison of Discrete and Continuous Wavelet Transforms

Comparison of Discrete and Continuous Wavelet Transforms, Figure 8 Julia set with c D 1 These images are generated by Mathematica by authors for different c values for 'c (z) D z( 2) C c.

Comparison of Discrete and Continuous Wavelet Transforms, Figure 9 Julia set with c D 0:45  0:1428i These images are generated by Mathematica by authors for different c values for 'c (z) D z( 2) C c.

incarnations (many of them known as Ruelle operators), and all of them based on N-fold branching laws. In our wavelet application, the Ruelle operator weights in input over the N branch possibilities, and the weighting is assigned by a chosen scalar function W. the and the W-Ruelle operator is denoted RW . In the wavelet setting there is in addition a low-pass filter function m0 which in its frequency response formulation is a function on the d-torus Td D Rd /Zd . Since the scaling matrix A has integer entries A passes to the quotient Rd /Zd , and the induced transformation r A : T d ! T d is an N-fold cover, where N D j det Aj, i. e., for every x in T d there are N distinct points y in T d solving r A (y) D x. In the wavelet case, the weight function W is W D jm0 j2 . Then with this choice of W, the ONB problem for a candidate for a wavelet basis in the Hilbert space L2 (Rd ) as it turns out may be decided by the dimension of a distinguished eigenspace for RW , by the so called Perron– Frobenius problem. This has worked well for years for the wavelets which have an especially simple algorithm, the wavelets that are initialized by a single function, called the scaling function. These are called the multiresolution analysis (MRA) wavelets, or for short the MRA-wavelets. But there are instances, for example if a problem must be localized in fre-

quency domain, when the MRA-wavelets do not suffice, where it will by necessity include more than one scaling function. And we are then back to trying to decide if the output from the discrete algorithm, and the O N representation is an ONB, or if it has some stability property which will serve the same purpose, in case where asking for an ONB is not feasible. Future Directions The idea of a scientific analysis by subdividing a fixed picture or object into its finer parts is not unique to wavelets. It works best for structures with an inherent self-similarity; this self-similarity can arise from numerical scaling of distances. But there are more subtle non-linear self-similarities. The Julia sets in the complex plane are a case in point [4,5,10,11,24,25]. The simplest Julia set come from a one parameter family of quadratic polynomials 'c (z) D z2 C c, where z is a complex variable and where c is a fixed parameter. The corresponding Julia sets J c have a surprisingly rich structure. A simple way to understand them is the following: Consider the two branches of the p inverse ˇ˙ D z 7! ˙ z  c. Then J c is the unique minimal non-empty compact subset of C, which is invariant under fˇ˙ g. (There are alternative ways of presenting J c but this one fits our purpose. The Julia set J of

525

526

Comparison of Discrete and Continuous Wavelet Transforms

a holomorphic function, in this case z 7! z2 C c, informally consists of those points whose long-time behavior under repeated iteration, or rather iteration of substitutions, can change drastically under arbitrarily small perturbations.) Here “long-time” refers to large n, where ' (nC1) (z) D '(' (n) (z)), n D 0; 1; : : :, and ' (0) (z) D z. It would be interesting to adapt and modify the Haar wavelet, and the other wavelet algorithms to the Julia sets. The two papers [13,14] initiate such a development. Literature As evidenced by a simple Google check, the mathematical wavelet literature is gigantic in size, and the manifold applications spread over a vast number of engineering journals. While we cannot do justice to this volume st literature, we instead offer a collection of the classics [19] edited recently by C. Heil et al. Acknowledgments We thank Professors Dorin Dutkay and Judy Packer for helpful discussions. Work supported in part by the U.S. National Science Foundation. Bibliography 1. Aubert G, Kornprobst P (2006) Mathematical problems in image processing. Springer, New York 2. Baggett L, Jorgensen P, Merrill K, Packer J (2005) A non-MRA C r frame wavelet with rapid decay. Acta Appl Math 1–3:251–270 3. Bratelli O, Jorgensen P (2002) Wavelets through a looking glass: the world of the spectrum. Birkhäuser, Birkhäuser, Boston 4. Braverman M (2006) Parabolic julia sets are polynomial time computable. Nonlinearity 19(6):1383–1401 5. Braverman M, Yampolsky M (2006) Non-computable julia sets. J Amer Math Soc 19(3):551–578 (electronic) 6. Bredies K, Lorenz DA, Maass P (2006) An optimal control problem in medical image processing Springer, New York, pp 249– 259 7. Daubechies I (1992) Ten lectures on wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics, vol 61, SIAM, Philadelphia 8. Daubechies I (1993) Wavelet transforms and orthonormal wavelet bases. Proc Sympos Appl Math, Amer Math Soc 47:1– 33, Providence 9. Daubechies I, Lagarias JC (1992) Two-scale difference equations. II. Local regularity, infinite products of matrices and fractals. SIAM J Math Anal 23(4):1031–1079 10. Devaney RL, Look DM (2006) A criterion for sierpinski curve julia sets. Topology Proc 30(1):163–179. Spring Topology and Dynamical Systems Conference

11. Devaney RL, Rocha MM, Siegmund S (2007) Rational maps with generalized sierpinski gasket julia sets. Topol Appl 154(1):11– 27 12. Dutkay DE (2004) The spectrum of the wavelet galerkin operator. Integral Equations Operator Theory 4:477–487 13. Dutkay DE, Jorgensen PET (2005) Wavelet constructions in non-linear dynamics. Electron Res Announc Amer Math Soc 11:21–33 14. Dutkay DE, Jorgensen PET (2006) Hilbert spaces built on a similarity and on dynamical renormalization. J Math Phys 47(5):20 15. Dutkay DE, Jorgensen PET (2006) Iterated function systems, ruelle operators, and invariant projective measures. Math Comp 75(256):1931–1970 16. Dutkay DE, Jorgensen PET (2006) Wavelets on fractals. Rev Mat Iberoam 22(1):131–180 17. Dutkay DE, Roysland K (2007) The algebra of harmonic functions for a matrix-valued transfer operator. arXiv:math/0611539 18. Dutkay DE, Roysland K (2007) Covariant representations for matrix-valued transfer operators. arXiv:math/0701453 19. Heil C, Walnut DF (eds) (2006) Fundamental papers in wavelet theory. Princeton University Press, Princeton, NJ 20. Jorgensen PET (2003) Matrix factorizations, algorithms, wavelets. Notices Amer Math Soc 50(8):880–894 21. Jorgensen PET (2006) Analysis and probability: wavelets, signals, fractals. grad texts math, vol 234. Springer, New York 22. Jorgensen PET (2006) Certain representations of the cuntz relations, and a question on wavelets decompositions. In: Operator theory, operator algebras, and applications. Contemp Math 414:165–188 Amer Math Soc, Providence 23. Liu F (2006) Diffusion filtering in image processing based on wavelet transform. Sci China Ser F 49(4):494–503 24. Milnor J (2004) Pasting together julia sets: a worked out example of mating. Exp Math 13(1):55–92 25. Petersen CL, Zakeri S (2004) On the julia set of a typical quadratic polynomial with a siegel disk. Ann Math (2) 159(1):1– 52 26. Skodras A, Christopoulos C, Ebrahimi T (2001) JPEG 2000 still image compression standard. IEEE Signal Process Mag 18:36– 58 27. Song MS (2006) Wavelet image compression. Ph.D. thesis, University of Iowa 28. Song MS (2006) Wavelet image compression. In: Operator theory, operator algebras, and applications. Contemp. Math., vol 414. Amer. Math. Soc., Providence, RI, pp 41–73 29. Strang G (1997) Wavelets from filter banks. Springer, Singapore, pp 59–110 30. Strang G (2000) Signal processing for everyone. Computional mathematics driven by industrial problems (Martina F 1999), pp 365–412. Lect Notes Math, vol 1739. Springer, Berlin 31. Strang G, Nguyen T (1996) Wavelets and filter banks. WellesleyCambridge Press, Wellesley 32. Usevitch BE (Sept. 2001) A tutorial on modern lossy wavelet image compression: foundations of jpeg 2000. IEEE Signal Process Mag 18:22–35 33. Walker JS (1999) A primer on wavelets and their scientific applications. Chapman & Hall, CRC

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination SUI HUANG, STUART A. KAUFFMAN Institute for Biocomplexity and Informatics, Department of Biological Sciences, University of Calgary, Calgary, Canada Article Outline Glossary Definition of the Subject Introduction Overview: Studies of Networks in Systems Biology Network Architecture Network Dynamics Cell Fates, Cell types: Terminology and Concepts History of Explaning Cell Types Boolean Networks as Model for Complex GRNs Three Regimes of Behaviors for Boolean Networks Experimental Evidence from Systems Biology Future Directions and Questions Bibliography Glossary Transcription factor A protein that binds to the regulatory region of a target gene (its promoter or enhancer regions) and thereby, controls its expression (transcription of the target gene into a mRNA which is ultimately translated into the protein encoded by the target gene). Transcription factors account for temporal and contextual specificity of the expression of genes; for instance, a developmentally regulated gene is expressed only during a particular phase in development and in particular tissues. Gene regulatory network (GRN) Transcription factors regulate the expression of other transcription factor genes as well as other ‘non-regulatory’ genes which encode proteins, such as metabolic enzymes or structural proteins. A regulatory relationship between two genes thus is formalized as: “transcription factor A is the regulator of target gene B” or: A ! B. The entirety of such regulatory interactions forms a network = the gene regulatory network (GRN). Synonyms: genetic network or gene network, transcriptional network. Gene network architecture and topology The GRN can be represented as a “directed graph”. The latter consists of a set of nodes (= vertices) representing the genes connected by arrows (directed links or edges)

representing the regulatory interactions pointing from the regulator to the regulated target gene. [In contrast, in an undirected graph, the links are simple lines without arrowheads. The protein-interaction network can be represented as undirected graph]. The topology of a network is the structure of this graph and is an abstract notation, without physicality, of all the potential regulatory interactions between the genes. Topology usually is used to denote the simple interactions captured by the directed graph. For defining the network dynamics, however, additional aspects of interactions need to be specified, including: modalities or “sign” of an arrow (inhibitory vs. activating regulation), the ‘transfer functions’ (relationship between magnitude of input to that of the output = target gene) and the logical function (notably, in Boolean network, defining how multiple inputs are related to each other and are integrated to shaping the output). In this article when all this additional information is implied to be included, the term gene network architecture is used. Thus, the graph topology is a subset of network architecture. Gene network dynamics The collective change of the gene expression levels of the genes in a network, essentially, the change over time of the network state S. State space Phase space = the abstract space that contains all possible states S of a dynamical system. For (autonomous) gene regulatory networks, each state S is specified by the configuration of the expression levels of each of the N genes of the network; thus a system state S is one point in the N-dimensional state space. As the system changes its state over time, S moves along trajectories in the state space. Transcriptome Gene expression pattern across the entire (or large portion of) the genome, measured at the level of mRNA levels. Used as synonym to “gene expression profile”. The transcriptome can in a first approximation be considered a snapshot of the network state S of the GRN in gene network dynamics. Cell type A distinct, whole-cell phenotype characteristic of a mature cell specialized to exert an organ-specific function. Example of cell types are: liver cell, red blood cell, skin fibroblast, heart muscle cell, fat cell, etc. Cell types are characterized by their distinct cell morphology and their gene expression pattern. Cell fate A potential developmental outcome of a (stem or progenitor) cell. A cell fate of a stem cell can be the development into a particular mature cell type. Multipotency The ability of a cell to generate multiple cell types; a hallmark of stem cells. Stem cells are said to be multipotent (see also under Stem cells).

527

528

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Stem cell A multi-potent cell capable of “self-renewal” (division in which both daughter cells have the same degree of multi-potency as the mother cell) and can give rise to multiple cell types. There is a hierarchy of multipotency: a toti-potent embryonic stem cell can generate all possible cell types in the body, including extra-embryonic tissues, such as placenta. A pluripotent embryonic stem cell can generate tissues of three germ layers, i. e., it can produce all cell types of the foetus and the adult. A multipotent (sensu stritiore) stem cell of a tissue (e. g., blood) can give rise to all cell types of that tissue (e. g., a hematopoietic stem cell can produce all the blood cells). A multipotent progenitor cell can give rise to more than one cell types within a tissue (e. g. the granulocyte-monocyte progenitor cell). Cell lineage Developmental trajectory of a multipotent cell towards one of multiple cell types, e. g., the “myeloid lineage” among blood cells, comprising the white blood cells granulocytes, monocytes, etc. Thus, a cell fate decision is a decision between multiple lineages accessible to a stem or progenitor cell. Differentiation The process of cell fate decision in a stem or progenitor cell and the subsequent maturation into a mature cell type. Definition of the Subject Current studies of complex gene regulatory networks (GRN) in which thousands of genes regulate each others’ expression have revealed interesting features of the network structure using graph theory methods. But how does the particular network architecture translate into biology? Since individual genes alter their expression level as a consequence of the network interactions, the genome-wide gene expression pattern (transcriptome), which manifests the dynamics of the entire network, changes as a whole in a highly constrained manner. The transcriptome in turn determines the cell phenotype. Hence, the constraints in the global dynamics of the GRN directly map into the most elementary “biological observable”: the existence of distinct cell types in the metazoan body and their development from pluripotent stem cells. In this article a historical overview of the various levels at which GRNs are studied, starting from network architecture analysis to the dynamics are first presented. An introduction is given to continuous and discrete value models of GRN commonly used to understand the dynamics of small genetic circuits or of large genome wide networks, respectively. This will allow us to explain how the intuitive metaphor of the “epigenetic landscape”, a key idea that was proposed by Waddington in the 1940s to explain

the generation of discrete cell fates, formally arises from gene network dynamics. This central idea appears in its modern form in formal and molecular terms as the concept that cell types represent attractor states of GRNs, first proposed by Kauffman in 1969. This raises two fundamental questions currently addressed by experimental biologists in the era of “systems biology”: (1) Are cell types in the metazoan body indeed high-dimensional attractors of GRNs and (2) is the dynamics of GRNs in the “critical regime” – poised between order and chaos? A summary of recent experimental findings on these questions is given, and the broader implications of network concepts for cell fate commitment of stem cells are also briefly discussed. The idea of the epigenetic landscape is key to our understanding of how genes and gene regulatory networks give rise to the observable cell behavior, and thus to a formal and integrated view of molecular causation in biology. Introduction With the rise of “systems biology” over the past decade, molecular biology is moving away from the paradigm of linear genetic pathways which has long served as linear chains of causation (e. g., Gene A ! Gene B! Gene C! phenotype) in explaining cell behaviors. It has begun to embrace the idea of molecular networks as an integrated information processing system of the cell [93,138]. The departure from the gene-centered, mechanistic ‘arrow-arrow’ schemes that embody ‘proximal causation’ [189] towards an integrative view will also entail a change in our paradigm of what an “explanation” means in biology: How do we map the collective behavior of thousands of interacting genes, obtained from molecular dissection, to the “biological observable”? The latter term, borrowed from the statistical physics idea of the “macroscopic observable”, is most prosaically epitomized in whole-cell behavior in metazoan organisms: its capacity to adopt a large variety of easily recognizable, discretely distinct phenotypes, such as a liver cell vs. a red blood cell, or different functional states, such as the proliferative, quiescent or the apoptotic state. All these morphologically and functionally distinct phenotypes are produced by the very same set of genes of the genome as a result of the joint action of the genes. This is achieved by the differential expression (“turning ON or OFF”) of individual genes. Thus, in a first approximation, each cell phenotype is determined by a specific configuration of the status of expression of all individual genes across the genome. This genome-wide gene expression pattern or profile, or transcriptome, is the direct out-

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

put of the GRN and maps almost uniquely into a cell phenotype (see overview in Fig. 1). A network is the most elementary conceptualization of a complex system which is composed of interacting elements and whose behavior as an entity we wish to understand: it formalizes how a set of distinct elements (nodes) influence each other as predetermined by a fixed scheme of their interactions (links between the nodes) and the modality of interactions (rules associated with each node). In this article we focus on the gene regulatory network (GRN), the network formed by interactions through which genes regulate each other’s expression (Fig. 1a), and we ask how they control the global behavior of the network, and thereby, govern development of cells into the thousands of cell types found in the metazoan body. Because in a network a node exerts influence onto others, we can further formalize networks as directed graphs, that is, the links are arrows pointing from one node to another (Fig. 1b). The information captured in an undirected or directed graph is the network topology (see Glossary). In addition, the arrows have a modality (“sign”), namely, activating or inhibiting their target gene. But since each node can receive several inputs (“upstream regulators”), it is more appropriate to combine modality of interaction together with the way the target gene integrates the various inputs to change its expression behavior (= output). Thus, each node can be assigned a function that maps all its inputs in a specific way to the output (Fig. 1, top). For instance, “promoter logics” [209] which may dictate that two stimulating inputs, the transcription factors, act synergistically, or that one inhibitory input may, when present, override all other activating inputs, is one way to represent such an input integrating function. Here we use network architecture as a term that encompasses both network topology as well as the interaction modalities or the functions. The latter add the ingredients to the topology information that are necessary to describe the dynamics (behavior) of the network. The Core Gene Regulatory Network (GRN) in Mammals How complex is the effective GRN of higher, multicellular organisms, such as mammals? Virtually every cell type in mammals contains the 25,000 genes in the genome [44, 158,178] which could potentially interact with each other. However, in a first approximation, we do not need to deal with all the 25,000 genes but only with those intrinsic regulators that have direct influence on other genes. A subset of roughly 5–10% of the genes in the genome encode transcription factors (TF) [194] a class of DNA binding pro-

teins that regulate the expression of genes by binding to the promoter region of the ‘target genes’ (Fig. 1a). Another few hundred loci in the genome encode microRNAs (miRNA), which are transcribed to RNA but do not encode proteins. miRNAs regulate the expression of genes by interfering with the mRNA of their target genes based on sequence complementarity (for reviews see [41,89,141]). Thus, in this discussion we can assume a regulatory network of around 3000 regulator genes rather than the 25,000 genes of the genome. The genes that do not encode TFs are the “effector genes”, encoding the work horses of the cell, including metabolic enzymes and structural genes. In our approximation, we also assume that the effector genes do not directly control other genes (although they may have global effects such as change of pH that may affect the expression of some genes). Further we do not count genes encoding proteins of the signal transduction machinery since they act to mediate external signals (such as hormones) that can be viewed as perturbations to the network and can in a first approximation be left out when discussing cell-intrinsic processes. Thus, the directed graph describing the genome-wide GRN network has a “medusa” structure [112], with a core set of regulators (medusa head) and a periphery of regulated genes (arms) which in a first approximation do not feedback to the core. The Core GRN as a Graph That Governs Cell Phenotype The next question is: do the 3000 core regulatory genes form a connected graph or rather independent (detached) “modules”? The idea of modularity would have justified the classical paradigm of independent causative pathways and has in fact actively been promoted in an attempt to mitigate the discouragement in view of the unfathomable complexity of the genome [85]. While a systematic survey that would provide a precise number is still not available, we can, based on patchy knowledge from the study of individual TFs, safely assume that a substantial fraction of TFs control the expression of more than one other TF. Many of them also control entire batteries [50] of effector genes, while perhaps a third subset may be specialized in regulating solely the effector genes. In any case, the core regulatory network of 3000 nodes controls directly or indirectly the entire gene expression profile of 25,000 genes, and hence, the cell phenotype. Then, assuming that on a average each TF controls at least two (typically more) other TFs [128,188], and considering statistical properties of random evolved networks (graphs) [36], we can safely assume that the core transcriptional network is a connected graph or at least its gi-

529

530

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination, Figure 1 Overview: Concepts at different levels for understanding emergence of biological behavior from genes and networks. a Elementary interaction: regulator gene X 1 interacts with target (regulated) gene X 2 , simplified as one link of a graph and the corresponding standard ‘cartoon’ it represents. b Notation of gene regulatory network topology as directed graph. c Schematic of a higher dimensional gene expression state space, showing that a network state S(t) maps onto one point (denoted as cross) in the state space. Dashed curve/arrow represents state space trajectory. d Spotted DNA microarray for measuring gene expression (mRNA) profiles, representing a network state S(t) and e the associated “GEDI map” visualization of gene expression pattern. The map is obtained using the program GEDI [57,83]. This program places genes that behave similarly (with respect to their expression in a set of microarray measurements) onto the same pixel (= minicluster of genes) of the map. Similar miniclusters are arranged in nearby pixels in the twodimensional picture of an n  m array of pixels. The assignment of genes to a pixel is achieved by a self-organizing map (SOM) algorithm. The color of each pixel represents the centroid gene expression level (mRNA abundance) of the genes in the minicluster. For a stack of GEDI maps, all genes are forced to be assigned to the same pixels in the different maps, hence the global coherent patterns of each GEDI map allows for a one-glance ‘holistic’ comparison of gene expression profiles of different conditions S(t) (tissue types, time points in a trajectory, etc). f Scheme of branching development of a subset of four different blood cells with their distinct GEDI maps, starting from the multipotent CMP = common myeloid progenitor cell [150]. MEP = Megakaryocte-Erythorid progenitor cell; GMP = Granulocyte-Monocyte progenitor

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

ant component (largest connected subgraph) covers the vast majority of its nodes. This appears to be the case in GRNs of simple organisms for which more data are available for parts of the network, as in yeast [128], C. elegans [54]; sea urchin [50] or, for even more limited subnetworks in mammalian systems [32,183] although studies focused on selected subnetworks may be subjected to investigation bias. However, recent analysis of DNA binding by ‘master transcription factors’ in mammalian cells using systemic (hence in principle unbiased) chromatin immunoprecipitation techniques [177] show that they typically bind to hundreds if not thousands of target genes [48, 70,105,155], strongly suggesting a global interconnectedness of the GRN. This however, does not exclude the possibility that the genome-wide network may exhibit some “modularity” in terms of weakly connected modules that are locally densely connected [166]. Structure of This Article The goal of this article is to present both to biologists and physicists a set of basic concepts for understanding how the maps of thousands of interacting genes that systems biology researchers are currently assembling, ultimately control cell phenotypes. In Sect. “Overview: Studies of Networks in Systems Biology” we present an overview to experimental biologists on the history of the analysis of networks in systems biology. In Sect. “Network Architecture” we briefly discuss core issues in studies of network topology, before explaining basic ideas of network dynamics in Sect. “Network Dynamics” based on twogene circuits. The central concepts of multi-stability will

be explained to biologists, assuming a basic calculus background. Waddington’s epigenetic landscape will also be addressed in this conjunction. In Sect. “Cell Fates, Cell Types: Terminology and Concepts” we discuss the actual “biological observable” that the gene regulatory network controls by introducing to non-biologists central concepts of cell fate regulation. Section “History of Explaning Cell Types” offers a historical overview of various explanations for metazoan cell type diversity, including dynamical, network-based concepts, as well as more ‘reductionist’ explanations that still prevail in current mainstream biology. Here the formal link between network concepts and Waddington’s landscape metaphor will be presented. We then turn from small gene circuits to large, complex networks, and in Sect. “Boolean Networks as Model for Complex GRNs” we introduce the model of random Boolean networks which, despite their simplicity, have provided a useful conceptual framework and paved the way to learning what an integrative understanding of global network dynamics would look like, leading to the first central hypothesis: Cell types may be high-dimensional attractors of the complex gene regulatory network. In Sect. “Three Regimes of Behaviors for Boolean Networks” the more fundamental dynamical properties of ordered, critical and chaotic behavior are discussed, leading to the second hypothesis: Networks that control living cells may be in the critical regime. In Sect. “Experimental Evidence from Systems Biology” we summarize current experimental findings that lend initial support to these ideas, and in Sect. “Future Directions and Questions” we conclude with an outlook on how these general concepts may impact future biology.

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination, Table 1 Overview of typical directions and levels in the study of gene regulatory networks A. Network Architecture A1. Determination of network architecture I Experimental I Theoretical = Network inference A2. Analysis of network architecture Identification/characterization of “interesting” structural features B. Network Dynamics For small networks B1. Modeling: theoretical prediction of behavior of a real circuit based on (partially known, assumed) architecture + experimental verification For complex networks (full architecture not known) B2. Theoretical Study of generic dynamics of simulated ensembles of networks of particular architecture classes B3. Experimental Measurement and analysis of high-dimensional state space trajectories of a real network

531

532

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Overview: Studies of Networks in Systems Biology Cellular networks of biological molecules currently studied by “systems biologists” encompass three large classes: metabolic, protein–protein interaction and gene regulatory (transcriptional) networks (GRN). The formalization of these systems into networks is often taken for granted but is an important issue. It is noteworthy that metabolic network diagrams [193] represent physical networks in that there is a flow of information or energy in the links, and thus, the often used metaphors inspired by man-made transport or communication networks (‘bottleneck’, ‘hubs’ etc.) are more appropriate than in the other two classes. Moreover, metabolic networks are subjected to the constraint of mass preservation at each node – in compliance with Lavoisier’s mass conservation in chemistry. In contrast to such flow networks, protein–protein interaction networks and GRN are abstract notations of “influence networks” in which the nodes influence the behavior of other nodes directly or indirectly – as represented by the links of the graph. There is no actual flow of matter in the links (although of course information is exchanged) and there is no obvious physical law that constrains the architecture, thus allowing for much richer variations of the network structure. The links are hence abstract entities, representing potential interactions assembled from independent observations that may not coexist in a particular situation. Thus, in this case the network is rather a convenient graphical representation of a collection of potential interactions. Studies of such biomolecular influence networks can fundamentally be divided between two levels (Table 1): (A) network architecture and (B) network dynamics (system behavior). The study of network architecture in molecular biology can further be divided into (A1) efforts to determine the actual graph structure that represent the specific interactions of the genes for particular instance (a species) and (A2) the more recent analysis of its structure, e. g., for interesting topological features [2]. The determination of the graph structure of the genome-wide GRNs in turn is either achieved (i) by direct experimental demonstration of the physical interactions, which has in the past decade greatly benefited from novel massively parallel technologies, such as chromatin IP, promoter one-hybrid or protein binding DNA microarrays [35,54,177], or (ii) via theoretical inference based on observed correlations in gene expression behavior from genome-wide gene expression profiling experiments. Such correlations are the consequence of mutual dependencies of expression in the influence network.

The next section provides a concise overview of the study of the network architecture, which due to limitations of available data is mostly an exercise in studying graph topology, and briefly discusses associated problems. Network Architecture Network Inference An emerging challenge is the determination of entire networks, including the connection graph (topology), the direction of interactions (‘arrows”) and, ideally, the interaction modality and logical function, using systematic inference from genome-wide patterns of gene expression (“inverse problem”). Gene expression profiles (transcriptomes) represent a snap-shot of the mRNA levels in cell or tissue samples. The arrival of DNA microarrays for efficient measuring gene expression profiles for almost all the genes in the genome has stimulated the development of algorithms that address the daunting inverse problem, and the number of proposed algorithms has recently exploded. Most approaches concentrate on inference based on the more readily available static gene expression profiles, although time course microarray experiments, where time evolution of transcriptomes are monitored at close intervals would be advantageous, especially to infer the arrow direction of network links (directed graph) and the interaction functions. However, because of uncertainties and the paucity of experimental data, systematic network inference faces formidable technical and formal challenges, and most theoretical work has been developed and tested based on simulated networks. A fundamental concern is that microarray based expression levels reflect the average expression of a population of millions of cells that – as recently demonstrated – exhibit vastly diverse behaviors even when they are clonal. Thus, the actual quantity used for inference is not a direct but a convoluted manifestation of network regulation (this issue is discussed in Sect. “Are Cell Types Attractors?”). Moreover, while mRNA levels reflect relatively well the activity status of the corresponding gene promoter, i. e., revealing the regulated activity, they are poor indicator of the regulating activity of a gene, because of the loose relationship between mRNA level and effective activity of the transcription factor that it encodes. Since the true network architecture (‘gold standard’) is not known, validation of the theoretical approaches remains unsatisfactory. Nevertheless, the recent availability of large numbers of gene expression profiles and the increasing (although not complete) coverage of gene regulation maps for single-cell organisms (notably E. coli and yeast) open the opportunity to directly study the mapping between

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

gene expression profile and network structure [39,42,61, 88,135,190]. Here we refer to [2,131,139] for a survey of inference methods and instead briefly discuss the study of network topology before we move on to network dynamics. Analyzing Network Structures Once the network topology is known, even if direction, modality and logics of links are not specified to offer the complete system architecture, it can be analyzed using graph theory tools for the presence of global or local structures (subgraphs) that are “interesting”. A potentially interesting feature can be defined as a one which cannot be explained by chance occurrence in some randomized version of the graph (null model), i. e., which departs from what one would expect in a “random” network (see below). Most of the graph theoretical studies have been stimulated by the protein–protein interaction networks which have been available for some years, of which the best characterized is that of the yeast S. cerevisiae [29]. Such networks represent non-directed graphs, since the links between the nodes (proteins) have been determined by the identification of physical protein interactions (heterodimer or higher complex formation). Here we provide only a cursory overview of this exploding field, while focusing on conceptual issues. A large array of structural network features, many of them inspired by the study of ecological and social networks [152,160,199] have been found in biomolecular networks. These features include global as well as local features, such as, to mention a few, the scale-free or broadscale distribution of the connectivity ki of the nodes i [3, 10,20], betweenness of node i [107,208], hierarchical organization [164], modularity [102,121,133,166,206], assortativity [140], and enrichment for specific local topology motifs [147], etc. (for a review of this still expanding field see [2,21,29,152]). The global property of a scale-free distribution of connectivity ki , which has attracted most attention early on and quickly entered the vocabulary of the biologist, means that the probability P(k i ) for an arbitrary node (gene) i in the network to have the connectivity ki has the form P(k i ) k i where the characteristic constant  is the power-law exponent – the slope of the line in a P(k i ) vs. ki double-logarithmic plot. This distribution implies that there is no characteristic scale, i. e., no stable average value of k: sampling of larger number of nodes N will lead to larger “average” k values. In other words, there is an “unexpectedly” high fraction of nodes which are highly connected (“hubs”) while the majority exhibits low connec-

tivity. This property has attracted as much interest as it has stirred controversy because of the connotation of “universality” of scale-freeness on the one hand, and several methodological concerns on the other hand [21,24,73,76, 175,181,193] The Problem of Choosing the Null Model In addition to well-articulated caveats due to incompleteness, bias and noise of the data [29,52], an important general methodological issue in the identification of structural features of interest, especially if conclusion on functionality and evolution is drawn, is the choice of the appropriate “null model” – or “negative control” in the lingo of experimentalists. A total random graph obviously is not an ideal null model, for on its background, any bias due to obvious constraints imposed by the physical reality of the molecular network will appear non-random and hence, be falsely labeled “interesting” even if one is not interested in the physical constraints but in evidence for functional organization [14,93]. For instance, the fact that gene duplication, a general mechanism of genome growth during evolution, will promote the generation of the network motif in which one regulator regulates two target genes, needs to be considered before a “purposeful” enrichment for such a motif due to increased fitness is assumed [93]. Similarly, the rewiring of artificial regulatory networks through reshuffling of cis and trans regions or the construction of networks based on promoter-sequence information content reveal constraints that lead to bias for particular structures in the absence of selection pressure [18,46] The problem amounts to the practical question of which structural property of the network should be preserved when randomizing an observed graph to generate a null model. Arbitrarily constrained randomization based on preservation of some a priori graph properties [18,140] thus may not suffice. However, the question of which feature to keep cannot be satisfactorily answered given the lack of detailed knowledge of all physical processes underlying the genesis and evolution of the network. The more one knows of the latter, the more appropriate constraints can be applied to the randomization in order to expose novel biological feature that cannot be (trivially) explained by the neglected elementary construction constrains. Evolution of Network Structure: Natural Selection or Natural Constraint? This question goes beyond the technicality of choosing a null model for it reaches into the realm of a deeper questions of evolutionary biology: which features are inevitably linked to the very physical processes through which net-

533

534

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

works have grown in evolution and which arose due to natural selection because they confer a survival advantage [15,77,78,93,203]? Often this question is taken for granted and an all-mighty selection process is assumed that can produce any structure as long as it contributes sufficiently to fitness. This however, would require that during natural selection the random, mutation-driven reshuffling of nodes and connections has no constrains and that Darwinian evolution explores the entire space of possible architectures. Clearly this is not the case: physical constraints in the rearrangement of DNA (insertions, deletions, duplications, conversion, etc) [16,170,187] as well as graph theoretical considerations channel the possibilities for how one graph can transform into another [93, 203]. For instance, growth of the network due to the increase of genome-size (gene number) by gene duplication can, without the invisible hand of selection, give rise to the widely-found scale-free structure [23,186] although it remains to be seen whether this mechanism (and not some more fundamental statistical process) accounts for the ubiquitous (near) scale-free topology. The fact that the scale-free (or at least, broad-scale [10]) architecture has dynamical consequences (see Sect. “Architectural Features of Large Networks and Their Dynamics”) raises the question whether properties such as robustness may be inherent to some structure that is “self-organized”, rather than sculpted by the invisible hand of natural selection. Thus, one certainly cannot simply argue that the scale-free structure has evolved “because of some functional advantage”. Instead, it is reminded here that natural selection can benefit from spontaneous, self-organized structures [116]. This structuralist view [200], in which the physically most likely and not the biologically most functional is realized, need to be considered when analyzing the anatomy of networks [93]. In summary, the choice of the null model has to be made carefully and requires knowledge of the biochemical and physicochemical process that underlies genome evolution. This methodological caveat has its counterpart in the identification of “interesting” nucleotide sequence motifs when analyzing genome sequences [167]. Gene-Specific Information: More Functional Analysis Based on Topology Beyond pure graph theoretical analysis, there have been attempts to link the topology with functional biological significance. For instance, one question is how the global graph structure changes (such as the size of the giant component) when nodes or links are randomly or selectively (e. g., hubs) removed. It should be noted that the term

“robustness” used in such structural studies [5] refer to networks with the aforementioned connotation of transport or communication function with flow in the links and thus, differs fundamentally from robustness in a dynamical sense in the influence networks that we will discuss below. Another approach towards connecting network topology with biological functionality is to employ bioinformatics and consider the biological identity of the genes or proteins represented by the nodes. Then one can ask whether some graph-related node properties (e. g., degree of connectivity, betweenness/centrality, contribution to network entropy, etc.) are correlated with known biological properties of the genes, of which the most prosaic is the “essentiality” of the protein, derived from genetic studies [84, 137,207,208]. To mention just a few of the earlier studies of this expanding field, it has been suggested that proteins with large connectivity (“hubs”) appear to be enriched for “essential proteins” [104], that hubs evolve slower [68], and that they tend to be older in evolution [58] – in accordance with the model of preferential attachment that generates the scale-free distribution of the connectivity k i . However, many of these findings have been contested for statistical or other reasons [27,28,67,106,107]. The conclusion of such functional bioinformatics analysis need to be re-examined when more reliable and complete data become available. Network Dynamics While the graph theoretical studies, including the ramifications into core questions of evolution outlined above, are interesting eo ipso, life is breathed into these networks only with the ‘dynamics’. Understanding the system-level dynamics of networks is an essential step in our desire to map the static network diagrams which are merely “anatomical observables”, into the functional observable of cell behavior. Dynamics is introduced by considering that a given gene (node) has a time-dependent expression value x i (t) which in a first approximation represents the activity state of gene i (an active gene is expressed and post-translationally activated). Instead of seeing dynamics as a sequence of gene activation, epitomizing the chain of causation of the gene-centric view (as outlined in Sect. “Introduction”) a goal in the study of complex systems is to understand the integrated, “emergent” behavior of systems as a holistic entity. We thus define a system (network) state S(t) as S(t) D [x1 (t); x2 (t); : : : ; x N (t)] which is determined by the expression values xi of all the N genes of a network which change over time. It is obvious that not all theoretically possible state configurations S can

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

be realized since the individual variables xi cannot change independently. This global constraint on S imposed by the gene regulatory interactions ultimately determines the “system behavior” which is equivalent to whole-cell phenotype changes, such as switching between cell phenotypes. Thus, one key question is: given a particular network architecture, what is the dynamics of S, and does it reproduce typical cell behaviors, or even predict the particular behavior of a specific cell? Before plunging into complex GRNs, let us introduce basic concepts of modeling dynamics using small circuits (Fig. 2, B1 in Table 1.).

Small Gene Circuits Basic Formalism The dynamics of the network can be written as a system of ordinary differential equations (ODE) that describe the rate of change of xi as a function of the state of the all the xj : dx D F(x) ; dt

(1)

where x is the state vector x(t) D [x1 ; x2 ; : : : ; x N ] for a network of N genes, and F describes the interactions

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination, Figure 2 From gene circuit architecture to dynamics, exemplified on a bistable two-gene circuit. a Circuit architecture: two mutually inhibitory genes X 1 and X 2 which are expressed at constant rate and inactivated with first-order kinetics (“bistable toggle switch”). b Typical ODE system equations for the circuit in a (see [71,109]). c State space (x 1 –x 2 -phase-plane). Each dot is an example initial state So D [x1 (t D 0); x2 (t D 0)] with the emanating tick revealing direction and extent that the state So would travel in the next time and S , denote stable fixed-points (“attractors”). Empty circle denotes unstable fixed-point (saddle-node). unit t. Solid circles, S 1 2 d, e, f Various schematic representations of the probability P(S) (for a noisy circuit) to find the system in state S D [x1 ; x2 ]. The “elevation” (z-axis over the x 1 –x 2 plane) is calculated as – ln(P); thus, the most probable = stable states are the lowest in the emerging landscape. g Waddington’s metaphoric epigenetic landscape, in the 1957 version [198]

535

536

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

(including the interaction matrix), defining how the components influence each other. Concretely, for individual genes in small circuits, e. g. of two genes, N = 2: dx1 D f1 (x1 ; x2 ) dt dx2 D f2 (x1 ; x2 ) : dt

(1a)

Here the functions f i are part of the network “architecture” that determine how the inputs onto gene i jointly determine its dynamical behavior – as defined in Sect. “Introduction”. An example of f is shown in Fig. 2b. Its form is further specified by system parameters that are constants for the time period of observation. The solution of these system equations is the movement of the state vector x in the N-dimensional gene expression state space spanned by xi . x(t) D S(t) D [x1 ; x2 ; : : : ; x N ] :

(2)

The Network State For a larger number of genes, it is convenient to describe the dynamics of the network with the vector x which roughly represents the system state S, Eq. (2), that biologists measure using DNA microarrays and is known as the “gene expression profile” (see Sect. “Definition of the Subject”, Fig. 1). We will thus refer to S instead of the vector x in discussing biological system states. Since Eq. (1) is a first order differential equation, S specifies a system state at time t which for t D 0 represents an initial condition. In other words, unlike in macroscopic mechanical systems with inertia, in which the velocity dx i /dt of the “particles” is also important to specify an initial state, in cells a photographic snapshot of all the positions of xi , that is S(t) itself, specifies the system state. (This will become important for experimental monitoring of network dynamics). Role of Dynamical Models in Biology Small networks, or more precisely, “circuits” of a handful of interacting proteins and genes, have long been the object of modeling efforts in cell biology that use ODEs of the type of (1a) to model the behavior of the circuit [109,192] (Table 1, B1). In contrast, the objects of interest in the study of complex system sciences are large, i. e., “complex” networks of thousands of nodes, such as the 3000-node core GRN, mentioned earlier, and has largely been driven by experimental monitoring of S(t) using microarrays since the function F that represents the architecture of the network is not known (Table 1, B3). What is the actual aim of modeling a biological network? In mathematical modeling of small gene circuits

or of signal transduction pathways, one typically predicts the temporal evolution (time course) of the concentration of the modeled variables x1 (t), or x2 (t), and characterizes critical points, such as stable steady states or oscillations in the low-dimensional state space, e. g. in the x1  x2 phase plane, after solving the system equations, as exemplified in Figs. 2 and 3. Unknown parameter values have to be estimated, either based on previous reports or obtained by fitting the modeled x i (t) to the observed time course. Successful prediction of behaviors also serves to validate the assumed circuit architecture. From the stability analysis of the resulting behavior [151], generalization as to the dynamical robustness (low sensitivity to noise that cause x(t) to randomly fluctuate) and structural robustness (preservation of similar dynamical behavior even if network architecture is slightly rewired by mutations) can be made [7,34]. Both types of robustness of the system shall not be confounded with the robustness which is sometimes encountered in the analysis of network topology where preservation of some graph properties (such as global connectedness) upon deletion of nodes or links is examined (“error tolerance”) (as discussed in Sect. “Network Architecture”). Although in reality most genes and proteins are embedded in the larger, genome-wide network, small, idealized circuit models which implicitly accept the absence of many unknown links as well as inputs from the nodes in the global network outside the considered circuit often surprisingly well predict the observed kinetics of the variables of the circuit. This not only points to intrinsic structural robustness of the class of natural circuits, in that circuit architectures slightly different from that of the real system can generate the observed dynamics. But it also suggests that obviously, for some not well-understood reasons, local circuits are to some extent dynamically insulated from the larger network in which they are embedded although they are topologically not detached. Such “functional modularity” may be a property of the particular architecture of the evolved complex GRNs. In fact, in models of complex networks (discussed in the next section), evidence for just such “functional modularity” has been found with respect to network dynamics [116]. In brief preview (see Sect. “Three Regimes of Behaviors for Boolean Networks”), work on random Boolean network models of GRN has revealed three “regimes” of dynamical behaviors: ordered, critical, and chaotic, as described in Sect. “Ordered and Chaotic Regimes of Network Behavior”. In the ordered and critical regimes, many or most genes become “frozen” into “active” or “inactive” states, leaving behind functionally isolated “islands” of genes which, despite being connected to other islands

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination, Figure 3 Expansion of the bistable two-gene circuit by self-stimulation – creating a central metastable attractor. a The bistable circuit of Fig. 2 and associated epigenetic landscape, now as contour graph. b If the two genes exert self-activation, then, for a large range of parameters and additive effect of the two inputs that each gene receives, the epigenetic landscape is altered such that the central unstable , giving rise to robust “tristable” dynamics [99]. c Two specific examples of observed gene fixed-point in a. becomes locally stable, S 3 circuits that represent the circuit of C. Such network motifs are typically found to regulate the branch points where multipotent can be modeled as representing the progenitor cells make binary cell fate decisions [99,162]. The metastable central attractor S 3 metastable bi-potent progenitor cell which is poised in a state of locally stable indeterminacy between the two prospective fate and S attractors S 1 2

through the interactions of the network, are free to vary their activities without impacting the behaviors of other functionally isolated islands of genes. Complex Networks Despite a recent spate of publications in which the temporal change of expression of individual genes have been predicted based on small circuit diagrams, such predictions do not provide understanding of the integrated cell behavior, such as the change of cell phenotype that may involve thousands genes in the core GRN. Thus, analysis of

genome-wide network state is needed to understand the biological observable. In a first approximation, it is plausible that the state of a cell, such as the particular cell (pheno-)type in a multi-cellular organism, is defined by its genome-wide gene expression profile, or transcriptome T D [x1 ; x2 ; : : : ; x N ] with N D 25; 000. In fact, microarray analysis of various cells and tissues reveals globally distinct, tissue specific patterns of gene expression profiles that can easily be discerned as shown in Fig. 1f. As mentioned above (Sect. “Definition of the Subject”), the gene expression profile across the genome is governed by the core regulatory network of transcription factors (TFs)

537

538

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

which enslave the rest of the genome. Thus, in our approximation the network state S of the core transcriptional network of 3000 or so genes essentially controls the entire (genome-wide) gene expression profile. For clarity of formalization, it is important to note that one genome in principle encodes exactly one fixed network, since the network connections are defined by the specific molecular interactions between the protein structure of TFs and the DNA sequence motif of the cis-regulatory promoter elements they recognize. Both are encoded by the genomic sequence. The often encountered notion that “networks change during development” and that “every cell type has its own network” is in this strict formalism incorrect – the genes absent in one cell type must directly or indirectly have been repressed (and sometimes, continuously kept repressed) by other genes that are expressed. Thus, the genome (of one species) directly maps into one (time-invariant) network architecture which in turn can generate a variety of states S(t). It is the state S that changes in time and is distinct in different cell types or in different functional states within one cell type. Only genetic mutations in the genome will “rewire” the network and change its architecture. If the network state S(t) of the core GRN directly maps into a state of a cell (the biological observable), then one question is: What is the nature of the integrated dynamics of S in complex, irregularly wired GRN and is it compatible with observable cell behavior? The study of the dynamics of complex networks (Table 1, B) may at first glance appear to be impeded by our almost complete ignorance about the architecture and interaction modalities between the genes. However, we cannot simply extrapolate from the mindset of studying small circuits with rate equations to the analysis of large networks. This is not only impossible due to the lack of information about the detailed structure of the entire network – the function F in Eq. 1 is unknown – but it may also be numerically hard to do. Yet, despite our ignorance about the network architecture, much can be learned if we reset our focus plane to the larger picture of the network. In this regard, the study of the dynamics of complex network can have two distinct goals (see Table 1). One line of research (Table 1, B2) overcomes the lack of information about the network architecture by taking an ensemble approach [112] and asks: What is the typical behavior of a statistical ensemble of networks that is characterized by an architectural feature (e. g. average connectivity ki , power-law exponent  )? This computationally intense approach typically entails the use of discrete-valued gene networks (Sect. “Boolean Networks as Model for Complex GRNs”). As will be discussed below, such analy-

sis has led to the definition of three broad classes of behaviors: chaotic, critical and ordered. The second approach (Table 1, B3) to the dynamics of complex GRN is closer to experimental biology and exploits the availability of gene expression profile measurements using DNA microarray technology. Such measurements provide snapshots of the state of the network S(t) over N genes, covering almost the entire transcriptome, and thus, reveals the direct output of the GRN as a distributed control system. Monitoring S(t) and its change in time during biological processes at the whole-cell level will reveal the constraint imposed on the dynamics of S(t) by network interactions and can, in addition to providing data for the inference problem (Sect. “Network Inference”), expose particular dynamical properties of the transcriptome that can be correlated with the biological observable. Cell Fates, Cell types: Terminology and Concepts In order to appreciate the meaning of the network state S and how it maps to the biological observable, we will now present (to non-biologists) in more detail the most prosaic biological observable of gene network dynamics: cell fate determination during development. Stem Cells and Progenitor Cells in Multi-cellular Organism A hall-mark of multi-cellular organisms is the differentiation of the omnipotent zygote (fertilized egg) via totipotent and pluripotent embryonic stem cells and multipotent tissue stem cells into functionally distinct mature “cell types” of the adult body, such as red blood cell, skin cells, nerve cells, liver cells, etc. This is achieved through a branching (treelike) scheme of successive specialization into lineages. If the fertilized egg is represented by the main trunk of the “tree of development”, then think of cells at the branching points of developmental paths as the stem cells. One example of a multipotent stem cell is the hematopoietic stem cells (HSC) which is capable of differentiating into the entire palette of blood cells, such as red blood cells, platelets and the variety of white blood cells. The last branch points represent progenitor cells which have a lesser developmental potential but still can chose between a few cell types (e. g., the common granulocyte-macrophage progenitor (GMP) cell, Fig. 1f). Finally, the outmost branches of the tree represent the mature, terminally differentiated cell types of the body. A cell that can branch into various lineages is said to be “multipotent”. It is a “stem cell” when it has the potential to

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

self-renew by cell division, maintaining its differentiation potential, and to create the large family of cells of an entire tissue (e. g., the hematopoietic stem cell). Thus, progenitor cells, which proliferate but cannot infinitively self-renew, are strictly not stem cells. The commitment to a particular cell phenotype (a next generation branch of the tree) is also referred to as a “cell fate” since the cell at the proximal branching point is “fated” to commit to one of its prospective cell types. Development and Differentiation The diversification of the embryonic stem cell to yield the spectrum of the thousands [94] or so cell types in the body occurs in a process of successive branching events, at which multipotent cells commit to one fate and which appear to be binary in nature. Thus, multipotent cells make an either-or decision between typically two lineages – although more complex schemes have been proposed [65]. Moreover, it is generally assumed that during natural development there is only diversification of developmental paths but no confluence from different lineages, although recently exceptions to this rule have been reported for the hematopoietic system [1]. As cells develop towards the outer branches of the “tree of development”, they become more and more specialized and progressively lose their competence to proliferate and diversify (“potency”). They also develop the phenotype features of a mature cell type; for instance, in the case of red blood cells, they adopt the flat, donut-like shape and synthesize haemoglobin. This process is called differentiation. Most cells then also loose their capacity to divide, that is, to enter into the proliferative state. Thus, mature, terminally differentiated cells are typically quiescent or “post mitotic”. The branching scheme of cell types imposes another fundamental property: cell types are discrete entities. They are distinct from each other, i. e., they are well separated in the “phenotype space” and are stable [171]. There is thus no continuum of phenotype. As Waddington, a prominent embryologist of the last century, recognized in the 1940s: Cell types are “well-recognisable” entities and “intermediates are rare” [198]. In addition to the (quasi-) discreteness between branches of the same level, discreteness between the stages within one developmental path is also apparent: a multipotent stem cell at a branching point is in a discrete stage and can be identified based on molecular markers, isolated and cultured as such. Hence, a “stem cell” is not just a snapshot of an intermediate stage within a continuous process of development, but a discrete metastable entity.

The flow down the developmental paths, from a stem cell to the terminally differentiated state is, despite the pauses at the various metastable stem and progenitor cell levels, essentially unidirectional and, with a few exceptions, irreversible. In some tissues, the mature cells, as is the case with liver, pancreas or endothelial cells, can upon injury revert to a phenotype similar to that of the last immature (progenitor) stage and resume proliferation to restore the lost cell population, upon which they return to the differentiated, quiescent state. Biologists often speak in somewhat loose manner of cells “switching” their phenotype. This may refer to switching from a progenitor state to a terminally differentiated state (differentiation) or within a progenitor state from the quiescent to other functional states, such as the proliferative or the apoptotic state (apoptosis = programmed cell death). In any case, such intra-lineage switching between different functional states also represents discontinuous, quasi-discrete transitions of wholecell behaviors. The balance between division, differentiation and death in the progenitor or stem cell compartment of a tissue thus consists of state transitions that entail allor-none decisions. This balance is at the core of organismal development and tissue homeostasis. Now we can come back to the network formalism: if the network state S maps directly into the biological observable, what are the properties of the network architecture that confer its ability to produce the properties of the biological observable outlined in this section: discreteness and robustness of cell types, discontinuity of transitions, successive binary diversification and directionality of these processes? Addressing these questions is the longterm goal of a theory of the multicellular organism. In the following we describe the status of research toward this goal. History of Explaning Cell Types Waddington’s Epigenetic Landscape and Bistable Genetic Circuits One of the earliest conceptualization of the existence of discreteness of cell types was the work of C. Waddington, who proposed “epigenetic regulation” in the 1940s, an idea that culminated in the famous figure of the “epigenetic landscape” (Fig. 2g). This metaphor, devoid of any formal basis, let alone relationship to gene regulation, captures the basic properties of discrete entities and the instability of intermediates. The term ‘epigenetic’ was coined by Waddington to describe distinct biological traits that arise from the interplay of the same set of genes and does not require the assumption of a distinct, “causal” gene to

539

540

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

be explained [196]. The 1957 version of the epigenetic landscape [198] (Fig. 2g) also implies that multipotent cells are destined to make either-or decisions between two prospective lineages, as embodied in the “watershed” regions (Fig. 2g). Almost at the same time as Waddington, in 1948 Max Delbrück proposed a generic concept of differentiation into two discrete states in a biochemical system that can be described by equations of the form (3), consisting of two mutual inhibiting metabolites, x1 (t) and x2 (t) that exhibits bistability [53] (For a detailed qualitative explanation, see Fig. 2). The dynamics of such a system is graphically represented in the two dimensional state space spanned by x1 and x2 (Fig. 2c-f). Bistable dynamics implies that there are two steady states S1 and S2 that satisfy dx1 /dt D dx2 /dt D 0 and are stable fixed-points of the system. For the system equations of Fig. 2b, this behavior is observed for a large range of parameters. In a nutshell, the mutual inhibition renders the balanced steady state S3 (x1 D x2 ) unstable so that the system settles down in either the steady state S1 (x1  x2 ) or S2 (x1  x2 ) when kicked out of this unstable fixed point. These two stable steady states and their associated gene activity patterns are discretely separated in the x1  x2 state space and have been postulated to represent the differentiated state of cells, thus, corresponding to Waddington’s valleys. This was the first conceptualization of a cellular differentiated state as a stable fixed-point of a non-linear dynamical system. Soon after Monod and Jacob discovered the principle of gene regulation they also proposed a circuit of the same architecture as Delbrück [148], but consisting of two mutually suppressing genes instead of metabolites, to explain differentiation in bacteria as a bistable system. Bistability, Tristability and Multistability The central idea of bistability is that the very same system, in this case, a gene circuit composed of two genes x1 and x2 , can under some conditions produce two distinct stable states separated by an unstable stationary-state, and hence, it can switch in almost discontinuous manner from one state (S1 ) to another (S2 ) (“toggle switch”) (Fig. 2). Bistability is a special (the simplest) case of multi-stability, an elementary (but not necessary) property of systems with non-linear interactions, such as those underlying the gene regulatory influences as detailed in Fig. 2b. Which stable state (in this case, S1 or S2 ) a network occupies depends on the initial condition, the position of S(t D 0) in the x1  x2 state space (Fig. 2c). The two stable steady-states, S1 or S2 , are called “attractors” since the system in such states will return to the characteristic gene expression pat-

tern in response to small perturbations to the system by enforced change of the values of x1 or x2 . Similarly, after an external influence that places the network at an unstable initial state S 0 (t D 0) within the basin of attraction of attractor S1 (gray area in Fig. 2c), and hence cause the network to settle down in the stable state S1 , the network will stay there even after the causative influence that initially put it in the state S 0 has disappeared. Thus, attractor states confer memory of the network state. In contrast, larger perturbations beyond a certain threshold will trigger a transition from one attractor to another, thus explaining the observed discontinuous “switch-like” transitions between two stable states in a continuous system. After Delbrück, Monod and Jacob, numerous theoretical [74] and, with the rise of systems biology, experimental works have further explored similar simple circuitries that produce bistable behavior (see ref. in [38]). Such circuits and the predicted switch-like behavior have been found in various biological systems, including in gene regulatory circuits in Escherichia coli [157], mammalian cell differentiation [43,99,127,168] as well as in protein signal transduction modules [12,64,205]. Artificial gene regulatory circuits have been constructed using simple recombinant DNA technology to verify model predictions [22,71, 123]. Circuit analyzes have been expanded to cover more complex circuits in mammalian development. It appears that there is a common theme in the circuits that govern cell differentiation (Sect. “Cell Fates, Cell Types: Terminology and Concepts”): interconnected pairs of mutually regulating genes that are often also self-regulatory, as shown in Fig. 3b [43,99,168]. Such circuit diagrams may be crucial in controlling the binary diversification at developmental branch points (Fig. 1f) [99]. In the case where the two mutually inhibiting genes are also are self-stimulatory, as summarized in Fig. 3b, an interesting modification of the bistable dynamics can be obtained. Assuming independent (additive) influence of the self-stimulatory and cross-inhibitory inputs, this circuit will convert the central unstable state (saddle) S3 (Fig. 2, 3a) that separate the two stable steady-states into a stable steady state, thus generating tristable behavior [99]. The third, central stable fixed point has, in symmetrical cases, the gene expression configuration S3 [x1 x2 ] (Fig. 3b). The promiscuous expression of intermediate-low levels of x1 and x2 in the locally stable state S3 has been associated with a stem or progenitor cell that can differentiate into the cell represented by the attractors S1 (x1  x2 ) or S2 (x1  x2 ). In fact, the common progenitor cells (Fig. 1f) have been shown to express “promiscuous” expression of genes thought to be specific for either of the

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

lineages in which it will have to commit to, so-to speak providing a “preview” or “multi-lineage priming” of the gene expression of its prospective cell fates [47,59,92]. For instance, the common myeloid progenitor (CMP, Fig. 1) which can commit to either the erythroid or the myeloid lineage, expresses both the erythroid-specific transcription factor GATA1 (= x1 ) and the myeloid specific transcription factor PU.1 (= x2 ) at intermediate levels (Fig. 3c). The metastable [GATA1 PU.1] configuration generates a state of indeterminacy or “suspense” [146] awaiting the signal, be it a biological instruction or stochastic perturbation, that resolves it to become either one of the more stable states [GATA1  PU.1] or [GATA1  PU.1] when cells differentiate into the erythroid or myeloid lineage, respectively. Thus, multi-potency and indeterminacy of a progenitor cell can be defined purely dynamically and does not require a much sought after “stemness” gene (again, a concept that arose from the gene-centered view) [195]. The metastable state also captures the notion of a “higher potential energy” [79] that drives development and hence, may account for the arrow of time in ontogenesis. A Formalism for Waddington’s Epigenetic Landscape Metaphor Obviously, the valleys in Waddington’s epigenetic landscape represent the stable steady states (attractors) while the hill-tops embody the unstable fixed-points, as shown in Fig. 2. How can Waddington’s metaphor formally be linked to gene network dynamics and the state space structure? For a one dimensional system dx/dt D f (x), this is easily shown with the following analogy: An “energy” landscape represents the cumulative “work” performed/received when “walking” in the state space against / along the vector field of the “force” f (x) (Fig. 2c). Thus, the “potential energy” is obtained by integrating f (x) (the right-hand side of the system equation, Eq. (1)) over the state space variables x. Z V(x) D  f (x)dx : (3) Here the state space dimension x is given the meaning of a physical space, as that pertaining to a landscape, and the integral V(x) is the sum of the “forces” experienced by the network states S(t) D x(t) over a path in x that drive S(t) to the stable states (Fig. 2c). The negative sign establishes the notion of a potential in that the system loses energy as it moves towards the stable steady states which are in the valleys (“lowest energy”). Higher (N > 1) dimensional systems (Eq. 1) are

in general non-integrable (unless there exists a continuously differentiable (potential) function V (x1 ; x2 ; : : : ; x N ) for which f1 dx1 C f2 dx2 C    C f N dx N    D 0 is the exact total differential, so that –grad(V)=F(x) with F(x) = [ f1 (x); f2 (x); : : : ; f N (x)]T ). Thus, there is in general no potential function that could represent a “landscape”; however, the idea of some sort of “potential landscape” is widely (and loosely) used, especially in the case of two-dimensional systems where the third dimension can be depicted as a cartographic elevation V (x1 , x2 ). An elevation function V(x1 , x2 ) can be obtained in stochastic systems where x(t) is subjected to random fluctuations due to gene expression noise [87]. Then V(S) is related to the probability P(S) to find the system in state S D (x1 ; x2 ; : : : ), e. g., V(S) D  ln(P(S) [118,177]. It should however be kept in mind that the “quasi-potential” V is not a true (conservative) potential, since the vector field is not a conservative field. The above treatment of the dynamics of the gene regulatory circuit explains the valleys and hills of Waddington’s landscape but still lacks the “directionality” of the overall flow on the landscape, depicted by Waddington as the slope from the back to the front of his landscape. (This arrow or time of development will briefly be discussed in the outlook Sect. “Future Directions and Questions”). In summary, the epigenetic landscape that Waddington proposed based on his careful observation of cell behavior and that he reshaped over decades [103,179,196, 197,198], can now be given both a molecular biology correlate (the gene regulatory networks) and a formal framework (probability landscape of network states S). The landscape idea lies at the heart of the connection between molecular network topology and biological observable. The Molecular Biology View: “Epigenetic” Marks of Chromatin Modification Although it is intuitively plausible that stable steady states of the circuits of Delbrück and of Monod and Jacob may represent the stable differentiated state, this explanation of a biological observable in terms of a dynamical system was not popular in the community of experimental molecular biologists and was soon sidelined as molecular biology, and with it the gene-centered view, came to dominate biology. The success in identifying novel genes (and their mutated alleles) and the often straightforward explanation of a phenotype offered by the mere discovery of a (mutant) gene triggered a hunt for such “explanatory genes” to account for biological observables. Gene circuits had to give place to the one-gene-one trait concept, leading to an era in which a new gene-centered epistemo-

541

542

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

logical habit, sometimes referred to as “genetic determinism”, prevailed [180]. Genetic determinism, a particular form of reductionism in biology, was to last for another fifty years after Delbrück’s proposal of bistability. Only as the “low hanging fruits” of simple genotype-phenotype relationships seemed to have all been picked, and genomewide gene expression measurements became possible was the path cleared for the rise of “system biology” and biocomplexity that we witness today. In genetic determinism, macroscopic observables are reduced in a qualitative manner to the existence of genes or regulatory interactions which at best “form” a linear chain of events. Such “pathways”, most lucidly manifest in the arrow-arrow schemes of modern cell biology papers, serve as a mechanistic explanation and satisfied the intellectual longing for causation. It is in light of this thinking that the molecular biologists’ explanation of the phenomenon of cell type determination has to be seen. Absent a theory of cell fate diversification and in view of the stunning stability of cell types, a dogma thus came into existence according to which the type identity of cells, once committed to, is irreversibly carved in stone [161]. Rare transdifferentiation events (switch of cell lineages) were regarded as curiosities. The observation that cell types express type-specific genes was explained by the presence of cell-type specific transcription factors. For instance, red blood cells express haemoglobin because of the presence of GATA1 – a lineage specific transcription factor that not only promotes commitment to the erythroid lineage (as discussed above, Sect. “Bistability, Tristability and Multistability”) but also controls haemoglobin expression [149]. Conversely, the absence of gene activity, for instance, nonexpression of liver-specific genes in erythrocytes, was explained by the silencing of the not needed genes by covalent DNA methylation and histone modifications (methylation, acetylation, etc.) [13,65,69,117,122] which modify chromatin structure, and thereby, control the access of the transcription machinery to the regulatory sites on the DNA. But who controls the controller? Chromatin modifications [117,122] are thought to confer discrete alterations of gene expression state that are stable and essentially irreversible. This idea of permanent marks on DNA represents the conceptual cousin of mutations (but without altering DNA sequence) and was thus in line with the spirit of genetic determinism. Accordingly, they were readily adopted as an explanation of the apparently irreversible cell type-specific gene inactivation and were given the attribute “epigenetic” to contrast them from the genetic changes that involve alteration of DNA sequences. But enzymes responsible for covalent DNA and chromatin modification are not gene-locus specific, leav-

ing open how the cell type specific gene expression pattern is orchestrated. It is important to mention here a disparity in the current usage of the term “epigenetics” [103]. In modern molecular biology, “epigenetics” is almost exclusively used to refer to DNA methylation and covalent histone modifications; this meaning is taken for granted even among authors who comment on the very usage of this term [13, 26,103], and are unaware that memory effects can arise purely dynamically without a distinct material substrate, as discussed in Sect. “Bistability, Tristability and Multistability”. In contrast, biological physicists use “epigenetic” to describe precisely phenomena, such as multi-stability (Sect. “Bistability, Tristability and Multistability”) that are found in non-linear systems – a usage that comes closer to Waddington’s original metaphor for illustrating the discreteness and stability of cell types [197,198]. The use of Waddington’s “epigenetic landscape” has recently seen a revival [75,165] in the modern literature in the context of chromatin modifications but remains loosely metaphoric and without a formal basis. Rethinking the Histone Code The idea of methylation and histone modifications as a second code beyond the DNA sequence (“histone code”) that cells use to “freeze” their type-specific gene expression pattern relied on the belief that these covalent modifications act like permanent marks (only erased in the germline when an oocyte is formed). This picture is now beginning to change. First, recent biochemical analysis suggest that the notion of a static, irreversible “histone code” is oversimplified, casting doubt on the view that histone modification is the molecular substrate of “epigenetic memory” [122,126,144,191]. With the accumulating characterization of chromatin modifying enzymes, notably those controlling histone lysine (de)methylation [122,126,144, 191], it is increasingly recognized that the covalent “epigenetic” modifications are bidirectional (reversible) and highly dynamic. Second, cell fate plasticity, most lucidly evident in the long-known but rarely observed transdifferentiation events, or in the artificial reprogramming of cells into embryonic stem cells either by nuclear transfer-mediated cloning [91] or genetic manipulation [143, 156,184,201], confirm that the “epigenetic” marks are reversible – given that the appropriate biochemical context is provided. If what was thought of as permanent molecular marks is actually dynamical and reversible – what then maintains lineage-specific gene expression patterns in an inheritable fashion across cell divisions?

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

In addition, as mentioned above, there is another, more fundamental question from a conceptual point of view: chromatin-based marking of gene expression status is a generic, not locus – specific process – the same enzymes can apply or remove the covalent marks on virtually any gene in the genome. They are dumb. So what smart system orchestrates the DNA methylation and histone modification machinery at the tens of thousands of loci in the genome so that the appropriate set of genes is (in)activated to generate the cell type-specific patterns of gene expression? A system-level view avoids the conundrum caused by the mechanistic, proximal explanation [189] of the genecentered view. A complex systems approach led to the idea that the genome-wide network of transcriptional regulation can under some conditions spontaneously orchestrate, thanks to self-organization, the establishment of lineage- specific gene expression profiles [114], as will be discussed below. In fact, the picture of chromatin modification as primum movens that operates “upstream” of the transcription factors (TFs), controlling their access to the regulatory elements in promoter regions, must be revised in light of a series of new observations. Evidence is accumulating that the controller itself is controlled – namely, by the TFs they are thought to control: TFs may actually take the initiative by recruiting the generic chromatin-modifying enzymes to their target loci [51,80,125, 144,145,182,185]. It is even possible that a mutual, cooperative dynamical interdependence between TFs and chromatin-modifying enzymes may establish locus-specific, switch-like behavior that commands “chromatin status” changes [56,132]. In fact, an equivalent of the indeterminacy state where both opposing lineage specific TFs balance each other (Sect. “Bistability, Tristability and Multistability”, Fig. 3b) is found at the level of chromatin modification, in that some promoters exhibit “bivalent” histone modification in which activating and suppressing histone methylations coexist [25]. Such states in fact are associated with TFs expressed at low level – in agreement with the central attractor S3 of the tristable model (Fig. 3b). Thus, chromatin modification is at least in part “downstream” of TFs and may thus act to add additional stability to the dynamical states that arise from the network of transcriptional regulation. If correct, such a relationship will allow us to pass primary responsibility for establishing the observable gene expression patterns back to transcription factors. With its genome-wide regulatory connections the GRN is predestined for the task of coordination and distributed information processing. But under what conditions can a complex, apparently randomly wired network of 3000 regulators create, with

such stunning reliability and accuracy, the thousands of distinct, stable, gene expression profiles that are associated with a meaningful biological state, such as a cell type? Studies of Boolean networks as toy models in the past 40 years have provided important insights.

Boolean Networks as Model for Complex GRNs General Comment on Simplifying Models The small-circuit model discussed in Sect. “Small Gene Circuits” represents arbitrarily cut-out fragments of the genome-wide regulatory network. A cell phenotype, however, is determined by the gene expression profile over thousands of genes. How can we study the entire network of 25,000 genes, or at least the core GRN with 3000 transcription factors even if most of the details of interactions remain elusive? The use of random Boolean networks as a generic network model, independent of a specific GRN architecture of a particular species, was proposed by Kauffman in 1969 – at the time around which the very idea of small gene regulatory circuits were presented by Monod and Jacob [148]. In random Boolean networks the dynamics is implemented by assuming discrete-valued genes that are either ON (expressing the encoded protein) or OFF (silenced). The interaction function (Sect. “Introduction”, and F in Sect. “Small Gene Circuits”) that determines how the ON/OFF status of multiple inputs of a target gene map into its behavior (output status) is a logical (Boolean) function B, and the network topology is randomly-wired, with the exception of some deliberately fixed features. Thus, the work on random Boolean networks allowed the study of the generic behavior of large networks of thousands of genes even before molecular biology could deliver the actual connections and interaction logics of the GRN of living systems (for review, see [8]). The lack of detailed knowledge about specific genes, their interaction functions and the formidable computational cost for modeling genome-wide networks has warranted a coarse-graining epitomized by the Boolean network approach. But more specifically, in the broader picture of system-wide dynamics, the discretization of gene expression level is also justified because (i) the above discussed steep sigmoidal shape of “transfer functions” (Fig. 2b) that describe the influence of one gene onto another’s expression rate can be approximated by a stepfunction and/or (ii) the local dynamics produced by such small gene circuit modules is in fact characterized by discontinuous transitions between discrete states as shown in Sect. “Network Dynamics”.

543

544

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

In addition, the Boolean network approach offers several advantages over the exhaustive, maximally detailed models favored by engineers who seek to understand a particular instance of a system rather than typical properties of a class of systems. The simplification opens a new vista onto fundamental principles that may have been obscured by the details since, as philosophers and physicists have repeatedly articulated, there is no understanding without simplification, and “less can be more” [11,30, 159]. An important practical advantage of the simplification in the Boolean network approach is the possibility to study statistical ensembles of ten thousands of network instances, i. e., entire classes of network architectures, and to address the question of how a particular architecture type maps into a particular type of dynamic behavior. Some of the results obtained in fact are valid for both discrete and continuous behavior of the network nodes [17]. Model Formalism for Boolean Networks In the Kauffman model of random Boolean networks gene activity values are binary (1 = ON, and 0 = OFF) [111, 115,116] and time is also discretized. Thus, a Boolean network is a generalized form of cellular automata but without the aspect of physical space and the particular neighborhood relations. Then, in analogy to continuous systems, a network of N elements i (i D 1; 2; : : : ; N), defines a network state S at any given discrete time step t : S(t) D [x1 (t); x2 (t); : : : ; x N (t)] where xi is the activity status that now only takes the values 1 or 0. The principles are summarized in Fig. 4. The state space that represents the entire dynamics of the network is finite and contains 2 N states. However, again, not all states are equally likely to be realized and observed, since genes do not behave independently but influence each other’s expression status. Regulation of gene i by its incoming network connections is modeled by the Boolean function Bi that is associated with each gene i and maps the configuration of the activity status (1 or 0) of its input genes (upstream regulators of gene i) into the new value of xi for the next time point. Thus, the argument of the Boolean function Bi is the input vector R i (t) D [x1 (t); x2 (t); : : : ; x ki (t)], where ki is the number of inputs that the gene i receives. At each time step, the value of each gene is updated: x i (t C 1) D B i [R i (t)]. The logical function Bi can be formulated as a “truth table” which is convenient for large k’s (Fig. 4a). In the widely studied case where each gene has exactly k D 2 inputs, the Boolean function can be one of the set of 2(2^k) D 16 classical Boolean operators, such as AND, OR, NOTIF, etc. [109,116]. Figure 4b shows the example of the well-studied lactose operon and how its regu-

latory characteristics can be captured as an AND Boolean function for the two inputs. In the simplest model, all genes are updated in every time step. This synchrony is artificial for it assumes a central clock in the cell which is not likely to exist, although some gating processes from oscillations in the redox potential has been reported [120]. The idealization of synchrony, however, facilitates the study of large Boolean networks, and many of the results that have been found with synchronous Boolean networks carry over to networks with asynchrony of updating that has been implemented in various ways [40,72,81,119]. For synchronous networks the entire network state S can also be viewed as being updated by the updating function U : S(t C 1) D U[S(t)], where U summarizes all the N Boolean functions Bi . This facilitates some treatments and is represented in the state transition table that lists all possible network states in one column and its successor in a second column. This leads to a higher-level, directed graph that represents the entire dynamics of a network as illustrated below. Dynamics in Boolean Networks The state transition table captures the entire dynamical behavior of the network and can conveniently be depicted as a state transition map, a directed graph in which a node represents one of the 2 N possible state S(t) (a box with a string or 1, 0s in Fig. 4a. representing the gene expression pattern). Such diagrams are particularly illustrative only for N up to 10, since they display all possible states S of the finite state space [204]. The states are connected by the arrows which represent individual state transitions and collectively depict the trajectories in the state space (Fig. 4a, right panel). Since the Boolean functions are deterministic, a state S(t) unambiguously transitions into one successor state S(t C 1). In contrast, a state can have multiple predecessors, since two or more different states can be updated into the same state. Hence, trajectories can converge but not diverge, i. e., there is no branching of trajectories emanating from one state. This property of “losing information about the ancestry” is essential to the robustness of the dynamics of networks. In updating the network states over time, S(t) can be represented as a walk on the directed graph of the state transition map. Because of the finiteness of the state space in discrete networks, S(t) will eventually encounter an already visited state, and because of the absence of divergence, the set of successive states will be the same as in the previous encounter. Thus, no matter what the initial state is, the network will eventually settle down in either a set of

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination, Figure 4 Principles of Boolean network models of GRN, exemplified on a N D 4 gene network. a From architecture to dynamics. The four genes i : A; B; C; D interact as indicated by the directed graph (top left panel), and each of the genes is assigned a Boolean function Bi as indicated, with the corresponding “truth table” shown below. A network state is a string of N D 4 binary variable, thus there are 2N D 24 D 16 possible network states S. They collectively establish the entire state space and can be arranged in a state transition map according to the state transitions imposed by the Boolean functions (right panel). The attractor states are colored gray. In the example, there are three point attractors and one limit cycle attractor (of period T D 2). The dotted lines in the state space denote the attractor boundaries. b Example for capturing the regulation at the promoter of the lactose operon as an “AND” Boolean function. Note that there are many ways to define what constitute an input – in the case shown, the allosteric ligands cAMP (“activator”) and a ˇ-galactoside (“inducer”), such as allolactose, give rise to a two-input Boolean function ‘AND’

cycling states (which form a limit cycle) or in a single stable state that updates onto itself. Accordingly, these states to which all the trajectories are “attracted” to are the attractors of the network. They are equivalent to stable oscillators or stable fixed-points, respectively, in the continuous description of gene circuits (Sect. “Bistability, Tristability and Multistability”). In other words, because of the regulatory interactions between the genes, the system cannot freely choose any gene expression configuration S. Again,

most network states S in the state space are thus unstable and transition to other states to comply with the Boolean rules until the network finds an attractor. And as with the small, continuous systems (Sect. “Bistability, Tristability and Multistability”) the set of states S that “drain” to an attractor state constitute its basin of attraction. However, unlike continuous systems, Boolean network dynamics do not produce unstable steady-states that can represent the indeterminacy of undecided cell states

545

546

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

that correspond to stem cells about to make an eitheror decision between two lineages (see Sect. “Bistability, Tristability and Multistability”). Instead, basins of attraction are “disjoint” areas of state space. Attractors as Cell Types Kauffman proposed that the high-dimensional attractor states represent cell types in metazoan organisms – thus expanding the early notion of steady state in small circuits to networks of thousands of genes [111,116] (Sect. “Bistability, Tristability and Multistability”). This provides a natural explanation for the stability of the genome-wide expression profile that is characteristic of and defines a cell type as well as for the stable coordination of genome-wide gene expression oscillations in proliferating cells that undergo the cell division cycle [202]. The correspondence of attractors in large networks with the cell type specific transcriptome is a central hypothesis that links the theoretical treatment of the dynamics of complex networks with experimental cell biology. Use of Boolean Networks to Model Real Network Dynamics Owing to their simple structure, Boolean networks have been applied in place of differential equations to model real-world networks for which a rudimentary picture of the topology with only few details about the interaction functions is known. Hereby, individual Boolean functions are assigned to the network nodes either based on best guesses, informed by qualitative descriptions from the experimental literature, or randomly. This approach has yielded surprisingly adequate recapitulation of biological behaviors, indicating that the topology itself accounts for a great deal of the dynamical constraints [4,49,60,62, 95,130]. Such studies also have been used to evaluate the dynamical regime (discussed in the next section) of real biological networks [19]. Three Regimes of Behaviors for Boolean Networks The simplicity and tractability of the Boolean network formalism has stimulated a broad stream of investigations that has led to important insights in the fundamental properties of the generic dynamics that large networks can generate [116] even before progress in genomics could even offer a first glimpse on the actual architecture of a real gene regulatory network [112]. Using the ensemble approach, the architectural parameters that influence the global, long-term behavior of complex networks with N up to 100,000 has been determined [114,116]. As mentioned in the introduction, based on the latest count of the genome size and the idea that

the core transcriptional network essentially governs the global dynamics of gene expression profiles, networks of N 3000 would have sufficed. Nevertheless, a striking result from the ensemble studies was that for a broad class of architectures, even a complex, irregular (randomly wired) network can produce ‘ordered’ dynamics with stable patterns of gene expression, thus potentially delivering the “biological observable”. In general, the global behavior of ensembles of Boolean networks can be divided in three broad regimes [116]: an ordered and a chaotic regime, and a regime that represents behavior at their common border, the critical regime. Ordered and Chaotic Regimes of Network Behavior In networks in the ordered regime, two randomly picked initial states S1 (t D 0) and S2 (t D 0) that are closed to one another in state space as measured by the Hamming distance H[H(t) D jS1 (t)  S2 (t)j the number of genes whose activities differ in the two states S1 (t) and S2 (t)] will exhibit trajectories that on average will quickly converge (that is, the Hamming distance between the two trajectories will on average decrease with time) [55]. The two trajectories will settle down in one fixed-point attractor or a limit cycle attractor that will have a small period T compared to N and thus, produce very stable system behaviors. Such networks in general have a small number of attractors with typically have small periods T and drain large basins of attraction [116]. Numerical analyzes of large ensembles suggest that the average period length scales with p N. The state transition map, as shown for N D 4 in Fig. 4a, will show that trajectories converge onto attractor states from many different directions and are in general rather short [Maliackal, unpublished] so that attractor basins appear as compact, with high rotational symmetry and hence, “bushy”. It is important to stress here that if a cell type is an attractor, then different cell types are different attractors, and, in the absence of the unstable steady states present in continuous dynamical systems (see Sect. “Bistability, Tristability and Multistability”) differentiation consists in perturbations that move a system state from one attractor into the basin of another attractor from which it flows to the new attractor state that encodes the gene expression profile of the new cell type. Examination of the bushy basins in the ordered regime makes it clear, as do numerical investigations and experiments (Sect. “Are Cell Types Attractors?”), that multiple pathways can lead from one attractor to another attractor – a property that meets resistance in the community of pathway-centered biologists.

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

In contrast, in networks in the chaotic regime, two randomly placed initial states S1 (t D 0) and S2 (t D 0) that are initially close to one another (in terms of Hamming distance) will generate trajectories that on average will diverge and either end with high likelihood in two different attractors, or they may appear to “wander” aimlessly in state space. This happens because the attractor is a limit cycle attractor with very long period T – on the scale of 2 N so that in the worst case a trajectory may visit most if not all 2 N possible network configurations S. For a small network of just N D 200, this is a limit cycle in the order of length 2100  1030 time steps. As a point of comparison, the universe is some 1017 seconds old. Given the hyperastronomic size of this number, this “limit cycle” will appear as an aperiodic and endless stream of uncorrelated state transitions, as if the system is on a “permanent transient”. Thus, networks in the chaotic regime are not stable, trajectories tend to diverge (at least initially), and their behavior is sensitive to the initial state. In the state transition map, the small attractors typically receive trajectories with long transients that arrive from a few state space directions and hence, in contrast to the bushy attractors of the ordered regime, the basins have long thin branches and appear “tree-like”. The definition of “chaos” for discrete networks given here is distinct from that of (deterministic) chaos in continuous systems, where time evolution of infinitesimally closed initial states can be monitored and shown to diverge. Nevertheless, the degree of chaos in discrete networks, as qualitatively outlined above, is well-defined and can be quantified based on the slope of the curve in the socalled Derrida plot which assesses how a large number of random pairs of initial states evolves in one time step [55]. More recently, it was shown that it is possible to determine the behavior class from the architecture, without simulating the state transitions and determining the Derrida plot, simply by calculating the expected average sensitivity from all the N Boolean functions Bi [173]. In addition, a novel distance measure that uses normalized compression distances (NCD) – which captures the complexity in the difference between two states S(t) better than the Hamming distance used in the Derrida plot – has been proposed to determine the regime of networks [153]. Critical Networks: Life at Edge of Chaos? Critical networks are those which exhibit a dynamical behavior just at the edge between that of the ordered and the chaotic regime, and have been postulated to be optimally poised to govern the dynamics of living cells [114, 116]. Ordered behavior would be too rigid, in that most

perturbations of a stable attractor would be absorbed and the network would return to the same attractor state, minimizing the possibility for a network to change its internal equilibrium state in response to external signals (which are modeled as flipping individual genes from ON to OFF or vice versa). Chaotic behavior, on the other hand, would be too sensitive to such perturbations because trajectories diverge – so that the network would wander off in state space and fail to exhibit robust behavior. Critical networks may represent the optimal mix between stability and responsiveness, hence conferring robustness to random perturbations (noise) and adaptability in response to specific signals. Critical network have several remarkable features: First, consider the need of cells to plausibly make the maximum number of reliable discriminations, and to act on them in the presence of noise with maximum reliability. Then ‘deep’ in the ordered regime, convergence of trajectories in state space is high, hence, as explained above (Sect. “Dynamics in Boolean Networks”) information is constantly discarded. In this “lossy” regime, information about past discriminations is thus easily lost. Conversely, in the chaotic regime and with only a small amount of noise, the system will diverge and hence cannot respond reliably to external signals (perturbations). It seems plausible that optimal capacity to categorize and act reliably is found in critical networks, or networks slightly in the ordered regime. Second, it has recently been shown that a measure of the correlation of pairs of genes with respect to their altering activities, called “mutual information” (MI) is maximized for critical networks [154]. The MI measures the mutual dependence of two variables (vectors), such as two genes based on their expression in a set of states S(t). Consider two genes in a synchronous Boolean network, A and B. The mutual information between xA and xB is defined as MI(A,B) = H(xA ) C H(xB )  H(xA ; xB ). Here, H(x) is the entropy of the variable x, and H(x; y) is the joint entropy of x and y. Mutual information is 0 if either gene A or B is unchanging, or if A and B are changing in time in an uncorrelated way. Mutual information is greater than 0 and bounded by 1.0, if A and B are fluctuating in a correlated way. Thus, critical networks maximize the correlated changing behavior of variables in the genetic network. This new result strongly suggests, at least in the ensemble of random Boolean networks, that critical networks can coordinate the most complex organized behavior. Third, the “basin entropy” of a Boolean network, which characterizes the way the state space is partitioned in to the disjoint basins of attraction of various sizes (Fig. 4a) also exposes a particular property of critical networks [124]. If the size or “weight”, W i , of a basin of attrac-

547

548

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

tion i is the fraction of all the 2 N states that flow to that attractor, then the basin entropy is defined as ˙ Wi log(W i ). The remarkable result is that only critical networks have the property that this basin entropy continues to increase as the size of the network increases. By contrast, ordered and chaotic networks have basin entropies that first increase, but then stop increasing with network size [124]. If one thinks of basins of attraction and attractors not only as cell fates, or cell types, but as distinct specific cellular programs encoded by the network, then only critical networks appear to have the capacity to expand the diversity of what “a cell can do” with increasing network size. Again, this strongly suggests that critical networks can carry out the most complex coordinated behaviors. Thus, GRN may have evolved under natural selection (or otherwise – see Sect. “Evolution of Network Structure: Natural Selection or Natural Constraint?”) to be critical. Finally, it is noteworthy that while Boolean networks were invented to model GRNs, the variables can equally be interpreted as any kind of two-valued states of components in a cell, and the Boolean network becomes a causal network concerning events in a cell, including GRN as a subset of such events. This suggests that not only GRN but the entire network of processes in cells, including signal transduction and metabolic processes, that is, information, mass and energy flow, may optimally be coordinated if the network is critical. If life is poised to be in the critical regime, then the questions follow: which architecture produces ordered, critical and chaotic behaviors, and are living cells in fact the critical regime? Architectural Features of Large Networks and Their Dynamics As outlined in Sect. “Network Architecture”, the recent availability of data on gene regulation in real networks, although far from complete, has triggered the study of complex, irregular network topologies as static graphs. This line of investigation is now beginning to merge with the study of the dynamics of generic Boolean networks. Below we summarize some of the interesting architecture features and their significance for global dynamics in term of the three regimes. First, studies of generic dynamics in ensembles of Boolean networks have established the following major structure-dynamics relationships: (1) The average input connectivity per node, k. Initial studies on Boolean networks by Kauffman assumed a homogenous distribution of inputs k. It was found that k D 2 networks are in the ordered/critical regime (given other parameters, see below) [116]. Above

a critical kc value (which depends on other parameters, see below) networks behave chaotically. Analysis of continuous, linearized models also suggest that in general, sparsity of connections is more likely to promote ordered dynamics – or stability [142]. (2) The distribution of the connectivity (degree) over the individual network genes. As mentioned in Sect. “Network Architecture”, the global topology of many complex molecular networks appears to have a connectivity distribution that approximates a power-law. For directed graphs such as the GRN, one needs to distinguish between the distribution of the input and output connectivities. Analyses of the dynamics of random Boolean networks with either scale-free input [66] or output connectivity distribution suggest that this property favors the ordered regime for a given value of the parameter p (“internal homogeneity”, see below) [6]; Specifically, if the slope of the scale free distribution (power-law exponent  ) is greater than 2.5, the corresponding Boolean network is ordered, regardless of the value of the parameter p (see below). For values of p approaching 1.0,  > 2.0 suffices to assure ordered dynamics. (3) The nature of the Boolean functions is an important aspect of the network architecture that also influences the global dynamics. In the early studies, Kauffman characterized Boolean functions with respect to these two important features [116]: (a) “internal homogeneity p”. The parameter p (0:5 < p < 1) is the proportion of either 1s or 0s in the output column of the truth table of the Boolean function (Fig. 4). Thus, a function with p D 0:5 has equal numbers of 1s and 0s for the output of all input configurations. Boolean functions with p-values close to 1 or 0 are said to exhibit high internal homogeneity. (b) “Canalizing function”. A Boolean function Bi of target gene i is said to be canalizing if at least one of its inputs has one value (either 1 or 0) that determines the output of gene i (1 or 0), independently of the values of the other components of the input vector. If the two values of the input j determines both output values of gene i [a “fully canalizing” function, e. g., if x j (t) D 1 (or 0, respectively), then x i (t C 1) D 1 (or 0, respectively)], then the other inputs have no influence on the output at all and the “effective input connectivity” of i; k ieff is smaller than the “nominal” ki . For instance, for Boolean functions with k D 2, only two of the 16 possible functions, XOR and XNOR, are not canalizing. Four functions are “fully canaliz-

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

ing”, i. e., are effectively k D 1 functions (TRANSFER, COMPLEMENT). From ensemble studies it was found that both a high internal homogeneity p and a high proportion of canalizing functions contribute to ordered behavior [116]. Do Real Networks Have Architecture Features That Suggest They Avoid Chaos? Only scant data is available for transcriptional networks, and it must be interpreted with due caution since new data may, given sampling bias and artefacts, especially for nearly scale-free distributions, affect present statistics. In any case, existing data indicate that in fact the average input connectivity of GRNs is rather low, and far from k N which would lead to chaotic behavior. Specifically, analysis of available (partial) transcriptional networks suggest that the input degrees approximates an exponential distribution while the output degree distribution seems to be scale-free although the number of nodes are rather small to reliably identify a power-law distribution [54,82]. GRN data from E. coli, for which the most complete, hand-curated maps of genome-wide gene regulation exists [169] and from partial gene interaction networks from yeast, obtained mostly by chromatin-precipitation/microarray (ChIP-chip) [128], indicate that the average input connectivity (which is exponentially distributed) is below 4 [82,134,188]. A recent work on the worm C. elegans using the Yeast-one-hybrid system on a limited set of 72 promoters found an average of 4 DNAbinding proteins per promoter [54]. Thus, while such analyzes await correction as coverage increase, clearly, real GRNs for microbial and lower metazoan are sparse, and hence more likely to be in or near the ordered regime. In contrast to the input, the output connectivity appears to be power-law distributed for yeast and bacteria, and perhaps also for C. elegans, if the low coverage of the data available so far can be trusted [54,82]. The paucity of data points precludes reliable estimates of the power-law exponents. However, the scale-free property, if confirmed for the entire GRN, may well also contribute to avoiding chaotic and increasing the ordered or critical regime [6]. Nevertheless, It is reminded here that the deeper meaning of the scale-freeness per se and its genesis (natural selection due to functionality or not) are not clear – it may be an inevitable manifestation of fundamental statistics rather than evolved under natural selection (Sect. “Evolution of Network Structure: Natural Selection or Natural Constraint?”). As for the use of Boolean functions, analysis of a set of 150 experimentally verified regulatory mechanism of

well-studied promoters revealed an enrichment for canalizing functions, again, in accordance with the architecture criteria associated with ordered dynamics [86]. Similarly, when canalizing functions were randomly imposed onto the published yeast protein–protein interaction network topology to create a architecture whose dynamics was then simulated, ordered behavior was observed [113]. In this conjunction it is interesting to mention microRNAs (miRNAs), a recently discovered class of non-coding RNA which act by inhibiting gene expression at the posttranscriptional level through sequence complementarity to mRNA [41,89]. Hence, they suppress a gene independent of the TF constellation at the promoter. Thus, miRNAS epitomize the most simple and powerful molecular realization of a canalizing function. Their existence may therefore shift the network behavior from the chaotic towards the critical or ordered regime. Interestingly, RNAbased gene regulation is believed to have appeared before the metazoan radiation 1000 million years ago, and microRNAs are thought to have evolved in ancestors of Bilateria [141]. In fact, many miRNA play key roles in cell fate determination during tissue specification of development of vertebrates [41,172], and a composite feedback loop circuit involving both microRNA and TFs has been described in neutrophil differentiation [63]. The shift of network behavior from the chaotic towards the critical or ordered regime may indeed enable the coordination of complex gene expression patterns during ontogeny of multicellular systems which requires maximal information processing capacity to ensure the coexistence of stability and diversity. There are, as mentioned in Sect. “Analyzing Network Structures”, many more global and local topological features that have been found in biomolecular networks that appear to be interesting, as defined in the sense in Sect. “Network Architecture” (enriched above some null model graph). It would be interesting to test how they contribute to producing chaotic, critical or ordered behavior. The impact of most of these topology features on the global dynamics, notably the three regimes of behaviors, remains unknown, since most functional interpretation of network motifs have focused on local dynamics [9,136]. The low quality and availability of experimental data of GRN architectures opens at the moment only a minimal window into the dynamical regimes of biological networks. The improvement with respect to coverage and quality of real GRNs, to be expected in the coming years, is mostly driven by a reductionist agenda whose intrinsic aim is to exhaustively enumerate all the “pathways” that may serve “predictive modeling” of gene behaviors, as detailed in Sect. “Small Gene Circuits”. However,

549

550

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

with the concepts introduced here, a framework now exists that warrants a deeper, genome-wide analysis of the relationship between structure and biological function. Such analysis should also address the fundamental dualism between inevitable self-organization (due to intrinsic constraints from physical laws) and natural selection (of random mutants for functional advantages) [46,93,203] to ask whether criticality is self-organized (see Sect. “Evolution of Network Structure: Natural Selection or Natural Constraint?”). Experimental Evidence from Systems Biology The experimental validation of the central concepts that were erected based on theoretical analysis of network dynamics, notably the ensemble approach of Boolean networks, amounts to addressing the following two questions: 1. Are cell types attractors? 2. Is the dynamical behavior of the genomic GRN in the critical regime? As experimental systems biology begins to reach beyond the systematic characterization of genes and their interactions it has already been possible to design experiments to obtain evidence to address these questions. Are Cell Types Attractors? Obviously, the observable gene expression profiles S , as shown in Fig. 1e, are stationary (steady) states and characteristic of a cell type. But “steady state” ( dx/dt D 0 in Eq. (1)) does not necessary imply a stable (self-stabilizing) attractor state that attracts nearby states. In the absence of knowledge of the architecture of the underlying GRN (we do not know the function F in Eq. (1)) we cannot perform the standard formal analysis, such as linear stability analysis around S , to determine whether a stationary state is stable or not; however, use of microarray-based expression profiling not to identify individual genes as in the genecentered view, but in a novel integrated manner (Table 1, B3) for the analysis of S(t) provides a way to address the question. The qualitative properties of an attractor offer a handle, in that a high-dimensional attractor state S , be it a fixed-point or a small limit cycle or even a strange attractor in the state space, requires that the volume of the states around it contracts to the attractor, i. e., div(F) < 0. [90]. Convergence of Trajectories Thus, one consequence is that trajectories emanating from states around (and near) S converge towards it from most (ideally, all) directions and in all dimensions of the state space. It is technically challenging to sample multiple high-dimensional initial

states near the attractor states and demonstrate that they all converge towards S  over time. However, the fact that the promyeloid precursor cells HL60 (a leukemic cell line) can be triggered to differentiate into mature neutrophillike cells by an array of distinct chemical agents can be exploited [45]. This historical observation itself, without analysis of S(t) by gene expression profiling, already suggests that the neutrophil state is an attractor, because it reflects robustness and means that the detailed history of how it is reached doesn’t matter. But the advent of microarray-based gene expression profiling technologies has recently opened up the possibility to show that (at least two) distinct trajectories converge towards gene expression profile of the neutrophil state SNeutr if SNeutr is an attractor state [98]. Thus, HL60 cells were treated with either one of two differentiation-inducing reagents, all-trans-retinoid acid (ATRA) and dimethyl sulfoxide (DMSO), two chemically unrelated compounds, and the changes of the transcriptome over time in response to either treatment were measured at multiple time intervals to monitor the two trajectories, S ATRA (t) and S DMSO (t), respectively (see Fig. 5a for details). In fact the two trajectories first diverged to an extent that the two state vectors lost most of their correlation (i. e. the two gene expression profiles S ATRA (24 h) and S DMSO (24 h) were maximally different at t D 24 hours after stimulation). But subsequently, they converged to very similar S(t) values when the cells reached the differentiated neutrophil state for both treatments (Fig. 5a). The convergence was not complete, but quite dramatic relative to the maximally divergent state at 24 h, and was contributed by around 2800 of the 3800 genes monitored (for details see [98]). Thus, it appears that at least the artificial drug-induced differentiated neutrophil state is so stable that it can apparently orchestrate the expression of thousands of genes to produce the appropriate celltype defining expression pattern S from quite distinct perturbed cellular states. Although only two trajectories have been measured rather than an entire state space volume, the convergence with respect to 2800 state space dimensions is strongly indicative of a high-dimensional attractor state. Relaxation After Small Perturbations and the Problem of Cell Heterogeneity A second way to expose qualitative properties of a high-dimensional attractor S  is to perform a weak perturbation of S  , into a state S 0 near the edge of (but within) the basin of attraction (where S 0 should differ from S with respect to as many genes xi as possible), and to observe its return of the network to S . This more intuitive property of an attractor state was

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination, Figure 5 Experimental evidence that a cell fate (differentiated state, cell type) is a high-dimensional stable attractor in gene expression state space. Evidence is based on convergence (a) from different directions of high-dimensional trajectories to the gene expression pattern of the differentiated state, or on the relaxation of states placed near the border of the basin of attraction back to the “center” of the attractor state (b). a Gene expression profile dynamics of HL60 cells treated with ATRA (all-trans retinoid acid) or DMSO (dimethylsulfoxide) at 0h to differentiate them into neutrophil-like cells. Selected gene expression profile snapshots along the trajectories shown as GEDI maps (as in Fig. 1), schematically placed along the state space trajectories. GEDI maps show the convergence of N  2800 genes towards very similar patterns at 168h [98]. b Heterogeneity of a population of clonal (genetically identical) cells is exploited to demonstrate the relaxation towards the attractor state. The histogram (top, left panel) from flow cytrometric measurement shows the inherent heterogeneity (spread) of cells with respect to stem cell marker Sca-1 expression in EML cells which cannot be attributed solely to measurement or gene expression noise but reflect metastable cell-individuality [37]. Two spontaneous “outlier” subfractions, expressing either low (L) or high (H) levels of the Sca-1, were sorted using FACS (fluorescence-activated cell sorting) and cultured independently. They represent “small perturbations” of the attractor state. Each sorted subpopulation will over a period of 5–9 days restore the parental distribution (right panel, schematic). Gene expression profiling (shown as as GEDI maps) reveal that the H and L subpopulations are distinct with respect not only to Sca-1 expression levels but also to that of multiple other genes. Thus, the two outlier cell fractions are at distinct states SH and SL in the high-dimensional state space of which Sca-1 is only one distinguishing dimension. The restoration of the parental distribution of Sca-1 is accompanied by the approaching of the two distinct gene expression profiles SH and SL of the spontaneously perturbed cells to become similar to each other (and to that of the parental population), indicating a high-dimensional attractor state

551

552

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

difficult to measure because of a phenomenon often neglected by theorists and experimentalists alike: cell population heterogeneity due to “gene expression noise”. Microarray measurements, as is the case with many biochemical analysis methods, require the material of millions of cells, thus the measured S(t) is actually a population average. This can be problematic because the population is heterogeneous [33]: gene expression level of gene i can typically differ by as much as 100 or more fold between two individual cells within a clonal (genetically identical) population [37]. Thus, virtually all genes i exhibit a broad (log-normal) histogram (Fig. 5b, inset) when the expression level of an individual gene xi is measured at the single-cell level across a population [38,101,129]. Such cellto-cell variability is often explained by “gene expression noise” caused by random temporal fluctuations due to low copy numbers of specific molecules in the cell [108]. However, there may be other possibly deterministic sources that generate metastable variant cells [33,176]. Such nongenetic, enduring individuality of cells translates into the picture of a cell population forming a cloud of points in state space around the attractor S , in which each cell represents a single point. Application of a low dose perturbation intended to allow a system to relax back to the attractor state then will be interpreted by individual cells differently: Those positioned at the border of the cloud may be kicked out of the basin of attraction, and move to another attractor, thus masking the trajectory of relaxation. In fact, single-cell resolution measurements of the response to low dose stimulation in cell populations confirmed this picture of heterogeneity, in that a differentiation inducer given at low dose to trigger a partial response (weak perturbation) produced a bimodal distribution of gene expression of differentiation markers: Some cells differentiated, other did not [38]. However, the spontaneous heterogeneity eo ipso, consisting of transient but persistent variant cells within a population [37], allows us to demonstrate the relaxation to the attractor when single cell-level manipulation and analysis are performed: Physical isolation of the population fractions at two opposite the edge of the cloud (basin of attraction), based on one single state space dimension xk can substitute for the weak perturbation that places cells to the border of a basin (see Fig. 5b for details). Indeed such “outlier” population fractions exhibited not only distinct levels of xk expression but also globally distinct gene expression profiles (despite being members of the same clonal population). Cells of both outlier fractions eventually “flowed back” to populate the state space region around the attractor state and restored the original distribution (shape of the cloud) [37]. The

time scale of this relaxation (>5 days) was similar to that for the HL60 cells to converge to the attractor of the differentiated neutrophils. This result is summarized in Fig. 5b. Again, the spontaneous regeneration of the distinct gene expression profile of a macroscopically observable cell phenotype, consisting of thousands of genes, supports the notion of a high-dimensional attractor in gene expression space that maps into a distinct, observable cell type. Are Gene Regulatory Networks in Living Cells Critical? The second central question we ask in this article is whether GRNs produce a global dynamics that is in the critical regime (Sect. “Three Regimes of Behaviors for Boolean Networks”). This question is not as straightforward to address experimentally. First, the notion of order, chaos and criticality are defined in the models as properties of network ensembles. Second, it is cumbersome to sample a large number of pairs of initial states and monitor their high-dimensional trajectories to determine whether they on average converge, diverge or stay “parallel”. As mentioned earlier (Sect. “Dynamics in Boolean Networks”), one approach for obtaining a first glimpse of an idea as to whether real networks are critical or not was recently proposed by Aldana and coworkers [19] and refs. herein): They “imposed” a dynamics onto real biological networks, for which only the topology but not the interaction functions are known, by treating them as Boolean networks, whereby the Boolean functions were guessed or randomly assigned according to some rules. Such studies suggest that these networks, given their assumed topologies, are in the critical regime. To more directly characterize the observed dynamics of natural systems in terms of the three regimes, several indirect approaches have been taken. These strategies are based on novel measurable quantities computed from several schemes of microarray experiments that are now available but were not originally generated with an intention to answer this question. By first determining in simulated networks whether that quantity is associated also with criticality in silico, inference is then made as to what regime of system behavior the observed system resembles. Three such pieces of evidence provide a first hint that perhaps, living cells may indeed be in the critical regime: (i)

Gene expression profile changes during Hela cell cycle progression was compared with the detailed temporal structure of the updating of network states in Boolean networks. Tithe discretized real gene expression data of cells progressing in the cell cycle were

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

compared with that of simulated state cycles of random Boolean networks in the three regimes in terms of the Lempev-Zip complexity of time series. This led to the conclusion that the dynamical behavior of thousands of genes was most consistent with either ordered or critical behavior, but not chaotic behavior [174]. (ii) A striking property predicted from analysis of simulated critical Boolean networks is that if a randomly selected gene is deleted and the number of other genes that change their expression as a consequence of that single-gene deletion (“avalanche size”) is measured, and such single-gene deletion experiments is repeated many times, the avalanche-sizes will exhibit a power law distribution with a slope of  D 1:5. This specific behavior is only seen in critical networks. Analysis of just such data for over 200 single deletion mutants in yeast [163] from the experiments reported by Hughes et al. [100]was recently performed. It was found that the distribution of the avalanche sizes not only approached a power law, but that the slope was also  1.5 [163]. This result was insensitive to altering the criterion for defining a “change in gene activity” from two-fold to five fold in calculating the avalanche size. (iii) A more direct determination of the regime of network dynamics was recently reported, in which an analogous analysis as the Derrida plot (see Sect. “Ordered and Chaotic Regimes of Network Behavior”) was performed [153]. Macrophage gene expression profiles were measured at various time points in response to various perturbations (= stimulation of Toll-like receptors with distinct agents), offering a way to measure the time evolution of a large number of similar initial states. Here, instead of the Hamming distance the normalized compression distance NCD (as mentioned in Sect. “Ordered and Chaotic Regimes of Network Behavior”) was used to circumvent the problems associated with Derrida curves. The results were consistent with critical dynamics, in that on average, for many pairs of initial states, their distance at time t and t C t was on average equal, thus neither trajectory convergence nor divergence took place. Future Directions and Questions The ideas of a state space representing the dynamics of the network, and of its particular structure that stems from the constraints imposed by the regulatory interactions and can be epitomized as an ‘epigenetic landscape’, serve as a con-

ceptual bridge linking network architecture with the observable biological behavior of a cell. In this framework, the attractors (valleys) are disjoint regions in the state space landscape and represent cell fates and cell types. One profound and bold hypothesis is that for networks to have a landscape with attractors that optimally convey cell lineage identity and robustness of their gene expression profile, yet allow enough flexibility for cell phenotype switches during development, the networks must be poised at the boundary between the chaotic and the ordered regime. Such critical networks may be a universal property of networks that have maximal information processing capacity. But what can we naturally do with this conceptual framework presented here? And what are the next questions to ask for the near future? Clearly, new functional genomics analysis techniques will soon advance the experimental elucidation of the architecture of the GRN of various species, including that of metazoan organisms. This will drastically expand the opportunities for theoretical analysis of the architecture of networks and finally afford a much closer look at the dynamics without resorting to simulated network ensembles. However, beyond network analysis in terms of mathematical formalisms, the concepts of integrated network dynamics presented here should also pave the way for a formal rather than descriptive understanding of tissue homeostasis and development as well as diseases, such as cancer. One corollary of the idea that cell types are attractors is that cancer cells are also attractors – which are lurking somewhere in state space (near embryonic or stem cell attractors) and are normally avoided by the physiological developmental trajectories. They become accessible in pathological conditions and trap cells in them, preventing terminal differentiation (this idea is discussed in detail in [33,96]). Such biological interpretation of the concepts presented here will require that these concepts be expanded to embrace the following aspects of dynamical biological systems that are currently not well understood but can already be framed: Attractor Transitions and Development If cell fates are attractors, then development is a flow in state space trajectories which then represent the developmental trajectories. But how do cells, e. g., a progenitor cell committing to a particular cell fate, move from one attractor to another? Currently, two models are being envisioned for the “flow between attractors” that constitutes developmental paths in the generation of the multi-cellular organisms:

553

554

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

(i) In one model, stimulated by the studies in Boolean networks, the cell “jumps” from one attractor to another in response to a distinct perturbation (e. g., developmental signal) which is represented as the imposed alteration of the expression status of a set of genes (“bit-flipping” in binary Boolean networks). This corresponds to the displacement of the network state S(t) by an external influence to a new position in the basin of another attractor. (ii) The second model posits that the landscape changes, and has its roots in the classical modeling of nonlinear dynamical systems: For instance, the attractor of the progenitor cell (valley) may be converted into an unstable steady-state point (hill top) at which point the network (cell) will spontaneously be attracted by either of the two attractors on each side of the newly formed hill. This is exemplified in a model for fate decision in bipotent progenitor cells in hematopoiesis [99]. In this model, a change of the landscape structure ensues from a change in the network architecture caused by the slow alteration of the value of a system parameter that controls the interaction strength in the system equations (see Sect. “Small Gene Circuits”). Thus, the external signal that triggers the differentiation exerts its effect by affecting the system parameters. Increasing the decay rates of x1 and x2 in the example of Fig. 3b will lead to the disappearance of the central attractor and convert the tristable landscape of Fig. 3b to the bistable one of Fig. 3a [99]. Such qualitative changes that occur during the slow increase or decrease of a system parameter are referred to as a bifurcation. Note that this second model takes a narrower view, assuming that the network under study is not the global, universal and fixed network of the entire genome, as discussed in Sect. “Complex Networks”, but rather a subnetwork that can change its architecture based on the presence or absence of gene products of genes that are outside of the subnetwork.

random partitioning of molecules at cell divisions, implies that cell states cannot be viewed as deterministic points and trajectories in the state space, but rather as moving “clouds” – held together by the attractors. More recently, on the basis of such bistable switches, it has even been proposed and shown in synthetic networks that environmental bias in fluctuation magnitude may explicitly control the switch to the physiologically favorable attractor state – because gene expression noise may be higher (relative to the deterministically controlled expression levels) when cells are in the attractor that dictates a gene expression pattern that is incompatible with a given environment [110]. Such biased gene expression noise thus may drive the flow between the attractors, and hence, (on average) guide the system through attractor transitions to produce various cell fates in a manner that is commensurate with development of the tissue. On the other hand, it may also explain the local stochasticity of cell fate decisions observed for many stem and progenitor cells.

Noisy Systems

Beyond Cell Autonomous Processes

In a bistable switching system, it is immediately obvious that random fluctuations in xi due to “gene expression noise” may trigger a transition between the attractors and thus, explain the stochastic phenotype transitions, as has been recently shown for several micro-organismal systems and discussed for mammalian cell differentiation [87,97, 101,108]. The observed heterogeneity of cells in a nominally identical cell population;caused either by “gene expression noise” or other diversifying processes, such as the

To understand development we need of course to open our view beyond the cell autonomous dynamics of GRNs. Some of the genes expressed in particular states S(t) encode secreted proteins that affect the gene expression hence, state S(t) of neighboring cells. Such inter-cell communication establishes a network at a higher level, with its own constrained dynamics that need to be incorporated in the models of developmental trajectories in gene expression state space.

Directionality The concepts of cell fates and cell types as attractors, as well as any mechanisms that explains attractor transitions during development and differentiation, do not explain the overall directionality of development, or the “arrow of time” in ontogenesis: Why is development essentially a one-way process, i. e., irreversible in time, given that the underlying regulatory events, the switching ON or OFF of genes, are fully reversible? Where does the overall slope (from back to front) in Waddington’s epigenetic landscape (Fig. 2g) come from? One idea is that “gene expression noise” may play the role of thermal noise (heat) in thermodynamics in explaining the irreversibility of processes. Alternatively, the network could be specifically wired, perhaps through natural selection so that attractor transitions are highly biased for one direction.

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

Evolution As discussed several times in this article, when networks are studied in the light of evolution, a central question arises: how do mutations, which essentially rewire the network by altering the nature of regulatory interactions, give rise to the particular architecture of the GRN that we find today? What are the relative roles in shaping particular network architectures, of (i) constraints due to physical (graph-theoretical) laws and self-organization vs. that of (ii) natural selection for functionality? If selection plays a major role, can it select for such features as the global landscape structure, or even for criticality? Or can the latter even “self-organize” without adaptive selection [31]? Such questions go far beyond the current analysis of Darwinian mechanisms of robustness and evolvability of networks. They are at the heart of the quest in biocomplexity research for fundamental principles of life, of which the process of natural selection itself is a just subset. Regulatory networks offer a accessible and formalizable object of study to begin to ask these questions. Bibliography Primary Literature 1. Adolfsson J, Mansson R, Buza-Vidas N, Hultquist A, Liuba K, Jensen CT, Bryder D, Yang L, Borge OJ, Thoren LA, Anderson K, Sitnicka E, Sasaki Y, Sigvardsson M, Jacobsen SE (2005) Identification of Flt3+ lympho-myeloid stem cells lacking erythromegakaryocytic potential a revised road map for adult blood lineage commitment. Cell 121:295–306 2. Aittokallio T, Schwikowski B (2006) Graph-based methods for analysing networks in cell biology. Brief Bioinform 7:243–55 3. Albert R (2005) Scale-free networks in cell biology. J Cell Sci 118:4947–57 4. Albert R, Othmer HG (2003) The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol 223:1–18 5. Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406:378–82 6. Aldana M, Cluzel P (2003) A natural class of robust networks. Proc Natl Acad Sci USA 100:8710–4 7. Aldana M, Balleza E, Kauffman S, Resendiz O (2007) Robustness and evolvability in genetic regulatory networks. J Theor Biol 245:433–48 8. Aldana M, Coppersmith S, Kadanoff LP (2003) Boolean dynamics with random couplings. In: Kaplan E, Marsden JE, Sreenivasan KR (eds) Perspectives and problems in nonlinear science. A celebratory volume in honor of Lawrence Sirovich. Springer, New York 9. Alon U (2003) Biological networks: the tinkerer as an engineer. Science 301:1866–7 10. Amaral LA, Scala A, Barthelemy M, Stanley HE (2000) Classes of small-world networks. Proc Natl Acad Sci USA 97:11149–52 11. Anderson PW (1972) More is different. Science 177:393–396

12. Angeli D, Ferrell JE Jr., Sontag ED (2004) Detection of multistability, bifurcations, and hysteresis in a large class of biological positive-feedback systems. Proc Natl Acad Sci USA 101: 1822–1827 13. Arney KL, Fisher AG (2004) Epigenetic aspects of differentiation. J Cell Sci 117:4355–63 14. Artzy-Randrup Y, Fleishman SJ, Ben-Tal N, Stone L (2004) Comment on Network motifs: simple building blocks of complex networks and Superfamilies of evolved and designed networks. Science 305:1107; author reply 1107 15. Autumn K, Ryan MJ, Wake DB (2002) Integrating historical and mechanistic biology enhances the study of adaptation. Q Rev Biol 77:383–408 16. Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (2004) Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14:283–91 17. Bagley RJ, Glass L (1996) Counting and classifying attractors in high dimensional dynamical systems. J Theor Biol 183:269–84 18. Balcan D, Kabakcioglu A, Mungan M, Erzan A (2007) The information coded in the yeast response elements accounts for most of the topological properties of its transcriptional regulation network. PLoS ONE 2:e501 19. Balleza E, Alvarez-Buylla ER, Chaos A, Kauffman A, Shmulevich I, Aldana M (2008) Critical dynamics in genetic regulatory networks: examples from four kingdoms. PLoS One 3:e2456 20. Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–12 21. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5: 101–113 22. Becskei A, Seraphin B, Serrano L (2001) Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. EMBO J 20:2528–2535 23. Berg J, Lassig M, Wagner A (2004) Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evol Biol 4:51 24. Bergmann S, Ihmels J, Barkai N (2004) Similarities and differences in genome-wide expression data of six organisms. PLoS Biol 2:E9 25. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal A, Feil R, Schreiber SL, Lander ES (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125:315–26 26. Bird A (2007) Perceptions of epigenetics. Nature 447:396–8 27. Bloom JD, Adami C (2003) Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein–protein interactions data sets. BMC Evol Biol 3:21 28. Bloom JD, Adami C (2004) Evolutionary rate depends on number of protein–protein interactions independently of gene expression level: response. BMC Evol Biol 4:14 29. Bork P, Jensen LJ, von Mering C, Ramani AK, Lee I, Marcotte EM (2004) Protein interaction networks from yeast to human. Curr Opin Struct Biol 14:292–9 30. Bornholdt S (2005) Systems biology. Less is more in modeling large genetic networks. Science 310:449–51

555

556

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

31. Bornholdt S, Rohlf T (2000) Topological evolution of dynamical networks: global criticality from local dynamics. Phys Rev Lett 84:6114–7 32. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA (2005) Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122:947–56 33. Brock A, Chang H, Huang SH Non-genetic cell heterogeneity and mutation-less tumor progression. Manuscript submitted 34. Brown KS, Hill CC, Calero GA, Myers CR, Lee KH, Sethna JP, Cerione RA (2004) The statistical mechanics of complex signaling networks: nerve growth factor signaling. Phys Biol 1: 184–195 35. Bulyk ML (2006) DNA microarray technologies for measuring protein-DNA interactions. Curr Opin Biotechnol 17:422–30 36. Callaway DS, Hopcroft JE, Kleinberg JM, Newman ME, Strogatz SH (2001) Are randomly grown graphs really random? Phys Rev E Stat Nonlin Soft Matter Phys 64:041902 37. Chang HH, Hemberg M, Barahona M, Ingber DE, Huang S (2008) Transcriptome-wide noise controls lineage choice in mammalian progenitor cells. Nature 453:544–547 38. Chang HH, Oh PY, Ingber DE, Huang S (2006) Multistable and multistep dynamics in neutrophil differentiation. BMC Cell Biol 7:11 39. Chang WC, Li CW, Chen BS (2005) Quantitative inference of dynamic regulatory pathways via microarray data. BMC Bioinformatics 6:44 40. Chaves M, Sontag ED, Albert R (2006) Methods of robustness analysis for Boolean models of gene control networks. Syst Biol (Stevenage) 153:154–67 41. Chen K, Rajewsky N (2007) The evolution of gene regulation by transcription factors and microRNAs. Nat Rev Genet 8: 93–103 42. Chen KC, Wang TY, Tseng HH, Huang CY, Kao CY (2005) A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics 21:2883–90 43. Chickarmane V, Troein C, Nuber UA, Sauro HM, Peterson C (2006) Transcriptional dynamics of the embryonic stem cell switch. PLoS Comput Biol 2:e123 44. Claverie JM (2001) Gene number. What if there are only 30,000 human genes? Science 291:1255–7 45. Collins SJ (1987) The HL-60 promyelocytic leukemia cell line: proliferation, differentiation, and cellular oncogene expression. Blood 70:1233–1244 46. Cordero OX, Hogeweg P (2006) Feed-forward loop circuits as a side effect of genome evolution. Mol Biol Evol 23:1931–6 47. Cross MA, Enver T (1997) The lineage commitment of haemopoietic progenitor cells. Curr Opin Genet Dev 7: 609–613 48. Dang CV, O’Donnell KA, Zeller KI, Nguyen T, Osthus RC, Li F (2006) The c-Myc target gene network. Semin Cancer Biol 16:253–64 49. Davidich MI, Bornholdt S (2008) Boolean network model predicts cell cycle sequence of fission yeast. PLoS ONE 3:e1672 50. Davidson EH, Erwin DH (2006) Gene regulatory networks and the evolution of animal body plans. Science 311:796–800 51. de la Serna IL, Ohkawa Y, Berkes CA, Bergstrom DA, Dacwag CS, Tapscott SJ, Imbalzano AN (2005) MyoD targets chromatin

52.

53.

54.

55. 56.

57.

58. 59. 60.

61.

62.

63.

64.

65. 66. 67.

68.

69.

70.

remodeling complexes to the myogenin locus prior to forming a stable DNA-bound complex. Mol Cell Biol 25:3997–4009 Deane CM, Salwinski L, Xenarios I, Eisenberg D (2002) Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 1:349–56 Delbrück M (1949) Discussion. In: Unités biologiques douées de continuité génétique Colloques Internationaux du Centre National de la Recherche Scientifique. CNRS, Paris Deplancke B, Mukhopadhyay A, Ao W, Elewa AM, Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm L, Reece-Hoyes JS, Hope IA, Tissenbaum HA, Mango SE, Walhout AJ (2006) A gene-centered C. elegans protein-DNA interaction network. Cell 125:1193–205 Derrida B, Pomeau Y (1986) Random networks of automata: a simple annealed approximation. Europhys Lett 1:45–49 Dodd IB, Micheelsen MA, Sneppen K, Thon G (2007) Theoretical analysis of epigenetic cell memory by nucleosome modification. Cell 129:813–22 Eichler GS, Huang S, Ingber DE (2003) Gene Expression Dynamics Inspector (GEDI): for integrative analysis of expression profiles. Bioinformatics 19:2321–2322 Eisenberg E, Levanon EY (2003) Preferential attachment in the protein network evolution. Phys Rev Lett 91:138701 Enver T, Heyworth CM, Dexter TM (1998) Do stem cells play dice? Blood 92:348–51; discussion 352 Espinosa-Soto C, Padilla-Longoria P, Alvarez-Buylla ER (2004) A gene regulatory network model for cell-fate determination during Arabidopsis thaliana flower development that is robust and recovers experimental gene expression profiles. Plant Cell 16:2923–39 Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, Kasif S, Collins JJ, Gardner TS (2007) Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 5:e8 Faure A, Naldi A, Chaouiya C, Thieffry D (2006) Dynamical analysis of a generic Boolean model for the control of the mammalian cell cycle. Bioinformatics 22:e124–31 Fazi F, Rosa A, Fatica A, Gelmetti V, De Marchis ML, Nervi C, Bozzoni I (2005) A minicircuitry comprised of microRNA-223 and transcription factors NFI-A and C/EBPalpha regulates human granulopoiesis. Cell 123:819–31 Ferrell JE Jr., Machleder EM (1998) The biochemical basis of an all-or-none cell fate switch in Xenopus oocytes. Science 280:895–8 Fisher AG (2002) Cellular identity and lineage choice. Nat Rev Immunol 2:977–82 Fox JJ, Hill CC (2001) From topology to dynamics in biochemical networks. Chaos 11:809–815 Fraser HB, Hirsh AE (2004) Evolutionary rate depends on number of protein–protein interactions independently of gene expression level. BMC Evol Biol 4:13 Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW (2002) Evolutionary rate in the protein interaction network. Science 296:750–2 Fuks F (2005) DNA methylation and histone modifications: teaming up to silence genes. Curr Opin Genet Dev 15: 490–495 Gao H, Falt S, Sandelin A, Gustafsson JA, Dahlman-Wright K (2007) Genome-wide identification of estrogen receptor ˛ binding sites in mouse liver. Mol Endocrinol 22:10–22

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

71. Gardner TS, Cantor CR, Collins JJ (2000) Construction of a genetic toggle switch in Escherichia coli. Nature 403:339–342 72. Gershenson C (2002) Classification of random Boolean networks. In: Standish RK, Bedau MA, Abbass HA (eds) Artificial life, vol 8. MIT Press, Cambridge, pp 1–8 73. Gisiger T (2001) Scale invariance in biology: coincidence or footprint of a universal mechanism? Biol Rev Camb Philos Soc 76:161–209 74. Glass L, Kauffman SA (1972) Co-operative components, spatial localization and oscillatory cellular dynamics. J Theor Biol 34:219–37 75. Goldberg AD, Allis CD, Bernstein E (2007) Epigenetics: a landscape takes shape. Cell 128:635–8 76. Goldstein ML, Morris SA, Yen GG (2004) Problems with fitting to the power-law distribution. Eur Phys J B 41:255–258 77. Goodwin BC, Webster GC (1999) Rethinking the origin of species by natural selection. Riv Biol 92:464–7 78. Gould SJ, Lewontin RC (1979) The spandrels of San Marco and the Panglossian paradigm: a critique of the adaptationist programme. Proc R Soc Lond B Biol Sci 205:581–98 79. Graf T (2002) Differentiation plasticity of hematopoietic cells. Blood 99:3089–101 80. Grass JA, Boyer ME, Pal S, Wu J, Weiss MJ, Bresnick EH (2003) GATA-1-dependent transcriptional repression of GATA-2 via disruption of positive autoregulation and domain-wide chromatin remodeling. Proc Natl Acad Sci USA 100:8811–6 81. Greil F, Drossel B, Sattler J (2007) Critical Kauffman networks under deterministic asynchronous update. New J Phys 9:373 82. Guelzim N, Bottani S, Bourgine P, Kepes F (2002) Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet 31:60–3 83. Guo Y, Eichler GS, Feng Y, Ingber DE, Huang S (2006) Towards a holistic, yet gene-centered analysis of gene expression profiles: a case study of human lung cancers. J Biomed Biotechnol 2006:69141 84. Hahn MW, Kern AD (2005) Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 22:803–6 85. Hartwell LH, Hopfield JJ, Leibler S, Murray AW (1999) From molecular to modular cell biology. Nature 402:C47–52 86. Harris SE, Sawhill BK, Wuensche A, Kauffman SA (2002) A model of transcriptional regulatory networks based on biases in the observed regulation rules. Complexity 7:23–40 87. Hasty J, Pradines J, Dolnik M, Collins JJ (2000) Noise-based switches and amplifiers for gene expression. Proc Natl Acad Sci USA 97:2075–80 88. Haverty PM, Hansen U, Weng Z (2004) Computational inference of transcriptional regulatory networks from expression profiling and transcription factor binding site identification. Nucleic Acids Res 32:179–88 89. He L, Hannon GJ (2004) MicroRNAs: small RNAs with a big role in gene regulation. Nat Rev Genet 5:522–31 90. Hilborn R (1994) Chaos and nonlinear dynamics: An introduction for scientists and engineers, 2 edn. Oxford University Press, New York 91. Hochedlinger K, Jaenisch R (2006) Nuclear reprogramming and pluripotency. Nature 441:1061–7 92. Hu M, Krause D, Greaves M, Sharkis S, Dexter M, Heyworth C, Enver T (1997) Multilineage gene expression precedes commitment in the hemopoietic system. Genes Dev 11:774–85

93. Huang S (2004) Back to the biology in systems biology: what can we learn from biomolecular networks. Brief Funct Genomics Proteomics 2:279–297 94. Huang S (2007) Cell fates as attractors – stability and flexibility of cellular phenotype. In: Endothelial biomedicine, 1st edn, Cambridge University Press, New York, pp 1761–1779 95. Huang S, Ingber DE (2000) Shape-dependent control of cell growth, differentiation, and apoptosis: switching between attractors in cell regulatory networks. Exp Cell Res 261:91–103 96. Huang S, Ingber DE (2006) A non-genetic basis for cancer progression and metastasis: self-organizing attractors in cell regulatory networks. Breast Dis 26:27–54 97. Huang S, Wikswo J (2006) Dimensions of systems biology. Rev Physiol Biochem Pharmacol 157:81–104 98. Huang S, Eichler G, Bar-Yam Y, Ingber DE (2005) Cell fates as high-dimensional attractor states of a complex gene regulatory network. Phys Rev Lett 94:128701 99. Huang S, Guo YP, May G, Enver T (2007) Bifurcation dynamics of cell fate decision in bipotent progenitor cells. Dev Biol 305:695–713 100. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH (2000) Functional discovery via a compendium of expression profiles. Cell 102:109–26 101. Hume DA (2000) Probability in transcriptional regulation and its implications for leukocyte differentiation and inducible gene expression. Blood 96:2323–8 102. Ihmels J, Bergmann S, Barkai N (2004) Defining transcription modules using large-scale gene expression data. Bioinformatics 20:1993–2003 103. Jablonka E, Lamb MJ (2002) The changing concept of epigenetics. Ann N Y Acad Sci 981:82–96 104. Jeong H, Mason SP, Barabasi AL, Oltvai ZN (2001) Lethality and centrality in protein networks. Nature 411:41–42 105. Johnson DS, Mortazavi A, Myers RM, Wold B (2007) Genomewide mapping of in vivo protein-DNA interactions. Science 316:1497–502 106. Jordan IK, Wolf YI, Koonin EV (2003) No simple dependence between protein evolution rate and the number of protein– protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol 3:1 107. Joy MP, Brock A, Ingber DE, Huang S (2005) High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol 2005:96–103 108. Kaern M, Elston TC, Blake WJ, Collins JJ (2005) Stochasticity in gene expression: from theories to phenotypes. Nat Rev Genet 6:451–64 109. Kaplan D, Glass L (1995) Understanding Nonlinear Dynamics, 1st edn. Springer, New York 110. Kashiwagi K, Urabe I, Kancko K, Yomo T (2006) Adaptive response of a gene network to environmental changes by fitness-induced attractor selection. PLoS One, 1:e49 111. Kauffman S (1969) Homeostasis and differentiation in random genetic control networks. Nature 224:177–8 112. Kauffman S (2004) A proposal for using the ensemble approach to understand genetic regulatory networks. J Theor Biol 230:581–90

557

558

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

113. Kauffman S, Peterson C, Samuelsson B, Troein C (2003) Random Boolean network models and the yeast transcriptional network. Proc Natl Acad Sci USA 100:14796–9 114. Kauffman SA (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 22:437–467 115. Kauffman SA (1991) Antichaos and adaptation. Sci Am 265:78–84 116. Kauffman SA (1993) The origins of order. Oxford University Press, New York 117. Khorasanizadeh S (2004) The nucleosome: from genomic organization to genomic regulation. Cell 116:259–72 118. Kim KY, Wang J (2007) Potential energy landscape and robustness of a gene regulatory network: toggle switch. PLoS Comput Biol 3:e60 119. Klemm K, Bornholdt S (2005) Stable and unstable attractors in Boolean networks. Phys Rev E Stat Nonlin Soft Matter Phys 72:055101 120. Klevecz RR, Bolen J, Forrest G, Murray DB (2004) A genomewide oscillation in transcription gates DNA replication and cell cycle. Proc Natl Acad Sci USA 101:1200–5 121. Kloster M, Tang C, Wingreen NS (2005) Finding regulatory modules through large-scale gene-expression data analysis. Bioinformatics 21:1172–9 122. Kouzarides T (2007) Chromatin modifications and their function. Cell 128:693–705 123. Kramer BP, Fussenegger M (2005) Hysteresis in a synthetic mammalian gene network. Proc Natl Acad Sci USA 102: 9517–9522 124. Krawitz P, Shmulevich I (2007) Basin entropy in Boolean network ensembles. Phys Rev Lett 98:158701 125. Krysinska H, Hoogenkamp M, Ingram R, Wilson N, Tagoh H, Laslo P, Singh H, Bonifer C (2007) A two-step, PU.1-dependent mechanism for developmentally regulated chromatin remodeling and transcription of the c-fms gene. Mol Cell Biol 27:878–87 126. Kubicek S, Jenuwein T (2004) A crack in histone lysine methylation. Cell 119:903–6 127. Laslo P, Spooner CJ, Warmflash A, Lancki DW, Lee HJ, Sciammas R, Gantner BN, Dinner AR, Singh H (2006) Multilineage transcriptional priming and determination of alternate hematopoietic cell fates. Cell 126:755–66 128. Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799–804 129. Levsky JM, Singer RH (2003) Gene expression and the myth of the average cell. Trends Cell Biol 13:4–6 130. Li F, Long T, Lu Y, Ouyang Q, Tang C (2004) The yeast cellcycle network is robustly designed. Proc Natl Acad Sci USA 101:4781–6 131. Li H, Xuan J, Wang Y, Zhan M (2008) Inferring regulatory networks. Front Biosci 13:263–75 132. Lim HN, van Oudenaarden A (2007) A multistep epigenetic switch enables the stable inheritance of DNA methylation states. Nat Genet 39:269–75 133. Luo F, Yang Y, Chen CF, Chang R, Zhou J, Scheuermann RH (2007) Modular organization of protein interaction networks. Bioinformatics 23:207–14

134. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, Gerstein M (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431:308–12 135. MacCarthy T, Pomiankowski A, Seymour R (2005) Using largescale perturbations in gene network reconstruction. BMC Bioinformatics 6:11 136. Mangan S, Alon U (2003) Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci USA 100:11980–5 137. Manke T, Demetrius L, Vingron M (2006) An entropic characterization of protein interaction networks and cellular robustness. JR Soc Interface 3:843–50 138. Marcotte EM (2001) The path not taken. Nat Biotechnol 19:626–627 139. Margolin AA, Califano A (2007) Theory and limitations of genetic network inference from microarray data. Ann N Y Acad Sci 1115:51–72 140. Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks. Science 296:910–3 141. Mattick JS (2007) A new paradigm for developmental biology. J Exp Biol 210:1526–47 142. May RM (1972) Will a large complex system be stable? Nature 238:413–414 143. Meissner A, Wernig M, Jaenisch R (2007) Direct reprogramming of genetically unmodified fibroblasts into pluripotent stem cells. Nat Biotechnol 25:1177–1181 144. Mellor J (2006) Dynamic nucleosomes and gene transcription. Trends Genet 22:320–9 145. Metzger E, Wissmann M, Schule R (2006) Histone demethylation and androgen-dependent transcription. Curr Opin Genet Dev 16:513–7 146. Mikkers H, Frisen J (2005) Deconstructing stemness. Embo J 24:2715–9 147. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298:824–7 148. Monod J, Jacob F (1961) Teleonomic mechanisms in cellular metabolism, growth, and differentiation. Cold Spring Harb Symp Quant Biol 26:389–401 149. Morceau F, Schnekenburger M, Dicato M, Diederich M (2004) GATA-1: friends, brothers, and coworkers. Ann N Y Acad Sci 1030:537–54 150. Morrison SJ, Uchida N, Weissman IL (1995) The biology of hematopoietic stem cells. Annu Rev Cell Dev Biol 11:35–71 151. Murray JD (1989) Mathematical biology, 2nd edn (1993). Springer, Berlin 152. Newman MEJ (2003) The structure and function of complex networks. SIAM Review 45:167–256 153. Nykter M, Price ND, Aldana M, Ramsey SA, Kauffman SA, Hood L, Yli-Harja O, Shmulevich I (2008) Gene expression dynamics in the macrophage exhibit criticality. Proc Natl Acad Sci USA 105:1897–900 154. Nykter M, Price ND, Larjo A, Aho T, Kauffman SA, Yli-Harja O, Shmulevich I (2008) Critical networks exhibit maximal information diversity in structure-dynamics relationships. Phys Rev Lett 100:058702 155. Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, Murray HL, Volkert TL, Schreiber J, Rolfe PA, Gifford DK, Fraenkel E, Bell GI, Young RA (2004) Control of pancreas and liver gene expression by HNF transcription factors. Science 303:1378–81

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

156. Okita K, Ichisaka T, Yamanaka S (2007) Generation of germline-competent induced pluripotent stem cells. Nature 448:313–7 157. Ozbudak EM, Thattai M, Lim HN, Shraiman BI, Van Oudenaarden A (2004) Multistability in the lactose utilization network of Escherichia coli. Nature 427:737–740 158. Pennisi E (2003) Human genome. A low number wins the GeneSweep Pool. Science 300:1484 159. Picht P (1969) Mut zur utopie. Piper, München 160. Proulx SR, Promislow DE, Phillips PC (2005) Network thinking in ecology and evolution. Trends Ecol Evol 20:345–53 161. Raff M (2003) Adult stem cell plasticity: fact or artifact? Annu Rev Cell Dev Biol 19:1–22 162. Ralston A and Rossant J (2005) Genetic regulation of stem cell origins in the mouse embryo. Clin Genet 68:106–12 163. Ramo P, Kesseli J, Yli-Harja O (2006) Perturbation avalanches and criticality in gene regulatory networks. J Theor Biol 242:164–70 164. Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of modularity in metabolic networks. Science 297:1551–5 165. Reik W, Dean W (2002) Back to the beginning. Nature 420:127 166. Resendis-Antonio O, Freyre-Gonzalez JA, Menchaca-Mendez R, Gutierrez-Rios RM, Martinez-Antonio A, Avila-Sanchez C, Collado-Vides J (2005) Modular analysis of the transcriptional regulatory network of E. coli. Trends Genet 21:16–20 167. Robins H, Krasnitz M, Barak H, Levine AJ (2005) A relativeentropy algorithm for genomic fingerprinting captures hostphage similarities. J Bacteriol 187:8370–4 168. Roeder I, Glauche I (2006) Towards an understanding of lineage specification in hematopoietic stem cells: a mathematical model for the interaction of transcription factors GATA-1 and PU.1. J Theor Biol 241:852–65 169. Salgado H, Santos-Zavaleta A, Gama-Castro S, Peralta-Gil M, Penaloza-Spinola MI, Martinez-Antonio A, Karp PD, ColladoVides J (2006) The comprehensive updated regulatory network of Escherichia coli K-12. BMC Bioinformatics 7:5 170. Samonte RV, Eichler EE (2002) Segmental duplications and the evolution of the primate genome. Nat Rev Genet 3:65–72 171. Sandberg R, Ernberg I (2005) Assessment of tumor characteristic gene expression in cell lines using a tissue similarity index (TSI). Proc Natl Acad Sci USA 102:2052–7 172. Shivdasani RA (2006) MicroRNAs: regulators of gene expression and cell differentiation. Blood 108:3646–53 173. Shmulevich I, Kauffman SA (2004) Activities and sensitivities in boolean network models. Phys Rev Lett 93:048701 174. Shmulevich I, Kauffman SA, Aldana M (2005) Eukaryotic cells are dynamically ordered or critical but not chaotic. Proc Natl Acad Sci USA 102:13439–44 175. Siegal ML, Promislow DE, Bergman A (2007) Functional and evolutionary inference in gene networks: does topology matter? Genetica 129:83–103 176. Smith MC, Sumner ER, Avery SV (2007) Glutathione and Gts1p drive beneficial variability in the cadmium resistances of individual yeast cells. Mol Microbiol 66:699–712 177. Southall TD, Brand AH (2007) Chromatin profiling in model organisms. Brief Funct Genomic Proteomic 6:133–40 178. Southan C (2004) Has the yo-yo stopped? An assessment of human protein-coding gene number. Proteomics 4:1712–26

179. Stern CD (2000) Conrad H. Waddington’s contributions to avian and mammalian development, 1930–1940. Int J Dev Biol 44:15–22 180. Strohman R (1994) Epigenesis: the missing beat in biotechnology? Biotechnology (N Y) 12:156–64 181. Stumpf MP, Wiuf C, May RM (2005) Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci USA 102:4221–4 182. Suzuki M, Yamada T, Kihara-Negishi F, Sakurai T, Hara E, Tenen DG, Hozumi N, Oikawa T (2006) Site-specific DNA methylation by a complex of PU.1 and Dnmt3a/b. Oncogene 25:2477–88 183. Swiers G, Patient R, Loose M (2006) Genetic regulatory networks programming hematopoietic stem cells and erythroid lineage specification. Dev Biol 294:525–40 184. Takahashi K, Yamanaka S (2006) Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126:663–76 185. Tapscott SJ (2005) The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development 132:2685–95 186. Taylor JS, Raes J (2004) Duplication and divergence: The evolution of new genes and old ideas. Annu Rev Genet 38: 615–643 187. Teichmann SA, Babu MM (2004) Gene regulatory network growth by duplication. Nat Genet 36:492–6 188. Thieffry D, Huerta AM, Perez-Rueda E, Collado-Vides J (1998) From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli. Bioessays 20:433–40 189. Tinbergen N (1952) Derived activities; their causation, biological significance, origin, and emancipation during evolution. Q Rev Biol 27:1–32 190. Toh H, Horimoto K (2002) Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinformatics 18:287–97 191. Trojer P, Reinberg D (2006) Histone lysine demethylases and their impact on epigenetics. Cell 125:213–7 192. Tyson JJ, Chen KC, Novak B (2003) Sniffers, buzzers, toggles and blinkers: dynamics of regulatory and signaling pathways in the cell. Curr Opin Cell Biol 15:221–231 193. van Helden J, Wernisch L, Gilbert D, Wodak SJ (2002) Graphbased analysis of metabolic networks. Ernst Schering Res Found Workshop:245–74 194. van Nimwegen E (2003) Scaling laws in the functional content of genomes. Trends Genet 19:479–84 195. Vogel G (2003) Stem cells. ‘Stemness’ genes still elusive. Science 302:371 196. Waddington CH (1940) Organisers and genes. Cambridge University Press, Cambridge 197. Waddington CH (1956) Principles of embryology. Allen and Unwin Ltd, London 198. Waddington CH (1957) The strategy of the genes. Allen and Unwin, London 199. Watts DJ (2004) The “new” science of networks. Ann Rev Sociol 20:243–270 200. Webster G, Goodwin BC (1999) A structuralist approach to morphology. Riv Biol 92:495–8 201. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, Hochedlinger K, Bernstein BE, Jaenisch R (2007) In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448:318–24

559

560

Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination

202. Whitfield ML, Sherlock G, Saldanha AJ, Murray JI, Ball CA, Alexander KE, Matese JC, Perou CM, Hurt MM, Brown PO, Botstein D (2002) Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell 13:1977–2000 203. Wilkins AS (2007) Colloquium Papers: Between “design” and “bricolage”: Genetic networks, levels of selection, and adaptive evolution. Proc Natl Acad Sci USA 104 Suppl 1:8590–6 204. Wuensche A (1998) Genomic regulation modeled as a network with basins of attraction. Pac Symp Biocomput:89–102 205. Xiong W, Ferrell JE Jr. (2003) A positive-feedback-based bistable ‘memory module’ that governs a cell fate decision. Nature 426:460–465 206. Xu X, Wang L, Ding D (2004) Learning module networks from genome-wide location and expression data. FEBS Lett 578:297–304 207. Yu H, Greenbaum D, Xin Lu H, Zhu X, Gerstein M (2004) Genomic analysis of essentiality within protein networks. Trends Genet 20:227–31 208. Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M (2007) The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol 3:e59 209. Yuh CH, Bolouri H, Davidson EH (2001) Cis-regulatory logic in the endo16 gene: switching from a specification to a differentiation mode of control. Development 128:617–29

Books and Reviews Huang S (2004) Back to the biology in systems biology: what can we learn from biomolecular networks. Brief Funct Genomics Proteomics 2:279–297 Huang S (2007) Cell fates as attractors – stability and flexibility of cellular phenotype. In: Endothelial biomedicine, 1st edn. Cambridge University Press, New York, pp 1761– 1779 Huang S, Ingber DE (2006) A non-genetic basis for cancer progression and metastasis: self-organizing attractors in cell regulatory networks. Breast Dis 26:27–54 Kaneko K (2006) Life: An introduction to complex systems biology, 1edn. Springer, Berlin Kauffman SA (1991) Antichaos and adaptation. Sci Am 265:78–84 Kauffman SA (1993) The origins of order. Oxford University Press, New York Kauffman SA (1996) At home in the universe: the search for the laws of self-organization and complexity. Oxford University Press, New York Laurent M, Kellershohn N (1999) Multistability: a major means of differentiation and evolution in biological systems. Trends Biochem Sci 24:418–422 Wilkins AS (2007) Colloquium papers: Between “design” and “bricolage”: Genetic networks, levels of selection, and adaptive evolution. Proc Natl Acad Sci USA 104 Suppl 1:8590–6

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

Complexity in Systems Level Biology and Genetics: Statistical Perspectives DAVID A. STEPHENS Department of Mathematics and Statistics, McGill University, Montreal, Canada Article Outline Glossary Definition of the Subject Introduction Mathematical Representations of the Organizational Hierarchy Transcriptomics and Functional Genomics Metabolomics Future Directions Bibliography Glossary Systems biology The holistic study of biological structure, function and organization. Probabilistic graphical model A probabilistic model defining the relationships between variables in a model by means of a graph, used to represent the relationships in a biological network or pathway. MCMC Markov chain Monte Carlo – a computational method for approximating high-dimensional integrals using Markov chains to sample from probability distributions, commonly used in Bayesian inference. Microarray A high-throughput experimental platform for collecting functional gene expression and other genomic data. Cluster analysis A statistical method for discovering subgroups in data. Metabolomics The study of the metabolic content of tissues. Definition of the Subject This chapter identifies the challenges posed to biologists, geneticists and other scientists by advances in technology that have made the observation and study of biological systems increasingly possible. High-throughput platforms have made routine the collection vast amounts of structural and functional data, and have provided insights into the working cell, and helped to explain the role of genetics in common diseases. Associated with the improvements in technology is the need for statistical procedures that extract the biological information from the available data in a coherent fashion, and perhaps more importantly,

can quantify the certainty with which conclusions can be made. This chapter outlines a biological hierarchy of structures, functions and interactions that can now be observed, and detail the statistical procedures that are necessary for analyzing the resulting data. The chapter has four main sections. The first section details the historical connection between statistics and the analysis of biological and genetic data, and summarizes fundamental concepts in biology and genetics. The second section outlines specific mathematical and statistical methods that are useful in the modeling of data arising in bioinformatics. In sections three and four, two particular issues are discussed in detail: functional genomic via microarray analysis, and metabolomics. Section five identifies some future directions for biological research in which statisticians will play a vital role. Introduction The observation of biological systems, their processes and inter-reactions, is one of the most important activities in modern science. It has the capacity to provide direct insight into fundamental aspects of biology, genetics, evolution, and indirectly will inform many aspects of public health. Recent advances in technology – high-throughput measurement platforms, imaging – have brought a new era of increasingly precise methods of investigation. In parallel to this, there is an increasingly important focus on statistical methods that allow the information gathered to be processed and synthesized. This chapter outlines key statistical techniques that allow the information gathered to be used in an optimal fashion. Although its origin is dated rather earlier, the term Systems Biology (see, for example, [1,2,3]) has, since 2000, been used to describe the study of the operation of biological systems, using tools from mathematics, statistics and computer science, supplanting computational biology and bioinformatics as an all-encompassing term for quantitative investigation in molecular biology. Most biological systems are hugely complex, involving chemical and mechanical processes operating at different scales. It is important therefore that information gathered is processed coherently, according to self-consistent rules and practices, in the presence of the uncertainty induced by imperfect observation of the underlying system. The most natural framework for coherent processing of information is that of probabilistic modeling. Statistical Versus Mathematical Modeling There is a great tradition of mathematical and probabilistic modeling of biology and genetics; see [4] for a thor-

561

562

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

ough review. The mathematization of biology, evolution and heredity began at the end of the nineteenth century, and continued for the first half of the twentieth century, by far pre-dating the era of molecular biology and genetics that culminated at the turn of the last millennium with the human genome project. Consequently, the mathematical models of, say, evolutionary processes that were developed by Yule [5] and Fisher and Wright [6,7,8], and classical models of heredity, could only be experimentally verified and developed many years after their conception. It could also be convincingly argued that through the work of F. Galton, K. S. Pearson and R. A. Fisher, modern statistics has its foundation in biology and genetics. In parallel to statistical and stochastic formulation of models for biological systems, there has been a more recent focus on the construction of deterministic models to describe observed biological phenomena. Such models fall under the broad description Mathematical Biology, and have their roots in applied mathematics and dynamical systems; see, for example, [8,9] for a comprehensive treatment. The distinction between stochastic and deterministic models is important to make, as the objectives and tools used often differ considerably. This chapter will restrict attention to stochastic models, and the processing of observed data, and thus is perhaps more closely tied to the immediate interests of the scientist, although some of the models utilized will be inspired by mathematical models of the phenomena being observed. Fundamental Concepts in Biology and Genetics To facilitate the discussion of statistical methods applied to systems biology, it is necessary to introduce fundamental concepts from molecular biology and genetics; see the classic text [10] for full details. Attention is restricted to eukaryotic, organisms whose cells are constructed to contain a nucleus which coding information is encapsulated.  The cell nucleus is a complex architecture containing several nuclear domains [11] whose organization is not completely understood, but the fundamental activity that occurs within the nucleus is the production and distribution of proteins.  Dioxyribonucleic acid (DNA) is a long string of nucleotides that encodes biological information, and that is copied or transcribed into ribonucleic acid (RNA), which in turn enables the formation of proteins. Specific segments of the DNA, genes, encode the proteins, although non-coding regions of DNA – for example, promoter regions, transcription factor binding sites – also have important roles. Genetic variation at the nucleotide level, even involving a single nucleotide, can









disrupt cellular activity. In humans and most other complex organisms, DNA is arranged into chromosomes, which are duplicated in the process of mitosis. The entire content of the DNA of an organism is termed the genome. Proteins are macromolecules formed by the translation of RNA, comprising amino acids arranged (in primary structure) in a linear fashion, comprising domains with different roles, and physically configured in three dimensional space. Proteins are responsible for all biological activities that take place in the cell, although proteins may have different roles in different tissues at different times, due to the regulation of transcription. Proteins interact with each other in different ways in different contexts in interaction networks that may be dynamically organized. Genes are also regarded as having indirect interactions through gene regulatory networks. Genetic variation amongst individuals in a population is due to mutation and selection, which can be regarded as stochastic mechanisms. Genetic information in the form of DNA passes from parent to offspring, which promulgates genetic variation. Individuals in a population are typically related in evolutionary history. Similarly, proteins can also thought to be related through evolutionary history. Genetic disorders are the result of genetic variation, but the nature of the genetic variation can be large- or small-scale; at the smallest scale, variation in single nucleotides (single nucleotide polymorphisms (SNPs)) can contribute to the variation in observed traits.

Broadly, attention is focused on the study of structure and function of DNA, genes and proteins, and the nature of their interactions. It is useful, if simplistic, to view biological activities in terms of an organizational hierarchy of inter-related chemical reactions at the level of DNA, protein, nucleus, network and cellular levels. A holistic view of mathematical modeling and statistical inference requires the experimenter to model simultaneously actions and interactions of all the component features, whilst recognizing that the component features cannot observed directly, and can only be studied through separate experiments on often widely different platforms. It is the role of the bioinformatician or systems biologist to synthesize the data available from separate experiments in an optimal fashion. Mathematical Representations of the Organizational Hierarchy A mathematical representation of a biological system is required that recognizes, first, the complexity of the sys-

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

tem, secondly, its potentially temporally changing nature, and thirdly the inherent uncertainties that are present. It is the last feature that necessitates the use of probabilistic or stochastic modeling. An aphorism commonly ascribed to D.V. Lindley states that “Probability is the language of uncertainty”; probability provides a coherent framework for processing information in the presence of imperfect knowledge, and through the paradigm of Bayesian theory [12] provides the mathematical template for statistical inference and prediction. In the modeling of complex systems, three sorts of uncertainty are typically present:  Uncertainty of Structure: Imperfect knowledge of the connections between the interacting components is typically present. For example, in a gene regulatory network, it may be possible via the measurement of gene co-expression to establish which genes interact within the network, but it may not be apparent precisely how the organization of regulation operates, that is which genes regulate the expression of other genes.  Uncertainty concerning Model Components: In any mathematical or probabilistic model of a biological system, there are model components (differential equations, probability distributions, parameter settings) that must be chosen to facilitate implementation of the model. These components reflect, but are not determined by, structural considerations.  Uncertainty of Observation: Any experimental procedure carries with it uncertainty induced by the measurement of underlying system, that is typically subject to random measurement error, or noise. For example, many biological systems rely on imaging technology, and the extraction of the level of signal of a fluorescent probe, for a representation of the amount of biological material present. In microarray studies (see Sect. “Microarrays”), comparative hybridization of messenger RNA (mRNA) to a medium is a technique for measuring gene expression that is noisy due to several factors (imaging noise, variation in hybridization) not attributable to a biological cause. The framework to be built must handle these types of uncertainty, and permit inference about structure and model components. Models Derived from Differential Equations A deterministic model reflecting the dynamic relationships often present in biological systems may be based on the system of ordinary differential equations (ODEs) x˙ (t) D g(x(t))

(1)

where x(t) D (x1 (t); : : : ; x d (t))T represent the levels of d quantities being observed x˙ (t) represents time derivative, and g is some potentially non-linear system of equations, that may be suggested by biological prior knowledge or prior experimentation. The model in Eq. (1) is a classical “Mathematical Biology” model, that has been successful in representing forms of organization in many biological systems (see, for example [13] for general applications). Suppressed in the notation is a dependence on system parameters, , a k-dimensional vector that may be presumed fixed, and “tuned” to replicate observed behavior, or estimated from observed data. When data representing a partial observation of the system are available, inferences about  can be made, and models defined by ODE systems are of growing interest to statisticians; see, for example, [15,16,17]. Equation (1) can be readily extended to a stochastic differential equation (SDE) system x˙ (t) D g(x(t)) C dz(t)

(2)

where z(t) is some stochastic process that renders the solution to Eq. (2) a stochastic process (see, for example, [18] for a comprehensive recent summary of modeling approaches and inference procedures, and a specific application in [19]). The final term dz(t) represents the infinitesimal stochastic increment in z(t). Such models, although particularly useful for modeling activity at the molecular level, often rely on simplifying assumptions (linearity of g, Gaussianity of z) and the fact that the relationship structure captured by g is known. Inference for the parameters of the system can be made, but in general require advanced computational methods (Monte Carlo (MC) and Markov chain Monte Carlo (MCMC)). Probabilistic Graphical Models A simple and often directly implementable approach is based on a probabilistic graphical model, comprising a graph G D (N ; E ), described by a series of nodes N, edges E , and a collection of random variables X D (X 1 ; : : : ; X d )T placed at the nodes, all of which may be dynamically changing. See, for example [20] for a recent summary, [14] for mathematical details and [22] for a biological application. The objective of constructing such a model is to identify the joint probability structure of X given the graph G , which possibly is parametrized by parameters , f X (xj; G ). In many applications, X is not directly observed, but is instead inferred from observed data, Y, arising as noisy observations derived from X. Again, a k-dimensional parameter vector  helps to characterize the

563

564

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

stochastic dependence of Y on X by parametrize the conditional probability density f YjX (yjx; ). The joint probability model encapsulating the probabilistic structure of the model is f X;Y (x; yj; ; G ) D f X (xj; G ) f Y jX (yjx; )

X2

X3

which encodes a conditional independence relationship between X 1 and X 3 given X 2 , and a factorization of the joint distribution p(x1 ; x2 ; x3 ) D p(x1 )p(x2 jx1 )p(x3 jx2 ):

X1

X2

X1

X3

X3 encoding p(x1 ; x2 ; x3 ) D p(x1 )p(x2 jx1 )p(x3 jx1 ) :

Such simple model assumptions are the building blocks for the construction of highly complex graphical representations of biological systems. There is an important difference between analysis based purely on simultaneous observation of all components of the system, which can typically on yield inference on dependencies (say, covariances measured in the joint probability model p(x) – see, for example [26,27,28] and analysis based on interventionsgenomic knock–out experiments, chemical or biological challenges, transcriptional/translational perturbation such as RNA interference (RNAi) – that may yield information on casual links; see, for example [29,30]. Bayesian Statistical Inference Given a statistical model for observed data such as Eq. (3), inference for the parameters (; ) and the graph structure G is required. The optimal coherent framework is that of Bayesian statistical inference (see for example [31]), that requires computation of the posterior distribution for the unknown (or unobservable) quantities given by

D L(; ; G jx; y)p(; ; G )

X1

X1

X2 encode p(x1 ; x2 ; x3 ) D p(x1 )p(x2 )p(x3 jx1 ; x2 )

X3

(4)

(; ; G jx; y) / f X;Y (x; yj; ; G )p(; ; G )

Similarly, the equivalent graphs

X3

X2

(3)

The objectives of inference are to learn about G (the uncertain structural component) and parameters (; ) (the uncertain model parameters and observation components). The graph structure G is described by N and E . In holistic models, G represents the interconnections between interacting modules (genomic modules, transcription modules, regulatory modules, proteomic modules, metabolic modules etc.) and also the interconnections within modules in the form of sub graphs. The nodes N (and hence X) represent influential variables in the model structure, and the edges E represent dependencies. The edge connecting two nodes, if present, may be directed or undirected according to the nature of the influence; a directed edge indicates the direction of causation, an undirected edge indicates a dependence. Causality is a concept distinct from dependence (association, co-variation or correlation), and represents the influence of one node on one or more other nodes (see, for example, [23] for a recent discussion of the distinction with examples, and [24,25] for early influential papers discussing how functional dependence may be learned from real data). A simple causal relationship between three variables X 1 ; X2 ; X3 can be represented X1

whereas the graphs for conditional independence of X 2 and X 3 given X 1 are

X2

(5)

a probability distribution from which can be computed parameter estimates with associated uncertainties, and predictions from the model. The terms L(; ; G jx; y) and p(; ; G ) are termed likelihood and prior probability distribution respectively. The likelihood reflects the observed data, and the prior distribution encapsulates biological prior knowledge about the system under study. If the graph structure is known in advance, the prior distribution for that component can be set to be degenerate. If, as in many cases of probabilistic graphical models, the x are unobserved, then the posterior distribution incorporates

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

and in Eq. (6) takes the form

them also, (; ; G ; xj; y) / f YjX (yjx; ) f X (xj; G ) p(; ; G )

(6)

yielding a latent or state-space model, otherwise interpreted as a missing data model. The likelihood and prior can often be formulated in a hierarchical fashion to reflect believed causal or conditional independence structures. If a graph G is separable into two sub graphs G1 ; G2 conditional on a connecting node , similar to the graph in Eq. (4), then the probability model also factorizes into a similar fashion; for example, X 1 might represent the amount of expressed mRNA of a gene that regulates two separate functional modules, and X 2 and X 3 might be the levels of expression of collections of related proteins. The hierarchical specification also extends to parameters in probability models; a standard formulation of a Bayesian hierarchical model involves specification of conditional independence structures at multiple levels of within a graph. The following three-level hierarchical model relates data Y D (Y 1 ; : : : ; Y p ) at level 1, to a population of parameters  D ( 1 ; : : : ;  p )T at level 2, to hyper parameters at level 3

Level 1

1

Y1

2

Y2





Yp

is achieved by randomly sampling x 1 ; : : : ; x N (N large) from (), and using the estimate

yielding the factorization of the Bayesian full joint distribution as ; (x; y; ;

D p( )

) ( p Y iD1

)( p( i j )

p Y

and is termed the marginal likelihood or prior predictive distribution for the observable quantities x and y. In formal Bayesian theory, it is the representation of the distribution of the observable quantities through the paradigm of exchangeability that justifies the decomposition in Eq. (8) into likelihood and prior, and justifies, via asymptotic arguments, the use of the posterior distribution for inference (see Chaps. 1–4 in [13] for full details). It is evident from these equations that exact computation of the posterior distribution necessitates high-dimensional integration, and in many cases this cannot be carried out analytically.

p

(7)

f X;Y ;

(9)

Numerical Integration Approaches Classical numerical integration methods, or analytic approximation methods are suitable only in low dimensions. Stochastic numerical integration, for example Monte Carlo integration, approximates expectations by using empirical averages of functional of samples obtained from the target distribution; for probability distribution (x), the approximation of E [g(X)], Z E [g(X)] D g(x)(x) dx < 1

Level 3

Level 2

f Y (y) Z D f X;Y (x; yj; ; G )p(; ; G ) d d dG dx

) p(Y i j i )

:

iD1

Bayesian Computation The posterior distribution is, potentially, a high-dimensional multivariate function on a complicated parameter space. The proportionality constant in Eq. (5) takes the form

b E [g(X)] D

n 1 X g(x i ): N iD1

An adaptation of the Monte Carlo method can be used if the functions g and  are not “similar” (in the sense that g is large in magnitude where  is not, and vice versa); importance sampling uses the representation Z Z g(x)(x) p(x) dx E [g(X)] D g(x)(x) dx D p(x) for some pdf p() having common support with , and constructs an estimate from a sample x 1 ; : : : ; x N from p() of the form b E [g(X)] D

n 1 X g(x i )(x i ) : N p(x i ) iD1

f X;Y (x; y) Z D f X;Y (x; yj; ; G )p(; ; G ) d d dG

(8)

Under standard regularity conditions, the corresponding estimators converge to the required expectation. Further extensions are also useful:

565

566

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

 Sequential Monte Carlo: Sequential Monte Carlo (SMC) is an adaptive procedure that constructs a sequence of improving importance sampling distributions. SMC is a technique that is especially useful for inference problems where data are collected sequentially in time, but is also used in standard Monte Carlo problems (see [32]).  Quasi Monte Carlo: Quasi Monte Carlo (QMC) utilizes uniform but not random samples to approximate the required expectations. It can be shown that QMC can produce estimators with lower variance than standard Monte Carlo. Markov Chain Monte Carlo Markov chain Monte Carlo (MCMC) is a stochastic Monte Carlo method for sampling from a high-dimensional probability distribution (x), and using the samples to approximate expectations with respect to that distribution. An ergodic, discrete-time Markov chain is defined on the support of  in such a way that the stationary distribution of the chain exists, and is equal to . Dependent samples from  are obtained by collecting realized values of the chain after it has reached its stationary phase, and then used as the basis of a Monte Carlo strategy. The most common MCMC algorithm is known as the Metropolis–Hastings algorithm which proceeds as follows. If the state of the d-dimensional chain fX t g at iteration t is given by X t D u, then a candidate state v is generated from conditional density q(u; v) D q(vju), and accepted def

as the new state of the chain (that is, X tC1 D v) with probability ˛(u; v) given by

 (v)q(v; u) ˛(u; v) D min 1; : (u)q(u; v) A common MCMC approach involves using a Gibbs sampler strategy that performs iterative sampling with updating from the collection of full conditional distributions (x j jx ( j) ) D (x j jx 1 ; : : : ; x j1 ; x jC1 ; x d ) D

(x 1 ; : : : ; x d ) ; (x 1 ; : : : ; x j1 ; x jC1 ; x d )

j D 1; : : : ; d

rather than updating the components of x simultaneously. There is a vast literature on MCMC theory and applications; see [33,34] for comprehensive treatments. MCMC re-focuses inferential interest from computing posterior analytic functional forms to producing posterior samples. It is an extremely flexible framework for computational inference that carries with it certain well-documented problems, most important amongst them being

the assessment of convergence. It is not always straightforward to assess when the Markov chain has reached its stationary phase, so certain monitoring steps are usually carried out. Bayesian Modeling: Examples Three models that are especially useful in the modeling of systems biological data are regression models, mixture models, and state-space models. Brief details of each type of model follow. Regression Models Linear regression models relate an observed response variable Y to a collection of predictor variables X1 ; X2 ; : : : ; X d via the model for the ith response Yi D ˇ0 C

d X

ˇ j X i j C  i D X Ti ˇ C  i

jD1

say, or in vector form, for Y D (Y1 ; : : : ; Yn )T , Y D Xˇ C  where ˇ D (ˇ0 ; ˇ1 ; : : : ; ˇd )T is a vector of real-valued parameters, and  is a vector random variable with zeromean and variance-covariance matrix ˙ . The objective in the analysis is to make inference about ˇ, to understand the influence of the predictors on the response, and to perform prediction for Y. The linear regression model (or General Linear Model) is extremely flexible: the design matrix X can be formed from arbitrary, possibly non-linear basis functions of the predictor variables. By introducing a covariance structure into ˙ , it is possible to allow for dependence amongst the components of Y , and allows for the possibility of modeling repeated measures, longitudinal or time-series data that might arise from multiple observation of the same experimental units. An extension that is often also useful is to random effect or mixed models that take into account any repeated measures aspect to the recorded data. If data on an individual (person, sample, gene etc) is Y i D (Yi1 ; : : : ; Yi d )T , then Y i D Xˇ C ZU i C  i

(10)

where Z is a d  p constant design matrix, and U i is a p  1 vector of random effects specific to individual i. Typically the random effect vectors are assumed to be drawn from a common population. Similar formulations can be used to construct semi-parametric models that are useful for flexible modeling in regression.

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

Mixture Models A mixture model presumes that the probability distribution of variable Y can be written f Yj (yj) D

K X

! k f k (yj k )

(11)

kD1

where f1 ; f2 ; : : : ; f K are distinct component densities indexed by parameters 1 ; : : : ; K , and for all k, 0 < ! k < 1, with K X

! k D 1:

kD1

i D 1; : : : ; n

1 ; : : : ; n DP(˛; F0 ) where DP(˛; F0 ) denotes a Dirichlet process. The DP(˛; F0 ) model may be sampled to produce 1 ; 2 ; : : : ;

n using the Polya-Urn scheme

1 F0

k j 1 ; : : : ; k1

State-Space Models A state-space model is specified through a pair of equations that relate a collection of states, X t , to observations Y t that represent a system and how that system develops over time. For example, the relationship could be model led as Y t D f (X t ; U t )

The model can be interpreted as one that specifies that with probability ! k , Y is drawn from density f k , for k D 1; : : : ; K. Hence the model is suitable for modeling in cluster analysis problems. This model can be extended to an infinite mixture model, which has close links with Bayesian non-parametric modeling. A simple infinite mixture/Bayesian non-parametric model is the mixture of Dirichlet processes (MDP) model [35,36]: for parameter ˛ > 0 and distribution function F 0 , an MDP model can be specified using the following hierarchical specification: for a sample of size n, we have Yi j i f Yj (yj i )

univariate or multivariate, and the model itself can be used to represent the variability in observed data or as a prior density. Inference for such models is typically carried out using MCMC or SMC methods [32,33]. For applications in bioinformatics and functional genomic, see [37,38].

k1

X ˛ 1 ı j F0 C ˛Ck1 ˛Ck1 jD1

where ıx is a point mass at x. For k , conditional on 1 ; : : : ; k1 , the Polya-Urn scheme either samples

k from F 0 (with probability ˛/(˛ C k  1)), or samples k D j for some j D 1; : : : ; k  1 (with probability 1/(˛ C k  1)). This model therefore induces clustering amongst the values, and hence has a structure similar to the finite mixture model – the distinct values of 1 ; : : : ; n are identified as the cluster “centers” that index the component densities in the mixture model in Eq. (11). The degree of clustering is determined by ˛; high values of ˛ encourage large numbers of clusters. The MDP model is a flexible model for statistical inference, and is used in a wide range of applications such as density estimation, cluster analysis, functional data analysis and survival analysis. The component densities can be

X tC1 D g(X t ; V t ) where f and g are vector-valued functions, and (U t ; V t ) are random error terms. A linear state-space model takes the form Y t D At X t C ct C U t X tC1 D B t X t C d t C V t for deterministic matrices A t and B t and vectors c t and d t . The X t represent the values of unobserved states, and the second equation represents the evolution of these states through time (see [39]). State-space models can be used as models for scalar, vector and matrix-valued quantities. One application is evolution of a covariance structure, for example, representing dependencies in a biological network. If the network is dynamically changing through time, a model similar to those above is required but where X t is a square, positive-definite matrix. For such a network, therefore, a probabilistic model for positive-definite matrices can be constructed from the Wishart/Inverse Wishart distributions [40]. For example, we may have for t D 1; 2; : : :, Y t Normal(0; X t ) X tC1 InverseWishart( t ; X t ) where degrees of freedom parameter t is chosen to induce desirable properties (stationary, constant expectation etc.) in the sequence of X t matrices. Transcriptomics and Functional Genomics A key objective in the study of biological organization is to understand the mechanisms of the transcription of genomic DNA into mRNA that initiates the production of proteins and hence lies at the center of the functioning of the nuclear engine. In a cell in a particular tissue at a particular time, the nucleus contains the entire mRNA profile (transcriptome) which, if it could be measured, would

567

568

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

provide direct insight into the functioning of the cell. If this profile could be measured in a dynamic fashion, then the patterns of gene regulation for one, several or many genes could be studied. Broadly, if a gene is “active” at any time point, it is producing mRNA transcripts, sometimes at a high rate, sometimes at a lower rate, and understanding the relationships between patterns of up- and downregulation lies at the heart of uncovering pathways, or networks of interacting genes. Transcriptomics is the study of the entirety of recorded transcripts for a given genome in a given condition. Functional genomics, broadly, is the study of gene function via measured expression levels and how it relates to genome structure and protein expression. Microarrays A common biological problem is to detect differential expression levels of a gene in two or more tissue or cell types, as any differences may contribute to the understanding of the cellular organization (pathways, regulatory networks), or may provide a mechanism for discrimination between future unlabeled samples. An important tool for the analysis of these aspects of gene function is the microarray, a medium onto which DNA fragments (or probes) are placed or etched. Test sample mRNA fragments are tagged with a fluorescent marker, and then allowed to bond or hybridize with the matching DNA probes specific to that nucleotide sequence, according to the usual biochemical bonding process. The microarray thus produces a measurement of the mRNA content of the test sample for each of the large number of DNA sequences bound to the microarray as probes. Microarrays typically now contain tens of thousands of probes for simultaneous investigation of gene expression in whole chromosomes, or even whole genomes for simple organisms. The hybridization experiments are carried out under strict protocols, and every effort is made to regularize the production procedures, from the preparation stage through to imaging. Typically, replicate experiments are carried out. Microarray experiments have made the study of gene expression routine; instantaneous measurements of mRNA levels for large numbers of different genes can be obtained for different tissue or cell types in a matter of hours. The most important aspects of a statistical analysis of gene expression data are, therefore, twofold; the analysis should be readily implementable for large data sets (large numbers of genes, and/or large numbers of samples), and should give representative, robust and reliable results over a wide range of experiments. Since their initial use as experimental platforms, microarrays have become increasingly sophisticated, allow-

ing measurement of different important functional aspects. Arrays containing whole genomes of organisms can be used for investigation of function, copy-number variation, SNP variation, deletion/insertion sites and other forms of DNA sequence variation (see [41] for a recent summary). High-throughput technologies similar in the form of printed arrays are now at the center of transcriptome investigation in several different organisms, and also widely used for genome-wide investigation of common diseases in humans [42,43]. The statistical analysis of such data represents a major computational challenge. In the list below, a description of details of first and second generation microarrays is given.  First Generation Microarray Studies From the mid 1990s, comparative hybridization experiments using microarrays or gene-chips began to be widely used for the investigation of gene expression. The two principal types of array used were cDNA arrays and oligonucleotide arrays:  cDNA microarrays: In cDNA microarray competitive hybridization experiments, the mRNA levels of a genes in a target sample are compared to the mRNA level of a control sample by attaching fluorescent tags (usually red and green respectively for the two samples) and measuring the relative fluorescence in the two channels. Thus, in a test sample (containing equal amounts of target and control material), differential expression relative to the control is either in terms of up-regulation or downregulation of the genes in the target sample. Any genes that are up-regulated in the target compared to the control and hence that have larger amounts of the relevant mRNA, will fluoresce as predominantly red, and any that are down-regulated will fluoresce green. Absence of differences in regulation will give equal amounts of red and green, giving a yellow fluor. Relative expression is measured on the log scale y D log

xR xTARGET D log xCONTROL xG

(12)

where xR and xG are the fluorescence levels in the RED and GREEN channels respectively.  Oligonucleotide arrays: The basic concept oligonucleotide arrays is that the array is produced to interrogate specific target mRNAs or genes by means of a number of oligo probes usually of length no longer than 25 bases; typically 10-15 probes are used to hybridize to a specific mRNA, with each oligo probe designed to target a specific segment of the mRNA

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

sequence. Hybridization occurs between oligos and test DNA in the usual way. The novel aspect of the oligonucleotide array is the means by which the absolute level of the target mRNA is determined; each perfect match (PM) probe is paired with a mismatch (MM) probe that is identical to the prefect match probe except for the nucleotide in the center of the probe, for which a mismatch nucleotide is substituted, as indicated in the diagram below. PM : ATGTATACTATT A TGCCTAGAGTAC MM : ATGTATACTATT C TGCCTAGAGTAC The logic is that the target mRNA, which has been fluorescently tagged, will bind perfectly to the PM oligo, and not bind at all to the MM oligo, and hence the absolute amount of the target mRNA present can be obtained as the difference x P M  x M M where x P M and x M M are the measurements of for the PM and MM oligos respectively.  Second Generation Microarrays In the current decade, the number of array platforms has increased greatly. The principle of of hybridization of transcripts to probes on a printed array is often still the fundamental biological component, but the design of the new arrays is often radically different. Some of the new types of array are described below (see [44] for a summary).  ChiP-Chip: ChIP-chip (chromatin immunoprecipitation chip) arrays are tiling array with genomic probes systematically covering whole genomes or chromosomes that is used to relate protein expression to DNA sequence by mapping the binding sites of transcription factor and other DNA-binding proteins. See [45] for an application and details of statistical issues.  ArrayCGH: Array comparative genome hybridization (ArrayCGH) is another form of tiling array that is used to detect copy number variation (the variation in the numbers of repeated DNA segments) in subgroups of individuals with the aim of detecting important variations related to common diseases. See [46,47].  SAGE: Serial Analysis of Gene Expression (SAGE) is a platform for monitoring the patterns of expression of many thousands of transcripts in one sample, which relies on the sequencing of short cDNA tags that correspond to a sequence near one end of every transcript in a tissue sample. See [48,49,50].  Single Molecule Arrays: Single Molecule Arrays rely on the binding of single mRNA transcripts to

the spots on the array surface, and thus allows for extremely precise measurement of transcript levels: see [51]. Similar technology is used for precise protein measurement and antibody detection. See [52]. Statistical Analysis of Microarray Data In a microarray experiment, the experimenter has access to expression/expression profile data, possibly for a number of replicate experiments, for each of a (usually large) number of genes. Conventional statistical analysis techniques and principles (hypothesis testing, significance testing, estimation, simulation methods/Monte Carlo procedures) are used in the analysis of microarray data. The principal biological objectives of a typical microarray analysis are:  Detection of differential expression: up- or down-regulation of genes in particular experimental contexts, or in particular tissue samples, or cell lines at a given time instant.  Understanding of temporal aspects of gene regulation: the representation and modeling of patterns of changes in gene regulation over time.  Discovery of gene clusters: the partitioning of large sets of genes into smaller sets that have common patterns of regulation.  Inference for gene networks/biological pathways: the analysis of co-regulation of genes, and inference about the biological processes involving many genes concurrently. There are typically several key issues and models that arise in the analysis of microarray data: such methods are described in detail in [53,54,55,56]. For a Bayesian modeling perspective, see [57].  Array normalization: Arrays are often imaged under slightly different experimental conditions, and therefore the data are often very different even from replicate to replicate. This is a systematic experimental effect, and therefore needs to be adjusted for in the analysis of differential expression. A misdiagnosis of differential expression may be made purely due to this systematic experimental effect.  Measurement error: The reported (relative) gene expression levels models are only in fact proxies for the true level gene expression in the sample. This requires a further level of variability to be incorporated into the model.  Random effects modeling: It may be necessary to use mixed regression models, where gene specific randomeffects terms are incorporated into the model.

569

570

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

 Multivariate analysis: The covariability of response measurements, in time course experiments, or between PM and M M measurements for an oligonucleotide array experiment, is best handled using multivariate modeling.  Testing: One- and two-sample hypothesis testing techniques, based on parametric and non-parametric testing procedures can be used in the assessment of the presence of differential expression. For detecting more complex (patterns of) differential expression, in more general structured models, the tools of analysis of variance (ANOVA) can be used to identify the chief sources of variability.  Multiple testing/False discovery: In microarray analysis, a classical statistical analysis using significance testing needs to take into account the fact that a very large number of tests are carried out. Hence significance levels of tests must be chosen to maintain a required family-wise error rate, and to control the false discovery rate.  Classification: The genetic information contained in a gene expression profile derived from microarray experiments for, say, an individual tissue or tumor type may be sufficient to enable the construction of a classification rule that will enable subsequent classification of new tissue or tumor samples.  Cluster analysis: Discovery of subsets of sets of genes that have common patterns of regulation can be achieved using the statistical techniques of cluster analysis (see Sect. “Clustering”).  Computer-intensive inference: For many testing and estimation procedures needed for microarray data analysis, simulation-based methods (bootstrap estimation, Monte Carlo and permutation tests, Monte Carlo and MCMC) are often necessary, especially when complex Bayesian models are used.  Data compression/feature extraction: The methods of principal components analysis and extended linear modeling via basis functions can be used to extract the most pertinent features of the large microarray data sets.  Experimental design: Statistical experimental design can assist in determining the number of replicates, the number of samples, the choice of time points at which the array data are collected and many other aspects of microarray experiments. In addition, power and sample size assessments can inform the experimenter as to the statistical worth of the microarray experiments that have been carried out.

Typically, data derived from both types of microarrays are highly noise and artefact corrupted. The statistical analysis of such data is therefore quite a challenging process. In many cases, the replicate experiments are very variable. The other main difficulty that arises in the statistical analysis of microarray data is the dimensionality; a vast number of gene expression measurements are available, usually only on a relatively small number of individual observations or samples, and thus it is hard to establish any general distributional models for the expression of a single gene. Clustering Cluster analysis is an unsupervised statistical procedure that aims to establish the presence of identifiable subgroups (or clusters) in the data, so that objects belonging to the same cluster resemble each other more closely than objects in different clusters; see [58,59] for comprehensive summaries. In two or three dimensions, clusters can be visualized by plotting the raw data. With more than three dimensions, or in the case of dissimilarity data (see below), analytical assistance is needed. Broadly, clustering algorithms fall into two categories:  Partitioning Algorithms: A partitioning algorithm divides the data set into K clusters, where and the algorithm is run for a range of K-values. Partitioning methods are based on specifying an initial number of groups, and iteratively reallocating observations between groups until some equilibrium is attained. The most famous algorithm is the K-Means algorithm in which the observations are iteratively classified as belonging to one of K groups, with group membership is determined by calculating the centroid for each group (the multidimensional version of the mean) and assigning each observation to the group with the closest centroid. The K-means algorithm alternates between calculating the centroids based on the current group memberships, and reassigning observations to groups based on the new centroids. A more robust method uses mediods rather than centroids (that is, medians rather than means in each dimension, and more generally, any distance-based allocation algorithm could be used.  Hierarchical Algorithms: A hierarchical algorithm yields an entire hierarchy of clustering for the given data set. Agglomerative methods start with each object in the data set in its own cluster, and then successively merges clusters until only one large cluster remains. Divisive methods start by considering the whole data set as

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

one cluster, and then splits up clusters until each object is separated. Hierarchical algorithms are discussed in detail in Sect. “Hierarchical Clustering“. Data sets for clustering of N observations can either take the form of an N  p data matrix, where rows contain the different observations, and columns contain the different variables, or an N  N dissimilarity matrix, whose (i,j)th element is d i j , the distance or dissimilarity between observations i and j that obeys the usual properties of a metric. Typical data distance measures between two data points i and j with measurement vectors x i and x j are the L1 and L2 Euclidean distances, and the grid-based Manhattan distance for discrete variables, or the Hamming distance for binary variables. For ordinal (ordered categorical) or nominal (label) data, other dissimilarities can be defined. Hierarchical Clustering Agglomerative hierarchical clustering initially places each of the N items in its own cluster. At the first level, two objects are to be clustered together, and the pair is selected such that the potential function increases by the largest amount, leaving N  1 clusters, one with two members, the remaining N  2 each with one. At the next level, the optimal configuration of N  2 clusters is found, by joining two of the existing clusters. This process continuous until a single cluster remains containing all N items. At each level of the hierarchy, the merger chosen is the one that leads to the smallest increase in some objective function. Classical versions of the hierarchical agglomeration algorithm are typically used with average, single or complete linkage methods, depending on the nature of the merging mechanism. Such criteria are inherently heuristic, and more formal model-based criteria can also be used. Modelbased clustering is based on the assumption that the data are generated by a mixture of underlying probability distributions. Specifically, it is assumed that the population of interest consists of K different sub populations, and that the density of an observation from the the sub population is for some unknown vector of parameters. Model-based clustering is described in more detail in Sect. “Model– Based Hierarchical Clustering”. The principal display plot for a clustering analysis is the dendrogram which plots all of the individual data objects linked by means of a binary “tree”. The dendrogram represents the structure inferred from a hierarchical clustering procedure which can be used to partition the data into subgroups as required if it is cut at a certain “height” up the tree structure. As with many of the aspects of the clustering procedures described above, it is more of a heuristic graphical representation rather than a formal

inferential summary. However, the dendrogram is readily interpretable, and favored by biologists. Model-Based Hierarchical Clustering Another approach to hierarchical clustering is model-based clustering (see for example [60,61]), which is based on the assumption that the data are generated by a mixture of K underlying probability distributions as in Eq. (11). Given  T data matrix y D y 1 ; : : : ; y N , let  D (1 ; : : : ;  N ) denote the cluster labels, where  i D k if the ith data point comes from the kth sub-population. In the classification procedure, the maximum likelihood procedure is used to choose the parameters in the model. Commonly, the assumption is made that the data in the different sub-populations follow multivariate normal distributions, with mean u k and covariance matrix ˙ k for cluster k, so that f Yj (yj) D

K X

! k f k (yju k ; ˙ k )

kD1

D

K X kD1

1 1 d/2 (2) j˙ k j1/2

 1 T 1 exp  (y  u k ) ˙ k (y  u k ) 2

!k

where Pr[ i D k] D ! k . If ˙ k D  2 I p is a p  p matrix, then maximizing the likelihood is the same as minimizing the sum of within-group sums of squares and corresponds to the case of hyper-spherical clusters with the same variance. Other forms of ˙ k yield clustering methods that are appropriate in different situations. The key to specifying this is the singular value or eigen decomposition of ˙ k , given by eigenvalues 1 ; : : : ;  p and eigen vectors v1 ; : : : ; v p , as in Principal Components Analysis [62]. The eigen vectors of ˙ k , specify the orientation of the kth cluster, the largest eigenvalue 1 specifies its variance or size, and the ratios of the other eigenvalues to the largest one specify its shape. Further, if ˙ k D  k2 I p , the criterion corresponds to hyper spherical clusters of different sizes, and by fixing the eigenvalue ratios ˛ j D  j /1 for j D 2; 3; : : : ; p across clusters, other cluster shapes are encouraged. Model-Based Analysis of Gene Expression Profiles The clustering problem for vector-valued observations can be formulated using models used to represent the gene expression patterns via the extended linear model, that is, a linear model in non-linear basis functions; see, for example, [63,64] for details.

571

572

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

Generically, the aim of the statistical model is to capture the behavior of the gene expression ratio y t as a function of time t. The basis of the modeling strategy would be to use models that capture the characteristic behavior of expression profiles likely to be observed due to different forms of regulation. A regression framework and model can be adopted. Suppose that Yt is model led using a linear model Yt D X t ˇ C " t

(v is p  1, V is p  p positive-definite and symmetric, all other parameters are scalars) and IGamma denotes the inverse Gamma distribution. Using this prior, standard Bayesian calculations show that conditional on the data      ˇjy;  2 Normal v  ;  2 V 

(16)   T C˛ cC ;   2 jy IGamma 2 2 where

where X t is (in general) a 1  p vector of specified functions of t, and ˇ is a p  1 parameter vector. In vector representation, the gene expression profile over times t1 ; : : : ; t T can be written Y D (Y1 ; : : : ; YT ), Y D Xˇ C " :

(13)

 1 V  D X T X C V 1   1  T v  D X T X C V 1 X y C V 1 v  T c D y T y C v T V 1 v  X T y C V 1 v   T 1  T X X C V 1 X y C V 1 v

(17)

The precise form of design matrix X will be specified to model the time-variation in signal. Typically the random error terms f" t g are taken as independent and identically distributed Normal random variables with variance  2 , implying that the conditional distribution of the responses Y is multivariate normal   (14) YjX; ˇ;  2 N Xˇ;  2 I T

In regression modeling, it is usual to consider a centered parametrization for ˇ so that v D 0, giving

where now X is T  p where I T is the T  T identity matrix. In order to characterize the underlying gene expression profile, the parameter vector ˇ must be estimated. For this model, the maximum likelihood/ordinary least squares estimates of ˇ and  2 are

A critical quantity in a Bayesian clustering procedure is the marginal likelihood, as in Eq. (8), for the data in light of the model: Z       f Yjˇ; 2 yjˇ;  2  ˇj 2   2 dˇ d 2 : f Y (y) D

T   1  y b y y b y Tp  T 1 T DX X X X y.

(18)

 1 T b ˇ ML D X T X X y for fitted values b y D Xb ˇ ML

b 2 D

Bayesian Analysis in Model–Based Clustering In a Bayesian analysis   of the model in (13) a joint prior distribution  ˇ;  2 is specified for ˇ;  2 , and a posterior distribution conditional on the observed data is computed for the parameters. The calculation proceeds using Eq. (5) (essentially with G fixed).       L y; X; ˇ;  2  ˇ;  2     ˇ;  2 jy; X D R  L y; X; ˇ;  2  ˇ;  2 dˇ d 2   where L y; X; ˇ;  2 is the likelihood function. In the linear model context, a conjugate prior specification is used where      ˇj 2 Normal v;  2 V ˛   (15)     2 IGamma ; 2 2

1 T  v  D X T X C V 1 X y  T 1 T T T T c D y y  y X X X C V 1 X y     1 T X y D y T I T  X X T X C V 1

Combining terms above gives that

f Y (y) D

T/2 1 



TC˛ 2 ˛  2  1/2 jV j

 ˛/2 

jVj

1/2



1

fc C  g(TC˛)/2

(19)

This expression is the marginal likelihood for a single gene expression profile. For a collection of profiles belonging to a single cluster, y 1 ; : : : ; y N , Eq. (19) can again be evaluated and used as the basis of a dissimilarity measure as an input into a hierarchical clustering procedure. The marginal likelihood in Eq. (19) can easily be re-expressed for clustered data. The basis of the hierarchical clustering method outlined in [64] proceeds by agglomeration of clusters from N to 1, with the two clusters that lead to the greatest increase marginal likelihood score at each stage of

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

the hierarchy. This method works for profiles of arbitrary length, potentially with different observation time points, however it is computationally most efficient when the time points are the same for each profile. The design matrix X is typically expressed via nonlinear basis functions, for example truncated polynomial splines, Fourier bases or wavelets. For T large, it is usually necessary to use a projection through a lower number of bases; for example, for a single profile, X becomes T  p and ˇ becomes p  1, for T > p. Using different designs, many flexible models for the expression profiles can be fitted. In some cases, the linear mixed effect formulation in Eq. (10) can be used to construct the spline-based models; in such models, some of the ˇ parameters are themselves assumed to be random effects (see [65]). For example, in harmonic regression, regression in the Fourier bases is carried out. Consider the extended linear model Yt D

p X

ˇ j g j (t) C " t

jD0

Complexity in Systems Level Biology and Genetics: Statistical Perspectives, Figure 1 Cluster of gene expression profiles obtained using Bayesian hierarchical model-based clustering: data from the intraerythrocytic developmental cycle of protozoa Plasmodium falciparum. Clustering achieved using harmonic regression model with k D 2. Solid red line is posterior mean for this cluster, dotted red lines are point wise 95% credible intervals for the cluster mean profile, and dotted blue lines are point wise 95% credible intervals for the observations

where g0 (t) D 1 and ( g j (t) D

cos( j t)

j odd

sin( j t)

j even

where p is an even number, p D 2k say, and  j ; j D 1; 2; : : : ; k are constants with 1 < 2 <    <  k . For fixed t, cos( j t) and sin( j x) are also fixed and this model is a linear model in parameters ˇ D (ˇ0 ; ˇ1 ; : : : ; ˇ p )T : This model can be readily fitted to time-course expression profiles. The plot below is a fit of the model with k D 2 to a cluster of profiles extracted using the method described in [64] from the malaria protozoa Plasmodium falciparum data set described in [66]. One major advantage of the Bayesian inferential approach is that any biological prior knowledge that is available can be incorporated in a coherent fashion. For example, the data in Figure 1 illustrate periodic behavior related to the cyclical nature of cellular organization, and thus the choice of the Fourier bases is a natural one. Choosing the Number of Clusters: Bayesian Information Criterion A hierarchical clustering procedure gives the sequence by which the clusters are merged (in agglomerative clustering) or split (in divisive clustering) according the model or distance measure used, but does not give

an indication for the number of clusters that are present in the data (under the model specification). This is obviously an important consideration. One advantage of the modelbased approach to clustering is that it allows the use of statistical model assessment procedures to assist in the choice of the number of clusters. A common method is to use approximate Bayes factors to compare models of different orders (i. e. models with different numbers of clusters), and gives a systematic means of selecting the parametrization of the model, the clustering method, and also the number of clusters (see [67]). The Bayes factor is the posterior odds for one model against the other assuming neither is favored a priori. A reliable approximation to twice the log Bayes factor called the Bayesian Information Criterion (BIC), which, for model M fitted to n data points is given by BIC M D 2 log L M (b ) C d M log n where LM is the Bayesian marginal likelihood from ) is the maximized log likelihood of the Eq. (18), L M (b data for the model M, and dM is the number of parameters estimated in the model. The number of clusters is not considered a parameter for the purposes of computing the BIC. The smaller (more negative) the value of the BIC, the stronger the evidence for the model.

573

574

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

Classification via Model–Based Clustering Any clustering procedure can be used as the first step in the construction of classification rules. Suppose that it, on the basis of an appropriate decision procedure, it is known that there are C clusters, and that a set of existing expression profiles y1 ; : : : ; y N have been allocated in turn to the clusters. Let z1 ; : : : ; z N be the cluster allocation labels for the profiles. Now, suppose further that the C clusters can be decomposed further into two subsets of sizes C0 and C1 , where the subsets represent perhaps clusters having some common, known biological function or genomic origin. For example, in a cDNA microarray, it might be known that the clones are distinguishable in terms of the organism from which they were derived. A new objective could be to allocate a novel gene and expression profile to one of the subsets, and one of the clusters within that subset. Let y i jk , for i D 0; 1, j D 1; 2; : : : ; C i , k D 1; 2; : : : ; N i j denote the kth profile in cluster j in subset i. Let y  denote a new profile to be classified, and   be the binary classification-to-subset, and z the classification-to-cluster variable for y  . Then, by Bayes Rule, for i D 1; 2,   P   D ijy  ; y; z     / p y  j  D i; y; z P   D ijy; z :

(20)

The two terms in Eq. (20) can be determined on the basis of the clustering output. Metabolomics The term metabolome refers to the total metabolite content of an organic sample (tissue, blood, urine etc) obtained from a living organism which represents the products of a higher level of biological interaction than that which occurs within the cell. Metabolomics and metabonomics are the fields in biomedical investigation that combines the application of nuclear magnetic resonance (NMR) spectroscopy with multivariate statistical analysis in studies of the composition of the samples. Metabonomics is often used in reference to the static chemical content of the sample, whereas metabolomics is used to refer to the dynamic evolution of the metabolome. Both involve the measurement of the metabolic response to interventions – see for example [68] – and applications of metabolomics include several in public health and medicine [69,70]. Statistical Methods for Spectral Data The two principal spectroscopic measurement platforms, NMR and Mass Spectrometry (MS) yield alternative representations of the metabolic spectrum. They produce spec-

Complexity in Systems Level Biology and Genetics: Statistical Perspectives, Figure 2 A normalized rat urine spectrum. The ordinate is parts per million, the abscissa is intensity after standardization

tra (or profiles) that consist of several thousands of individual measurements at different resonances or masses. There are several phases of processing of such data; preprocessing using smoothing, alignment and de-noising, peak separation, registration and signal extraction. For an extensive discussion, see [62]. An NMR spectrum consists of measurements of the intensity or frequency of different biochemical compounds (metabolites) represented by a set of resonances dependent upon the chemical structure, and can be regarded as a linear combination of peaks (nominally of various widths) that correspond to singletons or multiple peaks according to the neighboring chemical environment. A typical spectrum extracted from rat urine is depicted in Fig. 2 see [71]. Two dominant sharp peaks are visible. Features of the spectra that require specific statistical modeling include multiple peaks for a single compound, variation in peak shape, and chemical shifts induced by variation in experimental pH Signals from different metabolites can be highly overlapped and subject to peak position variation due primarily to pH variations in the samples, and there are many small scale features (see Fig. 3). Statistical methods of pre-processing NMR spectra for statistical analysis which address the problems outlined above, using, for example, dynamic time warping to achieve alignment of resonance peaks across replicate spectra as a form or spectral registration form part of the necessary holistic Bayesian framework.

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

Complexity in Systems Level Biology and Genetics: Statistical Perspectives, Figure 3 Magnified portion of the spectrum showing small scale features

Classical statistical methods for metabolic spectra include the following:  Principal Components Analysis (PCA) and Regression: a linear data projection method for dimension reduction, feature extraction, and classification of samples in an unsupervised fashion, that is, without reference to labeled cases.  Partial Least Squares (PLS): a non-linear projection method similar to PCA, but implemented in a supervised setting for sample discrimination.  Clustering: Clusters of spectra, or peaks within spectra, can be discovered using similar techniques to those described in Sect. “Clustering”.  Neural Networks: Flexible non-linear regression models constructed from simple mathematical functions that are learned from the observation of cases, that are ideal models for classification. The formulation of an neural netweork involves three levels of interlinked variables; outputs, inputs, and hidden variables, interpreted as a collection of unobserved random variables that form the hidden link between inputs and outputs. Bayesian Approaches The Bayesian framework is a natural one for incorporating genuine biological prior knowledge into the signal reconstruction, and typically useful prior information (about fluid composition, peak location, peak multiplicity) is available. In addition, a hierarchical Bayesian model structure naturally allows construction of plausible models for the spectra across experiments or individuals.

Complexity in Systems Level Biology and Genetics: Statistical Perspectives, Figure 4 Wavelet reconstruction of a region of the spectrum results under the “Least Asymmetric wavelet” with four vanishing moments using the hard thresholding (HT)

 Flexible Bayesian Models: The NMR spectrum can be represented as a noisy signal derived from some underlying and biologically important mechanism. Basisfunction approaches (specifically, wavelets) have been much used to represent non-stationary time-varying signals [65,71,72,73]. The sparse representation of the NMR spectrum in terms of wavelet coefficients makes them an excellent tool in data compression, yet these coefficients can still be easily transformed back to the spectral domain to give a natural interpretation in terms of the underlying metabolites. Figure 4 depicts the reconstruction of the rat urine spectrum in the region between 2.5 and 2.8 ppm using wavelet methods; see [71].  Bayesian Time Series Models for Complex Non-stationary Signals: See for example [74]. The duality between semi-parametric modeling of functions and latent time series models allows a view of the analysis of the underlying NMR spectrum not as a set of point wise evaluations of a function, but rather as a (time-ordered) series of correlated observations with some identifiable latent structure. Time series models, computed using dynamic calculation (filtering), provide a method for representing the NMR spectra parsimoniously.  Bayesian Mixture Models: A reasonable generative model for the spectra is one that constructs the spectra from a large number of symmetric peaks of varying size, corresponding to the contributions of dif-

575

576

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

ferent biochemical compounds. This can be approximated using a finite mixture model, where the number, magnitudes and locations, of the spectral contributions are unknown. Much recent research has focused on the implementation of computational strategies for Bayesian mixtures, in particular Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) have proved vital. The reconstruction of NMR spectra is a considerably more challenging area than those for which mixture modeling is conventionally used, as many more individual components are required. Flexible semi-parametric mixture models have been utilized in [75,76], whilst fully non-parametric mixture models similar to those described in Sect. “Mixture Models” can also be used [73]. A major advantage of using the fully Bayesian framework is that, once again, all relevant information (the spectral data itself, knowledge of the measurement processes for different experimental platforms, the mechanisms via which multiple peaks and shifts are introduced) can be integrated in a coherent fashion. In addition, prior knowledge about the chemical composition of the samples can be integrated via a prior distribution constructed by inspection of the profiles for training samples. At a higher level of synthesis, the Bayesian paradigm offers a method for integrating metabolomic data with other functional or structural data, such as gene expression or protein expression data. Finally, the metabolic content of tissue changes temporally, so dynamic modeling of the spectra could also be attempted. Future Directions Biological data relating to structure and function of genes, proteins and other biological substances are now available from a wide variety of platforms. Researchers are beginning to develop methods for coherent combination of data from different experimental processes to get an entire picture of biological cause and effect. For example, the effective combination of gene expression and metabonomic data will be of tremendous utility. A principal challenge is therefore the fusion of expression data derived from different experimental platforms, and seeking links with sequence and ontological information available. Such fusion will be critical in the future of statistical analysis of large scale systems biology and bioinformatics data sets. In terms of public health impact of systems biology and statistical genomics, perhaps the most prominent is the study of common diseases through high-throughput genotyping of single nucleotide polymorphisms (SNPs). In genome wide association studies, SNP locations that

correlate with disease status or quantitative trait value are sought. In such studies, the key statistical step involves the selection of informative predictors (SNP or genomic loci) from a large collection of candidates. Many such genome wide studies have been completed or are ongoing (see [42,43,77,78]). Such studies represent huge challenges for statisticians and mathematical modelers, as the data contain many subtle structures but also as the amount of information is much greater than that available for typical statistical analysis. Another major challenge to the quantitative analysis of biological data comes in the form of image analysis and extraction. Many high throughput technologies rely on the extraction of information from images, either in static form, or dynamically from a series of images. For example it is now possible to track the expression level of mRNA transcripts in real-time ([79,80,81]), and to observe mRNA transcripts moving from transcription sites to translation sites (see for example [82]). Imaging techniques can also offer insights into aspects of the dynamic organization of nuclear function by studying the positioning of nuclear compartments and how those compartments reposition themselves in relation to each other through time. The challenges for the statistician are to develop real-time analysis methods for tracking and quantifying the nature and content of such images, and tools from spatial modeling and time series analysis will be required. Finally, flow cytometry can measure characteristics of millions of cells simultaneously, and is a technology that offers many promises for insights into biological organization and public health implications. However, quantitative measurement and analysis methods are only yet in the early stages of development, but offer much promise (see [83,84]). Bibliography 1. Kitano H (ed) (2001) Foundations of Systems Biology. MIT Press, Cambridge 2. Kitano H (2002) Computational systems biology. Nature 420(6912):206–210 3. Alon U (2006) An Introduction to Systems Biology. Chapman and Hall, Boca Raton 4. Edwards AWF (2000) Foundations of mathematical genetics, 2nd edn. Cambridge University Press, Cambridge 5. Yule GU (1924) A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis. Philos Trans R Soc Lond Ser B 213:21–87 6. Fisher RA (1922) On the dominance ratio. Proc R Soc Edinburgh 42:321–341 7. Fisher RA (1930) The genetical theory of natural selection. Clarendon Press, Oxford 8. Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

9. Murray JD (2002) Mathematical Biology: I An Introduction. Springer, New York 10. Murray JD (2003) Mathematical Biology: II Spatial Models and Biomedical Applications. Springer, New York 11. Lewin B (2007) Genes, 9th edn. Jones & Bartlett Publishers, Boston 12. Spector DL (2001) Nuclear domains. J Cell Sci 114(16):2891–3 13. Bernardo JM, Smith AFM (1994) Bayesian Theory. Wiley, New York 14. Haefner JW (ed) (2005) Modeling Biological Systems: Principles and Applications, 2nd edn. Springer, New York 15. Ramsay JO, Hooker G, Campbell D, Cao J (2007) Parameter estimation for differential equations: a generalized smoothing approach. J Royal Stat Soc: Series B (Methodology) 69(5):741–796 16. Donnet S, Samson A (2007) Estimation of parameters in incomplete data models defined by dynamical systems. J Stat Plan Inference 137(9):2815–2831 17. Rogers S, Khanin R, Girolami M (2007) Bayesian model-based inference of transcription factor activity. BMC Bioinformatics 8(Suppl 2) doi:10.1186/1471-2105-8-S2-S2 18. Wilkinson DJ (2006) Stochastic Modelling for Systems Biology. Chapman & Hall (CRC), Boca Raton 19. Heron EA, Finkenstädt B, Rand DA (2007) Bayesian inference for dynamic transcriptional regulation; the hes1 system as a case study. Bioinformatics 23(19):2596–2603 20. Airoldi EM (2007) Getting started in probabilistic graphical models. PLoS Comput Biol 3(12):e252 21. Husmeier D, Dybowski R, Roberts S (eds) (2005) Probabilistic Modelling in Bioinformatics and Medical Informatics. Springer, Ney York 22. Friedman N (2004) Inferring cellular networks using probabilistic graphical models. Science 303:799–805 23. Opgen-Rhein R, Strimmer K (2007) From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst Biol 1:37:1–10 24. Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA 97:12182–12186 25. Friedman N, Linial M, Nachman I, Pe’er D (2000) Using bayesian networks to analyze expression data. J Comput Biol 7:601–620 26. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA Jr, Marks JR, Nevins JR (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 98(11):462–467 27. Dobra A, Hans C, Jones B, Nevins J, Yao G, West M (2004) Sparse graphical models for exploring gene expression data. J Multivar Anal 90:196–212 28. Jones B, Carvalho C, Dobra A, Hans C, Carter C, West M (2005) Experiments in stochastic computation for high dimensional graphical models. Stat Sci 20:388–400 29. Markowetz F, Bloch J, Spang R (2005) Non-transcriptional pathway features reconstructed from secondary effects of RNA interference. Bioinformatics 21:4026–4032 30. Eaton D, Murphy KP (2007) Exact Bayesian structure learning from uncertain interventions. Artificial Intelligence & Statistics 2:107–114 31. Robert CP (2007) The Bayesian Choice: From Decision– Theoretic Foundations to Computational Implementation. Texts in Statistics, 2nd edn. Springer, New York

32. Doucet A, de Freitas N, Gordon NJ (eds) (2001) Sequential Monte Carlo Methods in Practice, Statistics for Engineering and Information Science. Springer, New York 33. Robert CP, Casella G (2005) Monte Carlo Statistical Methods. Texts in Statistics, 2nd edn. Springer, New York 34. Gamerman D, Lopes HF (2006) Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Texts in Statistical Science, 2nd edn. Chapman and Hall (CRC), Boca Raton 35. Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann Stat 2:1152–1174 36. Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J Am Stat Assoc 90(430):577–588 37. Dahl DB (2006) Model-based clustering for expression data via a Dirichlet process mixture model. In: Do KA, Müller P, Vannucci M (eds) Bayesian Inference for Gene Expression and Proteomics. University Press, Cambridge, Chap 10 38. Kim S, Tadesse MG, Vannucci M (2006) Variable selection in clustering via Dirichlet process mixture models. Biometrika 93(4):877–893 39. West M, Harrison J (1999) Bayesian Forecasting and Dynamic models, 2nd edn. Springer, New York 40. Philipov A, Glickman ME (2006) Multivariate stochastic volatility via Wishart processes. J Bus Econ Stat 24(3):313–328 41. Gresham D, Dunham MJ, Botstein D (2008) Comparing whole genomes using DNA microarrays. Nat Rev Genet 9:291–302 42. The Wellcome Trust Case Control Consortium (2007) Association scan of 14,500 nonsynonymous snps in four diseases identifies autoimmunity variants. Nat Genet 39:1329–1337 43. The Wellcome Trust Case Control Consortium (2007) Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678 44. Liu XS (2007) Getting started in tiling microarray analysis. PloS Comput Biol 3(10):1842–1844 45. Johnson WE, Li W, Meyer CA, Gottardo R, Carroll JS, Brown M, Liu XS (2006) Model-based analysis of tiling-arrays for ChIPchip. Proc Natl Acad Sci USA 103(33):12457–62 (2006) 46. Freeman JL et al (2006) Copy number variation: New insights in genome diversity. Genome Res 16:949–961 47. Urban AE et al (2006) High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. Proc Natl Acad Sci USA 103(12):4534–4539 48. Saha S et al (2002) Using the transcriptome to annotate the genome. Nat Biotech 20:508–512 49. Shadeo A et al (2007) Comprehensive serial analysis of gene expression of the cervical transcriptome. BMC Genomics 8:142 50. Robinson SJ, Guenther JD, Lewis CT, Links MG, Parkin IA (2007) Reaping the benefits of SAGE. Methods Mol Biol 406:365–386 51. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the Small RNA Component of the Transcriptome. Science 309(5740):1567–1569 52. Weiner H, Glökler J, Hultschig C, Büssow K, Walter G (2006) Protein, antibody and small molecule microarrays. In: Müller UR, Nicolau DV (eds) Microarray Technology and Its Applications. Biological and Medical Physics. Biomedical Engineering. Springer, Berlin, pp 279–295 53. Speed TP (ed) (2003) Statistical Analysis of Gene Expression Microarray Data. Chapman & Hall/CRC, Bacon Raton

577

578

Complexity in Systems Level Biology and Genetics: Statistical Perspectives

54. Parmigiani G, Garett ES, Irizarry RA, Zeger SL (eds) (2003) The Analysis of Gene Expression Data. Statistics for Biology and Health. Springer, New York 55. Wit E, McClure J (2004) Statistics for Microarrays: Design, Analysis and Inference. Wiley, New York 56. Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) (2005) Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Statistics for Biology and Health. Springer, New York 57. Do KA, Müller P, Vannucci M (2006) Bayesian Inference for Gene Expression and Proteomics. Cambridge University Press, Cambridge 58. Everitt BS, Landau S, Leese M (2001) Cluster Analysis, 4th edn. Hodder Arnold, London 59. Kaufman L, Rousseeuw PJ (2005) Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Statistics, 2nd edn. Wiley, Ney York 60. Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformation for gene expression data. Bioinformatics 17:977–987 61. McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422 62. De Iorio M, Ebbels TMD, Stephens DA (2007) Statistical techniques in metabolic profiling. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of Statistical Genetics, 3rd edn. Wiley, Chichester, Chap 11 63. Heard NA, Holmes CC, Stephens DA, Hand DJ, Dimopoulos G (2005) Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges. Proc Natl Acad Sci USA 102(47):16939–16944 64. Heard NA, Holmes CC, Stephens DA (2006) A Quantitative Study of Gene Regulation Involved in the Immune Response of Anopheline Mosquitoes: An Application of Bayesian Hierarchical Clustering of Curves. J Am Stat Assoc 101(473):18–29 65. Morris JS, Brown PJ, Baggerly KA, Coombes KR (2006) Analysis of mass spectrometry data using Bayesian wavelet-based functional mixed models. In: Do KA, Müller P, Vannucci M (eds) Bayesian Inference for Gene Expression and Proteomics. Cambridge University Press, Cambridge, pp 269–292 66. Bozdech Z, Llinás M, Pulliam BL, Wong ED, Zhu J, DeRisi JL (2003) The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum. PLoS Biol 1(1):E5 67. Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795 68. Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: a platform for studying drug toxicity and gene function. Nat Rev Drug Discov 1:153–161 69. Lindon JC, Nicholson JK, Holmes E, Antti H, Bollard ME, Keun H, Beckonert O, Ebbels TM, Reily MD, Robertson D (2003) Con-

70.

71.

72.

73.

74.

75.

76.

77.

78. 79. 80. 81.

82.

83.

84.

temporary issues in toxicology the role of metabonomics in toxicology and its evaluation by the COMET project. Toxic Appl Pharmacol 187:137 Brindle JT, Annti H, Holmes E, Tranter G, Nicholson JK, Bethell HWL, Clarke S, Schofield SM, McKilligin E, Mosedale DE, Graingerand DJ (2002) Rapid and noninvasive diagnosis of the presence and severity of coronary heart disease using 1HNMR-based metabonomics. Nat Med 8:143 Yen TJ, Ebbels TMD, De Iorio M, Stephens DA, Richardson S (2008) Analysing real urine spectra with wavelet methods. (in preparation) Brown PJ, Fearn T, Vannucci M (2001) Bayesian wavelet regression on curves with applications to a spectroscopic calibration problem. J Am Stat Soc 96:398–408 Clyde MA, House LL, Wolpert RL (2006) Nonparametric models for proteomic peak identification and quantification. In: Do KA, Müller P, Vannucci M (eds) Bayesian Inference for Gene Expression and Proteomics. Cambridge University Press, Cambridge, pp 293–308 West M, Prado R, Krystal A (1999) Evaluation and comparison of EEG traces: Latent structure in non-stationary time series. J Am Stat Assoc 94:1083–1095 Ghosh S, Grant DF, Dey DK, Hill DW (2008) A semiparametric modeling approach for the development of metabonomic profile and bio-marker discovery. BMC Bioinformatics 9:38 Ghosh S, Dey DK (2008) A unified modeling framework for metabonomic profile development and covariate selection for acute trauma subjects. Stat Med 30;27(29):3776–88 Duerr RH et al (2006) A Genome–Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease Gene. Science 314(5804):1461–1463 Sladek R et al (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885 Longo D, Hasty J (2006) Imaging gene expression: tiny signals make a big noise. Nat Chem Biol 2:181–182 Longo D, Hasty J (2006) Dynamics of single-cell gene expression. Mol Syst Biol 2:64 Wells AL, Condeelis JS, Singer RH, Zenklusen D (2007) Imaging real-time gene expression in living systems with single-transcript resolution: Image analysis of single mRNA transcripts. CSH Protocols, Cold Springer Habor Rodriguez AJ, Condeelis JS, Singer RH, Dictenberg JB (2007) Imaging mRNA movement from transcription sites to translation sites. Semin Cell Dev Biol 18(2):202–208 Lizard G (2007) Flow cytometry analyses and bioinformatics: Interest in new softwares to optimize novel technologies and to favor the emergence of innovative concepts in cell research. Cytom A 71A:646–647 Lo K, Brinkman RR, Gottardo R (2008) Automated gating of flow cytometry data via robust model-based clustering. Cytom Part A 73A(4):321–332

Complex Networks and Graph Theory

Complex Networks and Graph Theory GEOFFREY CANRIGHT Telenor R&I, Fornebu, Norway Article Outline Glossary Definition of the Subject Introduction Graphs, Networks, and Complex Networks Structure of Networks Dynamical Network Structures Dynamical Processes on Networks Graph Visualization Future Directions Bibliography Glossary Directed/Undirected graph A set of vertices connected by directed or undirected edges. A directed edge is one-way (A ! B), while an undirected edge is twoway or symmetric: A  B. Network For our purposes, a network is defined identically to a graph: it is an abstract object composed of vertices (nodes) joined by (directed or undirected) edges (links). Hence we will use the terms ‘graph’ and ‘network’ interchangeably. Graph topology The list of nodes i and edges (i; j) or (i ! j) defines the topology of the graph. Graph structure There is no single agreed definition for what constitutes the “structure” of a graph. To the contrary: this question has been the object of a great deal of research—research which is still ongoing. Node degree distribution One crude measure of a graph’s structure. If nk is the number of nodes having degree k in a graph with N nodes, then the set of nk is the node degree distribution—which is also often expressed in terms of the frequencies p k D n k /N. Small-worlds graph A “small-worlds graph” has two properties: it has short path lengths (as is typical of random graphs)—so that the “world” of the network is truly “small”, in that every node is within a few (or not too many) hops of every other; and secondly, it has (like real social networks, and unlike random graphs) a significant degree of clustering—meaning that two neighbors of a node have a higher-than-random probability of also being linked to one another. Graph visualization The problem of displaying a graph’s topology (or part of it) in a 2D image, so as to give

the viewer insight into the structure of the graph. We see that this is a hard problem, as it involves both the unsolved problem of what we mean by the structure of the graph, and also the combined technological/psychological problem of conveying useful information about a (possibly large) graph via a 2D (or quasi-3D) layout. Clearly, the notion of a good graph visualization is dependent on the use to which the visualization is to be put—in other words, on the information which is to be conveyed. Section Here, a ‘bookkeeping’ definition. This article introduces the reader to all of the other articles in the Section of the Encyclopedia which is titled “Complex Networks and Graph Theory”. Therefore, whenever the word ‘Section’ (with a large ‘S’) is used in this ‘roadmap’ article, the word refers to that Section of the Encyclopedia. To avoid confusion, the various subdivisions of this roadmap article will be called ‘parts’. Definition of the Subject The basic network concept is very simple: objects, connected by relationships. Because of this simplicity, the concept turns up almost everywhere one looks. The study of networks (or equivalently, graphs), both theoretically and empirically, has taken off over the last ten years, and shows no sign of slowing down. The field is highly interdisciplinary, having important applications to the Internet and the World Wide Web, to social networks, to epidemiology, to biology, and in many other areas. This introductory article serves as a reader’s guide to the 13 articles in the Section of the Encyclopedia which is titled “Complex Networks and Graph Theory”. These articles will be discussed in the context of three broad themes: network structure; dynamics of network structure; and dynamical processes running over networks. Introduction In the past ten years or so, the study of graphs has exploded, leaving forever the peaceful sanctum of pure mathematics to become a fundamental concept in a vigorous, ill-defined, interdisciplinary, and important field of study. The most common descriptive term for this field is the “study of complex networks”. This term is distinguished from the older, more mathematically-bound term “graph theory” in two ways. First – and perhaps most important – this new field is not just theoretical; to the contrary, the curious researcher finds that there is an enormous variety of empirically obtained graphs/networks (we use the terms interchangeably here) available as datasets on the Web. That is, one studies real, measured graphs;

579

580

Complex Networks and Graph Theory

and not surprisingly, this empirical connection gives the endeavor much more of an applied flavor also. The second distinction is perhaps not well motivated by the words used; but the “study of complex networks” typically means studying networks whose structure deviates in important ways from the “classical random graphs” of Erd˝os and Rényi [10,11]. We note that these two points are related: as one turned to studying real networks [18], one found that they were not described by classical random graphs [24,25]; hence one was forced to look at other kinds of structures. This article is a reader’s guide to the other articles in the Section entitled “Complex Networks and Graph Theory”. The inclusion of both terms was very deliberate: graph theory gives the mathematical foundation for the more messy endeavor called “complex networks”; and the two fields have a strong and fruitful interaction. In this article I will describe the 13 other articles of this Section. These 13 articles amply document the interdisciplinary nature of this exciting field: we find represented mathematics, biology, the Web and the Internet, software, epidemiology, and social networks (with the latter field also having its own Section in the Encyclopedia). For this reason, I will not define the parts of this article by field of study, but rather by general themes which run through essentially all studies of networks (at least in principle, and often in fact). In each part of this article I will point out those articles which represent that part’s theme to a significant degree. Hence these themes are meant to tie together all of the articles into a simple framework. In the next part “Graphs, Networks, and Complex Networks”, I will concisely present the basic terminology which is in use. Then in entitled part the “Structure of Networks” I discuss the knotty question of the structure of a graph; this problem is the first theme, and it is very much unfinished business. In part “Dynamical Network Structures” we look at network topologies (and structures) which are dynamic rather than static. This is clearly an important theme, since (i) real empirical networks are necessarily dynamic (on some – often short – time scale), and (ii) the study of how networks grow and evolve can be highly useful for understanding how they have the structure we observe today. Then in part “Dynamical Processes on Networks” we look at the very large question of dynamical processes which take place over networks. Examples which illustrate the importance of this topic are epidemic spreading over social and computer networks, and the activation/inhibition processes going on over gene (or neural) networks. Clearly the progress of such dynamical processes may be strongly dependent on the underlying network structure; hence we see that all of these themes

(parts “Structure of Networks”–“Dynamical Processes on Networks”) are tightly related. In part “Graph Visualization” we look briefly at an important but somewhat ‘orthogonal’ theme, namely, the problem of graph visualization: given a graph’s topology, how can one best present this information visually to a human viewer? Finally, part “Future Directions” offers a very brief, modest, and personal take on the very large question of “future directions” in research on complex networks. Graphs, Networks, and Complex Networks Graphs One of the earliest applications of graph theory may be found in Euler’s solution, in 1736, of the problem called ‘The seven bridges of Königsberg’. Euler considered the problem of finding a continuous path crossing each of the seven bridges of Königsberg exactly once, and solved the problem by representing each connected piece of land as a vertex (node) of a graph, and each bridge as an undirected (two-way) link. (For a nice presentation of this problem, see http://mathforum.org/isaac/problems/ bridges2.html.) This little problem is an excellent example of the power of mathematics to extract understanding via abstraction: the lay person may stare at the bridges, islands etc, and try various ideas—but reducing the entire problem to an abstract graph, composed only of nodes and links, aids the application of pure reason, leading to a final and utterly convincing solution. To study graphs is to study discrete objects and the relationships between them. Hence graph theory may be regarded as a branch of combinatorics. Erd˝os and Rényi [10,11] founded the study of “classical random graphs”. These graphs are specified by the node number N (which typically is assumed to grow large), and by links laid down at random between these nodes, subject to various constraints. One such constraint is that every node have exactly k links—giving a (k) regular random graph. A more relaxed constraint is simply to specify m links in total (so that the average node degree is hki D m/N). Further relaxing of this constraint gives that every possible link, out of the set of N D N(N  1)/2 2 possible links, is included with probability p. This gives the average node degree (now averaged over many graphs) as hki D hmi/N D p(N  1)/2. While all of these types of classical random graphs are similar “in spirit”, those with the fewest constraints are those for which it is easiest to prove things.

Complex Networks and Graph Theory

We are fortunate to have in our Section the article  Random Graphs, A Whirlwind Tour of by Fan Chung. In this article, the reader is given a good introduction to classical random graphs, along with a thorough presentation of the more modern theory of random graphs with a new, and more realistic, type of constraint—namely that they should, on average, have a given node degree distribution. This new theory makes the study of random graphs extremely relevant for today’s empirically-anchored research: many empirical graphs are characterized by their node degree distribution, and many of these in fact have a power-law degree distribution, such that the number nk of nodes having degree k varies with k by a power law: n k k ˇ . This work is useful, precisely because a random graph with a given node degree distribution is the “most typical” graph of the set of graphs with that degree distribution. Hence, statements about such random graphs are statements about typical graphs with the same degree distribution—unless and until we know more about the empirical graphs. Networks As noted earlier in this introduction, we consider the terms ‘network’ and ‘graph’ to be interchangeable. Nevertheless there is a bit more to be said about the term. The ‘network’ concept motivates and infuses a vigorous and lively research activity that has more or less exploded since (roughly) the work of Watts and Strogatz [24,25]. Much of their work was motivated by the ‘small worlds problem’. The latter dates back to the work of Milgram [18] (and even earlier). Milgram posed the question: how far is it from Kansas (or Nebraska) to Boston, Massachusetts—when the distance is measured in ‘hops’ between people who know one another? Modern language would rephrase this question as follows: is the US acquaintanceship network a ‘small world’? Milgram’s answer was ‘yes’: after disregarding the chains (of letters—that was the mechanism for a ‘hop’) that never reached the target, the average path length was roughly 5–6 hops—a ‘small world’. The explosion of interest in the last 10 years is well documented in the set of references in “Books and Reviews” (below). We also include in this part an introductory discussion of directed graphs. Directed graphs have directed links: it no longer suffices to say “i and j are linked”, because there is a directionality in the linking: (i ! j) or ( j ! i) (or both). Some of the mathematical background for understanding directed graphs is provided in the article in this Section by Berman and Shaked-Monderer ( Non-neg-

ative Matrices and Digraphs). One quickly finds, upon coming in contact with directed graphs, that one is in a rather different world—the mathematics has changed, the structures are different, and one’s intuition often fails. Directed graphs are however here to stay, and well worth study. We cite two classic examples to demonstrate the extreme relevance of directed graphs. First, there is early and pioneering work by Stuart Kauffman [14,15] on genetic regulatory networks. These form directed graphs, because the links express the fact that gene G1 regulates gene G2 (G1 ! G2)—a relationship which is by no means symmetric. Understanding gene regulation and expression is a fundamental problem in biology—and we see that the problem may be usefully expressed as one of understanding dynamics on a directed graph. The article in this Section by Huang and Kauffman ( Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination) brings us up to date on this exciting problem. A more well known directed graph, which plays a role for many of us in our daily lives, is the Web graph [1,2,8]. We navigate among Web pages using hyperlinks—oneway pointers taking us (e. g.) from page A to page B (A ! B). The utility of the Web as a source of information—and as a platform for interaction—is enormous. Hence the Web is well worth intense study, both intellectually and practically. Hyperlinks are useful, not only for navigation, but also to aid in ranking Web pages. This utility was clearly pointed out in two early papers on Web link analysis [7,17]. “Web link analysis” may be viewed most simply as a process which takes (all or part of) the Web graph as input, and which gives ‘importance scores’ for each Web page as output. The PageRank approach to link analysis of Brin and Page [7] has almost certainly played a significant role in the meteoric rise of the Web search company Google. Hence there is both practical and commercial value in understanding the Web graph. We have two complementary papers in this Section which treat this important area: that by Adamic on  World Wide Web, Graph Structure, and that by Bjelland et al. on  Link Analysis and Web Search. Complex Networks Now we come to the last substantial word in the title of this Section: ‘complex’. This is a word that (as yet) lacks a precise scientific meaning. There are in fact too many definitions; in other words, there is no generally-agreed precise definition. For a good overview of work on this thorny problem we refer the reader to [4].

581

582

Complex Networks and Graph Theory

Fortunately, for the purposes of this Section, we need only a very simple definition: ‘complex networks’ are those which are not well modeled by classical random graphs (described above, and in the article by Fan Chung). The use of the term ‘complex networks’ is very widespread—while examples which give a clear definition are less common. We note that the definition given here is also cited by Dorogovtsev in his article in this Section ( Growth Models for Networks). Structure of Networks We have noted already that there is no single, simple answer to the question “what is the structure of this graph”? With this caveat in mind, we offer the reader a guide to the articles in this Section which address the structure of networks. Undirected Graphs We have already noted the article by Fan Chung ( Random Graphs, A Whirlwind Tour of), giving an up-to-date overview of the properties of random graphs with a given node degree distribution—with the well-studied case being power-law graphs. She cites a number of experimental results which indicate that the experimental exponents ˇ (taken from the power-law degree distributions) fall in one range (2 < ˇ < 3) for social and technological networks, and another, rather distinct range (ˇ < 2) for biological networks. This may be regarded as a (quantitative) structural difference between these two types of network; Chung offers an explanation based on qualitatively distinct growth mechanisms (see the next part of this article). Fortunato and Castellano ( Community Structure in Graphs) offer an excellent overview of another broad approach to network structure. Here the idea is to understand structure in terms of substructure (and subsubstructure, etc). That is: here is a network. Can we identify subgraphs—possibly overlapping, possibly disjoint—of this network that in some sense “belong together”? In other words: how can one identify the community structure of a graph? The review of Fortunato and Castellano is on the one hand very thorough—and on the other hand makes clear that there is no one agreed answer to this question. There are indeed almost as many answers as there are theoretical approaches; and this problem has received a lot of attention. Fortunato and Castellano note that there is currently a “favorite” approach, defined by finding subgraphs with high modularity [21]. Roughly speaking, a subgraph with high modularity has a higher density of internal links than that found for the same subgraph in a randomized ‘null model’ for the same graph.

Fortunato and Castellano give a careful discussion of the strengths and weaknesses of this approach to community detection, as well as of many others. Since the work of Watts and Strogatz [24,25], the definition of a ‘small-world graph’ has included two criteria. First, one must have the short average path length which Milgram found in his experiments, and which gives rise to the term ‘small worlds’. However, one also insists that a ‘true’ small-world graph should locally resemble a real social network, in that there is a significant degree of clustering. Roughly speaking, this means that the probability that two of my acquaintances know each other is higher than random. More mathematically, to say that a graph G has high clustering means that the incidence of closed triangles of links in G is higher than expected in a randomized ‘null model’ for G. A closed triangle is a small, simply defined subgraph, and in studying clustering one studies the statistics of the occurrence of this small subgraph. A more general term for this type of small subgraph is a motif [19]. Much as with clustering and triangles, one defines a set of motifs, and then generates a significance profile for any given network, comparing the frequency of each motif in the profile to that of a corresponding random graph. Valverde and Solé ( Motifs in Graphs) offer a stimulating overview of the study of motifs in networks of many types—both directed and undirected. They point out a remarkable consistency of motif significance profiles across networks with very different origins—for example, a software graph and the gene network of a bacterium—and argue that this consistency is best understood in terms of historical accident, rather than in terms of functionality. The article by Liljeros ( Human Sexual Networks) looks at empirical human sexual networks. He addresses the evidence for and against the notion that such networks are power-law (also known as “scale free”). This question is important for understanding epidemic spreading over networks—especially in the light of the results of Pastor-Satorras and Vespignani [23], which showed that epidemic spreading on power-law networks is more difficult to stop than was predicted by earlier models using the well-mixed (all-to-all) approximation. Liljeros examines carefully what is known about the structure of human sexual networks, noting the great difficulty inherent in gathering extensive and/or reliable data. The article by He, Siganos, and Faloutsos ( Internet Topology) looks at a very different empirical network, namely the physical Internet, composed (at the lowest level) of routers and cables. The naïve newcomer might assume that the Internet, being an engineered system, is fully mapped out, and hence its topology should be readily

Complex Networks and Graph Theory

“understood”. The article by He et al. presents the reality of the Internet: it is largely self-organized (especially at the higher level of organization, the ‘Autonomous System’ or AS level); it is far from trivial to experimentally map out this network—even at the AS level; and there is not even agreement on whether or not the AS-graph is a power-law graph—which is after all a rather crude measure of structure of the network. He et al. describe recent work which offers a neat resolution of the conflicting and ambiguous data on this question. They then go on to describe more imaginative models for the structure of the Internet, going beyond simply the degree distribution, and having names like ‘Jellyfish’ and ‘Medusa’. Directed Graphs A generic directed graph is immediately distinguished from its undirected counterparts in that a natural unit of substructure is obvious, and virtually always present. That is the ‘strongly connected component’ or SCC (termed ‘class’ by Berman and Shaked-Monderer). That is, even when a directed graph is connected, there are node pairs which cannot reach one another by following directed paths. An SCC C is then a maximal set of nodes satisfying the constraint that every node in C is reachable, via a directed path, from every other node in C. The SCCs then form disjoint sets (equivalence classes), and every node is in one SCC. In short, the very notion of ‘reachability’ is more problematic in directed graphs: all nodes in the same SCC can reach one another, but otherwise, all bets are off! Lada Adamic ( World Wide Web, Graph Structure) gives a good overview of what is known empirically about the structure of the Web graph—that abstract and yet very real object in which a node is a Web page, and a link is a (one-way) hyperlink. The Web graph is highly dynamic, in at least two ways: pages have a finite lifetime, with new ones appearing while old ones disappear; also, many Web pages are dynamic, in that they generate new content when accessed—and can in principle represent an infinite amount of content. Also, of course, the Web is huge. Lada Adamic presents the problems associated with ‘crawling’ the Web to map out its topology. The reader will perhaps not be surprised to learn that the Web graph obeys a power law—both for the indegree distribution and for the outdegree distribution. Adamic discusses several other measures for the structure of the Web graph, including its gross SCC structure (the ‘bow tie’), its diameter, reciprocity, clustering, and motifs. The problem of path lengths and diameter is less straightforward for directed graphs, since the unreachability problem produces infinite or undefined path length for many pairs.

A nice bonus in the article by Adamic is the discussion of some popular and well studied subgraphs of the Web graph—query connection graphs, Weblogs, and Wikipedia. Query connection graphs are subgraphs built from a hit list, and can give useful information about the query itself, and the likelihood that the hit list will be satisfactory. Weblogs and Wikipedia are well known to most Web-literate people; and it is fascinating to see these daily phenomena subjected to careful scientific analysis. The article by Bjelland et al. ( Link Analysis and Web Search) has perhaps the closest ties to the mathematical presentation of Berman and Shaked-Monderer ( Non-negative Matrices and Digraphs). This is because Web link analysis tends to focus on the principal eigenvector of the graph’s adjacency matrix (or of some modification of the adjacency matrix), while Berman and ShakedMonderer discuss in some detail the spectral properties of this matrix, and give some results about the principal eigenvector. The latter yields importance or authority scores for each page. These scores are the principal output of Web link analysis; and in fact Berman and ShakedMonderer cite PageRank as an outstanding example of an application of the theory [22]. Bjelland et al. explain the logic leading to the choice of the principal eigenvector, as a way of ‘harvesting’ information from the huge set of ‘collective recommendations’ that the hyperlinks constitute. They also present the principal approaches to link analysis (the ‘big three’), and place them in a simple logical framework which is completed by the arrival of a new, fourth approach. In addition, Bjelland et al. discuss a number of technical issues related to Web link analysis—which is, after all, a practical and commercial field of endeavor, as well as an object for research and understanding. Jennifer Dunne ( Food Webs) presents a rather special type of directed graph taken from biology: the food web. She offers a very concise definition of this concept in her glossary, which we reproduce here: “the network of feeding interactions among diverse co-occurring species in a particular habitat”. We note that most feeding interactions are one-way: bird B eats insect I, but insect I does not eat bird B. However, food webs are rather special among empirical directed graphs, in that they have a lower incidence of loops than that found in typical directed graphs. Early work indicated (or even assumed) that a food web is loop-free; but this has been shown not to be strictly true. (A simple, but real example of a loop is cannibalism: A eats A; but much longer loops also exist.) The field faces (as many do) a real problem in getting good empirical data, and empirically obtained food webs tend to be small. For example, Table 1 of Dunne presents data for 16 empirical food webs, ranging in size from 25 nodes to 172.

583

584

Complex Networks and Graph Theory

Our second biological application of directed graphs is presented by Huang and Kauffman ( Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination). This article presents gene regulatory networks (GRN). The directed link (G1 ! G2) in a GRN expresses the fact that gene G1 regulates (by inhibition or activation) the expression of gene G2, via intermediate proteins (typically transcription factors). The links are thus inherently one way, although reciprocal links (G1 $ G2) do occur. The article of Huang and Kauffman is very comprehensive: the structure of a GRN is only their starting point, as they seek to understand and model the dynamical development process, in which the set of on/off states of the cell genome moves towards an attractor (a steady or quasi-steady state), which represents a mature and stable cell type. Returning to structure (of GRNs), we again face a severe data-extraction problem, which is compounded by the fact that the ‘interaction modality’ (activation or inhibition, plus or minus) of each link must also be known before one can study dynamical development of gene expression over the GRN. In short: one needs to know more than just the presence and direction of the links; one needs their type. Huang and Kauffman give a thorough discussion of these problems, and argue for studies using an “ensemble approach”. This is much like the random graph approach, in that it takes a set of statistical constraints which are empirically determined, and then studies (often via simulation) a randomly generated ensemble of graphs which satisfy the statistical constraints. In short, the ensemble approach takes the structure as determined by these constraints (node degree distribution, etc), and then studies random graphs with this structure. Note that both the graph topology and the interaction modalities of the links are randomized in this ensemble approach. Dynamical Network Structures We have already had a lot to say about the structure of networks—and yet we have left out one important dimension, namely time. Empirical networks change over time, so that most measurements that map out such a network are taking a snapshot. Now we will look explicitly at studies addressing the dynamical evolution of networks. One classic study is the paper by Barabasi and Albert [5]. Here the preferential-attachment (or “rich get richer”) model was introduced as an explanation for the ubiquitous power-law degree distributions. In other words, a growth (developmental) model was used to explain properties of snapshots of “mature” networks. In the preferential-attachment model, new nodes which join

a network link to existing nodes with a biased probability distribution—so that the probability of linking to an existing node of degree k is proportional to k. This simple model indeed gives (after long time) a power-law distribution; and the ideas in [5] stimulated a great interest in various growth models for networks. We are fortunate to have, in this Section of the Encyclopedia, the article by Sergey Dorogovtsev entitled  Growth Models for Networks. This article is of course very short compared to the volume [9]—but it offers a good overview of the field, and a thorough updating of that volume, covering a broad range of questions and ideas, including the simple linear preferential attachment model, and numerous variations of it. Also, a distinctly different class of growth models, termed “optimization based models”, is presented, and compared to the class of models involving forms of preferential attachment. In optimization based models, new nodes place links so as to optimize some function of the resulting network properties. We also mention, in the context of growth models, the article by Fan Chung ( Random Graphs, A Whirlwind Tour of). As noted above, she has pointed out a tendency for biological networks to have significantly smaller exponents than technological networks; and she includes a good discussion of both preferential attachment (which tends to give the larger, technological, exponents) and duplication models. The latter involve new nodes “copying” or duplicating (probabilistically) the links of existing nodes. Chung observes that such duplication mechanisms do exist in biology, and also shows that they can give exponents in the appropriate range. Huang and Kauffman ( Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination) also offer a limited discussion of evolution of the genome—not so much as a topic in itself, but because (again) understanding genome evolution can help in understanding—and even determining—the structure of today’s genome. They mention both duplication and preferential attachment. Also, it is quite interesting to note the parallels between their discussion and that of Valverde and Solé ( Motifs in Graphs)—in that Huang and Kauffman also argue that many observed structures may be due, not to selection pressure or enhanced functionality, but simply to physical constraints on the evolutionary process, and/or to historical accident. The duplication mechanism in fact turns up again in the article by Valverde and Solé, who argue that this growth mechanism can largely account for the high frequency of occurrence of some motifs. Finally, we note that growth is not the only dynamical process operative in networks. Just as the mature body is constantly shedding and regenerating cells, many mature

Complex Networks and Graph Theory

networks are subject to constant small topology changes. One interesting class of non-growth dynamics is the attack. That is: how robust is a given network against a systematic attack which deletes nodes and/or links? A classic study in this direction is [3], which studied the attack tolerance of scale-free networks. There are many reasons to believe that such networks are very well connected, and this study added more: attacking random nodes had little effect on the functionality of the network (as crudely measured by the network diameter, and by the size of the largest surviving component). Thus such networks may be termed “robust”—but only with regard to this kind of “uninformed” attack. The same study showed that a “smart” attack, removing the nodes in descending order of node degree (i. e., highest degree first), caused the network to break down much more rapidly—thus highlighting once again the crucial role played by the ‘hubs’ in power-law networks. Jennifer Dunne ( Food Webs) reports some studies of the robustness of food webs to attack—where here the ‘attack’ is extinction of a node (species). The dynamics is slightly different from that in the previous paragraph, however, because of the phenomenon of secondary extinction: removal of one species can cause another (or several others) to go extinct as well, if they are dependent on the first for their food supply. Also, food webs are not well modeled by power-law degree distributions. Nevertheless, the cited studies indicate that removing high-degree species again causes considerably more damage (as measured by secondary extinctions and web fragmentation) than removing low-degree species. Dunne also reports studies seeking to simulate the effects of “ecologically plausible” extinction scenarios; here we find that the studied food webs are in fact very robust to such extinction scenarios—a result which (perhaps) confirms our prejudice, that today’s ecosystems are here because they are robust. Dynamical Processes on Networks Our next theme is also about dynamics—not of the network topology, however, but over it. That is, it is often of interest to study processes which occur on the nodes (changing their state), and which transmit something (information) over the links, so that a change in one node’s state induces a change in other nodes’ states. We can render these ideas less abstract by considering the concrete example of epidemic spreading. The nodes can be individuals (or computers, or mobile phones), and the network is then a network of social contacts (or a computer or phone network). The elementary point that the disease is spread via contact is readily captured by the net-

work model. A classic study in this regard, which strongly underscored the value of network models, was that (mentioned earlier) of Pastor-Satorras and Vespignani [23]. Here it was shown that classical threshold thinking about epidemic spreading fails when the network is scale-free: the effective threshold is zero. This study, in yet another way, revealed that such networks are extremely well connected—and it stimulated a great deal of thought about prevention strategies. We have already mentioned the article by Liljeros ( Human Sexual Networks) in this Section, with its evidence for power-law, or nearly power-law, degree distributions in human sexual networks. Liljeros offers a sober and careful discussion of the implications of the theoretical result just cited, for understanding and preventing the spread of sexually transmitted diseases on finite networks. One consequence of a power-law node degree distribution—that a few individuals will have an extremely high number of contacts—may seem at first glance implausible or even impossible; and yet, as Liljeros points out, careful empirical studies have tended to support this prediction. The human sexual network is of course not static; people change partners, and in fact people with many partners tend to change more often. Hence, the dynamics of the network topology must be considered, including how it affects the dynamics of the epidemic spreading process going on over the topology. A term which captures this interplay is “concurrence”—which tells, not how many partners you have, but how many partners you have “simultaneously”—i. e., within a time window which allows the passing of the disease from one contact to another. Considering concurrence leads to another type of graph, termed a ‘line graph’—a structure in which contact relationships become nodes, which are linked when they are concurrent. The reader will not be surprised to hear that epidemic spreading over directed graphs is qualitatively different from the same process over undirected graphs. The topic has real practical interest, because—as pointed out by Kephart and White [16] and by Newman, Forrest and Balthrop [20]—the effective network over which computer viruses propagate is typically directed. We invite the interested reader to consult these sources—and to note the striking similarities between the picture of the ‘email address graph’ in [20] and the gross structure of the Web graph [8] (and Figure 1 of Adamic). Another classic study of dynamical processes on networks is that by Duncan Watts [24,25] on synchronization phenomena over networks. Our Section includes a thorough and up-to-date survey of this problem, primarily from the theoretical side, by Chen et al. ( Synchroniza-

585

586

Complex Networks and Graph Theory

tion Phenomena on Networks). The field is large, as there exists a wide variety of dynamical processes on nodes, and types of inter-node coupling, which may (or may not) lead to synchronization over the network. Chen et al. focus on three main themes which neatly summarize the field. First, there is the synchronization process itself, and the theory which allows one to predict when synchronization will occur. Next comes the question of how the network structure affects the tendency to synchronization. The third theme is a logical followup to the second: if we know something about how structure affects synchronization, can we not find design methods which enhance the tendency to synchronize? This latter theme includes a number of ingenious methods for ‘rewiring’ the network in order to enhance its synchronizability. For example: it is found that a very strong community structure (recall the article by Castellano and Fortunato) will inhibit synchronization; and so one has studied networks called ‘entangled networks’, which are systematically rewired so as to have essentially no community structure, and so are optimally synchronizable. Dunne ( Food Webs) discusses dynamics of species number over food webs. That is, the links telling “who eats who” also mediate transfers of biomass—a simple dynamical process. And yet—as we know from the dynamics of the simple two-species Lotka–Volterra equations [13]—simple nonlinear rules can give rise to complex behavior. Dunne reports that the behavior does not get more simple as one studies larger networks; one finds the same types of asymptotic behavior—equilibrium, limit cycle, and chaotic dynamics. It is a nontrivial task to study nonlinear dynamical models over tens or hundreds of nodes, and at the same time anchor the theory and simulation in reality. One approach to doing this has been to insist that the resulting model satisfy certain stability criteria (hence conforming to reality!); these criteria are framed in terms of ‘species persistence’ (not too many extinctions) and/or ‘population stability’ (limited fluctuations in species mass for all species). A dynamical process also plays a central role in the discussion of Huang and Kauffman ( Complex Gene Regulatory Networks – from Structure to Biological Observables: Cell Fate Determination). In a simplified but highly nontrivial model, the state of the gene network is modeled by a vector S(t), which takes binary values (0 or 1, i. e., ‘off’ or ‘on’) at each node in the network, at each time t. Also, the interactions between genes are modeled by Boolean truth tables—i. e., a given gene’s (binary) output state is some Boolean function of all of its input states. The resulting ‘Boolean network model’ is at once very simple (discrete time, discrete binary states), and yet very com-

plex, in the sense that it is impossible to predict the behavior of such models without simulating them. Huang and Kauffman describe the three regimes of dynamical behavior found: an ‘ordered’ regime, a ‘chaotic’ regime, and an intermediate ‘critical’ regime. While the ordered regime gives stable behavior, Huang and Kauffman argue that biology favors the critical regime. One gets the flavor of their argument if one recalls that the same genome must be able to converge to many different cell types—so its dynamics cannot be too stable—and yet those same cell types must be ‘stable enough’. The discussion of the latter point includes a remarkable recent experimental study which graphically shows the return of two cell populations to the same ‘preferred’ stable attractor, after two distinct perturbations, via two distinct paths in state space. Valverde and Solé ( Motifs in Graphs) also discuss dynamical processes, in terms of motifs. They give a clear picture for the simple test case of the three-node motif called ‘Feed Forward Loop’ or FFL. Here again—as for the GRNs of Huang and Kauffman—the modality (activation/inhibition) of the three links must be considered, and the resulting possibilities are either ‘coherent’ (non-conflicting) or ‘incoherent’. It is interesting to note that the FFL motif has been studied both via simulation, and experimentally, in real gene networks. Valverde and Solé also make contact with the article of Chen et al. ( Synchronization Phenomena on Networks) by briefly discussing the connection between network synchronizability and the distribution of motif types. Graph Visualization The problem of graph visualization is a marvelous mixture of art and science. To produce a good graph visualization is to translate the science—what is known factually and analytically about the graph—into a 2D (or quasi-3D) image that the human brain can appreciate. Here the word ‘appreciate’ can include both understanding and the experience of beauty. Both of these aspects are present in great quantities in the many figures of this Section. We invite the reader interested in visualization to visit and compare the following figures: Figure 2 in Fan Chung’s article; Figures 1 and 3 in Dunne; Figures 2 and 5 in Liljeros; Figure 17 in Chen et al.; and Figures 1, 2, and 6b in Valverde and Solé. Graph visualization is included in the articles of this Section because it is a vital tool in the study of networks, and also an unfinished research challenge. The article by Vladimir Batagelj ( Complex Networks, Visualization of) offers a fine overview of this large field. This article in fact has a very broad scope, touching upon many visualization problems (molecules, Google maps) which may be

Complex Networks and Graph Theory

called “information visualization”. The connection to networks (no pun intended) is that information is readily understood in terms of connections between units—i. e., as a network. We note that one natural way of ‘understanding’ a graph is in terms of its communities; and since community substructure offers a way (perhaps even hierarchical) of ‘coarse-graining’ the network, methods for defining communities often lead naturally to methods for graph visualization. One example of this may be found in Figure 10 of Fortunato and Castellano ( Community Structure in Graphs); for other examples, see [12] and [6]. Batagelj describes a wide variety of graph visualization methods; but a careful reading of his article shows that many methods which are useful for large graphs are dependent on finding and exploiting some kind of community structure. The terms used for approaches in this vein include ‘multilevel algorithms’, ‘clustering’, ‘block modeling’, coarse graining, partitions, and hierarchies. The lessons learned from Fortunato and Castellano are brought home again in the article by Batagelj: there is no ‘magic bullet’ that gives a universally satisfying answer to the problem of visualizing large networks. To quote Batagelj in regard to an early graph visualization: “Nice as a piece of art, but with an important message: there are big problems with visualization of dense and/or large networks.” A more recent example is a 2008 visualization of the Internet: here there is a ‘natural’ unit of coarse graining, namely the autonomous system or AS level (recall the article by He et al. on Internet Topology), so that the network of almost 5 million nodes reduces to ‘only’ about 18 000 ASes. Yet the resulting visualization (Figure 4 of Batagelj) clearly reveals that 18 000 nodes is still ‘large’ relative to human visual processing capacity. For other beautiful and mystifying examples of this same point, I recommend figures 5 and 7 in the same article. The difficulty of visualizing large networks is perhaps most succinctly captured in the outline of Batagelj’s stimulating article. Besides the normal introductory parts, we find only two others: ‘Attempts’ and ‘Perspectives’. Yet the article is by no means discouraging—it is rather fascinating and inspiring, and I encourage the reader to read and enjoy it.

offer a very short and entirely personal assessment of the ‘Future’ for the broad field of complex networks and graph theory. I view the field somewhat as a living organism: it is highly dynamic, and still growing vigorously. New ideas continue to pop up—many of which could not be covered in this Section, simply due to practical limitations. Also, there is a fairly free flow of ideas across disciplines. The reader has perhaps already gotten a feeling for this crossboundary flow, by seeing the same basic ideas crop up in many articles, on problems coming for distinctly different traditional disciplines. In short, I feel that the interdisciplinarity of this field is real, it is vigorous and healthy, and it is exciting. Another aspect contributing to the excellent health of the field is its strong connection to empiricism. Network studies thrive on getting access to real, empirically-obtained graphs—we have seen this over and over again in discussion of the articles in this Section. From the Web graph with its tens of billions of nodes, to food webs with perhaps a hundred nodes, the science of networks is stimulated, challenged, and enriched by a steady influx of new data. Finally, the study of complex networks is eminently practical. The path to direct application can be very short. Again we cite the Web graph and Google’s PageRank algorithm as an example of this. The study of gene networks is somewhat farther from immediate application; but the possible benefits from a real understanding of cell and organism development can be enormous. The same holds for the problem of epidemic spreading. These examples are only picked out to illustrate the point; all of the articles, and topics represented by them, are not far removed from practical application. In short: the field is exciting, vigorous, and interdisciplinary, and offers great practical benefits to society. The study of graphs and networks shows no signs of becoming moribund. Hence I will hazard a guess about the future: that the field will continue to grow and inspire excitement for many years to come. It is my hope that many readers of this Section will be infected by this excitement, and will choose to join in the fun. Bibliography

Future Directions The many articles related to “Complex Networks and Graph Theory” have each offered their own view of ‘Future directions’ for the corresponding field of study. Therefore it would be both redundant and presumptuous for me to attempt the same task for all of these fields. Instead I will

Primary Literature 1. Adamic LA (1999) The Small World Web. In: Proc 3rd European Conf Research and Advanced Technology for Digital Libraries, ECDL, London, pp 443–452 2. Albert R, Jeong H, Barabasi AL (1999) Diameter of World-Wide Web. Nature 410:130–131

587

588

Complex Networks and Graph Theory

3. Albert R, Jeong H, Barabasi AL (2000) Error and attack tolerance of complex networks. Nature 406:378–382 4. Badii R, Politi A (1997) Complexity: Hierarchical Structures and Scaling in Physics. Cambridge Nonlinear Science Series, vol 6. Cambridge University Press, Cambridge 5. Barabasi AL, Albert R (1999) Emergence of Scaling in Random Networks. Science 286:509–512 6. Bjelland J, Canright G, Engø-Monsen K, Remple VP (2008) Topographic Spreading Analysis of an Empirical Sex Workers’ Network. In: Ganguly N, Mukherjee A, Deutsch A (eds) Dynamics on and of Complex Networks. Birkhauser, Basel,also in http://delis.upb.de/paper/DELIS-TR-0634.pdf 7. Brin S, Page L (1998) The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the seventh international conference on World Wide Web, pp 107–117 8. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata S, Tomkins A, Wiener J (2000) Graph structure in the web. Comput Netw 33:309–320 9. Dorogovtsev SN, Mendes JFF (2003) Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford University Press, Oxford ˝ P (1947) Some remarks on the theory of graphs. Bull 10. Erdos Amer Math Soc 53:292–294 ˝ P, Rényi A (1959) On random graphs, I. Publ Math Debre11. Erdos cen 6:290–297 12. Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99:8271–8276 13. Kaplan D, Glass L (1995) Understanding Nonlinear Dynamics. Springer, New York 14. Kauffman S (1969) Homeostasis and differentiation in random genetic control networks. Nature 224:177–8 15. Kauffman SA (1969) Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 22:437–467 16. Kephart JO, White SR (1991) Directed-graph epidemiological models of computer viruses. In: Proceedings of the 1991 IEEE Computer Society Symposium on Research in Security and Privacy, pp 343–359 17. Kleinberg JM (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632 18. Milgram S (1967) The Small World Problem. Psychol Today 2:60–67 19. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon

20. 21. 22.

23. 24.

25.

U (2002) Network Motifs: Simple Building Blocks of Complex Networks. Science 298:824–827 Newman MEJ, Forrest S, Balthrop J (2002) Email networks and the spread of computer viruses. Phys Rev E 66:35–101 Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113 Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: Bringing order to the web. Technical report. Stanford University, Stanford Pastor-Satorras R, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86:3200–3203 Watts DJ (1999) Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton Studies in Complexity. Princeton University Press, Princeton Watts D, Strogatz S (1998) Collective dynamics of ‘small world’ networks. Nature 393:440–442

Books and Reviews Albert R, Barabasi AL (2002) Statistical Mechanics of Complex Networks. Rev Mod Phys 74:47–97 Bornholdt S, Schuster HG (2003) Handbook of Graphs and Networks: From the Genome to the Internet. Wiley-VCH, Berlin Caldarelli G, Vespignani A (2007) Large Scale Structure and Dynamics of Complex Networks: From Information Technology to Finance and Natural Science. Cambridge University Press, Cambridge Chung F, Lu L (2006) Complex Graphs and Networks, CBMS Regional Conference. Series in Mathematics, vol 107. AMS, Providence da F Costa L, Osvaldo N, Oliveira J, Travierso G, Rodrigues FA, Paulino R, Boas V, Antiqueira L, Viana MP, da Rocha LEC Analyzing and Modeling Real-World Phenomena with Complex Networks: A Survey of Applications. Working Paper: http://arxiv. org/abs/0711.3199 Kauffman SA (1993) The origins of order. Oxford University Press, Oxford Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256 Newman M, Barabasi A, Watts DJ (2006) The Structure and Dynamics of Networks. Princeton Studies in Complexity. Princeton University Press, Princeton

Complex Networks, Visualization of

Complex Networks, Visualization of VLADIMIR BATAGELJ University of Ljubljana, Ljubljana, Slovenia Article Outline Glossary Definition of the Subject Introduction Attempts Perspectives Bibliography Glossary For basic notions on graphs and networks, see the articles by Wouter de Nooy:  Social Network Analysis, Graph Theoretical Approaches to and by Vladimir Batagelj:  Social Network Analysis, Large-Scale in the Social Networks Section. For complementary information on graph drawing in social network analysis, see the article by Linton Freeman:  Social Network Visualization, Methods of. k-core A set of vertices in a graph is a k-core if each vertex from the set has an internal (restricted to the set) degree of at least k and the set is maximal – no such vertex can be added to it. Network A network consists of vertices linked by lines and additional data about vertices and/or lines. A network is large if it has at least some hundreds of vertices. Large networks can be stored in computer memory. Partition A partition of a set is a family of its nonempty subsets such that each element of the set belongs to exactly one of the subsets. The subsets are also called classes or groups. Spring embedder is another name for the energy minimization graph drawing method. The vertices are considered as particles with repulsive force between them, and lines as springs that attract or repel the vertices if they are too far or too close, respectively. The algorithm is a means of determining an embedding of vertices in two or three dimensional space that minimizes the ‘energy’ of the system. Definition of the Subject The earliest pictures containing graphs were magic figures, connections between different concepts (for example the Sephirot in Jewish Kabbalah), game boards (nine men’s morris, pachisi, patolli, go, xiangqi, and others) road maps

(for example Roman roads in Tabula Peutingeriana), and genealogical trees of important families [33]. The notion of the graph was introduced by Euler. In the eighteenth and nineteenth centuries, graphs were used mainly for solving recreational problems (Knight’s tour, Eulerian and Hamiltonian problems, map coloring). At the end of the nineteenth century, some applications of graphs to real life problems appeared (electric circuits, Kirchhoff; molecular graphs, Kekulé). In the twentieth century, graph theory evolved into its own field of discrete mathematics with applications to transportation networks (road and railway systems, metro lines, bus lines), project diagrams, flowcharts of computer programs, electronic circuits, molecular graphs, etc. In social science the use of graphs was introduced by Jacob Moreno around 1930 as a basis of his sociometric approach. In his book Who shall survive? [36], a relatively large network Sociometric geography of community – map III (435 individuals, 4350 lines) is presented. Linton Freeman wrote a detailed account of the development of social network analysis [20] and the visualization of social networks [19]. The networks studied in social network analysis until the 1990s were mostly small – some tens of vertices. Introduction Through the 1980s, the development of information technology (IT) laid the groundwork for the emerging field of computer graphics. During this time, the first algorithms for graph drawing appeared:  Trees: Wetherell and Shannon [45].  Acyclic graphs: Sugiyama [42].  Energy minimization methods (spring embedders) for general graphs: Eades [17], Kamada and Kawai [30], Fruchterman and Reingold [21]. In energy minimization methods, vertices are considered as particles with repulsive force between them and lines as springs that attract or repel the vertices if they are too far or too close, respectively. The algorithms provide a means of determining an embedding of vertices in two or three dimensional space that minimizes the ‘energy’ of the system. As early as 1963, William Tutte proposed an algorithm for drawing planar graphs [43] and Donald E. Knuth put forth an algorithm for drawing flowcharts [31]. A well known example of an early graph visualization was produced by Alden Klovdahl using his program View_Net – see Fig. 1. As nice a piece of art as it was, it held an important message: there are big problems with the visualization of dense, large graphs.

589

590

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 1 Klovdahl: Social links in Canberra, Australia

These developments led to the emergence of a new field: graph drawing. In 1992 a group of computer scientists and mathematicians (Giuseppe Di Battista, Peter Eades, Michael Kaufmann, Pierre Rosenstiehl, Kozo Sugiyama, Roberto Tamassia, Ioannis Tollis, and others) started the conference International Symposium on Graph Drawing which takes place each year. The proceedings of the conference are published in the Lecture Notes in Computer Science series by Springer [24]. To stimulate new approaches to graph drawing, a graph drawing contest accompanies each conference. Many papers on graph drawing are published in the Journal of Graph Algorithms and Applications [79]. Most of the efforts of the graph drawing community were spent on problems of drawing special types of graphs (trees, acyclic graphs, planar graphs) or using special styles (straight lines, orthogonal, grid-based, circular, hierarchical) and deriving bounds on required space (area) for selected types of drawing. In the 1990s, further development of IT (GUI, multimedia, WWW) made large graph analysis a reality. For example, see the studies of large organic molecules (PDB [51]), the Internet (Caida [52]), and genealogies (White and Jorion [46], FamilySearch [60]). In chemistry, several tools for dynamic three-dimensional visualization and inspection of molecules were developed (Kinemage [48], Rasmol [87], MDL Chime [82]). One of the earliest systems for large networks was SemNet (see Fairchild, Poltrock, Furnas [18]), used to explore knowledge bases represented as directed graphs. In 1991 Tom Sawyer Software [91] was founded – the premier provider of high performance graph visualization, layout, and analysis systems that enable the user to see and interpret complex information to make better decisions.

Complex Networks, Visualization of, Figure 2 Network of traceroute paths for 29 June 1999

In 1993 at AT&T, the development of GraphViz tools for graph visualization began (dot, neato, dotty, tcldot, libgraph) [69]. Becker, Eick and Wilks developed a SeeXYZ family of network visualization programs [8]. In 1996 Vladimir Batagelj and Andrej Mrvar started the development of Pajek – a program for large network analysis and visualization [5]. In 1997 at La Sepienza, Rome, the development of GDToolkit [63] started as an extension of LEDA (Library of Efficient Data types and Algorithms) to provide implementations of several classical graph-drawing algorithms. The new version, GDT 4.0 (2007), produced in collaboration with University of Perugia, is LEDA independent. Graham Wills, a principal investigator at Bell Labs (1992–2001), built the Nicheworks system for the visual analysis of very large weighted network graphs (up to a million vertices). In the summer of 1998 Bill Cheswick and Hal Burch started work on the Internet Mapping Project at Bell Labs [53]. Its goal was to acquire and save Internet topological data over a long period of time. This data has been used in the study of routing problems and changes, distributed denial of service (DDoS) attacks, and graph theory. In the fall of 2000 Cheswick and Burch moved to a spin-off from Lucent/Bell Labs named Lumeta Corporation. Bill Cheswick is now back at AT&T Labs. Figure 2 shows a network obtained from traceroute paths for 29 June 1999 with nearly 100 000 vertices. In the years 1997–2004, Martin Dodge maintained his Cybergeography Research web pages [58]. The results were published in the book The Atlas of Cyberspace [14]. A newer, very rich, site on information visualization is Visual complexity [95], where many interesting ideas on

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 3 FAS: The scientific field of Austria

network visualizations can be found. These examples, and many others, can also be accessed from the Infovis 1100+ examples of information visualization site [77]. An interesting collection of graph/network visualizations can be found also on the CDs of Gerhard Dirmoser. The collection also contains many artistic examples and other pictures not produced by computers. In 1997 Harald Katzmair founded FAS research in Vienna, Austria [61], a company providing network analysis services. FAS emphasizes the importance of nice-looking final products (pictures) for customers by using graphical tools to enhance the visual quality of results obtained from network analysis tools. In Fig. 3, a network of Austrian research projects is presented. A similar company, Aguidel [49], was founded in France by Andrei Mogoutov, author of the program Réseau-Lu. Every year (from 2002) at the INSNA Sunbelt conference [78], the Viszards group has a special session in which they present their solutions – analysis and visualizations of selected networks or types of networks. Most of the selected networks are large (KEDS, Internet Movie Data Base, Wikipedia, Web of Science). Attempts The new millennium has seen several attempts to develop programs for drawing large graphs and networks. Most

of the following descriptions are taken verbatim from the programs’ web pages. Stephen Kobourov, with his collaborators from the University of Arizona, developed two graph drawing systems: GRIP (2000) [22,70] and Graphael (2003) [66]. GRIP – Graph dRawing with Intelligent Placement was designed for drawing large graphs and uses a multi-dimensional force-directed method together with fast energy function minimization. It employs a simple recursive coarsening scheme – rather than being placed at random, vertices are placed intelligently, several at a time, at locations close to their final positions. The Cooperative Association for Internet Data Analysis (CAIDA) [52], co-founded in 1998 by kc claffy, is an independent research group dedicated to investigating both the practical and theoretical aspects of the Internet to promote the engineering and maintenance of a robust, scalable, global Internet infrastructure. They have been focusing primarily on understanding how the Internet is evolving, and on developing a state-of-the-art infrastructure for data measurement that can be shared with the entire research community. Figure 4 represents a macroscopic snapshot of the Internet for two weeks: 1–17 January 2008. The graph reflects 4 853 991 observed IPv4 addresses and 5 682 419 IP links. The network is aggregated into a topology of

591

592

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 4 CAIDA: AS core 2008

Autonomous Systems (ASes). The abstracted graph consists of 17 791 ASes (vertices) and 50 333 peering sessions (lines). Walrus is a tool for interactively visualizing large directed graphs in three-dimensional space. It is best suited to visualizing moderately sized graphs (a few hundred thousand vertices) that are nearly trees. Walrus uses threedimensional hyperbolic geometry to display graphs under a fisheye-like magnifier. By bringing different parts of a graph to the magnified central region, the user can examine every part of the graph in detail. Walrus was developed by Young Hyun at CAIDA based on research by Tamara Munzner. Figure 5 presents two examples of visualizations produced with Walrus. Some promising algorithms for drawing large graphs have been proposed by Ulrik Brandes, Tim Dwyer, Emden Gansner, Stefan Hachul, David Harel, Michael Jünger, Yehuda Koren, Andreas Noack, Stephen North, Christian Pich, and Chris Walshaw [11,16,23,25,26,32,39,44]. They are based either on a multilevel energy minimization approach or on an algebraic or spectral approach that reduces to some application of eigenvectors. The multilevel approach speeds-up the algorithms. Multilevel algorithms are based on two phases: a coarsening phase, in which a sequence of coarse graphs with decreasing sizes is computed, and a refinement phase,

in which successively finer drawings of graphs are computed, using the drawings of the next coarser graphs and a variant of a suitable force-directed single-level algorithm [25]. The fastest algorithms combine the multilevel approach with fast approximation of long range repulsive force using nested data structures, such as quadtree or kdtree. Katy Börner from Indiana University, with her collaborators, produced several visualizations of scientometric networks such as Backbone of Science [9] and Wikipedia [73]. They use different visual cues to produce information-rich visualizations – see Fig. 6. She also commissioned the Map of Science based on data (800 000 published papers) from Thomson ISI and produced by Kevin Boyack, Richard Klavans and Bradford Paley [9]. Yifan Hu from AT&T Labs Information Visualization Group developed a multilevel graph drawing algorithm for visualization of large graphs [29]. The algorithm was first implemented in 2004 in Mathematica and released in 2005. For demonstration he applied it to the University of Florida Sparse Matrix collection [56] that contains over 1500 square matrices. The results are available in the Gallery of Large Graphs [74]. The largest graph (vanHeukelum/cage15) has 5 154 859 vertices and 47 022 346 edges. In Fig. 7, selected pictures from the gallery are presented.

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 5 Walrus

Complex Networks, Visualization of, Figure 6 Katy Börner: Text analysis

593

594

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 7 Examples from the Gallery of Large Graphs

From the examples that we have given, we can see that, in some cases, graph drawing algorithms can reveal symmetries in a given graph and also a ‘structure’ ((sub)trees, clusters, planarity, etc.). Challenges remain in devising ways to represent graphs with dense parts. For dense parts, a better approach is to display them using matrix representation. This representation was used in 1999 by Vladimir Batagelj, Andrej Mrvar and Matjaž Zaveršnik in their partitioning approach to visualization

of large graphs [7] and is a basis of systems such as Matrix Zoom by James Abello and Frank van Ham, 2004 [1,2], and MatrixExplorer by Nathalie Henry and Jean-Daniel Fekete, 2006 [27]. A matrix representation is determined by an ordering of vertices. Several algorithms exist that can produce such orderings. A comparative study of them was published by Chris Mueller [37,84]. In Fig. 8 three orderings of the same matrix are presented. Most ordering algorithms were originally designed for applications in nu-

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 8 Matrix representations

merical, rather than data, analysis. The orderings can also be determined using clustering or blockmodeling methods [15]. An important type of networks are temporal networks, where the presence of vertices and lines changes through time. Visualization of such networks requires special approaches (Sonia [88], SVGanim [90], TecFlow [75]). An interesting approach to visualization of temporal networks was developed by Ulrik Brandes and his group [12]. Perspectives In this section, we present a collection of ideas on how to approach visualization of large networks. These ideas are only partially implemented in different visualization solutions. While the technical problems of graph drawing strive for a single ‘best’ picture, network analysis is also a part of data analysis. Its goal is to gain insight not only into the structure and characteristics of a given network, but also into how this structure influences processes going on over the network. We usually need several pictures to present the obtained results. Small graphs can be presented in their totality and in detail within a single view. In a comprehensive view of large graphs, details become lost – conversely a detailed view can encompass only a part of a large graph. The literature on graph drawing is dominated by the ‘sheet of paper’ paradigm – the solutions and techniques are mainly based on the assumption that the final result is a static picture on a sheet of paper. In this model, to present a large data set we need a large ‘sheet of paper’ – but this has a limit. Figure 9 presents a visualization of a symmetrized subnetwork of 5952 words and 18 008 associations from the Edinburgh Associative Thesaurus [59] prepared by Vladimir Batagelj on a 3 m  5 m ‘sheet of paper’ for Ars Electronica, Linz 2004 Language of networks exhibition.

The main tool for dealing with large objects is abstraction. In graphs, abstraction is usually realized using a hierarchy of partitions. By shrinking selected classes of a partition we obtain a smaller reduced graph. The main operations related to abstraction are:  Cut-out: Display of only selected parts (classes of partition) of a graph;  Context: Display details of selected parts (classes) of a graph and display the rest of the graph in some reduced form;  Model: Display the reduced graph with respect to a given partition;  Hierarchy: Display the tree representing the nesting of graph partitions. In larger, denser networks there is often too much information to be presented at once. A possible answer is an interactive layout on a computer screen where the user controls what (s)he wants to see. The computer screen is a medium which offers many new possibilities: parallel views (global and local); brushing and linking; zooming and panning; temporary elements (additional information about the selected elements, labels, legends, markers, etc.); highlighted selections; and others. These features can and should be maximally leveraged to support data analytic tasks; or repeating Shneiderman’s mantra: overview first, zoom and filter, then details on-demand (extended with: Relate, history and extract) [40]. When interactively inspecting very large graphs, a serious problem appears: how does one avoid the “lost within the forest” effect? There are several solutions that can help the user maintain orientation:  Restart option: Returns the user to the starting point;  Introduction of additional orientation elements: Allows elements to be switched on and off.  Multiview: Presents at least two views (windows):

595

596

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 9 Big picture, V. Batagelj, AE’04

– Map view: Shows an overall global view which contains the current position and allows ‘long’ moves (jumps). For very large graphs, a map view can be combined with zooming or fish-eye views. – Local view: Displays a selected portion of the graph. Additional support can be achieved by implementing trace, backtrack, and replay mechanisms and guided tours. An interactive dynamic visualization of a graph on the computer screen need not be displayed in its totality. Inspecting a visualization, the user can select which parts and elements will be displayed and in what way. See, for example, TouchGraph [92]. Closely related to the multiview concept are the associated concepts of glasses, lenses and zooming. Glasses affect the entire window, while lenses affect only selected region or elements. By selecting different glasses, we can obtain different views on the same data supporting different visualization aims. For example, in Fig. 10 four different glasses (ball and stick, space-fill, backbone, ribbons) were applied in the program Rasmol to the molecule 1atn.pdb (deoxyribonuclease I complex with actin). Another example of glasses is presented in Fig. 11. The two pictures were produced by James Moody [35]. The graph pictured was obtained by applying spring em-

bedders. It represents the friendships among students in a school. The glasses are the coloring of its vertices by different partitions: an age partition (left picture) and a race partition (right picture). This gives us an explanation of the four groups in the resulting graph picture, characterized by younger/older and white/black students. Figure 12 shows a part of the big picture presented in Fig. 9. The glasses in this case are based on ordering the edges in increasing order of their values and drawing them in this order – stronger edges cover the weaker. The picture emphasizes the strongest substructures; the remaining elements form a background. There are many kinds of glasses in representation of graphs, for example, fish-eye views, matrix representation, using application field conventions (genealogies, molecules, electric circuits, SBGN), displaying vertices only, selecting the type of labels (long/short name, value), displaying only the important vertices and/or lines, size of vertices determined by core number or “betweennes”. An example of lens is presented in Fig. 13 – contributions of companies to various presidential candidates from Follow the Oil Money by Greg Michalec and Skye Bender-deMoll [62]. When a vertex is selected, information about that vertex is displayed. Another possible use of a lens would be to temporarily enhance the display of neighbors of a selected vertex [94] or to display their labels.

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 10 Glasses: Rasmol displays – BallStick, SpaceFill, Backbone, Ribbons

Complex Networks, Visualization of, Figure 11 Glasses: Display of properties – school

597

598

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 12 Part of the big picture

Complex Networks, Visualization of, Figure 13 Lenses: Temporary info about the selected vertex

The “shaking” option used in Pajek to visually identify all vertices from a selected cluster is also a kind of lens; so are the matrix representations of selected clusters in NodeTrix [72]. Additional enhancement of a presentation can be achieved by the use of support elements such as labels, grids, legends, and various forms of help facilities.

An important concept connected with zooming is the level of detail, or LOD – subobjects are displayed differently depending on the zoom depth. A nice example of a combination of these techniques is the Google Maps service [65] – see Fig. 14. It combines zooming, glasses (Map, Satellite, Terrain), navigation (left, right, up, down) and lenses (info about points). The maps

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 14 Zoom, glasses, lenses, navigation: Google Maps

Complex Networks, Visualization of, Figure 15 Zoom, glasses, lenses, navigation: Grokker

at different zoom levels provide information at different levels of detail and in different forms. A similar approach could be used for inspection of a large graph or network by examining selected hierarchical clusterings of its vertices. To produce higher level ‘maps,’ different methods can be used: k-core representation [4], density contours [83], generalized blockmodeling [15], clustering [71] (Fig. 15), preserving only im-

portant vertices and lines, etc. In visualizing ‘maps,’ new graphical elements (many of them still to be invented) can be used (see [13,80], p. 223 in [15]) to preserve or to indicate information about structures at lower levels. The k-core representation [4,81] is based on k-core decomposition of a network [6,7] and was developed by Alessandro Vespignani and his collaborators [54]. Figure 16 shows a portion of the web at the .fr domain with

599

600

Complex Networks, Visualization of

modules can be developed and combined with different layout algorithms. Some useful ideas can be found in the nViZn (“envision”) system [89]. To specify layouts we can borrow from typesetting the notion of style. Bibliography Primary Literature

Complex Networks, Visualization of, Figure 16 k-core structure of a portion of the web at the .fr domain

1 million pages. Each node represents a web page and each edge is a hyperlink between two pages. Density contours were introduced by James Moody in 2006. First, a spring embedder layout of a (valued) network is determined. Next, vertices and lines are removed and replaced by density contours. Figure 17 shows this process applied to the case of a social science co-citation network. The left side shows the network layout and the right bottom part presents the corresponding density contours. The basic steps in graph/network visualization are: graph/network !

analysis !

! layouts

viewer

! pictures :

Development of different tools can be based on this scheme, depending on the kind of users (simple, advanced) and the tasks they address (reporting, learning, monitoring, exploration, analysis). In some cases, a simple viewer will be sufficient (for example SVG viewer, X3D viewer, or a special graph layout viewer), in others a complete network analysis system is needed (such as Geomi [3,64], ILOG [76], Pajek [50], Tulip [93], yFiles [96]). To visualize a network, layouts are obtained by augmenting network data with results of analysis and users’ decisions. In Pajek’s input format, there are several layout elements from Pajek’s predecessors (see Pajek’s manual, pp. 69–73 in [50]). As in typesetting text C formatting D formatted text so in network visualization network C layout D picture : It would be useful to define a common layout format (an extension of GraphML [68]?) so that independent viewer

1. Abello J, van Ham F (2004) Matrix zoom: A visual interface to semi-external graphs. IEEE Symposium on Information Visualization, October 10–12 2004, Austin, Texas, USA, pp 183–190 2. Abello J, van Ham F, Krishnan N (2006) ASK-GraphView: A large scale graph visualization system. IEEE Trans Vis Comput Graph 12(5):669–676 3. Ahmed A, Dwyer T, Forster M, Fu X, Ho J, Hong S, Koschützki D, Murray C, Nikolov N, Taib R, Tarassov A, Xu K (2006) GEOMI: GEOmetry for maximum insight. In: Healy P, Eades P (eds) Proc 13th Int Symp Graph Drawing (GD2005). Lecture Notes in Computer Science, vol 3843. Springer, Berlin, pp 468–479 4. Alvarez-Hamelin JI, DallAsta L, Barrat A, Vespignani A (2005) Large scale networks fingerprinting and visualization using the k-core decomposition. In: Advances in neural information processing systems 18, Neural Information Processing Systems, NIPS 2005, December 5–8, 2005, Vancouver, British Columbia, Canada 5. Batagelj V, Mrvar A (2003) Pajek – analysis and visualization of large networks. In: Jünger M, Mutzel P (eds) Graph drawing software. Springer, Berlin, pp 77–103 6. Batagelj V, Zaveršnik M (2002) Generalized cores. arxiv cs.DS/0202039 7. Batagelj V, Mrvar A, Zaveršnik M (1999) Partitioning approach to visualization of large graphs. In: Kratochvíl J (ed) Lecture notes in computer science, vol 1731. Springer, Berlin, pp 90–97 8. Becker RA, Eick SG, Wilks AR (1995) Visualizing network data. IEEE Trans Vis Comput Graph 1(1):16–28 9. Boyack KW, Klavans R, Börner K (2005) Mapping the backbone of science. Scientometrics 64(3):351–374 10. Boyack KW, Klavans R, Paley WB (2006) Map of science. Nature 444:985 11. Brandes U, Pich C (2007) Eigensolver methods for progressive multidimensional scaling of large data. In: Proc 14th Intl Symp Graph Drawing (GD ’06). Lecture notes in computer science, vol 4372. Springer, Berlin, pp 42–53 12. Brandes U, Fleischer D, Lerner J (2006) Summarizing dynamic bipolar conflict structures. IEEE Trans Vis Comput Graph (special issue on Visual Analytics) 12(6):1486–1499 13. Dickerson M, Eppstein D, Goodrich MT, Meng J (2005) Confluent drawings: Visualizing non-planar diagrams in a planar way. J Graph Algorithms Appl (special issue for GD’03) 9(1):31–52 14. Dodge M, Kitchin R (2001) The atlas of cyberspace. Pearson Education, Addison Wesley, New York 15. Doreian P, Batagelj V, Ferligoj A (2005) Generalized blockmodeling. Cambridge University Press, Cambridge 16. Dwyer T, Koren Y (2005) DIG-COLA: Directed graph layout through constrained energy minimization. INFOVIS 2005:9 17. Eades P (1984) A heuristic for graph drawing. Congressus Numerantium 42:149–160

Complex Networks, Visualization of

Complex Networks, Visualization of, Figure 17 Density structure

18. Fairchild KM, Poltrock SE, Furnas GW (1988) SemNet: Threedimensional representations of large knowledge bases. In: Guindon R (ed) Cognitive science and its applications for human-computer interaction. Lawrence Erlbaum, Hillsdale, pp 201–233 19. Freeman LC (2000) Visualizing social networks. J Soc Struct 1(1). http://wwww.cmu.edu/joss/content/articles/volume1/ Freeman/ 20. Freeman LC (2004) The development of social network analysis: A study in the sociology of science. Empirical, Vancouver 21. Fruchterman T, Reingold E (1991) Graph drawing by force directed placement. Softw Pract Exp 21(11):1129–1164

22. Gajer P, Kobourov S (2001) GRIP: Graph drawing with intelligent placement. Graph Drawing 2000 LNCS, vol 1984:222–228 23. Gansner ER, Koren Y, North SC (2005) Topological fisheye views for visualizing large graphs. IEEE Trans Vis Comput Graph 11(4):457–468 24. Graph Drawing. Lecture Notes in Computer Science, vol 894 (1994), 1027 (1995), 1190 (1996), 1353 (1997), 1547 (1998), 1731 (1999), 1984 (2000), 2265 (2001), 2528 (2002), 2912 (2003), 3383 (2004), 3843 (2005), 4372 (2006), 4875 (2007). Springer, Berlin 25. Hachul S, Jünger M (2007) Large-graph layout algorithms at work: An experimental study. JGAA 11(2):345–369

601

602

Complex Networks, Visualization of

26. Harel D, Koren Y (2004) Graph drawing by high-dimensional embedding. J Graph Algorithms Appl 8(2):195–214 27. Henry N, Fekete J-D (2006) MatrixExplorer: A dual-representation system to explore social networks. IEEE Trans Vis Comput Graph 12(5):677–684 28. Herman I, Melancon G, Marshall MS (2000) Graph visualization and navigation in information visualization: A survey. IEEE Trans Vis Comput Graph 6(1):24–43 29. Hu YF (2005) Efficient and high quality force-directed graph drawing. Math J 10:37–71 30. Kamada T, Kawai S (1988) An algorithm for drawing general undirected graphs. Inf Proc Lett 31:7–15 31. Knuth DE (1963) Computer-drawn flowcharts. Commun ACM 6(9):555–563 32. Koren Y (2003) On spectral graph drawing. COCOON 2003: 496–508 33. Kruja E, Marks J, Blair A, Waters R (2001) A short note on the history of graph drawing. In: Proc Graph Drawing 2001. Lecture notes in computer science, vol 2265. Springer, Berlin, pp 272–286 34. Lamping J, Rao R, Pirolli P (1995) A focus+context technique based on hyperbolic geometry for visualizing large hierarchies. CHI 95:401–408 35. Moody J (2001) Race, school integration, and friendship segregation in America. Am J Soc 107(3):679–716 36. Moreno JL (1953) Who shall survive? Beacon, New York 37. Mueller C, Martin B, Lumsdaine A (2007) A comparison of vertex ordering algorithms for large graph visualization. APVIS 2007, pp 141–148 38. Munzner T (1997) H3: Laying out large directed graphs in 3D hyperbolic space. In: Proceedings of the 1997 IEEE Symposium on Information Visualization, 20–21 October 1997, Phoenix, AZ, pp 2–10 39. Noack A (2007) Energy models for graph clustering. J Graph Algorithms Appl 11(2):453–480 40. Shneiderman B (1996) The eyes have it: A task by data type taxonomy for information visualization. In: IEEE Conference on Visual Languages (VL’96). IEEE CS Press, Boulder 41. Shneiderman B, Aris A (2006) Network visualization by semantic substrates. IEEE Trans Vis Comput Graph 12(5):733–740 42. Sugiyama K, Tagawa S, Toda M (1981) Methods for visual understanding of hierarchical systems. IEEE Trans Syst, Man, Cybern 11(2):109–125 43. Tutte WT (1963) How to draw a graph. Proc London Math Soc s3-13(1):743–767 44. Walshaw C (2003) A multilevel algorithm for force-directed graph-drawing. J Graph Algorithms Appl 7(3):253–285 45. Wetherell C, Shannon A (1979) Tidy drawing of trees. IEEE Trans Softw Engin 5:514–520 46. White DR, Jorion P (1992) Representing and computing kinship: A new approach. Curr Anthr 33(4):454–463 47. Wills GJ (1999) NicheWorks-interactive visualization of very large graphs. J Comput Graph Stat 8(2):190–212

51. 52.

53. 54. 55.

56.

57.

58.

59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72.

73.

Web Resources

74.

48. 3D Macromolecule analysis and Kinemage home page: http:// kinemage.biochem.duke.edu/. Accessed March 2008 49. Aguidel: http://www.aguidel.com/en/. Accessed March 2008 50. Batagelj V, Mrvar A (1996) Pajek – program for analysis and visualization of large network: http://pajek.imfm.si. Accessed

75.

March 2008. Data sets: http://vlado.fmf.uni-lj.si/pub/networks/ data/. Accessed March 2008 Brookhaven Protein Data Bank: http://www.rcsb.org/pdb/. Accessed March 2008 Caida: http://www.caida.org/home/. Accessed March 2008. Walrus gallery: http://www.caida.org/tools/visualization/ walrus/gallery1/. Accessed March 2008 Cheswick B: Internet mapping project – map gallery: http:// www.cheswick.com/ches/map/gallery/. Accessed March 2008 Complex Networks Collaboratory: http://cxnets.googlepages. com/. Accessed March 2008 Cruz I, Tamassia R (1994) Tutorial on graph drawing. http:// graphdrawing.org/literature/gd-constraints.pdf. Accessed March 2008 Davis T: University of Florida Sparse Matrix Collection: http:// www.cise.ufl.edu/research/sparse/matrices. Accessed March 2008 Di Battista G, Eades P, Tamassia R, Tollis IG (1994) Algorithms for drawing graphs: An annotated bibliography. Comput Geom: Theory Appl 4:235–282. http://graphdrawing.org/ literature/gdbiblio.pdf. Accessed March 2008 Dodge M: Cyber-Geography Research: http://personalpages. manchester.ac.uk/staff/m.dodge/cybergeography/. Accessed March 2008 Edinburgh Associative Thesaurus (EAT): http://www.eat.rl.ac. uk/. Accessed March 2008 FamilySearch: http://www.familysearch.org/. Accessed March 2008 FASresearch, Vienna, Austria: http://www.fas.at/. Accessed March 2008 Follow the Oil Money: http://oilmoney.priceofoil.org/. Accessed March 2008 GDToolkit – Graph Drawing Toolkit: http://www.dia.uniroma3. it/~gdt/gdt4/index.php. Accessed March 2008 GEOMI (Geometry for Maximum Insight): http://www.cs.usyd. edu.au/~visual/valacon/geomi/. Accessed March 2008 Google Maps: http://maps.google.com/. Accessed March 2008 Graphael: http://graphael.cs.arizona.edu/. Accessed March 2008 Graphdrawing home page: http://graphdrawing.org/. Accessed March 2008 GraphML File Format: http://graphml.graphdrawing.org/. Accessed March 2008 Graphviz: http://graphviz.org/. Accessed March 2008 GRIP: http://www.cs.arizona.edu/~kobourov/GRIP/. Accessed March 2008 Grokker – Enterprise Search Management and Content Integration: http://www.grokker.com/. Accessed March 2008 Henry N, Fekete J-D, Mcguffin M (2007) NodeTrix: Hybrid representation for analyzing social networks: https://hal.inria.fr/ inria-00144496. Accessed March 2008 Herr BW, Holloway T, Börner K (2007) Emergent mosaic of wikipedian activity: http://www.scimaps.org/dev/big_thumb. php?map_id=158. Accessed March 2008 Hu YF: Gallery of Large Graphs: http://www.research.att.com/ ~yifanhu/GALLERY/GRAPHS/index1.html. Accessed March 2008 iCKN: TeCFlow – a temporal communication flow visualizer for social network analysis: http://www.ickn.org/. Accessed March 2008

Complex Networks, Visualization of

76. ILOG Diagrams: http://www.ilog.com/. Accessed March 2008 77. Infovis – 1100+ examples of information visualization: http://www.infovis.info/index.php? cmd=search&words=graph&mode=normal. Accessed March 2008 78. INSNA – International Network for Social Network Analysis: http://www.insna.org/. Accessed March 2008 79. Journal of Graph Algorithms and Applications: http://jgaa. info/. Accessed March 2008 80. KartOO visual meta search engine: http://www.kartoo.com/. Accessed March 2008 81. LaNet-vi – Large Network visualization tool: http://xavier. informatics.indiana.edu/lanet-vi/. Accessed March 2008 82. MDL Chime: http://www.mdli.com/. Accessed March 2008 83. Moody J (2007) The network structure of sociological production II: http://www.soc.duke.edu/~jmoody77/presentations/ soc_Struc_II.ppt. Accessed March 2008 84. Mueller C: Matrix visualizations: http://www.osl.iu.edu/~chemuell/data/ordering/sparse.html. Accessed March 2008 85. OLIVE, On-line library of information visualization environments: http://otal.umd.edu/Olive/. Accessed March 2008 86. Pad++: Zoomable user interfaces: Portal filtering and ‘magic lenses’: http://www.cs.umd.edu/projects/hcil/pad++/tour/ lenses.html. Accessed March 2008 87. RasMol Home Page: http://www.umass.edu/microbio/rasmol/ index2.htm. Accessed March 2008 88. Sonia – Social Network Image Animator: http://www.stanford. edu/group/sonia/. Accessed March 2008 89. SPSS nViZn: http://www.spss.com/research/wilkinson/nViZn/ nvizn.html. Accessed March 2008 90. SVGanim: http://vlado.fmf.uni-lj.si/pub/networks/pajek/SVGanim. Accessed March 2008 91. Tom Sawyer Software: http://www.tomsawyer.com/home/ index.php. Accessed March 2008

92. TouchGraph: http://www.touchgraph.com/. Accessed March 2008 93. Tulip: http://www.labri.fr/perso/auber/projects/tulip/. Accessed March 2008 94. Viégas FB, Wattenberg M (2007) Many Eyes: http://services. alphaworks.ibm.com/manyeyes/page/Network_Diagram. html. Accessed March 2008 95. Visual complexity: http://www.visualcomplexity.com/vc/. Accessed March 2008 96. yWorks/yFiles: http://www.yworks.com/en/products_yfiles_ about.htm. Accessed March 2008

Books and Reviews Bertin J (1967) Sémiologie graphique. Les diagrammes, les réseaux, les cartes. Mouton/Gauthier-Villars, Paris/La Haye Brandes U, Erlebach T (eds) (2005) Network analysis: Methodological foundations. LNCS. Springer, Berlin Carrington PJ, Scott J, Wasserman S (eds) (2005) Models and methods in social network analysis. Cambridge University Press, Cambridge de Nooy W, Mrvar A, Batagelj V (2005) Exploratory social network analysis with Pajek. Cambridge University Press, Cambridge di Battista G, Eades P, Tamassia R, Tollis IG (1999) Graph drawing: Algorithms for the visualization of graphs. Prentice Hall, Englewood Cliffs Jünger M, Mutzel P (eds) (2003) Graph drawing software. Springer, Berlin Kaufmann M, Wagner D (2001) Drawing graphs, methods and models. Springer, Berlin Tufte ER (1983) The visual display of quantitative information. Graphics, Cheshire Wasserman S, Faust K (1994) Social network analysis: Methods and applications. Cambridge University Press, Cambridge Wilkinson L (2000) The grammar of graphics. Statistics and Computing. Springer, Berlin

603

604

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in BRIAN MAC N AMEE School of Computing, Dublin Institute of Technology, Dublin, Ireland Article Outline Glossary Definition of the Subject Introduction Agent-Based Modelling in Computer Graphics Agent-Based Modelling in CGI for Movies Agent-Based Modelling in Games Future Directions Bibliography Glossary Computer generated imagery (CGI) The use of computer generated images for special effects purposes in film production. Intelligent agent A hardware or (more usually) softwarebased computer system that enjoys the properties autonomy, social ability, reactivity and pro-activeness. Non-player character (NPC) A computer controlled character in a computer game – as opposed to a player controlled character. Virtual character A computer generated character that populates a virtual world. Virtual world A computer generated world in which places, objects and people are represented as graphical (typically three dimensional) models. Definition of the Subject As the graphics technology used to create virtual worlds has improved in recent years, more and more importance has been placed on the behavior of virtual characters in applications such as games, movies and simulations set in these virtual worlds simulations. The behavior of these virtual characters should be believable in order to create the illusion that virtual worlds are populated with living characters. This has led to the application of agent-based modeling to the control of virtual characters. There are a number of advantages of using agent-based modeling techniques which include the fact that they remove the requirement for hand controlling all agents in a virtual environment, and allow agents in games to respond to unexpected actions by players or users.

Introduction Advances in computer graphics technology in recent years have allowed the creation of realistic and believable virtual worlds. However, as such virtual worlds have been developed for applications spanning games, education and movies it has become apparent that in order to achieve real believability, virtual worlds must be populated with life-like virtual characters. This is where the application of agent-based modeling has found a niche in the areas of computer graphics and, in a huge way, computer games. Agent-based modeling is a perfect solution to the problem of controlling the behaviors of the virtual characters that populate a virtual world. In fact, because virtual characters are embodied and autonomous these applications require an even stronger notion of agency than many other areas in which agent-based modeling is employed. Before proceeding any further, and because there are so many competing alternatives, it is worth explicitly stating the definition of an intelligent agent that will inform the remainder of this article. Taken from [83] an intelligent agent is defined as “. . . a hardware or (more usually) software-based computer system that enjoys the following properties:  autonomy: agents operate without the direct intervention of humans or others, and have some kind of control over their actions and internal state;  social ability: agents interact with other agents (and possibly humans) via some kind of agent-communication language;  reactivity: agents perceive their environment, (which may be the physical world, a user via a graphical user interface, a collection of other agents, the INTERNET, or perhaps all of these combined), and respond in a timely fashion to changes that occur in it;  pro-activeness: agents do not simply act in response to their environment, they are able to exhibit goal-directed behavior by taking the initiative.” Virtual characters implemented using agent-based modeling techniques satisfy all of these properties. The characters that populate virtual worlds should be fully autonomous and drive their own behaviors (albeit sometimes following the orders of a director or player). Virtual characters should be able to interact believably with other characters and human participants. This property is particularly strong in the case of virtual characters used in games which by their nature are particularly interactive. It is also imperative that virtual characters appear to perceive their environments and react to events that occur in that environment, especially the actions of other

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 1 The three rules used by Reynolds’ original Boids system to simulate flocking behaviors

characters or human participants. Finally virtual characters should be pro-active in their behaviors and not always require prompting from a human participant in order to take action. The remainder of this article will proceed as follows. Firstly, a broad overview of the use of agent-based modeling in computer graphics will be given, focusing in particular on the genesis of the field. Following on from this, the focus will switch to the use of agent-based modeling techniques in two particular application areas: computer generated imagery (CGI) for movies, and computer games. CGI has been used to astounding effect in movies for decades, and in recent times has become heavily reliant on agent-based modeling techniques in order to generate CGI scenes containing large numbers of computer generated extras. Computer games developers have also been using agent-based modeling techniques effectively for some time now for the control of non-player characters (NPCs) in games. There is a particularly fine match between the requirements of computer games and agent-based modeling due to the high levels of interactivity required. Finally, the article will conclude with some suggestions for the future directions in which agent-based modeling technology in computer graphics and games is expected to move. Agent-Based Modelling in Computer Graphics The serious use of agent-based modeling in computer graphics first arose in the creation of autonomous groups and crowds – for example, crowds of people in a town square or hotel foyer, or flocks of birds in an outdoor scene. While initially this work was driven by visually unappealing simulation applications such as fire safety testing for buildings [75], focus soon turned to the creation of visually realistic and believable crowds for applications such as movies, games and architectural walk-throughs.

Computer graphics researchers realized that creating scenes featuring large virtual crowds by hand (a task that was becoming important for the applications already mentioned) was laborious and time-consuming and that agent-based modeling techniques could remove some of the animator’s burden. Rather than requiring that animators hand-craft all of the movements of a crowd, agentbased systems could be created in which each character in a crowd (or flock, or swarm) could drive its own behavior. In this way the behavior of a crowd would emerge from the individual actions of the members of that crowd. Two of the earliest, and seminal, examples of such systems are Craig Reynolds’ Boids system [64] and Tu & Terzopoulos’ animations of virtual fish [76]. The Boids system simulates the flocking behaviors exhibited in nature by schools of fish, or flocks of birds. The system was first presented at the prestigious SIGGRAPH conference (www.siggraph.org) in 1987 and was accompanied by the short movie “Stanley and Stella in: Breaking the Ice”. Taking influence from the area of artificial life (or aLife) [52], Reynolds postulated that the individual members of a flock would not be capable of complex reasoning, and so flocking behavior must emerge from simple decisions made by individual flock members. This notion of emergent behavior is one of the key characteristics of aLife systems. In the original Boids system, each virtual agent (represented as a simple particle and known as a boid) used just three rules to control its movement. These were separation, alignment and cohesion and are illustrated in Fig. 1. Based on just these three simple rules extremely realistic flocking behaviors emerged. This freed animators from the laborious task of hand-scripting the behavior of each creature within the flock and perfectly demonstrates the advantage offered by agent-based modeling techniques for this kind of application. The system created by Tu and Terzopoulos took a more complex approach in that they created complex

605

606

Computer Graphics and Games, Agent Based Modeling in

models of biological fish. Their models took into account fish physiology, with a complex model of fish muscular structure, along with a perceptual model of fish vision. Using these they created sophisticated simulations in which properties such as schooling and predator avoidance were displayed. The advantage of this approach was that it was possible to create unique, unscripted, realistic simulations without the intervention of human animators. Terzopoulos has since gone on to apply similar techniques to the control of virtual humans [68]. Moving from animals to crowds of virtual humans, the Virtual Reality Lab at the Ecole Polytechnique Fédérale de Lausanne in Switzerland (vrlab.epfl.ch) led by Daniel Thalmann has been at the forefront of this work for many years. They group currently has a highly evolved system, VICrowd, for the animation of virtual crowds [62] which they model as a hierarchy which moves from individuals to groups to crowds. This hierarchy is used to avoid some of the complications which arise from trying to model large crowds in real time – one of the key gaols of ViCrowd. Each of the levels in the ViCrowd hierarchy can be modeled as an agent and this is done based on beliefs, desires and intentions. The beliefs of an agent represent the information that the agent possesses about the world, including information about places, objects and other agents. An agent’s desires represent the motivations of the agent regarding objectives it would like to achieve. Finally, the intentions of an agent represent the actions that an agent has chosen to pursue. The belief-desire-intention (BDI) model of agency was proposed by Rao and Georgeff [61] and has been used in many other application areas of agent-based modeling. ViCrowd has been used in ambitious applications including the simulation of a virtual city comprised of, amongst other things, a train station a park and a theater [22]. In all of these environments the system was capable of driving the believable behaviors of large groups of characters in real-time. It should be apparent to readers from the examples given thus far that the use of agent-based modeling techniques to control virtual characters gives rise to a range of unique requirements when compared to the use of agent based modeling in other application areas. The key to understanding these is to realize that the goal in designing agents for the control of virtual characters is typically not to design the most efficient or effective agent, but rather to design the most interesting or believable character. Outside of very practical applications such as evacuation simulations, when creating virtual characters, designers are concerned with maintaining what Disney, experts in this field, refer to as the illusion of life [36].

This refers to the fact that the user of a system must believe that virtual characters are living, breathing creatures with goals, beliefs, desires, and, essentially, lives of their own. Thus, it is not so important for a virtual human to always choose the most efficient or cost effective option available to it, but rather to always choose reasonable actions and respond realistically to the success or failure of these actions. With this in mind, and following a similar discussion given in [32], some of the foremost researchers in virtual character research have the following to say about the requirements of agents as virtual characters. Loyall writes [46] that “Believable agents are personality-rich autonomous agents with the powerful properties of characters from the arts.” Coming from a dramatic background it is not surprising that Loyall’s requirements reflect this. Agents should have strong personality and be capable of showing emotion and engaging in meaningful social relationships. According to Blumberg [11], “. . . an autonomous animated creature is an animated object capable of goal-directed and time-varying behavior”. The work of Blumberg and his group is very much concerned with virtual creatures, rather than humans in particular, and his requirements reflect this. Creatures must appear to make choices which improve their situation and display sophisticated and individualistic movements. Hayes–Roth and Doyle focus on the differences between “animate characters” and traditional agents [27]. With this in mind they indicate that agents’ behaviors must be “variable rather than reliable”, “idiosyncratic instead of predictable”, “appropriate rather than correct”, “effective instead of complete”, “interesting rather than efficient”, and “distinctively individual as opposed to optimal”. Perlin and Goldberg [59] concern themselves with building believable characters “that respond to users and to each other in real-time, with consistent personalities, properly changing moods and without mechanical repetition, while always maintaining an author’s goals and intentions”. Finally, in characterizing believable agents, Bates [7] is quite forgiving requiring “only that they not be clearly stupid or unreal”. Such broad, shallow agents must “exhibit some signs of internal goals, reactivity, emotion, natural language ability, and knowledge of agents . . . as well as of the . . . micro-world”. Considering these definitions, Isbister and Doyle [32] identify the fact that the consistent themes which run through all of the requirements given above match the general goals of agency – virtual humans must display autonomy, reactivity, goal driven behavior and social abil-

Computer Graphics and Games, Agent Based Modeling in

ity – and again support the use of agent-based modeling to drive the behavior of virtual characters. The Spectrum of Agents The differences between the systems mentioned in the previous discussion are captured particularly well on the spectrum of agents presented by Aylett and Luck [5]. This positions agent systems on a spectrum based on their capabilities, and serves as a useful tool in differentiating between the various systems available. One end of this spectrum focuses on physical agents which are mainly concerned with simulation of believable physical behavior, (including sophisticated physiological models of muscle and skeleton systems), and of sensory systems. Interesting work at this end of the spectrum includes Terzopoulos’ highly realistic simulation of fish [76] and his virtual stuntman project [21] which creates virtual actors capable of realistically synthesizing a broad repertoire of lifelike motor skills. Cognitive agents inhabit the other end of the agent spectrum and are mainly concerned with issues such as reasoning, decision making, planning and learning. Systems at this end of the spectrum include Funge’s cognitive modeling approach [26] which uses the situation calculus to control the behavior of virtual characters, and Nareyek’s work on planning agents for simulation [55], both of which will be described later in this article. While the systems mentioned so far sit comfortably at either end of the agent spectrum, many of the most effective inhabit the middle ground. Amongst these are c4 [13], used to great effect to simulate a virtual sheep dog with the ability to learn new behaviors, Improv [59] which augments sophisticated physical human animation with scripted behaviors and the ViCrowd system [62] which sits on top of a realistic virtual human animation system and uses planning to control agents’ behavior. Virtual Fidelity The fact that so many different agent-based modeling systems, for the control of virtual humans exist gives rise to the question why? The answer to this lies in the notion of virtual fidelity, as described by Badler [6]. Virtual fidelity refers to the fact that virtual reality systems need only remain true to actual reality in so much as this is required by, and improves, the system. In [47] the point is illustrated extremely effectively. The article explains that when game designers are architecting the environments in which games are set, the scale to which these environments are created is not kept true to reality. Rather, to ease players’ movement in these worlds,

areas are designed to a much larger scale, compared to character sizes, than in the real world. However, game players do not notice this digression from reality, and in fact have a negative response to environments that are designed to be more true to life finding them cramped. This is a perfect example of how, although designers stay true to reality for many aspects of environment design, the particular blend of virtual fidelity required by an application can dictate certain real world restrictions can be ignored in virtual worlds. With regard to virtual characters, virtual fidelity dictates that the set of capabilities which these characters should display is determined by the application which they are to inhabit. So, the requirements of an agent-based modeling system for CGI in movies would be very different to those of a agent-based modeling system for controlling the behaviors of game characters. Agent-Based Modelling in CGI for Movies With the success of agent-based modeling techniques in graphics firmly established there was something of a search for application areas to which they could be applied. Fortunately, the success of agent-based modeling techniques in computer graphics was paralleled with an increase in the use of CGI in the movie industry, which offered the perfect opportunity. In many cases CGI techniques were being used to replace traditional methods for creating expensive, or difficult to film scenes. In particular, scenes involving large numbers of people or animals were deemed no longer financially viable when set in the real world. Creating these scenes using CGI involved painstaking hand animation of each character within a scene, which again was not financially viable. The solution that agent-based modeling offers is to make each character within a scene an intelligent agent that drives its own behavior. In this way, as long as the initial situation is set up correctly scenes will play out without the intervention of animators. The facts that animating for movies does not need to be performed in real-time, and is in no way interactive (there are no human users involved in the scene), make the use of agent-based modeling a particularly fine match for this application area. Craig Reynolds’ Boids system [64] which simulates the flocking behaviors exhibited in nature by schools of fish, or flocks of birds and was discussed previously is one of the seminal examples of agent-based modeling techniques being used in movie CGI. Reynold’s approach was first used for CGI in the 1999 film “Batman Returns” [14] to simulate colonies of bats. Reynold’s technologies have been used in “The Lion King” [4] and “From Dusk ‘Till Dawn” [65]

607

608

Computer Graphics and Games, Agent Based Modeling in

amongst other films. Reynolds’ approach was so successful, in fact, that he was awarded an Academy Award for his work in 1998. Similar techniques to those utilized in the Boids system have been used in many other films to animate such diverse characters as ants, people and stampeding wildebeest. Two productions which were released in the same year, “Antz” [17] by Dreamworks and “A Bug’s Life” [44] by Pixar took great steps in using CGI effects to animate large crowds for. For “Antz” systems were developed which allowed animators easily create scenes containing large numbers of virtual characters modeling each as an intelligent agent capable of obstacle avoidance, flocking and other behaviors. Similarly, the creators of “A Bug’s Life” created tools which allowed animators easily combine pre-defined motions (known as alibis) to create behaviors which could easily be applied to individual agents in scenes composed of hundreds of virtual characters. However, the largest jump in the use of agent-based modeling in movie CGI was made in the recent Lord of the Rings trilogy [33,34,35]. In these films the bar was raised markedly in terms of the sophistication of the virtual characters displayed and the sheer number of characters populating each scene. To achieve the special effects shots required by the makers of these films, the Massive software system was developed by Massive Software (www.massivesoftware.com). This system [2,39] uses agent-based modeling techniques, again inspired by aLife, to create virtual extras that control their own behaviors. This system was put to particularly good use in the large scale battle sequences that feature in all three of the Lord of the Rings films. Some of the sequences in the final film of the trilogy, the Return of the King, contain over 200,000 digital characters. In order to create a large battle scene using the Massive software, each virtual extra is represented as an intelligent agent, making its own decisions about which actions it will perform based on its perceptions of the world around it. Agent control is achieved through the use of fuzzy logic based controllers in which the state of an agent’s brain is represented as a series of motivations, and knowledge it has about the world – such as the state of the terrain it finds itself on, what kinds of other agents are around it and what these other agents are doing. This knowledge about the world is perceived through simple simulated visual, auditory and tactile senses. Based on the information they perceive agents decide on a best course of action. Designing the brains of these agents is made easier that it might seem at first by the fact that agents are developed for short sequences, and so a small range of possible tasks. So for ex-

ample, separate agent models would be used for a fighting scene and a celebration scene. In order to create a large crowd scene using Massive animators initially set up an environment populating it with an appropriate cast of virtual characters where the brains of each character are slight variations (based on physical and personality attributes) of a small number of archetypes. The scene will then play itself out with each character making it’s own decisions. Therefore there is no need for any hand animation of virtual characters. However, directors can view the created scenes and by tweaking the parameters of the brains of the virtual characters have a scene play out in the exact way that they require. Since being used to such impressive effect in the Lord of the Rings trilogy (the developers of the Massive system were awarded an academy award for their work), the Massive software system has been used in numerous other films such as “I, Robot” [60], “The Chronicles of Narnia: The Lion, the Witch and the Wardrobe” [1] and “Ratatouille” [10] along with numerous television commercials and music videos. While the achievements of using agent-based modeling for movie CGI are extremely impressive, it is worth noting that none of these systems run in real-time. Rather, scenes are rendered by banks of high powered computers, a process that can take hours for relatively simple scenes. For example, the famous Prologue battle sequence in the “Lord of the Rings: The Fellowship of the Ring” took a week to render. When agent-based modeling is applied to the real-time world of computer games, things are very different. Agent-Based Modelling in Games Even more so than in movies, agent-based modeling techniques have been used to drive the behaviors of virtual characters in computer games. As games have become graphically more realistic (and in recent years they have become extremely so) game-players have come to expect that games are set in hugely realistic and believable virtual worlds. This is particularly evident in the widespread use of realistic physics modeling which is now commonplace in games [67]. In games that make strong use of physics modeling, objects in the game world topple over when pushed, float realistically when dropped in water and generally respond as one would expect them to. Players expect the same to be true of the virtual characters that populate virtual game worlds. This can be best achieved by modeling virtual characters as embodied virtual agents. However, there are a number of constraints which have a major

Computer Graphics and Games, Agent Based Modeling in

influence on the use of agent-based modeling techniques in games. The first of these constraints stems from the fact that modern games are so highly interactive. Players expect to be able to interact with all of the characters they encounter within a game world. These interactions can be as simple as having something to shoot at or having someone to race against; or involve much more sophisticated interactions in which a player is expected to converse with a virtual character to find out specific information or to cooperate with a virtual character in order to accomplish some task that is key to the plot of a game. Interactivity raises a massive challenge for practitioners as there is very little restriction in terms of what the player might do. Virtual characters should respond in a believable way at all times regardless of how bizarre and unexpected the actions of the player might be. The second challenge comes from the fact that the vast majority of video games should run in real time. This means that the computational complexity must be kept to a reasonable level as there are only a finite number of processor cycles available for AI processing. This problem is magnified by the fact that an enormous amount of CPU power it usually dedicated to graphics processing. When compared to the techniques that can be used for controlling virtual characters in films some of the techniques used in games are rudimentary due to this real-time constraint. Finally, modern games resemble films in the fact that creators go to great lengths to include intricate storylines and control the building of tension in much the way that film script writers do. This means that games are tested heavily in order to ensure that the game proceeds smoothly and that the level of difficulty is finely tuned so as to always hold the interest of a player. In fact, this testing of games has become something of a science in itself [77]. Using autonomous agents gives game characters the ability to do things that are unexpected by the game designers and so upset their well laid plans. This can often be a barrier to the use of sophisticated techniques such as learning. Unfortunately there is also a barrier to the discussion of agent-based modeling techniques used in commercial games. Because of the very competitive nature of the games industry, game development houses often consider the details of how their games work as valuable trade secrets to be kept well guarded. This can make it difficult to uncover the details of how particularly interesting features of a game are implemented. While this situation is improving – more commercial game developers are speaking at games conferences about how their games are developed and the release of game systems development kits for the

development of game modifications (or mods) allows researchers to plumb the depths of game code – it is still often impossible to find out the implementation details of very new games. Game Genres Before discussing the use of agent-based modeling in games any further, it is worth making a short clarification on the kinds of computer games that this article refers to. When discussing modern computer games, or video games, this article does not refer to computer implementations of traditional games such as chess, backgammon or card games such as solitaire. Although these games are of considerable research interest (chess in particular has been the subject of extremely successful research [23]) they are typically not approached using agent-based modeling techniques. Typically, artificial intelligence approaches to games such as these rely largely on sophisticated searching techniques which allow the computer player to search through a multitude of possible future situations dictated by the moves it will make and the moves it expects its opponent to make in response. Based on this search, and some clever heuristics that indicate what constitutes a good game position for the computer player, the best sequence of moves can be chosen. This searching technique relies on the fact that there are usually a relatively small number of moves that a player can make at any one time in a game. However, the fact that the ancient Chinese game of Go-Moku has not, to date, been mastered by computer players [80] illustrates the restrictions of such techniques. The common thread linking together the kinds of games that this article focuses on is that they all contain computer controlled virtual characters that possess a strong notion of agency. Efforts are often made to separate the many different kinds of modern video games that are the focus of this article into a small set of descriptive genres. Unfortunately, much like in music, film and literature, no categorization can hope to perfectly capture the nuances of all of the available titles. However, a brief mention of some of the more important game genres is worth while (a more detailed description of game genres, and artificial intelligence requirements of each is given in [41]). The most popular game genre is without doubt the action game in which the player must defeat waves of demented foes, typically (for increasingly bizarre motivations) bent upon global destruction. Illustrative examples of the genre include Half-Life 2 (www.half-life2.com) and the Halo series (www.halo3.com). A screenshot of the upcoming action game Rogue Warrior (www.bethsoft.com) is shown in Fig. 2.

609

610

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 2 A screenshot of the upcoming action game Rogue Warrior from Bethesda Softworks (image courtesy of Bethesda Softworks)

Strategy games allow players to control large armies in battle with other people, or computer controlled opponents. Players do not have direct control over their armies, but rather issue orders which are carried out by agent-based artificial soldiers. Well regarded examples of the genre include the Age of Empires (www. ageofempires.com) and Command & Conquer (www. commandandconquer.com) series. Role playing games (such as the Elder Scrolls (www. elderscrolls.com) series) place game players in expansive virtual worlds across which they must embark on fantastical quests which typically involve a mixture of solving puzzles, fighting opponents and interacting with non-player characters in order to gain information. Figure 3 shows a screenshot of the aforementioned role-playing game The Elder Scrolls IV: Oblivion. Almost every sport imaginable has at this stage been turned into a computer based sports game. The challenges in developing these games are creating computer controlled opponents and team mates that play the games at a level suitable to the human player. Interesting examples include FIFA Soccer 08 (www.fifa08.ea.com) and Forza Motorsport 2 (www.forzamotorsport.net). Finally, many people expected that the rise of massively multi-player online games (MMOGs), in which hundreds of human players can play together in an online world, would sound the death knell for the use of virtual non-player characters in games. Examples of MMOGs include World of Warcraft (www.worldofwarcraft.com) and Battlefield 2142 (www.battlefield.ea.com). However, this has not turned out to be the case as there are still large

numbers of single player games being produced and even MMOGs need computer controlled characters for roles that players do not wish to play. Of course there are many games that simply do not fit into any of these categorizations, but that are still relevant for a discussion of the use of agent-based techniques – for example The Sims (www.thesims.ea.com) and the Microsoft Flight Simulator series (www.microsoft.com/ games/flightsimulatorx). However the categorization still serves to introduce those unfamiliar with the subject to the kinds of games up for discussion. Implementing Agent-Based Modelling Techniques in Games One of the earliest examples of using agent-based modeling techniques in video games was its application to path planning. The ability of non-player characters (NPCs) to manoeuvre around a game world is one of the most basic competencies required in games. While in very early games it was sufficient to have NPCs move along prescripted paths, this soon become unacceptable. Games programmers soon began to turn to AI techniques which might be applied to solve some of the problems that were arising. The A* path planning algorithm [74] was the first example of such a technique to find wide-spread use in the games industry. Using the A* algorithm NPCs can be given the ability to find their own way around an environment. This was put to particularly fine effect early on in real-time strategy games where the units controlled by players are semi-autonomous and are given orders rather

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 3 A screenshot from Bethesda Softwork’s role playing game The Elder Scrolls IV: Oblivion (image courtesy of Bethesda Softworks)

than directly controlled. In order to use the A* algorithm a game world must be divided into a series of cells each of which is given a rating in terms of the effort that must be expended to cross it. The A* algorithm then performs a search across these cells in order to find the shortest path that will take a game agent from a start position to a goal. Since becoming widely understood amongst the game development community many interesting additions have been made to the basic A* algorithm. It was not long before three dimensional versions of the algorithm became commonly used [71]. The basic notion of storing the energy required to cross a cell within a game world has also been extended to augment cells with a wide range of other useful information (such as the level of danger in crossing a cell) that can be used in the search process [63]. The next advance in the kind of techniques being used to achieve agent-based modeling in games was the finite state machine (FSM) [30]. An FSM is a simple system in which a finite number of states are connected in a directed graph by transitions between these states. When used for the control of NPCs, the nodes of an FSM indicate the possible actions within a game world that an agent can perform. Transitions indicate how changes in the state of the game world or the character’s own attributes (such as health, tiredness etc) can move the agent from one state to another.

Computer Graphics and Games, Agent Based Modeling in, Figure 4 A simple finite state machine for a soldier NPC in an action game

Figure 4 shows a sample FSM for the control of an NPC in a typical action game. In this example the behaviors of the character are determined by just four states – CHASE , ATTACK, FLEE and EXPLORE . Each of these states provides an action that the agent should take. For example, when in the EXPLORE state the character should wan-

611

612

Computer Graphics and Games, Agent Based Modeling in

der randomly around the world, or while in the FLEE state the character should determine a direction to move in that will take it away from its current enemy and move in that direction. The links between the states show how the behaviors of the character should move between the various available states. So, for example, if while in the ATTACK state the agent’s health measure becomes low, they will move to the FLEE state and run away from their enemy. FSMs are widely used because they are so simple, well understood and extremely efficient both in terms of processing cycles required and memory usage. There have also been a number of highly successful augmentations to the basic state machine model to make them more effective, such as the introduction of layers of parallel state machines [3], the use of fuzzy logic in finite state machines [19] and the implementation of cooperative group behaviors through state machines [72]. The action game Halo 2 is recognized as having a particularly good implementation of state machine based NPC control [79]. At any time an agent could be in any one of the four states Idle, Guard/Patrol, Attack/Defend, and Retreat. Within each of these states a set of rules was used in order to select from a small set of appropriate actions for that state – for example a number of different ways to attack the player. The decisions made by NPCs were influenced by a number of character attributes including strength, speed and cowardliness. Transition between states was triggered by perceptions made by characters simulated senses of vision and hearing and by internal attributes such as health. The system implemented also allowed for group behaviors allowing NPCs to hold conversations and cooperate to drive vehicles. However, FSMs are not without their drawbacks. When designing FSMs developers must envisage every possible situation that might confront an NPC over the course of a game. While this is quite possible for many games, if NPCs are required to move between many different situations this task can become overwhelming. Similarly, as more and more states are added to an FSM designing the links between these states can become a mammoth undertaking. From [31] the definition of rule based systems states that they are “. . . comprised of a database of associated rules. Rules are conditional program statements with consequent actions that are performed if the specified conditions are satisfied”. Rule based systems have been applied extensively to control NPCs in games [16], in particular for the control of NPCs in role-playing games. NPCs behaviors are scripted using a set of rules which typically indicate how an NPC should respond to particular events within the game world. Borrowed from [82], the listing be-

low shows a snippet of the rules used to control a warrior character in the RPG Baldur’s Gate (www.bioware.com). IF

// If my nearest enemy is not within 3 !Range(NearestEnemyOf(Myself),3) // and is within 8 Range(NearestEnemyOf(Myself),8) THEN // 1/3 of the time RESPONSE #40 // Equip my best melee weapon EquipMostDamagingMelee() // and attack my nearest enemy, checking every 60 // ticks to make sure he is still the nearest AttackReevalutate(NearestEnemyOf (Myself),60) // 2/3 of the time RESPONSE #60 // Equip a ranged weapon EquipRanged() // and attack my nearest enemy, checking every 30 // ticks to make sure he is still the nearest AttackReevalutate(NearestEnemyOf (Myself), 30) The implementation of an NPC using a rule-based system would consist of a large set of such rules, a small set of which would fire based on the conditions in the world at any given time. Rule based systems are favored by game developers as they are relatively simple to use and can be exhaustively tested. Rule based systems also have the advantage that rule sets can be written using simple proprietary scripting systems [9], rather than full programming languages, making them easy to implement. Development companies have also gone so far as to make these scripting languages available to the general public, enabling them to author there own rule sets. Rule based systems, however, are not without their drawbacks. Authoring extensive rule sets is not a trivial task, and they are usually restricted to simple situations. Also, rule based systems can be restrictive in that they don’t allow sophisticated interplay between NPCs motivations, and require that rule set authors foresee every situation that the NPC might find itself in. Some of the disadvantages of simple rule based systems can be alleviated by using more sophisticated inference engines. One example uses Dempster Schafer theory [43] which allows rules to be evaluated by combining multiple sources of (often incomplete) evidence to determine actions. This goes some way towards supporting the use of rule based systems in situations where complete knowledge is not available. ALife techniques have also been applied extensively in the control of game NPCs, as much as a philosophy as any

Computer Graphics and Games, Agent Based Modeling in

particular techniques. The outstanding example of this is The Sims (thesims.ea.com) a surprise hit of 2000 which has gone on to become the best selling PC game of all time. Created by games guru Will Wright the Sims puts the player in control of the lives of a virtual family in their virtual home. Inspired by aLife, the characters in the game have a set of motivations, such as hunger, fatigue and boredom and seek out items within the game world that can satisfy these desires. Virtual characters also develop sophisticated social relationships with each other based on common interest, attraction and the amount of time spent together. The original system in the Sims has gone on to be improved in the sequel The Sims 2 and a series of expansion packs. Some of the more interesting work in developing techniques for the control of game characters (particularly in action games) has been focused on developing interesting sensing and memory models for game characters. Players expect when playing action games that computer controlled opponents should suffer from the same problems that players do when perceiving the world. So, for example, computer controlled characters should not be able to see through walls or from one floor to the next. Similarly, though, players expect computer controlled characters to be capable of perceiving events that occur in a world and so NPCs should respond appropriately to sound events or on seeing the player. One particularly fine example of a sensing model was in the game Thief: The Dark Project where players are required to sneak around an environment without alerting guards to their presence [45]. The developers produced a relatively sophisticated sensing model that was used by non-player characters which modeled visual effects such as not being able to see the player if they were in shadows, and moving some way towards modeling acoustics so that non-player characters could respond reasonably to sound events. 2004’s Fable (fable.lionhead.com) took the idea of adding memory to a game to new heights. In this adventure game the player took on the role of a hero from boyhood to manhood. However, every action the player took had an impact on the way in which the game world’s population would react to him or her as they would remember every action the next time they met the player. This notion of long-term consequences added an extra layer of believability to the game-playing experience. Serious Games & Academia It will probably have become apparent to most readers of the previous section that much of the work done in implementing agent-based techniques for the control of NPCs

in commercial games is relatively simplistic when compared to the application of these techniques in other areas of more academic focus, such as robotics [54]. The reasons for this have been discussed already and briefly relate to the lack of available processing resources and the requirements of commercial quality control. However, a large amount of very interesting work is taking place in the application of agent-based technologies in academic research, and in particular the field of serious games. This section will begin by introducing the area of serious games and then go on to discuss interesting academic projects looking at agent-based technologies in games. The term serious games [53] refers to games designed to do more than just entertain. Rather, serious games, while having many features in common with conventional games, have ulterior motives such as teaching, training, and marketing. Although games have been used for ends apart from entertainment, in particular education, for a long time, the modern serious games movement is set apart from these by the level of sophistication of the games it creates. The current generation of serious games is comparable with main-stream games in terms of the quality of production and sophistication of their design. Serious games offer particularly interesting opportunities for the use of agent-based modeling techniques due to the facts that they often do not have to live up to the rigorous testing of commercial games, can have the requirement of specialized hardware rather than being restricted to commercial games hardware and often, by the nature of their application domains, require more in-depth interactions between players and NPCs. The modern serious games movement can be said to have begun with the release of America’s Army (www. americasarmy.com) in 2002 [57]. Inspired by the realism of commercial games such as the Rainbow 6 series (www.rainbow6.com), the United States military developed America’s Army and released it free of charge in order to give potential recruits a flavor of army life. The game was hugely successful and is still being used today as both a recruitment tool and as an internal army training tool. Spurred on by the success of America’s Army the serious games movement began to grow, particularly within academia. A number of conferences sprung up and notably the Serious Games Summit became a part of the influential Game Developer’s Conference (www.gdconf. com) in 2004. Some other notable offerings in the serious games field include Food Force (www.food-force.com) [18], a game developed by the United Nations World Food Programme in order to promote awareness of the issues surrounding emergency food aid; Hazmat Hotzone [15], a game devel-

613

614

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 5 A screenshot of Serious Gordon, a serious game developed to aid in the teaching of food safety in kitchens

oped by the Entertainment Technology Centre at Carnegie Mellon University to train fire-fighters to deal with chemical and hazardous materials emergencies; Yourself!Fitness (www.yourselffitness.com) [53] an interactive virtual personal trainer developed for modern games consoles; and Serious Gordon (www.seriousgames.ie) [50] a game developed to aid in teaching food safety in kitchens. A screen shot of Serious Gordon is shown in Fig. 5. Over the past decade, interest in academic research that is directly focused on artificial intelligence, and in particular agent-based modelling techniques and their application to games (as opposed to the general virtual character/computer graphics work discussed previously) has grown dramatically. One of the first major academic research projects into the area of Game-AI was led by John Laird at the University of Michigan, in the United States. The SOAR architecture was developed in the early nineteen eighties in an attempt to “develop and apply a unified theory of human and artificial intelligence” [66]. SOAR is essentially a rule based inference system which takes the current state of a problem and matches this to production rules which lead to actions. After initial applications into the kind of simple puzzle worlds which characterized early AI research [42], the SOAR architecture was applied to the task of control-

ling computer generated forces [37]. This work lead to an obvious transfer to the new research area of gameAI [40]. Initially the work of Laird’s group focused on applying the SOAR architecture to the task of controlling NPC opponents in the action game Quake (www.idsoftware. com) [40]. This proved quite successful leading to opponents which could successfully play against human players, and even begin to plan based on anticipation of what the player was about to do. More recently Laird’s group have focused on the development of a game which requires more involved interactions between the player and the NPCs. Named Haunt 2, this game casts the player in the role of a ghost that must attempt to influence the actions of a group of computer controlled characters inhabiting the ghost’s haunted house [51]. The main issue that arises with the use the SOAR architecture is that it is enormously resource hungry, with the NPC controllers running on a separate machine to the actual game. At Trinity College in Dublin in Ireland, the author of this article worked on an intelligent agent architecture, the Proactive Persistent Agent (PPA) architecture, for the control of background characters (or support characters) in character-centric games (games that focus on character interactions rather than action, e. g. role-playing

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 6 Screenshots of the PPA system simulating parts of a college

games) [48,49]. The key contributions of this work were that it made possible the creation of NPCs that were capable of behaving believably in a wide range of situations and allowed for the creation of game environments which it appeared had an existence beyond their interactions with players. Agent behaviors in this work were based on models of personality, emotion, relationships to other characters and behavioral models that changed according to the current role of an agent. This system was used to develop a stand alone game and as part of a simulation of areas within Trinity College. A screenshot of this second application is shown in Fig. 6. At Northwestern University in Chicago the Interactive Entertainment group has also applied approaches from more traditional research areas to the problems facing game-AI. Ian Horswill has led a team that are attempting to use architectures traditionally associated with robotics for the control of NPCs. In [29] Horswill and Zubek consider how perfectly matched the behavior based architectures often used in robotics are with the requirements of NPC control architectures. The group have demonstrated some of their ideas in a test-bed environment built on top of the game Half-Life [38]. The group also looks at issues around character interaction [85] and the many psychological issues associated with creating virtual characters asking how we can create virtual game agents that display

all of the foibles that make us relate to characters in human stories [28]. Within the same research group a team led by Ken Forbus have extended research previously undertaken in conjunction with the military [24] and applied it to the problem of terrain analysis in computer strategy games [25]. Their goal is to create strategic opponents which are capable of performing sophisticated reasoning about the terrain in a game world and using this knowledge to identify complex features such as ambush points. This kind of high level reasoning would allow AI opponents play a much more realistic game, and even surprise human players from time to time, something that is sorely missing from current strategy games. As well as this work which has spring-boarded from existing applications, a number of projects began expressly to tackle problems in game-AI. Two which particularly stand out are the Excalibur Project, led by Alexander Nareyek [55] and work by John Funge [26]. Both of these projects have attempted to applying sophisticated planning techniques to the control of game characters. Nareyek uses constraint based planning to allow game agents reason about their world. By using techniques such as local search Nareyek has attempted to allow these sophisticated agents perform resource intensive planning

615

616

Computer Graphics and Games, Agent Based Modeling in

within the constraints of a typical computer game environment. Following on from this work, the term anytime agent was coined to describe the process by which agents actively refine original plans based on changing world conditions. In [56] Narayek describes the directions in which he intends to take this work in future. Funge uses the situational calculus to allow agents reason about their world. Similarly to Nareyek he has addressed the problems of a dynamic, ever changing world, plan refining and incomplete information. Funge’s work uses an extension to the situational calculus which allows the expression of uncertainty. Since completing this work Funge has gone on to be one of the founders of AiLive (www.ailive.net), a middleware company specializing in AI for games. While the approaches of both of these projects have shown promise within the constrained environments to which they have been applied during research, (and work continues on them) it remains to be seen whether such techniques can be successfully applied to a commercial game environment and all of the resource constraints that such an environment entails. One of the most interesting recent examples of agentbased work in the field of serious games is that undertaken by Barry Silverman and his group at the University of Pennsylvania in the United States [69,70]. Silverman models the protagonists in military simulations for use in training programmes and has taken a very interesting approach in that his agent models are based on established cognitive science and behavioral science research. While Silverman admits that many of the models described in the cognitive science and behavioral science literature are not well quantified enough to be directly implemented, he has adapted a number of well respected models for his purposes. Silverman’s work is an excellent example of the capabilities that can be explored in a serious games setting rather than a commercial game setting, and as such merits an in depth discussion. A high-level schematic diagram of Silverman’s approach is shown in Fig. 7 and shows the agent architecture used by Silverman’s system, PMFserv. The first important component of the PMFserv system is the biology module which controls biological needs using a metaphor based on the flow of water through a system. Biological concepts such as hunger and fatigue are simulated using a series of reservoirs, tanks and valves which model the way in which resources are consumed by the system. This biological model is used in part to model stress which has an important impact on the way in which agents make decisions. To model the way in which agent performance changes under pressure Silverman uses performance moderator functions (PMFs). An example of one

of the earliest PMFs used is the Yerkes–Dodson “invertedu” curve [84] which illustrates that as mental arousal is increased performance initially improves, peaks and then trails off again. In PMFserv a range of PMFs are used to model the way in which behavior should change depending on stress levels and biological conditions. The second important module of PMFserv attempts to model how personality, culture and emotion affect the behavior of an agent. In keeping with the rest of their system PMFserv uses models inspired by cognitive science to model emotions. In this case the well known OCC model [58], which has been used in agent-based applications before [8], is used. The OCC model provides for 11 pairs of opposite emotions such as pride and shame, and hope and fear. The emotional state of an agent with regard to past, current and future actions heavily influences the decisions that the agent makes. The second portion of the Personality, Culture, Emotion module uses a value tree in order to capture the values of an agent. These values are divided into a Preference Tree which captures long term desired states for the world, a Standards Tree which relates to the actions that an agent believes it can or cannot follow in order to achieve these desired states and a Goal Tree which captures short term goals. PMFserv also models the relationships between agents (Social Model, Relations, Trust in Fig. 7). The relationship of one agent to another is modeled in terms of three axes. The first is the degree to which the other agent is thought of as a human rather than an inanimate object – locals tend to view American soldiers as objects rather than people. The second axis is the cognitive grouping (ally, foe etc) to which the other agent belongs and whether this is also a group to which the first agent has an affinity. Finally, the valence, or strength, of the relationship is stored. Relationships continually change based on actions that occur within the game world. Like the other modules of the system this model is also based on psychological research [58]. The final important module of the PMFserv architecture is the Cognitive module which is used to decide on particular actions that agents will undertake. This module uses inputs from all of the other modules to make these decisions and so the behavior of PMFserv agents is driven by their stress levels, relationships to other agents and objects within the game world, personality, culture and emotions. The details of the PMFserv cognitive process are beyond the scope of this article, so it will suffice to say that action selection is based on a calculation of the utility of a particular action to an agent, with this calculation modified by the factors listed above.

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 7 A schematic diagram of the main components of the PMFserv system (with kind permission of Barry Silverman)

The most highly developed example using the PMFserv model is a simulation of the 1993 event in Mogadishu, Somalia in which a United States military Black Hawk helicopter crashed, as made famous by the book and film “Black Hawk Down” [12]. In this example, which was developed as a military training aid as part of a larger project looking at agent implementations within such systems [78,81] the player took on the role of a US army ranger on a mission to secure the helicopter wreck in a modification (or “mod”) of the game Unreal Tournament (www.unreal.com). A screenshot of this simulation is shown in Fig. 8. The PMFserv system was used to control the behaviors of characters within the game world such as Somali militia, and Somali civilians. These characters were imbued with physical attributes, a value system and relationships with other characters and objects within the game environment. The sophistication of PMFserv was apparent in many of the behaviors of the simulations NPCs. One particularly good example was the fact that Somali women would offer themselves as human shields for militia fighters. This behavior was never directly programmed into the agents make-up, but rather emerged as a result of their values and assessment of their situation. PMFserv remains one of the most sophisticated current agent implementations and shows the possibilities when the shackles of commercial game constraints are thrown off.

Future Directions There is no doubt that with the increase in the amount of work being focused on the use of agent-based modeling in computer graphics and games there will be major developments in the near future. This final section will attempt to predict what some of these might be. The main development that might be expected in all of the areas that have been discussed in this article is an increase in the depth of simulation. The primary driver of this increase in depth will be the development of more sophisticated agent models which can be used to drive ever more sophisticated agent behavior. The PMFserv system described earlier is one example of the kinds of deeper systems that are currently being developed. In general computer graphics applications this will allow for the creation of more interesting simulations including previously prohibitive features such as automatic realistic facial expressions and other physical expressions of agents’ internal states. This would be particularly use in CGI for movies in which, although agent based modeling techniques are commonly used for crowd scenes and background characters, main characters are still animated almost entirely by hand. In the area of computer games it can be expected that many of the techniques being used in movie CGI will filter over to real-time game applications as the process-

617

618

Computer Graphics and Games, Agent Based Modeling in

Computer Graphics and Games, Agent Based Modeling in, Figure 8 A screenshot of the PMFserv system being used to simulate the Black Hawk Down scenario (with kind permission of Barry Silverman)

ing power of game hardware increases – this is a pattern that has been evident for the past number of years. In terms of depth that might be added to the control of game characters one feature that has mainly been conspicuous by its absence in modern games is genuine learning by game agents. 2000’s Black & White and its sequel Black & White 2 (www.lionhead.com) featured some learning by one of the game’s main characters that the player could teach in a reinforcement manner [20]. While this was particularly successful in the game, such techniques have not been more widely applied. One interesting academic project in this area is the NERO project (www.nerogame. org) which allows a player to train an evolving army of soldiers and have them battle the armies of other players [73]. It is expected that these kinds of capabilities will become more and more common in commercial games. One new feature of the field of virtual character control in games is the emergence of specialized middleware. Middleware has had a massive impact in other areas of game development including character modeling (for example Maya available from www.autodesk. com) and physics modeling (for example Havok available from www.havok.com). AI focused middleware for games is now becoming more common with notable of-

ferings including AI-Implant (www.ai-implant.com) and Kynogon (www.kynogon.com) which perform path finding and state machine based control of characters. It is expected that more sophisticated techniques will over time find their way into such software. To conclude the great hope for the future is that more and more sophisticated agent-based modeling techniques from other application areas and other branches of AI will find their way into the control of virtual characters.

Bibliography Primary Literature 1. Adamson A (Director) (2005) The Chronicles of Narnia: The Lion, the Witch and the Wardrobe. Motion Picture. http:// adisney.go.com/disneypictures/narnia/lb_main.html 2. Aitken M, Butler G, Lemmon D, Saindon E, Peters D, Williams G (2004) The Lord of the Rings: the visual effects that brought middle earth to the screen. International Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), Course Notes 3. Alexander T (2003) Parallel-State Machines for Believable Characters. In: Massively Multiplayer Game Development. Charles River Media 4. Allers R, Minkoff R (Directors) (1994) The Lion King. Motion

Computer Graphics and Games, Agent Based Modeling in

5.

6.

7.

8. 9.

10.

11.

12. 13.

14. 15.

16.

17. 18.

19. 20. 21.

22.

23.

24. 25.

Picture. http://disney.go.com/disneyvideos/animatedfilms/ lionking/ Aylett R, Luck M (2000) Applying Artificial Intelligence to Virtual Reality: Intelligent Virtual Environments. Appl Artif Intell 14(1):3–32 Badler N, Bindiganavale R, Bourne J, Allbeck J, Shi J, Palmer M (1999) Real Time Virtual Humans. In: Proceedings of the International conference on Digital Media Futures. Bates J (1992) The Nature of Characters in Interactive Worlds and the Oz Project. Technical Report CMU-CS-92–200. School of Computer Science, Carnegie Melon University Bates J (1992) Virtual reality, art, and entertainment. Presence: J Teleoper Virtual Environ 1(1):133–138 Berger L (2002) Scripting: Overview and Code Generation. In: Rabin S (ed) AI Game Programming wisdom. Charles River Media Bird B, Pinkava J (Directors) (2007) Ratatouille. Motion Picture. http://disney.go.com/disneyvideos/animatedfilms/ ratatouille/ Blumberg B (1996) Old Tricks, New Dogs: Ethology and Interactive Creatures. Ph D Thesis, Media Lab, Massachusetts Institute of Technology Bowden M (2000) Black Hawk Down. Corgi Adult Burke R, Isla D, Downie M, Ivanov Y, Blumberg B (2002) Creature Smarts: The Art and Architecture of a Virtual Brain. In: Proceedings of Game-On 2002: the 3rd International Conference on Intelligent Games and Simulation, pp 89–93 Burton T (Director) (1992) Batman Returns. Motion Picture. http://www.warnervideo.com/batmanmoviesondvd/ Carless S (2005) Postcard From SGS 2005: Hazmat: Hotzone – First-Person First Responder Gaming. Retrieved October 2007, from Gamasutra: www.gamasutra.com/features/ 20051102/carless_01b.shtml Christian M (2002) A Simple Inference Engine for a Rule Based Architecture. In: Rabin S (ed) AI Game Programming Wisdom. Charles River Media Darnell E, Johnson T (Directors) (1998) Antz. Motion Picture. http://www.dreamworksanimation.com/ DeMaria R (2005) Postcard from the Serious Games Summit: How the United Nations Fights Hunger with Food Force. Retrieved October 2007, from Gamasutra: www.gamasutra.com/ features/20051104/demaria_01.shtml Dybsand E (2001) A Generic Fuzzy State Machine in C++. In: Rabin S (ed) Game Programming Gems 2. Charles River Media Evans R (2002) Varieties of Learning. In: Rabin S (ed) AI Game Programming Wisdom. Charles River Media Faloutsos P, van de Panne M, Terzopoulos D (2001) The Virtual Stuntman: Dynamic Characters with a Repetoire of Autonomous Motor Skills. Comput Graph 25(6):933–953 Farenc N, Musse S, Schweiss E, Kallmann M, Aune O, Boulic R et al (2000) A Paradigm for Controlling Virtual Humans in Urban Environment Simulations. Appl Artif Intell J Special Issue Intell Virtual Environ 14(1):69–91 Feng-Hsiung H (2002) Behind Deep Blue: Building the Computer that Defeated the World Chess Champion. Princeton University Press Forbus K, Nielsen P, Faltings B (1991) Qualitative Spatial Reasoning: The CLOCK Project. Artif Intell 51:1–3 Forbus K, Mahoney J, Dill K (2001) How Qualitative Spatial Rea-

26. 27. 28.

29.

30.

31. 32.

33.

34. 35. 36. 37.

38.

39.

40.

41.

42.

43.

44. 45.

46.

soning Can Improve Strategy Game AIs. In: Proceedings of the AAAI Spring Symposium on AI and Interactive Entertainment Funge J (1999) AI for Games and Animation: A Cognitive Modeling Approach. A.K. Peters Hayes-Roth B, Doyle P (1998) Animate Characters. Auton Agents Multi-Agent Syst 1(2):195–230 Horswill I (2007) Psychopathology, narrative, and cognitive architecture (or: why NPCs should be just as screwed up as we are). In: Proceedings of AAAI Fall Symposium on Intelligent Narrative Technologies Horswill I, Zubek R (1999) Robot Architectures for Believable Game Agents. In: Proceedings of the 1999 AAAI Spring Symposium on Artificial Intelligence and Computer Games Houlette R, Fu D (2003) The Ultimate Guide to FSMs in Games. In: Rabin S (ed) AI Game Programming Wisdom 2. Charles River Media IGDA (2003) Working Group on Rule-Based Systems Report. International Games Development Association Isbister K, Doyle P (2002) Design and Evaluation of Embodied Conversational Agents: A Proposed Taxonomy. In: Proceedings of the AA- MAS02 Workshop on Embodied Conversational Agents: Lets Specify and Compare Them! Bologna, Italy Jackson P (Director) (2001) The Lord of the Rings: The Fellowship of the Ring. Motion Picture. http://www.lordoftherings. net/ Jackson P (Director) (2002) The Lord of the Rings: The Two Towers. Motion Picture. http://www.lordoftherings.net/ Jackson P (Director) (2003) The Lord of the Rings: The Return of the King. Motion Picture. http://www.lordoftherings.net/ Johnston O, Thomas F (1995) The Illusion of Life: Disney Animation. Disney Editions Jones R, Laird J, Neilsen P, Coulter K, Kenny P, Koss F (1999) Automated Intelligent Pilots for Combat Flight Simulation. AI Mag 20(1):27–42 Khoo A, Zubek R (2002) Applying Inexpensive AI Techniques to Computer Games. IEE Intell Syst Spec Issue Interact Entertain 17(4):48–53 Koeppel D (2002) Massive Attack. http://www.popsci.com/ popsci/science/d726359b9fa84010vgnvcm1000004eecbccdr crd.html. Accessed Oct 2007 Laird J (2000) An Exploration into Computer Games and Computer Generated Forces. The 8th Conference on Computer Generated Forces and Behavior Representation Laird J, van Lent M (2000) Human-Level AI’s Killer Application: Interactive Computer Games. In: Proceedings of the 17th National Conference on Artificial Intelligence Laird J, Rosenbloom P, Newell A (1984) Towards Chunking as a General Learning Mechanism. The 1984 National Conference on Artificial Intelligence (AAAI), pp 188–192 Laramée F (2002) A Rule Based Architecture Using DempsterSchafer theory. In: Rabin S (ed) AI Game Programming Wisdom. Charles River Media Lasseter J, Stanton A (Directors) (1998) A Bug’s Life; Motion Picture. http://www.pixar.com/featurefilms/abl/ Leonard T (2003) Building an AI Sensory System: Examining the Deign of Thief: The Dark Project. In: Proceedings of the 2003 Game Developers’ Conference, San Jose Loyall B (1997) Believable Agents: Building Interactive Personalities. Ph D Thesis, Carnegie Melon University

619

620

Computer Graphics and Games, Agent Based Modeling in

47. Määta A (2002) Realistic Level Design for Max Payne. In: Proceedings of the 2002 Game Developer’s conference, GDC 2002 48. Mac Namee B, Cunningham P (2003) Creating Socially Interactive Non Player Characters: The µ-SIC System. Int J Intell Games Simul 2(1) 49. Mac Namee B, Dobbyn S, Cunningham P, O’Sullivan C (2003) Simulating Virtual Humans Across Diverse Situations. In: Proceedings of Intelligent Virtual Agents ’03, pp 159–163 50. Mac Namee B, Rooney P, Lindstrom P, Ritchie A, Boylan F, Burke G (2006) Serious Gordon: Using Serious Games to Teach Food Safety in the Kitchen. The 9th International Conference on Computer Games: AI, Animation, Mobile, Educational & Serious Games CGAMES06, Dublin 51. Magerko B, Laird JE, Assanie M, Kerfoot A, Stokes D (2004) AI Characters and Directors for Interactive Computer Games. The 2004 Innovative Applications of Artificial Intelligence Conference. AAAI Press, San Jose 52. Thalmann MN, Thalmann D (1994) Artificial Life and Virtual Reality. Wiley 53. Michael D, Chen S (2005) Serious Games: Games That Educate, Train, and Inform. Course Technology PTR 54. Muller J (1996) The Design of Intelligent Agents: A Layered Approach. Springer 55. Nareyek A (2001) Constraint Based Agents. Springer 56. Nareyek A (2007) Game AI is Dead. Long Live Game AI! IEEE Intell Syst 22(1):9–11 57. Nieborg D (2004) America’s Army: More Than a Game. Bridging the Gap;Transforming Knowledge into Action through Gaming and Simulation. Proceedings of the 35th Conference of the International Simulation and Gaming Association (ISAGA), Munich 58. Ortony A, Clore GL, Collins A (1988) The cognitive structure of emotions. Cambridge University Press, Cambridge 59. Perlin K, Goldberg A (1996) Improv: A System for Scripting Interactive Actors in Virtual Worlds. In: Proceedings of the ACM Computer Graphics Annual Conference, pp 205–216 60. Proyas A (Director) (2004) I, Robot. Motion Picture. http://www. irobotmovie.com 61. Rao AS, Georgeff MP (1991) Modeling rational agents within a BDI-architecture. In: Proceedings of Knowledge Representation and Reasoning (KR&R-91). Morgan Kaufmann, pp 473–484 62. Musse RS, Thalmann D (2001) A Behavioral Model for Real Time Simulation of Virtual Human Crowds. IEEE Trans Vis Comput Graph 7(2):152–164 63. Reed C, Geisler B (2003) Jumping, Climbing, and Tactical Reasoning: How to Get More Out of a Navigation System. In: Rabin S (ed) AI Game Programming Wisdom 2. Charles River Media 64. Reynolds C (1987) Flocks, Herds and Schools: A Distributed Behavioral Model. Comput Graph 21(4):25–34 65. Rodriguez R (Director) (1996) From Dusk ’Till Dawn. Motion Picture 66. Rosenbloom P, Laird J, Newell A (1993) The SOAR Papers: Readings on Integrated Intelligence. MIT Press 67. Sánchez-Crespo D (2006) GDC: Physical Gameplay in Half-Life 2. Retrieved October 2007, from gamasutra.com: http://www. gamasutra.com/features/20060329/sanchez_01.shtml 68. Shao W, Terzopoulos D (2005) Autonomous Pedestrians. In: Proceedings of SIGGRAPH/EG Symposium on Computer Animation, SCA’05, pp 19–28 69. Silverman BG, Bharathy G, O’Brien K, Cornwell J (2006) Human

70.

71.

72.

73.

74. 75. 76.

77.

78.

79.

80.

81.

82.

83. 84.

85.

Behavior Models for Agents in Simulators and Games: Part II: Gamebot Engineering with PMFserv. Presence Teleoper Virtual Worlds 15(2):163–185 Silverman BG, Johns M, Cornwell J, O’Brien K (2006) Human Behavior Models for Agents in Simulators and Games: Part I: Enabling Science with PMFserv. Presence Teleoper Virtual Environ 15(2):139–162 Smith P (2002) Polygon Soup for the Programmer’s Soul: 3D Path Finding. In: Proceedings of the Game Developer’s Conference 2002, GDC2002 Snavely P (2002) Agent Cooperation in FSMs for Baseball. In: Rabin S (ed) AI Game Programming Wisdom. Charles River Media Stanley KO, Bryant BD, Karpov I, Miikkulainen R (2006) RealTime Evolution of Neural Networks in the NERO Video Game. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence, AAAI-2006. AAAI Press, pp 1671–1674 Stout B (1996) Smart Moves: Intelligent Path-Finding. Game Dev Mag Oct Takahashi TS (1992) Behavior Simulation by Network Model. Memoirs of Kougakuin University 73, pp 213–220 Terzopoulos D, Tu X, Grzeszczuk R (1994) Artificial Fishes with Autonomous Locomotion, Perception, Behavior and Learning, in a Physical World. In: Proceedings of the Artificial Life IV Workshop. MIT Press Thompson C (2007) Halo 3: How Microsoft Labs Invented a New Science of Play. Retrieved October 2007, from wired.com: http://www.wired.com/gaming/virtualworlds/ magazine/15-09/ff_halo Toth J, Graham N, van Lent M (2003) Leveraging gaming in DOD modelling and simulation: Integrating performance and behavior moderator functions into a general cognitive architecture of playing and non-playing characters. Twelfth Conference on Behavior Representation in Modeling and Simulation (BRIMS, formerly CGF), Scotsdale, Arizona Valdes R (2004) In the Mind of the Enemy: The Artificial Intelligence of Halo 2. Retrieved October 2007, from HowStuffWorks.com: http://entertainment.howstuffworks.com/ halo2-ai.htm van der Werf E, Uiterwijk J, van den Herik J (2002) Programming a Computer to Play and Solve Ponnuki-Go. In: Proceedings of Game-On 2002: The 3rd International Conference on Intelligent Games and Simulation, pp 173–177 van Lent M, McAlinden R, Brobst P (2004) Enhancing the behavioral fidelity of synthetic entities with human behavior models. Thirteenth Conference on Behavior Representation in Modeling and Simulation (BRIMS) Woodcock S (2000) AI Roundtable Moderator’s Report. In: Proceedings of the Game Developer’s Conference 2000 (GDC2000) Wooldridge M, Jennings N (1995) Intelligent Agents: Theory and Practice. Know Eng Rev 10(2):115–152 Yerkes RW, Dodson JD (1908) The relation of strength of stimulus to rapidity of habit formation. J Comp Neurol Psychol 18:459–482 Zubek R, Horswill I (2005) Hierarchical Parallel Markov Models of Interaction. In: Proceedings of the Artificial Intelligence and Interactive Digital Entertainment Conference, AIIDE 2005

Computer Graphics and Games, Agent Based Modeling in

Books and Reviews DeLoura M (ed) (2000) Game Programming Gems. Charles River Media DeLoura M (ed) (2001) Game Programming Gems 2. Charles River Media Dickheiser M (ed) (2006) Game Programming Gems 6. Charles River Media Kirmse A (ed) (2004) Game Programming Gems 4. Charles River Media

Pallister K (ed) (2005) Game Programming Gems 5. Charles River Media Rabin S (ed) (2002) Game AI Wisdom. Charles River Media Rabin S (ed) (2003) Game AI Wisdom 2. Charles River Media Rabin S (ed) (2006) Game AI Wisdom 3. Charles River Media Russell S, Norvig P (2002) Artificial Intelligence: A Modern Approach. Prentice Hall Treglia D (ed) (2002) Game Programming Gems 3. Charles River Media

621

622

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems JERZY GORECKI 1,2 , JOANNA N ATALIA GORECKA 3 Institute of Physical Chemistry, Polish Academy of Science, Warsaw, Poland 2 Faculty of Mathematics and Natural Sciences, Cardinal Stefan Wyszynski University, Warsaw, Poland 3 Institute of Physics, Polish Academy of Science, Warsaw, Poland

1

Article Outline Glossary Definition of the Subject Introduction Logic Gates, Coincidence Detectors and Signal Filters Chemical Sensors Built with Structured Excitable Media The Ring Memory and Its Applications Artificial Chemical Neurons with Excitable Medium Perspectives and Conclusions Acknowledgments Bibliography Glossary Some of the terms used in our article are the same as in the article of Adamatzky in this volume, and they are explained in the glossary of  Reaction-Diffusion Computing. Activator A substance that increases the rate of reaction. Excitability Here we call a dynamical system excitable if it has a single stable state (the rest state) with the following properties: if the rest state is slightly perturbed then the perturbation uniformly decreases as the system evolves towards it. However, if the perturbation is sufficiently strong it may grow by orders of magnitude before the system approaches the rest state. The increase in variables characterizing the system (usually rapid if compared with the time necessary to reach the rest state) is called an excitation. A forest is a classical example of excitable medium and a wildfire that burns it is an excitation. A dynamical system is non-excitable if applied perturbations do not grow up and finally decay. Excitability level The measure of how strong a perturbation has to be applied to excite the system. For example, for the Ru-catalyzed Belousov–Zhabotinsky reaction, increasing illumination makes the medium less excitable. The decrease in the excitability level can be observed in reduced amplitudes of spikes and in decreased velocities of autowaves.

Firing number The ratio between the number of generated excitations to the number of applied external stimulations. In most of the cases we define the firing number as the ratio between the number of output spikes to the number of arriving pulses. Inhibitor A substance that decreases the rate of reaction or even prevents it. Medium In the article we consider a chemical medium, fully characterized by local concentrations of reagents and external conditions like temperature or illumination level. The time evolution of concentrations is governed by a set of reaction-diffusion equations, where the reaction term is an algebraic function of variables characterizing the system and the non-local coupling is described by the diffusion operator. We are mainly concerned with a two dimensional medium (i. e., a membrane with a solution of reagents used in experiments), but the presented ideas can be also applied to one-dimensional or three-dimensional media. Refractory period A period of time during which an excitable system is incapable of repeating its response to an applied, strong perturbation. After the refractory period the excitable medium is ready to produce an excitation as an answer to the stimulus. Spike, autowave In a spatially distributed excitable medium, a local excitation can spread around as a pulse of excitation. Usually a propagating pulse of excitation converges to a stationary shape characteristic for the medium, which is not dependent on initialization and propagates with a constant velocity – thus it is called an autowave. Subexcitability A system is called subexcitable if the amplitude and size of the initiated pulse of excitation decreases in time. However, if the decay time is comparable with the characteristic time for the system, defined as the ratio between system size and pulse velocity, then pulses in a subexcitable system travel for a sufficiently long distance to carry information. Subexcitable media can be used to control the amplitude of excitations. Subexcitability is usually related to system dynamics, but it may also appear as the result of geometrical constraints. For example, a narrow excitable channel surrounded by a non-excitable medium may behave as a subexcitable system because a propagating pulse dies out due to the diffusion of the activator in the neighborhood. Definition of the Subject It has been shown in the article of Adamatzky  Reaction-Diffusion Computing that an excitable system can

Computing in Geometrical Constrained Excitable Chemical Systems

be used as an information processing medium. In such a medium, information is coded in pulses of excitation; the presence of a single excitation or of a group of excitations forms a message. Information processing discussed by Adamatzky is based on a homogeneous excitable medium and the interaction between pulses in such medium. Here we focus our attention on a quite specific type of excitable medium that has an intentionally introduced structure of regions characterized by different excitability levels. As the simplest case we consider a medium composed of excitable regions where autowaves can propagate and non-excitable ones where excitations rapidly die. Using such two types of medium one can, for example, construct signal channels: stripes of excitable medium where pulses can propagate surrounded by non-excitable areas thick enough to cancel potential interactions with pulses propagating in neighboring channels. Therefore, the propagation of a pulse along a selected line in an excitable system can be realized in two ways. In a homogeneous excitable medium, it can be done by a continuous control of pulse propagation and the local feedback with activating and inhibiting factors ( Reaction-Diffusion Computing). In a structured excitable medium, the same result can be achieved by creating a proper pattern of excitable and non-excitable regions. The first method gives more flexibility, the second is just simpler and does not require a permanent watch. As we show in this article, a number of devices that perform simple information processing operations, including the basic logic functions, can be easily constructed with structured excitable medium. Combining these devices as building blocks we can perform complex signal processing operations. Such an approach seems similar to the development of electronic computing where early computers were built of simple integrated circuits. The research on information processing with structured excitable media has been motivated by a few important problems. First, we would like to investigate how the properties of a medium can be efficiently used to construct devices performing given functions, and what tasks are the most suitable for chemical computing. There is also a question of generic designs valid for any excitable medium and specific ones that use unique features of a particular system (for example, a one-dimensional generator of excitation pulses that can be built with an excitable surface reaction [23]). In information processing with a structured excitable medium, the geometry of the medium is as important as its dynamics, and it seems interesting to know what type of structures are related to specific functions. In the article we present a number of such structures characteristic for particular information processing operations.

Another important motivation for research comes from biology. Even the simplest biological organisms can process information and make decisions important for their life without CPU-s, clocks or sequences of commands as it is in the standard von Neumann computer architecture [15]. In biological organisms, even at a very basic cellular level, excitable chemical reactions are responsible for information processing. The cell body considered as an information processing medium is highly structured. We believe that analogs of geometrical structures used for certain types of information processing operations in structured excitable media will be recognized in biological systems, so we will better understand their role in living organisms. At a higher level, the analogies between information processing with chemical media and signal processing in the brain seems to be even closer because the excitable dynamics of calcium in neural tissue is responsible for signal propagation in nerve system [30]. Excitable chemical channels that transmit signals between processing elements look similar to dendrites and axons. As we show in the article, the biological neuron has its chemical analog, and this allows for the construction of artificial neural networks using chemical processes. Having in mind that neural networks are less vulnerable to random errors than classical algorithms one can go back from biology to man-made computing and adopt the concepts in a fast excitable medium, for example especially prepared semiconductors ( Unconventional Computing, Novel Hardware for). The article is organized in the following way. In the next section we discuss the basic properties of a structured chemical medium that seem useful for information processing. Next we consider the binary information coded in propagating pulses of concentration and demonstrate how logic gates can be built. In the following chapter we show that a structured excitable medium can acquire information about distances and directions of incoming stimuli. Next we present a simple realization of read-write memory cell, discuss its applications in chemical counting devices, and show its importance for programming with pulses of excitation. In the following section we present a chemical realization of artificial neurons that perform multiargument operation on sets of input pulses. Finally we discuss the perspectives of the field, in particular more efficient methods of information coding and some ideas of self-organization that can produce structured media capable of information processing. Introduction Excitability is the wide spread behavior of far-from-equilibrium systems [35,39] observed, for example, in chem-

623

624

Computing in Geometrical Constrained Excitable Chemical Systems

ical reactions (Bielousov–Zhabotynsky BZ-reaction [45], CO oxidation on Pt [37], combustion of gases [26]) as well as in many other physical (laser action) and biochemical (signaling in neural systems, contraction of cardiovascular tissues) processes [51]. All types of excitable systems share a common property that they have a stable stationary state (the rest state) they reside in when they are not perturbed. A small perturbation of the rest state results only in a small-amplitude linear response of the system that uniformly decays in time. However, if a perturbation is sufficiently large then the system can evolve far away from the rest state before finally returning to it. This response is strongly nonlinear and it is accompanied by a large excursion of the variables through phase space, which corresponds to an excitation peak (a spike). The system is refractory after a spike, which means that it takes a certain recovery time before another excitation can take place. The excitability is closely related with relaxation oscillations and the phenomena differ by one bifurcation only [56]. Properties of excitable systems have an important impact on their ability to process information. If an excitable medium is spatially distributed then an excitation at one point of the medium (usually seen as an area with a high concentration of a certain reagent), may introduce a sufficiently large perturbation to excite the neighboring points as the result of diffusion or energy transport. Therefore, an excitation can propagate in space in the form of a pulse. Unlike mechanical waves that dissipate the initial energy and finally decay, traveling spikes use the energy of the medium to propagate and dissipate it. In a typical excitable medium, after a sufficiently long time, an excitation pulse converges to the stationary shape, independent of the initial condition what justifies to call it an autowave. Undamped propagation of signals is especially important if the distances between the emitting and receiving devices are large. The medium’s energy comes from the nonequilibrium conditions at which the system is kept. In the case of a batch reactor, the energy of initial composition of reagents allows for the propagation of pulses even for days [41], but a typical time of an experiment in such conditions is less than an hour. In a continuously fed reactor pulses can run as long as the reactants are delivered [44]. If the refractory period of the medium is sufficiently long then the region behind a pulse cannot be excited again for a long time. As a consequence, colliding pulses annihilate. This type of behavior is quite common in excitable systems. Another important feature is the dispersion relation for a train of excitations. Typically, the first pulse is the fastest, and the subsequent ones are slower, which is related to the fact that the medium behind the first

pulse has not relaxed completely [63]. For example, this phenomenon is responsible for stabilization of positions in a train of pulses rotating on a ring-shaped excitable area. However, in some systems [43] the anomalous dispersion relation is observed and there are selected stable distances between subsequent spikes. The excitable systems characterized by the anomalous dispersion can play an important role in information processing, because packages of pulses are stable and thus the information coded in such packages can propagate without dispersion. The mathematical description of excitable chemical media is based on differential equations of reaction-diffusion type, sometimes supplemented by additional equations that describe the evolution of the other important properties of the medium, for example the orientation of the surface in the case of CO oxidation on a Pt surface [12,37]. Numerical simulations of pulse propagation in an excitable medium which are presented in many papers [48,49] use the FitzHugh–Nagumo model describing the time evolution of electrical potentials in nerve channels [17,18,54]. The models for systems with the Belousov–Zhabotinsky (BZ) reaction, for example the Rovinsky and Zhabotinsky model [61,62] for the ferroin catalyzed BZ reaction with the immobilized catalyst, can be derived from “realistic” reaction schemes [16] via different techniques of variable reduction. Experiments with the Ru-catalyzed, photosensitive BZ reaction have become standard in experiments with structured excitable media because the level of excitation can be easily controlled by illumination; see for example [7,27,33]. Light catalyzes the production of bromine that inhibits the reaction, so non illuminated regions are excitable and those strongly illuminated are not. The pattern of excitable (dark) and non-excitable (transparent) fields is just projected on a membrane filled with the reagents. For example, the labyrinth shown in Fig. 1 has been obtained by illuminating a membrane through a proper mask. The presence of a membrane is important because it stops convection in the solution and reduces the speed and size of spikes, so studied systems can be smaller. The other methods of forming excitable channels based on immobilizing a catalyst by imprinting it on a membrane [68] or attaching it by lithography [21,71,72] have been also used, but they seem to be more difficult. Numerical simulations of the Ru-catalyzed BZ reaction can be performed with different variants of the Oregonator model [9,19,20,38]. For example, the three-variable model uses the following equations: @u "1 (1) D u(1  u)  w(u  q) C Du r 2 u @t @v Duv (2) @t

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 1 Pulses of excitation propagating in a labyrinth observed in an experiment with a Ru-catalyzed BZ-reaction. The excitable areas are dark, the non-excitable light. The source of a train of pulses (a tip of a silver wire) is placed at the point A

"2

@w D  C f v  w(u C q) C Dw r 2 w @t

(3)

where u, v and w denote dimensionless concentrations of the following reagents: HBrO2 , Ru(4,40 -dm-bpy)3C 3 , and Br , respectively. In the considered system of equations, u is an activator and v is an inhibitor. The set of Oregonator equations given above reduces to the two-variable model cited in the article of Adamatzky  Reaction-Diffusion Computing if the processes responsible for bromide production are very fast if compared to the other reactions ("2  "1  1). In such cases, the local value of w can be calculated assuming that it corresponds to the stationary solution of the third equation. If such a w is substituted into the first equation, one obtains the two-variable Oregonator model. In the equations given above, the units of space and time are dimensionless and they have been chosen to scale the reaction rates. Here we also neglected the diffusion of ruthenium catalytic complex because usually it is much smaller than those of the other reagents. The reaction-diffusion equations describing the time evolution of the system can be solved with the standard numerical techniques [57]. The parameter  represents the rate of bromide production caused by illumination and it is proportional to

the applied light intensity. Therefore, by adjusting the local illumination (or choosing the proper  as a function of space variables in simulations) we create regions with the required level of excitability, like for example excitable stripes insulated by a non-excitable neighborhood. Of course, the reagents can freely diffuse between the regions characterized by different illuminations. The structure of the equations describing the system’s evolution in time and space gives the name “reaction diffusion computing” [4] to information processing with an excitable medium. Simulations play an important role in tests of potential information processing devices, because, unlike in experiment, the conditions and parameters of studied systems can be easily adjusted with the required precision and kept forever. Most of the information processing devices discussed below were first tested in simulations and next verified experimentally. One of the problems is related to the short time of a typical experiment in a batch condition (a membrane filled with reagents) that does not exceed one hour. In such systems the period in which the conditions can be regarded as stable is usually shorter than 30 minutes and within this time an experimentalist should prepare the medium, introduce the required illumination and perform observations. The problem can be solved when one uses a continuously fed reactor [44], but experimental setups are more complex. On the other hand, simulations often indicate that the range of parameters in which a given phenomenon appears is very narrow. Fortunately experiments seem to be more robust than simulations and the expected effects can be observed despite of inevitable randomness in reagent preparation. Having in mind the relatively short time of experiments in batch conditions and the low stability of the medium, we believe that applications of a liquid chemical excitable medium like the Ru-catalyzed BZ reaction as information processors (wetware) are rather academic and they are mainly oriented on the verification of ideas. Practical applications of reaction-diffusion computers will probably be based on an other type of medium, like structured semiconductors ( Unconventional Computing, Novel Hardware for). In an excitable reaction-diffusion medium, spikes propagate along the minimum time path. Historically, one of the first applications of a structured chemical medium used in information processing was the solution of the problem of finding the shortest path in a labyrinth [67]. The idea is illustrated in Fig. 1. The labyrinth is build of excitable channels (dark) separated by non-excitable medium (light) that does not allow for interactions between pulses propagating in different channels. Let us assume that we are interested in the distance between the

625

626

Computing in Geometrical Constrained Excitable Chemical Systems

right bottom corner (point A) and the left upper corner (point B) of the labyrinth shown. The algorithm that scans all the possible paths between these points and selects the shortest one is automatically executed if the paths in labyrinth are build of an excitable chemical medium. To see how it works, let us excite the medium at the point A. The excitation spreads out through the labyrinth, separates at the junctions and spikes enter all possible paths. During the time evolution, pulses of excitation can collide and annihilate, but the one that propagates along the shortest path has always unexcited medium in front. Knowing the time difference between the moment when the pulse is initiated at the point A and the moment when it arrives at the point B and assuming that the speed of a pulse is constant, we can estimate the length of shortest path linking both points. The algorithm described above is called the “prairie fire” algorithm and it is automatically executed by an excitable medium. It finds the shortest path in a highly parallel manner scanning all possible routes at the same time. It is quite remarkable that the time required for finding the shortest path does not depend on the the complexity of labyrinth structure, but only on the distance between the considered points. Although the estimation of the minimum distance separating two points in a labyrinth is relatively easy (within the assumption that corners do not significantly change pulse speed) it is more difficult to tell what is the shortest path. To do it one can trace the pulse that arrived first and plot a line tangential to its velocity. This idea was discussed in [6] in the context of finding the length of the shortest path in a nonhomogeneous chemical medium with obstacles. The method, although relatively easy, requires the presence of an external observer who follows the propagation of pulses. An alternative technique of extracting the shortest path based on the image processing was demonstrated in [59]. One can also locate the shortest path connecting two points in a labyrinth in a purely chemical way using the coincidence of excitations generated at the endpoints of the path. Such a method, described in [29], allows one to find the midpoint of a trajectory, and thus locate the shortest path point by point. The approximately constant speed of excitation pulses in a reaction-diffusion medium allows one to solve some geometrically oriented problems. For example one can “measure” the number  by comparing the time of pulse propagation around a non-excitable circle of the radius d with the time of propagation around a square with the same side [69]. Similarly, a constant speed of propagating pulses can be used to obtain a given fraction of an angle. For example, a trisection of an angle can be done if the angle arms are linked with two arc-shaped excitable chan-

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 2 The idea of angle trisection. Angle arms are linked with red arc shaped excitable channels with the radius ratio r : R D 1 : 3. The channels have been excited at the same time at the right arm of the angle. At the moment when the excitation pulse (a yellow dot) on a smaller arch reaches the second arm the excitation on the other arch shows a point on a line trisecting the angle (the dashed line)

nels as shown in Fig. 2. The ratio of channel radii should be equal to 3. Both channels are excited at the same time at points on the same arm. The position of excitation at the larger arch at the moment when the excitation propagating on the shorter arch reaches the other arm belongs to a line that trisects the angle. The examples given above are based on two properties of structured excitable medium: the fact that the non-excitability of the neighborhood can restrict the motion of an excitation pulse to a channel, and that the speed of propagation depends on the excitability level of the medium but not on channel shape. This is true when channels are wide and the curvature is small. There are also other properties of an excitable medium useful for information processing. A typical shape of a pulse propagating in a stripe of excitable medium is illustrated in Fig. 3. The pulse moves from the left to the right as the arrow indicates. The profiles on Fig. 3b,c show cross sections of the concentrations of the activator u(x; y) and inhibitor v(x; y) along the horizontal axis of the stripe at a selected moment of time. The peak of the inhibitor follows activator maximum and is responsible for the refractory character of the region behind the pulse. Figure 3d illustrates the profile of an activator along the line perpendicular to the stripe axis. The concentration of the activator reaches its maximum in the stripe center and rapidly decreases at the boundary between the excitable and non-excitable areas. Therefore, the width of the excitable channel can be used as a parameter that controls the maximum concentration of an activator in a propagating pulse.

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 3 The shape of the excitation pulse in a stripe of excitable medium. The arrow indicates the direction of propagation. Calculations were done for the Oregonator model (Eqs. (1)–(3)) and the following values of parameters were applied: f D 1:12; q D 0:002; "1 D 0:08; "2 D 0:00097; excitable D 0:007; non-excitable D 0:075. a The position of a spike on the stripe, b, c concentration of activator and inhibitor along the x-axis, d concentration of activator along the y-axis

Figure 4 shows profiles of a concentration in an excitable channel with a triangular shape. The two curves on Fig. 4b illustrate the profile of u along the lines 1 and 2, respectively measured at the time when the concentration of u on a given line reaches its maximum. It can be seen that the maximum concentration of the activator decreases when a pulse propagates towards the tip of the triangle. This effect can be used to build a chemical signal diode. Let us consider two pieces of excitable medium, one of triangular shape and another rectangular, separated by a non-excitable gap as shown on Fig. 4a. It is expected that a perturbation of the rectangular area by a pulse propagating towards the tip of the triangular one is much smaller than the perturbation of the triangular area by a pulse propagating towards the end of rectangular channel. Us-

ing this effect, a chemical signal diode that transmits pulses only in one direction can be constructed just by selecting the right width of non-excitable medium separating two pieces of excitable medium: a triangular one and a rectangular one [5,40]. The idea of a chemical signal diode presented in Fig. 4a was, in some sense, generalized by Davydov et al. [46], who considered pulses of excitation propagating on a 2-dimensional surface in 3-dimensional space. It has been shown that the propagation of spikes on surfaces with rapidly changing curvature can be unidirectional. For example, such an effect occurs when an excitation propagates on the surface of a tube with a variable diameter. In such a case, a spike moving from a segment characterized by a small diameter towards a larger one is stopped.

627

628

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 4 The shape of activator concentration in an excitable medium of a triangular shape. a The structure of excitable (dark) and nonexcitable (light) regions, b concentration of activator along the lines marked in a. The profiles correspond to times when the activator reaches its maximum at a given line

For both of the chemical signal diodes mentioned above, the excitability of the medium is a non trivial function of some space variables. However, a signal diode can be also constructed when the excitability of the medium changes in one direction only, so in a properly selected coordinate system it is a function of a single space variable. For example, the diode behavior in a system where the excitability level is a triangular function of a single space variable has been confirmed in numerical simulations based on the Oregonator model of a photosensitive, Ru-catalyzed BZ reaction (Eqs. (1)–(3)) [76]. If the catalyst is immobilized, then pulses that enter the region of inhomogeneous illumination from the strongly illuminated site are not transmitted, whereas the pulses propagating in the other direction can pass through (see Fig. 5a). A similar diode-like behavior resulting from a triangular profile of medium excitability can be expected for oxidation of CO on a Pt surface. Calculations demonstrate that a triangular profile of temperature (in this case, reduced with respect to that which characterizes the excitable medium) allows for the unidirectional transmission of spikes characterized by a high surface oxygen concentration [23]. However, a realization of the chemical signal diode can be simplified. If the properties of excitable channels on both sites of a diode are the same, then the diode can be constructed with just two stripes of non-excitable medium characterized by different excitabilities as illustrated in Fig. 5b. If the symmetry is broken at the level of input channels then the construction of a signal diode can be yet simpler, and it reduces to a single narrow non-excitable gap with a much lower excitability than that of the neighboring channels (cf. Fig. 5c). In

both cases the numerical simulations based on the Oregonator model have shown that a diode works. The predictions of simulations have been qualitatively confirmed by experimental results [25]. Different realizations of a signal diode show that even for very simple signal processing devices the corresponding geometrical structure of excitable and non-excitable regions is not unique. Alternative constructions of chemical information processing devices seem important because they tell us on the minimum conditions necessary to build a device that performs a given function. In this respect a diode built with a single non-excitable region looks interesting, because such situation may occur at a cellular level, where the conditions inside the cell are different from those around. The diode behavior in the geometry shown on Fig. 5c indicates that the unidirectional propagation of spikes can be forced by a channel in a membrane, transparent to molecules or ions responsible for signal transmission. Wave-number-dependent transmission through a non-excitable barrier is another feature of a chemical excitable medium important for information processing [48]. Let us consider two excitable areas separated by a stripe of non-excitable medium and a pulse of excitation propagating in one of those areas. The perturbation of the area behind the stripe introduced by an arriving pulse depends on the direction of its propagation. If the pulse wavevector is parallel to the stripe then the perturbation is smaller than in the case when it arrives perpendicularly [48]. Therefore, the width of the stripe can be selected such that pulses propagating perpendicularly to the stripe can cross it, whereas a pulse propagating along the stripe

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 5 The excitability as a function of a space variable in different one-dimensional realizations of a signal diode with BZ- reaction inhibited by light: a a triangular profile of illumination, b the illumination profile in a diode composed of two non-excitable barriers, c the illumination profile in a single barrier diode with nonsymmetrical excitable inputs. The upper graphs in b and c illustrate illumination on a membrane used in experiments [25]

do not excite the area on the other side. This feature is frequently used to arrange the geometry of excitable channels such that pulses arriving from one channel do not excite the other. Non-excitable barriers in a structured medium can play a more complex role than that described above. The problem of barrier crossing by a periodic train of pulses can be seen as an excitation via a periodic perturbation of the medium. It has been studied in detail in [13]. The answer of the medium is quite characteristic in the form of a devil-staircase-like firing number as a function of perturbation strength. In the case of barrier crossing, the strength of the excitation behind a barrier generated by an arriving pulse depends on the character of the non-excitable medium, the barrier width, and the frequency of the incoming signal (usually, due to an uncompleted relaxation of the medium, the amplitude of spikes decreases with frequency). A typical complex frequency transformation after barrier crossing is illustrated in Fig. 6. Experimental and numerical studies on firing number of a transmitted signal have been published [10,64,72,74]. It is interesting that the shape of regions characterized by the same firing number in the space of two parameters, barrier width and signal frequency, is not generic and depends on the type of the excitable medium. Figure 7 compares the firing numbers obtained for FitzHugh–Nagumo and Rovinsky– Zhabotinsky models. In the first case, trains of pulses with small periods can cross wider barriers than trains characterized by low frequency; for the second model the dependence is reversed.

Logic Gates, Coincidence Detectors and Signal Filters The simplest application of excitable media in information processing is based on the assumption that the logical FALSE and TRUE variables are represented by the rest state and by the presence of an excitation pulse at a given point of the system respectively. Within such interpretation a pulse represents a bit of information propagating in space. When the system remains in its rest state, no information is recorded or processed, which looks plausible for biological interpretation. Information coded in excitation pulses is processed in regions of space where pulses interact (via collision and subsequent annihilation or via transient local change in the properties of the medium). In this section we demonstrate that the geometry of excitable channels and non-excitable gaps can be tortured (the authors are grateful to prof. S. Stepney for this expression) to the level at which the system starts to perform the simplest logic operations on pulses. The binary chemical logic gates can be used as building blocks for devices performing more complex signal processing operations. Information processing with structured excitable media is “unconventional” because it is performed without a clock that sequences the operations, as it is in the standard von Neumann type computer architecture. On the other hand, in the signal processing devices described below, the proper timing of signals is important and this is achieved by selecting the right length and geometry of channels. In some cases, for example for the

629

630

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 6 Frequency transformation on a barrier – the results for the FitzHugh–Nagumo model [64]. a The comparison of an arriving train of pulses (1) and the transmitted signal (2). b A typical dependence of the firing number as a function of barrier width. The plateaus are labeled with corresponding values of firing number (1, 6/7, 4/5, 3/4, 2/3, 1/2 and 0). At points labeled a–e the following values have been observed: a – 35/36; b – 14/15; c – 10/11; d – 8/9 and e – 5/6

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 7 The firing number as a function of the barrier width d and the interval of time between consecutive pulses (tp ). Labels in the white areas give the firing number, and the gray color marks values of parameters where more complicated transformations of frequency occur. a Results calculated for the FitzHugh–Nagumo model, b the Rovinsky–Zhabotinsky model. Both space and time are in dimensionless units

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 8 The distribution of excitable channels (dark) that form the OR gate. b and c illustrate the time evolution of a single pulse arriving from inputs I1 and I2, respectively

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 9 The distribution of excitable channels (dark) that form the AND gate. b and c illustrate the response of the gate to a single pulse arriving from input I1 and to a pair of pulses, respectively

operations on trains of pulses, the presence of a reference signal, which plays a role similar to a clock, would significantly help to process information [70]. Historically, the logical gates were the first devices that have been realized with a structured chemical medium [1,2,14,65,73,75]. The setup of channels that execute the logic sum (OR) operation is illustrated in Fig. 8 [48]. The gate is composed of three excitable stripes (marked gray) surrounded by a non-excitable medium. The gaps separating the stripes have been selected such that a pulse that arrives perpendicularly to the gap can excite the area on the other side and a pulse that propagates parallel to the gap does not generate sufficient perturbation to excite the medium behind the gap. If there is no input pulse, the OR gate remains in the rest state and does not produce any output. A pulse in any of the input channels I1 and I2 can cross the gap separating these channels from the output O and an output spike appears. The excitation of the output channel generated by a pulse from one of the input channels propagates parallel to the gap separating the other input channel so it does not interfere with the other input as seen in Fig. 8b and c. The frequency at which the described OR gate operates is limited by the refractory period of the output medium. If the signals from both input channels arrive, but the time difference between pulses is smaller than the refractory time, then only the first pulse will produce the output spike. The gate that returns the logic product (AND) of input signals is illustrated in Fig. 9. The width of the gap separating the output channel O from the input one (I1,I2) is selected such that a single excitation propagating in the input channel does not excite the output (Fig. 9b). However,

if two counterpropagating pulses meet, then the resulting perturbation is sufficiently strong to generate an excitation in the output channel (Fig. 9c). Therefore, the output signal appears only when both sites of the input channel have been excited and pulses of excitation collided in front of the output channel. The width of the output channel defines the time difference between input pulses treated as simultaneous. Therefore, the structure shown in Fig. 9 can also be used as a detector of time coincidence between pulses. The design of the negation gate is shown in Fig. 10. The gaps separating the input channels, the source channel and the output channel can be crossed by a pulse that propagates perpendicularly, but they are non-penetrable for pulses propagating parallel to the gaps. The NOT gate should deliver an output signal if the input is in the rest state. Therefore, it should contain a source of excitation pulses (marked S). If the input is in the rest state then pulses from the source propagate unperturbed and enter

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 10 The distribution of excitable channels (dark) that form the NOT gate

631

632

Computing in Geometrical Constrained Excitable Chemical Systems

the output channel. If there is an excitation pulse in the input channel then it enters the channel linking the source with the output and annihilates with one of pulses generated by the source. As a result no output pulse appears. At the first look, the described NOT gate works fine, but if we assume that a single pulse is used in information coding than the input pulse should arrive at the right time to block the source. Therefore, additional synchronization of the source is required. If information is coded in trains of pulses then the frequency of the source should match with the one used for coding. The structure of excitable channels for the exclusive OR (XOR) gate is illustrated in Fig. 11c [34]. Two input channels bring signals to the central area C. The output channels are linked together via diodes (Fig. 4) that stop possible backward propagation. As in the previous cases only pulses perpendicular to the gaps can pass through non-excitable gaps between the central area and both input and output channels. The shape of the central area has been designed such that an excitation generated by a single input pulse propagates parallel to one of the outputs and is perpendicular to another. As the result one of the output channels is excited (see Fig. 11a). However, if pulses from both input channels arrive at the same time then the wavevector of the excitation in the central part is always parallel to the boundaries (Fig. 11b). Therefore, no output signal appears. Of course, there is no output signal if none of the inputs are excited. It is worth noticing that for some geometries of the XOR gate the diodes in output channels are not necessary because the backward propagation does not produce a pulse with a wavevector perpendicular to a gap as seen in the fourth frame of Fig. 11a. Another interesting example of behavior resulting from the interaction of pulses has been observed in a crossshaped structure built of excitable regions, separated by gaps penetrable for perpendicular pulses [65,66] shown in Fig. 12. The answer of cross-shaped junction to a pair pulses arriving from two perpendicular directions has been studied as a function of the time difference between pulses. Of course, if the time difference is large, pulses propagate independently along their channels. If the time difference is small the cross-junction acts like the AND gate and the output excitation appears in one of the corner areas. However, for a certain time difference the first arriving pulse is able to redirect the other and force it to follow. The effect is related to uncompleted relaxation of the central area of the junction at the moment when the second pulse arrives. Pulse redirection seems to be an interesting effect from the point of programming with excitation pulses, but in practice it requires a high precision in selecting the right time difference.

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 11 c the distribution of excitable channels (dark) that form the XOR gate. a and b illustrate the response of the gate to a single pulse arriving from input I1 and to a pair of pulses, respectively

The logic gates described above can also be applied to transform signals composed of many spikes. For example, two trains of pulses can be added together if they pass through an OR gate. The AND gate creates a signal composed of coinciding pulses from both trains. It is also easy to use a structured excitable medium and generate a signal that does not contain coinciding pulses [50]. The structure of the corresponding device is illustrated in Fig. 13. All non-excitable gaps are penetrable for perpendicular pulses. If a pulse of excitation arrives from one of the in-

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 13 The distribution of excitable channels in devices that compare trains of pulses. The device shown in a applies the XOR operation to a pair of signals and makes a signal composed of spikes that do not coincide with the other signal. b generates a signal composed of spikes that arrive through I1 and do not coincide with excitation pulses coming from I2 Computing in Geometrical Constrained Excitable Chemical Systems, Figure 12 The distribution of excitable and non-excitable regions in a cross-shaped junction. Here the excitable regions are gray and the non-excitable black. Consecutive figures illustrate an interesting type of time evolution caused by interaction of pulses. Two central figures are enlarged in order to show how uncompleted relaxation influences the shape of the next pulse

put channels then it excites the segment A and propagates towards the other end of it. If there is no spike in the other input channel then this excitation propagates unperturbed and activates the output channel. However, if a pulse from the other channel arrives it annihilates with the original pulse and no output signal is generated. The structure illustrated on Fig. 13a can be easily transformed into a device that compares two trains of pulses such that the resulting signal is composed of spikes of the first signal that do not coincide with the pulses of the second train. Such a device can be constructed just by neglecting one of the output channels. Figure 13b illustrates the geometry of the device that produces the output signal composed of pulses in arriving at the input I1 that have no corresponding spikes in the signal arriving from I2. The coincidence detector can be used as a frequency filter that transmits periodic signals within a certain frequency interval [22,50]. The idea of such a filter is shown

in Fig. 14. The device tests if the time between subsequent spikes of the train remains in the assumed range. As illustrated the signal arriving from the input I separates and enters the segment E through diodes D1 and D2. The segments E and F form a coincidence detector (or an AND gate, c. f. Fig. 9). The excitation of the output appears when an excitation coming to E via the segment C and D2 is in the coincidence with the subsequent spike of the train that arrived directly via D1. The time shift for which coincidences are tested is decided by the difference in lengths of both paths. The time resolution depends on the width of the F channel (here l2  l1 ). For periodic signals the presented structure works as a frequency filter and transmits signals within the frequency range f˙ D v/(r ˙ (w)/2), where r is the difference of distances traveled by spikes calculated to the point in E placed above the center of F channels ( 2  l e ), w is the width of the output channel, and v is the velocity of a spike in the medium. Typical characteristics of the filter are illustrated on Fig. 14b. The points represent results of numerical simulations and the line shows the filter characteristics calculated from the equation given above. It can be noticed that at the ends of the transmitted frequency band a change in output signal frequency is observed. This unwelcome effect can be easily avoided if a sequence of two

633

634

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 14 a The distribution of excitable and non-excitable regions in a band filter. b Typical characteristics of the band filter [22]. The black dots mark periods for which the signal is not transmitted, the empty ones indicate full transmission, the empty diamonds mark the periods of the arriving signal for which every second spike is transmitted

identical filters is used. A filter tuned to a certain frequency will also pass any of its harmonics because the output signal can be generated by the coincidence of every second, third, etc. pulses in the train. Chemical Sensors Built with Structured Excitable Media Even the simplest organisms without specialized nerve systems or brains are able to search for the optimum living conditions and resources of food. We should not be astonished by this fact because even chemical systems can receive and process information arriving from their neighborhood. In this section we describe how simple structures built of excitable media can be used for direction and distance sensing. In order to simplify the conversion between an external stimulus and information processed by a sensor, we assume that the environment is represented by the same excitable medium as the sensor itself, so a stimulus can directly enter the sensor and be processed. One possible strategy of sensing is based on a one-toone relationship between the measured variable and the activated sensor channel [53]. An example of a sensor of that type is illustrated on Fig. 15. It is constructed with a highly excitable, black ring surrounded by a number

of coincidence detectors, denoted as X 1 , X 2 and X 3 . The excitation of a detector appears if a pair of pulses collide on the ring in front of it. Let us consider a homogeneous excitable environment and a spherical pulse of excitation with the source at the point S1 . High excitability of the ring means that excitations on the ring propagate faster than those in the surrounding environment. At a certain moment the pulse arrives at the point P1 , which lies on a line connecting the S1 and the center of the ring (see Fig. 15b). The arriving pulse creates an excitation on the ring originating from the point P1 . This excitation splits into a pair of pulses rotating in opposite directions and after propagating around the ring, they collide at the point, which is symmetric to P with respect to the center of the ring O. The point of collision can be located by an array of coincidence detectors and thus we have information on the wave vector of the arriving pulse. In this method the resolution depends on the number of detectors used because each of them corresponds to a certain range of the measured wave vectors. The fact that a pulse has appeared in a given output channel implies that no other channel of the sensor gets excited. It is interesting that the output information is reversed in space if compared with the input one; the left sensor channels are excited when an excitation arrives from the right and vice versa.

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 15 The distribution of excitable areas (gray and black) and non-excitable regions (white) in the ring-shaped direction detector. Triangles marked X 1 , X 2 and X 3 show the output channels. b and c illustrate time evolution of pulses generated from sources at different locations. The excited output channel depends on the direction of the source (copied from [53] with the permission of the authors)

The geometry of the direction sensor that sends information as an excitation pulse in one of its detector channels can be yet simplified. Two such realizations are illustrated in Fig. 16. The excitable areas that form the sensor are marked black, the excitable surrounding medium where stimuli propagate is gray, and the non-excitable areas are white. Let us consider an excitation generated in the medium M by a source S. It is obvious that if the source is located just above the sensor then pulses of excitation are generated at the ends of the sensor channel D at the same time and they finally annihilate above the central coincidence channel and generate an output pulse in it. If the source is located at a certain angle with respect to the vertical line then the annihilation point is shifted off the center of D. We have performed a series of numerical simulations to find the relation between the position of the source and the position of annihilation point in D. In Fig. 16c and 16d the position of the annihilation point is plotted as a function of the angle between the source position and the vertical line. Although the constructions of sensors seem to be very similar, the working range of angles is larger for the sensor shown on Fig. 16b whereas the sensor shown in Fig. 16a offers slightly better resolution. Using information from two detectors of direction, the position of the source of excitation can be easily located because it is placed on the intersection of the lines representing the detected directions [77]. Another strategy of sensing is based on the observation that frequencies of pulses excited in a set of sensor channels by an external periodic perturbation contain information on the location of a stimulus. Within this strategy there is no direct relationship between the number of

sensor channels and sensor resolution, and, as we show below, a sensor with a relatively small number of channels can quite precisely estimate the distance separating it from the excitation source. As we have mentioned in the Introduction, the frequency of a chemical signal can change after propagating through a barrier made of non-excitable medium. For a given frequency of arriving spikes the frequency of excitations behind the barrier depends on the barrier width and on the angle between the normal angle to the barrier and the wave vector of arriving pulses. This effect can be used for sensing. The geometrical arrangement of excitable and non-excitable areas in a distance sensor is shown in Fig. 17. The excitable signal channels (in Fig. 17a they are numbered 1–5) are wide enough to ensure stable propagation of spikes. They are separated from one another by parallel non-excitable gaps that do not allow for interference between pulses propagating in the neighboring channels. The sensor channels are separated from the excitable medium M by the non-excitable sensor gap G. The width of this gap is very important. If the gap is too wide then no excitation of the medium M can generate a pulse in the sensor channels. If the gap is narrow then any excitation in front of the sensor can pass G and create a spike in each sensor channel so the signals sent out by the sensor channels are identical. However, there is a range of gap widths such that the firing number depends on the wave vector characterizing a pulse at the gap in front of the channel. If the source S is close to the array of sensor channels, then the wave vectors characterizing excitations in front of various channels are significantly different. Thus the frequencies of excitations in various channels should differ too. On the other hand, if

635

636

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 16 The geometry of excitable (black) and diffusive (white) areas in two realizations of simplified distance sensors. The gray color marks the excitable medium around the sensor, S marks the position of excitation source. c and d show the position of the annihilation point in the channel D as a function of the angle q between the source position and the vertical line. The half-length of D is used as the scale unit

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 17 The distance sensor based on frequency transformation. a The distribution of excitable channels, b the firing numbers in different channels as a function of distance; black, green, blue red and violet curves correspond to signals in channels 1–5, respectively

Computing in Geometrical Constrained Excitable Chemical Systems

the source of excitations is far away from the gap G then the wave vectors in front of different channels should be almost identical and the frequencies of excitations should be the same. Therefore, the system illustrated in Fig. 17a can sense the distance separating it from the source of excitations. If this distance is small then the firing numbers in neighboring sensor channels are different and these differences decrease when the source of excitations moves away. A typical distance dependence of firing numbers observed in different channels is illustrated in Fig. 17b. This result has been obtained in numerical simulations based on the Oregonator model. The range of sensed distances depends on the number of sensor channels. A similar sensor, but with 4 sensor channels, was studied in [28]. Comparing the results, we observe that the presence of 5th channel significantly improves the range of distances for which the sensor operates. On the other hand, the additional channel has almost no effect on sensor resolution at short distances. The sensor accuracy seems to be a complex function related to the width of the sensor gap and the properties of channels. The firing number of a single channel as a function of the distance between the sensor and the source has a devilstaircase-like form with long intervals where the function is constant corresponding to simple fractions as illustrated in Fig. 5b. In some range of distances the steps in firing numbers of different channels can coincide, so the resolution in this range of distances is poor. For the other regions, the firing numbers change rapidly and even small changes in distance can be easily detected. The signal transformation on a barrier depends on the frequency of incoming pulses (c. f. Fig. 6) so the distance sensor that works well for one frequency may not work for another. In order to function properly for different stimuli the sensor has to be adapted to the conditions it operates. In practice such adaptation can be realized by the comparison of frequencies of signals in detector channels with the frequency in a control channel. For example the control channel can be separated from the medium M by a barrier so narrow that every excitation of the medium generates a spike. The comparison between the frequency of excitations in the control channel and in the sensor channels can be used to adjust the sensor gap G. If the frequency of excitations in the sensor channels is the same as in the control channels then the width of the gap should be increased or its excitability level decreased. On the other hand if the frequency in the sensor channel is much smaller than in the control channel (or null) then the gap should be more narrow or more excitable. Such an adaptation mechanism allows one to adjust the distance detector to any frequency of arriving excitations.

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 18 Two snapshots from the experimental realization of the distance sensor with four channels. In the upper figure the source (1 mm thick silver wire) is placed 2 mm away from the sensor; in the bottom one the source is 12 mm away. The firing numbers are given next to the corresponding channels

The fact that the distance detector described above actually works has been confirmed in experiments with a photosensitive Ru-catalyzed BZ reaction. Typical snapshots from two experiments performed for the source placed 2 and 12 mm away from the sensors are shown in Fig. 18. The firing numbers observed in different sensor channels confirm qualitatively the predictions of numerical simulations. If the source of excitations is close to the sensor gap, then the differences between firing numbers observed in neighboring channels are large. On the other hand, when the source of excitations is far away from the sensor, the frequencies in different channels become similar. The range of distances at which the sensor works is measured in centimeters so it is of the same order as the sensor size. The Ring Memory and Its Applications The devices discussed in the previous section can be classified as instant machines [58] capable of performing just the task they have been designed for. A memory where information coded in excitation pulses can be written-in, kept, read-out and, if necessary, erased, significantly increases the information processing potential of structured excitable media. Moreover, due to the fact that the state of memory can be changed by a spike, the memory allows for programming with excitation pulses. One possible realization of a chemical memory is based on the observation that

637

638

Computing in Geometrical Constrained Excitable Chemical Systems

a pulse of excitation can rotate on a ring-shaped excitable area as long as the reactants are supplied and the products removed [41,52,55]. Therefore, a ring with a number of spikes rotating on it can be regarded as a loaded memory cell. Such memory can be erased by counterpropagating pulses. The idea of memory with loading pulses rotating in one direction and erasing pulses in another has been discussed in [49]. If the ring is big then it can be used to memorize a large amount of information because it has many states corresponding to different numbers of rotating pulses. However, in such cases, loading the memory with subsequent pulses may not be reliable because the input can be blocked by the refractory tail left by one of already rotating pulses. The same effect can block the erasing pulses. Therefore, the memory capable of storing just a single bit seems to be more reliable and we consider it in this section. Such memory has two states: if there is a rotating pulse the ring is in the logical TRUE state (we call it loaded); if there is no pulse the state of memory corresponds to the logical FALSE and we call such memory erased. Let us consider the memory illustrated in Fig. 19. The black areas define the memory ring, the output channel O, and the loading channel ML. The memory ring is formed by two L-shaped excitable areas. The areas are separated by gaps and, as we show below, with a special choice of the gaps the symmetry of the ring is broken and unidirectional rotation ensured. The Z-shaped excitable area composed of gray segments inside the ring forms the erasing channel. The widths of all non-excitable gaps separating excitable areas are selected such that a pulse of excitation propagating perpendicularly to the gap excites the active area on the other site of the gap, but the gap is impenetrable for pulses propagating parallel to the gap. The memory cell can be loaded by a spike arriving from the ML channel. Such a spike crosses the gap and generates an excitation

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 19 The distribution of excitable channels (dark) that form a memory cell. The cell is composed of the loading channel ML, the memory ring, the erasing channel ME (marked gray) and the output channel O

on the ring rotating counterclockwise. The information about loaded memory is periodically sent out as a series of spikes through the output channel O. The rotating pulse does not affect the erasing channel because it always propagates parallel to it. The erasing excitation is generated in the center of the Z-shaped area and it splits into two erasing pulses. These pulses can cross the gaps separating the erasing channel from the memory ring and create a pair of pulses rotating clockwise. A spike that propagates clockwise on the memory ring is not stable because it is not able to cross any of the gaps and dies. It also does not produce any output signal. Therefore, if the memory has not been loaded then an erasing excitation does not load it. On the other hand, if the memory is loaded then clockwise rotating pulses resulting from the excitation of the erasing channel annihilate with the loading pulse and the memory is erased. The idea of using two places where erasing pulses can enter the memory ring is used to ensure that at least one of those places is fully relaxed and so one of erasing pulses can always enter the ring. In order to verify that such memory works, we have performed a number of simulations using the Rovinsky– Zhabotinsky model of a BZ reaction and have done the experiments. We considered a loaded memory and at a random time the excitation was generated in the middle of the erasing channel. In all cases, such an excitation erased the memory ring. A typical experimental result is illustrated in Fig. 20. Here the channels are 1.5 mm thick and the gaps are 0.1 mm wide. Figure 20a shows the loaded memory and initiated pair of pulses in the erasing channel. In Fig. 20b one of the pulses from the erasing channel enters the memory ring. The memory ring in front of the left part of erasing channel is still in the refractory state so the left erasing pulse has not produced an excitation on the ring. Finally in Fig. 20c we observe the annihilation of the loading pulse with one of erasing pulses. Therefore, the state of memory changed from loaded to unloaded. The experiment was repeated a few times and the results were in a qualitative agreement with the simulations: the loaded memory cell kept its information for a few minutes and it was erased after every excitation of the erasing channel. Figure 21 illustrates two simple, yet interesting, applications of a memory cell. Figure 21a shows a switchable unidirectional channel that can be opened or closed depending on the state of memory [29]. The channel is constructed with three excitable segments A, B and C separated by signals diodes D1 and D2 (c. f. Fig. 4). The mid segment B is also linked with the output of the memory ring M. Here the erasing channels of M are placed outside the memory ring, but their function is exactly the same as in the memory illustrated in Fig. 19. The idea of switch-

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 20 Three snapshots from an experiment with memory erasing. The memory ring is formed by two L-shaped excitable channels, the Z-shaped erasing channel is inside the ring

able channel is similar to the construction of the NOT gate (Fig. 10). If the memory is not loaded then the spike propagation from input I to output O is unperturbed. However, if the memory is loaded then pulses of excitation periodically enter the segment B and annihilate with the transmitted signal. As a result, the channel is either open or blocked depending on the state of the memory. The excitations generated by the memory ring do not spread outside

the B segment. On one end their propagation is stopped by the diode D1, on the other by the geometry of the junction between the B channel and the memory output. The state of memory is controlled by the excitation pulses coming from loading and erasing channels, so switchable channels can be used in devices that are programmable with excitation pulses. Figure 21b illustrates a self-erasing memory cell that changes its state from loaded to erased after a certain time. In such a memory cell, the output channel is connected with the erasing one. When the memory is loaded the output signal appears. After some time decided by the length of connecting channels an output pulse returns as an erasing pulse and switches the memory to unloaded state. This behavior is an example of a simple feedback process common in self regulating information processing systems. Using memory cells, signal diodes, and coincidence detectors, one can construct devices which perform more complex signal processing operations. As an example, we present a simple chemical realization of a device that counts arriving spikes and returns their number in any chosen positional representation [27]. Such a counter can be assembled from single digit counters. The construction of a single digit counter depends on the representation used. Here, as an example, we consider the positional representation with the base 3. The geometry of a single digit counter is schematically shown in Fig. 22. Its main elements are two memory cells M1 and M2 and two coincidence detectors C1 and C2 . At the beginning let us assume that none of the memory cells are loaded. When the first pulse arrives through the input channel I0 , it splits at all junctions and excitations enter segments B0 , B1 and B2 . The pulse that has propagated through B0 loads the memory cell M1 . The pulses that have propagated through B1 and B2 die at the bottom diodes of segments C1 and C2 respectively. Thus, the first input pulse loads the memory M1 and does not change the state of M2 . When M1 is loaded, pulses of excitation are periodically sent to segments B0 and C1 via the bottom channel. Now let us consider what happen when the second pulse arrives. It does not pass through B0 because it annihilates with the pulses arriving from the memory M1 . The excitations generated by the second pulse can enter B1 and B2 . The excitation that propagated through B2 dies at the bottom diode of the segment C2 . The pulse that has propagated through B1 enters C1 , annihilates with a pulse from memory M1 and activates the coincidence detector. The output pulse from the coincidence detector loads the memory M2 . Therefore, after the second input pulse both memories M1 and M2 are loaded. If the third pulse arrives the segments B0 and B1 are blocked by spikes sent

639

640

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 21 Two simple applications of a memory cell. a A switchable unidirectional channel that stops or transmits signals depending on the state of memory. b A self-erasing memory cell that changes its state after a certain time. SEC marks the connection between the memory output and the erasing channel

from the memory rings. The generated excitation can enter channel B2 and its collision with a pulse coming from the memory cell M2 activates the output channel of C2 . The output signal is directed to the counter of responsible for the digit at next position (I1) and it is also used to erase all memory cells. Thus after the third pulse both memory cells M1 and M2 are erased. The counter shown in Fig. 22 returns a digit in a representation with the base 3: here 0 is represented by the (M1 ; M2 ) D (0; 0), 1 by (1; 0), 2 by (1; 1) and the next pulse changes the state of memory cell into (M1 ; M2 ) D (0; 0). Of course, using n  1 memory cells in a single digit counter we can represent digits of the system with base n. A cascade of single digit counters (see Fig. 22b) gives a positional representation of the number of arriving pulses. Artificial Chemical Neurons with Excitable Medium In this section we discuss a simple realization of an artificial neuron with structured excitable medium and show how neuron networks can be used in programmable devices. We consider a chemical analogy of the McCulloch– Pitts neuron, i. e., a device that produces the output signal if the combined activation exceeds a critical value [32]. The geometry of a chemical neuron is inspired by a the structure of biological neuron [30]. One of its realizations with an excitable medium is illustrated on Fig. 23. Another geometry of a neuron has been discussed in [24]. In an artificial chemical neuron, like in real neurons, dendrites (input channels 1–4) transmit weak signals which are added together through the processes of spatial and temporal integration inside the cell body (part C). If the aggregate excitation is larger than the threshold value the cell body gets excited. This excitation is transmitted as an output signal

down the axon (the output channel) and the amplitude of the output signal does not depend on the value of integrated inputs but only on the properties of the medium that makes the output channel. In Fig. 23a the axon is not shown and we assume that it is formed by an excitable channel located perpendicularly above the cell body. In the construction discussed in [24] both dendrites and the axon were on a single plane. We have studied the neuron using numerical simulations based on the Oregonator model and the reaction-diffusion equations have been solved on a square grid. The square shape of the neuron body and the input channels shown on Fig. 23a allows for precise definition of the boundary between the excitable and non-excitable parts. The idea behind the chemical neuron is similar to that of the AND gate: perturbations introduced by multiple inputs combine and generate a stronger excitation of the cell body than this resulting from a single input pulse. Therefore, it is intuitively clear that if we are able to adjust the amplitudes of excitations coming from individual input channels, then the output channel becomes excited only when the required number of excitations arrive from the inputs. In the studied neuron, the amplitudes of spikes in input channels have been adjusted by sub-excitability of these channels which can be controlled by the channel width or by the illumination of surrounding nonexcitable medium. In our simulations we considered different values of  p for non-excitable areas, whereas  a D 0:007 characterizing excitable regions has been fixed for dendrites and the neuron body. Simulation results shown on Fig. 23b indicate that the properties of chemical neurons are very sensitive with respect to changes in  p . The thick line marks the values of  p for which the output signal appears. For the parameters used when  p is slightly

Computing in Geometrical Constrained Excitable Chemical Systems

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 22 The counter of excitation pulses that arrive at the input I0. a shows the geometry of excitable channels (black) in a single digit counter for the positional representation with the base 3. b is a schematic illustration of the cascade of single digit counters that provides a positional representation. The feedback signals from E1, E2 and E3 channels erase the memory of the single digit counters

Computing in Geometrical Constrained Excitable Chemical Systems, Figure 23 Artificial chemical neuron. a The geometry of excitable and non-excitable areas; b The response of the neuron to different types of excitations as a function of the illumination of non-excitable regions. The numbers given on the left list the excited channels. The values of p for which the neuron body gets excited are marked by a thick line

below 0.047377 any single excitation produces an output signal. On the other hand if  p > 0:047397 then even the combined excitation of all inputs is not sufficient to excite the neuron. In between those two limiting values we observe all other thresholds; i. e., an output excitation as the result of two or three combined inputs. Therefore, by applying the proper illumination level the structure shown in Fig. 23a can work as a four input McCulloch–Pitts neuron with the required threshold. A similar high sensitivity of the neuron properties on  a is also observed. It means that neuron properties can be controlled with tiny changes in system parameters. The chemical neuron illustrated in Fig. 23a can be used to program signal processing with pulses of excitation. If

we set the illuminations such that any two input pulses produce the output and use one of the inputs to control the device, then if the control pulse is present the device performs the OR operation on the other inputs. If there is no control pulse, it calculates an alternative on conjunctions of all pairs of channels. The geometry of a network constructed with neurons can be easily controlled if the switchable channels illustrated on Fig. 21b are used to establish or cut connections between processing elements. The state of the memory that controls the channel may depend on the state of the network through a feedback mechanism what allows for network training. In the chemical programming described above pulses of concentration of the same reagent

641

642

Computing in Geometrical Constrained Excitable Chemical Systems

are used to store and process information, and to program the medium. Therefore, the output signal may be directly used to change the network geometry. External programming factors like illumination or a temperature field [3] are difficult for applications in three dimensional structures. The pulse based programming seems to be easier if the proper geometry of switchable channels and of the feedbacks is introduced. At the first look it seems that a practical realization of a network built of chemical neurons may be difficult. However, the geometry of a considered neuron looks quite similar to structures of phases formed in a multicomponent system. For example, the diamond structure in an oilwater-surfactant system, which spontaneously appears at certain thermodynamic conditions [11], has a form of centers linked with the four nearest neighbors. If the reactants corresponding for excitability are soluble in water, but not in oil then the water rich phase forms the structure of excitable channels and processing elements just as the result of thermodynamic conditions. Within a certain range of parameters, such a structure is thermodynamically stable. This means that the network has an auto-repair ability and robustness against unexpected destruction. The subexcitability of a channel is related to its diameter, so the required value can be obtained by selecting the right composition of the mixture and the conditions at which the phase transition occurs. Moreover, the structure is three dimensional, which allows for a higher density of processing elements than that obtained with the classical two-dimensional techniques, for example lithography. Perspectives and Conclusions Perspectives In this article we have described a number of simple devices constructed with structured excitable chemical media which process information coded in excitation pulses. All the considered systems process information in an unconventional (non-von Neumann) way; i. e., without an external clock or synchronizing signal that controls the sequence of operations. On the other hand, in many cases the right timing of performed operation is hidden in the geometrical distribution and sizes of excitable regions. The described devices can be used as building blocks for more complex systems that process signals formed of excitation pulses. Some of the discussed devices can be controlled with spikes. Therefore, there is room for programming and learning. However, the further development of applications of structured excitable medium for information processing along this line seem to follow the evolution of classical electronic computing. In our opinion it would be

more interesting to step away from this path and learn more about the potential offered by excitable media. It would be interesting to study new types of excitable media suitable for information processing. Electrical analogs of reaction-diffusion systems ( Unconventional Computing, Novel Hardware for) seem promising, because they are more robust than the wetware based on liquid BZ media. In such media, spikes propagate much faster and the spatial scale can be much reduced if compared with typical chemical systems. They seem to be promising candidates for the hardware in geometrically oriented problems and direct image processing [4]. However, two interesting properties of chemical reaction-diffusion systems are lost in their electrical analogs. First, chemical information processing systems, unlike the electronic ones, integrate two functions: the chemical reactivity and the ability to process information. Having in mind the excitable oxidation of CO on a Pt surface and the potential application of this medium for information processing [23], we can think of catalysts that are able to monitor their activity and report it. Second, the media described in  Unconventional Computing, Novel Hardware for are two-dimensional. Most of the systems discussed in this article can be realized in three dimensions. The use of two dimensional media in chemical experiments is mainly related with significant difficulties in observations of three dimensional chemical excitations [42]. Potential application of phase transitions for channel structure generation described in the previous section should increase the interest in information processing with three dimensional medium. Studies on more effective methods of information coding are important for the further development of reaction-diffusion computing. The most simple translation of chemistry into the language of information science based on the equivalence between the presence of a spike and the TRUE logic value is certainly not the most efficient. It can be expected that information interpreted within multivalued logic systems is more suitable for transformations with the use of structured media [47]. As an example, let us consider three-valued logic encoded by excitation pulses in the following way: the lack of a pulse represents a logical FALSE (F), two pulses separated by time ıt correspond at the logical TRUE (T), and one pulse is interpreted as a “nonsense” (F). Let us assume that two signals coded as described above are directed to the inputs of the AND gate illustrated on Fig. 9 and that they are fully synchronized at the input. The gap between the input and output channels is adjusted such that an activator diffusing from a single traveling pulse is not sufficient to generate a new pulse in the output channel, but the excitation resulting from the

Computing in Geometrical Constrained Excitable Chemical Systems

collision of facing pulses at the junction point exceeds the threshold and the output pulse is generated. Moreover, let us assume that the system is described by the dynamics for which signals are transformed as illustrated in Fig. 7a (for example FitzHugh–Nagumo dynamics) and that the time difference between spikes ıt is selected such that the second spike can pass the gap. The output signal appears when spikes from both input channels are in coincidence or when the second spike from the same input arrives. As a result the device performs the following function within 3-valued logic [47]: F 1 T F ?

T T ? ?

F ? F F

? ? F ?

The same operation, if performed on a classical computer would require a procedure that measures time between spikes. The structured excitable medium performs it naturally provided that the time ıt is adjusted with the properties of the medium and the geometry of the gap. Of course, similar devices that process variables of n-valued logic can be also constructed with structured excitable media. Conclusions In the article we have presented a number of examples that should convince the reader that structured excitable media can be used for information processing. The future research will verify if this branch of computer science is fruitful. It would be important to find new algorithms that can be efficiently executed using a structured excitable medium. However, the ultimate test for the usefulness of ideas should come from biology. It is commonly accepted that excitable behavior is responsible for information processing and coding in living organisms [8,31,36,60]. We believe that studies on chemical information processing will help us to better understand these problems. And, although at the moment computing with a homogeneous excitable medium seems to offer more applications than that with the structured one, we believe the proportions will be reversed in the future. After all, our brains are not made of a single piece of a homogeneous excitable medium. Acknowledgments The research on information processing with structured excitable medium has been supported by the Polish State Committee for Scientific Research project 1 P03B 035 27.

Bibliography Primary Literature 1. Adamatzky A, De Lacy Costello B (2002) Experimental logical gates in a reaction-diffusion medium: The XOR gate and beyond. Phys Rev E 66:046112 2. Adamatzky A (2004) Collision-based computing in Belousov– Zhabotinsky medium. Chaos Soliton Fractal 21(5):1259–1264 3. Adamatzky A (2005) Programming Reaction-Diffusion Processors. In: Banatre J-P, Fradet P, Giavitto J-L, Michel O (eds) LNCS, vol 3566. Springer, pp 47–55 4. Adamatzky A, De Lacy Costello B, Asai T (2005) Reaction-Diffusion Computers. Elsevier, UK 5. Agladze K, Aliev RR, Yamaguchi T, Yoshikawa K (1996) Chemical diode. J Phys Chem 100:13895–13897 6. Agladze K, Magome N, Aliev R, Yamaguchi T, Yoshikawa K (1997) Finding the optimal path with the aid of chemical wave. Phys D 106:247–254 7. Agladze K, Tóth Á, Ichino T, Yoshikawa K (2000) Propagation of Chemical Pulses at the Boundary of Excitable and Inhibitory Fields. J Phys Chem A 104:6677–6680 8. Agmon-Snir H, Carr CE, Rinzel J (1998) The role of dendrites in auditory coincidence detection. Nature 393:268–272 9. Amemiya T, Ohmori T, Yamaguchi T (2000) An Oregonatorclass model for photoinduced Behavior in the Ru(bpy)2C 3 – Catalyzed Belousov–Zhabotinsky reaction. J Phys Chem A 104:336–344 10. Armstrong GR, Taylor AF, Scott SK, Gaspar V (2004) Modelling wave propagation across a series of gaps. Phys Chem Chem Phys 6:4677–4681 11. Babin V, Ciach A (2003) Response of the bicontinuous cubic D phase in amphiphilic systems to compression or expansion. J Chem Phys 119:6217–6231 12. Bertram M, Mikhailov AS (2003) Pattern formation on the edge of chaos: Mathematical modeling of CO oxidation on a Pt(110) surface under global delayed feedback. Phys Rev E 67:036207 13. Dolnik M, Finkeova I, Schreiber I, Marek M (1989) Dynamics of forced excitable and oscillatory chemical-reaction systems. J Phys Chem 93:2764–2774; Finkeova I, Dolnik M, Hrudka B, Marek M (1990) Excitable chemical reaction systems in a continuous stirred tank reactor. J Phys Chem 94:4110–4115; Dolnik M, Marek M (1991) Phase excitation curves in the model of forced excitable reaction system. J Phys Chem 95:7267–7272; Dolnik M, Marek M, Epstein IR (1992) Resonances in periodically forced excitable systems. J Phys Chem 96:3218–3224 14. Epstein IR, Showalter K (1996) Nonlinear Chemical Dynamics: Oscillations, Patterns, and Chaos. J Phys Chem 100:13132– 13147 15. Feynman RP, Allen RW, Heywould T (2000) Feynman Lectures on Computation. Perseus Books, New York 16. Field RJ, Körös E, Noyes RM (1972) Oscillations in chemical systems. II. Thorough analysis of temporal oscillation in the bromate-cerium-malonic acid system. J Am Chem Soc 94:8649– 8664 17. FitzHugh R (1960) Thresholds and plateaus in the HodgkinHuxley nerve equations. J Gen Physiol 43:867–896 18. FitzHugh R (1961) Impulses and physiological states in theoretical models of nerve membrane. Biophys J 1:445–466 19. Field RJ, Noyes RM (1974) Oscillations in chemical systems. IV.

643

644

Computing in Geometrical Constrained Excitable Chemical Systems

20.

21.

22.

23.

24.

25.

26.

27. 28.

29.

30. 31. 32. 33.

34.

35. 36.

37.

38.

39.

Limit cycle behavior in a model of a real chemical reaction. J Chem Phys 60:1877–1884 Gaspar V, Bazsa G, Beck MT (1983) The influence of visible light on the Belousov–Zhabotinskii oscillating reactions applying different catalysts. Z Phys Chem (Leipzig) 264:43–48 Ginn BT, Steinbock B, Kahveci M, Steinbock O (2004) Microfluidic Systems for the Belousov–Zhabotinsky Reaction. J Phys Chem A 108:1325–1332 Gorecka J, Gorecki J (2003) T-shaped coincidence detector as a band filter of chemical signal frequency. Phys Rev E 67:067203 Gorecka J, Gorecki J (2005) On one dimensional chemical diode and frequency generator constructed with an excitable surface reaction. Phys Chem Chem Phys 7:2915–2920 Gorecka J, Gorecki J (2006) Multiargument logical operations performed with excitable chemical medium. J Chem Phys 124:084101 (1–5) Gorecka J, Gorecki J, Igarashi Y (2007) One dimensional chemical signal diode constructed with two nonexcitable barriers. J Phys Chem A 111:885–889 Gorecki J, Kawczynski AL (1996) Molecular dynamics simulations of a thermochemical system in bistable and excitable regimes. J Phys Chem 100:19371–19379 Gorecki J, Yoshikawa K, Igarashi Y (2003) On chemical reactors that can count. J Phys Chem A 107:1664–1669 Gorecki J, Gorecka JN, Yoshikawa K, Igarashi Y, Nagahara H (2005) Sensing the distance to a source of periodic oscillations in a nonlinear chemical medium with the output information coded in frequency of excitation pulses. Phys Rev E 72:046201 (1–7) Gorecki J, Gorecka JN (2006) Information processing with chemical excitations – from instant machines to an artificial chemical brain. Int J Unconv Comput 2:321–336 Haken H (2002) Brain Dynamics. In: Springer Series in Synergetics. Springer, Berlin Häusser M, Spruston N, Stuart GJ (2000) Diversity and Dynamics of Dendritic Signaling. Science 290:739–744 Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. Addison-Wesley, Redwood City Ichino T, Igarashi Y, Motoike IN, Yoshikawa K (2003) Different operations on a single circuit: Field computation on an Excitable Chemical System. J Chem Phys 118:8185–8190 Igarashi Y, Gorecki J, Gorecka JN (2006) Chemical information processing devices constructed using a nonlinear medium with controlled excitability. Lect Note Comput Science 4135:130–138 Kapral R, Showalter K (1995) Chemical Waves and Patterns. Kluwer, Dordrecht Kindzelskii AL, Petty HR (2003) Intracellular calcium waves accompany neutrophil polarization, formylmethionylleucylphenylalanine stimulation, and phagocytosis: a high speed microscopy study. J Immunol 170:64–72 Krischer K, Eiswirth M, Ertl GJ (1992) Oscillatory CO oxidation on Pt(110): modelling of temporal self-organization. J Chem Phys 96:9161–9172 Krug HJ, Pohlmann L, Kuhnert L (1990) Analysis of the modified complete Oregonator accounting for oxygen sensitivity and photosensitivity of Belousov–Zhabotinskii systems. J Phys Chem 94:4862–4866 Kuramoto Y (1984) Chemical oscillations, waves, and turbulence. Springer, Berlin

40. Kusumi T, Yamaguchi T, Aliev RR, Amemiya T, Ohmori T, Hashimoto H, Yoshikawa K (1997) Numerical study on time delay for chemical wave transmission via an inactive gap. Chem Phys Lett 271:355–360 41. Lázár A, Noszticzius Z, Försterling H-D, Nagy-Ungvárai Z (1995) Chemical pulses in modified membranes I. Developing the technique. Physica D 84:112–119; Volford A, Simon PL, Farkas H, Noszticzius Z (1999) Rotating chemical waves: theory and experiments. Physica A 274:30–49 42. Luengviriya C, Storb U, Hauser MJB, Müller SC (2006) An elegant method to study an isolated spiral wave in a thin layer of a batch Belousov–Zhabotinsky reaction under oxygen-free conditions. Phys Chem Chem Phys 8:1425–1429 43. Manz N, Müller SC, Steinbock O (2000) Anomalous dispersion of chemical waves in a homogeneously catalyzed reaction system. J Phys Chem A 104:5895–5897; Steinbock O (2002) Excitable Front Geometry in Reaction-Diffusion Systems with Anomalous Dispersion. Phys Rev Lett 88:228302 44. Maselko J, Reckley JS, Showalter K (1989) Regular and irregular spatial patterns in an immobilized-catalyst Belousov– Zhabotinsky reaction. J Phys Chem 93:2774–2780 45. Mikhailov AS, Showalter K (2006) Control of waves, patterns and turbulence in chemical systems. Phys Rep 425:79–194 46. Morozov VG, Davydov NV, Davydov VA (1999) Propagation of Curved Activation Fronts in Anisotropic Excitable Media. J Biol Phys 25:87–100 47. Motoike IN, Adamatzky A (2005) Three-valued logic gates in reaction–diffusion excitable media. Chaos, Solitons & Fractals 24:107–114 48. Motoike I, Yoshikawa K (1999) Information Operations with an Excitable Field. Phys Rev E 59:5354–5360 49. Motoike IN, Yoshikawa K, Iguchi Y, Nakata S (2001) Real–Time Memory on an Excitable Field. Phys Rev E 63:036220 (1–4) 50. Motoike IN, Yoshikawa K (2003) Information operations with multiple pulses on an excitable field. Chaos, Solitons & Fractals 17:455–461 51. Murray JD (1989) Mathematical Biology. Springer, Berlin 52. Nagai Y, Gonzalez H, Shrier A, Glass L (2000) Paroxysmal Starting and Stopping of Circulatong Pulses in Excitable Media. Phys Rev Lett 84:4248–4251 53. Nagahara H, Ichino T, Yoshikawa K (2004) Direction detector on an excitable field: Field computation with coincidence detection. Phys Rev E 70:036221 (1–5) 54. Nagumo J, Arimoto S, Yoshizawa S (1962) An Active Pulse Transmission Line Simulating Nerve Axon. Proc IRE 50: 2061–2070 55. Noszticzuis Z, Horsthemke W, McCormick WD, Swinney HL, Tam WY (1987) Sustained chemical pulses in an annular gel reactor: a chemical pinwheel. Nature 329:619–620 56. Plaza F, Velarde MG, Arecchi FT, Boccaletti S, Ciofini M, Meucci R (1997) Excitability following an avalanche-collapse process. Europhys Lett 38:85–90 57. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes, 3rd edn. The Art of Scientific Computing. Available via http://www.nr.com. 58. Rambidi NG, Maximychev AV (1997) Towards a Biomolecular Computer. Information Processing Capabilities of Biomolecular Nonlinear Dynamic Media. BioSyst 41:195–211 59. Rambidi NG, Yakovenchuk D (1999) Finding paths in a labyrinth based on reaction-diffusion media. BioSyst 51:67– 72; Rambidi NG, Yakovenchuk D (2001) Chemical reaction-

Computing in Geometrical Constrained Excitable Chemical Systems

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

diffusion implementation of finding the shortest paths in a labyrinth. Phys Rev E 63:026607 Rambidi NG (2005) Biologically Inspired Information Processing Technologies: Reaction-Diffusion Paradigm. Int J Unconv Comp 1:101–121 Rovinsky AB, Zhabotinsky AM (1984) Mechanism and mathematical model of the oscillating bromate-ferroin-bromomalonic acid reaction. J Phys Chem 88:6081–6084 Rovinsky AB (1986) Spiral waves in a model of the ferroin catalyzed Belousov–Zhabotinskii reaction. J Phys Chem 90:217–219 Sielewiesiuk J, Gorecki J (2002) Chemical Waves in an Excitable Medium: Their Features and Possible Applications in Information Processing. In: Klonowski W (ed) Attractors, Signals and Synergetics. 1st European Interdisciplinary School on Nonlinear Dynamics for System and Signal Analysis Euroattractor 2000, Warsaw, 6–15 June 2000. Pabst, Lengerich, pp 448–460 Sielewiesiuk J, Gorecki J (2002) On complex transformations of chemical signals passing through a passive barrier. Phys Rev E 66:016212; Sielewiesiuk J, Gorecki J (2002) Passive barrier as a transformer of chemical signal frequency. J Phys Chem A 106:4068–4076 Sielewiesiuk J, Gorecki J (2001) Chemical impulses in the perpendicular junction of two channels. Acta Phys Pol B 32:1589–1603 Sielewiesiuk J, Gorecki J (2001) Logical functions of a cross junction of excitable chemical media. J Phys Chem A 105:8189–8195 Steinbock O, Toth A, Showalter K (1995) Navigating complex labyrinths – optimal paths from chemical waves. Science 267:868–871 Steinbock O, Kettunen P, Showalter K (1995) Anisotropy and Spiral Organizing Centers in Patterned Excitable Media. Science 269:1857–1860 Steinbock O, Kettunen P (1996) Chemical clocks on the basis of rotating pulses. Measuring irrational numbers from period ratios. Chem Phys Lett 251:305–308

70. Steinbock O, Kettunen P, Showalter K (1996) Chemical Pulse Logic Gates. J Phys Chem 100:18970–18975 71. Suzuki K, Yoshinobu T, Iwasaki H (1999) Anisotropic waves propagating on two–dimensional arrays of Belousov– Zhabotinsky oscillators. Jpn J Appl Phys 38:L345–L348 72. Suzuki K, Yoshinobu T, Iwasaki H (2000) Unidirectional propagation of chemical waves through microgaps between zones with different excitability. J Phys Chem A 104:6602– 6608 73. Suzuki R (1967) Mathematical analysis and application of iron-wire neuron model. IEEE Trans Biomed Eng 14:114– 124 74. Taylor AF, Armstrong GR, Goodchild N, Scott SK (2003) Propagation of chemical waves across inexcitable gaps. Phys Chem Chem Phys 5:3928–3932 75. Toth A, Showalter K (1995) Logic gates in excitable media. J Chem Phys 103:2058–2066 76. Toth A, Horvath D, Yoshikawa K (2001) Unidirectional wave propagation in one spatial dimension. Chem Phys Lett 345: 471–474 77. Yoshikawa K, Nagahara H, Ichino T, Gorecki J, Gorecka JN, Igarashi Y (2009) On Chemical Methods of Direction and Distance Sensing. Int J Unconv Comput 5:53–65

Books and Reviews Hjelmfelt A, Ross J (1994) Pattern recognition, chaos, and multiplicity in neural networks of excitable systems. Proc Natl Acad Sci USA 91:63–67 Nakata S (2003) Chemical Analysis Based on Nonlinearity. Nova, New York Rambidi NG (1998) Neural network devices based on reaction– diffusion media: an Approach to artificial retina. Supramol Sci 5:765–767 Storb U, Müller SC (2004) Scroll waves. In: Scott A (ed) Encyclopedia of nonlinear sciences. Routledge, Taylor and Francis Group, New York, pp 825–827

645

646

Computing with Solitons

Computing with Solitons DARREN RAND1 , KEN STEIGLITZ2 1 Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, USA 2 Computer Science Department, Princeton University, Princeton, USA Article Outline Glossary Definition of the Subject Introduction Manakov Solitons Manakov Soliton Computing Multistable Soliton Collision Cycles Experiments Future Directions Bibliography Glossary Integrable This term is generally used in more than one way and in different contexts. For the purposes of this article, a partial differential equation or system of partial differential equations is integrable if it can be solved explicitly to yield solitons (qv). Manakov system A system of two cubic Schrödinger equations where the self- and cross-phase modulation terms have equal weight. Nonlinear Schrödinger equation A partial differential equation that has the same form as the Schrödinger equation of quantum mechanics, with a term nonlinear in the dependent variable, and for the purposes of this article, interpreted classically. Self- and cross-phase modulation Any terms in a nonlinear Schrödinger equation that involve nonlinear functions of the dependent variable of the equation, or nonlinear functions of a dependent variable of another (coupled) equation, respectively. Solitary wave A solitary wave is a wave characterized by undistorted propagation. Solitary waves do not in general maintain their shape under perturbations or collisions. Soliton A soliton is a solitary wave which is also robust under perturbations and collisions. Turing equivalent Capable of simulating any Turing Machine, and hence by Turing’s Thesis capable of performing any computation that can be carried out by a sequence of effective instructions on a finite amount of data. A machine that is Turing equivalent is there-

fore as powerful as any digital computer. Sometimes a device that is Turing equivalent is called “universal.” Definition of the Subject Solitons are localized, shape-preserving waves characterized by robust collisions. First observed as water waves by John Scott Russell [29] in the Union Canal near Edinburgh and subsequently recreated in the laboratory, solitons arise in a variety of physical systems, as both temporal pulses which counteract dispersion and spatial beams which counteract diffraction. Solitons with two components, vector solitons, are computationally universal due to their remarkable collision properties. In this article, we describe in detail the characteristics of Manakov solitons, a specific type of vector soliton, and their applications in computing. Introduction In this section, we review the basic principles of soliton theory and spotlight relevant experimental results. Interestingly, the phenomena of soliton propagation and collision occur in many physical systems despite the diversity of mechanisms that bring about their existence. For this reason, the discussion in this article will treat temporal and spatial solitons interchangeably, unless otherwise noted. Scalar Solitons A pulse in optical fiber undergoes dispersion, or temporal spreading, during propagation. This effect arises because the refractive index of the silica glass is not constant, but is rather a function of frequency. The pulse can be decomposed into a frequency range—the shorter the pulse, the broader its spectral width. The frequency dependence of the refractive index will cause the different frequencies of the pulse to propagate at different velocities, giving rise to dispersion. As a result, the pulse develops a chirp, meaning that the individual frequency components are not evenly distributed throughout the pulse. There are two types of dispersion: normal and anomalous. If the longer wavelengths travel faster, the medium is said to have normaldispersion. If the opposite is true, the medium has anomalous dispersion. The response of a dielectric such as optical fiber is nonlinear. Most of the nonlinear effects in fiber originate from nonlinear refraction, where the refractive index n depends on the intensity of the propagating field according to the relation n D n0 C n2 jEj2 ;

(1)

Computing with Solitons

where n0 is the linear part of the refractive index, jE 2 j is the optical intensity, and n2 is the coefficient of nonlinear contribution to the refractive index. Because the material responds almost instantaneously, on the order of femtoseconds, and because the phase shift  is proportional to n, each component of an intense optical pulse sees a phase shift proportional to its intensity. Since the frequency shift ı! D (@)/(@t), the leading edge of the pulse is red-shifted (ı! < 0), while the trailing edge is blue-shifted (ı! > 0), an effect known as self-phase modulation (SPM). As a result, if the medium exhibits normal dispersion, the pulse is broadened; for anomalous dispersion, the pulse is compressed. Under the proper conditions, this pulse compression can exactly cancel the linear, dispersion-induced broadening, resulting in distortionless soliton propagation. For more details, see the book by Agrawal [3]. The idealized mathematical model for this pulse propagation is the nonlinear Schrödinger equation (NLSE): i

@u 1 @2 u C juj2 u D 0; ˙ @z 2 @x 2

(2)

where u(z, x) is the complex-valued field envelope, z is a normalized propagation distance and x is normalized time propagating with the group velocity of the pulse. The second and third terms describe dispersion and the intensity-dependent Kerr nonlinearity, respectively. The coefficient of the dispersion term is positive for anomalous dispersion and negative for normal dispersion. Equation (2), known as the scalar NLSE, is integrable—that is, it can be solved analytically, and collisions between solitons are ‘elastic,’ in that no change in amplitude or velocity occurs as a result of a collision. Zakharov and Shabat [38] first solved this equation analytically using the inverse scattering method. It describes, for example, the propagation of picosecond or longer pulses propagating in lossless optical fiber. Two solitons at different wavelengths will collide in an optical fiber due to dispersion-induced velocity differences. A schematic of such a collision is depicted in Fig. 1. The scalar soliton collision is characterized by two phenomena— a position and phase shift—both of which can be understood in the same intuitive way. During collision, there will be a local increase in intensity, causing a local increase in the fiber’s refractive index, according to Eq. (1). As a result, both the soliton velocity and phase will be affected during the collision. From an all-optical signal processing perspective, the phase and position shifts in a soliton collision are not useful. This is because these effects are independent of any soliton properties that are changed by collision; that is, the

Computing with Solitons, Figure 1 Schematic of a scalar soliton collision, in which amplitude and velocities are unchanged. The two soliton collision effects are a position shift (depicted through the translational shift in the soliton path) and phase shift (not pictured)

result of one collision will not affect the result of subsequent collisions. Scalar solitons are therefore not useful for complex logic or computing, which depend on multiple, cascaded interactions. Despite this setback, it was discovered later that a system similar to the scalar NLSE, the Manakov system [19], possesses very rich collisional properties [26] and is integrable as well. Manakov solitons are a specific instance of two-component vector solitons, and it has been shown that collisions of Manakov solitons are capable of transferring information via changes in a complex-valued polarization state [16]. Vector Solitons When several field components, distinguished by polarization and/or frequency, propagate in a nonlinear medium, the nonlinear interaction between them must be considered as well. This interaction between field components results in intensity-dependent nonlinear coupling terms analogous to the self-phase modulation term in the scalar case. Such a situation gives rise to a set of coupled nonlinear Schrödinger equations, and may allow for propagation of vector solitons. For the case of two components propagating in an ideal sl medium with no higher-order effects and only intensity-dependent nonlinear coupling, the equations become: @u1 @2 u1 C 2(ju1 j2 C ˛ju2 j2 )u1 D 0 ; C @z @x 2 @u2 @2 u2 i C 2(ju2 j2 C ˛ju1 j2 )u2 D 0 ; C @z @x 2

i

(3)

647

648

Computing with Solitons

where u1 (z; x) and u2 (z; x) are the complex-valued pulse envelopes for each component,  is a nonlinearity parameter, and ˛ describes the ratio between self- and crossphase modulation contributions to the overall nonlinearity. Only for the special case of ˛ D 1 are Eqs. (3) integrable. First solved using the method of inverse scattering by Manakov [19], Eqs. (3) admit solutions known as Manakov solitons. For nonintegrable cases (˛ ¤ 1), some analytical solitary-wave solutions are known for specific cases, although in general a numerical approach is required [36]. The specific case of ˛ D 2/3, for example, corresponds to linearly birefringent polarization maintaining fiber, and will be considered in more detail in Sect. “Experiments”. Due to their multicomponent structure, vector solitons have far richer collision dynamics than their scalar, one-component counterparts. Recall that scalar collisions are characterized by phase and position shifts only. Vector soliton collisions also exhibit these effects, with the added feature of possible intensity redistributions between the component fields [19,26]. This process is shown schematically in Fig. 2. In the collision, two conservation relations are satisfied: (i) the energy in each soliton is conserved and (ii) the energy in each component is conserved. It can be seen that when the amplitude of one component in a soliton increases as a result of the collision, the other compo-

Computing with Solitons, Figure 2 Schematic of a vector soliton collision, which exhibits a position shift and phase shift (not pictured), similar to the scalar soliton collision (cf. Fig. 1). Vector soliton collisions also display an energy redistribution among the component fields, shown here as two orthogonal polarizations. Arrows indicate direction of energy redistribution

nent decreases, with the opposite exchange in the second soliton. The experimental observation of this effect will be discussed in Sect. “Experiments”. In addition to fundamental interest in such solitons, collisions of vector solitons make possible unique applications, including collision-based logic and universal computation [16,27,34,35], as discussed in Sect. “Manakov Soliton Computing”. Manakov Solitons As mentioned in Sect. “Introduction”, computation is possible using vector solitons because of an energy redistribution that occurs in a collision. In this section, we provide the mathematic background of Manakov soliton theory, in order to understand soliton computing and a remarkable way to achieve bistability using soliton collisions as described in Sects. “Manakov Soliton Computing” and “Multistable Soliton Collision Cycles”, respectively. The Manakov system consists of two coupled NLSEs [19]: @q1 @2 q1 C 2(jq1 j2 C jq2 j2 )q1 D 0 ; C @z @x 2 @2 q2 @q2 C 2(jq1 j2 C jq2 j2 )q2 D 0 ; C i @z @x 2 i

(4)

where q1 (x; z) and q2 (x; z) are two interacting optical components,  is a positive parameter representing the strength of the nonlinearity, and x and z are normalized space and propagation distance, respectively. As mentioned in Sect. “Vector Solitons”, the Manakov system is a special case of Eqs. (3) with ˛ D 1. The two components can be thought of as components in two polarizations, or, as in the case of a photorefractive crystal, two uncorrelated beams [11]. Manakov first solved Eqs. (4) by the method of inverse scattering [19]. The system admits single-soliton, two-component solutions that can be characterized by a complex number k k R C ik I , where kR represents the energy of the soliton and kI the velocity, all in normalized units. The additional soliton parameter is the complex-valued polarization state q1 /q2 , defined as the (z- and x-independent) ratio between the q1 and q2 components. Figure 3 shows the schematic for a general two-soliton collision, with initial parameters 1 , k1 and L , k2 , corresponding to the right-moving and left-moving solitons, respectively. The values of k1 and k2 remain constant during collision, but in general the polarization state changes. Let 1 and L denote the respective soliton states before impact, and suppose the collision transforms 1 into R , and L into 2 . It turns out that the state change undergone

Computing with Solitons

operator on the state. Note that this requires that the inverse operator have the same k parameter as the original, a condition that will hold in our application of computing in the next section. These state transformations were first used by Jakubowski et al. [16] to describe logical operations such as NOT . Later, Steiglitz [34] established that arbitrary computation was possible through time gating of Manakov (1 + 1)-dimensional spatial solitons. We will describe this in Sect. “Manakov Soliton Computing”. There exist several candidates for the physical realization of Manakov solitons, including photorefractive crystals [4,5,9,11,30], semiconductor waveguides [17], quadratic media [33], and optical fiber [23,28]. In Sect. “Experiments”, we discuss in detail an experiment with vector solitons in linearly birefringent optical fiber. Computing with Solitons, Figure 3 Schematic of a general two-soliton collision. Each soliton is characterized by a complex-valued polarization state and complex parameter k. Reprinted with permission from [34]. Copyright by the American Physical Society

by each colliding soliton takes on the very simple form of a linear fractional transformation (also called a bilinear or Möbius transformation). Explicitly, the state of the emerging left-moving soliton is given by [16]: [(1  g)/ 1 C 1 ] L C g 1 / 1 ; g L C (1  g) 1 C 1/ 1

2 D

(5)

where g

k1 C k1 : k2 C k1

(6)

The state of the right-moving soliton is obtained similarly, and is R D

[(1  h )/ L C L ] 1 C h L / L ; h 1 C (1  h ) L C 1/ L

(7)

where h

k2 C k2 : k1 C k2

(8)

We assume here, without loss of generality, that k1R ; k2R > 0. Several properties of the linear fractional transformations in Eqs. (5) and (7) are derived in [16], including the characterization of inverse operators, fixed points, and implicit forms. In particular, when viewed as an operator every soliton has an inverse, which will undo the effect of the

Manakov Soliton Computing We described in the previous section how collisions of Manakov solitons can be described by transformations of a complex-valued state which is the ratio between the two Manakov components. We show in this section that general computation is possible if we use (1+1)-dimensional spatial solitons that are governed by the Manakov equations and if we are allowed to time-gate the beams input to the medium. The result is a dynamic computer without spatially fixed gates or wires, which is unlike most presentday conceptions of a computer that involve integrated circuits, in which information travels between logical elements that are fixed spatially through fabrication on a silicon wafer. We can call such a scheme “nonlithographic,” in the sense that there is no architecture imprinted on the medium. The requirements for computation include cascadability, fanout, and Boolean completeness. The first, cascadability, requires that the output of one device can serve as input to another. Since any useful computation consists of many stages of logic, this condition is essential. The second, fanout, refers to the ability of a logic gate to drive at least two similar gates. Finally, Boolean completeness makes it possible to perform arbitrary computation. We should emphasize that although the model we use is meant to reflect known physical phenomena, at least in the limit of ideal behavior, the result is a mathematical one. Practical considerations of size and speed are not considered here, nor are questions of error propagation. In this sense the program of this article is analogous to Fredkin and Toffoli [13] for ideal billiard balls, and Shor [31] for quantum mechanics. There are however several can-

649

650

Computing with Solitons

Computing with Solitons, Figure 5 Colliding spatial solitons. Reprinted with permission from [34]. Copyright by the American Physical Society

Computing with Solitons, Figure 4 The general physical arrangement considered in this paper. Time-gated beams of spatial Manakov solitons enter at the top of the medium, and their collisions result in state changes that reflect computation. Each solid arrow represents a beam segment in a particular state. Reprinted with permission from [34]. Copyright by the American Physical Society

didates for physical instantiation of the basic ideas in this paper, as noted in the previous section. Although we are describing computation embedded in a homogeneous medium, and not interconnected gates in the usual sense of the word, we will nevertheless use the term gates to describe prearranged sequences of soliton collisions that effect logical operations. We will in fact adopt other computer terms to our purpose, such as wiring to represent the means of moving information from one place to another, and memory to store it in certain ways for future use. We will proceed in the construction of what amounts to a complete computer in the following stages: First we will describe a basic gate that can be used for FANOUT. Then we will show how the same basic configuration can be used for NOT, and finally, NAND. Then we will describe ways to use time gating of the input beams to interconnect signals. The NAND gate, FANOUT, and interconnect are sufficient to implement any computer, and we conclude with a layout scheme for a general-purpose, and hence Turing-equivalent computer. The general picture of the physical arrangement is shown in Fig. 4. Figure 5 shows the usual picture of colliding solitons, which can work interchangeably for the case of temporal

Computing with Solitons, Figure 6 Convenient representation of colliding spatial solitons. Reprinted with permission from [34]. Copyright by the American Physical Society

or spatial solitons. It is convenient for visualization purposes to turn the picture and adjust the scale so the axes are horizontal and vertical, as in Fig. 6. We will use binary logic, with two distinguished, distinct complex numbers representing TRUE and FALSE, called 1 and 0, respectively. In fact, it turns out to be possible to use complex 1 and 0 for these two state values, and we will do that throughout this paper, but this is a convenience and not at all a necessity. We will thus use complex polarization states 1 and 0 and logical 1 and 0 interchangeably. FANOUT We construct the FANOUT gate by starting with a COPY gate, implemented with collisions between three downmoving, vertical solitons and one left-moving horizontal soliton. Figure 7 shows the arrangement. The soliton state labeled in will carry a logical value, and so be in one of the two states 0 or 1. The left-moving soliton labeled actuator will be in the fixed state 0, as will be the case throughout this paper. The plan is to adjust the (so far) arbitrary states z and y so that out = in, justifying the name COPY. It is reasonable to expect that this might be possible, be-

Computing with Solitons

Computing with Solitons, Figure 7 COPY gate. Reprinted with permission from [34]. Copyright by the American Physical Society

cause there are four degrees of freedom in the two complex numbers z and y, and two complex equations to satisfy: that out be 1 and 0 when in is 1 and 0, respectively. Values that satisfy these four equations in four unknowns were obtained numerically. We will call them zc and yc . It is not always possible to solve these equations; Ablowitz et al. [1] showed that a unique solution is guaranteed to exist in certain parameter regimes. However, explicit solutions have been found for all the cases used in this section, and are given in Table 1. To be more specific about the design problem, write Eq. (5) as the left-moving product 2 D L( 1 ; L ), and similarly write Eq. (7) as R D R( 1 ; L ). The successive left-moving products in Fig. 7 are L(in; 0) and L(y; L(in; 0)). The out state is then R(z; L(y; L(in; 0))). The stipulation that 0 maps to 0 and 1 maps to 1 is expressed by the following two simultaneous complex equations in two complex unknowns R(z; L(y; L(0; 0))) D 0 ; R(z; L(y; L(1; 0))) D 1 : It is possible to solve for z as a function of y and then eliminate z from the equations, yielding one complex equation in the one complex unknown y. This is then solved numerically by grid search and successive refinement. There is no need for efficiency here, since we will require solutions in only a small number of cases.

Computing with Solitons, Figure 8 FANOUT gate. Reprinted with permission from [34]. Copyright by the American Physical Society

To make a FANOUT gate, we need to recover the input, which we can do using a collision with a soliton in the state which is the inverse of 0, namely 1 [16]. Figure 8 shows the complete FANOUT gate. Notice that we indicate collisions with a dot at the intersection of paths, and require that the continuation of the inverse soliton not intersect the continuation of z that it meets. We indicate that by a broken line, and postpone the explanation of how this “wire crossing” is accomplished. It is immaterial whether the continuation of the inverse operator hits the continuation of y, because it is not used later. We call such solitons garbage solitons. NOT and ONE Gates In the same way we designed the complex pair of states (z c ; y c ) to produce a COPY and FANOUT gate, we can find a pair (z n ; y n ) to get a NOT gate, mapping 0 to 1 and 1 to 0; and a pair (z1 ; y1 ) to get a ONE gate, mapping both 0 and 1 to 1. These (z, y) values are given in Table 1. We should point out that the ONE gate in itself, considered as a one-input, one-output gate, is not invertible, and could never be achieved by using the continuation of one particular soliton through one, or even many collisions. This is because such transformations are always nonsin-

Computing with Solitons, Table 1 Parameters for gates when soliton speeds are 1 gate

z y 0:24896731  0:62158212  I 2:28774210 C 0:01318152  I NOT 0:17620885 C 0:38170630  I 0:07888703  1:26450654  I ONE 0:45501471  1:37634227  I 1:43987094 C 0:64061349  I Z - CONV 0:31838068  0:43078735  I 0:04232340 C 2:17536612  I Y - CONV 1:37286955 C 0:88495501  I 0:58835758  0:18026939  I COPY

651

652

Computing with Solitons

Computing with Solitons, Figure 9 A NAND gate, using converter gates to couple copies of one of its inputs to its z and y parameters. Reprinted with permission from [34]. Copyright by the American Physical Society

gular linear fractional transformations, which are invertible [16]. The transformation of state from the input to the continuation of z is, however, much more complicated and provides the flexibility we need to get the ONE gate. It turns out that this ONE gate will give us a row in the truth table of a NAND, and is critical for realizing general logic. Output/Input Converters, Two-Input Gates, and NAND To perform logic of any generality we must of course be able to use the output of one operation as the input to another. To do this we need to convert logic (0/1) values to some predetermined z and y values, the choice depending on the type of gate we want. This results in a two-input, one-output gate. As an important example, here’s how a NAND gate can be constructed. We design a z-converter that converts 0/1 values to appropriate values of z, using the basic threecollision arrangement shown in Fig. 7. For a NAND gate, we map 0 to z1 , the z value for the ONE gate, and map 1 to zn , the z value for the NOT gate. Similarly, we construct a y-converter that maps 0 to y1 and 1 to yn . These z-

and y-converters are used on the fanout of one of the inputs, and the resulting two-input gate is shown in Fig. 9. Of course these z- and y-converters require z and y values themselves, which are again determined by numerical search (see Table 1). The net effect is that when the left input is 0, the other input is mapped by a ONE gate, and when it is 1 the other input is mapped by a NOT gate. The only way the output can be 0 is if both inputs are 1, thus showing that this is a NAND gate. Another way of looking at this construction is that the 2×2 truth table of (left input)×(right input) has as its 0 row a ONE gate of the columns (1 1), and as its 1 row a NOT gate of the columns (1 0). The importance of the NAND gate is that it is universal [20]. That is, it can be used with interconnects and fanouts to construct any other logical function. Thus we have shown that with the ability to “wire” we can implement any logic using the Manakov model. We note that other choices of input converters result in direct realizations of other gates. Using input converters that convert 0 and 1 to (z c ; y c ) and (z n ; y n ), respectively, results in a truth table with first row (0 1) and second row (1 0), an XOR gate. Converting 0 and 1 to (z c ; y c ) and (z1 ; y1 ), respectively, results in an OR gate, and so on. Time Gating We next take up the question of interconnecting the gates described above, and begin by showing how the continuation of the input in the COPY gate can be restored without affecting the other signals. In other words, we show how a simple “wire crossing” can be accomplished in this case. For spatial solitons, the key flexibility in the model is provided by assuming that input beams can be time-gated; that is, turned on and off. When a beam is thus gated, a finite segment of light is created that travels through the medium. We can think of these finite segments as finite light pulses, and we will call them simply pulses in the remainder of this article. Figure 10a shows the basic three-collision gate implemented with pulses. Assuming that the actuator and data pulses are appropriately timed, the actuator pulse hits all three data pulses, as indicated in the projection below the space-space diagram. The problem is that if we want a later actuator pulse to hit the rightmost data pulse (to invert the state, for example, as in the FANOUT gate), it will also hit the remaining two data pulses because of the way they must be spaced for the earlier three collisions. We can overcome this difficulty by sending the actuator pulse from the left instead of the right. Timing it appropriately early it can be made to miss the first two data

Computing with Solitons

Computing with Solitons, Figure 10 a When entered from the right and properly timed, the actuator pulse hits all three data pulses, as indicated in the projection at the bottom; b When entered from the left and properly timed, the actuator pulse misses two data pulses and hits only the rightmost data pulse, as indicated in the projection at the bottom. Reprinted with permission from [34]. Copyright by the American Physical Society

pulses, and hit the third, as shown in Fig. 10b. It is easy to check that if the velocity of the right-moving actuator solitons is algebraically above that of the data solitons by the same amount that the velocity of the data solitons is algebraically above that of the left-moving actuator solitons, the same state transformations will result. For example, if we choose the velocities of the data and left-moving actuator solitons to be C1 and 1, we should choose the velocity of the right-moving actuator solitons to be C3. This is really a consequence of the fact that the g and h parameters of Eqs. (6) and (8) in the linear fractional transformation depend only on the difference in the velocities of the colliding solitons.

Computing with Solitons, Figure 11 The frame of this figure is moving down with the data pulses on the left. A data pulse in memory is operated on with a three-collision gate actuated from the left, and the result deposited to the upper right. Reprinted with permission from [34]. Copyright by the American Physical Society

not remain in the picture with a moving frame and hence cannot interfere with later computations. However, all vertically moving pulses remain stationary in this picture. Once a diagonal trajectory is used for a three-collision gate, reusing it will in general corrupt the states of all the stationary pulses along that diagonal. However, the original data pulse (gate input) can be restored with a pulse in the state inverse to the actuator, either along the same diagonal as the actuator, provided we allow enough time for the result (the gate output, a stationary z pulse) to be used, or along the other diagonal.

Wiring Having shown that we can perform FANOUT and NAND, it remains only to show that we can “wire” gates so that any outputs can be fed to any inputs. The basic method for doing this is illustrated in Fig. 11. We think of data as stored in the down-moving pulses in a column, which we can think of as “memory”. The observer moves with this frame, so the data appears stationary. Pulses that are horizontal in the three-collision gates shown in previous figures will then appear to the observer to move upward at inclined angles. It is important to notice that these upward diagonally moving pulses are evanescent in our picture (and hence their paths are shown dashed in the figure). That is, once they are used, they do

Computing with Solitons, Figure 12 A data pulse is copied to the upper right, this copy is copied to the upper left, and the result put at the top of memory. The original data pulse can then be restored with an inverse pulse and copied to the left in the same way. Reprinted with permission from [34]. Copyright by the American Physical Society

653

654

Computing with Solitons

Suppose we want to start with a given data pulse in the memory column and create two copies above it in the memory column. Figure 12 shows a data pulse at the lower left being copied to the upper right with a three-collision COPY gate, initiated with an actuator pulse from the left. This copy is then copied again to the upper left, back to a waiting z pulse in the memory column. After the first copy is used, an inverse pulse can be used along the lower left to upper right diagonal to restore the original data pulse. The restored data pulse can then be copied to the left in the same way, to a height above the first copy, say, and thus two copies can be created and deposited in memory above the original. A Second Speed and Final FANOUT and NAND There is one problem still remaining with a true FANOUT: When an original data pulse in memory is used in a COPY operation for FANOUT, two diagonals are available, one from the lower left to the upper right, and the other from the lower right to the upper left. Thus, two copies can be made, as was just illustrated. However, when a data pulse is deposited in the memory column as a result of a logic operation, the logical operation itself uses at least one diagonal, which leaves at most one free. This makes a FANOUT of the output of a gate impossible with the current scheme. A simple solution to this problem is to introduce another speed, using velocities ˙0:5, say, in addition to ˙1. This effectively provides four rather than two directions in which a pulse can be operated on, and allows true FANOUT and general interconnections. Figure 13 shows such a FANOUT; the data pulse at the lower left is copied to a position above it using one speed, and to another position, above that, using another. Finally, a complete NAND gate is shown in Fig. 14. The gate can be thought of as composed of the following steps:  input 2 is copied to the upper left, and that copy transformed by a z-converter to the upper right, placing the z pulse for the NAND gate at the top of the figure;  after the copy of input 2 is used, input 2 is restored with an inverse pulse to the upper left;  input 2 is then transformed to the upper right by a yconverter;  input 1 is copied to the upper right, to a position collinear with the z- and y-converted versions of the other input;  a final actuator pulse converts the z pulse at the top to the output of the NAND gate. Note that the output of the NAND has used two diagonals, which again shows why a second speed is needed if

Computing with Solitons, Figure 13 The introduction of a second speed makes true FANOUT possible. For simplicity, in this and the next figure, data and operator pulses are indicated by solid dots, and the y operator pulses are not shown. The paths of actuator pulses are indicated by dashed lines. Reprinted with permission from [34]. Copyright by the American Physical Society

we are to use the NAND output as an input to subsequent logical operations. The y operator pulses, middle components in the three-collision COPY and converter gates, are not shown in the figure, but room can always be made for them to avoid accidental collisions by adding only a constant amount of space. Universality It should be clear now that any sequence of three-collision gates can be implemented in this way, copying data out of the memory column to the upper left or right, and performing NAND operations on any two at a time in the way shown in the previous section. The computation can proceed in a breadth-first manner, with the results of each successive stage being stored above the earlier results. Each additional gate can add only a constant amount of height

Computing with Solitons

the Game of Life [7], and Lattice Gasses [32], for example, in that no internal mirrors or structures of any kind are used inside the medium. To the author’s knowledge, whether internal structure is necessary in these other cases is open. Finally, we remark that the model used is reversible and dissipationless. The fact that some of the gate operations realized are not in themselves reversible is not a contradiction, since extra, “garbage” solitons [13] are produced that save enough state to run the computation backwards. Multistable Soliton Collision Cycles

Computing with Solitons, Figure 14 Implementation of a NAND gate. A second speed will be necessary to use the output. Reprinted with permission from [34]. Copyright by the American Physical Society

and width to the medium, so the total area required is no more than proportional to the square of the number of gates. The “program” consists of down-moving y and z operator pulses, entering at the top with the down-moving data, and actuator pulses that enter from the left or right at two different speeds. In the frame moving with the data, the data and operator pulses are stationary and new results are deposited at the top of the memory column. In the laboratory frame the data pulses leave the medium downward, and new results appear in the medium at positions above the old data, at the positions of newly entering z pulses. Discussion We have shown that in principle any computation can be performed by shining time-gated lasers into a completely homogeneous nonlinear optical medium. This result should be viewed as mathematical, and whether the physics of vector soliton collisions can lead to practical computational devices is a subject for future study. With regard to the economy of the model, the question of whether time gating is necessary, or even whether two speeds are necessary, is open. We note that the result described here differs from the universality results for the ideal billiard ball model [13],

Bistable and multistable optical systems, besides being of some theoretical interest, are of practical importance in offering a natural “flip-flop” for noise immune storage and logic. We show in this section that simple cycles of collisions of solitons governed by the Manakov equations can have more than one distinct stable set of polarization states, and therefore these distinct equilibria can, in theory, be used to store and process information. The multistability occurs in the polarization states of the beams; the solitons themselves do not change shape and remain the usual sech-shaped solutions of the Manakov equations. This phenomenon is dependent only on simple soliton collisions in a completely homogeneous medium. The basic configuration considered requires only that the beams form a closed cycle, and can thus be realized in any nonlinear optical medium that supports spatial Manakov solitons. The possibility of using multistable systems of beam collisions broadens the possibilities for practical application of the surprisingly strong interactions that Manakov solitons can exhibit, a phenomenon originally described in [26]. We show here by example that a cycle of three collisions can have two distinct foci surrounded by basins of attractions, and that a cycle of four collisions can have three. The Basic Three-Cycle and Computational Experiments Figure 15 shows the simplest example of the basic scheme, a cycle of three beams, entering in states A, B, and C, with intermediate beams a, b, and c. For convenience, we will refer to the beams themselves, as well as their states, as A, B, C, etc. Suppose we start with beam C initially turned off, so that A D a. Beam a then hits B, thereby transforming it to state b. If beam C is then turned on, it will hit A, closing the cycle. Beam a is then changed, changing b, etc., and the cycle of state changes propagates clockwise. The question we ask is whether this cycle converges, and if so, whether

655

656

Computing with Solitons

Computing with Solitons, Figure 15 The basic cycle of three collisions. Reprinted with permission from [35]. Copyright by the American Physical Society

it will converge with any particular choice of complex parameters to exactly zero, one, two, or more foci. We answer the question with numerical simulations of this cycle. A typical computational experiment was designed by fixing the input beams A, B, C, and the parameters k1 and k2 , and then choosing points a randomly and independently with real and imaginary coordinates uniformly distributed in squares of a given size in the complex plane. The cycle described above was then carried out until convergence in the complex numbers a, b, and c was obtained to within 10–12 in norm. Distinct foci of convergence were stored and the initial starting points a were categorized by which focus they converged to, thus generating the usual picture of basins of attraction for the parameter a. Typically this was done for 50,000 random initial values of a, effectively filling in the square, for a variety of parameter choices A, B, and C. The following results were observed:

Computing with Solitons, Figure 16 The two foci and their corresponding basins of attraction in the first example, which uses a cycle of three collisions. The states of the input beams are A D 0:8  i  0:13, B D 0:4  i  0:13, C D 0:5 C i  1:6; and k D 4 ˙ i. Reprinted with permission from [35]. Copyright by the American Physical Society

 In cases with one or two clear foci, convergence was obtained in every iteration, almost always within one or two hundred iterations.  Each experiment yielded exactly one or two foci.  The bistable cases (two foci) are somewhat less common than the cases with a unique focus, and are characterized by values of kR between about 3 and 5 when the velocity difference  was fixed at 2. Figure 16 shows a bistable example, with the two foci and their corresponding basins of attraction. The parameter k is fixed in this and all subsequent examples at 4 ˙ i for the right- and left-moving beams of any given collision, respectively. The second example, shown in Fig. 17, shows that the basins are not always simply connected; a sizable island that maps to the upper focus appears within the basin of the lower focus. A Tristable Example Using a Four-Cycle Collision cycles of length four seem to exhibit more complex behavior than those of length three, although it is dif-

Computing with Solitons, Figure 17 A second example using a cycle of three collisions, showing that the basins need not be simply connected. The states of the input beams are A D 0:7  i  0:3, B D 1:1  i  0:5, C D 0:4 C i  0:81; and k D 4 ˙ i. Reprinted with permission from [35]. Copyright by the American Physical Society

Computing with Solitons

Computing with Solitons, Figure 18 A case with three stable foci, for a collision cycle of length four. The states of the input beams are A D 0:39  i  0:45, B D 0:22  i  0:25, C D 0:0 C i  0:25, D D 0:51 C i  0:48; and k D 4 ˙ i. Reprinted with permission from [35]. Copyright by the American Physical Society

ficult to draw any definite conclusions because the parameter spaces are too large to be explored exhaustively, and there is at present no theory to predict such highly nonlinear behavior. If one real degree of freedom is varied as a control parameter, we can move from bistable to tristable solutions, with a regime between in which one basin of attraction disintegrates into many small separated fragments. Clearly, this model is complex enough to exhibit many of the well-known features of nonlinear systems. Fortunately, it is not difficult to find choices of parameters that result in very well behaved multistable solutions. For example, Fig. 18 shows such a tristable case. The smallest distance from a focus to a neighboring basin is on the order of 25% of the interfocus distance, indicating that these equilibria will be stable under reasonable noise perturbations. Discussion The general phenomenon discussed in this section raises many questions, both of a theoretical and practical nature. The fact that there are simple polarization-multistable cycles of collisions in a Manakov system suggests that similar situations occur in other vector systems, such as photorefractive crystals or birefringent fiber. Any vector system with the possibility of a closed cycle of soliton collisions

becomes a candidate for multistability, and there is at this point really no compelling reason to restrict attention to the Manakov case, except for the fact that the explicit statechange relations make numerical study much easier. The simplified picture we used of information traveling clockwise after we begin with a given beam a gives us stable polarization states when it converges, plus an idea of the size of their basins of attractions. It is remarkable that in all cases in our computational experience, except for borderline transitional cases in going from two to three foci in a four-cycle, this circular process converges consistently and quickly. But understanding the actual dynamics and convergence characteristics in a real material requires careful physical modeling. This modeling will depend on the nature of the medium used to approximate the Manakov system, and is left for future work. The implementation of a practical way to switch from one stable state to another is likewise critically dependent on the dynamics of soliton formation and perturbation in the particular material at hand, and must be studied with reference to a particular physical realization. We remark also that no iron-clad conclusions can be drawn from computational experiments about the numbers of foci in any particular case, or the number possible for a given size cycle—despite the fact that we regularly used 50,000 random starting points. On the other hand, the clear cases that have been found, such as those used as examples, are very characteristic of universal behavior in other nonlinear iterated maps, and are sufficient to establish that bi- and tristability, and perhaps highermode multistability, is a genuine mathematical characteristic, and possibly also physically realizable. It strongly suggests experimental exploration. We restricted discussion in this section to the simplest possible structure of a single closed cycle, with three or four collisions. The stable solutions of more complicated configurations are the subject of continuing study. A general theory that predicts this behavior is lacking, and it seems at this point unlikely to be forthcoming. This forces us to rely on numerical studies, from which, as we point out above, only certain kinds of conclusions can be drawn. We are fortunate, however, in being able to find cases that look familiar and which are potentially useful, like the bistable three-cycles with well separated foci and simply connected basins of attraction. It is not clear however, just what algorithms might be used to find equilibria in collision topologies with more than one cycle. It is also intriguing to speculate about how collision configurations with particular characteristics can be designed, how they can be made to interact, and how they might be controlled by pulsed beams. There is

657

658

Computing with Solitons

promise that when the ramifications of complexes of vector soliton collisions are more fully understood they might be useful for real computation in certain situations. Application to Noise-Immune Soliton Computing Any physical instantiation of a computing technology must be designed to be immune from the effects of noise buildup from logic stage to logic stage. In the familiar computers of today, built with solid-state transistors, the noiseimmunity is provided by physical state restoration, so that voltage levels representing logical “0” and “1” are restored by bistable circuit mechanisms at successive logic stages. This is state restoration at the physical level. For another example, proposed schemes for quantum computing would be impractical without some means of protecting information stored in qubits from inevitable corruption by the rest of the world. The most common method proposed for accomplishing this is error correction at the software level, state restoration at the logical level. In the collision-based scheme for computing with Manakov solitons described in Sect. “Manakov Soliton Computing”, there is no protection against buildup of error from stage to stage, and some sort of logical staterestoration would be necessary in a practical realization. The bistable collision cycles of Manakov solitons described in this section, however, offer a natural computational building block for soliton computation with physical state restoration. This idea is explored in [27]. Figure 19 illustrates the approach with a schematic diagram of a NAND gate, implemented with bistable cycles to represent bits. The input bits are stored in the collision cycles (1) and (2), which have output beams that can be made to collide with

Computing with Solitons, Figure 19 Schematic of NAND gate using bistable collision cycles. Reprinted with permission from [27]. Copyright by Old City Publishing

input beam A of cycle (3), which represents the output bit of the gate. These inputs to the gate, shown as dashed lines, change the state of beam A of the ordinarily bistable cycle (3) so that it becomes monostable. The state of cycle (3) is then steered to a known state. When the input beams are turned off, cycle (3) returns to its normal bistable condition, but with a known input state. Its state then evolves to one of two bits, and the whole system of three collision cycles can be engineered so that the final state of cycle (3) is the NAND of the two bits represented by input cycles (1) and (2). (See [27] for details.) A computer based on such bistable collision cycles is closer in spirit to present-day ordinary transistor-based computers, with a natural noise-immunity and staterestoration based on physical bistability. As mentioned in the previous subsection, however, the basic bistable cycle phenomenon awaits laboratory verification, and much remains to be learned about the dynamics, and eventual speed and reliability of such systems. Experiments The computation schemes described in the previous sections obviously rely on the correct mathematical modeling of the physics proposed for realization. We next describe experiments that verify some of the required soliton phenomenology in optical fibers. Specifically, we highlight the experimental observation of temporal vector soliton propagation and collision in a birefringent optical fiber [28]. This is both the first demonstration of temporal vector solitons with two mutually-incoherent component fields, and of vector soliton collisions in a Kerr nonlinear medium. Temporal soliton pulses in optical fiber were first predicted by Hasegawa and Tappert [14], followed by the first experimental observation by Mollenauer et al. [24]. In subsequent work, Menyuk accounted for the birefringence in polarization maintaining fiber (PMF) and predicted that vector solitons, in which two orthogonally polarized components trap each other, are stable under the proper operating conditions [21,22]. For birefringent fibers, selftrapping of two orthogonally polarized pulses can occur when XPM-induced nonlinearity compensates the birefringence-induced group velocity difference, causing the pulse in the fiber’s fast axis to slow down and the pulse in the slow axis to speed up. The first demonstration of temporal soliton trapping was performed in the sub picosecond regime [15], in which additional ultrashort pulse effects such as Raman scattering are present. In particular, this effect results in a red-shift that is linearly proportional to the propagation distance, as observed in a later temporal

Computing with Solitons

soliton trapping experiment [25]. Recently, soliton trapping in the picosecond regime was observed with equal amplitude pulses [18]. However, vector soliton propagation could not be shown, because the pulses propagated for less than 1.5 dispersion lengths. In other work, phaselocked vector solitons in a weakly birefringent fiber laser cavity with nonlinear coherent coupling between components were observed [12]. The theoretical model for linearly birefringent fiber is the following coupled nonlinear Schrödinger equation (CNLSE): i



@A x @A x ˇ2 @2 A x 2 2 2 Ax C jA j C j Cˇ1x  jA x y @z @t 2 @t 2 3

D0;



2 @A y @A y ˇ2 @ A y 2 2 2 C jA j C j Cˇ1y  jA Ay i y x @z @t 2 @t 2 3

D 0;

(9)

where t is the local time of the pulse, z is propagation distance along the fiber, and Ax;y is the slowly varying pulse envelope for each polarization component. The parameter ˇ1x;y is the group velocity associated with each fiber axis, and ˇ2 represents the group velocity dispersion, assumed equal for both polarizations. In addition, we neglect higher order dispersion and assume a loss less medium with an instantaneous electronic response, valid for picosecond pulses propagating in optical fiber. The last two terms of Eqs. (9) account for the nonlinearity due to SPM and XPM, respectively. In linearly birefringent optical fiber, a ratio of 2/3 exists between these two terms. When this ratio equals unity, the CNLSE becomes the integrable Manakov system of Eqs. (4). On the other hand, solutions of Eqs. (9) are, strictly speaking, solitary waves, not solitons. However, it was found in [36] that the family of symmetric, single-humped (fundamental or first-order) solutions, to which the current investigation in this section belongs, are all stable. Higher-order solitons, characterized by multiple humps, are unstable. Furthermore, it was shown in [37] that collisions of solitary waves in Eqs. (9) can be described by application of perturbation theory to the integrable Manakov equations, indicating the similarities between the characteristics of these two systems. Experimental Setup and Design The experimental setup is shown in Fig. 20. We synchronized two actively mode-locked erbium-doped fiber lasers (EDFLs)—EDFL1 at 1.25 GHz repetition rate, and EDFL2 at 5 GHz. EDFL2 was modulated to match with the lower

Computing with Solitons, Figure 20 Experimental setup. EDFL: Erbium-doped fiber laser; EDFA: Erbium-doped fiber amplifier; MOD: modulator; D: tunable delay line; PLC: polarization loop controller; 2:1: fiber coupler; LP: linear polarizer; /2: half-wave plate; HB-PMF and LB-PMF: high and low birefringence polarization maintaining fiber; PBS: polarization beam splitter; OSA: optical spectrum analyzer. Reprinted with permission from [28]. Copyright by the American Physical Society

repetition rate of EDFL1. Each pulse train, consisting of 2 ps pulses, was amplified in an erbium-doped fiber amplifier (EDFA) and combined in a fiber coupler. To align polarizations, a polarization loop controller (PLC) was used in one arm, and a tunable delay line (TDL) was employed to temporally align the pulses for collision. Once combined, both pulse trains passed through a linear polarizer (LP) and a half-wave plate to control the input polarization to the PMF. Approximately 2 m of high birefringence (HB) PMF preceded the specially designed 500 m of low birefringence (LB) PMF used to propagate vector solitons. Although this short length of HB-PMF will introduce some pulse splitting (on the order of 2–3 ps), the birefringent axes of the HB- and LB-PMF were swapped in order to counteract this effect. At the output, each component of the vector soliton was then split at a polarization beam splitter, followed by an optical spectrum analyzer (OSA) for measurement. The design of the LB-PMF required careful control over three characteristic length scales: the (polarization) beat length, dispersion length Ld , and nonlinear length Lnl . A beat length Lb D /n D 50 cm was chosen at a wavelength of 1550 nm, where n is the fiber birefringence. According to the approximate stability criterion of [8], this choice allows stable propagation of picosecond vector solitons. By avoiding the sub picosecond regime, ultrashort pulse effects such as intrapulse Raman scattering will not be present. The dispersion D D 2 cˇ2 /2 D 16 ps/km nm and Ld D 2T02 /jˇ2 j  70 m, where T0 D TFWHM /1:763 is a characteristic pulse width related to the

659

660

Computing with Solitons

full width at half maximum (FWHM) pulse width. Since Ld  Lb , degenerate four-wave mixing due to coherent coupling between the two polarization components can be neglected [23]. Furthermore, the total propagation distance is greater than 7 dispersion lengths. Polarization instability, in which the fast axis component is unstable, occurs when Lnl D ( P)1 is of the same order of magnitude or smaller than Lb , as observed in [6]. The nonlinearity parameter  D 2 n2 /Aeff D 1:3 (km W)1 , with Kerr nonlinearity coefficient n2 D 2:6  1020 m2 /W and measured effective mode area Aeff D 83 #m2 . In the LB-PMF, the fundamental vector soliton power P  14 W, thus Lnl D 55 m  Lb , mitigating the effect of polarization instability.

simulations, but are not as dominant (cf. Fig. 21d and e). We attribute this to the input pulse, which is calibrated for the  D 45ı case, because the power threshold for vector soliton formation in this case is largest due to the 2/3 factor between SPM and XPM nonlinear terms in the CNLSE. As the input is rotated towards unequal components, there will be extra power in the input pulse, which will radiate in the form of dispersive waves as the vector soliton forms. Due to the nature of this system, these dispersive waves can be nonlinearly trapped, giving rise to the satellite features in the optical spectra. This effect is not as prevalent in the simulations because the threshold was numerically determined at each input angle . Vector Soliton Collision

Vector Soliton Propagation We first studied propagation of vector solitons using both lasers independently. The wavelength shift for each component is shown in Fig. 21a as a function of the input polarization angle , controlled through the half-wave plate. Due to the anomalous dispersion of the fiber at this wavelength, the component in the slow (fast) axis will shift to shorter (longer) wavelengths to compensate the birefringence. The total amount of wavelength shift between components x y D ˇ1 /D D 0:64 nm, where ˇ1 D jˇ1x  ˇ1y j D 10:3 ps/km is the birefringenceinduced group velocity difference and dispersion D D 2 cˇ2 /2 D 16 ps/km nm. As  approaches 0ı (90ı ), the vector soliton approaches the scalar soliton limit, and the fast (slow) axis does not shift in wavelength, as expected. At  D 45ı , a symmetric shift results. For unequal amplitude solitons, the smaller component shifts more in wavelength than the larger component, because the former experiences more XPM. Numerical simulations of Eqs. (9), given by the dashed lines of Fig. 21a, agree very well with the experimental results. Also shown in Fig. 21 are two cases,  D 45ı and 37ı , as well as the numerical prediction. The experimental spectra show some oscillatory features at 5 GHz, which are a modulation of the EDFL2 repetition rate on the optical spectrum. A sample input pulse spectrum from EDFL1 is shown in the inset of Fig. 21, which shows no modulation due to the limited resolution of the OSA. Vector solitons from both lasers produced similar results. In this and all subsequent plots in this section, the slow and fast axis components are depicted by solid and dashed lines, respectively. As the two component amplitudes become more unequal, satellite peaks become more pronounced in the smaller component. These features are also present in the

To prepare the experiment for a collision, we operated both lasers simultaneously, detuned in wavelength to allow for dispersion-induced walkoff, and adjusted the delay line in such a way that the collision occurred halfway down the fiber. We define a collision length Lcoll D 2TFWHM /D, where  is the wavelength separation between the two vector solitons. For our setup,  D 3 nm, and Lcoll D 83:3 m. An asymptotic theory of soliton collisions, in which a full collision takes place, requires at least 5 collision lengths. The total fiber length in this experiment is equal to 6 collision lengths, long enough to ensure sufficient separation of solitons before and after collision. In this way, results of our experiments can be compared to the asymptotic theory, even though full numerical simulations will be shown for comparison. To quantify our results, we introduce a quantity R tan2 , defined as the amplitude ratio between the slow and fast components. Recall that in Sect. “Manakov Solitons”, we introduced the Manakov equations (Eqs. (4)), and described collision-induced transformations of the polarization state of the soliton, which come about due to the asymptotic analysis of the soliton collision. The polarization state is the ratio between the two components A x /A y D cot  exp(i ), and is therefore a function of the polarization angle  and the relative phase  between the two components. In the context of the experiments described in this section, these state transformations (Eqs. (5) and (7)) predict that the resulting energy exchange will be a function of amplitude ratios R1;2 , wavelength separation , and the relative phase  1;2 between the two components of each soliton, where soliton 1 (2) is the shorter (longer) wavelength soliton. A word of caution is in order at this point. An interesting consequence of the 2/3 ratio between SPM and XPM, which sets the birefringent fiber model apart from

Computing with Solitons

a

b

d

c

e

Computing with Solitons, Figure 21 Arbitrary-amplitude vector soliton propagation. a Wavelength shift vs. angle to fast axis , numerical curves given by dashed lines; b and d experimental results for D 45ı and 37ı with EDFL2, respectively. Inset: input spectrum for EDFL1; c and e corresponding numerical simulations of D 45ı and 37ı , respectively. The slow and fast axis components are depicted by solid and dashed lines, respectively. Reprinted with permission from [28]. Copyright by the American Physical Society

the Manakov model, is the relative phase between the two components. For the Manakov soliton, each component ‘feels’ the same amount of total nonlinearity, because the strengths of both SPM and XPM are equal. Therefore, regardless of the polarization angle, the amount of total nonlinear phase shift for each component is the same (even though the contributions of SPM and XPM phase shifts

are in general not equal). As a result, the relative phase between the two components stays constant during propagation, as does the polarization state. This is not the case for vector solitons in birefringent fiber. For the case of equal amplitudes, each component does experience the same amount of nonlinear phase shift, and therefore the polarization state is constant as a function of propagation

661

662

Computing with Solitons

Computing with Solitons, Figure 22 Demonstration of phase-dependent energy-exchanging collisions. a–c Short HB-PMF; d–f long HB-PMF; a,d experiment, without collision; b, e experiment, with collision; c,f simulated collision result with c 2 D 90ı and f 2 D 50ı . Values of slow-fast amplitude ratio R are given above each soliton. The slow and fast axis components are depicted by solid and dashed lines, respectively. Reprinted with permission from [28]. Copyright by the American Physical Society

distance. However, for arbitrary (unequal) amplitudes, the total phase shift for each component will be different. Consequently, the relative phase will change linearly as a function of propagation distance, and the polarization state will not be constant. As a result, the collision-induced change in polarization state, while being a function of the amplitude ratios R1;2 and wavelength separation , will also depend upon the collision position due to the propagation dependence of the relative phase  1;2 (z). To bypass this complication, we ensure that all collisions occur at the same spatial point in the fiber. Because only one half-wave plate is used in our experiment (see Fig. 20), it was not possible to prepare each vector soliton individually with an arbitrary R. In addition,

due to the wavelength dependence of the half-wave plate, it was not possible to adjust  without affecting R. First, we investigated the phase dependence of the collision. This was done by changing the length of the HBPMF entering the LB-PMF, while keeping R and  constant. As a result, we could change  1;2 due to the birefringence of the HB-PMF. Approximately 0.5 m of HBPMF was added to ensure that the total amount of temporal pulse splitting did not affect the vector soliton formation. The results are shown in Fig. 22, where Fig. 22a–c and d–f correspond to the short and long HB-PMFs, respectively. Figure 22a and d show the two vector solitons, which propagate independently when no collision occurs; as expected, the two results are similar because the OSA

Computing with Solitons

Computing with Solitons, Figure 23 Additional energy-exchanging collisions. a,d Experiment, without collision; b,e experiment, with collision; c,f simulated collision result, using 2 D 90ı inferred from the experiment of Fig. 22. Values of slow-fast amplitude ratio R are given above each soliton. The slow and fast axis components are depicted by solid and dashed lines, respectively. Reprinted with permission from [28]. Copyright by the American Physical Society

measurement does not depend on  1;2 . The result of the collision is depicted in Fig. 22b and e, along with the corresponding simulation results in Fig. 22c and f. In both of these collisions, an energy exchange between components occurs, and two important relations are satisfied: the total energy in each soliton and in each component is conserved. It can be seen that when one component in a soliton increases as a result of the collision, the other component decreases, with the opposite exchange in the second soliton. The difference between these two collisions is dramatic, in that the energy redistributes in opposite directions. For the simulations, idealized sech pulses for each component were used as initial conditions, and propagation was modeled without accounting for losses. The experimental amplitude ratio was used, and (without loss of generality [16,19,26])  2 was

varied while  1 D 0. Best fits gave  2 D 90ı (Fig. 22c) and 50ı (Fig. 22f). Despite the model approximations, experimental and numerical results all agree to within 15%. In the second set of results (Fig. 23), we changed R while keeping all other parameters constant. More specifically, we used the short HB-PMF, with initial phase difference  2 D 90ı , and changed the amplitude ratio. In agreement with theoretical predictions, the same direction of energy exchange is observed as in Fig. 22a–c. Spatial Soliton Collisions We mention here analogous experiments with spatial solitons in photorefractive media by Anastassiou et al. In [4], it is shown that energy is transferred in a collision of vector spatial solitons in a way consistent with the predictions

663

664

Computing with Solitons

for the Manakov system (although the medium is a saturable one, and only approximates the Kerr nonlinearity). The experiment in [5] goes one step farther, showing that one soliton can be used as an intermediary to transfer energy from a second soliton to a third. We thus are now at a point where the ability of both temporal and spatial vector solitons to process information for computation has been demonstrated.

properties of vector soliton collisions extend perfectly to the semi-discrete case: that is, to the case where space is discretized, but time remains continuous. This models, for example, propagation in an array of coupled nonlinear waveguides [10]. The work suggests alternative physical implementations for soliton switching or computing, and also hints that the phenomenon of soliton information processing is a very general one.

Future Directions This article discussed computing with solitons, and attempted to address the subject from basic physical principles to applications. Although the nonlinearity of fibers is very weak, the ultralow loss and tight modal confinement make them technologically attractive. By no means, however, are they the only potential material for soliton-based information processing. Others include photorefractive crystals, semiconductor waveguides, quadratic media, and Bose–Einstein condensates, while future materials research may provide new candidate systems. From a computing perspective, scalar soliton collisions are insufficient. Although measurable phase and position shifts do occur, these phenomena cannot be cascaded to affect future soliton collisions and therefore cannot transfer information from one collision to the next. Meaningful computation using soliton collisions requires a new degree of freedom; that is, a new component. Collisions of vector solitons display interesting energy-exchanging effects between components, which can be exploited for arbitrary computation and bistability. The vector soliton experiments of Sect. “Experiments” were proof-of-principle ones. The first follow-up experiments with temporal vector solitons in birefringent fiber can be directed towards a full characterization of the collision process. This can be done fairly simply using the experimental setup of Fig. 20 updated in such a way as to allow independent control of two vector soliton inputs. This would involve separate polarizers and half-waveplates, followed by a polarization preserving fiber coupler. Cascaded collisions of temporal solitons also await experimental study. As demonstrated in photorefractive crystals with a saturable nonlinearity [5], one can show that information can be passed from one collision to the next. Beyond a first demonstration of two collisions is the prospect of setting up a multi-collision feedback cycle. Discussed in Sect. “Multistable Soliton Collision Cycles”, these collision cycles can be bistable and lead to interesting applications in computation. Furthermore, the recent work of Ablowitz et al. [2] shows theoretically that the useful energy-redistribution

Bibliography 1. Ablowitz MJ, Prinari B, Trubatch AD (2004) Soliton interactions in the vector nls equation. Inverse Problems 20(4):1217–1237 2. Ablowitz MJ, Prinari B, Trubatch AD (2006) Discrete vector solitons: Composite solitons, Yang–Baxter maps and computation. Studies in Appl Math 116:97–133 3. Agrawal GP (2001) Nonlinear Fiber Optics, 3rd edn. Academic Press, San Diego 4. Anastassiou C, Segev M, Steiglitz K, Giordmaine JA, Mitchell M, Shih MF, Lan S, Martin J (1999) Energy-exchange interactions between colliding vector solitons. Phys Rev Lett 83(12):2332– 2335 5. Anastassiou C, Fleischer JW, Carmon T, Segev M, Steiglitz K (2001) Information transfer via cascaded collisions of vector solitons. Opt Lett 26(19):1498–1500 6. Barad Y, Silberberg Y (1997) Phys Rev Lett 78:3290 7. Berlekamp ER, Conway JH, Guy RK (1982) Winning ways for your mathematical plays, Vol. 2. Academic Press Inc. [Harcourt Brace Jovanovich Publishers], London 8. Cao XD, McKinstrie CJ (1993) J Opt Soc Am B 10:1202 9. Chen ZG, Segev M, Coskun TH, Christodoulides DN (1996) Observation of incoherently coupled photorefractive spatial soliton pairs. Opt Lett 21(18):1436–1438 10. Christodoulides DN, Joseph RI (1988) Discrete self-focusing in nonlinear arrays of coupled waveguides. Opt Lett 13:794–796 11. Christodoulides DN, Singh SR, Carvalho MI, Segev M (1996) Incoherently coupled soliton pairs in biased photorefractive crystals. Appl Phys Lett 68(13):1763–1765 12. Cundiff ST, Collings BC, Akhmediev NN, Soto-Crespo JM, Bergman K, Knox WH (1999) Phys Rev Lett 82:3988 13. Fredkin E, Toffoli T (1982) Conservative logic. Int J Theor Phys 21(3/4):219–253 14. Hasegawa A, Tappert F (1973) Transmission of stationary nonlinear optical pulses in dispersive dielectric fibers 1: Anomalous dispersion. Appl Phys Lett 23(3):142–144 15. Islam MN, Poole CD, Gordon JP (1989) Opt Lett 14:1011 16. Jakubowski MH, Steiglitz K, Squier R (1998) State transformations of colliding optical solitons and possible application to computation in bulk media. Phys Rev E 58(5):6752–6758 17. Kang JU, Stegeman GI, Aitchison JS, Akhmediev N (1996) Observation of Manakov spatial solitons in AlGaAs planar waveguides. Phys Rev Lett 76(20):3699–3702 18. Korolev AE, Nazarov VN, Nolan DA, Truesdale CM (2005) Opt Lett 14:132 19. Manakov SV (1973) On the theory of two-dimensional stationary self-focusing of electromagnetic waves. Zh Eksp Teor Fiz 65(2):505–516, [Sov. Phys. JETP 38, 248 (1974)]

Computing with Solitons

20. Mano MM (1972) Computer Logic Design. Prentice-Hall, Englewood Cliffs 21. Menyuk CR (1987) Opt Lett 12:614 22. Menyuk CR (1988) J Opt Soc Am B 5:392 23. Menyuk CR (1989) Pulse propagation in an elliptically birefringent Kerr medium. IEEE J Quant Elect 25(12):2674–2682 24. Mollenauer LF, Stolen RH, Gordon JP (1980) Experimental observation of picosecond pulse narrowing and solitons in optical fibers. Phys Rev Lett 45(13):1095–1098 25. Nishizawa N, Goto T (2002) Opt Express 10:1151-1160 26. Radhakrishnan R, Lakshmanan M, Hietarinta J (1997) Inelastic collision and switching of coupled bright solitons in optical fibers. Phys Rev E 56(2):2213–2216 27. Rand D, Steiglitz K, Prucnal PR (2005) Signal standardization in collision-based soliton computing. Int J of Unconv Comp 1:31–45 28. Rand D, Glesk I, Brès CS, Nolan DA, Chen X, Koh J, Fleischer JW, Steiglitz K, Prucnal PR (2007) Observation of temporal vector soliton propagation and collision in birefringent fiber. Phys Rev Lett 98(5):053902 29. Russell JS (1844) Report on waves. In: Report of the 14th meeting of the British Association for the Advancement of Science, Taylor and Francis, London, pp 331–390 30. Shih MF, Segev M (1996) Incoherent collisions between two-di-

31.

32. 33.

34. 35. 36. 37.

38.

mensional bright steady-state photorefractive spatial screening solitons. Opt Lett 21(19):1538–1540 Shor PW (1994) Algorithms for quantum computation: Discrete logarithms and factoring. In: 35th Annual Symposium on Foundations of Computer Science, IEEE Press, Piscataway, pp 124–134 Squier RK, Steiglitz K (1993) 2-d FHP lattice gasses are computation universal. Complex Systems 7:297–307 Steblina VV, Buryak AV, Sammut RA, Zhou DY, Segev M, Prucnal P (2000) Stable self-guided propagation of two optical harmonics coupled by a microwave or a terahertz wave. J Opt Soc Am B 17(12):2026–2031 Steiglitz K (2000) Time-gated Manakov spatial solitons are computationally universal. Phys Rev E 63(1):016608 Steiglitz K (2001) Multistable collision cycles of Manakov spatial solitons. Phys Rev E 63(4):046607 Yang J (1997) Physica D 108:92–112 Yang J (1999) Multisoliton perturbation theory for the Manakov equations and its applications to nonlinear optics. Phys Rev E 59(2):2393–2405 Zakharov VE, Shabat AB (1971) Exact theory of two-dimensional self-focusing and one-dimensional self-modulation of waves in nonlinear media. Zh Eksp Teor Fiz 61(1):118–134, [Sov. Phys. JETP 34, 62 (1972)]

665

666

Cooperative Games

Cooperative Games ROBERTO SERRANO1,2 1 Department of Economics, Brown University, Providence, USA 2 IMDEA-Social Sciences, Madrid, Spain

Shapley value It is a solution that prescribes a single payoff for each player, which is the average of all marginal contributions of that player to each coalition he or she is a member of. It is usually viewed as a good normative answer to the question posed in cooperative game theory. That is, those who contribute more to the groups that include them should be paid more.

Article Outline Glossary Definition of the Subject Introduction Cooperative Games The Core The Shapley Value Future Directions Bibliography Glossary Game theory Discipline that studies strategic situations. Cooperative game Strategic situation involving coalitions, whose formation assumes the existence of binding agreements among players. Characteristic or coalitional function The most usual way to represent a cooperative game. Solution concept Mapping that assigns predictions to each game. Core Solution concept that assigns the set of payoffs that cannot be improved upon by any coalition. Shapley value Solution concept that assigns the average of marginal contributions to coalitions. Definition of the Subject Cooperative game theory It is one of the two counterparts of game theory. It studies the interactions among coalitions of players. Its main question is this: Given the sets of feasible payoffs for each coalition, what payoff will be awarded to each player? One can take a positive or normative approach to answering this question, and different solution concepts in the theory lean towards one or the other. Core It is a solution concept that assigns to each cooperative game the set of payoffs that no coalition can improve upon or block. In a context in which there is unfettered coalitional interaction, the core arises as a good positive answer to the question posed in cooperative game theory. In other words, if a payoff does not belong to the core, one should not expect to see it as the prediction of the theory if there is full cooperation.

Although there were some earlier contributions, the official date of birth of game theory is usually taken to be 1944, year of publication of the first edition of the Theory of Games and Economic Behavior, by John von Neumann and Oskar Morgenstern [42]. The core was first proposed by Francis Ysidro Edgeworth in 1881 [13], and later reinvented and defined in game theoretic terms in [14]. The Shapley value was proposed by Lloyd Shapley in his 1953 Ph D dissertation [37]. Both the core and the Shapley value have been applied widely, to shed light on problems in different disciplines, including economics and political science. Introduction Game theory is the study of games, also called strategic situations. These are decision problems with multiple decision makers, whose decisions impact one another. It is divided into two branches: non-cooperative game theory and cooperative game theory. The actors in non-cooperative game theory are individual players, who may reach agreements only if they are self-enforcing. The non-cooperative approach provides a rich language and develops useful tools to analyze games. One clear advantage of the approach is that it is able to model how specific details of the interaction among individual players may impact the final outcome. One limitation, however, is that its predictions may be highly sensitive to those details. For this reason it is worth also analyzing more abstract approaches that attempt to obtain conclusions that are independent of such details. The cooperative approach is one such attempt, and it is the subject of this article. The actors in cooperative game theory are coalitions, that is, groups of players. For the most part, two facts, that coalitions can form and that each coalition has a feasible set of payoffs available to its members, are taken as given. Given the coalitions and their sets of feasible payoffs as primitives, the question tackled is the identification of final payoffs awarded to each player. That is, given a collection of feasible sets of payoffs, one for each coalition, can one predict or recommend a payoff (or set of payoffs) to be awarded to each player? Such predictions or recommendations are embodied in different solution concepts.

Cooperative Games

Indeed, one can take several approaches to answering the question just posed. From a positive or descriptive point of view, one may want to get a prediction of the likely outcome of the interaction among the players, and hence, the resulting payoff be understood as the natural consequence of the forces at work in the system. Alternatively, one can take a normative or prescriptive approach, set up a number of normative goals, typically embodied in axioms, and try to derive their logical implications. Although authors sometimes disagree on the classification of the different solution concepts according to these two criteria – as we shall see, the understanding of each solution concept is enhanced if one can view it from very distinct approaches –, in this article we shall exemplify the positive approach with the core and the normative approach with the Shapley value. While this may oversimplify the issues, it should be helpful to a reader new to the subject. The rest of the article is organized as follows. Sect. “Cooperative Games” introduces the basic model of a cooperative game, and discusses its assumptions as well as the notion of solution concepts. Sect. “The Core” is devoted to the core, and Sect. “The Shapley Value” to the Shapley value. In each case, some of the main results for each of the two are described, and examples are provided. Sect. “Future Directions” discusses some directions for future research. Cooperative Games Representations of Games. The Characteristic Function Let us begin by presenting the different ways to describe a game. The first two are the usual ways employed in noncooperative game theory. The most informative way to describe a game is called its extensive form. It consists of a game tree, specifying the timing of moves for each player and the information available to each of them at the time of making a move. At the end of each path of moves, a final outcome is reached and a payoff vector is specified. For each player, one can define a strategy, i. e., a complete contingent plan of action to play the game. That is, a strategy is a function that specifies a feasible move each time a player is called upon to make a move in the game. One can abstract from details of the interaction (such as timing of moves and information available at each move), and focus on the concept of strategies. That is, one can list down the set of strategies available to each player, and arrive at the strategic or normal form of the game. For two players, for example, the normal form is represented in a bimatrix table. One player controls the rows, and the other the columns. Each cell of the bimatrix is occupied

with an ordered pair, specifying the payoff to each player if each of them chooses the strategy corresponding to that cell. One can further abstract from the notion of strategies, which will lead to the characteristic function form of representing a game. From the strategic form, one makes assumptions about the strategies used by the complement of a coalition of players to determine the feasible payoffs for the coalition (see, for example, the derivations in [7,42]). This is the representation most often used in cooperative game theory. Thus, here are the primitives of the basic model in cooperative game theory. Let N D f1; : : : ; ng be a finite set of players. Each non-empty subset of N is called a coalition. The set N is referred to as the grand coalition. For each coalition S, we shall specify a set V(S)  RjSj containing jSj-dimensional payoff vectors that are feasible for coalition S. This is called the characteristic function, and the pair (N; V) is called a cooperative game. Note how a reduced form approach is taken because one does not explain what strategic choices are behind each of the payoff vectors in V(S). In addition, in this formulation, it is implicitly assumed that the actions taken by the complement coalition (those players in N n S) cannot prevent S from achieving each of the payoff vectors in V(S). There are more general models in which these sorts of externalities across coalitions are considered, but we shall ignore them in this article. Assumptions on the Characteristic Function Some of the most common technical assumptions made on the characteristic function are the following: (1) For each S N, V(S) is closed. Denote by @V (S) the boundary of V (S). Hence, @V (S) V(S). (2) For each S N, V(S) is comprehensive, i. e., for each jSj x 2 V(S), fxg  RC V(S). (3) For each x 2 RjSj ,   jSj @V(S) \ fxg C RC is bounded. (4) For each S N, there exists a continuously differentiable representation of V(S), i. e., a continuously differentiable function g S : RjSj ! R such that V(S) D fx 2 RjSj jg S (x)  0g : (5) For each S N, V (S) is non-leveled, i. e., for every x 2 @V (S), the gradient of g S at x is positive in all its coordinates.

667

668

Cooperative Games

With the assumptions made, @V (S) is its Pareto frontier, i. e., the set of vectors x S 2 V(S) such that there does not exist y S 2 V(S) satisfying that y i x i for all i 2 S with at least one strict inequality. Other assumptions usually made relate the possibilities available to different coalitions. Among them, a very important one is balancedness, which we shall define next: A collection T of coalitions is balanced if there exists a set of weights w(S) 2 [0; 1] for each S 2 T such that P for every i 2 N, S2T ;S fig w(S) D 1. One can think of these weights as the fraction of time that each player devotes to each coalition he is a member of, with a given coalition representing the same fraction of time for each player. The game (N; V) is balanced if x N 2 V (N) whenever (x S ) 2 V (S) for every S in a balanced collection T . That is, the grand coalition can always implement any “time-sharing arrangement” that the different subcoalitions may come up with. The characteristic function defined so far is often referred to as a non-transferable utility (NTU) game. A particular case is the transferable utility (TU) game case, in which for each coalition S N, there exists a real number v(S) such that ( ) X jSj V(S) D x 2 R : x i  v(S) : i2S

Abusing notation slightly, we shall denote a TU game by (N; v). In the TU case there is an underlying nummeraire – money – that can transfer utility or payoff at a one-to-one rate from one player to any other. Technically, the theory of NTU games is far more complex: it uses convex analysis and fixed point theorems, whereas the TU theory is based on linear inequalities and combinatorics. Solution Concepts Given a characteristic function, i. e., a collection of sets V(S), one for each S, the theory formulates its predictions on the basis of different solution concepts. We shall concentrate on the case in which the grand coalition forms, that is, cooperation is totally successful. Of course, solution concepts can be adapted to take care of the case in which this does not happen. A solution is a mapping that assigns a set of payoff vectors in V (N) to each characteristic function game (N; V). Thus, a solution in general prescribes a set, which can be empty, or a singleton (when it assigns a unique payoff vector as a function of the fundamentals of the problem). The leading set-valued cooperative solution concept is the core, while one of the most used single-valued ones is the Shapley value for TU games.

There are several criteria to evaluate the reasonableness or appeal of a cooperative solution. As outlined above, in a normative approach, one can propose axioms, abstract principles that one would like the solution to satisfy, and the next step is to pursue their logical consequences. Historically, this was the first argument to justify the Shapley value. Alternatively, one could start by defending a solution on the basis of its definition alone. In the case of the core, this will be especially natural: in a context in which players can freely get together in groups, the prediction should be payoff vectors that cannot be improved upon by any coalition. One can further enhance one’s positive understanding of the solution concept by proposing games in extensive form or in normal form played non-cooperatively by players whose self-enforcing agreements lead to a given solution. This is simply to provide non-cooperative foundations or non-cooperative implementation to the cooperative solution in question, and it is an important research agenda initiated by John Nash in [25], referred to as the Nash program (see [34] for a recent survey). Today, there are interesting results of these different kinds for many solution concepts, which include axiomatic characterizations and non-cooperative foundations. Thus, one can evaluate the appeal of the axioms and the non-cooperative procedures behind each solution to defend a more normative or positive interpretation in each case. The Core The idea of agreements that are immune to coalitional deviations was first introduced to economic theory by Edgeworth in [13], which defined the set of coalitionally stable allocations of an economy under the name “final settlements.” Edgeworth envisioned this concept as an alternative to competitive equilibrium [43], of central importance in economic theory, and was also the first to investigate the connections between the two concepts. Edgeworth’s notion, which today we refer to as the core, was rediscovered and introduced to game theory in [14]. The origins of the core were not axiomatic. Rather, its simple and appealing definition appropriately describes stable outcomes in a context of unfettered coalitional interaction. The core of the game (N; V) is the set of payoff vectors C(N; V ) D fx 2 V (N) : 6 9S N; x S 2 V (S) n @V(S)g : In words, it is the set of feasible payoff vectors for the grand coalition that no coalition can upset. If such a coalition S exists, we shall say that S can improve upon or block x, and x is deemed unstable. That is, in a context where any coalition can get together, when S has a blocking move, coalition S will form and abandon the grand coalition and

Cooperative Games

its payoffs xS in order to get to a better payoff for each of the members of the coalition, a plan that is feasible for them.

 for every i 2 N, zi is top-ranked for agent i among all bundles z satisfying that pz  p! i , P P  and i2N z i D i2N ! i .

Non-Emptiness

In words, this is what the concept expresses. First, at the equilibrium prices, each agent demands zi , i. e., wishes to purchase this bundle among the set of affordable bundles, the budget set. And second, these demands are such that all markets clear, i. e., total demand equals total supply. Note how the notion of a competitive equilibrium relies on the principle of private ownership (each individual owns his or her endowment, which allows him or her to access markets and purchase things). Moreover, each agent is a price-taker in all markets. That is, no single individual can affect the market prices with his or her actions; prices are fixed parameters in each individual’s consumption decision. The usual justification for the price-taking assumption is that each individual is “very small” with respect to the size of the economy, and hence, has no market power. One difficulty with the competitive equilibrium concept is that it does not explain where prices come from. There is no single agent in the model responsible for coming up with them. Walras in [43] told the story of an auctioneer calling out prices until demand and supply coincide, but in many real-world markets there is no auctioneer. More generally, economists attribute the equilibrium prices to the workings of the forces of demand and supply, but this appears to be simply repeating the definition. So, is there a different way one can explain competitive equilibrium prices? As it turns out, there is a very robust result that answers this question. We refer to it as the equivalence principle (see, e. g., [6]), by which, under certain regularity conditions, the predictions provided by different game-theoretic solution concepts, when applied to an economy with a large enough set of agents, tend to converge to the set of competitive equilibrium allocations. One of the first results in this tradition was provided by Edgeworth in 1881 for the core. Note how the core of the economy can be defined in the space of allocations, using the same definition as above. Namely, a feasible allocation is in the core if it cannot be blocked by any coalition of agents when making use of the coalition’s endowments. Edgeworth’s result was generalized later by Debreu and Scarf in [11] for the case in which an exchange economy is replicated an arbitrary number of times (Anderson studies in [1] the more general case of arbitrary sequences of economies, not necessarily replicas). An informal statement of the Debreu–Scarf theorem follows:

The core can prescribe the empty set in some games. A game with an empty core is to be understood as a situation of strong instability, as any payoffs proposed to the grand coalition are vulnerable to coalitional blocking. Example Consider the following simple majority 3-player TU game, in which the votes of at least two players makes the coalition winning. That is, we represent the situation by the following characteristic function: v(S) D 1 for any S containing at least two members, v(fig) D 0 for all i 2 N. Clearly, C(N; v) D ;. Any feasible payoff agreement proposed to the grand coalition will be blocked by at least one coalition. An important sufficient condition for the non-emptiness of the core of NTU games is balancedness, as shown in [32]: Theorem 1 (Scarf [32]) Let the game (N; V) be balanced. Then C(N; V ) 6D ;. For the TU case, balancedness is not only sufficient, but it becomes also necessary for the non-emptiness of the core: Theorem 2 (Bondareva [9]; Shapley [39]) Let (N; v) be a TU game. Then, (N; v) is balanced if and only if C(N; v) 6D ;. The Connections with Competitive Equilibrium In economics, the institution of markets and the notion of prices are essential to the understanding of the allocation of goods and the distribution of wealth among individuals. For simplicity in the presentation, we shall concentrate on exchange economies, and disregard production aspects. That is, we shall assume that the goods in question have already been produced in some fixed amounts, and now they are to be allocated to individuals to satisfy their consumption needs. An exchange economy is a system in which each agent i l of commodin the set N has a consumption set Z i RC ity bundles, as well as a preference relation over Zi and an initial endowment ! i 2 Z i of the commodities. A feasible allocation of goods in the economy is a list of bundles P P (z i ) i2N such that z i 2 Z i and i2N z i  i2N ! i . An allocation is competitive if it is supported by a competitive equilibrium. A competitive equilibrium is a price-allocation pair (p; (z i ) i2N ), where p 2 R l n f0g is such that

669

670

Cooperative Games

Theorem 3 (Debreu and Scarf [11]) Consider an exchange economy. Then, (i) The set of competitive equilibrium allocations is contained in the core. (ii) For each non-competitive core allocation of the original economy, there exists a sufficiently large replica of the economy for which the replica of the allocation is blocked. The first part states a very appealing property of competitive allocations, i. e., their coalitional stability. The second part, known as the core convergence theorem, states that the core “shrinks” to the set of competitive allocations as the economy grows large. In [3], Aumann models the economy as an atomless measure space, and demonstrates the following core equivalence theorem: Theorem 4 (Aumann [3]) Let the economy consists of an atomless continuum of agents. Then, the core coincides with the set of competitive allocations. For readers who wish to pursue the topic further, [2] provides a recent survey. Axiomatic Characterizations The axiomatic foundations of the core were provided much later than the concept was proposed. These characterizations are all inspired by Peleg’s work. They include [26,27], and [36] – the latter paper also provides an axiomatization of competitive allocations in which core convergence insights are exploited. In all these characterizations, the key axiom is that of consistency, also referred to as the reduced game property. Consistency means that the outcomes prescribed by a solution should be “invariant” to the number of players in the game. More formally, let (N; V) be a game, and let  be a solution. Let x 2 (N; V ). Then, the solution is consistent if for every S N, x S 2 (S; Vx S ), where (S; Vx S ) is the reduced game for S given payoffs x, defined as follows. The feasible set for S in this reduced game is the projection of V(N) at x NnS , i. e., what remains after paying those outside of S: Vx S (S) D fy S : (y S ; x NnS ) 2 V(N)g : However, the feasible set of T  S, T ¤ S, allows T to make deals with any coalition outside of S, provided that those services are paid at the rate prescribed by x NnS : Vx S (T) D fy T 2 [Q NnS (y T ; x Q ) 2 V(T [ Q)g :

It can be shown that the core satisfies consistency with respect to this reduced game. Moreover, consistency is the central axiom in the characterization of the core, which, depending on the version one looks at, uses a host of other axioms; see [26,27,36]. Non-cooperative Implementation To obtain a non-cooperative implementation of the core, the procedure must embody some feature of anonymity, since the core is usually a large set and it contains payoffs where different players are treated very differently. For instance, if the procedure always had a fixed set of moves, typically the prediction would favor the first mover, making it impossible to obtain an implementation of the entire set of payoffs. The model in [30] builds in this anonymity by assuming that negotiations take place in continuous time, so that anyone can speak at the beginning of the game, and at any point in time, instead of having a fixed order. The player that gets to speak first makes a proposal consisting of naming a coalition that contains him and a feasible payoff for that coalition. Next, the players in that coalition get to respond. If they all accept the proposal, the coalition leaves and the game continues among the other players. Otherwise, a new proposal may come from any player in N. It is shown that, if the TU game has a non-empty core (as well as any of its subgames), a class of stationary self-enforcing predictions of this procedure coincide with the core. If a core payoff is proposed to the grand coalition, there are no incentives for individual players to reject it. Conversely, a non-core payoff cannot be sustained because any player in a blocking coalition has an incentive to make a proposal to that coalition, who will accept it (knowing that the alternative, given stationarity, would be to go back to the non-core status quo). [24] offers a discrete-time version of the mechanism: in this work, the anonymity required is imposed on the solution concept, by looking at the orderindependent equilibria of the procedure. The model in [33] sets up a market to implement the core. The anonymity of the procedure stems from the random choice of broker. The broker announces a vector (x1 ; : : : ; x n ), where the components add up to v(N). One can interpret xi as the price for the productive asset held by player i. Following an arbitrary order, the remaining players either accept or reject these prices. If player i accepts, he sells his asset to the broker for the price xi and leaves the game. Those who reject get to buy from the broker, at the called out prices, the portfolio of assets of their choice if the broker still has them. If a player rejects, but does not get to buy the portfolio of assets he would like because some-

Cooperative Games

one else took them before, he can always leave the market with his own asset. The broker’s payoff is the worth of the final portfolio of assets that he holds, plus the net monetary transfers that he has received. It is shown in [33] that the prices announced by the broker will always be his top-ranked vectors in the core. If the TU game is such that gains from cooperation increase with the size of coalitions, a beautiful theorem of Shapley in [41] is used to prove that the set of all equilibrium payoffs of this procedure will coincide with the core. Core payoffs are here understood as those price vectors where all arbitrage opportunities in the market have been wiped out. Also, procedures in [35] implement the core, but do not rely on the TU assumption, and they use a procedure in which the order of moves can be endogenously changed by players. Finally, yet another way to build anonymity in the procedure is by allowing the proposal to be made by brokers outside of the set N, as done in [28]. An Application Consider majority games within a parliament. Suppose there are 100 seats, and decisions are made by simple majority so that 51 votes are required to pass a piece of legislation. In the first specification, suppose there is a very large party – player 1 –, who has 90 seats. There are five small parties, with 2 seats each. Given the simple majority rules, this problem can be represented by the following TU characteristic function: v(S) D 1 if S contains player 1, and v(S) D 0 otherwise. The interpretation is that each winning coalition can get the entire surplus – pass the desired proposal. Here, a coalition is winning if and only if player 1 is in it. For this problem, the core is a singleton: the entire unit of surplus is allocated to player 1, who has all the power. Any split of the unit surplus of the grand coalition (v(N) D 1) that gives some positive fraction of surplus to any of the small parties can be blocked by the coalition of player 1 alone. Consider now a second problem, in which player 1, who continues to be the large party, has 35 seats, and each of the other five parties has 13 seats. Now, the characteristic function is as follows: v(S) D 1 if and only if S either contains player 1 and two small parties, or it contains four of the small parties; v(S) D 0 otherwise. It is easy to see that now the core is empty: any split of the unit surplus will be blocked by at least one coalition. For example, the entire unit going to player 1 is blocked by the coalition of all five small parties, which can award 0.2 to each of them. But this arrangement, in which each small party gets 0.2 and player 1 nothing, is blocked as well, because player 1

can bribe two of the small parties (say, players 2 and 3) and promise them 1/3 each, keeping the other third for itself, and so on. The emptiness of the core is a way to describe the fragility of any agreement, due to the inherent instability of this coalition formation game. The Shapley Value Now consider a transferable utility or TU game in characteristic function form. The number v(S) is referred to as the worth of S, and it expresses S’s initial position (e. g., the maximum total amount of surplus in nummeraire – money, or power – that S initially has at its disposal. Axiomatics Shapley in [37] is interested in solving in a fair and unique way the problem of distribution of surplus among the players, when taking into account the worth of each coalition. To do this, he restricts attention to single-valued solutions and resorts to the axiomatic method. He proposes the following axioms on a single-valued solution: (i) Efficiency: The payoffs must add up to v(N), which means that all the grand coalition surplus is allocated. (ii) Symmetry: If two players are substitutes because they contribute the same to each coalition, the solution should treat them equally. (iii) Additivity: The solution to the sum of two TU games must be the sum of what it awards to each of the two games. (iv) Dummy player: If a player contributes nothing to every coalition, the solution should pay him nothing. (To be precise, the name of the first axiom should be different. In an economic sense, the statement does imply efficiency in superadditive games, i. e., when for every pair of disjoint coalitions S and T, v(S) C v(T)  v(S [ T). In the absence of superadditivity, though, forming the grand coalition is not necessarily efficient, because a higher aggregate payoff can be obtained from a different coalition structure.) The surprising result in [37] is this: Theorem 5 (Shapley [37]) There is a unique single-valued solution to TU games satisfying efficiency, symmetry, additivity and dummy. It is what today we call the Shapley value, the function that assigns to each player i the payoff Sh i (N; v) D

X (jSj  1)!(jNj  jSj)! [v(S)v(S nfig)]: jNj!

S;i2S

That is, the Shapley value awards to each player the average of his marginal contributions to each coalition. In taking this average, all orders of the players are considered to

671

672

Cooperative Games

be equally likely. Let us assume, also without loss of generality, that v(fig) D 0 for each player i. What is especially surprising in Shapley’s result is that nothing in the axioms (with the possible exception of the dummy axiom) hints at the idea of marginal contributions, so marginality in general is the outcome of all the axioms, including additivity or linearity. Among the axioms utilized by Shapley, additivity is the one with a lower normative content: it is simply a mathematical property to justify simplicity in the computation of the solution. Young in [45] provides a beautiful counterpart to Shapley’s theorem. He drops additivity (as well as the dummy player axiom), and instead, uses an axiom of marginality. Marginality means that the solution should pay the same to a pleyar in two games if his or her marginal contributions to coalitions is the same in both games. Marginality is an idea with a strong tradition in economic theory. Young’s result is “dual” to Shapley’s, in the sense that marginality is assumed and additivity derived as the result: Theorem 6 (Young [45]) There exists a unique singlevalued solution to TU games satisfying efficiency, symmetry and marginality. It is the Shapley value. Apart from these two, [19] provides further axiomatizations of the Shapley value using the idea of potential and the concept of consistency, as described in the previous section. There is no single way to extend the Shapley value to the class of NTU games. There are three main extensions that have been proposed: the Shapley -transfer value [40], the Harsanyi value [16], and the Maschler–Owen consistent value [23]. They were axiomatized in [5,10,17], respectively. The Connections with Competitive Equilibrium As was the case for the core, there is a value equivalence theorem. The result holds for the TU domain (see [4,8,38]). It can be shown that the Shapley value payoffs can be supported by competitive prices. Furthermore, in large enough economies, the set of competitive payoffs “shrinks” to approximate the Shapley value. However, the result cannot be easily extended to the NTU domain. While it holds for the -transfer value, it need not obtain for the other extensions. For further details, the interested reader is referred to [18] and the references therein.

tive procedures and techniques to the same end, including [20,21,29,44]. We shall concentrate on the description of the procedure proposed by Hart and Mas-Colell in [20]. Generalizing an idea found in [22], which studies the case of ı D 0 – see below –, Hart and Mas-Colell propose the following non-cooperative procedure. With equal probability, each player i 2 N is chosen to publicly make a feasible proposal to the others: (x1 ; : : : ; x n ) is such that the sum of its components cannot exceed v(N). The other players get to respond to it in sequence, following a prespecified order. If all accept, the proposal is implemented; otherwise, a random device is triggered. With probability 0  ı < 1, the same game continues being played among the same n players (and thus, a new proposer will be chosen again at random among them), but with probability 1  ı, the proposer leaves the game. He is paid 0 and his resources are removed, so that in the next period, proposals to the remaining n  1 players cannot add up to more than v(N n fig). A new proposer is chosen at random among the set N n fig, and so on. As shown in [20], there exists a unique stationary selfenforcing prediction of this procedure, and it actually coincides with the Shapley value payoffs for any value of ı. (Stationarity means that strategies cannot be history dependent). As ı ! 1, the Shapley value payoffs are also obtained not only in expectation, but with independence of who is the proposer. One way to understand this result, as done in [20], is to check that the rules of the procedure and stationary behavior in it are in agreement with Shapley’s axioms. That is, the equilibrium relies on immediate acceptances of proposals, stationary strategies treat substitute players similarly, the equations describing the equilibrium have an additive structure, and dummy players will have to receive 0 because no resources are destroyed if they are asked to leave. It is also worth stressing the important role in the procedure of players’ marginal contributions to coalitions: following a rejection, a proposer incurs the risk of being thrown out and the others of losing his resources, which seem to suggest a “price” for them. In [21], the authors study the conditions under which stationarity can be removed to obtain the result. Also, [29] uses a variant of the Hart and Mas-Colell procedure, by replacing the random choice of proposers with a bidding stage, in which players bid to obtain the right to make proposals.

Non-cooperative Implementation Reference [15] was the first to propose a procedure that provided some non-cooperative foundations of the Shapley value. Later, other authors have provided alterna-

An Application Consider again the class of majority problems in a parliament consisting of 100 seats. As we shall see, the Shap-

Cooperative Games

ley value is a good way to understand the power that each party has in the legislature. Let us begin by considering again the problem in which player 1 has 90 seats, while each of the five small parties has 2 seats. It is easy to see that the Shapley value, like the core in this case, awards the entire unit of surplus to player 1: effectively, each of the small parties is a dummy player, and hence, the Shapley value awards zero to each of them. Consider a second problem, in which player 1 is a big party with 35 seats, and there are 5 small parties, with 13 seats each. The Shapley value awards 1/3 to the large party, and, by symmetry, 2/15 to each of the small parties. To see this, we need to see when the marginal contributions of player 1 to any coalition are positive. Recall that there are 6! possible orders of players. Note how, if player 1 arrives first or second in the room in which the coalition is forming, his marginal contribution is zero: the coalition was losing before he arrived and continues to be a losing coalition after his arrival. Similarly, his marginal contribution is also zero if he arrives fifth or sixth to the coalition; indeed, in this case, before he arrives the coalition is already winning, so he adds nothing to it. Thus, only when he arrives third or fourth, which happens a third of the times, does he change the nature of the coalition, from losing to winning. This explains his Shapley value share of 1/3. In this game, the Shapley value payoffs roughly correspond to the proportion of seats that each party has. Next, consider a third problem in which there are two large parties, while the other four parties are very small. For example, let each of the large parties have 48 seats (say, players 1 and 2), while each of the four small parties has only one seat. Now, the Shapley value payoffs are 0.3 to each of the two large parties, and 0.1 to each of the small ones. To see this, note that the marginal contribution of a small party is only positive when he comes fourth in line, and out of the preceding three parties in the coalition, exactly one of them is a large party, i. e., 72 orders out of the 5! orders in which he is fourth. That is, (72/5!)  (1/6) D 1/10. In this case, the competition between the large parties for the votes of the small parties increases the power of the latter quite significantly, with respect to the proportion of seats that each of them holds. Finally, consider a fourth problem with two large parties (players 1 and 2) with 46 seats each, one mid-size party (player 3) with 5 seats, and three small parties, each with one seat. First, note that each of the three small parties has become a dummy player: no winning coalition where he belongs becomes losing if he leaves the coalition, and so players 4, 5 and 6 are paid zero by the Shapley value. Now, note that, despite the substantial differ-

ence of seats between each large party and the mid-size party, each of them is identical in terms of marginal contributions to a winning coalition. Indeed, for i D 1; 2; 3, player i’s marginal contribution to a coalition is positive only if he arrives second or third or fourth or fifth (and out of the preceding players in the coalition, exactly one is one of the non-dummy players). Note how the Shapley value captures nicely the changes in the allocation of power due to each different political scenario. In this case, the fierce competition between the large parties for the votes of player 3, the swinging party to form a majority, explains the equal share of power among the three.

Future Directions This article has been a first approach to cooperative game theory, and has emphasized two of its most important solution concepts. The literature on these topics is vast, and the interested reader is encouraged to consult the general references listed below. For the future, one should expect to see progress of the theory into areas that have been less explored, including games with asymmetric information and games with coalitional externalities. In both cases, the characteristic function model must be enriched to take care of the added complexities. Relevant to this encyclopedia are issues of complexity. The complexity of cooperative solution concepts has been studied (see, for instance, [12]). In terms of computational complexity, the Shapley value seems to be easy to compute, while the core is harder, although some classes of games have been identified in which this task is also simple. Finally, one should insist on the importance of novel and fruitful applications of the theory to shed new light on concrete problems. In the case of the core, for example, the insights of core stability in matching markets have been successfully applied by Alvin Roth and his collaborators to the design of matching markets in the “real world” (e. g., the job market for medical interns and hospitals, the allocation of organs from doners to patients, and so on) – see [31]. Bibliography Primary Literature 1. Anderson RM (1978) An elementary core equivalence theorem. Econometrica 46:1483–1487 2. Anderson RM (2008) Core convergence. In: Durlauff S, Blume L (eds) The New Palgrave Dictionary of Economics, 2nd edn. McMillan, London 3. Aumann RJ (1964) Markets with a continuum of traders. Econometrica 32:39–50

673

674

Cooperative Games

4. Aumann RJ (1975) Values of markets with a continuum of traders. Econometrica 43:611–646 5. Aumann RJ (1985) An axiomatization of the non-transferable utility value. Econometrica 53:599–612 6. Aumann RJ (1987) Game theory. In: Eatwell J, Milgate M, Newman P (eds) The New Palgrave Dictionary of Economics, Norton, New York 7. Aumann RJ, Peleg B (1960) Von Neumann–Morgenstern solutions to cooperative games without side payments. Bull Am Math Soc 66:173–179 8. Aumann RJ, Shapley LS (1974) Values of Non-Atomic Games. Princeton University Press, Princeton 9. Bondareva ON (1963) Some applications of linear programming methods to the theory of cooperative games (in Russian). Problemy Kibernetiki 10:119 139 10. de Clippel G, Peters H, Zank H (2004) Axiomatizing the Harsanyi solution, the symmetric egalitarian solution and the consistent solution for NTU-games. I J Game Theory 33:145–158 11. Debreu G, Scarf H (1963) A limit theorem on the core of an economy. Int Econ Rev 4:235–246 12. Deng X, Papadimitriou CH (1994) On the complexity of cooperative solution concepts. Math Oper Res 19:257–266 13. Edgeworth FY (1881) Mathematical Psychics. Kegan Paul Publishers, London. reprinted in 2003) Newman P (ed) F. Y. Edgeworth’s Mathematical Psychics and Further Papers on Political Economy. Oxford University Press, Oxford 14. Gillies DB (1959) Solutions to general non-zero-sum games. In: Tucker AW, Luce RD (eds) Contributions to the Theory of Games IV. Princeton University Press, Princeton, pp 47–85 15. Gul F (1989) Bargaining foundations of Shapley value. Econometrica 57:81–95 16. Harsanyi JC (1963) A simplified bargaining model for the n-person cooperative game. Int Econ Rev 4:194–220 17. Hart S (1985) An axiomatization of Harsanyi’s non-transferable utility solution. Econometrica 53:1295–1314 18. Hart S (2008) Shapley value. In: Durlauff S, Blume L (eds) The New Palgrave Dictionary of Economics, 2nd edn. McMillan, London 19. Hart S, Mas-Colell A (1989) Potencial, value and consistency. Econometrica 57:589–614 20. Hart S, Mas-Colell A (1996) Bargaining and value. Econometrica 64:357–380 21. Krishna V, Serrano R (1995) Perfect equilibria of a model of n-person non-cooperative bargaining. I J Game Theory 24:259–272 22. Mas-Colell A (1988) Algunos comentarios sobre la teoria cooperativa de los juegos. Cuadernos Economicos 40:143–161 23. Maschler M, Owen G (1992) The consistent Shapley value for games without side payments. In: Selten R (ed) Rational Interaction: Essays in Honor of John Harsanyi. Springer, New York 24. Moldovanu B, Winter E (1995) Order independent equilibria. Games Econ Behav 9:21–34 25. Nash JF (1953) Two person cooperative games. Econometrica 21:128–140 26. Peleg B (1985) An axiomatization of the core of cooperative games without side payments. J Math Econ 14:203–214

27. Peleg B (1986) On the reduced game property and its converse. I J Game Theory 15:187–200 28. Pérez-Castrillo D (1994) Cooperative outcomes through noncooperative games. Games Econ Behav 7:428–440 29. Pérez-Castrillo D, Wettstein D (2001) Bidding for the surplus: a non-cooperative approach to the Shapley value. J Econ Theory 100:274–294 30. Perry M, Reny P (1994) A non-cooperative view of coalition formation and the core. Econometrica 62:795–817 31. Roth AE (2002) The economist as engineer: game theory, experimentation and computation as tools for design economics. Econometrica 70:1341–1378 32. Scarf H (1967) The core of an N person game. Econometrica 38:50–69 33. Serrano R (1995) A market to implement the core. J Econ Theory 67:285–294 34. Serrano R (2005) Fifty years of the Nash program, 1953–2003. Investigaciones Económicas 29:219–258 35. Serrano R, Vohra R (1997) Non-cooperative implementation of the core. Soc Choice Welf 14:513–525 36. Serrano R, Volij O (1998) Axiomatizations of neoclassical concepts for economies. J Math Econ 30:87–108 37. Shapley LS (1953) A value for n-person games. In: Tucker AW, Luce RD (eds) Contributions to the Theory of Games II. Princeton University Press, Princeton, pp 307–317 38. Shapley LS (1964) Values of large games VII: a general exchange economy with money. Research Memorandum 4248PR. RAND Corporation, Santa Monica 39. Shapley LS (1967) On balanced sets and cores. Nav Res Logist Q 14:453–460 40. Shapley LS (1969) Utility comparison and the theory of games. In La Décision: Agrégation et Dynamique des Ordres de Préférence. CNRS, Paris 41. Shapley LS (1971) Cores of convex games. I J Game Theory 1:11–26 42. von Neumann J, Morgenstern O (1944) Theory of Games and Economic Behavior. Princeton University Press, Princeton 43. Walras L (1874) Elements of Pure Economics, or the Theory of Social Wealth. English edition: Jaffé W (ed) Reprinted in 1984 by Orion Editions, Philadelphia 44. Winter E (1994) The demand commitment bargaining and snowballing of cooperation. Econ Theory 4:255–273 45. Young HP (1985) Monotonic solutions of cooperative games. I J Game Theory 14:65–72

Books and Reviews Myerson RB (1991) Game Theory: An Analysis of Conflict. Harvard University Press, Cambridge Osborne MJ, Rubinstein A (1994) A Course in Game Theory. MIT Press, Cambridge Peleg B, Sudholter P (2003) Introduction to the Theory of Cooperative Games. Kluwer, Amsterdam. 2nd edn. Springer, Berlin Roth AE, Sotomayor M (1990) Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Cambridge University Press, Cambridge

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

Cooperative Games (Von Neumann– Morgenstern Stable Sets) JUN W AKO1 , SHIGEO MUTO2 1 Department of Economics, Gakushuin University, Tokyo, Japan 2 Graduate School of Decision Science and Technology, Tokyo, Institute of Technology, Tokyo, Japan Article Outline Glossary Definition of the Subject Introduction Stable Sets in Abstract Games Stable Set and Core Stable Sets in Characteristic Function form Games Applications of Stable Sets in Abstract and Characteristic Function Form Games Stable Sets and Farsighted Stable Sets in Strategic Form Games Applications of Farsighted Stable Sets in Strategic Form Games Future Directions Bibliography Glossary Characteristic function form game A characteristic function form game consists of a set of players and a characteristic function that gives each group of players, called a coalition, a value or a set of payoff vectors that they can gain by themselves. It is a typical representation of cooperative games. For characteristic function form games, several solution concepts are defined such as von Neumann – Morgenstern stable set, core, bargaining set, kernel, nucleolus, and Shapley value. Abstract game An abstract game consists of a set of outcomes and a binary relation, called domination, on the outcomes. Von Neumann and Morgenstern presented this game form for general applications of stable sets. Strategic form game A strategic form game consists of a player set, each player’s strategy set, and each player’s payoff function. It is usually used to represent non-cooperative games. Imputation An imputation is a payoff vector in a characteristic function form game that satisfies group rationality and individual rationality. The former means that the players divide the amount that the grand coalition of all players can gain, and the latter says that each

player is assigned at least the amount that he/she can gain by him/herself. Domination Domination is a binary relation defined on the set of imputations, outcomes, or strategy combinations, depending on the form of a given game. In characteristic function form games, an imputation is said to dominate another imputation if there is a coalition of players such that they can realize their payoffs in the former by themselves, and make each of them better off than in the latter. Domination given a priori in abstract games can be also interpreted in the same way. In strategic form games, domination is defined on the basis of commonly beneficial changes of strategies by coalitions. Internal stability A set of imputations (outcomes, strategy combinations) satisfies internal stability if there is no domination between any two imputations in the set. External stability A set of imputations (outcomes, strategy combinations) satisfies external stability if any imputation outside the set is dominated by some imputation inside the set. von Neumann–Morgenstern stable set A set of imputations (outcomes, strategy combinations) is a von Neumann–Morgenstern stable set if it satisfies both internal and external stability. Farsighted stable set A farsighted stable set is a more sophisticated stable set concept mainly defined for strategic form games. Given two strategy combinations x and y, we say that x indirectly dominates y if there exist a sequence of coalitions S 1 ; : : : ; S p and a sequence of strategy combinations y D x 0 ; x 1 ; : : : ; x p D x such that each coalition S j can induce strategy combination x j by a joint move from x j1 , and all members of S j end up with better payoffs at x p D x compared to the payoffs at x j1 . A farsighted stable set is a set of strategy combinations that is stable both internally and externally with respect to the indirect domination. A farsighted stable set can be defined in abstract games and characteristic function form games. Definition of the Subject The von Neumann–Morgenstern stable set for solution (hereafter stable set) is the first solution concept in cooperative game theory defined by J. von Neumann and O. Morgenstern. Though it was defined cooperative games in characteristic function form, von Neumann and Morgenstern gave a more general definition of a stable set in abstract games. Later, J. Greenberg and M. Chwe cleared a way to apply the stable set concept to the analysis of

675

676

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

non-cooperative games in strategic and extensive forms. Though general existence of stable sets in characteristic function form games was denied by a 10-person game presented by W.F. Lucas, stable sets exist in many important games. In voting games, for example, stable sets exist, and they indicate what coalitions can be formed in detail. The core, on the other hand, can be empty in voting games, though it is one of the best known solution concept in cooperative game theory. The analysis of stable sets is not necessarily straightforward, since it can reveal a variety of possibilities. However, stable sets give us deep insights into players’ behavior in economic, political and social situations such as coalition formation among players. Introduction For studies of economic or social situations where players can take cooperative behavior, the stable set was defined by von Neumann and Morgenstern [31] as a solution concept for characteristic function form cooperative games. They also defined the stable set in abstract games so that one can apply the concept to more general games including non-cooperative situations. Greenberg [9] and Chwe [3] cleared a way to apply the stable set concept to the analysis of non-cooperative games in strategic and extensive forms. The stable set is a set of outcomes satisfying two stability conditions: internal and external stability. The internal stability means that between any two outcomes in the set, there is no group of players such that all of its members prefer one to the other and they can realize the preferred outcome. The external stability means that for any outcome outside the set, there is a group of players such that all of its members have a commonly preferred outcome in the set and they can realize it. Though general existence was denied by Lucas [18] and Lucas and Rabie [21], the stable set has revealed many interesting behavior of players in economic, political, and social systems. Von Neumann and Morgenstern (also Greenberg) assumed only a single move by a group of players. Harsanyi [13] first pointed out that stable sets in characteristic function form games may fail to cover “farsighted” behavior of players. Harsanyi’s work inspired Chwe’s [3] contribution to the formal study of foresight in social environments. Chwe paid attention to a possible chain of moves such that a move of a group of players will bring about a new move of another group of players, which will further cause a third group of players to move, and so on. Then the group of players moving first should take into account a sequence of moves that may follow, and evaluate their profits obtained at the end. By incorporating such a se-

quence of moves, Chwe [3] defined a more sophisticated stable set, which we call a farsighted stable set in what follows. Recent work by Suzuki and Muto [51,52] showed that the farsighted stable set provides more reasonable outcomes than the original (myopic) stable set in important classes of strategic form games. The rest of the chapter is organized as follows. Section “Stable Sets in Abstract Games” presents the definition of a stable set in abstract games. Section “Stable Set and Core” shows basic relations between the two solution concepts of stable set and core. Section “Stable Sets in Characteristic Function form Games” gives the definition of a stable set in characteristic function form games. Section “Applications of Stable Sets in Abstract and Characteristic Function Form Games” first discusses general properties of stable sets in characteristic function form games, and then presents applications of stable sets in abstract and characteristic function games to political and economic systems. Section “Stable Sets and Farsighted Stable Sets in Strategic Form Games” gives the definitions of a stable set and a farsighted stable set in strategic form games. Section “Applications of Farsighted Stable Sets in Strategic Form Games” discusses properties of farsighted stable sets and their applications to social and economic situations. Section “Future Directions” ends the chapter with remarks. Section “Bibliography” offers a list of references. Stable Sets in Abstract Games An abstract game is a pair (W; ) of a set of outcomes W and an irreflexive binary relation  on W, where irreflexivity means that x  x holds for no element x 2 W. The relation  is interpreted as follows: if x  y holds, then there must exist a set of players such that they can induce x from y by themselves and all of them are better off in x. A subset K of W is called a stable set of abstract game (W; ) if the following two conditions are satisfied: 1. Internal stability: For any two elements x; y 2 K; x  y never holds. 2. External stability: For any element z … K, there must exist x 2 K such that x  z. We explain more in detail what the external and internal stability conditions imply in the definition of a stable set. Suppose players have common understanding that each outcome inside a stable set is “stable” and that each outcome outside the set is “unstable”. Here the “stability” means that no group of players has an incentive to deviate from it, and the “instability” means that there is at least one group of players that has an incentive to deviate from

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

it. Then the internal and external stability conditions guarantee that the common understanding is never disproved, and thus continues to prevail. In fact, suppose the set is both internally and externally stable, and pick any outcome in the set. Then by internal stability, no group of players can be better off by deviating from it and inducing an outcome inside the set. Thus no group of players reaches an agreement to deviate, which makes each outcome inside the set remain stable. Deviating players may be better off by inducing an outcome outside the set; but outcomes outside the set are commonly considered to be unstable. Thus deviating players can never expect that such an outcome will continue. Next pick any outcome outside the set. Then by external stability, there exists at least one group of players who can become better off by deviating from it and inducing an outcome inside the set. The induced outcome is considered to be stable since it is in the set. Hence the group of players will deviate. Hence each outcome outside the set remains unstable. Stable Set and Core Another solution concept that is widely known is a core. For a given abstract game G D (W; ), a subset C of W is called the core of G if C D fx 2 Wj there is no y 2 W with y  xg. From the definition, the core satisfies internal stability. Thus the core C of G is contained in any stable set of G if the latter exists. To see this, suppose that C 6 K for a stable set K, and C is non-empty, i. e., C ¤ ;. (If C D ;, then clearly C  K.) Pick any element x 2 CnK. Since x … K, by external stability there exists y 2 K with y  x, which contradicts x 2 C. When the core of a game satisfies external stability, it has very strong stability. It is called the stable core. The stable core is the unique stable set of the game. Stable Sets in Characteristic Function form Games An n-person game in characteristic function form with transferable utility is a pair (N; v) of a player set N D f1; 2; : : : ; ng and a characteristic function v on the set 2 N of all subsets of N such that v(;) D 0. Each subset of N is called a coalition. The game (N; v) is often called a TU-game. A characteristic function form game without transferable utility is called an NTU-game: its characteristic function gives each coalition a set of payoff vectors to players. For NTU-games and their stable sets, refer to Aumann and Peleg [1] and Peleg [35]. In this section, hereafter, we deal with only TU characteristic function form games, and refer to them simply as characteristic function form games.

Let (N; v) be a characteristic function form game. The characteristic function v assigns a real number v(S) to each coalition S N. The value v(S) indicates the worth that coalition S can achieve by itself. An n-dimensional vector x D (x1 ; x2 ; : : : ; x n ) is called a payoff vector. A payoff vector x is called an imputation if the following two conditions are satisfied: P 1. Group rationality: niD1 x i D v(N), 2. Individual rationality: x i v(fig) for each i 2 N. The first condition says that all players cooperate and share the worth v(N) that they can produce. The second condition says that each player must receive at least the amount that he/she can gain by him/herself. Let A be the set of all imputations. Let x; y be any imputations and S be any coalition. We say that x dominates y via S and write this as x domS y if the following two conditions are satisfied: 1. Coalitional rationality: x i > y i for each i 2 S, P 2. Effectivity: i2S x i  v(S). The first condition says that every member of coalition S strictly prefers x to y. The second condition says that coalition S can guarantee the payoff xi for each member i 2 S by themselves. We say that x dominates y (denoted by x dom y) if there exists at least one coalition S such that x domS y. It should be noted that a pair (A, dom) is an abstract game defined in Sect. “Stable Sets in Abstract Games”. It is easily seen that “dom” is an irreflexive binary relation on A. A stable set and the core of game (N; v) are defined to be a stable set and the core of the associated abstract game (A, dom), respectively. Since von Neumann and Morgenstern defined the stable set, its general existence had been one of the most important problems in game theory. The problem was eventually solved in a negative way. Lucas [18] found the following 10-person characteristic function form game in which no stable set exists. A game with no stable set: Consider the following 10person game: N D f1; 2; : : : ; 10g ; v(N) D 5 ; v(f1; 3; 5; 7; 9g) D 4 ; v(f3; 5; 7; 9g) D v(f1; 5; 7; 9g) D v(f1; 3; 7; 9g) D 3 ; v(f1; 2g) D v(f3; 4g) D v(f5; 6g) D v(f7; 8g) D v(f9; 10g) D 1 ; v(f3; 5; 7g) D v(f1; 5; 7g) D v(f1; 3; 7g) D v(f3; 5; 9g) D v(f1; 5; 9g) D v(f1; 3; 9g) D 2 ;

677

678

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

v(f1; 4; 7; 9g) D v(f3; 6; 7; 9g) D v(f5; 2; 7; 9g) D 2 and v(S) D 0 for all other S  N : Though this game has no stable set, it has a nonempty core. A game with no stable set and an empty core was also found by Lucas and Rabie [21]. We remark on a class of games in which a stable core exists. As mentioned before, if a stable set exists, it always contains the core, which is of course true also in characteristic function form games. Furthermore, in characteristic function form games, there is an interesting class, called convex games, in which the core satisfies external stability. That is, the core is a stable core. A characteristic function form game (N; v) is a convex game if for any S; T N with S  T and for any i … T; v(S[fig)v(S)  v(T[fig)v(T), i. e., the bigger coalition a player joins, the larger the player’s contribution becomes. In convex games, the core is large and satisfies external stability. For the details, refer to Shapley [45]. Though general existence is denied, the stable set provides us with very useful insights into many economic, political, and social issues. In the following, we will present some stable set analyzes applied to those issues. Applications of Stable Sets in Abstract and Characteristic Function Form Games Symmetric Voting Games This section deals with applications of stable sets to voting situations. Let us start with a simple example. Example 1 Suppose there is a committee consisting of three players 1, 2 and 3. Each player has one vote. Decisions are done according to a simple majority rule. That is, to pass a bill, at least two votes are necessary. Before analyzing players’ behavior, we first formulate the situation as a characteristic function form game. Let the player set be N D f1; 2; 3g. Since a coalition of a simple majority of players can pass any bill, we give value 1 to such coalitions. Other coalitions can pass no bill. We thus give them value 0. Hence the characteristic function is given by ( 1 if jSj 2 ; v(S) D 0 if jSj  1 ;

2; 1/2). A brief proof is the following. Since each of the three imputations has only two numbers 1/2 and 0, internal stability is trivial. To show external stability, take any imputation x D (x1 ; x2 ; x3 ) from outside K. Suppose first x1 < 1/2. Since x … K, at least one of x2 and x3 is less than 1/2. We assume x2 < 1/2. Then (1/2; 1/2; 0) dominates x via coalition f1; 2g. Next suppose x1 D 1/2. Since x … K; 0 < x2 ; x3 < 1/2. Thus (0; 1/2; 1/2) dominates x via coalition f2; 3g. Finally suppose x1 > 1/2. Then x2 ; x3 < 1/2, and thus (0; 1/2; 1/2) dominates x via coalition f2; 3g. Thus the proof of external stability is complete. This three-point stable set indicates that a two-person coalition is formed, and that players in the coalition share equally the outcome obtained by passing a bill. This game has another three types of stable sets. First, any set K c1 D fx 2 Ajx1 D cg with 0  c < 1/2 is a stable set. The internal stability of each K 1c is trivial. To show external stability, take any imputation x D (x1 ; x2 ; x3 ) … K c1 . Suppose x1 > c. Define by y1 D c; y2 D x2 C (x1  c)/2; y3 D x3 C (x1  c)/2. Then y 2 K c1 and y domf2;3g x. Next suppose x1 < c. Notice that at least one of x2 and x3 is less than 1c since c < 1/2. Suppose without loss of generality x2 < 1  c. Since c < 1/2, we have (c; 1  c; 0) 2 K c1 and (c; 1  c; 0) domf1;2g x. Thus external stability holds. This stable set indicates that player 1 gets a fixed amount c and players 2 and 3 negotiate for how to allocate the rest 1  c. Similarly, any sets K c2 D fx 2 Ajx2 D cg and K c3 D fx 2 Ajx3 D cg with 0  c < 1/2 are stable sets. The three-person game of Example 1 has no other stable set. See von Neumann and Morgenstern [31]. The former stable set is called a symmetric (or objective) stable set, while the latter types are called discriminatory stable sets. As a generalization of the above result, symmetric stable sets are found in general n-person simple majority voting games. An n-person characteristic function form game (N; v) with N D f1; 2; : : : ; ng is called a simple majority voting game if ( 1 if jSj > n/2 ; v(S) D 0 if jSj  n/2 :

A D fx D (x1 ; x2 ; x3 )jx1 C x2 C x3 D 1; x1 ; x2 ; x3 0g :

A coalition S with v(S) D 1, i. e., with jSj > n/2, is called a winning coalition. A winning coalition including no smaller winning coalitions is called a minimal winning coalition. In simple majority voting games, a minimal winning coalition means a coalition of (n C 1)/2 players if n is odd, or (n C 2)/2 players if n is even. The following theorem holds. See Bott [2] for the proof.

One stable set of this game is given by the set K consisting of three imputations, (1/2; 1/2; 0); (1/2; 0; 1/2); (0; 1/

Theorem 1 Let (N; v) be a simple majority voting game. Then the following hold.

where jSj denotes the number of players in coalition S. The set of imputations is given by

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

(1) If n is odd, then the set K D h2/(n C 1); : : : ; 2/(n C 1); 0; : : : ; 0i „ ƒ‚ … „ ƒ‚ … n1 2

nC1 2

is a stable set where the symbol hxi denotes the set of all imputations obtained from x through permutations of its components. (2) If n is even, the set K D hfx 2 Aj x1 D : : : D x n/2 x(n/2)C1 D : : : D x n gi „ ƒ‚ … „ ƒ‚ … n 2

n 2

is a stable set, where ˇX  ˇ n x i D 1; x1 ; : : : ; x n 0 A D x D (x1 ; : : : ; x n )ˇˇ

iD1

and hYi D [ hxi . x2Y

It should be noted from (1) of Theorem 1 that when the number of players is odd, a minimal winning coalition is formed. The members of the coalition share equally the total profit. On the other hand, when the number of players is even, (2) of Theorem 1 shows that every player may gain a positive profit. This implies that the grand coalition of all players is formed. In negotiating for how to share the profit, two coalitions, each with n/2 players, are formed and profits are shared equally within each coalition. Since at least n/2 C 1 players are necessary to win when n is even, an n/2-player coalition is the smallest coalition that can prevent its complement from winning. Such a coalition is called a minimal blocking coalition. When n is odd, an (n C 1)/2-player minimal winning coalition is also a minimal blocking coalition. General Voting Games In this section, we present properties of stable sets and cores in general (not necessarily symmetric) voting games. A characteristic function from game (N; v) is called a simple game if v(S) D 1 or 0 for each nonempty coalition S N. A coalition S with v(S) D 1 (resp. v(S) D 0) is a winning coalition (resp. losing coalition). A simple game is called a voting game if it satisfies (1) v(N) D 1, (2) if S T, then v(S)  v(T), and (3) if S is winning, then N  S is losing. The first condition implies that the grand coalition N is always winning. The second condition says that a superset of a winning coalition is also winning. The third condition says that there are no two disjoint winning coalitions. It is easily shown that the simple majority

voting game studied in the previous section satisfies these conditions. A player has a veto if he/she belongs to every winning coalition. As for cores of voting games, the following theorem holds. Theorem 2 Let (N; v) be a voting game. Then the core of (N; v) is nonempty if and only if there exists a player with veto. Thus the core is not a useful tool for analyzing voting situations with no veto player. In simple majority voting games, no player has a veto, and thus the core is empty. The following theorem shows that stable sets always exist. Theorem 3 Let (N; v) be a voting game. Let S be a minimal winning coalition and define a set K by ˇX

 ˇ ˇ K D x 2 Aˇ x i D 1; x i D 08i … S : i2S

Then K is a stable set. Thus in voting games, a minimal winning coalition is always formed, and they gain all the profit. For the proofs of these theorems, see Owen [34]. Further results on stable sets in voting games are found in Bott [2], Griesmer [12], Heijmanns [16], Lucas et al. [20], Muto [26,28], Owen [32], Rosenmüller [36], Shapley [43,44]. Production Market Games Let us start with a simple example. Example 2 There are four players, each having one unit of a raw material. Two units of the raw material are necessary for producing one unit of an indivisible commodity. One unit of the commodity is sold at p dollars. The situation is formulated as the following characteristic function form game. The player set is N D f1; 2; 3; 4g. Since two units of the raw material are necessary to produce one unit of the commodity, the characteristic function v is given by v(S) D 2p if jSj D 4 ;

v(S) D p if jSj D 3; 2 ; v(S) D 0 if jSj D 1; 0 :

The set of imputations is A D fx D (x1 ; x2 ; x3 ; x4 )jx1 C x2 C x3 C x4 D 2p; x1 ; x2 ; x3 ; x4 0g : The following set K is one of the stable sets of the game: K D hfx D (x1 ; x2 ; x3 ; x4 ) 2 Ajx1 D x2 D x3 x4 gi :

679

680

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

To show internal stability, take two imputations x D (x1 ; x2 ; x3 ; x4 ) with x1 D x2 D x3 x4 and y D (y1 ; y2 ; y3 ; y4 ) in K. Suppose x dominates y. Since x1 D x2 D x3

p/2 x4 , the domination must hold via coalition fi; 4g with i D 1; 2; 3. Then we have a contradiction 2p D P4 P4 iD1 x i > iD1 y i D 2p, since y 2 K implies that the largest three elements of y are equal. To show external stability, take z D (z1 ; z2 ; z3 ; z4 ) … K. Suppose z1 z2

z3 z4 . Then z1 > z3 . Define y D (y1 ; y2 ; y3 ; y4 ) by 8 z C z2  2z3 ˆ z3 ; y4 > z4 and y3 C y4  p D v(f3; 4g). This stable set shows that in negotiating for how to share the profit of 2p dollars, three players form a coalition and share equally the gain obtained through collaboration. At least two players are necessary to produce the commodity. Thus a three-player coalition is the smallest coalition that can prevent its complement from producing the commodity, i. e., a minimal blocking coalition. We would claim that in the market a minimal blocking coalition is formed and that profits are shared equally within the coalition. An extension of the model was given by Hart [14] and Muto [27]. Hart considered the following production market with n players, each holding one unit of a raw material. To produce one unit of an indivisible commodity, k units of raw materials are necessary. The associated production market game is defined by the player set N D f1; 2; : : : ; ng and the characteristic function given by 8 0 ˆ ˆ ˆ ˆ ˆ p ˆ ˆ ˆ ˆ : ˆ y1 and x2 > y2 P3 hold. Thus we have a contradiction p D iD1 x i > P3 iD1 y i D p. The domination via f1; 3g leads to the same contradiction. To show external stability, take any imputation z D (z1 ; z2 ; z3 ) … K. Then z2 ¤ z3 . Without loss of generality, let z2 < z3 . Define y D (y1 ; y2 ; y3 ) by 8 z3  z2 ˆ z1 C for i D 1 ; ˆ ˆ 3z < z3  2 for i D 2 ; y i D z2 C ˆ 3z ˆ  z ˆ 3 2 :z 2 C for i D 3 : 3 Then y 2 K and y domf1;2g z, since y1 > z1 ; y2 > z2 and y1 C y2 < (f1; 2g). This stable set shows that in negotiating for how to share the profit p dollars, players 2 and 3 form a coalition against player 1 and share equally the gain obtained through collaboration. There exist other stable sets in which players 2 and 3 collaborate but they do not share equally the profit. More precisely, the following set K D fx D (x1 ; x2 ; x3 ) 2 Aj x2 and x3 move towards the same directiong is a stable set, where “move towards the same direction” means that if x2 increases then x3 increases, and if x2 decreases then x3 decreases. A generalization of the results above is given by the following theorem due to Shapley [42]. Shapley’s original theorem is more complicated and holds in more general markets. Theorem 6 Suppose there are m players, 1; : : : ; m, each holding one unit of raw material P, and n players, m C 1; : : : ; m C n, each holding one unit of raw material Q. To produce one unit of an indivisible commodity, one unit of each of raw materials Pand Q is necessary. One unit of commodity is sold at p dollars. In this market, the following set K is a stable set. K D fx D (x1 ; x2 ; : : : ; x mCn ) 2 Aj x1 D    D x m ; x mC1 D    D x mCn g :

where ˇ ˇ A D x D (x1 ; : : : ; x m ; x mC1 ; : : : ; x mCn )ˇˇ

mCn X

 x i D p  min(m; n); x1 ; : : : ; x mCn 0 ;

iD1

is the set of imputations of this game. This theorem shows that players holding the same raw material form a coalition and share equally the profit gained through collaboration. For further results on stable sets in production market games, refer to Hart [14], Muto [27], Owen [34]. Refer also to Lucas [19], Owen [33], Shapley [41], for further general studies on stable sets. Assignment Games The following two sections deal with markets in which indivisible commodities are traded between sellers and buyers, or bartered among agents. The first market is the assignment market originally introduced by Shapley and Shubik [47]. An assignment market consists of a set of n( 1) buyers B D f1; : : : ; ng and a set of n sellers F D f10 ; : : : ; n0 g. Each seller k 0 2 F is endowed with one indivisible commodity to sell, which is called object k0 . Thus F also denotes the set of n objects in the market. The objects are differentiated. Each buyer i 2 B wants to buy at most one of the objects, and places a nonnegative monetary valuation u i k 0 ( 0) for each object k 0 2 F. The matrix U D (u i k 0 )(i;k 0 )2BF is called the valuation matrix. The sellers place no valuation for any objects. An assignment market is denoted by M(B; F; U). We remark that an assignment market with jBj ¤ jFj can be transformed into the market with jBj D jFj by adding dummy buyers resp. sellers, and zero rows resp. columns correspondingly to valuation matrix U. For each coalition S B [ F with S \ B ¤ ; and S \ F ¤ ;, we define assignment problem P(S) as follows: X P(S) : m(S) D max ui k0 xi k0 x

s:t:

(i;k 0 )2(S\B)(S\F)

X

xi k0  1

for all i 2 S \ B ;

k 0 2S\F

X

xi k0  1

for all k 0 2 S \ F ;

i2S\B

x i k 0 0 for all (i; k 0 ) 2 (S \ B)  (S \ F) : Assignment problem P(S) has at least one optimal integer solution (see Simonnard [49]), which gives an opti-

681

682

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

mal matching between sellers and buyers in S that yields the highest possible surplus in S. Without loss of generality, we assume that the rows and columns of valuation matrix U are arranged so that the diagonal assignment x  with x i i 0 D 1; i D 1; : : : ; n, is one of the optimal solutions to P(B [ F). For a given assignment market M(B; F; U), we define the associated assignment game G to be the characteristic function form game (B [ F; v). The player set of G is B [ F. The characteristic function v is defined as follows: v(S) D m(S) for each S F [ B with S \ B ¤ ; and S \ F ¤ ;. For coalitions only of sellers or buyers, they cannot produce surplus from trade. Thus v(S) D 0 for each S with S B; S F, or S D ;. The imputation set of G is ˇX n o X B F ˇ A D (w; p) 2 u i (x j1 ) for each i 2 S j . We sometimes say that x indirectly dominates y starting with S1 (denoted by x S 1 y) to specify the set of players which deviates first from y. We hereupon remark that in the definition of indirect domination we implicitly assume that joint moves by groups of players are neither once-and-for-all nor binding, i. e., some players in a deviating group may later make another move with players within or even outside the group. It should be noted that the indirect domination defined above is borrowed from Chwe [3]. Though Harsanyi [13] first proposed the notion of indirect domination, his definition was given in characteristic function form games. When p D 1 in the definition of indirect domination, we simply say that x directly dominates y, which is denoted by x d y. When we want to specify a deviating coalition, we say that x directly dominates y via coalition S, which is denoted by x dS y. This direct domination in strategic form games was defined by Greenberg [9]. Let pairs (X; ) and (X; d ) be the abstract games associated with game G. A farsighted stable set of G is defined to be a stable set of abstract game (X; ) with indirect domination. A stable set of abstract game (X; d ) with direct domination is simply called a stable set of G . Applications of Farsighted Stable Sets in Strategic Form Games Existence of farsighted stable sets in strategic form games remains unsolved. Nevertheless, it has turned out through

applications that farsighted stable sets give much sharper insights into players’ behavior in economic, political, and social situations than (myopic) stable sets with direct dominations. In what follows, we show the analyses of farsighted stable sets in the prisoner’s dilemma and two types of duopoly market games in strategic form. Prisoner’s Dilemma To make discussion as clear as possible, we will focus on the following particular version of the prisoner’ s dilemma shown below. Similar results hold in general prisoner’ s dilemma games. Prisoner’s Dilemma:

Player 1

Cooperate Defect

Player 2 Cooperate Defect 4; 4 0; 5 5; 0 1; 1

We first present a farsighted stable set derived when two players use only pure strategies. In this case, the set of strategy combinations is X = {(Cooperate, Cooperate), (Cooperate, Defect), (Defect, Cooperate), (Defect, Defect)}, where in each combination, the former (resp. the latter) is player 1’s (resp. 2’s) strategy. The direct domination relation of this game is summarized as in Fig. 1. From Fig. 1, no stable set (with direct domination) exists in the prisoner’s dilemma. However, since (Cooperate, Cooperate)  (Cooperate, Defect), (Defect,Cooperate) and there is no other indirect domination, the singleton {(Cooperate, Cooperate)} is the unique farsighted stable set with respect to . Hence if the two players are farsighted and make a joint but not binding move, the farsighted stable set succeeds in showing that cooperation of the players results in the unique stable outcome. We now study stable outcomes in the mixed extension of the prisoner’s dilemma, i. e., the prisoner’s dilemma with mixed strategies played. Let X1 D X 2 D [0; 1] be the sets of mixed strategies of players 1 and 2, respectively, and let t1 2 X1 (resp. t2 2 X2 ) denote the probability that

Cooperative Games (Von Neumann–Morgenstern Stable Sets), Figure 1 2

Here ! denotes a direct domination. For example, “(Cooperate, Cooperate) !(Cooperate,Defect) means (Cooperate, Defect) df2g (Cooperate,Cooperate). “(Cooperate, Cooperate)

12

(Defect, Defect)” means (Cooperate, Cooperate) df1;2g (Defect, Defect)

685

686

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

player 1 (resp. 2) plays “Cooperate”. It is easily seen that the minimax payoffs to players 1 and 2 are both 1 in this game. We call a strategy combination that gives both players at least (resp. more than) their minimax payoffs, an individually rational (resp. a strictly individually rational) strategy combination. We then have the following theorem. Theorem 10 Let T D f(t1 ; t2 )j1/4 < t1  1 ; t2 D 1g[ f(t1 ; t2 )jt1 D 1; 1/4 < t2  1g ; and define the singleton K 1 (t1 ; t2 ) D f(t1 ; t2 )g for each (t1 ; t2 ) 2 T. Let K 2 D f(0; 0); (1; 1/4)g and K 3 D f(0; 0); (1/4; 1)g. Then the sets K 2 ; K 3 , and any K 1 (t1 ; t2 ) with (t1 ; t2 ) 2 T are farsighted stable sets of the mixed extension of the prisoner’s dilemma. There are no other types of farsighted stable sets in the mixed extension of the prisoner’s dilemma. This theorem shows that if the two players are farsighted and make a joint but not binding move in the prisoner’s dilemma, then essentially a single Pareto efficient and strictly individually rational strategy combination results as a stable outcome. i. e., K 1 (t1 ; t2 ). We, however, have two exceptional cases K 2 ; K 3 that (Defect, Defect) could be stable together with one Pareto efficient point at which one player gains the same payoff as in (Defect, Defect). n-Person Prisoner’s Dilemma We consider an n-person prisoner’s dilemma. Let N D f1; : : : ; ng be the player set. Each player i has two strategies: C (Cooperate) and D (Defect). Let X i D fC; Dg for each i 2 N. Hereafter we refer to a strategy combination Q as a state. The set of states is X D i2N X i . For each Q Q coalition S  N, let X S D i2S X i and X S D i2S c X i , where Sc denotes the complement of S with respect to N. Let x s and xs denote generic elements of X S and X S , respectively. Player i’s payoff depends not only on his/her strategy but also on the number of other cooperators. Player i’s payoff function u i : X ! < is given by u i (x) D f i (x i ; h), where x 2 X; x i 2 X i (player i’s choice in x), and h is the number of players other than i playing C. We call the strategic form game thus defined an n-person prisoner’s dilemma game. For simplifying discussion, we assume that all players are homogeneous and each player has an identical payoff function. That is, f i ’s are identical, and simply written as f unless any confusion arises. We assume the following properties on the function f

Assumption 1 (1) f (D; h) > f (C; h) for all h D 0; 1; : : : ; n  1 (2) f (C; n  1) > f (D; 0) (3) f (C; h) and f (D; h) are strictly increasing in h. Property (1) means that every player prefers playing D to playing C regardless of which strategies other players play. Property (2) means that if all players play C, then each of them gains a payoff higher than the one in (D; : : : ; D). Property (3) means that if the number of cooperators increases, every player becomes better off regardless of which strategy he/she plays. It holds from Property (1) that (D; : : : ; D) is the unique Nash equilibrium of the game. Here for x; y 2 X, we say that y is Pareto superior to x if u i (y) u i (x) for all i 2 N and u i (y) > u i (x) for some i 2 N. The state x 2 X is said to be Pareto efficient if there is no y 2 X that is Pareto superior to x. By Property (2), (C; : : : ; C) is Pareto superior to (D; : : : ; D). Together with Property (3), (C; : : : ; C) is Pareto efficient. Given a state x, we say that x is individually rational if for all i 2 N; u i (x) min yi 2Xi max y i 2X i u i (y). If a strict inequality holds, we say that x is strictly individually rational. It holds from (1), (3) of Assumption 8.1 that min yi 2Xi max y i 2X i u i (y) D f (D; 0). The following theorem shows that any state that is strictly individually rational and Pareto efficient is itself a singleton farsighted stable set. That is, any strictly individually rational and Pareto efficient outcome is stable if the players are farsighted. Refer to Suzuki and Muto [51,52] for the details. Theorem 11 For n-person prisoner’s dilemma game, if x is a strictly individually rational and Pareto efficient state, then fxg is a farsighted stable set. Duopoly Market Games We consider two types of duopoly: Cournot quantitysetting duopoly and Bertrand price-setting duopoly. For simplifying discussion, we will consider a simple duopoly model in which firms’ cost functions and a market demand function are both linear. Similar results, however, hold in more general duopoly models. There are two firms 1,2, each producing a homogeneous good with the same marginal cost c > 0. No fixed cost is assumed. (1) Cournot duopoly: Firms’ strategic variables are their production levels. Let x1 and x2 be production levels of firms 1 and 2, respectively. The market price p(x1 ; x2 )

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

for x1 and x2 is given by p(x1 ; x2 ) D max(a  (x1 C x2 ); 0) ; where a > c. We restrict the domain of production of both firms to 0  x i  a  c ; i D 1; 2. This is reasonable since a firm would not overproduce to make a nonpositive profit. When x1 and x2 are produced, firm i’s profit is given by  i (x1 ; x2 ) D (p(x1 ; x2 )  c)x i : Thus Cournot duopoly is formulated as the following strategic form game G C D (N; fX i g iD1;2 ; f i g iD1;2) ; where the player set is N D f1; 2g, each player’s strategy set is a closed interval between 0 and a  c, i. e., X1 D X2 D [0; a  c], and their payoff functions are  i ; i D 1; 2. Let X D X1  X 2 . The joint profit of two firms is maximized when x1 C x2 D (a  c)/2. (2) Bertrand duopoly: Firms’ strategic variables are their price levels. Let D(p) D max (a  p; 0) be the market demand at price p. Then the total profit at p is Y (p) D (p  c)D(p) : We restrict the domain of price level p of both firms to c  p  a. This assumption is also reasonable since a firm Q would avoid a negative profit. The total profit (p) is maximized at p D (a C c)/2, which is called a monopoly price. Let p1 and p2 be prices of firms 1 and 2, respectively. We assume that if firms’ prices are equal, then they share equally the total profit, otherwise all sales go to the lower pricing firm of the two. Thus firm i’s profit is given by 8Q if p i < p j ˆ p j for i; j D 1; 2; i ¤ j : Hence Bertrand duopoly is formulated as the strategic form game   G B D N; fYi g iD1;2 ; f i g iD1;2 ; where N D f1; 2g; Y1 D Y2 D [c; a], and i (i D 1; 2) is i’s payoff function. Let Y D Y1  Y2 .

It is well-known that a Nash equilibrium is uniquely determined in either market: x1 D x2 D (a  c)/3 in the Cournot market, and p1 D p2 D c in the Bertrand market. The following theorem holds for the farsighted stable sets in Cournot duopoly. Theorem 12 Let (x1 ; x2 ) 2 X be any strategy pair with x1 C x2 D (a  c)/2. Then the singleton f(x1 ; x2 )g is a farsighted stable set. Furthermore, every farsighted stable set is of the form f(x1 ; x2 )g with x1 C x2 D (a  c)/2 and x1 ; x2 0. As mentioned before, any strategy pair (x1 ; x2 ) with x1 C x2 D (a  c)/2 and x1 ; x2 0 maximizes two firms’ joint profit. This suggests that the von Neumann– Morgenstern stability together with firms’ farsighted behavior produce joint profit maximization even if firms’ collaboration is not binding. As for Bertrand duopoly, we have the following theorem, which claims that the monopoly price pair is itself a farsighted stable set, and no other farsighted stable set exists. Therefore the von Neumann–Morgenstern stability together with firms’ farsighted behavior attain efficiency (from the standpoint of firms) also in Bertrand duopoly. Refer to Suzuki and Muto [53] for the details. Theorem 13 Let p D (p1 ; p2 ) be the pair of monopoly prices, i. e., p1 D p2 D (a C c)/2. Then the singleton fpg is the unique farsighted stable set. For studies of stable sets with direct domination in duopoly market games, refer to Muto and Okada [29,30]. Properties of stable sets and Harsanyi’s original farsighted stable sets in pure exchange economies are investigated by Greenberg et al. [10]. For further studies on stable sets and farsighted stable sets in strategic form games, refer to Kaneko [17], Mariotti [24], Xue [55,56], Diamantoudi and Xue [4]. Future Directions In this paper, we have reviewed applications of von Neumann–Morgenstern stable sets in abstract games, characteristic function form games, and strategic form games to economic, political and social systems. Stable sets give us insights into coalition formation among players in the systems in question. Farsighted stable sets, especially applied to some economic systems, show that players’ farsighted behavior leads to Pareto efficient outcomes even though their collaboration is not binding. The stable set analysis is also applicable to games with infinitely many players. Those analyses show us new approaches to large economic and social systems with infinitely many players. For the details, refer to Hart [15],

687

688

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

Einy et al. [7], Einy and Shitovitz [6], Greenberg et al. [11], Shitovitz and Weber [48], and Rosenmüller and Shitovitz [37]. There is also a study on the linkage between common knowledge of Bayesian rationality and achievement of stable sets in generalized abstract games. Refer to Luo [22,23] for the details. Analyses of social systems by applying the concepts of farsighted stable sets as well as stable sets must further advance theoretical studies on games in which players inherently take both cooperative and non-cooperative behavior. Those studies will in turn have impacts on developments of economics, politics, sociology, and many applied social sciences.

Bibliography Primary Literature 1. Aumann R, Peleg B (1960) Von Neumann–Morgenstern solutions to cooperative games without side payments. Bull Am Math Soc 66:173–179 2. Bott R (1953) Symmetric solutions to majority games. In: Kuhn HW, Tucker AW (eds) Contribution to the theory of games, vol II. Annals of Mathematics Studies, vol 28. Princeton University Press, Princeton, pp 319–323 3. Chwe MS-Y (1994) Farsighted coalitional stability. J Econ Theory 63:299–325 4. Diamantoudi E, Xue L (2003) Farsighted stability in hedonic games. Soc Choice Welf 21:39–61 5. Ehlers L (2007) Von Neumann–Morgenstern stable sets in matching problems. J Econ Theory 134:537–547 6. Einy E, Shitovitz B (1996) Convex games and stable sets. Games Econ Behav 16:192–201 7. Einy E, Holzman R, Monderer D, Shitovitz B (1996) Core and stable sets of large games arising in economics. J Econ Theory 68:200–211 8. Gale D, Shapley LS (1962) College admissions and the stability of marriage. Am Math Mon 69:9–15 9. Greenberg J (1990) The theory of social situations: an alternative game theoretic approach. Cambridge University Press, Cambridge 10. Greenberg J, Luo X, Oladi R, Shitovitz B (2002) (Sophisticated) stable sets in exchange economies. Games Econ Behav 39:54– 70 11. Greenberg J, Monderer D, Shitovitz B (1996) Multistage situations. Econometrica 64:1415–1437 12. Griesmer JH (1959) Extreme games with three values. In: Tucker AW, Luce RD (eds) Contribution to the theory of games, vol IV. Annals of Mathematics Studies, vol 40. Princeton University Press, Princeton, pp 189–212 13. Harsanyi J (1974) An equilibrium-point interpretation of stable sets and a proposed alternative definition. Manag Sci 20:1472– 1495 14. Hart S (1973) Symmetric solutions of some production economies. Int J Game Theory 2:53–62 15. Hart S (1974) Formation of cartels in large markets. J Econ Theory 7:453–466

16. Heijmans J (1991) Discriminatory von Neumann–Morgenstern solutions. Games Econ Behav 3:438–452 17. Kaneko M (1987) The conventionally stable sets in noncooperative games with limited observations I: Definition and introductory argument. Math Soc Sci 13:93–128 18. Lucas WF (1968) A game with no solution. Bull Am Math Soc 74:237–239 19. Lucas WF (1990) Developments in stable set theory. In: Ichiishi T et al (eds) Game Theory and Applications, Academic Press, New York, pp 300–316 20. Lucas WF, Michaelis K, Muto S, Rabie M (1982) A new family of finite solutions. Int J Game Theory 11:117–127 21. Lucas WF, Rabie M (1982) Games with no solutions and empty cores. Math Oper Res 7:491–500 22. Luo X (2001) General systems and '-stable sets: a formal analysis of socioeconomic environments. J Math Econ 36:95–109 23. Luo X (2006) On the foundation of stability. Academia Sinica, Mimeo, available at http://www.sinica.edu.tw/~xluo/pa14.pdf 24. Mariotti M (1997) A model of agreements in strategic form games. J Econ Theory 74:196–217 25. Moulin H (1995) Cooperative Microeconomics: A Game-Theoretic Introduction. Princeton University Press, Princeton 26. Muto S (1979) Symmetric solutions for symmetric constantsum extreme games with four values. Int J Game Theory 8:115– 123 27. Muto S (1982) On Hart production games. Math Oper Res 7:319–333 28. Muto S (1982) Symmetric solutions for (n,k) games. Int J Game Theory 11:195–201 29. Muto S, Okada D (1996) Von Neumann–Morgenstern stable sets in a price-setting duopoly. Econ Econ 81:1–14 30. Muto S, Okada D (1998) Von Neumann–Morgenstern stable sets in Cournot competition. Econ Econ 85:37–57 31. von Neumann J, Morgenstern O (1953) Theory of Games and Economic Behavior, 3rd ed. Princeton University Press, Princeton 32. Owen G (1965) A class of discriminatory solutions to simple n-person games. Duke Math J 32:545–553 33. Owen G (1968) n-Person games with only 1, n-1, and n-person coalitions. Proc Am Math Soc 19:1258–1261 34. Owen G (1995) Game theory, 3rd ed. Academic Press, New York 35. Peleg B (1986) A proof that the core of an ordinal convex game is a von Neumann–Morgenstern solution. Math Soc Sci 11:83–87 36. Rosenmüller J (1977) Extreme games and their solutions. In: Lecture Notes in Economics and Mathematical Systems, vol 145. Springer, Berlin 37. Rosenmüller J, Shitovitz B (2000) A characterization of vNMstable sets for linear production games. Int J Game Theory 29:39–61 38. Roth A, Postlewaite A (1977) Weak versus strong domination in a market with indivisible goods. J Math Econ 4:131–137 39. Roth A, Sotomayor M (1990) Two-Sided Matching: A Study in Game-Theoretic Modeling and Analysis. Cambridge University Press, Cambridge 40. Quint T, Wako J (2004) On houseswapping, the strict core, segmentation, and linear programming, Math Oper Res 29:861– 877

Cooperative Games (Von Neumann–Morgenstern Stable Sets)

41. Shapley LS (1953) Quota solutions of n-person games. In: Kuhn HW, Tucker TW (eds) Contribution to the theory of games, vol II. Annals of Mathematics Studies, vol 28. Princeton University Press, Princeton, pp 343–359 42. Shapley LS (1959) The solutions of a symmetric market game. In: Tucker AW, Luce RD (eds) Contribution to the theory of games, vol IV. Annals of Mathematics Studies, vol 40. Princeton University Press, Princeton, pp 145–162 43. Shapley LS (1962) Simple games: An outline of the descriptive theory. Behav Sci 7:59–66 44. Shapley LS (1964) Solutions of compound simple games. In Tucker AW et al (eds) Advances in Game Theory. Annals of Mathematics Studies, vol 52. Princeton University Press, Princeton, pp 267–305 45. Shapley LS (1971) Cores of convex games. Int J Game Theory 1:11–26 46. Shapley LS, Scarf H (1974) On cores and indivisibilities. J Math Econ 1:23–37 47. Shapley LS, Shubik M (1972) The assignment game I: The core. Int J Game Theory 1:111–130 48. Shitovitz B, Weber S (1997) The graph of Lindahl correspondence as the unique von Neumann–Morgenstern abstract stable set. J Math Econ 27:375–387 49. Simonnard M (1966) Linear programming. Prentice-Hall, New Jersey

50. Solymosi T, Raghavan TES (2001) Assignment games with stable core. Int J Game Theory 30:177–185 51. Suzuki A, Muto S (2000) Farsighted stability in prisoner’s dilemma. J Oper Res Soc Japan 43:249–265 52. Suzuki A, Muto S (2005) Farsighted stability in n-person prisoner’s dilemma. Int J GameTheory 33:431–445 53. Suzuki A, Muto S (2006) Farsighted behavior leads to efficiency in duopoly markets. In: Haurie A, et al (eds) Advances in Dynamic Games. Birkhauser, Boston, pp 379–395 54. Wako J (1991) Some properties of weak domination in an exchange market with indivisible goods. Jpn Econ Rev 42:303– 314 55. Xue L (1997) Nonemptiness of the largest consistent set. J Econ Theory 73:453–459 56. Xue L (1998) Coalitional stability under perfect foresight. Econ Theory 11:603–627

Books and Reviews Lucas WF (1992) Von Neumann–Morgenstern stable sets. In: Aumann RJ, Hart S (eds) Handbook of Game Theory with Economic Applications, vol 1. North-Holland, Amsterdam, pp 543– 590 Shubik M (1982) Game theory in the social sciences: Concepts and solutions. MIT Press, Boston

689

690

Cooperative Multi-hierarchical Query Answering Systems

Cooperative Multi-hierarchical Query Answering Systems Z BIGNIEW W. RAS1,2 , AGNIESZKA DARDZINSKA 3 Department of Computer Science, University of North Carolina, Charlotte, USA 2 Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland 3 Department of Computer Science, Białystok Technical University, Białystok, Poland

1

Article Outline Glossary Definition of the Subject Introduction Multi-hierarchical Decision System Cooperative Query Answering Future Directions Bibliography Glossary Autonomous information system An autonomous information system is an information system existing as an independent entity. Intelligent query answering Intelligent query answering is an enhancement of query-answering into a sort of intelligent system (capable or being adapted or molded). Such systems should be able to interpret incorrectly posed questions and compose an answer not necessarily reflecting precisely what is directly referred to by the question, but rather reflecting what the intermediary understands to be the intention linked with the question. Knowledge base Knowledge base is a collection of rules defined as expressions written in predicate calculus. These rules have a form of associations between conjuncts of values of attributes. Ontology Ontology is an explicit formal specification of how to represent objects, concepts and other entities that are assumed to exist in some area of interest and relationships holding among them. Systems that share the same ontology are able to communicate about the domain of discourse without necessarily operating on a globally shared theory. A system commits to ontology if its observable actions are consistent with the definitions in the ontology. Semantics The meaning of expressions written in some language as opposed to their syntax which describes how symbols may be combined independently of their meaning.

Definition of the Subject One way to make a Query Answering System (QAS) intelligent is to assume the hierarchical structure of their attributes. Such systems have been investigated by Cuppens and Demolombe [3], Gal and Minker [4], and Gaasterland et al. [6], and they are called cooperative. Queries submitted to them are built, in a classical way, from values of attributes describing objects in an information system S and from two-argument functors “and”, “or”. Instead of “or”, we use the symbol “+”. Instead of “and”, we use the symbol “*”. Let us assume that QAS is associated with an information system S. Now, if query q submitted to QAS fails, then any attribute value listed in q can be generalized and the number of objects supporting q in S may increase. In cooperative systems, these generalizations are controlled either by users [4], or by methods based on knowledge discovery [12]. Conceptually, a similar approach has been proposed by Lin [11]. He defines a neighborhood of an attribute value which we can interpret as its generalization (or its parent in the corresponding hierarchical attribute structure). When query fails, then the query answering system is trying to replace values in a query by new values from their corresponding neighborhoods. QAS for S can also collaborate and exchange knowledge with other information systems. In all such cases, it is called intelligent. In papers [14,15] the query answering strategy was based on a guided process of knowledge (rules) extraction and knowledge exchange among systems. Knowledge extracted from information systems collaborating with S was used to construct new attributes in S and/or impute null or hidden values of attributes in S. This way we do not only enlarge the set of queries which QAS can successfully answer but also increase the overall number of retrieved objects and their confidence. Some attributes in S can be distinguished. We usually call them decision attributes. Their values represent concepts which can be defined in terms of the remaining attributes in S, called classification attributes. Query languages for such information systems are built only from values of decision attributes and from two-argument functors “+”, “*” [16]. The semantics of queries is defined in terms of semantics of values of classification attributes. Precision and recall of QAS is strictly dependent on the support and confidence of the classifiers used to define queries. Introduction Responses by QAS to submitted queries do not always contain the information desired and although they may be logically correct, can sometimes be misleading. Research in the area of intelligent query answering rectifies these

Cooperative Multi-hierarchical Query Answering Systems

problems. The classical approach is based on a cooperative method called relaxation for expanding an information system and related to it queries [3,4]. The relaxation method expands the scope of a query by relaxing the constraints implicit in the query. This allows QAS to return answers related to the original query as well as the literal answers which may be of interest to the user. This paper concentrates on multi-hierarchical decision systems which are defined as information systems with several hierarchical distinguished attributes called decision attributes. Their values are used to build queries. We give the theoretical framework for modeling such systems and its corresponding query languages. Standard interpretation and the classifier-based interpretation of queries are introduced and used to model the quality (precision, recall) of QAS.

 h [ fVa : a 2 Dg (8 j 2 f1; 2; : : : ; ng) t j 2 i  [ fVa : a 2 Dg : _ tj D w ^ w 2 Definition 4 By a set of classification terms (c-terms) for S we mean a least set T C such that:  0; 1 2 TC , S  if w 2 fVa : a 2 Ag, then w; w 2 TC ,  t1 ; t2 2 TC , then (t1 C t2 ); (t1  t2 ) 2 TC . Definition 5 Classification term t is called simple if t D t1  t2  : : :  t n and h  [ (8 j 2 f1; 2; : : : ; ng) t j 2 fVa : a 2 Ag i  [ fVa : a 2 Ag : _ tj D w ^ w 2

Multi-hierarchical Decision System In this section we introduce the notion of a multi-hierarchical decision system S and the query language built from atomic expressions containing only values of the decision attributes in S. Classifier-based semantics and the standard semantics of queries in S are proposed. The set of objects X in S is defined as the interpretation domain for both semantics. Standard semantics identifies all objects in X which should be retrieved by a query. Classifierbased semantics gives weighted sets of objects which are retrieved by queries. The notion of precision and recall of QAS in the proposed setting is introduced. We use only rule-based classifiers to define the classifier-based semantics. By improving the confidence and support of the classifiers we improve the precision and recall of QAS. Definition 1 By a multi-hierarchical decision system we mean a triple S D (X; A [ D; V), where X is a nonempty, finite set of objects, D D fd[i] : 1  i  kg is a set of hierarchical decision attributes, A is a nonempty finite set S of classification attributes, and V D fVa : a 2 A [ Dg is a set of their values. We assume that: V a , V b are disjoint for any a; b 2 A [ D, such that a ¤ b, a : X ! Va is a partial function for every a 2 A [ D. Definition 2 By a set of decision queries (d-queries) for S we mean a least set T D such that:  0; 1 2 TD , S  if w 2 fVa : a 2 Dg, then w; w 2 TD ,  if t1 ; t2 2 TD , then (t1 C t2 ), (t1  t2 ) 2 TD . Definition 3 Decision query t is called simple if t D t1  t2  : : :  t n and

Definition 6 By a classification rule we mean any expression of the form [t1 ! t2 ], where t1 is a simple classification term and t2 is a simple decision query. Definition 7 Semantics M S of c-terms in S D (X; A [ D; V) is defined in a standard way as follows:  M S (0) D 0, M S (1) D X,  M S (w) D fx 2 X : w D a(x)g for any w 2 Va , a 2 A,  M S ( w) D fx 2 X : (9v 2 Va )[v D a(x) & v ¤ w]g for any w 2 Va , a 2 A,  if t1 , t2 are terms, then M S (t1 C t2 ) D M S (t1 ) [ M S (t2 ) ; M S (t1  t2 ) D M S (t1 ) \ M S (t2 ) : Now, we introduce the notation used for values of decision attributes. Assume that the term d[i] also denotes the first granularity level of a hierarchical decision attribute d[i]. The set fd[i; 1]; d[i; 2]; d[i; 3]; : : :g represents the values of attribute d[i] at its second granularity level. The set fd[i; 1; 1]; d[i; 1; 2]; : : : ; d[i; 1; n i ]g represents the values of attribute d at its third granularity level, right below the node d[i; 1]. We assume here that the value d[i; 1] can be refined to any value from fd[i; 1; 1]; d[i; 1; 2]; : : : ; d[i; 1; n i ]g, if necessary. Similarly, the set fd[i; 3; 1; 3; 1]; d[i; 3; 1; 3; 2]; d[i; 3; 1; 3; 3]; d[i; 3; 1; 3; 4]g represents the values of attribute d at its fourth granularity level which are finer than the value d[i; 3; 1; 3]. Now, let us assume that a rule-based classifier is used to extract rules describing simple decision queries in S. We denote that classifier by RC. The definition of semantics

691

692

Cooperative Multi-hierarchical Query Answering Systems

N S of c-terms is RC independent whereas the definition of semantics M S of d-queries is RC dependent. Definition 8 Classifier-based semantics M S of d-queries in S D (X; A [ D; V) is defined as follows: If t is a simple d-query in S and fr j D [t j ! t] : j 2 J t g is a set of all rules defining t which are extracted from S by classifier RC, then M S (t) D f(x; p x ) : (9 j 2 J t )(x 2 M S (t j )[p x D ˙ fconf( j)  sup( j) : x 2 M S (t j ) & j 2 J t g/ ˙ fsup( j) : x 2 M S (t j ) & j 2 J t g], where conf( j); sup( j) denote the confidence and the support of the rule [t j ! t], correspondingly. Definition 9 Attribute value d[ j1 ; j2 ; : : : ; j n ] in S D (X; A [ D; V) is dependent on d[i1 ; i2 ; : : : ; i k ] in S, if one of the following conditions hold: 1) n  k & (8m  n) [i m D j m ], 2) n > k & (8m  k) [i m D j m ]. Otherwise, d[ j1 ; j2 ; : : : ; j n ] is called independent from d[i1 ; i2 ; : : : ; i k ] in S. Example 1 The attribute value d[2; 3; 1; 2] is dependent on the attribute value d[2; 3; 1; 2; 5; 3]. Also, d[2; 3; 1; 2; 5; 3; 2; 4] is dependent on d[2; 3; 1; 2; 5; 3].   Definition 10 Let S D X; A [ fd[1]; d[2]; : : : ; d[k]g; V , w 2 Vd[i] , and IVd[i] be the set of all attribute values in Vd[i] which are independent from w. Standard semantics N S of d-queries in S is defined as follows:  N S (0) D 0, N S (1) D X,  if w 2 Vd[i] , then N S (w) D fx 2 X : d[i](x) D wg, for any 1  i  k  if w 2 Vd[i] , then N S ( w) D fx 2 X : (9v 2 IVd[i] ) [d[i](x) D v]g, for any 1  i  k  if t1 , t2 are terms, then N S (t1 C t2 ) D N S (t1 ) [ N S (t2 ) ; N S (t1  t2 ) D N S (t1 ) \ N S (t2 ) : Definition 11 Let S D (X; A [ D; V), t is a d-query in S, N S (t) is its meaning under standard semantics, and M S (t) is its meaning under classifier-based semantics. Assume that N S (t) D X 1 [ Y1 , where X 1 D fx i ; i 2 I1 g; Y1 D fy i ; i 2 I2 g. Assume also that M S (t) D f(x i ; p i ) : i 2 I1 g [ f(z i ; q i ) : i 2 I3 g and fy i ; i 2 I2 g \ fz i ; i 2 I3 g D ;. By precision of a classifier-based semantics M S on a dquery t, we mean Prec(M S ; t) D [˙ fp i : i 2 I1 g C ˙ f(1  q i ) : i 2 I3 g] /[card(I1 ) C card(I3 )] :

By recall of a classifier-based semantics M S on a d-query t, we mean Rec(M S ; t) D [˙ fp i : i 2 I1 g]/[card(I1 ) C card(I2 )] : Example 2 Assume that N S (t) D fx1 ; x2 ; x3 ; x4 g, M S (t) D f(x1 ; p1 ); (x2 ; p2 ); (x5 ; p5 ); (x6 ; p6 )g. Then: Prec(M S ; t) D [p1 C p2 C (1  p5 ) C (1  p6 )]/4 ; Rec(M S ; t) D [p1 C p2 ]/4 : Cooperative Query Answering There are cases when classical Query Answering Systems fail to return any answer to a d-query q but still a satisfactory answer can be found. For instance, let us assume that in a multi-hierarchical decision system S D (X; A[ D; V ), where D D fd[1]; d[2]; : : : ; d[k]g, there is no single object whose description matches the query q. Assuming that a distance measure between objects in S is defined, then by generalizing q, we may identify objects in S whose descriptions are closest to the description of q. This problem is similar to the problem when the granularity of an attribute value used in a query q is finer than the granularity of the corresponding attribute used in S. By replacing such attribute values in q by more general values used in S, we may retrieve objects from S which satisfy q. Definition 12 The distance ı S between two attribute values d[ j1 ; j2 ; : : : j n ], d[i1 ; i2 ; : : : ; i m ] in S D (X; A[ D; V ), where j1 D i1 , p 1, is defined as follows: 1) if [ j1 ; j2 ; : : : ; j p ] D [i1 ; i2 ; : : : ; i p ] and j pC1 ¤ i pC1 , then ıS d[ j1 ; j2 ; : : : ; j n ]; d[i1 ; i2 ; : : : ; i m ] D 1/2 p1 2) if n  m and [ j1 ; j2 ; : : : ; j n ] D [i1; i2 ; : : : ; i n ], then ıS d[ j1 ; j2 ; : : : ; j n ]; d[i1 ; i2 ; : : : ; i m ] D 1/2n The second condition, in the above definition, represents the average case between the best and the worth case. Example 3 Following the above definition of the distance measure, we get:   1. ıS d[2; 3; 2; 4]; d[2; 3; 2; 5; 1] D 1/4   2. ıS d[2; 3; 2; 4]; d[2; 3; 2] D 1/8 Let us assume that q D q(a[3; 1; 3; 2]; b[1]; c[2]) is a dquery submitted to S. The notation q(a[3; 1; 3; 2]; b[1]; c[2]) means that q is built from a[3; 1; 3; 2]; b[1]; c[2] which are the atomic attribute values in S. Additionally, we assume that attribute a is not only hierarchical but also it is ordered. It basically means that the difference between the values a[3; 1; 3; 2] and a[3; 1; 3; 3] is smaller than between the values a[3; 1; 3; 2] and a[3; 1; 3; 4]. Also, the

Cooperative Multi-hierarchical Query Answering Systems Cooperative Multi-hierarchical Query Answering Systems, Table 1 Multi-hierarchical decision system S X x1 x2 x3 x4

e e[1 e[2] e[2] e[1]

f f [1] f [1] f [1] f [2]

g ... ... ... ...

... ... ... ... ...

... ... ... ... ...

a a[1] a[1; 1] a[1;1;1] a[2]

b b[2] b[2; 1] b[2;2;1] b[2; 2]

c c[1; 1] c[1;1;1] c[2; 2] c[1; 1]

d d[3] d[3;1;2] d[1] d[1; 1]

Clearly,  ıS [q; x1 ] D ıS [a[1; 2]; b[2]; c[1; 1]; d[3; 1; 1]];  [a[1]; b[2]; c[1; 1]; d[3]] D 1/4 C 0 C 0 C 1/4 D 1/2 ;  ıS [q; x2 ] D ıS [a[1; 2]; b[2]; c[1; 1]; d[3; 1; 1]];

 [a[1; 1]; b[2; 1]; c[1; 1; 1]; d[3; 1; 2]]

D 1/4 C 1/4 C 1/8 C 1/8 D 3/4 ; difference between any two elements in fa[3; 1; 3; 1]; a[3; 1; 3; 2]; a[3; 1; 3; 3]; a[3; 1; 3; 4]g is smaller than between a[3; 1; 3] and a[3; 1; 2]. Now, we outline a possible strategy which QAS can follow to solve q. Clearly, the best solution for answering q is to identify objects in S which precisely match the d-query submitted by the user. If it fails, we try to identify objects which match d-query q(a[3; 1; 3]; b[1]; c[2]). If we succeed, then we try d-queries q(a[3; 1; 3; 1]; b[1]; c[2]) and q(a[3; 1; 3; 3]; b[1]; c[2]). If we fail, then we should succeed with q(a[3; 1; 3; 4]; b[1]; c[2]). If we fail with q(a[3; 1; 3]; b[1]; c[2]), then we try q(a[3; 1]; b[1]; c[2]) and so on. To present this cooperative strategy in a more precise way, we use an example and start with a very simple dataset. Namely, we assume that S has four decision attributes which belong to the set fa; b; c; dg. System S contains only four objects listed in Table 1. Now, we assume that d-query q D a[1; 2]  b[2]  c[1; 1]  d[3; 1; 1] is submitted to the multi-hierarchical decision system S (see Table 1). Clearly, q fails in S. Jointly with q, also a threshold value for a minimum support can be supplied as a part of a d-query. This threshold gives the minimal number of objects that need to be returned as an answer to q. When QAS fails to answer q, the nearest objects satisfying q have to be identified. The algorithm for finding these objects is based on the following steps: If QAS fails to identify a sufficient number of objects satisfying q in S, then the generalization process starts. We can generalize either attribute a or d. Since the value d[3; 1; 2] has a lower granularity level than a[1; 1], then we generalize d[3; 1; 2] getting a new query q1 D a[1; 2]  b[2] c[1; 1] d[3; 1]. But q1 still fails in S. Now, we generalize a[1; 1] getting a new query q2 D a[1]b[2] c[1; 1] d[3; 1]. Objects x1 , x2 are the only objects in S which support q2 . If the user is only interested in one object satisfying the query q, then we need to identify which object in fx1 ; x2 g has a distance closer to q.

which means x1 is the winning object. Note that the cooperative strategy only identifies objects satisfying d-queries and it identifies objects to be returned by QAS to the user. The confidence assigned to these objects depends on the classifier RC. Future Directions We have introduced the notion of system-based semantics and user-based semantics of queries. User-based semantics are associated with the indexing of objects by a user which is time consuming and unrealistic for very large sets of data. System-based semantics are associated with automatic indexing of objects in X which strictly depends on the support and confidence of classifiers and depends on the precision and recall of a query answering system. The quality of classifiers can be improved by a proper enlargement of the set X and the set of features describing them which differentiate the real-life objects from the same semantic domain as X in a better way. An example, for instance, is given in [16]. The quality of a query answering system can be improved by its cooperativeness. Both precision and recall of QAS is increased if no-answer queries are replaced by generalized queries which are answered by QAS on a higher granularity level than the initial level of queries submitted by users. Assuming that the system is distributed, the quality of QAS for multi-hierarchical decision system S can also be improved through collaboration among sites [14,15]. The key concept of intelligent QAS based on collaboration among sites is to generate global knowledge through knowledge sharing. Each site develops knowledge independently which is used jointly to produce global knowledge. Assume that two sites S1 and S2 accept the same ontology of their attributes and share their knowledge in order to solve a user query successfully. Also, assume that one of the attributes at site S1 is confidential. The confidential data in S1 can be hidden by replacing them with null values. However, users at S1 may treat them as missing data and reconstruct them with the knowledge extracted from S2 [10]. The vulnerability illustrated in this exam-

693

694

Cooperative Multi-hierarchical Query Answering Systems

ple shows that a security-aware data management is an essential component for any intelligent QAS to ensure data confidentiality. Bibliography 1. Chmielewski MR, Grzymala-Busse JW, Peterson NW (1993) The rule induction system LERS – a version for personal computers. Found Comput Decis Sci 18(3-4):181–212 2. Chu W, Yang H, Chiang K, Minock M, Chow G, Larson C (1996) Cobase: A scalable and extensible cooperative information system. J Intell Inf Syst 6(2/3):223–259 3. Cuppens F, Demolombe R (1988) Cooperative answering: a methodolgy to provide intelligent access to databases. Proceeding of the Second International Conference on Expert Database Systems, pp 333–353 4. Gal A, Minker J (1988) Informative and cooperative answers in databases using integrity constraints. Natural Language Understanding and Logic Programming, North Holland, pp 277– 300 5. Gaasterland T (1997) Cooperative answering through controlled query relaxation. IEEE Expert 12(5):48–59 6. Gaasterland T, Godfrey P, Minker J (1992) Relaxation as a platform for cooperative answering. J Intell Inf Syst 1(3):293–321 7. Giannotti F, Manco G (2002) Integrating data mining with intelligent query answering. Logics in Artificial Intelligence. Lecture Notes in Computer Science, vol 2424. Springer, Berlin, pp 517–520

8. Godfrey P (1993) Minimization in cooperative response to failing database queries. Int J Coop Inf Syst 6(2):95–149 9. Guarino N (1998) Formal ontology in information systems. IOS Press, Amsterdam 10. Im S, Ras ZW (2007) Protection of sensitive data based on reducts in a distributed knowledge discovery system. Proceedings of the International Conference on Multimedia and Ubiquitous Engineering (MUE 2007), Seoul. IEEE Computer Society, pp 762–766 11. Lin TY (1989) Neighborhood systems and approximation in relational databases and knowledge bases. Proceedings of the Fourth International Symposium on Methodologies for Intelligent Systems. Poster Session Program, Oak Ridge National Laboratory, ORNL/DSRD-24, pp 75–86 12. Muslea I (2004) Machine Learning for Online Query Relaxation. Proceedings of KDD-2004, Seattle. ACM, pp 246–255 13. Pawlak Z (1981) Information systems – theoretical foundations. Inf Syst J 6:205–218 14. Ras ZW, Dardzinska A (2004) Ontology based distributed autonomous knowledge systems. Inf Syst Int J 29 (1): 47–58 15. Ras ZW, Dardzinska A (2006) Solving Failing Queries through Cooperation and Collaboration, Special Issue on Web Resources Access. World Wide Web J 9(2):173–186 16. Ras ZW, Zhang X, Lewis R (2007) MIRAI: Multi-hierarchical, FS-tree based Music Information Retrieval System. In: Kryszkiewicz M et al (eds) Proceedings of RSEISP 2007. LNAI, vol 4585. Springer, Berlin, pp 80–89

Correlated Equilibria and Communication in Games

Correlated Equilibria and Communication in Games FRANÇOISE FORGES Ceremade, Université Paris-Dauphine, Paris, France

Article Outline Glossary Definition of the Subject Introduction Correlated Equilibrium: Definition and Basic Properties Correlated Equilibrium and Communication Correlated Equilibrium in Bayesian Games Related Topics and Future Directions Acknowledgment Bibliography Glossary Bayesian game An interactive decision problem consisting of a set of n players, a set of types for every player, a probability distribution which accounts for the players’ beliefs over each others’ types, a set of actions for every player and a von Neumann–Morgenstern utility function defined over n-tuples of types and actions for every player. Nash equilibrium In an n-person strategic form game, a strategy n-tuple from which unilateral deviations are not profitable. von Neumann–Morgenstern utility function A utility function which reflects the individual’s preferences over lotteries. Such a utility function is defined over outcomes and can be extended to any lottery  by taking expectation with respect to . Pure strategy (or simply strategy) A mapping which, in an interactive decision problem, associates an action with the information of a player whenever this player can make a choice. Sequential equilibrium A refinement of the Nash equilibrium for n-person multistage interactive decision problems, which can be loosely defined as a strategy n-tuple together with beliefs over past information for every player, such that every player maximizes his expected utility given his beliefs and the others’ strategies, with the additional condition that the beliefs satisfy (possibly sophisticated) Bayes updating given the strategies.

Strategic (or normal) form game An interactive decision problem consisting of a set of n players, a set of strategies for every player and a (typically, von Neumann–Morgenstern) utility function defined over n-tuples of strategies for every player. Utility function A real valued mapping over a set of outcomes which reflects the preferences of an individual by associating a utility level (a “payoff”) with every outcome. Definition of the Subject The correlated equilibrium is a game theoretic solution concept. It was proposed by Aumann [1,2] in order to capture the strategic correlation opportunities that the players face when they take into account the extraneous environment in which they interact. The notion is illustrated in Sect. “Introduction”. A formal definition is given in Sect. “Correlated Equilibrium: Definition and Basic Properties”. The correlated equilibrium also appears as the appropriate solution concept if preplay communication is allowed between the players. As shown in Sect. “Correlated Equilibrium and Communication”, this property can be given several precise statements according to the constraints imposed on the players’ communication, which can go from plain conversation to exchange of messages through noisy channels. Originally designed for static games with complete information, the correlated equilibrium applies to any strategic form game. It is geometrically and computationally more tractable than the better known Nash equilibrium. The solution concept has been extended to dynamic games, possibly with incomplete information. As an illustration, we define in details the communication equilibrium for Bayesian games in Sect. “Correlated Equilibrium in Bayesian Games”. Introduction Example Consider the two-person game known as “chicken”, in which each player i can take a “pacific” action (denoted as pi ) or an “aggressive” action (denoted as ai ): p1 a1

p2 (8; 8) (10; 3)

a2 (3; 10) (0; 0)

The interpretation is that player 1 and player 2 simultaneously choose an action and then get a payoff, which is determined by the pair of chosen actions according to the previous matrix. If both players are pacific, they both get 8. If both are aggressive, they both get 0. If one player is ag-

695

696

Correlated Equilibria and Communication in Games

gressive and the other is pacific, the aggressive player gets 10 and the pacific one gets 3. This game has two pure Nash equilibria (p1 ; a2 ), (a1 ; p2 ) and one mixed Nash equilibrium in which both players choose the pacific action with probability 3/5, resulting in the expected payoff 6 for both players. A possible justification for the latter solution is that the players make their choices as a function of independent extraneous random signals. The assumption of independence is strong. Indeed, there may be no way to prevent the players’ signals from being correlated. Consider a random signal which has no effect on the players’ payoffs and takes three possible values: low, medium or high, occurring each with probability 1/3. Assume that, before the beginning of the game, player 1 distinguishes whether the signal is high or not, while player 2 distinguishes whether the signal is low or not. The relevant interactive decision problem is then the extended game in which the players can base their action on the private information they get on the random signal, while the payoffs only depend on the players’ actions. In this game, suppose that player 1 chooses the aggressive action when the signal is high and the pacific action otherwise. Similarly, suppose that player 2 chooses the aggressive action when the signal is low and the pacific action otherwise. We show that these strategies form an equilibrium in the extended game. Given player 2’s strategy, assume that player 1 observes a high signal. Player 1 deduces that the signal cannot be low so that player 2 chooses the pacific action; hence player 1’s best response is to play aggressively. Assume now that player 1 is informed that the signal is not high; he deduces that with probability 1/2, the signal is medium (i. e., not low) so that player 2 plays pacific and with probability 1/2, the signal is low so that player 2 plays aggressive. The expected payoff of player 1 is 5.5 if he plays pacific and 5 if he plays aggressive; hence, the pacific action is a best response. The equilibrium conditions for player 2 are symmetric. To sum up, the strategies based on the players’ private information form a Nash equilibrium in the extended game in which an extraneous signal is first selected. We shall say that these strategies form a “correlated equilibrium”. The corresponding probability distribution over the players’ actions is

p1 a1

p2

a2

1 3 1 3

1 3

(1)

0

and the expected payoff of every player is 7. This probability distribution can be used directly to make private recommendations to the players before the beginning of the game (see the canonical representation below).

Correlated Equilibrium: Definition and Basic Properties Definition A game in strategic form G D (N; (˙ i ) i2N ; (u i ) i2N ) consists of a set of players N together with, for every player i 2 N, a set of strategies (for instance, a set of actions) ˙ i and a (von Neumann–Morgenstern) utility function Q u i : ˙ ! R, where ˙ D j2N ˙ j is the set of all strategy profiles. We assume that the sets N and ˙ i , i 2 N, are finite. A correlation device d D (˝; q; (P i ) i2N ) is described by a finite set of signals ˝, a probability distribution q over ˝ and a partition P i of ˝ for every player i 2 N. Since ˝ is finite, the probability distribution q is just a real vector P q D (q(!))!2˝ such that q(!) 0 and !2˝ q(!) D 1. From G and d, we define the extended game Gd as follows:  ! is chosen in ˝ according to q  every player i is informed of the element P i (!) of P i which contains !  G is played: every player i chooses a strategy  i in ˙ i and gets the utility u i ( ),  D ( j ) j2N . A (pure) strategy for player i in Gd is a mapping ˛ i : ˝ ! ˙ i which is P i -measurable, namely, such that ˛ i (! 0 ) D ˛ i (!) if ! 0 2 P i (!). The interpretation is that, in Gd , every player i chooses his strategy  i as a function of his private information on the random signal ! which is selected before the beginning of G. According to Aumann [1], a correlated equilibrium of G is a pair (d; ˛), which consists of a correlation device d D (˝; q; (P i ) i2N ) and a Nash equilibrium ˛ D (˛ i ) i2N of Gd . The equilibrium conditions of every player i, conditionally on his private information, can be written as: X

q(! 0 jP i (!))u i (˛(! 0 ))

! 0 2P i (!)



X

q(! 0 jP i (!))u i ( i ; ˛ i (! 0 )) ;

! 0 2P i (!)

8i 2 N; 8 i 2 ˙ i ; 8! 2 ˝ : q(!) > 0 ;

(2)

where ˛ i D (˛ j ) j¤i . A mixed Nash equilibrium D ( i ) i2N of G can be viewed as a correlated equilibrium of G. By definition, every i is a probability distribution over ˙ i , the finite set of pure strategies of player i. Let us consider the correlation device d D (˝; q; (P i ) i2N ) in Q which ˝ D ˙ D j2N ˙ j , q is the product probability distribution induced by the mixed strategies (i. e.,

Correlated Equilibria and Communication in Games

Q q(( j ) j2N ) D j2N j ( j )) and for each i, P i is the partition of ˝ generated by ˙ i (i. e., for !; 2 ˝,

2 P i (!) , i D ! i ). Let ˛ i : ˙ ! ˙ i be the projection over ˙ i (i. e., ˛ i () D  i ). The correlation device d and the strategies ˛ i defined in this way form a correlated equilibrium. As we shall see below, this correlated equilibrium is “canonical” .

let (˛; d) be a correlated equilibrium associated with an arbitrary correlation device d D (˝; q; (P i ) i2N ). The corresponding “correlated equilibrium distribution”, namely, the probability distribution induced over ˙ by q and ˛, defines a canonical correlated equilibrium. For instance, in the introduction, (1) describes a canonical correlated equilibrium.

Canonical Representation

Duality and Existence

A canonical correlated equilibrium of G is a correlated Q equilibrium in which ˝ D ˙ D j2N ˙ j while for every player i, the partition P i of ˙ is generated by ˙ i and ˛ i : ˙ ! ˙ i is the projection over ˙ i . A canonical correlated equilibrium is thus fully specified by a probability distribution q over ˙ . A natural interpretation is that a mediator selects  D ( j ) j2N according to q and privately recommends  i to player i, for every i 2 N. The players are not forced to obey the mediator, but  is selected in such a way that player i cannot benefit from deviating unilaterally from the recommendation  i , i. e., i D  i maximizes the conditional expectation of player i’s payoff u i ( i ;  i ) given the recommendation  i . A probability distribution q over ˙ thus defines a canonical correlated equilibrium if and only if it satisfies the following linear inequalities:

From the linearity of (3), duality theory can be used to study the properties of correlated equilibria, in particular to prove their existence without relying on Nash’s [45] theorem and its fixed point argument (recall that every mixed Nash equilibrium is a correlated equilibrium). Hart and Schmeidler [30] establish the existence of a correlated equilibrium by constructing an auxiliary two person zerosum game and applying the minimax theorem. Nau and McCardle [47] derive another elementary proof of existence from an extension of the “no arbitrage opportunities” axiom that underlies subjective probability theory. They introduce jointly coherent strategy profiles, which do not expose the players as a group to arbitrage from an outside observer. They show that a strategy profile is jointly coherent if and only if it occurs with positive probability in some correlated equilibrium. From a technical point of view, both proofs turn out to be similar. Myerson [44] makes further use of the linear structure of correlated equilibria by introducing dual reduction, a technique to replace a finite game with a game with fewer strategies, in such a way that any correlated equilibrium of the reduced game induces a correlated equilibrium of the original game.

X

q( i j i )u i ( i ;  i )

 i 2˙ i

X



q( i j i )u i ( i ;  i ) ;

 i 2˙ i

8i 2 N; 8 i 2 ˙ i : q( i ) > 0; 8 i 2 ˙ i

Geometric Properties

or, equivalently, X

i

q( ; 

i

i

i

)u ( ; 

i

)

 i 2˙ i

X



q( i ;  i )u i ( i ;  i ) ;

 i 2˙ i

8i 2 N; 8 i ; i 2 ˙ i

(3)

The equilibrium conditions can also be formulated ex ante: X 2˙

q()u i ()

X

q()u i (˛ i ( i );  i ) ;

2˙

8i 2 N; 8˛ i : ˙ i ! ˙ i The following result is an analog of the “revelation principle” in mechanism design (see, e. g., Myerson [41]):

As (3) is a system of linear inequalities, the set of all correlated equilibrium distributions is a convex polytope. Nau et al. [49] show that if it has “full” dimension (namely, dimension j˙ j  1), then all Nash equilibria lie on its relative boundary. Viossat [60] characterizes in addition the class of games whose correlated equilibrium polytope contains a Nash equilibrium in its relative interior. Interestingly, this class of games includes two person zero-sum games but is not defined by “strict competition” properties. In two person games, all extreme Nash equilibria are also extreme correlated equilibria [13,25]; this result does not hold with more than two players. Finally, Viossat [59] proves that having a unique correlated equilibrium is a robust property, in the sense that the set of n person games with a unique correlated equilibrium is open. The same is not true for Nash equilibrium (unless n D 2).

697

698

Correlated Equilibria and Communication in Games

Complexity From (3), correlated equilibria can be computed by linear programming methods. Gilboa and Zemel [24] show more precisely that the complexity of standard computational problems is “NP-hard” for the Nash equilibrium and polynomial for the correlated equilibrium. Examples of such problems are: “Does the game G have a Nash (resp., correlated) equilibrium which yields a payoff greater than r to every player (for some given number r)?” and “Does the game G have a unique Nash (resp., correlated) equilibrium?”. Papadimitriou [50] develops a polynomial-time algorithm for finding correlated equilibria, which is based on a variant of the existence proof of Hart and Schmeidler [30]. Foundations By re-interpreting the previous canonical representation, Aumann [2] proposes a decision theoretic foundation for the correlated equilibrium in games with complete information, in which ˙ i for i 2 N, stands merely for a set of actions of player i. Let ˝ be the space of all states of the world; an element ! of ˝ thus specifies all the parameters which may be relevant to the players’ choices. In particular, the action profile in the underlying game G is part of the state of the world. A partition P i describes player i’s information on ˝. In addition, every player i has a prior belief, i. e., a probability distribution, qi over ˝. Formally, the framework is similar as above except that the players possibly hold different beliefs over ˝. Let ˛ i (!) denote player i’s action at !; a natural assumption is that player i knows the action he chooses, namely that ˛ i is P i -measurable. According to Aumann [2], player i is Bayesrational at ! if his action ˛ i (!) maximizes his expected payoff (with respect to qi ) given his information P i (!). Note that this is a separate rationality condition for every player, not an equilibrium condition. Aumann [2] proves the following result: Under the common prior assumption (namely, q i D q, i 2 N), if every player is Bayes-rational at every state of the world, the distribution of the corresponding action profile ˛ is a correlated equilibrium distribution. The key to this decision theoretic foundation of the correlated equilibrium is that, under the common prior assumption, Bayesian rationality amounts to (2). If the common prior assumption is relaxed, the previous result still holds, with subjective prior probability distributions, for the subjective correlated equilibrium which was also introduced by Aumann [1]. The latter solution concept is defined in the same way as above, by considering a device (˝; (q i ) i2N ; (P i ) i2N ), with a probability distribution qi for every player i, and by writing (2) in terms

of qi instead of q. Brandenburger and Dekel [10] show that (a refinement of) the subjective correlated equilibrium is equivalent to (correlated) rationalizability, another wellestablished solution concept which captures players’ minimal rationality. Rationalizable strategies reflect that the players commonly know that each of them makes an optimal choice given some belief. Nau and McCardle [48] reconcile objective and subjective correlated equilibrium by proposing the no arbitrage principle as a unified approach to individual and interactive decision problems. They argue that the objective correlated equilibrium concept applies to a game that is revealed by the players’ choices, while the subjective correlated equilibrium concept applies to the “true game”; both lead to the same set of jointly coherent outcomes. Correlated Equilibrium and Communication As seen in the previous section, correlated equilibria can be achieved in practice with the help of a mediator and emerge in a Bayesian framework embedding the game in a full description of the world. Both approaches require to extend the game by taking into account information which is not generated by the players themselves. Can the players reach a correlated equilibrium without relying on any extraneous correlation device, by just communicating with each other before the beginning of the game? Consider the game of “chicken” presented in the introduction. The probability distribution p2

a2

p1

0

1 2

a1

1 2

0

(4)

describes a correlated equilibrium, which amounts to choosing one of the two pure Nash equilibria, with equal probability. Both players get an expected payoff of 6.5. Can they safely achieve this probability distribution if no mediator tosses a fair coin for them? The answer is positive, as shown by Aumann et al. [3]. Assume that before playing “chicken”, the players independently toss a coin and simultaneously reveal to each other whether heads or tails obtains. Player 1 tells player 2 “h1 ” or “t1 ” and, at the same time, player 2 tells player 1 “h2 ” or “t2 ”. If both players use a fair coin, reveal correctly the result of the toss and play (p1 ; a2 ) if both coins fell on the same side (i. e., if (h1 ; h2 ) or (t 1 ; t 2 ) is announced) and (a1 ; p2 ) otherwise (i. e., if (h1 ; t 2 ) or (t 1 ; h2 ) is announced), they get the same effect as a mediator using (4). Furthermore, none of them can gain by unilaterally deviating from the described strategies, even at the randomizing stage: the two relevant out-

Correlated Equilibria and Communication in Games

comes, [(h1 ; h2 ) or (t 1 ; t 2 )] and [(h1 ; t 2 ) or (t 1 ; h2 )], happen with probability 1/2 provided that one of the players reveals the toss of a fair coin. This procedure is known as a “jointly controlled lottery” . An important feature of the previous example is that, in the correlated equilibrium described by (4), the players know each other’s recommendation. Hence, they can easily reproduce (4) by exchanging messages that they have selected independently. In the correlated equilibrium described by the probability distribution (1), the private character of recommendations is crucial to guarantee that (p1 ; p2 ) be played with positive probability. Hence one cannot hope that a simple procedure of direct preplay communication be sufficient to generate (1). However, the fact that direct communication is necessarily public is typical of two-person games. Given the game G D (N; (˙ i ) i2N ; (u i ) i2N ), let us define a (bounded) “cheap talk” extension ext(G) of G as a game in which T stages of costless, unmediated preplay communication are allowed before G is played. More precisely, let M ti be a finite set of messages for player i, i 2 N, at stage t, t D 1; 2; : : : T; at every stage t of ext(G), every player i selects a message in M ti ; these choices are made simultaneously before being revealed to a subset of players at the end of stage t. The rules of ext(G) thus determine a set of “senders” for every stage t (those players i for whom M ti contains more than one message) and a set of “receivers” for every stage t. The players perfectly recall their past messages. After the communication phase, they choose their strategies (e. g., their actions) as in G; they are also rewarded as in G, independently of the preplay phase, which is thus “cheap” . Communication has an indirect effect on the final outcome in G, since the players make their decisions as a function of the messages that they have exchanged. Specific additional assumptions are often made on ext(G), as we will see below. Let us fix a cheap talk extension ext(G) of G and a Nash equilibrium of ext(G). As a consequence of the previous definitions, the distribution induced by this Nash equilibrium over ˙ defines a correlated equilibrium of G (this can be proved in the same way as the canonical representation of correlated equilibria stated in Sect. “Correlated Equilibrium: Definition and Basic Properties”). The question raised in this section is whether the reverse holds. If the number of players is two, the Nash equilibrium distributions of cheap talk extensions of G form a subset of the correlated equilibrium distributions: the convex hull of Nash equilibrium distributions. Indeed, the players have both the same information after any direct exchange of messages. Conversely, by performing repeated jointly controlled lotteries like in the example above, the players can achieve any convex combination (with rational weights) of

Nash equilibria of G as a Nash equilibrium of a cheap talk extension of G. The restriction on probability distributions whose components are rational numbers is only needed as far as we focus on bounded cheap talk extensions. Bárány [4] establishes that, if the number of players of G is at least four, every (rational) correlated equilibrium distribution of G can be realized as a Nash equilibrium of a cheap talk extension ext(G), provided that ext(G) allows the players to publicly check the record of communication under some circumstances. The equilibria of ext(G) constructed by Bárány involve that a receiver gets the same message from two different senders; the message is nevertheless not public thanks to the assumption on the number of players. At every stage of ext(G), every player can ask for the revelation of all past messages, which are assumed to be recorded. Typically, a receiver can claim that the two senders’ messages differ. In this case, the record of communication surely reveals that either one of the senders or the receiver himself has cheated; the deviator can be punished (at his minmax level in G) by the other players. The punishments in Bárány’s [4] Nash equilibria of ext(G) need not be credible threats. Instead of using double senders in the communication protocols, BenPorath [5,6] proposes a procedure of random monitoring, which prescribes a given behavior to every player in such a way that unilateral deviations can be detected with probability arbitrarily close to 1. This procedure applies if there are at least three players, which yields an analog of Bárány’s result already in this case. If the number of players is exactly three, Ben-Porath [6] needs to assumes, as Bárány [4], that public verification of the record of communication is possible in ext(G) (see Ben-Porath [7]). However, Ben-Porath concentrates on (rational) correlated equilibrium distributions which allow for strict punishment on a Nash equilibrium of G; he constructs sequential equilibria which generate these distributions in ext(G), thus dispensing with incredible threats. At the price of raising the number of players to five or more, Gerardi [22] proves that every (rational) correlated equilibrium distribution of G can be realized as a sequential equilibrium of a cheap talk extension of G which does not require any message recording. For this, he builds protocols of communication in which the players base their decisions on majority rule, so that no punishment is necessary. We have concentrated on two extreme forms of communication: mediated communication, in which a mediator performs lotteries and sends private messages to the players and cheap talk, in which the players just exchange messages. Many intermediate schemes of communication are obviously conceivable. For instance, Lehrer [36] introduces (possibly multistage) “mediated talk”: the play-

699

700

Correlated Equilibria and Communication in Games

ers send private messages to a mediator, but the latter can only make deterministic public announcements. Mediated talk captures real-life communication procedures, like elections, especially if it lasts only for a few stages. Lehrer and Sorin [37] establish that whatever the number of players of G, every (rational) correlated equilibrium distribution of G can be realized as a Nash equilibrium of a single stage mediated talk extension of G. Ben-Porath [5] proposes a variant of cheap talk in which the players do not only exchange verbal messages but also “hard” devices such as urns containing balls. This extension is particularly useful in two-person games to circumvent the equivalence between the equilibria achieved by cheap talk and the convex hull of Nash equilibria. More precisely, the result of Ben-Porath [5] stated above holds for two-person games if the players first check together the content of different urns, and then each player draws a ball from an urn that was chosen by the other player, so as to guarantee that one player only knows the outcome of a lottery while the other one only knows the probabilities of this lottery. The various extensions of the basic game G considered up to now, with or without a mediator, implicitly assume that the players are fully rational. In particular, they have unlimited computational abilities. By relaxing that assumption, Urbano and Vila [55] and Dodis et al. [12] build on earlier results from cryptography so as to implement any (rational) correlated equilibrium distribution through unmediated communication, including in twoperson games. As the previous paragraphs illustrate, the players can modify their intitial distribution of information by means of many different communication protocols. Gossner [26] proposes a general criterion to classify them: a protocol is “secure” if under all circumstances, the players cannot mislead each other nor spy on each other. For instance, given a cheap talk extension ext(G), a protocol P describes, for every player, a strategy in ext(G) and a way to interpret his information after the communication phase of ext(G). P induces a correlation device d(P) (in the sense of Sect. “Correlated Equilibrium: Definition and Basic Properties”). P is secure if, for every game G and every Nash equilibrium ˛ of Gd(P) , the following procedure is a Nash equilibrium of ext(G): communicate according to the strategies described by P in order to generate d(P) and make the final choice, in G, according to ˛. Gossner [26] gives a tractable characterization of secure protocols. Correlated Equilibrium in Bayesian Games A Bayesian game  D (N; (T i ) i2N ; p; (Ai ) i2N ; (v i ) i2N ) consists of: a set of players N; for every player i 2 N,

a set of types T i , a probability distribution pi over Q T D j2N T j , a set of actions Ai and a (von Neumann– Morgenstern) utility function v i : T  A ! R, where Q A D j2N A j . For simplicity, we make the common prior assumption: p i D p for every i 2 N. All sets are assumed finite. The interpretation is that a virtual move of nature chooses t D (t j ) j2N according to p; player i is only informed of his own type ti ; the players then choose simultaneously an action. We will focus on two possible extensions of Aumann’s [1] solution concept to Bayesian games: the strategic form correlated equilibrium and the communication equilibrium. Without loss of generality, the definitions below are given in “canonical form” (see Sect. “Correlated Equilibrium: Definition and Basic Properties”). Strategic Form Correlated Equilibrium A (pure) strategy of player i in  is a mapping  i : T i ! Ai , i 2 N. The strategic form of  is a game G( ), like the game G considered in Sect. “Correlated Equilibrium: Definition and Basic Properties”, with sets of pure strategies Q j ˙ i D ATi i and utility functions ui over ˙ D j2N ˙ i computed as expectations with respect to p : u ( ) D E[v i (t;  (t))], with  (t) D ( i (t i )) i2N . A strategic form correlated equilibrium, or simply, a correlated equilibrium, of a Bayesian game  is a correlated equilibrium, in the sense of Sect. “Correlated Equilibrium: Definition and Basic Properties”, of G( ). A canonical correlated equilibrium of  is thus described by a probability distribution Q over ˙ , which selects an N-tuple of pure strategies ( i ) i2N . This lottery can be thought of as being performed by a mediator who privately recommends  i to player i, i 2 N, before the beginning of  , i. e., before (or in any case, independently of) the chance move choosing the N -tuple of types. The equilibrium conditions express that, once he knows his type ti , player i cannot gain in unilaterally deviating from  i (t i ). Communication Equilibrium Myerson [41] transforms the Bayesian game  into a mechanism design problem by allowing the mediator to collect information from the players before making them recommendations. Following Forges [15] and Myerson [42], a canonical communication device for  consists of a system q of probability distributions q D (q(:jt)) t2T over A. The interpretation is that a mediator invites every player i, i 2 N, to report his type ti , then selects an N-tuple of actions a according to q(:jt) and privately recommends ai to player i. The system q defines a communication equi-

Correlated Equilibria and Communication in Games

librium if none of the players can gain by unilaterally lying on his type or by deviating from the recommended action, namely if X

p(t i jt i )

t i 2T i



X

X

q(ajt)v i (t; a)

a2A i

p(t jt i )

t i 2T i

X

q(ajs i ; t i )v i (t; ˛ i (a i ); ai ) ;

a2A

8i 2 N; 8t i ; s i 2 T i ; 8˛ i : Ai ! Ai Correlated Equilibrium, Communication Equilibrium and Cheap Talk Every correlated equilibrium of the Bayesian game  induces a communication equilibrium of  , but the converse is not true, as the following example shows. Consider the two-person Bayesian game in which T 1 D fs 1 ; t 1 g, T 2 D ft 2 g, A1 D fa1 ; b1 g, A2 D fa2 ; b2 g, p(s1 ) D p(t 1 ) D 12 and payoffs are described by s

1

t1

b1

a2 (1; 1) (0; 0)

a1 b1

a2 (0; 0) (1; 1)

a1

b2 (1; 1) (0; 0) b2 (0; 0) (1; 1)

In this game, the communication equilibrium q(a 1 ; a2 js 1 ) D q(b1 ; b2 jt 1 ) D 1 yields the expected payoff of 1 to both players. However the maximal expected payoff of every player in a correlated equilibrium is 1/2. In order to see this, one can derive the strategic form of the game (in which player 1 has four strategies and player 2 has two strategies). Let us turn to the game in which player 1 can cheaply talk to player 2 just after having learned his type. In this new game, the following strategies form a Nash equilibrium: player 1 truthfully reveals his type to player 2 and plays a1 if s1 , b1 if t1 ; player 2 chooses a2 if s1 , b2 if t1 . These strategies achieve the same expected payoffs as the communication equilibrium. As in Sect. “Correlated Equilibrium and Communication”, one can define cheap talk extensions ext( ) of  . A wide definition of ext( ) involves an ex ante preplay phase, before the players learn their types, and an interim preplay phase, after the players learn their types but before they choose their actions. Every Nash equilibrium of ext( ) induces a communication equilibrium of  . In order to investigate the converse, namely whether cheap talk can simulate mediated communication in a Bayesian game, two approaches have been developed. The first one

(Forges [17], Gerardi [21,22], Vida [58]) proceeds in two steps, by reducing communication equilibria to correlated equilibria before applying the results obtained for strategic form games (see Sect. “Correlated Equilibrium and Communication”). The second approach (Ben-Porath [6], Krishna [33]) directly addresses the question in a Bayesian game. By developing a construction introduced for particular two person games (Forges [14]), Forges [17] shows that every communication equilibrium outcome of a Bayesian game  with at least four players can be achieved as a correlated equilibrium outcome of a two stage interim cheap talk extension extint ( ) of  . No punishment is necessary in extint ( ): at the second stage, every player gets a message from three senders and uses majority rule if the messages are not identical. Thanks to the underlying correlation device, each receiver is able to privately decode his message. Vida [58] extends Forges [17] to Bayesian games with three or even two players. In the proof, he constructs a correlated equilibrium of a long, but almost surely finite, interim cheap talk extension of  , whose length depends both on the signals selected by the correlation device and the messages exchanged by the players. No recording of messages is necessary to detect and punish a cheating player. If there are at least four players in  , once a communication equilibrium of  has been converted into a correlated equilibrium of extint ( ), one can apply Bárány’s [4] result to extint ( ) in order to transform the correlated equilibrium into a Nash equilibrium of a further, ex ante, cheap talk preplay extension of  . Gerardi [21] modifies this ex ante preplay phase so as to postpone it at the interim stage. This result is especially useful if the initial move of nature in  is just a modelling convenience. Gerardi [22] also extends his result for at least five person games with complete information (see Sect. “Correlated Equilibrium and Communication”) to any Bayesian game with full support (i. e., in which all type profiles have positive probability: p(t) > 0 for every t 2 T) by proving that every (rational) communication equilibrium of  can be achieved as a sequential equilibrium of a cheap talk extension of  . Ben-Porath [6] establishes that if  is a three (or more) person game with full support, every (rational) communication equilibrium of  which strictly dominates a Nash equilibrium of  for every type ti of every player i, i 2 N, can be implemented as a Nash equilibrium of an interim cheap talk extension of  in which public verification of past record is possible (see also Ben-Porath [7]). Krishna [33] extends Ben-Porath’s [5] result on two person games (see Sect. “Correlated Equilibrium and Communication”) to the incomplete information framework. The

701

702

Correlated Equilibria and Communication in Games

other results mentioned at the end of Sect. “Correlated Equilibrium and Communication” have also been generalized to Bayesian games (see [26,37,56]). Related Topics and Future Directions In this brief article, we concentrated on two solution concepts: the strategic form correlated equilibrium, which is applicable to any game, and the communication equilibrium, which we defined for Bayesian games. Other extensions of Aumann’s [1]solution concept have been proposed for Bayesian games, as the agent normal form correlated equilibrium and the (possibly belief invariant) Bayesian solution (see Forges [18,19] for definitions and references). The Bayesian solution is intended to capture the players’ rationality in games with incomplete information in the spirit of Aumann [2] (see Nau [46] and Forges [18]). Lehrer et al. [38] open a new perspective in the understanding of the Bayesian solution and other equilibrium concepts for Bayesian games by characterizing the classes of equivalent information structures with respect to each of them. Comparison of information structures, which goes back to Blackwell [8,9] for individual decision problems, was introduced by Gossner [27] in the context of games, both with complete and incomplete information. In the latter model, information structures basically describe how extraneous signals are selected as a function of the players’ types; two information structures are equivalent with respect to an equilibrium concept if, in every game, they generate the same equilibrium distributions over outcomes. Correlated equilibria, communication equilibria and related solution concepts have been studied in many other classes of games, like multistage games (see, e. g., [15,42]), repeated games with incomplete information (see, e. g., [14,16]) and stochastic games (see, e. g., [53,54]). The study of correlated equilibrium in repeated games with imperfect monitoring, initiated by Lehrer [34,35], proved to be particularly useful and is still undergoing. Lehrer [34] showed that if players are either fully informed of past actions or get no information (“ standard-trivial” information structure), correlated equilibria are equivalent to Nash equilibria. In other words, all correlations can be generated internally, namely by the past histories, on which players have differential information. The schemes of internal correlation introduced to establish this result are widely applicable and inspired those of Lehrer [36] (see Sect. “Correlated Equilibrium and Communication”). In general repeated games with imperfect monitoring, Renault and Tomala [52] characterize communication equilibria but the amount of correlation that the players can

achieve in a Nash equilibrium is still an open problem (see, e. g., [28,57] for recent advances). Throughout this article, we defined a correlated equilibrium as a Nash equilibrium of an extension of the game under consideration. The solution concept can be strengthened by imposing some refinement, i. e., further rationality conditions, to the Nash equilibrium in this definition (see, e. g., [11,43]). Refinements of communication equilibria have also been proposed (see, e. g., [22,23,42]). Some authors (see, e. g., [39,40,51]) have also developed notions of coalition proof correlated equilibria, which resist not only to unilateral deviations, as in this article, but even to multilateral ones. A recurrent difficulty is that, for many of these stronger solution concepts, a useful canonical representation (as derived in Sect. “Correlated Equilibrium: Definition and Basic Properties”) is not available. Except for two or three references, we deliberately concentrated on the results published in the game theory and mathematical economics literature, while substantial achievements in computer science would fit in this survey. Both streams of research pursue similar goals but rely on different formalisms and techniques. For instance, computer scientists often make use of cryptographic tools which are not familiar in game theory. Halpern [29] gives an idea of recent developments at the interface of computer science and game theory (see in particular the section “implementing mediators”) and contains a number of references. Finally, the assumption of full rationality of the players can also be relaxed. Evolutionary game theory has developed models of learning in order to study the long term behavior of players with bounded rationality. Many possible dynamics are conceivable to represent more or less myopic attitudes with respect to optimization. Under appropriate learning procedures, which express for instance that agents want to minimize the regret of their strategic choices, the empirical distribution of actions converge to correlated equilibrium distributions (see, e. g., [20,31,32] for a survey). However, standard procedures, as the “replicator dynamics”, may even eliminate all the strategies which have positive probability in a correlated equilibrium (see [61]). Acknowledgment The author wishes to thank Elchanan Ben-Porath, Frédéric Koessler, R. Vijay Krishna, Ehud Lehrer, Bob Nau, Indra Ray, Jérôme Renault, Eilon Solan, Sylvain Sorin, Bernhard von Stengel, Tristan Tomala, Amparo Urbano, Yannick Viossat and, especially, Olivier Gossner and Péter Vida, for useful comments and suggestions.

Correlated Equilibria and Communication in Games

Bibliography Primary Literature 1. Aumann RJ (1974) Subjectivity and correlation in randomized strategies. J Math Econ 1:67–96 2. Aumann RJ (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18 3. Aumann RJ, Maschler M, Stearns R (1968) Repeated games with incomplete information: an approach to the nonzero sum case. Reports to the US Arms Control and Disarmament Agency, ST-143, Chapter IV, 117–216 (reprinted In: Aumann RJ, Maschler M (1995) Repeated Games of Incomplete Information. M.I.T. Press, Cambridge) 4. Bárány I (1992) Fair distribution protocols or how players replace fortune. Math Oper Res 17:327–340 5. Ben-Porath E (1998) Correlation without mediation: expanding the set of equilibrium outcomes by cheap pre-play procedures. J Econ Theor 80:108–122 6. Ben-Porath E (2003) Cheap talk in games with incomplete information. J Econ Theor 108:45–71 7. Ben-Porath E (2006) A correction to “Cheap talk in games with incomplete information”. Mimeo, Hebrew University of Jerusalem, Jerusalem 8. Blackwell D (1951) Comparison of experiments. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, Berkeley and Los Angeles, pp 93–102 9. Blackwell D (1953) Equivalent comparison of experiments. Ann Math Stat 24:265–272 10. Brandenburger A, Dekel E (1987) Rationalizability and correlated equilibria. Econometrica 55:1391–1402 11. Dhillon A, Mertens JF (1996) Perfect correlated equilibria. J Econ Theor 68:279–302 12. Dodis Y, Halevi S, Rabin T (2000) A cryptographic solution to a game theoretic problem. CRYPTO 2000: 20th International Cryptology Conference. Springer, Berlin, pp 112–130 13. Evangelista F, Raghavan TES (1996) A note on correlated equilibrium. Int J Game Theory 25:35–41 14. Forges F (1985) Correlated equilibria in a class of repeated games with incomplete information. Int J Game Theory 14:129–150 15. Forges F (1986) An approach to communication equilibrium. Econometrica 54:1375–1385 16. Forges F (1988) Communication equilibria in repeated games with incomplete information. Math Oper Res 13:191–231 17. Forges F (1990) Universal mechanisms. Econometrica 58:1341– 1364 18. Forges F (1993) Five legitimate definitions of correlated equilibrium in games with incomplete information. Theor Decis 35:277–310 19. Forges F (2006) Correlated equilibrium in games with incomplete information revisited. Theor Decis 61:329–344 20. Foster D, Vohra R (1997) Calibrated learning and correlated equilibrium. Game Econ Behav 21:40–55 21. Gerardi D (2000) Interim pre-play communication. Mimeo, Yale University, New Haven 22. Gerardi D (2004) Unmediated communication in games with complete and incomplete information. J Econ Theory 114:104– 131

23. Gerardi D, Myerson R (2007) Sequential equilibria in Bayesian games with communication. Game Econ Behav 60:104–134 24. Gilboa I, Zemel E (1989) Nash and correlated equilibria: some complexity considerations. Game Econ Behav 1:80–93 25. Gomez-Canovas S, Hansen P, Jaumard B (1999) Nash Equilibria from the correlated equilibria viewpoint. Int Game Theor Rev 1:33–44 26. Gossner O (1998) Secure protocols or how communication generates correlation. J Econ Theory 83:69–89 27. Gossner O (2000) Comparison of information structures. Game Econ Behav 30:44–63 28. Gossner O, Tomala T (2007) Secret correlation in repeated games with signals. Math Oper Res 32:413–424 29. Halpern JY (2007) Computer science and game theory. In: Durlauf SN, Blume LE (eds) The New Palgrave dictionary of economics, 2nd edn. Palgrave Macmillan. The New Palgrave dictionary of economics online. http://www. dictionaryofeconomics.com/article?id=pde2008_C000566. Accessed 24 May 2008 30. Hart S, Schmeidler D (1989) Existence of correlated equilibria. Math Oper Res 14:18–25 31. Hart S, Mas-Colell A (2000) A simple adaptive procedure leading to correlated equilibrium. Econometrica 68:1127–1150 32. Hart S (2005) Adaptative heuristics. Econometrica 73:1401– 1430 33. Krishna RV (2007) Communication in games of incomplete information: two players. J Econ Theory 132:584–592 34. Lehrer E (1991) Internal correlation in repeated games. Int J Game Theory 19:431–456 35. Lehrer E (1992) Correlated equilibria in two-player repeated games with non-observable actions. Math Oper Res 17:175– 199 36. Lehrer E (1996) Mediated talk. Int J Game Theory 25:177–188 37. Lehrer E, Sorin S (1997) One-shot public mediated talk. Game Econ Behav 20:131–148 38. Lehrer E, Rosenberg D, Shmaya E (2006) Signaling and mediation in Bayesian games. Mimeo, Tel Aviv University, Tel Aviv 39. Milgrom P, Roberts J (1996) Coalition-proofness and correlation with arbitrary communication possibilities. Game Econ Behav 17:113–128 40. Moreno D, Wooders J (1996) Coalition-proof equilibrium. Games Econ Behav 17:80–112 41. Myerson R (1982) Optimal coordination mechanisms in generalized principal-agent problems. J Math Econ 10:67–81 42. Myerson R (1986a) Multistage games with communication. Econometrica 54:323–358 43. Myerson R (1986b) Acceptable and predominant correlated equilibria. Int J Game Theory 15:133–154 44. Myerson R (1997) Dual reduction and elementary games. Game Econ Behav 21:183–202 45. Nash J (1951) Non-cooperative games. Ann Math 54:286–295 46. Nau RF (1992) Joint coherence in games with incomplete information. Manage Sci 38:374–387 47. Nau RF, McCardle KF (1990) Coherent behavior in noncooperative games. J Econ Theory 50(2):424–444 48. Nau RF, McCardle KF (1991) Arbitrage, rationality and equilibrium. Theor Decis 31:199–240 49. Nau RF, Gomez-Canovas S, Hansen P (2004) On the geometry of Nash equilibria and correlated equilibria. Int J Game Theory 32:443–453

703

704

Correlated Equilibria and Communication in Games

50. Papadimitriou CH (2005) Computing correlated equilibria in multiplayer games. Proceedings of the 37th ACM Symposium on Theory of Computing. STOC, Baltimore, pp 49–56 51. Ray I (1996) Coalition-proof correlated equilibrium: a definition. Game Econ Behav 17:56–79 52. Renault J, Tomala T (2004) Communication equilibrium payoffs in repeated games with imperfect monitoring. Game Econ Behav 49:313–344 53. Solan E (2001) Characterization of correlated equilibrium in stochastic games. Int J Game Theory 30:259–277 54. Solan E, Vieille N (2002) Correlated equilibrium in stochastic games. Game Econ Behav 38:362–399 55. Urbano A, Vila J (2002) Computational complexity and communication: coordination in two-player games. Econometrica 70:1893–1927 56. Urbano A, Vila J (2004a) Computationally restricted unmediated talk under incomplete information. J Econ Theory 23:283– 320 57. Urbano A, Vila J (2004b) Unmediated communication in repeated games with imperfect monitoring. Game Econ Behav 46:143–173 58. Vida P (2007) From communication equilibria to correlated equilibria. Mimeo, University of Vienna, Vienna 59. Viossat Y (2005) Is having a unique equilibrium robust? to appear in J Math Econ

60. Viossat Y (2006) The geometry of Nash equilibria and correlated equilibria and a generalization of zero-sum games. Mimeo, S-WoPEc working paper 641, Stockholm School of Economics, Stockholm 61. Viossat Y (2007) The replicator dynamics does not lead to correlated equilibria. Game Econ Behav 59:397–407

Books and Reviews Forges F (1994) Non-zero sum repeated games and information transmission. In: Megiddo N (ed) Essays in Game Theory in Honor of Michael Maschler. Springer, Berlin, pp 65–95 Mertens JF (1994) Correlated- and communication equilibria. In: Mertens JF, Sorin S (eds) Game Theoretic Methods in General Equilibrium Analysis. Kluwer, Dordrecht, pp 243–248 Myerson R (1985) Bayesian equilibrium and incentive compatibility. In: Hurwicz L, Schmeidler D, Sonnenschein H (eds) Social Goals and Social Organization. Cambridge University Press, Cambridge, pp 229–259 Myerson R (1994) Communication, correlated equilibria and incentive compatibility. In: Aumann R, Hart S (eds) Handbook of Game Theory, vol 2. Elsevier, Amsterdam, pp 827–847 Sorin S (1997) Communication, correlation and cooperation. In: Mas Colell A, Hart S (eds) Cooperation: Game Theoretic Approaches. Springer, Berlin, pp 198–218

Correlations in Complex Systems

Correlations in Complex Systems RENAT M. YULMETYEV1,2 , PETER HÄNGGI 3 1 Department of Physics, Kazan State University, Kazan, Russia 2 Tatar State University of Pedagogical and Humanities Sciences, Kazan, Russia 3 University of Augsburg, Augsburg, Germany Article Outline Glossary Definition of the Subject Introduction Correlation and Memory in Discrete Non-Markov Stochastic Processes Correlation and Memory in Discrete Non-Markov Stochastic Processes Generated by Random Events Information Measures of Memory in Complex Systems Manifestation of Strong Memory in Complex Systems Some Perspectives on the Studies of Memory in Complex Systems Bibliography Glossary Correlation A correlation describes the degree of relationship between two or more variables. The correlations are viewed due to the impact of random factors and can be characterized by the methods of probability theory. Correlation function The correlation function (abbreviated, as CF) represents the quantitative measure for the compact description of the wide classes of correlation in the complex systems (CS). The correlation function of two variables in statistical mechanics provides a measure of the mutual order existing between them. It quantifies the way random variables at different positions are correlated. For example in a spin system, it is the thermal average of the scalar product of the spins at two lattice points over all possible orderings. Memory effects in stochastic processes through correlations Memory effects (abbreviated, as ME) appear at a more detailed level of statistical description of correlation in the hierarchical manner. ME reflect the complicated or hidden character of creation, the propagation and the decay of correlation. ME are produced by inherent interactions and statistical after-effects in CS. For the statistical systems ME are induced by contracted description of the evolution of the dynamic variables of a CS.

Memory functions Memory functions describe mutual interrelations between the rates of change of random variables on different levels of the statistical description. The role of memory has its roots in the natural sciences since 1906 when the famous Russian mathematician Markov wrote his first paper in the theory of Markov Random Processes. The theory is based on the notion of the instant loss of memory from the prehistory (memoryless property) of random processes. Information measures of statistical memory in complex systems From the physical point of view time scales of correlation and memory cannot be treated as arbitrary. Therefore, one can introduce some statistical quantifiers for the quantitative comparison of these time scales. They are dimensionless and possess the statistical spectra on the different levels of the statistical description. Definition of the Subject As commonly used in probability theory and statistics, a correlation (also so called correlation coefficient), measures the strength and the direction of a linear relationship between two random variables. In a more general sense, a correlation or co-relation reflects the deviation of two (or more) variables from mutual independence, although correlation does not imply causation. In this broad sense there are some quantifiers which measures the degree of correlation, suited to the nature of data. Increasing attention has been paid recently to the study of statistical memory effects in random processes that originate from nature by means of non-equilibrium statistical physics. The role of memory has its roots in natural sciences since 1906 when the famous Russian mathematician Markov wrote his first paper on the theory of Markov Random Processes (MRP) [1]. His theory is based on the notion of an instant loss of memory from the prehistory (memoryless property) of random processes. In contrast, there are an abundance of physical phenomena and processes which can be characterized by statistical memory effects: kinetic and relaxation processes in gases [2] and plasma [3], condensed matter physics (liquids [4], solids [5], and superconductivity [6]) astrophysics [7], nuclear physics [8], quantum [9] and classical [9] physics, to name only a few. At present, we have a whole toolbox available of statistical methods which can be efficiently used for the analysis of the memory effects occurring in diverse physical systems. Typical such schemes are Zwanzig–Mori’s kinetic equations [10,11], generalized master equations and corresponding statistical quantifiers [12,13,14,15,16,17,18], Lee’s recurrence relation method [19,20,21,22,23], the generalized Langevin equation (GLE) [24,25,26,27,28,29], etc.

705

706

Correlations in Complex Systems

Here we shall demonstrate that the presence of statistical memory effects is of salient importance for the functioning of the diverse natural complex systems. Particularly, it can imply that the presence of large memory times scales in the stochastic dynamics of discrete time series can characterize catastrophical (or pathological for live systems) violation of salutary dynamic states of CS. As an example, we will demonstrate here that the emergence of strong memory time scales in the chaotic behavior of complex systems (CS) is accompanied by the likely initiation and the existence of catastrophes and crises (Earthquakes, financial crises, cardiac and brain attack, etc.) in many CS and especially by the existence of pathological states (diseases and illness) in living systems.

for by a linear fit of xi to  y . This is written R2x y D 1 

2  yjx

 y2

;

2 denotes the square of the error of a linear rewhere  yjx gression of xi on yi in the equation y D a C bx, 2  yjx D

n 1X (y i  a  bx i )2 n iD1

and  y2 denotes just the dispersion of y. Note that since the sample correlation coefficient is symmetric in xi and yi , we will obtain the same value for a fit to yi :

Introduction A common definition [30] of a correlation measure (X; Y) between two random variables X and Y with the mean values E(X) and E(Y), and fluctuations ıX D X E(X) and ıY D Y  E(Y), dispersions  X2 D E(ıX 2 ) D E(X 2 )  E(X)2 and Y2 D E(ıY 2 ) D E(Y 2 )  E(Y)2 is defined by: (X; Y) D

E(ıX ıY) ;  X Y

where E is the expected value of the variable. Therefore we can write (X; Y) D

[E(XY)  E(X) E(Y)] : (E(X 2 )  E(X)2 )1/2 (E(Y 2 )  E(Y)2 )1/2

Here, a correlation can be defined only if both of the dispersions are finite and both of them are nonzero. Due to the Cauchy–Schwarz inequality, a correlation cannot exceed 1 in absolute value. Consequently, a correlation assumes it maximum at 1 in the case of an increasing linear relationship, or 1 in the case of a decreasing linear relationship, and some value in between in all other cases, indicating the degree of linear dependence between the variables. The closer the coefficient is either to 1 or 1, the stronger is the correlation between the variables. If the variables are independent then the correlation equals 0, but the converse is not true because the correlation coefficient detects only linear dependencies between two variables. Since the absolute value of the sample correlation must be less than or equal to 1 the simple formula conveniently suggests a single-pass algorithm for calculating sample correlations. The square of the sample correlation coefficient, which is also known as the coefficient of determination, is the fraction of the variance in  x that is accounted

R2x y D 1 

2 xjy

x2

:

This equation also gives an intuitive idea of the correlation coefficient for random (vector) variables of higher dimension. Just as the above described sample correlation coefficient is the fraction of variance accounted for by the fit of a 1-dimensional linear submanifold to a set of 2-dimensional vectors (x i ; y i ), so we can define a correlation coefficient for a fit of an m-dimensional linear submanifold to a set of n-dimensional vectors. For example, if we fit a plane z D a C bx C cy to a set of data (x i ; y i ; z i ) then the correlation coefficient of z to x and y is R2 D 1 

2 zjx y

z2

:

Correlation and Memory in Discrete Non-Markov Stochastic Processes Here we present a non-Markov approach [31,32] for the study of long-time correlations in chaotic long-time dynamics of CS. For example, let the variable xi be defined as the R-R interval or the time distance between nearest, so called R peaks occurring in a human electrocardiogram (ECG). The generalization will consist in taking into account non-stationarity of stochastic processes and its further applications to the analysis of the heart-ratevariability. We should bear in mind, that one of the key moments of the spectral approach in the analysis of stochastic processes consists in the use of normalized time correlation function (TCF) a0 (t) D

hhA(T) A(T C t)ii : hA(T)2 i

(1)

Correlations in Complex Systems

Here the time T indicates the beginning of a time serial, A(t) is a state vector of a complex system as defined below in Eq. (5) at t, jA(t)j is the length of vector A(t), the double angular brackets indicate a scalar product of vectors and an ensemble averaging. The ensemble averaging is, of course needed in Eq. (1) when correlation and other characteristic functions are constructed. The average and scalar product becomes equivalent when a vector is composed of elements from a discrete-time sampling, as done later. Here a continuous formalism is discussed for convenience. However further, since Sect. “Correlation and Memory in Discrete Non-Markov Stochastic Processes” we shall consider only a case of discrete processes. The above-stated designation is true only for stationary systems. In a non-stationary case Eq. (1) is not true and should be changed. The concept of TCF can be generalized in case of discrete non-stationary sequence of signals. For this purpose the standard definition of the correlation coefficient in probability theory for the two random signals X and Y must be taken into account D

hhXYii ;  X Y

 X D hjXji ;

Y D hjYji :

(2)

In Eq. (2) the multi-component vectors X, Y are determined by fluctuations of signals x and y accordingly,  X2 ; Y2 represent the dispersions of signals x and y, and values jXj; jYj represent the lengths of vectors X, Y, correspondingly. Therefore, the function a(T; t) D

hhA(T) A(T C t)ii hjA(T)ji hjA(T C t)ji

(3)

can serve as the generalization of the concept of TCF (1) for non-stationary processes A(T C t). The non-stationary TCF (3) obeys the conditions of the normalization and attenuation of correlation a(T; 0) D 1 ;

lim a(T; t) D 0 :

t!1

Let us note, that in a real CS the second limit, typically, is not carried out due possible occurrence nonergodocity (meaning that a time average does not equal its ensemble average). According to the Eqs. (1) and (3) for the quantitative description of non-stationarity it is convenient to introduce a function of non-stationarity  (T; t) D

hjA(T C t)ji D hjA(T)ji

 2 (T C t)  2 (T)

 1/2 :

(4)

One can see that this function equals the ratio of the lengths of vectors of final and initial states. In case of stationary process the dispersion does not vary with the time

(or its variation is very weak). Therefore the following relations  (T C t) D  (T) ;

 (T; t) D 1

(5)

hold true for the stationary process. Due to the condition (5) the following function  (T; t) D 1   (T; t)

(6)

is suitable in providing a dynamic parameter of non-stationarity. This dynamic parameter can serve as a quantitative measure of non-stationarity of the process under investigation. According to Eqs. (4)–(6) it is reasonable to suggest the existence of three different elementary classes of non-stationarity j (T; t)j D j1   (T; t)j 9 8 = 1 one deals with a situation with moderate memory strength, and the case where both ", ı 1 typically constitutes a more regular and robust random process exhibiting strong memory features. Manifestation of Strong Memory in Complex Systems A fundamental role of the strong and weak memory in the functioning of the human organism and seismic phenomena can be illustrated by the example of some situations examined next. We will consider some examples of the time series for both living and for seismic systems. It is necessary to note that a comprehensive analysis of the experimental data includes the calculation and the presentation of corresponding phase portraits in some planes of the dynamic orthogonal variables, the autocorrelation

time functions, the memory time functions and their frequency power spectra, etc. However, we start out by calculating two statistical quantifiers, characterizing two informational measures of memory: the parameters 1 (!) and ı1 (!). Figures 1 and 3 present the results of experimental data of pathological states of human cardiovascular systems (CVS). Figure 2 depicts the analysis for the seismic observation. Figures 4 and 5 indicate the memory effects for the patients with Parkinson disease (PD), and the last two Figs. 6, 7 demonstrate the key role of the strength of memory in the case of time series of patients suffering from photosensitive epilepsy which are contrasted with signals taken from healthy subjects. All these cases convincingly display the crucial role of the statistical memory in the functioning of complex (living and seismic) systems. A characteristic role of the statistical memory can be detected from Fig. 1 for the typical representatives taken from patients from four different CVS-groups: (a) for healthy subject, (b) for a patient with rhythm driver migration, (c) for a patient after myocardial infarction (MI), (d) for a patient after MI with subsequent sudden car-

Correlations in Complex Systems

Correlations in Complex Systems, Figure 2 Frequency spectra of the first three points of the first measure of memory (non-Markovity parameters) "1 (!), "2 (!), and "3 (!) for the seismic phenomena: a, b, c long before the strong Earthquake (EQ) for the steady state of Earth and d, e, f during the strong EQ. Markov and quasi-Markov behavior of seismic signals with manifestation of the weak memory is observed only for "1 in state before the strong EQ. All remaining cases b, c, d and e relate to non-Markov processes. Strong non-Markovity and strong memory is typical for case d (state during the strong EQ). In behavior of "2 (!) and "3 (!) one can see a transition from quasi-Markovity (at low frequencies) to strong non-Markovity (at high frequencies). From Fig. 6 in [105]

diac death (SSCD). All these data were obtained from the short time series of the dynamics of RR-intervals from the electric signals of the human ECG’s. It can be seen here that significant memory effects typically lead to the longtime correlations in the complex systems. For healthy we observe weak memory effects while and large values of the measure memory 1 (! D 0)  25. The strong memory and the long memory time (approximately, 10 times more) are being observed with the help of 3 patient groups: with RDM (rhythm driver migration) (b), after MI (c) and after MI with SSCD (d). Figure 2 depicts the strong memory effects presented in seismic phenomena. By a transition from the steady state of Earth ((a), (b) and (c)) to the state of strong earthquake (EQ) ((d), (e), and (f)) a remarkable amplification of memory effects is highly visible. The term amplification refers to the appearance of strong memory and the prolon-

gation of the memory correlation time in the seismic system. Recent study show that discrete non-Markov stochastic processes and long-range memory effects play a crucial role in the behavior of seismic systems. An approach, permitting us to obtain an algorithm of strong EQ forecasting and to differentiate technogenic explosions from weak EQs, can be developed thereupon. Figure 3 demonstrates an intensification of memory effects of one order at the transition from healthy people ((a), (b) and (c)) to patient suffering from myocardial infarction. The figures were calculated from the long time series of the RR-intervals dynamics from the human ECG’s. The zero frequency values 1 (! D 0) at ! D 0 sharply reduced, approximately of the size of one order for patient as compared to healthy subjects. Figures 4 and 5 illustrate the behavior for patients with Parkinson’s disease. Figure 4 shows time recording of the

715

716

Correlations in Complex Systems

Correlations in Complex Systems, Figure 3 The frequency dependence of the first three points of non-Markovity parameter (NMP) for the healthy person (a), (b), (c) and patient after myocardial infarction (MI) (d), (e), (f) from the time dynamics of RR-intervals of human ECG’s for the case of the long time series. In the spectrum of the first point of NMP "1 (!) there is an appreciable low-frequency (long time) component, which concerns the quasi-Markov processes. Spectra NMP "2 (!) and NMP "3 (!) fully comply with non-Markov processes within the whole range of frequencies. From Fig. 6 in [106]

pathological tremor velocity in the left index finger of a patient with Parkinson’s disease (PD) for eight diverse pathological cases (with or without medication, with or without deep brain stimulation (DBS), for various DBS, medication and time conditions). Figure 5, arranged in accordance with these conditions, displays a wide variety of the memory effects in the treatment of PD’s patients. Due to the large impact of memory effects this observation permits us to develop an algorithm of exact diagnosis of Parkinson’s disease and a calculation of the quantitative parameter of the quality of treatment. A physical role of the strong and long memory correlation time enables us to extract a vital information about the states of various patient on basis of notions of correlation and memory times. According to Figs. 6 and 7 specific information about the physiological mechanism of photosensitive epilepsy (PSE) was obtained from the analysis of the strong

memory effects via the registration the neuromagnetic responses in recording of magnetoencephalogram (MEG) of the human brain core. Figure 6 presents the topographic dependence of the first level of the second memory measure ı1 (! D 0; n) for the healthy subjects in the whole group (upper line) vs. patients (lower line) for red/blue combination of the light stimulus. This topographic dependence of "1 (! D 0; n) depicted in Fig. 6 clearly demonstrates the existence of long-range time correlation. It is accompanied by a sharp increase of the role of the statistical memory effects in the all MEG’s sensors with sensor numbers n D 1; 2; : : : ; 61 of the patient with PSE in comparison with healthy peoples. A sizable difference between the healthy subject and a subject with PSE occurs. To emphasize the role of strong memory one can continue studying the topographic dependence in terms of the novel informational measure, the index of memory, de-

Correlations in Complex Systems

Correlations in Complex Systems, Figure 4 Pathological tremor velocity in the left index finger of the sixth patient with Parkinson’s disease (PD). The registration of Parkinsonian tremor velocity is carried out for the following conditions: a “OFF-OFF” condition (no any treatment), b “ON-ON” condition (using deep brain stimulation (DBS) by electromagnetic stimulator and medicaments), c “ON-OFF” condition (DBS only), d “OFF-ON” condition (medicaments (L-Dopa) only), e–h the “15 OFF”, “30 OFF”, “45 OFF”, “60 OFF” conditions – the patient’s states 15 (30, 45, 60) minutes after the DBS is switched off, no treatment. Let’s note the scale of the pathological tremor amplitude (see the vertical scale). Such representation of the time series allows us to note the increase or the decrease of pathological tremor. From Fig. 1 in [107]

fined as:

(n) D

healthy (0; n) ı1 patient ı1 (0; n)

;

(62)

see in Fig. 7. This measure quantifies the detailed memory effects in the individual MEG sensors of the patient with PSE versus the healthy group. A sharp increase of the role of the memory effects in the stochastic behavior of the magnetic signals is clearly detected in sensor numbers n D 10; 46; 51; 53 and 59. The observed points of MEG sensors locate the regions of a protective mechanism against PSE in a human organism: frontal (sensor 10), occipital (sensors 46, 51 and 53) and right parietal (sensor 59) regions. The early activity in these sensors may reflect a protective mechanism suppressing the cortical hyperactivity due to the chromatic flickering. We remark that some early steps towards understanding the normal and various catastrophical states of complex systems have already been taken in many fields of

science such as cardiology, physiology, medicine, neurology, clinical neurophysiology, neuroscience, seismology and so forth. With the underlying systems showing fractal and complicated spatial structures numerous studies applying the linear and nonlinear time series analysis to various complex systems have been discussed by many authors. Specifically the results obtained shows evidence of the significant nonlinear structure evident in the registered signals in the control subjects, whereas nonlinearity for the patients and catastrophical states were not detected. Moreover the couplings between distant parts and regions were found to be stronger for the control subjects. These prior findings are leading to the hypothesis that the real normal complex systems are mostly equipped with significantly nonlinear subsystems reflecting an inherent mechanism which stems against a synchronous excitation vs. outside impact or inside disturbances. Such nonlinear mechanisms are likely absent in the occurrence of catastrophical or pathological states of the complex systems.

717

718

Correlations in Complex Systems

Correlations in Complex Systems, Figure 5 The frequency dependence of the first point of the non-Markovity parameter "1 () for pathological tremor velocity in the patient. As an example, the sixth patient with Parkinson’s disease is chosen. The figures are submitted according to the arrangement of the initial time series. The characteristic low-frequency oscillations are observed in frequency dependence (a, e–h), which get suppressed under medical influence (b–d). The non-Markovity parameter reflects the Markov and non-Markov components of the initial time signal. The value of the parameter on zero frequency "1 (0) reflects the total dynamics of the initial time signal. The maximal values of parameter "1 (0) correspond to small amplitudes of pathological tremor velocity. The minimal values of this parameter are characteristic of significant pathological tremor velocities. The comparative analysis of frequency dependence "1 () allows us to estimate the efficiency of each method of treatment. From Fig. 5 in [107]

Correlations in Complex Systems, Figure 6 The topographic dependence of the first point of the second measure of memory ı1 (! D 0; n) for the healthy on average in the whole group (upper line) vs. patient (lower line) for R/B combination of the light stimulus. One can note the singular weak memory effects for the healthy on average in sensors with No. 5, 23, 14, 11 and 9

Correlations in Complex Systems

Correlations in Complex Systems, Figure 7 The topographic dependence of the memory index (n) D 1 (n; 0) for the the whole group of healthy on average vs. patient for an R/B combination of the light stimulus. Strong memory in patient vs. healthy appears clearly in sensors with No. 10, 5, 23, 40 and 53

From the physical point of view our results can be used as a toolbox for testing and identifying the presence or absence of various memory effects as they occur in complex systems. The set of our memory quantifiers is uniquely associated with the appearance of memory features in the chaotic behavior of the observed signals. The registration of the behavior belonging to these indicators, as elucidated here, is of beneficial use for detecting the catastrophical or pathological states in the complex systems. There exist alternative quantifiers of different nature as well, such as the Lyapunov’s exponent, Kolmogorov–Sinai entropy, correlation dimension, etc., which are widely used in nonlinear dynamics and relevant applications. In the present context, we have found out that the employed memory measures are not only convenient for the analysis but are also ideally suitable for the identification of anomalous behavior occurring in complex systems. The search for other quantifiers, and foremost, the ways of optimization of such measures when applied to the complex discrete time dynamics presents a real challenge. Especially this objective is met when attempts are made towards the identification and quantification of functioning in complex systems. This work presents initial steps towards the understanding of basic foundation of anomalous processes in complex systems on the basis of a study of the underlying memory effects and connected with this, the occurrence of long lasting correlations. Some Perspectives on the Studies of Memory in Complex Systems Here we present a few outlooks on the fundamental role of statistical memory in complex systems. This involves the issue of studying cross-correlations. The statistical theory of stochastic dynamics of cross-correlation can be cre-

ated on the basis of the mentioned formalism of projection operators technique in the linear space of random variables. As a result we obtain the cross-correlation memory functions (MF’s) revealing the statistical memory effects in complex systems. Some memory quantifiers will appear simultaneously which will reflect cross-correlation between different parts of CS. Cross-correlation MF’s can be very useful for the analysis of the weak and strong interactions, signifying interrelations between the different groups of random variables in CS. Besides that the cross-correlation can be important for the problem of phase synchronization, which can find a unique way of studying of synchronization phenomena in CS that has a special importance when studying aspects of brain and living systems dynamics. Some additional information about the strong and weak memory effects can be extracted from the observation of correlation in CS in the random event’s scales. Similar effects are playing a crucial role in the differentiation between stochastic phenomena within astrophysical systems, for example, in galaxies, pulsars, quasars, microquasars, lacertides, black holes, etc. One of the most important area of application of developed approach is a bispectral and polyspectral analysis for the diverse CS. From the mathematical point of view a correct definition of the spectral properties in the functional space of random functions is quite important. A variety of MF’s arises in the quantitative analysis of the fine details of memory effects in a nonlinear manner. The quantitative control of the treatment quality in the diverse areas of medicine and physiology may be one of the important biomedical application of the manifestation of the strong memory effects. These and other features of memory effects in CS call for an advanced development of brain studies on the basis of EEG’s and MEG’s data, cardiovascular, locomotor

719

720

Correlations in Complex Systems

and respiratory human systems, in the development of the control system of information flows in living systems. An example is the prediction of strong EQ’s and the clear differentiation between the occurrence of weak EQ’s and the technogenic explosions, etc. In conclusion, we hope that the interested reader becomes invigorated by this presentation of correlation and memory analysis of the inherent nonlinear system dynamics of varying complexity. He can find further details how significant memory effects typically cause long time correlations in complex systems by inspecting more closely some of the published items in [42–103]. There are the relationships between standard fractional and polyfractal processes and long-time correlation in complex systems, which were explained in [39,40,44,45, 46,49,53,54,60,62,64,76,79,83,84,94] in detail. Example of using the Hurst exponent over time for testing the assertion that emerging markets are becoming more efficient can be found in [51]. While over 30 measures of complexity have been proposed in the research literature one can distinguish [42,55,66,81,89,99] with the specific designation of long-time correlation and memory effects. The papers [48,57] are focused on long range correlation processes that are nonlocal in time and whence show memory effects. The statistical characterization of the nonstationarities in real-world time series is an important topic in many fields of research and some numerous methods of characterizing nonstationary time series were offered in [59,65,84]. Long-range correlated time series have been widely used in [52,61,63,68,74] for the theoretical description of diverse phenomena. Example of the study an anatomy of extreme events in a complex adaptive system can be found in [67]. Approaches for modeling long-time and longrange correlation in complex systems from time series are investigated and applied to different examples in [50,56,69,70,73,75,80,82,86,100,101,102]. Detecting scale invariance and its fundamental relationships with statistical structures is one of the most relevant problems among those addressed correlation analysis [47,71,72,91]. Specific long-range correlation in complex systems are the object of active research due to its implications in the technology of materials and in several fields of scientific knowledge with the use of quantified histograms [78], decrease of chaos in heart failure [85], scaling properties of ECG’s signals fluctuations [87], transport properties in correlated systems [88] etc.

It is demonstrated in [43,92,93] how ubiquity of the long-range correlations is apparent in typical and exotic complex statistical systems with application to biology, medicine, economics and to time clustering properties [95,98]. The scale-dependent wavelet and spectral measures for assessing cardiac dysfunction have been used in [97]. In recent years the study of an increasing number of natural phenomena that appear to deviate from standard statistical distributions has kindled interest in alternative formulations of statistical mechanics [58,101]. At last, papers [77,90] present the samples of the deep and multiple interplay between discrete and continuous long-time correlation and memory in complex systems and the corresponding modeling the discrete time series on the basis of physical Zwanzig–Mori’s kinetic equation for the Hamilton statistical systems. Bibliography Primary Literature 1. Markov AA (1906) Two-dimensional Brownian motion and harmonic functions. Proc Phys Math Soc Kazan Imp Univ 15(4):135–178; in Russian 2. Chapman S, Couling TG (1958) The mathematical theory of nonuniform gases. Cambridge University Press, Cambridge 3. Albeverio S, Blanchard P, Steil L (1990) Stochastic processes and their applications in mathematics and physics. Kluwer, Dordrecht 4. Rice SA, Gray P (1965) The statistical mechanics of simple liquids. Interscience, New York 5. Kubo R, Toda M, Hashitsume N, Saito N (2003) Statistical physics II: Nonequilibrium statistical mechanics. In: Fulde P (ed) Springer Series in Solid-State Sciences, vol 31. Springer, Berlin, p 279 6. Ginzburg VL, Andryushin E (2004) Superconductivity. World Scientific, Singapore 7. Sachs I, Sen S, Sexton J (2006) Elements of statistical mechanics. Cambridge University Press, Cambridge 8. Fetter AL, Walecka JD (1971) Quantum theory of many-particle physics. Mc Graw-Hill, New York 9. Chandler D (1987) Introduction to modern statistical mechanics. Oxford University Press, Oxford 10. Zwanzig R (2001) Nonequilibrium statistical mechanics. Cambridge University Press, Cambridge 11. Zwanzig R (1961) Memory effects in irreversible thermodynamics. Phys Rev 124:983–992 12. Mori H (1965) Transport, collective motion and Brownian motion. Prog Theor Phys 33:423–455; Mori H (1965) A continued fraction representation of the time correlation functions. Prog Theor Phys 34:399–416 13. Grabert H, Hänggi P, Talkner P (1980) Microdynamics and nonlinear stochastic processes of gross variables. J Stat Phys 22:537–552 14. Grabert H, Talkner P, Hänggi P (1977) Microdynamics and time-evolution of macroscopic non-Markovian systems. Z Physik B 26:389–395

Correlations in Complex Systems

15. Grabert H, Talkner P, Hänggi P, Thomas H (1978) Microdynamics and time-evolution of macroscopic non-Markovian systems II. Z Physik B 29:273–280 16. Hänggi P, Thomas H (1977) Time evolution, correlations and linear response of non-Markov processes. Z Physik B 26:85–92 17. Hänggi P, Talkner P (1983) Memory index of first-passage time: A simple measure of non-Markovian character. Phys Rev Lett 51:2242–2245 18. Hänggi P, Thomas H (1982) Stochastic processes: Time-evolution, symmetries and linear response. Phys Rep 88:207–319 19. Lee MH (1982) Orthogonalization process by recurrence relations. Phys Rev Lett 49:1072–1072; Lee MH (1983) Can the velocity autocorrelation function decay exponentially? Phys Rev Lett 51:1227–1230 20. Balucani U, Lee MH, Tognetti V (2003) Dynamic correlations. Phys Rep 373:409–492 21. Hong J, Lee MH (1985) Exact dynamically convergent calculations of the frequency-dependent density response function. Phys Rev Lett 55:2375–2378 22. Lee MH (2000) Heisenberg, Langevin, and current equations via the recurrence relations approach. Phys Rev E 61:3571– 3578; Lee MH (2000) Generalized Langevin equation and recurrence relations. Phys Rev E 62:1769–1772 23. Lee MH (2001) Ergodic theory, infinite products, and long time behavior in Hermitian models. Phys Rev Lett 87(1– 4):250601 24. Kubo R (1966) Fluctuation-dissipation theorem. Rep Progr Phys 29:255–284 25. Kawasaki K (1970) Kinetic equations and time correlation functions of critical fluctuations. Ann Phys 61:1–56 26. Michaels IA, Oppenheim I (1975) Long-time tails and Brownian motion. Physica A 81:221–240 27. Frank TD, Daffertshofer A, Peper CE, Beek PJ, Haken H (2001) H-theorem for a mean field model describing coupled oscillator systems under external forces. Physica D 150:219–236 28. Vogt M, Hernandez R (2005) An idealized model for nonequilibrium dynamics in molecular systems. J Chem Phys 123(1– 8):144109 29. Sen S (2006) Solving the Liouville equation for conservative systems: Continued fraction formalism and a simple application. Physica A 360:304–324 30. Prokhorov YV (1999) Probability and mathematical statistics (encyclopedia). Scien Publ Bolshaya Rossiyskaya Encyclopedia, Moscow 31. Yulmetyev R et al (2000) Stochastic dynamics of time correlation in complex systems with discrete time. Phys Rev E 62:6178–6194 32. Yulmetyev R et al (2002) Quantification of heart rate variability by discrete nonstationarity non-Markov stochastic processes. Phys Rev E 65(1–15):046107 33. Reed M, Samon B (1972) Methods of mathematical physics. Academic, New York 34. Graber H (1982) Projection operator technique in nonequilibrium statistical mechanics. In: Höhler G (ed) Springer tracts in modern physics, vol 95. Springer, Berlin 35. Yulmetyev RM (2001) Possibility between earthquake and explosion seismogram differentiation by discrete stochastic non-Markov processes and local Hurst exponent analysis. Phys Rev E 64(1–14):066132 36. Abe S, Suzuki N (2004) Aging and scaling of earthquake aftershocks. Physica A 332:533–538

37. Tirnakli U, Abe S (2004) Aging in coherent noise models and natural time. Phys Rev E 70(1–4):056120 38. Abe S, Sarlis NV, Skordas ES, Tanaka HK, Varotsos PA (2005) Origin of the usefulness of the natural-time representation of complex time series. Phys Rev Lett 94(1–4):170601 39. Stanley HE, Meakin P (1988) Multifractal phenomena in physics and chemistry. Nature 335:405–409 40. Ivanov P Ch, Amaral LAN, Goldberger AL, Havlin S, Rosenblum MG, Struzik Z, Stanley HE (1999) Multifractality in human heartbeat dynamics. Nature 399:461–465 41. Mokshin AV, Yulmetyev R, Hänggi P (2005) Simple measure of memory for dynamical processes described by a generalized Langevin equation. Phys Rev Lett 95(1–4):200601 42. Allegrini P et al (2003) Compression and diffusion: A joint approach to detect complexity. Chaos Soliton Fractal 15: 517–535 43. Amaral LAN et al (2001) Application of statistical physics methods and concepts to the study of science and technology systems. Scientometrics 51:9–36 44. Arneodo A et al (1996) Wavelet based fractal analysis of DNA sequences. Physica D 96:291–320 45. Ashkenazy Y et al (2003) Magnitude and sign scaling in power-law correlated time series. Physica A Stat Mech Appl 323:19–41 46. Ashkenazy Y et al (2003) Nonlinearity and multifractality of climate change in the past 420,000 years. Geophys Res Lett 30:2146 47. Azbel MY (1995) Universality in a DNA statistical structure. Phys Rev Lett 75:168–171 48. Baldassarri A et al (2006) Brownian forces in sheared granular matter. Phys Rev Lett 96:118002 49. Baleanu D et al (2006) Fractional Hamiltonian analysis of higher order derivatives systems. J Math Phys 47:103503 50. Blesic S et al (2003) Detecting long-range correlations in time series of neuronal discharges. Physica A 330:391–399 51. Cajueiro DO, Tabak BM (2004) The Hurst exponent over time: Testing the assertion that emerging markets are becoming more efficient. Physica A 336:521–537 52. Brecht M et al (1998) Correlation analysis of corticotectal interactions in the cat visual system. J Neurophysiol 79: 2394–2407 53. Brouersa F, Sotolongo-Costab O (2006) Generalized fractal kinetics in complex systems (application to biophysics and biotechnology). Physica A 368(1):165–175 54. Coleman P, Pietronero L (1992) The fractal structure of the universe. Phys Rep 213:311–389 55. Goldberger AL et al (2002) What is physiologic complexity and how does it change with aging and disease? Neurobiol Aging 23:23–26 56. Grau-Carles P (2000) Empirical evidence of long-range correlations in stock returns. Physica A 287:396–404 57. Grigolini P et al (2001) Asymmetric anomalous diffusion: An efficient way to detect memory in time series. FractalComplex Geom Pattern Scaling Nat Soc 9:439–449 58. Ebeling W, Frommel C (1998) Entropy and predictability of information carriers. Biosystems 46:47–55 59. Fukuda K et al (2004) Heuristic segmentation of a nonstationary time series. Phys Rev E 69:021108 60. Hausdorff JM, Peng CK (1996) Multiscaled randomness: A possible source of 1/f noise in biology. Phys Rev E 54: 2154–2157

721

722

Correlations in Complex Systems

61. Herzel H et al (1998) Interpreting correlations in biosequences. Physica A 249:449–459 62. Hoop B, Peng CK (2000) Fluctuations and fractal noise in biological membranes. J Membrane Biol 177:177–185 63. Hoop B et al (1998) Temporal correlation in phrenic neural activity. In: Hughson RL, Cunningham DA, Duffin J (eds) Advances in modelling and control of ventilation. Plenum Press, New York, pp 111–118 64. Ivanova K, Ausloos M (1999) Application of the detrended fluctuation analysis (DFA) method for describing cloud breaking. Physica A 274:349–354 65. Ignaccolo M et al (2004) Scaling in non-stationary time series. Physica A 336:595–637 66. Imponente G (2004) Complex dynamics of the biological rhythms: Gallbladder and heart cases. Physica A 338:277–281 67. Jefferiesa P et al (2003) Anatomy of extreme events in a complex adaptive system. Physica A 318:592–600 68. Karasik R et al (2002) Correlation differences in heartbeat fluctuations during rest and exercise. Phys Rev E 66:062902 69. Kulessa B et al (2003) Long-time autocorrelation function of ECG signal for healthy versus diseased human heart. Acta Phys Pol B 34:3–15 70. Kutner R, Switala F (2003) Possible origin of the non-linear long-term autocorrelations within the Gaussian regime. Physica A 330:177–188 71. Koscielny-Bunde E et al (1998) Indication of a universal persistence law governing atmospheric variability. Phys Rev Lett 81:729–732 72. Labini F (1998) Scale invariance of galaxy clustering. Phys Rep 293:61–226 73. Linkenkaer-Hansen K et al (2001) Long-range temporal correlations and scaling behavior in human brain oscillations. J Neurosci 21:1370–1377 74. Mercik S et al (2000) What can be learnt from the analysis of short time series of ion channel recordings. Physica A 276:376–390 75. Montanari A et al (1999) Estimating long-range dependence in the presence of periodicity: An empirical study. Math Comp Model 29:217–228 76. Mark N (2004) Time fractional Schrodinger equation. J Math Phys 45:3339–3352 77. Niemann M et al (2008) Usage of the Mori–Zwanzig method in time series analysis. Phys Rev E 77:011117 78. Nigmatullin RR (2002) The quantified histograms: Detection of the hidden unsteadiness. Physica A 309:214–230 79. Nigmatullin RR (2006) Fractional kinetic equations and universal decoupling of a memory function in mesoscale region. Physica A 363:282–298 80. Ogurtsov MG (2004) New evidence for long-term persistence in the sun’s activity. Solar Phys 220:93–105 81. Pavlov AN, Dumsky DV (2003) Return times dynamics: Role of the Poincare section in numerical analysis. Chaos Soliton Fractal 18:795–801 82. Paulus MP (1997) Long-range interactions in sequences of human behavior. Phys Rev E 55:3249–3256 83. Peng C-K et al (1994) Mosaic organization of DNA nucleotides. Phys Rev E 49:1685–1689 84. Peng C-K et al (1995) Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos 5:82–87

85. Poon CS, Merrill CK (1997) Decrease of cardiac chaos in congestive heart failure. Nature 389:492–495 86. Rangarajan G, Ding MZ (2000) Integrated approach to the assessment of long range correlation in time series data. Phys Rev E 61:4991–5001 87. Robinson PA (2003) Interpretation of scaling properties of electroencephalographic fluctuations via spectral analysis and underlying physiology. Phys Rev E 67:032902 88. Rizzo F et al (2005) Transport properties in correlated systems: An analytical model. Phys Rev B 72:155113 89. Shen Y et al (2003) Dimensional complexity and spectral properties of the human sleep EEG. Clinic Neurophysiol 114:199–209 90. Schmitt D et al (2006) Analyzing memory effects of complex systems from time series. Phys Rev E 73:056204 91. Soen Y, Braun F (2000) Scale-invariant fluctuations at different levels of organization in developing heart cell networks. Phys Rev E 61:R2216–R2219 92. Stanley HE et al (1994) Statistical-mechanics in biology – how ubiquitous are long-range correlations. Physica A 205: 214–253 93. Stanley HE (2000) Exotic statistical physics: Applications to biology, medicine, and economics. Physica A 285:1–17 94. Tarasov VE (2006) Fractional variations for dynamical systems: Hamilton and Lagrange approaches. J Phys A Math Gen 39:8409–8425 95. Telesca L et al (2003) Investigating the time-clustering properties in seismicity of Umbria-Marche region (central Italy). Chaos Soliton Fractal 18:203–217 96. Turcott RG, Teich MC (1996) Fractal character of the electrocardiogram: Distinguishing heart-failure and normal patients. Ann Biomed Engin 24:269–293 97. Thurner S et al (1998) Receiver-operating-characteristic analysis reveals superiority of scale-dependent wavelet and spectral measures for assessing cardiac dysfunction. Phys Rev Lett 81:5688–5691 98. Vandewalle N et al (1999) The moving averages demystified. Physica A 269:170–176 99. Varela M et al (2003) Complexity analysis of the temperature curve: New information from body temperature. Eur J Appl Physiol 89:230–237 100. Varotsos PA et al (2002) Long-range correlations in the electric signals that precede rupture. Phys Rev E 66:011902 101. Watters PA (2000) Time-invariant long-range correlations in electroencephalogram dynamics. Int J Syst Sci 31:819–825 102. Wilson PS et al (2003) Long-memory analysis of time series with missing values. Phys Rev E 68:017103 103. Yulmetyev RM et al (2004) Dynamical Shannon entropy and information Tsallis entropy in complex systems. Physica A 341:649–676 104. Yulmetyev R, Hänggi P, Gafarov F (2000) Stochastic dynamics of time correlation in complex systems with discrete time. Phys Rev E 62:6178 105. Yulmetyev R, Gafarov F, Hänggi P, Nigmatullin R, Kayumov S (2001) Possibility between earthquake and explosion seismogram processes and local Hurst exponent analysis. Phys Rev E 64:066132 106. Yulmetyev R, Hänggi P, Gafarov F (2002) Quantification of heart rate variability by discrete nonstationary non-Markov stochastic processes. Phys Rev E 65:046107 107. Yulmetyev R, Demin SA, Panischev OY, Hänggi P, Tima-

Correlations in Complex Systems

shev SF, Vstovsky GV (2006) Regular and stochastic behavior of Parkinsonian pathological tremor signals. Physica A 369:655

Books and Reviews Badii R, Politi A (1999) Complexity: Hierarchical structures and scaling in physics. Oxford University Press, New York Elze H-T (ed) (2004) Decoherence and entropy in complex systems. In: Selected lectures from DICE 2002 series: Lecture notes in physics, vol 633. Springer, Heidelberg

Kantz H, Schreiber T (2004) Nonlinear time series analysis. Cambridge University Press, Cambridge Mallamace F, Stanley HE (2004) The physics of complex systems (new advances and perspectives). IOS Press, Amsterdam Parisi G, Pietronero L, Virasoro M (1992) Physics of complex systems: Fractals, spin glasses and neural networks. Physica A 185(1–4):1–482 Sprott JC (2003) Chaos and time-series analysis. Oxford University Press, New York Zwanzig R (2001) Nonequilibrium statistical physics. Oxford University Press, New York

723

724

Cost Sharing

Cost Sharing MAURICE KOSTER University of Amsterdam, Amsterdam, Netherlands Article Outline Glossary Definition of the Subject Introduction Cooperative Cost Games Non-cooperative Cost Games Continuous Cost Sharing Models Future Directions Bibliography Glossary Core The core of a cooperative cost game hN; ci is the set of all coalitionally stable vectors of cost shares. Cost function A cost function relates to each level of output of a given production technology the minimal necessary units of input to generate it. It is non-decreasing function c : X ! RC , where X is the (ordered) space of outputs. Cost sharing problem A cost sharing problem is an orN is a profile of indidered pair (q; c), where q 2 RC vidual demands of a fixed and finite group of agents N D f1; 2; : : : ; ng and c is a cost function. Game theory The branch of applied mathematics and economics that studies situations where players make decisions in an attempt to maximize their returns. The essential feature is that it provides a formal modeling approach to social situations in which decision makers interact. Cost sharing rule A cost sharing rule is a mapping that assigns to each cost sharing problem under consideration a vector of non-negative cost shares. Demand game Strategic game where agents place demands for output strategically. Demand revelation game Strategic game where agents announce their maximal contribution strategically. Strategic game An ordered triple G D hN; (A i ) i2N ; (- i ) i2N i, where  N D f1; 2; : : : ; ng is the set of players,  Ai is the set of available actions for player i,  - i is a preference relation over the set of possible consequences C of action.

Definition of the Subject Throughout we will use a fixed set of agents N D f1; 2; : : : ; ng, where n is a given natural number. For subsets S; T of N, we write S  T if each element of S is contained in T; TnS denotes the set of agents in T except those in S. The power set of N is the set of all subsets of N; each coalition S  N will be identified with the element 1 S 2 f0; 1g N , the vector with ith coordinate equal to 1 precisely when i 2 S. Fix a vector x 2 R N and S  N. The projection of x on RS is denoted by xS , and x NnS is sometimes more conveniently denoted by xS . For any y 2 RS , (xS ; y) stands for the vector z 2 R N such that z i D x i if P i 2 NnS and z i D y i if i 2 S. We denote x(S) D i2S x i . The vector in RS with all coordinates equal to zero is denoted by 0S . Other notation will be introduced when necessary. This article focuses on different approaches in the literature through a discussion of a couple of basic and illustrative models, each involving a single facility for the production of a finite set M of outputs, commonly shared by a fixed set N :D f1; 2; : : : ; ng of agents. The feasible set of M. outputs for the technology is identified with a set X  RC It is assumed that the users of the technology may freely dispose over any desired quantity or level of the outputs; each agent i has some demand x i 2 X for output. Each profile of demands x 2 X N is associated to its cost c(x), i. e. the minimal amount of the idiosyncratic input commodity needed to fulfill the individual demands. This defines the cost function c : X N ! RC for the technology, comprising all the production externalities. A cost sharing problem is an ordered pair (x; c) of a demand profile x and a cost function c. The interpretation is that x is produced and the resulting cost c(x) has to be shared by the collective N. Numerous practical applications fit this general description of a cost sharing problem. In mathematical terms a cost sharing problem is equivalent to a production sharing problem where output is shared based on the profile of inputs. However, although many concepts are just as meaningful as they are in the cost sharing context, results are not at all easily established using this mathematical duality. In this sense consider [68] as a warning to the reader, showing that the strategic analysis of cost sharing solutions is quite different from surplus sharing solutions. This monograph will center on cost sharing problems. For further reference on production sharing see [51,67,68,91]. Introduction In many practical situations managers or policy-makers deal with private or public enterprises with multiple users.

Cost Sharing

A production technology facilitates its users, causing externalities that have to be shared. Applications are numerous, ranging from environmental issues like pollution, fishing grounds, to sharing multipurpose reservoirs, road systems, communication networks, and the Internet. The essence in all these examples is that a manager cannot directly influence the behavior of the users, but only indirectly by addressing the externalities through some decentralization device. By choosing the right instrument the manager may help to shape and control the nature of the resulting individual and aggregate behavior. This is what is usually understood as the mechanism design or implementation paradigm. The state-of the-art literature shows for a couple of simple but illustrative cost sharing models that one cannot push these principles too far, as there is often a trade-off between the degree of distributive justice and economic efficiency. Then this is what makes choosing ‘the’ right solution an ambiguous task, certainly without a profound understanding of the basic allocation principles. Now first some examples will be discussed. Example 1 The water-resource management problem of the Tennessee Valley Authority (TVA) in the 1930s is a classic in the cost-sharing literature. It concerns the construction of a dam in a river to create a reservoir, which can be used for different purposes like flood control, hydro-electric power, irrigation, and municipal supply. Each combination of purposes requires a certain dam height and accompanying construction costs have to be shared by the purposes. Typical for the type of problem is that up to a certain critical height there are economies of scale as marginal costs of extra height are decreasing. Afterwards, marginal costs increase due to technological constraints. The problem here is to allocate the construction costs of a specific dam among the relevant purposes. Example 2 Another illustrative cost sharing problem dating back from the early days in the cost sharing literature [69,70] deals with landing fee schedules at airports, so-called airport problems. These were often established to cover the costs of building and maintaining the runways. The cost of a runway essentially depends on the size of the largest type of airplane that has to be accommodated – a long runway can be used by smaller types as well. Suppose there are m types of airplanes and that ci is the cost of constructing a landing strip suitable for type i. Moreover, index the types from small to large so that 0 D c0 < c1 < c2 <    < c m . In the above terminology the technology can be described by X D f0; 1; 2; : : : ; mg, and the cost function c : X N ! RC is defined by c(x) D c k where k D max fx i j i 2 Ng is the maximal service level required in x. Suppose that in a given

year, N k is the set landings of type k airplanes, then the set of users of the runway is N D [ k N k . The problem is now to apportion the full cost c(x) of the runway to the users in N, where x is the demand vector given by x i D ` if i 2 N` . Airport problems describe a wide range of cost sharing problems, ranging from sharing the maintenance cost of a ditch system for irrigation projects [1], to sharing the dredging costs in harbors [14]. Example 3 A joint project involves a number of activities for which the estimated durations and precedence relations are known. Delay in each of these components affects the period in which the project can be realized. Then a cost sharing problem arises when the joint costs due to the accumulated delay are shared among the individuals causing the delays. See [21]. Example 4 In many applications the production technology is given by a network G D (V ; E) with nodes V and set of costly edges E  V  V, and cost function c : E ! RC . The demands of the agents are now parts of the infrastructure, i. e. subsets of E. Examples include the sharing the cost of infrastructure for supply of energy and water, or transport systems. For example, the above airport problem can be modeled as such with V D f1; 2; : : : ; mg [ fˇg ; E D f(ˇ; 1); (1; 2); : : : ; (m  1; m)g : Graphically, the situation is depicted by the line graph in Fig. 1. Imagine that the runway starts at the special node ˇ, and that the edges depict the different pieces of runway served to the players. An airplane of type k is situated at node k, and needs all edges towards ˇ. The edge left to the kth node is called e k D (k  1; k), and the corresponding cost is c(e k ) D c k  c k1 . The demand of an airplane at node k is now described by the edges on the path from node k to ˇ. Example 5 In more general network design problems, a link facilitates a flow; for instance, in telecommunication it is data flowing through the network, in road systems it is traffic. [49] discusses a model where a network planner allocates the fixed cost of a network based on the individual

Cost Sharing, Figure 1 Graphical representation of an airport problem

725

726

Cost Sharing

demands being flows. [77,124] discuss congested telecommunication networks, where the cost of a link depends on the size of the corresponding flow. Then these positive network externalities lead to a concentration of flow, and thus to hub-like networks. Economies of scale require cooperation of the users, and the problem now is to share the cost of these so-called hub-like networks. Example 6 As an insurance against the uncertainty of the future net worths of its constituents, firms are often regulated to hold an amount of riskless investments, i. e. its risk capital. Given that returns of normal investments are higher, the difference with the riskless investments is considered as a cost. The sum of the risk capitals of each constituent is usually larger than the risk capital of the firm as a whole, and the allocation problem is to apportion this diversification effect observed in risk measurements of financial portfolios. See Denault [28]. Solving Cost Sharing Problems: Cost Sharing Rules A vector of cost shares for the cost sharing problem (x; c) is an element y 2 R N with the property that P i2N y i D c(x). This equality is also called budget-balancing condition. Central issue addressed in the cost sharing literature is how to determine the appropriate y. The vast majority of the cost sharing literature is devoted to a mechanistic way of sharing joint costs; given a class of cost sharing problems P , a (simple) formula computes the vector of cost shares for each of its elements. This yields a cost sharing rule  : P ! R N where (P) is the vector of cost shares for each P 2 P . At this point it should be clear to the reader that many formula’s will see to a split of joint cost and heading for the solution to cost sharing problems is therefore an ambiguous task. The least we want from a solution is that it is consistent with some basic principles of fairness or justice and, moreover, that it creates the right incentives. Clearly, the desirability of solution varies with the context in which it is used, and so will the sense of appropriateness. Moreover, the different parties involved in the decision making process will typically hold different opinions; accountants, economists, production managers, regulators and others all are looking at the same institutional entity from different perspectives. The existing cost sharing literature is about exploring boundaries of what can be thought of desirable features of cost sharing rules. More important than the rules themselves, are the properties that each of them is consistent with. Instead of building a theory on single instances of cost sharing problems, the cost sharing literature discusses structural invariance properties over classes of problems. Here the main distinction is made on the ba-

sis of the topological properties of the technology, whether the cost sharing problem allows for a discrete or continuous formulation. For each type of models, divisible or indivisible goods, the state-of-the-art cost sharing literature has developed into two main directions, based on the way individual preferences over combinations of cost shares and (levels of) service are treated. On the one hand, there is a stream of research in which individual preferences are not explicitly modeled and demands are considered inelastic. Roughly, it accommodates the large and vast growing axiomatic literature (see e. g. [88,129]) and the theory on cooperative cost games [105,130,136,145]. Secondly, there is the literature on cost sharing models where individual preferences are explicitly modeled and demands are elastic. The focus is on non-cooperative demand games in which the agents are assumed to choose their demands strategically, see e. g. [56,91,141]. As an interested reader will soon find out, in the literature there is no shortage of plausible cost sharing techniques. Instead of presenting a kind of summary, this article focuses on the most basic and most interesting ones, and in particular their properties with respect to strategic interplay of the agents.

Outline The article is organized as follows. Section “Cooperative Cost Games” discusses cost sharing problems from the perspective of cooperative game theory. Basic concepts like core, Shapley value, nucleolus and egalitarian solution are treated. Section “Non-cooperative Cost Games” introduces the basic concepts of non-cooperative game theory including dominance relations, preferences, and Nash-equilibrium. Demand games and demand revelation games are introduced for discrete technologies with concave cost function. This part is concluded with two theorems, the strategic characterization of the Shapley value and constrained egalitarian solution as cost sharing solution, respectively. Section “Continuous Cost Sharing Models” introduces the continuous production model and it consists of two parts. First the simple case of a production technology with homogeneous and perfectly divisible private goods is treated. Prevailing cost sharing rules like proportional, serial, and Shapley–Shubik are shortly introduced. We then give a well-known characterization of additive cost sharing rules in terms of corresponding rationing methods, discuss the related cooperative and strategic games. The second part is devoted to the heterogeneous output model and famous solutions like Aumann–Shapley, Shapley–Shubik, and serial rules. We fi-

Cost Sharing

nalize with Sect. “Future Directions” where some future directions of research are spelled out. Cooperative Cost Games A discussion of cost sharing solutions and incentives needs a proper framework wherein the incentives are formalized. In the seminal work of von Neumann and Morgenstern [140] the notion of a cooperative game was introduced as to model the interaction between actors/players who coordinate their strategies in order to maximize joint profits. Shubik [122] was one of the first to apply this theory in the cost sharing context. Cooperative Cost Game A cooperative cost game among players in N is a function c : 2 N ! R with the property that c(¿) D 0; for nonempty sets S  N the value c(S) is interpreted as the cost that would arise should the individuals in S work together and serve only their own purposes. The class of all cooperative cost games for N will be denoted by CG. Any general class P of cost sharing problems can be embedded in CG as follows. For the cost sharing problem (x; c) 2 P among agents in N define the stand-alone cost game c x 2 CG by

c x (S) :D

c(x S ; 0 NnS ) 0

if S  N; S ¤ ¿ if S D ¿ :

(1)

So c x (S) can be interpreted as the cost of serving only the agents in S. Example 7 The following numerical example will be frequently referred to. An airport is visited by three airplanes in the set N D f1; 2; 3g, which can be accommodated at cost c1 D 12; c2 D 20, and c3 D 33, respectively. The situation is depicted in Fig. 2. The corresponding cost game c is determined by associating each coalition S of airplanes to the minimum cost of the runway needed to accommodate each of its members. Then the corresponding cost game c is given by the table below. Slightly abusing notation we denote c(i) to indicate c(fig), c(ij) for c(fi; jg) and so forth. S ¿ 1 2 3 12 13 23 123 c(S) 0 12 20 33 20 33 33 33

Cost Sharing, Figure 2 Airport game

Cost Sharing, Figure 3 Minimum cost spanning tree problem

Note that, since we identified coalitions of players in N with elements in 2 N , we may write c to denote the cooperative cost game. By the binary nature of the demands the cost function for the technology formally is a cooperative cost game. For x D (1; 0; 1) the corresponding cost game cx is specified by S ¿ 1 2 3 12 13 23 123 cx (S) 0 12 0 33 12 33 33 33

Player 2 is a dummy player in this game, for all S  Nn f2g it holds c x (S) D c x (S [ f2g). Example 8 Consider the situation as depicted in the Fig. 3 below, where three players, each situated at a different node, want to be connected to the special node ˇ using the indicated costly links. In order to connect themselves to ˇ a coalition S may use only links with ˇ and the direct links between its members, and then only if the costs are paid for. For instance, the minimum cost of connecting player 1 in the left node to ˇ is 10, and the cost of connecting players 1 and 2 to ˇ are 18 – the cost of the direct link from 2 and the indirect link between 1 and 2. Then the associated cost game is given by S ¿ 1 2 3 12 13 23 123 c(S) 0 10 10 10 18 20 19 27

Notice that in this case the network technology exhibits positive externalities. The more players want to be connected, the lower the per capita cost. For those applications where the cost c(S) can be determined irrespective of the actions taken by its complement NnS the interpretation of c implies sub-additivity, i. e. the property that for all S; T  N with S \ T D ¿ implies c(S [ T)  c(S) C c(T). This is for instance an essential feature of the technology underlying natural monopolies (see, e. g., [13,120]). Note that the cost games in Example 7 and 8 are sub-additive. This is a general property for airport games as well as minimum cost spanning tree games.

727

728

Cost Sharing

Cost Sharing, Figure 4 Non-concave MCST game

Sometimes the benefits of cooperation are even stronger. A game is called concave (or sub-modular) if for all S; T  N we have (S [ T) C c(S \ T)  c(S) C c(T) :

(2)

At first this seems a very abstract property, but one may show that it is equivalent with the following that c(S [ fig)  c(S) c(T [ fig)  c(T)

(3)

for all coalitions S  T  Nn fig. This means that the marginal cost of a player i with respect to larger coalitions is non-increasing, i. e. the technology exhibits positive externalities. Concave games are also frequently found in the network literature, see [63,75,93,121].

the imputation set. If, in a similar fashion, x(S)  c(S) for all S  N then x is called stable; under proposal x no coalition S has a strong incentive to go alone, as it is not possible to redistribute the cost shares afterwards and make every defector better of. The core of a cost game c, notation core(c), consists of all stable vectors of cost shares for c. If cooperation on a voluntary basis by the grand coalition N is conceived as a desirable feature then the core and certainly the imputation set impose reasonable conditions for reaching it. Nevertheless, the core of a game can be empty. Call a collection B of coalitions balanced, if there is a vector of positive weights (S )S2B such that for all i 2 N X

S D 1 :

S2B;S3i

A cost game c is balanced if it holds for each balanced collection B of coalition that X S c(S) c(N) : S2B

It is the celebrated theorem below which characterizes all games with non-empty cores. Theorem 1 (Bondareva–Shapley [20,117]) The cost game c is balanced if and only if the core of c is non-empty. Concave cost games are balanced, see [119]. Concavity is not a necessary condition for non-emptiness of the core, since minimum cost spanning tree games are balanced as well.

Example 9 Although sub-additive, minimum cost spanning tree games are not always concave. Consider the following example due to [17]. The numbers next to the edges indicate the corresponding cost. We assume a complete graph and that the invisible edges cost 4. Note that in this game every three-player coalition is connected at cost 12, whereas c(34) D 16. Then c(1234) c(234) D 1612 D 4 whereas c(134)  c(34) D 4. So the marginal cost of player 1 is not decreasing with respect to larger coalitions.

Example 10 Consider the two-player game c defined by c(12) D 10; c(1) D 3; c(2) D 8. Then core(c) D f(x; 10  x)j2  x  3g. Note that, opposed to the general case, for two-player games sub-additivity is equivalent with non-emptiness of the core.

Incentives in Cooperative Cost Games

Cooperative Solutions

The objective in cooperative games is to share the profits or costs savings of cooperation. Similar to the general framework, a vector of cost shares for a cost game c 2 CG is a vector x 2 R N such that x(N) D c(N). The question is what cost share vectors make sense if (coalitions of) players have the possibility to opt out thereby destroying cooperation on a larger scale. In order to ensure that individual players join, a proposed allocation x should at least be individual rational so that x i  c(i) for all i 2 N. In that case no player has a justified claim to reject x as proposal, since going alone yields a higher cost. The set of all such elements is called

A solution on a subclass A of CG is a mapping  : A ! R N that assigns to each c 2 A a vector of cost shares (c);  i (c) stands for the charge to player i. The Separable Cost Remaining Benefit Solution Common practice among civil engineers to allocate costs of multipurpose reservoirs is the following solution. The separable cost for each player (read purpose) i 2 N is given by s i D c(N)  c(Nn fig), and the remaining benefit by r i D c(i)  s i . The separable cost remaining benefit solution charges each player i for the separable cost si and the P non-separable costs c(N)  j2N s j are then allocated in

Cost Sharing

proportion to the remaining benefits ri , leading to the formula 2 3 X ri 4 c(N)  SCRB i (c) D s i C P (4) sj5 : j2N r j j2N

In this formula it is assumed that c is at least sub-additive to ensure that the ri ’s are all positive. For the two-player game c in Example 10 the solution is given by SCRB(c) D (2 C 12 (10  9); 7 C 12 (10  9)) D (2 12 ; 7 12 ). In earlier days the solution was as well known as ‘the alternate cost avoided method’ or ‘alternative justifiable expenditure method’. For references see [144]. Shapley Value One of the most popular and oldest solution concepts in the literature on cooperative games is due to Shapley [116], and named Shapley-value. Roughly it measures the average marginal impact of players. Consider an ordering of the players  : N ! N so that  (i) indicates the ith player in the order. Let   (i) be the set of the first i players according to ; so   (1) D f(1)g ;   (2) D f(1);  (2)g, etc. The marginal cost share vector m (c) 2 R N is defined by m(1) (1) D c( (1)) and for i D 2; 3; : : : ; n m(i) (c) D c(  (i))  c(  (i  1)) :

(5)

So according to m each player is charged with the increase in costs when joining the coalition of players before her. Then the Shapley-value for c is defined as the average of all n! marginal vectors, i. e. ˚(c) D

1 X  m (c) : n! 

(6)

Example 11 Consider the airport game in Example 7. Then the marginal vectors are given by  (123) (132) (213) (231) (312) (321)  (c) (12,8,13) (12,0,21) (0,20,13) (0,20,13) (0,0,33) (0,0,33)

Hence the Shapley value of the corresponding game is ˚ (c) D (4; 8; 21). Following [69,107], for airport games this allocation is easily interpreted as the allocation according to which each player pays an equal share of the cost of only those parts of the runway she uses. Then c(e1 ) is shared by all three players, c(e2 ) only by players 2 and 3, and, finally, c(e3 ) is paid in full by player 3. This interpretation extends to the class of standard fixed tree games, where instead of the lattice structure of the runway, there is a cost of a tree network to be shared, see [63]. If cost game is concave then the Shapley-value is in the core. Since then each marginal vector specifies a core-

element, and in particular the Shapley-value as a convex combination of these. Reconsider the minimum cost spanning tree game c in Example 9, a non-concave game with non-empty core and ˚(c) D (2 23 ; 2 23 ; 6 23 ; 4). Note that this is not a stable cost allocation since the coalition f2; 3g would profit by defecting, c(23) D 8 < 9 13 D ˚2 (c) C ˚3 (c). [50] show that in general games ˚(c) 2 core(c) precisely when c is average concave. Although not credited as a core-selector, the classic way to defend the Shapley-value is by the following properties. Symmetry Two players i; j are called symmetric in the cost game c if for all coalitions S not containing i; j it holds c(S [ fig) D c(S [ f jg). A solution  is symmetric if symmetric players in a cost game c get the same cost shares. If the cost game does not provide any evidence to distinguish between two players, symmetry is the property endorsing equal cost shares. Dummy A player i in a cost game c is dummy if c(S [ fig) D c(S) for all coalitions S. A solution  satisfies dummy if  i (c) D 0 for all dummy players i in c. So when a player has no impact on costs whatsoever, she can not be held responsible. Additivity A solution is additive if for all cost games c1 ; c2 it holds that (c1 ) C (c2 ) D (c1 C c2 ) :

(7)

For accounting reasons, in multipurpose projects it is a common procedure to subdivide the costs related to the different activities (players) into cost categories, like salaries, maintenance costs, marketing, et cetera. Each category ` is associated with a cost game c` where c` (S) is the total of category ` cost made for the different activities in S; P then c(S) D ` c` (S) is the joint cost for S. Suppose a solution is applied to each of the cost categories separately, then under an additive solution the aggregate cost share of an activity is independent from the particular cross-section in categories. Theorem 2 (Shapley [116]) ˚ is the unique solution on CG which satisfies all three properties dummy, symmetry, additivity. Note that SCRB satisfies dummy and symmetry, but that it does not satisfy additivity. The Shapley-value is credited with other virtues, like the following due to [144]. Consider the practical situation that several division managers simultaneously take steps to increase efficiency by decreasing joint costs, but one division manager establishes a greater relative improvement in the sense that its

729

730

Cost Sharing

marginal contribution to the cost associated with all possible coalitions increases. Then it is more than reasonable that this division should not be penalized. In a broader context this envisions the idea that each player in the cost game should be credited with the merits of ‘uniform’ technological advances. Strong Monotonicity Solution  is strongly monotonic if for any two cost games c; c it holds for all i 2 N that c(S [ fig)  c(S) c(S [ fig)  c(S) for all S Nn fig implies  i (c)  i (c). Anonymity is the classic property for solutions declaring independence of solution with respect to the name of the actors in the cost sharing problem. See e. g., [3,91,106]. Formally, the definition is as follows. For a given permutation  : N ! N and c 2 CG define  c 2 CG by  c(S) D c((S)) for all S N. Anonymity Solution  is anonymous if for all permutations  of N, and all i 2 N, (i) ( c) D  i (c) for all cost games c. Theorem 3 (Young [144]) The Shapley-value is the unique anonymous and strongly monotonic solution. [99] introduced the balanced contributions axiom for the model of non-transferable utility games, or games without side-payments, see [118]. Within the present context of CG, a solution  satisfies the balanced contributions axiom if for any cost game c and for any non-empty subset S N, fi; jg S 2 N it holds that  i (S; c)   i (Sn f jg ; c) D  j (S; c)   j (Sn fig ; c) : (8) The underlying idea is the following. Suppose that players agree on using solution  and that coalition S forms. Then  i (S; c)   i (Sn f jg ; c) is the amount player i gains or loses when S is already formed and player j resigns. The balanced contributions axiom states that the gains and/or losses by other player’s withdrawal from the coalition should be the same. Theorem 4 (Myerson [99]) There is a unique solution on CG that satisfies the balanced contributions axiom, and that is ˚. The balanced contribution property can be interpreted in a bargaining context as well. In the game c and with solution  a player i can object against player j to the solution (c) when the cost share for j increases when i steps out of the cooperation, i. e.  j (N; c)  j (N n fig). In turn, a counter objection by player j to this objection is an assertion that player i would suffer more when j ends cooperation, i. e.  j (N; c)   j (N n fig)   i (N; c)

 i (N n f jg). The balanced contribution property is equivalent to the requirement that each objection is balanced by a counter objection. For an excellent overview of ideas developed in this spirit, see [74]. Another marginalistic approach is by [44]. Denote for c 2 CG the game restricted to the players in S N by (S; c). Given a function P : CG ! R which associates a real number P(N; c) to each cost game c with player set N, the marginal cost of a player i is defined to be D i P(c) D P(N; c)  P(Nn fig ; c). Such a function P with P P(¿; c) D 0 is called potential if i2N D i P(N; c) D c(N). Theorem 5 (Hart & Mas-Colell [44]) There exists a unique potential function P, and for every c 2 CG the resulting payoff vector DP(N; c) coincides with ˚(c). Egalitarian Solution The Shapley-value is one of the first solution concepts proposed within the framework of cooperative cost games, but not the most trivial. This would be to neglect all asymmetries between the players and split total costs equally between them. But as one can expect egalitarianism in this pure form will not lead to a stable allocation. Just consider the two-player game in Example 10 where pure egalitarianism would dictate the allocation (5; 5), which violates individual rationality for player 1. In order to avoid these problems of course we can propose to look for the most egalitarian allocation within the core (see [7,31]). Then in this line of thinking what is needed in Example 10 is a minimal transfer of cost 2 to player 2, leading to the final allocation (3; 7) – the constrained egalitarian solution. Although in the former example is was clear what allocation to chose, in general we need a tool to evaluate allocations for the degree of egalitarianism. The earlier mentioned papers all suggest the use of Lorenz-order (see, e. g. [8]). More precisely, consider two vectors of cost shares x and x 0 such that x(N) D x 0 (N). Assume that these vectors are ordered in decreasing order so that x1 x2    x n and x10 x20    x n0 . Then x Lorenz-dominates x 0 – read x is more egalitarian than x 0 – if for all k D 1; : : : ; n  1 it holds that k X iD1

xi 

k X

x 0i ;

(9)

iD1

with at least one strict inequality. That is, x is better for those paying the most. Example 12 Consider the three allocations of cost 15 among three players x D (6; 5; 4); x 0 D (6; 6; 3), and x 00 D (7; 4; 4). Firstly, x Lorenz-dominates x 00 since x1 D

Cost Sharing

6 < 7 D x100 , and x1 C x2 D x10 C x20 . Secondly, x Lorenz-dominates x 0 since x1 D x10 ; x1 C x2 < x10 C x20 . Notice, however, that on the basis of only Eq. (9) we can not make any judgment what is the more egalitarian of the allocations x 0 and x 00 . Since x10 D 6 < 7 D x100 but x10 C x20 D 6 C 6 > 7 C 4 D x100 C x200 . The Lorenz-order is only a partial order. The constrained egalitarian solution is the set of Lorenzundominated allocations in the core of a game. Due to the partial nature of the Lorenz-order there may be more than one Lorenz-undominated elements in the core. And what if the core is empty? The constrained egalitarian solution is obviously not a straight-forward solution. The original idea of constrained egalitarianism as in [30], focuses on the Lorenz-core instead of the core. It is shown that there is at most one such allocation, that may exist even when the core of the underlying game is empty. For concave cost games c, the allocation is well defined and denoted by E (c). In particular this holds for airport games. Intriguingly, empirical studies [1,2] show there is a tradition in using the solution for this type of problems. For concave cost games c, there exists an algorithm to compute E (c). This method, due to [30], performs the following consecutive steps. First determine the maximal set S1 of players minimizing the per capita cost c(S)/jSj, where jSj is the size of the coalition S. Then each of these players in S1 pays c(S1 )/jS1 j. In the next step determine the maximal set S2 of players in NnS1 minimizing c2 (S)/jSj, where c2 is the cost game defined by c2 (S) D c(S1 [ S)  c(S1 ). The players in S2 pay c2 (S2 )/jS2 j each. Continue in this way just as long as not everybody is allocated a cost share. Then in at most n steps this procedure results in an allocation of total cost, the constrained egalitarian solution. In short the algorithm is as follows  Stage 0: Initialization, put S0 D ¿; x  D 0 N , go to stage t D 1.  Stage t: Determine S t 2 arg max

S¤¿

c(S [ S t1 )  c(S t1 ) : jSj

Put S t D S t1 [ S t and for i 2 S t , x i :D

c(S t )  c(S t1 ) : jS t j

If S t D N, we are finished, put E (c) D x  . Else repeat the stage with t :D t C 1. For example, this algorithm can be used to calculate the constrained egalitarian solution for the airport game in

Cost Sharing, Figure 5 Standard fixed tree

Example 7. In the first step we determine S1 D f1; 2g, together with cost shares 10 for the players 1 and 2. Player 3 is allocated the remaining cost in the next step; hence the corresponding final allocation is E (c) D (10; 10; 13). Example 13 Consider the case where six players share the cost of the following tree network that connects them to ˇ. The standard fixed tree game c for this network associates to each coalition of players the minimum cost of connecting cost of connecting each member to ˇ, where it may use all the links of the tree. This type of games is known to be concave we can use the above algorithm to calculate E (c). In the first step we determine S1 D f1; 3; 4g and each herein pays 8. Then in the second step the game remains where the edges e1 ; e3 ; e4 connecting S1 have been paid for. Then it easily follows that S2 D f2; 5g, so that players 2 and 5 pay 9 each, leaving 10 as cost share for player 6. Thus, we find E (c) D (8; 9; 8; 8; 9; 10). Nucleolus Given a cost game c 2 CG the excess of a coalition S  N with respect to a vector x 2 R N is defined as e(S; x) D x(S)  c(S); it measures dissatisfaction of S under proposal x. Arrange the excesses of all coalitions S ¤ N; ¿ in decreasing order and call the resultn ing vector #(x) 2 R2 2 . A vector of cost shares x will be preferred to a vector y, notation x  y, whenever #(x) is smaller than #(y) in the lexicographic order, i. e. there exists i  such that for i  i   1 it holds # i (x) D # i (y) and # i  (x) < # i  (y). Schmeidler [115] showed that in the set of individual rational cost sharing vectors there is a unique element that is maximal with respect to , which is called the nucleolus. This allocation, denoted by (c), is based on the idea of egalitarianism that the largest complaints of coalitions should consistently be minimized. The concept gained much popularity as a core-selector, i. e. it is

731

732

Cost Sharing

a one-point solution contained in the core when it is nonempty. This contrasts with the constrained egalitarian solution which might not be well defined, and the Shapleyvalue which may lay outside the core. Example 14 Consider in Example 7 the excesses of the different coalitions with respect to the constrained egalitarian solution E (c) D (10; 10; 13) and the nucleolus

(c) D (6; 7; 20): S 1 2 3 12 13 23 e(S; E (c)) 2 10 20 0 10 10 e(S; (c)) 6 13 13 7 7 6

Then the ordered excess vectors are #(10; 10; 13) D (10; 0; 2; 10; 10; 20) ; #(6; 7; 20) D (6; 6; 7; 7; 13; 13) : Note that indeed #( (c))  #(E (c)) since #1 (6; 7; 20) D 6 < 10 D #1 (10; 10; 13) : The nucleolus of standard fixed tree games may be calculated as a particular home-down allocation, as was pointed out by Maschler et al. [75]. For standard fixed tree games and minimum cost spanning tree games the special structure of the technology makes it possible to calculate the nucleolus in polynomial time, i. e. with a number of calculations bounded by a multiple of n2 (see [39]). Sometimes one may even express the nucleolus through a nice formula; Legros [66] showed a class of cost sharing problems for which the nucleolus equals the SCRB solution. But in general calculations are hard and involve solving a linear program with a number of inequalities which is exponential in n. [124] suggests to use the nucleolus on the cost game corresponding to hubgames. Instead of the direct comparison of excesses like above, the literature also discusses weighted excesses as to model the asymmetries of justifiable complaints within coalitions. For instance the per capita nucleolus minimizes maximal excesses which are divided by the number of players in the coalition (see [105]).

Cost Sharing, Figure 6 Induced cost sharing rules

recall that each cost sharing problem (x; c) is associated its stand-alone cost game c x 2 CG, as in Eq. (1). Now let  be a solution on a subclass of A  CG and B a class of cost sharing problems (x; c) for which c x 2 A. Then a cost sharing rule  is defined on B through (x; c) D (c x ) :

(10)

The general idea is illustrated in the diagram on the left. For example, since the Shapley value is defined on the class of all cost games, it defines a cost sharing rule ˚ on the class of all cost sharing problems. The cost sharing rule E is defined on the general class of cost sharing problems with corresponding concave cost game. Cost sharing rules derived in this way, game-theoretical rules according to [130], will be most useful below. Non-cooperative Cost Games Formulating the cost sharing problem through a cooperative cost game assumes inelastic demands of the players. It might well be that for some player the private merits of service do not outweigh the cost share that is calculated by the planner. She will try to block the payment when no service at no cost is a preferred outcome. Another aspect is that the technology may operate at a sub-optimal level if benefits of delivered services are not taken into account. Below the focus is on a broader framework with elastic demands, which incorporates preferences of a player are defined over combinations of service levels and cost shares. The theory of non-cooperative games will provide a proper framework in which we can discuss individual aspirations and efficiency of outcomes on a larger scale.

Cost Sharing Rules Induced by Solutions Most of the above numerical examples deal with cost sharing problems which have a natural and intuitive representation as a cost game. Then basically on such domains of cost sharing problems there is no difference between cost sharing rules and solutions. It may seem that the cooperative solutions are restricted to this kind of situations. But

Strategic Demand Games At the heart of this non-cooperative theory is the notion of a strategic game, which models an interactive decisionmaking process among a group of players whose decisions may impact the consequences for others. Simultaneously, each player i independently chooses some available action

Cost Sharing

ai and the so realized action profile a D (a1 ; a2 ; : : : ; a n ) is associated with some consequence f (a). Below we will have in mind demands or offered contributions as actions, and consequences are combinations of service levels with cost shares. Preferences Over Consequences Denote by A the set of possible action profiles and C as the set of all consequences of action. Throughout we will assume that players have preferences over the different consequences of action. Moreover, such preference relation can be expressed by a utility function u i : C ! R such that for z; z0 2 C it holds u i (z)  u i (z0 ) if agent i weakly prefers z0 to z. Below the set of consequences for agent i 2 N will consist of pairs (x; y) where x is the level of service and y a cost share, so that a utilities are specified through multi-variable functions, (x; y) 7! u i (x; y). Preferences Over Action Profiles In turn define for each agent i and all a 2 A, U i (a) D u i ( f (a)); then U i assigns to each action profile the utility of its consequence. We will say that the action profile a 0 is weakly preferred to a by agent i if U i (a)  U i (a 0 ); U i is called agent i’s utility function over action profiles. Strategic Game and Nash Equilibrium A strategic game is an ordered triple G D hN; (A i ) i2N ; (U i ) i2N i where  N D f1; 2; : : : ; ng is the set of players,  Ai is the set of available actions for player i,  U i is player i’s utility function over action profiles. Rational players in a game will choose optimal actions in order to maximize utility. The most commonly used concept in game theory is that of Nash-equilibrium, a profile of strategies from where unilateral deviation by a single player does not pay. It can be seen as a steady state of action in which players hold correct beliefs about the actions taken by others and act rationally. An important assumption here is the level at which the players understand the game; usually it is taken as a starting point that players know the complete description of the game, including the action spaces and preferences of others. Nash Equilibrium (Nash 1950) An action profile a in a strategic game G D hN; (A i ) i2N ; (u i ) i2N i is a Nashequilibrium if, for every player i it holds u i (a  )

 ) for every a 2 A . u i (a i ; ai i i The literature discusses several refinements of this equilibrium concept. One that will play a role in the games below is that of strong Nash equilibrium due to Aumann [9]; it is

a Nash equilibrium a in a strategic game G such that for all S  N and action profile aS there exists a player i 2 S such that u i (a S ; aNnS )  u i (a  ). This means that a strong Nash-equilibrium guarantees stability against coordinated deviations, since within the deviating coalition there is at least one agent who does not strictly improve. Example 15 Consider the following two-player strategic game with N D f1; 2g, A1 D fT; Bg and A2 D fL; M; Rg. Let the utilities be as in the table below L M R T 5,4 2,1 3,2 B 4,3 5,2 2,5

Here player 1 chooses a row, and player 2 a column. The numbers in the cells summarize the individual utilities corresponding to the action profiles; the first number is the utility of player 1, the second that of player 2. In this game there is a unique Nash-equilibrium, which is the action profile (T; L). Dominance in Strategic Games In the game G D hN; (A i ) i2N ; (U i ) i2N i, the action a i 2 A i is weakly dominated by a0i 2 A i if U i (a i ; ai )  U i (a 0i ; ai ) for all ai 2 Ai , with strict inequality for some profile of actions ai . If strict inequality holds for all ai then ai is strictly dominated by a˜ i . Rational players will not use strictly dominated strategies, and, as far as prediction of play is concerned, these may be eliminated from the set of possible actions. If we do this elimination step for each player, then we may reconsider whether some actions are dominated within the reduced set of action profiles. This step-by-step reduction of action sets is called the procedure of successive elimination of (strictly) dominated strategies. The set of all action profiles surviving this procedure is denoted by D1 . Example 16 In Example 15 action M of player 2 is strictly dominated by L and R. Player 1 has no dominated actions. Now eliminate M from the actions for player 2. Then the reduced game is L R T 5,4 3,2 B 4,3 2,5

Notice that action L for player 1 was not dominated in the original game, for the reason that B was the better of the two actions against M. But if M is never played, T is strictly better than B. Now eliminate B, yielding the reduced game L R T 5,4 3,2

733

734

Cost Sharing

In this game, L dominates R; hence the only action profile surviving the successive elimination of strictly dominated strategies is (T; L). A stronger notion than dominance is the following. Call an action a i 2 A i overwhelmed by a0i 2 A i if max fU i (a i ; ai )jai 2 Ai g  ˚ < min U i (a 0i ; ai )jai 2 Ai : Then O1 is the set of all action profiles surviving the successive elimination of overwhelmed actions. This notion is due to [34,37]. In Example 15 the action M is overwhelmed by L, not by R. Moreover, the remaining actions in O1 are B; T; L, and R.

Here qi takes values 0 (no service) or 1 (service) and xi stands for the allocated cost. So player 1 prefers to be served at unit cost instead of not being served at zero cost, u1 (0; 0) D 0 < 7 D u1 (1; 1). The infrastructure is seen as an excludable public good, so those with demand 0 do not get access to the technology. Each player now actively chooses to be served or not, so her action set is specified by A i D f0; 1g. Recall the definition of cx as in Eq. (1). Then given a profile of such actions a D (a1 ; a2 ; a3 ) and cost shares ˚(a; c), utilities of the players in terms of action profiles become U i (a) D u i (a i ; ˚ i (a; c)), so that U1 (a) D 8a1  ˚ 1 (a; c) ; U2 (a) D 6a1  ˚ 2 (a; c) ; U3 (a) D 30a3  ˚ 3 (a; c) :

Demand Games Strategic games in cost sharing problems arise when we assume that the users of the production technology choose their demands strategically and a cost sharing rule sees to an allocation of the corresponding service costs. The action profiles Ai are simply specified by the demand spaces of the agents, and utilities are specified over combinations of (level of) received service and accompanying cost shares. Hence utilities are defined over consequences of action, u i (q i ; x i ) denotes i’s utility at receiving service level qi and cost share xi ; ui is increasing in the level of received service xi , and decreasing in the allocated cost yi . Now assume a cost function c and a cost sharing rule . Then given a demand profile a D (a1 ; a2 ; : : : ; a n ) the cost sharing rule determines a vector of cost shares (a; c), and in return also the corresponding utilities over demands U i (a) D u i (a i ;  i (a; c)). Observe that agents influence each others utility via the cost component. The demand game for this situation is then the strategic game G(; c) D hN; (A i ) i2N ; (U i ) i2N i :

(11)

Example 17 Consider the airport problem in Example 7. Each player now may request service (1) or not (0). Then the cost function is fully described by the demand of the largest player. That is, c(x) D 33 if 3 requires service, c(x) D 20 for all profiles with x2 D 1; x3 D 0 and c(x) D 12 if x D (1; 0; 0), c(0; 0; 0) D 0. Define the cost sharing rule ˚(x; c) D ˚(c x ), that is, ˚ calculates the Shapley-value for the underlying cost game cx as in Eq. (1). Assume that the players’ preferences over ordered pairs of service level and cost shares are fully described by u1 (q1 ; x1 ) D 8q1  x1 ; u2 (q2 ; x2 ) D 6q2  x2 ; u3 (q3 ; x3 ) D 30q3  x3 :

Now that we provided all details of the demand game G(˚ ; c), let us look for (strong) Nash-equilibria. Suppose that the action profile a D (1; 0; 1) is played in the game. Then in turn the complete infrastructure is realized just for players 1 and 3 and the cost allocation is given by (6; 0; 27). Then the vectors of individual utilities is given by (2; 0; 3). Now if we consider unilateral deviations from a , what happens to the individual utilities? U1 (0; 0; 1) D 0 < 2 D U1 (1; 0; 1) ; U2 (1; 1; 1) D 6  ˚ 2 ((1; 1; 1); c) D 6  ˚(c) D 6  8 D  2 < 0 D U2 (1; 0; 1) ; U3 (1; 0; 0) D 0 < 3 D U3 (1; 0; 1) : This means that for each player unilateral deviation does not pay, a is a Nash-equilibrium. The first inequality shows as well why the action profile (0; 0; 1) is not. It is easy to see that the other Nash equilibrium of this game is the action profile (0; 0; 0), no player can afford the completion of the infrastructure just for herself. Notice however that this zero profile is not a strong Nash equilibrium as players 1 and 3 may well do better by choosing for service at the same time, ending up in (1; 0; 1). The latter profile is the unique strong Nash-equilibrium of the game. Similar considerations in the demand game G(E ; c) induced by the constrained egalitarian solution lead to the unique strong Nash-equilibrium (0; 0; 0), nobody wants service. With cost sharing rules as decentralization tools, the literature postulates Nash-equilibria of the related demand game as the resulting behavioral mode. This is a delicate step because – as the example above shows – it is easy

Cost Sharing

to find games with many equilibria, which causes a selection problem. And what can we do if there are no equilibria? This will not be the topic of this text and the interested reader is referred to any standard textbook on game theory, for instance see [103,104,109]. If there is a unique equilibrium then it is taken as the prediction of actual play. Demand Revelation Games For a social planner one way to retrieve the level at which to operate the production facility is via a pre-specified demand game. Another way is to ask each of the agents for the maximal amount that she is willing to contribute in order to get service, and then, contingent on the reported amounts, install an adequate level of service together with a suitable allocation of costs. Opposed to demand games ensuring the level of service, in a demand revelation game each player is able to ensure a maximum charge for service. The approach will be discussed under the assumption of a discrete production technology with binary demands, so that the cost function c for the technology is basically the characteristic function of a cooperative game. Moreover assume that the utilities of the agents in N are quasilinear and given by u i (q i ; x i ) D ˛ i q i  x i

(12)

sensible ones will grant the players some control over the outcomes. We postulate the following properties:  Voluntary Participation (VP) Each agent i can guarantee herself the welfare level u i (0; 0) (no service, no payment) by reporting truthfully the maximal willingness to pay, which is ˛ i under Eq. (12).  Consumer Sovereignty (CS) For each agent i a report yi exists so that she receives service, irrespective of the reports by others. Now suppose that the planner receives the message  D ˛, known to her as the profile of true player characteristics. Then for economic reasons she could choose to serve a coalition S of players that maximizes the net benefit at ˛, (S; ˛) D ˛(S)  c(S). However, problems will arise when some player i is supposed to pay more than ˛ i , so the planner should be more careful than that. She may want to choose a coalition S with maximal (S; ˛) such that (1 S ; c)  ˛ holds; such set S is called efficient. But in general the planner cannot tell whether the players reported truthfully or not, then what should she do then? One option is that she applies the above procedure thereby naively holding each reported profile  for the true player characteristics. In other words, she will pick a coalition that solves the following optimization problem max

(S; ) D (S)  c(S)

s.t.

(1 S ; c)   :

S N

where q i 2 f0; 1g denotes the service level, xi stands for the cost share, and ˛ i is a non-negative real number. [93] discusses this framework and assume that c is concave, and [64,148] moreover take c as the joint cost function for the realization of several discrete public goods. Demand Revelation Mechanisms Formally, a revelation mechanism M assigns to each profile  of reported maximal contributions a set S() of agents receiving service and x() a vector of monetary compensations. Here we will require that these monetary compensations are cost shares; given some cost sharing rule  the vector x() is given by (1 S() ; c) where c is the relevant cost function. Moreover, note that by restricting ourselves to cost share vectors we implicitly assume non-positive monetary transfers. The budget balance condition is crucial here, otherwise mechanisms of a different nature must be considered as well, see [23,40,41]. There are other ways that a planner may use to determine a suitable service level is by demanding pre-payment from the players, and determine a suitable service level on the basis of these labeled contributions, see [64,148]. Many mechanisms come to mind, but in order to avoid too much arbitrariness from the planner’s side, the more

(13)

Denote such a set by S(; ). If this set is unique then the demand revelation mechanism M() selects S(; ) to be served at cost shares determined by x() D (1 S ; c). This procedure will be explained through some numerical examples. Example 18 Consider the airport problem and utilities of players over service levels and cost shares as in Example 7. Moreover assume the planner uses the Shapley cost sharing rule ˚ as in Example 17 and that she receives the true profile of preferences from the players, ˛ D (8; 6; 30). Calculate for each coalition S the net benefits at ˛: S ¿ 1 2 3 12 13 23 N (S; ˛) 0 -4 -14 -3 -6 8 3 11

Not surprisingly, the net benefits are the highest for the grand coalition. But if N were selected by the mechanism the corresponding cost shares are given by ˚(1 N ; c) D (4; 8; 21), and player 2 is supposed to contribute more than she is willing to. Then the second highest net benefits are generated by serving S D f1; 3g, with cost shares ˚(1 S ; c) D (6; 0; 27). Then f1; 3g is the solution to Eq. (13).

735

736

Cost Sharing

What happens if some of the players misrepresent their preferences, for instance like in  D (13; 6; 20)? The planner determines the conceived net benefits S ¿ 1 2 3 12 13 23 N (S; ) 0 1 -14 -13 -1 0 -7 6

Again, if the planner served the coalition with the highest net benefit, N, then player 2 would refuse to pay. Second highest net benefit corresponds to the singleton S D f1g, and this player will get service under M(˚) since 3 D 13 > 12 D c(1; 0; 0). Example 19 Consider the same situation but now with E instead of ˚ as cost sharing rule. Now consider the following table, restricted to coalitions with non-negative net benefits (all other will not be selected): S ¿ 13 23 (S; ˛) 0 8 3 E (1S ; c) (0,0,0) (12,0,21) (0;

N 11 33 33 ; ) (10; 10; 13) 2 2

Here only the empty coalition S D ¿ satisfies the requirement E (1S ; c) D (0; 0; 0)  (8; 6; 30); hence according to the mechanism M(E ) nobody will get service. In general, the optimization problem Eq. (13) does not give unique solutions in case of which a planner should still further specify what she does in those cases. For concave cost functions, consider the following sequence of coalitions S1 D N; S t D f i 2 S t1 j  i  i (1 S t ; c)g : So, starting with the grand coalition N, at each consecutive step those players are removed whose maximal contributions are not consistent with the proposed cost share – until the process settles down. The set of remaining players defines a solution to Eq. (13) and taken to define S(; ). Strategyproofness The essence of a demand revelation mechanism M is that its rules are set up in such a way that it provides enough incentives for the players not to lie about their true preferences. We will now discuss the most common non-manipulability properties of revelation mechanisms in the literature. N , where ˛ corresponds to Fix two profiles ˛ 0 ; ˛ 2 RC the true maximal willingness’ to pay. Let (q0 ; x 0 ) and (q; x) be the allocations implemented by the mechanism M on receiving the messages ˛ 0 and ˛, respectively. The mechanism M is called strategy-proof if it holds for all i 2 N that ˛ 0Nnfig D ˛ Nnfig implies u i (q0i ; x 0i )  u i (q i ; x i ). So, given the situation that the other agents report truthfully, unilateral deviation by agent i from the true

preference never produces better outcomes for her. Similarly, M is group strategy-proof if deviations of groups of agents does not pay for all deviators, i. e., for all T  N the fact that ˛ 0NnT D ˛ NnT implies u i (q0i ; x 0i )  u i (q i ; x i ) for all i 2 T. So, under a (group) strategy-proof mechanism, there is no incentive to act untruthfully by misrepresenting the true preferences and this gives a benevolent planner control over the outcome. Cross-Monotonicity Cost sharing rule  is called crossmonotonic if the cost share of an agent i is not increasing if other agents demand more service, in case of a concave cost function c. Formally, if x x and x i D x i then  i (x; c)   i (x; c); each agent is granted a (fair) share of the positive externality due to an increase in demand by others. Proposition 1 (Moulin & Shenker [93]) The only group strategy-proof mechanisms M() satisfying VP and CS are those related to cross-monotonic cost sharing rules . There are many different cross-monotonic cost sharing rules, and thus just as many mechanisms that are group strategy-proof. Examples include the mechanisms M(˚) and M(E ), because ˚ and E are cross-monotonic. However, the nucleolus is not cross-monotonic and does therefore not induce a strategy-proof mechanism. Above we discussed two instruments a social planner may invoke to implement a desirable outcome without knowing the true preferences of the agents. Basically, the demand revelation games define a so-called direct mechanism. The announcement of the maximal price for service pins down the complete preferences of an agent. So in fact the planner decides upon the service level based on a complete profile of preferences. In case of a crossmonotonic cost sharing rule, under the induced mechanism truth-telling is a weakly dominant strategy; announcing the true maximal willingness to pay is optimal for the agent regardless of the actions of others. This means good news for the social planner as the mechanism is self-organizing: the agents need not form any conjecture about the behavior of others in order to know what to do. In the literature [24] such a mechanism is called straightforward. The demand games define an indirect mechanism, as by reporting a demand the agents do no more than signaling their preferences to the planner. Although in general there is a clear distinction between direct and indirect mechanisms, in the model presented in this section these are nevertheless strongly connected. Focus on a demand game G(; c); recall that in this game the agents simultaneously and independently decide upon requesting service or not and costs are shared using the rule  amongst those agents receiving service. Sup-

Cost Sharing

pose that for each profile of utility functions as in Eq. (12) the resulting game G(; c) has a unique (strong) Nashequilibrium. Then this equilibrium can be taken to define a mechanism. That is, the mechanism elicits u and chooses the unique equilibrium outcome of the reported demand game. Then this mechanism is equivalent with the demand revelation mechanism. Observe that indeed the strong equilibrium (1; 0; 1) in the game G(˚ ; c) in Example 17 corresponds to players chosen by M(˚) under truthful reporting. And where no player is served in the strong equilibrium of G(E ; c), none of the players is selected by M(E ). It is a general result in implementation theory due to [24] that a direct mechanism constructed in this way is (group) strategy-proof provided the underlying space of preferences is rich. It is easily seen that the above sets of preferences meet the requirements. To stress importance of such a structural property as richness, it is instructive to point at what is yet to come in Sect. “Uniqueness of Nash-Equilibria in P 1 -Demand Games”. Here, the strategic analysis of the demand game induced by the proportional rule shows uniqueness of Nash equilibrium on the domain of preferences L if only costs are convex. However, this domain is not rich, and the direct mechanism defined in the same fashion as above by the Nash equilibrium selection is not strategyproof. Efficiency and Strategy-Proof Cost Sharing Mechanisms Suppose cardinal utility for each agent, so that inter-comparison of utility is allowed. Proceeding on the net benefit of a coalition, we may define its value at ˛ by v(N; ˛) D max (S; ˛) ; S N

(14)

where (S; ˛) is the net benefit of S at ˛. A coalition S such that v(N; ˛) D (S; ˛) is called efficient. It will be clear that a mechanism M() that is defined through the optimization problem Eq. (13) will not implement an efficient coalition of served players, due to the extra constraint on the cost shares. For instance, in Example 7 the value of the grand coalition at ˛ D (8; 6; 30) is given by v(N; ˛) D ˛(N)c(N) D 44  33 D 11. At the same profile the implemented outcome by mechanism M(˚) gives rise to a total surplus of 38  30 D 8 for the grand coalition – which is not optimal. The mechanism M(E ) performs even worse as it leads to the stand alone surplus 0, none is served. This observation holds for far more general settings, and, moreover, it is a well known result from implementation theory that – under non-constant marginal cost – any strategy-proof mechanism based on full coverage of total

costs will not always implement efficient outcomes. For the constant marginal cost case see [67,71]. Then, if there is an unavoidable loss in using demand revelation mechanisms, can we still tell which mechanisms are more efficient? Is it a coincidence that in the above examples the Shapley value performs better than the egalitarian solution? The welfare loss due to M() at a profile of true preferences ˛ is given by L(; ˛) D v(N; ˛)  f˛(S(; ˛))  c(S)g :

(15)

For instance, with ˛ D (8; 6; 30) in the above examples we calculate L(˚ ; ˛) D 11  8 D 3 and L(E ; ˛) D 11  0 D 11. An overall measure of quality of a cost sharing rule  in terms of efficiency loss is defined by  () D sup˛ L(; ˛) :

(16)

Theorem 6 (Moulin & Shenker [93]) Among all mechanisms M() derived from cross-monotonic cost sharing rules , the Shapley rule ˚ has the unique smallest maximal efficiency loss, or  (˚) <  () if  ¤ ˚. Notice that this makes a strong case for the Shapley-value against the egalitarian solution. The story does, however, not end here. [98] considers a model where the valuations of the agents for the good are independent random variables, drawn from a distribution function F satisfying the monotone hazard condition. This means that the function defined by h(x) D f (x)/(1  F(x)) is non-decreasing, where F has f as density function. It is shown that the constrained egalitarian solution maximizes the probability that all members of any given coalition accept the cost shares imputed to them. Moreover, [98] characterized the solution in terms of efficiency. Suppose for the moment that the planner calculated cost share vector x for the coalition S, and that its members are served conditional on acceptance of the proposed cost shares. The probability that all members of S accept the shares is given by Q P(x) D i2S (1  F(x i )), and if we assume that the support of F is (0; m) then the expected surplus from such an offer can be calculated as follows " # XZ m ui W(x) D P(x) dF(u i )  c(S) : (17) x i 1  F(x i ) i2S

The finding of [98] is that for log-concave f , i. e. x 7! ln( f (x)) is concave [5], the mechanism based on the constrained egalitarian solution not only maximizes the probability that a coalition accepts the proposal, but it maximizes its expected surplus as well. Formally, the result is the following.

737

738

Cost Sharing

Theorem 7 (Mutuswami (2004)) If the profile of valuations (u i ) i2N are independently drawn form a common distribution function F with log-concave and differentiable density function f , then W(E (1 S ; c)) W((1 S ; c)) for all cross monotonic solutions  and all S  N. Extension of the Model: Discrete Goods Suppose the agents consume idiosyncratic goods produced in indivisible units. Then given a profile of demands the cost associated with the joint production must be shared by the users. Then this model generalizes the binary good model discusses so far and it is a launch-pad to the continuous framework in the next section. In this discrete good setting [86] characterizes the cost sharing rules which induce strategyproof social choice functions defined by the equilibria of the corresponding demand game. As it turns out, these rules are basically the sequential stand alone rules, according to which costs are shared in an incremental fashion with respect to a fixed ordering of the agents. This means that such a rule charges the first agent for her stand alone costs, the second for the stand alone cost for the first two users minus the stand alone cost of the first, et cetera. Here the word ‘basically’ refers to all discrete cost sharing problems other than those with binary goods. Then here the sufficient condition for strategyproofness is that the underlying cost sharing rule be cross monotonic, which admits other rules than the sequential ones – like E and ˚.

Denote the set of all such cost functions by C 1 and the related cost sharing problems by P 1 . Several cost sharing rules on P 1 have been proposed in the literature. Average Cost Sharing Rule This is the most popular and oldest concept in the literature and advocates Aristotle’s principle of proportionality.

AV (x; c) D

x x(N)

0

 c(x(N))

if x ¤ 0 N if x D 0 N

(18)

Shapley–Shubik Rule Each cost sharing problem (x; c) 2 P 1 is related to the stand-alone cost game cx such that c x (S) D c(x(S)). Then the Shapley–Shubik rule is determined by application of the Shapley-value to this game: SS (x; c) D ˚(c x ) : Serial Rule This rule, due to Moulin and Shenker [91], determines cost shares by considering particular intermediate cost levels. More precisely, given (x; c) 2 P 1 it first relabels the agents by increasing demands such that x1  x2      x n . The intermediate production levels are y 1 D nx1 ; y 2 D x1 C (n  1)x2 ; : : : ; yk D

k1 X

x j C (n  k C 1)x k ; : : : ;

jD1

Continuous Cost Sharing Models Continuous Homogeneous Output Model, P1 This model deals with production technologies for one single perfectly divisible output commodity. Moreover, we will restrict ourselves to private goods. Many ideas below have been studied for public goods as well, for further references see, e. g., [33,72,82]. The demand space of an individual is given by X D RC . The technology is described by a non-decreasing cost function c : RC ! R such that c(0) D 0, i. e. there are N costs no fixed costs. Given a profile of demands x 2 RC c(x(N)) have to be shared. Moreover, the space of cost functions will be restricted to those c being absolutely continuous. Examples include the differentiable and Lipschitzcontinuous functions. Absolute continuity implies that aggregate costs for production can be calculated by the total of marginal costs Z c(y) D 0

y

c 0 (t) dt :

y n D x(N) : These levels are chosen such that at each new level one agent more is fully served his demand: at y1 each agent is handed out x1 , at y2 agent 1 is given x1 and the rest x2 , etc. The serial cost shares are now given by

SR i (x; c) D

k X c(y ` )  c(y `1 ) : n`C1 `D1

So according to SR each agent pays a fair share of the incremental costs in each stage that she gets new units. Example 20 Consider the cost sharing problem (x; c) with x D (10; 20; 30) and c(y) D 12 y 2 . Then first calculate the intermediate production levels y 0 D 0; y 1 D 30; y 2 D 50, and y 3 D 60. Then the cost shares are calculated as follows 1SR (x; c) D

c(y 1 )  c(y 0 ) D 150 ; 3

Cost Sharing

c(y 2 )  c(y 1 ) 2 1250  450 D 150 C D 550 ; 2 3SR (x; c) D 2SR (x; c) C c(y 3 )  c(y 2 ) D 550 C 550

2SR (x; c) D 1SR (x; c) C

D 1100 : The serial rule has attracted much attention lately in the network literature, and found its way in fair queuing packet scheduling algorithms in routers [27]. Decreasing Serial Rule De Frutos [26] proposes serial cost shares where demands of agents are put in decreasing order. Resulting is the decreasing serial rule. Consider N such that x  x      x . a demand vector x 2 RC 1 2 n Define recursively the numbers y ` for ` D 1; 2; : : : ; n by y ` D `x` C x`C1 C    C x n , and put y nC1 D 0. Then the decreasing serial rule is defined by n X c(y ` )  c(y `C1 ) (x; c) D : DSR i `

(19)

per unit of the output good. [47,57] propose variations on the serial rule that coincide with the increasing (decreasing) serial rule in case of a convex (concave) cost function, meeting the positivity requirement. Marginal Pricing Rule A popular way of pricing an output of a production facility is marginal cost pricing. The price of the output good is set to cover the cost producing one extra unit. It is frequently used in the domain of public services and utilities. However, a problem is that for concave cost functions the method leads to budget deficits. An adapted form of marginal cost pricing splits these deficits equally over the agents. The marginal pricing rule is defined by   0 0 1 MP i (x; c) D x i c (x(N))C n c(x(N))  x(N)c (x(N)) : (20) Note that in case of convex cost functions agents can receive negative cost shares, just like it is the case with decreasing serial cost sharing.

`Di

Example 21 For the cost sharing problem in Example 20 calculate y 1 D 90; y 2 D 70; y 1 D 60, then c(y 3 )  c(y 4 ) 4050  0 3 (x; c) D D D 1350 ; 3 3 c(y 2 )  c(y 3 ) 2DSR (x; c) D 3DSR (x; c) C 2 2450  4050 D 1350 C D 550 ; 2 DSR DSR 1 1 (x; c) D 2 (x; c) C (c(y )  c(y 2 )) DSR

D 550 C (1800  2450) D 100 : Notice that here the cost share of agent 1 is negative, due to the convexity of c! This may be considered as an undesirable feature of the cost sharing rule. Not only are costs increasing in the level of demand, in case of a convex cost function each agent contributes to the negative externality. It seems fairly reasonable to demand a non-negative contribution in those cases, so that none profits for just being there. The mainstream cost sharing literature includes positivity of cost shares into the very definition of a cost sharing rule. Here we will add it as a specific property: Positivity  is positive if (x; c) 0 N for all (x; c) in its domain. All earlier discussed cost sharing rules have this property, except for the decreasing serial rule. The decreasing serial rule is far more intuitive in case of economies of scale, in presence of a concave cost function. The larger agents now are credited with a lower price

Additive Cost Sharing and Rationing The above cost sharing rules for homogeneous production models share the following properties: Additivity (x; c1 C c2 ) D (x; c1 ) C (x; c2 ) for all relevant cost sharing problems. This property carries the same flavor as the homonymous property for cost games. Constant Returns (x; c) D # x for linear cost functions c such that c(y) D # y for all y. So if the agents do not cause any externality, the fixed marginal cost is taken as a price for the good. It turns out that the class of all positive cost sharing rules with these properties can be characterized by solutions to rationing problems – which are the most basic of all models of distributive justice. A rationing problem amongst the N  R such that agents in N consists of a pair (x; t) 2 RC C x(N) t; t is the available amount of some (in)divisible good and x is the set of demands. The inequality sees to the interpretation of rationing as not every agent may get all she wants. A rationing method r is a solution to rationing problems, such that each problem (x; t) is assigned a vector of N such that 0  r(x; t)  x. The latter shares r(x; t) 2 RC N restriction is a weak solidarity statement assuring that everybody’s demand be rationed in case of shortages. For t 2 RC define the special cost function  t by  t (y) D min fy; tg. The cone generated by these base

739

740

Cost Sharing

Cost Sharing, Figure 7 Intermediate production levels

functions lays dense in the space of all absolutely continuous cost functions c; if we know what the values (x;  t ) are, then basically we know (x; c). Denote by M the class of all cost sharing rules with the properties positivity, additivity, and constant returns.

Under increasing returns to scale, this implies that P ; SR are core-selectors but MC is not. [137] associates to each cost sharing problem (x; c) a pessimistic one, (x; c  ); here c  (y) reflects the maximum of marginal cost on [0; x(N)] to produce y units,

Theorem 8 (Moulin & Shenker [88,92]) Consider the following mappings associating rationing methods with cost sharing rules and vice versa, Z x(N) c 0 (t)dr(x; t) ; r 7!  : (x; c) D

8 ˚R 0  ˆ < sup T c (t)dt j T  [0; x(N)]; (T) D y c  (y) D if y  x(N) ; ˆ : c(y) else : (21)

0

 7! r : r(x; t) D (x;  t ) :

Here  denotes the Lebesgue measure.

These define an isomorphism between M and the space of all monotonic rationing methods.

Theorem 10 (Koster [60]) For any cost sharing problem (x; c), it holds core(c x ) D f(x; c) j  2 Mg.

So each monotonic rationing method relates to a cost sharing rule and vice versa. In this way P is directly linked with the proportional rationing method, SR to the uniform gains method, and SS to the random priority method. Properties of rationing methods lead to properties of cost sharing rules and vice versa [61].

In particular this means that for  2 M it holds that (x; c) 2 core(c x ) whenever c is concave, since this implies c  D c. This result appeared earlier as a corollary to Theorem 8, see [88]. [47,57,61] show non-linear cost sharing rules yielding core elements for concave cost functions as well, so additivity is only a sufficient condition in the above statement. For average cost sharing, one can show more, P (x; c) 2 core(c x ) for all x precisely when the average cost c(y)/y is decreasing in y.

Incentives in Cooperative Production Stable Allocations, Stand-Alone Core Suppose again, like in the framework of cooperative cost games, that (coalitions of) agents can decide to leave the cost sharing and organize their own production facility. Under the ability to replicate the technology the question arises whether cost sharing rules induce stable cost share vectors. Theorem 9 (Moulin [85]) For concave cost functions c, if  is an anonymous and cross-monotonic cost sharing rule then (x; c) 2 core(c x ).

Strategic Manipulation Through Reallocation of Demands In the cooperative production model, there are other ways that agents may use to manipulate the final allocation. In particular, note that the serial procedure gives the larger demanders an advantage in case of positive externalities; as marginal costs decrease, the price paid by the larger agents per unit of output is lower than that of the smaller agents. In the other direction larger demanders are

Cost Sharing

punished if costs are convex. Then as the examples below show, this is just why the serial ideas are vulnerable to misrepresentation of demands, since combining demands and redistribute the output afterwards can be beneficial.

Theorem 11 Assume that N contains at least three agents. The proportional cost sharing rule is the unique rule that charges nothing for a null demand and meets any one of the following properties:

Example 22 Consider the cost function c given by c(y) D min f5y; 60 C 2yg. Such cost functions are part of daily life, whenever one has to decide upon telephone or energy supply contracts: usually customers get to choose between a contract with high fixed cost and a low variable cost, and another with low or no fixed cost and a high variable price. Now consider the two cost sharing problems (x; c) and (x 0 ; c) where x D (10; 20; 30); x 0 D (0; 30; 30). The cost sharing problem (x 0 ; c) arises from (x; c) if agent 2 places a demand on behalf of agent 1 – without letting agent 3 know. The corresponding average and serial cost shares are given by

 Independence of merging and splitting,  No advantageous reallocation,  Irrelevance of reallocation.

P (x; c) D (30; 60; 90) SR (x; c) D (40; 60; 80)

P (x 0 ; c) D (0; 90; 90) SR (x 0 ; c) D (0; 90; 90) :

Notice that the total of average cost shares for 1 and 2 is the same in both cost sharing problems. But if the serial rule were used, these agents can profit by merging their demands; if agent 2 returns agent 1’s demand and requires a payment from agent 1 between €30 and €40, then both agents will have profited by such merging of demand. Example 23 Consider the five-agent cost sharing problems (x; c) and (x; c) with x D (1; 2; 3; 0; 0); x D (1; 2; 1; 1; 1) and convex cost function c(y) D 12 y 2 . (x; c) arises out of (x; c) if agent 3 splits her demand over agents 4 and 5 as well. Then P (x; c) D (6; 12; 18; 0; 0)

P (x; c) D (6; 12; 6; 6; 6) ;

SR (x; c) D (3; 11; 22; 0; 0)

SR (x; c) D (5; 16; 5; 5; 5) :

The second property shows even a stronger property than merging and splitting: agents may redistribute the demands in any preferred way without changing the aggregate cost shares of the agents involved. The third property states that in such cases the cost shares of the other agents do not change. Then this makes proportional cost sharing compelling in situations where one is not capable of detecting the true demand characteristics of individuals. Demand Games for P1 Consider demand games G(; c) as in Eq. (11), Sect. “Demand Games”, where now  is a cost sharing rule on P 1 . These games with uncountable strategy spaces are more complex than the demand games that we studied before. The set of consequences for players is now given by C D R2C , combinations of levels of production and costs (see Sect. “Strategic Demand Games”). Then an individual i’s preference relation is convex if for the corresponding utility function ui and all pairs z; z0 2 C it holds u i (z) D u i (z0 ) H) u(tz C (1  t)z) u i (z)

for all t 2 [0; 1] : (22)

The aggregate of average cost shares for agents 3, 4, and 5 does not change. But notice that according to the serial cost shares, there is a clear advantage for the agents. Instead of paying 22 in the original case, now the total of payments equals 15. Agent 3 may consider a transfer between 0 and 7 to 3 and 4 for their collaboration and still be better of. In general, in case of a convex cost function the serial rule is vulnerable with respect to manipulation of demands through splitting.

This means that a weighted average of the consequences is weakly preferred to both consequences, if these are equivalent. Such utility functions ui are called quasi-concave. An example of convex preferences are those related to linear utility functions of type u i (x; y) D ˛x  y. Moreover, strictly convex preferences are those with strict inequality in Eq. (22) for 0 < t < 1; the corresponding utility functions are strictly quasi-concave. Special classes of preferences are the following.

Note that in the above cases the proportional cost sharing rule does prescribe the same cost shares. It is a nonmanipulable rule: reshuffling of demands will not lead to different aggregate cost shares. The rule does not discriminate between units, when a unit is produced is irrelevant. It is actually a very special feature of the cost sharing rule that is basically not satisfied by any other cost sharing rule.

 L: the class of all convex and continuous preferences utility functions that are non-decreasing in the service component x, non-increasing in the cost component y, non-locally satiated and decreasing on (x; c(x)) for x large enough. The latter restriction is no more than assuring that agents will not place requests for unlimited amounts of the good.

741

742

Cost Sharing

Cost Sharing, Figure 8 Linear, convex preferences, u(x; y) D 2x  y. The contours indicate indifference curves, i. e. sets of type f(x; y)ju(x; y) D kg, the k-level curve of u

 L : the class of bi-normal preferences in L. Basically, if such a preference is represented by a differentiable utility function u then the slope dy/dx of the indifference contours is non-increasing in x, non-decreasing in y. For a concise definition see [141]. Examples include Cobb–Douglas utility functions and also those of type u i (x; y) D ˛(x)  ˇ(y) where ˛ and ˇ are concave and convex functions, respectively. A typical plot of level curves of such utility functions is in Fig. 9. Note that the approach differs from the standard literature where agents have preferences over endowments. Here costs are ‘negative’ endowments. In the latter interpretation, the condition can be read as that the marginal rate of substitution is non-positive. At equal utility, an increase of the level of output has to be compensated by a decrease in the level of input good. Nash-Equilibria of Demand Games in a Simple Case Consider a production facility shared by three agents N D f1; 2; 3g with cost function c(y) D 12 y 2 . Assume that the agents have quasi-linear utilities in L, i. e. u i (x i ; y i ) D ˛ i x i  y i for all pairs (x i ; y i ) 2 R2C . Below the Nash-equilibrium in the serial and proportional demand game is calculated in two special cases. This numerical example is based on [91]. Proportional Demand Game Consider the corresponding proportional demand game, G(P ; c), with utility over actions given by U iP (x) D ˛ i x i  P (x; c) D ˛ i x i  12 x i x(N) :

(23)

Cost Sharing, Figure 9 p Strictly convex preferences, u(x; y) D x  e0:5y . The straight line connecting any two points on the same contour lays in the lighter area – with higher utility. Fix the y-value, then an increase of x yields higher utility, whereas for fixed x an increase of y causes the utility to decrease

In a Nash-equilibrium x  of G(P ; c) each player i gives  , the action profile of the other a best response on xi  ). agents. That is, player i chooses x i 2 arg max U iP (t; xi t

Then first order conditions implies for an interior solution ˛ i  12 x  (N)  12 x i D 0

(24)

for all i 2 N. Then x  (N) D 12 (˛1 C ˛2 C ˛3 ) and x i D 2˛ i  12 (˛1 C ˛2 C ˛3 ). Serial Demand Game Consider the same production facility and the demand game G(SR ; c), corresponding to the serial rule. Then the utilities over actions are given by U iSR (x) D ˛ i x i  SR (x; c) :

(25)

Now suppose x is a Nash equilibrium of this game, and assume without loss of generality that x 1  x 2  x 3 . Then player 1 with the smallest equilibrium demand maximizes the expression U1SR ((t; x 2 ; x 3 ) D ˛1 t  c(3t)/3 D ˛1 t  32 t 2 at x 1 , from which we may conclude that ˛1 D 3x 1 . In addition, in equilibrium, player 2, maximizes U2SR (x 1 ; t; x 3 ) D ˛2 t ( 13 c(3x 1 )C 12 (c(x 1 C2t) c(3x 1 )); for t x 1 , yielding ˛2 D x 1 C 2x 2 . Finally, the equilibrium condition for player 3 implies ˛3 D x(N). Then it is not hard to see that actually this constitutes the serial equilibrium.

Cost Sharing

Comparison of Proportional and Serial Equilibria (I) Now let’s compare the serial and the proportional equilibrium in the following two cases: (i) (ii)

˛1 D ˛2 D ˛3 D ˛ ˛1 D ˛2 D 2; ˛3 D 4 :

Case (i): Then we get x  D ( 12 ˛; 12 ˛; 12 ˛) and x i D ( 13 ˛; 13 ˛; 13 ˛) for all i. The resulting equilibrium payoff vectors are given by U P (x  ) D ( 18 ˛ 2 ; 18 ˛ 2 ; 18 ˛ 2 ) and U SR (x) D ( 16 ˛ 2 ; 16 ˛ 2 ; 16 ˛ 2 ) : Not only the average outcome is less efficient than its serial counterpart, it is also Pareto-inferior to the latter. Case (ii): The proportional equilibrium is a boundary solution, x  D (0; 0; 2) with utility profile U P (x  ) D (0; 0; 8). The serial equilibrium strategies and utilities are given by x D ( 23 ; 23 ; 83 ); U SR (x) D ( 23 ; 23 ; 4) : Notice that the serial equilibrium is now less efficient, but is not Pareto-dominated by the proportional utility distribution. Uniqueness of Nash-Equilibria in P1 -Demand Games In the above demand games there is a unique Nash-equilibrium which serves as a prediction of actual play. This need not hold for any game. In the literature strategic characterizations of cost sharing rules are discussed in terms of uniqueness of equilibrium in the induced cost games, relative to specific domains of preferences and cost functions. Below we will discuss the major findings of [141]. These results concern a broader cost sharing model with the notion of a cost function as a differentiable function RC ! RC . So in this paragraph such cost function can decrease, fixed cost need not be 0. This change in setup is not crucial to the overall exposition since the characterizations below are easily interpreted within the context of P 1 . Demand Monotonicity The mapping t 7!  i ((t; xi ); c) is non-decreasing;  is strictly demand monotonic if this mapping is increasing whenever c is increasing. Smoothness The mapping x 7! (x; c) is continuously differentiable for all continuously differentiable c 2 C 1 . Recall that L is the domain of all bi-normal preferences. Theorem 12 (Watts [141]) Fix a differentiable cost function c and a demand monotonic and smooth cost sharing rule . A cost sharing game G(; c) has a unique equilibrium whenever agents’ preferences belong to L , only if, for all x D (x1 ; : : : ; x n )

 Every principal minor of the matrix W with rows wi is non-negative for all i

w 2



@ i @ i ;:::; @x1 @x n

2

 @ i @2  i ; : ;:::; @x1 @x1 @x i @x n (26)

 The determinant of the Hessian matrix corresponding to the mapping x 7! (x; c) is strictly positive. A sufficient condition to have uniqueness of equilibrium is that the principle minor of the matrix W is strictly positive. The impact of this theorem is that one can characterize the class of cost functions yielding unique equilibria if the domain of preferences is L .  G(SR ; c); G(DSR ; c): Necessary condition for uniqueness of equilibrium is that c is strictly convex, i. e. c 00 > 0. A sufficient condition is that c is increasing and strictly convex. Actually, [141] also shows that the conclusions for the serial rule do not change when L is used instead of L . As will get more clear below, the serial games have unique strategic properties.  G(P ; c): The necessary and sufficient conditions are those for the serial demand game, including c 0 (y) > c(y)/y for all y ¤ 0. Notice, that the latter property does not pose additional restrictions on cost functions within the framework of P 1 .  G(SS ; c): Necessary condition is c 00 > 0. In general it is hard to establish uniqueness if more than 2 players are involved.  G(MP ; c): even in 2 player games uniqueness is not guaranteed. For instance, uniqueness is guaranteed for cost functions c(y) D y ˛ only if 1 < ˛  3. For c(y) D y 4 there are preference profiles in L such that multiple equilibria reside. Decreasing Returns to Scale The above theorem basically shows that uniqueness of equilibrium in demand games related to P 1 can be achieved for preferences in L if only costs are convex, i. e. the technology exhibits decreasing returns to scale. Starting point in the literature to characterize cost sharing rules in terms of their strategic properties is the seminal paper by Moulin and Shenker [91]. Their finding is that on L basically SR is the only cost sharing rule of which the corresponding demand game passes the unique equilibrium test like in Theorem 12. Call a smooth and strictly demand monotonic cost sharing rule  regular if it is anonymous, so that the name of an agent does not have an impact on her cost share.

743

744

Cost Sharing

Theorem 13 (Moulin & Shenker [91]) Let c be a strictly convex continuously differentiable cost function, and let  be a regular cost sharing rule. The following statements are equivalent:   D SR ,  for all profiles (u1 ; u2 ; : : : ; u n ) of utilities in L, G(; c) has at most one Nash-equilibrium,  for all profiles (u1 ; u2 ; : : : ; u n ) of utilities in L every Nash-equilibrium of G(; c) is also a strong equilibrium, i. e., no coalition can coordinate in order to improve the payoff for all its members. This theorem makes a strong case for the serial cost sharing rule, especially when one realizes that the serial equilibrium is the unique element surviving successive elimination of strictly dominated strategies. Then this equilibrium may naturally arise through evolutive or eductive behavior, it is a robust prediction of non-cooperative behavior. Recent experimental studies are in line with this theoretical support, see [22,108]. Proposition 1 in [141] shows how easy it is to construct preferences in L such that regular cost sharing rules other than SR give rise to multiple equilibria in the corresponding demand game, even in two-agent cost sharing games. Besides other fairness concepts in the distributive literature the most compelling is envy-freeness. An allocation passes the no envy test if no player prefers her own allocation less than that of other players. Formally, the definition is as follows. No Envy Test Let x be a demand profile and y a vector of cost shares. Then the allocation (x i ; y i ) i2N is envy-free if for all i; j 2 N it holds u i (x i ; y i ) u i (x j ; y j ). It is easily seen that the allocations associated with the serial equilibria are all envy-free. Increasing Returns to Scale As Theorem 12 already shows, uniqueness of equilibrium in demand games for all utility profiles in L is in general inconsistent with concave cost functions. Theorem 14 (de Frutos [26]) Let c be a strictly concave continuously differentiable cost function, and let  be a regular cost sharing rule. The following statements are equivalent:   D DSR or  D SR .  For all utility profiles u D (u i ) i2N in L the induced demand game G(; c) has at most one Nash equilibrium, or

Cost Sharing, Figure 10 Scenario (ii). The indifference curves of agent 1 together with the curve  : t 7! 1SR ((t; x 1 ); c). Best response of player 1 against x 1 D ( 23 ; 83 ) is the value of x where the graph of  is tangent to an indifference curve of u1

 For all utility profiles u D (u i ) i2N in L, every Nash equilibrium of the game G(; c) is a strong Nash equilibrium as well. Moreover, if the curvature of the indifference curves is bigger than that of the curve generated by the cost sharing rule as in Fig. 10 then the second and third statement are equivalent with  D DSR . Theorem 15 (Moulin [85]) Assume agents have preferences in L . The serial cost sharing rule is the unique continuous, cross-monotonic and anonymous cost sharing rule for which the Nash-equilibria of the corresponding demand games all pass the no-envy test. Comparison of Serial and Proportional Equilibria (II) Just as in the earlier analysis in Sect. “Efficiency and Strategy-Proof Cost Sharing Mechanisms”, performance of cost sharing rules can be measured by the related surpluses in the Nash-equilibria of the corresponding demand games. Assume in this section that the preferences of the agents are quasi-linear in cost shares, and represented by functions U i (x i ; y i ) D u i (x i )  y i . Moreover, assume that ui is non-decreasing and concave, u i (0) D 0. Then the surplus at the demand profile x and utility profile is the numP ber i2N u i (x i )  c(x(N)). Define the efficient surplus or value of N relative to c and U by X u i (x i )  c(x(N)) : (27) v(c; U) D sup x2RN C

i2N

Cost Sharing

Denote the set of Nash-equilibria in the demand game G(; c) with profile of preferences U by NE(; c; U). Given c;  the guaranteed (relative) surplus of the cost sharing rule  for N is defined by P i2N u i (x i )  c(x(N)) : (28)  (c; ) D inf v(c; U) U ;x2NE(;c;U ) Here the infimum is taken over all utility profiles discussed above. This measure is also called the price of anarchy of the game, see [65]. Let C  be the set of all convex increasing cost functions with lim y!1 c(y)/y D 1. Then [89] shows that for the serial and the proportional rule the guaranteed surplus is at least 1/n. But sometimes the distinction is eminent. Define the number ı(y) D

yc 00 (y) ; c 0 (y)  c 0 (0)

which is a kind of elasticity. The below theorem shows that on certain domains of cost functions with bounded ı the serial rule prevails over the proportional rule. For large n the guaranteed surplus at SR is of order 1/ ln(n), that of AV of order 1/n. More precise, write 1 K n D 1 C 13 C    C 2n1  1 C ln2n , then: Theorem 16 (Moulin [89]) For any convex increasing cost function c with lim y!1 c(y)/y D 1 it holds that  If c 0 is concave and inf fı(y)jy 0g D p > 0, then  (c; SR )

p Kn

 (c; AV ) 

4 nC3

 If c 0 is convex and sup fı(y)jy 0g D p < 1 then  (c; SR )

1 1 2p  1 K n

4 4(2p  1)   (c; AV )  nC3 n

the true preferences. Then this gives rise to a strategyproof mechanism. Note that the same approach can not be used for the proportional rule. The strategic properties of the proportional demand game are weaker than that of the serial demand game in several aspects. First of all, it is not hard to find preference profiles in L leading to multiple equilibria. Whereas uniqueness of equilibrium can be repaired by restricting L to L , the resulting equilibria are, in general, not strong (like the serial counterparts). In the proportional equilibria there is overproduction, see e. g. the example in Sect. “Proportional Demand Game” where a small uniform reduction of demands yields higher utility for all the players. Besides, a single-valued Nash equilibrium selection corresponds to a strategyproof mechanism provided the underlying domain of preferences is rich, and L is not. Though richness is not a necessary condition, the proportional rule is not consistent with a strategyproof demand game. Bayesian P1 -Demand Games Recall that at the basis of a strategic game there is the assumption that each player knows all the ingredients of the game. However, as [56] argues, production cost and output quality may vary unpredictably as a consequence of the technology and worker quality. Besides that, changes in the available resources and demands will have not foreseen influences on individual preferences. On top of that, the players may have asymmetrical information regarding the nature of uncertainty. [56] study the continuous homogeneous cost sharing problem within the context of a Bayesian demand game [43], where these uncertainties are taking into account. The qualifications of the serial rule in the stochastic model are roughly the same as in the deterministic framework. Continuous Heterogeneous Output Model, P n

A Word on Strategy-Proofness in

P1

Recall the discussion on strategyproofness in Sect. “Strategyproofness”. The serial demand game has a unique strong Nash equilibrium in case costs are convex and preferences are drawn from L. Suppose the social planner aims at designing a mechanism to implement the outcomes associated with these equilibria. [91] show an efficient way to implement this serial choice function by an indirect mechanism. It is defined through a multistage game which mimics the way the serial Nash equilibria are calculated. It is easily seen that here each agent has a unique dominant strategy, in which demands result from optimization of

The analysis of continuous cost sharing problems for multi-service facilities are far more complex than the single-output model. The literature discusses two different models, one where each agent i demands a different good, and one where agents may require mixed bundles of goods. As the reader will notice, the modeling and analysis of solutions differs in abstraction and complexity. In order to concentrate on the main ideas, here we will stick to the first model, where goods are identified with agents. This N means that a demand profile is a vector x 2 RC , where xi denotes the demand of agent i for good i. From now we deal with technologies described by continuously differ-

745

746

Cost Sharing

N entiable cost functions c : RC ! RC , non-decreasing and c(0 N ) D 0; the class of all such functions is denoted by C n .

Extensions of Cost Sharing Rules The single-output model is connected to the multi-output model via the homogeneous cost sharing problems. Suppose that for c 2 C n there is a function c0 2 C such that c(x) D c0 (x(N)) for all x. For instance, such functions are found if we distinguish between production of blue and red cars; the color of the car does not affect the total production costs. Essentially, a homogeneous cost sharing problem (x; c) may be solved as if it were in P 1 . If  is the compelling solution on P 1 then any cost sharing rule on P n should determine the same solution on the class of homogeneous problems therein. Formally the cost sharing rule on P n extends  on P 1 if for all homogeneous cost sharing problems (x; c) it holds (x; c) D (x; c0 ). In general a cost sharing rule  on P 1 allows for a whole class of extensions. Below we will focus on extensions of SR ; P ; SS . Measurement of Scale Around the world quantities of goods are measured by several standards. Length is expressed in inches or centimeters, volume in gallons or liters, weight in ounces to kilos. Here measurement conversion involves no more than multiplication with a fixed scalar. When such linear scale conversions do not have any effect on final cost shares, a cost sharing rule is called scale invariant. It is an ordinal rule if this invariance extends to essentially all transformations of scale. Scale invariance captures the important idea that the relative cost shares should not change, whether we are dividing 1 Euro or 1,000 Euros. Ordinality may be desirable, but for many purposes too strong as a basic requirement. Formally, N ! R N such a transformation of scale is a mapping f : RC C that f (x) D ( f1 (x1 ); f2 (x2 ); : : : ; f n (x n )) for all x and each of the coordinate mappings f j is differentiable and strictly increasing. Ordinality A cost sharing rule  on P n is ordinal if for all transformations of scale f and all cost sharing problems (x; c) 2 P n , it holds that (x; c) D ( f (x); c ı f 1 ) :

(29)

Scale Invariance A cost sharing rule  on P n is scale invariant if Eq. (29) holds for all linear transforms f . Under a scale invariant cost sharing rule final cost shares do not change by changing the units in which the goods are measured.

Path-Generated Cost Sharing Rules Many cost sharing rules on P n calculate the cost shares for (x; c) 2 P n by the total of marginal costs along some production path from 0 toward x. Here a path for x is a non-decreasing N such that  (0) D 0 and there mapping  x : RC ! RC N cost sharing is a T 2 RC with  x (T) D x. The ˚  rule generN is defined by ated by the path collection  D  x jx 2 RC Z 1 @ i c( x (t))( ix )0 (t)dt : (30)  i (x; c) D 0

Special path-generated cost sharing rules are the fixed-path N with the cost sharing rules; a single path   : RC ! RC  property that lim t!1  i (t) D 1 defines the whole underlying family of paths. More precisely, the fixed path cost sharing rule  generated by   ˚ is the path-gen N erated rule for the family of paths  x jx 2 RC defined˚ by  x (t)D   (t) ^ x, the vector with coordinates min  i (t); x i . So the paths are no more than the projections of   (t) on the cube [0; x]. Below we will see many examples of (combinations of) such fixed-path methods. Aumann–Shapley Rule The early characterizations by [15,79] on this rule set off a vast growing literature on cost sharing models with variable demands. [16] suggested to use the Aumann–Shapley rule to determine telephone billing rates in the context of sharing the cost of a telephone system. This extension of proportional cost sharing calculates marginal costs along the path  AS (t) D tx for t 2 [0; 1]. Then Z 1 (x; c) D x @ i c(tx)dt : (31) AS i i 0

The Aumann–Shapley rule can be interpreted as the Shapley-value of the non-atomic game where each unit of the good is a player, see [11]. It is the uniform average over marginal costs along all increasing paths from 0 N to x. The following is a classic result in the cost sharing literature: Theorem 17 (Mirman & Tauman [79], Billera & Heath [15]) There is only one additive, positive, and scale invariant cost sharing rule on P n that extends the proportional rule, and this is AS . Example 24 If c is positively homogeneous, i. e. c(˛ y) D N , then ˛c(y) for ˛ 0 and all y 2 RC AS i (x; c) D

@i c (x) @x i

i. e. AS calculates the marginal costs of the ith good at the final production level x. The risk measures (cost functions) as in [28] are of this kind.

Cost Sharing

Friedman–Moulin Rule This serial extension [36] calculates marginal costs along the diagonal path, i. e.  FM (t) D t1 N ^ x Z xi (x; c) D @ i c( x (t))dt : (32) FM i 0

This fixed path cost sharing rule is demand monotonic. As far as invariance with the choice of unit is concerned, the performance is bad as it is not a scale invariant cost sharing rule. Moulin–Shenker Rule This fixed-path cost sharing rule is proposed as an ordinal serial extension by [125]. Suppose that the partial derivatives of c 2 C n are bounded away from 0, i. e. there is a such that @ i c(x) > a for all N . The Moulin–Shenker rule MS is generated by the x 2 RC path  MS as solution to the system of ordinary differential equations (P  i0 (t) D

@ j c( (t)) @ i c( (t))

j2N

if  i (t) < x i ; else :

0

(33)

The interpretation of this path is that at each moment the total expenditure for production of extra units for the different agents is equal; if good 2 is twice as expensive as good 1, then the production device  MS will produce twice as much of the good 1. The serial rule embraces the same idea – as long as an agent desires extra production, the corresponding incremental costs are equally split. This makes MS a natural extension of the serial rule. Call t i the completion time of production for agent i, i. e.  iMS (t) < x i if t < t i and  iMS (t i ) D x i . Assume without loss of generality that these completion times are ordered such that 0 D t0  t1  t2      t n , then the Moulin– Shenker rule is given by MS i (x; c) D

i  )) X c( MS (t` ))  c( MS (t`1 : n`C1

(34)

`D1

Note that the path varies with the cost function, and that this is the reason why MS is a non-additive solution. Such solutions – though intuitive – are in general notoriously hard to analyze. There are two axiomatic characterizations of the Moulin–Shenker rule. The first is by [125], in terms of the serial principle and the technical condition that a cost sharing rule be a partial differentiable functions of the demands. The other characterization by [60] is more in line with the ordinal character of this serial extension. Continuity A cost sharing rule  on P n is continuous if N for all c. q 7! (q; c) is continuous on RC

Continuity is weaker than partial differentiability, as it requires stability of the solution with respect to small changes in the demands. Upperbound A cost sharing rule  satisfies upperbound if for all (q; c) 2 P n ; i 2 N  i (q; c)  max @ i c(y) : y2[0;q]

An upperbound provides each agent with a conservative and rather pessimistic estimate of her cost share, based on the maximal value of the corresponding marginal cost toward the aggregate demand. Suppose that d is a demand profile smaller than q. A reduced cost sharing problem is defined by (q  d; c d ) where cd is defined by c d (y) D c(y C d)  c(d). So cd measures the incremental cost of production beyond the level d. Self–consistency A cost sharing rule  is self-consistent if for all cost sharing problems (q; c) 2 P n with q NnS D 0 NnS for some S N, and d  q such that  i (d; c) D  j (d; c) for all fi; jg  S, then (q; c)S D (d; c)S C (q  d; c d )S . So, self-consistency is expresses the idea that if cost shares of agents with non-zero demand differ, then this is not due to the part of the problem that they are equally charged for, but due to the asymmetries in the related reduced problem. The property is reminiscent of the step-by-step negotiation property in the bargaining literature, see [54]. Theorem 18 (Koster [60]) There is only one continuous, self-consistent and scale invariant cost sharing rule satisfying upper bounds, which is the Moulin–Shenker rule. Shapley–Shubik Rule For each demand profile x the stand-alone cost game cx is defined as before. Then the Shapley–Shubik rule is no more than the Shapley value of this game, i. e. SS (x; c) D ˚ (c x ). The Shapley–Shubik rule is ordinal. A Numerical Example Consider the cost sharing problem (x; c) with N D f1; 2g ; x D (5; 10), and c 2 C 2 is given by c(t1 ; t2 ) D e2t1 Ct2  1 on [0; 10]  [0; 10]. We calculate the partial derivatives @1 c(t1 ; t2 ) D 2e2t1 Ct2 D 2@2 c(t1 ; t2 ) for all (t1 ; t2 ) 2 R2C . The Aumann–Shapley path is given by  (t) D (5t; 10t) for t 2 [0; 1] and 1AS (x; c) D

Z

1

@1 c(5t; 10t)  5dt 0

747

748

Cost Sharing

Z

1

D 2AS (x; c) D

Z

0

Z

0

2e20t  5dt D

1 2

  20 e 1

1

@2 c(5t; 10t)  10dt 1

D

e20t  10dt D

0

1 2

 20  e 1 :

of  MS such that (t) D (t; 2t) for 0  t  5. The corresponding cost shares are equal since  reaches both coordinates of x at the same time, so 1MS (x; c) D 2MS (x; c) D 12 c( (5)) D 12 c(5; 10) D 12 (e20  1). Now suppose that the demands are summarized by x  D (10; 10). In order to calculate MS (x  ; c), notice that there is a parametrization of   of the corresponding path  MS such that

The Friedman–Moulin rule uses the path (  FM (t) D t1 N ^ q D

(t; t) (5; 5 C t)

if 0  t < 5 ;

Z

5 0 5

D

2e3t dt D

0

 2 3



Z

5

Note that both discussed cost sharing rules use one and the same path for all cost sharing problems with demand profile x. This is characteristic for additive cost sharing rules (see e. g. [35,42]). Now turn to the Moulin–Shenker rule. Since @1 c D 2@2 c everywhere on [0; 10]  [0; 10], according the solution  MS of Eq. (33), until one of the demands is reached, for each produced unit of good 1 two units of good 2 are produced. In particular there is a parametrization 

Cost Sharing, Figure 11 Paths for MS ; AS ; FM

if t  5 ;

(t; 10)

for 5 < t  10 ;

Notice that this path extends  just to complete service for agent 1, so that – like before – agent 2 only contributes while t < 5. Then the cost shares are given by

1MS (x  ; c) D 2MS (x  ; c) C c(  (10))  c(  (5)) D e30  12 (e20 C 1) :

e15  1 ;

Z 10 @2 c(t; t)dt C @2 c(5; 5 C t)dt 0 5   D 13 e15  1 C e20  e15 :

2FM (x; c) D

(t; 2t)

2MS (x  ; c) D 12 c(  (5)) D 12 c(5; 10) D 12 (e20  1) ;

@1 c(t; t)dt Z

 (t) D

if 5  t ;

and the corresponding cost shares are calculated as follows: 1FM (x; c) D

( 

For x  the cost sharing rules AS and FM use essentially the same symmetric path  (t) D (t; t), so that it is easily  calculated that AS (x  ; c) D FM (x  ; c) D ( 23 e30  1 ;   1 30 3 e 1 . Axiomatic Characterization of Fixed-Path Rules Recall demand monotonicity as a weak incentive constraint for cost sharing rules. Despite the attention that the Aumann–Shapley rule received, it fails to meet this standard. To see this consider the following c(y) D

y1 y2 ; y1 C y2

Cost Sharing, Figure 12 Paths for MS ; AS ; FM

and 1AS (x; c) D

x1 x22 : (x1 C x2 )2

Cost Sharing

Then the latter expression is not monotonic in x1 . One may show that the combination of properties in Theorem 17 are incompatible with demand monotonicity. Now what kind of rules are demand monotonic? The classification of all such rules is too complex. We will restrict our attention to the additive rules with the dummy property, which comprises the idea that a player pays nothing if her good is for free:

G(; c), and that it is strong as well. Moreover, [34] shows that this Nash-equilibrium can be reached through some learning dynamics. Then this means that the demand games induced by FM and MS have strong strategic properties. Notice that the above theorem is only one-way. There are other cost sharing rules, like the axial serial rule, having the same strategic characteristics.

Dummy If @ i c(y) D 0 for all y then  i (x; c) D 0 for all cost sharing problems (x; c) 2 P n .

Future Directions

Theorem 19 (Friedman [35])  A cost sharing rule  satisfies dummy, additivity and demand monotonicity if and only if it is an (infinite) convex combination of rules generated by fixed paths which do not depend on the cost structure.  A cost sharing rule  satisfies dummy, additivity and scale invariance if and only if it is an (infinite) convex combination of rules generated by scale invariant fixed paths which do not depend on the cost structure. This theorem has some important implications. The Friedman–Moulin rule is the unique serial extension with the properties additivity, dummy and demand monotonicity. As we mentioned before, FM is not scale invariant. The only cost sharing rules satisfying all four of the above properties are random order values, i. e. a convex combination of marginal vectors of the stand-alone cost game [142]. SS is the special element in this class of rules, giving equal weight to each marginal vector. Consider the following weak fairness property: Equal Treatment Consider (x; c) 2 P n . If c x (S [ fig) D c x (S [ f jg) for all i; j and S  Nn fi; jg then  i (x; c) D  j (x; c). Within the class of random order values, SS is the unique cost sharing rule satisfying equal treatment. Strategic Properties of Fixed-Path Rules [34] shows that the fixed-path cost sharing rules essentially have the same strategic properties as the serial rule. The crucial result in this respect is the following. Theorem 20 (Friedman [34]) Consider the demand game G(; c) where  is a fixed path cost sharing rule and c 2 C n has strictly increasing partial derivatives. Then the corresponding set O1 of action profiles surviving the successive elimination of overwhelmed actions consists of a unique element. As a corollary one may prove that the action profile in O1 is actually the unique Nash-equilibrium of the game

So far, a couple of standard stylized models have been launched providing a theoretical basis for defining and studying cost sharing principles at a basic level. The list of references below indicates that this field of research is full in swing, both in theoretical and applied directions. Although it is hard to make a guess where developments lead to, a couple of future directions will be highlighted. Informational Issues So far most of the literature is devoted to deterministic cost sharing problems. The cost sharing problems we face in practice are shaped by unsure events. Despite its relevance, this stochastic modeling in the literature is quite exceptional, see [56,138]. The presented models assume the information of costs for every contingent demand profile. Certainly within the continuous framework this seems too much to ask for. Retrieving the necessary information is hindered not only by technical constraints, but leads to new costs as well. [48] discusses data envelopment in cost sharing problems. A stochastic framework will be useful to study such estimated cost sharing problems. Other work focusing on informational coherence in cost sharing problems is [126]. Related work is [4], discussing mixtures of discrete and continuous cost sharing problems. Budget Balance In this overview, the proposed mechanisms are based on cost sharing rules. Another stream in implementation theory – at the other extreme of the spectrum – deals with cost allocation rules with no restrictions on the budget. [89] compares the size of budget deficits relative to the overall efficiency of a mechanism. Performance Recall the performance indices measuring the welfare impact of different cost sharing rules. [90] focuses on the continuous homogeneous production situations, with cost functions of specific types. There is still a need for a more

749

750

Cost Sharing

general theory. In particular this could prove to be indispensable for analyzing the quality of cost sharing rules in a broader set-up, the heterogeneous and Bayesian cost sharing problems.

15. 16.

Non-linear Cost Sharing Rules Most of the axiomatic literature is devoted to the analysis of cost sharing rules as linear operators. The additivity property is usually motivated as an accounting convention, but it serves merely as a tool by which some mathematical representation theorems apply. Besides the practical motivation, it is void of any ethical content. As [88] underlines, there are hardly results on non-additive cost sharing rules – one of the reasons is that the mathematical analysis becomes notoriously hard. But – as a growing number of authors acknowledges – the usefulness of these mathematical techniques alone cannot justify the contribution of the property. I thank Hervé Moulin as a referee of this article, and his useful suggestions.

17. 18. 19. 20.

21.

22. 23. 24.

Bibliography 1. Aadland D, Kolpin V (1998) Shared irrigation cost: an empirical and axiomatical analysis. Math Soc Sci 35:203–218 2. Aadland D, Kolpin V (2004) Environmental determinants of cost sharing. J Econ Behav Organ 53:495–511 3. Albizuri MJ, Zarzuelo JM (2007) The dual serial cost-sharing rule. Math Soc Sci 53:150–163 4. Albizuri MJ, Santos JC, Zarzuelo JM (2003) On the serial cost sharing rule. Int J Game Theory 31:437–446 5. An M (1998) Logconcavity versus logconvexity, a complete characterization. J Econ Theory 80:350–369 6. Archer A, Feigenbaum J, Krishnamurthy A, Sami R, Shenker S (2004) Approximation and collusion in multicast costsharing. Games Econ Behav 47:36–71 7. Arin J, Iñarra E (2001) Egalitarian solutions in the core. Int J Game Theory 30:187–193 8. Atkinson AB (1970) On the measurement of inequality. J Econ Theory 2:244–263 9. Aumann RJ (1959) Acceptable points in general cooperative n-person games. In: Kuhn HW, Tucker AW (eds) Contributions to the Theory of Games, vol IV. Princeton University Press, Princeton 10. Aumann RJ, Maschler M (1985) Game theoretic analysis of a bankruptcy problem from the Talmud. J Econ Theory 36:195–213 11. Aumann RJ, Shapley LS (1974) Values of Non-atomic Games. Princeton University Press, Princeton 12. Baumol W, Bradford D (1970) Optimal departure from marginal cost pricing. Am Econ Rev 60:265–283 13. Baumol W, Panzar J, Willig R (1988) Contestable Markets & the Theory of Industry Structure. 2nd edn Hartcourt Brace Jovanovich. San Diego 14. Bergantino A, Coppejans L (1997) A game theoretic approach

25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37.

38.

39.

to the allocation of joint costs in a maritime environment: A case study. Occasional Papers, vol 44. Department of Maritime Studies and International Transport, University of Wales, Cardi Billera LJ, Heath DC (1982) Allocation of shared costs: A set of axioms yielding a unique procedure. Math Oper Res 7:32–39 Billera LJ, Heath DC, Raanan J (1978) Internal telephone billing rates: A novel application of non-atomic game theory. Oper Res 26:956–965 Bird CG (1976) On cost allocation for a spanning tree: A game theoretic approach. Networks 6:335–350 Binmore K (2007) Playing for real: A text on game theory. Oxford University Press, Oxford Bjorndal E, Hamers H, Koster M (2004) Cost allocation in a bank ATM network. Math Methods Oper Res 59:405–418 Bondareva ON (1963) Some applications of linear programming to the theory of cooperative games. Problemy Kybernetiki 10:119–139 (in Russian) Brânzei R, Ferrari G, Fragnelli V, Tijs S (2002) Two approaches to the problem of sharing delay costs in joint projects. Ann Oper Res 109:359–374 Chen Y (2003) An experimental study of serial and average cost pricing mechanisms. J Publ Econ 87:2305–2335 Clarke EH (1971) Multipart pricing of public goods. Publ Choice 11:17–33 Dasgupta PS, Hammond PJ, Maskin ES (1979) The implementation of social choice rules: Some general results on incentive compatibility. Rev Econ Stud 46:185–216 Davis M, Maschler M (1965) The kernel of a cooperative game. Nav Res Logist Q 12:223–259 de Frutos MA (1998) Decreasing serial cost sharing under economies of scale. J Econ Theory 79:245–275 Demers A, Keshav S, Shenker S (1990) Analysis and simulation of a fair queueing algorithm. J Internetworking 1:3–26 Denault M (2001) Coherent allocation of risk capital. J Risk 4_7–21 Dewan S, Mendelson H (1990) User delay costs and internal pricing for a service facility. Manag Sci 36:1502–1517 Dutta B, Ray D (1989) A concept of egalitarianism under participation constraints. Econometrica 57:615–635 Dutta B, Ray D (1991) Constrained egalitarian allocations. Games Econ Behav 3:403–422 Flam SD, Jourani A (2003) Strategic behavior and partial cost sharing. Games Econ Behav 43:44–56 Fleurbaey M, Sprumont Y (2006) Sharing the cost of a public good without subsidies. Université de Montréal, Cahier Friedman E (2002) Strategic properties of heterogeneous serial cost sharing. Math Soc Sci 44:145–154 Friedman E (2004) Paths and consistency in additive cost sharing. Int J Game Theory 32:501–518 Friedman E, Moulin H (1999) Three methods to share joint costs or surplus. J Econ Theory 87:275–312 Friedman E, Shenker S (1998) Learning and implementation on the Internet. Working paper 199821, Rutgers University, Piscataway González-Rodríguez P, Herrero C (2004) Optimal sharing of surgical costs in the presence of queues. Math Methods Oper Res 59:435–446 Granot D, Huberman G (1984) On the core and nucleolus of minimum cost spanning tree games. Math Program 29:323– 347

Cost Sharing

40. Green J, Laffont JJ (1977) Characterization of satisfactory mechanisms for the revelation of preferences for public goods. Econometrica 45:427–438 41. Groves T (1973) Incentives in teams. Econometrica 41:617– 663 42. Haimanko O (2000) Partially symmetric values. Math Oper Res 25:573–590 43. Harsanyi J (1967) Games with incomplete information played by Bayesian players. Manag Sci 14:159–182 44. Hart S, Mas-Colell A (1989) Potential, value, and consistency. Econometrica 57:589–614 45. Haviv M (2001) The Aumann–Shapley price mechanism for allocating congestion costs. Oper Res Lett 29:211–215 46. Hougaard JL, Thorlund-Petersen L (2000) The stand-alone test and decreasing serial cost sharing. Econ Theory 16:355– 362 47. Hougaard JL, Thorlund-Petersen L (2001) Mixed serial cost sharing. Math Soc Sci 41:51–68 48. Hougaard JL, Tind J (2007) Cost allocation and convex data envelopment. mimeo University of Copenhagen, Copenhagen 49. Henriet D, Moulin H (1996) Traffic-based cost allocation in a network. J RAND Econ 27:332–345 50. Iñarra E, Isategui JM (1993) The Shapley value and average convex games. Int J Game Theory 22:13–29 51. Israelsen D (1980) Collectives, communes, and incentives. J Comp Econ 4:99–124 52. Jackson MO (2001) A crash course in implementation theory. Soc Choice Welf 18:655–708 53. Joskow PL (1976) Contributions of the theory of marginal cost pricing. Bell J Econ 7:197–206 54. Kalai E (1977) Proportional solutions to bargaining situations: Interpersonal utility comparisons. Econometrica 45:1623– 1630 55. Kaminski M (2000) ‘Hydrolic’ rationing. Math Soc Sci 40:131– 155 56. Kolpin V, Wilbur D (2005) Bayesian serial cost sharing. Math Soc Sci 49:201–220 57. Koster M (2002) Concave and convex serial cost sharing. In: Borm P, Peters H (eds) Chapters in game theory. Kluwer, Dordrecht 58. Koster M (2005) Sharing variable returns of cooperation. CeNDEF Working Paper 05–06, University of Amsterdam, Amsterdam 59. Koster M (2006) Heterogeneous cost sharing, the directional serial rule. Math Methods Oper Res 64:429–444 60. Koster M (2007) The Moulin–Shenker rule. Soc Choice Welf 29:271–293 61. Koster M (2007) Consistent cost sharing. mimeo, University of Amsterdam 62. Koster M, Tijs S, Borm P (1998) Serial cost sharing methods for multi-commodity situations. Math Soc Sci 36:229–242 63. Koster M, Molina E, Sprumont Y, Tijs ST (2002) Sharing the cost of a network: core and core allocations. Int J Game Theory 30:567–599 64. Koster M, Reijnierse H, Voorneveld M (2003) Voluntary contributions to multiple public projects. J Publ Econ Theory 5:25– 50 65. Koutsoupias E, Papadimitriou C (1999) Worst-case equilibria. In: 16th Annual Symposium on Theoretical Aspects of Computer Science, Trier. Springer, Berlin, pp 404–413

66. Legros P (1986) Allocating joint costs by means of the Nucleolus. Int J Game Theory 15:109–119 67. Leroux J (2004) Strategy-proofness and efficiency are incompatible in production economies. Econ Lett 85:335–340 68. Leroux J (2006) Profit sharing in unique Nash equilibrium: characterization in the two-agent case, Games and Economic Behavior 62:558–572 69. Littlechild SC, Owen G (1973) A simple expression for the Shapley value in a simple case. Manag Sci 20:370–372 70. Littlechild SC, Thompson GF (1977) Aircraft landing fees: A game theory approach. Bell J Econ 8:186–204 71. Maniquet F, Sprumont Y (1999) Efficient strategy-proof allocation functions in linear production economies. Econ Theory 14:583–595 72. Maniquet F, Sprumont Y (2004) Fair production and allocation of an excludable nonrival good. Econometrica 72:627– 640 73. Maschler M (1990) Consistency. In: Ichiishi T, Neyman A, Tauman Y (eds) Game theory and applications. Academic Press, New York, pp 183–186 74. Maschler M (1992) The bargaining set, kernel and nucleolus. In: Aumann RJ, Hart S (eds) Handbook of Game Theory with Economic Applications, vol I. North–Holland, Amsterdam 75. Maschler M, Reijnierse H, Potters J (1996) Monotonicity properties of the nucleolus of standard tree games. Report 9556, Department of Mathematics, University of Nijmegen, Nijmegen 76. Maskin E, Sjöström T (2002) Implementation theory. In: Arrow KJ, Sen AK, Suzumura K (eds) Handbook of Social Choice and Welfare, vol I. North–Holland, Amsterdam 77. Matsubayashi N, Umezawa M, Masuda Y, Nishino H (2005) Cost allocation problem arising in hub-spoke network systems. Europ J Oper Res 160:821–838 78. McLean RP, Pazgal A, Sharkey WW (2004) Potential, consistency, and cost allocation prices. Math Oper Res 29:602–623 79. Mirman L, Tauman Y (1982) Demand compatible equitable cost sharing prices. Math Oper Res 7:40–56 80. Monderer D, Shapley LS (1996) Potential games. Games Econ Behav 14:124–143 81. Moulin H (1987) Equal or proportional division of a surplus, and other methods. Int J Game Theory 16:161–186 82. Moulin H (1994) Serial cost-sharing of an excludable public good. Rev Econ Stud 61:305–325 83. Moulin H (1995) Cooperative microeconomics: A game– theoretic introduction. Prentice Hall, London 84. Moulin H (1995) On additive methods to share joint costs. Japan Econ Rev 46:303–332 85. Moulin H (1996) Cost sharing under increasing returns: A comparison of simple mechanisms. Games Econ Behav 13:225–251 86. Moulin H (1999) Incremental cost sharing: Characterization by coalition strategy-proofness. Soc Choice Welf 16:279–320 87. Moulin H (2000) Priority rules and other asymmetric rationing methods. Econometrica 68:643–684 88. Moulin H (2002) Axiomatic cost and surplus-sharing. In: Arrow KJ, Sen AK, Suzumura K (eds) Handbook of Social Choice and Welfare. Handbooks in Economics, vol 19. North–Holland, Amsterdam, pp 289–357 89. Moulin H (2006) Efficient cost sharing with cheap residual claimant. mimeo Rice University, Housten

751

752

Cost Sharing

90. Moulin H (2007) The price of anarchy of serial, average and incremental cost sharing. mimeo Rice University, Housten 91. Moulin H, Shenker S (1992) Serial cost sharing. Econometrica 60:1009–1037 92. Moulin H, Shenker S (1994) Average cost pricing versus serial cost sharing; an axiomatic comparison. J Econ Theory 64:178– 201 93. Moulin H, Shenker S (2001) Strategy-proof sharing of submodular cost: budget balance versus efficiency. Econ Theory 18:511–533 94. Moulin H, Sprumont Y (2005) On demand responsiveness in additive cost sharing. J Econ Theory 125:1–35 95. Moulin H, Sprumont Y (2006) Responsibility and cross-subsidization in cost sharing. Games Econ Behav 55:152–188 96. Moulin H, Vohra R (2003) Characterization of additive cost sharing methods. Econ Lett 80:399–407 97. Moulin H, Watts A (1997) Two versions of the tragedy of the commons. Econ Des 2:399–421 98. Mutuswami S (2004) Strategy proof cost sharing of a binary good and the egalitarian solution. Math Soc Sci 48:271–280 99. Myerson RR (1980) Conference structures and fair allocation rules. Int J Game Theory 9:169–182 100. Myerson RR (1991) Game theory: Analysis of conflict. Harvard University Press, Cambridge 101. Nash JF (1950) Equilibrium points in n-person games. Proc Natl Acad Sci 36:48–49 102. O’Neill B (1982) A problem of rights arbitration from the Talmud. Math Soc Sci 2:345–371 103. Osborne MJ (2004) An introduction to game theory. Oxford University Press, New York 104. Osborne MJ, Rubinstein A (1994) A Course in Game Theory. MIT Press, Cambridge 105. Peleg B, Sudhölter P (2004) Introduction to the theory of cooperative games, Series C: Theory and Decision Library Series. Kluwer, Amsterdam 106. Pérez-Castrillo D, Wettstein D (2006) An ordinal Shapley value for economic environments. J Econ Theory 127:296–308 107. Potters J, Sudhölter P (1999) Airport problems and consistent allocation rules. Math Soc Sci 38:83–102 108. Razzolini L, Reksulak M, Dorsey R (2004) An experimental evaluation of the serial cost sharing rule. Working paper 0402, VCU School of Business, Dept. of Economics, Richmond 109. Ritzberger K (2002) Foundations of non-cooperative game theory. Oxford University Press, Oxford 110. Rosenthal RW (1973) A class of games possessing pure-strategy Nash equilibria. J Econ Theory 2:65–67 111. Roth AE (ed) (1988) The Shapley value, essays in honor of Lloyd S Shapley. Cambridge University Press, Cambridge, pp 307–319 112. Samet D, Tauman Y, Zang I (1984) An application of the Aumann–Shapley prices for cost allocation in transportation problems. Math Oper Res 9:25–42 113. Sánchez SF (1997) Balanced contributions axiom in the solution of cooperative games. Games Econ Behav 20:161–168 114. Sandsmark M (1999) Production games under uncertainty. Comput Econ 14:237–253 115. Schmeidler D (1969) The Nucleolus of a characteristic function game. SIAM J Appl Math 17:1163–1170 116. Shapley LS (1953) A value for n-person games. Ann Math Study, vol 28. Princeton University Press, Princeton, pp 307– 317

117. Shapley LS (1967) On balanced sets and cores. Nav Res Logist Q 14:453–460 118. Shapley LS (1969) Utility comparison and the theory of games. In: Guilbaud GT (ed) La decision: Aggregation et dynamique des ordres de preference. Editions du Centre National de la Recherche Scientifique. Paris pp 251–263 119. Shapley LS (1971) Cores of convex games. Int J Game Theory 1:1–26 120. Sharkey W (1982) Suggestions for a game–theoretic approach to public utility pricing and cost allocation. Bell J Econ 13:57–68 121. Sharkey W (1995) Network models in economics. In: Ball MO et al (eds) Network routing. Handbook in Operations Research and Management Science, vol 8. North–Holland, Amsterdam 122. Shubik M (1962) Incentives, decentralized control, the assignment of joint cost, and internal pricing. Manag Sci 8:325–343 123. Skorin-Kapov D (2001) On cost allocation in hub-like networks. Ann Oper Res 106:63–78 124. Skorin-Kapov D, Skorin–Kapov J (2005) Threshold based discounting network: The cost allocation provided by the nucleolus. Eur J Oper Res 166:154–159 125. Sprumont Y (1998) Ordinal cost sharing. J Econ Theory 81:126–162 126. Sprumont Y (2000) Coherent cost sharing. Games Econ Behav 33:126–144 127. Sprumont Y (2005) On the discrete version of the Aumann– Shapley cost-sharing method. Econometrica 73:1693–1712 128. Sprumont Y, Ambec S (2002) Sharing a river. J Econ Theory 107:453–462 129. Sprumont Y, Moulin H (2005) Fair allocation of production externalities: Recent results. CIREQ working paper 28-2005, Montreal 130. Sudhölter P (1998) Axiomatizations of game theoretical solutions for one-output cost sharing problems. Games Econ Behav 24:42–71 131. Suijs J, Borm P, Hamers H, Koster M, Quant M (2005) Communication and cooperation in public network situations. Ann Oper Res 137:117–140 132. Tauman Y (1988) The Aumann–Shapley prices: A survey. In: Roth A (ed) The shapley value. Cambridge University Press, Cambridge, pp 279–304 133. Thomas LC (1992) Dividing credit-card costs fairly. IMA J Math Appl Bus & Ind 4:19–33 134. Thomson W (1996) Consistent allocation rules. Mimeo, Economics Department, University of Rochester, Rochester 135. Thomson W (2001) On the axiomatic method and its recent applications to game theory and resource allocation. Soc Choice Welf 18:327–386 136. Tijs SH, Driessen TSH (1986) Game theory and cost allocation problems. Manag Sci 32:1015–1028 137. Tijs SH, Koster M (1998) General aggregation of demand and cost sharing methods. Ann Oper Res 84:137–164 138. Timmer J, Borm P, Tijs S (2003) On three Shapley-like solutions for cooperative games with random payoffs. Int J Game Theory 32:595–613 139. van de Nouweland A, Tijs SH (1995) Cores and related solution concepts for multi-choice games. Math Methods Oper Res 41:289–311 140. von Neumann J, Morgenstern O (1944) Theory of games and economic behavior. Princeton University Press, Princeton

Cost Sharing

141. Watts A (2002) Uniqueness of equilibrium in cost sharing games. J Math Econ 37:47–70 142. Weber RJ (1988) Probabilistic values for games. In: Roth AE (ed) The Shapley value. Cambridge University Press, Cambridge 143. Young HP (1985) Producer incentives in cost allocation. Econometrica 53:757–765 144. Young HP (1985) Monotonic solutions of cooperative games. Int J Game Theory 14:65–72

145. Young HP (1985) Cost allocation: Methods, principles, applications. North-Holland, Amsterdam 146. Young HP (1988) Distributive justice in taxation. J Econ Theory 44:321–335 147. Young HP (1994) Cost allocation. In: Aumann RJ, Hart S (eds) Handbook of Game Theory, vol II. Elsevier, Amsterdam, pp 1193–1235 148. Young HP (1998) Cost allocation, demand revelation, and core implementation. Math Soc Sci 36:213–229

753

754

Curvelets and Ridgelets

Curvelets and Ridgelets JALAL FADILI 1, JEAN-LUC STARCK2 GREYC CNRS UMR 6072, ENSICAEN, Caen Cedex, France 2 Laboratoire Astrophysique des Interactions Multiéchelles UMR 7158, CEA/DSM-CNRS-Universite Paris Diderot, SEDI-SAP, Gif-sur-Yvette Cedex, France

1

Article Outline Glossary Definition of the Subject Introduction Ridgelets Curvelets Stylized Applications Future Directions Bibliography Glossary WT1D The one-dimensional Wavelet Transform as defined in [53]. See also  Numerical Issues When Using Wavelets. WT2D The two-dimensional Wavelet Transform. Discrete ridgelet trasnform (DRT) The discrete implementation of the continuous Ridgelet transform. Fast slant stack (FSS) An algebraically exact Radon transform of data on a Cartesian grid. First generation discrete curvelet transform (DCTG1) The discrete curvelet transform constructed based on the discrete ridgelet transform. Second generationdiscrete curvelet transformx (DCTG2) The discrete curvelet transform constructed based on appropriate bandpass filtering in the Fourier domain. Anisotropic elements By anistropic, we mean basis elements with elongated effective support; i. e. length > width. Parabolic scaling law A basis element obeys the parabolic scaling law if its effective support is such that width  length2 . Definition of the Subject Despite the fact that wavelets have had a wide impact in image processing, they fail to efficiently represent objects with highly anisotropic elements such as lines or curvilinear structures (e. g. edges). The reason is that wavelets are non-geometrical and do not exploit the regularity of the edge curve.

The Ridgelet and the Curvelet [16,17] transforms were developed as an answer to the weakness of the separable wavelet transform in sparsely representing what appears to be simple building atoms in an image, that is lines, curves and edges. Curvelets and ridgelets take the form of basis elements which exhibit high directional sensitivity and are highly anisotropic [9,18,32,68]. These very recent geometric image representations are built upon ideas of multiscale analysis and geometry. They have had an important success in a wide range of image processing applications including denoising [42,64,68], deconvolution [38,74], contrast enhancement [73], texture analysis [2], detection [44], watermarking [78], component separation [70,71], inpainting [37,39] or blind source separation [6,7]. Curvelets have also proven useful in diverse fields beyond the traditional image processing application. Let’s cite for example seismic imaging [34,42,43], astronomical imaging [48,66,69], scientific computing and analysis of partial differential equations [13,14]. Another reason for the success of ridgelets and curvelets is the availability of fast transform algorithms which are available in non-commercial software packages following the philosophy of reproducible research, see [4,75]. Introduction Sparse Geometrical Image Representation Multiscale methods have become very popular, especially with the development of the wavelets in the last decade. Background texts on the wavelet transform include [23,53, 72]. An overview of implementation and practical issues of the wavelet transform can also be found in  Numerical Issues When Using Wavelets. Despite the success of the classical wavelet viewpoint, it was argued that the traditional wavelets present some strong limitations that question their effectiveness in higher-dimension than 1 [16,17]. Wavelets rely on a dictionary of roughly isotropic elements occurring at all scales and locations, do not describe well highly anisotropic elements, and contain only a fixed number of directional elements, independent of scale. Following this reasoning, new constructions have been proposed such as the ridgelets [9,16] and the curvelets [17,18,32,68]. Ridgelets and curvelets are special members of the family of multiscale orientation-selective transforms, which has recently led to a flurry of research activity in the field of computational and applied harmonic analysis. Many other constructions belonging to this family have been investigated in the literature, and go by the name contourlets [27], directionlets [76], bandlets [49,62], grouplets [54], shearlets [47], dual-tree wavelets and wavelet packets [40,46], etc.

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 1 Few Ridgelets examples – The second to fourth graphs are obtained after simple geometric manipulations of the first ridgelet, namely rotation, rescaling, and shifting

Throughout this paper, the term ‘sparsity’ is used and intended in a weak sense. We are aware that practical images and signals may not be supported in a transform domain on a set of relatively small size (sparse set). Instead, they may only be compressible (nearly sparse) in some transform domain. Hence, with a slight abuse of terminology, we will say that a representation is sparse for an image within a certain class, if it provides a compact description of such an image.

A ridgelet is constant along lines x1 cos C x2 sin D const. Transverse to these ridges it is a wavelet. Figure 1 depicts few examples of ridgelets. The second to fourth panels are obtained after simple geometric manipulations of the ridgelet (left panel), namely rotation, rescaling, and shifting. Given an integrable bivariate function f (x), we define its ridgelet coefficients by Z ˛ ˝ R f (a; b; ) :D f ; a;b; D f (x) a;b; (x) dx : R2

Notations We work throughout in two dimensions with spatial variable x 2 R2 and  a continuous frequency-domain variable. Parentheses (:; :) are used for continuous-domain function evaluations, and brackets [:; :] for discrete-domain array indices. The hatˆnotation will be used for the Fourier transform. Ridgelets The Continuous Ridgelet Transform The two-dimensional continuous ridgelet transform in R2 can be defined as follows [10]. We pick a smooth univariate function : R ! R with sufficient decay and satisfying the admissibility condition Z (1) j ˆ ()j2 /jj2 d < 1 ; R which holds if, say, has a vanishing mean (t)dt D 0. We will suppose a special normalization about so that R1 ˆ ()j2  2 d D 1. j 0 For each scale a > 0, each position b 2 R and each orientation 2 [0; 2), we define the bivariate ridgelet 2 a;b; : R ! R by a;b; (x)

D

a;b; (x1 ; x2 ) D a1/2  ((x1 cos

C x2 sin  b)/a) ;

(2)

We have the exact reconstruction formula Z

2

Z

1

Z

f (x) D 1

0

0

1

da

d

R f (a; b; ) a;b; (x) 3 db a 4

(3) valid almost everywhere for functions which are both integrable and square integrable. This formula is stable and one can prove a Parseval relation [16]. Ridgelet analysis may be constructed as wavelet analysis in the Radon domain. The rationale behind this is that the Radon transform translates singularities along lines into point singularities, for which the wavelet transform is known to provide a sparse representation. Recall that the Radon transform of an object f is the collection of line integrals indexed by ( ; t) 2 [0; 2)  R given by Z R f ( ; t) D

R2

f (x1 ; x2 )ı(x1 cos C x2 sin  t) dx1 dx2 ; (4)

where ı is the Dirac distribution. Then the ridgelet transform is precisely the application of a 1-dimensional wavelet transform to the slices of the Radon transform where the angular variable is constant and t is varying. Thus, the basic strategy for calculating the continuous

755

756

Curvelets and Ridgelets

ridgelet transform is first to compute the Radon transform R f (t; ) and second, to apply a one-dimensional wavelet transform to the slices R f (; ). Several digital ridgelet transforms (DRTs) have been proposed, and we will describe three of them in this section, based on different implementations of the Radon transform. The RectoPolar Ridgelet Transform A fast implementation of the Radon transform can be proposed in the Fourier domain, based on the projection-slice-theorem. First the 2D-FFT of the given image is computed. Then the resulting function in the frequency domain is to be used to evaluate the frequency values in a polar grid of rays passing through the origin and spread uniformly in angle. This conversion from Cartesian to Polar grid could be obtained by interpolation, and this process is well known by the name gridding in tomography. Given the polar grid samples, the number of rays corresponds to the number of projections, and the number of samples on each ray corresponds to the number of shifts per such angle. Applying one dimensional inverse Fourier transform for each ray, the Radon projections are obtained. The above described process is known to be inaccurate due to the sensitivity to the interpolation involved. This implies that for a better accuracy, the first 2D-FFT employed should be done with high-redundancy. An alternative solution for the Fourier-based Radon transform exists, where the polar grid is replaced with

a pseudo-polar one. The geometry of this new grid is illustrated in Fig. 2. Concentric circles of linearly growing radius in the polar grid are replaced by concentric squares of linearly growing sides. The rays are spread uniformly not in angle but in slope. These two changes give a grid vaguely resembling the polar one, but for this grid a direct FFT can be implemented with no interpolation. When applying now 1D-FFT for the rays, we get a variant of the Radon transform, where the projection angles are not spaced uniformly. For the pseudo-polar FFT to be stable, it was shown that it should contain at least twice as many samples, compared to the original image we started with. A by-product of this construction is the fact that the transform is organized as a 2D array with rows containing the projections as a function of the angle. Thus, processing the Radon transform in one axis is easily implemented. More details can be found in [68]. One-Dimensional Wavelet Transform To complete the ridgelet transform, we must take a one-dimensional wavelet transform (WT1D) along the radial variable in Radon space. We now discuss the choice of the digital WT1D. Experience has shown that compactly-supported wavelets can lead to many visual artifacts when used in conjunction with nonlinear processing, such as hardthresholding of individual wavelet coefficients, particularly for decimated wavelet schemes used at critical sampling. Also, because of the lack of localization of such compactly-supported wavelets in the frequency domain, fluctuations in coarse-scale wavelet coefficients can introduce fine-scale fluctuations. A frequency-domain approach must be taken, where the discrete Fourier transform is reconstructed from the inverse Radon transform. These considerations lead to use band-limited wavelet, whose support is compact in the Fourier domain rather than the time-domain [28,29,68]. In [68], a specific overcomplete wavelet transform [67,72] has been used. The wavelet transform algorithm is based on a scaling function  such that ˆ vanishes outside of the interval [ c ; c ]. We define the Fourier transform of the scaling function as a re-normalized B3 -spline 3 ˆ ( ) D B3 (4 ); 2

Curvelets and Ridgelets, Figure 2 Illustration of the pseudo-polar grid in the frequency domain for an n by n image (n D 8)

and ˆ as the difference between two consecutive resolutions ˆ (2 ) D ( ) ˆ ˆ  (2 ):

Curvelets and Ridgelets

Because ˆ is compactly supported, the sampling theorem shows than one can easily build a pyramid of n C n/2 C    C 1 D 2n elements, see [72] for details. This WT1D transform enjoys the following useful properties:  The wavelet coefficients are directly calculated in the Fourier space. In the context of the ridgelet transform, this allows avoiding the computation of the one-dimensional inverse Fourier transform along each radial line.  Each sub-band is sampled above the Nyquist rate, hence, avoiding aliasing –a phenomenon typically encountered by critically sampled orthogonal wavelet transforms [65].  The reconstruction is trivial. The wavelet coefficients simply need to be co-added to reconstruct the input signal at any given point. In our application, this implies that the ridgelet coefficients simply need to be co-added to reconstruct Fourier coefficients. This wavelet transform introduces an extra redundancy factor. However, we note that the goal in this implementation is not data compression or efficient coding. Rather, this implementation would be useful to the practitioner whose focuses on data analysis, for which it is well-known that over-completeness through (almost) translation-invariance can provide substantial advantages. Assembling all above ingredients together gives the flowchart of the discrete ridgelet transform (DRT) depicted in Fig. 3. The DRT of an image of size n  n is an image of size 2n  2n, introducing a redundancy factor equal to 4. We note that, because this transform is made of a chain of steps, each one of which is invertible, the whole transform is invertible, and so has the exact reconstruction property. For the same reason, the reconstruction is stable under perturbations of the coefficients. Last but not least, this discrete transform is computationally attractive. Indeed, the algorithm we presented here has low complexity since it runs in O(n2 log n) flops for an n  n image. The Orthonormal Finite Ridgelet Transform The orthonormal finite ridgelet transform (OFRT) has been proposed [26] for image compression and filtering. This transform is based on the finite Radon transform [55] and a 1D orthogonal wavelet transform. It is not redundant and reversible. It would have been a great alternative to the previously described ridgelet transform if the OFRT were not based on an awkward definition of a line. In fact,

Curvelets and Ridgelets, Figure 3 Discrete ridgelet transform flowchart. Each of the 2n radial lines in the Fourier domain is processed separately. The 1-D inverse FFT is calculated along each radial line followed by a 1-D nonorthogonal wavelet transform. In practice, the one-dimensional wavelet coefficients are directly calculated in the Fourier space

Curvelets and Ridgelets, Figure 4 The backprojection of a ridgelet coefficient by the FFT-based ridgelet transform (left), and by the OFRT (right)

a line in the OFRT is defined algebraically rather that geometrically, and so the points on a ‘line’ can be arbitrarily and randomly spread out in the spatial domain. Figure 4 shows the back-projection of a ridgelet coefficient by the FFT-based ridgelet transform (left) and by the OFRT (right). It is clear that the backprojection of the OFRT is nothing like a ridge function. Because of this specific definition of a line, the thresholding of the OFRT coefficients produces strong artifacts. Figure 5 shows a part of the original image Boat, and its reconstruction after hard thresholding the OFRT of the noise-free Boat. The resulting image is not smoothed as

757

758

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 5 Part of original noise-free Boat image (left), and reconstruction after hard thresholding its OFRT coefficients (right)

one would expect, but rather a noise has been added to the noise-free image as part of the filtering ! Finally, the OFRT presents another limitation: the image size must be a prime number. This last point is however not too restrictive, because we generally use a spatial partitioning when denoising the data, and a prime number block size can be used. The OFRT is interesting from the conceptual point of view, but still requires work before it can be used for real applications such as denoising. The Fast Slant Stack Ridgelet Transform The Fast Slant Stack (FSS) [3] is a Radon transform of data on a Cartesian grid, which is algebraically exact and geometrically more accurate and faithful than the previously described methods. The back-projection of a point in Radon space is exactly a ridge function in the spatial domain (see Fig. 6). The transformation of an n  n image  is a 2n   2n image. n line integrals with angle between  4 ; 4 are calculated from the zero padded image on the  y-axis, and n line integrals with angle between 4 ; 3 4 are computed by zero padding the image on the x-axis. For   a given angle inside  4 ; 4 , 2n line integrals are calculated by first shearing the zero-padded image, and then integrating the pixel values along all horizontal lines (resp.  vertical lines for angles in 4 ; 3 4 ). The shearing is per-

formed one column at a time (resp. one line at a time) by using the 1D FFT. Figure 7 shows an example of the image shearing step with two different angles (5 4 and  4 ). A DRT based on the FSS transform has been proposed in [33]. The connection between the FSS and the Linogram has been investigated in [3]. A FSS algorithm is also proposed in [3], based on the 2D Fast Pseudo-polar Fourier transform which evaluates the 2-D Fourier transform on a non-Cartesian (pseudo-polar) grid, operating in O(n2 log n) flops. Figure 8 left exemplifies a ridgelet in the spatial domain obtained from the DRT based on FSS implementation. Its Fourier transform is shown on Fig. 8 right superimposed on the DRT frequency tiling [33]. The Fourier transform of the discrete ridgelet lives in an angular wedge. More precisely, the Fourier transform of a discrete ridgelet at scale j lives within a dyadic square of size 2 j . Local Ridgelet Transforms The ridgelet transform is optimal for finding global lines of the size of the image. To detect line segments, a partitioning must be introduced [9]. The image can be decomposed into overlapping blocks of side-length b pixels in such a way that the overlap between two vertically adjacent blocks is a rectangular array of size b by b/2; we use overlap to avoid blocking artifacts. For an n by n image, we count 2n/b such blocks in each direction, and thus the redundancy factor grows by a factor of 4. The partitioning introduces redundancy, as a pixel belongs to 4 neighboring blocks. We present two competing strategies to perform the analysis and synthesis: 1. The block values are weighted by a spatial window w (analysis) in such a way that the co-addition of all blocks reproduce exactly the original pixel value (synthesis). 2. The block values are those of the image pixel values (analysis) but are weighted when the image is reconstructed (synthesis).

Curvelets and Ridgelets, Figure 6 Backprojection of a point at four different locations in the Radon space using the FSS algorithm

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 7 Slant Stack Transform of an image

Curvelets and Ridgelets, Figure 8 a Example of a ridgelet obtained by the Fast Slant Stack implementation. b Its FFT superimposed on the DRT frequency tiling

Experiments have shown that the second approach leads to better results especially for restoration problems, see [68] for details. We calculate a pixel value, f [i1 ; i2 ] from its four corresponding block values of half-size m D b/2, namely, B1 [k1 ; l1 ], B2 [k2 ; l1 ], B3 [k1 ; l2 ] and B4 [k2 ; l2 ] with k1 ; l1 > b/2 and k2 D k1  m; l2 D l1  m, in the

following way: f1 D w(k2 /m)B1 [k1 ; l1 ] C w(1  k2 /m)B2 [k2 ; l1 ] f2 D w(k2 /m]B3 [k1 ; l2 ] C w(1  k2 /m)B4 [k2 ; l2 ] f [i1 ; i2 ] D w(l2 /m) f1 C w(1  l2 /m) f2 : (5)

759

760

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 9 Local ridgelet transform on bandpass filtered image. At fine scales, curved edges are almost straight lines

where w(x) D cos2 ( x/2) is the window. Of course, one might select any other smooth, non-increasing function satisfying, w(0) D 1, w(1) D 0, w 0 (0) D 0 and obeying the symmetry property w(x) C w(1  x) D 1. Sparse Representation by Ridgelets The continuous ridgelet transform provides sparse representation of both smooth functions (in the Sobolev space W22 ) and of perfectly straight lines [11,31]. We have just seen that there are also various DRTs, i. e. expansions with countable discrete collection of generating elements, which correspond either to frames or orthobases. It has been shown for these schemes that the DRT achieves nearoptimal M-term approximation – that is the non-linear approximation of f using the M highest ridgelet coefficients in magnitude - to smooth images with discontinuities along straight lines [16,31]. In summary, ridgelets provide sparse presentation for piecewise smooth images away from global straight edges. Curvelets The First Generation Curvelet Transform In image processing, edges are curved rather than straight lines and ridgelets are not able to efficiently represent such images. However, one can still deploy the ridgelet machinery in a localized way, at fine scales, where curved edges are almost straight lines (see Fig. 9). This is the idea underlying the first generation curvelets (termed here CurveletG1) [18].

First Generation Curvelets Construction The CurveletG1 transform [18,32,68] opens the possibility to analyze an image with different block sizes, but with a single transform. The idea is to first decompose the image into a set of wavelet bands, and to analyze each band by a local ridgelet transform as illustrated on Fig. 9. The block size can be changed at each scale level. Roughly speaking, different levels of the multiscale ridgelet pyramid are used to represent different sub-bands of a filter bank output. At the same time, this sub-band decomposition imposes a relationship between the width and length of the important frame elements so that they are anisotropic and obey approximately the parabolic scaling law width  length2 . The First Generation Discrete Curvelet Transform (DCTG1) of a continuum function f (x) makes use of a dyadic sequence of scales, and a bank of filters with the property that the bandpass filter j is concentrated near the frequencies [22 j ; 22 jC2 ], i. e.  j ( f ) D 2 j  f ;

b 2 j () D  b (22 j ) : 

In wavelet theory, one uses a decomposition into dyadic sub-bands [2 j ; 2 jC1 ]. In contrast, the sub-bands used in the discrete curvelet transform of continuum functions has the nonstandard form [22 j ; 22 jC2 ]. This is nonstandard feature of the DCTG1 well worth remembering (this is where the approximate parabolic scaling law comes into play). The DCTG1 decomposition is the sequence of the following steps:  Sub-band Decomposition. The object f is decomposed into sub-bands.

Curvelets and Ridgelets

Require: Input n  n image f [i1 ; i2 ], type of DRT (see above). 1: Apply the a` trous isotropic WT2D with J scales, 2: Set B1 D Bmin , 3: for j D 1; : : : ; J do 4: Partition the sub-band w j with a block size B j and apply the DRT to each block, 5: if j modulo 2 D 1 then 6: B jC1 D 2B j , 7: else 8: B jC1 D B j . 9: end if 10: end for Curvelets and Ridgelets, Algorithm 1 DCTG1

 Smooth Partitioning. Each sub-band is smoothly windowed into “squares” of an appropriate scale (of sidelength 2 j ).  Ridgelet Analysis. Each square is analyzed via the DRT. In this definition, the two dyadic sub-bands [22 j ; 22 jC1 ] and [22 jC1 ; 22 jC2 ] are merged before applying the ridgelet transform.

Digital Implementation It seems that the isotropic “à trous” wavelet transform ( Numerical Issues When Using Wavelets), [72] is especially well-adapted to the needs of the digital curvelet transform. The algorithm decomposes an n by n image f [i1; i2 ] as a superposition of the form f [i1 ; i2 ] D c J [i1 ; i2 ] C

J X

w j [i1 ; i2 ];

jD1

where cJ is a coarse or smooth version of the original image f and wj represents ‘the details of f ’ at scale 2 j . Thus, the algorithm outputs J C 1 sub-band arrays of size n  n. A sketch of the DCTG1 implementation is given in Algorithm 1. The side-length of the localizing windows is doubled at every other dyadic sub-band, hence maintaining the fundamental property of the curvelet transform which says that elements of length about 2 j/2 serve for the analysis and synthesis of the jth sub-band [2 j ; 2 jC1 ]. Note also that the coarse description of the image cJ is left intact. In the results shown in this paper, we used the default value Bmin D 16 pixels in our implementation. Figure 10 gives an overview of the organization of the DCTG1 algorithm.

Curvelets and Ridgelets, Figure 10 First Generation Discrete Curvelet Transform (DCTG1) flowchart. The figure illustrates the decomposition of the original image into sub-bands followed by the spatial partitioning of each sub-band. The ridgelet transform is then applied to each block

761

762

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 11 A few first generation curvelets

This implementation of the DCTG1 is also redundant. The redundancy factor is equal to 16J C 1 whenever J scales are employed. The DCTG1 algorithm enjoys exact reconstruction and stability, as each step of the analysis (decomposition) algorithm is itself invertible. One can show that the computational complexity of the DCTG1 algorithm we described here based on the DRT of Fig. 3 is O(n2 (log n)2 ) for an n  n image. Figure 11 shows a few curvelets at different scales, orientations and locations. Sparse Representation by First Generation Curvelets The CurveletG1 elements can form either a frame or a tight frame for L2 (R2 ) [17], depending on the WT2D used and the DRT implementation (rectopolar or FSS Radon transform). The frame elements are anisotropic by construction and become successively more anisotropic at progressively higher scales. These curvelets also exhibit directional sensitivity and display oscillatory components across the ‘ridge’. A central motivation leading to the curvelet construction was the problem of non-adaptively representing piecewise smooth (e. g. C2 ) images f which have discontinuity along a C2 curve. Such a model is the so-called cartoon model of (non-textured) images. With the CurveletG1 tight frame construction, it was shown in [17] that for such f , the M-term non-linear approximations f M of f obey, for each  > 0,

The Second Generation Curvelet Transform Despite these interesting properties, the CurveletG1 construction presents some drawbacks. First, the construction involves a complicated seven-index structure among which we have parameters for scale, location and orientation. In addition, the parabolic scaling ratio width  length2 is not completely true (see Subsect. “First Generation Curvelets Construction”). In fact, CurveletG1 assumes a wide range of aspect ratios. These facts make mathematical and quantitative analysis especially delicate. Second, the spatial partitioning of the CurveletG1 transform uses overlapping windows to avoid blocking effects. This leads to an increase of the redundancy of the DCTG1. The computational cost of the DCTG1 algorithm may also be a limitation for large-scale data, especially if the FSSbased DRT implementation is used. In contrast, the second generation curvelets (CurveletG2) [15,20] exhibit a much simpler and natural indexing structure with three parameters: scale, orientation (angle) and location, hence simplifying mathematical analysis. The CurveletG2 transform also implements a tight frame expansion [20] and has a much lower redundancy. Unlike the DCTG1, the discrete CurveletG2 implementation will not use ridgelets yielding a faster algorithm [15, 20]. Second Generation Curvelets Construction

k f  f M k2  C M 2C ;

M ! C1 :

The M-term approximations in the CurveletG1 are almost rate optimal, much better than M-term Fourier or wavelet approximations for such images, see [53].

Continuous Coronization The second generation curvelets are defined at scale 2 j , orientation l and j;l position xk D R1 (2 j k1 ; 2 j/2 k2 ) by translation and l rotation of a mother curvelet ' j as

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 12 a Continuous curvelet frequency tiling. The dark gray area represents a wedge obtained as the product of the radial window (annulus shown in gray) and the angular window (light gray). b The Cartesian grid in space associated to the construction in a whose spacing also obeys the parabolic scaling by duality. c Discrete curvelet frequency tiling. The window uˆ j;l isolates the frequency near trapezoidal wedge such as the one shown in dark gray. d The wrapping transformation. The dashed line shows the same trapezoidal wedge as in c. The parallelogram contains this wedge and hence the support of the curvelet. After periodization, the wrapped Fourier samples can be collected in the rectangle centered at the origin j;l

' j;l ;k (x) D ' j (R l (x  xk )) ;

(6)

where R l is the rotation by l radians. l is the equispaced sequence of rotation angles l D 22b j/2c l, with integer l such that 0  p l  2 (note that the number of orientations varies as 1/ scale). k D (k1 ; k2 ) 2 Z2 is the sequence of translation parameters. The waveform ' j is defined by means of its Fourier transform 'ˆ j (), written in polar coordinates in the Fourier domain

'ˆ j (r; ) D 2

3 j/4

j

ˆ w(2 r)ˆv

2b j/2c 2

! :

(7)

The support of 'ˆ j is a polar parabolic wedge defined by the support of wˆ and vˆ, the radial and angular windows (both smooth, nonnegative and real-valued), applied with scale-dependent window widths in each direction. wˆ and vˆ must also satisfy the partition of unity property [15]. See the frequency tiling in Fig. 12a.

763

764

Curvelets and Ridgelets

In continuous frequency , the CurveletG2 coefficients of data f (x) are defined as the inner product Z j;l ˝ ˛ c j;l ;k :D f ' j;l ;k D fˆ()'ˆ j (R l )e ixk  d : (8)

now define the window which is the Cartesian analog of 'ˆ j above, uˆ j;l () D wˆ j ()ˆv j;l () D wˆ j ()ˆv j;0 (S l );

(10)

R2

This construction implies a few properties: (i) the CurveletG2 defines a tight frame of L2 (R2 ), (ii) the effective length and width of these curvelets obey the parabolic scaling relation (2 j D width) D (length D 2 j/2 )2 , (iii) the curvelets exhibit an oscillating behavior in the direction perpendicular to their orientation. Curvelets as just constructed are complex-valued. It is easy to obtain real-valued curvelets by working on the symmetrized version 'ˆ j (r; ) C 'ˆ j (r; C ). Discrete Coronization The discrete transform takes as input data defined on a Cartesian grid and outputs a collection of coefficients. The continuous-space definition of the CurveletG2 uses coronae and rotations that are not especially adapted to Cartesian arrays. It is then convenient to replace these concepts by their Cartesian counterparts. That is concentric squares (instead of concentric circles) and shears (instead of rotations), see Fig. 12c. The Cartesian equivalent to the radial window ˆ  j ) would be a bandpass frequencywˆ j () D w(2 localized window which can be derived from the differˆ  j 1 ) ence of separable low-pass windows H j () D h(2 ˆh(2 j 2 ) (h is a 1D low-pass filter): wˆ j () D

q

H 2jC1 ()  H 2j (); 8 j 0 ; ˆ 1 ) h(

ˆ 2) : and wˆ 0 () D h(

Another possible choice is to select these windows inspired by the construction of Meyer wavelets [20,57]. See [15] for more details about the construction of the Cartesian wˆ j ’s. Let’s now examine the angular localization. Each Cartesian coronae has four quadrants: East, North, West and South. Each quadrant is separated into 2b j/2c orientations (wedges) with the same areas. Take for example the East quadrant (/4  l < /4). For the West quadrant, we would proceed by symmetry around the origin, and for the North and South quadrant by exchanging the roles of

1 and 2 . Define the angular window for the lth direction as

j/2c 2  1 tan l b vˆ j;l () D vˆ 2 ; (9)

1 with the sequence of equi-spaced slopes (and not angles) tan l D 2b j/2c l, for l D 2b j/2c ; : : : ; 2b j/2c  1. We can

where S l is the shear matrix. From this definition, it can be seen that uˆ j;l is supported near the trapezoidal wedge f D ( 1 ; 2 )j2 j  1  2 jC1 ; 2 j/2  2 / 1  tan l  2 j/2 g. The collection of uˆ j;l () gives rise to the frequency tiling shown in Fig. 12c. From uˆ j;l (), the digital CurveletG2 construction suggests Cartesian curvelets that are translated and sheared versions of a mother  Cartesian  D (x) D 23 j/4 ' D S T x  m curvelet 'ˆ jD () D uˆ j;0 (), ' j;l j ;k  l

where m D (k1 2 j ; k2 2 j/2 ). Digital Implementation The goal here is to find a digital implementation of the Second Generation Discrete Curvelet Transform (DCTG2), whose coefficients are now given by E Z D i S T m D )e  l d: (11) c j;l ;k :D f ' j;l fˆ()'ˆ jD (S1 ;k D l R2

To evaluate this formula with discrete data, one may think of (i) using the 2D FFT to get fˆ, (ii) form the windowed frequency data fˆuˆ j;l and (iii) apply the the inverse Fourier transform. But this necessitates to evaluate the FFT at the sheared grid ST m, for which the classical FFT algol rithm is not valid. Two implementations were then proposed [15], essentially differing in their way of handling the grid:  A tilted grid mostly aligned with the axes of uˆ j;l () which leads to the Unequi-Spaced FFT (USFFT)-based DCTG2. This implementation uses a nonstandard interpolation. Furthermore, the inverse transform uses conjugate gradient iteration to invert the interpolation step. This will have the drawback of a higher computational burden compared to the wrapping-based implementation that we will discuss hereafter. We will not elaborate more about the USFFT implementation as we never use it in practice. The interested reader may refer to [15] for further details and analysis.  A grid aligned with the input Cartesian grid which leads to the wrapping-based DCTG2. The wrapping-based DCTG2 makes a simpler choice of the spatial grid to translate the curvelets. The curvelet coefficients are essentially the same as in (11), except that ST m is replaced by m with values on a rectangular grid. l But again, a difficulty rises because the window uˆ j;l does not fit in a rectangle of size 2 j  2 j/2 to which an inverse

Curvelets and Ridgelets

Require: Input n  n image f [i1 ; i2 ], coarsest decomposition scale, curvelets or wavelets at the finest scale. 1: Apply the 2D FFT and obtain Fourier samples fˆ[i1 ; i2 ]. 2: for each scale j and angle l do 3: Form the product fˆ[i1 ; i2 ]uˆ j;l [i1 ; i2 ]. 4: Wrap this product around the origin. 5: Apply the inverse 2D FFT to the wrapped data to get discrete DCTG2 coefficients. 6: end for Curvelets and Ridgelets, Algorithm 2 DCTG2 via wrapping

FFT could be applied. The wrapping trick consists in periodizing the windowed frequency data fˆuˆ j;l , and reindexing the samples array by wrapping around a 2 j  2 j/2 rectangle centered at the origin, see Fig. 12d to get a gist of the wrapping idea. The wrapping-based DCTG2 algorithm can be summarized as in Algorithm 2. The DCTG2 implementation can assign either wavelets or curvelets at the finest scale. In the CurveLab toolbox [75], the default choice is set to wavelets at the finest scale, but this can be easily modified directly in the code. We would like to apologize to the expert reader as many technical details are (deliberately) missing here on the CurveletG2 construction. For instance, low-pass coarse component, window overlapping, windows over junctions between quadrants. This paper is intended to give an overview of these recent multi-scale transforms, and the genuinely interested reader may refer to the original papers of Candès, Donoho, Starck and co-workers for further details (see bibliography). The computational complexity of the wrapping-based DCTG2 analysis and reconstruction algorithms is that of the FFT O(n2 log n), and in practice, the computation time is that of 6 to 10 2D FFTs [15]. This is a faster algorithm compared to the DCTG1. The DCTG2 fast algorithm has participated to make the use of the curvelet transform more attractive in many applicative fields (see Sect. “Stylized Applications” for some of them). The DCTG2, as it is implemented in the CurveLab toolbox [75], has reasonable redundancy, at most 7:8 (much higher in 3D) if curvelets are used at the finest scale. This redundancy can even be reduced down to 4 (and 8 in 3D) if we replace in this implementation the Meyer wavelet construction, which introduces a redundancy factor of 4, by an-

other wavelet pyramidal construction, similar to the one presented in Sect. “One–dimensional Wavelet Transform” which has a redundancy less than 2 in any dimension. Our experiments have shown that this modification does not modify the results in denoising experiments. DCTG2 redundancy is anyway much smaller than the DCTG1 one which is 16J C 1. As stated earlier, the DCTG2 coefficients are complex-valued, but a real-valued DCTG2 with the same redundancy factor can be easily obtained by properly combining coefficients at orientations l and l C . The DCTG2 can be extended to higher dimensions [21]. In the same vein as wavelets on the interval [53], the DCGT2 has been recently adapted to handle image boundaries by mirror extension instead of periodization [25]. The latter modification can have immediate implications in image processing applications where the contrast difference at opposite image boundaries may be an issue (see e. g. the denoising experiment discussion reported in Sect. “Stylized Applications”). We would like to make a connection with other multiscale directional transforms directly linked to curvelets. The contourlets tight frame of Do and Vetterli [27] implements the CurveletG2 idea directly on a discrete grid using a perfect reconstruction filter bank procedure. In [51], the authors proposed a modification of the contourlets with a directional filter bank that provides a frequency partitioning which is close to the curvelets but with no redundancy. Durand in [35] recently introduced families of non-adaptive directional wavelets with various frequency tilings, including that of curvelets. Such families are nonredundant and form orthonormal bases for L2 (R2 ), and have an implementation derived from a single nonseparable filter bank structure with nonuniform sampling. Sparse Representation by Second Generation Curvelets It has been shown by Candès and Donoho [20] that with the CurveletG2 tight frame construction, the M-term nonlinear approximation error of C2 images except at discontinuities along C2 curves obey k f  f M k2  CM 2 (log M)3 : This is an asymptotically optimal convergence rate (up to the (log M)3 factor), and holds uniformly over the C 2  C 2 class of functions. This is a remarkable result since the CurveletG2 representation is non-adaptative. However, the simplicity due to the non-adaptivity of curvelets has a cost: curvelet approximations loose their near optimal properties when the image is composed of edges which are not exactly C2 . Additionally, if the edges are C ˛ -regular with ˛ > 2, then the curvelets convergence rate exponent

765

766

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 13 An example of second generation real curvelet. Left: curvelet in spatial domain. Right: its Fourier transform

remain 2. Other adaptive geometric representations such as bandlets are specifically designed to reach the optimal decay rate O(M ˛ ) [49,62]. Stylized Applications Denoising Elongated Feature Recovery The ability of ridgelets to sparsely represent piecewise smooth images away from discontinuities along lines has an immediate implication on statistical estimation. Consider a piecewise smooth image f away from line singularities embedded in an additive white noise of standard deviation . The ridgeletbased thresholding estimator is nearly optimal for recovering such functions, with a mean-square error (MSE) decay rate almost as good as the minimax rate [12]. To illustrate these theoretical facts, we simulate a vertical band embedded in white Gaussian noise with large  . Figure 14 (top left) represents such a noisy image. The parameters are as follows: the pixel width of the band is 20 and the signal-to-noise ratio (SNR) is set to 0.1. Note that it is not possible to distinguish the band by eye. The wavelet transform (undecimated wavelet transform) is also incapable of detecting the presence of this object; roughly speaking, wavelet coefficients correspond to weighted averages over approximately isotropic neighborhoods (at different scales) and those wavelets clearly do not correlate very well with the very elongated structure (pattern) of the object to be detected. Curve Recovery Consider now the problem of recovering a piecewise C2 function f apart from a discontinuity along a C2 edge. Again, a simple strategy based on thresh-

olding curvelet tight frame coefficients yields an estimator that achieves a MSE almost of the order O( 4/3 ) uniformly over the C 2  C 2 class of functions [19]. This is the optimal rate of convergence as the minimax rate for that class scales as  4/3 [19]. Comparatively, wavelet thresholding methods only achieves a MSE of order O( ) and no better. We also note that the statistical optimality of the curvelet thresholding extends to a large class of ill-posed linear inverse problems [19]. In the experiment of Fig. 15, we have added a white Gaussian noise to “War and Peace”, a drawing from Picasso which contains many curved features. Figure 15 bottom left and right shows respectively the restored images by the undecimated wavelet transform and the DCTG1. Curves are more sharply recovered with the DCTG1. In a second experiment, we compared the denoising performance of several digital implementations of the curvelet transform; namely the DCTG1 with the rectopolar DRT, the DCTG1 with the FSS-based DRT and the wrapping-based DCTG2. The results are shown in Fig. 16, where the original 512  512 Peppers image was corrupted by a Gaussian white noise  D 20 (PSNR = 22dB). Although the FSS-based DRT is more accurate than the rectopolar DRT, the denoising improvement of the former (PSNR=31.31dB) is only 0.18 dB better than the latter (PSNR=31.13dB) on Peppers. The difference is almost undistinguishable by eye, but the computation time is 20 higher for the DCTG1 with the FSS DRT. Consequently, it appears that there is a little benefit of using the FSS DRT in the DCTG1 for restoration applications. Denoising using the DCTG2 with the wrapping implementation gives a PSNR=30.62 dB which is 0:7 dB less than the DCTG1. But this is the price to pay for a lower re-

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 14 Original image containing a vertical band embedded in white noise with relatively large amplitude (left). Denoised image using the undecimated wavelet transform (middle). Denoised image using the DRT based on the rectopolar Radon transform (right)

Curvelets and Ridgelets, Figure 15 The Picasso picture War and Peace (top left), the same image contaminated with a Gaussian white noise (top right). The restored images using the undecimated wavelet transform (bottom left) and the DCTG1 (bottom right)

767

768

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 16 Comparison of denoising performance of several digital implementations of the curvelet transform. Top left: original image. Top right: noisy image  D 20. Bottom left: denoised with DCTG1 using the rectopolar DRT. Bottom middle: denoised with DCTG1 using the FSS DRT. Bottom right: denoised with DCTG2 using the wrapping implementation

dundancy and a much faster transform algorithm. Moreover, the DCTG2 exhibits some artifacts which look like ‘curvelet ghosts’. This is a consequence of the fact that the DCTG2 makes a central use of the FFT which has the side effect of treating the image boundaries by periodization. Linear Inverse Problems Many problems in image processing can be cast as inverting the linear degradation equation y D H f C ", where f is the image to recover, y the observed image and " is a white noise of variance  2 < C1. The linear mapping H is generally ill-behaved which entails ill-posedness of the inverse problem. Typical examples of linear inverse problems include image deconvolution where H is the convolution operator, or image inpainting (recovery of missing data) where H is a binary mask. In the last few years, some authors have attacked the problem of solving linear inverse problems under the umbrella of sparse representations and variational formulations, e. g. for deconvolution [19,24,38,41,74] and in-

painting [37,39]. Typically, in this setting, the recovery of f is stated as an optimization problem with a sparsitypromoting regularization on the representation coefficients of f , e. g. its wavelet or curvelet coefficients. See [37, 38,39,74] for more details. In Fig. 17 first row, we depict an example of deconvolution on Barbara using the algorithm described in [38] with the DCTG2 curvelet transform. The original, degraded (blurred with an exponential kernel and noisy) and restored images are respectively shown on the left, middle and right. The second row gives an example of inpainting on Claudia image using the DCTG2 with 50% missing pixels. Contrast Enhancement The curvelet transform has been successfully applied to image contrast enhancement by Starck et al. [73]. As the curvelet transform capture efficiently edges in an image, it is a good candidate for multiscale edge enhancement. The idea is to modify the curvelet coefficients of the in-

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 17 Illustration of the use of curvelets (DCTG2 transform) when solving two typical linear inverse problems: deconvolution (first row), and inpainting (second row). First row: deconvolution of Barbara image, original (left), blurred and noisy (middle), restored (right). Second row: inpainting of Claudia image, original (left), masked image (middle), inpainted (right)

put image in order to enhance its edges. The curvelet coefficients are typically modified according to the function displayed in the left plot of Fig. 18. Basically, this plot says that the input coefficients are kept intact (or even shrunk) if they have either low (e. g. below the noise level) or high (strongest edges) values. Intermediate curvelet coefficient values which correspond to the faintest edges are amplified. An example of curvelet-based image enhancement on Saturn image is given in Fig. 18. Morphological Component Separation The idea to morphologically decompose a signal/image into its building blocks is an important problem in signal and image processing. Successful separation of a signal content has a key role in the ability to effectively analyze it, enhance it, compress it, synthesize it, and more. Various approaches have been proposed to tackle this problem. The Morphological Component Analysis method (MCA) [70,71] is a method which allows us to decompose

a single signal into two or more layers, each layer containing only one kind of feature in the input signal or image. The separation can be achieved when each kind of feature is sparsely represented by a given transformation in the dictionary of transforms. Furthermore, when a transform sparsely represents a part in the signal/image, it yields nonsparse representation on the other content type. For instance, lines and Gaussians in a image can be separated using the ridgelet transform and the wavelet transform. Locally oscillating textures can be separated from the piecewise smooth content using the local discrete cosine transform and the curvelet transform [70]. A full description of MCA is given in [70]. The first row of Fig. 19 illustrates a separation result when the input image contains only lines and isotropic Gaussians. Two transforms were amalgamated in the dictionary; namely the à trous WT2D and the DRT. The left, middle and right images in the first row of Fig. 19 represent respectively, the original image, the reconstructed component from the à trous wavelet coefficients, and

769

770

Curvelets and Ridgelets

Curvelets and Ridgelets, Figure 18 Curvelet contrast enhancement. Left: enhanced vs original curvelet coefficient. Middle: original Saturn image. Right: result of curvelet-based contrast enhancement

Curvelets and Ridgelets, Figure 19 First row, from left to right: original image containing lines and Gaussians, separated Gaussian component (wavelets), separated line component (ridgelets). Second row, from left to right: original Barbara image, reconstructed local discrete cosine transform part (texture), and piecewise smooth part (curvelets)

Curvelets and Ridgelets

the reconstructed layer from the ridgelet coefficients. The second row of Fig. 19 shows respectively the Barbara image, the reconstructed local cosine (textured) component and the reconstructed curvelet component. In the Barbara example, the dictionary contained the local discrete cosine and the DCTG2 transforms. Future Directions In this paper, we gave an overview of two important geometrical multiscale transforms; namely ridgelets and curvelets. We illustrate their potential applicability on a wide range of image processing problems. Although these transforms are not adaptive, they are strikingly effective both theoretically and practically on piecewise images away from smooth contours. However, in image processing, the geometry of the image and its regularity is generally not known in advance. Therefore, to reach higher sparsity levels, it is necessary to find representations that can adapt themselves to the geometrical content of the image. For instance, geometric transforms such as wedgelets [30] or bandlets [49, 50,62] allow to define an adapted multiscale geometry. These transforms perform a non-linear search for an optimal representation. They offer geometrical adaptivity together with stable algorithms. Recently, Mallat [54] proposed a more biologically inspired procedure named the grouplet transform, which defines a multiscale association field by grouping together pairs of wavelet coefficients. In imaging science and technology, there is a remarkable proliferation of new data types. Beside the traditional data arrays defined on uniformly sampled cartesian grids with scalar-valued samples, many novel imaging modalities involve data arrays that are either (or both):  acquired on specific “exotic” grids such as in astronomy, medicine and physics. Examples include data defined on spherical manifolds such as in astronomical imaging, catadioptric optical imaging where a sensor overlooks a paraboloidal mirror, etc.  or with samples taking values in a manifold. Examples include vector fields such as those of polarization data that may rise in astronomy, rigid motions (a special Euclidean group), definite-positive matrices that are encountered in earth science or medical imaging, etc. The challenge faced with this data is to find multiscale representations which are sufficiently flexible to apply to many data types and yet defined on the proper grid and respect the manifold structure. Extension of wavelets, curvelets and ridgelets for scalar-valued data on the sphere has been proposed recently by [66]. Construc-

tion of wavelets for scalar-valued data defined on graphs and some manifolds was proposed by [22]. The authors in [63][see references therein] describe multiscale representations for data observed on equispaced grids and taking values in manifolds such as: the sphere, the special orthogonal group, the positive definite matrices, and the Grassmannian manifolds. Nonetheless many challenging questions are still open in this field: extend the idea of multiscale geometrical representations such as curvelets or ridgelets to manifold-valued data, find multiscale geometrical representations which are sufficiently general for a wide class of grids, etc. We believe that these directions are one of the hottest topics in this field. Most of the transforms discussed in this paper can handle efficiently smooth or piecewise smooth functions. But sparsely representing textures remains an important open question, mainly because there is no consensus on how to define a texture. Although Julesz [45] stated simple axioms about the probabilistic characterization of textures. It has been known for some time now that some transforms can sometimes enjoy reasonably sparse expansions of certain textures; e. g. oscillatory textures in bases such as local discrete cosines [70], brushlets [56], Gabor [53], complex wavelets [46]. Gabor and wavelets are widely used in the image processing community for texture analysis. But little is known on the decay of Gabor and wavelet coefficients of “texture”. If one is interested in synthesis as well as analysis, the Gabor representation may be useless (at least in its strict definition). Restricting themselves to locally oscillating patters, Demanet and Ying have recently proposed a wavelet-packet construction named WaveAtoms [77]. They showed that WaveAtoms provide optimally sparse representation of warped oscillatory textures. Another line of active research in sparse multiscale transforms was initiated by the seminal work of Olshausen and Field [59]. Following their footprints, one can push one step forward the idea of adaptive sparse representation and requires that the dictionary is not fixed but rather optimized to sparsify a set of exemplar signals/images, i. e. patches. Such a learning problem corresponds to finding a sparse matrix factorization and several algorithms have been proposed for this task in the literature; see [1] for a good overview. Explicit structural constraints such as translation invariance can also be enforced on the learned dictionary [5,58]. These learning-based sparse representations have shown a great improvement over fixed (and even adapted) transforms for a variety of image processing tasks such as denoising and compression [8,36,52], linear inverse problems (image decomposition and inpainting) [61], texture synthesis [60].

771

772

Curvelets and Ridgelets

Bibliography 1. Aharon M, Elad M, Bruckstein AM (2006) The K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process 54(11):4311–4322 2. Arivazhagan S, Ganesan L, Kumar TS (2006) Texture classification using curvelet statistical and co-occurrence feature. In: Proceedings of the 18th International Conference on Pattern Recognition, Hong Kong (ICPR 2006), vol 2. pp 938–941 3. Averbuch A, Coifman RR, Donoho DL, Israeli M, Waldén J (2001) Fast Slant Stack: A notion of Radon transform for data in a cartesian grid which is rapidly computible, algebraically exact, geometrically faithful and invertible. SIAM J Sci Comput 4. BeamLab 200 (2003) http://www-stat.stanford.edu/beamlab/ 5. Blumensath T, Davies M (2006) Sparse and shift-invariant representations of music. IEEE Trans Speech Audio Process 14(1):50–57 6. Bobin J, Moudden Y, Starck JL, Elad M (2006) Morphological diversity and source separation. IEEE Trans Signal Process 13(7):409–412 7. Bobin J, Starck JL, Fadili J, Moudden Y (2007) Sparsity, morphological diversity and blind source separation. IEEE Transon Image Process 16(11):2662–2674 8. Bryt O, Elad M (2008) Compression of facial images using the ksvd algorithm. J Vis Commun Image Represent 19(4):270–283 9. Candès EJ (1998) Ridgelets: theory and applications. Ph D thesis, Stanford University 10. Candès EJ (1999) Harmonic analysis of neural networks. Appl Comput Harmon Anal 6:197–218 11. Candès EJ (1999) Ridgelets and the representation of mutilated sobolev functions. SIAM J Math Anal 33:197–218 12. Candès EJ (1999) Ridgelets: Estimating with ridge functions. Ann Statist 31:1561–1599 13. Candès EJ, Demanet L (2002) Curvelets and Fourier integral operators. C R Acad Sci, Paris, Serie I(336):395–398 14. Candès EJ, Demanet L (2005) The curvelet representation of wave propagators is optimally sparse. Com Pure Appl Math 58(11):1472–1528 15. Candès EJ, Demanet L, Donoho DL, Ying L (2006) Fast discrete curvelet transforms. SIAM Multiscale Model Simul 5(3):861– 899 16. Candès EJ, Donoho DL (1999) Ridgelets : the key to high dimensional intermittency? Philos Trans Royal Soc of Lond A 357:2495–2509 17. Candès EJ, Donoho DL (1999) Curvelets – a surprisingly effective nonadaptive representation for objects with edges. In: Cohen A, Rabut C, Schumaker LL (eds) Curve and Surface Fitting: Saint-Malo. Vanderbilt University Press, Nashville 18. Candès EJ, Donoho DL (2000) Curvelets and curvilinear integrals. Approx J Theory 113:59–90 19. Candès EJ, Donoho DL (2000) Recovering edges in ill-posed inverse problems: Optimality of curvelet frames. Ann Stat 30:784–842 20. Candès EJ, Donoho DL (2002) New tight frames of curvelets and optimal representations of objects with piecewise-C2 singularities. Comm Pure Appl Math 57:219–266 21. Candès E,Ying L, Demanet L (2005) 3d discrete curvelet transform. In: Wavelets XI conf., San Diego. Proc. SPIE, vol 5914, 591413; doi:10.1117/12.616205 22. Coifman RR, Maggioni M (2006) Diffusion wavelets. Appl Comput Harmon Anal 21:53–94

23. Daubechies I (1992) Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia 24. Daubechies I, Defrise M, De Mol C (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Com Pure Appl Math 57:1413–1541 25. Demanet L, Ying L (2007) Curvelets and wave atoms for mirrorextended images. In: Wavelets XII conf., San Diego. Proc. SPIE, vol 6701, 67010J; doi:10.1117/12.733257 26. Do MN, Vetterli M (2003) The finite ridgelet transform for image representation. IEEE Trans Image Process 12(1):16–28 27. Do MN, Vetterli M (2003) Contourlets. In: Stoeckler J, Welland GV (eds) Beyond Wavelets. Academic Press, San Diego 28. Donoho DL (1997) Fast ridgelet transforms in dimension 2. Stanford University, Stanford 29. Donoho DL (1998) Digital ridgelet transform via rectopolar coordinate transform. Technical Report, Stanford University 30. Donoho DL (1999) Wedgelets: nearly-minimax estimation of edges. Ann Stat. 27:859–897 31. Donoho DL (2000) Orthonormal ridgelets and linear singularities. SIAM J Math Anal 31(5):1062–1099 32. Donoho DL, Duncan MR (2000) Digital curvelet transform: strategy, implementation and experiments. In: Szu HH, Vetterli M, Campbell W, Buss JR (eds) Proc. Aerosense, Wavelet Applications VII, vol 4056. SPIE, pp 12–29 33. Donoho DL, Flesia AG (2002) Digital ridgelet transform based on true ridge functions. In: Schmeidler J, Welland GV (eds) Beyond Wavelets. Academic Press, San Diego 34. Douma H, de Hoop MV (2007) Leading-order seismic imaging using curvelets. Geophys 72(6) 35. Durand S (2007) M-band filtering and nonredundant directional wavelets. Appl Comput Harmon Anal 22(1):124–139 36. Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745 37. Elad M, Starck JL, Donoho DL, Querre P (2006) Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). J Appl Comput Harmon Anal 19:340–358 38. Fadili MJ, Starck JL (2006) Sparse representation-based image deconvolution by iterative thresholding. In: Murtagh F, Starck JL (eds) Astronomical Data Analysis IV. Marseille 39. Fadili MJ, Starck JL, Murtagh F (2006) Inpainting and zooming using sparse representations. Comput J 40. Fernandes F, Wakin M, Baraniuk R (2004) Non-redundant, linear-phase, semi-orthogonal, directional complex wavelets. In: IEEE Conf. on Acoustics, Speech and Signal Processing, Montreal, Canada, vol 2, pp 953 956 41. Figueiredo M, Nowak R (2003) An EM algorithm for waveletbased image restoration. IEEE Trans Image Process 12(8):906– 916 42. Hennenfent G, Herrmann FJ (2006) Seismic denoising with nonuniformly sampled curvelets. IEEE Comput Sci Eng 8(3):16– 25 43. Herrmann FJ, Moghaddam PP, Stolk CC (2008) Sparsity- and continuity-promoting seismic image recovery with curvelet frames. Appl Comput Harmon Anal 24(2):150–173 44. Jin J, Starck JL, Donoho DL, Aghanim N, Forni O (2005) Cosmological non-gaussian signatures detection: Comparison of statistical tests. Eurasip J 15:2470–2485 45. Julesz B (1962) Visual pattern discrimination. Trans RE Inform Theory 8(2):84–92

Curvelets and Ridgelets

46. Kingsbury NG (1998) The dual-tree complex wavelet transform: a new efficient tool for image restoration and enhancement. In: European Signal Processing Conference, Rhodes, Greece, pp 319–322 47. Labate D, Lim W-Q, Kutyniok G, Weiss G (2005) Sparse multidimensional representation using shearlets. In: Wavelets XI, vol 5914. SPIE, San Diego, pp 254–262 48. Lambert P, Pires S, Ballot J, García RA, Starck JL, Turck-Chièze S (2006) Curvelet analysis of asteroseismic data. i. method description and application to simulated sun-like stars. Astron Astrophys 454:1021–1027 49. Le Pennec E, Mallat S (2000) Image processing with geometrical wavelets. In: International Conference on Image Processing, Thessaloniki, Greece 50. Le Pennec E, Mallat S (2005) Bandelet Image Approximation and Compression. Multiscale SIAM Mod Simul 4(3):992–1039 51. Lu Y, Do MN (2003) CRIPS-contourlets: A critical sampled directional multiresolution image representation. In: Wavelet X, San Diego. Proc. SPIE, vol 5207. pp 655–665 52. Mairal J, Elad M, Sapiro G (2008) Sparse representation for color image restoration. IEEE Trans Image Process 17(1):53–69 53. Mallat S (1998) A Wavelet Tour of Signal Processing. Academic Press, London 54. Mallat S (2008) Geometrical grouplets. Appl Comput Harmon Anal 55. Matus F, Flusser J (1993) Image representations via a finite Radon transform. IEEE Trans Pattern Anal Mach Intell 15(10):996–1006 56. Meyer FG, Coifman RR (1997) Brushlets: a tool for directional image analysis and image compression. Appl Comput Harmon Anal 4:147–187 57. Meyer Y (1993) Wavelets: Algorithms and Applications. SIAM, Philadelphia 58. Olshausen BA (2000) Sparse coding of time-varying natural images. In: Int. Conf. Independent Component Analysis and Blind Source Separation (ICA), Barcelona, pp 603–608 59. Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive-field properties by learning a sparse code for natural images. Nature 381(6583):607–609 60. Peyré G (2007) Non-negative sparse modeling of textures. In: SSVM. Lecture Notes in Computer Science. Springer, pp 628– 639 61. Peyré G, Fadili MJ, Starck JL (2007) Learning adapted dictionaries for geometry and texture separation. In: Wavelet XII, San Diego. Proc. SPIE, vol 6701, 67011T; doi:10.1117/12.731244 62. Peyré G, Mallat S (2007) A review of bandlet methods for geometrical image representation. Num Algorithms 44(3):205–234 63. Rahman I, Drori I, Stodden VC, Donoho DL, Schröder P (2005) Multiscale representations fpr manifold-valued data. Multiscale Mod Simul 4(4):1201–1232

64. Saevarsson B, Sveinsson J, Benediktsson J (2003) Speckle reduction of SAR images using adaptive curvelet domain. In: Proceedings of the IEEE International Conference on Geoscience and Remote Sensing Symposium, IGARSS ’03, vol 6., pp 4083– 4085 65. Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ (1992) Shiftable multi-scale transforms [or what’s wrong with orthonormal wavelets]. IEEE Trans Inf Theory 38(2):587–607 66. Starck JL, Abrial P, Moudden Y, Nguyen M (2006) Wavelets, ridgelets and curvelets on the sphere. Astron Astrophys 446:1191–1204 67. Starck JL, Bijaoui A, Lopez B, Perrier C (1994) Image reconstruction by the wavelet transform applied to aperture synthesis. Astron Astrophys 283:349–360 68. Starck JL, Candès EJ, Donoho DL (2002) The curvelet transform for image denoising. IEEE Trans Image Process 11(6):131– 141 69. Starck JL, Candès EJ, Donoho DL (2003) Astronomical image representation by the curvelet tansform. Astron Astrophys 398:785–800 70. Starck JL, Elad M, Donoho DL (2004) Redundant multiscale transforms and their application for morphological component analysis. In: Hawkes P (ed) Advances in Imaging and Electron Physics, vol 132. Academic Press, San Diego, London, pp 288–348 71. Starck JL, Elad M, Donoho DL (2005) Image decomposition via the combination of sparse representation and a variational approach. IEEE Trans Image Process 14(10):1570–1582 72. Starck JL, Murtagh F, Bijaoui A (1998) Image Processing and Data Analysis: The Multiscale Approach. Cambridge University Press, Cambridge 73. Starck JL, Murtagh F, Candès E, Donoho DL (2003) Gray and color image contrast enhancement by the curvelet transform. IEEE Trans Image Process 12(6):706–717 74. Starck JL, Nguyen MK, Murtagh F (2003) Wavelets and curvelets for image deconvolution: a combined approach. Signal Process 83(10):2279–2283 75. The Curvelab Toolbox (2005) http://www.curvelet.org 76. Velisavljevic V, Beferull-Lozano B, Vetterli M, Dragotti PL (2006) Directionlets: Anisotropic multi-directional representation with separable filtering. IEEE Trans Image Process 15(7):1916–1933 77. Ying L, Demanet L (2007) Wave atoms and sparsity of oscillatory patterns. Appl Comput Harmon Anal 23(3):368– 387 78. Zhang Z, Huang W, Zhang J, Yu H, Lu Y (2006) Digital image watermark algorithm in the curvelet domain. In: Proceedings of the International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP’06), Pasadena. pp 105–108

773

774

Data and Dimensionality Reduction in Data Analysis and System Modeling

Data and Dimensionality Reduction in Data Analysis and System Modeling W ITOLD PEDRYCZ1,2 1 Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada 2 Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Article Outline Glossary Definition of the Subject Introduction Information Granularity and Granular Computing Data Reduction Dimensionality Reduction Co-joint Data and Dimensionality Reduction Conclusions Future Directions Acknowledgments Appendix: Particle Swarm Optimization (PSO) Bibliography Glossary Reduction process A suite of activities leading to the reduction of available data and/or reduction of features. Feature selection An algorithmic process in which a large set of features (attributes) is reduced by choosing a certain relatively small subset of them. The reduction is a combinatorial optimization task which is NP-complete. Given this, quite often it is realized in a suboptimal way. Feature transformation A process of transforming a highly dimensional feature space into a low-dimensional counterpart. These transformations are linear or nonlinear and can be guided by some optimization criterion. The commonly encountered methods utilize Principal Component Analysis (PCA) which is an example of a linear feature transformation. Dimensionality reduction A way of converting a large data set into a representative subset. Typically, the data are grouped into clusters whose prototypes are representatives of the overall data set. Curse of dimensionality A phenomenon of a rapid exponential increase of computing related with the dimensionality of the problem (say, the number of data or the number of features) which prevents us from achieving

an optimal solution. The curse of dimensionality leads to the construction of sub-optimal solutions. Data mining A host of activities aimed at discovery of easily interpretable and experimentally sound findings in huge data sets. Biologically-inspired optimization An array of optimization techniques realizing searches in highly-dimensional spaces where the search itself is guided by a collection of mechanisms (operators) inspired by biological search processes. Genetic algorithms, evolutionary methods, particle swarm optimization, and ant colonies are examples of biologically-inspired search techniques. Definition of the Subject Data and dimensionality reduction are fundamental pursuits of data analysis and system modeling. With the rapid growth of sizes of data sets and the diversity of data themselves, the use of some reduction mechanisms becomes a necessity. Data reduction is concerned with a reduction of sizes of data sets in terms of the number of data points. This helps reveal an underlying structure in data by presenting a collection of groups present in data. Given a number of groups which is very limited, the clustering mechanisms become effective in terms of data reduction. Dimensionality reduction is aimed at the reduction of the number of attributes (features) of the data which leads to a typically small subset of features or brings the data from a highly dimensional feature space to a new one of a far lower dimensionality. A joint reduction process involves data and feature reduction. Introduction In the information age, we are continuously flooded by enormous amounts of data that need to be stored, transmitted, processed and understood. We expect to make sense of data and find general relationships within them. On the basis of data, we anticipate to construct meaningful models (classifiers or predictors). Being faced with this ongoing quest, there is an intensive research along this line with a clear cut objective to design effective and computationally feasible algorithms to combat a curse of dimensionality which comes associated with the data. Data mining has become one of the dominant developments in data analysis. The term intelligent data analysis is another notion which provides us with a general framework supporting thorough and user-oriented processes of data analysis in which reduction processes play a pivotal role. Interestingly enough, the problem of dimensionality reduction and complexity management is by no means

Data and Dimensionality Reduction in Data Analysis and System Modeling

Data and Dimensionality Reduction in Data Analysis and System Modeling, Figure 1 A general roadmap of reduction processes; note various ways of dealing with data and dimensionality along with a way of providing some evaluation mechanisms (performance indexes)

a new endeavor. It has been around for a number of decades almost from the very inception of computer science. One may allude here to pattern recognition, data visualization as the two areas in which we were faced with inherent data dimensionality. This has led to a number of techniques which as of now are regarded classic and are used quite intensively. There have been a number of approaches deeply rooted in classic statistical analysis. The ideas of principal component analysis, Fisher analysis and alike are the techniques of paramount relevance. What has changed quite profoundly over the decades is the magnitude of the problem itself which has forced us to the exploration of new ideas and optimization techniques involving advanced techniques of global search including tabu search and biologically-inspired optimization mechanisms. In a nutshell, we can distinguish between reduction processes involving (a) data and (b) features (attributes). Data reduction is concerned with grouping data and revealing their structure in the form of clusters (groups). Clustering is regarded as one of the fundamental techniques within the domain of data reduction. Typically, we start with thousands of data points and arrive at 10–15 clusters. Feature or attribute reduction deals with a (a) transformation of the feature space into another feature space of a far lower dimensionality or (b) selection of a subset of features that are regarded to be the most essential with respect to a certain predefined objective function. Considering the underlying techniques of feature transformation, we encounter a number of classic linear statistical techniques such as e. g., principal component analysis or

more advanced nonlinear mapping mechanisms realized by e. g., neural networks. The criteria used to assess the quality of the resulted (reduced) feature space give rise to the two general categories, namely filters and wrappers. Using filters we consider some criterion that pertains to the statistical characteristics of the selected attributes and evaluate them with this respect. In contrast, when dealing with wrappers, we are concerned with the effectiveness of the features as a vehicle to carry out classification so in essence there is a mechanism (e. g., a certain classifier) which effectively evaluates the performance of the selected features with respect to their discriminating capabilities. In addition to feature and data reduction being regarded as two separate processes, we may consider their combinations. A general roadmap of the dimensionality reduction is outlined in Fig. 1. While the most essential components have been already described, we note that all reduction processes are guided by various criteria. The reduction activities are established in some formal frameworks of information granules. In the study, we will start with an idea of information granularity and information granules, Sect. “Introduction”, in which we demonstrate their fundamental role of the concept that leads to the sound foundations and helps establish a machinery of data and feature reduction. We start with data reduction (Sect. “Information Granularity and Granular Computing”) where we highlight the role of clustering and fuzzy clustering as a processing vehicle leading to the discovery of structural relationships within the data. We also cover several key measures

775

776

Data and Dimensionality Reduction in Data Analysis and System Modeling

used to the evaluation of reduction process offering some thoughts on fuzzy quantization which brings a quantitative characterization of the reconstruction process. In this setting, we show that a tandem of encoding and decoding is guided by well-grounded optimization criteria. Dimension reduction is covered in Sect. “Data Reduction”. Here we start with linear and nonlinear transformations of the feature space including standard methods such as a wellknown Principal Component Analysis (PCA). In the sequel, we stress a way in which biologically inspired optimization leads to the formation of the optimal subset of features. Some mechanisms of co-joint data and dimensionality reduction are outlined in Sect. “Dimensionality Reduction” in which we discuss a role of biclustering in these reduction problems. In this paper, we adhere to the standard notation. Individual data are treated as n-dimensional vectors of real numbers, say x 1 ; x 2 ; : : : etc. We consider a collection of N data points which could be arranged in a matrix form where data occupy consecutive rows of the matrix. Furthermore we will be using the terms attribute and feature interchangeably. Information Granularity and Granular Computing Information granules permeate numerous human endeavors [1,19,20,22,23,24,25,26]. No matter what problem is taken into consideration, we usually express it in a certain conceptual framework of basic entities, which we regard to be of relevance to the problem formulation and problem solving. This becomes a framework in which we formulate generic concepts adhering to some level of abstraction, carry out processing, and communicate the results to the external environment. Consider, for instance, image processing. In spite of the continuous progress in the area, a human being assumes a dominant and very much uncontested position when it comes to understanding and interpreting images. Surely, we do not focus our attention on individual pixels and process them as such but group them together into semantically meaningful constructs – familiar objects we deal with in everyday life. Such objects involve regions that consist of pixels or categories of pixels drawn together because of their proximity in the image, similar texture, color, etc. This remarkable and unchallenged ability of humans dwells on our effortless ability to construct information granules, manipulate them and arrive at sound conclusions. As another example, consider a collection of time series. From our perspective we can describe them in a semi-qualitative manner by pointing at specific regions of such signals. Specialists can effortlessly interpret ECG signals. They distinguish some seg-

ments of such signals and interpret their combinations. Experts can interpret temporal readings of sensors and assess the status of the monitored system. Again, in all these situations, the individual samples of the signals are not the focal point of the analysis and the ensuing signal interpretation. We always granulate all phenomena (no matter if they are originally discrete or analog in their nature). Time is another important variable that is subjected to granulation. We use seconds, minutes, days, months, and years. Depending which specific problem we have in mind and who the user is, the size of information granules (time intervals) could vary quite dramatically. To the high level management time intervals of quarters of year or a few years could be meaningful temporal information granules on basis of which one develops any predictive model. For those in charge of everyday operation of a dispatching plant, minutes and hours could form a viable scale of time granulation. For the designer of high-speed integrated circuits and digital systems, the temporal information granules concern nanoseconds, microseconds, and perhaps microseconds. Even such commonly encountered and simple examples are convincing enough to lead us to ascertain that (a) information granules are the key components of knowledge representation and processing, (b) the level of granularity of information granules (their size, to be more descriptive) becomes crucial to the problem description and an overall strategy of problem solving, (c) there is no universal level of granularity of information; the size of granules is problem-oriented and user dependent. What has been said so far touched a qualitative aspect of the problem. The challenge is to develop a computing framework within which all these representation and processing endeavors could be formally realized. The common platform emerging within this context comes under the name of Granular Computing. In essence, it is an emerging paradigm of information processing. While we have already noticed a number of important conceptual and computational constructs built in the domain of system modeling, machine learning, image processing, pattern recognition, and data compression in which various abstractions (and ensuing information granules) came into existence, Granular Computing becomes innovative and intellectually proactive in several fundamental ways  It identifies the essential commonalities between the surprisingly diversified problems and technologies used there which could be cast into a unified framework we usually refer to as a granular world. This is a fully operational processing entity that interacts with the external world (that could be another granular or nu-

Data and Dimensionality Reduction in Data Analysis and System Modeling











meric world) by collecting necessary granular information and returning the outcomes of the granular computing. With the emergence of the unified framework of granular processing, we get a better grasp as to the role of interaction between various formalisms and visualize a way in which they communicate. It brings together the existing formalisms of set theory (interval analysis) [8,10,15,21], fuzzy sets [6,26], rough sets [16,17,18], etc. under the same roof by clearly visualizing that in spite of their visibly distinct underpinnings (and ensuing processing), they exhibit some fundamental commonalities. In this sense, Granular Computing establishes a stimulating environment of synergy between the individual approaches. By building upon the commonalities of the existing formal approaches, Granular Computing helps build heterogeneous and multifaceted models of processing of information granules by clearly recognizing the orthogonal nature of some of the existing and well established frameworks (say, probability theory coming with its probability density functions and fuzzy sets with their membership functions). Granular Computing fully acknowledges a notion of variable granularity whose range could cover detailed numeric entities and very abstract and general information granules. It looks at the aspects of compatibility of such information granules and ensuing communication mechanisms of the granular worlds. Interestingly, the inception of information granules is highly motivated. We do not form information granules without reason. Information granules arise as an evident realization of the fundamental paradigm of abstraction.

Granular Computing forms a unified conceptual and computing platform. Yet, it directly benefits from the already existing and well-established concepts of information granules formed in the setting of set theory, fuzzy sets,

rough sets and others. A selection of a certain formalism from this list depends upon the problem at hand. While in dimensionality reduction set theory becomes more visible, in data reduction we might encounter both set theory and fuzzy sets. Data Reduction In data reduction, we transform large collections of data into a limited, quite small family of their representatives which capture the underlying structure (topology). Clustering has become one of the fundamental tools being used in this setting. Depending upon the distribution of data, we may anticipate here the use of set theory or fuzzy sets. As an illustration, consider a two-dimensional data portrayed in Fig. 2. The data in Fig. 2a exhibit a clearly delineated structure with three well separated clusters. Each cluster can be formalized as a set of elements. A situation illustrated in Fig. 2b is very different: while there are some clusters visible, they overlap to some extent and a number of points are “shared” by several clusters. In other words, some data may belong to more than a single cluster and hence the emergence of fuzzy sets as a formal setting in which we can formalize the resulting information granules. In what follows, we briefly review the essence of fuzzy clustering and then move on with the ideas of the evaluation of quality of the results. Fuzzy C-means as an Algorithmic Vehicle of Data Reduction Through Fuzzy Clusters Fuzzy sets can be formed on a basis of numeric data through their clustering (groupings). The groups of data give rise to membership functions that convey a global more abstract view at the available data. With this regard Fuzzy C-Means (FCM) is one of the commonly used mechanisms of fuzzy clustering [2,3,20]. Let us review its formulation, develop the algorithm and highlight the main properties of the fuzzy clusters.

Data and Dimensionality Reduction in Data Analysis and System Modeling, Figure 2 Data clustering and underlying formalism of set theory suitable for handling well-delineated structures of data (a) and the use of fuzzy sets in capturing the essence of data with significant overlap (b)

777

778

Data and Dimensionality Reduction in Data Analysis and System Modeling

Given a collection of n-dimensional data set fx k g; k D 1; 2; : : : ; N, the task of determining its structure – a collection of “c” clusters, is expressed as a minimization of the following objective function (performance index) Q being regarded as a sum of the squared distances between data and their representatives (prototypes) QD

N c X X

2 um i k jjx k  v i jj :

(1)

iD1 kD1

Here v i s are n-dimensional prototypes of the clusters, i D 1; 2; : : : ; c and U D [u i k ] stands for a partition matrix expressing a way of allocation of the data to the corresponding clusters; uik is the membership degree of data x k in the ith cluster. The distance between the data z k and prototype v i is denoted by jj:jj. The fuzzification coefficient m (> 1:0) expresses the impact of the membership grades on the individual clusters. A partition matrix satisfies two important and intuitively appealing properties (a) 0 p(c j jx) for all j ¤ i. The Bayesian error in this case is p B D 1  p(c i jx). This decision rule may be conveniently expressed by means of discriminants assigned to classes: to the class ci the discriminant g i (x) is assigned which by the Bayes formula can be expressed as an increasing function of the product p(xjc i )  p(c i ), e. g., g i (x) D log p(xjc i ) C log p(c i ); an object with features x is classified to ci for which g i (x) D maxfg j (x) : j D 1; 2; : : : ; kg. The most important case happens when conditional densities p(xjc i ) are distributed normally unP der N( i ; i ); under simplifying assumptions that prior probabilities are equal and features are independent, the Bayes decision rule becomes the following recipe: for an object with feature vector x, the class ci is assigned for which the distance jjx   i jj from x to the mean  i is the smallest among all distances jjx   j jj [16]. In this way the nearest neighbor method can be formally introduced and justified. In many real cases, estimation of densities p(xjc i ) is for obvious reasons difficult; nonparametric techniques were proposed to bypass this difficulty. The idea of a Parzen window in Parzen [58], see [16], consists of considering a sample S of objects in a d-space potentially increasing to infinity their number, and a sequence R1 ; R2 ; : : : ; R n ; : : : of d-hypercubes of edge lengths hn hence of volumes Vn D h dn , Rn containing kn objects from the sample; the estimate pn (x) for density induced from Rn is clearly equal to p n (x) D (k n  jSj1 )/(Vn ). Under additional conditions that limn!1 Vn D 0 and lim n!1 n  Vn D 1, the sequence pn (x) does converge to density p(x) in case the latter is continuous [16]. The difficulty with this approach lies in the question of how to choose regions Ri and their parameters and the idea of nearest neighbors returns: the modification rests on the idea that the sampling window should be dependent on the training sample itself and adapt to its structure; the way of implementing this idea can be as follows: one could center the region R about the object with features x and resize it until it absorbs k training objects (k nearest neighbors); if ki of them falls into the class ci then the estimate for probability density is p(c i jx) D (k i )/(k) [16]. Letting k variable with l im n!1 nk D 0 and l im n!1 k D 1 secures that the estimate sequence p n (cjx) would converge in probability to p(c i jx) at continuity points of the lat-

ter [16]. The k-nearest neighbor method finds its justification through considerations like the one recalled above. The 1-nearest neighbor method is the simplest variant of k-nearest neighbors; in the sample space, with the help of a selected metric, it does build neighborhoods of training objects, splitting the space into cells composing together the Voronoi tesselation (Voronoi diagram) of the space, see [79]. The basic theorem by Cover and Hart [13] asserts that the error P in classification by the 1-nearest neighbor method is related to the error PB in classification by the Bayesian decision rule by the inequality: P B  P  P B  (2  (k  1)/(k)  P B ), where k is the number of decision classes. Although the result is theoretically valid under the assumption of infinite sample, yet it can be regarded as an estimate of limits of error by the 1-nearest neighbor method as at most twice the error by Bayes classifier also in case of finite reasonably large samples. A discussion of analogous results for the k-nearest neighbor method can be found in Ripley [81]. Metrics [see Glossary: “Distance functions (metrics)”] used in nearest neighbor-based techniques can be of varied form; the basic distance function is the Euclidean P metric in a d-space: E (x; y) D [ diD1 (x i  y i )2 ]1/2 and its generalization to the class of Minkowski metrics P L p (x; y) D [ diD 1 jx i  y i j p ]1/p for p 1 with limiting P cases of L1 D diD1 jx i  y i j (the Manhattan metric) and L1 (x; y) D maxfjx i  y i j : i D 1; 2; : : : ; dg. These metrics can be modified by scaling factors (weights) applied P to coordinates, e. g., Lw1 (x; y) D diD1 w i  jx i  y i j is the Manhattan metric modified by the non-negative weight vector w [119] and subject to adaptive training. Metrics like the above can be detrimental to the nearest neighbor method in the sense that the nearest neighbors are not invariant with respect to transformations like translations, shifts, rotations. A remedy for this difficulty was proposed as the notion of the tangent distance by Simard, Le Cun and Denker [86]. The idea consists of replacing each training as well as each test object, represented as a vector x in the feature space Rk , with its invariance manifold, see Hastie, Tibshirani and Friedman [32], consisting of x along with all its images by allowable transformations: translation, scaling of axes, rotation, shear, line thickening; instead of measuring distances among object representing vectors x; y, one can find shortest distances among invariance manifolds induced by x and y; this task can be further simplified by finding for each x the tangent hyperplane at x to its invariance manifold and measuring the shortest distance between these tangents. For a vector x and the matrix T of basic tangent vectors at x to the invariance manifold, the equation of the tangent

801

802

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

hyperplane is H(x) : y D x C Ta, where a in Rk . The simpler version of the tangent metric method, so-called “onesided”, assumes that tangent hyperplanes are produced for training objects x whereas for test objects x 0 no invariance manifold is defined but the nearest to x 0 tangent hyperplane is found and x 0 is classified as x that defined the nearest tangent hyperplane. In this case the distance to the tangent hyperplane is given by arg min a E (x 0 ; x C Ta) in case the Euclidean metric is chosen as the underlying distance function. In case of a “two-sided” variant, the tangent hyperplane at x 0 is found as well and x 0 is classified as the training object x for which the distance between tangent hyperplanes H(x); H(x 0 ) is the smallest among distances from H(x 0 ) to tangent hyperplanes at training objects. A simplified variant of the tangent distance method was proposed by Abu-Mostafa, see [32], consisting of producing all images of training and test objects representations in the feature space and regarding them as respectively training and test objects and using standard nearest neighbor techniques. Among problems related to metrics is also the dimensionality problem; in high-dimensional feature spaces, nearest neighbors can be located at large distances from the test point in question, thus violating the basic principle justifying the method as window choice; as pointed out in [32], the median of the radius R of the sphere about the origin containing the nearest neighbor of n objects uniformly distributed in the cube [1/2; 1/2] p is equal p to (1)/(Vp )  (1  1/21/N )1/p where Vp  r p is the volume of the sphere of radius r in p-space, and it approaches the value of 0.5 with an increase of p for each n, i.e, a nearest neighbor is asymptotically located on the surface of the cube. A related result in Aggarwal et al. [3] asserts that the expected value of the quotient k 1/2  (maxfjxj p ; jyj p g  minfjxj p ; jyj p g)/(minfjxj p ; jyj p g) for vectors x, y uniformly distributed in the cube (0; 1) k is asymptotically (k ! 1) equal to C  (2  p C 1)1/2 (this phenomena are collectively known as the dimensionality curse). To cope with this problem, a method called discriminant adaptive nearest neighbor was proposed by Hastie and Tibshirani [31]; the method consists of adaptive modification of the metric in neighborhoods of test vectors in order to assure that probabilities of decision classes do not vary much; the direction in which the neighborhood is stretched is the direction of the least change in class probabilities. This idea that the metric used in nearest neighbor finding should depend locally on the training set was also used in constructions of several metrics in the realm of deci-

sion systems, i. e., objects described in the attribute-value language; for nominal values, the metric VDM (Value Difference Metric) in Stanfill and Waltz [97] takes into account conditional probabilities P(d D vja i D v i ) of decision value given the attribute value, estimated over the training set Trn, and on this basis constructs in the value set V i of the attribute ai a metric i (v i ; v 0i ) D P 0 v2Vd jP(d D vja i D v i )  P(d D vja i D v i )j. The global metric is obtained by combining metrics i for all attributes a i 2 A according to one of many-dimensional metrics, e. g., Minkowski metrics. This idea was also applied to numerical attributes in Wilson and Martinez [116] in metrics IVDM (Interpolated VDM) and WVDM (Windowed VDM), see also [119]. A modification of the WVDM metric based again on the idea of using probability densities in determining the window size was proposed as the DBVDM metric [119]. Implementation of the k-nearest neighbor method requires satisfactorily fast methods for finding k nearest neighbors; the need for accelerating search in training space led to some indexing methods. The method due to Fukunaga and Narendra [24] splits the training sample into a hierarchy of clusters with the idea that in a search for the nearest neighbor the clusters are examined in a topdown manner and clusters evaluated as not fit in the search are pruned from the hierarchy. The alternative is the bottom-up scheme of clustering due to Ward [113]. Distances between test objects and clusters are evaluated as distances between test objects and clusters centers that depend on the chosen strategy. For features that allow one to map objects onto vectors in a vector space, and clusters in the form of hypercubes a number of tree-based indexing methods exist: k-d trees due to Bentley [8] and quad trees in Finkel and Bentley [20] among them; for high-dimensional feature spaces where the dimensionality curse plays a role, more specialized tree-based indexing methods were proposed: X-trees by Berchtold, Keim and Kriegel [9], SR-trees in Katayama and Satoh [34], TV-trees by Lin, Jagadish and Faloustos [45]. For general, also non-numerical features, more general cluster regions are necessary and a number of indexing methods like BST due to Kalantari and McDonald [33], GHT by Uhlmann [110], GNAT in Brin [10], SS-tree due to White and Jain [115] and M-trees in Ciaccia, Patella and Zezula [11] were proposed. A selection of cluster centers for numerical features may be performed as mean values of feature values of vectors in the cluster; in a general case it may be based on minimization of variance of distance within the cluster [119]. Search for k nearest neighbors of a query object is done in the tree in depthfirst order from the root down.

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

Applications of the k nearest neighbor method include satellite images recognition, protein sequence matching, spatial databases, information retrieval, stock market and weather forecasts, see Aha [4] and Veloso [111]. Case-Based Reasoning Nearest neighbor techniques can be seen abstractly as tasks of classification of an entity e by means of recovering from the base of already classified entities the nearest one E to e and adopting the class of E as the class of e. Regarded in this light, classification and more generally, problem solving, tasks may be carried out as tasks of recovering from the base of cases already solved the case most similar to the case being currently solved and adopting its solution, possibly with some modifications, as the solution of the current problem. The distinction between this general idea and nearest neighbor techniques is that in the former, new cases are expected to be used in classification of the next cases. This is the underlying idea of Case-Based Reasoning. The Case-Based Reasoning methodology emerged in response to some difficulties that were encountered by Knowledge-Based Reasoning systems that dominated in the first period of growth of Artificial Intelligence like DENDRAL, MYCIN, or PROSPECTOR. These systems relied on knowledge of the subject in the form of decision rules or models of entities; the problems were of varied nature: complexity of knowledge extraction; difficulty of managing large amounts of information, difficulties with maintenance and refreshing knowledge, see [114]. In spite of search for improvements and a plethora of ideas and concepts aimed at solving the mentioned difficulties, new ideas based on analogy reasoning came forth. Resorting to analogous cases eliminates the need for modeling knowledge as knowledge is cases along with their solutions and methods; implementing consists in identifying features determining cases; large volumes of information can be managed; learning is acquiring new cases [114]. Case-Based Reasoning (CBR) can be traced back to few sources of origin; on a philosophical level, L. Wittgenstein [117] is quoted in Aamodt and Plaza [2] due to his idea of natural concepts and the claim that they be represented as collections of cases rather than features. An interest in cognitive aspects of learning due partially to the contemporary growth of interest in theories of language inspired in large part by work of N. Chomsky and in psychological aspects of learning and exploiting the concept of situation, was reflected in the machine-learning area by works of Schank and Abelson and Schank [84], [83], regarded as pioneering in the CBR area in [114]. Schank and

Abelson’s idea was to use scripts as the tool to describe memory of situation patterns. The role of memory in reasoning by cases was analyzed by Schank [83] leading to memory organization packets (MOP) and by Porter as the case memory model. These studies led to the first CBRbased systems: CYRUS by Kolodner [37], MEDIATOR by Simpson [87], PROTOS by Porter and Bareiss [78], among many others; see [114] for a more complete listing. The methodology of CBR systems was described in [2] as consisting of four distinct processes repeated in cycles: (1) RETRIEVE: the process consisting of matching the currently being solved case against the case base and fetching cases most similar according to adopted similarity measure to the current case; (2) REUSE: the process of making solutions of most similar cases fit to solve the current case; (3) REVISE: the process of revision of fetched solutions to adapt it to the current case taking place after the REUSED solution was proposed, evaluated and turned to be REVISED; (4) RETAIN: the process of RETAINING the REVISED solution in the case base along with the case as the new solved case that in turn can be used in solving future cases. Whereas the process of reasoning by cases can be adapted to model reasoning by nearest neighbors yet the difference between the two is clear from Aamodt and Plaza’s description of the CBR process: in CBR, case base is incrementally enlarged and retained cases become its valid members used in the process of solving new cases; in nearest neighbors the sharp distinction between training and test objects is kept throughout the classification process. From an implementation point of view, the problem of representing cases is the first to comment upon. According to Kolodner [38], case representation should secure functionality and ease of information acquisition from the case. The structure of a case is a triple (problem, solution, outcome): the problem is a query along with the state of the world (situation) at the moment the query is posed; the solution is the proposed change in the state of the world and the outcome is the new state after the query is answered. The solution may come along with the method in which it was obtained which is important when revision is necessary. In representing cases many formalisms were used like frames/objects, rules, semantic nets, predicate calculi/first order logics etc. Retrieval of cases from the case base requires a form of case indexing; Kolodner [38] recommended hand-chosen indices as more human-oriented than automated methods but many automated strategies for indexing were introduced and verified, among them (listed in [114]):

803

804

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

Checklist-based indexing in Kolodner [38] that indexes cases by features and dimensions like MEDIATOR indexing which takes into account types and functions of objects being disputed and relationships among disputants; Difference-based indexing that selects features discerning best among cases like in CYRUS by Kolodner [37]; Methods based on inductive learning and rule induction which extract features used in indexing; Similarity methods which produce abstractions of cases sharing common sets of features and use remaining features to discern among cases; Explanation-based methods which select features by inspecting each case and deciding on the set of relevant features. The retrieval mechanism relies on indexing and memory organization to retrieve cases; the process of selection of the most similar case uses some measures of similarity and a strategy to find next matching cases in the case base. Among these strategies the nearest neighbor strategy can be found, based on chosen similarity measures for particular features; an exemplary similarity measure is the one adopted in the system ReMind in Kolodner [38]: P w f  sim f ( f (I); f (R)) sim(I; R) D features f P wf features f where wf is a weight assigned to the feature f , sim f is the chosen similarity measure on values of feature f , I is the new case and R is the retrieved case. Other methods are based on inductive learning, template retrieval etc., see [114]. Memory organization should provide a trade-off between wealth of information stored in the case base and efficiency of retrieval; two basic case memory models emerged: the dynamic memory model by Schank [83], Kolodner [37] and the category-exemplar model due to Porter and Bareiss [78]. On the basis of Schank’s MOP (Memory Organization Packets), Episodic Memory Organization Packets in Kolodner [37] and Generalized Episodes (GE) in Koton [41] were proposed; a generalized episode consists of cases, norms and indices where norms are features shared by all cases in the episode and indices are features discerning cases in the episode. The memory is organized as a network of generalized episodes, cases, index names and index values. New cases are dynamically incorporated into new episodes. The category-exemplar model divides cases (exemplars) into categories, according to case features, and indexes cases by case links from

categories to cases in them; other indices are feature links from features to categories or cases and difference links from categories to similar cases that differ to a small degree in features. Categories are related within a semantic network. Reusing and adaptation follows two main lines: structural adaptation, see [38]; in this methodology, the solution of the retrieved case is directly adapted by the adaptation rules. Derivational adaptation in Simpson [87] in which the methods that produced the retrieved solution undergo adaptation in order to yield the solution for the new case is the other. Various techniques were proposed and applied in adaptation process, see, e. g., [114]. Applications based on CBR up to the early 1990s are listed in [38]; to mention a few: JUDGE (Bain): the system for sentencing in murder, assault and man-slaughter; KICS (Yang and Robertson): the system for deciding about building regulations; CASEY (Koton): the system for heart failure diagnosis; CADET (Sycara K): the system for assisting in mechanical design; TOTLEC (Costas and Kashyap): the system for planning in the process of manufacturing design; PLEXUS (Alterman): the system for plan adaptation; CLAVIER (Hennessy and Hinkle): the system implemented at Lockheed to control and modify the autoclave processes in part manufacturing; CaseLine (Magaldi): the system implemented at British Airways for maintenance and repair of the Boeing fleet.

Complexity Issues Complexity of Computations in Information Systems The basic computational problems: (DM) Computing the discernibility matrix M U, A from an information system (U, A); (MLA) The membership in the lower approximation; (RD) Rough definability of sets are of polynomial complexity: (DM) and (RD) in time O(n2 ); (MLA) in time O(n) [90]. The core, CORE(U, A), of an information system (U, A), is the set of all indispensable attributes, i. e., CORE(U; A) D fa 2 A: ind(A) ¤ ind(A n fag)g. As proved in [90], CORE(U; A) D fa 2 A: c i; j D fag for some entry c i; j into the discernibility matrix. Thus, finding the core requires O(n2 ) time [90]. The reduct membership Problem, i. e., checking whether a given set B of attributes is a reduct, requires O(n2 ) time [90]. The reduct set Problem, i. e., finding the set of reducts, is polynomially equivalent to the problem of converting a conjunctive form of a monotone Boolean function into the reduced disjunctive form [90]. The number of reducts

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

in an information system of n attributes can be exponential n  with the upper bound of bn/2c reducts [89]. The problem of finding a reduct of minimal cardinality is NP-hard [90]; thus, heuristics for reduct finding are in the general case necessary; they are based on the Johnson algorithm, simulated annealing etc. Genetic algorithmsbased hybrid algorithms give short reducts in relatively short time [120]. Complexity of Template-Based Computations Templates (see Subsect. “Similarity” of Sect. “Rough Set Theory. Extensions”) were used in rule induction based on similarity relations [53]. Decision problems related to templates are [53]: TEMPLATE SUPPORT (TS) Instance: information system (U, A); natural numbers s,l Q.: does there exist a template of length l and support s? OPTIMAL TEMPLATE SUPPORT (OTS) Input: information system (U, A); natural number l Output: a template T of length l and maximal support.

The problem TS is NP-complete and the problem OTS is NP-hard [53]. TEMPLATE QUALITY PROBLEM (TQP) Instance: an information system (U, A); a natural number k Q.: does there exist a template of quality greater than k?

In the case quality(T) D support(T) C length(T), the problem TQP can be solved in polynomial time; in the case quality(T) D support(T)  length(T), the problem TQP is conjectured to be NP-hard. In [53] some methods for template extraction of satisfactory quality are discussed. Complexity of Discretization Discretization (see Subsect. “Discretization” of Sect. “Rough Set Theory. Extensions”) offers a number of decision problems. In the case of numeric values of attributes, the decision problem is to check for an irreducible set of cuts. IRREDUCIBLE SET OF CUTS (ISC) Instance: a decision system (U, A,d) with jAj  2; a natural number k Q.: does there exist an irreducible set of cuts of cardinality less than k?

ISC is NP-complete and its optimization version, i. e., if there exists an optimal set of cuts, is NP-hard [53]; in the case jAj D 1, ISC is of complexity O(jUj  logjUj) [53]. For non-numerical values of attributes, the counterpart of a cut system is a partition of the value set; for a partition Pa of an attribute value set V a into k sets, the rank r(Pa ) is k; for a family P : fPa : a 2 Ag of partitions for all value sets of attributes in a decision system (U; A; d), the P rank of P is r(P) D a2A r(Pa ); the notion of a consistent partition P mimics that of a consistent cut system. The corresponding problem is Symbolic Value Partition Problem (SVPP) Input: a decision system (U, A,d) a set B of attributes; a natural number k Output: a B-consistent partition of rank less or equal k.

The problem SVPP is NP-complete [53]. Problems of generating satisfactory cut systems require some heuristics; Maximal Discernibility Heuristics (MD) [51] is also discussed in [7,53]. Complexity of Problems Related to K-Nearest Neighbors In the case of n training objects in d-space, the search for the nearest neighbor (k D 1) does require O(dn2 ) time and O(n) space. A parallel implementation [16] is O(1) in time and O(n) in space. This implementation checks for each of Voronoi regions induced from the training sample whether the test object falls inside it and thus would receive its class label by checking for each of the d “faces” of the region whether the object is on the region side of the face. The large complexity associated with storage of n training objects, called for eliminating some “redundant” training objects; a technique of editing in Hart [30] consists of editing away, i. e., eliminating from the training set of objects that are surrounded in the Voronoi diagram by objects in the same decision class; the complexity of the editing algorithm is O(d 3  nbd/2c  log n). The complexity of problems related to Voronoi diagrams has been researched in many aspects; the Voronoi diagram for n points in 2-space can be constructed in O(n  log n) [79]; in d-space the complexity is (nbd/2c ) (Klee [35]). Such also is the complexity of finding the nearest neighbor in the corresponding space. There were proposed other editing techniques based on graphs: Gabriel graphs in Gabriel and Sokal [25] and relative neighborhood graphs in Toussaint [107].

805

806

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

Future Directions It seems reasonable to include the following among problems that could occupy the attention of researchers notwithstanding the paradigm applied: Compression of knowledge encoded in the training set in order to treat large volumes of data. Granulation of knowledge as a compression means. This should call for efficient formal granulation models based on similarity relations rather than on metrics that would preserve large parts of information encoded in original data. Search for the most critical decision/classification rules among the bulk induced from data. This search should lead to a deeper insight into relations in data. Missing values problem, important especially for medical and biological data; its solution should also lead to new deeper relations and dependencies among attributes. Working out the methods for analyzing complex data like molecular, genetic and medical data that could include signals, images.

Bibliography Primary Literature 1. Aamodt A (1991) A knowledge intensive approach to problem solving and sustained learning. Dissertation, University Trondheim, Norway. University Microfilms PUB 92–08460 2. Aamodt A, Plaza E (1994) Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications 7:39–59 3. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: Proceedings of the eighth international conference on database theory, London, pp 420–434 4. Aha DW (1998) The omnipresence of case-based reasoning in science and applications. Knowl-Based Syst 11:261–273 5. Bayes T (1763) An essay towards solving a problem in the doctrine of chances. Philos Trans R Soc (London) 53:370–418

6. Bazan JG (1998) A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery, vol 1. Physica, Heidelberg, pp 321–365 7. Bazan JG et al (2000) Rough set algorithms in classification problems. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. New developments in knowledge discovery in information systems. Physica, Heidelberg, pp 49– 88 8. Bentley JL (1975) Multidimensional binary search trees used for associative searching. Commun ACM 18:509–517 9. Berchtold S, Keim D, Kriegel HP (1996) The X-tree: an index structure for high dimensional data. In: Proceedings of the 22nd International Conference on Very Large Databases VLDB’96 1996 Mumbai, Morgan Kaufmann, San Francisco, pp 29–36 10. Brin S (1995) Near neighbor search in large metric spaces. In: Proceedings of the 21st International Conference on Very Large Databases VLDB’95 Zurich, Morgan Kaufmann, San Francisco, pp 574–584 11. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd International Conference on Very Large Databases VLDB’97, Athens, Morgan Kaufmann, San Francisco, pp 426– 435 12. Clark P, Evans F (1954) Distance to nearest neighbor as a measure of spatial relationships in populations. Ecology 35:445– 453 13. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13(1):21–27 14. Czy˙zewski A et al (2004) Musical phrase representation and recognition by means of neural networks and rough sets. In: Transactions on rough sets, vol 1. Lecture Notes in Computer Science, vol 3100. Springer, Berlin, pp 254–278 15. Deja R (2000) Conflict analysis. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. New developments in knowledge discovery in information systems. Physica, Heidelberg, pp 491–520 16. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York 17. Düntsch I, Gediga G (1998) GROBIAN. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 555–557 18. Faucett WM (1955) Compact semigroups irreducibly connected between two idempotents. Proc Am Math Soc 6:741– 747 19. Fernandez-Baizan MC et al (1998) RSDM: Rough sets data miner. A system to add data mining capabilities to RDBMS. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 558–561 20. Finkel R, Bentley J (1974) Quad trees: a data structure for retrieval and composite keys. Acta Inf 4:1–9 21. Fix E, Hodges JL Jr (1951) Discriminatory analysis: Nonparametric discrimination: Consistency properties. USAF Sch Aviat Med 4:261–279 22. Fix E, Hodges JL Jr (1952) Discriminatory analysis: Nonparametric discrimination: Small sample performance. USAF Sch Aviat Med 11:280–322 23. Frege G (1903) Grundlagen der Arithmetik II. Jena, Hermann Pohle

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

24. Fukunaga K, Narendra PM (1975) A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans Comput 24:750–753 25. Gabriel KR, Sokal RR (1969) A new statistical approach to geographic variation analysis. Syst Zool 18:259–278 ´ R (1999) On joint use of indis26. Greco S, Matarazzo B, Słowinski cernibility, similarity and dominance in rough approximation of decision classes. In: Proceedings of the 5th international conference of the decision sciences institute, Athens, Greece, pp 1380–1382 27. Grzymala-Busse JW (1992) LERS – a system for learning from ´ R (ed) Intelligent examples based on rough sets. In: Słowinski decision support. Handbook of Advances and Applications of the Rough Sets Theory. Kluwer, Dordrecht, pp 3–18 28. Grzymala-Busse JW (2004) Data with missing attribute values: Generalization of indiscernibility relation and rule induction. In: Transactions on rough sets, vol 1. Lecture Notes in Computer Science, vol 3100. Springer, Berlin, pp 78–95 29. Grzymala-Busse JW, Ming H (2000) A comparison of several approaches to missing attribute values in data mining. In: Lecture notes in AI, vol 2005. Springer, Berlin, pp 378–385 30. Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory IT-14(3):515–516 31. Hastie T, Tibshirani R (1996) Discriminant adaptive nearestneighbor classification. IEEE Pattern Recognit Mach Intell 18:607–616 32. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, New York 33. Kalantari I, McDonald G (1983) A data structure and an algorithm for the nearest point problem. IEEE Trans Softw Eng 9:631–634 34. Katayama N, Satoh S (1997) The SR-tree: an index structure for high dimensional nearest neighbor queries. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, Tucson, AZ, pp 369–380 35. Klee V (1980) On the complexity of d-dimensional Voronoi diagrams. Arch Math 34:75–80 ˙ 36. Klösgen W, Zytkow J (eds) (2002) Handbook of data mining and knowledge discovery. Oxford University Press, Oxford 37. Kolodner JL (1983) Maintaining organization in a dynamic long-term memory. Cogn Sci 7:243–80 38. Kolodner JL (1993) Case-based reasoning. Morgan Kaufmann, San Mateo 39. Komorowski J, Skowron A et al (1998) The ROSETTA software system. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 572–575 40. Kostek B (2007) The domain of acoustics seen from the rough set perspective. In: Transactions on rough sets, vol VI. Lecture Notes in Computer Science, vol 4374. Springer, Berlin, pp 133–151 41. Koton P (1989) Using experience in learning and problem solving. Ph D Dissertation MIT/LCS/TR-441, MIT, Laboratory of Computer Science, Cambridge 42. Kowalczyk W (1998) TRANCE: A tool for rough data analysis, classification and clustering. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 566–568 43. Krawiec K et al (1998) Learning decision rules from similarity based rough approximations. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery, vol 2. Physica, Heidelberg, pp 37–54

44. Le´sniewski S (1916) Podstawy Ogólnej Teoryi Mnogosci (On the Foundations of Set Theory), in Polish. The Polish Scientific Circle, Moscow; see also a later digest: (1982) Topoi 2:7–52 45. Lin KI, Jagadish HV, Faloustos C (1994) The TV-tree: an index structure for high dimensional data. VLDB J 3:517–542 46. Lin TY (1997) From rough sets and neighborhood systems to information granulation and computing with words. In: 5th European Congress on Intelligent Techniques and Soft Computing, 1997 Aachen, Verlagshaus Mainz, Aachen, pp 1602– 1606 47. Lin TY (2005) Granular computing: Examples, intuitions, and modeling. In: Proceedings of IEEE 2005 conference on granular computing GrC05, Beijing, China. IEEE Press, pp 40–44, IEEE Press, New York 48. Ling C-H (1965) Representation of associative functions. Publ Math Debrecen 12:189–212 49. Michalski RS et al (1986) The multi-purpose incremental learning system AQ15 and its testing to three medical domains. In: Proceedings of AAAI-86. Morgan Kaufmann, San Mateo, pp 1041–1045 50. Mostert PS, Shields AL (1957) On the structure of semigroups on a compact manifold with a boundary. Ann Math 65:117– 143 51. Nguyen SH (1997) Discretization of real valued attributes: Boolean reasoning approach. Ph D Dissertation, Warsaw University, Department of Mathematics, Computer Science and Mechanics 52. Nguyen SH, Skowron A (1995) Quantization of real valued attributes: Rough set and Boolean reasoning approach. In: Proceedings 2nd annual joint conference on information sciences, Wrightsville Beach, NC, pp 34–37 53. Nguyen SH (2000) Regularity analysis and its applications in Data Mining. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. New developments in knowledge discovery in information systems. Physica, Heidelberg, pp 289–378 54. Nguyen TT (2004) Handwritten digit recognition using adaptive classifier construction techniques. In: Pal SK, Polkowski L, Skowron A (eds) Rough – neural computing. Techniques for computing with words. Springer, Berlin, pp 573–586 55. Novotny M, Pawlak Z (1988) Partial dependency of attributes. Bull Pol Acad Ser Sci Math 36:453–458 56. Novotny M, Pawlak Z (1992) On a problem concerning dependence spaces. Fundam Inform 16:275–287 57. Pal SK, Dasgupta B, Mitra P (2004) Rough-SOM with fuzzy discretization. In: Pal SK, Polkowski L, Skowron A (eds) Rough – neural computing. Techniques for computing with words. Springer, Berlin, pp 351–372 58. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):128–152 59. Patrick EA, Fisher FP (1970) A generalized k-nearest neighbor rule. Inf Control 16(2):128–152 60. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356 61. Pawlak Z (1985) On rough dependency of attributes in information systems. Bull Pol Acad Ser Sci Tech 33:551–559 62. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Dordrecht 63. Pawlak Z, Skowron, A (1993) A rough set approach for decision rules generation. In: Proceedings of IJCAI’93 workshop W12. The management of uncertainty in AI. also: ICS Research Report 23/93 Warsaw University of Technology

807

808

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

64. Pawlak Z, Skowron A (1994) Rough membership functions. In: Yaeger RR, Fedrizzi M, Kasprzyk J (eds) Advances in the Dempster–Schafer theory of evidence. Wiley, New York, pp 251–271 65. Peters J, Ramanna S (2004) Approximation space for software models. In: Transactions on rough sets, vol I. Lecture Notes in Computer Science, vol 3100. Springer, Berlin, pp 338–355 66. Poincaré H (1902) Science et hypothese and I’Hypothese.Flammarion, Paris 67. Polkowski L (2003) A rough set paradigm for unifying rough set theory and fuzzy set theory. In: Proceedings RSFDGrC03, Chongqing, China, 2003. Lecture Notes in AI, vol 2639. Springer, Berlin, pp 70–78; also: Fundam Inf 54:67–88 68. Polkowski L (2004) Toward rough set foundations. Mereological approach. In: Proceedings RSCTC04, Uppsala, Sweden. Lecture Notes in AI, vol 3066. Springer, Berlin, pp 8–25 69. Polkowski L (2005) Formal granular calculi based on rough inclusions. In: Proceedings of IEEE 2005 conference on granular computing GrC05, Beijing, China. IEEE Press, New York, pp 57–62 70. Polkowski L (2005) Rough-fuzzy-neurocomputing based on rough mereological calculus of granules. Int J Hybrid Intell Syst 2:91–108 71. Polkowski L (2006) A model of granular computing with applications. In: Proceedings of IEEE 2006 conference on granular computing GrC06, Atlanta, USA May 10-12. IEEE Press, New York, pp 9–16 72. Polkowski L, Araszkiewicz B (2002) A rough set approach to estimating the game value and the Shapley value from data. Fundam Inf 53(3/4):335–343 73. Polkowski L, Artiemjew P (2007) On granular rough computing: Factoring classifiers through granular structures. In: Proceedings RSEISP’07, Warsaw. Lecture Notes in AI, vol 4585, pp 280–290 74. Polkowski L, Skowron A (1994) Rough mereology. In: Proceedings of ISMIS’94. Lecture notes in AI, vol 869. Springer, Berlin, pp 85–94 75. Polkowski L, Skowron A (1997) Rough mereology: a new paradigm for approximate reasoning. Int J Approx Reason 15(4):333–365 76. Polkowski L, Skowron A (1999) Towards an adaptive calculus of granules. In: Zadeh LA, Kacprzyk J (eds) Computing with words in information/intelligent systems, vol 1. Physica, Heidelberg, pp 201–228 ˙ 77. Polkowski L, Skowron A, Zytkow J (1994) Tolerance based rough sets. In: Lin TY, Wildberger M (eds) Soft Computing: Rough sets, fuzzy logic, neural networks, uncertainty management, knowledge discovery. Simulation Councils Inc., San Diego, pp 55–58 78. Porter BW, Bareiss ER (1986) PROTOS: An experiment in knowledge acquisition for heuristic classification tasks. In: Proceedings of the first international meeting on advances in learning (IMAL), Les Arcs, France, pp 159–174 79. Preparata F, Shamos MI (1985) Computational geometry: an introduction. Springer, New York 80. Rauszer C (1985) An equivalence between indiscernibility relations in information systems and a fragment of intuitionistic logic. Bull Pol Acad Ser Sci Math 33:571–579 81. Ripley BD (1997) Pattern recognition and neural networks. Cambridge University Press, Cambridge 82. Skowron A et al (1994) A system for data analysis. http://logic. mimuw.edu.pl/~rses/

83. Schank RC (1982) Dynamic memory: A theory of reminding and learning in computers and people. Cambridge University Press, Cambridge 84. Schank RC, Abelson RP (1977) Scripts, plans, goals and understanding. Lawrence Erlbaum, Hillsdale 85. Semeniuk-Polkowska M (2007) On conjugate information systems: A proposition on how to learn concepts in humane sciences by means of rough set theory. In: Transactions on rough sets, vol VI. Lecture Notes in Computer Science, vol 4374. Springer, Berlin, pp 298–307 86. Simard P, Le Cun Y, Denker J (1993) Efficient pattern recognition using a new transformation distance. In: Hanson SJ, Cowan JD, Giles CL (eds) Advances in neural information processing systems, vol 5. Morgan Kaufmann, San Mateo, pp 50– 58 87. Simpson RL (1985) A A computer model of case-based reasoning in problem solving: An investigation in the domain of dispute mediation. Georgia Institute of Technology, Atlanta 88. Skellam JG (1952) Studies in statistical ecology, I, Spatial pattern. Biometrica 39:346–362 89. Skowron A (1993) Boolean reasoning for decision rules generation. In: Komorowski J, Ras Z (eds) Proceedings of ISMIS’93. Lecture Notes in AI, vol 689. Springer, Berlin, pp 295–305 90. Skowron A, Rauszer C (1992) The discernibility matrices and ´ R (ed) Intelligent functions in decision systems. In: Słowinski decision support. Handbook of applications and advances of the rough sets theory. Kluwer, Dordrecht, pp 311–362 91. Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundam Inf 27:245–253 92. Skowron A, Stepaniuk J (2001) Information granules: towards foundations of granular computing. Int J Intell Syst 16:57–85 93. Skowron A, Swiniarski RW (2004) Information granulation and pattern recognition. In: Pal SK, Polkowski L, Skowron A (eds), Rough – Neural Computing. Techniques for computing with words. Springer, Berlin, pp 599–636 94. Slezak D (2000) Various approaches to reasoning with frequency based decision reducts: a survey. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. New developments in knowledge discovery in information systems. Physica, Heidelberg, pp 235–288 ´ 95. Słowinski R, Stefanowski J (1992) “RoughDAS” and “RoughClass” software implementations of the rough set approach. ´ R (ed) Intelligent decision support: Handbook of In: Słowinski advances and applications of the rough sets theory. Kluwer, Dordrecht, pp 445–456 ´ 96. Słowinski R, Stefanowski J (1998) Rough family – software implementation of the rough set theory. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 580–586 97. Stanfill C, Waltz D (1986) Toward memory-based reasoning. Commun ACM 29:1213–1228 98. Mackie M (2006) Stanford encyclopedia of philosophy: Transworld identity http://plato.stanford.edu/entries/ identity-transworld Accessed 6 Sept 2008 99. Stefanowski J (1998) On rough set based approaches to induction of decision rules. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 1. Physica, Heidelberg, pp 500–529 100. Stefanowski J (2007) On combined classifiers, rule induction and rough sets. In: Transactions on rough sets, vol VI. Lec-

Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets

101.

102.

103.

104.

105.

106.

107. 108.

109. 110. 111. 112.

113. 114.

115.

116. 117.

ture Notes in Computer Science, vol 4374. Springer, Berlin, pp 329–350 Stepaniuk J (2000) Knowledge discovery by application of rough set models. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. New developments in knowledge discovery in information systems. Physica, Heidelberg, pp 138–233 Suraj Z (1998) TAS: Tools for analysis and synthesis of concurrent processes using rough set methods. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery, vol 2. Physica, Heidelberg, pp 587–590 Suraj Z (2000) Rough set methods for the synthesis and analysis of concurrent processes. In: Polkowski L, Tsumoto S, Lin TY (eds) Rough set methods and applications. New developments in knowledge discovery in information systems. Physica, Heidelberg, pp 379–490 Swiniarski RW (1998) RoughFuzzyLab: A system for data mining and rough and fuzzy sets based classification. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 591–593 Swiniarski RW, Skowron A (2004) Independent component analysis, principal component analysis and rough sets in face recognition. In: Transactions on rough sets, vol I. Lecture Notes in Computer Science, vol 3100. Springer, Berlin, pp 392–404 Sycara EP (1987) Resolving adversial conflicts: An approach to integrating case-based and analytic methods. Georgia Institute of Technology, Atlanta Toussaint GT (1980) The relative neighborhood graph of a finite planar set. Pattern Recognit 12(4):261–268 Tsumoto S (1998) PRIMEROSE. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 594–597 UCI Repository http://www.ics.uci.edu./mlearn/databases/ University of California, Irvine, Accessed 6 Sept 2008 Uhlmann J (1991) Satisfying general proximity/similarity queries with metric trees. Inf Process Lett 40:175–179 Veloso M (1994) Planning and learning by analogical reasoning. Springer, Berlin Vitoria A (2005) A framework for reasoning with rough sets. In: Transactions on rough sets, vol IV. Lecture Notes in Computer Science, vol 3100. Springer, Berlin, pp 178–276 Ward J (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58:236–244 Watson I, Marir F (1994) Case-based reasoning: A review http://www.ai-cbr.org/classroom/cbr-review.html Accessed 6 Sept 2008; see also: Watson I (1994). Knowl Eng Rev 9(4):327–354 White DA, Jain R (1996) Similarity indexing with the SS-tree. In: Proceedings of the twelve international conference on data engineering, New Orleans LA, pp 516–523 Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34 Wittgenstein L (1953) Philosophical investigations. Blackwell, London

118. Wojdyłło P (2004) WaRS: A method for signal classification. In: Pal SK, Polkowski L, Skowron A (eds) Rough – neural computing. Techniques for computing with words. Springer, Berlin, pp 649–688 119. Wojna A (2005) Analogy-based reasoning in classifier construction. In: Transactions on rough sets, vol IV. Lecture Notes in Computer Science, vol 3700. Springer, Berlin, pp 277–374 120. Wróblewski J (1998) Covering with reducts – a fast algorithm for rule generation. In: Lecture notes in artificial intelligence, vol 1424. Springer, Berlin, pp 402–407 121. Wróblewski J (2004) Adaptive aspects of combining approximation spaces. In: Pal SK, Polkowski L, Skowron A (eds) Rough – neural computing. Techniques for computing with words. Springer, Berlin, pp 139–156 122. Yao YY (2000) Granular computing: Basic issues and possible solutions. In: Proceedings of the 5th Joint Conference on Information Sciences I. Assoc Intell Machinery, Atlantic NJ, pp 186–189 123. Yao YY (2005) Perspectives of granular computing. In: Proceedings of IEEE 2005 Conference on Granular Computing GrC05, Beijing, China. IEEE Press, New York, pp 85–90 124. Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta M, Ragade R, Yaeger RR (eds) Advances in fuzzy set theory and applications. North-Holland, Amsterdam, pp 3–18 125. Zeeman EC (1965) The topology of the brain and the visual perception. In: Fort MK (ed) Topology of 3-manifolds and selected topics. Prentice Hall, Englewood Cliffs, pp 240–256 126. Ziarko W (1998) KDD-R: Rough set-based data mining system. In: Polkowski L, Skowron A (eds) Rough sets in knowledge discovery 2. Physica, Heidelberg, pp 598–601

Books and Reviews Avis D, Bhattacharya BK (1983) Algorithms for computing d-dimensional Voronoi diagrams and their duals. In: Preparata FP (ed) Advances in computing research: Computational geometry. JAI Press, Greenwich, pp 159–180 ´ Bochenski JM (1954) Die Zeitgenössischen Denkmethoden. A. Francke, Bern Dasarathy BV (ed) (1991) Nearest neighbor (NN) norms: NN Pattern classification techniques. IEEE Computer Society, Washington Friedman J (1994) Flexible metric nearest-neighbor classification. Technical Report, Stanford University Polkowski L (2002) Rough sets. Mathematical foundations. Physica, Heidelberg Russell SJ, Norvig P (2003) Artificial intelligence. A modern approach, 2nd edn. Prentice Hall Pearson Education, Upper Saddle River Toussaint GT, Bhattacharya BV, Poulsen RS (1984) Application of voronoi diagrams to nonparametric decision rules. In: Proceedings of Computer Science and Statistics: The Sixteenth Symposium on the Interface. North Holland, Amsterdam, pp 97–108 Watson I (1997) Applying case-based reasoning. Techniques for enterprise systems. Morgan Kaufmann, Elsevier, Amsterdam

809

810

Data-Mining and Knowledge Discovery, Introduction to

Data-Mining and Knowledge Discovery, Introduction to PETER KOKOL Department of Computer Science, University of Maribor, Maribor, Slovenia Data mining and knowledge discovery is the principle of analyzing large amounts of data and picking out relevant information leading to the knowledge discovery process for extracting meaningful patterns, rules and models from raw data making discovered patterns understandable. Applications include medicine, politics, games, business, marketing, bioinformatics and many other areas of science and engineering. It is an area of research activity that stands at the intellectual intersection of statistics, computer science, machine learning and database management. It deals with very large datasets, tries to make fewer theoretical assumptions than has traditionally been done in statistics, and typically focuses on problems of classification, prediction, description and profiling, clustering, and regression. In such domains, data mining often uses decision trees or neural networks as models and frequently fits them using some combination of techniques such as bagging, boosting/arcing, and racing. Data mining techniques include data visualization, neural network analysis, support vector machines, genetic and evolutionary algorithms, case based reasoning, etc. Other activities in data mining focus on issues such as causation in largescale systems, and this effort often involves elaborate statistical models and, quite frequently, Bayesian methodology and related computational techniques. Also data mining covers evaluation of the top down approach of model building, starting with an assumed mathematical model solving dynamical equations, with the bottom up approach of time series analysis, which takes measured data as input and provides as output a mathematical model of the system. Their difference is termed measurement error. Data mining allows characterization of chaotic dynamics, involves Lyapunov exponents, fractal dimension and Kolmogorov–Sinai entropy. Data mining comes in two main directions: directed and undirected. Directed data mining tries to categorize or explain some particular target field, while undirected data mining attempts to find patterns or similarities among groups of records without the specific goal. Pedrycz (see  Data and Dimensionality Reduction in Data Analysis and System Modeling) points out that data and dimensionality reduction are fundamental pursuits of data analysis and system modeling. With the rapid growth

of the size of data sets and the diversity of data themselves, the use of some reduction mechanisms becomes a necessity. Data reduction is concerned with a reduction of sizes of data sets in terms of the number of data points. This helps reveal an underlying structure in data by presenting a collection of groups present in data. Given the number of groups which are very limited, the clustering mechanisms become effective in terms of data reduction. Dimensionality reduction is aimed at the reduction of the number of attributes (features) of the data which leads to a typically small subset of features or brings the data from a highly dimensional feature space to a new one of a far lower dimensionality. A joint reduction process involves data and feature reduction. Brameier (see  Data-Mining and Knowledge Discovery, Neural Networks in) describes neural networks or, more precisely, artificial neural networks as mathematical and computational models that are inspired by the way biological nervous systems process information. A neural network model consists of a larger number of highly interconnected, simple processing nodes or units which operate in parallel and perform functions collectively, roughly similar as in biological neural networks. Artificial neural networks like their biological counterpart, are adaptive systems which learn by example. Learning works by adapting free model parameters, i. e., the signal strength of the connections and the signal flow, to external information that is presented to the network. In other terms, the information is stored in the weighted connections. If not stated explicitly, let the term “neural network” mean “artificial neural network” in the following. In more technical terms, neural networks are nonlinear statistical data modeling tools. Neural networks are generally well suited for solving problem tasks that involve classification of (necessarily numeric) data vectors, pattern recognition and decision-making. The power and usefulness of neural networks have been demonstrated in numerous application areas, like image processing, signal processing, biometric identification – including handwritten character, fingerprint, face, and speech recognition – robotic control, industrial engineering, and biomedicine. In many of these tasks neural networks outperform more traditional statistical or artificial intelligence techniques or may even achieve human-like performance. The most valuable characteristics of neural networks are adaptability and tolerance to noisy or incomplete data. Another important advantage is in solving problems that do not have an algorithmic solution or for which an algorithmic solution is too complex or time-consuming to be found. Brameier describes his chapter as providing a concise introduction to the two most popular neural network types

Data-Mining and Knowledge Discovery, Introduction to

used in application, back propagation neural networks and self-organizing maps The former learn high-dimensional non-linear functions from given input-output associations for solving classification and approximation (regression) problems. The latter are primarily used for data clustering and visualization and for revealing relationships between clusters. These models are discussed in context with alternative learning algorithms and neural network architectures. Polkowski (see  Data-Mining and Knowledge Discovery: Case-Based Reasoning, Nearest Neighbor and Rough Sets) describes rough set theory as a formal set of notions aimed at carrying out tasks of reasoning, in particular about classification of objects, in conditions of uncertainty. Conditions of uncertainty are imposed by incompleteness, imprecision and ambiguity of knowledge. Applications of this technique can be found in many areas like satellite images analysis, plant ecology, forestry, conflict analysis, game theory, and cluster identification in medical diagnosis. Originally, the basic notion proposed was that of a knowledge base, understood as a collection of equivalence relations on a universe of objects; each relation induces on the set a partition into equivalence classes. Knowledge so encoded is meant to represent the classification ability. As objects for analysis and classification come most often in the form of data, a useful notion of an information system is commonly used in knowledge representation; knowledge base in that case is defined as the collection of indiscernibility relations. Exact concepts are defined as unions of indiscernibility classes whereas inexact concepts are only approximated from below (lower approximations) and from above (upper approximations) by exact ones. Each inexact concept is, thus, perceived as a pair of exact concepts between which it is sandwiched. Povalej, Verlic and Stiglig (see  Discovery Systems) point out that results from simple query for “discovery system” on the World Wide Web returns different types of discovery systems: from knowledge discovery systems in databases, internet-based knowledge discovery, service discovery systems and resource discovery systems to more specific, like for example drug discovery systems, gene discovery systems, discovery system for personality profiling, and developmental discovery systems among others. As illustrated variety of discovery systems can be found in many different research areas, but they focus on knowledge discovery and knowledge discovery systems from the computer science perspective. A decision tree in data mining or machine learning is a predictive model or a mapping process from observations about an item to conclusions about its target value. More descriptive names for such tree models are classifica-

tion tree (discrete outcome) or regression tree (continuous outcome). In these tree structures, leaves represent classifications and branches represent conjunctions of features that lead to those classifications. Podgorelec and Zorman (see  Decision Trees) point out that the term “decision trees” has been used for two different purposes: in decision analysis as a decision support tool for modeling decisions and their possible consequences to select the best course of action in situations where one faces uncertainty, and in machine learning or data mining as a predictive model; that is, a mapping from observations about an item to conclusions about its target value. Their article concentrates on the machine learning view. Orlov, Sipper and Hauptman (see  Genetic and Evolutionary Algorithms and Programming: General Introduction and Application to Game Playing) cover genetic and evolutionary algorithms, which are a family of search algorithms inspired by the process of (Darwinian) evolution in Nature. Common to all the different family members is the notion of solving problems by evolving an initially random population of candidate solutions, through the application of operators like crossover and mutation inspired by natural genetics and natural selection, such that in time “fitter” (i. e., better) solutions emerge. The field, whose origins can be traced back to the 1950s and 1960s, has come into its own over the past two decades, proving successful in solving multitudinous problems from highly diverse domains including (to mention but a few): optimization, automatic programming, electronic-circuit design, telecommunications, networks, finance, economics, image analysis, signal processing, music, and art. Berkhin and Dhillon (see  Knowledge Discovery: Clustering) discuss the condition where data found in scientific and business applications usually do not fit a particular parametrized probability distribution. In other words, the data are complex. Knowledge discovery starts with exploration of this complexity in order to find inconsistencies, artifacts, errors, etc. in the data. After data are cleaned, it is usually still extremely complex. Descriptive data mining deals with comprehending and reducing this complexity. Clustering is a premier methodology in descriptive unsupervised data mining. A cluster could represent an important subset of the data such as a galaxy in astronomical data or a segment of customers in marketing applications. Clustering is important as a fundamental technology to reduce data complexity and to find data patterns in an unsupervised fashion. It is universally used as a first technology of choice in data exploration. Džeroski, Panov and Zenko (see  Machine Learning, Ensemble Methods in) cover ensemble methods, which are

811

812

Data-Mining and Knowledge Discovery, Introduction to

machine learning methods that construct a set of predictive models and combine their outputs into a single prediction. The purpose of combining several models together is to achieve better predictive performance, and it has been shown in a number of cases that ensembles can be more accurate than single models. While some work on ensemble methods has already been done in the 1970s, it was not until the 1990s, and the introduction of methods such as bagging and boosting that ensemble methods started to be more widely used. Today, they represent a standard machine learning method which has to be considered whenever good predictive accuracy is demanded. Liu and Zhao (see  Manipulating Data and Dimension Reduction Methods: Feature Selection) cover feature selection, which is the study of algorithms for reducing dimensionality of data for various purposes. One of the most common purposes is to improve machine learning performance. The other purposes include simplifying data description, streamlining data collection, improving com-

prehensibility of the learned models, and helping gain insight through learning. The objective of feature selection is to remove irrelevant and/or redundant features and retain only relevant features. Irrelevant features can be removed without affecting learning performance. Redundant features are a type of irrelevant features. The distinction is that a redundant feature implies the co-presence of another feature; individually, each feature is relevant, but the removal of either one will not affect learning performance. As a plethora of data are generated in every possible means with the exponential decreasing costs of data storage and computer processing power, data dimensionality increases on a scale beyond imagination in cases ranging from transactional data to high-throughput data. In many fields such as medicine, health care, Web search, and bioinformatics, it is imperative to reduce high dimensionality such that efficient data processing and meaningful data analysis can be conducted in order to mine nuggets from high-dimensional, massive data.

Data-Mining and Knowledge Discovery, Neural Networks in

Data-Mining and Knowledge Discovery, Neural Networks in MARKUS BRAMEIER Bioinformatics Research Center, University of Aarhus, Århus, Denmark Article Outline Glossary Definition of the Subject Introduction Neural Network Learning Feedforward Neural Networks Backpropagation Other Learning Rules Other Neural Network Architectures Self-organizing Maps Future Directions Bibliography Glossary Artificial neural network An artificial neural network is a system composed of many simple, but highly interconnected processing nodes or neurons which operate in parallel and collectively. It resembles biological nervous systems in two basic functions: (1) Experiential knowledge is acquired through a learning process and can be retrieved again later. (2) The knowledge is stored in the strength (weights) of the connections between the neurons. Artificial neuron An artificial neuron receives a number of inputs, which may be either external inputs to the neural network or outputs of other neurons. Each input connection is assigned a weight, similar to the synaptic efficacy of a biological neuron. The weighted sum of inputs is compared against an activation level (threshold) to determine the activation value of the neuron. Activation function The activation or transfer function transforms the weighted inputs of a neuron into an output signal. Activation functions often have a “squashing” effect. Common activation functions used in neural networks are: threshold, linear, sigmoid, hyperbolic, and Gaussian. Learning rule The learning rule describes the way a neural network is trained, i. e., how its free parameters undergo changes to fit the network to the training data. Feedforward network Feedforward neural networks are organized in one or more layers of processing units (neurons). In a feedforward neural network the signal

is allowed to flow one-way only, i. e., from inputs to outputs. There are no feedback loops, i. e., the outputs of a layer do not affect its inputs. Feedback networks In feedback or recurrent networks signals may flow in both directions. Feedback networks are dynamic such that they have a state that is changing continuously until it reaches an equilibrium point. Definition of the Subject Neural networks (NNs) or, more precisely, artificial neural networks (ANNs) are mathematical and computational models that are inspired by the way biological nervous systems process information. A neural network model consists of a larger number of highly interconnected, simple processing nodes or units which operate in parallel and perform functions collectively, roughly similar to biological neural networks. ANNs, like their biological counterpart, are adaptive systems that learn by example. Learning works by adapting free model parameters, i. e., the signal strength of the connections and the signal flow, to external information that is presented to the network. In other terms, the information is stored in the weighted connections. If not stated explicitly, let the term “neural network” mean “artificial neural network” in the following. In more technical terms, neural networks are non-linear statistical data modeling tools. Neural networks are generally well suited for solving problem tasks that involve classification, pattern recognition and decision making. The power, and usefulness of neural networks have been demonstrated in numerous application areas, like image processing, signal processing, biometric identification – including handwritten character, fingerprint, face, and speech recognition – robotic control, industrial engineering, and biomedicine. In many of these tasks neural networks outperform more traditional statistical or artificial intelligence techniques or may even achieve humanlike performance. The most valuable characteristics of neural networks are adaptability and tolerance to noisy or incomplete data. Another important advantage is in solving problems that do not have an algorithmic solution or for which an algorithmic solution is too complex or timeconsuming to be found. The first neural network model was developed by McCulloch and Pitts in the 1940s [32]. In 1958 Rosenblatt [41] described the first learning algorithm for a single neuron, the perceptron model. After Rumelhart et al. [46] invented the popular backpropagation learning algorithm for multi-layer networks in 1986, the field of neural networks gained incredible popularity in the 1990s.

813

814

Data-Mining and Knowledge Discovery, Neural Networks in

Neural networks is regarded as a method of machine learning, the largest subfield of artificial intelligence (AI). Conventional AI mainly focuses on the development of expert systems and the design of intelligent agents. Today neural networks also belong to the more recent field of computational intelligence (CI), which also includes evolutionary algorithms (EAs) and fuzzy logic.

Artificial neural networks copy only a small amount of the biological complexity by using a much smaller number of simpler neurons and connections. Nevertheless, artificial neural networks can perform remarkably complex tasks by applying a similar principle, i. e., the combination of simple and local processing units, each calculating a weighted sum of its inputs and sending out a signal if the sum exceeds a certain threshold.

Introduction Herein I provide a concise introduction to the two most popular neural network types used in application, backpropagation neural networks (BPNNs) and self-organizing maps (SOMs). The former learn high-dimensional non-linear functions from given input–output associations for solving classification and approximation (regression) problems. The latter are primarily used for data clustering and visualization and for revealing relationships between clusters. These models are discussed in context with alternative learning algorithms and neural network architectures. Biological Motivation Artificial neural networks are inspired by biological nervous systems, which are highly distributed and interconnected networks. The human brain is principally composed of a very large number of relatively simple neurons (approx. 100 billion), each of which is connected to several thousand other neurons, on average. A neuron is a specialized cell that consists of the cell body (the soma), multiple spine-like extensions (the dendrites) and a single nerve fiber (the axon). The axon connects to the dendrites of another neuron via a synapse. When a neuron is activated, it transmits an electrical impulse (activation potential) along its axon. At the synapse the electric signal is transformed into a chemical signal such that a certain number of neurotransmitters cross the synaptic gap to the post synaptic neuron, where the chemical signal is converted back to an electrical signal to be transported along the dendrites. The dendrites receive signals from the axons of other neurons. One very important feature of neurons is that they react delayed. A neuron combines the strengths (energies) of all received input signals and sends out its own signal (“fires”) only if the total signal strength exceeds a certain critical activation level. A synapse can either be excitatory or inhibitory. Input signals from an excitatory synapse increase the activation level of the neuron while inputs from an inhibitory synapse reduce it. The strength of the input signals critically depends on modulations at the synapses. The brain learns basically by adjusting number and strength of the synaptic connections.

History and Overview The history of artificial neural networks begins with a discrete mathematical model of a biological neural network developed by pioneers McCulloch and Pitts in 1943 [32]. This model describes neurons as threshold logic units (TLUs) or binary decision units (BDNs) with multiple binary inputs and a single binary output. A neuron outputs 1 (is activated) if the sum of its unweighted inputs exceeds a certain specified threshold, otherwise it outputs 0. Each neuron can only represent simple logic functions like OR or AND, but any boolean function can be realized by combinations of such neurons. In 1958 Rosenblatt [41] extended the McCulloch–Pitts model to the perceptron model. This network was based on a unit called the perceptron, which produces an output depending on the weighted linear combination of its inputs. The weights are adapted by the perceptron learning rule. Another single-layer neural network that is based on the McCulloch–Pitts neuron is the ADALINE (ADAptive Linear Element) which was invented in 1960 by Widrow and Hoff [55] and employs a Least-Mean-Squares (LMS) learning rule. In 1969 Minsky and Papert [33] provided mathematical proofs that single-layer neural networks like the perceptron are incapable of representing functions which are linearly inseparable, including in particular the exclusiveor (XOR) function. This fundamental limitation led the research on neural networks to stagnate for many years, until it was found that a perceptron with more than one layer has far greater processing power. The backpropagation learning method was first described by Werbos in 1974 [53,54], and further developed for multi-layer neural networks by Rumelhart et al. in 1986 [46]. Backpropagation networks are by far the most well known and most commonly used neural networks today. Recurrent auto-associative networks were first described independently by Anderson [2] and Kohonen [21] in 1977. Invented in 1982, the Hopfield network [17] is a recurrent neural network in which all connections are symmetric. All neurons are both input and output neurons

Data-Mining and Knowledge Discovery, Neural Networks in

and update their activation values asynchronously and independently from each other. For each new input the network converges dynamically to a new stable state. A Hopfield network may serve as an associative, i. e., content-addressable, memory. The Boltzmann machine by Ackley et al. [1] can be seen as an extension of the Hopfield network. It uses a stochastic instead of a deterministic update rule that simulates the physical principle of annealing. The Boltzmann machine is one of the first neural networks to demonstrate learning of an internal representation (hidden units). It was also in 1982 when Kohonen first published his self-organizing maps [22], a neural networks model based on unsupervised learning and competitive learning. SOMs produce a low-dimensional representation (feature map) of high-dimensional input data while preserving their most important topological features. Significant progress was made in the 1990s in the field of neural networks which attracted a great deal of attention both in research and in many application domains (see Sect. “Definition of the Subject”). Faster computers allowed a more efficient solving of more complex problems. Hardware implementations of larger neural networks were realized on parallel computers or in neural network chips with multiple units working simultaneously. Characteristics of Neural Networks Interesting general properties of neural networks are that they (1) mimic the way the brain works, (2) are able to learn by experience, (3) make predictions without having to know the precise underlying model, (4) have a high fault tolerance, i. e., can still give the correct output to missing, noisy or partially correct inputs, and (5) can work with data they have never seen before, provided that the underlying distribution fits the training data. Computation in neural networks is local and highly distributed throughout the network such that each node operates by itself, but tries to minimize the overall network (output) error in cooperation with other nodes. Working like a cluster of interconnected processing nodes, neural networks automatically distribute both the problem and the workload among the nodes in order to find and converge to a common solution. Actually, parallel processing was one of the original motivations behind the development of artificial neural networks. Neural networks have the topology of a directed graph. There are only one-way connections between nodes, just like in biological nervous systems. A two-way relationship requires two one-way connections. Many different net-

work architectures are used, often with hundreds or thousands of adjustable parameters. Neural networks are typically organized in layers which each consist of a number of interconnected nodes (see Fig. 2 below). Data patterns are presented to the system via the input layer. This non-computing layer is connected to one or more hidden layers where the actual processing is done. The hidden layers then link to an output layer which combines the results of multiple processing units to produce the final response of the network. The network acts as a high-dimensional vector function, taking one vector as input and returning another vector as output. The modeled functions are general and complex enough to solve a large class of non-linear classification and estimation problems. No matter which problem domain a neural network is operating in, the input data always have to be encoded into numbers which may be continuous or discrete. The high number of free parameters in a neural network and the high degree of collinearity between the neuron outputs let individual parameter settings (weight coefficients) become meaningless and make the network for the most part uninterpretable. High-order interactions between neurons also do not permit simpler substructures in the model to be identified. Therefore, an extraction of the acquired knowledge from such black box predictors to understand the underlying model is almost impossible. Neural Network Learning The human brain learns by practice and experience. The learned knowledge can change if more information is received. Another important element of learning is the ability to infer knowledge, i. e., to make assumptions based on what we know and to apply what we have learned in the past to similar problems and situations. One theory of the physiology of human learning by repetition is that repeated sequences of impulses strengthen connections between neurons and form memory paths. To retrieve the learned information, nerve impulses follow these paths to the correct information. If we get out of practice these paths may diminish over time and we forget what we have learned. The information stored in a neural network is contained in its free parameters. In general, the network architecture and connections are held constant and only the connection weights are variable during training. Once the numbers of (hidden) layers and units have been selected, the free parameters (weights) are set to fit the model or function represented by the network to the training data, following a certain training algorithm or learning rule.

815

816

Data-Mining and Knowledge Discovery, Neural Networks in

A neural network learns, i. e., acquires knowledge, by adjusting the weights of connections between its neurons. This is also referred to as connectionist learning. Training occurs iteratively in multiple cycles during which the training examples are repeatedly presented to the network. During one epoch all data patterns pass through the network once. The three major learning paradigms include supervised learning, unsupervised learning, and reinforcement learning. Supervised learning, in general, means learning by an external teacher using global information. The problem the network is supposed to solve is defined through a set of training examples given. The learning algorithm searches the solution space F, the class of possible functions, for a function f  2 F that matches this set of input–output associations (E x ; yE) best. In other words, the mapping implied by the sample data has to be inferred. Training a neural network means to determine a set of weights which minimizes its prediction error on the training set. The cost or error function E : F ! R measures the error between the desired output values yE and the predicted network outputs f (E x ) over all input vectors xE. That means, it calculates how far away the current state f of the network is from the optimal solution f  with E( f  )  E( f ) 8 f 2 F. A neural network cannot perfectly learn a mapping if the input data does not contain enough information to derive the desired outputs. It may also not converge if there is not enough data available (see also below). Unsupervised learning uses no external teacher and only local information. It is distinguished from supervised learning by the fact that there is no a priori output. In unsupervised learning we are given some input data xE, and the cost function to be minimized can be any function of xE and the network output f (E x ). Unsupervised learning incorporates self-organization, i. e., organizes the input data by using only their inherent properties to reveal their emergent collective properties. A neural network learns offline if learning phase and operation (application) phase are separated. A neural network learns online if both happens at the same time. Usually, supervised learning is performed offline, whereas unsupervised learning is performed online. In reinforcement learning, neither inputs xE nor outputs yE are given explicitly, but are generated by the interactions of an agent within an environment. The agent performs an action yE with costs c according to an observation xE made in the environment. The aim is to discover a policy or plan for selecting actions that minimizes some measure of the expected total costs.

Overtraining and Generalization The overall motivation and most desirable property of neural networks is their ability to generalize to new unknown data, i. e., to classify patterns correctly on which they have not been trained. Minimizing the network error on the training examples only, does not automatically minimize the real error of the unknown underlying function. This important problem is called overfitting or overtraining. A regular distribution of training examples over the input data space is important. Generalization is reasonable only as long as the data inputs remain inside the range for which the network was trained. If the training set only included vectors from a certain part of the data space, predictions on other parts are random and likely wrong. Overtraining may occur also when the iterative training algorithm is run for too long and if the network is too complex for the problem to solve or the available quantity of data. A larger neural network with more weights models a more complex function and invariably achieves a lower error, but is prone to overfitting. A network with less weights, on the other hand, may not be sufficiently powerful to model the underlying function. A simple heuristic, called early stopping, helps to ensure that the network will generalize well to examples not in the training set. One solution is to check progress during training against an independent data set, the validation set. As training progresses, the training error naturally decreases monotonically and, providing training is minimizing the true error function, also the validation error decreases. However, if the validation error stops dropping or even starts to increase again, this is an indication that the network is starting to overfit the data. Then the optimization process has become stuck in a local minima and training should be stopped. The weights that produced the minimum validation error are then used for the final model. In this case of overtraining, the size of the network, i. e., the number of hidden units and/or hidden layers, may be decreased. Neural networks typically involve experimenting with a large number of different configurations, training each one a number of times while observing the validation error. A problem with repeated experimentation is that the validation set is actually part of the training process. One may just find a network by chance that happens to perform well on the validation set. It is therefore normal practice to reserve a third set of examples for testing the final model on this test set. In many cases a sufficient amount of data is not available, however. Then we have to get around this problem by

Data-Mining and Knowledge Discovery, Neural Networks in

resampling techniques, like cross validation. In principle, multiple experiments are conducted, each using a different division of the available data into training and validation set. This should remove any sampling bias. For small data sets, where splitting the data would leave too few observations for training, leave-one-out validation may be used to determine when to stop training or the optimal network size. Machine-learning techniques, like neural networks, require both positive and negative training examples for solving classification problems. Because they minimize an overall error, the proportion of positive and negative examples in the training set is critical. Ideally, the relation should be close to the (usually unknown) real distribution in the data space. Otherwise, it may bias the network’s decision to be more often wrong on unknown data.

Feedforward Neural Networks In feedforward neural networks the information is passed in only one direction (forward) from the inputs, through the hidden nodes (if any) to the output nodes. There are no connections backwards to neurons of upper layers i. e., there are no feedback loops or cycles in the network. In feedforward networks with a single-layer of weights, the inputs are directly connected to the output units (single-layer neural networks). Multi-layer feedforward networks use additional intermediate layers of hidden units. Neural networks with two or more processing layers may have far greater processing power than networks with only one layer. Single-layer neural networks are only capable of learning linearly separable patterns and functions.

The Perceptron The most simple kind of feedforward neural network is a perceptron, which consists of a single pseudo input layer and one or more processing nodes in the output layer. All inputs are weighted and fed directly to the output neuron(s) (see Fig. 1). Each node calculates the sum of the products of weights and inputs. If this value is above some threshold (typically 0) the neuron takes the activated value 1, otherwise it outputs 0. Neurons with this kind of activation function are called threshold units. In the literature the term perceptron often refers to networks consisting of only one (output) neuron. More formally, the perceptron is a linear binary classifier that maps a binary n-dimensional input vector E  xE) 2 f0; 1g calxE 2 f0; 1gn to a binary output value f (w culated as

Data-Mining and Knowledge Discovery, Neural Networks in, Figure 1 Principle structure of single-unit perceptron network

f (s) D

1 0

if s > T otherwise

(1)

P E  xE D niD1 w i x i is the input sum and w E is where s D w a vector of real-valued weights. The constant threshold T does not depend on any input value. In addition to the network topology, the learning rule is an important component of neural networks. Perceptrons can be trained by a simple learning algorithm, called the delta rule or perceptron learning rule. This realizes a simple stochastic gradient descent where the weights of the network are adjusted depending on the error between the predicted outputs of the network and the example outputs. The delta rule changes the weight vector such that the output error is minimized. McClelland and Rumelhart [30] proved that a neural network using the delta rule can learn associations whenever the inputs are linearly independent. All neurons of a perceptron share the same structure and learning algorithm. Each weight w i j , representing the influence of input x i on neuron j, is updated at time t according to the rule: w i j (t C 1) D w i j (t) C w i j

(2)

w i j D ˛(o i  y i )x i j :

(3)

The network learns by updating the weight vector after each iteration (training example) by an amount proportional to the difference between given output oi and calculated output y i D f (s i ). The learning rate ˛ is a constant with 0 < ˛ < 1 and regulates the learning speed. The training data set is linearly separable in n-dimensional data space if its two-classes of vectors xE can be separated by an (n  1)-dimensional hyperplane. If the training examples are not linearly separable, the perceptron learning algorithm is not guaranteed to converge. Linear classifiers, like single-unit perceptrons, are only able

817

818

Data-Mining and Knowledge Discovery, Neural Networks in

to learn, i. e., perfectly classify, linearly separable patterns because they can only implement a simple decision surface (single hyperplane) [33]. The same is true for singlelayer neural networks with more than one output unit. This makes these linear neural networks unable to learn, for example, the XOR function [33]. Nevertheless a problem that is thought to be highly complex may still be solved as well by a linear network as by a more powerful (non-linear) neural network. Single-Layer Neural Networks In general, the state of a neuron is represented by its activation value. An activation or transfer function f calculates the activation value of a unit from the weighted sum s of its inputs. In case of the perceptron f is called step or threshold function, with the activation value being 1 if the network sum is greater than a constant T, and 0 otherwise (see Eq. (1)). Another common form of non-linear activation function is the logistic or sigmoid function: f (s) D

1 : 1 C es

(4)

This enables a neural network to compute a continuous output between 0 and 1 instead of a step function. With this choice, a single-layer network is identical to a logistic regression model. If the activation functions is linear, i. e., the identity, then this is just a multiple linear regression and the output is proportional to the total weighted sum s. Multi-Layer Neural Networks The limitation that non-linearly separable functions cannot be represented by a single-layer network with fixed weights can be overcome by adding more layers. A multilayer network is a feedforward network with two or more layers of computational units, interconnected such that the neurons’ outputs of one layer serve as inputs only to neurons of the directly subsequent layer (see Fig. 2). The input layer is not considered a real layer with processing neurons. The number of units in the input layer is determined by the problem, i. e., the dimension of the input data space. The number of output units also depends on the output encoding (see Subsect.“ Application Issues”). By using hidden layers, the partitioning of the data space can be more effective. In principle, each hidden unit adds one hyperplane to divide the space and discriminate the solution. Only if the outputs of at least two neurons are combined in a third neuron, the XOR problem is solv-

Data-Mining and Knowledge Discovery, Neural Networks in, Figure 2 Principle structure of multi-layer feedforward neural network

able. Important issues in multi-layer NN design are, thus, the specification of the number of hidden layers and the number of units in these layers (see also Subsect. “Application Issues”). Both numbers determine the complexity of functions that can be modeled. There is no theoretical limitation on the number of hidden layers, but usually one or two are used. The universal approximation theorem for neural networks states that any continuous function that maps intervals of real numbers to an output interval of real numbers can be approximated arbitrarily closely by a multilayer neural network with only one hidden layer and certain types of non-linear activation functions. This gives, however, no indication about how fast or likely a solution is found. Networks with two hidden layers may work better for some problems. However, more than two hidden layers usually provide only marginal benefit compared to the significant increase in training time. Any multi-layer network with fixed weights and linear activation function is equivalent to a single-layer (linear) network: In the case of a two-layer linear system, for instance, let all input vectors to the first layer form matrix X and W 1 and W 2 be the weight matrices of the two processing layers. Then the output Y1 D W1  X of the first layer is input to the second layer, which produces output Y2 D W2  (W1  X) D (W2  W1 )  X. This is equivalent to a single-layer network with weight matrix W D W2  W1 . Only a multi-layer network that is non-linear can provide more computational power. In many applications these networks use a sigmoid function as non-linear activation function. This is the case at least for the hidden units. For the output layer, the sigmoid activation function is usually applied with classification problems, while a linear transfer function is applied with regression problems.

Data-Mining and Knowledge Discovery, Neural Networks in

Error Function and Error Surface

Backpropagation

The error function derives the overall network error from the difference of the network’s output yij and target output oij over all examples i and output units j. The mostcommon error function is the sum squared error:

The best known and most popular training algorithm for multi-layer networks is backpropagation, short for backwards error propagation and also referred to as the generalized delta rule [46]. The algorithm involves two phases: Forward pass. During the first phase, the free parameters (weights) of the network are fixed. An example pattern is presented to the network and the input signals are propagated through the network layers to calculate the network output at the output unit(s). Backward pass. During the second phase, the model parameters are adjusted. The error signals at the output units, i. e., the differences between calculated and expected outputs, are propagated back through the network layers. In doing so, the error at each processing unit is calculated and used to make adjustments to its connecting weights such that the overall error of the network is reduced by some small amount. After iteratively repeating both phases for a sufficiently large number of training cycles (epochs) the network will converge to a state where its output error is small enough. The backpropagation rule involves the repeated use of the chain rule, saying that the output error of a neuron can be ascribed partly to errors in the weights of its direct inputs and partly to errors in the outputs of higher-level (hidden) nodes [46]. Moreover, backpropagation learning may happen in two different modes. In sequential mode or online mode weight adjustments are made example by example, i. e., each time an example pattern has been presented to the network. The batch mode or offline mode adjustments are made epoch by epoch, i. e., only after all example patterns have been presented. Theoretically, the backpropagation algorithm performs gradient descent on the total error only if the weights are updated epoch-wise. There are empirical indications, however, that a pattern-wise update results in faster convergence. The training examples should be presented in random order. Then the precision of predictions will be more similar over all inputs. Backpropagation learning requires a differentiable activation function. Besides adding non-linearity to multilayer networks, the sigmoid activation function (see Eq. (4)) is often used in backpropagation networks because it has a continuous derivative that can be calculated easily:

ED

1 XX (o i j  y i j )2 : 2 i

(5)

j

Neural network training performs a search within the space of solution, i. e., all possible network configurations, towards a global minimum of the error surface. The global minimum is the best overall solution with the lowest possible error. A helpful concept for understanding NN training is the error surface: The n weight parameters of the network model form the n dimensions of the search space. For any possible state of the network or configuration of weights the error is plotted in the (n C 1)th dimension. The objective of training is to find the lowest point on this n-dimensional surface. The error surface is seldom smooth. Indeed, for most problems, the surface is quite rugged with numerous hills and valleys which may cause the network search to run into a local minimum, i. e., a Suboptimum solution. The speed of learning is the rate of convergence between the current solution and the global minimum. In a linear network with a sum-squared error function, the error surface is an multi-dimensional parabola, i. e., has only one minimum. In general, it is not possible to analytically determine where the global minimum of the error surface is. Training is essentially an exploration of the error surface. Because of the probabilistic and often highly non-linear modeling by neural networks, we cannot be sure that the error could not be lower still, i. e., that the minimum we found is the absolute one. Since the shape of the error space cannot be known a priori, neural network analysis requires a number of independent runs to determine the best solution. When different initial values for the weights are selected, different network models will be derived. From an initially random configuration of the network, i. e., a random point on the error surface, the training algorithm starts seeking for the global minimum. Small random values are typically used to initialize the network weights. Although neural networks resulting from different initial weights may have very different parameter settings, their prediction errors usually do not vary dramatically. Training is stopped when a maximum number of epochs has expired or when the network error does not improve any further.

f 0 (s) D f (s)(1  f (s)) :

(6)

We further assume that there is only one hidden layer in order to keep notations and equations clear. A generalization to networks with more than one hidden layer is straightforward.

819

820

Data-Mining and Knowledge Discovery, Neural Networks in

The backpropagation rule is a generalization of the delta learning rule (see Eq. (3)) to multi-layer networks with non-linear activation function. For an input vector xE the output y D f (s) is calculated at each output neuron of the network and compared with the desired target output o, resulting in an error ı. Each weight is adjusted proportionally to its effect on the error. The weight of a connection between a unit i and a unit j is updated depending on the output of i (as input to j) and the error signal at j: w i j D ˛ı j y i :

(7)

For an output node j the error signal (error surface gradient) is given by: ı j D (o j  y j ) f 0 (s j ) D (o j  y j )y j (1  y j ) :

(8)

If the error is zero, no changes are made to the connection weight. The larger the absolute error, the more the responsible weight is changed, while the sign of the error determines the direction of change. For a hidden neuron j the error signal is calculated recursively using the signals of all directly connected output neurons k. 0

ı j D f (s j )

X k

ı k w jk D y j (1  y j )

X

ı k w jk :

(9)

k

The partial derivative of the error function with respect to the network weights can be calculated purely locally, such that each neuron needs information only from neurons directly connected to it. A theoretical foundation of the backpropagation algorithm can be found in [31]. The backpropagation algorithm performs a gradient descent by calculating the gradient vector of the error surface at the current search point. This vector points into the direction of the steepest descent. Moving in this direction will decrease the error and will eventually find a new (local) minimum, provided that the step size is adapted appropriately. Small steps slow down learning speed, i. e., require a larger number of iterations. Large steps may converge faster, but may also overstep the solution or make the algorithm oscillate around a minimum without convergence of the weights. Therefore, the step size is made proportional to the slope ı, i. e., is reduced when the search point approaches a minimum, and to the learning rate ˛. The constant ˛ allows one to control the size of the gradient descent step and is usually set to be between 0.1 and 0.5. For practical purposes, it is recommended to choose the learning rate as large as possible without leading to oscillation.

Momentum One possibility to avoid oscillation and to achieve faster convergence is in the addition of a momentum term that is proportional to the previous weight change: w i j (t C 1) D ˛ı j y i C ˇw i j (t) :

(10)

The algorithm increases learning speed step size if it has taken several steps in the same direction. This gives it the ability to overcome obstacles in the error surface, e. g., to avoid and escape from local minima, and to move faster over larger plateaus. Finding the optimum learning rate ˛ and momentum scale parameter ˇ, i. e., the best trade-off between longer training time and instability, can be difficult and might require many experiments. Global or local adaptation techniques use, for instance, the partial derivative to automatically adapt the learning rate. Examples here are the DeltaBar-Delta rule [18] and the SuperSAB algorithm [51].

Other Learning Rules The backpropagation learning algorithm is computationally efficient in that its time complexity is linear in the number of weight parameters. Its learning speed is comparatively low, however, on the basis of epochs. This may result in long training times, especially for difficult and complex problems requiring larger networks or larger amounts of training data. Another major limitation is that backpropagation does not always converge. Still, it is a widely used algorithm and has its advantages: It is relatively easy to apply and to configure and provides a quick, though not absolutely perfect solution. Its usually pattern-wise error adjustment is hardly affected by data that contains a larger number of redundant examples. Standard backpropagation also generalizes equally well on small data sets as more advanced algorithms, e. g., if there is insufficient information available to find a more precise solution. There are many variations of the backpropagation algorithm, like resilient propagation (Rprop) [42], quick propagation (Quickprop) [13], conjugate gradient descent [6], Levenberg–Marquardt [16], Delta-BarDelta [18], to mention the most popular. All these secondorder algorithms are designed to deal with some of the limitations on the standard approach. Some work substantially faster in many problem domains, but require more control parameters than backpropagation, which makes them more difficult to use.

Data-Mining and Knowledge Discovery, Neural Networks in

Resilient Propagation Resilient propagation (Rprop) as proposed in [42,43] is a variant of standard backpropagation with very robust control parameters that are easy to adjust. The algorithm converges faster than the standard algorithm without being less accurate. The size of the weight step w i j taken by standard backpropagation not only depends on the learning rate ˛, but also on the size of the partial derivative (see Eq. (7)). This may have an unpredictable influence during training that is difficult to control. Therefore, Rprop uses only the sign of derivative to adjust the weights. It necessarily requires learning by epoch, i. e., all adjustments take place after each epoch only. One iteration of the Rprop algorithm involves two steps, the adjustment of the step size and the update of the weights. The amount of weight change is found by the following update rule: 8 C <   i j (t  1) if d i j (t  1)  d i j (t) > 0   i j (t  1) if d i j (t  1)  d i j (t) < 0  i j (t) D : otherwise  i j (t  1) (11) with 0 <  < 1 < C and derivative d i j D ı j y i . Every time t the derivative term changes its sign, indicating that the last update (at time t  1) was too large and the algorithm has jumped over a local minimum, the update value (step size)  i j (t  1) is decreased by a constant factor  . The rule for updating the weights is straightforward: w i j (t C 1) D w i j (t) C w i j (t) 8 <  i j (t) if d i j (t) > 0 C i j (t) if d i j (t) < 0 w i j (t) D : 0 otherwise

(12)

:

(13)

One advantage of the Rprop algorithm, compared to, for example, Quickprop [13], is its small set of parameters that hardly requires adaptation. Standard values for decrease factor  and increase factor C are 0.5 and 1.2, respectively. To avoid too large or too small weight values, this is bounded above by max and bounded below by min , set by default to 50 and 106 . The same initial value 0 D 0:1 is recommended for all ij . While the choice of the parameter settings is not critical, for most problems no other choice is needed to obtain the optimal or at least a nearly optimal solution. Application Issues The architecture of a neural network, i.e, the number of (hidden) neurons and layers, is an important decision. If

a neural network is highly redundant and overparameterized, it might adapt too much to the data. Thus, there is a trade-off between reducing bias (fitting the training data) and reducing variance (fitting unknown data). The mostcommon procedure is to select a network structure that has more than enough parameters and neurons and then to avoid overfitting only over the training algorithm (see Subsect. “Overtraining and Generalization”). There is no general best network structure for a particular type of application. There are only general rules for selecting the network architecture: (1) The more complex the relationships between input and output data are, the higher the number of hidden units should be selected. (2) If the modeled process is separable into multiple stages, more than one hidden layer may be beneficial. (3) An upper bound for the total number of hidden units may be set by the number of data examples divided by the number of input and output units and multiplied by a scaling factor. A simpler rule is to start with one hidden layer and half as many hidden units as there are input and output units. One would expect that for a given data set there would be an optimal network size, lying between a minimum of one hidden neuron (high bias, low variance) and a very large number of neurons (low bias, high variance). While this is true for some data sets, in many cases increasing the number of hidden nodes continues to improve prediction accuracy, as long as cross validation is used to stop training in time. For classification problems, the neural network assigns to each input case a class label or, more generally, estimates the probability of the case to fall into each class. The various output classes of a problem are normally represented in neural networks using one of two techniques, including binary encoding and one-out-of-n encoding. A binary encoding is only possible for two-class problems. A single unit calculates class 1 if its output is above the acceptance threshold. If the output is below the rejection threshold, class 0 is predicted. Otherwise, the output class is undecided. Should the network output be always defined, both threshold values must be equal (e. g. 0.5). In one-out-of-n encoding one unit is allocated for each class. A class is selected if the corresponding output is above the acceptance threshold and all the other outputs are below the rejection threshold. If this condition is not met, the class is undecided. Alternatively, instead of using a threshold, a winner-takes-all decision may be made such that the unit with the highest output gives the class. For regression problems, the objective is to estimate the value of a continuous output variable, given the input variables. Particularly important issues in regression are output scaling and interpolation. The most-common NN

821

822

Data-Mining and Knowledge Discovery, Neural Networks in

architectures produce outputs in a limited range. Scaling algorithms may be applied to the training data to ensure that the target outputs are in the same range. Constraining the network’s outputs limits its generalization performance. To overcome this, a linear activation function may be used for the output units. Then there is often no need for output scaling at all, since the units can in principle calculate any value. Other Neural Network Architectures This section summarizes some alternative NN architectures that are variants or extensions of multi-layer feedforward networks. Cascade Correlation Networks Cascade-correlation is a neural network architecture with variable size and topology [14]. The initial network has no hidden layer and grows during training by adding new hidden units one at a time. In doing so, a near minimal network topology is built. In the cascade architecture the outputs from all existing hidden neurons in the network are fed into a new neuron. In addition, all neurons – including the output neurons – receive all input values. For each new hidden unit, the learning algorithm tries to maximize the correlation between this unit’s output and the overall network error using an ordinary learning algorithm, like, e. g., backpropagation. After that the inputside weighs of the new neuron are frozen. Thus, it does not change anymore and becomes a permanent feature detector. Cascade correlation networks have several advantages over multi-layer perceptrons: (1) Training time is much shorter already because the network size is relatively small. (2) They require only little or no adjustment of parameters, especially not in terms of the number of hidden neurons to use. (3) They are more robust and training is less likely to become stuck in local minima. Recurrent Networks A network architecture with cycles is adopted by recurrent or feedback neural networks such that outputs of some neurons are fed back as extra inputs. Because past outputs are used to calculate future outputs, the network is said to “remember” its previous state. Recurrent networks are designed to process sequential information, like time series data. Processing depends on the state of the network at the last time step. Consequently, the response to the current input depends on previous inputs.

Two similar types of recurrent network are extensions of the multi-layer perceptron: Jordan networks [19] feed back all network outputs into the input layer; Elman networks [12] feed back from the hidden units. State or context units are added to the input layer for the feedback connections which all have constant weight one. At each time step t, an input vector is propagated in a standard feedforward fashion, and then a learning rule (usually backpropagation) is applied. The extra units always maintain a copy of the previous outputs at time step t  1. Radial Basis Function Networks Radial basic function (RBF) networks [8,34,37,38] are another popular variant of two-layer feedforward neural networks which uses radial basis functions as activation functions. The idea behind radial basis functions is to approximate the unknown function f (E x ) by a weighted sum of non-linear basis functions , which are often Gaussian functions with a certain standard deviation  . f (E x) D

X

w i (jjE x  cEi jj)

(14)

i

The basis functions operate on the Euclidean distance between n-dimensional input vector xE and center vector cEi . Once the center vectors cEi are fixed, the weight coefficients wi are found by simple linear regression. The architecture of RBF networks is fixed to two processing layers. Each unit in the hidden layer represents a center vector and a basis function which realizes a nonlinear transformation of the inputs. Each output unit calculates a weighted sum (linear combination) of the nonlinear outputs from the hidden layer. Only the connections between hidden layer and output layer are weighted. The use of a linear output layer in RBF networks is motivated by Cover’s theorem on the separability of patterns. The theorem states that if the transformation from the data (input) space to the feature (hidden) space is nonlinear and the dimensionality of the feature space is relatively high compared to that of the data space, then there is a high likelihood that a non-separable pattern classification task in the input space is transformed into a linearly separable one in the feature space. The center vectors are selected from the training data, either randomly or uniformly distributed in the input space. In principle, as many centers (and hidden units) may be used as there are data examples. Another method is to group the data in space using, for example, k-means clustering, and select the center vectors close to the cluster centers.

Data-Mining and Knowledge Discovery, Neural Networks in

RBF learning is considered a curve-fitting problem in high-dimensional space, i. e., approximates a surface with the basis functions that fits and interpolates the training data points best. The basis functions are well-suited to online learning applications, like adaptive process control. Adapting the network to new data and changing data statistics only requires a retraining by linear regression which is fast. RBF networks are more local approximators than multi-layer perceptrons. New training data from one region of the input space have less effect on the learned model and its predictions in other regions. Self-organizing Maps A self-organizing map (SOM) or Kohonen map [22,23] applies an unsupervised and competitive learning scheme. That means that the class labels of the data vectors are unknown or not used for training and that each neuron improves through competition with other neurons. It is a non-deterministic machine-learning approach to data clustering that implements a mapping of the high-dimensional input data into a low-dimensional feature space. In doing so, SOMs filter and compress information while preserving the most relevant features of a data set. Complex, non-linear relationships and dependencies are revealed between data vectors and between clusters and are transformed into simple geometric distances. Such an abstraction facilitates both visualization and interpretation of the clustering result. Typically, the units of a SOM network are arranged in a two-dimensional regular grid, the topological feature map, which defines a two-dimensional Euclidean distance between units. Each unit is assigned a center vector from the n-dimensional data space and represents a certain cluster. Algorithm 1 describes the basic principle behind SOM training. Starting with an initially random set of center vectors, the algorithm iteratively adjusts them to reflect the clustering of the training data. In doing so, the twodimensional order of the SOM units is imposed on the input vectors such that more similar clusters (center vectors) in the input space are closer to each other on the two-dimensional grid structure than more different clusters. One can think of the topological map to be folded and distorted into the n-dimensional input space, so as to preserve as much as possible the original structure of the data. Algorithm 1 (Self-organizing Map) 1. Initialize the n-dimensional center vector cEi 2 Rn of each cluster randomly.

2. For each data point pE 2 Rn find the nearest center vector cEw (called the “winner”) in n-dimensional space according to a distance metric d. 3. Move cEw and all centers cEi within its local neighborhood closer to pE : cEi (t C 1) D cEi (t) C ˛(t)  hr(t)  ( pE  cEi (t)) with learning rate 0 < ˛ < 1 and neighborhood function h depending on a neighborhood radius r. 4. ˛(t C 1) D ˛(t)  ˛ where ˛ D ˛0 /tmax r(t C 1) D r(t)  r

where r D r0 /tmax

5. Repeat steps 2.–4. for each epoch t D 1; ::; tmax . 6. Assign each data point to the cluster with the nearest center vector. Each iteration involves randomly selecting a data point pE and moving the closest center vector a bit in the direction of pE. Only distance metric d (usually Euclidean) defined on the data space influences the selection of the closest cluster. The adjustment of centers is applied not just to the winning neuron, but to all the neurons of its neighborhood. The neighborhood function is often Gaussian. A simple definition calculates h D 1 if the Euclidean distance between the grid coordinates of the winning cluster w and a cluster i is below a radius r, i. e., jj(xw ; yw )  (x i ; y i )jj < r, and h D 0 otherwise. Both learning rate ˛ and neighborhood radius r monotonically decrease over time. Initially quite large areas of the network are affected by the neighborhood update, leading to a rather rough topological order. As epochs pass, less neurons are altered with lower intensity and finer distinctions are drawn within areas of the map. Unlike hierarchical clustering [52] and k-means clustering [29], which are both deterministic – apart from the randomized initialization in k-means clustering – and operate only locally, SOMs are less likely to become stuck in local minima and have a higher robustness and accuracy. Once the network has been trained to recognize structures in the data, it can be used as a visualization tool and for exploratory data analysis. The example map in Fig. 3 shows a clustering of time series of gene expression values [48]. Clearly higher similarities between neighboring clusters are revealed when comparing the mean vectors. If neurons in the feature map can be labeled, i. e., if common meaning can be derived from the vectors in a clusters, the network becomes capable of classification. If the winning neuron of an unknown input case has not been assigned a class label, labels of clusters in close or direct neighborhood may be considered. Ideally, higher similarities between neighboring data clusters are reflected in similar class labels. Alternatively, the network output is

823

824

Data-Mining and Knowledge Discovery, Neural Networks in

Data-Mining and Knowledge Discovery, Neural Networks in, Figure 3 6 x 6 SOM example clustering of gene expression data (time series over 24 time points) [48]. Mean expression vector plotted for each cluster. Cluster sizes indicate the number of vectors (genes) in each cluster

undefined in this case. SOM classifiers also make use of the distance of the winning neuron from the input case. If this distance exceeds a certain maximum threshold, the SOM is regarded as undecided. In this way, a SOM can be used for detecting novel data classes. Future Directions To date, neural networks are widely accepted as an alternative to classical statistical methods and are frequently used in medicine [4,5,11,25,27,44,45] with many applications related to cancer research [10,26,35,36,49]. In the first place, these comprise diagnostics and prognosis (i. e. classification) tasks, but also image analysis and drug design. Cancer prediction is often based on clustering of gene expression data [15,50] or microRNA expression profiles [28] which may involve both self-organizing maps and multi-layer feedforward neural networks. Another broad application field of neural networks today is bioinformatics and, in particular, the analysis

and classification of gene and protein sequences [3,20,56]. A well-known successful example is protein (secondary) structure prediction from sequence [39,40]. Even though the NN technology is clearly established today, the current period is rather characterized by stagnation. This is partly because of a redirection of research to newer and often – but not generally – more powerful paradigms, like the popular support vector machines (SVMs) [9,47], or to more open and flexible methods, like genetic programming (GP) [7,24]. In many applications, for example, in bioinformatics, SVMs have already replaced conventional neural networks as the state-of-theart black-box classifier. Bibliography Primary Literature 1. Ackley DH, Hinton GF, Sejnowski TJ (1985) A learning algorithm for Boltzman machines. Cogn Sci 9:147–169 2. Anderson JA et al (1977) Distinctive features, categorical per-

Data-Mining and Knowledge Discovery, Neural Networks in

3. 4. 5.

6. 7. 8. 9.

10.

11. 12. 13.

14.

15.

16.

17.

18. 19.

20.

21. 22. 23. 24.

25.

ception, and probability learning: Some applications of a neural model. Psychol Rev 84:413–451 Baldi P, Brunak S (2001) Bioinformatics: The machine learning approach. MIT Press, Cambridge Baxt WG (1995) Applications of artificial neural networks to clinical medicine. Lancet 346:1135–1138 Begg R, Kamruzzaman J, Sarkar R (2006) Neural networks in healthcare: Potential and challenges. Idea Group Publishing, Hershey Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, London Brameier M, Banzhaf W (2007) Linear genetic programming. Springer, New York Broomhead DS, Lowe D (1988) Multivariable functional interpolation and adaptive networks. Complex Syst 2:321–355 Cristianini N (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, London Dybowski R (2000) Neural computation in medicine: Perspectives and prospects. In: Malmgren H, Borga M, Niklasson L (eds) Proceedings of the Conference on Artificial Neural Networks in Medicine and Biology (ANNIMAB). Springer, Berlin, pp 26–36 Dybowski R, Gant V (2001) Clinical Applications of Artificial Neural Networks. Cambridge University Press, London Elman JL (1990) Finding structure in time. Cogn Sci 14:179–211 Fahlman SE (1989) Faster learning variations on backpropagation: An empirical study. In: Touretzky DS, Hinton GE, Sejnowski TJ (eds) Proceedings of the (1988) Connectionist Models Summer School. Morgan Kaufmann, San Mateo, pp 38–51 Fahlman SE, Lebiere C (1990) The cascade-correlation learning architecture. In: Touretzky DS (ed) Advances in Neural Information Processing Systems 2. Morgan Kaufmann, Los Altos Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439):531–537 Hagan MT, Menhaj M (1994) Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw 5(6):989–993 Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79(8):2554–2558 Jacobs RA (1988) Increased rates of convergence through learning rate adaptation. Neural Netw 1:295–307 Jordan MI (1986) Attractor dynamics and parallelism in a connectionist sequential machine. In: Proceedings of the Eighth Annual Conf of the Cogn Sci Society. Lawrence Erlbaum, Hillsdale, pp 531–546 Keedwell E, Narayanan A (2005) Intelligent bioinformatics: The application of artificial intelligence techniques to bioinformatics problems. Wiley, New York Kohonen T (1977) Associative Memory: A System-Theoretical Approach. Springer, Berlin Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43:59–69 Kohonen T (1995) Self-organizing maps. Springer, Berlin Koza JR (1992) Genetic programming: On the programming of computer programs by natural selection. MIT Press, Cambridge Lisboa PJG (2002) A review of evidence of health benefit from artificial neural networks in medical intervention. Neural Netw 15:11–39

26. Lisboa PJG, Taktak AFG (2006) The use of artificial neural networks in decision support in cancer: a systematic review. Neural Netw 19(4):408–415 27. Lisboa PJG, Ifeachor EC, Szczepaniak PS (2001) Artificial neural networks in biomedicine. Springer, Berlin 28. Lu J et al (2005) MicroRNA expression profiles classify human cancers. Nature 435:834–838 29. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, Berkeley, pp 281–297 30. McClelland JL, Rumelhart DE (1986) Parallel distributed processing: Explorations in the microstructure of cognition. MIT Press, Cambridge 31. McClelland J, Rumelhart D (1988) Explorations in parallel distributed processing. MIT Press, Cambridge 32. McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133 33. Minsky ML, Papert SA (1969/1988) Perceptrons. MIT Press, Cambridge 34. Moody J, Darken CJ (1989) Fast learning in networks of locallytuned processing units. Neural Comput 1:281–294 35. Naguib RN, Sherbet GV (1997) Artificial neural networks in cancer research. Pathobiology 65(3):129–139 36. Naguib RNG, Sherbet GV (2001) Artificial neural networks in cancer diagnosis, prognosis, and patient management. CRC Press, Boca Raton 37. Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multi-layer networks. Science 247:978– 982 38. Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78:1481–1497 39. Quian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884 40. Rost B (2001) Review: Protein secondary structure prediction continues to rise. J Struct Biol 134:204–218 41. Rosenblatt F (1958) The perceptron: A probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386–408 42. Riedmiller M, Braun H (1992) Rprop – a fast adaptive learning algorithm. In: Proceedings of the International Symposium on Computer and Information Science VII 43. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: The Rprop algorithm. In: Proceedings of the IEEE International Conference on Neural Networks. IEEE Press, Piscataway, pp 586–591 44. Ripley BD, Ripley RM (2001) Neural Networks as statistical methods in survival analysis. In: Dybowski R, Gant V (eds) Clinical Applications of Artificial Neural Networks. Cambridge Univ Press, London 45. Robert C et al (2004) Bibliometric overview of the utilization of artificial neural networks in medicine and biology. Scientometrics 59:117–130 46. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by backpropagating errors. Nature 323:533–536 47. Schlökopf B, Smola AJ (2001) Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

825

826

Data-Mining and Knowledge Discovery, Neural Networks in

48. Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273–97 49. Taktak AFG, Fisher AC (2007) Outcome prediction in cancer. Elsevier Science, London 50. Tamayo P et al (1999) Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. PNAS 96(6):2907–2912 51. Tollenaere T (1990) SuperSAB: Fast adaptive backpropagation with good scaling properties. Neural Netw 3:561–573 52. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244 53. Werbos PJ (1974) Beyond regression: New tools for prediction and analysis in the behavioral science. Ph D Thesis, Harvard University 54. Werbos PJ (1994) The roots of backpropagation. Wiley, New York 55. Widrow B, Hoff ME (1960) Adaptive switching circuits. In: IRE WESCON Convention Record, Institute of Radio Engineers (now IEEE), vol 4. pp 96–104 56. Wu CH, McLarty JW (2000) Neural networks and genome informatics. Elsevier Science, Amsterdam

Books and Reviews Abdi H (1994) A neural network primer. J Biol Syst 2:247–281 Bishop CM (2008) Pattern recognition and machine learning. Springer, Berlin Fausett L (1994) Fundamentals of neural networks: Architectures, algorithms, and applications. Prentice Hall, New York Freeman JA, Skapura DM (1991) Neural networks: Algorithms, applications, and programming techniques. Addison, Reading Gurney K (1997) An Introduction to neural networks. Routledge, London

Hastie T, Tibshirani R, Friedman JH (2003) The elements of statistical learning. Springer, Berlin Haykin S (1999) Neural networks: A comprehensive foundation. Prentice Hall, New York Hertz J, Krogh A, Palmer R (1991) Introduction to the theory of neural computation. Addison, Redwood City Kröse B, van der Smagt P (1996) An introduction to neural networks. University of Amsterdam, Amsterdam Masters T (1993) Practical neural network recipes in C++. Academic Press, San Diego Masters T (1995) Advanced algorithms for neural networks: A C++ sourcebook. Wiley, New York Parks R, Levine D, Long D (1998) Fundamentals of neural network modeling. MIT Press, Cambridge Patterson D (1996) Artif neural networks. Prentice Hall, New York Peretto P (1992) An introduction to the modeling of neural networks. Cambridge University Press, London Ripley BD (1996/2007) Pattern recognition and neural networks. Cambridge University Press, London Smith M (1993) Neural networks for statistical modeling. Van Nostrand Reinhold, New York Wasserman PD (1989) Neural Computing: Theory and Practice. Van Nostrand Reinhold, New York Wasserman PD (1993) Advanced methods in neural computing. Van Nostrand Reinhold, New York De Veaux RD, Ungar LH (1997) A brief introduction to neural networks. Technical Report, Williams College, University of Pennsylvania Hinton GE (1992) How neural networks learn from experience. Sci Am 267:144–151 Lippman RP (1987) An introduction to computing neural networks. IEEE ASSP Mag 4(2):4–22 Reilly D, Cooper L (1990) An overview of neural networks: Early models to real world systems. Neural Electron Netw 2:229–250

Decision Trees

Decision Trees VILI PODGORELEC, MILAN Z ORMAN University of Maribor, Maribor, Slovenia Article Outline Glossary Definition of the Subject Introduction The Basics of Decision Trees Induction of Decision Trees Evaluation of Quality Applications and Available Software Future Directions Bibliography Glossary Accuracy The most important quality measure of an induced decision tree classifier. The most general is the overall accuracy, defined as a percentage of correctly classified instances from all instances (correctly classified and not correctly classified). The accuracy is usually measured both for the training set and the testing set. Attribute A feature that describes an aspect of an object (both training and testing) used for a decision tree. An object is typically represented as a vector of attribute values. There are two types of attributes: continuous attributes whose domain is numerical, and discrete attributes whose domain is a set of predetermined values. There is one distinguished attribute called decision class (a dependent attribute). The remaining attributes (the independent attributes) are used to determine the value of the decision class. Attribute node Also called a test node. It is an internal node in the decision tree model that is used to determine a branch from this node based on the value of the corresponding attribute of an object being classified. Classification A process of mapping instances (i. e. training or testing objects) represented by attribute-value vectors to decision classes. If the predicted decision class of an object is equal to the actual decision class of the object, then the classification of the object is accurate. The aim of classification methods is to classify objects with the highest possible accuracy. Classifier A model built upon the training set used for classification. The input to a classifier is an object (a vector of known values of the attributes) and the output of the classifier is the predicted decision class for this object.

Decision node A leaf in a decision tree model (also called a decision) containing one of the possible decision classes. It is used to determine the predicted decision class of an object being classified that arrives to the leaf on its path through the decision tree model. Instance Also called an object (training and testing), represented by attribute-value vectors. Instances are used to describe the domain data. Induction Inductive inference is the process of moving from concrete examples to general models, where the goal is to learn how to classify objects by analyzing a set of instances (already solved cases) whose classes are known. Instances are typically represented as attribute-value vectors. Learning input consists of a set of such vectors, each belonging to a known class, and the output consists of a mapping from attribute values to classes. This mapping should accurately classify both the given instances (a training set) and other unseen instances (a testing set). Split selection A method used in the process of decision tree induction for selecting the most appropriate attribute and its splits in each attribute (test) node of the tree. The split selection is usually based on some impurity measures and is considered the most important aspect of decision tree learning. Training object An object that is used for the induction of a decision tree. In a training object both the values of the attributes and the decision class are known. All the training objects together constitute a training set, which is a source of the “domain knowledge” that the decision tree will try to represent. Testing object An object that is used for the evaluation of a decision tree. In a testing object the values of the attributes are known and the decision class is unknown for the decision tree. All the testing objects together constitute a testing set, which is used to test an induced decision tree – to evaluate its quality (regarding the classification accuracy). Training set A prepared set of training objects. Testing set A prepared set of testing objects.

Definition of the Subject The term decision trees (abbreviated, DT) has been used for two different purposes: in decision analysis as a decision support tool for modeling decisions and their possible consequences to select the best course of action in situations where one faces uncertainty, and in machine learning or data mining as a predictive model; that is, a mapping from observations about an item to conclusions

827

828

Decision Trees

about its target value. This article concentrates on the machine learning view of DT. More descriptive names for DT models are classification tree (discrete outcome) or regression tree (continuous outcome). In DT structures, leafs represent classifications and branches represent conjunctions of features that lead to those classifications. The machine learning technique for inducing a DT classifier from data (training objects) is called decision tree learning, or decision trees. The main goal of classification (and regression) is to build a model that can be used for prediction [16]. In a classification problem, we are given a data set of training objects (a training set), each object having several attributes. There is one distinguished attribute called a decision class; it is a dependent attribute, whose value should be determined using the induced decision tree. The remaining attributes (the independent attributes, we will denote them just as attributes in further text) are used to determine the value of the decision class. Classification is thus a process of mapping instances (i. e. training or testing objects) represented by attribute-value vectors to decision classes. The aim of DT learning is to induce such a DT model that is able to accurately predict the decision class of an object based on the values of its attributes. The classification of an object is accurate if the predicted decision class of the object is equal to the actual decision class of the object. The DT is induced using a training set (a set of objects where both the values of the attributes and decision class are known) and the resulting DT is used to determine decision classes for unseen objects (where the values of the attributes are known but the decision class is unknown). A good DT should accurately classify both the given instances (a training set) and other unseen instances (a testing set). DT have a wide range of applications, but they are especially attractive in data mining [16]. Because of their intuitive representation, the resulting classification model is easy to understand, interpret and criticize [5]. DT can be constructed relatively fast compared to other methods [28]. And lastly, the accuracy of classification trees is comparable to other classification models [19,28]. Every major data mining tool includes some form of classification tree model construction component [17]. Introduction In the early 1960s Feigenbaum and Simon presented EPAM (the Elementary Perceiver And Memorizer) [12] – a psychological theory of learning and memory implemented as a computer program. EPAM was the first sys-

tem to use decision trees (then called discrimination nets). It memorized pairs of nonsense syllables by using discrimination nets in which it could find images of syllables it had seen. It stored as its cue only enough information to disambiguate the syllable from others seen at the time the association was formed. Thus, old cues might be inadequate to retrieve data later, so the system could “forget”. Originally designed to simulate phenomena in verbal learning, it has been later adapted to account for data on the psychology of expertise and concept formation. Later in the 1960s Hunt et al. built further on this concept and introduced CLS (Concept Learning System) [23] that used heuristic lookahead to construct trees. CLS was a learning algorithm that learned concepts and used them to classify new cases. CLS was the precursor to decision trees; it lead to Quinlan’s ID3 system. ID3 [38] added the idea of using information content to choose the attribute to split; it also initially chooses a window (a subset) of the training examples, and tries to learn a concept that will correctly classify all based on that window; if not, it increases window size. Quinlan later constructed C4.5 [41], an industrial version of ID3. Utgoff’s ID5 [53] was an extension of ID3 that allowed many-valued classifications as well as incremental learning. From the early 1990s both the number of researchers and applications of DT have grown tremendously. Objective and Scope of the Article In this article an overview of DT is presented with the emphasis on a variety of different induction methods available today. Induction algorithms ranging from the traditional heuristic-based techniques to the most recent hybrids, such as evolutionary and neural network-based approaches, are described. Basic features, advantages and drawbacks of each method are presented. For the readers not very familiar with the field of DT this article should be a good introduction into this topic, whereas for more experienced readers it should broaden their perspective and deepen their knowledge. The Basics of Decision Trees Problem Definition DT are a typical representative of a symbolic machine learning approach used for the classification of objects into decision classes, whereas an object is represented in a form of an attribute-value vector (attribute1 , attribute2 , . . . attributeN , decision class). The attribute values describe the features of an object. The attributes are usually identified and selected by the creators of the dataset. The de-

Decision Trees

cision class is one special attribute whose value is known for the objects in the learning set and which will be predicted based on the induced DT for all further unseen objects. Normally the decision class is a feature that could not be measured (for example some prediction for the future) or a feature whose measuring is unacceptably expensive, complex, or not known at all. Examples of attributes-decision class objects are: patient’s examination results and the diagnosis, raster of pixels and the recognized pattern, stock market data and business decision, past and present weather conditions and the weather forecast. The decision class is always a discretevalued attribute and is represented by a set of possible values (in the case of regression trees a decision class is a continuous-valued attribute). If the decision should be made for a continuous-valued attribute, the values should be discretized first (by appropriately transforming continuous intervals into corresponding discrete values). Formal Definition Let A1 ; : : : ; A n , C be random variables where Ai has domain dom(A i ) and C has domain dom(C); we assume without loss of generality that dom(C) D fc1 ; c2 ; : : : ; c j g. A DT classifier is a function dt : dom(A1 )      dom(A n ) 7! dom(C) Let P(A0 ; C 0 ) be a probability distribution on dom(A1 )      dom(A n )  dom(C) and let t D ht:A1 ; : : : ; t:A n ; t:Ci be a record randomly drawn from P; i. e., t has probability P(A0 ; C 0 ) that ht:A1 ; : : : ; t:A n i 2 A0 and t:C 2 C 0 . We define the misclassification rate Rdt of classifier dt to be P(dt ht:A1 ; : : : ; t:A n i ¤ t:C). In terms of the informal introduction to this article, the training database D is a random sample from P, the Ai correspond to the attributes, and C is the decision class. A DT is a directed, acyclic graph T in the form of a tree. Each node in a tree has either zero or more outgoing edges. If a node has no outgoing edges, then it is called a decision node (a leaf node); otherwise a node is called a test node (or an attribute node). Each decision node N ˚is labeled with one of the pos sible decision classes c 2 c1 ; : : : ; c j . Each test node is labeled with one attribute A i 2 fA1 ; : : : ; A n g, called the splitting attribute. Each splitting attribute Ai has a splitting function f i associated with it. The splitting function f i determines the outgoing edge from the test node, based on the attribute value Ai of an object O in question. It is in a form of A i 2 Yi where Yi  dom(A i ); if the value of the attribute Ai of the object O is within Y i , then the corresponding outgoing edge from the test node is chosen. The problem of DT construction is the following. Given a data set D D ft1 ; : : : ; td g where the ti are inde-

pendent random samples from an unknown probability distribution P, find a decision tree classifier T such that the misclassification rate R T (P) is minimal. Inducing the Decision Trees Inductive inference is the process of moving from concrete examples to general models, where the goal is to learn how to classify objects by analyzing a set of instances (already solved cases) whose decision classes are known. Instances are typically represented as attribute-value vectors. Learning input consists of a set of such vectors, each belonging to a known decision class, and the output consists of a mapping from attribute values to decision classes. This mapping should accurately classify both the given instances and other unseen instances. A decision tree is a formalism for expressing such mappings [41] and consists of test nodes linked to two or more sub-trees and leafs or decision nodes labeled with a decision class. A test node computes some outcome based on the attribute values of an instance, where each possible outcome is associated with one of the sub-trees. An instance is classified by starting at the root node of the tree. If this node is a test, the outcome for the instance is determined and the process continues using the appropriate sub-tree. When a leaf is eventually encountered, its label gives the predicted decision class of the instance. The finding of a solution with the help of DT starts by preparing a set of solved cases. The whole set is then divided into (1) a training set, which is used for the induction of a DT classifier, and (2) a testing set, which is used to check the accuracy of an obtained solution. First, all attributes defining each case are described (input data) and among them one attribute is selected that represents a decision class for the given problem (output data). For all input attributes specific value classes are defined. If an attribute can take only one of a few discrete values then each value takes its own class; if an attribute can take various numeric values then some characteristic intervals must be defined, which represent different classes. Each attribute can represent one internal node in a generated DT, also called a test node or an attribute node (Fig. 1). Such a test node has exactly as many outgoing edges as its number of different value classes. The leafs of a DT are decisions and represent the value classes of the decision attribute – decision classes (Fig. 1). When a decision has to be made for an unsolved case, we start with the root node of the DT classifier and moving along attribute nodes select outgoing edges where values of the appropriate attributes in the unsolved case matches the attribute values in the DT classifier until the leaf node is reached representing the decision

829

830

Decision Trees

Decision Trees, Figure 1 An example of a (part of a) decision tree

class. An example training database is shown in Table 1 and a sample DT is shown in Fig. 1. The DT classifier is very easy to interpret. From the tree shown in Fig. 1 we can deduce for example the following rules:  If the chance of precipitation is less than 30% and the visibility is good and the temperature is in the range of 10–27°C, then we should go for a trip,  If the chance of precipitation is less than 30% and the visibility is good and the temperature is less than 10°C or more than 27°C, then we should stay at home, Decision Trees, Table 1 An example training set for object classification # 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Attributes Color Edge green dotted green dotted yellow dotted red dotted red solid green solid green solid yellow dotted yellow solid red solid green solid yellow dotted yellow solid red dotted

Dot no yes no no no yes yes no no no yes yes no yes

Decision class Shape triangle triangle square square square triangle square triangle square square square square square triangle

 If the chance of precipitation is 30% or more and the wind is moderate or strong, then we should stay at home. A DT can be built from a set of training objects with the “divide and conquer” principle. When all objects are of the same decision class (the value of the output attribute is the same) then a tree consists of a single node – a leaf with the appropriate decision. Otherwise an attribute is selected and a set of objects is divided according to the splitting function of the selected attribute. The selected attribute builds an attribute (test) node in a growing DT classifier, for each outgoing edge from that node the inducing procedure is repeated upon the remaining objects regarding the division until a leaf (a decision class) is encountered. From a geometric point of view a set of n attributes defines a n-dimensional space, where each data object represents a point. A division of data objects regarding the attribute’s class suits the definition of decision planes in the same space. Those planes are hyper-planes which are orthogonal to the selected attribute – DT divides a search space into hyper-rectangles, each of them represents one of the possible decision classes; of course more rectangles can also represent the same decision class. Induction of Decision Trees In 1986 Quinlan introduced an algorithm for inducing decision trees called ID3 [39,40]. In 1993 ID3 was upgraded with an improved algorithm C4.5 [41] that is still regarded as the reference model to build a DT based on the tradi-

Decision Trees

tional statistical approach. Both algorithms ID3 and C4.5 use the statistical calculation of information gain from a single attribute to build a DT. In this manner an attribute that adds the most information about the decision upon a training set is selected first, followed by the next one that is the most informative from the remaining attributes, etc. The method for constructing a DT as paraphrased from Quinlan [41], pp. 17–18, is as follows: If there are j classes denoted fc1 ; c2 ; : : : ; c j g, and a training set D, then  If D contains one or more objects which all belong to a single class ci , then the decision tree is a leaf identifying class ci  If D contains no objects, the decision tree is a leaf determined from information other than D  If D contains objects that belong to a mixture of classes, then a test is chosen, based on a single attribute, that has one or more mutually exclusive outcomes fo1 ; o2 ; : : : ; o n g. D is partitioned into subsets D1 ; D2 ; : : : ; D n , where Di contains all the objects in D that have outcome oi of the chosen test. The same method is applied recursively to each subset of training objects.

tribute regarding all the training objects with jth value of splitting attribute within the training set D. Entropy, Information Gain, Information Gain Ratio The majority of classical DT induction algorithms (such as ID3, C4.5, . . . ) are based on calculating entropy from the information theory [45] to evaluate splits. In information theory entropy is used to measure the unreliability of a message as the source of information; the more information a message contains, the lower is the entropy. Two splitting criteria are implemented:  Gain criterion, and  Gain ratio criterion. The gain criterion [41] is developed in the following way: For any training set D, ni is the number of training objects of decision class ci within the training set D. Then consider the “message” that a randomly selected object belongs to decision class ci . The “message” has probability p i D nni , where n is the total number of training objects in D. The information conveyed by the message (in bits) is given by I D  log2 p i D  log2

Split Selection The most important aspect of a traditional DT induction strategy is the way in which a set is split, i. e. how to select an attribute test that determines the distribution of training objects into sub-sets upon which sub-trees are built consequently. This process is called split selection. Its aim is to find an attribute and its associated splitting function for each test node in a DT. In the following text some of the most widely used split selection approaches will be presented. Let D be the learning set of objects described with attributes A1 ; : : : ; A n and a decision class C. Let n denote the number of training objects in D, ni the number of training objects of decision class ci , nj the number of training objects with the jth value of splitting attribute, and nij the number of training objects of decision class ci and jth value of the splitting attribute. The relative frequencies (probabilities) of training objects in D are as follows: n p i j D ni j is the relative frequency of training objects of decision class ci and jth value of splitting attribute within the training set D, p i D nni is the relative frequency of training objects of decision class ci within the training n set D, p j D nj is the relative frequency of training objects with the jth value of splitting attribute within the trainn ing set D, and p i j j D nijj is the relative frequency of training objects of decision class ci and jth value of splitting at-

ni : n

Entropy E of an attribute A of a training object w with the possible attribute values a1 ; : : : ; a m and the probability distribution p(A(w) D a i ) is thus defined as EA D 

X

p j  log2 p j :

j

Let EC be entropy of decision class distribution, EA be entropy of values of an attribute A, and ECA be entropy of combined decision class distribution and attribute values as follows: X p i  log2 p i EC D  i

EA D 

X

p j  log2 p j

j

EC A D 

XX i

p i j  log2 p i j :

j

The expected entropy of decision class distribution regarding the attribute A is thus defined as E CjA D EC A  E A : This expected entropy E CjA measures the reduction of entropy that is to be expected in the case when A is selected

831

832

Decision Trees

as the splitting attribute. The information gain is thus defined as Igain (A) D EC  E CjA : In each test node an attribute A is selected with the highest value Igain(A). The gain criterion [41] selects a test to maximize this information gain. The gain criterion has one significant disadvantage in that it is biased towards tests with many outcomes. The gain ratio criterion [41] was developed to avoid this bias; it is defined as Igain (A) Igain ratio(A) D : EA If the split is near trivial, split information will be small and this ratio will be unstable. Hence, the gain ratio criterion selects a test to maximize the gain ratio subject to the constraint that the information gain is large. Gini Index The gain ratio criterion compares with CART’s impurity function approach [5], where impurity is a measure of the class mix of a subset and splits are chosen so that the decrease in impurity is maximized. This approach led to the development of the Gini index [5]. The impurity function approach considers the probability of misclassifying a new sample from the overall population, given that the sample was not part of the training sample, T. This probability is called the misclassification rate and is estimated using either the resubstitution estimate (or training set accuracy), or the test sample estimate (test set accuracy). The node assignment rule selects i to minimize this misclassification rate. In addition, the Gini index promotes splits that minimize the overall size of the tree. X X X gini(A) D  pj p2ij j  p2i : j

i

i

Chi-Square Various types of chi-square tests are often used in statistics for testing the significance. The Chisquare (chi-square goodness-of-fit) test compares the expected frequency eij with the actual frequency nij of training objects, which belong to decision class ci and having jth value of the given attribute [55]. It is defined as 2

 (A) D

X X (e i j  n i j )2 i

j

ei j

where n j ni ei j D : n The higher value of 2 means a clearer split. Consequentially, the attribute with the highest 2 value is selected.

J-Measure The J-measure heuristics has been introduced by Smyth and Goodman [48] as an informationtheoretical model of measuring the information content of a rule. The crossover entropy J j is appropriate for selecting a single value of a given attribute A for constructing a rule; it is defined as X p ij j p ij j log : J j (A) D p j pi i

The generalization over all possible values of a given attribute A measures the purity of the attribute X J j (A) : J(A) D j

The J-measure is also used to reduce over-fitting in the process of pre-pruning (see the following sections). DT Induction Example: Classifying Geometrical Objects Let us have a number of geometrical objects: squares and triangles. Each geometrical object is described with three features: color describes the color of an object (it can be either green, yellow or red), edge describes the line type of an object’s edge (it can be either solid or dotted), and dot describes whether there is a dot in an object. See Table 1 for details. We want to induce a DT that will predict the shape of an unseen object based on the three features. The decision class will thus be shape, having two possible values: square or triangle. The three features represent the tree attributes: color (with possible values green, yellow and red), edge (with possible values solid and dotted), and dot (with possible values yes and no). Table 1 resembles the training set with 14 training objects; in Fig. 2 the training objects are also visually represented. In the training set (Table 1, Fig. 2) there are 14 training objects: five triangles and nine squares. We can calculate the class distributions: 9 5 p(square) D and p(triangle) D : 14 14 The entropy of decision class (shape) distribution is thus X p i  log2 p i Eshape D  i

9 5 9 5 D   log2   log2 D 0:940 : 14 14 14 14 The entropy can be reduced by splitting the whole training set. For this purpose either of the attributes (color, edge,

Decision Trees

(green, yellow, and red) the whole training set is split into three subsets (Fig. 3). The entropies are as follows: 2 3 3 2 EcolorDgreen D   log2   log2 D 0:971 5 5 5 5 4 0 0 4 EcolorDyellow D   log2   log2 D 0:0 4 4 4 4 3 2 2 3 EcolorDred D   log2   log2 D 0:971 : 5 5 5 5 Consequentially, the entropy of the attribute color based on the whole training set is Decision Trees, Figure 2 Visual representation of training objects from Table 1

Ecolor D

X

pi Ei D

i

or dot) can be used as a splitting attribute. Regarding the information gain split selection approach, the entropy of each attribute is calculated first, then the information gain of each attribute is calculated based on these entropies, and finally the attribute is selected that has the highest information gain. To demonstrate it, let us calculate the entropy for the attribute color. Using different possible values of color

Decision Trees, Figure 3 Splitting the data set based on the attribute color

5 4 5  0:971 C  0:0 C  0:971 14 14 14

D 0:694 : The information gain of the attribute color is thus Igain (color) D Eshape  E shapejcolor D 0:940  0:694 D 0:246 : Similarly, the information gain can be calculated also for the remaining two attributes, edge and dot. The informa-

833

834

Decision Trees

Decision Trees, Figure 4 The resulting DT classifier for the classification of geometrical objects

tion gain for all three attributes are thus Igain (color) D 0:246 Igain (edge) D 0:151 Igain (dot) D 0:048 : As we can see the attribute color has the highest information gain and is therefore chosen as the splitting attribute at the root node of the DT classifier. The whole process of splitting is recursively repeated on all subsets. The resulting DT classifier is shown in Fig. 4. Tests on Continuous Attributes In the above example all attributes were discrete-valued. For the continuous attributes a split threshold needs to be determined whenever the attribute is to be used as a splitting attribute. The algorithm for finding appropriate thresholds for continuous attributes [5,33,41] is as follows: The training objects are sorted on the values of the attribute. Denote them in order as fw1 ; w2 ; : : : ; w k g. Any threshold value lying between wi and w iC1 will have the same effect, so there are only k-1 possible splits, all of which are examined. Some approaches for building DT use discretization of continuous attributes. Two ways are used to select discretized classes:  equidistant intervals; The Number Of Classes Is Selected First And Then Successive Equidistant Intervals Are Determined Between Absolute Lower And Upper Bounds, And  percentiles; Again The Number Of Classes Is Selected First And Then Successive Intervals Are Determined Based On The Values Of The Appropriate Attribute In The Training Set So That All Intervals Contain The Same Number Of Training Objects.

Dynamic Discretization of Attributes In the MtDeciT 2.0 tool authors implemented an algorithm for finding subintervals [58], where the distribution of training objects is considered and there are more than two subintervals possible. The approach is called dynamic discretization of continuous attributes, since the subintervals are determined dynamically during the process of building a DT. This technique first splits the interval into many subintervals, so that every training object’s value has its own subinterval. In the second step it merges together smaller subintervals that are labeled with the same outcome into larger subintervals. In each of the following steps three subintervals are merged together: two “stronger” subintervals with one “weak” interval, where the “weak” interval lies between those two “strong” subintervals. Here strong and weak applies to the number of training objects in the subinterval tree (Fig. 5). In comparison to the previous two approaches the dynamic discretization returns more natural subintervals, which result in better and smaller DT classifiers. In general we differentiate between two types of dynamic discretization:  General dynamic discretization, and  Nodal dynamic discretization. General dynamic discretization uses all available training objects for the definition of subintervals. That is why the general dynamic discretization is performed before the start of building a DT. All the subintervals of all attributes are memorized in order to be used later in the process of building the DT. Nodal dynamic discretization performs the definition of subintervals for all continuous attributes which are available in the current node of a DT. Only those training objects that came in the current node are used for setting the subintervals of the continuous attributes.

Decision Trees

Decision Trees, Figure 5 Dynamic discretization of continuous attribute, which has values between 60 and 100. Shapes represent different attribute’s values

In a series of tests the authors showed that nodal dynamic discretization produces smaller DT with higher accuracy than DT built with general dynamic discretization [56,58]. Nodal dynamic discretization also outperforms classical discretization techniques in the majority of cases. Oblique Partitioning of Search Space The so-far presented algorithms are using univariate partitioning methods, which are attractive because they are straightforward to implement (since only one feature is analyzed at a time) and the derived DT is relatively easy to understand. Beside univariate partitioning methods there are also some successful partitioning methods that do not partition the search space axis-parallel based on only one attribute at a time but are forming oblique partition boundaries based on a combination of attributes (Fig. 6). Oblique partitioning provides a viable alternative to univariate methods. Unlike their univariate counterparts, oblique partitions are formed by combinations of attributes. The general form of an oblique partition is given by d X

ˇi x i  C

iD1

where ˇ i represents the coefficient of the ith attribute. Be-

cause of their multivariate nature, oblique methods offer far more flexibility in partitioning the search space; this flexibility comes at a price of higher complexity, however. Consider that given a data set containing n objects   P described with d attributes, there can be 2  diD0 n1 i oblique splits if n > d [50]; each split is a hyper-plane that divides the search space into two non-overlapping halves. For univariate splits, the number of potential partitions is much lower, but still significant, n  d [29]. In short, finding the right oblique partition is a difficult task. Given the size of the search space, choosing the right search method is of critical importance in finding good partitions. Perhaps the most comprehensive reference on this subject is [5] on classification and regression trees (CART). Globally, CART uses the same basic algorithm as Quinlan in C4.5. At the decision node level, however, the algorithm becomes extremely complex. CART starts out with the best univariate split. It then iteratively searches for perturbations in attribute values (one attribute at a time) which maximize some goodness metric. At the end of the procedure, the best oblique and axis-parallel splits found are compared and the better of these is selected. Although CART provides a powerful and efficient solution to a very difficult problem, it is not without its disadvantages. Because the algorithm is fully deterministic, it has no inherent mechanism for escaping from local op-

835

836

Decision Trees

Decision Trees, Figure 6 Given axes that show the attribute values and shape corresponding to class labels: i axis-parallel and ii oblique decision boundaries

tima. As a result, CART has a tendency to terminate its partition search at a given node too early. The most fundamental disadvantage of CART (and of the traditional approach of inducing DT in general) is that the DT induction process can cause the metrics to produce misleading results. Because traditional DT induction algorithms choose what is locally optimal for each decision node, they inevitably ignore splits that score poorly alone, but yield better solution when used in combination. This problem is illustrated by Fig. 7. The solid lines indicate the splits found by CART. Although each split optimizes the impurity metric, the end product clearly does not reflect the best possible partitions (indicated by the dotted lines). However, when evaluated as individuals, the dotted lines register high impurities and are therefore not chosen. Given this, it is apparent that the sequential nature of DT can prevent the induction of trees that reflect the natural structure of the data. Pruning Decision Trees Most of the real-world data contains at least some amount of noise and outliers. Different machine-learning approaches elaborate different levels of sensitivity to noise in data. DT fall into the category of methods, sensitive to noise and outliers. In order to cure that, some additional methods must be applied to DT in order to reduce the complexity of the DT and increase its accuracy. In DTs we call this approach pruning. Though pruning can be applied in different stages, it follows the basic idea, also called the Ockham’s razor. William of Ockham (1285–1349), one of the most influ-

Decision Trees, Figure 7 CART generated splits (solid lines – 1,2) minimize impurity at each decision node in a way that is not necessarily optimal regarding the natural structure of data (denoted by dotted lines – 3,4)

ential medieval philosophers has been given credit for the rule, which said that “one should not draw more conclusions about a certain matter than minimum necessary and that the redundant conclusions should be removed or shaved off”. That rule was a basis for latter interpretation in the form of the following statement: “If two different solutions solve the same problem with the same accuracy, then the better of the two is the shorter solution.”

Decision Trees

Mapping this rule to DT pruning allows us to prune or shave away those branches in the DT, which do not decrease the classification accuracy. By doing so, we end up with a DT, which is not only smaller, but also has higher (or in the worse case scenario at least the same) accuracy as the original DT. DT classifiers aim to refine the training sample T into subsets which have only a single class. However, training samples may not be representative of the population they are intended to represent. In most cases, fitting a DT until all leafs contain data for a single class causes over-fitting. That is, the DT is designed to classify the training sample rather than the overall population and accuracy on the overall population will be much lower than the accuracy on the training sample. For this purpose most of the DT induction algorithms (C4.5, CART, OC1) use pruning. They all grow trees to maximum size, where each leaf contains single-class data or no test offers any improvement on the mix of classes at that leaf, and then prune the tree to avoid over-fitting. Pruning occurs within C4.5 when the predicted error rate is reduced by replacing a branch with a leaf. CART and OC1 use a proportion of the training sample to prune the tree. The tree is trained on the remainder of the training sample and then pruned until the accuracy on the pruning sample can not be further improved. In general we differentiate between two pruning approaches: prepruning, which takes place during DT induction and postpruning, applied to already-induced DT in order to reduce complexity. We will describe three representatives of pruning – one prepruning and two postpruning examples.

Prepruning Prepruning (called also stopping criteria) is a very simple procedure, performed during the first phase of induction. Early stopping of DT construction is based on criteria, which measures the percentage of the dominant class in the training subset in the node. If that percentage is higher than the preset threshold, then the internal node is transformed to a leaf and marked with the dominant class label. Such early stopping of DT construction reduces the size and complexity of DT, reduces the possibility of over-fitting, making it more general. However, there is also danger of oversimplification when the preset threshold is too low. This problem is especially present in training sets where frequencies of class labels are not balanced. An improper threshold can cause some special training objects, which are very important for solving the classification problem, to be discarded during the prepruning process.

Postpruning The opposite to prepruning, postpruning operates on DT that is already constructed. Using simple frequencies or error estimation, postpruning approaches calculate whether or not substituting a subtree with a leaf (pruning or shaving-off according to Ockham’s razor) would increase the DT’s accuracy or at least reduce its size without negatively effecting classification accuracy. Both the following approaches are practically the same up to a point, where we estimate error in a node or subtree. Static Error Estimate Pruning Static error estimate pruning uses frequencies of training objects’ class labels in nodes to estimate error, caused by replacing an internal node with a leaf. If the error estimate in the node is lower or equal to the error computed in the subtree, then we can prune the subtree in question. Let C be a set of training objects in node V, and m the number of all possible class labels. N is the number of training objects in C and O is the most frequent class label in C. Let n be the number of training objects in C that belong to class O. The static error estimate (also called Laplace error estimate) in node V is E(V ) : E(V ) D

N nCm1 : NCm

Let us compute the subtree error of V using the next expression SubtreeError(V ) D

NoOfChildNodes X

Pi  Err(Vi ) :

iD1

Since we do not have information about class distribution, we use relative frequencies in the place of probabilities Pi . Err(Vi ) is the error estimate of child node V i . Now we can compute the new error estimate Err(V ) in node V by using the next expression:   Err(V ) D min E(V ); SubtreeError(V) : If E(V) is less than or equal to SubtreeError(V), then we can prune the subtree of V, replace it by a leaf, label it class O and assign the value of E(V) to Err(V ). In the opposite case we only assign the value of SubtreeError(V ) to Err(V ). Static error estimate pruning is a recursive bottom-up approach, where we start to calculate error estimates in leafs and perform error propagation and eventual pruning actions toward parent nodes until the root node is reached. In Fig. 8 we have an unpruned DT with two possible decision classes, O1 and O2 . Inside each node there are frequencies of training objects according to their class label,

837

838

Decision Trees

Decision Trees, Figure 8 Unpruned decision tree with static error estimates

Decision Trees, Figure 9 Pruned decision tree with static error estimates

written in brackets (first for O1 and second for O2 ). Beside each internal node (squares) there are two numbers. The top one is static error estimate in the node and the bottom one is error estimate in the subtree. The number below each leaf (ellipses) is its error estimate. We can see that static error estimates are lower than subtree errors for two nodes: Att3 and Att4. Therefore, we prune those two subtrees and we get a smaller but more accurate DT (Fig. 9). Reduced Error Pruning Reduced error pruning implements a very similar principle as static error estimate pruning. The difference is in error estimation. Reduced error pruning uses the pruning set to estimate errors in nodes and leafs. The pruning set is very similar to training set,

since it contains objects, described with attributes and class label. In the first stage we try to classify each pruning object. When a leaf is reached, we compare DT’s classification with actual class label and mark correct/incorrect decision in the leaf. At the end there is a number of errors and number of correct classifications recorded in each leaf. In a recursive bottom-up approach, we start to calculate error estimates in leafs, perform error propagation and eventual pruning actions toward parent nodes until the root node is reached. Let SubtreeError(V) be a sum of errors from all subtrees of node V. 8 < V 2 Leafs ) NoOfErrorsInLeaf NoOfChildNodes P SubtreeError(V) D : V … Leafs ) Err(Vi ) : iD1

Decision Trees

Let C be a set of training objects in node V and O the most frequent class label in C. Let N be the number of all pruning objects that reached the node V and n be the number of pruning objects in V that belong to class O. The error estimate in node V is defined by E(V ) : E(V ) D N  n : Nowwe can compare the error estimate in the node with the error estimate in the subtree and compute a new error estimate Err(V) in node V:   Err(V) D min E(V ); SubtreeError(V) : IfE(V) is less than or equal to SubtreeError(V ), then we can prune the subtree of V, replace it by a leaf labeled with class O and assign the value of E(V) to Err(V). In the opposite case we only assign the value of SubtreeError(V ) to Err(V ). The drawback of this pruning approach is the fact that it is very hard to construct a pruning set that is capable of “visiting” each node in a DT at least once. If we cannot assure that, the pruning procedure will not be as efficient as other similar approaches. Deficiencies of Classical Induction The DT has been shown to be a powerful tool for decision support in different areas. The effectiveness and accuracy of classification of DTs have been a surprise for many experts and their greatest advantage is in simultaneous suggestion of a decision and the straightforward and intuitive explanation of how the decision was made. Nevertheless, the classical induction approach also contains several deficiencies. One of the most obvious drawbacks of classical DT induction algorithms is poor processing of incomplete, noisy data. If some attribute value is missing, classical algorithms do not perform well on processing of such an object. For example, in Quinlan’s algorithms before C4.5 such data objects with missing data were left out of the training set – this fact of course resulted in decreased quality of obtained solutions (in this way the training set size and the information about the problem’s domain were reduced). Algorithm C4.5 introduced a technique to overcome this problem, but this is still not very effective. In real world problems, especially in medicine, missing data is very common – therefore, effective processing of such data is of vital importance. The next important drawback of classical induction methods is the fact that they can produce only one DT classifier for a problem (when the same training set is

used). In many real-world situations it would be of great benefit if more DT classifiers would be available and a user could choose the most appropriate one for a single case. As it is possible for training objects to miss some attribute value, the same goes also for a testing object – there can be a new case where some data is missing and it is not possible to obtain it (unavailability of some medical equipment, for example, or invasiveness of a specific test for a patient). In this way another DT classifier could be chosen that does not include a specific attribute test to make a decision. Let us mention only one more disadvantage of classical induction methods, namely the importance of different errors. Between all decisions possible there are usually some that are more important than the others. Therefore, a goal is to build a DT in such a way that the accuracy of classification for those most important decisions is maximized. Once again, this problem is not solved adequately in classical DT induction methods. Variant Methods Although algorithms such as ID3, C4.5, and CART make up the foundation of traditional DT induction practice, there is always room for improvement of the accuracy, size, and generalization ability of the generated trees. As would be expected, many researchers have tried to build on the success of these techniques by developing better variations of them. Alternatively, a lot of different methods for the induction of DT have been introduced by various researchers, which try to overcome the deficiencies noted above. Most of them are based on so-called soft methods, like evolutionary techniques or neural networks, sometimes several methods are combined in a hybrid algorithm. A vast number of techniques have also been developed that help to improve only a part of the DT induction process. Such techniques include evolutionary algorithms for optimizing split functions in attribute nodes, dynamic discretization of attributes, dynamic subset selection of training objects, etc. Split Selection Using Random Search Since random search techniques have proven extremely useful in finding solutions to non-deterministic polynomial complete (NPcomplete) problems [30], naturally they have been applied to DT induction. Heath [20,21] developed a DT induction algorithm called SADT that uses a simulated annealing process to find oblique splits at each decision node. Simulated annealing is a variation of hill climbing which, at the beginning of the process, allows some random downhill moves to be made [42]. As a computational process,

839

840

Decision Trees

simulated annealing is patterned after the physical process of annealing [25], in which metals are melted (at high temperatures) and then gradually cool until some solid state is reached. Starting with an initial hyper-plane, SADT randomly perturbs the current solution and determines the goodness of the split by measuring the change in impurity (E). If E is negative (i. e. impurity decreases), the new hyper-plane becomes the current solution; otherwise, the new hyper-plane becomes the current split with probability e( E/T) where T is the temperature of the system. Because simulated annealing mimics the cooling of metal, its initially high temperature falls with each perturbation. At the start of the process, the probability of replacing the current hyper-plane is nearly 1. As the temperature cools, it becomes increasingly unlikely that worse solutions are accepted. When processing a given data set, Heath typically grows hundreds of trees, performing several thousands perturbations per decision node. Thus, while SADT has been shown to find smaller trees than CART, it is very expensive from a computational standpoint [21]. A more elegant variation on Heath’s approach is the OC1 system [29]. Like SADT, OC1 uses random search to find the best split at each decision node. The key difference is that OC1 rejects the brute force approach of SADT, using random search only to improve on an existing solution. In particular, it first finds a good split using a CART-like deterministic search routine. OC1 then randomly perturbs this hyper-plane in order to decrease its impurity. This step is a way of escaping the local optima in which deterministic search techniques can be trapped. If the perturbation results in a better split, OC1 resumes the deterministic search on the new hyper-plane; if not, it re-perturbs the partition a user-selectable number of times. When the current solution can be improved no further, it is stored for later reference. This procedure is repeated a fixed number of times (using a different initial hyper-plane in each trial). When all trials have been completed, the best split found is incorporated into the decision node. Incremental Decision Tree Induction The DT induction algorithms discussed so far grow trees from a complete training set. For serial learning tasks, however, training instances may arrive in a stream over a given time period. In these situations, it may be necessary to continually update the tree in response to the newly acquired data. Rather than building a new DT classifier from scratch, the incremental DT induction approach revises the existing tree to be consistent with each new training instance. Utgoff implemented an incremental version of ID3, called

ID5R [53]. ID5R uses an E-Score criteria to estimate the amount of ambiguity in classifying instances that would result from placing a given attribute as a test in a decision node. Whenever the addition of new training instances does not fit the existing tree, the tree is recursively restructured such that attributes with the lowest E-Scores are moved higher in the tree hierarchy. In general, Utgoff’s algorithm yields smaller trees compared to methods like ID3, which batch process all training data. Techniques similar to ID5R include an incremental version of CART [8]. Incremental DT induction techniques result in frequent tree restructuring when the amount of training data is small, with the tree structure maturing as the data pool becomes larger. Decision Forests Regardless of the DT induction method utilized, subtle differences in the composition of the training set can produce significant variances in classification accuracy. This problem is especially acute when cross-validating small data sets with high dimensionality [11]. Researchers have reduced these high levels of variance by using decision forests, composed of multiple trees (rather than just one). Each tree in a forest is unique because it is grown from a different subset of the same data set. For example, Quinlan’s windowing technique [41] induces multiple trees, each from a randomly selected subset of the training data (i. e., a window). Another approach was devised by Ho [22], who based each tree on a unique feature subset. Once a forest exists, the results from each tree must be combined to classify a given data instance. Such committee-type schemes for accomplishing this range from using majority rules voting [21,37] to different statistical methods [46]. A Combination of Decision Trees and Neural Networks When DT and neural networks are compared, one can see that their advantages and drawbacks are almost complementary. For instance knowledge representation of DT is easily understood by humans, which is not the case for neural networks; DT have trouble dealing with noise in training data, which is again not the case for neural networks; DT learn very fast and neural networks learn relatively slow, etc. Therefore, the idea is to combine DT and neural networks in order to combine their advantages. In this manner, different hybrid approaches from both fields have been introduced [7,54,56]. Zorman in his MtDecit 2.0 approach first builds a DT that is then used to initialize a neural network [56,57]. Such a network is then trained using the same training objects as the DT. After that the neural network is again converted to a DT that is better than the original DT [2]. The

Decision Trees

source DT classifier is converted to a disjunctive normal form – a set of normalized rules. Then the disjunctive normal form serves as the source for determining the neural network’s topology and weights. The neural network has two hidden layers, the number of neutrons on each hidden layer depends on rules in the disjunctive normal form. The number of neurons in the output layer depends on how many outcomes are possible in the training set. After the transformation, the neural networks is trained using back-propagation. The mean square error of such a network converges toward 0 much faster than it would in the case of randomly set weights in the network. Finally, the trained neural network has to be converted into a final DT. The neural network is examined in order to determine the most important attributes that influence the outcomes of the neural network. A list containing the most important attributes is then used to build the final DT that in the majority of cases performs better than the source DT. The last conversion usually causes a loss of some knowledge contained in the neural network, but even so most knowledge is transformed into the final DT. If the approach is successful, then the final DT has better classification capabilities than the source DT. Using Evolutionary Algorithms to Build Decision Trees Evolutionary algorithms are generally used for very complex optimization tasks [18], for which no efficient heuristic method exist. Construction of DT is a complex task, but heuristic methods exist that usually work efficiently and reliably. Nevertheless, there are some reasons justifying an evolutionary approach. Because of the robustness of evolutionary techniques they can be successfully used also on incomplete, noisy data (which often happens in real life data because of measurement errors, unavailability of proper instruments, risk to patients, etc.). Because of evolutionary principles used to evolve solutions, solutions can be found which can be easily overlooked otherwise. Also the possibility of optimizing the DT classifier’s topology and the adaptation of split thresholds is an advantage. There have been several attempts to build a DT with the use of evolutionary techniques [6,31,35], one the most recent and also very successful in various applications is genTrees, developed by Podgorelec and Kokol [35,36,37]. First an initial population of (about one hundred) semirandom DT is seeded. A random DT classifier is built by randomly choosing attributes and defining a split function. When a pre-selected number of test nodes is placed into a growing tree then a tree is finalized with decision nodes, which are defined by choosing the most fit decision class in each node regarding the training set. After the initialization the population of decision trees evolves through

many generations of selection, crossover and mutation in order to optimize the fitness function that determines the quality of the evolved DT. According to the fitness function the best trees (the most fit ones) have the lowest function values – the aim of the evolutionary process is to minimize the value of local fitness function (LFF) for the best tree. With the combination of selection that prioritizes better solutions, crossover that works as a constructive operator towards local optimums, and mutation that works as a destructive operator in order to keep the needed genetic diversity, the searching for the solution tends to be directed toward the global optimal solution. The global optimal solution is the most appropriate DT regarding the specific needs (expressed in the form of fitness function). As the evolution repeats, more qualitative solutions are obtained regarding the chosen fitness function. One step further from Podgorelec’s approach was introduced by Sprogar with his evolutionary vector decision trees – VEDEC [49]. In his approach the evolution of DT is similar to the one by Podgorelec, but the functionality of DT is enhanced in the way that not only one possible decision class is predicted in the leaf, but several possible questions are answered with a vector of decisions. In this manner, for example a possible treatment for a patient is suggested together with the diagnosis, or several diagnoses are suggested at the same time, which is not possible in ordinary DT. One of the most recent approaches to the evolutionary induction of DT-like methods is the AREX algorithm developed by Podgorelec and Kokol [34]. In this approach DT are extended by so-called decision programs that are evolved with the help of automatic programming. In this way a classical attribute test can be replaced by a simple computer program, which greatly improves the flexibility, at the cost of computational resources, however. The approach introduces a multi-level classification model, based on the difference between objects. In this manner “simple” objects are classified on the first level with simple rules, more “complex” objects are classified on the second level with more complex rules, etc. Evolutionary Supported Decision Trees Even though DT induction isn’t as complicated from the parameter settings’ point of view as some other machine-learning approaches, getting the best out of the approach still requires some time and skill. Manual setting of parameters takes more time than induction of the tree itself and even so, we have no guarantee that parameter settings are optimal. With MtDeciT3.1Gen Zorman presented [59] an implementation of evolutionary supported DT. This approach

841

842

Decision Trees

did not change the induction process itself, since the basic principle remained the same. The change was in evolutionary support, which is in charge of searching the space of induction parameters for the best possible combination. By using genetic operators like selection, crossover and mutation evolutionary approaches are often capable of finding optimal solutions even in the most complex of search spaces or at least they offer significant benefits over other search and optimization techniques. Enhanced with the evolutionary option, DT don’t offer us more generalizing power, but we gain on better coverage of method parameter’s search space and significant decrease of time spent compared to manual parameter setting. Better exploitation of a method usually manifests in better results, and the goal is reached. The rule of the evolutionary algorithm is to find a combination of parameters settings, which would enable induction of DT with high overall accuracy, high class accuracies, use as less attributes as possible, and be as small as possible. Ensemble Methods An extension of individual classifier models are ensemble methods that use a set of induced classifiers and combine their outputs into a single classification. The purpose of combining several individual classifiers together is to achieve better classification results. All kinds of individual classifiers can be used to construct an ensemble, however DT are used often as one of the most appropriate and effective models. DT are known to be very sensitive to changes in a learning set. Consequentially, even small changes in a learning set can result in a very different DT classifier which is a basis for the construction of efficient ensembles. Two well-known representatives of ensemble methods are Random forests and Rotation forests, which use DT classifiers, although other classifiers (especially in the case of Rotation forests) could be used as well. Dealing with Missing Values Like most of the machine-learning approaches, the DT approach also assumes that all data is available at the time of induction and usage of DT. Real data bases rarely fit this assumption, so dealing with incomplete data presents an important task. In this section some aspects of dealing with missing values for description attributes will be given. Reasons for incomplete data can be different: from cases where missing values could not be measured, human errors when writing measurement results, to usage of data bases, which were not gathered for the purpose of machine learning, etc.

Missing values can cause problems in three different stages of induction and usage of DT:  Split selection in the phase of DT induction,  Partition of training set in a test node during the phase of DT induction, and  Selecting which edge to follow from a test node when classifying unseen cases during the phase of DT usage. Though some similarities can be found between approaches for dealing with missing data for listed cases, some specific solutions exist, which are applicable only to one of the possible situations. Let us first list possible approaches for the case of dealing with missing values during split selection in the phase of DT induction:  Ignore – the training object is ignored during evaluation of the attribute with the missing value,  Reduce – the evaluation (for instance entropy, gain or gain ratio) of the attribute with missing values is changed according to the percentage of training objects with missing values,  Substitute with DT – a new DT with a reduced number of attributes is induced in order to predict the missing value for the training object in question; this approach is best suited for discrete attributes,  Replace – the missing value is replaced with mean (continuous attributes) or most common (discrete attributes) value, and  New value – missing values for discrete attributes are substituted with new value (“unknown”) which is then treated the same way as other possible values of the attribute. The next seven approaches are suited for partitioning of the training set in a test node during the phase of DT induction:  Ignore – the training object is ignored during partition of the test set,  Substitute with DT – a new DT with a reduced number of attributes is induced in order to predict the missing value for the training object in question – according to the predicted value we can propagate the training object to one of the successor nodes; this approach is best suited for discrete attributes,  Replace – the missing value is replaced with the mean (continuous attributes) or most common (discrete attributes) value,  Probability – the training object with the missing value is propagated to one of the edges according to prob-

Decision Trees

ability, proportional to strength of subsets of training objects,  Fraction – a fraction of the training object with a missing value is propagated to each of the subsets; the size of the fraction is proportional to the strength of each subset of training objects,  All – the training object with the missing value is propagated to all subsets, coming from the current node, and  New value – training objects with new value (“unknown”) are assigned to a new subset and propagated further, following a new edge, marked with the previously mentioned new value. The last case for dealing with missing attribute values is during the decision-making phase, when we are selecting which edge to follow from a test node when classifying unseen cases:  Substitute with DT – a new DT with a reduced number of attributes is induced in order to predict the missing value for the object in question – according to predicted value we can propagate the unseen object to one of the successor nodes; this approach is best suited for discrete attributes,  Replace – the missing value is replaced with the mean (continuous attributes) or most common (discrete attributes) value,  All – the unseen object with a missing value is propagated to all subsets, coming from the current node; the final decision is subject to voting by all leaf nodes we ended in,  Stop – when facing a missing value, the decisionmaking process is stopped and the decision with the highest probability according to the training subset in the current node is returned, and  New value – if there exists a special edge marked with value “unknown”, we follow that edge to the next node. Evaluation of Quality Evaluation of an induced DT is essential for establishing the quality of the learned model. For the evaluation it is essential to use a set of unseen instances – a testing set. A testing set is a set of testing objects which have not been used in the process of inducing the DT and can therefore be used to objectively evaluate its quality. A quality is a complex term that can include several measures of efficiency. For the evaluation of the DT’s quality the most general measure is the overall accuracy, defined as a percentage of correctly classified objects from all objects (correctly classified and not correctly classified).

Accuracy can be thus calculated as: ACC D

T TCF

where T stands for “true” cases (i. e. correctly classified objects) and F stands for “false” cases (i. e. not correctly classified objects). The above measure is used to determine the overall classification accuracy. In many cases, the accuracy of each specific decision class is even more important than the overall accuracy. The separate class accuracy of ith single decision class is calculated as: Ti ACC k;i D Ti C F i andthe average accuracy over all decision classes is calculated as ACC k D

M 1 X Ti  M Ti C F i iD1

where M represents the number of decision classes. There are many situations in real-world data where there are exactly two decision classes possible (i. e. positive and negative cases). In such a case the common measures are sensitivity and specificity: Sens D

TP ; TP C F N

Spec D

TN T N C FP

where TP stands for “true positives” (i. e. correctly classified positive cases), TN stands for “true negatives” (i. e. correctly classified negative cases), FP stands for “false positives” (i. e. not correctly classified positive cases), and FN stands for “false negatives” (i. e. not correctly classified negative cases). Besides accuracy, various other measures of efficiency can be used to determine the quality of a DT. Which measures are used depends greatly on the used dataset and the domain of a problem. They include complexity (i. e. the size and structure of an induced DT), generality (differences in accuracy between training objects and testing objects), a number of used attributes, etc. Applications and Available Software DT are one of the most popular data-mining models because they are easy to induce, understand, and interpret. High popularity means also many available commercial and free applications. Let us mention a few. We have to start with Quinlan’s C4.5 and C5.0/See5 (MS Windows/XP/Vista version of C5.0), the classic version of DT tool. It is commercial and available at http:// www.rulequest.com/see5-info.html.

843

844

Decision Trees

Another popular commercial tool is CART 5, based on original regression trees algorithm. It is implemented in an intuitive Windows-based environment. It is commercially available at http://www.salford-systems.com/1112.php. Weka (Waikato Environment for Knowledge Analysis) environment contains a collection of visualization tools and algorithms for data analysis and predictive modeling. Among them there are also some variants of DTs. Beside its graphical user interfaces for easy access, one of its great advantages is the possibility of expanding the tool with own implementations of data analysis algorithms in Java. A free version is available at http://www.cs.waikato. ac.nz/ml/weka/. OC1 (Oblique Classifier 1) is a decision tree induction system written in C and designed for applications where the instances have numeric attribute values. OC1 builds DT that contain linear combinations of one or more attributes at each internal node; these trees then partition the space of examples with both oblique and axis-parallel hyperplanes. It is available free at http://www.cs.jhu.edu/ ~salzberg/announce-oc1.html. There exist many other implementations, some are available as a part of larger commercial packages like SPSS, Matlab, etc. Future Directions DT has reached the stage of being one of the fundamental classification tools used on a daily basis, both in academia and in industry. As the complexity of stored data is growing and information systems become more and more sophisticated, the necessity for an efficient, reliable and interpretable intelligent method is growing rapidly. DT has been around for a long time and has matured and proved its quality. Although the present state of research regarding DT gives reasons to think there is nothing left to explore, we can expect major developments in some trends of DT research that are still open. It is our belief that DT will have a great impact on the development of hybrid intelligent systems in the near future. Bibliography Primary Literature 1. Babic SH, Kokol P, Stiglic MM (2000) Fuzzy decision trees in the support of breastfeeding. In: Proceedings of the 13th IEEE Symposium on Computer-Based Medical Systems CBMS’2000, Houston, pp 7–11 2. Banerjee A (1994) Initializing neural networks using decision trees In: Proceedings of the International Workshop on Computational Learning Theory and Natural learning Systems, Cambridge, pp 3–15

3. Bonner G (2001) Decision making for health care professionals: use of decision trees within the community mental health setting. J Adv Nursing 35:349–356 4. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140 5. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont 6. Cantu-Paz E, Kamath C (2000) Using evolutionary algorithms to induce oblique decision trees. In: Proceedings of the Genetic and Evolutionary Computation Conference GECCO-2000, Las Vegas, pp 1053–1060 7. Craven MW, Shavlik JW (1996) Extracting tree-structured representations of trained networks. In: Advances in Neural Information Processing Systems, vol 8. MIT Press, Cambridge 8. Crawford S (1989) Extensions to the CART algorithm. Int J ManMach Stud 31(2):197–217 9. Cremilleux B, Robert C (1997) A theoretical framework for decision trees in uncertain domains: Application to medical data sets. In: Lecture Notes in Artificial Intelligence, vol 1211. Springer, London, pp 145–156 10. Dantchev N (1996) Therapeutic decision frees in psychiatry. Encephale-Revue Psychiatr Clinique Biol Therap 22(3):205–214 11. Dietterich TG, Kong EB (1995) Machine learning bias, statistical bias and statistical variance of decision tree algorithms. Mach Learn, Corvallis 12. Feigenbaum EA, Simon HA (1962) A theory of the serial position effect. Br J Psychol 53:307–320 13. Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121:256–285 14. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Machine Learning: Proc. Thirteenth International Conference. Morgan Kauffman, San Francisco, pp 148–156 15. Gambhir SS (1999) Decision analysis in nuclear medicine. J Nucl Med 40(9):1570–1581 16. Gehrke J (2003) Decision Tress. In: Nong Y (ed) The Handbook of Data Mining. Lawrence Erlbaum, Mahwah 17. Goebel M, Gruenwald L (1999) A survey of data mining software tools. SIGKDD Explor 1(1):20–33 18. Goldberg DE (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, Reading 19. Hand D (1997) Construction and assessment of classification rules. Wiley, Chichester 20. Heath D et al (1993) k-DT: A multi-tree learning method. In: Proceedings of the Second International Workshop on Multistrategy Learning, Harpers Fery, pp 138–149 21. Heath D et al (1993) Learning Oblique Decision Trees. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence IJCAI-93, pp 1002–1007 22. Ho TK (1998) The Random Subspace Method for Constructing Decision Forests. IEEE Trans Pattern Anal Mach Intell 20(8):832– 844 23. Hunt EB, Marin J, Stone PT (1966) Experiments in Induction. Academic Press, New York, pp 45–69 24. Jones JK (2001) The role of data mining technology in the identification of signals of possible adverse drug reactions: Value and limitations. Curr Ther Res-Clin Exp 62(9):664–672 25. Kilpatrick S et al (1983) Optimization by Simulated Annealing. Science 220(4598):671–680 26. Kokol P, Zorman M, Stiglic MM, Malcic I (1998) The limitations of decision trees and automatic learning in real world medical

Decision Trees

27.

28.

29. 30. 31. 32.

33. 34.

35.

36.

37.

38.

39. 40. 41. 42. 43.

44. 45. 46. 47.

decision making. In: Proceedings of the 9th World Congress on Medical Informatics MEDINFO’98, 52, pp 529–533 Letourneau S, Jensen L (1998) Impact of a decision tree on chronic wound care. J Wound Ostomy Conti Nurs 25:240– 247 Lim T-S, Loh W-Y, Shih Y-S (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 48:203–228 Murthy KVS (1997) On Growing Better Decision Trees from Data, Ph D dissertation. Johns Hopkins University, Baltimore Neapolitan R, Naimipour K (1996) Foundations of Algorithms. DC Heath, Lexington Nikolaev N, Slavov V (1998) Inductive genetic programming with decision trees. Intell Data Anal Int J 2(1):31–44 Ohno-Machado L, Lacson R, Massad E (2000) Decision trees and fuzzy logic: A comparison of models for the selection of measles vaccination strategies in Brazil. Proceedings of AMIA Symposium 2000, Los Angeles, CA, US, pp 625–629 Paterson A, Niblett TB (1982) ACLS Manual. Intelligent Terminals, Edinburgh Podgorelec V (2001) Intelligent systems design and knowledge discovery with automatic programming. Ph D thesis, University of Maribor Podgorelec V, Kokol P (1999) Induction f medical decision trees with genetic algorithms. In: Proceedings of the International ICSC Congress on Computational Intelligence Methods and Applications CIMA. Academic Press, Rochester Podgorelec V, Kokol P (2001) Towards more optimal medical diagnosing with evolutionary algorithms. J Med Syst 25(3): 195–219 Podgorelec V, Kokol P (2001) Evolutionary decision forests – decision making with multiple evolutionary constructed decision trees. In: Problems in Applied Mathematics and Computational Intelligence. WSES Press, pp 97–103 Quinlan JR (1979) Discovering rules by induction from large collections of examples. In: Michie D (ed) Expert Systems in the Micro Electronic Age, University Press, Edingburgh, pp 168– 201 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234 Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco Rich E, Knight K (1991) Artificial Intelligence, 2nd edn. McGraw Hill, New York Sanders GD, Hagerty CG, Sonnenberg FA, Hlatky MA, Owens DK (2000) Distributed decision support using a web-based interface: prevention of sudden cardiac death. Med Decis Making 19(2):157–166 Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227 Shannon C, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Champagn Shlien S (1992) Multiple binary decision tree classifiers. Pattern Recognit Lett 23(7):757–763 Sims CJ, Meyn L, Caruana R, Rao RB, Mitchell T, Krohn M (2000)

48.

49. 50. 51.

52.

53. 54. 55. 56.

57.

58.

59.

Predicting cesarean delivery with decision tree models. Am J Obstet Gynecol 183:1198–1206 Smyth P, Goodman RM (1991) Rule induction using information theory, In: Piatsky-Scharpiro G, Frawley WJ (eds) Knowledge Discovery in Databases, AAAI Press, Cambridge, pp 159– 176 Sprogar M, Kokol P, Hleb S, Podgorelec V, Zorman M (2000) Vector decision trees. Intell Data Anal 4(3–4):305–321 Tou JT, Gonzalez RC (1974) Pattern Recognition Principles. Addison-Wesley, Reading Tsien CL, Fraser HSF, Long WJ, Kennedy RL (1998) Using classification tree and logistic regression methods to diagnose myocardial infarction. In: Proceedings of the 9th World Congress on Medical Informatics MEDINFO’98, 52, pp 493–497 Tsien CL, Kohane IS, McIntosh N (2000) Multiple signal integration by decision tree induction to detect artifacts in the neonatal intensive care unit. Artif Intell Med 19(3):189–202 Utgoff PE (1989) Incremental induction of decision trees. Mach Learn 4(2):161–186 Utgoff PE (1989) Perceptron trees: a case study in hybrid concept representations. Connect Sci 1:377–391 White AP, Liu WZ (1994) Bias in information-based measures in decisions tree induction. Mach Learn 15:321–329 Zorman M, Hleb S, Sprogar M (1999) Advanced tool for building decision trees MtDecit 2.0. In: Arabnia HR (ed) Proceedings of the International Conference on Artificial Intelligence ICAI99. Las Vegas Zorman M, Kokol P, Podgorelec V (2000) Medical decision making supported by hybrid decision trees. In: Proceedings of the ICSC Symposia on Intelligent Systems & Applications ISA’2000, ICSC Academic Press, Wollongong Zorman M, Podgorelec V, Kokol P, Peterson M, Lane J (2000) Decision tree’s induction strategies evaluated on a hard real world problem. In: Proceedings of the 13th IEEE Symposium on Computer-Based Medical Systems CBMS’2000, Houston, pp 19–24 Zorman M, Sigut JF, de la Rosa SJL, Alayón S, Kokol P, Verliè M (2006) Evolutionary built decision trees for supervised segmentation of follicular lymphoma images. In: Proceedings of the 9th IASTED International conference on Intelligent systems and control, Honolulu, pp 182–187

Books and Reviews Breiman L, Friedman JH, Olsen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont Han J, Kamber M (2006) Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco Hand D, Manilla H, Smyth P (2001) Principles of Data Mining. MIT Press, Cambridge Kantardzic M (2003) Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, San Francisco Mitchell TM (1997) Machine Learning. McGraw-Hill, New York Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco Ye N (ed) (2003) The Handbook of Data Mining. Lawrence Erlbaum, Mahwah

845

846

Dependency and Granularity in Data-Mining

Dependency and Granularity in Data-Mining SHUSAKU TSUMOTO, SHOJI HIRANO Department of Medical Informatics, Shimane University, School of Medicine, Enya-cho Izumo City, Shimane, Japan Article Outline Definition of the Subject Introduction Contingency Table from Rough Sets Rank of Contingency Table (2 × 2) Rank of Contingency Table (m × n) Rank and Degree of Dependence Degree of Granularity and Dependence Conclusion Acknowledgment Bibliography Definition of the Subject The degree of granularity of a contingency table is closely related with that of dependence of contingency tables. We investigate these relations from the viewpoints of determinantal devisors and determinants. From the results of determinantal divisors, it seems that the devisors provide information on the degree of dependencies between the matrix of all elements and its submatrices and that an increase in degree of granularity may lead to an increase in dependency. However, another approach shows that a constraint on the sample size of a contingency table is very strong, which leads to an evaluation formula in which an increase of degree of granularity gives a decrease of dependency. Introduction Independence (dependence) is a very important concept in data mining, especially for feature selection. In rough sets [2], if two attribute-value pairs, say [c D 0] and [d D 0] are independent, their supporting sets, denoted by C and D do not have an overlapping region (C \ D D ), which means that an attribute independent to a given target concept may not appear in the classification rule for the concept. This idea is also frequently used in other rule discovery methods: let us consider deterministic rules, described as if-then rules, which can be viewed as classic propositions (C ! D). From the set-theoretical point of view, a set of examples supporting the conditional part of a determin-

istic rule, denoted by C, is a subset of a set whose examples belong to the consequence part, denoted by D. That is, the relation C D holds and deterministic rules are supported only by positive examples in a dataset [4]. When such a subset relation is not satisfied, indeterministic rules can be defined as if-then rules with probabilistic information [6]. From the set-theoretical point of view, C is not a subset, but closely overlapped with D. That is, the relations C \ D ¤  and jC \ Dj/jCj ı will hold in this case1 . Thus, probabilistic rules are supported by a large number of positive examples and a small number of negative examples. On the other hand, in a probabilistic context, independence of two attributes means that one attribute (a1 ) will not influence the occurrence of the other attribute (a2 ), which is formulated as p(a2 ja1 ) D p(a2 ). Although independence is a very important concept, it has not been fully and formally investigated as a relation between two attributes. Tsumoto introduces linear algebra into formal analysis of a contingency table [5]. The results give the following interesting results: First, a contingency table can be viewed as comparison between two attributes with respect to information granularity. Second, algebra is a key point of analysis of this table. A contingency table can be viewed as a matrix and several operations and ideas of matrix theory are introduced into the analysis of the contingency table. Especially for the degree of independence, rank plays a very important role in extracting a probabilistic model from a given contingency table. This paper presents a further investigation into the degree of independence of a contingency matrix. Intuitively and empirically, when two attributes have many values, the dependence between these two attributes becomes low. However, from the results of determinantal divisors, it seems that the devisors provide information on the degree of dependencies between the matrix of all elements and its submatrices and that an increase in degree of granularity may lead to an increase in dependency. The key of the resolution of these conflicts is to consider constraints on the sample size. In this paper we show that a constraint on the sample size of a contingency table is very strong, which leads to the evaluation formula in which the increase of degree of granularity gives a decrease of dependency. The paper is organized as follows: Sect. “Contingency Table from Rough Sets” shows preliminaries. Sect. “Rank of Contingency Table (2  2)” discusses the former results. Sect. “Rank of 1 The threshold ı is the degree of the closeness of overlapping sets, which will be given by domain experts. For more information, please refer to Sect. “Rank of Contingency Table (2  2)”.

Dependency and Granularity in Data-Mining

Contingency Table (m  n)” gives the relations between rank and submatrices of a matrix. Finally, Sect. “Degree of Granularity and Dependence” concludes this paper. Contingency Table from Rough Sets Notations In the subsequent sections, the following notation, introduced in [3], is adopted: Let U denote a nonempty, finite set called the universe and A denote a nonempty, finite set of attributes, that is, a : U ! Va for a 2 A, where V a is called the domain of a. Then, a decision table is defined as an information system, A D (U; A [ fDg), where fDg is a set of given decision attributes. The atomic formulas over B A [ fDg and V are expressions of the form [a D v], called descriptors over B, where a 2 B and v 2 Va . The set F(B; V) of formulas over B is the least set containing all atomic formulas over B and closed with respect to disjunction, conjunction and negation. For each f 2 F(B; V), f A denotes the meaning of f in A, that is, the set of all objects in U with property f , defined inductively as follows:

Dependency and Granularity in Data-Mining, Table 1 Contingency table (m  n)

B1 B2  Bm Sum

a 1 0 0 1 0

Then, the following contingency table is obtained (Table 3). From this table, accuracy and coverage for [b D 0] ! [e D 0] are obtained as 1/(1 C 2) D 1/3 and 1/(1 C 1) D 1/2.

     

An x1n x2n  xmn xn

Sum x1 x2  xm x D jUj D N

b 0 0 1 1 0

c 0 1 2 1 2

d 0 1 2 2 1

e 1 1 0 1 0

Dependency and Granularity in Data-Mining, Table 3 Corresponding contingency table bD0 eD0 1 eD1 2 3

Contingency Table (m  n)

Example 1 Let us consider an information table shown in Table 2. The relationship between b and e can be examined by using the corresponding contingency table. First, the frequencies of four elementary relations, called marginal distributions, are counted: [b D 0], [b D 1], [e D 0], and [e D 1]. Then, the frequencies of four kinds of conjunction are counted: [b D 0] ^ [e D 0], [b D 0] ^ [e D 1], [b D 1] ^ [e D 0], and [b D 1] ^ [e D 1].

A2 x12 x22  xm2 x2

Dependency and Granularity in Data-Mining, Table 2 A small dataset

1. If f is of the form [a D v] then, f A D fs 2 Uja(s) D vg. 2. ( f ^g)A D f A \g A ; ( f _g)A D f A _g A ; (: f )A D U f a .

Definition 1 Let R1 and R2 denote multinomial attributes in an attribute space A which have m and n values. A contingency table is a table of a set described by the following formulas: j[R1 D A j ]A j, j[R2 D B i ]A j, j[R1 D A j ^ R2 D B i ]A j, j[R1 D A1 ^ R1 D A2 ^ : : : ^ R1 D A m ]A j, j[R2 D B1 ^ R2 D A2 ^ : : : ^ R2 D A n ]A j and jUj (i D 1; 2; 3; : : : ; n and j D 1; 2; 3; : : : ; m). This table is arranged into the form shown in Table 1, P where: j[R1 D A j ]A j D m iD1 x1i D x j , j[R2 D B i ] A j D Pn x D x , j[R D A ji i 1 j ^ R2 D B i ] A j D x i j , jUj D jD1 N D x (i D 1; 2; 3; : : : ; n and j D 1; 2; 3; : : : ; m).

A1 x11 x21  xm1 x1

bD1 1 2 1 3 2 5

One of the important observations from granular computing is that a contingency table shows the relations between two attributes with respect to intersection of their supporting sets. For example, in Table 3, both b and e have two different partitions of the universe and the table gives the relation between b and e with respect to the intersection of supporting sets. It is easy to see that this idea can be extended into an n  n contingency table, which can be viewed as an n  n-matrix. When two attributes have a different number of equivalence classes, the situation may be a little complicated. But, in this case, due to knowledge about linear algebra, we have only to consider the attribute which has a smaller number of equivalence classes and the surplus number of equivalence classes of the attributes with larger number of equivalence classes can be projected into other partitions. In other words, an m  n matrix or contingency table includes a projection from one attribute to the other one. Rank of Contingency Table (2 × 2) Preliminaries Definition 2 A corresponding matrix C Ta;b is defined as a matrix the elements of which are equal to the value of the

847

848

Dependency and Granularity in Data-Mining

corresponding contingency table Ta;b of two attributes a and b, except for marginal values. Definition 3 The rank of a table is defined as the rank of its corresponding matrix. The maximum value of the rank is equal to the size of (square) matrix, denoted by r. Example 2 Let the table given in Table 3 be defined as Tb;e . Then, C Tb;e is:

1 1 : 2 1 Since the determinant of C Tb;e det(C Tb;e ) is not equal to 0, the rank of C Tb;e is equal to 2. It is the maximum value (r D 2), so b and e are statistically dependent. Independence when the Table is 2  2 From the application of linear algebra, several results are obtained. (The proof is omitted). First, it is assumed that a contingency table is given as m D 2, n D 2 in Table 1. Then the corresponding matrix (C TR1 ;R2 ) is given as:

x11 x12 : x21 x22 Proposition 1 The determinant of det(C TR1 ;R2 ) is equal to jx11 x22  x12 x21 j. Proposition 2 The rank will be: ( 2; if det(C TR1 ;R2 ) ¤ 0 rank D 1; if det(C TR1 ;R2 ) D 0 : If the rank of det(C Tb;e ) is equal to 1, according to the theorems of linear algebra, it is obtained that one row or column will be represented by the other column. That is, Proposition 3 Let r1 and r2 denote the rows of the corresponding matrix of a given 2  2 table, C Tb;e . That is, r1 D (x11 ; x12 ) ;

r2 D (x21 ; x22 ) :

Then, r1 can be represented by r2 : r1 D kr2 , where k is given as: x11 x12 x1 kD D D : x21 x22 x2 From this proposition, the following theorem is obtained. Theorem 1 If the rank of the corresponding matrix is 1, then two attributes in a given contingency table are statistically independent. Thus, ( 2; dependent rank D 1; statistical independent :

Rank of Contingency Table (m × n) In the case of a general square matrix, the results in the 2  2 contingency table can be extended. It is especially important to observe that conventional statistical independence is supported only when the rank of the corresponding matrix is equal to 1. Let us consider the contingency table of c and a in Table 2, which is obtained as follows. Thus, the corresponding matrix of this table is: 0 1 1 0 0 @ 0 1 1A ; 0 1 1 whose determinant is equal to 0. It is clear that its rank is 2. It is interesting to see that if the case of [d D 0] is removed, then the rank of the corresponding matrix is equal to 1 and two rows are equal. Thus, if the value space of d into f1; 2g is restricted, then c and d are statistically independent. This relation is called contextual independence [1], which is related to conditional independence. However, another type of weak independence is observed: let us consider the contingency table of a and c. The table is obtained as Table 4. Its corresponding matrix is: 0 1 0 1 C Ta;c D @1 1A ; 2 0 Since the corresponding matrix is not square, the determinant is not defined. But it is easy to see that the rank of this matrix is two. In this case, even removing any attributevalue pair from the table will not generate statistical independence. Finally, the relation between rank and independence in a n  n contingency table is obtained. Theorem 2 Let the corresponding matrix of a given contingency table be a square n  n matrix. If the rank of the corresponding matrix is 1, then two attributes in a given contingency table are statistically independent. If the rank of the corresponding matrix is n, then two attributes in a given contingency table are dependent. Otherwise, two attributes are contextually dependent, which means that several conditional probabilities can be represented by a linear Dependency and Granularity in Data-Mining, Table 4 Contingency table for a and c aD0 cD0 0 cD1 1 cD2 2 3

aD1 1 1 0 2

1 2 2 5

Dependency and Granularity in Data-Mining

combination of conditional probabilities. Thus, 8 ˆ 0. The least positive n with this property is called the period of x. The orbit of x is the set O(x) D fF n (x) : n > 0g. A set Y X is positively invariant, if F(Y) Y and strongly invariant if F(Y) D Y. A point x 2 X is equicontinuous (x 2 E F ) if the family of maps F n is equicontinuous at X, i. e. x 2 E F iff (8" > 0)(9ı > 0)(8y 2 Bı (x))(8n > 0) (d(F n (y); F n (x)) < ") : The system (X; F) is almost equicontinuous if E F ¤ ; and equicontinuous, if (8" > 0)(9ı > 0)(8x 2 X)(8y 2 Bı (x)) (8n > 0)(d(F n (y); F n (x)) < ") : For an equicontinuous system E F D X. Conversely, if E F D X and if X is compact, then F is equicontinuous; this needs not be true in the non-compact case. A system (X; F) is sensitive (to initial conditions), if (9" > 0)(8x 2 X)(8ı > 0)(9y 2 Bı (x)) (9n > 0)(d( f n (y); f n (x)) ") : A sensitive system has no equicontinuous point. However, there exist systems with no equicontinuity points which are not sensitive. A system (X; F) is positively expansive, if (9" > 0)(8x ¤ y 2 X)(9n 0)(d( f n (x); f n (y)) ") : A positively expansive system on a perfect space is sensitive. A system (X; F) is (topologically) transitive, if for

915

916

Dynamics of Cellular Automata in Non-compact Spaces

any nonempty open sets U; V X there exists n 0 such that F n (U) \ V ¤ ;. If X is perfect and if the system has a dense orbit, then it is transitive. Conversely, if (X; F) is topologically transitive and if X is compact, then (X; F) has a dense orbit. A system (X; F) is mixing, if for any nonempty open sets U; V X there exists k > 0 such that for every n k we have F n (U) \ V ¤ ;. An "-chain (from x0 to xn ) is a sequence of points x0 ; : : : ; x n 2 X such that d(F(x i ); x iC1 ) < " for 0  i < n. A system (X; F) is chain-transitive, if for any " > 0 and any x; y 2 X there exists an "-chain from x to y. A strongly invariant closed set Y X is stable, if 8" > 0; 9ı > 0; 8x 2 X; (d(x; Y) < ı H) 8n > 0; d(F n (x); Y) < ") : A strongly invariant closed stable set Y X is an attractor, if

cylinder set of a set of words U AC located at l 2 Z is S the set [U] l D u2U [u] l . A subshift is a nonempty subset ˙ AZ such that there exists a set D AC of forbidden words and ˙ D ˙ D :D fx 2 AZ : 8u v x; u 62 Dg. A subshift ˙ D is of finite type (SFT), if D is finite. A subshift is uniquely determined by its language L(˙ ) :D

[

Ln (˙ ) ;

n0

where Ln (˙ ) :D fu 2 An : 9x 2 ˙; u v xg : A cellular automaton is a map F : AZ ! AZ defined by F(x) i D f (x[ir;iCr]), where r 0 is a radius and f : A2rC1 ! A is a local rule. In particular the shift map  : AZ ! AZ is defined by  (x) i :D x iC1 . A local rule extends to the map f : A ! A by f (u) i D f (u[i;iC2r]) so that j f (u)j D maxfjuj  2r; 0g.

9ı > 0; 8x 2 X; (d(x; Y) < ı H) lim d(F n (x); Y) D 0: Definition 3 Let F : AZ ! AZ be a CA. n!1

Theorem 1 (Knudsen [10]) Let (X; F) be a dynamical system and Y X a dense, F-invariant subset.

(1) A word u 2 A is m-blocking, if juj m and there exists offset d  juj  m such that 8x; y 2 [u]0 ; 8n > 0; F n (x)[d;dCm) D F n (y)[d;dCm) . (2) A set U AC is spreading, if [U] is F-invariant and there exists n > 0 such that F n ([U])  1 ([U]) \  ([U]).

(1) (X; F) is sensitive iff (Y; F) is sensitive. (2) (X; F) is transitive iff (Y; F) is transitive.

The following results will be useful in the sequel.

Recall that a space X is separable, if it has a countable dense set.

Proposition 4 (Formenti, K˚urka [5]) Let F : AZ ! AZ be a CA and let U AC be an invariant set. Then ˝F ([U]) is a subshift iff U is spreading.

Theorem 2 (Blanchard, Formenti, and K˚urka [2]) Let (X; F) be a dynamical system on a non-separable space. If (X; F) is transitive, then it is sensitive.

Theorem 5 (Hedlund [6]) Let F : AZ ! AZ be a CA with local rule f : A2rC1 ! A. Then F is surjective iff f : A ! A is surjective iff j f 1 (u)j D jAj2r for each u 2 AC .

Cellular Automata

Submeasures

For a finite alphabet A, denote by jAj the number of its eleS ments, by A :D n0 An the set of words over A, and by S AC :D n>0 An D A n fg the set of nonempty words. The length of a word u 2 An is denoted by juj :D n. We say that u 2 A is a subword of v 2 A (u v v) if there exists k such that v kCi D u i for all i < juj. We denote by u[i; j) D u i : : : u j1 and u[i; j] D u i : : : u j subwords of u associated to intervals. We denote by AZ the set of A-configurations, or doubly-infinite sequences of letters of A. For any u 2 AC we have a periodic configuration u1 2 AZ defined by (u1 ) kjujCi D u i for k 2 Z and 0  i < juj. The cylinder of a word u 2 A located at l 2 Z is the set [u] l D fx 2 AZ : x[l ;l Cjuj) D ug. The

A pseudometric on a set X is a map d : X  X ! [0; 1) which satisfies the following conditions:

A set W X is inward, if F(W) W ı . In compact spaces, attractors are exactly ˝-limits ˝F (W) D T n>0 F(W) of inward sets.

1. d(x; y) D d(y; x), 2. d(x; z)  d(x; y) C d(y; z). If moreover d(x; y) > 0 for x ¤ y, then we say that d is a metric. There is a standard method to create pseudometrics from submeasures. A bounded submeasure (with bound M 2 RC ) is a map ' : P (Z) ! [0; M] which satisfies the following conditions: 1. '(;) D 0, 2. '(U)  '(U [ V)  '(U) C '(V ) for U; V Z.

Dynamics of Cellular Automata in Non-compact Spaces

A bounded submeasure ' on Z defines a pseudometric d' : AZ  AZ ! [0; 1) by d' (x; y) :D '(fi 2 Z : x i ¤ y i g). The Cantor, Besicovich and Weyl pseudometrics on AZ are defined by the following submeasures: 'C (U) :D 2 minfjij: i2U g ; 'B (U) :D lim sup l !1

jU \ [l; l)j ; 2l

jU \ [k; k C l)j 'W (U) :D lim sup sup : l l !1 k2Z The Cantor Space The Cantor metric on AZ is defined by dC (x; y) D 2k

where

k D minfjij : x i ¤ y i g ;

so dC (x; y) < 2k iff x[k;k] D y[k;k] . We denote by CA D (AZ ; dC ) the metric space of two-sided configurations with metric dC . The cylinders are clopen sets in CA . All Cantor spaces (with different alphabets) are homeomorphic. The Cantor space is compact, totally disconnected and perfect, and conversely, every space with these properties is homeomorphic to a Cantor space. Literature about CA dynamics in Cantor spaces is really huge. In this section, we just recall some results and definitions which will be used later. Theorem 6 (K˚urka [11]) Let (CA ; F) be a CA with radius r. (1) (CA ; F) is almost equicontinuous iff there exists a r-blocking word for F (2) (CA ; F) is equicontinuous iff all sufficiently long words are r-blocking. Denote by E F the set of equicontinuous points of F. The sets of equicontinuous directions and almost equicontinuous directions of a CA (CA ; F) (see Sablik [15]) are defined by

 p : p 2 Z; q 2 N C ; E F q  p D AZ ; q

 p A(F) D : p 2 Z; q 2 N C ; E F q  p ¤ ; : q

E(F) D

All periodic spaces (with different alphabets) are homeomorphic. The periodic space is not compact, but it is totally disconnected and perfect. It is dense in CA . If (CA ; F) is a CA, Then F(PA ) PA . We denote by FP : PA ! PA the restriction of F to PA , so (PA ; FP ) is a (non-compact) dynamical system. Every FP -orbit is finite, so every point x 2 PA is FP -eventually periodic. Theorem 8 Let F be a CA over alphabet A. (1) (CA ; F) is surjective iff (PA ; FP ) is surjective. (2) (CA ; F) is equicontinuous iff (PA ; FP ) is equicontinuous. (3) (CA ; F) is almost equicontinuous iff (PA ; FP ) is almost equicontinuous. (4) (CA ; F) is sensitive iff (PA ; FP ) is sensitive. (5) (CA ; F) is transitive iff (PA ; FP ) is transitive. Proof (1a) Let F be surjective, let y 2 PA and  n (y) D y. There exists z 2 F 1 (y) and integers i < j such that z[i nr;i nrCr) D z[ jnr; jnrCr) . Then x D (z[i nr; jnr) )1 2 PA and FP (x) D y, so FP is surjective. (1b) Let FP be surjective, and u 2 AC . Then u1 has FP -preimage and therefore u has preimage under the local rule. By Hedlund Theorem, (CA ; F) is surjective. (2a) Since PA  CA , the equicontinuity of F implies trivially the equicontinuity of FP . (2b) Let FP be equicontinuous. There exist m > r such that if x; y 2 PA and x[m;m] D y[m;m] , then F n (x)[r;r] D F n (y)[r;r] for all n 0. We claim that all words of length 2m C 1 are (2r C 1)-blocking with offset m  r. If not, then for some x; y 2 AZ with x[m;m] D y[m;m] , there exists n > 0 such that F n (x)[r;r] ¤ F n (y)[r;r] . For periodic configurations x 0 D (x[mnr;mCnr] )1 ; y 0 D (y[mnr;mCnr] )1 we get F n (x 0 )[r;r] ¤ F n (y 0 )[r;r] contradicting the assumption. By Theorem 6, F is C -equicontinuous. (3a) If (CA ; F) is almost equicontinuous, then there exists a r-blocking word u and u1 2 PA is an equicontinuous configuration for (P a ; FP ). (3b) The proof is analogous as (2b). (4) and (5) follow from the Theorem 1 of Knudsen.  The Toeplitz Space Definition 9 Let A be an alphabet

The Periodic Space Definition 7 The periodic space PA D fx 2 AZ : 9n > 0;  n (x) D xg over an alphabet A consists of shift periodic configurations with Cantor metric dC .

(1) The Besicovitch pseudometric on AZ is defined by dB (x; y) D lim sup l !1

jf j 2 [l; l) : x j ¤ y j gj : 2l

917

918

Dynamics of Cellular Automata in Non-compact Spaces

(2) The Weyl pseudometric on AZ is defined by jf j 2 [k; k C l) : x j ¤ y j gj : dW (x; y) D lim sup max l k2Z l !1

dW (x; y)  lim max jf j 2 [k  p i ; k C p i ) : x j ¤ y j gj i!1 k2Z



 lim

i!1

Clearly dB (x; y)  dW (x; y) and dB (x; y) < " () 9l0 2 N ; 8l l0 ; jf j 2 [l; l] : x j ¤ y j gj < (2l C 1)" ; dW (x; y) < " () 9l0 2 N ; 8l l0 ; 8k 2 Z ; jf j 2 [k; k C l) : x j ¤ y j gj < l" : Both dB and dW are symmetric and satisfy the triangle inequality, but they are not metrics. Distinct configurations x; y 2 AZ can have zero distance. We construct a set of regular quasi-periodic configurations, on which dB and dW coincide and are metrics. Definition 10 (1) The period of k 2 Z in x 2 AZ is r k (x) :D inffp > 0 : 8n 2 Z; x kCn p D x k g. We set r k (x) D 1 if the defining set is empty. (2) x 2 AZ is quasi-periodic, if r k (x) < 1 for all k 2 Z. (3) A periodic structure for a quasi-periodic configuration x is a sequence of positive integers p D (p i ) i 0. Proof (1) We must show dW (x; y)  dB (x; y). Let px , p y be the periodic structures for x and y, and let p i D k ix p xi D y y y k i p i be the lowest common multiple of p xi and p i . Then p D (p i ) i is a periodic structure for both x and y. For each i > 0 and for each k 2 Z we have jf j 2 [k  p i ; k C p i ) : x j ¤ y j gj y y

 2k ix q xi C 2k i q i C jf j 2 [p i ; p i ) : x j ¤ y j gj ;

y y

2k q 2k ix q xi C iy iy 2k ix p xi 2k i p i

jf j 2 [p i ; p i ) : x j ¤ y j gj C 2p i

D dB (x; y) : (2) Since x ¤ y, there exists i such that for some k 2 [0; p i ) and for all n 2 Z we have x kCn p i D x k ¤  y k D y kCn p i . It follows dB (x; y) 1/p i . Definition 12 The Toeplitz space TA over A consists of all regular quasi-periodic configurations with metric dB D dW . Toeplitz sequences are constructed by filling in periodic parts successively. For an alphabet A put e A D A [ fg. Definition 13 (1) The p-skeleton S p (x) 2 e AZ of x 2 AZ is defined by ( x k if 8n 2 Z; x kCn p D x k S p (x) k D  otherwise : (2) The sequence of gaps of x 2 e AZ is the unique increasing integer sequence (t i ) a 0. We show that u1 is T -equicontinuous. For a given " > 0 set ı D "/(4m  2r C 1). If dT (y; x) < ı, then there exists l0 such that for all l l0 , jfi 2 [l; l] : x i ¤ y i gj < (2l C 1)ı. For k(2m C 1)  j < (k C 1)(2m C 1), F n (y) j can differ from F n (x) j only if y differs from x in some i 2 [k(2mC1)(mr); (kC1)mC(mr)) Thus a change x i ¤ y i can cause at most 2mC1C2(mr) D 4m2rC1 changes F n (y) j ¤ F n (x) j . We get jfi 2 [l; l) : F n (x) i ¤ F n (y) i gj  2lı(4m  2r C 1)  2l" : This shows that FT is almost equicontinuous. In the q general case that A(F) ¤ ;, we get that FT  p is almost equicontinuous for some p 2 Z, q 2 N C . Since  q is T -equicontinuous, FT is almost equicontinuous and therefore (TA ; FT ) is almost equicontinuous. (3) The proof is the same as in (2) with the only modification that all u 2 Am are (2r C 1)-blocking. (4) The proof of Proposition 8 from [2] works in this case too. (5) The proof of Proposition 12 of [3] works in this case also. 

919

920

Dynamics of Cellular Automata in Non-compact Spaces

The Besicovitch Space

The Generic Space

On AZ we have an equivalence x B y iff dB (x; y) D 0. Denote by B A the set of equivalence classes of B and by B : AZ ! B A the projection. The factor of dB is a metric on B A . This is the Besicovitch space on alphabet A. Using prefix codes, it can be shown that every two Besicovitch spaces (with different alphabets) are homeomorphic. By Proposition 11 each equivalence class contains at most one quasi-periodic sequence.

For a configuration x 2 AZ and word v 2 AC set ˚ v (x) D lim inf jfi 2 [n; n) : x[i;iCjvj) D vgj/2n ; n!1

˚ v (x) D lim sup jfi 2 [n; n) : x[i;iCjvj) D vgj/2n : n!1

For every v 2 AC ; ˚ v ; ˚ v : AZ ! [0; 1] are continuous in the Besicovitch topology. In fact we have j˚ v (x)  ˚ v (y)j  dB (x; y)  jvj ;

Proposition 18 TA is dense in B A . The proof of Proposition 9 of [3] works also for regular quasi-periodic sequences. Theorem 19 (Blanchard, Formenti and K˚urka [2]) The Besicovitch space is pathwise connected, infinite-dimensional, homogenous and complete. It is neither separable nor locally compact. The properties of path-connectedness and infinite dimensionality is proved analogously as in Proposition 15. To prove that B A is neither separable nor locally compact, Sturmian configurations have been used in [2]. The completeness of B A has been proved by Marcinkiewicz [14]. Every cellular automaton F : AZ ! AZ is uniformly continuous with respect to dB , so it preserves the equivalence B . If dB (x; y) D 0, then dB (F(x); F(y)) D 0. Thus a cellular automaton F defines a uniformly continuous map FB : B A ! B A . Theorem 20 (Blanchard, Formenti and K˚urka [2]) Let F be a CA on A. (1) (CA ; F) is surjective iff (B A ; FB ) is surjective. (2) If A(F) ¤ ; then (B A ; FB ) is almost equicontinuous. (3) if E(F) ¤ ;, then (B A ; FB ) is equicontinuous. (4) If (B A ; FB ) is sensitive, then (CA ; F) is sensitive. (5) No cellular automaton (B A ; FB ) is positively expansive. (6) If (CA ; F) is chain-transitive, then (B A ; FB ) is chaintransitive. Theorem 21 (Blanchard, Cervelle and Formenti [3])

j˚ v (x)  ˚ v (y)j  dB (x; y)  jvj : Define the generic space (over the alphabet A) as GA D fx 2 AZ : 8v 2 A ; ˚ v (x) D ˚ v (x)g :

It is a closed subspace of B A . For v 2 A denote by ˚v : GA ! [0; 1] the common value of ˚ and ˚. Using prefix codes, one can show that all generic spaces (with different alphabets) are homeomorphic. The generic space contains all uniquely ergodic subshifts, in particular all Sturmian sequences and all regular Toeplitz sequences. Thus the proofs in Blanchard Formenti and K˚urka [2] can be applied to the generic space too. In particular the generic space is homogenous. If we regard the alphabet A D f0; : : : ; m  1g as the group Zm D Z/mZ, then for every x 2 GA there is an isometry H x : GA ! GA defined by H x (y) D x C y. Moreover, GA is pathwise connected, infinite-dimensional and complete (as a closed subspace the full Besicovitch space). It is neither separable nor locally compact. If F : AZ ! AZ is a cellular automaton, then F(GA ) GA . Thus, the restriction of FB to GA defines a dynamical system (GA ; FG ). See also Pivato for a similar approach. Theorem 22 Let F : AZ ! AZ be a CA. (1) (2) (3) (4) (5)

(CA ; F) is surjective iff (GA ; FG ) is surjective. If A(F) ¤ ;, then (GA ; FG ) is almost equicontinuous. if E(F) ¤ ;, then (GA ; FG ) is equicontinuous. If (GA ; FG ) is sensitive, then (CA ; F) is sensitive. If F is C -chain transitive, then F is G -chain transitive.

The proofs are the same as the proofs of corresponding properties in [2].

(1) No CA (B A ; FB ) is transitive. (2) A CA (B A ; FB ) has either a unique fixed point and no other periodic point, or it has uncountably many periodic points. (3) If a surjective CA has a blocking word, then the set of its FB -periodic points is dense in B A .

The Space of Measures By a measure we mean a Borel shift-invariant probability measure on the Cantor space AZ (see  Ergodic Theory of Cellular Automata). This is a countably additive function  on the Borel sets of AZ which assigns 1 to the full space and satisfies (U) D ( 1 (U)).

Dynamics of Cellular Automata in Non-compact Spaces

A measure on AZ is determined by its values on cylinders (u) :D ([u]n ) which does not depend on n 2 Z. Thus a measure can be identified with a map  : A ! [0; 1] subject to bilateral Kolmogorov compatibility conditions X X (ua) D (au) D (u) ; () D 1 : a2A

a2A

Define the distance of two measures X j(u)  (u)j  jAj2juj : dM (; ) :D u2AC

This is a metric which yields the topology of weak convergence on the compact space MA :D M (AZ ) of shiftinvariant Borel probability measures. A CA F : AZ ! AZ with local rule f determines a continuous and affine map FM : MA ! MA by X (FM ())(u) D (v) : v2 f 1 (u)

Moreover F and F determine the same dynamical system on MA : FM D (F)M . For x 2 GA denote by ˚ x : A ! [0; 1] the function x ˚ (v) D ˚v (x). For every x 2 GA , ˚ x is a shift-invariant Borel probability measure. The map ˚ : GA ! MA is continuous with respect to the Besicovich and weak topologies. In fact we have X dM (˚ x ; ˚ y )  dB (x; y) juj  jAj2juj u2AC

D dB (x; y)

X

n  jAjn

n>0

D dB (x; y)  jAj/(jAj  1)2 : By a theorem of Kamae [9], ˚ is surjective. Every shiftinvariant Borel probability measure has a generic point. It follows from the ergodic theorem that if  is a -invariant measure, then (GA ) D 1 and for every v 2 A , the measure of v is the integral of its density ˚ v , Z (v) D ˚v (x) d : If F is a CA, we have a commutative diagram ˚ FG D FM ˚. FG

GA ! GA

? ? ˚y

FM

? ? y˚

MA ! MA

Theorem 23 Let F be a CA over A. (1) (CA ; F) is surjective iff (MA ; FM ) is surjective. (2) If (GA ; FG ) has dense set of periodic points, then (MA ; FM ) has dense set of periodic points. (3) If A(F) ¤ ;, then (MA ; FM ) is almost equicontinuous. (4) If E(F) ¤ ;, then (MA ; FM ) is equicontinuous. Proof (1) See K˚urka [13] for a proof. (2) This holds since (MA ; FM ) is a factor of (GA ; FG ). (3) It suffices to prove the claim for the case that F is almost equicontinuous. In this case there exists a blocking word u 2 AC and the Dirac measure ı u defined by ( 1/juj if v v u ıu (v) D 0 if v 6v u is equicontinuous for (MA ; FM ). (4) If (CA ; F) is equicontinuous, then all sufficiently long words are blocking and there exists d > 0 such that for all n > 0, and for all x; y 2 AZ such that x[nd;nCd] D y[nd;nCd] we have F k (x)[n;n] D F k (y)[n;n] for all k > 0. Thus there are maps g k : A ! A such that jg k (u)j D maxfjuj  2d; 0g and for every x 2 AZ we have F k (x)[n;n] D F k (x[nkd;nCkd] ) D g k (x[nd;nCd] ), where f is the local rule for F. We get  k  k (); FM ( ) dM FM ˇ ˇ ˇ 1 X ˇ X X ˇ ˇ ˇ ((v)  (v))ˇˇ  jAj2n D ˇ ˇ nD1 u2An ˇv2 f k (u) ˇ ˇ ˇ ˇ 1 X ˇ X ˇ X ˇ ˇ D ((v)  (v))ˇ  jAj2n ˇ ˇ ˇ nD1 u2An ˇv2g 1 (u) ˇ k 

1 X

X

j(v)  (v)j  jAj2n

nD1 v2AnC2d

 jAj4d  dM (; ) :



The Weyl Space Define the following equivalence relation on AZ : x W y iff dW (x; y) D 0. Denote by WA the set of equivalence classes of W and by W : AZ ! WA the projection. The factor of dW is a metric on WA . This is the Weyl space on alphabet A. Using prefix codes, it can be shown that every two Weyl spaces (with different alphabets) are homeomorphic. The Toeplitz space is not dense in the Weyl space (see Blanchard, Cervelle and Formenti [3]). Theorem 24 (Blanchard, Formenti and K˚urka [2]) The Weyl space is pathwise connected, infinite-dimensional and

921

922

Dynamics of Cellular Automata in Non-compact Spaces

homogenous. It is neither separable nor locally compact. It is not complete. Every cellular automaton F : AZ ! AZ is continuous with respect to dW , so it preserves the equivalence W . If dW (x; y) D 0, then dW (F(x); F(y)) D 0. Thus a cellular automaton F defines a continuous map FW : WA ! WA . The shift map  : WA ! WA is again an isometry, so in W A many topological properties are preserved if F is composed with a power of the shift. This is true for example for equicontinuity, almost continuity and sensitivity. If  : WA ! B A is the (continuous) projection and F a CA, then the following diagram commutes. FW

W A ! W A

? ? y

FB

? ? y

B A ! B A

Theorem 25 (Blanchard, Formenti and K˚urka [2]) Let F be a CA on A. (1) (2) (3) (4)

(CA ; F) is surjective iff (WA ; FW ) is surjective. If A(F) ¤ ;, then (WA ; FW ) is almost equicontinuous. if E(F) ¤ ;, then (WA ; FW ) is equicontinuous. If (CA ; F) is chain-transitive, then (WA ; FW ) is chaintransitive.

Theorem 26 (Blanchard, Cervelle and Formenti [3]) No CA is (WA ; FW ) is transitive. Theorem 27 Let ˙ be a subshift attractor of finite type for F (in the Cantor space). Then there exists ı > 0 such that for every x 2 WA satisfying dW (x; ˙ ) < ı, F n (x) 2 ˙ for some n > 0. Thus a subshift attractor of finite type is a W -attractor. Example 2 shows that it need not be B-attractor. Example 3 shows that the assertion need not hold if ˙ is not of finite type. Proof Let U AZ be a C -clopen set such that ˙ D ˝F (U). Let U be a union of cylinders of words of T n e  (U) D length q. Set ˝ n2Z  (U). By a generalization of a theorem of Hurd [7] (see  Topological Dynamics of Cellular Automata), there exists m > 0 such that e  ). If dW (x; ˙ ) < 1/q then there exists l > 0 ˙ D F m (˝

Dynamics of Cellular Automata in Non-compact Spaces, Figure 1 The product ECA184

such that for every k 2 Z there exists a nonnegative j < l such that  kC j (x) 2 U. It follows that there exists n > 0 e  (U) and therefore F nCm (x) 2 ˙ .  such that F n (x) 2 ˝ Examples Example 1 The identity rule Id(x) D x. (B A ; IdB ) and (WA ; IdW ) are chain-transitive (since both B A and W A are connected). However, (CA ; Id) is not chain-transitive. Thus the converse of Theorem 20(6) and of Theorem 25(4) does not hold. Example 2 The product rule ECA128 F(x) i D x i1  x i  x iC1 . (CA ; F), (B A ; FB ) and (WA ; FW ) are almost equicontinous and the configuration 01 is equicontinuous in all these versions. By Theorem 27, f01 g is a W -attractor. However, contrary to a mistaken Proposition 9 in [2], f01 g is not B-attractor. For a given 0 < " < 1 define x 2 AZ by x i D 1 iff 3n (1  ") < jij  3n for some n 0. Then dB (x; 01 ) D " but x is a fixed point, since dB (F(x); x) D lim n!1 2n/3n D 0 (see Fig. 1). Example 3 The traffic ECA184 F(x) i D 1 iff x[i1;i] D 10 or x[i;iC1] D 11. No F q  p is C -almost equicontinuous, so A(F) D ;. However, if dW (x; 01 ) < ı, then dW (F n (x); 01 ) < ı for every n > 0, since F conserves the number of letters 1 in a configuration. Thus 01 is a point of equicontinuity in (TA ; FT ), (B A ; FB ), and (WA ; FW ). This shows that item (2) of Theorems 17, 20 and 25 cannot be converted. The maximal C -attractor ˝F D fx 2 AZ : 8n > 0; 1(10)n 0 6v xg is not SFT. We show that it does not W -attracts points from any of its neighborhood. For a given even integer q > 2 define x 2 AZ by 8 if 9n 0 ; i D qn C 1 ˆ 0 (see Fig. 2, where q D 8). Example 4 The sum ECA90 F(x) i D (x i1 C x iC1 ) mod 2.

Dynamics of Cellular Automata in Non-compact Spaces

Dynamics of Cellular Automata in Non-compact Spaces, Figure 2 The traffic ECA184

Dynamics of Cellular Automata in Non-compact Spaces, Figure 3 The sum ECA90

Both (B A ; FB ) and (WA ; FW ) are sensitive (Cattaneo et al. [4]). For a given n > 0 define a configuration z by n1 z i D 1 iff i D k2n for some k 2 Z. Then F 2 (z) D (01)1 . For any x 2 AZ , we have dW (x; x C z) D 2n but n1 n1 dW (F 2 (x); F 2 (x C z)) D 1/2. The same argument works for (B A ; FB ).

Acknowledgments We thank Marcus Pivato and Francois Blanchard for careful reading of the paper and many valuable suggestions. The research was partially supported by the Research Program Project “Sycomore” (ANR-05-BLAN-0374).

Example 5 The shift ECA170 F(x) i D x iC1 . Since the system has fixed points 01 and 11 , it has uncountable number of periodic points. However, the periodic points are not dense in B A ([3]). Future Directions One of the promising research directions is the connection between the generic space and the space of Borel probability measures which is based on the factor map ˚. In particular Lyapunov functions based on particle weight functions (see K˚urka [12]) work both for the measure space MA and the generic space GA . The potential of Lyapunov functions for the classification of attractors has not yet been fully explored. This holds also for the connections between attractors in different topologies. While the theory of attractors is well established in compact spaces, in noncompact spaces there are several possible approaches. Finally, the comparison of entropy properties of CA in different topologies may be revealing for classification of CA. There is even a more general approach to different topologies for CA based on the concept of submeasure on Z. Since each submeasure defines a pseudometric, it would be interesting to know, whether CA are continuous with respect to any of these pseudometrics, and whether some dynamical properties of CA can be derived from the properties of defining submeasures.

Bibliography Primary Literature 1. Besicovitch AS (1954) Almost periodic functions. Dover, New York 2. Blanchard F, Formenti E, Kurka ˚ P (1999) Cellular automata in the Cantor, Besicovitch and Weyl spaces. Complex Syst 11(2):107–123 3. Blanchard F, Cervelle J, Formenti E (2005) Some results about the chaotic behaviour of cellular automata. Theor Comput Sci 349(3):318–336 4. Cattaneo G, Formenti E, Margara L, Mazoyer J (1997) A shiftinvariant metric on Sz inducing a nontrivial topology. Lecture Notes in Computer Science, vol 1295. Springer, Berlin 5. Formenti E, Kurka ˚ P (2007) Subshift attractors of cellular automata. Nonlinearity 20:105–117 6. Hedlund GA (1969) Endomorphisms and automorphisms of the shift dynamical system. Math Syst Theory 3:320–375 7. Hurd LP (1990) Recursive cellular automata invariant sets. Complex Syst 4:119–129 8. Iwanik A (1988) Weyl almost periodic points in topological dynamics. Colloquium Mathematicum 56:107–119 9. Kamae J (1973) Subsequences of normal sequences. Isr J Math 16(2):121–149 10. Knudsen C (1994) Chaos without nonperiodicity. Am Math Mon 101:563–565 11. Kurka ˚ P (1997) Languages, equicontinuity and attractors in cellular automata. Ergod Theory Dyn Syst 17:417–433 12. Kurka ˚ P (2003) Cellular automata with vanishing particles. Fundamenta Informaticae 58:1–19

923

924

Dynamics of Cellular Automata in Non-compact Spaces

13. Kurka ˚ P (2005) On the measure attractor of a cellular automaton. Discret Continuous Dyn Syst 2005(suppl):524–535 14. Marcinkiewicz J (1939) Une remarque sur les espaces de a.s. Besicovitch. C R Acad Sc Paris 208:157–159 15. Sablik M (2006) étude de l’action conjointe d’un automate cellulaire et du décalage: une approche topologique et ergodique. Ph D thesis, Université de la Mediterranée

Books and Reviews Besicovitch AS (1954) Almost periodic functions. Dover, New York Kitchens BP (1998) Symbolic dynamics. Springer, Berlin Kurka ˚ P (2003) Topological and symbolic dynamics. Cours spécialisés, vol 11 . Société Mathématique de France, Paris Lind D, Marcus B (1995) An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge

Embodied and Situated Agents, Adaptive Behavior in

Embodied and Situated Agents, Adaptive Behavior in STEFANO N OLFI Institute of Cognitive Sciences and Technologies, National Research Council (CNR), Rome, Italy Article Outline Glossary Definition of the Subject Introduction Embodiment and Situatedness Behavior and Cognition as Complex Adaptive Systems Adaptive Methods Evolutionary Robotics Methods Discussion and Conclusion Bibliography Glossary Phylogenesis Indicates the variations of the genetic characteristics of a population of artificial agents throughout generations. Ontogenesys Indicates the variations which occur in the phenotypical characteristics of an artificial agent (i. e. in the characteristics of the control system or of the body of the agent) while it interacts with the environment. Embodied agent Indicates an artificial system (simulated or physical) which has a body (characterized by physical properties such us shape, dimension, weight, etc), actuators (e. g. motorized wheels, motorized articulated joints), and sensors (e. g. touch sensors or vision sensors). For a more restricted definition see the concluding section of the paper. Situated agent Indicates an artificial system which is located in a physical environment (simulated or real) with which it interacts on the basis of the law of physics. For a more restricted definition see the concluding section of the paper. Morphological computation Indicates the ability of the body of an agent (with certain specific characteristics) to control its interaction with the environment so to produce a given desired behavior. Definition of the Subject Adaptive behavior concerns the study of how organisms develop their behavioral and cognitive skills through a synthetic methodology which consists in designing artificial agents which are able to adapt to their environment autonomously. These studies are important both

from a modeling point of view (i. e. for making progress in our understanding of intelligence and adaptation in natural beings) and from an engineering point of view (i. e. for making progresses in our ability to develop artefacts displaying effective behavioral and cognitive skills). Introduction Adaptive behavior research concerns the study of how organisms can develop behavioral and cognitive skills by adapting to the environment and to the task they have to fulfill autonomously (i. e. without human intervention). This goal is achieved through a synthetic methodology, i. e. through the synthesis of artificial creatures which: (i) have a body, (ii) are situated in an environment with which they interact, and (iii) have characteristics which vary during an adaptation process. In the rest of the paper we will use the term “agent” to indicate artificial creatures which posses the first two features described above and the term “adaptive agent” to indicate artificial creatures which also posses the third feature. The agents and the environment might be simulated or real. In the former case the characteristics of agents’ body, motor, and sensory system, the characteristics of the environment, and the rules that regulate the interactions between all the elements are simulated on a computer. In the latter case, the agents consist of physical entities (mobile robots) situated in a physical environment with which they interact on the basis of the physical laws. The adaptive process which regulates how the characteristics of the agents (and eventually of the environment change) change might consist of a population-based evolutionary process and/or of a developmental/learning process. In the former case, the characteristics of the agents do not vary during their “lifetime” (i. e. during the time in which the agents interact with the environment) but phylogenetically, while individual agents “reproduce”. In the latter case, the characteristics of the agents vary ontogenetically, while they interact with the environment. The criteria which determine how variations are generated and/or whether or not variations are retained can be task-dependent and/or task-independent, i. e. might be based on an evaluation of whether the variation increase or decrease agents’ ability to display a behavior which is adapted to the task/environment or might be based on task-independent criteria (i. e. general criteria which do not reward directly the exhibition of the requested skill). The paper is organized as follows. In Sect. “Introduction” we briefly introduce the notion of embodiment and situatedness and their implications. In Sect. “Embodiment and Situatedness” we claim that behavior and cog-

925

926

Embodied and Situated Agents, Adaptive Behavior in

nition in embodied and situated adaptive agents should be characterized as a complex adaptive system. In Sect. “Behavior and Cognition as Complex Adaptive Systems” we briefly describe the methods which can be used to synthesize embodied and situated adaptive agents. Finally in Sect. “Adaptive Methods”, we draw our conclusions. Embodiment and Situatedness The notion of embodiment and situatedness has been introduced [8,9,12,34,48] to characterize systems (e. g. natural organism and robots) which have a physical body and which are situated in a physical environment with which they interact. In this and in the following sections we will briefly discuss the general implications of these two fundamental properties. This analysis will be further extended in the concluding section where we will claim on the necessity to distinguish between a weak and a strong notion of embodiment and situatedness. One first important implication of being embodied and situated consists in the fact that these agents and their parts are characterized by their physical properties (e. g. weight, dimension, shape, elasticity etc.), are subjected to the laws of physics (e. g. inertia, friction, gravity, energy consumption, deterioration etc.), and interact with the environment through the exchange of energy and physical material (e. g. forces, sound waves, light waves etc.). Their physical nature also implies that they are quantitative in state and time [49]. The fact that these agents are quantitative in time implies, for example, that the joints which connect the parts of a robotic arm can assume any possible position within a given range. The fact that these agents are quantitative in time implies, for example, that the effects of the application of a force to a joint depend from the time duration of its application. One second important implication is that the information measured by the sensors is not only a function of the environment but also of the relative position of the agent in the environment. This implies that the motor actions performed by an agent, by modifying the agent/environmental relation or the environment, co-determine the agent sensory experiences. One third important implication is that the information measured by the sensors provide information about the external environment which is egocentric (depends from the current position and the orientation of the agent in the environment), local (only provide information related to the local observable portion of the environment), incomplete (due to visual occlusion, for example), and subjected to noise. Similar characteristics apply to the motor actions produced by the agent’s effectors.

It is important to notice that these characteristics do not only represent constraints but also opportunities to be exploited. Indeed, as we will see in the next section, the exploitation of some of these characteristics might allow embodied and situated agents to solve their adaptive problems through solutions which are robust and parsimonious (i. e. minimal) with respect to the complexity of the agent’s body and control system. Behavior and Cognition as Complex Adaptive Systems In embodied and situated agents, behavioral and cognitive skills are dynamical properties which unfold in time and which arise from the interaction between agents’ nervous system, body, and the environment [3,11,19,29,31] and from the interaction between dynamical processes occurring within the agents’ control system, the agents’ body, and within the environment [4,15,45]). Moreover, behavioral and cognitive skills typically display a multilevel and multi-scale organization involving bottom-up and top-down influences between entities at different levels of organization. These properties imply that behavioral and cognitive skills in embodied and situated agents can be properly characterized as complex adaptive systems [29]. These aspects and the complex system nature of behavior and cognition will be illustrated in more details in the next subsections also with the help of examples. The theoretical and practical implication of these aspects for developing artificial agents able to exhibit effective behavioral and cognitive skills will be discussed in the forthcoming sections. Behavior and Cognition as Emergent Dynamical Properties Behavior and cognition are dynamical properties which unfold in time and which emerge from high-frequent nonlinear interactions between the agent, its body, and the external environment [11]. At any time step, the environmental and the agent/ environmental relation co-determine the body and the motor reaction of the agent which, in turn, co-determines how the environment and/or the agent/environmental relation vary. Sequences of these interactions, occurring at a fast time rate, lead to a dynamical process – behavior – which extends over significant larger time span than the interactions (Fig. 1). Since interactions between the agent’s control system, the agent’s body, and the external environment are nonlinear (i. e. small variations in sensory states might lead to significantly different motor actions) and dynamical (i. e.

Embodied and Situated Agents, Adaptive Behavior in

Embodied and Situated Agents, Adaptive Behavior in, Figure 2 A schematization of the passive walking machine developed by McGeer [24]. The machine includes two passive knee joints and a passive hip joint

Embodied and Situated Agents, Adaptive Behavior in, Figure 1 A schematic representation of the relation between agent’s control system, agent’s body, and the environment. The behavioral and cognitive skills displayed by the agent are the emergent result of the bi-directional interactions (represented with full arrows) between the three constituting elements – agent’s control system, agent’s body, and environment. The dotted arrows indicate that the three constituting elements might be dynamical systems on their own. In this case, agents’ behavioral and cognitive skills result of the dynamics originating from the agent/body/environmental interactions but also from the combination and the interaction between dynamical processes occurring within the agent’s body, within the agent’s control system, and within the environment (see Sect. “Embodiment and Situatedness”)

small variations in the action performed at time t might significantly impact later interactions at time tCx ) the relation between the rules that govern the interactions and the behavioral and cognitive skills originating from the interactions tend to be very indirect. Behavioral and cognitive skills thus emerge from the interactions between the three foundational elements and cannot be traced back to any of the three elements taken in isolation. Indeed, the behavior displayed by an embodied and situated agent can hardly be predicted or inferred from an external observer even on the basis of a complete knowledge of the interacting elements and of the rules governing the interactions. A clear example of how behavioral skill might emerge from the interaction between the agents’ body and the environment is constituted by the passive walking machines developed in simulation by McGeer [24] – a two-dimensional bipedal machines able to walk down a four-degree

slope with no motors and no control system (Fig. 2). The walking behavior arises from the fact that the physical forces resulting from gravity and from the collision between the machine and the slope produce a movement of the robot and the fact that robot’s movements produce a variation of the agent-environmental relation which in turn produce a modification of the physical forces to which the machine will be subjected in the next time step. The sequence of by-directional effects between the robot’s body and the environment can lead to a stable dynamical process – the walking behavior. The type of behavior which arises from the robot/ environmental interaction depends from the characteristics of the environment, the physics law which regulate the interaction between the body and the environment, and the characteristics of the body. The first two factors can be considered as fixed but the third factor, the body structure, can be adapted to achieve a given function. Indeed, in the case of this biped robot, the author carefully selected the leg length, the leg mass, and the foot size to obtain the desired walking behavior. In more general term, this example shows how the role of regulating the interaction between the robot and the environment in the appropriate way can be played not only but the control system but also from the body itself providing that the characteristics of the body has been shaped so to favor the exhibition of the desired behavior. This property, i. e. the ability of the body to control its interaction with the environment, has been named with the term “morphological computation” [35]. For related work which demonstrate how effective walking machines can be obtained by integrating passive walking techniques with simple control mecha-

927

928

Embodied and Situated Agents, Adaptive Behavior in

nisms, see [6,13,50]. For related works which show the role of elastic material and elastic actuators for morphological computing see [23,40]. To illustrate of how behavioral and cognitive skills might emerge from agent’s body, agent’s control system, and environmental interactions we describe a simple experiment in which a small wheeled robot situated in an arena surrounded by walls has been evolved to find and to remain close to a cylindrical object. The Khepera robot [26] is provided with eight infrared sensors and two motors controlling the two corresponding wheels (Fig. 3). From the point of view of an external observer, solving this problem requires robots able to: (a) explore the environment until an obstacle is detected, (b) discriminate whether the obstacle detected is a wall or a cylindrical object, and (c) approach or avoid objects depending on the object type. Some of these behaviors (e. g. the wall-avoidance behavior) can be obtained through simple control mechanisms but others require non trivial control mechanisms. Indeed, a detailed analysis of the sensory patterns experienced by the robot indicated that the task of discriminating the two objects is far from trivial since the two classes of sensory patterns experienced by robots close to a wall and close to cylindrical objects largely overlap. The attempt to solve this problem through an evolutionary adaptive method (see Sect. “Behavior and Cognition as Complex Adaptive Systems”) in which the free parameters (i. e. the parameters which regulate the fine-

Embodied and Situated Agents, Adaptive Behavior in, Figure 3 Left: The agent situated in the environment. The agent is a Khepera robot [26]. The environment consists of an arena of 60 × 35 cm containing cylindrical objects placed in a randomly selected location. Right: Angular trajectories of an evolved robot close to a wall (top graph) and to a cylinder (bottom graph). The picture was obtained by placing the robot at a random position in the environment, leaving it free to move for 500 time steps each lasting 100 ms, and recording its relative movements with respect to the two types of objects for distances smaller than 45 mm. The x-axis and the y-axis indicate the relative angle (in degrees) and distance (in mm) between the robot and the corresponding object. For sake of clarity, arrows are used to indicate the relative direction, but not the amplitude of movements

grained interaction between the robot and the environment) are varied randomly and in which variations are retained or discarded on the basis on an evaluation of the overall ability of the robot (i. e. on the basis of the time spent by the robot close to the cylindrical object) demonstrated how adaptive robots can find solutions which are robust and parsimonious in term of control mechanisms [28]. Indeed, in all replications of these experiment, evolved robot solve the problem by moving forward, by avoiding walls, and by oscillating back and fourth and left and right close to cylindrical objects (Fig. 3, right). All these behaviors result from sequences of interactions between the robot and the environment mediated by four types of simple control rules which consist in: turning left when the right infrared sensors are activated, turning right when the left infrared sensors are activated, moving back when the frontal infrared sensors are activated, and moving forward when the frontal infrared sensors are not activated. To understand how these simple control rules can produce the required behaviors and the required arbitration between behaviors we should consider that the same motor responses produce different effects on different agent/environmental situations. For example, the execution of a left-turning action close to a cylindrical object and the subsequent modification of the robot/object relative position produce a new sensory state which triggers a right-turning action. Then, the execution of the latter action and the subsequent modification of the robot/object relative position produce a new sensory state which triggers a left-turning action. The combination and the alternation of these left and right-turning actions over time produce an attractor in the agent/environmental dynamics (Fig. 3, right, bottom graph) which allows the robot to remain close to the cylindrical object. On the other hand the execution of a left-turning behavior close to a wall object and the subsequent modification of the robot/wall position produce a new sensory state which triggers the reiteration of the same motor action. The execution of a sequence of left-turning action then leads to the avoidance of the object and to a modification of the robot/environmental relation which finally lead to a perception of a sensory state which trigger a move-forward behavior (Fig. 4, right, top graph). Before concluding the description of this experiment, it is important to notice that, although the rough classification of the robot motor responses into four different types of actions is useful to describe the strategy with which these robots solve the problem qualitatively, the quantitative aspects which characterize the robot motor reactions (e. g. how sharply a robot turns given a certain pattern of activation of the infrared sensors) are crucial for determin-

Embodied and Situated Agents, Adaptive Behavior in

Embodied and Situated Agents, Adaptive Behavior in, Figure 4 Left: The e-puck robot developed at EPFL, Switzerland http://www.e-puck.org/. Center: The environment which have a size of 52 cm by 60 cm. The light produced by the light bulb located on the left side of the central corridor cannot be perceived from the other two corridors. Right: The motor trajectory produced by the robot during a complete lap of the environment

ing whether the robot will be able to solve the problem or not. Indeed, small differences in the robot’s motor response tend to cumulate in time and might prevent the robot for producing successful behavior (e. g. might prevent the robot to produce a behavioral attractor close to cylindrical objects). This experiment clearly exemplifies some important aspects which characterize all adaptive behavioral system, i. e. systems which are embodied and situated and which have been designed or adapted so to exploit the properties that emerge from the interaction between their control system, their body, and the external environment. In particular, it demonstrates how required behavioral and cognitive skills (i. e. object categorization skills) might emerge from the fine-grained interaction between the robot’s control system, body, and the external environment without the need of dedicated control mechanisms. Moreover, it demonstrates how the relation between the control rules which mediate the interaction between the robot body and the environment and the behavioral skills exhibited by the agents are rather indirect. This means, for example, that an external human observer can hardly predict the behaviors which will be produced by the robot, before observing the robot interacting with the environment, even on the basis of a complete description of the characteristics of the body, of the control rules, and of the environment. Behavior and Cognition as Phenomena Originating from the Interaction Between Coupled Dynamical Processes Up to this point we restricted our analysis to the dynamics originating from the agent’s control system, agents’ body, and environmental interactions. However, the body of an agent, its control system, and the environment

might have their own dynamics (dotted arrows in Fig. 1). For the sake of clarity, we will refer to the dynamical processes occurring within the agent control system, within the agent body, or within the environment as internal dynamics and to the dynamics originating from the agent/body/environmental interaction as external dynamics. In cases in which agents’ body, agents’ control system, or the environment have their own dynamics, behavior should be characterized as a property emerging from the combination of several coupled dynamical processes. The existence of several concurrent dynamical processes represents an important opportunity for the possibility to exploit emergent features. Indeed, behavioral and cognitive skills might emerge not only from the external dynamics, as we showed in the previous section, but also from the internal dynamical processes or from the interaction between different dynamical processes. As an example which illustrates how complex cognitive skills can emerge from the interaction between a simple agent/body/environmental dynamic and a simple agent’s internal dynamic consider the case of a wheeled robot placed in a maze environment (Fig. 4) which has been trained to: (a) produce a wall-following behavior which allows the robot to periodically visit and re-visit all environmental areas, (b) identify a target object constituted by a black disk which is placed in a randomly selected position of the environment for a limited time duration, and (c) recognize the location in which the target object was previously found every time the robot re-visit the corresponding location [15]. The robot has infrared sensors (which provide information about nearby obstacles), light sensors (which provide information about the light gradient generated by the light bulb placed in the central corridor), ground sensors (which detect the color of the ground), two motors

929

930

Embodied and Situated Agents, Adaptive Behavior in

(which control the desired speed of the two corresponding wheels), and one additional output units which should be turned on when the robot re-visit the environmental area in which the black disk was previously found. The robot’s controller consists of a three layers neural network which includes a layer of sensory neurons (which encode the state of the corresponding sensors), a layer of motor neurons which encode the state of the actuators, and a layer of internal neurons which consist of leaky integrators operating at tuneable time scale [3,15]. The free parameters of the robot’s neural controllers (i. e. the connection weights, and the time constant of the internal neurons which regulate the time rate at which this neurons change their state over time) were adapted through an evolutionary technique [31]. By analyzing the evolved robot the authors observed how they are able to generate a spatial representation of the environment and of their location in the environment while they are situated in the environment itself. Indeed, while the robot travel by performing different laps of the environment (see Fig. 4, right), the states of the two internal neurons converge on a periodic limit cycle dynamic in which different states correspond to different locations of the robot in the environment (Fig. 5). As we mentioned above, the ability to generate this form of representation which allow the robot to solve its adaptive problem originate from the coupling between a simple robot’s internal dynamics and a simple robot/body/environmental dynamics. The former dynamics is characterized by the fact that the state of the two internal neurons tends to move slowly toward different fixed point attractors, in the robot’s internal dynamics, which correspond to different type of sensory states exemplified in Fig. 5. The latter dynamics originate from the fact that different types of sensory states last for different time durations and alternate with a given order while the robot move in the environment. The interaction between these two dynamical processes leads to a transient dynamics of agents’ internal state which moves slowly toward the current fixed point attractor without never fully reaching it (thus preserving information about previously experienced sensory states, the time duration of these states, and the order with which they have been experienced). The coupling between the two dynamical processes originates from the fact that the free parameters which regulate the agent/environmental dynamics (e. g. the trajectory and the speed with which the robot moves in the environment) and the agent internal dynamics (e. g. the direction and the speed with which the internal neurons change their state) have been co-adapted and co-shaped during the adaptive process.

Embodied and Situated Agents, Adaptive Behavior in, Figure 5 The state of the two internal neurons (i1 and i2) of the robot recorded for 330 s while the robot performs about 5 laps of the environment. The s, a, b, c, and d labels indicate the internal states corresponding to five different positions of the robot in the environment shown in Fig. 4. The other labels indicate the position of the fixed point attractors in the robot’s internal dynamics corresponding to five types of sensory states experienced by the robot when it detects: a light in its frontal side (LF), a light on its rear side (LR), an obstacle on its right and frontal side (OFR), an obstacle on its right side (OR), no obstacles and no lights (NO)

For related works which show how navigation and localization skills might emerge from the coupling between agent’s internal and external dynamics, see [45]. For other works addressing other behavioral/cognitive capabilities see [4] for what concerns categorization, [16,41] for what concerns selective attention and [44] for what concern language and compositionality.

Behavior and Cognition as Phenomena with a Multi-Level and Multi-Scale Organization Another fundamental feature that characterizes behavior is the fact that it is a multi-layer system with different levels of organizations extending at different time scales [2,19]. More precisely, as exemplified in Fig. 6, the behavior of an agent or of a group of agents involve both lower and higher level behaviors which extend for shorter or longer time spans, respectively. Lower level behaviors arise from few agent/environmental interactions and short term internal dynamical processes. Higher level behaviors, instead, arise from the combination and interaction of lower level behaviors and/or from long term internal dynamical processes.

Embodied and Situated Agents, Adaptive Behavior in

Embodied and Situated Agents, Adaptive Behavior in, Figure 6 A schematic representation of multi-level and multi-scale organization of behavior. The behaviors represented in the inner circles represent elementary behaviors which arise from finegrained interactions between the control system, the body, and the environment, and which extend over limited time spans. The behaviors represented in the external circles represent higher level behaviors which arise from the combination and interaction between lower-level behaviors and which extend over longer time spans. The arrows which go from higher level behavior toward lower levels indicate the fact that the behaviors currently exhibited by the agents later affect the lower level behaviors and/or the fine-grained interaction between the constituting elements (agent’s control system, agent’s body, and the environment)

The multi-level and multi-scale organization of agents’ behavior play important roles: it is one of the factors which allow agents to produce functionally useful behavior without necessarily developing dedicated control mechanisms [8,9,29], it might favor the development of new behavioral and/or cognitive skills thanks to the recruitment of pre-existing capabilities [22], it allow agents to generalize their skills in new task/environmental conditions [29]. An exemplification of how the multi-level and multiscale organization of behavior allow agents to generalize their skill in new environmental conditions is represented by the experiments carried our by Baldassarre et al. [2] in which the authors evolved the control system of a group of robots assembled into a linear structure (Fig. 7) for the ability to move in a coordinated manner and for the ability to display a coordinated light approaching behavior. Each robot [27] consists of a mobile base (chassis) and a main body (turret) that can rotate with respect to the chassis along the vertical axis. The chassis has two drive mechanisms that control the two corresponding tracks and teethed wheels. The turret has one gripper, which allows robots to assemble together and to grasp objects, and a motor controlling the rotation of the turret with respect to the chassis. Robots are provided with a traction sensor, placed at the turret-chassis junction, that detects the intensity and the direction of the force that the turret exerts on the chassis (along the plane orthogonal to the vertical axis) and light sensors. Given that the orientations of individual robots might vary and given that the target light might be out of sight, robots need to coordinate to choose a common direction of movement and to change their direction as soon as one or few robots start to detect a light gradient. Evolved individuals show the ability to negotiate a common direction of movement and by approaching

Embodied and Situated Agents, Adaptive Behavior in, Figure 7 Left: Four robots assembled into a linear structure. Right: A simulation of the robots shown in the left part of the figure

931

932

Embodied and Situated Agents, Adaptive Behavior in

level organization can be observed. The simpler behaviors that can be identified consist of low level individual behaviors which extend over short time spans:

Embodied and Situated Agents, Adaptive Behavior in, Figure 8 The behavior produced by eight robots assembled into a circular structure in a maze environment including walls and cylindrical objects (represented with gray lines and circles). The robots start in the central portion of the maze and reach the light target located in the bottom-left side of the environment (represented with an empty circle) by exhibiting a combination of coordinated-movement behaviors, collective obstacle-avoidance, and collective light-approaching behaviors. The irregular lines, that represent the trajectories of the individual robots, show how the shape of the assembled robots changes during motion by adapting to the local structure of the environment

light targets as soon as a light gradient is detected. By testing evolved robots in different conditions the authors observed that they are able to generalize their skills in new conditions and also to spontaneously produce new behaviors which have not been rewarded during the evolutionary process. More precisely, groups of assembled robots display a capacity to generalize their skills with respect to the number of robots which are assembled together and to the shape formed by the assembled robots. Moreover, when the evolved controllers are embodied in eight robots assembled so to form a circular structure and situated in the maze environment shown in Fig. 8, the robots display an ability to collectively avoid obstacles, to rearrange their shape so to pass through narrow passages, and to explore the environment. The ability to display all these behavioral skills allow the robots to reach the light target even in large maze environments, i. e. even in environmental conditions which are rather different from the conditions that they experienced during the training process (Fig. 8). By analyzing the behavior displayed by the evolved robots tested in the maze environment, a complex multi-

1. A move-forward behavior which consists of the individuals’ ability to move forward when the robot is coordinated with the rest of the team, is oriented toward the direction of the light gradient (if any), and does not collide with obstacles. This behavior results from the combination of: (a) a control rule which produces a move forward action when the perceived traction has a low intensity and when difference between the intensity of the light perceived on the left and the right side of the robot is low, and (b) the sensory effects of the execution of the move forward action selected mediated by the external environment which does not produce a variation of the state of the sensors until the conditions that should be satisfied to produce this behaviors hold. 2. A conformistic behavior which consists of the individuals’ ability to conform its orientation with that of the rest of the team when the two orientations differ significantly. This behavior results from the combination of: (a) a control rule that makes the robot turns toward the direction of the traction when its intensity is significant, and (b) the sensory effects produced by the execution of this action mediated by the external environment that lead to a progressive reduction of the intensity of the traction until the orientation of the robot conform with the orientation of the rest of the group. 3. A phototaxis behavior which consists of the individuals’ ability to orient toward the direction of the light target. This behavior results from the combination of: (a) a control rule that makes the robot turns toward the direction in which the intensity of the light gradient is higher, and (b) the sensory effects produced by the execution of this action mediated by the external environment that lead to a progressive reduction of the difference in the light intensity detected on the two side of the robot until the orientation of the robot conforms with the direction of the light gradient. 4. An obstacle-avoidance behavior which consists of the individuals’ ability to change direction of motion when the execution of a motor action produced a collision with an obstacle. This behavior results from the combination of: (a) the same control rule which lead to behavior #2 which make the robot turns toward the direction of the perceived traction (which in this case is caused by the collision with the obstacle while in the case of behavior #2 is caused by the forces exhorted by the other assembled robots), and (b) the sensory effects produced by the execution of the turning action mediated by the

Embodied and Situated Agents, Adaptive Behavior in

external environment which make the robot turns until collisions do not prevent anymore the execution of a moving forward behavior. The combination and the interaction between these three behaviors produce the following higher levels collective behaviors that extend over a longer time span: 5. A coordinated-motion behavior which consists in the ability of the robots to negotiate a common direction of movement and to keep moving along such direction by compensating further misalignments originating during motion. This behavior emerges from the combination and the interaction of the conformistic behavior (which plays the main role when robots are misaligned) and the move-forward behavior (which plays the main role when robots are aligned). 6. A coordinated-light-approaching behavior which consists in the ability of the robots to co-ordinately move toward a light target. This behavior emerges from the combination of the conformistic, the move-forward, and the phototaxis behaviors (which is triggered when the robots detect a light gradient). The relative importance of the three control rules which lead to the three corresponding behaviors depends both on the strength of the corresponding triggering condition (i. e. the extent of lack of traction forces, the intensity of traction forces, and the intensity of the light gradient, respectively) and on a priority relations among behaviors (i. e. the fact that the conformistic behavior tends to play a stronger role than the phototaxis behavior). 7. A coordinated-obstacle-avoidance behavior which consists in the ability of the robots to co-ordinately turn to avoid nearby obstacles. This behavior arises as the result of the combination of the obstacle avoidance-, the conformistic and the move-forward behaviors. The combination and the interaction between these behaviors lead to the following higher levels collective behaviors that extend over still longer time spans: 8. A collective-exploration-behavior which consists in the ability of the robots to visit different area on the environment when the light target cannot be detected. This behavior emerges from the combination of the coordinated-motion behavior and the coordinate obstacleavoidance behavior which ensures that the assembled robots can move in the environment without getting stuck and without entering into limit cycle trajectories. 9. A shape-re-arrangement behavior which consists in the ability of the assembled robots to dynamically adapt their shape to the current structure of the environment so to pass through narrow passages especially when the

passages to be negotiated are in the direction of the light gradient. This behavior emerges from the combination and the interaction between coordinated motion and coordinated-light-approaching behaviors mediated by the effects produced by relative differences in motion between robots resulting from the execution of different motor actions and/or from differences in the collisions. The fact that the shape of the assembled robots adapt to the current environmental structure so to facilitate the overcoming of narrow passages can be explained by considering that collisions produce a modification of the shape which affect on particular the relative position of the colliding robots. The combination and the interaction of all these behavior leads to a still higher level behavior: 10. A collective-navigation-behavior which consists in the ability of the assembled robots to navigate toward the light target by producing coordinated movements, exploring the environment, passing through narrow passages, and producing a coordinated-lightapproaching behavior (Fig. 8). This analysis illustrates two important mechanisms which explain the remarkable generalization abilities of these robots. The first mechanism consists in the fact that the control rules which regulate the interaction between the agents’ and the environment so to produce certain behavioral skills in certain environmental conditions will produce different but related behavioral skills in other environmental conditions. In particular, the control rules which generate the behaviors #5 and #6 for which evolving robots have been evolved in an environment without obstacles also produce behavior #7 in an environment with obstacles. The second mechanism consists in the fact that the development of certain behaviors at a given level of organization which extend for a given time span will automatically lead to the exhibition of related higher-level behaviors extending at longer time spans which originate form the interactions from the former behaviors (even if these higher level behaviors have not being rewarded during the adaptation process). In particular, the combination and the interaction of behaviors #5, #6, and #7 (which have been rewarded during the evolutionary process or which arise from the same control rules which lead to the generation of rewarded behaviors) automatically lead to the production of behaviors #8, #9, and #10 (which have not been rewarded). Obviously, there no warranty that the new behaviors obtained as a result of these generalization processes will play useful functions. However, the fact that these behaviors are related to the other functional behav-

933

934

Embodied and Situated Agents, Adaptive Behavior in

ioral skills implies that the probabilities that these new behavior will play useful functions is significant. In principle, these generalization mechanisms can also be exploited by agents during their adaptive process to generate behavioral skills which play new functionalities and which emerge from the combination and the interaction between pre-existing behavioral skills playing different functions. On the Top-Down Effect from Higher to Lower Levels of Organization In the previous sections we have discussed how the interactions between the agents’ body, the agents’ control system, and the environment lead to behavioral and cognitive skills and how such skills have a multi-level and multiscale organization in which the interaction between lowerlevel skills lead to the emergence of higher-level skills. However, higher level skills also affect lower level skills up to the fine-grained interaction between the constituting elements (agents’ body, agents’ control system, and environment). More precisely, the behaviors which originate from the interaction between the agent and the environment and from the interaction between lower levels behaviors, later affect the lower levels behaviors and the interaction from which they originate. These bi-directional influences between different levels of organization can lead to circular causality [20] where high level processes act as independent entities which constraint the lower level processes from which they originate. One of the most important effects of this top-down influences consists in the fact that the behavior exhibited by an agent constraint the type of sensory patterns that the agent will experience later on (i. e. constraint the fine-grained agent/environmental interactions which determine the behavior that will be later exhibited by the agent). Since the complexity of the problem faced by an agent depends on the sensory information experienced by the agent itself, this top down influences can be exploited in order to turn hard problems into simple ones. One neat demonstration of this type of phenomena is given by the experiments conducted by Marocco and Nolfi [32] in which a simulated finger robot with six degree of freedom provided with sensors of its joint positions and with rough touch sensors is asked to discriminate between cubic and spherical objects varying in size. The problem is not trivial since, in general terms, the sensory patterns experienced by the robot do not provide clear regularities for discriminating between the two types of objects. However, the type of sensory states which are experienced by the agent also depend on the behavior previously exhib-

ited by the agent itself – agents exhibiting different behavior might face simpler or harder problems. By evolving the robots in simulation for the ability to solve this problem and by analyzing the complexity of the problem faced by robots of successive generations, the authors observed that the evolved robot manage to solve their adaptive problem on the basis of simple control rules which allow the robot to approach the object and to move following the surface of the object from left to right, independently from the object shape. The exhibition of this behavior in interaction with objects characterized by a smooth or irregular surface (in the case of spherical or cubic objects, respectively) ensures that the same control rules lead to two types of behaviors depending on the type of the object. These behaviors consist in following the surface of the object and then moving away from the object in the case of spherical objects, and in following the surface of the object by getting stuck in a corner in the case of cubic objects. The exhibition of these two behaviors allows the agent to experience rather different proprioceptors states as a consequence of having had interacted with spherical or cubic object which nicely encode the regularities which are necessary to differentiate the two types of objects. For other examples which shows how adaptive agents can exploit the fact that behavioral and cognitive processes which arise from the interaction between lower-level behaviors or between the constituting elements later affect these lower level processes see [4,28,39]. Adaptive Methods In this section we briefly review the methods through which artificial embodied and situated agents can develop their skill autonomously while they interact at different levels of organization with the environment and eventually with other agents. These methods are inspired by the adaptive process observed in nature: evolution, maturation, development, and learning. We will focus in particular on self-organized adaptive methodologies in which the role of the experimenter/designer is reduced to the minimum and in which the agents are free to develop their strategy to solve their adaptive problems within a large number of potentially alternative solutions. This choice is motivated by the following considerations: (a) These methods allow agents to identify the behavioral and cognitive skills which should be possessed, combined, and integrated so to solve the given problem. In other words, these methods can come up with effective ways of decomposing the overall required skill into a collection of simpler lower levels skills. Indeed,

Embodied and Situated Agents, Adaptive Behavior in

as we showed in the previous section, evolutionary adaptive techniques can discover ways of decomposing the high-level requested skill into lower-levels behavioral and cognitive skills so find solutions which are effective and parsimonious thanks to the exploitation of properties emerging from the interaction between lower-levels processes and skills and thanks to the recruitment of previously developed skills for performing new functions. In other words, these methods release the designer from the burden of deciding how the overall skill should be divided into a set of simpler skills and how these skills should be integrated. More importantly, these methods can come up with solutions exploiting emergent properties which would be hard to design [17,31]. (b) These methods allow agents to identify how a given behavioral and cognitive skill can be produced, i. e. the appropriate fine-grained characteristics of agents’ body structure and control rules regulating the agent/environmental interaction. As for the previous aspect, the advantage of using adaptive techniques lies not only in the fact that the experimenter is released from the burden of designing the fine-grained characteristics of the agents but also in the fact that adaptation might prove more effective than human design due to the inability of an external observer to foresee the effects of large number of non-linear interactions occurring at different levels of organization. (c) These methods allow agents to adapt to variations of the task, of the environment, and of the social conditions. Current approaches, in this respect, can be grouped into two families which will be illustrated in the following subsections and which include Evolutionary Robotics methods and Developmental Robotics methods. Evolutionary Robotics Methods Evolutionary Robotics [14,31] is a method which allow to create embodied and situated agents able to adapt to their task/environment autonomously through an adaptive process inspired by natural evolution [18] and, eventually, through the combination of evolutionary, developmental, and learning processes. The basic idea goes as follows (Fig. 9). An initial population of different artificial genotypes, each encoding the control system (and possibly the morphology) of an agent, is randomly created. Each genotype is translated into a corresponding phenotype (i. e. a corresponding agent) which is then left free to act (move, look around, manipulate the environment etc.) while its performance

Embodied and Situated Agents, Adaptive Behavior in, Figure 9 A schematic representation of the evolutionary process. The stripes with black and white squares represent individual genotypes. The rectangular boxes indicate the genome of a population of a certain generation. The small robots placed inside the square on the right part of the figure represent a group of robots situated in an environment which interact with the environment and between themselves

(fitness) with respect to a given task is automatically evaluated. In cases in which this methodology is applied to collective behaviors, agents are evaluated in groups which might be heterogeneous or homogeneous (i. e. might consist of agents which differ not with respect to their genetic and phenotypic characteristics). The fittest individuals (those having higher fitness) are allowed to reproduce by generating copies of their genotype with the addition of changes introduced by some genetic operators (e. g., mutations, exchange of genetic material). This process is repeated for a number of generations until an individual or a group of individuals is born which satisfies the performance level set by the user. The process that determines how a genotype (i. e. typically a string of binary values) is turned into a corresponding phenotype (i. e. a robot with a given morphology and control system) might consist of a simple one-to-one mapping or of a complex developmental process. In the former case, many of the characteristics of the phenotypical individual (e. g. the shape of the body, the number and position of the sensors and of the actuators, and in some case the architecture of the neural controller) are pre-determined and fixed and the genotype encodes a vector of

935

936

Embodied and Situated Agents, Adaptive Behavior in

free parameters (e. g. the connection weights of the neural controller [31]). In the latter case, the genotype might encode a set of rules that determine how the body structure and the control system of the individual growth during an artificial developmental processes. Through these type of indirect developmental mappings most of the characteristics of the phenotypical robot can be encoded in the genotype and subjected to the evolutionary adaptive process [31,36]. Finally, in some cases the adaptation process might involve both an evolutionary process that regulates how the characteristics of the robots vary phylogenetically (i. e. throughout generations) and a developmental/learning process which regulates how the characteristics of the robots vary ontogenetically (i. e. during the phase in which the robots act in the environment [30]). Evolutionary methods can be used to allow agents to develop the requested behavioral and cognitive skills from scratch (i. e. starting from agents which do not have any behavioral or cognitive capability) or in an incremental manner (i. e. starting from pre-evolved robots which already have some behavioral capability which consists, for example, in the ability to solve a simplified version of the adaptive problem). The fitness function which determines whether an individual will be reproduced or not might also include, in the addition to a component that score the performance of the agent with respect to a given task, additional taskindependent components. These additional components, in fact, can lead to the development of behavioral skills which are not necessarily functional but which can favor the development of functional skills later on [37]. Evolutionary methods can allow agents to develop low-levels behavioral and cognitive skills which have been previously identified by the designer/experimenter, which might later be combined and integrated in order to realize the high-level requested skill, or directly to develop the high-level requested skill. In the former case the adaptive process leads to the identification of the fine-grained features of the agent (e. g. number and type of sensors, body shape, architecture and connection weights of the neural controller) which by interacting between themselves and with the environment will produce the required skill. In the latter case, the adaptive process leads to the identification of the lower-levels skills (at different levels of organization) which are necessary to produce the required high-level skill, the identification of the way in which these lower levels skills should be combined and integrated, and (as for the formed case) the identification of the fine grained features of the agent which, in interaction with the physical and social environment, will produce the required behavioral or cognitive skills.

Developmental Robotics Methods Developmental Robotics [1,10,21], also known as epigenetic robotics, is a method for developing embodied and situated agents that adapt to their task/environment autonomously through processes inspired by biological developmental and learning processes. Evolutionary and developmental robotics methods share the same fundamental assumptions but also present differences for what concerns the way in which they are realized and the type of situations in which they are typically applied. For what concerns the former aspect, unlike evolutionary robotics methods which operate on ‘long’ phylogenetic time scales, developmental methods typically operate on ‘short’ ontogenetic time scales. For what concerns the latter aspects, unlike evolutionary methods which are usually used to develop behavioral and cognitive skills from scratch, developmental methods are typically adopted to model the development of complex developmental and cognitive skills from simpler pre-existing skills which represents pre-requisites for the development of the required skills. At the present stage, developmental robotics does not consist of a well defined methodology [1,21] but rather of a collection of approaches and methods often addressing complementary aspects which hopefully would be integrated in a single methodology in the future. Below briefly summarize some of the most important methodological aspects of the developmental robotics approach. The Incremental Nature of the Developmental Process Development should be characterized as an incremental process in which pre-existing structures and behavioral skills constitute important prerequisites and constraints for the development of more complex structures and behavioral skills and in which the complexity of the internal and external characteristics increases during developmental. One crucial aspect of developmental approach therefore consists in the identification of the initial characteristics and skills which should enable the bootstrapping of the developmental process: the layering of new skills on top of existing ones [10,25,38]. Another important aspect consists in shaping the developmental process so to ensure that the progressive increase in the complexity of the task matches the current competency of the system and so to drive the developmental process toward the progressive acquisition of the skills which represent the prerequisites for further developments. The progressive increase in complexity might concern not only the complexity of the task or of the required skills but also the complexity of single components of the robot/environmental interac-

Embodied and Situated Agents, Adaptive Behavior in

tion such us, for example, the number of freeze/unfreeze degrees of freedom [5]. The Social Nature of the Developmental Process Development should involve social interaction with human subjects and with other developing robots. Social interactions (e. g. scaffolding, tutelage, mimicry, emulation, and imitation), in fact, play an important role not only for the development of social skills [7] but also as facilitators for the development of individual cognitive and behavioral skills [47]. Moreover, other types of social interactions (i. e. alignment processes or social games) might lead to the development of cognitive and/or behavioral skills which are generated by a collection of individuals and which could not be developed by a single individual robot [43]. Exploitation of the Interaction Between Concurrent Developmental Processes Development should involve the exploitation of properties originating from the interaction and the integration of several co-occurring processes. Indeed, the co-development of different skills at the same time can favor the acquisition of the corresponding skills and of additional abilities arising from the combination and the integration of the developed skills. For example, the development of an ability to anticipate the sensory consequences of our own actions might facilitate the concurrent development of other skills such us categorical perception skills [46]. The development of an ability to pay attention to new situations (curiosity) and to look for new experiences after some time (boredom) might improve the learning of a given functional skill [33,42]. The codevelopment of behavioral and linguistic skills might favor the acquisition of the corresponding skills and the development of semantic combinatoriality skills Sugita and Tani [44]. Discussion and Conclusion In this paper we described how artificial agents which are embodied and situated can develop behavioral and cognitive skills autonomously while they interact with their physical and social environment. After having introduced the notion of embodiment and situatedness, we illustrated how the behavioral and cognitive skills displayed by adaptive agents can be properly characterized as complex system with multi-level and multi-scale properties resulting from a large number of interaction at different levels of organization and involving both bottom-up processes (in which the interaction between elements at lower levels of organization lead to higher levels properties) and top-down processes (in

which properties at a certain level of organization later affect lower level properties or processes). Finally, we briefly introduced the methods which can be used to synthesize adaptive embodied and situated agents. The complex system nature of adaptive agents which are embodied and situated has important implications which constraint the organization of these systems and the dynamics of the adaptive process through which they develop their skills. For what concerns the organization of these systems, it implies that agents’ behavioral and/or cognitive skills (at any stage of the adaptive process) cannot be traced back to anyone of the three foundational elements (i. e. the body of the agents, the control system of the agents, and the environment) in isolation but should rather be characterized as properties which emerge from the interactions between these three elements and the interaction between behavioral and cognitive properties emerging from the former interactions at different levels of organizations. Moreover, it implies that ‘complex’ behavioral or cognitive skills might emerge from the interaction between simple properties or processes. For what concerns agents’ adaptive process, it implies that the development of new ‘complex’ skills does not necessarily require the development of new ‘complex’ morphological features or new ‘complex’ control mechanisms. Indeed, new ‘complex’ skills might arise from the addition of new ‘simple’ features or new ‘simple’ control rules which, in interaction with the pre-existing features and processes, might produce the required new behavioral or cognitive skills. The study of adaptive behavior in artificial agents which has been reviewed in this paper has important implication both from an engineering point of view (i. e. for progressing in our ability to develop effective machines) and from a modeling point of view (i. e. for understanding the characteristics of biological organisms). In particular, from an engineering point of view, progresses in our ability to develop adaptive embodied and situated agents can lead to development of machines playing useful functionalities. From a modeling point of view, progresses in our ability to model and analyze artificial adaptive agents can improve our understanding of the general mechanisms behind animal and human intelligence. For example, the comprehension of the complex system nature of behavioral and cognitive skills illustrated in this paper can allow us to better define the notion of embodiment and situatedness which represent two foundational concepts in the study of natural and artificial intelligence. Indeed, al-

937

938

Embodied and Situated Agents, Adaptive Behavior in

though possessing a body and being in a physical environment certainly represent a pre-requisite for considering an agent embodied and situated, a more useful definition of embodiment (or of truly embodiment) can be given in term of the extent to which a given agent exploits its body characteristics to solve its adaptive problem (i. e. the extent to which its body structure is adapted to the problem to be solved, or in other words, the extent to which its body performs morphological computation). Similarly, a more useful definition of situatedness (or truly situatedness) can be given in terms of the extent to which an agent exploits its interaction with the physical and social environment and the properties originating from this interaction to solve its adaptive problem. For sake of clarity we can refer to the former definition of the terms (i. e. possessing a physical body and being situated in a physical environment) as embodiment and situatedness as weak sense, and to the latter definition as embodiment and situatedness in a strong sense. Bibliography 1. Asada M, MacDorman K, Ishiguro H, Kuniyoshi Y (2001) Cognitive developmental robotics as a new paradigm for the design of humanoid robots. Robot Auton Syst 37:185–193 2. Baldassarre G, Parisi D, Nolfi S (2006) Distributed coordination of simulated robots based on self-organisation. Artif Life 3(12):289–311 3. Beer RD (1995) A dynamical systems perspective on agent-environment interaction. Artif Intell 72:173–215 4. Beer RD (2003) The dynamics of active categorical perception in an evolved model agent. Adapt Behav 11:209–243 5. Berthouze L, Lungarella M (2004) Motor skill acquisition under environmental perturbations: on the necessity of alternate freezing and freeing. Adapt Behav 12(1):47–63 6. Bongard JC, Paul C (2001) Making evolution an offer it can’t refuse: Morphology and the extradimensional bypass. In: Keleman J, Sosik P (eds) Proceedings of the Sixth European Conference on Artificial Life. Lecture Notes in Artificial Intelligence, vol 2159. Springer, Berlin 7. Breazeal C (2003) Towards sociable robots. Robotics Auton Syst 42(3–4):167–175 8. Brooks RA (1991) Intelligence without reason. In: Mylopoulos J, Reiter R (eds) Proceedings of 12th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo 9. Brooks RA (1991) Intelligence without reason. In: Proceedings of 12th International Joint Conference on Artificial Intelligence. Sydney, Australia, pp 569–595 10. Brooks RA, Breazeal C, Irie R, Kemp C, Marjanovic M, Scassellati B, Williamson M (1998) Alternate essences of intelligence. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), Madison, Wisconsin, pp 961–976 11. Chiel HJ, Beer RD (1997) The brain has a body: Adaptive behavior emerges from interactions of nervous system, body and environment. Trends Neurosci 20:553–557 12. Clark A (1997) Being there: Putting brain, body and world together again. MIT Press, Cambridge

13. Endo I, Yamasaki F, Maeno T, Kitano H (2002) A method for co-evolving morphology and walking patterns of biped humanoid robot. In: Proceedings of the IEEE Conference on Robotics and Automation, Washington, D.C. 14. Floreano D, Husband P, Nolfi S (2008) Evolutionary Robotics. In: Siciliano B, Oussama Khatib (eds) Handbook of Robotics. Springer, Berlin 15. Gigliotta O, Nolfi S (2008) On the coupling between agent internal and agent/environmental dynamics: Development of spatial representations in evolving autonomous robots. Adapt Behav 16:148–165 16. Goldenberg E, Garcowski J, Beer RD (2004) May we have your attention: Analysis of a selective attention task. In: Schaal S, Ijspeert A, Billard A, Vijayakumar S, Hallam J, Meyer J-A (eds) From Animals to Animats 8: Proceedings of the Eighth International Conference on the Simulation of Adaptive Behavior. MIT Press, Cambridge 17. Harvey I (2000) Robotics: Philosophy of mind using a screwdriver. In: Gomi T (ed) Evolutionary Robotics: From Intelligent Robots to Artificial Life, vol III. AAI Books, Ontario 18. Holland J (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor 19. Keijzer F (2001) Representation and behavior. MIT Press, London 20. Kelso JAS (1995) Dynamics patterns: The self-organization of brain and behaviour. MIT Press, Cambridge 21. Lungarella M, Metta G, Pfeifer R, Sandini G (2003) Developmental robotics: a survey. Connect Sci 15:151–190 22. Marocco D, Nolfi S (2007) Emergence of communication in embodied agents evolved for the ability to solve a collective navigation problem. Connect Sci 19(1):53–74 23. Massera G, Cangelosi A, Nolfi S (2007) Evolution of prehension ability in an anthropomorphic neurorobotic arm. Front Neurorobot 1(4):1–9 24. McGeer T (1990) Passive walking with knees. In: Proceedings of the IEEE Conference on Robotics and Automation, vol 2, pp 1640–1645 25. Metta G, Sandini G, Natale L, Panerai F (2001) Development and Q30 robotics. In: Proceedings of IEEE-RAS International Conference on Humanoid Robots, pp 33–42 26. Mondada F, Franzi E, Ienne P (1993) Mobile robot miniaturisation: A tool for investigation in control algorithms. In: Proceedings of the Third International Symposium on Experimental Robotics, Kyoto, Japan 27. Mondada F, Pettinaro G, Guigrard A, Kwee I, Floreano D, Denebourg J-L, Nolfi S, Gambardella LM, Dorigo M (2004) Swarm-bot: A new distributed robotic concept. Auton Robots 17(2–3):193–221 28. Nolfi S (2002) Power and limits of reactive agents. Neurocomputing 49:119–145 29. Nolfi S (2005) Behaviour as a complex adaptive system: On the role of self-organization in the development of individual and collective behaviour. Complexus 2(3–4):195–203 30. Nolfi S, Floreano D (1999) Learning and Evolution. Auton Robots 1:89–113 31. Nolfi S, Floreano D (2000) Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines. MIT Press/Bradford Books, Cambridge 32. Nolfi S, Marocco D (2002) Active perception: A sensorimotor account of object categorization. In: Hallam B, Floreano D, Hallam J, Hayes G, Meyer J-A (eds) From Animals to Animats 7, Pro-

Embodied and Situated Agents, Adaptive Behavior in

33.

34. 35.

36. 37.

38.

39. 40.

41.

ceedings of the VII International Conference on Simulation of Adaptive Behavior. MIT Press, Cambridge, pp 266–271 Oudeyer P-Y, Kaplan F, Hafner V (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265–286 Pfeifer R, Bongard J (2007) How the body shape the way we think. MIT Press, Cambridge Pfeifer R, Iida F, Gómez G (2006) Morphological computation for adaptive behavior and cognition. In: International Congress Series, vol 1291, pp 22–29 Pollack JB, Lipson H, Funes P, Hornby G (2001) Three generations of coevolutionary robotics. Artif Life 7:215–223 Prokopenko M, Gerasimov V, Tanev I (2006) Evolving spatiotemporal coordination in a modular robotic system. In: Rocha LM, Yaeger LS, Bedau MA, Floreano D, Goldstone RL, Vespignani A (eds) Artificial Life X: Proceedings of the Tenth International Conference on the Simulation and Synthesis of Living Systems. MIT Press, Boston Scassellati B (2001) Foundations for a Theory of Mind for a Humanoid Robot. Ph D thesis, Department of Electrical Engineering and Computer Science, MIT, Boston Scheier C, Pfeifer R, Kunyioshi Y (1998) Embedded neural networks: exploiting constraints. Neural Netw 11:1551–1596 Schmitz A, Gómez G, Iida F, Pfeifer R (2007) On the robustness of simple speed control for a quadruped robot. In: Proceeding of the International Conference on Morphological Computation, Venice, Italy Slocum AC, Downey DC, Beer RD (2000) Further experiments in the evolution of minimally cognitive behavior: From perceiving affordances to selective attention. In: Meyer J, Berthoz A, Floreano D, Roitblat H, Wilson S (eds) From Animals to Animats

42.

43. 44.

45.

46.

47.

48.

49. 50.

6. Proceedings of the Sixth International Conference on Simulation of Adaptive Behavior. MIT Press, Cambridge Schmidhuber J (2006) Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts. Connect Sci 18(2):173–187 Steels L (2003) Evolving grounded communication for robots. Trends Cogn Sci 7(7):308–312 Sugita Y, Tani J (2005) Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adapt Behav 13(1):33–52 Tani J, Fukumura N (1997) Self-organizing internal representation in learning of navigation: A physical experiment by the mobile robot Yamabico. Neural Netw 10(1):153–159 Tani J, Nolfi S (1999) Learning to perceive the world as articulated: An approach for hierarchical learning in sensory-motor systems. Neural Netw 12:1131–1141 Tani J, Nishimoto R, Namikawa J, Ito M (2008) Co-developmental learning between human and humanoid robot using a dynamic neural network model. IEEE Trans Syst Man Cybern B. Cybern 38:1 Varela FJ, Thompson E, Rosch E (1991) The Embodied mind: Cognitive science and human experience. MIT Press, Cambridge van Gelder TJ (1998) The dynamical hypothesis in cognitive science. Behav Brain Sci 21:615–628 Vaughan E, Di Paolo EA, Harvey I (2004) The evolution of control and adaptation in a 3D powered passive dynamic walker. In: Pollack J, Bedau M, Husband P, Ikegami T, Watson R (eds) Proceedings of the Ninth International Conference on the Simulation and Synthesis of Living Systems. MIT Press, Cambridge

939

940

Entropy

Entropy CONSTANTINO TSALLIS1,2 1 Centro Brasileiro de Pesquisas Físicas, Rio de Janeiro, Brazil 2 Santa Fe Institute, Santa Fe, USA Article Outline Glossary Definition of the Subject Introduction Some Basic Properties Boltzmann–Gibbs Statistical Mechanics On the Limitations of Boltzmann–Gibbs Entropy and Statistical Mechanics The Nonadditive Entropy Sq A Connection Between Entropy and Diffusion Standard and q-Generalized Central Limit Theorems Future Directions Acknowledgments Bibliography Glossary Absolute temperature Denoted T. Clausius entropy Also called thermodynamic entropy. Denoted S. Boltzmann–Gibbs entropy Basis of Boltzmann–Gibbs statistical mechanics. This entropy, denoted SBG , is additive. Indeed, for two probabilistically independent subsystems A and B, it satisfies SBG (ACB) D SBG (A)C SBG (B). Nonadditive entropy It usually refers to the basis of nonextensive statistical mechanics. This entropy, denoted Sq , is nonadditive for q ¤ 1. Indeed, for two probabilistically independent subsystems A and B, it satisfies S q (A C B) ¤ S q (A) C S q (B) (q ¤ 1). For historical reasons, it is frequently (but inadequately) referred to as nonextensive entropy. q-logarithmic and q-exponential functions Denoted ln q x (ln1 x D ln x), and exq (e1x D e x ), respectively. Extensive system So called for historical reasons. A more appropriate name would be additive system. It is a system which, in one way or another, relies on or is connected to the (additive) Boltzmann–Gibbs entropy. Its basic dynamical and/or structural quantities are expected to be of the exponential form. In the sense of complexity, it may be considered a simple system. Nonextensive system So called for historical reasons. A more appropriate name would be nonadditive sys-

tem. It is a system which, in one way or another, relies on or is connected to a (nonadditive) entropy such as S q (q ¤ 1). Its basic dynamical and/or structural quantities are expected to asymptotically be of the powerlaw form. In the sense of complexity, it may be considered a complex system. Definition of the Subject Thermodynamics and statistical mechanics are among the most important formalisms in contemporary physics. They have overwhelming and intertwined applications in science and technology. They essentially rely on two basic concepts, namely energy and entropy. The mathematical expression that is used for the first one is well known to be nonuniversal; indeed, it depends on whether we are say in classical, quantum, or relativistic regimes. The second concept, and very specifically its connection with the microscopic world, has been considered during well over one century as essentially unique and universal as a physical concept. Although some mathematical generalizations of the entropy have been proposed during the last forty years, they have frequently been considered as mere practical expressions for disciplines such as cybernetics and control theory, with no particular physical interpretation. What we have witnessed during the last two decades is the growth, among physicists, of the belief that it is not necessarily so. In other words, the physical entropy would basically rely on the microscopic dynamical and structural properties of the system under study. For example, for systems microscopically evolving with strongly chaotic dynamics, the connection between the thermodynamical entropy and the thermostatistical entropy would be the one found in standard textbooks. But, for more complex systems (e. g., for weakly chaotic dynamics), it becomes either necessary, or convenient, or both, to extend the traditional connection. The present article presents the ubiquitous concept of entropy, useful even for systems for which no energy can be defined at all, within a standpoint reflecting a nonuniversal conception for the connection between the thermodynamic and the thermostatistical entropies. Consequently, both the standard entropy and its recent generalizations, as well as the corresponding statistical mechanics, are here presented on equal footing. Introduction The concept of entropy (from the Greek  !, en trepo, at turn, at transformation) was first introduced in 1865 by the German physicist and mathematician Rudolf Julius Emanuel Clausius, Rudolf Julius Emanuel in order to mathematically complete the formalism of classi-

Entropy

cal thermodynamics [55], one of the most important theoretical achievements of contemporary physics. The term was so coined to make a parallel to energy (from the Greek    o& , energos, at work), the other fundamental concept of thermodynamics. Clausius connection was given by dS D

ıQ ; T

(1)

where ıQ denotes an infinitesimal transfer of heat. In other words, 1/T acts as an integrating factor for ıQ. In fact, it was only in 1909 that thermodynamics was finally given, by the Greek mathematician Constantin Caratheodory, a logically consistent axiomatic formulation. In 1872, some years after Clausius proposal, the Austrian physicist Ludwig Eduard Boltzmann introduced a quantity, that he noted H, which was defined in terms of microscopic quantities: • H

f (v) ln[ f (v)] dv ;

(2)

where f (v) dv is the number of molecules in the velocity space interval dv. Using Newtonian mechanics, Boltzmann showed that, under some intuitive assumptions (Stoßzahlansatz or molecular chaos hypothesis) regarding the nature of molecular collisions, H does not increase with time. Five years later, in 1877, he identified this quantity with Clausius entropy through kH S, where k is a constant. In other words, he established that • S D k f (v) ln[ f (v)] dv ; (3) later on generalized into “ S D k

f (q; p) ln[ f (q; p)] dq dp ;

(4)

where (q; p) is called the -space and constitutes the phase space (coordinate q and momentum p) corresponding to one particle. Boltzmann’s genius insight – the first ever mathematical connection of the macroscopic world with the microscopic one – was, during well over three decades, highly controversial since it was based on the hypothesis of the existence of atoms. Only a few selected scientists, like the English chemist and physicist John Dalton, the Scottish physicist and mathematician James Clerk Maxwell, and the American physicist, chemist and mathematician Josiah Willard Gibbs, believed in the reality of atoms and

molecules. A large part of the scientific establishment was, at the time, strongly against such an idea. The intricate evolution of Boltzmann’s lifelong epistemological struggle, which ended tragically with his suicide in 1906, may be considered as a neat illustration of Thomas Kuhn’s paradigm shift, and the corresponding reaction of the scientific community, as described in The Structure of Scientific Revolutions. There are in fact two important formalisms in contemporary physics where the mathematical theory of probabilities enters as a central ingredient. These are statistical mechanics (with the concept of entropy as a functional of probability distributions) and quantum mechanics (with the physical interpretation of wave functions and measurements). In both cases, contrasting viewpoints and passionate debates have taken place along more than one century, and continue still today. This is no surprise after all. If it is undeniable that energy is a very deep and subtle concept, entropy is even more. Indeed, energy concerns the world of (microscopic) possibilities, whereas entropy concerns the world of the probabilities of those possibilities, a step further in epistemological difficulty. In his 1902 celebrated book Elementary Principles of Statistical Mechanics, Gibbs introduced the modern form of the entropy for classical systems, namely Z S D k d f (q; p) ln[C f (q; p)] ; (5) where  represents the full phase space of the system, thus containing all coordinates and all momenta of its elementary particles, and C is introduced to take into account the finite size and the physical dimensions of the smallest admissible cell in  -space. The constant k is known today to be a universal one, called Boltzmann constant, and given by k D 1:3806505(24)  1023 Joule/Kelvin. The studies of the German physicist Max Planck along Boltzmann and Gibbs lines after the appearance of quantum mechanical concepts, eventually led to the expression S D k ln W ;

(6)

which he coined as Boltzmann entropy. This expression is carved on the stone of Boltzmann’s grave at the Central Cemetery of Vienna. The quantity W is the total number of microstates of the system that are compatible with our macroscopic knowledge of it. It is obtained from Eq. (5) under the hypothesis of an uniform distribution or equal probabilities. The Hungarian–American mathematician and physicist Johann von Neumann extended the concept of BG entropy in two steps – in 1927 and 1932 respectively –, in order to also cover quantum systems. The following expression, frequently referred to as the von Neumann entropy,

941

942

Entropy

resulted: S D k Tr ln ;

(7)

being the density operator (with Tr D 1). Another important step was given in 1948 by the American electrical engineer and mathematician Claude Elwood Shannon. Having in mind the theory of digital communications he explored the properties of the discrete form S D k

W X

p i ln p i ;

(8)

iD1

PW frequently referred to as Shannon entropy (with iD1 p i D 1). This form can be recovered from Eq. (5) for the particular case for which the phase space density f (q; p) D PW iD1 p i ı(q  qi ) ı(p  pi ). It can also be recovered from Eq. (7) when is diagonal. We may generically refer to Eqs. (5), (6), (7) and (8) as the BG entropy, noted SBG . It is a measure of the disorder of the system or, equivalently, of our degree of ignorance or lack of information about its state. To illustrate a variety of properties, the discrete form (8) is particularly convenient. Some Basic Properties Non-negativity It can be easily verified that, in all cases, SBG 0, the zero value corresponding to certainty, i. e., p i D 1 for one of the W possibilities, and zero for all the others. To be more precise, it is exactly so whenever SBG is expressed either in the form (7) or in the form (8). However, this property of non-negativity may be no longer true if it is expressed in the form (5). This violation is one of the mathematical manifestations that, at the microscopic level, the state of any physical system exhibits its quantum nature. Expansibility Also SBG (p1 ; p2 ; : : : ; p W ; 0) D SBG (p1 ; p2 ; : : : ; p W ), i. e., zero-probability events do not modify our information about the system. Maximal value SBG is maximized at equal probabilities, i. e., for p i D 1/W ; 8i. Its value is that of Eq. (6). This corresponds to the Laplace principle of indifference or principle of insufficient reason. Concavity If we have two arbitrary probability distributions fp i g and fp0i g for the same set of W possibilities, we can define the intermediate probability distribution p00i D  p i C (1  ) p0i (0 <  < 1). It straightforwardly follows that SBG (fp00i g)  SBG (fp i g) C (1  ) SBG (fp0i g). This property is essential for thermodynamics since it eventually leads to thermodynamic stability, i. e., to robustness with regard to energy fluctuations. It also leads to the tendency of the entropy to

attain, as time evolves, its maximal value compatible with our macroscopic knowledge of the system, i. e., with the possibly known values for the macroscopic constraints. Lesche stability or experimental robustness B. Lesche introduced in 1982 [107] the definition of an interesting property, which he called stability. It reflects the experimental robustness that a physical quantity is expected to exhibit. In other words, similar experiments should yield similar numerical results for the physical quantities. Let us consider two probability distributions fp i g and fp0i g, assumed to be close, in the sense P 0 that W iD1 jp i  p i j < ı ; ı > 0 being a small number. An entropic functional S(fp i g) is said stable or experimentally robust if, for any given  > 0, a ı > 0 exists such that jS(fp i g)  S(fp0i g)j/Smax <  ; where Smax is the maximal value that the functional can attain (ln W in the case of SBG ). This implies that limı!0 limW!1 (S(fp i g)  S(fp0i g))/Smax D 0. As we shall see soon, this property is much stronger than it seems at first sight. Indeed, it provides a (necessary but not sufficient) criterion for classifying entropic functionals into physically admissible or not. It can be shown that SBG is Lesche-stable (or experimentally robust). Entropy production If we start the (deterministic) time evolution of a generic classical system from an arbitrarily chosen point in its  phase space, it typically follows a quite erratic trajectory which, in many cases, gradually visits the entire (or almost) phase space. By making partitions of this  -space, and counting the frequency of visits to the various cells (and related symbolic quantities), it is possible to define probability sets. Through them, we can calculate a sort of time evolution of SBG (t). If the system is chaotic (sometimes called strongly chaotic), i. e., if its sensitivity to the initial conditions increases exponentially with time, then SBG (t) increases linearly with t in the appropriate asymptotic limits. This rate of increase of the entropy is called Kolmogorov–Sinai entropy rate, and, for a large class of systems, it coincides (Pesin identity or Pesin theorem) with the sum of the positive Lyapunov exponents. These exponents characterize the exponential divergences, along various directions in the  -space, of a small discrepancy in the initial condition of a trajectory. It turns out, however, that the Kolmogorov–Sinai entropy rate is, in general, quite inconvenient for computational calculations for arbitrary nonlinear dynamical systems. In practice, another quantity is used instead [102], usually referred to as entropy production per unit time, which we note KBG . Its definition is as

Entropy

follows. We first make a partition of the  -space into many W cells (i D 1; 2; : : : ; W). In one of them, arbitrarily chosen, we randomly place M initial conditions (i. e., an ensemble). As time evolves, the occupancy of the W cells determines the set fM i (t)g, with PW iD1 M i (t) D M. This set enables the definition of a probability set with p i (t) M i (t)/M, which in turn determines SBG (t). We then define the entropy production per unit time as follows: KBG lim

lim

lim

t!1 W!1 M!1

SBG (t) : t

(9)

Up to date, no theorem guarantees that this quantity coincides with the Kolmogorov–Sinai entropy rate. However, many numerical approaches of various chaotic systems strongly suggest so. The same turns out to occur with what is frequently referred in the literature as a Pesin-like identity. For instance, if we have a one-dimensional dynamical system, its sensitivity to the initial conditions  lim x(0)!0 x(t)/x(0) is typically given by (t) D et ;

(11)

Additivity and extensivity If we consider a system A C B constituted by two probabilistically independent subsystems A and B, i. e., if we consider p ACB D p Ai p Bj , ij we immediately obtain from Eq. (8) that SBG (A C B) D SBG (A) C SBG (B) :

(12)

In other words, the BG entropy is additive [130]. If our system is constituted by N probabilistically independent identical subsystems (or elements), we clearly have SBG (N) / N. It frequently happens, however, that the N elements are not exactly independent but only asymptotically so in the N ! 1 limit. This is the usual case of many-body Hamiltonian systems involving only short-range interactions, where the concept of short-range will be focused in detail later on. For such systems, SBG is only asymptotically additive, i. e., 0 < lim

N!1

SBG (N) < 1: N

0 < lim

N!1

S(N) < 1; N

(14)

where no hypothesis at all is made about the possible independence or weak or strong correlations between the elements of the system whose entropy S we are considering. Equation (13) amounts to say that the additive entropy SBG is extensive for weakly correlated systems such as the already mentioned many-body shortrange-interacting Hamiltonian ones. It is important to clearly realize that additivity and extensivity are independent properties. An additive entropy such as SBG is extensive for simple systems such as the ones just mentioned, but it turns out to be nonextensive for other, more complex, systems that will be focused on later on. For many of these more complex systems, it is the nonadditive entropy Sq (to be analyzed later on) which turns out to be extensive for a non standard value of q (i. e., q ¤ 1).

(10)

where x(t) is the discrepancy in the one-dimensional phase space of two trajectories initially differing by x(0), and  is the Lyapunov exponent ( > 0 corresponds to strongly sensitive to the initial conditions, or strongly chaotic, and  < 0 corresponds to strongly insensitive to the initial conditions). The so-called Pesinlike identity amounts, if  0, to KBG D  :

An entropy S(fp i g) of a specific systems is said extensive if it satisfies

(13)

Boltzmann–Gibbs Statistical Mechanics Physical systems (classical, quantum, relativistic) can be theoretically described in very many ways, through microscopic, mesoscopic, macroscopic equations, reflecting either stochastic or deterministic time evolutions, or even both types simultaneously. Those systems whose time evolution is completely determined by a well defined Hamiltonian with appropriate boundary conditions and admissible initial conditions are the main purpose of an important branch of contemporary physics, named statistical mechanics. This remarkable theory (or formalism, as sometimes called), which for large systems satisfactorily matches classical thermodynamics, was primarily introduced by Boltzmann and Gibbs. The physical system can be in all types of situations. Two paradigmatic such situations correspond to isolation, and thermal contact with a large reservoir called thermostat. Their stationary state (t ! 1) is usually referred to as thermal equilibrium. Both situations have been formally considered by Gibbs within his mathematical formulation of statistical mechanics, and they respectively correspond to the socalled micro-canonical and canonical ensembles (other ensembles do exist, such as the grand-canonical ensemble, appropriate for those situations in which the total number of elements of the system is not fixed; this is however out of the scope of the present article). The stationary state of the micro-canonical ensemble is determined by p i D 1/W (8i, where i runs over all possi-

943

944

Entropy

ble microscopic states), which corresponds to the extremization of SBG with a single (and trivial) constraint, namely W X

pi D 1 :

(15)

iD1

To obtain the stationary state for the canonical ensemble, the thermostat being at temperature T, we must (typically) add one more constraint, namely W X

pi Ei D U ;

(16)

iD1

where fE i g are the energies of all the possible states of the system (i. e., eigenvalues of the Hamiltonian with the appropriate boundary conditions). The extremization of SBG with the two constraints above straightforwardly yields pi D Z

eˇ E i Z W X

(17)

eˇ E j

(18)

jD1

with the partition function Z, and the Lagrange parameter ˇ D 1/kT. This is the celebrated BG distribution for thermal equilibrium (or Boltzmann weight, or Gibbs state, as also called), which has been at the basis of an enormous amount of successes (in fluids, magnets, superconductors, superfluids, Bose–Einstein condensation, conductors, chemical reactions, percolation, among many other important situations). The connection with classical thermodynamics, and its Legendre-transform structure, occurs through relations such as 1 @S D T @U F U  TS D 

(19) 1 ln Z ˇ

@ ln Z @ˇ @U @2 F @S D D T 2 ; C T @T @T @T

UD

(20) (21) (22)

where F, U and C are the Helmholtz free energy, the internal energy, and the specific heat respectively. The BG statistical mechanics historically appeared as the first connection between the microscopic and the macroscopic descriptions of the world, and it constitutes one of the cornerstones of contemporary physics. The Establishment resisted heavily before accepting the validity and power of

Boltzmann’ s revolutionary ideas. In 1906 Boltzmann dramatically committed suicide, after 34 years that he had first proposed the deep ideas that we are summarizing here. At that early 20th century, few people believed in Boltzmann’s proposal (among those few, we must certainly mention Albert Einstein), and most physicists were simply unaware of the existence of Gibbs and of his profound contributions. It was only half a dozen years later that the emerging new generation of physicists recognized their respective genius (thanks in part to various clarifications produced by Paul Ehrenfest, and also to the experimental successes related with Brownian motion, photoelectric effect, specific heat of solids, and black-body radiation). On the Limitations of Boltzmann–Gibbs Entropy and Statistical Mechanics Historical Background As any other human intellectual construct, the applicability of the BG entropy, and of the statistical mechanics to which it is associated, naturally has restrictions. The understanding of present developments of both the concept of entropy, and its corresponding statistical mechanics, demand some knowledge of the historical background. Boltzmann was aware of the relevance of the range of the microscopic interactions between atoms and molecules. He wrote, in his 1896 Lectures on Gas Theory [41], the following words: When the distance at which two gas molecules interact with each other noticeably is vanishingly small relative to the average distance between a molecule and its nearest neighbor—or, as one can also say, when the space occupied by the molecules (or their spheres of action) is negligible compared to the space filled by the gas—then the fraction of the path of each molecule during which it is affected by its interaction with other molecules is vanishingly small compared to the fraction that is rectilinear, or simply determined by external forces. [ . . . ] The gas is “ideal” in all these cases. Also Gibbs was aware. In his 1902 book [88], he wrote: In treating of the canonical distribution, we shall always suppose the multiple integral in equation (92) [the partition function, as we call it nowadays] to have a finite value, as otherwise the coefficient of probability vanishes, and the law of distribution becomes illusory. This will exclude certain cases, but not such apparently, as will affect the value of our results with respect to their bearing on thermodynamics. It

Entropy

will exclude, for instance, cases in which the system or parts of it can be distributed in unlimited space [ . . . ]. It also excludes many cases in which the energy can decrease without limit, as when the system contains material points which attract one another inversely as the squares of their distances. [ . . . ]. For the purposes of a general discussion, it is sufficient to call attention to the assumption implicitly involved in the formula (92). The extensivity/additivity of SBG has been challenged, along the last century, by many physicists. Let us mention just a few. In his 1936 Thermodynamics [82], Enrico Fermi wrote: The entropy of a system composed of several parts is very often equal to the sum of the entropies of all the parts. This is true if the energy of the system is the sum of the energies of all the parts and if the work performed by the system during a transformation is equal to the sum of the amounts of work performed by all the parts. Notice that these conditions are not quite obvious and that in some cases they may not be fulfilled. Thus, for example, in the case of a system composed of two homogeneous substances, it will be possible to express the energy as the sum of the energies of the two substances only if we can neglect the surface energy of the two substances where they are in contact. The surface energy can generally be neglected only if the two substances are not very finely subdivided; otherwise, it can play a considerable role. Laszlo Tisza wrote, in his Generalized Thermodynamics [178]: The situation is different for the additivity postulate P a2, the validity of which cannot be inferred from general principles. We have to require that the interaction energy between thermodynamic systems be negligible. This assumption is closely related to the homogeneity postulate P d1. From the molecular point of view, additivity and homogeneity can be expected to be reasonable approximations for systems containing many particles, provided that the intermolecular forces have a short range character. Corroborating the above, virtually all textbooks of quantum mechanics contain the mechanical calculations corresponding to a particle in a square well, the harmonic oscillator, the rigid rotator, a spin 1/2 in the presence of a magnetic field, and the Hydrogen atom. In the textbooks of statistical mechanics we can find the thermostatistical calculations of all these systems . . . excepting the Hydrogen atom! Why? Because the long-range electron-proton

interaction produces an energy spectrum which leads to a divergent partition function. This is but a neat illustration of the above Gibbs’ alert. A Remark on the Thermodynamics of Short- and Long-Range Interacting Systems We consider here a simple d-dimensional classical fluid, constituted by many N point particles, governed by the Hamiltonian H DKCV D

N X X p2i V(r i j ) ; C 2m iD1

(23)

i¤ j

where the potential V(r) has, if it is attractive at short distances, no singularity at the origin, or an integrable singularity, and whose asymptotic behavior at infinity is given by V(r) B/r˛ with B > 0 and ˛ 0. One such example is the d D 3 Lennard–Jones fluid, for which V(r) D A/r12  B/r6 (A > 0), i. e., repulsive at short distances and attractive at long distances. In this case ˛ D 6. Another example could be Newtonian gravitation with a phenomenological short-distance cutoff (i. e., V(r) ! 1 for r  r0 with r0 > 0. In this case, ˛ D 1. The full  -space of such a system has 2dN dimensions. The total potential energy is expected to scale (assuming a roughly homogeneous distribution of the particles) as Upot (N) / B N

Z

1

dr r d1 r˛ ;

(24)

1

where the integral starts appreciably contributing above a typical cutoff, here taken to be unity. This integral is finite [ D B/(˛  d) ] for ˛/d > 1 (short-range interactions), and diverges for 0  ˛/d  1 (long-range interactions). In other words, the energy cannot be generically characterized by Eq. (24), and we must turn onto a different and more powerful estimation. Given the finiteness of the size of the system, an appropriate one is, in all cases, given by Upot (N) / B N

Z

N 1/d 1

B dr r d1 r˛ D  N ? ; d

(25)

where 8 1 ˆ ˆ ˆ ˆ ˛/d 1 ˆ N 1˛/d  1 < ? ln N N ˆ 1  ˛/d ˆ ˆ ˆ N 1˛/d ˆ : 1  ˛/d

if ˛/d > 1 ; if ˛/d D 1 ; if 0 < ˛/d < 1 : (26)

945

946

Entropy

Notice that N ? D ln˛/d N where the q-log function lnq x (x 1q  1)/(1  q)(x > 0; ln1 x D ln x) will be shown to play an important role later on. Satisfactorily enough, Eqs. (26) recover the characterization with Eq. (24) in the limit N ! 1, but they have the great advantage of providing, for finite N, a finite value. This fact will be now shown to enable to properly scale the macroscopic quantities in the thermodynamic limit (N ! 1), for all values of ˛/d 0. Let us address the thermodynamical consequences of the microscopic interactions being short- or long-ranged. To present a slightly more general illustration, we shall assume from now on that our homogeneous and isotropic classical fluid is made by magnetic particles. Its Gibbs free energy is then given by G(N; T; p; H) D U(N; T; p; H)  T S(N; T; p; H) C pV (N; T; p; H)  HM(N; T; p; H) ;

(27)

where (T; p; H) correspond respectively to the temperature, pressure and external magnetic field, V is the volume and M the magnetization. If the interactions are shortranged (i. e., if ˛/d > 1), we can divide this equation by N and then take the N ! 1 limit. We obtain

must in general express them in the (T ? ; p? ; H ? ) variables. If ˛/d > 1, this procedure recovers the usual equations of states, and the usual extensive (G; U; S; V ; M) and intensive (T; p; H) thermodynamic variables. But, if 0  ˛/d  1, the situation is more complex, and we realize that three, instead of the traditional two, classes of thermodynamic variables emerge. We may call them extensive (S; V ; M; N), pseudo-extensive (G; U) and pseudo-intensive (T; p; H) variables. All the energy-type thermodynamical variables (G; F; U) give rise to pseudo-extensive ones, whereas those which appear in the usual Legendre thermodynamical pairs give rise to pseudo-intensive ones (T; p; H; ) and extensive ones (S; V ; M; N). See Figs. 1 and 2. The possibly long-range interactions within Hamiltonian (23) refer to the dynamical variables themselves. There is another important class of Hamiltonians, where the possibly long-range interactions refer to the coupling constants between localized dynamical variables. Such is, for instance, the case of the following classical Hamiltonian: N X L2i H DKCV D 2I iD1

y y

g(T; p; H) D u(T; p; H)  Ts(T; p; H) C pv(T; p; H)  Hm(T; p; H) ;

 (28)

where g(T; p; H) lim N!1 G(N; T; p; H)/N, and analogously for the other variables of the equation. If the interactions were instead long-ranged (i. e., if 0  ˛/d  1), all these quantities would be divergent, hence thermodynamically nonsense. Consequently, the generically correct procedure, i. e. 8 ˛/d 0, must conform to the following lines: lim

N!1

G(N; T; p; H) U(N; T; p; H) D lim N!1 N N? N N? T S(N; T; p; H)  lim N!1 N ? N (29) p V (N; T; p; H) C lim N!1 N ? N H M(N; T; p; H)  lim N!1 N ? N

hence g(T ? ; p? ; H ? ) D u(T ? ; p? ; H ? )  T ? s(T ? ; p? ; H ? ) C p? v(T ? ; p? ; H ? )  H ? m(T ? ; p? ; H ? ) ;

(30)

where the definitions of T ? and all the other variables are self-explanatory (e. g., T ? T/N ? ). In other words, in order to have finite thermodynamic equations of states, we

X J x s xi s xj C J y s i s j C J z s zi s zj i¤ j

r˛i j

(˛ 0) ;

(31)

where fL i g are the angular momenta, I the moment of y inertia, f(s xi ; s i ; s zi )g are the components of classical rotators, (J x ; J y ; J z ) are coupling constants, and rij runs over all distances between sites i and j of a d-dimensional lattice. For example, for a simple hypercubic lattice

Entropy, Figure 1 For long-range interactions (0 ˛/d 1) we have three classes of thermodynamic variables, namely the pseudo-intensive (scaling with N? ), pseudo-extensive (scaling with NN? ) and extensive (scaling with N) ones. For short range interactions (˛/d > 1) the pseudo-intensive variables become intensive (independent from N), and the pseudo-extensive merge with the extensive ones, all being now extensive (scaling with N), thus recovering the traditional two textbook classes of thermodynamical variables

Entropy

matical difficulties emerging in the models in the presence of long-range interactions. Last but not least, we verify a point which is crucial for the developments here below, namely that the entropy S is expected to be extensive no matter the range of the interactions. The Nonadditive Entropy Sq Introduction and Basic Properties

Entropy, Figure 2 The so-called extensive systems (˛/d > 1 for the classical ones) typically involve absolutely convergent series, whereas the socalled nonextensive systems (0 ˛/d < 1 for the classical ones) typically involve divergent series. The marginal systems (˛/d D 1 here) typically involve conditionally convergent series, which therefore depend on the boundary conditions, i. e., typically on the external shape of the system. Capacitors constitute a notorious example of the ˛/d D 1 case. The model usually referred to in the literature as the Hamiltonian–Mean–Field (HMF) one lies on the ˛ D 0 axis (8d > 0). The model usually referred to as the d-dimensional ˛-XY model [19] lies on the vertical axis at abscissa d (8˛  0)

with unit crystalline 2; 3; : : : if p parameter we have r i j D p 1; p d D 1, r i j D 1; 2; 2; : : : if d D 2, r i j D 1; 2; 3; 2; : : : if d D 3, and so on. For such a case, we have that N?

N X

˛ r1i ;

(32)

iD2

which has in fact the same asymptotic behaviors as indicated in Eq. (26). In other words, here again ˛/d > 1 corresponds to short-range interactions, and 0  ˛/d  1 corresponds to long-range ones. The correctness of the present generalized thermodynamical scalings has already been specifically checked in many physical systems, such as a ferrofluid-like model [97], Lennard–Jones-like fluids [90], magnetic systems [16,19,59,158], anomalous diffusion [66], percolation [85,144]. Let us mention that, for the ˛ D 0 models (i. e., mean field models), it is largely spread in the literature to divide by N the potential term of the Hamiltonian in order to make it extensive by force. Although mathematically admissible (see [19]), this is obviously very unsatisfactory in principle since it implies a microscopic coupling constant which depends on N. What we have described here is the thermodynamically proper way of eliminating the mathe-

The possibility was introduced in 1988 [183] (see also [42,112,157,182]) to generalize the BG statistical mechanics on the basis of an entropy Sq which generalizes SBG . This entropy is defined as follows: P q 1 W iD1 p i Sq k (33) (q 2 R; S1 D SBG ): q1 For equal probabilities, this entropy takes the form S q D k lnq W

(S1 D k ln W) ;

(34)

where the q-logarithmic function has already been defined. Remark With the same or different prefactor, this entropic form has been successively and independently introduced in many occasions during the last decades. J. Havrda and F. Charvat [92] were apparently the first to ever introduce this form, though with a different prefactor (adapted to binary variables) in the context of cybernetics and information theory. I. Vajda [207], further studied this form, quoting Havrda and Charvat. Z. Daroczy [74] rediscovered this form (he quotes neither Havrda–Charvat nor Vajda). J. Lindhard and V. Nielsen [108] rediscovered this form (they quote none of the predecessors) through the property of entropic composability. B.D. Sharma and D.P. Mittal [163] introduced a two-parameter form which reproduces both Sq and Renyi entropy [145] as particular cases. A. Wehrl [209] mentions the form of Sq in p. 247, quotes Daroczy, but ignores Havrda–Charvat, Vajda, Lindhard–Nielsen, and Sharma–Mittal. Myself I rediscovered this form in 1985 with the aim of generalizing Boltzmann–Gibbs statistical mechanics, but quote none of the predecessors in the 1988 paper [183]. In fact, I started knowing the whole story quite a few years later thanks to S.R.A. Salinas and R.N. Silver, who were the first to provide me with the corresponding informations. Such rediscoveries can by no means be considered as particularly surprising. Indeed, this happens in science more frequently than usually realized. This point is lengthily and colorfully developed by S.M. Stigler [167]. In p. 284, a most interesting example is described, namely that of the celebrated

947

948

Entropy

normal distribution. It was first introduced by Abraham De Moivre in 1733, then by Pierre Simon de Laplace in 1774, then by Robert Adrain in 1808, and finally by Carl Friedrich Gauss in 1809, nothing less than 76 years after its first publication! This distribution is universally called Gaussian because of the remarkable insights of Gauss concerning the theory of errors, applicable in all experimental sciences. A less glamorous illustration of the same phenomenon, but nevertheless interesting in the present context, is that of Renyi entropy [145]. According to I. Csiszar [64], p. 73, the Renyi entropy had already been essentially introduced by Paul-Marcel Schutzenberger [161].

(vii) Sq is nonadditive for q ¤ 1. Indeed, for independent subsystems A and B, it can be straightforwardly proved

The entropy defined in Eq. (33) has the following main properties:

which makes explicit that (1  q) ! 0 plays the same role as k ! 1. Property (38), occasionally referred to in the literature as pseudo-additivity, can be called subadditivity (superadditivity) for q > 1 (q < 1). P x (viii) S q D k D q W iD1 p i j xD1 , where the 1909 Jackson differential operator is defined as follows:

(i)

Sq is nonnegative (8q);

(ii)

Sq is expansible (8q > 0);

(iii) Sq attains its maximal (minimal) value k lnq W for q > 0 (for q < 0); (iv)

Sq is concave (convex) for q > 0 (for q < 0);

(v)

Sq is Lesche-stable (8q > 0) [2];

(vi)

Sq yields a finite upper bound of the entropy production per unit time for a special value of q, whenever the sensitivity to the initial conditions exhibits an upper bound which asymptotically increases as a power of time. For example, many D D 1 nonlinear dynamical systems have a vanishing maximal Lyapunov exponent 1 and exhibit a sensitivity to the initial conditions which is (upper) bounded by q t

 D eq

;

S q (A C B) S q (A) S q (B) S q (A) S q (B) D C C (1  q) ; k k k k k (38) or, equivalently, S q (A C B) D S q (A) C S q (B) C

D q f (x)

(ix)

exq

1

if 1 C (1  q)x > 0 ;

0

otherwise :

An uniqueness theorem has been proved by Santos [159], which generalizes, for arbitrary q, that of Shannon [162]. Let us assume that an entropic form S(fp i g) satisfies the following properties: (a) S(fp i g) is a continuous function of fp i g; (41) (b) S(p i D 1/W; 8i) monotonically increases with the total number of possibilities W; (42)

(35)

[1 C (1  q) x] 1q

(D1 f (x) D d f (x)/dx) : (40)

with q > 0, q < 1, the q-exponential function exq being the inverse of lnq x. More explicitly (see Fig. 3) (

f (qx)  f (x) qx  x

(1  q) S q (A) S q (B) ; k (39)

(c) S(A C B) S(A) S(B) S(A) S(B) D C C (1  q) k k k k k A B D p p 8(i; j) ; with k > 0; if p ACB i j ij

(36) Such systems have a finite entropy production per unit time, which satisfies a q-generalized Pesin-like identity, namely, for the construction described in Sect. “Introduction”, K q lim

lim

lim

t!1 W!1 M!1

S q (t) D q : t

(43) (d) q

L terms

(37)

The situation is in fact sensibly much richer than briefly described here. For further details, see [27,28, 29,30,93,116,117,146,147,148,149,150,151,152].

q

S(fp i g) D S(p L ; p M ) C p L S(fp i /p L g) C p M S(fp i /p M g) X X with p L pi ; pM p i (L C M D W) ; and

M terms

pL C p M D 1 : (44)

(x)

Then and only then [159] S(fp i g) D S q (fp i g). Another (equivalent) uniqueness theorem was

Entropy

Entropy, Figure 3 The q-exponential and q-logarithm functions in typical representations: a Linear-linear representation of exq ; b Linear-linear repreaq x , solution of dy/dx D aq yq with y(0) D 1; d Linear-linear representation sentation of ex q ; c Log-log representation of y(x) D eq of Sq D lnq W (value of the entropy for equal probabilities)

proved by Abe [1], which generalizes, for arbitrary q, that of Khinchin [100]. Let us assume that an entropic form S(fp i g) satisfies the following properties: (a) S(fp i g) is a continuous function of fp i g; (45) (b) S(p i D 1/W; 8i) monotonically increases

(d) S(A) S(BjA) S(A) S(BjA) S(A C B) D C C (1  q) k k k k k o n ; where S(A C B) S p ACB ij 91 08 WB 0) (48)

S(p1 ; p2 ; : : : ; p W ; 0) D S(p1 ; p2 ; : : : ; p W ) ; (47)

Then and only then [1] S(fp i g) D S q (fp i g).

949

950

Entropy

Additivity Versus Extensivity of the Entropy It is of great importance to distinguish additivity from extensivity. An entropy S is additive [130] if its value for a system composed by two independent subsystems A and B satisfies S(A C B) D S(A) C S(B) (hence, for N independent equal subsystems or elements, we have S(N) D N S(1)). Therefore, SBG is additive, and S q (q ¤ 1) is nonadditive. A substantially different matter is whether a given entropy S is extensive for a given system. An entropy is extensive if and only if 0 < lim N!1 S(N)/N < 1. What matters for satisfactorily matching thermodynamics is extensivity not additivity. For systems whose elements are nearly independent (i. e., essentially weakly correlated), SBG is extensive and Sq is nonextensive. For systems whose elements are strongly correlated in a special manner, SBG is nonextensive, whereas Sq is extensive for a special value of q ¤ 1 (and nonextensive for all the others). Let us illustrate these facts for some simple examples of equal probabilities. If W(N) A N (A > 0,  > 1, and N ! 1), the entropy which is extensive is SBG . Indeed, SBG (N) D k ln W(N) (ln )N / N (it is equally trivial to verify that S q (N) is nonextensive for any q ¤ 1). If W(N) BN (B > 0, > 0, and N ! 1), the entropy which is extensive is S1(1/ ) . Indeed, S1(1/ ) (N) k B1/ N / N (it is equally trivial to verify that SBG (N) / ln N, hence nonextensive). If  W(N) C N (C > 0,  > 1,  ¤ 1, and N ! 1), then S q (N) is nonextensive for any value of q. Therefore, in such a complex case, one must in principle refer to some other kind of entropic functional in order to match the extensivity required by classical thermodynamics. Various nontrivial abstract mathematical models can be found in [113,160,186,198,199] for which S q (q ¤ 1) is extensive. Moreover, a physical realization is also available now [60,61] for a many-body quantum Hamiltonian, namely the ground state of the following one:

H D

N1 X

 y y  x (1 C  )S ix S iC1 C (1   )S i S iC1  2

p 9 C c2  3 ; qD c

S zi ;

(49)

iD1

(50)

where c is the central charge which emerges in quantum field theory [54]. In other words, 0 < limL!1 S(p9Cc 2 3)/c (L)/L < 1. Notice that q increases from zero p to unity when c increases from zero to infinity; p q D 37  6 ' 0:083 for c D 1/2 (Ising model), q D 10  3 ' 0:16 for c D 1 (isotropic XY model), p q D 1/2 for c D 4 (dimension of space-time), and q D ( 685  3)/26 ' 0:89 for c D 26, related to string theory [89]. The possible physical interpretation of the limit c ! 1 is still unknown, although it could correspond to some sort of mean field approach.

Nonextensive Statistical Mechanics To generalize BG statistical mechanics for the canonical ensemble, we optimize Sq with constraint (15) and also W X

iD1 N X

versality class, which corresponds to a so-called central charge c D 1/2), whereas the latter one belongs to a different universality class (the XX one, which corresponds to a central charge c D 1). At temperature T D 0 and N ! 1, this model exhibits a second-order phase transition as a function of . For the Ising model, the critical value is  D 1, whereas, for the XX model, the entire line 0    1 is critical. Since the system is at its ground state (assuming a vanishingly small magnetic field component in the x  y plane), it is a pure state (i. e., its density matrix N is such that Tr 2N D 1, 8N), hence the entropy S q (N)(8q > 0) is strictly zero. However, the situation is drastically different for any L-sized block of the infinite chain. Indeed, L TrNL N is such that Tr 2L < 1, i. e., it is a mixed state, hence it has a nonzero entropy. The block entropy S q (L) lim N!1 S q (N; L) monotonically increases with L for all values of q. And it does so linearly for

Pi E i D U q ;

(51)

iD1

where y

where  is a transverse magnetic field, and (S ix ; S i ; S zi ) are Pauli matrices; for j j D 1 we have the Ising model, for 0 < j j < 1, we have the anisotropic XY model, and, for  D 0, we have the isotropic XY model. The two former share the same symmetry and consequently belong to the same critical universality class (the Ising uni-

q

W X

pi

Pi PW

jD1

q

pi

! Pi D 1

(52)

iD1

is the so-called escort distribution [33]. It follows that 1/q P 1/q p i D Pi / W jD1 P j . There are various converging rea-

Entropy

sons for being appropriate to impose the energy constraint with the fPi g instead of with the original fp i g. The full discussion of this delicate point is beyond the present scope. However, some of these intertwined reasons are explored in [184]. By imposing Eq. (51), we follow [193], which in turn reformulates the results presented in [71,183]. The passage from one to the other of the various existing formulations of the above optimization problem are discussed in detail in [83,193]. The entropy optimization yields, for the stationary state, ˇq (E i U q )

pi D

eq

;

Z¯ q

(53)

with ˇ ˇ q PW

jD1

q

pj

;

(54)

The connection to thermodynamics is established in what follows. It can be proved that @S q 1 ; D T @U q with T 1/(kˇ). Also we prove, for the free energy, Fq U q  T S q D 

Z¯ q

W X

ˇq (E i U q )

eq

;

(55)

lnq Z q D lnq Z¯ q  ˇU q :

pi D

;

(56)

with Z q0

W X

ˇq0 E j eq

;

(57)

@ lnq Z q ; @ˇ

ˇq0

ˇq : 1 C (1  q)ˇ q U q

(58)

The form (56) is particularly convenient for many applications where comparison with experimental or computational data is involved. Also, it makes clear that pi asymp1/(q1) totically decays like 1/E i for q > 1, and has a cutoff for q < 1, instead of the exponential decay with Ei for q D 1.

(62)

@S q @U q @2 Fq : D D T @T @T @T 2

(63)

In fact the entire Legendre transformation structure of thermodynamics is q-invariant, which is both remarkable and welcome. A Connection Between Entropy and Diffusion We review here one of the main common aspects of entropy and diffusion. We shall present on equal footing both the BG and the nonextensive cases [13,138,192,216]. Let us extremize the entropy Sq D k

1

R1

1

jD1

and

(61)

as well as relations such as Cq T

ˇq0 E i eq Z q0

(60)

This relation takes into account the trivial fact that, in contrast with what is usually done in BG statistics, the energies fE i g are here referred to U q in (53). It can also be proved

i

ˇ being the Lagrange parameter associated with the constraint (51). Equation (53) makes explicit that the probability distribution is, for fixed ˇ q , invariant with regard to the arbitrary choice of the zero of energies. The stationary state (or (meta)equilibrium) distribution (53) can be rewritten as follows:

1 lnq Z q ; ˇ

where

Uq D 

and

(59)

d(x/ ) [ p(x)]q q1

with the constraints Z 1 dx p(x) D 1

(64)

(65)

1

and R1

R1 hx iq 1 2

dx x 2 [p(x)]q

1

dx [p(x)]q

D 2 ;

(66)

 > 0 being some fixed value having the same physical dimensions of the variable x. We straightforwardly obtain

951

952

Entropy

the following distribution: p q (x) D

8 1 ˆ 1/2  ˆ ˆ 1 q1 q1 1 ˆ ˆ

ˆ 1/(q1) ˆ 3  q  (3  q) ˆ q  1 x2 ˆ  ˆ 1 C ˆ 2(q  1) ˆ 3  q 2 ˆ ˆ ˆ if 1 < q < 3 ; ˆ ˆ ˆ ˆ ˆ 1 1 2 2 ˆ ˆ < p ex /2 

ˇ [U (x)U (0)]

eq ; p(x; 1)q D Z Z 1 [U (x)U (0)] Z dx eˇ ; q

(70)

1/ˇ kT / jDj ;

(71)

(69)

1

2

if q D 1 ; ˆ ˆ ˆ

ˆ ˆ 5  3q ˆ ˆ 1/(1q) 1/2  ˆ ˆ 1  q x2 2(1  q) 1  q 1 ˆ ˆ

1  ˆ ˆ 2q  (3  q) 3  q 2 ˆ ˆ  ˆ ˆ 1  q ˆ ˆ ˆ ˆfor jxj <  [(3  q)/(1  q)]1/2 ; and zero otherwise , ˆ : if

q < 1:

(67) These distributions are frequently referred to as q-Gaussians. For q > 1, they asymptotically have a power-law tail (q 3 is not admissible because the norm (65) cannot be satisfied); for q < 1, they have a compact support. For q D 1, the celebrated Gaussian is recovered; for q D 2, the Cauchy–Lorentz distribution is recovered; finally, for q ! 1, the uniform distribution within the interval [1; 1] is recovered. For q D 3Cm 1Cm , m being an integer (m D 1; 2; 3; : : :), we recover the Student’s t-distributions with m degrees of freedom [79]. For q D n4 n2 , n being an integer (n D 3; 4; 5; : : :), we recover the socalled r-distributions with n degrees of freedom [79]. In other words, q-Gaussians are analytical extensions of Student’s t- and r-distributions. In some communities they are also referred to as the Barenblatt form. For q < 5/3, they have a finite variance which monotonically increases for q varying from 1 to 5/3; for 5/3  q < 3, the variance diverges. Let us now make a connection of the above optimization problem with diffusion. We focus on the following quite general diffusion equation: @ı p(x; t) @t ı

For example, the stationary state for ˛ D 2, 8ı, and any confining potential (i. e., limjxj!1 U(x) D 1) is given by [106]

@ D @x



@U(x) @˛ [p(x; t)]2q p(x; t) C D @x @jxj˛

(0 < ı  1; 0 < ˛  2; q < 3; t 0) ;

(68)

with a generic nonsingular potential U(x), and a generalized diffusion coefficient D which is positive (negative) for q < 2 (2 < q < 3). Several particular instances of this equation have been discussed in the literature (see [40,86,106,131,188] and references therein).

which precisely is the distribution obtained within nonextensive statistical mechanics through extremization of Sq . Also, the solution for ˛ D 2, ı D 1, U(x) D k1 x C k2 2 x (8 k1 , and k2 0), and p(x; 0) D ı(x) is given 2 by [188] ˇ (t)[xx M (t)]2

p q (x; t) D

eq

Z q (t) 2

;

Z q (0) ˇ(t) D ˇ(0) Z q (t)

1 t/ 1 2/(3q) D 1 e ; C K2 K2 k2 ; K2 2(2  q)Dˇ(0)[Z q (0)]q1 1 ; k2 (3  q) k1 k1 k 2 t C x M (0)  : x M (t) e k2 k2

(72)

(73)

(74) (75) (76)

In the limit k2 ! 0, Eq. (73) becomes ˚ Z q (t) D [Z q (0)]3q C 2(2  q)(3  q)Dˇ(0) 1/(3q) ; [Z q (0)]2 t

(77)

which, in the t ! 1 limit, yields 1 / [Z q (t)]2 / t 2/(3q) : ˇ(t)

(78)

In other words, x2 scales like t , with D

2 ; 3q

(79)

hence, for q > 1 we have  > 1 (i. e., superdiffusion; in particular, q D 2 yields  D 2, i. e., ballistic diffusion), for q < 1 we have  < 1 (i. e., subdiffusion; in particular, q ! 1 yields  D 0, i. e., localization), and naturally, for q D 1, we obtain normal diffusion. Four systems are known for which results have been found that are consistent with prediction (79). These are the motion of Hydra viridissima [206], defect turbulence [73], simulation of

Entropy

a silo drainage [22], and molecular dynamics of a manybody long-range-interacting classical system of rotators (˛  XY model) [143]. For the first three, it has been found (q;  ) ' (3/2; 4/3). For the latter one, relation (79) has been verified for various situations corresponding to  > 1. Finally, for the particular case ı D 1 and U(x) D 0, Eq. (68) becomes @p(x; t) @˛ [p(x; t)]2q DD @t @jxj˛

(0 < ˛  2; q < 3) : (80)

The diffusion constant D just rescales time t. Only two parameters are therefore left, namely ˛ and q. The linear case (i. e., q D 1) has two types of solutions: Gaussians for ˛ D 2, and Lévy- (or ˛-stable) distributions for 0 < ˛ < 2. The case ˛ D 2 corresponds to the Central Limit Theorem, where the N ! 1 attractor of the sums of N independent random variables with finite variance precisely is a Gaussian. The case 0 < ˛ < 2 corresponds to the sometimes called Levy–Gnedenko Central Limit Theorem, where the N ! 1 attractor of the sums of N independent random variables with infinite variance (and appropriate asymptotics) precisely is a Lévy distribution with index ˛. The nonlinear case (i. e., q ¤ 1) has solutions that are q-Gaussians for ˛ D 2, and one might conjecture that, similarly, interesting solutions exist for 0 < ˛ < 2. Furthermore, in analogy with the q D 1 case, one expects corresponding q-generalized Central Limit Theorems to exist [187]. This is precisely what we present in the next Section. Standard and q-Generalized Central Limit Theorems The q-Product It has been recently introduced (independently and virtually simultaneously) [43,125] a generalization of the product, which is called q-product. It is defined, for x 0 and y 0, as follows:

it is commutative, i. e., x ˝q y D y ˝q x ;

(83)

it is additive under q-logarithm, i. e., lnq (x ˝q y) D lnq x C lnq y

(84)

(whereas we remind that lnq (x y) D lnq x C lnq y C (1  q)(ln q x)(ln q y); it has a (2  q)-duality/inverse property, i. e., 1/(x ˝q y) D (1/x) ˝2q (1/y) ;

(85)

it is associative, i. e., x ˝q (y ˝q z) D (x ˝q y) ˝q z D x ˝q y ˝q z D (x 1q C y 1q C z1q  2)1/(1q) ;

(86)

it admits unity, i. e., x ˝q 1 D x :

(87)

and, for q 1, also a zero, i. e., x ˝q 0 D 0

(q 1) :

(88)

The q-Fourier Transform We shall introduce the q-Fourier transform of a quite generic function f (x) (x 2 R) as follows [140,189,202, 203,204,205]: Z 1 dx eix Fq [ f ]() q ˝ q f (x) 1 ; (89) Z 1 ix[ f (x)] q1 dx eq f (x) D 1

where we have primarily focused on the case q 1. In contrast with the q D 1 case (standard Fourier transform), this integral transformation is nonlinear for q ¤ 1. It has a remarkable property, namely that the q-Fourier transform of a q-Gaussian is another q-Gaussian: h p i 2 x2 1 () D eˇ ; (90) Fq N q ˇ eˇ q q1 with

x ˝q y ( [x 1q C y 1q  1]1/(1q) 0

if

x 1q C y 1q > 1 ;

otherwise : (81)

It has, among others, the following properties: it recovers the standard product as a particular instance, i. e.,

Nq

8

1 ˆ ˆ  ˆ ˆ q1 q  1 1/2 ˆ

ˆ ˆ 3q ˆ  ˆ  ˆ ˆ 2(q  1) ˆ < 1 p

if 1 < q < 3 ;

if

q D 1;

 ˆ

ˆ ˆ 3q ˆ ˆ  ˆ 1/2 ˆ 2(1  q) 3q 1q ˆ ˆ

if q < 1 ; ˆ ˆ 2 1  ˆ :  1q

x ˝1 y D x y ;

(82)

(91)

953

954

Entropy

and q1 D z(q) ˇ1 D

1 ˇ 2q

1Cq ; 3q

(92)

2(1q)

Nq

(3  q) : 8

(93) p

p 1/ 2q

Equation (93) can be re-written as ˇ 2q ˇ1 D p 2(1q) 1/ 2q [(N q (3  q))/8]

K(q), which, for q D 1, recovers the well known Heisenberg-uncertainty-principlelike relation ˇˇ1 D 1/4. If we iterate n times the relation z(q) in Eq. (92), we obtain the following algebra: q n (q) D

2q C n(1  q) 2 C n(1  q)

(n D 0; ˙1; ˙2; : : : ) ; (94)

which can be conveniently re-written as 2 2 D Cn 1  q n (q) 1q

(n D 0; ˙1; ˙2; : : : ) : (95)

(See Fig. 4). We easily verify that q n (1) D 1 (8n), q˙1 (q) D 1 (8q), as well as 1 q nC1

D 2  q n1 :

(96)

This relation connects the so called additive duality q ! (2  q) and multiplicative duality q ! 1/q, frequently emerging in all types of calculations in the literature. Moreover, we see from Eq. (95) that multiple values of q are expected to emerge in connection with diverse properties of nonextensive systems, i. e., in systems whose basic entropy is the nonadditive one Sq . Such is the case of the so called q-triplet [185], observed for the first time in the magnetic field fluctuations of the solar wind, as it has been revealed by the analysis of the data sent to NASA by the spacecraft Voyager 1 [48]. q-Independent Random Variables Two random variables X [with density f X (x)] and Y [with density f Y (y)] having zero q-mean values (e. g., if f X (x) and f Y (y) are even functions) are said q-independent, with q1 given by Eq. (92), if Fq [X C Y]() D Fq [X]() ˝q 1 Fq [Y]() ;

(97)

i. e., if Z 1

dz eiz q ˝ q f XCY (z) D dx eix ˝ f (x) ˝(1Cq)/(3q) q X q 1 Z 1 i y dy eq ˝q f X (y) ;

1 Z 1

1

Entropy, Figure 4 The q-dependence of qn (q) q2;n (q)

with Z f XCY (z) D

Z

1

1

dy h(x; y) ı(x C y  z)

dx Z1 1

1

dx h(x; z  x)

D

(99)

Z1 1 D

dy h(z  y; y) 1

where h(x; y) is the joint density. Clearly, q-independence means independence for q D 1 (i. e., h(x; y) D f X (x) f Y (y)), and implies a special correlation for q ¤ 1. Although the full understanding of this correlation is still under progress, q-independence appears to be consistent with scale-invariance. q-Generalized Central Limit Theorems It is out of the scope of the present survey to provide the details of the complex proofs of the q-generalized central limit theorems. We shall restrict to the presentation of their structure. Let us start by introducing a notation which is important for what follows. A distribution is said (q; ˛)-stable distribution L q;˛ (x) if its q-Fourier transform Lq;˛ () is of the form ˛

jj Lq;˛ () D a eb q1

(98)

[a > 0; b > 0; 0 < ˛  2; q1 D (q C 1)/(3  q)] : (100)

Entropy

Consistently, L1;2 are Gaussians, L1;˛ are Lévy distributions, and L q;2 are q-Gaussians. We are seeking for the N ! 1 attractor associated with the sum of N identical and distinguishable random variables each of them associated with one and the same arbitrary symmetric distribution f (x). The random variables are independent for q D 1, and correlated in a special manner for q ¤ 1. To obtain the N ! 1 invariant distribution, i. e. the attractor, the sum must be rescaled, i. e., divided by N ı , where ıD

1 : ˛(2  q)

(101) p

For (˛; q) D (2; 1), we recover the traditional 1/ N rescaling of Brownian motion. At the present stage, the theorems have been established for q 1 and are summarized in Table 1. The case q < 1 is still open at the time at which these lines are being written. Two q < 1 cases have been preliminarily explored numerically in [124] and in [171]. The numerics seemed to indicate that the N ! 1 limits would be q-Gaussians for both models. However, it has been analytically shown [94] that it is not exactly so. The limiting distributions numerically are amazingly close to q-Gaussians, but they are in fact different. Very recently, another simple scale-invariant model

has been introduced [153], whose attractor has been analytically shown to be a q-Gaussian. These q ¤ 1 theorems play for the nonadditive entropy Sq and nonextensive statistical mechanics the same grounding role that the well known q D 1 theorems play for the additive entropy SBG and BG statistical mechanics. In particular, interestingly enough, the ubiquity of Gaussians and of q-Gaussians in natural, artificial and social systems may be understood on equal footing. Future Directions The concept of entropy permeates into virtually all quantitative sciences. The future directions could therefore be very varied. If we restrict, however, to the evidence presently available, the main lines along which evolution occurs are: Networks Many of the so-called scale-free networks, among others, systematically exhibit a degree distribution p(k) (probability of a node having k links) which is of the form p(k) /

1 (k0 C k)

( > 0; k0 > 0) ;

(102)

or, equivalently,

Entropy, Table 1 The attractors corresponding to the four basic where the NRvariables that are being summed are q-independent (i. e., globally R cases, 1 1 correlated) with q1 D (1 C q)/(3  q); Q ( 1 dx x 2 [f(x)]Q )/( 1 dx [f(x)]Q ) with Q 2q  1. The attractor for (q; ˛) D (1; 2) is a Gaussian G(x) L1;2 (standard Central Limit Theorem); for q D 1 and 0 < ˛ < 2, it is a Lévy distribution L˛ L1;˛ (the so called Lévy-Gnedenko limit theorem); for ˛ D 2 and q ¤ 1, it is a q-Gaussian Gq Lq;2 (the q-Central Limit Theorem; [203]); finally, for q ¤ 1 and 0 < ˛ < 2, it is a generic (q; ˛)-stable distribution Lq;˛ ([204,205]). See [140,189] for typical illustrations of the four types of attractors. The distribution L˛ (x) remains, for 1 < ˛ < 2, close to a Gaussian for jxj up to about xc (1; ˛), where it makes a crossover to a power-law. The distribution Gq (x) remains, for q > 1, close to a Gaussian for jxj up to about xc (q; 2), where it makes a crossover to a power-law. The distribution Lq;˛ (x) remains, for q > 1 and ˛ < 2, close to a Gaussian for jxj up to about xc(1) (q; ˛), where it makes a crossover to a power-law (intermediate regime), which lasts further up to about xc(2) (q; ˛), where it makes a second crossover to another power-law (distant regime) q ¤ 1 (i. e., Q ¤ 1) [globally correlated] Gq (x) D G(3q1 1)/(1Cq1 ) (x) [with same  Q of f (x)] Gq (x) G(x) if jxj xc (q; 2) Gq (x) Cq;2 /jxj2/(q1) if jxj  xc (q; 2) for q > 1, with limq!1 xc (q; 2) D 1 Q ! 1 L˛ (x) Lq;˛ (x) (˛ < 2) [with same jxj ! 1 behavior of f (x)] [with same jxj ! 1 behavior of f (x)] Q < 1 (˛ D 2)

q D 1 [independent] G(x) [with same  1 of f (x)]

2(1q)˛(3q)

L˛ (x) G(x) if jxj xc (1; ˛) L˛ (x) C1;˛ /jxj1C˛ if jxj  xc (1; ˛)

(intermediate) 2(q1) Lq;˛ Cq;˛ /jxj (1) if xc (q; ˛) jxj xc(2) (q; ˛) 1C˛

with lim˛!2 qc (1; ˛) D 1

(distant) Lq;˛ Cq;˛ /jxj 1C˛(q1) (2) if jxj  xc (q; ˛)

955

956

Entropy

Entropy, Figure 5 Snapshot of a nongrowing dynamic network with N D 256 nodes (see details in [172], by courtesy of the author)

p(k) / ek/ q

(q 1;  > 0) ;

(103)

with  D 1/(q  1) and k0 D /(q  1) (see Figs. 5 and 6). This is not surprising since, if we associate to each link an “energy” (or cost) and to each node half of the “energy” carried by its links (the other half being associated with the other nodes to which any specific node is linked), the distribution of energies optimizing Sq precisely coincides with the degree distribution. If, for any reason, we consider k as the modulus of a d-dimensional vector k, the optimization of the functional S q [p(k)] may lead to , where k  plays the role of a denp(k) / k  ek/ q sity of states, (d) being either zero (which reproduces Eq. (103)) or positive or negative. Several examples [12,39,76,91,165,172,173,212,213] already exist in the literature; in particular, the Barabasi–Albert universality class  D 3 corresponds to q D 4/3. A deeper understanding of this connection might enable the systematic calculation of several meaningful properties of networks. Nonlinear dynamical systems, self-organized criticality, and cellular automata Various interesting phenomena emerge in both low- and high-dimensional weakly chaotic deterministic dynamical systems, either dissipative or conservative. Among these phenomena we have the sensitivity to the initial conditions and the entropy production, which have been briefly addressed in Eq. (37) and related papers. But there

Entropy, Figure 6 Nongrowing dynamic network: a Cumulative degree distribution for typical values for the number N of nodes; b Same data of a in the convenient representation linear q-log versus linear with Zq (k) lnq [Pq (> k)] ([Pq (> k)]1q  1)/(1  q) (the optimal fitting with a q-exponential is obtained for the value of q which has the highest value of the linear correlation r as indicated in the inset; here this is qc D 1:84, which corresponds to the slope  1.19 in a). See details in [172,173]

is much more, such as relaxation, escape, glassy states, and distributions associated with the stationary state [14,15,31,46,62,67,68,77,103,111,122,123,154, 170,174,176,177,179]. Also, recent numerical indications suggest the validity of a dynamical version of the q-generalized central limit theorem [175]. The possible connections between all these various properties is still in its infancy. Long-range-interacting many-body Hamiltonians A wide class of long-range-interacting N-body classical Hamiltonians exhibits collective states whose Lyapunov spectrum has a maximal value that vanishes in the N ! 1 limit. As such, they constitute natural

Entropy

Entropy, Figure 7 Distribution of velocities for the HMF model at the quasi-stationary state (whose duration appears to diverge when N ! 1). The blue curves indicate a Gaussian, for comparison. See details in [137]

candidates for studying whether the concepts derived from the nonadditive entropy Sq are applicable. A variety of properties have been calculated, through molecular dynamics, for various systems, such as Lennard– Jones-like fluids, XY and Heisenberg ferromagnets, gravitational-like models, and others. One or more long-standing quasi-stationary states (infinitely longstanding in the limit N ! 1) are typically observed before the terminal entrance into thermal equilibrium. Properties such as distribution of velocities and angles, correlation functions, Lyapunov spectrum, metastability, glassy states, aging, time-dependence of the temperature in isolated systems, energy whenever thermal contact with a large thermostat at a given temperature is allowed, diffusion, order parameter, and others, are typically focused on. An ongoing debate exists, also involving Vlasov-like equations, Lynden–Bell statistics, among others. The breakdown of ergodicity that emerges in various situations makes the whole discussion rich and complex. The activity of the research nowadays in this area is illustrated in papers such as [21,26,45,53,56,57,63,104,119,121,126,127,132,133, 134,135,136,142,169,200]. A quite remarkable molecular-dynamical result has been obtained for a paradigmatic long-range Hamiltonian: the distribution of time averaged velocities sensibly differs from that found for the ensemble-averaged velocities, and has been shown to be numerically consistent with a q-Gaussian [137], as shown in Fig. 7. This result provides strong support to a conjecture made long ago: see Fig. 4 at p. 8 of [157]. Stochastic differential equations Quite generic Fokker– Planck equations are currently being studied. Aspects

Entropy, Figure 8 Quantum Monte Carlo simulations in [81]: a Velocity distribution (superimposed with a q-Gaussian); b Index q (superimposed with Lutz prediction [110], by courtesy of the authors)

such as fractional derivatives, nonlinearities, space-dependent diffusion coefficients are being focused on, as well as their connections to entropic forms, and associated generalized Langevin equations [20,23,24,70,128, 168,214]. Quite recently, computational (see Fig. 8) and experimental (see Fig. 9) verifications of Lutz’ 2003 prediction [110] have been exhibited [81], namely about the q-Gaussian form of the velocity distribu-

957

958

Entropy

Entropy, Figure 9 Experiments in [81]: a Velocity distribution (superimposed with a q-Gaussian); b Index q as a function of the frequency; c Velocity distribution (superimposed with a q-Gaussian; the red curve is a Gaussian); d Tail of the velocity distribution (superimposed with the asymptotic power-law of a q-Gaussian). [By courtesy of the authors]

tion of cold atoms in dissipative optical lattices, with q D 1 C 44E R /U0 (E R and U 0 being energy parameters of the optical lattice). These experimental verifications are in variance with some of those exhibited previously [96], namely double-Gaussians. Although it is naturally possible that the experimental conditions have not been exactly equivalent, this interesting question remains open at the present time. A hint might be hidden in the recent results [62] obtained for a quite different problem, namely the size distributions of avalanches; indeed, at a critical state, a q-Gaussian shape was obtained, whereas, at a noncritical state, a double-Gaussian was observed.

Quantum entanglement and quantum chaos The nonlocal nature of quantum physics implies phenomena that are somewhat analogous to those originated by classical long-range interactions. Consequently, a variety of studies are being developed in connection with the entropy Sq [3,36,58,59,60,61,155,156,195]. The same happens with some aspects of quantum chaos [11,180,210,211]. Astrophysics, geophysics, economics, linguistics, cognitive psychology, and other interdisciplinary applications Applications are available and presently searched in many areas of physics (plasmas, turbulence, nuclear collisions, elementary particles, manganites), but also

Entropy

in interdisciplinary sciences such astrophysics [38,47, 48,49,78,84,87,101,109,129,196], geophysics [4,5,6,7,8, 9,10,62,208], economics [25,50,51,52,80,139,141,197, 215], linguistics [118], cognitive psychology [181], and others. Global optimization, image and signal processing Optimizing algorithms and related techniques for signal and image processing are currently being developed using the entropic concepts presented in this article [17,35,72,75,95,105,114,120,164,166,191]. Superstatistics and other generalizations The methods discussed here have been generalized along a variety of lines. These include Beck–Cohen superstatistics [32,34,65,190], crossover statistics [194,196], spectral statistics [201]. Also, a huge variety of entropies have been introduced which generalize in different manners the BG entropy, or even focus on other possibilities. Their number being nowadays over forty, we mention here just a few of them: see [18,44,69,98,99, 115].

Acknowledgments Among the very many colleagues towards which I am deeply grateful for profound and long-lasting comments along many years, it is a must to explicitly thank S. Abe, E.P. Borges, E.G.D. Cohen, E.M.F. Curado, M. Gell-Mann, R.S. Mendes, A. Plastino, A.R. Plastino, A.K. Rajagopal, A. Rapisarda and A. Robledo.

Bibliography 1. Abe S (2000) Axioms and uniqueness theorem for Tsallis entropy. Phys Lett A 271:74–79 2. Abe S (2002) Stability of Tsallis entropy and instabilities of Renyi and normalized Tsallis entropies: A basis for q-exponential distributions. Phys Rev E 66:046134 3. Abe S, Rajagopal AK (2001) Nonadditive conditional entropy and its significance for local realism. Physica A 289:157–164 4. Abe S, Suzuki N (2003) Law for the distance between successive earthquakes. J Geophys Res (Solid Earth) 108(B2):2113 5. Abe S, Suzuki N (2004) Scale-free network of earthquakes. Europhys Lett 65:581–586 6. Abe S, Suzuki N (2005) Scale-free statistics of time interval between successive earthquakes. Physica A 350:588–596 7. Abe S, Suzuki N (2006) Complex network of seismicity. Prog Theor Phys Suppl 162:138–146 8. Abe S, Suzuki N (2006) Complex-network description of seismicity. Nonlinear Process Geophys 13:145–150 9. Abe S, Sarlis NV, Skordas ES, Tanaka H, Varotsos PA (2005) Optimality of natural time representation of complex time series. Phys Rev Lett 94:170601 10. Abe S, Tirnakli U, Varotsos PA (2005) Complexity of seismicity and nonextensive statistics. Europhys News 36:206–208

11. Abul AY-M (2005) Nonextensive random matrix theory approach to mixed regular-chaotic dynamics. Phys Rev E 71:066207 12. Albert R, Barabasi AL (2000) Phys Rev Lett 85:5234–5237 13. Alemany PA, Zanette DH (1994) Fractal random walks from a variational formalism for Tsallis entropies. Phys Rev E 49:R956–R958 14. Ananos GFJ, Tsallis C (2004) Ensemble averages and nonextensivity at the edge of chaos of one-dimensional maps. Phys Rev Lett 93:020601 15. Ananos GFJ, Baldovin F, Tsallis C (2005) Anomalous sensitivity to initial conditions and entropy production in standard maps: Nonextensive approach. Euro Phys J B 46:409–417 16. Andrade RFS, Pinho STR (2005) Tsallis scaling and the longrange Ising chain: A transfer matrix approach. Phys Rev E 71:026126 17. Andricioaei I, Straub JE (1996) Generalized simulated annealing algorithms using Tsallis statistics: Application to conformational optimization of a tetrapeptide. Phys Rev E 53:R3055–R3058 18. Anteneodo C, Plastino AR (1999) Maximum entropy approach to stretched exponential probability distributions. J Phys A 32:1089–1097 19. Anteneodo C, Tsallis C (1998) Breakdown of exponential sensitivity to initial conditions: Role of the range of interactions. Phys Rev Lett 80:5313–5316 20. Anteneodo C, Tsallis C (2003) Multiplicative noise: A mechanism leading to nonextensive statistical mechanics. J Math Phys 44:5194–5203 21. Antoniazzi A, Fanelli D, Barre J, Chavanis P-H, Dauxois T, Ruffo S (2007) Maximum entropy principle explains quasi-stationary states in systems with long-range interactions: The example of the Hamiltonian mean-field model. Phys Rev E 75:011112 22. Arevalo R, Garcimartin A, Maza D (2007) A non-standard statistical approach to the silo discharge. Eur Phys J Special Topics 143:191–197 23. Assis PC Jr, da Silva LR, Lenzi EK, Malacarne LC, Mendes RS (2005) Nonlinear diffusion equation, Tsallis formalism and exact solutions. J Math Phys 46:123303 24. Assis PC Jr, da Silva PC, da Silva LR, Lenzi EK, Lenzi MK (2006) Nonlinear diffusion equation and nonlinear external force: Exact solution. J Math Phys 47:103302 25. Ausloos M, Ivanova K (2003) Dynamical model and nonextensive statistical mechanics of a market index on large time windows. Phys Rev E 68:046122 26. Baldovin F, Orlandini E (2006) Incomplete equilibrium in longrange interacting systems. Phys Rev Lett 97:100601 27. Baldovin F, Robledo A (2002) Sensitivity to initial conditions at bifurcations in one-dimensional nonlinear maps: Rigorous nonextensive solutions. Europhys Lett 60:518–524 28. Baldovin F, Robledo A (2002) Universal renormalizationgroup dynamics at the onset of chaos in logistic maps and nonextensive statistical mechanics. Phys Rev E 66:R045104 29. Baldovin F, Robledo A (2004) Nonextensive Pesin identity. Exact renormalization group analytical results for the dynamics at the edge of chaos of the logistic map. Phys Rev E 69:R045202 30. Baldovin F, Robledo A (2005) Parallels between the dynamics at the noise-perturbed onset of chaos in logistic maps and the dynamics of glass formation. Phys Rev E 72:066213

959

960

Entropy

31. Baldovin F, Moyano LG, Majtey AP, Robledo A, Tsallis C (2004) Ubiquity of metastable-to-stable crossover in weakly chaotic dynamical systems. Physica A 340:205–218 32. Beck C, Cohen EGD (2003) Superstatistics. Physica A 322:267– 275 33. Beck C, Schlogl F (1993) Thermodynamics of Chaotic Systems. Cambridge University Press, Cambridge 34. Beck C, Cohen EGD, Rizzo S (2005) Atmospheric turbulence and superstatistics. Europhys News 36:189–191 35. Ben A Hamza (2006) Nonextensive information-theoretic measure for image edge detection. J Electron Imaging 15: 013011 36. Batle J, Plastino AR, Casas M, Plastino A (2004) Inclusion relations among separability criteria. J Phys A 37:895–907 37. Batle J, Casas M, Plastino AR, Plastino A (2005) Quantum entropies and entanglement. Intern J Quantum Inf 3:99–104 38. Bernui A, Tsallis C, Villela T (2007) Deviation from Gaussianity in the cosmic microwave background temperature fluctuations. Europhys Lett 78:19001 39. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U (2006) Phys Rep 424:175–308 40. Bologna M, Tsallis C, Grigolini P (2000) Anomalous diffusion associated with nonlinear fractional derivative FokkerPlanck-like equation: Exact time-dependent solutions. Phys Rev E 62:2213–2218 41. Boltzmann L (1896) Vorlesungen über Gastheorie. Part II, ch I, paragraph 1. Leipzig, p 217; (1964) Lectures on Gas Theory (trans: Brush S). Univ. California Press, Berkeley 42. Boon JP, Tsallis C (eds) (2005) Nonextensive Statistical Mechanics: New Trends, New Perspectives. Europhysics News 36(6):185–231 43. Borges EP (2004) A possible deformed algebra and calculus inspired in nonextensive thermostatistics. Physica A 340:95– 101 44. Borges EP, Roditi I (1998) A family of non-extensive entropies. Phys Lett A 246:399–402 45. Borges EP, Tsallis C (2002) Negative specific heat in a Lennard–Jones-like gas with long-range interactions. Physica A 305:148–151 46. Borges EP, Tsallis C, Ananos GFJ, Oliveira PMC (2002) Nonequilibrium probabilistic dynamics at the logistic map edge of chaos. Phys Rev Lett 89:254103 47. Burlaga LF, Vinas AF (2004) Multiscale structure of the magnetic field and speed at 1 AU during the declining phase of solar cycle 23 described by a generalized Tsallis PDF. J Geophys Res Space – Phys 109:A12107 48. Burlaga LF, Vinas AF (2005) Triangle for the entropic index q of non-extensive statistical mechanics observed by Voyager 1 in the distant heliosphere. Physica A 356:375–384 49. Burlaga LF, Ness NF, Acuna MH (2006) Multiscale structure of magnetic fields in the heliosheath. J Geophys Res Space – Phys 111:A09112 50. Borland L (2002) A theory of non-gaussian option pricing. Quant Finance 2:415–431 51. Borland L (2002) Closed form option pricing formulas based on a non-Gaussian stock price model with statistical feedback. Phys Rev Lett 89:098701 52. Borland L, Bouchaud J-P (2004) A non-Gaussian option pricing model with skew. Quant Finance 4:499–514

53. Cabral BJC, Tsallis C (2002) Metastability and weak mixing in classical long-range many-rotator system. Phys Rev E 66:065101(R) 54. Calabrese P, Cardy J (2004) JSTAT – J Stat Mech Theory Exp P06002 55. Callen HB (1985) Thermodynamics and An Introduction to Thermostatistics, 2nd edn. Wiley, New York 56. Chavanis PH (2006) Lynden-Bell and Tsallis distributions for the HMF model. Euro Phys J B 53:487–501 57. Chavanis PH (2006) Quasi-stationary states and incomplete violent relaxation in systems with long-range interactions. Physica A 365:102–107 58. Canosa N, Rossignoli R (2005) General non-additive entropic forms and the inference of quantum density operstors. Physica A 348:121–130 59. Cannas SA, Tamarit FA (1996) Long-range interactions and nonextensivity in ferromagnetic spin models. Phys Rev B 54:R12661–R12664 60. Caruso F, Tsallis C (2007) Extensive nonadditive entropy in quantum spin chains. In: Abe S, Herrmann HJ, Quarati P, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. American Institute of Physics Conference Proceedings, vol 965. New York, pp 51–59 61. Caruso F, Tsallis C (2008) Nonadditive entropy reconciles the area law in quantum systems with classical thermodynamics. Phys Rev E 78:021101 62. Caruso F, Pluchino A, Latora V, Vinciguerra S, Rapisarda A (2007) Analysis of self-organized criticality in the Olami– Feder–Christensen model and in real earthquakes. Phys Rev E 75:055101(R) 63. Campa A, Giansanti A, Moroni D (2002) Metastable states in a class of long-range Hamiltonian systems. Physica A 305:137–143 64. Csiszar I (1978) Information measures: A critical survey. In: Transactions of the Seventh Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, and the European Meeting of Statisticians, 1974. Reidel, Dordrecht 65. Cohen EGD (2005) Boltzmann and Einstein: Statistics and dynamics – An unsolved problem. Boltzmann Award Lecture at Statphys-Bangalore-2004. Pramana 64:635–643 66. Condat CA, Rangel J, Lamberti PW (2002) Anomalous diffusion in the nonasymptotic regime. Phys Rev E 65:026138 67. Coraddu M, Meloni F, Mezzorani G, Tonelli R (2004) Weak insensitivity to initial conditions at the edge of chaos in the logistic map. Physica A 340:234–239 68. Costa UMS, Lyra ML, Plastino AR, Tsallis C (1997) Power-law sensitivity to initial conditions within a logistic-like family of maps: Fractality and nonextensivity. Phys Rev E 56:245–250 69. Curado EMF (1999) General aspects of the thermodynamical formalism. Braz J Phys 29:36–45 70. Curado EMF, Nobre FD (2003) Derivation of nonlinear FokkerPlanck equations by means of approximations to the master equation. Phys Rev E 67:021107 71. Curado EMF, Tsallis C (1991) Generalized statistical mechanics: connection with thermodynamics. Phys J A 24:L69-L72; [Corrigenda: 24:3187 (1991); 25:1019 (1992)] 72. Cvejic N, Canagarajah CN, Bull DR (2006) Image fusion metric based on mutual information and Tsallis entropy. Electron Lett 42:11

Entropy

73. Daniels KE, Beck C, Bodenschatz E (2004) Defect turbulence and generalized statistical mechanics. Physica D 193:208–217 74. Daroczy Z (1970) Information and Control 16:36 75. de Albuquerque MP, Esquef IA, Mello ARG, de Albuquerque MP (2004) Image thresholding using Tsallis entropy. Pattern Recognition Lett 25:1059–1065 76. de Meneses MDS, da Cunha SD, Soares DJB, da Silva LR (2006) In: Sakagami M, Suzuki N, Abe S (eds) Complexity and Nonextensivity: New Trends in Statistical Mechanics. Prog Theor Phys Suppl 162:131–137 77. de Moura FABF, Tirnakli U, Lyra ML (2000) Convergence to the critical attractor of dissipative maps: Log-periodic oscillations, fractality and nonextensivity. Phys Rev E 62:6361–6365 78. de Oliveira HP, Soares ID, Tonini EV (2004) Role of the nonextensive statistics in a three-degrees of freedom gravitational system. Phys Rev D 70:084012 79. de Souza AMC, Tsallis C (1997) Student’s t- and r-distributions: Unified derivation from an entropic variational principle. Physica A 236:52–57 80. de Souza J, Moyano LG, Queiros SMD (2006) On statistical properties of traded volume in financial markets. Euro Phys J B 50:165–168 81. Douglas P, Bergamini S, Renzoni F (2006) Tunable Tsallis distributions in dissipative optical lattices. Phys Rev Lett 96:110601 82. Fermi E (1936) Thermodynamics. Dover, New York, p 53 83. Ferri GL, Martinez S, Plastino A (2005) Equivalence of the four versions of Tsallis’ statistics. J Stat Mech P04009 84. Ferro F, Lavagno A, Quarati P (2004) Non-extensive resonant reaction rates in astrophysical plasmas. Euro Phys J A 21:529– 534 85. Fulco UL, da Silva LR, Nobre FD, Rego HHA, Lucena LS (2003) Effects of site dilution on the one-dimensional long-range bond-percolation problem. Phys Lett A 312:331–335 86. Frank TD (2005) Nonlinear Fokker–Planck Equations – Fundamentals and Applications. Springer, Berlin 87. Gervino G, Lavagno A, Quarati P (2005) CNO reaction rates and chemical abundance variations in dense stellar plasma. J Phys G 31:S1865–S1868 88. Gibbs JW (1902) Elementary Principles in Statistical Mechanics – Developed with Especial Reference to the Rational Foundation of Thermodynamics. C Scribner, New York; Yale University Press, New Haven, 1948; OX Bow Press, Woodbridge, Connecticut, 1981 89. Ginsparg P, Moore G (1993) Lectures on 2D Gravity and 2D String Theory. Cambridge University Press, Cambridge; hepth/9304011, p 65 90. Grigera JR (1996) Extensive and non-extensive thermodynamics. A molecular dyanamic test. Phys Lett A 217:47–51 91. Hasegawa H (2006) Nonextensive aspects of small-world networks. Physica A 365:383–401 92. Havrda J, Charvat F (1967) Kybernetika 3:30 93. Hernandez H-S, Robledo A (2006) Fluctuating dynamics at the quasiperiodic onset of chaos, Tsallis q-statistics and Mori’s qphase thermodynamics. Phys A 370:286–300 94. Hilhorst HJ, Schehr G (2007) A note on q-Gaussians and nonGaussians in statistical mechanics. J Stat Mech P06003 95. Jang S, Shin S, Pak Y (2003) Replica-exchange method using the generalized effective potential. Phys Rev Lett 91:058305

96. Jersblad J, Ellmann H, Stochkel K, Kastberg A, Sanchez L-P, Kaiser R (2004) Non-Gaussian velocity distributions in optical lattices. Phys Rev A 69:013410 97. Jund P, Kim SG, Tsallis C (1995) Crossover from extensive to nonextensive behavior driven by long-range interactions. Phys Rev B 52:50–53 98. Kaniadakis G (2001) Non linear kinetics underlying generalized statistics. Physica A 296:405–425 99. Kaniadakis G, Lissia M, Scarfone AM (2004) Deformed logarithms and entropies. Physica A 340:41–49 100. Khinchin AI (1953) Uspekhi Matem. Nauk 8:3 (Silverman RA, Friedman MD, trans. Math Found Inf Theory. Dover, New York) 101. Kronberger T, Leubner MP, van Kampen E (2006) Dark matter density profiles: A comparison of nonextensive statistics with N-body simulations. Astron Astrophys 453:21–25 102. Latora V, Baranger M (1999) Kolmogorov-Sinai entropy rate versus physical entropy. Phys Rev Lett 82:520–523 103. Latora V, Baranger M, Rapisarda A, Tsallis C (2000) The rate of entropy increase at the edge of chaos. Phys Lett A 273:97–103 104. Latora V, Rapisarda A, Tsallis C (2001) Non-Gaussian equilibrium in a long-range Hamiltonian system. Phys Rev E 64:056134 105. Lemes MR, Zacharias CR, Dal Pino A Jr (1997) Generalized simulated annealing: Application to silicon clusters. Phys Rev B 56:9279–9281 106. Lenzi EK, Anteneodo C, Borland L (2001) Escape time in anomalous diffusive media. Phys Rev E 63:051109 107. Lesche B (1982) Instabilities of Rényi entropies. J Stat Phys 27:419–422 108. Lindhard J, Nielsen V (1971) Studies in statistical mechanics. Det Kongelige Danske Videnskabernes Selskab Matematiskfysiske Meddelelser (Denmark) 38(9):1–42 109. Lissia M, Quarati P (2005) Nuclear astrophysical plasmas: Ion distributions and fusion rates. Europhys News 36:211–214 110. Lutz E (2003) Anomalous diffusion and Tsallis statistics in an optical lattice. Phys Rev A 67:051402(R) 111. Lyra ML, Tsallis C (1998) Nonextensivity and multifractality in low-dimensional dissipative systems. Phys Rev Lett 80:53–56 112. Mann GM, Tsallis C (eds) (2004) Nonextensive Entropy – Interdisciplinary Applications. Oxford University Press, New York 113. Marsh JA, Fuentes MA, Moyano LG, Tsallis C (2006) Influence of global correlations on central limit theorems and entropic extensivity. Physica A 372:183–202 114. Martin S, Morison G, Nailon W, Durrani T (2004) Fast and accurate image registration using Tsallis entropy and simultaneous perturbation stochastic approximation. Electron Lett 40(10):20040375 115. Masi M (2005) A step beyond Tsallis and Renyi entropies. Phys Lett A 338:217–224 116. Mayoral E, Robledo A (2004) Multifractality and nonextensivity at the edge of chaos of unimodal maps. Physica A 340:219–226 117. Mayoral E, Robledo A (2005) Tsallis’ q index and Mori’s q phase transitions at edge of chaos. Phys Rev E 72:026209 118. Montemurro MA (2001) Beyond the Zipf–Mandelbrot law in quantitative linguistics. Physica A 300:567–578 119. Montemurro MA, Tamarit F, Anteneodo C (2003) Aging in an infinite-range Hamiltonian system of coupled rotators. Phys Rev E 67:031106

961

962

Entropy

120. Moret MA, Pascutti PG, Bisch PM, Mundim MSP, Mundim KC (2006) Classical and quantum conformational analysis using Generalized Genetic Algorithm. Phys A 363:260–268 121. Moyano LG, Anteneodo C (2006) Diffusive anomalies in a long-range Hamiltonian system. Phys Rev E 74:021118 122. Moyano LG, Majtey AP, Tsallis C (2005) Weak chaos in large conservative system – Infinite-range coupled standard maps. In: Beck C, Benedek G, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. World Scientific, Singapore, pp 123–127 123. Moyano LG, Majtey AP, Tsallis C (2006) Weak chaos and metastability in a symplectic system of many long-range-coupled standard maps. Euro Phys J B 52:493–500 124. Moyano LG, Tsallis C, Gell-Mann M (2006) Numerical indications of a q-generalised central limit theorem. Europhys Lett 73:813–819 125. Nivanen L, Le Mehaute A, Wang QA (2003) Generalized algebra within a nonextensive statistics. Rep Math Phys 52:437– 444 126. Nobre FD, Tsallis C (2003) Classical infinite-range-interaction Heisenberg ferromagnetic model: Metastability and sensitivity to initial conditions. Phys Rev E 68:036115 127. Nobre FD, Tsallis C (2004) Metastable states of the classical inertial infinite-range-interaction Heisenberg ferromagnet: Role of initial conditions. Physica A 344:587–594 128. Nobre FD, Curado EMF, Rowlands G (2004) A procedure for obtaining general nonlinear Fokker-Planck equations. Physica A 334:109–118 129. Oliveira HP, Soares ID (2005) Dynamics of black hole formation: Evidence for nonextensivity. Phys Rev D 71:124034 130. Penrose O (1970) Foundations of Statistical Mechanics: A Deductive Treatment. Pergamon Press, Oxford, p 167 131. Plastino AR, Plastino A (1995) Non-extensive statistical mechanics and generalized Fokker-Planck equation. Physica A 222:347–354 132. Pluchino A, Rapisarda A (2006) Metastability in the Hamiltonian Mean Field model and Kuramoto model. Physica A 365:184–189 133. Pluchino A, Rapisarda A (2006) Glassy dynamics and nonextensive effects in the HMF model: the importance of initial conditions. In: Sakagami M, Suzuki N, Abe S (eds) Complexity and Nonextensivity: New Trends in Statistical Mechanics. Prog Theor Phys Suppl 162:18–28 134. Pluchino A, Latora V, Rapisarda A (2004) Glassy dynamics in the HMF model. Physica A 340:187–195 135. Pluchino A, Latora V, Rapisarda A (2004) Dynamical anomalies and the role of initial conditions in the HMF model. Physica A 338:60–67 136. Pluchino A, Rapisarda A, Latora V (2005) Metastability and anomalous behavior in the HMF model: Connections to nonextensive thermodynamics and glassy dynamics. In: Beck C, Benedek G, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. World Scientific, Singapore, pp 102–112 137. Pluchino A, Rapisarda A, Tsallis C (2007) Nonergodicity and central limit behavior in long-range Hamiltonians. Europhys Lett 80:26002 138. Prato D, Tsallis C (1999) Nonextensive foundation of Levy distributions. Phys Rev E 60:2398–2401

139. Queiros SMD (2005) On non-Gaussianity and dependence in financial in time series: A nonextensive approach. Quant Finance 5:475–487 140. Queiros SMD, Tsallis C (2007) Nonextensive statistical mechanics and central limit theorems II – Convolution of q-independent random variables. In: Abe S, Herrmann HJ, Quarati P, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. American Institute of Physics Conference Proceedings, vol 965. New York, pp 21–33 141. Queiros SMD, Moyano LG, de Souza J, Tsallis C (2007) A nonextensive approach to the dynamics of financial observables. Euro Phys J B 55:161–168 142. Rapisarda A, Pluchino A (2005) Nonextensive thermodynamics and glassy behavior. Europhys News 36:202–206; Erratum: 37:25 (2006) 143. Rapisarda A, Pluchino A (2005) Nonextensive thermodynamics and glassy behaviour in Hamiltonian systems. Europhys News 36:202–206; Erratum: 37:25 (2006) 144. Rego HHA, Lucena LS, da Silva LR, Tsallis C (1999) Crossover from extensive to nonextensive behavior driven by longrange d D 1 bond percolation. Phys A 266:42–48 145. Renyi A (1961) In: Proceedings of the Fourth Berkeley Symposium, 1:547 University California Press, Berkeley; Renyi A (1970) Probability theory. North-Holland, Amsterdam 146. Robledo A (2004) Aging at the edge of chaos: Glassy dynamics and nonextensive statistics. Physica A 342:104–111 147. Robledo A (2004) Universal glassy dynamics at noise-perturbed onset of chaos: A route to ergodicity breakdown. Phys Lett A 328:467–472 148. Robledo A (2004) Criticality in nonlinear one-dimensional maps: RG universal map and nonextensive entropy. Physica D 193:153–160 149. Robledo A (2005) Intermittency at critical transitions and aging dynamics at edge of chaos. Pramana-J Phys 64:947–956 150. Robledo A (2005) Critical attractors and q-statistics. Europhys News 36:214–218 151. Robledo A (2006) Crossover from critical to chaotic attractor dynamics in logistic and circle maps. In: Sakagami M, Suzuki N, Abe S (eds) Complexity and Nonextensivity: New Trends in Statistical Mechanics. Prog Theor Phys Suppl 162:10–17 152. Robledo A, Baldovin F, Mayoral E (2005) Two stories outside Boltzmann-Gibbs statistics: Mori’s q-phase transitions and glassy dynamics at the onset of chaos. In: Beck C, Benedek G, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. World Scientific, Singapore, p 43 153. Rodriguez A, Schwammle V, Tsallis C (2008) Strictly and asymptotically scale-invariant probabilistic models of N correlated binary random variables havin q-Gaussians as N -> infinity Limiting distributions. J Stat Mech P09006 154. Rohlf T, Tsallis C (2007) Long-range memory elementary 1D cellular automata: Dynamics and nonextensivity. Physica A 379:465–470 155. Rossignoli R, Canosa N (2003) Violation of majorization relations in entangled states and its detection by means of generalized entropic forms. Phys Rev A 67:042302 156. Rossignoli R, Canosa N (2004) Generalized disorder measure and the detection of quantum entanglement. Physica A 344:637–643 157. Salinas SRA, Tsallis C (eds) (1999) Nonextensive Statistical Mechanics and Thermodynamics. Braz J Phys 29(1)

Entropy

158. Sampaio LC, de Albuquerque MP, de Menezes FS (1997) Nonextensivity and Tsallis statistic in magnetic systems. Phys Rev B 55:5611–5614 159. Santos RJV (1997) Generalization of Shannon’ s theorem for Tsallis entropy. J Math Phys 38:4104–4107 160. Sato Y, Tsallis C (2006) In: Bountis T, Casati G, Procaccia I (eds) Complexity: An unifying direction in science. Int J Bif Chaos 16:1727–1738 161. Schutzenberger PM (1954) Contributions aux applications statistiques de la theorie de l’ information. Publ Inst Statist Univ Paris 3:3 162. Shannon CE (1948) A Mathematical Theory of Communication. Bell Syst Tech J 27:379–423; 27:623–656; (1949) The Mathematical Theory of Communication. University of Illinois Press, Urbana 163. Sharma BD, Mittal DP (1975) J Math Sci 10:28 164. Serra P, Stanton AF, Kais S, Bleil RE (1997) Comparison study of pivot methods for global optimization. J Chem Phys 106:7170–7177 165. Soares DJB, Tsallis C, Mariz AM, da Silva LR (2005) Preferential Attachment growth model and nonextensive statistical mechanics. Europhys Lett 70:70–76 166. Son WJ, Jang S, Pak Y, Shin S (2007) Folding simulations with novel conformational search method. J Chem Phys 126:104906 167. Stigler SM (1999) Statistics on the table – The history of statistical concepts and methods. Harvard University Press, Cambridge 168. Silva AT, Lenzi EK, Evangelista LR, Lenzi MK, da Silva LR (2007) Fractional nonlinear diffusion equation, solutions and anomalous diffusion. Phys A 375:65–71 169. Tamarit FA, Anteneodo C (2005) Relaxation and aging in longrange interacting systems. Europhys News 36:194–197 170. Tamarit FA, Cannas SA, Tsallis C (1998) Sensitivity to initial conditions and nonextensivity in biological evolution. Euro Phys J B 1:545–548 171. Thistleton W, Marsh JA, Nelson K, Tsallis C (2006) unpublished 172. Thurner S (2005) Europhys News 36:218–220 173. Thurner S, Tsallis C (2005) Nonextensiveaspects of self-organized scale-free gas-like networks. Europhys Lett 72:197–204 174. Tirnakli U, Ananos GFJ, Tsallis C (2001) Generalization of the Kolmogorov–Sinai entropy: Logistic – like and generalized cosine maps at the chaos threshold. Phys Lett A 289:51–58 175. Tirnakli U, Beck C, Tsallis C (2007) Central limit behavior of deterministic dynamical systems. Phys Rev E 75:040106(R) 176. Tirnakli U, Tsallis C, Lyra ML (1999) Circular-like maps: Sensitivity to the initial conditions, multifractality and nonextensivity. Euro Phys J B 11:309–315 177. Tirnakli U, Tsallis C (2006) Chaos thresholds of the z-logistic map: Connection between the relaxation and average sensitivity entropic indices. Phys Rev E 73:037201 178. Tisza L (1961) Generalized Thermodynamics. MIT Press, Cambridge, p 123 179. Tonelli R, Mezzorani G, Meloni F, Lissia M, Coraddu M (2006) Entropy production and Pesin-like identity at the onset of chaos. Prog Theor Phys 115:23–29 180. Toscano F, Vallejos RO, Tsallis C (2004) Random matrix ensembles from nonextensive entropy. Phys Rev E 69:066131 181. Tsallis AC, Tsallis C, Magalhaes ACN, Tamarit FA (2003) Human and computer learning: An experimental study. Complexus 1:181–189

182. Tsallis C Regularly updated bibliography at http://tsallis.cat. cbpf.br/biblio.htm 183. Tsallis C (1988) Possible generalization of Boltzmann–Gibbs statistics. J Stat Phys 52:479–487 184. Tsallis C (2004) What should a statistical mechanics satisfy to reflect nature? Physica D 193:3–34 185. Tsallis C (2004) Dynamical scenario for nonextensive statistical mechanics. Physica A 340:1–10 186. Tsallis C (2005) Is the entropy Sq extensive or nonextensive? In: Beck C, Benedek G, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. World Scientific, Singapore 187. Tsallis C (2005) Nonextensive statistical mechanics, anomalous diffusion and central limit theorems. Milan J Math 73:145–176 188. Tsallis C, Bukman DJ (1996) Anomalous diffusion in the presence of external forces: exact time-dependent solutions and their thermostatistical basis. Phys Rev E 54:R2197–R2200 189. Tsallis C, Queiros SMD (2007) Nonextensive statistical mechanics and central limit theorems I – Convolution of independent random variables and q-product. In: Abe S, Herrmann HJ, Quarati P, Rapisarda A, Tsallis C (eds) Complexity, Metastability and Nonextensivity. American Institute of Physics Conference Proceedings, vol 965. New York, pp 8–20 190. Tsallis C, Souza AMC (2003) Constructing a statistical mechanics for Beck-Cohen superstatistics. Phys Rev E 67:026106 191. Tsallis C, Stariolo DA (1996) Generalized simulated annealing. Phys A 233:395–406; A preliminary version appeared (in English) as Notas de Fisica/CBPF 026 (June 1994) 192. Tsallis C, Levy SVF, de Souza AMC, Maynard R (1995) Statistical-mechanical foundation of the ubiquity of Levy distributions in nature. Phys Rev Lett 75:3589–3593; Erratum: (1996) Phys Rev Lett 77:5442 193. Tsallis C, Mendes RS, Plastino AR (1998) The role of constraints within generalized nonextensive statistics. Physica A 261:534–554 194. Tsallis C, Bemski G, Mendes RS (1999) Is re-association in folded proteins a case of nonextensivity? Phys Lett A 257:93–98 195. Tsallis C, Lloyd S, Baranger M (2001) Peres criterion for separability through nonextensive entropy. Phys Rev A 63:042104 196. Tsallis C, Anjos JC, Borges EP (2003) Fluxes of cosmic rays: A delicately balanced stationary state. Phys Lett A 310:372– 376 197. Tsallis C, Anteneodo C, Borland L, Osorio R (2003) Nonextensive statistical mechanics and economics. Physica A 324:89– 100 198. Tsallis C, Mann GM, Sato Y (2005) Asymptotically scale-invariant occupancy of phase space makes the entropy Sq extensive. Proc Natl Acad Sci USA 102:15377–15382 199. Tsallis C, Mann GM, Sato Y (2005) Extensivity and entropy production. In: Boon JP, Tsallis C (eds) Nonextensive Statistical Mechanics: New Trends, New perspectives. Europhys News 36:186–189 200. Tsallis C, Rapisarda A, Pluchino A, Borges EP (2007) On the non-Boltzmannian nature of quasi-stationary states in longrange interacting systems. Physica A 381:143–147 201. Tsekouras GA, Tsallis C (2005) Generalized entropy arising from a distribution of q-indices. Phys Rev E 71:046144 202. Umarov S, Tsallis C (2007) Multivariate generalizations of the q–central limit theorem. cond-mat/0703533 203. Umarov S, Tsallis C, Steinberg S (2008) On a q-central limit the-

963

964

Entropy

204.

205.

206.

207. 208.

209.

orem consistent with nonextensive statistical mechanics. Milan J Math 76. doi:10.1007/s00032-008-0087-y Umarov S, Tsallis C, Gell-Mann M, Steinberg S (2008) Symmetric (q, ˛)-stable distributions. Part I: First representation. cond-mat/0606038v2 Umarov S, Tsallis C, Gell-Mann M, Steinberg S (2008) Symmetric (q, ˛)-stable distributions. Part II: Second representation. cond-mat/0606040v2 Upadhyaya A, Rieu J-P, Glazier JA, Sawada Y (2001) Anomalous diffusion and non-Gaussian velocity distribution of Hydra cells in cellular aggregates. Physica A 293:549–558 Vajda I (1968) Kybernetika 4:105 (in Czech) Varotsos PA, Sarlis NV, Tanaka HK, Skordas ES (2005) Some properties of the entropy in the natural time. Phys Rev E 71:032102 Wehrl A (1978) Rev Modern Phys 50:221

210. Weinstein YS, Lloyd S, Tsallis C (2002) Border between between regular and chaotic quantum dynamics. Phys Rev Lett 89:214101 211. Weinstein YS, Tsallis C, Lloyd S (2004) On the emergence of nonextensivity at the edge of quantum chaos. In: Elze H-T (ed) Decoherence and Entropy in Complex Systems. Lecture notes in physics, vol 633. Springer, Berlin, pp 385–397 212. White DR, Kejzar N, Tsallis C, Farmer JD, White S (2005) A generative model for feedback networks. Phys Rev E 73:016119 213. Wilk G, Wlodarczyk Z (2004) Acta Phys Pol B 35:871–879 214. Wu JL, Chen HJ (2007) Fluctuation in nonextensive reactiondiffusion systems. Phys Scripta 75:722–725 215. Yamano T (2004) Distribution of the Japanese posted land price and the generalized entropy. Euro Phys J B 38:665–669 216. Zanette DH, Alemany PA (1995) Thermodynamics of anomalous diffusion. Phys Rev Lett 75:366–369

Ergodic Theory of Cellular Automata

Ergodic Theory of Cellular Automata MARCUS PIVATO Department of Mathematics, Trent University, Peterborough, Canada Article Outline Glossary Definition of the Subject Introduction Invariant Measures for CA Limit Measures and Other Asymptotics Measurable Dynamics Entropy Future Directions and Open Problems Acknowledgments Bibliography Glossary Configuration space and the shift Let M be a finitely generated group or monoid (usually abelian). Typically, M D N :D f0; 1; 2; : : :g or M D Z :D f: : : ; 1; 0; 1; 2; : : :g, or M D N E , ZD , or Z D  N E for some D; E 2 N. In some applications, M could be nonabelian (although usually amenable), but to avoid notational complexity we will generally assume M is abelian and additive, with operation ‘+’. Let A be a finite set of symbols (called an alphabet). Let AM denote the set of all functions a : M!A, which we regard as M-indexed configurations of elements in A. We write such a configuration as a D [am ]m2M , where am 2 A for all m 2 M, and refer to AM as configuration space. Treat A as a discrete topological space; then A is compact (because it is finite), so AM is compact in the Tychonoff product topology. In fact, AM is a Cantor space: it is compact, perfect, totally disconnected, and metrizable. For example, if M D Z D , then the D standard metric on AZ is defined d(a; b) D 2(a;b) , where (a; b) :D min fjzj; az ¤ bz g. Any v 2 M, determines a continuous shift map  v : AM !AM defined by  v (a)m D amCv for all a 2 AM and m 2 M. The set f v gv2M is then a continuous M-action on AM , which we denote simply by “”. If a 2 AM and U  M, then we define aU 2 AU by aU :D [au ]u2U . If m 2 M, then strictly speaking, amCU 2 AmCU ; however, it will often be convenient to ‘abuse notation’ and treat amCU as an element of AU in the obvious way.

Cellular automata Let H  M be some finite subset, and let  : AH !A be a function (called a local rule). The cellular automaton (CA) determined by  is the function ˚ : AM !AM defined by ˚(a)m D (amCH ) for all a 2 AM and m 2 M. Curtis, Hedlund and Lyndon showed that cellular automata are exactly the continuous transformations of AM which commute with all shifts (see Theorem 3.4 in [58]). We refer to H as the neighborhood of ˚. For example, if M D Z, then typically H :D [` : : : r] :D f`; 1  `; : : : ; r  1; rg for some left radius ` 0 and right radius r 0. If ` 0, then  can either define CA on AN or define a one-sided CA on AZ . If M D Z D , then typically H [R : : : R] D , for some radius R 0. Normally we assume that `, r, and R are chosen to be minimal. Several specific classes of CA will be important to us: Linear CA Let (A; C) be a finite abelian group (e. g. A D Z/p , where p 2 N; usually p is prime). Then ˚ is a linear CA (LCA) if the local rule  has the form X 'h (ah ) ; 8aH 2 AH ; (1) (aH ) :D h2H

where 'h : A!A is an endomorphism of (A; C), for each h 2 H. We say that ˚ has scalar coefficients if, for each h 2 H, there is some scalar ch 2 Z, so that 'h (ah ) :D ch  ah ; P then (aH ) :D h2H ch ah . For example, if A D (Z/p ; C), then all endomorphisms are scalar multiplications, so all LCA have scalar coefficients. If ch D 1 for all h 2 H, then ˚ has local rule P (aH ) :D h2H ah ; in this case, ˚ is called an additive cellular automaton; see  Additive Cellular Automata. Affine CA If (A; C) is a finite abelian group, then an affine CA is one with a local rule (aH ) :D c C P h2H 'h (ah ), where c is some constant and where 'h : A!A are endomorphisms of (A; C). Thus, ˚ is an LCA if c D 0. Permutative CA Suppose ˚ : AZ !AZ has local rule  : A[`:::r] !A. Fix b D [b1` ; : : : ; br1 ; br ] 2 A(`:::r] . For any a 2 A, define [a b] :D [a; b1` ; : : : ; br1 ; br ] 2 A[`:::r] . We then define the function b : A!A by b (a) :D ([a b]). We say that ˚ is left-permutative if b : A!A is a permutation (i. e. a bijection) for all b 2 A(`:::r] . Likewise, given b D [b` ; : : : ; br1 ] 2 A[`:::r) and c 2 A, define [b c] :D [b` ; b1` : : : ; br1 ; c] 2 A[`:::r] , and define b  : A!A by b (c) :D ([bc]); then ˚ is right-permutative if b  : A! A is a permutation for all b 2 A[`:::r) . We say ˚

965

966

Ergodic Theory of Cellular Automata

is bipermutative if it is both left- and right-permutative. More generally, if M is any monoid, H  M is any neighborhood, and h 2 H is any fixed coordinate, then we define h-permutativity for a CA on AM in the obvious fashion. For example, suppose (A; C) is an abelian group and ˚ is an affine CA on AZ with local rule P (aH ) D c C rhD` ' h (a h ). Then ˚ is leftpermutative iff '` is an automorphism, and right-permutative iff 'r is an automorphism. If A D Z/p , and p is prime, then every nontrivial endomorphism is an automorphism(because it is multiplication by a nonzero element of Z/p , which is a field), so in this case, every affine CA is permutative in every coordinate of its neighborhood (and in particular, bipermutative). If A ¤ Z/p , however, then not all affine CA are permutative. Permutative CA were introduced by Hedlund [58], §6, and are sometimes called permutive CA. Right permutative CA on AN are also called toggle automata. For more information, see Sect. 7 of  Topological Dynamics of Cellular Automata. Subshifts A subshift is a closed, -invariant subset X  AM . For any U  M, let XU :D fxU ; x 2 Xg  AU . We say X is a subshift of finite type (SFT) if there is some finite U  M such that X is entirely described by XU , in the sense that X D fx 2 AM ; xU C m 2 XU ; 8m 2 Mg. In particular, if M D Z, then a (two-sided) Markov subshift is an SFT X  AZ determined by a set Xf0;1g  Af0;1g of admissible transitions; equivalently, X is the set of all bi-infinite directed paths in a digraph whose vertices are the elements of A, with an edge a Ý b iff (a; b) 2 Xf0;1g . If M D N, then a onesided Markov subshift is a subshift of AN defined in the same way. D If D 2, then an SFT in AZ can be thought of as the set of admissible ‘tilings’ of RD by Wang tiles corresponding to the elements of XU . (Wang tiles are unit squares (or (hyper)cubes) with various ‘notches’ cut into their edges (or (hyper)faces) so that they can only be juxtaposed in certain ways.) D A subshift X AZ is strongly irreducible (or topologically mixing) if there is some R 2 N such that, for any disjoint finite subsets V ; U  Z D separated by a distance of at least R, and for any u 2 XU and v 2 XV , there is some x 2 X such that xU D u and xV D v. Measures For any finite subset U  M, and any b 2 AU , let hbi :D fa 2 AM ; aU :D bg be the cylinder set de-

termined by b. Let B be the sigma-algebra on AM generated by all cylinder sets. A (probability) measure  on AM is a countably additive function  : B![0; 1] such that [AM ] D 1. A measure on AM is entirely determined by its values on cylinder sets. We will be mainly concerned with the following classes of measures: Bernoulli measure Let ˇ0 be a probability measure on A. The Bernoulli measure induced by ˇ0 is the measure ˇ on AM such that, for any finite subset U  M, and any a 2 AU , if U :D jU j, then Q ˇ[hai] D h2H ˇ0 (ah ). Invariant measure Let  be a measure on AM , and let ˚ : AM !AM be a cellular automaton. The measure ˚ is defined by ˚(B) D (˚ 1 (B)), for any B 2 B. We say that  is ˚-invariant (or that ˚ is -preserving) if ˚ D . Uniform measure Let A :D jAj. The uniform measure  on AM is the Bernoulli measure such that, for any finite subset U  M, and any b 2 AU , if U :D jU j, then [hbi] D 1/AU . The support of a measure  is the smallest closed subset X  AM such that [X] D 1; we denote this by supp(). We say  has full support if supp() D AM – equivalently, [C] > 0 for every cylinder subset C  AM . Notation Let CA(AM ) denote the set of all cellular automata on AM . If X  AM , then let CA(X) be the subset of all ˚ 2 CA(AM ) such that ˚(X) X. Let Meas(AM ) be the set of all probability measures on AM , and let Meas(AM ; ˚ ) be the subset of ˚-invariant measures. If X  AM , then let Meas(X) be the set of probability measures  with supp() X, and define Meas(X; ˚ ) in the obvious way. Font conventions Upper case calligraphic letters (A; B; C ; : : :) denote finite alphabets or groups. Upper-case bold letters (A; B; C; : : :) denote subsets of AM (e. g. subshifts), lowercase bold-faced letters (a; b; c; : : :) denote elements of AM , and Roman letters (a; b; c; : : :) are elements of A or ordinary numbers. Lower-case sans-serif (: : : ; m; n; p) are elements of M, upper-case hollow font (U ; V ; W ; : : :) are subsets of M. Upper-case Greek letters (˚; ; : : :) are functions on AM (e. g. CA, block maps), and lowercase Greek letters (; ; : : :) are other functions (e. g. local rules, measures.) Acronyms in square brackets (e. g.  Topological Dynamics of Cellular Automata) indicate cross-references to related entries in the Encyclopedia; these are listed at the end of this article.

Ergodic Theory of Cellular Automata

Definition of the Subject Loosely speaking, a cellular automaton (CA) is the ‘discrete’ analogue of a partial differential evolution equation: it is a spatially distributed, discrete-time, symbolic dynamical system governed by a local interaction rule which is invariant in space and time. In a CA, ‘space’ is discrete (usually the D-dimensional lattice, ZD ) and the local statespace at each point in space is also discrete (a finite ‘alphabet’, usually denoted by A). A measure-preserving dynamical system (MPDS) is a dynamical system equipped with an invariant probability measure. Any MPDS can be represented as a stationary stochastic process (SSP) and vice versa; ‘chaos’ in the MPDS can be quantified via the information-theoretic ‘entropy’ of the corresponding SSP. An MPDS ˚ on a statespace X also defines a unitary linear operator ˚ on the Hilbert space L2 (X); the spectral properties of ˚ encode information about the global periodic structure and longterm informational asymptotics of ˚. Ergodic theory is the study of MPDSs and SSPs, and lies at the interface between dynamics, probability theory, information theory, and unitary operator theory. Please refer to the Glossary for precise definitions of ‘CA’, ‘MPDS’, etc. Introduction The study of CA as symbolic dynamical systems began with Hedlund [58], and the study of CA as MPDSs began with Coven and Paul [24] and Willson [144]. (Further historical details will unfold below, where appropriate.) The ergodic theory of CA is important for several reasons:  CA are topological dynamical systems ( Topological Dynamics of Cellular Automata,  Chaotic Behavior of Cellular Automata). We can gain insight into the topological dynamics of a CA by identifying its invariant measures, and then studying the corresponding measurable dynamics.  CA are often proposed as stylized models of spatially distributed systems in statistical physics – for example, as microscale models of hydrodynamics, or of atomic lattices ( Cellular Automata Modeling of Physical Systems). In this context, the distinct invariant measures of a CA correspond to distinct ‘phases’ of the physical system ( Phase Transitions in Cellular Automata).  CA can also act as information-processing systems ( Cellular Automata, Universality of,  Cellular Automata as Models of Parallel Computation). Ergodic theory studies the ‘informational’ aspect of dynamical systems, so it is particularly suited to explicitly ‘informational’ dynamical systems like CA.

Article Roadmap In Sect. “Invariant Measures for CA”, we characterize the invariant measures for various classes of CA. Then, in Sect. “Limit Measures and Other Asymptotics”, we investigate which measures are ‘generic’ in the sense that they arise as the attractors for some large class of initial conditions. In Sect. “Measurable Dynamics” we study the mixing and spectral properties of CA as measure-preserving dynamical systems. Finally, in Sect. “Entropy”, we look at entropy. These sections are logically independent, and can be read in any order. Invariant Measures for CA The Uniform Measure vs. Surjective Cellular Automata The uniform measure  plays a central role in the ergodic theory of cellular automata, because of the following result. Theorem 1 Let M D Z D  N E , let ˚ 2 CA(AM ) and let  be the uniform measure on AM . Then (˚ preserves ) () (˚ is surjective). Proof sketch “H)” If ˚ preserves , then ˚ must map supp() onto itself. But supp() D AM ; hence ˚ is surjective. “(H” The case D D 1 follows from a result of W.A. Blankenship and Oscar S. Rothaus, which first appeared in Theorem 5.4 in [58]. The Blankenship–Rothaus Theorem states that, if ˚ 2 CA(AZ ) is surjective and has neighborhood [` : : : r], then for any k 2 N and any a 2 A k , the ˚-preimage of the cylinder set hai is a disjoint union of exactly Ar+` cylinder sets of length k C r C `; it follows that [˚ 1 hai] D ArC` /AkCrC` D Ak D hai. This result was later reproved by Kleveland (see Theorem 5.1 in [74]). The special case A D f0; 1g also appeared in Theorem 2.4 in [131]. The case D 2 follows from the multidimensional version of the Blankenship–Rothaus Theorem, which was proved by Maruoka and Kimura (see Theorem 2 in [93]) (their proof assumes that D D 2 and that ˚ has a ‘quiescent’ state, but neither hypothesis is essential). Alternately, “(H” follows from recent, more general results of Meester, Burton, and Steif; see Example 9 below.  Example 2 Let M D Z or N and consider CA on AM . (a) Say that ˚ is bounded-to-one if there is some B 2 N such that every a 2 AM has at most B preimages. Then (˚ is bounded-to-one) () (˚ is surjective). (b) Any posexpansive CA on AM is surjective (see Subsect. “Posexpansive and Permutative CA” below). (c) Any left- or right-permutative CA on AZ (or rightpermutative CA on AN ) is surjective. This includes, for example, most linear CA.

967

968

Ergodic Theory of Cellular Automata

Hence, in any of these cases, ˚ preserves the uniform measure. Proof For (a), see Theorem 5.9 in [58], or Corollary 8.1.20, p. 271 in [81]. For (b), see Proposition 2.2 in [9], in the case AN ; their argument also works for AZ . Part (c) follows from (b) because any permutative CA is posexpansive (Proposition 11 below). There is also a simple direct proof for a right-permutative CA on AN : using right-permutativity, you can systematically construct a preimage of any desired image sequence, one entry at a time. See Theorem 6.6 in [58] for the proof in AZ .  The surjectivity of a one-dimensional CA can be determined in finite time using certain combinatorial tests ( Topological Dynamics of Cellular Automata). However, for D 2, it is formally undecidable whether an arbiD trary CA on AZ is surjective ( Tiling Problem and Undecidability in Cellular Automata). This problem is sometimes referred to as the Garden of Eden problem, because D an element of AZ with no ˚ -preimage is called a Garden of Eden (GOE) configuration for ˚ (because it could only ever occur at the ‘beginning of time’). However, it is known that a CA is surjective if it is ‘almost injective’ in a certain sense, which we now specify. Let (M; C) be any monoid, and let ˚ 2 CA(AM ) have neighborhood H  M. If B  M is any subset, then we define B :D B C H D fb C h ; b 2 B; h 2 Hg ; and

@B :D B \ BC :

If B is finite, then so is B (because H is finite). If ˚ has local rule  : AH !A, then  induces a function ˚B : AB !AB in the obvious fashion. A B-bubble (or B-diamond) is a pair b; b0 2 AB such that: b ¤ b0 ;

b@B D b0@B ;

and

˚B (b) D ˚B (b0 ) :

Suppose a; a0 2 AM are two configurations such that aB D b ;

a0B D b0 ;

and

aBC D a0BC :

Then it is easy to verify that ˚ (a) D ˚ (a0 ). We say that a and a0 form a mutually erasable pair (because ˚ ‘erases’ the difference between a and a0 ). Figure 1 is a schematic representation of this structure in the case D D 1 (hence the term ‘diamond’). If D D 2, then a and a0 are like two membranes which are glued together everywhere except for a B-shaped ‘bubble’. We say that ˚ is pre-injective if any (and thus, all) of the following three conditions hold:  ˚ admits no bubbles.  ˚ admits no mutually erasable pairs.

Ergodic Theory of Cellular Automata, Figure 1 A ‘diamond’ in AZ

 For any c 2 AM , if a; a0 2 ˚ 1 fcg are distinct, then a and a0 must differ in infinitely many locations. For example, any injective CA is preinjective (because a mutually erasable pair for ˚ gives two distinct ˚ -preimages for some point). More to the point, however, if B is finite, and ˚ admits a B-bubble (b; b0 ), then we can embed N disjoint copies of B into M, and thus, by making various choices between b and b0 on different translates, we obtain a configuration with 2N distinct ˚-preimages (where N is arbitrarily large). But if some configurations in AM have such a large number of preimages, then other configurations in AM must have very few preimages, or even none. This leads to the following result: Theorem 3 (Garden of Eden) Let M be a finitely generated amenable group (e. g. M D Z D ). Let ˚ 2 CA(AM ). (a) ˚ is surjective if and only if ˚ is pre-injective. (b) Let X  AM be a strongly irreducible SFT such that ˚(X) X. Then ˚(X) D X if and only if ˚ jX is preinjective. Proof (a) The case M D Z2 was originally proved by Moore [103] and Myhill [104]; see  Cellular Automata and Groups. The case M D Z was implicit Hedlund (Lemma 5.11, and Theorems 5.9 and 5.12 in [58]). The case when M is a finite-dimensional group was proved by Machi and Mignosi [92]. Finally, the general case was proved by Ceccherini-Silberstein, Machi, and Scarabotti (see Theorem 3 in [20]), see  Cellular Automata and Groups. (b) The case M D Z is Corollary 8.1.20 in [81] (actually this holds for any sofic subshift); see also Fiorenzi [38]. The general case is Corollary 4.8 in [39].  Corollary 4 (Incompressibility) Suppose M is a finitely generated amenable group and ˚ 2 CA(AM ). If ˚ is injective, then ˚ is surjective. Remark 5 (a) A cellular network is a CA-like system defined on an infinite, locally finite digraph, with different local rules at different nodes. By assuming a kind of ‘amenability’ for this digraph, and then imposing some weak global statistical symmetry conditions on the local rules, Gromov (see Theorem 8.F’ in [51]) has generalized

Ergodic Theory of Cellular Automata

the GOE Theorem 3 to a large class of such cellular networks (which he calls ‘endomorphisms of symbolic algebraic varieties’). See also [19]. (b) In the terminology suggested by Gottschalk [46], Incompressibility Corollary 4 says that the group M is surjunctive; Gottschalk claims that ‘surjunctivity’ was first proved for all residually finite groups by Lawton (unpublished); see  Cellular Automata and Groups. For a recent direct proof (not using the GOE theorem), see Weiss (see Theorem 1.6 in [143]). Weiss also defines sofic groups (a class containing both residually finite groups and amenable groups) and shows that Corollary 4 holds whenever M is a sofic group (see Theorem 3.2 in [143]); see also  Cellular Automata and Groups. (c) If X  AM is an SFT such that ˚(X) X, then Corollary 4 holds as long as X is ‘semi-strongly irreducible’; see Fiorenzi (see Corollary 4.10 in [40]). Invariance of Maxentropy Measures If X  AZ is any subshift with topological entropy htop (X;  ), and  2 Meas(X;  ) has measurable entropy h(;  ), then in general, h(;  )  htop (X;  ); we say  is a measure of maximal entropy (or maxentropy measure) if h(;  ) D htop (X;  ). (See Example 75(a) for definitions.) Every subshift admits one or more maxentropy measures. If D D 1 and X  AZ is an irreducible subshift of finite type (SFT), then Parry (see Theorem 10 in [110]) showed that X admits a unique maxentropy measure X (now called the Parry measure); see Theorem 8.10, p. 194 in [142] or Sect. 13.3, pp. 443–444 in [81]. Theorem 1 is then a special case of the following result: D

Theorem 6 (Coven, Paul, Meester and Steif) Let D X  AZ be an SFT having a unique maxentropy measure X , and let ˚ 2 CA(X). Then ˚ preserves X if and only if ˚(X) D X. Proof The case D D 1 is Corollary 2.3 in [24]. The case D 2 follows from Theorem 2.5(iii) in [95], which states: if X and Y are SFTs, and ˚ : X!Y is a factor mapping, and  is a maxentropy measure on X, then ˚ () is a maxentropy measure on Y.  For example, if X  AZ is an irreducible SFT and X is its Parry measure, and ˚ (X) D X, then Theorem 6 says ˚ (X ) D X , as observed by Coven and Paul (see Theorem 5.1 in [24]). Unfortunately, higher-dimensional SFTs do not, in general, have unique maxentropy measures. Burton and Steif [14] provided a plethora of examples of such nonuniqueness, but they also gave a sufficient condition for uniqueness of the maxentropy measure, which we now explain.

Let X  AZ be an SFT and let U  Z D . For any x 2 X, let xU :D [xu ]u2U be its ‘projection’ to AU , and let XU :D fxU ; x 2 Xg AU . Let V :D U C  Z D . For any D u 2 AU and v 2 AV , let [uv] denote the element of AZ such that [uv]U D u and [uv]V D v. Let D

X(u) :D fv 2 AV ; [uv] 2 Xg be the set of all “X-admissible completions” of u (thus, D X(u) ¤ ; , u 2 XU ). If  2 Meas(AZ ), and u 2 AU , then let (u) denote the conditional measure on AV induced by u. If U is finite, then (u) is just the restriction of  to the cylinder set hui. If U is infinite, then the precise definition of (u) involves a ‘disintegration’ of  into ‘fibre measures’ (we will suppress the details). Let U be the projection of  onto AU . If supp() X, then supp(U ) XU , and for any u 2 AU , supp((u) ) X(u) . We say that  is a Burton–Steif measure on X if: (1) supp() D X; and (2) For any U  Z D whose complement U C is finite, and for U -almost any u 2 XU , the measure (u) is uniformly distributed on the (finite) set X(u) . For example, if X D AZ , then the only Burton–Steif measure is the uniform Bernoulli measure. If X  AZ is an irreducible SFT, then the only Burton–Steif measure is the Parry measure. If r > 0 and B :D [r : : : r] D  Z D , and X is an SFT determined by a set of admissible words XB  AB , then it is easy to check that any Burton–Steif measure  on X must be a Markov random field with interaction range r. D

Theorem 7 (Burton and Steif) Let X  AZ be a subshift of finite type. (a) Any maxentropy measure on X is a Burton–Steif measure. (b) If X is strongly irreducible, then any Burton–Steif measure on X is a maxentropy measure for X. D

Proof (a) and (b) are Propositions 1.20 and 1.21 of [15], respectively. For a proof in the case when X is a symmetric nearest-neighbor subshift of finite type, see Propositions 1.19 and 4.1 of [14], respectively.  Any subshift admits at least one maxentropy measure, so any SFT admits at least one Burton–Steif measure. Theorems 6 and 7 together imply: Corollary 8 If X  AZ is an SFT which admits a unique Burton–Steif measure X , then X is the unique maxentropy measure for X. Thus, if ˚ 2 CA(AM ) and ˚(X) D X, then ˚ (X ) D X . D

969

970

Ergodic Theory of Cellular Automata

Example 9 If X D AZ , then we get Theorem 1, because D the unique Burton–Steif measure on AZ is the uniform Bernoulli measure. D

Remark If X  AM is a subshift admitting a unique maxentropy measure , and supp() D X, then Wiess (see Theorem 4.2 in [143]) has observed that X automatically satisfies Incompressibility Corollary 4. In particular, this applies to any SFT having a unique Burton–Steif measure. Periodic Invariant Measures If P 2 N, then a sequence a 2 AZ is P-periodic if  P (a) D a. If A :D jAj, then there are exactly AP such sequences, and a measure  on AZ is called P-periodic if  is supported entirely on these P-periodic sequences. More generally, if M is any monoid and P  M is any submonoid, then a configuration a 2 AM is P -periodic if  p (a) D a for all p 2 P . (For example, if M D Z and P :D PZ, then the P -periodic configurations are the P-periodic sequences). Let AM/P denote the set of P -periodic configurations. If P :D jM/P j, then jAM/P j D AP . A measure  is called P -periodic if supp() AM/P . Proposition 10 Let ˚ 2 CA(AM ). If P  M is any submonoid and jM/P j is finite, then there exists a P -periodic, ˚-invariant measure. Proof sketch If ˚ 2 CA(AM ), then ˚(AM/P ) AM/P . Thus, if  is P -periodic, then ˚ t () is P -periodic for all t 2 N. Thus, the Cesàro limit of the sequence f˚ t ()g1 tD1 is P -periodic and ˚-invariant. This Cesàro limit exists because AM/P is finite.  These periodic measures have finite (hence discrete) support, but by convex-combining them, it is easy to obtain (nonergodic) ˚-invariant measures with countable, dense support. When studying the invariant measures of CA, we usually regard these periodic measures (and their convex combinations) as somewhat trivial, and concentrate instead on invariant measures supported on aperiodic configurations.

AM , if a ¤ a0 , then there is some t 2 N such that ˚ t (a)B ¤ ˚ t (a0 )B . We say ˚ is positively expansive (or posexpansive) if ˚ is B-posexpansive for some finite B (it is easy to see that this is equivalent to the usual definition of positive expansiveness a topological dynamical system). For more information see Sect. 8  Topological Dynamics of Cellular Automata. Thus, if X :D ˚BN (AM )  BN , then X is a compact, shift-invariant subset of BN , and ˚BN : AM !X is an isomorphism from the system (AM ; ˚ ) to the one-sided subshift (X;  ), which is sometimes called the canonical factor or column shift of ˚ . The easiest examples of posexpansive CA are one-dimensional, permutative automata.

Proposition 11 (a) Suppose ˚ 2 CA(AN ) has neighborhood [r : : : R], where 0  r < R. Let B :D [0 : : : R) and let B :D AB . Then (˚ is right permutative ) () (˚ is B-posexpansive, and ˚BN (AN ) D BN ) :

(b) Suppose ˚ 2 CA(AZ ) has neighborhood [L : : : R], where L < 0 < R. Let B :D [L : : : R), and let B :D AB . Then (˚ is bipermutative ) () (˚ is B-posexpansive, and ˚ NB (AZ ) D BN ) : Thus, one-sided, right-permutative CA and two-sided, bipermutative CA are both topologically conjugate the onesided full shift (BN ;  ), where B is an alphabet with jAj RCL symbols (setting L D 0 in the one-sided case). Proof Suppose a 2 AM (where M D N or Z). Draw a picture of the spacetime diagram for ˚. For any t 2 N, and any b 2 B[0:::t) , observe how (bi)permutativity allows you to reconstruct a unique a[tL:::tR) 2 A[tL:::tR) such that b D (aB ; ˚ (a)B ; ˚ 2 (a)B ; : : : ; ˚ t1 (a)B ). By letting t!1, we see that the function ˚BN is a bijection between AM and BN . 

˚BN (a) :D [aB ; ˚(a)B ; ˚ 2 (a)B ; ˚ 3 (a)B ; : : :] 2 BN : (2)

Remark 12 (a) The idea of Proposition 11 is implicit in Theorem 6.7 in [58], but it was apparently first stated explicitly by Shereshevsky and Afra˘ımovich (see Theorem 1 in [130]). It was later rediscovered by Kleveland (see Corollary 7.3 in [74]) and Fagnani and Margara (see Theorem 3.2 in [35]). (b) Proposition 11(b) has been generalized to higher dimensions by Allouche and Skordev (see Proposition 1 D in [3]), who showed that, if ˚ 2 CA(AZ ) is permutative in the ‘corner’ entries of its neighborhood, then ˚ is conjugate to a full shift (K N ;  ), where K is an uncountable, compact space.

Clearly, ˚BN ı ˚ D  ı ˚BN . We say that ˚ is B-posexpansive if ˚BN is injective. Equivalently, for any a; a0 2

Proposition 11 is quite indicative of the general case. Posexpansiveness occurs only in one-dimensional CA, in

Posexpansive and Permutative CA Let B  M be a finite subset, and let B :D AB . If ˚ 2 CA(AM ), then we define a continuous function ˚BN : AM !BN by

Ergodic Theory of Cellular Automata

which it takes a very specific form. To explain this, suppose (M; ) is a group with finite generating set G  M. For any r > 0, let B(r) :D fg1  g2    gr ; g1 ; : : : ; gr 2 Gg. The dimension (or growth degree) of (M; ) is defined dim(M; ) :D lim supr!1 log jB(r)j/ log(r); see [50] or [49]. It can be shown that this number is independent of the choice of generating set G, and is always an integer. For example, dim(Z D ; C) D D. If X  AM is a subshift, then we define its topological entropy htop (X) with respect to dim(M) in the obvious fashion (see Example 75(a)). Theorem 13 Let ˚ 2 CA(AM ). (a) If M D Z D  N E with D C E 2, then ˚ cannot be posexpansive. (b) If M is any group with dim(M) 2, and X AM is any subshift with htop (X) > 0, and ˚ (X) X, then the system (X; ˚ ) cannot be posexpansive. (c) Suppose M D Z or N, and ˚ has neighborhood [L : : : :R]  M. Let L :D maxf0; Lg, R :D maxf0; Rg and B :D [L : : : R). If ˚ is posexpansive, then ˚ is B-posexpansive. Proof (a) is Corollary 2 in [127]; see also Theorem 4.4 in [37]. Part (b) follows by applying Theorem 1.1 in [128] to the natural extension of (X; ˚ ). (c) The case M D Z is Proposition 7 in [75]. The case M D N is Proposition 2.3 in [9].  Proposition 11 says bipermutative CA on AZ are conjugate to full shifts. Using his formidable theory of textile systems, Nasu extended this to all posexpansive CA on AZ . Theorem 14 (Nasu’s) Let ˚ 2 CA(AZ ) and let B  Z. If ˚ is B-posexpansive, then ˚BN (AZ ) BN is a one-sided SFT which is conjugate to a one-sided full shift C N for some alphabet C with jC j 3. Proof sketch The fact that X :D ˚ NB (AZ ) is an SFT follows from Theorem 10 in [75] or Theorem 10.1 in [76]. Next, Theorem 3.12(1) on p. 49 of [105] asserts that, if ˚ is any surjective endomorphism of an irreducible, aperiodic, SFT Y AZ , and (Y; ˚ ) is itself conjugate to an SFT, then (Y; ˚ ) is actually conjugate to a full shift (C N ;  ) for some alphabet C with jC j 3. Let Y :D AZ and invoke K˚urka’s result. For a direct proof not involving textile systems, see Theorem 4.9 in [86].  Remark 15 (a) See Theorem 60(d) for an ‘ergodic’ version of Theorem 14. (b) In contrast to Proposition 11, Nasu’s Theorem 14 does not say that ˚BN (AZ ) itself is a full shift – only that it is conjugate to one.

If (X; ;  ) is a measure-preserving dynamical system (MPDS) with sigma-algebra B, then a one-sided generator W t is a finite partition P  B such that 1 tD0  P  B. If P has C elements, and C is a finite set with jC j D C, then P induces an essentially injective function p : X!C N such that p ı  D  ı p. Thus, if  :D p(), then (X; ;  ) is measurably isomorphic to the (onesided) stationary stochastic process (C N ; ;  ). If  is invertible, then a (two-sided) generator is a finite partiW t tion P  B such that 1 tD1  P  B. The Krieger Generator Theorem says every finite-entropy, invertible MPDS has a generator; indeed, if h(; )  log2 (C), then (X; ;  ) has a generator with C or less elements. If jC j D C, then once again, P induces a measurable isomorphism from (X; ;  ) to a two-sided stationary stochastic process (C Z ; ;  ), for some stationary measure  on AZ . Corollary 16 (Universal Representation) Let M D N or Z, and let ˚ 2 CA(AM ) have neighborhood H  M. Suppose that either M D N, ˚ is right-permutative, and H D [r : : : R] for some 0  r < R, and then let C :D R log2 jAj; or M D Z, ˚ is bipermutative, and H D [L : : : R], and then let C :D (L C R) log2 jAj where L :D maxf0; Lg and R :D maxf0; Rg; or M D Z and ˚ is positively expansive, and htop (AM ; ˚ ) D log2 (C) for some C 2 N. (a) Let (X; ;  ) be any MPDS with a one-sided generator having at most C elements. Then there exists

2 Meas(AM ; ˚ ) such that the system (AM ; ; ˚) is measurably isomorphic to (X; ;  ). (b) Let (X; ;  ) be an invertible MPDS, with measurable entropy h(; )  log2 (C). Then there exists

2 Meas(AM ; ˚ ) such that the natural extension of the system (AM ; ; ˚) is measurably isomorphic to (X; ;  ). Proof Under each of the three hypotheses, Proposition 11 or Theorem 14 yields a topological conjugacy  : (C N ;  )!(AM ; ˚ ), where C is a set of cardinality C. (a) As discussed above, there is a measure  on C N such that (C N ; ;  ) is measurably isomorphic to (X; ;  ). Thus, :D  [] is a ˚-invariant measure on AM , and (AM ; ; ˚) is isomorphic to (C N ; ;  ) via  . (b) As discussed above, there is a measure  on C Z such that (C Z ; ;  ) is measurably isomorphic to (X; ;  ). Let N be the projection of  to C N ; then (C N ; N ;  ) is a one-sided stationary process. Thus, :D  [N ] is a ˚ -invariant measure on AM , and (AM ; ; ˚) is isomorphic to (C N ; N ;  ) via  . Thus, the natural extension of (AM ; ; ˚) is isomorphic to the natural extension

971

972

Ergodic Theory of Cellular Automata

of (C N ; N ; ), which is (C Z ; ; ), which is in turn isomorphic to (X; ;  ).  Remark 17 The Universal Representation Corollary implies that studying the measurable dynamics of the CA ˚ with respect to some arbitrary ˚ -invariant measure will generally tell us nothing whatsoever about ˚. For these measurable dynamics to be meaningful, we must pick a measure on AM which is somehow ‘natural’ for ˚. First, this measure should be shift-invariant (because one of the defining properties of CA is that they commute with the shift). Second, we should seek a measure which has maximal ˚-entropy or is distinguished in some other way. (In general, the measures given by the Universal Representation Corollary will neither be -invariant, nor have maximal entropy for ˚.) If ˚N 2 CA(AN ), and ˚Z 2 CA(AZ ) is the CA obtained by applying the same local rule to all coordinates in Z, then ˚Z can never be posexpansive: if B D [B : : : B], and a; a0 2 AZ are any two sequences such that a(1:::B) ¤ a0(1:::B) , then ˚ t (a)B D ˚ t (a0 )B for all t 2 N, because the local rule of ˚ only propagates information to the left. Thus, in particular, the posexpansive CA on AZ are completely unrelated to the posexpansive CA on AN . Nevertheless, posexpansive CA on AN behave quite similarly to those on AZ . Theorem 18 Let ˚ 2 CA(AN ) have neighborhood [r : : : R], where 0  r < R, and let B :D [0 : : : R). Suppose ˚ is posexpansive. Then: (a) X :D ˚BN (AN ) BN is a topologically mixing SFT. (b) The topological entropy of ˚ is log2 (k) for some k 2 N. (c) If  is the uniform measure on AN , then ˚BN () is the Parry measure on X. Thus,  is the maxentropy measure for ˚. Proof See Corollary 3.7 and Theorems 3.8 and 3.9 in [9] or Theorem 4.8(1,2,4) in [86].  Remark (a) See Theorem 58 for an ‘ergodic’ version of Theorem 18. (b) The analog of Nasu’s Theorem 14 (i. e. conjugacy to a full shift) is not true for posexpansive CA on AN . See [13] for a counterexample. (c) If ˚ : AN !AN is invertible, then we define the function ˚BZ : AN !BZ by extending the definition of ˚BN to negative times. We say that ˚ is expansive if ˚BZ is bijective for some finite B  N. Expansiveness is a much weaker condition than positive expansiveness. Nevertheless, the analog of Theorem 18(a) is true: if ˚ : AN !AN is invertible and expansive, then BZ is conjugate to a (two-sided) subshift of finite type; see Theorem 1.3 in [106].

Measure Rigidity in Algebraic CA Theorem 1 makes the uniform measure  a ‘natural’ invariant measure for a surjective CA ˚ . However, Proposition 10 and Corollary 16 indicate that there are many other (unnatural) ˚-invariant measures as well. Thus, it is natural to seek conditions under which the uniform measure  is the unique (or almost unique) measure which is ˚-invariant, shift-invariant, and perhaps ‘nondegenerate’ in some other sense – a phenomenon which is sometimes called measure rigidity. Measure rigidity has been best understood when ˚ is compatible with an underlying algebraic structure on AM . Let ? : AM  AM !AM be a binary operation (‘multiplication’) and let 1 : AM !AM be an unary operation (‘inversion’) such that (AM ; ?) is a group, and suppose both operations are continuous and commute with all M-shifts; then (AM ; ?) is called a group shift. For example, if (A; ) is itself a finite group, and AM is treated as a Cartesian product and endowed with componentwise multiplication, then (AM ; ) is a group shift. However, not all group shifts arise in this manner; see [70,71,72,73,124]. If (AM ; ?) is a group shift, then a subgroup shift is a closed, shift-invariant subgroup G  AM (i. e. G is both a subshift and a subgroup). If (G; ?) is a subgroup shift, then the Haar measure on G is the unique probability measure G on G which is invariant under translation by all elements of G. That is, if g 2 G, and U  G is any measurable subset, and U ? g :D fu ? g; u 2 Ug, then G [U ? g] D G [U]. In particular, if G D AM , then G is just the uniform Bernoulli measure on AM . The Haar measure is a maxentropy measure on G (see Subsect. “Invariance of Maxentropy Measures”). If (AM ; ?) is a group shift, and G AM is a subgroup shift, and ˚ 2 CA(AM ), then ˚ is called an endomorphic (or algebraic) CA on G if ˚(G) G and ˚ : G!G is an endomorphism of (G; ?) as a topological group. Let ECA(G; ?) denote the set of endomorphic CA on G. For example, suppose (A; C) is abelian, and let (G; ?) :D (AM ; C) with the product group structure; then the endomorphic CA on AM are exactly the linear CA. However, if (A; ) is a nonabelian group, then endomorphic CA on (AM ; ) are not the same as multiplicative CA. Even in this context, CA admit many nontrivial invariant measures. For example, it is easy to check the following: Proposition 19 Let AM be a group shift and let ˚ 2 ECA(AM ; ?). Let G AM be any ˚-invariant subgroup shift; then the Haar measure on G is ˚ -invariant.

Ergodic Theory of Cellular Automata

For example, if (A; C) is any nonsimple abelian group, and (AM ; C) has the product group structure, then AM admits many nontrivial subgroup shifts; see [73]. If ˚ is any linear CA on AM with scalar coefficients, then every subgroup shift of AM is ˚-invariant, so Proposition 19 yields many nontrivial ˚-invariant measures. To isolate  as a unique measure, we must impose further restrictions. The first nontrivial results in this direction were by Host, Maass, and Martínez [61]. Let h(˚; ) be the entropy of ˚ relative to the measure  (see Sect. “Entropy” for definition). Proposition 20 Let A :D Z/p , where p is prime. Let ˚ 2 CA(AZ ) be a linear CA with neighborhood f0; 1g, and let  2 Meas(AZ ; ˚;  ). If  is -ergodic, and h(˚; ) > 0, then  is the Haar measure  on AZ . Proof See Theorem 12 in [61].



A similar idea is behind the next result, only with the roles of ˚ and  reversed. If  is a measure on AN , and b 2 A[1:::1) , then we define the conditional measure (b) on A by (b) (a) :D [x0 D ajx[1:::1) D b], where x is a -random sequence. For example, if  is a Bernoulli measure, then (b) (a) D [x0 D a], independent of b; if  is a Markov measure, then (b) (a) D [x0 D ajx1 D b1 ]. Proposition 21 Let (A; ) be any finite (possibly nonabelian) group, and let ˚ 2 CA(AN ) have multiplicative local rule  : Af0;1g !A defined by (a0 ; a1 ) :D a0  a1 . Let  2 Meas(AZ ; ˚;  ). If  is ˚-ergodic, then there is some subgroup C  A such that, for every b 2 A[1:::1] , supp((b) ) is a right coset of C , and (b) is uniformly distributed on this coset. Proof See Theorem 3.1 in [113].



Example 22 Let ˚ and  be as in Proposition 21. Let  be the Haar measure on AN (a)  has complete connections if supp((b) ) D A for -almost all b 2 A[1:::1) . Thus, if  has complete connections in Proposition 21, then  D . (b1) Suppose h(;  ) > h0 :D maxflog2 jC j ; C a proper subgroup of Ag. Then  D . (b2) In particular, suppose A D (Z/p ; C), where p is prime; then h0 D 0. Thus, if ˚ has local rule (a0 ; a1 ) :D a0 C a1 , and  is any -invariant, ˚ -ergodic measure with h(;  ) > 0, then  D . This is closely analogous to Proposition 20, but ‘dual’ to it, because the roles of ˚ and  are reversed in the ergodicity and entropy hypotheses. (c) If C  A is a subgroup, and  is the Haar measure on the subgroup shift C N  AN , then  satisfies the condi-

tions of Proposition 21. Other, less trivial possibilities also exist (see Examples 3.2(b,c) in [113]). If  is a measure on AZ , and X; Y  AZ , then we say X  essentially equals Y and write X  Y if [X Y] D 0. If n 2 N, then let n o In () :D X  AZ ;  n (X)  X be the sigma-algebra of subsets of AZ which are ‘essentially’  n -invariant. Thus,  is  -ergodic if and only if I1 () is trivial (i. e. contains only sets of measure zero or one). We say  is totally  -ergodic if I n () is trivial for all n 2 N. Let (AZ ; ) be any group shift. The identity element e of (AZ ; ) is a constant sequence. Thus, if ˚ 2 ECA(AZ ; ) is surjective, then ker(˚) :D fa 2 AZ ; ˚(a) D eg is a finite, shift-invariant subgroup of AZ (i. e. a finite collection of  -periodic sequences). Proposition 23 Let (AZ ; ) be a (possibly nonabelian) group shift, and let ˚ 2 ECA(AZ ; ) be bipermutative, with neighborhood f0; 1g. Let  2 Meas(AZ ; ˚;  ). Suppose that: (IE)  is totally ergodic for  ; (H) h(˚; ) > 0; and (K) ker(˚) contains no nontrivial  -invariant subgroups. Then  is the Haar measure on AZ . Proof See Theorem 5.2 in [113].



(AZ ; C)

Example 24 If A D Z/p and is the product group, then ˚ is a linear CA and condition (c) is automatically satisfied, so Proposition 23 becomes a special case of Proposition 20. If ˚ 2 ECA(AZ ; ), then we have an increasing sequence of finite, shift-invariant subgroups ker(˚) ker(˚ 2 ) S n ker(˚ 3 )    . If K(˚) :D 1 nD1 ker(˚ ), then K(˚ ) is a countable, shift-invariant subgroup of (AZ ; ). Theorem 25 Let (AZ ; C) be an abelian group shift, and let G AZ be a subgroup shift. Let ˚ 2 ECA(G; C) be bipermutative, and let  2 Meas(G; ˚;  ). Suppose: (I) I kP () D I1 (), where P is the lowest common multiple of the  -periods of all elements in ker(˚), and k 2 N is any common multiple of all prime factors of jAj. (H) h(˚; ) > 0. Furthermore, suppose that either: (E1)  is ergodic for the N  Z action (˚;  ); (K1) Every infinite,  -invariant subgroup of K(˚) \ G is dense in G; or: (E2)  is  -ergodic;

973

974

Ergodic Theory of Cellular Automata

(K2) Every infinite, (˚;  )-invariant subgroup of K(˚) \ G is dense in G. Then  is the Haar measure on G. Proof See Theorems 3.3 and 3.4 of [122], or Théorèmes V.4 and V.5 on p. 115 of [120]. In the special case when G is irreducible and has topological entropy log2 (p) (where p is prime), Sobottka has given a different and simpler proof, by using his theory of ‘quasigroup shifts’ to establish an isomorphism between ˚ and a linear CA on Z/p , and then invoking Theorem 7. See Theorems 7.1 and 7.2 of [136], or Teoremas IV.3.1 and IV.3.2 on pp. 100–101 of [134].  Example 26 (a) Let A :D Z/p , where p is prime. Let ˚ 2 CA(AZ ) be linear, and suppose that  2 Meas(AZ ; ˚;  ) is (˚;  )-ergodic, h(˚; ) > 0, and I p(p1) () D I1 (). Setting k D p and P D p  1 in Theorem 25, we conclude that  is the Haar measure on AZ . For ˚ with neighborhood f0; 1g, this result first appeared as Theorem 13 in [61]. (b) If (AZ ; ) is abelian, then Proposition 23 is a special case of Theorem 25 [hypothesis (IE) of the former implies hypotheses (I) and (E2) of the latter, while (K) implies (K2)]. Note, however, that Proposition 23 also applies to nonabelian groups. An algebraic Z D -action is an action of ZD by automorphisms on a compact abelian group G. For example, if D G AZ is an abelian subgroup shift, then  is an algebraic ZD -action. The invariant measures of algebraic ZD -actions have been studied in Schmidt (see Sect. 29 in [124]), Silberger (see Sect. 7 in [132]), and Einsiedler [28,29]. If ˚ 2 CA(G), then a complete history for ˚ is a sequence (g t ) t2Z 2 GZ such that ˚(g t ) D g tC1 for all t 2 D DC1 be the set of Z. Let ˚ Z (G)  GZ (AZ )Z Š AZ Z all complete histories for ˚; then ˚ (G) is a subshift of DC1 AZ . If ˚ 2 ECA[G], then ˚ Z (G) is itself an abelian subgroup shift, and the shift action of Z DC1 on ˚ Z (G) is thus an algebraic Z DC1 -action. Any (˚;  )-invariant measure on G extends in the obvious way to a -invariant measure on ˚ Z (G). Thus, any result about the invariant measures (or rigidity) of algebraic Z DC1-actions can be translated immediately into a result about the invariant measures (or rigidity) of endomorphic cellular automata. D AZ

Proposition 27 Let G be an abelian subgroup shift and let ˚ 2 ECA(G). Suppose  2 Meas(G; ˚;  ) is (˚;  )-totally ergodic, and has entropy dimension d 2 [1 : : : D] (see Subsect. “Entropy Geometry and Expansive Subdynamics”). If the system (G; ; ˚;  ) admits no factors whose d-dimensional measurable entropy is zero, then there is a ˚-invariant subgroup shift G0 G and some

element x 2 G such that  is the translated Haar measure on the ‘affine’ subset G0 C x. Proof This follows from Corollary 2.3 in [29].



If we remove the requirement of ‘no zero-entropy factors’, and instead require G and ˚ to satisfy certain technical algebraic conditions, then  must be the Haar measure on G (see Theorem 1.2 in [29]). These strong hypotheses are probably necessary, because in general, the system (G; ; ˚ ) admits uncountably many distinct nontrivial invariant measures, even if (G; ; ˚ ) is irreducible, meaning that G contains no proper, infinite, ˚-invariant subgroup shifts: Proposition 28 Let G AZ be an abelian subgroup shift, let ˚ 2 ECA(G), and suppose (G; ; ˚ ) is irreducible. For any s 2 [0; 1), there exists a (˚;  )-ergodic measure  2 Meas(G; ˚;  ) such that h(; ˚ n ı  z ) D s  htop (G; ˚ n ı  z ) for every n 2 N and z 2 Z D . D

Proof This follows from Corollary 1.4 in [28].



Let  2 Meas(AM ;  ) and let H  M be a finite subset. We say that  is H-mixing if, for any H-indexed collection fUh gh2H of measurable subsets of AM , " # \ Y nh lim   (Uh ) D  [Uh ] : n!1

h2H

h2H

For example, if jHj D H, then any H-multiply  -mixing measure (see Subsect. “Mixing and Ergodicity”) is H-mixing. Proposition 29 Let G AZ be an abelian subgroup shift and let ˚ 2 ECA(G) have neighborhood H (with jHj 2). Suppose (G; ; ˚ ) is irreducible, and let D  2 Meas(AZ ; ˚;  ). Then  is H-mixing if and only if  is the Haar measure of G. D

Proof This follows from [124], Corollary 29.5, p. 289 (note that Schmidt uses ‘almost minimal’ to mean ‘irreducible’). A significant generalization of Proposition 29 appears in [115]  The Furstenberg Conjecture Let T 1 D R/Z be the circle group, which we identify with the interval [0; 1). Define the functions 2 ; 3 : T 1 !T 1 by 2 (t) D 2t (mod 1) and 3 (t) D 3t (mod 1). Clearly, these maps commute, and preserve the Lebesgue measure on T 1 . Furstenberg [44] speculated that the only nonatomic 2 - and 3 -invariant measure on T 1 was the Lebesgue measure. Rudolph [119] showed that, if is (2 ; 3 )-invariant measure and not Lebesgue, then the

Ergodic Theory of Cellular Automata

systems (T 1 ; ; 2 ) and (T 1 ; ; 3 ) have zero entropy; this was later generalized in [60,69]. It is not known whether any nonatomic measures exist on T 1 which satisfy Rudolph’s conditions; this is considered an outstanding problem in abstract ergodic theory. To see the connection between Furstenberg’s Conjecture and cellular automata, let A D f0; 1; 2; 3; 4; 5g, and define the surjection  : AN !T 1 by mapping each a 2 AN to the element of [0; 1) having a as its base-6 expansion. That is:  (a0 ; a1 ; a2 ; : : :) :D

1 X an : nC1 6 nD0

The map  is injective everywhere except on the countable set of sequences ending in [000 : : :] or [555 : : :] (on this set,  is 2-to-1). Furthermore,  defines a semiconjugacy from 2 and 3 into two CA on AN . Let H :D f0; 1g, and define local maps 2 ; 3 : AH !A as follows: h i la m 1 2 (a0 ; a1 ) D 2a0 C and 6 3 h i la m 1 3 (a0 ; a1 ) D 3a0 C ; 6 2 where, [a]6 is the least residue of a, mod 6. If  p 2 CA(AN ) has local map  p (for p D 2; 3), then it is easy to check that  p corresponds to multiplication by p in base-6 notation. In other words,  ı  p D  p ı  for p D 2; 3. If  is the Lebesgue measure on T 1 , then  () D , where  is the uniform Bernoulli measure on AN . Thus,  is 2 - and 3 -invariant, and Furstenberg’s Conjecture asserts that  is the only nonatomic measure on AN which is both 2 - and 3 -invariant. The shift map  : AN !AN corresponds to multiplication by 6 in base-6 notation. Hence, 2 ı 3 D . From this it follows that a measure  is (2 ; 3 )-invariant if and only if  is (2 ;  )-invariant if and only if  is (; 3 )-invariant. Thus, Furstenberg’s Conjecture equivalently asserts that  is the only stationary, 3 -invariant nonatomic measure on AN , and Rudolph’s result asserts that  is the only such nonatomic measure with nonzero entropy; this is analogous to the ‘measure rigidity’ results of Subsect. “Measure Rigidity in Algebraic CA”. The existence of zero-entropy, (; 3 )-invariant, nonatomic measures remains an open question. Remark 30 (a) There is nothing special about 2 and 3; the same results hold for any pair of prime numbers. (b) Lyons [85] and Rudolph and Johnson [68] have also established that a wide variety of 2 -invariant probability measures on T 1 will weak* converge, under the iteration of 3 , to the Lebesgue measure (and vice versa). In

the terminology of Subsect. “Asymptotic Randomization by Linear Cellular Automata”, these results immediately translate into equivalent statements about the ‘asymptotic randomization’ of initial probability measures on AN under the iteration of 2 or 3 . Domains, Defects, and Particles Suppose ˚ 2 CA(AZ ), and there is a collection of ˚ -invariant subshifts P1 ; P2 ; : : : ; P N  AZ (called phases). Any sequence a can be expressed a finite or infinite concatenation a D [: : : a2 d2 a1 d1 a0 d0 a1 d1 a2    ] ; where each domain a k is a finite word (or half-infinite sequence) which is admissible to phase Pn for some n 2 [1 : : : N], and where each defect d k is a (possibly empty) finite word (note that this decomposition may not be unique). Thus, ˚(a) D a0 , where a0 D [: : : a02 d02 a01 d01 a00 d00 a01 d01 a02    ] ; and, for every k 2 Z, a0k belongs to the same phase as a k . We say that ˚ has stable phases if, for any such a and a0 in AZ , it is the case that, for all k 2 Z, jd0k j  jd k j. In other words, the defects do not grow over time. However, they may propagate sideways; for example, d0k may be slightly to the right of d k , if the domain a0k is larger than a k , while the domain a0kC1 is slightly smaller than a kC1 . If a k and a kC1 belong to different phases, then the defect d k is sometimes called a domain boundary (or ‘wall’, or ‘edge particle’). If a k and a kC1 belong to the same phase, then the defect d k is sometimes called a dislocation (or ‘kink’). Often Pn D fpg where p D [: : : ppp : : :] is a constant sequence, or each Pn consists of the  -orbit of a single periodic sequence. More generally, the phases P1 ; : : : ; P N may be subshifts of finite type. In this case, most sequences in AZ can be fairly easily and unambiguously decomposed into domains separated by defects. However, if the phases are more complex (e. g. sofic shifts), then the exact definition of a ‘defect’ is actually fairly complicated – see [114] for a rigorous discussion. Example 31 Let A D f0; 1g and let H D f1; 0; 1g. Elementary cellular automaton (ECA) #184 is the CA ˚ : AZ !AZ with local rule  : AH !A given as follows: (a1 ; a0 ; a1 ) D 1 if a0 D a1 D 1, or if a1 D 1 and a0 D 0. On the other hand, (a1 ; a0 ; a1 ) D 0 if a1 D a0 D 0, or if a1 D 0 and a0 D 1. Heuristically, each ‘1’ represents a ‘car’ moving cautiously to the right on a single-lane road. During each iteration, each car will

975

976

Ergodic Theory of Cellular Automata

advance to the site in front of it, unless that site is already occupied, in which case the car will remain stationary. ECA#184 exhibits one stable phase P, given by the 2-periodic sequence [: : : 0101:0101 : : :] and its translate [: : : 1010:1010 : : :] (here the decimal point indicates the zeroth coordinate), and ˚ acts on P like the shift. The phase P admits two dislocations of width 2. The dislocation d0 D [00] moves uniformly to the right, while the dislocation d1 D [11] moves uniformly to the left. In the traffic interpretation, P represents freely flowing traffic, d0 represents a stretch of empty road, and d1 represents a traffic jam. Example 32 Let A :D Z/N , and let H :D [1 : : : 1]. The one-dimensional, N-color cyclic cellular automaton (CCAN ) ˚ : AZ !AZ has local rule  : AH !A defined: 8 ˆ < a0 C 1 if there is some h 2 H (a) :D with ah D a0 C 1 ; ˆ : otherwise : a0 (here, addition is mod N). The CCA has phases P0 ; P1 ; : : : ; P N1 , where P a D f[: : : aaa : : :]g for each a 2 A. A domain boundary between P a and P a1 moves with constant velocity towards the P a1 side. All other domain boundaries are stationary. In a particle cellular automaton (PCA), A D f;g t P , where P is a set of ‘particle types’ and ; represents a vacant site. Each particle p 2 P is assigned some (constant) velocity vector v(p) 2 (H) (where H is the neighborhood of the automaton). Particles propagate with constant velocity through M until two particles try to simultaneously enter the same site in the lattice, at which point the outcome is determined by a collision rule: a stylized ‘chemical reaction equation’. For example, an equation “p1 C p2 Ý p3 ” means that, if particle types p1 and p2 collide, they coalesce to produce a particle of type p3 . On the other hand, “p1 C p2 Ý ;” means that the two particles annihilate on contact. Formally, given a set of velocities and collision rules, the local rule  : AH !A is defined 8 ˆ p if there is a unique h 2 H and p 2 P with ˆ ˆ ˆ < ah D p and v(p) D h : (a) :D ˆ q if fp 2 P ; av(p) D pg D fp1 ; p2 ; : : : ; p n g; ˆ ˆ ˆ : and p C    C p Ý q : 1

n

Example 33 The one-dimensional ballistic annihilation model (BAM) contains two particle types: P D f˙1g, with the following rules: v(1) D 1 ;

v(1) D 1 ;

and  1 C 1 Ý ; :

(This CA is sometimes also called Just Gliders.) Thus, az D 1 if the cell z contains a particle moving to the right with velocity 1, whereas az D 1 if the cell z contains a particle moving left with velocity -1, and az D ; if cell z is vacant. Particles move with constant velocity until they collide with oncoming particles, at which point both particles are annihilated. If B :D f˙1; ;g and H D [1 : : : 1]  Z, then we can represent the BAM using  2 CA(BZ ) with local rule : BH !B defined: 8 ˆ 1 if b1 D 1 and ˆ ˆ ˆ < b1 ; b0 2 f1; ;g ; (b1 ; b0 ; b1 ) :D ˆ 1 if b1 D 1 and b0 ; b1 2 f1; ;g ; ˆ ˆ ˆ : ; otherwise : Particle CA can be seen as ‘toy models’ of particle physics or microscale chemistry. More interestingly, however, one-dimensional PCA often arise as factors of coalescentdomain CA, with the ‘particles’ tracking the motion of the defects. Example 34 (a) Let A :D f0; 1g and let ˚ 2 CA(AZ ) be ECA#184. Let B :D f˙1; 0g, and let  2 CA(BZ ) be the BAM. Let G :D f0; 1g, and let  : AZ !BZ be the block map with local rule  : AG !B defined 8 ˆ < 1 if [a0 ; a1 ] D [0; 0] D d0 ;  (a0 ; a1 ) :D 1a0 a1 D 1 if [a0 ; a1 ] D [1; 1] D d1 ; ˆ : 0 otherwise : Then  ı ˚ D  ı  ; in other words, the BAM is a factor of ECA#184, and tracks the motion of the dislocations. (b) Again, let  2 CA(BZ ) be the BAM. Let A D Z/3 , and let ˚ 2 CA(AZ ) be the 3-color CCA. Let G :D f0; 1g, and let  : AZ !BZ be the block map with local rule  : AG !B defined  (a0 ; a1 ) :D (a0  a1 ) mod 3 : Then  ı ˚ D  ı  ; in other words, the BAM is a factor of CCA3 , and tracks the motion of the domain boundaries. Thus, it is often possible to translate questions about coalescent domain CA into questions about particle CA, which are generally easier to study. For example, the invariant measures of the BAM have been completely characterized. Proposition 35 Let B D f˙1; 0g, and let  : BZ !BZ be the BAM. (a) The sets R :D f0; 1gZ and L :D f0; 1gZ are  -invariant, and  acts as a right-shift on R and as a leftshift on L.

Ergodic Theory of Cellular Automata

(b) Let LC :D f0; 1gN and R :D f0; 1gN , and let ˚ X :D a 2 AZ ; 9 z 2 Z such that a(1:::z] 2 R and  a[z:::1) 2 LC : Then X is  -invariant. For any x 2 X,  acts as a right shift on a(1:::z) , and as a left-shift on x(z:::1) . (The boundary point z executes some kind of random walk.) (c) Any  -invariant measure on AZ can be written in a unique way as a convex combination of four measures ı0 , , , and , where: ı0 is the point mass on the ‘vacuum’ configuration [: : : 0 0 0 : : :], is any shift-invariant measure on R,  is any shift-invariant measure on L, and  is a measure on X. Furthermore, there exist shift-invariant measures  and C on R and LC , respectively, such that, for -almost all x 2 X, x(1:::z] is  -distributed and x[z:::1) is C -distributed. Proof (a) and (b) are obvious; (c) is Theorem 1 in [8].  Remark 36 (a) Proposition 35(c) can be immediately translated into a complete characterization of the invariant measures of ECA#184, via the factor map  in Example 34(a); see [8], Theorem 3. Likewise, using the factor map in Example 34(b) we get a complete characterization of the invariant measures for CCA3 . (b) Proposition 48 and Corollaries 49 and 50 describe the limit measures of the BAM, CCA3 , and ECA#184. Also, Blank [10] has characterized invariant measures for a broad class of multilane, multi-speed traffic models (including ECA#184); see Remark 51(b). (c) K˚urka [78] has defined, for any ˚ 2 CA(AZ ), a construction similar to the set X in Proposition 35(b). For any n 2 N and z 2 Z, let Sz;n be the set of fixed points of ˚ n ı  z ; then Sz;n is a subshift of finite type, which K˚urka calls a signal subshift with velocity v D z/n. (For example, if ˚ is the BAM, then R D S1;1 and L D S1;1 .) Now, suppose that z1 /n1 > z2 /n2 >    > z J /n J . The join of the signal subshifts Sz 1 ;n 1 ; Sz 2 ;n 2 ; : : : ; Sz J ;n J is the set S of all infinite sequences [a1 a2 : : : a J ], where for all j 2 [1::J], a j is a (possibly empty) finite word or (half-)infinite sequence admissible to the subshift Sz j ;n j . (For example, if S is the join of S1;1 D R and S1;1 D L from Proposition 35(a), then S D L [ X [ R.) It follows that S ˚(S) ˚ 2 (S)   . If we deS fine ˚ 1 (S) :D 1 ˚ t (S), then ˚ 1 (S) ˚ 1 (AZ ), tD0 T1 1 Z where ˚ (A ) :D tD0 ˚ t (AZ ) is the omega limit set of ˚ (see Proposition 5 in [78]). The support of any ˚-invariant measure must be contained in ˚ 1 (AZ ), so invariant measures may be closely related to the joins of sig-

nal subshifts. See  Topological Dynamics of Cellular Automata for more information. In the case of the BAM, it is not hard to check that ˚ 1 (S) D S D ˚ 1 (AZ ); this suggests an alternate proof of Proposition 35(c). It would be interesting to know whether a conclusion analogous to Proposition 35(c) holds for other ˚ 2 CA(AZ ) such that ˚ 1 (AZ ) is a join of signal subshifts. Limit Measures and Other Asymptotics Asymptotic Randomization by Linear Cellular Automata The results of Subsect. “Measure Rigidity in Algebraic CA” suggest that the uniform Bernoulli measure  is the ‘natural’ measure for algebraic CA, because  is the unique invariant measure satisfying any one of several collections of reasonable criteria. In this section, we will see that  is ‘natural’ in quite another way: it is the unique limit measure for linear CA from a large set of initial conditions. M If fn g1 nD1 is a sequence of measures on A , then this sequence weak* converges to the measure 1 (“wk  lim n D 1 ”) if, for all cylinder sets B  AM , n!1

lim n [B] D 1 [B]. Equivalently, for all continuous

n!1

functions f : AM !C, we have Z Z f dn D f d1 : lim n!1

AM

AM

The Cesàro average (or Cesàro limit) of fn g1 nD1 is 1 XN n , if this limit exists. wk  lim nD1 N!1 N Let  2 Meas(AM ) and let ˚ 2 CA(AM ). For any t 2 N, the measure ˚ t  is defined by ˚ t (B) D ( t (B)), for any measurable subset B  AM . We say that ˚ asymptotically randomizes  if the Cesàro average of the sequence f n g1 nD1 is . Equivalently, there is a subset J  N of density 1, such that wk  lim ˚ j  D  : j!1 j2J

The uniform measure  is the measure of maximal entropy on AM . Thus, asymptotic randomization is kind of ‘Second Law of Thermodynamics’ for CA. Let (A; C) be a finite abelian group, and let ˚ be a linear cellular automaton (LCA) on AM . Recall that ˚ has scalar coefficients if there is some finite H  M, and integer coefficients fch gh2H so that ˚ has a local rule of the form X ch a h ; (3) (aH ) :D h2H

977

978

Ergodic Theory of Cellular Automata

An LCA ˚ is proper if ˚ has scalar coefficients as in Eq. (3), and if, furthermore, for any prime divisor p of jAj, there are at least two h; h0 2 H such that ch 6 0 6 ch0 mod p. For example, if A D Z/n for some n 2 N, then every LCA on AM has scalar coefficients; in this case, ˚ is proper if, for every prime p dividing n, at least two of these coefficients are coprime to p. In particular, if A D Z/p for some prime p, then ˚ is proper as long as jHj 2. Let PLCA(AM ) be the set of proper linear CA on AM . If  2 Meas(AM ), recall that  has full support if [B] > 0 for every cylinder set B  AM .

Part I Any linear CA with scalar coefficients can be written as a ‘Laurent polynomial of shifts’. That is, if ˚ has local rule (3), then for any a 2 AM , X ˚ (a) :D ch  h (a) (where we add configurations h2H

componentwise) :

We indicate this by writing “˚ D F( )”, where F 2 Z[x1˙1 ; x2˙1 ; : : : ; x ˙1 D ] is the D-variable Laurent polynomial defined: X F(x1 ; : : : ; x D ) :D ch x1h1 x2h2 : : : x Dh D : (h 1 ;:::;h D )2H

Theorem 37 Let (A; C) be a finite abelian group, let M :D Z D  N E for some D; E 0, and let ˚ 2 PLCA(AM ). Let  be any Bernoulli measure or Markov random field on AM having full support. Then ˚ asymptotically randomizes . History Theorem 37 was first proved for simple one-dimensional LCA randomizing Bernoulli measures on AZ , where A was a cyclic group. In the case A D Z/2 , Theorem 37 was independently proved for the nearest-neighbor XOR CA (having local rule (a1 ; a0 ; a1 ) D a1 C a1 mod 2) by Miyamoto [100] and Lind [82]. This result was then generalized to A D Z/p for any prime p by Cai and Luo [16]. Next, Maass and Martínez [87] extended the Miyamoto/Lind result to the binary Ledrappier CA (local rule (a0 ; a1 ) D a0 C a1 mod 2). Soon after, Ferrari et al. [36] considered the case when A was an abelian group of order pk (p prime), and proved Theorem 37 for any Ledrappier CA (local rule (a0 ; a1 ) D c0 a0 C c1 a1 , where c0 ; c1 6 0 mod p) acting on any measure on AZ having full support and ‘rapidly decaying correlations’ (see Part II(a) below). For example, this includes any Markov measure on AZ with full support. Next, Pivato and Yassawi [116] generalized Theorem 37 to any PLCA acting on any fully supported N-step Markov chain on AZ D E or any nontrivial Bernoulli measure on AZ N , where A D Z/p k (p prime). Finally, Pivato and Yassawi [117] proved Theorem 37 in full generality, as stated above. The proofs of Theorem 37 and its variations all involve two parts: Part I. A careful analysis of the local rule of ˚ t (for all t 2 N), showing that the neighborhood of ˚ t grows large as t!1 (and in some cases, contains large ‘gaps’). Part II. A demonstration that the measure  exhibits ‘rapidly decaying correlations’ between widely separated elements of M; hence, when these elements are combined using ˚ t , it is as if we are summing independent random variables.

For example, if ˚ is the nearest-neighbor XOR CA, then ˚ D  1 C  1 D F( ), where F(x) D x 1 C x. If ˚ is a Ledrappier CA, then ˚ D c0 Id C c1  1 D F( ), where F(x) D c0 C c1 x. It is easy to verify that, if F and G are two such polynomials, and ˚ D F() while  D G( ), then ˚ ı  D (F  G)( ), where F  G is the product of F and G in the polynomial ring Z[x1˙1 ; x2˙1 ; : : : ; x ˙1 D ]. In particular, this means that ˚ t D F t ( ) for all t 2 N. Thus, iterating an LCA is equivalent to computing the powers of a polynomial. If A D Z/p , then we can compute the coefficients of F t modulo p. If p is prime, then this can be done using a result of Lucas [84], which provides a formula for  the binomial coefficient ba in terms of the base-p expansions of a and b. For example, if p D 2, then Lucas’ theorem says that Pascal’s triangle, modulo 2, looks like a ‘discrete Sierpinski triangle’, made out of 0’s and 1’s. (This is why fragments of the Sierpinski triangle appear frequently in the spacetime diagrams of linear CA on A D Z/2 , a phenomenon which has inspired much literature on ‘fractals and automatic sequences in cellular automata’; see [2,4,5,6,7,52,53,54,55,56,57,94,138,139,140, 145,146,147,148,149].) Thus, Lucas’ Theorem, along with some combinatorial lemmas about the structure of base-p expansions, provides the machinery for Part I. Part II There are two approaches to analyzing probability measures on AM ; one using renewal theory, and the other using harmonic analysis. II(a) Renewal theory This approach was developed by Maass, Martínez and their collaborators. Loosely speaking, if  2 Meas(AZ ;  ) has sufficiently large support and sufficiently rapid decay of correlations (e. g. a Markov chain), and a 2 AZ is a -random sequence, then we can treat a as if there is a sparse, randomly distributed set of ‘renewal times’ when the normal stochastic evolution of a is interrupted by independent, random ‘errors’. By judicious use

Ergodic Theory of Cellular Automata

of Part I described above, one can use this ‘renewal process’ to make it seem as though ˚ t is summing independent random variables. For example, if (A; C) be an abelian group of order pk where p is prime, and  2 Meas(AZ ; ) has complete connections (see Example 22(a)) and summable decay (which means that a certain sequence of coefficients (measuring long-range correlation) decays fast enough that its sum is finite), and ˚ 2 CA(AZ ) is a Ledrappier CA, then Ferrari et al. (see Theorem 1.3 in [36]) showed that ˚ asymptotically randomizes . (For example, this applies to any N-step Markov chain with full support on on AZ .) FurZ D Zi/p h Zi thermore, if Ah /p , and ˚ 2 CA(A ) has lin-

ear local rule  xy00 ; xy11 D (y0 ; x0 C y1 ), then Maass and Martínez [88] showed that ˚ randomizes any Markov measure with full support on AZ . Maass and Martínez again handled Part II using renewal theory. However, in this case, Part I involves some delicate analysis of the (noncommutative) algebra of the matrix-valued coefficients; unfortunately, their argument does not generalize to other LCA with noncommuting, matrix-valued coefficients. (However, Proposition 8 of [117] suggests a general strategy for dealing with such LCA).

II(b) Harmonic analysis This approach to Part II was implicit in the early work of Lind [82] and Cai and Luo [16], but was developed in full generality by Pivato and Yassawi [116,117,118]. We regard AM as a direct product of copies of the group (A; C), and endow it with the product group structure; then (AM ; C) a compact abelian topological group. A character on (AM ; C) is a continuous group homomorphism  : AM !T , where T :D fc 2 C; jcj D 1g is the unit circle group. If  is a measure on AM R , then the Fourier coefficients of  are ˆ defined: [] D AM  d, for every character . If  : AM !T is any character, then there is a unique finite subset K  M (called the support of ) and a unique collection of nontrivial characters k : A!T for all k 2 K, such that, Y k (ak ) ; 8 a 2 AM : (4) (a) D

in [116]). If A D Z/p (p prime) then any nontrivial Bernoulli measure on AM is harmonically mixing (see Proposition 6 in [116]). Furthermore, if  2 Meas(AZ ;  ) has complete connections and summable decay, then  2 Hm(AZ ) (see Theorem 23 in [61]). If M :D Meas(AM ; C) is the set of all complex-valued measures on AM , then M is Banach algebra (i. e. it is a vector space under the obvious definition of addition and scalar multiplication for measures, and a Banach space under the total variation norm, and finally, since AM is a topological group, M is a ring under convolution). Then Hm(AM ) is an ideal in M, is closed under the total variation norm, and is dense in the weak* topology on M (see Propositions 4 and 7 in [116]). Finally, if  is any Markov random field on AM which is locally free (which roughly means that the boundary of any finite region does not totally determine the interior of that region), then  2 Hm(AM ) (see Theorem 1.3 in [118]). In particular, this implies: Proposition 38 If (A; C) is any finite group, and  2 Meas(AM ) is any Markov random field with full support, then  is harmonically mixing. Proof This follows from Theorem 1.3 in [118]. It is also a special case of Theorem 15 in [117].  If  is a character, and ˚ is a LCA, then  ı ˚ t is also a character, for any t 2 N (because it is a composition of two continuous group homomorphisms). We say ˚ is diffusive if there is a subset J  N of density 1, such that, for every character  of AM , lim

J 3 j!1

rank[ ı ˚ j ] D 1 :

Proposition 39 Let (A; C) be any finite abelian group and let M be any monoid. If  is harmonically mixing and ˚ is diffusive, then ˚ asymptotically randomizes . Proof See Theorem 12 in [117].



Proposition 40 Let (A; C) be any abelian group and let M :D Z D  N E for some D; E 0. If ˚ 2 PLCA(AM ), then ˚ is diffusive.

k2K

We define rank[] :D jKj. The measure  is called harmonically mixing if, for all  > 0, there is some R such that ˆ for all characters , (rank[] R) H) (j[]j < ). The set Hm(AM ) of harmonically mixing measures on AM is quite inclusive. For example, if  is any (N-step) Markov chain with full support on AZ , then  2 Hm(AZ ) (see Propositions 8 and 10 in [116]), and if 2 Meas(AZ ) is absolutely continuous with respect to this , then 2 Hm(AZ ) also (see Corollary 9

Proof The proof uses Lucas’ theorem, as described in Part I above. See Theorem 15 in [116] for the case A D Z/p when p prime. See Theorem 6 in [117] for the case when A is any cyclic group. That proof easily extends to any finite abelian group A: write A as a product of cyclic groups and decompose ˚ into separate automata over these cyclic factors.  Proof of Theorem 37 Combine Propositions 38, 39, and 40. 

979

980

Ergodic Theory of Cellular Automata

Remark (a) Proposition 40 can be generalized: we do not need the coefficients of ˚ to be integers, but merely to be a collection of automorphisms of A which commute with one another (so that Lucas’ theorem from Part I is still applicable). See Theorem 9 in [117]. (b) For simplicity, we stated Theorem 37 for measures with full support; however, Proposition 39 actually applies to many Markov random fields without full support, because harmonic mixing only requires ‘local freedom’ (see Theorem 1.3 in [118]). For example, the support of a Markov chain on AZ is Markov subshift. If A D Z/p (p prime), then Proposition 39 yields asymptotic randomization of the Markov chain as long as the transition digraph of the underlying Markov subshift admits at least two distinct paths of length 2 between any pair of vertices in A. More generally, if M D Z D , then the support of D any Markov random field on AZ is an SFT, which we can regard as the set of all tilings of RD by a certain collection of Wang tiles. If A D Z/p (p prime), then Proposition 39 yields asymptotic randomization of the Markov random field as long as the underlying Wang tiling is flexible enough that any hole can always be filled in at least two ways; see Sect. 1 in [118]. Remark 41 (Generalizations and Extensions) (a) Pivato and Yassawi (see Thm 3.1 in [118]) proved a variation of Theorem 60 where diffusion (of ˚ ) is replaced with a slightly stronger condition called dispersion, so that harmonic mixing (of ) can be replaced with a slightly weaker condition called dispursion mixing (DM). It is unknown whether all proper linear CA are dispersive, but a very large class are (including, for example, ˚ D Id C  ). Any uniformly mixing measure with positive entropy is DM (see Theorem 5.2 in [118]); this includes, for example, any mixing quasimarkov measure (i. e. the image of a Markov measure under a block map; these are the natural measures supported on sofic shifts). Quasimarkov measures are not, in general, harmonically mixing (see Sect. 2 in [118]), but this result shows they are still asymptotically randomized by most linear CA. D (b) Suppose G  AZ is a -transitive subgroup shift (see Subsect. “Measure Rigidity in Algebraic CA” for definition), and let ˚ 2 PLCA(G). If G satisfies an algebraic condition called the follower lifting property (FLP) and  is any Markov random field with supp() D G, then Maass, Martínez, Pivato, and Yassawi [89] have shown that ˚ asymptotically randomizes  to a maxentropy measure on G. Furthermore, if D D 1, then this maxentropy measure is the Haar measure on G. In particular, if A is an abelian group of prime-power order, then any transitive Markov subgroup G  AZ satisfies the FLP, so this re-

sult holds for any multistep Markov measure on G. See also [90] for the special case when ˚ 2 CA(AZ ) has local rule (x0 ; x1 ) D x0 C x1 . In the special case when ˚ has local rule (x0 ; x1 ) D c0 x0 C c1 x1 C a, the result has been extended to measures with complete connections and summable decay; see Teorema III.2.1, p. 71 in [134] or see Theorem 1 in [91]. (c) All the aforementioned results concern asymptotic randomization of initial measures with nonzero entropy. Is nonzero entropy either necessary or sufficient for asymptotic randomization? First let X N  AZ be the set of N-periodic points (see Subsect. “Periodic Invariant Measures”) and suppose supp() X N . Then the Cesàro limit of f˚ t ()g t2N will also be a measure supported on X N , so 1 cannot be the uniform measure on AZ . Nor, in general, will 1 be the uniform measure on X N ; this follows from Jen’s (1988) exact characterization of the limit cycles of linear CA acting on X N . What if  is a quasiperiodic measure, such as the unique  -invariant measure on a Sturmian shift? There exist quasiperiodic measures on (Z/2 )Z which are not asymptotically randomized by the Ledrappier CA (see Sect. 15 in [112]). But it is unknown whether this extends to all quasiperiodic measures or all linear CA. There is also a measure  on AZ which has zero  -entropy, yet is still asymptotically randomized by ˚ (see Sect. 8 in [118]). Loosely speaking,  is a Toeplitz measure with a very low density of ‘bit errors’. Thus,  is ‘almost’ deterministic (so it has zero entropy), but by sufficiently increasing the density of ‘bit errors’, we can introduce just enough randomness to allow asymptotic randomization to occur. (d) Suppose (G ; ) is a nonabelian group and ˚ : n n G Z !G Z has multiplicative local rule (g) :D gh11 gh22 nJ    gh J , for some fh1 ; : : : ; h J g  Z (possibly not distinct) and n1 ; : : : ; n J 2 N. If G is nilpotent, then G can be decomposed into a tower of abelian group extensions; this induces a structural decomposition of ˚ into a tower of skew products of ‘relative’ linear CA. This strategy was first suggested by Moore [102], and was developed by Pivato (see Theorem 21 in [111]), who proved a version of Theorem 37 in this setting. (e) Suppose (Q; ?) is a quasigroup – that is, F is a binary operation such that for any q; r; s 2 Q, (q ? r D q ? s) () (r D s) () (r ? q D s ? q). Any finite associative quasigroup has an identity, and any associative quasigroup with an identity is a group. However there are also many nonassociative finite quasigroups. If we define a ‘multiplicative’ CA ˚ : QZ !QZ with local rule  : Qf0;1g !Q given by (q0 ; q1 ) D q0 ? q1 , then it is easy to see that ˚ is bipermutative if and only if (Q; ?) is

Ergodic Theory of Cellular Automata

a quasigroup. Thus, quasigroups seem to provide the natural algebraic framework for studying bipermutative CA; this was first proposed by Moore [101], and later explored by Host, Maass, and Martínez (see Sect. 3 in [61]), Pivato (see Sect. 2 in [113]), and Sobottka [134,135,136]. Note that QZ is a quasigroup under componentwise F-multiplication. A quasigroup shift is a subshift X  QZ which is also a subquasigroup; it follows that ˚(X) X. If X and ˚ satisfy certain strong algebraic conditions, and  2 Meas(X; ) has complete connections and summable decay, then the sequence f˚ t g1 tD1 Cesàro-converges to a maxentropy measure 1 on X (thus, if X is irreducible, then 1 is the Parry measure; see Subsect. “Invariance of Maxentropy Measures”). See Theorem 6.3(i) in [136], or Teorema IV.5.3, p. 107 in [134]. Hybrid Modes of Self-Organization Most cellular automata do not asymptotically randomize; instead they seem to weak* converge to limit measures concentrated on small (i. e. low-entropy) subsets of the statespace AM – a phenomenon which can be interpreted as a form of ‘self-organization’. Exact limit measures have been computed for a few CA. For example, D let A D f0; 1; 2g and let ˚ 2 CA(AZ ) be the Greenberg– Hastings model (a simple model of an excitable medium). Durrett and Steif [26] showed that, if D 2 and  is any D Bernoulli measure on AZ , then 1 :D wk  lim ˚ t  t!1

exists; 1 -almost all points are 3-periodic for ˚, and alD though 1 is not a Bernoulli measure, the system (AZ ; 1 ;  ) is measurably isomorphic to a Bernoulli system. In other cases, the limit measure cannot be exactly computed, but can still be estimated. For example, let A D f˙1g, 2 (0; 1), and R > 0, and let ˚ 2 CA(AZ ) be the (R; )-threshold voter CA (where each cell computes the fraction of its radius-R neighbors which disagree with its current sign, and negates its sign if this fraction is at least ). Durret and Steif [27] and Fisch and Gravner [43] have described the long-term behavior of ˚ in the limit as R!1. If < 1/2, then every initial condition falls into a two-periodic orbit (and if < 1/4, then every cell simply alternates its sign). Let  be the uniform Bernoulli measure on AZ ; if 1/2 < , then for any finite subset B  Z, if R is large enough, then ‘most’ initial conditions (relative to ) converge to orbits that are fixed inside B. Indeed, there is a critical value c  0:6469076 such that, if c < , and R is large enough, then ‘most’ initial conditions (for ) are already fixed inside B; see also [137] for an analysis of behavior at the critical value. In still other cases, the limit measure is is known to exist, but is still mysterious; this is true for the Cesàro

limit measures of Coven CA, for example see [87], Theorem 1. However, for most CA, it is difficult to even show that limit measures exist. Except for the linear CA of Subsect. “Asymptotic Randomization by Linear Cellular Automata”, there is no large class of CA whose limit measures have been exactly characterized. Often, it is much easier to study the dynamical asymptotics of CA at a purely topological level. If ˚ 2 CA(AM ), then AM  ˚(AM )  ˚ 2 (AM )     . The limit set of ˚ is the nonempty subT t M M shift ˚ 1 (AM ) :D 1 tD1 ˚ (A ). For any a 2 A , the omega-limit set of a is the set !(a; ˚ ) of all cluster points M is of the ˚ -orbit f˚ t (a)g1 tD1 . A closed subset X  A a (Conley) attractor if there exists a clopen subset U  X T t such that ˚(U) U and X D 1 tD1 ˚ (U). It follows that !(˚; u) X for all u 2 U. For example, ˚ 1 (AM ) is an attractor (let U :D AM ). The topological attractors of CA were analyzed by Hurley [63,65,66], who discovered severe constraints on the possible attractor structures a CA could exhibit; see Sect. 9 of  Topological Dynamics of Cellular Automata and  Cellular Automata, Classification of. Within pure topological dynamics, attractors and (omega) limit sets are the natural formalizations of the heuristic notion of ‘self-organization’. The corresponding formalization in pure ergodic theory is the weak* limit measure. However, both weak* limit measures and topological attractors fail to adequately describe the sort of self-organization exhibited by many CA. Thus, several ‘hybrid’ notions self-organization have been developed, which combine topological and measurable criteria. These hybrid notions are more flexible and inclusive than purely topological notions. However, they do not require the explicit computation (or even the existence) of weak* limit measures, so in practice they are much easier to verify than purely ergodic notions. Milnor–Hurley -attractors If X  AM is a closed subset, then for any a 2 AM , we define d(a; X) :D infx2X d(a; x). If ˚ 2 CA(AM ), then the basin (or realm) of X is the set n o   Basin(X) :D a 2 AM ; lim d ˚ t (a); X D 0 t!1 n o M D a 2 A ; !(a; ˚ ) X : Suppose ˚ (X) X. If  2 Meas(AM ), then X is a -attractor if [Basin(X)] > 0; we call X a lean -attractor if in addition, [Basin(X)] > [Basin(Y)] for any proper closed subset Y ¨ X. Finally, a -attractor X is minimal if [Basin(Y)] D 0 for any proper closed subset Y ¨ X. For example, if X is a -attractor, and (X; ˚ ) is minimal as

981

982

Ergodic Theory of Cellular Automata

a dynamical system, then X is a minimal -attractor. This concept was introduced by Milnor [96,97] in the context of smooth dynamical systems; its ramifications for CA were first explored by Hurley [64,65]. D If  2 Meas(AZ ;  ), then  is weakly -mixing if, D for any measurable sets U; V  AZ , there is a subset J  Z D of density 1 such that limJ 3j!1 [ j (U) \ V] D [U]  [V] (see Subsect. “Mixing and Ergodicity”). For example, any Bernoulli measure is weakly mixing. A subD shift X  AZ is -minimal if X contains no proper nonempty subshifts. For example, if X is just the -orbit of some -periodic point, then X is -minimal. Proposition 42 Let ˚ 2 CA(AM ), let  2 Meas(AM ;  ), and let X be a -attractor. (a) If  is -ergodic, and X  AM is a subshift, then [Basin(X)] D 1. (b) If M is countable, and X is -minimal subshift with [Basin(X)] D 1, then X is lean. (c) Suppose M D Z D and  is weakly -mixing. (i) If X is a minimal -attractor, then X is a subshift, so [Basin(X)] D 1, and thus X is the only lean -attractor of ˚. (ii) If X is a ˚-periodic orbit which is also a lean -attractor, then X is minimal, [Basin(X)] D 1, and X contains only constant configurations. Proof (a) If X is -invariant, then Basin(X) is also  -invariant; hence [Basin(X)] D 1 because  is -ergodic. (b) Suppose Y ¨ X was a proper closed subset with [Basin(Y)] D 1. For any m 2 M, it is easy to check that Basin( m [Y]) D  m [Basin(Y)]. Thus, if T T e Y) D m2M  m [Basin(Y)], Y :D m2M  m (Y), then Basin(e Y)] D 1 (because M is countable). Thus, e Y is so [Basin(e nonempty, and is a subshift of X. But X is -minimal, so e Y D X, which means Y D X. Thus, X is a lean -attractor. (c) In the case when  is a Bernoulli measure, (c)[i] is Theorem B in [64] or Proposition 2.7 in [65], while (c)[ii] is Theorem A in [65]. Hurley’s proofs easily extend to the case when  is weakly -mixing. The only property we require of  is this: for any nontrivial meaD surable sets U; V  AZ , and any z 2 Z D , there is some x; y 2 Z D with z D x  y, such that [ y (U) \ V] > 0 and [ x (U) \ V] > 0. This is clearly true if  is weakly mixing (because if J  Z D has density 1, then J \ (z C J ) ¤ ; for any z 2 Z D ). Proof sketch for (c)[i] If X is a (minimal) -attractor, then so is  y (X), and Basin[y(X)] D y(Basin[X]). Thus, weak mixing yields x; y 2 Z D such that Basin[x(X)] \ Basin[X] and Basin[y(X)] \ Basin[X] are both nontrivial. But the

basins of distinct minimal -attractors must be disjoint; thus  x (X) D X D  y (X). But x  y D z, so this means  z (X) D X. This holds for all z 2 Z D , so X is a subshift, so (a) implies [Basin(X)] D 1.  Section 4 of [64] contains several examples showing that the minimal topological attractor of ˚ can be different from its minimal -attractor. For example, a CA can have different minimal -attractors for different choices of . On the other hand, there is a CA possessing a minimal topological attractor but with no minimal -attractors for any Bernoulli measure . Hilmy–Hurley Centers Let a 2 AM . For any closed subset X  AM , we define a [X] :D lim inf N!1

N 1 X 1X (˚ t (a)) : N nD1

(Thus, if  is a ˚-ergodic measure on AM , then Birkhoff’s Ergodic Theorem asserts that a [X] D [X] for -almost all a 2 AM ). The center of a is the set: o \n Cent < (a; ˚ ) :D closed subsets X AM ; a [X] D 1 : Thus, Cent(a; ˚ ) is the smallest closed subset such that a [Cent(a; ˚ )] D 1. If X  AM is closed, then the well of X is the set n o Well(X) :D a 2 AM ; Cent(a; ˚ ) X : If  2 Meas(AM ), then X is a -center if [Well(X)] > 0; we call X a lean -center if in addition, [Well(X)] > [Well(Y)] for any proper closed subset Y ¨ X. Finally, a -center X is minimal if [Well(Y)] D 0 for any proper closed subset Y ¨ X. This concept was introduced by Hilmy [59] in the context of smooth dynamical systems; its ramifications for CA were first explored by Hurley [65]. Proposition 43 Let ˚ 2 CA(AM ), let  2 Meas(AM ;  ), and let X be a -center. (a) If  is  -ergodic, and X  AM is a subshift, then [Well(X)] D 1. (b) If M is countable, and X is  -minimal subshift with [Well(X)] D 1, then X is lean. (c) Suppose M D Z D and  is weakly  -mixing. If X is a minimal -center, then X is a subshift, X is the only lean -center, and [Well (X)] D 1. Proof (a) and (b) are very similar to the proofs of Proposition 42(a,b).

Ergodic Theory of Cellular Automata

(c) is proved for Bernoulli measures as Theorem B in [65]. The proof is quite similar to Proposition 42(c)[i], and again, we only need  to be weakly mixing.  Section 4 of [65] contains several examples of minimal -centers which are not -attractors. In particular, the analogue of Proposition 42(c)[ii] is false for -centers. K˚urka–Maass -limit Sets If ˚ 2 CA(AM ) and  2 Meas(AM ;  ), then K˚urka and Maass define the -limit set of ˚: \n (˚; ) :D closed subsets X  AM ; o lim ˚ t (X) D 1 : t!1

It suffices to take this intersection only over all cylinder sets X. By doing this, we see that (˚; ) is a subshift of AM , and is defined by the following property: for any finite B  M and any word b 2 AB , b is admissible to (˚; ) if and only if lim inf t!1 ˚ t [b] > 0. Proposition 44 Let ˚ 2 CA(AM ) and  2 Meas(AM ; ). (a) If wk  lim ˚ t  D , then (˚; ) D supp( ). t!1

Suppose M D Z. (b) If ˚ is surjective and has an equicontinuous point, and  has full support on AZ , then (˚; ) D AZ . (c) If ˚ is left- or right-permutative and  is connected (see below), then (˚; ) D AZ . Proof For (a), see Proposition 2 in [79]. For (b,c), see Theorems 2 and 3 in [77]; for earlier special cases of these results, see also Propositions 4 and 5 in [79].  Remark 45 (a) In Proposition 44(c), the measure  is connected if there is some constant C > 0 such that, for any finite word b 2 A , and any a 2 A, we have [b a] C  [b] and [a b] C  [b]. For example, any Bernoulli, Markov, or N-step Markov measure with full support is connected. Also, any measure with ‘complete connections’ (see Example 22(a)) is connected. (b) Proposition 44(a) shows that -limit sets are closely related to the weak* limits of measures. Recall from Subsect. “Asymptotic Randomization by Linear Cellular Automata” that the uniform Bernoulli measure  is the weak* limit of a large class of initial measures under the action of linear CA. Presumably the same result should hold for a much larger class of permutative CA, but so far this is unproven, except in some special cases [see Remarks 41(d,e)]. Proposition .44(a,c) implies that the limit measure of a permutative CA (if it exists) must have full support – hence it can’t be ‘too far’ from .

 K˚urka’s Measure Attractors Let Minv :D Meas(AM ;  )   have the weak* topology, and define ˚ : Minv !Minv by ˚ () D  ı ˚ 1 . Then ˚ is continuous, so we can  treat (Minv ; ˚ ) itself as a compact topological dynamical system. The “weak* limit measures” of ˚ are simply  ; ˚ ). However, even the attracting fixed points of (Minv if the ˚ -orbit of a measure  does not weak* converge to a fixed point, we can still consider the omega-limit set  ) is the union of of . In particular, the limit set ˚1 (Minv the omega-limit sets of all  -invariant initial measures under ˚ . K˚urka defines the measure attractor of ˚: [  )g AM : MeasAttr(˚ ) :D fsupp() ;  2 ˚1 (Minv

(The bar denotes topological closure.) A configuration D a 2 AZ is densely recurrent if any word which occurs in a does so with nonzero frequency. Formally, for any finite B  ZD  ˚ # z 2 [N : : : N] D ; aBCz D aB lim sup >0: (2N C 1) D N!1 If X  AZ is a subshift, then the densely recurrent subshift of X is the closure D of the set of all  densely recurrent points in X. If  2 Minv (X), then the Birkhoff Ergodic Theorem implies that supp() D; see Proposition 8.8 in [1], p. 164. From this it fol  lows that Minv (X) D Minv (D). On the other hand, S  D D fsupp() ;  2 Minv (D)g. In other words, densely recurrent subshifts are the only subshifts which are ‘covered’ by their own set of shift-invariant measures. D

Proposition 46 Let ˚ 2 CA(AM )[AZ ]. Let D be D the densely recurrent subshift of ˚ 1 (AZ ). Then  D D MeasAttr(˚ ), and ˚ 1 (Minv ) D Meas(D;  ). D

Proof Case D D 1 is Proposition 13 in [78]. The same proof works for D 2.  Synthesis The various hybrid modes of self-organization are related as follows: Proposition 47 Let ˚ 2 CA(AM ). (a) Let  2 Meas(AM ;  ) and let X  AM be any closed set. If X is a topological attractor and  has full support, then X is a -attractor. [ii] If X is a -attractor, then X is a -center. [iii] Suppose M D Z D , and that  is weakly  -mixing. Let Y be the intersection of all topological attractors of ˚. If ˚ has a minimal -attractor X, then X Y.

[i]

983

984

Ergodic Theory of Cellular Automata

T [iv] If  is -ergodic, then (˚; ) fX AM ; D X a subshift and -attractorg ˚ 1 (AZ ). [v] Thus, if  is -ergodic and has full support, then (˚; )



X AM ; X a subshift and  a topological attractor :

[vi] If X is a subshift, then ((˚; ) X) ()  (!(˚ ; ) Minv (X)).

(b) Let M D Z D . Let B be the set of all Bernoulli meaD sures on AZ , and for any ˇ 2 B, let Xˇ be the minimal ˇ-attractor for ˚ (if it exists). D There is a comeager subset A  AZ such that S T ˇ 2B Xˇ a2A !(a; ˚ ).  S˚  (c) MeasAttr(˚) D (˚; ) ;  2 Minv (AM ) . D (d) If M D Z D , then MeasAttr(˚) ˚ 1 (AZ ). Proof (a)[i]: If U is a clopen subset and ˚ 1 (U) D X, then U Basin(X); thus, 0 < [U]  [Basin(X)], where the “ 0. (a)[iii] is Proposition 3.3 in [64]. (Again, Hurley states and proves this in the case when  is a Bernoulli measure, but his proof only requires weak mixing). (a)[iv]: Let X be a subshift and a -attractor; we claim that (˚; ) X. Proposition 42(a) says [Basin(X)] D 1. Let B  M be any finite set. If b 2 AB n XB , then  ˚ a 2 AM ; 9 T 2 N such that 8t T ; ˚ t (a)B ¤ b  Basin(X) : It follows that the left-hand set has -measure 1, which implies that lim t!1 ˚ t hbi D 0 – hence b is a forbidden word in (˚; ). Thus, all the words forbidden in X are also forbidden in (˚; ). Thus (˚; ) X. (The case M D Z of (a)[iv] appears as Prop. 1 in [79] and as Prop. II.27, p. 67 in [120]; see also Cor. II.30 in [120] for a slightly stronger result.) (a)[v] follows from (a)[iv] and (a)[i]. (a)[vi] is Proposition 1 in [77] or Proposition 10 in [78]; the argument is fairly similar to (a)[iv]. (K˚urka assumes M D Z, but this is not necessary.) (b) is Proposition 5.2 in [64].

 (c) Let X  AM be a subshift and let Minv  Minv (AM ). Then

D

(MeasAttr(˚ ) X)  () (supp( ) X ; 8 2 ˚1 (Minv ))   (X) ; 8 2 ˚1 (Minv )) () ( 2 Minv   () (˚1 (Minv ) Minv (X))   (X) ; 8  2 Minv ) () (!(˚ ; ) Minv  () ((˚; ) X ; 8  2 Minv )

()

[  () ( f(˚; )  2 Minv g X) : where () is by (a)[vi]. It follows that MeasAttr(˚ ) D  S˚  (˚; ) ;  2 Minv (AM ) . (d) follows immediately from Proposition 46.  Examples and Applications The most natural examples of these hybrid modes of self-organization arise in the particle cellular automata (PCA) introduced in Subsect. “Domains, Defects, and Particles”. The long-term dynamics of a PCA involves a steady reduction in particle density, as particles coalesce or annihilate one another in collisions. Thus, presumably, for almost any initial configuration a 2 AZ , the sequence f˚ t (a)g1 tD1 should converge to the subshift Z of configurations containing no particles (or at least, no particles of certain types), as t!1. Unfortunately, this presumption is generally false if we interpret ‘convergence’ in the strict topological dynamical sense: the occasional particles will continue to wander near the origin at arbitrarily large times in the future orbit of a (albeit with diminishing frequency), so !(a; ˚ ) will not be contained in Z. However, the presumption becomes true if we instead employ one of the more flexible hybrid notions introduced above. For example, most initial probability measures  should converge, under iteration of ˚ to a measure concentrated on configurations with few or no particles; hence we expect that (˚; ) Z. As discussed in Subsect. “Domains, Defects, and Particles”, a result about selforganization in a PCA can sometimes be translated into an analogous result about self-organization in associated coalescent-domain CA. Proposition 48 Let A D f0; ˙1g and let  2 CA(AZ ) be the Ballistic Annihilation Model (BAM) from Example 33. Let R :D f0; 1gZ and L :D f0; 1gZ . (a) If  2 Meas(AZ ;  ), then D wk  lim t!1  t () exists, and has one of three forms: either

2 Meas(R;  ), or 2 Meas(L;  ), or D ı0 , the point mass on the sequence 0 D [: : : 000 : : :]. (b) Thus, the measure attractor of ˚ is R [ L (note that R \ L D f0g).

Ergodic Theory of Cellular Automata

(c) In particular, if  is a Bernoulli measure on AZ with [C1] D [1], then D ı0 . (d) Let  be a Bernoulli measure on AZ . If [C1] > [1], then R is a -attractor – i. e. [Basin(R)] > 0. [ii] If [C1] < [1], then L is a -attractor. [iii] If [C1] D [1], then f0g is not a -attractor, because [Basinf0g] D 0. However, (˚; ) D f0g. [i]

Proof (a) is Theorem 6 of [8], and (b) follows from (a). (c) follows from Theorem 2 of [42]. (d)[i,ii] were first observed by Gilman (see Sect. 3, pp. 111–112 in [45], and later by K˚urka and Maass (see Example 4 in [80]). (d)[iii] follows immediately from (c): the statement (˚; ) D f0g is equivalent to asserting that lim t!1 ˚ t [˙1] D 0, which a consequence of (c). Another proof of (d)[iii] is Proposition 11 in [80]; see also Example 3 in [79] or Prop. II.32, p. 70 in [120].  Corollary 49 Let A D Z/3 , let ˚ 2 CA(AZ ) be the CCA3 (see Example 32), and let  be the uniform Bernoulli 1 measure on AZ . Then wk  lim ˚ t () D (ı0 C ı1 C ı2 ), t!1 3 where ı a is the point mass on the sequence [: : : aaa : : :] for each a 2 A. Proof Combine Proposition 48(c) with the factor map  in Example 34(b). See Theorem 1 of [42] for details.  Corollary 50 Let A D f0; 1g, let ˚ 2 CA(AZ ) be ECA#184 (see Example 31). (a) MeasAttr(˚) D R [ L, where R  AZ is the set of sequences not containing [11], and L  AZ is the set of sequences not containing [00]. (b) If  is the uniform Bernoulli measure on AZ , then 1 wk  lim ˚ t () D (ı0 C ı1 ), where ı0 and ı1 are the t!1 2 point masses on [: : : 010:101 : : :] and [: : : 101:010 : : :]. (c) Let  be a Bernoulli measure on AZ . [i] If [0] > [1], then R is a -attractor – i. e. [Basin(R)] > 0. [ii] If [0] < [1], then L is a -attractor. Proof sketch Let  be the factor map from Example 34(a). To prove (a), apply  to Proposition 48(b); see Example 26, Sect. 9 in [78] for details. To prove (b), apply  to Proposition 48(c); see Proposition 12 in [80] for details. To prove (c), apply  to Proposition 48(d)[i,ii];  Remark 51 (a) The other parts of Proposition 48 can likewise be translated into equivalent statements about

the measure attractors and -attractors of CCA3 and ECA#184. (b) Recall that ECA#184 is a model of single-lane, traffic, where each car is either stopped or moving rightwards at unit speed. Blank [10] has extended Corollary 50(c) to a much broader class of CA models of multi-lane, multispeed traffic. For any such model, let R  AZ be the set of ‘free flowing’ configurations where each car has enough space to move rightwards at its maximum possible speed. Let L  AZ be the set of ‘jammed’ configurations where the cars are so tightly packed that the jammed clusters can propagate (leftwards) through the cars at maximum speed. If  is any Bernoulli measure, then [Basin(R)] D 1 if the -average density of cars is greater than 1/2, whereas [Basin(L)] D 1 if the density is less than 1/2 Theorems 1.2 and 1.3 in [10]. Thus, L t R is a (non-lean) -attractor, although not a topological attractor Lemma 2.13 in [10]. Example 52 A cyclic addition and ballistic annihilation model (CABAM) contains the same ‘moving’ particles ˙1 as the BAM (Example 33), but also has one or more ‘stationary’ particle types. Let 3  N 2 N, and let P D f1; 2; : : : ; N  1g  Z/N , where we identify N  1 with -1, modulo N. It will be convenient to represent the ‘vacant’ state ; as 0; thus, A D Z/N . The particles 1 and –1 have velocities and collisions as in the BAM, namely: v(1) D 1 ;

v(1) D 1 ; and  1 C 1 Ý ; :

We set v(p) D 0, for all p 2 [2 : : : N  2], and employ the following collision rule: If p1 Cp0 Cp1 q;

(mod N); then p1 Cp0 Cp1 Ýq: (5)

(here, any one of p1 , p0 , p1 , or q could be 0, signifying vacancy). For example, if N D 5 and a (rightward moving) type C1 particle strikes a (stationary) type 3 particle, then the C1 particle is annihilated and the 3 particle turns into a (stationary) 4 particle. If another C1 particle hits the 4 particle, then both are annihilated, leaving a vacancy (0). Let B D Z/N , and let  2 CA(AZ ) be the CABAM. Then the set of fixed points of  is F D ff 2 BZ ; fz ¤ ˙1; 8z 2 Zg. Note that, if b 2 Basin[F] – that is, if !(b;  ) F – then in fact lim  t (b) exists and t!1

is a  -fixed point. Proposition 53 Let B D Z/N , let  2 CA(AZ ) be the CABAM, and let  be the uniform Bernoulli measure on BZ . If N 5, then F is a ‘global’ -attractor – that is, [Basin(F)] D 1. However, if N  4, then [Basin(F)] D 0. Proof See Theorem 1 of [41].



985

986

Ergodic Theory of Cellular Automata

Let A D Z/N and let ˚ 2 CA(AZ ) be the N-color CCA from Example 32. Then the set of fixed points of ˚ is F D ff 2 AZ ; fz  fzC1 ¤ ˙1; 8z 2 Zg. Note that, if a 2 Basin[F], then in fact lim ˚ t (a) exists and is a ˚ -fixed t!1 point. Corollary 54 Let A D Z/N , let ˚ 2 CA(AZ ) be the N-color CCA, and let  be the uniform Bernoulli measure on AZ If N 5, then F is a ‘global’ -attractor – that is, [Basin(F)] D 1. However, if N  4, then [Basin(F)] D 0. Proof sketch Let B D Z/N and let  2 CA(BZ ) be the N-particle CABAM. Construct a factor map  : AZ !BZ with local rule  (a0 ; a1 ) :D (a0  a1 ) mod N, similar to Example 34(b). Then  ı ˚ D  ı  , and the  -particles track the ˚-domain boundaries. Now apply  to Proposition 53.  Example 55 Let A D f0; 1g and let H D f1; 0; 1g. Elementary Cellular Automaton #18 is the one-dimensional CA with local rule  : AH !A given: [100] D 1 D [001], and (a) D 0 for all other a 2 AH . Empirically, ECA#18 has one stable phase: the odd sofic 1  0  . 0 shift S, defined by the A-labeled digraph  In other words, a sequence is admissible to S as long as an pair of consecutive ones are separated by an odd number of zeroes. Thus, a defect is any word of the form 102m 1 (where 02m represents 2m zeroes) for any m 2 N. Thus, defects can be arbitrarily large, they can grow and move arbitrarily quickly, and they can coalesce across arbitrarily large distances. Thus, it is impossible to construct a particle CA which tracks the motion of these defects. Nevertheless, in computer simulations, one can visually follow the moving defects through time, and they appear to perform random walks. Over time, the density of defects decreases as they randomly collide and annihilate. This was empirically observed by Grassberger [47,48] and Boccara et al. [11]. Lind (see Sect. 5 in [82]) conjectured that this gradual elimination of defects caused almost all initial conditions to converge, in some sense, to S under application of ˚. Eloranta and Numelin [34] proved that the defects of ˚ individually perform random walks. However, the motions of neighboring defects are highly correlated. They are not independent random walks, so one cannot use standard results about stochastic interacting particle systems to conclude that the defect density converges to zero. To solve problems like this, K˚urka [77] developed a theory of ‘particle weight functions’ for CA. Let A be the set of all finite words in the alphabet A. A particle weight function is a bounded function

p : A !N, so that, for any a 2 AZ , we interpret # p (a) :D

1 X X

p(a[z:::zCr] ) and

rD0 z2Z

ı p (a) :D

1 X rD0

lim

N!1

N 1 X p(a[z:::zCr] ) 2N zDN

to be, respectively the ‘number of particles’ and ‘density of particles’ in configuration a (clearly # p (a) is finite if and only if ı p (a) D 0). The function p can count the single-letter ‘particles’ of a PCA, or the short-length ‘domain boundaries’ found in ECA#184 and the CCA of Examples 31 and 32. However, p can also track the arbitrarily large defects of ECA#18. For example, define p18 (102m 1) D 1 (for any m 2 N), and define p18 (a) D 0 for all other a 2 A . Let Z p :D fa 2 AZ ; # p (a) D 0g be the set of vacuum configurations. (For example, if p D p18 as above, then Z p is just the odd sofic shift S.) If the iteration of a CA ˚ decreases the number (or density) of particles, then one expects Z p to be a limit set for ˚ in some  Z sense. Indeed, if R  2 Minv :D Meas(A ;  ), then we define  p () :D AZ ı p d. If ˚ is ‘p-decreasing’ in a certain sense, then p acts as a Lyapunov function for the  ; ˚ ). Thus, with certain technical dynamical system (Minv  assumptions, we can show that, if  2 Minv is connected, then (˚; ) Z p (see Theorem 8 in [77]). Furthermore, under certain conditions, MeasAttr(˚ ) Z p (see Theorem 7 in [77]). Using this machinery, K˚urka proved: Proposition 56 Let ˚ : AZ !AZ be ECA#18, and let S  AZ be the odd sofic shift. If  2 Meas(AZ ;  ) is connected, then (˚; ) S. Proof See Example 6.3 of [77].



Measurable Dynamics If ˚ 2 CA(AM ) and  2 Meas(AM ; ˚ ), then the triple (AM ; ; ˚) is a measure-preserving dynamical system (MPDS), and thus, amenable to the methods of classical ergodic theory. Mixing and Ergodicity If ˚ 2 CA(AM ), then the topological dynamical system (AM ; ˚ ) is topologically transitive (or topologically ergodic) if, if, for any open subsets U; V AM , there exists t 2 N such that U \ ˚ t (V) ¤ ;. Equivalently, there exists some a 2 AM whose orbit O(a) :D f˚ t (a)g1 tD0 is dense in AM . If  2 Meas(AM ; ˚ ), then the system (AM ; ; ˚) is ergodic if, for any nontrivial measurable U; V AM , there exists some t 2 N such that

Ergodic Theory of Cellular Automata

[U \ ˚ t (V)] > 0. The system (AM ; ; ˚) is totally ergodic if (AM ; ; ˚ n ) is ergodic for every n 2 N. The system (AM ; ; ˚) is (strongly) mixing if, for any nontrivial measurable U; V AM .   (6) lim  U \ ˚ t (V) D [U]  [V] : t!1

The system (AM ; ; ˚) is weakly mixing if the limit (6) holds as n!1 along an increasing subsequence ft n g1 nD1 of density one – i. e. such that lim n!1 t n /n D 1. For any M 2 N, we say (AM ; ; ˚ ) is M-mixing if, for any measurable U0 ; U1 ; : : : ; U M AM . " M # M \ Y t m lim  ˚ (Um ) D [Um ] (7) jt n t m j!1 8n¤m2[0:::M]

mD0

mD0

(thus, ‘strong’ mixing is 1-mixing). We say (AM ; ; ˚ ) is multimixing (or mixing of all orders) if (AM ; ; ˚) is M-mixing for all M 2 N. We say (AM ; ; ˚ ) is a Bernoulli endomorphism if its natural extension is measurably isomorphic to a system (BZ ; ˇ; ), where ˇ 2 Meas(BZ ; ) is a Bernoulli measure. We say (AM ; ; ˚) is a Kolmogorov endomorphism if its natural extension is a Kolmogorov (or “K”) automorphism. Theorem 57 Let ˚ 2 CA(AM ), let  2 Meas(AM ; ˚), and let X D supp(). Then X is a compact, ˚-invariant set. Furthermore: (; ˚ ) is Bernoulli H) (; ˚ ) is Kolmogorov H) (; ˚ ) is multimixing H) (; ˚ ) is mixing H) (; ˚ ) is weakly mixing H) (; ˚ ) is totally ergodic H) (; ˚ ) is ergodic H) The system (X; ˚ ) is topologically transitive H) ˚ : X!X is surjective. Theorem 58 Let ˚ 2 CA(AN ) be posexpansive (see Subsect. “Posexpansive and Permutative CA”). Then (AN ; ˚ ) has topological entropy log2 (k) for some k 2 N, ˚ preserves the uniform measure , and (AN ; ; ˚ ) is a uniformly distributed Bernoulli endomorphism on an alphabet of cardinality k. Proof Extend the argument of Theorem 18. See Corollary 3.10 in [9] or Theorem 4.8(5) in [86].  Example 59 Suppose ˚ 2 CA(AN ) is right-permutative, with neighborhood [r : : : R], where 0  r < R. Then htop (˚) D log2 (jAj R ), so Theorem 58 says that (AN ; ; ˚) is a uniformly distributed Bernoulli endomorphism on the alphabet B :D AR . In this case, it is easy to see this directly. If ˚BN : N () is the uniAN !BN is as in Eq. (2), then ˇ :D ˚B N N form Bernoulli measure on B , and ˚B is an isomorphism from (AN ; ; ˚) to (BN ; ˇ; ).

Theorem 60 Let ˚ 2 CA(AZ ) have neighborhood [L : : : R]. Suppose that either (a) 0  L < R and ˚ is right-permutative; or (b) L < R  0 and ˚ is left-permutative; or (c) L < R and ˚ is bipermutative; or (d) ˚ is posexpansive. Then ˚ preserves the uniform measure , and (AZ ; ; ˚) is a Bernoulli endomorphism. Proof For cases (a) and (b), see Theorem 2.2 in [125]. For case (c), see Theorem 2.7 in [125] or Corollary 7.3 in [74]. For (d), extend the argument of Theorem 14; see Theorem 4.9 in [86].  Remark Theorem 60(c) can be extended to some higherdimensional permutative CA using Proposition 1 in [3]; see Remark 12(b). Theorem 61 Let ˚ 2 CA(AZ ) have neighborhood [L : : : R]. Suppose that either (a) ˚ is surjective and 0 < L  R; or (b) ˚ is surjective and L  R < 0; or (c) ˚ is right-permutative and R ¤ 0; or (d) ˚ is left-permutative and L ¤ 0. Then ˚ preserves , and (AZ ; ; ˚) is a Kolmogorov endomorphism. Proof Cases (a) and (b) are Theorem 2.4 in [125]. Cases (c) and (d) are from [129].  Corollary 62 Any CA satisfying the hypotheses of Theorem 61 is multimixing. Proof This follows from Theorems 57 and 61. See also Theorem 3.2 in [131] for a direct proof that any CA satisfying hypotheses (a) or (b) is 1-mixing. See Theorem 6.6 in [74] for a proof that any CA satisfying hypotheses (c) or (d) is multimixing.  Let ˚ 2 CA(AZ ) have neighborhood H. An element x 2 H is extremal if hx; xi > hx; hi for all h 2 H n fxg. We say ˚ is extremally permutative if ˚ is permutative in some extremal coordinate. D

Theorem 63 Let ˚ 2 CA(AZ ) and let  be the uniform D measure. If ˚ is extremally permutative, then (AZ ; ; ˚) is mixing. D

Proof

See Theorem A in [144] for the case D D 2 and

A D Z/2 . Willson described ˚ as ‘linear’ in an extremal

coordinate (which is equivalent to permutative when A D Z/2 ), and then concluded that ˚ was ‘ergodic’ –

however, he did this by explicitly showing that ˚ was mixing. His proof technique easily generalizes to any extremally permutative CA on any alphabet, and any D 1. 

987

988

Ergodic Theory of Cellular Automata

Theorem 64 Let A D Z/m . Let ˚ 2 CA(AZ ) have linP ear local rule  : AH !A given by (aH ) D h2H ch  ah , where ch 2 Z for all h 2 H. Let  be the uniform meaD sure on AZ . The following are equivalent: D (a) ˚ preserves  and (AZ ; ; ˚ ) is ergodic. D (b) (AZ ; ˚ ) is topologically transitive. (c) gcdfch g0¤h2H is coprime to m. (d) For all prime divisors p of m, there is some nonzero h 2 H such that ch is not divisible by p. D

Proof Theorem 3.2 in [18]; see also [17]. For a different proof in the case D D 2, see Theorem 6 in [123].  Spectral Properties If  2 Meas(AM ), then let L2 D L2 (AM ; ) be the set ofR measurable functions f : AM !C such that k f k2 :D ( AM j f j2 d)1/2 is finite. If ˚ 2 CA(AM ) and  2 Meas(AM ; ˚ ), then ˚ defines a unitary linear operator ˚ : L2 !L2 by ˚ ( f ) D f ı ˚ for all f 2 L2 . If f 2 L2 , then f is an eigenfunction of ˚, with eigenvalue c 2 C, if ˚ ( f ) D c  f . By definition of ˚ , any eigenvalue must be an element of the unit circle T :D fc 2 C ; jcj D 1g. Let S˚  T be the set of all eigenvalues of ˚, and ˚  for any s 2 S˚ , let Es (˚) :D f 2 L2 ; ˚ f D s f be the corresponding eigenspace. For example, if f is constant -almost everywhere, then F f 2 E1 (˚). Let E(˚) :D s2S˚ Es (˚). Note that S˚ is a group. Indeed, if s1 ; s2 2 S˚ , and f1 2 Es 1 and f2 2 Es 2 , then ( f1 f2 ) 2 Es 1 s 2 and (1/ f1 ) 2 E1/s 1 . Thus, S˚ is called the spectral group of ˚ . If s 2 S˚ , then heuristically, an s-eigenfunction is an ‘observable’ of the dynamical system (AM ; ; ˚) which exhibits quasiperiodically recurrent behavior. Thus, the spectral properties of ˚ characterize the ‘recurrent aspect’ of its dynamics (or the lack thereof). For example:  (AM ; ; ˚) is ergodic () E1 (˚) contains only constant functions () dim[Es (˚)] D 1 for all s 2 S˚ .  (AM ; ; ˚) is weakly mixing (see Subsect. “Mixing and Ergodicity”) () E(˚) contains only constant functions () (AM ; ; ˚) is ergodic and S˚ D f1g. We say (AM ; ; ˚) has discrete spectrum if L2 is spanned by E(˚ ). In this case, (AM ; ; ˚) is measurably isomorphic to an MPDS defined by translation on a compact abelian group (e. g. an irrational rotation of a torus, an odometer, etc.). If  2 Meas(AM ;  ), then there is a natural unitary M-action on L2 , where m ( f ) D f ı m . A character of b M is a monoid homomorphism  : M!T . The set M

of all characters is a group under pointwise multiplication, b then f is called the dual group of M. If f 2 L2 and  2 M, M m a -eigenfunction of (A ; ;  ) if  ( f ) D (m)  f for all m 2 M; then  is called a eigencharacter. The spectral b of all group of (AM ; ;  ) is then the subgroup S  M eigencharacters. For any  2 S , let E ( ) be the correF sponding eigenspace, and let E( ) :D 2S˚ E ( ).  (AM ; ;  ) is ergodic () E1 ( ) contains only constant functions () dim[E ( )] D 1 for all  2 S .  (AM ; ;  ) is weakly mixing () E( ) contains only constant functions () (AM ; ;  ) is ergodic and S D f1g. (AM ; ;  ) has discrete spectrum if L2 is spanned by E( ). In this case, the system (AM ; ;  ) is measurably isomorphic to an action of M by translations on a compact abelian group. Example 65 Let M D Z; then any character  : Z!T has the form (n) D c n for some c 2 T , so a -eigenfunction is just a eigenfunction with eigenvalue c. In this case, the aforementioned spectral properties for the Z-action by shifts are equivalent to the corresponding spectral properties of the CA ˚ D  1 . Bernoulli measures and irreducible Markov chains are weakly mixing. On other hand, several important classes of symbolic dynamical systems have discrete spectrum, including Sturmian shifts, constant-length substitution shifts, and regular Toeplitz shifts; see  Dynamics of Cellular Automata in Non-compact Spaces. Proposition 66 Let ˚ 2 CA(AM ), and let  2 Meas(AM ; ˚;  ) be  -ergodic. (a) E( ) E(˚). (b) If (AM ; ;  ) has discrete spectrum, then so does (AM ; ; ˚). (c) Suppose  is ˚-ergodic. If (AM ; ;  ) is weakly mixing, then so is (AM ; ; ˚). b and f 2 E . Then f ı ˚ 2 E Proof (a) Suppose  2 M also, because for all m 2 M, f ı ˚ ı  m D f ı  m ı ˚ D (m)  f ı ˚ . But if (AM ; ;  ) is ergodic, then dim[E ( )] D 1; hence f ı ˚ must be a scalar multiple of f . Thus, f is also an eigenfunction for ˚. (b) follows from (a). (c) By reversing the roles of ˚ and  in (a), we see that E(˚) E( ). But if (AM ; ;  ) is weakly mixing, then E( ) D fconstant functionsg. Thus, (AM ; ; ˚) is also weakly mixing.  Example 67 (a) Let  be any Bernoulli measure on AM . If  is ˚-invariant and ˚ -ergodic, then (AM ; ; ˚) is weakly mixing (because (AM ; ;  ) is weakly mixing).

Ergodic Theory of Cellular Automata

(b) Let P 2 N and suppose  is a ˚-invariant measure supported on the set XP of P-periodic sequences (see Proposition 10). Then (AZ ; ; ) has discrete spectrum (with rational eigenvalues). But XP is finite, so the system (X P ; ˚ ) is also periodic; hence (AZ ; ; ˚) also has discrete spectrum (with rational eigenvalues). (c) Downarowicz [25] has constructed an example of a regular Toeplitz shift X  AZ and ˚ 2 CA(AZ ) (not the shift) such that ˚(X) X. Any regular Toeplitz shift is uniquely ergodic, and the unique shift-invariant measure  has discrete spectrum; thus, (AZ ; ; ˚) also has discrete spectrum. Aside from Examples 67(b,c), the literature contains no examples of discrete-spectrum, invariant measures for CA; this is an interesting area for future research. Entropy Let ˚ 2 CA(AM ). For any finite B  M, let B :D AB , let ˚BN : AN !BN be as in Eq. (2), and let X :D ˚BN (AM ) AN ; then define Htop (B; ˚ ) :D htop (X) D lim

T!1

Meas(AM ; ˚ ),

If  2 let :D variant measure on BN . Define

1 log2 (#X[0:::T) ) : T

˚BN ();

H (B; ˚) :D h () D  lim

T!1

1 T

then is a -in-

X

[b] log2 ( [b]):

b2B[0:::T)

The topological entropy of (AM ; ˚ ) and the measurable entropy of (AM ; ˚; ) are then defined htop (˚) :D sup Htop (B; ˚) and BM

finite

h (˚) :D sup H (B; ˚) : BM

finite

The famous Variational Principle states that htop (˚) D sup fh (˚);  2 Meas(AM ; ˚)g; see Sect. 10  Topological Dynamics of Cellular Automata. If M has more than one dimension (e. g. M D Z D or N D for D 2) then most CA on AM have infinite entropy. Thus, entropy is mainly of interest in the case M D Z or N. Coven [23] was the first to compute the topological entropy of a CA; he showed that htop (˚) D 1 for a large class of left-permutative, one-sided CA on f0; 1gN (which have since been called Coven CA). Later, Lind [83] showed how to construct CA whose topological entropy was any element of a countable dense subset of RC , consisting of logarithms of certain algebraic numbers.

Theorems 14 and 18(b) above characterize the topological entropy of posexpansive CA. However, Hurd et al. [62] showed that there is no algorithm which can compute the topological entropy of an arbitrary CA; see  Tiling Problem and Undecidability in Cellular Automata. Measurable entropy has also been computed for a few special classes of CA. For example, if ˚ 2 CA(AZ ) is bipermutative with neighborhood f0; 1g and  2 Meas(AZ ; ˚ ;  ) is  -ergodic, then h (˚ ) D log2 (K) for some integer K  jAj (see Thm 4.1 in [113]). If  is the uniform measure, and ˚ is posexpansive, then Theorems 58 and 60 above characterize h (˚ ). Also, if ˚ satisfies the conditions of Theorem 61, then h (˚ ) > 0, and furthermore, all factors of the MPDS (AZ ; ; ˚) also have positive entropy. However, unlike abstract dynamical systems, CA come with an explicit spatial ‘geometry’. The most fruitful investigations of CA entropy are those which have interpreted entropy in terms of how information propagates through this geometry. Lyapunov Exponents Wolfram [150] suggested that the propagation speed of ‘perturbations’ in a one-dimensional CA ˚ could transform ‘spatial’ entropy [i. e. h( )] into ‘temporal’ entropy [i. e. h(˚)]. He compared this propagation speed to the ‘Lyapunov exponent’ of a smooth dynamical system: it determines the exponential rate of divergence between two initially close ˚-orbits (see pp. 172, 261 and 514 in [151]). Shereshevsky [126] formalized Wolfram’s intuition and proved the conjectured entropy relationship; his results were later improved by Tisseur [141]. Let ˚ 2 CA(AZ ), let a 2 AZ , and let z 2 Z. Define o n Z and WC z (a) :D w 2 A ; w[z:::1) D a[z:::1) ; n o Z W z (a) :D w 2 A ; w(1:::z] D a(1:::z] :  Thus, we obtain each w 2 WC z (a) (respectively Wz (a)) by ‘perturbing’ a somewhere to the left (resp. right) of coordinate z. Next, for any t 2 N, define

˚  C   t  t C eC  t (a) :D min z 2 N ; ˚ W0 (a) Wz ˚ [a] ; and ˚     t  t  e  t (a) :D min z 2 N ; ˚ W0 (a) Wz ˚ [a] : e˙ Thus,  t measures the farthest distance which any perturbation of a at coordinate 0 could have propagated e˙ z by time t. Next, define ˙ t (a) :D max  t [ (a)]. Then z2Z

989

990

Ergodic Theory of Cellular Automata

Shereshevsky [126] defined the (maximum) Lyapunov exponents 1 C  (a) ; t!1 t t 1 (a) ;  (˚; a) :D lim  t!1 t t

C (˚; a) :D lim

and

AZ ; ˙

whenever these limits exist. Let G(˚ ) :D fg 2 (˚; g) both existg. The subset G(˚) is ‘generic’ within AZ in a very strong sense, and the Lyapunov exponents detect ‘chaotic’ topological dynamics. Proposition 68 Let ˚ 2 CA(AZ ). (a) Let  2 Meas(AZ ; ). Suppose that either: [i]  is also ˚-invariant; or: [ii]  is -ergodic and supp() is a ˚-invariant subset. Then (G) D 1. (b) The set G and the functions ˙ (˚; ) are (˚;  )-invariant. Thus, if  is either ˚-ergodic or  -ergodic, then there exist constants ˙  (˚) 0 such that ˙ (˚; g) D ˙ (˚ ) for all g 2 G.  (c) If ˚ is posexpansive, then there is a constant c > 0 such that ˙ (˚; g) c for all g 2 G. (d) Let  be the uniform Bernoulli measure. If ˚ is surjec tive, then htop (˚ )  C  (˚) C  (˚)  log jAj. Proof (a) follows from the fact that, for any a 2 AZ , the sequence [˙ t (a)] t2N is subadditive in t. Condition [i] is Theorem 1 in [126], and follows from Kingman’s subadditive ergodic theorem. Condition [ii] is Proposition 3.1 in [141]. (b) is clear by definition of ˙ . (c) is Theorem 5.2 in [37]. (d) is Proposition 5.3 in [141].  For any ˚-ergodic  2 Meas(AZ ; ˚;  ), Shereshevsky  (see Theorem 2 in [126]) showed that h (˚)  C  (˚ )C  (˚)  h (). Tisseur later improved this estimate. For    any T 2 N, let ˚ C e I T (a) :D min z 2 N ; 8 t 2 [1 : : : T] ;   t   C ˚ t WC and z (a) W0 ˚ [a] ˚  e I T (a) :D min z 2 N ; 8 t 2 [1 : : : T] ;   t    : ˚ t W z (a) W0 ˚ [a] Meas(AZ ;  ),

˙ b I T ()

Next, for any  2 define :D R ˙ e I (a) d[a]. AZ T Tisseur then defined the average Lyapunov exponents: ˙ ˙ (˚) :D lim infb I I T ()/T. T!1

Theorem 69 Let ˚ 2 CA(AZ ) and let  2 Meas(AZ ;  ). C (˚)  C (˚) and (a) If supp() is ˚-invariant, then I    I (˚)   (˚), and one or both inequalities are sometimes strict.

(b) If  is  -ergodic and ˚-invariant, then  C (˚ ) C I  (˚ )  h ( ), and this inequalh (˚ )  I   ity is sometimes strict. (c) If supp() contains ˚ -equicontinuous points, then C (˚ ) D I  (˚ ) D h (˚ ) D 0. I   Proof See [141]: (a) is Proposition 3.2 and Example 6.1; (b) is Theorem 5.1 and Example 6.2; and (c) is Proposition 5.2.  Directional Entropy Milnor [98,99] introduced directional entropy to capture the intuition that information in a CA propagates in particular directions with particular ‘velocities’, and that different CA ‘mix’ information in different ways. Classical entropy is unable to detect this informational anisotropy. For example, if A D f0; 1g and ˚ 2 CA(AZ ) has local rule (a0 ; a1 ) D a0 C a1 (mod 2), then htop (˚ ) D 1 D htop ( ), despite the fact that ˚ vigorously ‘mixes’ information together and propagates any ‘perturbation’ outwards in an expanding cone, whereas  merely shifts information to the left in a rigid and essentially trivial fashion. D If ˚ 2 CA(AZ ), then a complete history for ˚ is a seD DC1 quence (a t ) t2Z 2 (AZ )Z Š AZ such that ˚(a t ) D DC1 be the a tC1 for all t 2 Z. Let XHist :D XHist (˚ )  AZ subshift of all complete histories for ˚ , and let  be the Z DC1 shift action on XHist ; then (XHist ;  ) is conjugate to the natural extension of the system (Y; ˚;  ), where Y :D T t Z D ) is the omega-limit set of ˚ 1 (AM ) :D 1 tD1 ˚ (A D ˚. If  2 Meas(AZ ; ˚;  ), then supp() Y, and  ex˜ on XHist in the obvious tends to a  -invariant measure  way. Let v E D (v0 ; v1 ; : : : ; v D ) 2 R  R D Š R DC1 . For any E ) :D bounded open subset B  R DC1 and T > 0, let B(T v fb C t v E ; b 2 B and t 2 [0; T]g be the ‘sheared cylinder’ in R DC1 with cross-section B and length TjE vj in E ) \ Z DC1. Let the direction ˚v E , and let B(T v E ) :DB(T v XHist B(T vE ) :D xB(T vE) ; x 2 XHist (˚ ) . We define 1 log2 [#XHist B(T vE ) ] ; and T T!1 X 1 ˜ log2 ([x]) ˜ H (˚ ; B; v [x] E ) :D  lim sup : T!1 T Hist

E ) :D lim sup Htop (˚ ; B; v

x2X

B(TE v)

We then define the v E -directional topological entropy and v E -directional -entropy of ˚ by E ) :D htop (˚ ; v

sup B R DC1 open & bounded

htop (˚ ; B; v E) ;

and

(8)

Ergodic Theory of Cellular Automata

h (˚; v E ) :D

sup

h (˚; B; v E) :

(9)

B R DC1 open & bounded

Proposition 70 Let ˚ 2 CA(AZ ) and let  2 D Meas(AZ ; ˚;  ). (a) Directional entropy is homogeneous. That is, for any v E 2 R DC1 and r > 0, htop (˚; rv E ) D r  htop (˚; v E) E ) D r  h (˚; v E ). and h (˚; rv (b) If v E D (t; z) 2 Z  Z D , then htop (˚; v E ) D htop (˚ t ı z t E ) D h (˚ ı z).  ) and h (˚; v (c) There is an extension of the Z  Z D -system (XHist ; e; e ˚ ; ) to an R  R D -system (e X; ˚  ) such that, for D E) D any v E D (t; u E ) 2 R  R we have htop (˚; v et ı e htop (˚  uE ). D For any  2 Meas(AZ ; ˚;  ), there is an extension e; e ˜ 2 Meas(e  X; ˚  ) such that for any v E D (t; u E) 2 t D u E e E ) D h˜ (˚ ı e  ). R  R we have h (˚; v D

Proof (a,b) follow from the definition. (c) is Proposition 2.1 in [109].  Remark 71 Directional entropy can actually be defined for any continuous Z DC1-action on a compact metric DC1 space, and in particular, for any subshift of AZ . The directional entropy of a CA ˚ is then just the directional entropy of the subshift XHist (˚). Proposition 70 holds for any subshift. Directional entropy is usually infinite for multidimensional CA (for the same reason that classical entropy is usually infinite). Thus, most of the analysis has been for one-dimensional CA. For example, Kitchens and Schmidt (see Sect. 1 in [72]) studied the directional topological entropy of one-dimensional linear CA, while Smillie (see Proposition 1.1 in [133]) computed the directional topological entropy for ECA#184. If ˚ is linear, then the function v E 7! htop (˚; v E ) is piecewise linear and convex, but if ˚ is ECA#184, it is neither. If v E has rational entries, then Proposition 70(a,b) shows that h(˚; v E ) is a rational multiple of the classical entropy of some composite CA, which can be computed through classical methods. However, if v E is irrational, then h(˚; v E ) is quite difficult to compute using the formulae (8) and (9), and Proposition 70(c), while theoretically interesting, is not very computationally useful. Can we compute h(˚; v E ) as the limit of h(˚; v E k ) where fE v k g1 kD1 is a sequence of rational vectors tending to v E ? In other words, is directional entropy continuous as a function of v E ? What other properties has h(˚; v E ) as a function of v E? Theorem 72 Let ˚ 2 CA(AZ ) and let  2 Meas(AZ ; ˚). E 7! h (˚; v E ) 2 R is continuous. (a) The function R2 3 v

(b) Suppose there is some (t; z) 2 N  Z with t 1, such that ˚ t ı  z is posexpansive. Then the function E 7! htop (˚; v E ) 2 R is convex, and thus, LipsR2 3 v chitz-continuous. (c) However, there exist other ˚ 2 CA(AZ ) for which the E 7! htop (˚; v E ) 2 R is not continuous. function R2 3 v (d) Suppose ˚ has neighborhood [` : : : r]  Z. If v E D (t; x) 2 R2 , then let z` :D x  `t and zr :D x C rt. Let L :D log jAj. E )  maxfjz` j; [i] Suppose z`  zr 0. Then h (˚ ; v jzr jg  L. Furthermore:  If ˚ is right-permutative, and jz` j  jzr j, then h (˚ ; v E ) D jzr j  L.  If ˚ is left-permutative, and jzr j  jz` j, then h (˚ ; v E ) D jzr j  L. E )  jzr  z` j  L. [ii] Suppose z`  zr  0. Then h (˚ ; v Furthermore, if ˚ is bipermutative in this case, then h (˚ ; v E ) D jzr  z` j  L. Proof (a) is Corollary 3.3 in [109], while (b) is Théorème III.11 and Corollaire III.12, pp. 79–80 in [120]. (c) is Proposition 1.2 in [133]. (d) summarizes the main results of [21]. See also Example 6.2 in [99] for an earlier analysis of permutative CA in the case r D ` D 1; see also Example 6.4 in [12] and Sect. 1 in Sect. 1 in [72] for the special case when ˚ is linear.  Remark 73 (a) In fact, the conclusion of Theorem 72(b) holds as long as ˚ has any posexpansive directions (even irrational ones). A posexpansive direction is analogous to an expansive subspace (see Subsect. “Entropy Geometry and Expansive Subdynamics”), and is part of Sablik’s theory of ‘directional dynamics’ for one-dimensional CA; see Remark 84(b) below. Using this theory, Sablik has also shown that h (˚ ; v E ) D 0 D htop (˚; v E ) whenever v E is an E) ¤ 0 ¤ equicontinuous direction for ˚, whereas h (˚ ; v E ) whenever v E is a right- or left posexpansive dihtop (˚; v rection for ˚. See Sect. §III.4.5–Sect. §III.4.6, pp. 86–88 in [120]. (b) Courbage and Kami´nski have defined a ‘directional’ version of the Lyapunov exponents introduced in Subsect. “Lyapunov Exponents”. If ˚ 2 CA(AZ ), a 2 AZ and v E D (t; z) 2 N  Z, then ˙ (˚ t ı  z ; a), where ˙ are defined as (˚; a) :D  ˙ v E in Subsect. “Lyapunov Exponents”. If v E 2 R2 is irrational, then the definition of ˙ (˚; a) is somewhat more subtle. v E 2 ˙ E 7! vE (˚; a) 2 R is For any ˚ and a, the function R 3 v homogeneous and continuous (see Lemma 2 and Proposition 3 in [22]). If  2 Meas(AZ ; ˚;  ) is  -ergodic, then

991

992

Ergodic Theory of Cellular Automata

˙ (˚; ) is constant -almost everywhere, and is related v E to h (˚; v E ) through an inequality exactly analogous to Theorem 69(b); see Theorem 1 in [22]. Cone Entropy For any v E 2 R DC1 , any angle > 0, and any N > 0, we define n vj and K(N v E ; ) :D z 2 Z DC1 ; jzj  NjE o zv E /jzjjE vj cos( ) : Geometrically, this is the set of all Z DC1 -lattice points in a cone of length NjE vj which subtends an angle of 2 around an axis parallel to v E , and which has its apex at D the origin. If ˚ 2 CA(AZ ), then let XHist (N v E ; ) :D ˚  D Hist Z ˜ is xK(N vE; ) ; x 2 X (˚) . If  2 Meas(A ; ˚), and  Hist the extension of  to X , then the cone entropy of (˚; ) in direction v E is defined cone (˚; v E ) :D h

 lim

lim

 &0 N!1

1 N

X

˜ log2 ([x]) ˜ [x] :

x2XHist (N v E ; )

Park [107,108] attributes this concept to Doug Lind. Like directional entropy, cone entropy can be defined for any continuous Z DC1-action, and is generally infinite for multidimensional CA. However, for one-dimensional CA, Park has proved: Theorem 74 If ˚ 2 CA(AZ ),  2 Meas(AZ ; ˚) and cone (˚; v E ) D h (˚; v E ). v E 2 R2 , then h Proof See Theorem 1 in [108].



Entropy Geometry and Expansive Subdynamics Directional entropy is the one-dimensional version of a multidimensional ‘entropy density’ function, which was introduced by Milnor [99] to address the fact that classical and directional entropy are generally infinite for multidimensional CA. Milnor’s ideas were then extended by Boyle and Lind [12], using their theory of expansive subdynamics. DC1 be a subshift, and let  2 Meas(X;  ). Let X  AZ For any bounded B  R DC1 , let B :D B \ Z DC1, let XB :D XB , and then define HX (B) :D log2 jXB j and X H (B) :D  [x] log2 ([x]) : x2XB

The topological entropy dimension dim(X) is the smallest d 2 [0 : : : DC1] having some constant c > 0 such that, for

any finite B  R DC1 , HX (B)  c  diam [B]d . The measurable entropy dimension dim() is defined similarly, only with H in place of HX . Note that dim()  dim(X), because H (B)  HX (B) for all B  R DC1 . For any bounded B  R DC1 and ‘scale factor’ s > 0, let˚ sB :D fsb; b 2 Bg. For any  radius r > 0, let (sB)r :D x 2 R DC1 ; d(x; sB)  r . Define the d-dimensional topological entropy density of B by hXd (B) :D sup lim sup HX [(s B)r ]/s d : r>0

(10)

s!1

d (B) Define d-dimensional measurable entropy density h similarly, only using H instead of HX . Note that, for any d < dim(X) [respectively, d < dim()], hXd (B) [resp. d (B)] will be infinite, whereas for for any d > dim(X) h d (B)] will be zero; [resp. d > dim()], hXd (B) [resp. h hence dim(X) [resp. dim()] is the unique value of d for d ] defined in Eq. (10) could which the function hXd [resp. h be nontrivial.

Example 75 (a) If d D D C 1, and B is a unit cube cenDC1 (B)) is just tered at the origin, then hXDC1 (B) (resp. h the classical (DC1)-dimensional topological (resp. measurable) entropy of X (resp. ) as a (DC1)-dimensional subshift (resp. random field). (b) However, the most important case for Milnor [99] D (and us) is when X D XHist (˚ ) for some ˚ 2 CA(AZ ). In this case, dim()  dim(X)  D < DC1. ˚In particular, if d D 1, then for any v E 2 R DC1 , if B :D rv E ; r 2 [0; 1] , 1 1 (B) D h (˚ ; v E ) and h E ) are dithen hX (B) D htop (˚ ; v  rectional entropies of Subsect. “Directional Entropy”. For any d 2 [0 : : : DC1], let d be the d-dimensional Hausdorff measure on R DC1 such that, if P  R DC1 is any d-plane (i. e. a d-dimensional linear subspace of R DC1 ), then d restricts to the d-dimensional Lebesgue measure on P. be a subshift, and let Theorem 76 Let X  AZ  2 Meas(X;  ). Let d D dim(X) (or dim()) and let hd d ). Let B; C  R DC1 be compact sets. Then be hXd (or h (a) h d (B) is well-defined and finite. (b) If B C then h d (B)  h d (C). (c) h d (B [ C)  h d (B) C h d (C). E ) D h d (B) for any v E 2 R DC1 . (d) h d (B C v (e) h d (s B) D s d  h d (B) for any s > 0. (f) There is some constant c such that hd (B)  cd (B) for all compact B  R DC1 . (g) If d 2 N, then for any d-plane P  R DC1 , there is some Hd (P) 0 such that h d (B) D Hd (P)  d (B) for any compact subset B  P with d (@B) D 0. DC1

d

d

(h) There is a constant H X < 1 such that HXd (P)  H X for all d-planes P.

Ergodic Theory of Cellular Automata

Ergodic Theory of Cellular Automata, Figure 2 Example 78(b)[ii]: A left permutative CA ˚ is quasi-invertible. In this picture, [` : : : r] D [1 : : : 2], and L is a line of slope 1/3. If x 2 X and we know the entries of x in a neighborhood of L, then we can reconstruct the rest of x as shown. Entries above L are directly computed using the local rule of ˚. Entries below L are interpolated via left-permutativity. In both cases, the reconstruction occurs in consecutive diagonal lines, whose order is indicated by shading from darkest to lightest in the figure

Proof See Theorems 1 and 2 and Corollary 1 in [99], or see Theorems 6.2, 6.3, and 6.13 in [12].  Example 77 Let ˚ 2 CA(AZ ) and let X :D XHist (˚). If P :D f0g  R D , then H D (P) is the classical D-dimenD sional entropy of the omega limit set Y :D ˚ 1 (AZ ); heuristically, this measures the asymptotic level of ‘spatial disorder’ in Y. If P  R DC1 is some other D-plane, then H D (P) measures some combination of the ‘spatial disorder’ of Y with the dynamical entropy of ˚. D

Let d 2 [1 : : : DC1], and let P  R DC1 be a d-plane. For any r > 0, let P(r) :D fz 2 Z DC1 ; d(z; P) < rg. We say P is expansive for X if there is some r > 0 such that, for any x; y 2 X, (xP(r) D yP(r) ) () (x D y). If P is spanned by d rational vectors, then P \ Z DC1 is a rank-d sublattice L  Z DC1, and P is expansive if and only if the induced L-action on X is expansive. However, if P is ‘irrational’, then expansiveness is a more subtle concept; see Sect. 2 in [12] for more information. D If ˚ 2 CA(AZ ) and X D XHist (˚), then ˚ is quasi-invertible if X admits an expansive D-plane P (this is a natural extension of Milnor’s (1988; §7) definition in terms of ‘causal cones’). Heuristically, if we regard Z DC1 as ‘spacetime’ (in the spirit of special relativity), then P can be seen as ‘space’, and any direction transversal to P can be interpreted as the flow of ‘time’.

(b) Let ˚ 2 CA(AZ ), so that X  AZ . Let ˚ have neighborhood [` : : : r], with `  0  r, and let L  R2 be a line with slope S through the origin (Fig. 2). 2

1 , then L If ˚ is right-permutative, and 0 < S  `C1 is expansive for X. 1 [ii] If ˚ is left-permutative, and rC1  S < 0, then L is expansive for X. 1 [iii] If ˚ is bipermutative, and rC1  S < 0 or 0 < S  1 , then L is expansive for X. `C1

[i]

[iv] If ˚ is posexpansive (see Subsect. “Posexpansive and Permutative CA”) then the ‘time’ axis L D R  f0g is expansive for X. Hence, in any of these cases, ˚ is quasi-invertible. (Presumably, something similar is true for multidimensional permutative CA.) Proposition 79 Let ˚ 2 CA(AZ ), let X D XHist (˚ ), let d  2 Meas(X;  ), and let Hd and H X be as in Theorem 76(g,h). D (a) If HXD (f0g  R D ) D 0, then H X D 0. D

(b) Let d 2 [1 : : : D], and suppose that X admits an expansive d-plane. Then: [i]

dim(X)  d; d

d (P)  [ii] There is a constant H  < 1 such that H d

Example 78 (a) If ˚ is invertible, then it is quasi-invertible, because f0g  R D is an expansive D-plane (recall that the zeroth coordinate is time).

H  for all d-planes P; [iii] If Hd (P) D 0 for some expansive d-plane P, then d

H D 0.

993

994

Ergodic Theory of Cellular Automata

Proof (a) is Corollary 3 in [99], (b)[i] is Corollary 1.4 in [128], and (b)[ii] is Theorem 6.19(2) in [12]. d (b)[iii]: See Theorem 6.3(4) in [12] for “H X D 0”. d

See Theorem 6.19(1) in [12] for “H  D 0”.



If d 2 [1 : : : DC1], then a d-frame in R DC1 is a d-tuple F :D (E v1 ; : : : ; v E d ), where v E 1; : : : ; v E d 2 R DC1 are linearly independent. Let Frame(DC1; d) be the set of all d-frames in R DC1 ; then Frame(DC1; d) is an open subset of R DC1      R DC1 :D R(DC1)  d]. Let Expans(X; d) :D fF 2 Frame(DC1; d) ; span(F) is expansive for Xg : Then Expans(X; d) is an open subset of Frame(DC1; d), by Lemma 3.4 in [12]. A connected component of Expans(X; d) is called an expansive component for X. For any F 2 Frame(DC1; d), let [F] be the d-dimensional parallelepiped spanned by F, and let hXd (F) :D hXd ([F]) D HXd (span(F))  d ([F]), where the last equality is by Theorem 76(g). The next result is a partial extension of Theorem 72(b). be a subshift, suppose Proposition 80 Let X  AZ d :D dim(X) 2 N, and let C  Expans(X; d) be an expansive component. Then the function hXd : C!R is convex in each of its d distinct R DC1 -valued arguments. Thus, hXd is Lipschitz-continuous on C. DC1

Proof See Theorem 6.9(1,4) in [12].



For measurable entropy, we can say much more. Recall that a d-linear form is a function ! : R(DC1)d !R which is linear in each of its d distinct R DC1 -valued arguments and antisymmetric. be a subshift and let Theorem 81 Let X  AZ  2 Meas (X; ). Suppose d :D dim() 2 N, and let C  Expans(X; d) be an expansive component for X. Then there d agrees is a d-linear form ! : R(DC1)d !R such that h with ! on C. DC1

Proof Theorem 6.16 in [12].



d

If H  ¤ 0, then Theorem 81 means that there is E DC1 ) an orthogonal (D C 1  d)-frame W :D (E wdC1 ; : : : ; w (transversal to all frames in C) such that, for any d-frame V :D (E v1 ; : : : ; v E d ) 2 C, d h (V) D det(E v1 ; : : : ; v Ed; w E dC1 ; : : : ; w E DC1 ) :

(11)

E DC1 g is the Thus, the d-plane orthogonal to fE wdC1 ; : : : ; w d – this is the d-plane manid-plane which maximizes H festing the most rapid decay of correlation with distance.

On the other hand, span(W) is the (D C 1  d)-plane along which correlations decay the most slowly. Also, if V 2 C, then Eq. (11) implies that C cannot contain any frame spanning span(V) with reversed orientation (e. g. an odd permutation of V), because entropy is nonnegative. Example 82 Let ˚ 2 CA(AZ ) be quasi-invertible, and let P be an expansive D-plane for X :D XHist (˚ ) (see Example 78). The D-frames spanning P fall into two expansive components (related by orientation-reversal); let C be D union of these two components. Let  2 Meas (AZ ; ˚ ), and extend  to a  -invariant measure on X. In this case, Theorem 81 is equivalent to Theorem 4 in [99], which says there a vector w E 2 R DC1ˇ such that, for any ˇ d (F) D ˇdet(E E D ) 2 C, h v1 ; : : : ; v E D; w E )ˇ. D-frame (E v1 ; : : : ; v d (P) is maximized when P is the hyperplane orThus, H thogonal to w E . Heuristically, w E points in the direction of minimum correlation decay (or maximum ‘causality’) – the direction which could most properly be called ‘time’ for the MPDS (˚; ). D

Theorem 81 yields the following generalization the Variational Principle: be a subshift and suppose Theorem 83 Let X  AZ d :D dim(X) 2 N. (a) If F 2 Expans(X; d), then there exists  2 Meas(X;  ) d (F). such that hXd (F) D h (b) Let C  Expans(X; d) be an expansive component for X. There exists some  2 Meas(X;  ) such that hXd D d on C if and only if hd is a d-linear form on C. h X DC1

Proof Proposition 6.24 and Theorem 6.25 in [12]. ZD



Remark 84 (a) If G  A is an abelian subgroup shift and ˚ 2 ECA(G), then XHist (˚ ) is a subgroup shift of DC1 AZ , which can be viewed as an algebraic Z DC1-action (see discussion prior to Proposition 27). In this context, the expansive subspaces of XHist (˚ ) have been completely characterized by Einsiedler et al. (see Theorem 8.4 in [33]). Furthermore, certain dynamical properties (such as positive entropy, completely positive entropy, or Bernoullicity) are common amongst all elements of each expansive component of XHist (˚ ) (see Theorem 9.8 in [33]) (this sort of ‘commonality’ within expansive components was earlier emphasized by Boyle and Lind (see [12])). If XHist (˚ ) has entropy dimension 1 (e. g. ˚ is a one-dimensional linear CA), the structure of XHist (˚ ) has been thoroughly analyzed by Einsiedler and Lind [30]. Finally, if G1 and G2 are subgroup shifts, and ˚ k 2 ECA(G k ) and  k 2 Meas(G k ; ˚;  ) for k D 1; 2, with dim(1 ) D dim(2 ) D 1, then Einsiedler and Ward [32] have given conditions for the measure-pre-

Ergodic Theory of Cellular Automata

serving systems (G1 ; 1 ; ˚1 ;  ) and (G2 ; 2 ; ˚2 ;  ) to be disjoint. (b) Boyle and Lind’s ‘expansive subdynamics’ concerns expansiveness along certain directions in the space-time diagram of a CA. Recently, M. Sablik has developed a theory of directional dynamics, which explores other topological dynamical properties (such as equicontinuity and sensitivity to initial conditions) along spatiotemporal directions in a CA; see [120], Chapitre II or [121]. Future Directions and Open Problems 1. We now have a fairly good understanding of the ergodic theory of linear and/or ‘abelian’ CA. The next step is to extend these results to CA with nonlinear and/or nonabelian algebraic structures. In particular: (a) Almost all the measure rigidity results of Subsect. “Measure Rigidity in Algebraic CA” are for endomorphic CA on abelian group shifts, except for Propositions 21 and 23. Can we extend these results to CA on nonabelian group shifts or other permutative CA? (b) Likewise, the asymptotic randomization results of Subsect. “Asymptotic Randomization by Linear Cellular Automata” are almost exclusively for linear CA with scalar coefficients, and for M D Z D  N E . Can we extend these results to LCA with noncommuting, matrix-valued coefficients? (The problem is: if the coefficients do not commute, then the ‘polynomial representation’ and Lucas’ theorem become inapplicable.) Also, can we obtain similar results for multiplicative CA on nonabelian groups? (See Remark 41(d).) What about other permutative CA? (See Remark 41(e).) Finally, what if M is a nonabelian group? (For example, Lind and Schmidt (unpublished) [31] have recently investigated algebraic actions of the discrete Heisenberg group.) 2. Cellular automata are often seen as models of spatially distributed computation. Meaningful ‘computation’ could possibly occur when a CA interacts with a highly structured initial configuration (e. g. a substitution sequence), whereas such computation is probably impossible in the roiling cauldron of noise arising from a mixing, positive entropy measure (e. g. a Bernoulli measure or Markov random field). Yet almost all the results in this article concern the interaction of CA with such mixing, positive-entropy measures. We are starting to understand the topological dynamics of CA acting on non-mixing and/or zero-entropy symbolic dynamical systems, (e. g. substitution shifts, automatic shifts, regular Toeplitz shifts, and quasisturmian shifts);

3.

4.

5.

6.

see  Dynamics of Cellular Automata in Non-compact Spaces. However, almost nothing is known about the interaction of CA with the natural invariant measures on these systems. In particular: (a) The invariant measures discussed in Sect. “Invariant Measures for CA” all have nonzero entropy (see, however, Example 67(c)). Are there any nontrivial zero-entropy measures for interesting CA? (b) The results of Subsect. “Asymptotic Randomization by Linear Cellular Automata” all concern the asymptotic randomization of initial measures with nonzero entropy, except for Remark 41(c). Are there similar results for zero-entropy measures? (c) Zero-entropy systems often have an appealing combinatorial description via cutting-and-stacking constructions, Bratteli diagrams, or finite state machines. Likewise, CA admit a combinatorial description (via local rules). How do these combinatorial descriptions interact? As we saw in Subsect. “Domains, Defects, and Particles”, and also in Propositions 48–56, emergent defect dynamics can be a powerful tool for analyzing the measurable dynamics of CA. Defects in one-dimensional CA generally act like ‘particles’, and their ‘kinematics’ is fairly well-understood. However, in higher dimensions, defects can be much more topologically complicated (e. g. they can look like curves or surfaces), and their evolution in time is totally mysterious. Can we develop a theory of multidimensional defect dynamics? Almost all the results about mixing and ergodicity in Subsect. “Mixing and Ergodicity” are for one-dimensional (mostly permutative) CA and for the uniform measure on AZ . Can similar results be obtained for other CA and/or measures on AZ ? What about CA in D AZ for D 2? Let  be a (˚;  )-invariant measure on AM . Proposition 66 suggests an intriguing correspondence between certain spectral properties (namely, weak mixing and discrete spectrum) for the system (AM ; ;  ) and those for the system (AM ; ; ˚). Does a similar correspondence hold for other spectral properties, such as continuous spectrum, Lebesgue spectral type, spectral multiplicity, rigidity, or mild mixing? DC1 Let X 2 AZ be a subshift admitting an expansive D-plane P  R DC1 . As discussed in Subsect. “Entropy Geometry and Expansive Subdynamics”, if we regard Z DC1 as ‘spacetime’, then we can treat P as a ‘space’, and a transversal direction as ‘time’. Indeed, if P is spanned by rational vectors, then the Curtis–Hedlund– Lyndon theorem implies that X is isomorphic to the D history shift of some invertible ˚ 2 CA(AZ ) acting on

995

996

Ergodic Theory of Cellular Automata

some ˚-invariant subshift Y AZ (where we embed ZD in P). If P is irrational, then this is not the case; however, X still seems very much like the history shift of a spatially distributed symbolic dynamical system, closely analogous to a CA, except with a continually fluctuating ‘spatial distribution’ of state information, and perhaps with occasional nonlocal interactions. For example, Proposition 79(b)[i] implies that dim(X)  D, just as for a CA. How much of the theory of invertible CA can be generalized to such systems? D

I will finish with the hardest problem of all. Cellular automata are tractable mainly because of their homogeneity: CA are embedded in a highly regular spatial geometry (i. e. a lattice or other Cayley digraph) with the same local rule everywhere. However, many of the most interesting spatially distributed symbolic dynamical systems are not nearly this homogeneous. For example:  CA are often proposed as models of spatially distributed physical systems. Yet in many such systems (e. g. living tissues, quantum ‘foams’), the underlying geometry is not a flat Euclidean space, but a curved manifold. A good discrete model of such a manifold can be obtained through a Voronoi tessellation of sufficient density; a realistic symbolic dynamical model would be a CA-like system defined on the dual graph of this Voronoi tessellation.  As mentioned in question #3, defects in multidimensional CA may have the geometry of curves, surfaces, or other embedded submanifolds (possibly with varying nonzero thickness). To model the evolution of such a defect, we could treat it as a CA-like object whose underlying geometry is an (evolving) manifold, and whose local rules (although partly determined by the local rule of the original CA) are spatially heterogenous (because they are also influenced by incoming information from the ambient ‘nondefective’ space).  The CA-like system arising in question #6 has a D-dimensional planar geometry, but the distribution of ‘cells’ within this plane (and, presumably, the local rules between them) are constantly fluctuating. More generally, any topological dynamical system on a Cantor space can be represented as a cellular network: a CA-like system defined on an infinite digraph, with different local rules at different nodes. Gromov [51] has generalized the Garden of Eden Theorem 3 to this setting (see Remark 5(a)). However, other than Gromov’s work, basically nothing is known about such systems. Can we generalize any of the theory of cellular automata to cellular networks? Is it possible to develop a nontrivial ergodic theory for such systems?

Acknowledgments I would like to thank François Blanchard, Mike Boyle, Maurice Courbage, Doug Lind, Petr K˚urka, Servet Martínez, Kyewon Koh Park, Mathieu Sablik, Jeffrey Steif, and Marcelo Sobottka, who read draft versions of this article and made many invaluable suggestions, corrections, and comments. (Any errors which remain are mine.) To Reem.

Bibliography 1. Akin E (1993) The general topology of dynamical systems, Graduate Studies in Mathematics, vol 1. American Mathematical Society, Providence 2. Allouche JP (1999) Cellular automata, finite automata, and number theory. In: Cellular automata (Saissac, 1996), Math. Appl., vol 460. Kluwer, Dordrecht, pp 321–330 3. Allouche JP, Skordev G (2003) Remarks on permutive cellular automata. J Comput Syst Sci 67(1):174–182 4. Allouche JP, von Haeseler F, Peitgen HO, Skordev G (1996) Linear cellular automata, finite automata and Pascal’s triangle. Discret Appl Math 66(1):1–22 5. Allouche JP, von Haeseler F, Peitgen HO, Petersen A, Skordev G (1997) Automaticity of double sequences generated by one-dimensional linear cellular automata. Theoret Comput Sci 188(1-2):195–209 6. Barbé A, von Haeseler F, Peitgen HO, Skordev G (1995) Coarse-graining invariant patterns of one-dimensional twostate linear cellular automata. Internat J Bifur Chaos Appl Sci Eng 5(6):1611–1631 7. Barbé A, von Haeseler F, Peitgen HO, Skordev G (2003) Rescaled evolution sets of linear cellular automata on a cylinder. Internat J Bifur Chaos Appl Sci Eng 13(4):815–842 8. Belitsky V, Ferrari PA (2005) Invariant measures and convergence properties for cellular automaton 184 and related processes. J Stat Phys 118(3-4):589–623 9. Blanchard F, Maass A (1997) Dynamical properties of expansive one-sided cellular automata. Israel J Math 99:149–174 10. Blank M (2003) Ergodic properties of a simple deterministic traffic flow model. J Stat Phys 111(3-4):903–930 11. Boccara N, Naser J, Roger M (1991) Particle-like structures and their interactions in spatiotemporal patterns generated by one-dimensional deterministic cellular automata. Phys Rev A 44(2):866–875 12. Boyle M, Lind D (1997) Expansive subdynamics. Trans Amer Math Soc 349(1):55–102 13. Boyle M, Fiebig D, Fiebig UR (1997) A dimension group for local homeomorphisms and endomorphisms of onesided shifts of finite type. J Reine Angew Math 487:27–59 14. Burton R, Steif JE (1994) Non-uniqueness of measures of maximal entropy for subshifts of finite type. Ergodic Theory Dynam Syst 14(2):213–235 15. Burton R, Steif JE (1995) New results on measures of maximal entropy. Israel J Math 89(1-3):275–300 16. Cai H, Luo X (1993) Laws of large numbers for a cellular automaton. Ann Probab 21(3):1413–1426 17. Cattaneo G, Formenti E, Manzini G, Margara L (1997) On ergodic linear cellular automata over Z m . In: STACS 97 (Lübeck).

Ergodic Theory of Cellular Automata

18.

19.

20.

21.

22.

23. 24. 25.

26. 27. 28.

29.

30. 31.

32.

33.

34.

35.

36.

37.

38.

Lecture Notes in Computer Science, vol 1200. Springer, Berlin, pp 427–438 Cattaneo G, Formenti E, Manzini G, Margara L (2000) Ergodicity, transitivity, and regularity for linear cellular automata over Z m . Theoret Comput Sci 233(1-2):147–164 Ceccherini-Silberstein T, Fiorenzi F, Scarabotti F (2004) The Garden of Eden theorem for cellular automata and for symbolic dynamical systems. In: Random walks and geometry, de Gruyter, Berlin, pp 73–108 Ceccherini-Silberstein TG, Machì A, Scarabotti F (1999) Amenable groups and cellular automata. Ann Inst Fourier (Grenoble) 49(2):673–685 ´ Courbage M, Kaminski B (2002) On the directional entropy of Z2 -actions generated by cellular automata. Studia Math 153(3):285–295 ´ Courbage M, Kaminski B (2006) Space-time directional Lyapunov exponents for cellular automata. J Stat Phys 124(6):1499–1509 Coven EM (1980) Topological entropy of block maps. Proc Amer Math Soc 78(4):590–594 Coven EM, Paul ME (1974) Endomorphisms of irreducible subshifts of finite type. Math Syst Theory 8(2):167–175 Downarowicz T (1997) The royal couple conceals their mutual relationship: a noncoalescent Toeplitz flow. Israel J Math 97:239–251 Durrett R, Steif JE (1991) Some rigorous results for the Greenberg–Hastings model. J Theoret Probab 4(4):669–690 Durrett R, Steif JE (1993) Fixation results for threshold voter systems. Ann Probab 21(1):232–247 Einsiedler M (2004) Invariant subsets and invariant measures for irreducible actions on zero-dimensional groups. Bull London Math Soc 36(3):321–331 Einsiedler M (2005) Isomorphism and measure rigidity for algebraic actions on zero-dimensional groups. Monatsh Math 144(1):39–69 Einsiedler M, Lind D (2004) Algebraic Zd -actions on entropy rank one. Trans Amer Math Soc 356(5):1799–1831 (electronic) Einsiedler M, Rindler H (2001) Algebraic actions of the discrete Heisenberg group and other non-abelian groups. Aequationes Math 62(1–2):117–135 Einsiedler M, Ward T (2005) Entropy geometry and disjointness for zero-dimensional algebraic actions. J Reine Angew Math 584:195–214 Einsiedler M, Lind D, Miles R, Ward T (2001) Expansive subdynamics for algebraic Zd -actions. Ergodic Theory Dynam Systems 21(6):1695–1729 Eloranta K, Nummelin E (1992) The kink of cellular automaton Rule 18 performs a random walk. J Stat Phys 69(5-6):1131– 1136 Fagnani F, Margara L (1998) Expansivity, permutivity, and chaos for cellular automata. Theory Comput Syst 31(6):663– 677 Ferrari PA, Maass A, Martínez S, Ney P (2000) Cesàro mean distribution of group automata starting from measures with summable decay. Ergodic Theory Dynam Syst 20(6):1657– 1670 Finelli M, Manzini G, Margara L (1998) Lyapunov exponents versus expansivity and sensitivity in cellular automata. J Complexity 14(2):210–233 Fiorenzi F (2000) The Garden of Eden theorem for sofic shifts. Pure Math Appl 11(3):471–484

39. Fiorenzi F (2003) Cellular automata and strongly irreducible shifts of finite type. Theoret Comput Sci 299(1-3):477–493 40. Fiorenzi F (2004) Semi-strongly irreducible shifts. Adv Appl Math 32(3):421–438 41. Fisch R (1990) The one-dimensional cyclic cellular automaton: a system with deterministic dynamics that emulates an interacting particle system with stochastic dynamics. J Theoret Probab 3(2):311–338 42. Fisch R (1992) Clustering in the one-dimensional three-color cyclic cellular automaton. Ann Probab 20(3):1528–1548 43. Fisch R, Gravner J (1995) One-dimensional deterministic Greenberg–Hastings models. Complex Systems 9(5):329–348 44. Furstenberg H (1967) Disjointness in ergodic theory, minimal sets, and a problem in Diophantine approximation. Math Systems Theory 1:1–49 45. Gilman RH (1987) Classes of linear automata. Ergodic Theory Dynam Syst 7(1):105–118 46. Gottschalk W (1973) Some general dynamical notions. In: Recent advances in topological dynamics (Proc Conf Topological Dynamics, Yale Univ, New Haven, 1972; in honor of Gustav Arnold Hedlund), Lecture Notes in Math, vol 318. Springer, Berlin, pp 120–125 47. Grassberger P (1984) Chaos and diffusion in deterministic cellular automata. Phys D 10(1-2):52–58, cellular automata (Los Alamos, 1983) 48. Grassberger P (1984) New mechanism for deterministic diffusion. Phys Rev A 28(6):3666–3667 49. Grigorchuk RI (1984) Degrees of growth of finitely generated groups and the theory of invariant means. Izv Akad Nauk SSSR Ser Mat 48(5):939–985 50. Gromov M (1981) Groups of polynomial growth and expanding maps. Inst Hautes Études Sci Publ Math 53:53–73 51. Gromov M (1999) Endomorphisms of symbolic algebraic varieties. J Eur Math Soc (JEMS) 1(2):109–197 52. von Haeseler F, Peitgen HO, Skordev G (1992) Pascal’s triangle, dynamical systems and attractors. Ergodic Theory Dynam Systems 12(3):479–486 53. von Haeseler F, Peitgen HO, Skordev G (1993) Cellular automata, matrix substitutions and fractals. Ann Math Artificial Intelligence 8(3-4):345–362, theorem proving and logic programming (1992) 54. von Haeseler F, Peitgen HO, Skordev G (1995) Global analysis of self-similarity features of cellular automata: selected examples. Phys D 86(1-2):64–80, chaos, order and patterns: aspects of nonlinearity– the “gran finale” (Como, 1993) 55. von Haeseler F, Peitgen HO, Skordev G (1995) Multifractal decompositions of rescaled evolution sets of equivariant cellular automata. Random Comput Dynam 3(1-2):93–119 56. von Haeseler F, Peitgen HO, Skordev G (2001) Self-similar structure of rescaled evolution sets of cellular automata. I. Internat J Bifur Chaos Appl Sci Eng 11(4):913–926 57. von Haeseler F, Peitgen HO, Skordev G (2001) Self-similar structure of rescaled evolution sets of cellular automata. II. Internat J Bifur Chaos Appl Sci Eng 11(4):927–941 58. Hedlund GA (1969) Endormorphisms and automorphisms of the shift dynamical system. Math Syst Theory 3:320–375 59. Hilmy H (1936) Sur les centres d’attraction minimaux des systeémes dynamiques. Compositio Mathematica 3:227–238 60. Host B (1995) Nombres normaux, entropie, translations. Israel J Math 91(1-3):419–428 61. Host B, Maass A, Martìnez S (2003) Uniform Bernoulli measure

997

998

Ergodic Theory of Cellular Automata

62.

63. 64. 65. 66. 67. 68. 69.

70.

71.

in dynamics of permutative cellular automata with algebraic local rules. Discret Contin Dyn Syst 9(6):1423–1446 Hurd LP, Kari J, Culik K (1992) The topological entropy of cellular automata is uncomputable. Ergodic Theory Dynam Syst 12(2):255–265 Hurley M (1990) Attractors in cellular automata. Ergodic Theory Dynam Syst 10(1):131–140 Hurley M (1990) Ergodic aspects of cellular automata. Ergodic Theory Dynam Syst 10(4):671–685 Hurley M (1991) Varieties of periodic attractor in cellular automata. Trans Amer Math Soc 326(2):701–726 Hurley M (1992) Attractors in restricted cellular automata. Proc Amer Math Soc 115(2):563–571 Jen E (1988) Linear cellular automata and recurring sequences in finite fields. Comm Math Phys 119(1):13–28 Johnson A, Rudolph DJ (1995) Convergence under q of p invariant measures on the circle. Adv Math 115(1):117–140 Johnson ASA (1992) Measures on the circle invariant under multiplication by a nonlacunary subsemigroup of the integers. Israel J Math 77(1-2):211–240 Kitchens B (2000) Dynamics of mathbfZ d actions on Markov subgroups. In: Topics in symbolic dynamics and applications (Temuco, 1997), London Math Soc Lecture Note Ser, vol 279. Cambridge Univ Press, Cambridge, pp 89–122 Kitchens B, Schmidt K (1989) Automorphisms of compact groups. Ergodic Theory Dynam Syst 9(4):691–735 2

72. Kitchens B, Schmidt K (1992) Markov subgroups of (Z/2Z)Z . In: Symbolic dynamics and its applications (New Haven, 1991), Contemp Math, vol 135. Amer Math Soc, Providence, pp 265–283 73. Kitchens BP (1987) Expansive dynamics on zero-dimensional groups. Ergodic Theory Dynam Syst 7(2):249–261 74. Kleveland R (1997) Mixing properties of one-dimensional cellular automata. Proc Amer Math Soc 125(6):1755–1766 75. Kurka ˚ P (1997) Languages, equicontinuity and attractors in cellular automata. Ergodic Theory Dynam Systems 17(2):417– 433 76. Kurka ˚ P (2001) Topological dynamics of cellular automata. In: Codes, systems, and graphical models (Minneapolis, 1999), IMA vol Math Appl, vol 123. Springer, New York, pp 447–485 77. Kurka ˚ P (2003) Cellular automata with vanishing particles. Fund Inform 58(3-4):203–221 78. Kurka ˚ P (2005) On the measure attractor of a cellular automaton. Discret Contin Dyn Syst (suppl.):524–535 79. Kurka ˚ P, Maass A (2000) Limit sets of cellular automata associated to probability measures. J Stat Phys 100(5-6):1031–1047 80. Kurka ˚ P, Maass A (2002) Stability of subshifts in cellular automata. Fund Inform 52(1-3):143–155, special issue on cellular automata 81. Lind D, Marcus B (1995) An introduction to symbolic dynamics and coding. Cambridge University Press, Cambridge 82. Lind DA (1984) Applications of ergodic theory and sofic systems to cellular automata. Phys D 10(1-2):36–44, cellular automata (Los Alamos, 1983) 83. Lind DA (1987) Entropies of automorphisms of a topological Markov shift. Proc Amer Math Soc 99(3):589–595 84. Lucas E (1878) Sur les congruences des nombres eulériens et les coefficients différentiels des functions trigonométriques suivant un module premier. Bull Soc Math France 6:49–54

85. Lyons R (1988) On measures simultaneously 2- and 3-invariant. Israel J Math 61(2):219–224 86. Maass A (1996) Some dynamical properties of one-dimensional cellular automata. In: Dynamics of complex interacting systems (Santiago, 1994), Nonlinear Phenom. Complex Systems, vol 2. Kluwer, Dordrecht, pp 35–80 87. Maass A, Martínez S (1998) On Cesàro limit distribution of a class of permutative cellular automata. J Stat Phys 90(1–2): 435–452 88. Maass A, Martínez S (1999) Time averages for some classes of expansive one-dimensional cellular automata. In: Cellular automata and complex systems (Santiago, 1996), Nonlinear Phenom. Complex Systems, vol 3. Kluwer, Dordrecht, pp 37– 54 89. Maass A, Martínez S, Pivato M, Yassawi R (2006) Asymptotic randomization of subgroup shifts by linear cellular automata. Ergodic Theory Dynam Syst 26(4):1203–1224 90. Maass A, Martínez S, Pivato M, Yassawi R (2006) Attractiveness of the Haar measure for the action of linear cellular automata in abelian topological Markov chains. In: Dynamics and Stochastics: Festschrift in honour of Michael Keane, vol 48 of, Lecture Notes Monograph Series of the IMS, vol 48. Institute for Mathematical Statistics, Beachwood, pp 100–108 91. Maass A, Martínez S, Sobottka M (2006) Limit measures for affine cellular automata on topological Markov subgroups. Nonlinearity 19(9):2137–2147, http://stacks.iop.org/ 0951-7715/19/2137 92. Machì A, Mignosi F (1993) Garden of Eden configurations for cellular automata on Cayley graphs of groups. SIAM J Discret Math 6(1):44–56 93. Maruoka A, Kimura M (1976) Condition for injectivity of global maps for tessellation automata. Inform Control 32(2):158–162 94. Mauldin RD, Skordev G (2000) Random linear cellular automata: fractals associated with random multiplication of polynomials. Japan J Math (NS) 26(2):381–406 95. Meester R, Steif JE (2001) Higher-dimensional subshifts of finite type, factor maps and measures of maximal entropy. Pacific J Math 200(2):497–510 96. Milnor J (1985) Correction and remarks: “On the concept of attractor”. Comm Math Phys 102(3):517–519 97. Milnor J (1985) On the concept of attractor. Comm Math Phys 99(2):177–195 98. Milnor J (1986) Directional entropies of cellular automatonmaps. In: Disordered systems and biological organization (Les Houches, 1985), NATO Adv Sci Inst Ser F Comput Syst Sci, vol 20. Springer, Berlin, pp 113–115 99. Milnor J (1988) On the entropy geometry of cellular automata. Complex Syst 2(3):357–385 100. Miyamoto M (1979) An equilibrium state for a one-dimensional life game. J Math Kyoto Univ 19(3):525–540 101. Moore C (1997) Quasilinear cellular automata. Phys D 103(1– 4):100–132, lattice dynamics (Paris, 1995) 102. Moore C (1998) Predicting nonlinear cellular automata quickly by decomposing them into linear ones. Phys D 111(1– 4):27–41 103. Moore EF (1963) Machine models of self reproduction. Proc Symp Appl Math 14:17–34 104. Myhill J (1963) The converse of Moore’s Garden-of-Eden theorem. Proc Amer Math Soc 14:685–686

Ergodic Theory of Cellular Automata

105. Nasu M (1995) Textile systems for endomorphisms and automorphisms of the shift. Mem Amer Math Soc 114(546): viii+215 106. Nasu M (2002) The dynamics of expansive invertible onesided cellular automata. Trans Amer Math Soc 354(10):4067–4084 (electronic) 107. Park KK (1995) Continuity of directional entropy for a class of Z2 -actions. J Korean Math Soc 32(3):573–582 108. Park KK (1996) Entropy of a skew product with a Z2 -action. Pacific J Math 172(1):227–241 109. Park KK (1999) On directional entropy functions. Israel J Math 113:243–267 110. Parry W (1964) Intrinsic Markov chains. Trans Amer Math Soc 112:55–66 111. Pivato M (2003) Multiplicative cellular automata on nilpotent groups: structure, entropy, and asymptotics. J Stat Phys 110(1-2):247–267 112. Pivato M (2005) Cellular automata versus quasisturmian shifts. Ergodic Theory Dynam Syst 25(5):1583–1632 113. Pivato M (2005) Invariant measures for bipermutative cellular automata. Discret Contin Dyn Syst 12(4):723–736 114. Pivato M (2007) Spectral domain boundaries cellular automata. Fundamenta Informaticae 77(special issue), available at: http://arxiv.org/abs/math.DS/0507091 115. Pivato M (2008) Module shifts and measure rigidity in linear cellular automata. Ergodic Theory Dynam Syst (to appear) 116. Pivato M, Yassawi R (2002) Limit measures for affine cellular automata. Ergodic Theory Dynam Syst 22(4):1269–1287 117. Pivato M, Yassawi R (2004) Limit measures for affine cellular automata. II. Ergodic Theory Dynam Syst 24(6):1961–1980 118. Pivato M, Yassawi R (2006) Asymptotic randomization of sofic shifts by linear cellular automata. Ergodic Theory Dynam Syst 26(4):1177–1201 119. Rudolph DJ (1990) 2 and 3 invariant measures and entropy. Ergodic Theory Dynam Syst 10(2):395–406 120. Sablik M (2006) Étude de l’action conjointe d’un automate cellulaire et du décalage: Une approche topologique et ergodique. Ph D thesis, Université de la Méditerranée, Faculté des science de Luminy, Marseille 121. Sablik M (2008) Directional dynamics for cellular automata: A sensitivity to initial conditions approach. submitted to Theor Comput Sci 400(1–3):1–18 122. Sablik M (2008) Measure rigidity for algebraic bipermutative cellular automata. Ergodic Theory Dynam Syst 27(6):1965– 1990 123. Sato T (1997) Ergodicity of linear cellular automata over Zm . Inform Process Lett 61(3):169–172 124. Schmidt K (1995) Dynamical systems of algebraic origin, Progress in Mathematics, vol 128. Birkhäuser, Basel 125. Shereshevsky MA (1992) Ergodic properties of certain surjective cellular automata. Monatsh Math 114(3-4):305–316 126. Shereshevsky MA (1992) Lyapunov exponents for one-dimensional cellular automata. J Nonlinear Sci 2(1):1–8 127. Shereshevsky MA (1993) Expansiveness, entropy and polynomial growth for groups acting on subshifts by automorphisms. Indag Math (NS) 4(2):203–210 128. Shereshevsky MA (1996) On continuous actions commuting with actions of positive entropy. Colloq Math 70(2):265–269

129. Shereshevsky MA (1997) K-property of permutative cellular automata. Indag Math (NS) 8(3):411–416 130. Shereshevsky MA, Afra˘ımovich VS (1992/93) Bipermutative cellular automata are topologically conjugate to the onesided Bernoulli shift. Random Comput Dynam 1(1):91–98 131. Shirvani M, Rogers TD (1991) On ergodic one-dimensional cellular automata. Comm Math Phys 136(3):599–605 132. Silberger S (2005) Subshifts of the three dot system. Ergodic Theory Dynam Syst 25(5):1673–1687 133. Smillie J (1988) Properties of the directional entropy function for cellular automata. In: Dynamical systems (College Park, 1986–87), Lecture Notes in Math, vol 1342. Springer, Berlin, pp 689–705 134. Sobottka M (2005) Representación y aleatorización en sistemas dinámicos de tipo algebraico. Ph D thesis, Universidad de Chile, Facultad de ciencias físicas y matemáticas, Santiago 135. Sobottka M (2007) Topological quasi-group shifts. Discret Continuous Dyn Syst 17(1):77–93 136. Sobottka M (to appear 2007) Right-permutative cellular automata on topological Markov chains. Discret Continuous Dyn Syst. Available at http://arxiv.org/abs/math/0603326 137. Steif JE (1994) The threshold voter automaton at a critical point. Ann Probab 22(3):1121–1139 138. Takahashi S (1990) Cellular automata and multifractals: dimension spectra of linear cellular automata. Phys D 45(1– 3):36–48, cellular automata: theory and experiment (Los Alamos, NM, 1989) 139. Takahashi S (1992) Self-similarity of linear cellular automata. J Comput Syst Sci 44(1):114–140 140. Takahashi S (1993) Cellular automata, fractals and multifractals: space-time patterns and dimension spectra of linear cellular automata. In: Chaos in Australia (Sydney, 1990), World Sci Publishing, River Edge, pp 173–195 141. Tisseur P (2000) Cellular automata and Lyapunov exponents. Nonlinearity 13(5):1547–1560 142. Walters P (1982) An introduction to ergodic theory, Graduate Texts in Mathematics, vol 79. Springer, New York 143. Weiss B (2000) Sofic groups and dynamical systems. Sankhy¯a Ser A 62(3):350–359 144. Willson SJ (1975) On the ergodic theory of cellular automata. Math Syst Theory 9(2):132–141 145. Willson SJ (1984) Cellular automata can generate fractals. Discret Appl Math 8(1):91–99 146. Willson SJ (1984) Growth rates and fractional dimensions in cellular automata. Phys D 10(1-2):69–74, cellular automata (Los Alamos, 1983) 147. Willson SJ (1986) A use of cellular automata to obtain families of fractals. In: Chaotic dynamics and fractals (Atlanta, 1985), Notes Rep Math Sci Eng, vol 2. Academic Press, Orlando, pp 123–140 148. Willson SJ (1987) Computing fractal dimensions for additive cellular automata. Phys D 24(1-3):190–206 149. Willson SJ (1987) The equality of fractional dimensions for certain cellular automata. Phys D 24(1-3):179–189 150. Wolfram S (1985) Twenty problems in the theory of cellular automata. Physica Scripta 9:1–35 151. Wolfram S (1986) Theory and Applications of Cellular Automata. World Scientific, Singapore

999

1000

Evolutionary Game Theory

Evolutionary Game Theory W ILLIAM H. SANDHOLM Department of Economics, University of Wisconsin, Madison, USA Article Outline Glossary Definition of the Subject Introduction Normal Form Games Static Notions of Evolutionary Stability Population Games Revision Protocols Deterministic Dynamics Stochastic Dynamics Local Interaction Applications Future Directions Acknowledgments Bibliography

form game by introducing random matching; however, many population games of interest, including congestion games, do not take this form. Replicator dynamic The replicator dynamic is a fundamental deterministic evolutionary dynamic for games. Under this dynamic, the percentage growth rate of the mass of agents using each strategy is proportional to the excess of the strategy’s payoff over the population’s average payoff. The replicator dynamic can be interpreted biologically as a model of natural selection, and economically as a model of imitation. Revision protocol A revision protocol describes both the timing and the results of agents’ decisions about how to behave in a repeated strategic interaction. Revision protocols are used to derive both deterministic and stochastic evolutionary dynamics for games. Stochastically stable state In Game-theoretic models of stochastic evolution in games are often described by irreducible Markov processes. In these models, a population state is stochastically stable if it retains positive weight in the process’s stationary distribution as the level of noise in agents’ choices approaches zero, or as the population size approaches infinity.

Glossary Deterministic evolutionary dynamic A deterministic evolutionary dynamic is a rule for assigning population games to ordinary differential equations describing the evolution of behavior in the game. Deterministic evolutionary dynamics can be derived from revision protocols, which describe choices (in economic settings) or births and deaths (in biological settings) on an agent-by-agent basis. Evolutionarily stable strategy (ESS) In a symmetric normal form game, an evolutionarily stable strategy is a (possibly mixed) strategy with the following property: a population in which all members play this strategy is resistant to invasion by a small group of mutants who play an alternative mixed strategy. Normal form game A normal form game is a strategic interaction in which each of n players chooses a strategy and then receives a payoff that depends on all agents’ choices choices of strategy. In a symmetric two-player normal form game, the two players choose from the same set of strategies, and payoffs only depend on own and opponent’s choices, not on a player’s identity. Population game A population game is a strategic interaction among one or more large populations of agents. Each agent’s payoff depends on his own choice of strategy and the distribution of others’ choices of strategies. One can generate a population game from a normal

Definition of the Subject Evolutionary game theory studies the behavior of large populations of agents who repeatedly engage in strategic interactions. Changes in behavior in these populations are driven either by natural selection via differences in birth and death rates, or by the application of myopic decision rules by individual agents. The birth of evolutionary game theory is marked by the publication of a series of papers by mathematical biologist John Maynard Smith [137,138,140]. Maynard Smith adapted the methods of traditional game theory [151,215], which were created to model the behavior of rational economic agents, to the context of biological natural selection. He proposed his notion of an evolutionarily stable strategy (ESS) as a way of explaining the existence of ritualized animal conflict. Maynard Smith’s equilibrium concept was provided with an explicit dynamic foundation through a differential equation model introduced by Taylor and Jonker [205]. Schuster and Sigmund [189], following Dawkins [58], dubbed this model the replicator dynamic, and recognized the close links between this game-theoretic dynamic and dynamics studied much earlier in population ecology [132,214] and population genetics [73]. By the 1980s, evolutionary game theory was a well-developed and firmly established modeling framework in biology [106].

Evolutionary Game Theory

Towards the end of this period, economists realized the value of the evolutionary approach to game theory in social science contexts, both as a method of providing foundations for the equilibrium concepts of traditional game theory, and as a tool for selecting among equilibria in games that admit more than one. Especially in its early stages, work by economists in evolutionary game theory hewed closely to the interpretation set out by biologists, with the notion of ESS and the replicator dynamic understood as modeling natural selection in populations of agents genetically programmed to behave in specific ways. But it soon became clear that models of essentially the same form could be used to study the behavior of populations of active decision makers [50,76,133,149,167,191]. Indeed, the two approaches sometimes lead to identical models: the replicator dynamic itself can be understood not only as a model of natural selection, but also as one of imitation of successful opponents [35,188,216]. While the majority of work in evolutionary game theory has been undertaken by biologists and economists, closely related models have been applied to questions in a variety of fields, including transportation science [143, 150,173,175,177,197], computer science [72,173,177], and sociology [34,62,126,225,226]. Some paradigms from evolutionary game theory are close relatives of certain models from physics, and so have attracted the attention of workers in this field [141,201,202,203]. All told, evolutionary game theory provides a common ground for workers from a wide range of disciplines. Introduction This article offers a broad survey of the theory of evolution in games. Section “Normal Form Games” introduces normal form games, a simple and commonly studied model of strategic interaction. Section “Static Notions of Evolutionary Stability” presents the notion of an evolutionarily stable strategy, a static definition of stability proposed for this normal form context. Section “Population Games” defines population games, a general model of strategic interaction in large populations. Section “Revision Protocols” offers the notion of a revision protocol, an individual-level description of behavior used to define the population-level processes of central concern. Most of the article concentrates on these population-level processes: Section “Deterministic Dynamics” considers deterministic differential equation models of game dynamics; Section “Stochastic Dynamics” studies stochastic models of evolution based on Markov processes; and Sect. “Local Interaction” presents deterministic and

stochastic models of local interaction. Section “Applications” records a range of applications of evolutionary game theory, and Sect. “Future Directions” suggests directions for future research. Finally, Sect. “Bibliography” offers an extensive list of primary references. Normal Form Games In this section, we introduce a very simple model of strategic interaction: the symmetric two-player normal form game. We then define some of the standard solution concepts used to analyze this model, and provide some examples of games and their equilibria. With this background in place, we turn in subsequent sections to evolutionary analysis of behavior in games. In a symmetric two-player normal form game, each of the two players chooses a (pure) strategy from the finite set S, which we write generically as S D f1; : : : ; ng. The game’s payoffs are described by the matrix A 2 Rnn . Entry A i j is the payoff a player obtains when he chooses strategy i and his opponent chooses strategy j; this payoff does not depend on whether the player in question is called player 1 or player 2. The fundamental solution concept of noncooperative game theory is Nash equilibrium [151]. We say that the pure strategy i 2 S is a symmetric Nash equilibrium of A if A i i A ji

for all j 2 S:

(1)

Thus, if his opponent chooses a symmetric Nash equilibrium strategy i, a player can do no better than to choose i himself. A stronger requirement on strategy i demands that it be superior to all other strategies regardless of the opponent’s choice: A i k > A jk

for all j; k 2 S:

(2)

When condition (2) holds, we say that strategy i is strictly dominant in A. Example 1 The game below, with strategies C (“cooperate”) and D (“defect”), is an instance of a Prisoner’s Dilemma:

C D

C

D

2 3

0 1

:

(To interpret this game, note that A C D D 0 is the payoff to cooperating when one’s opponent defects.) Since 1 > 0, defecting is a symmetric Nash equilibrium of this game. In fact, since 3 > 2 and 1 > 0, defecting is even a strictly

1001

1002

Evolutionary Game Theory

dominant strategy. But since 2 > 1, both players are better off when both cooperate than when both defect. In many instances, it is natural to allow players to choose mixed (or randomized) strategies. When a player chooses mixed strategy from the simplex X D fx 2 P n : RC i2S x i D 1g, his behavior is stochastic: he commits to playing pure strategy i 2 S with probability x i . When either player makes a randomized choice, we evaluate payoffs by taking expectations: a player choosing mixed strategy x against an opponent choosing mixed strategy y garners an expected payoff of XX xi Ai j y j : (3) x 0 Ay D i2S j2S

In biological contexts, payoffs are fitnesses, and represent levels of reproductive success relative to some baseline level; Eq. (3) reflects the idea that in a large population, expected reproductive success is what matters. In economic contexts, payoffs are utilities: a numerical representation of players’ preferences under which Eq. (3) captures players’ choices between uncertain outcomes [215]. The notion of Nash equilibrium extends easily to allow for mixed strategies. Mixed strategy x is a symmetric Nash equilibrium of A if x 0 Ax y 0 Ax

for all y 2 X:

(4)

In words, x is a symmetric Nash equilibrium if its expected payoff against itself is at least as high as the expected payoff obtainable by any other strategy y against x. Note that we can represent the pure strategy i 2 S using the mixed strategy e i 2 X, the ith standard basis vector in Rn . If we do so, then definition (4) restricted to such strategies is equivalent to definition (1). We illustrate these ideas with a few examples. Example 2 Consider the Stag Hunt game:

H S

H

S

h 0

h s

:

Each player in the Stag Hunt game chooses between hunting hare (H) and hunting stag (S). A player who hunts hare always catches one, obtaining a payoff of h > 0. But hunting stag is only successful if both players do so, in which case each obtains a payoff of s > h. Hunting stag is potentially more profitable than hunting hare, but requires a coordinated effort. In the Stag Hunt game, H and S (or, equivalently, e H and e S ) are symmetric pure Nash equilibria. This game

also has a symmetric mixed Nash equilibrium, namely  ; x  ) D ( sh ; h ). If a player’s opponent chooses x  D (x H S s s this mixed strategy, the player’s expected payoff is h whether he chooses H, S, or any mixture between the two; in particular, x  is a best response against itself. To distinguish between the two pure equilibria, we might focus on the one that is payoff dominant, in that it achieves the higher joint payoff. Alternatively, we can concentrate on the risk dominant equilibrium [89], which utilizes the strategy preferred by a player who thinks his opponent is equally likely to choose either option (that is, against an opponent playing mixed strategy (x H ; x S ) D ( 12 ; 12 )). In the present case, since s > h, equilibrium S is payoff dominant. Which strategy is risk dominant depends on further information about payoffs. If s > 2h, then S is risk dominant. But if s < 2h, H is risk dominant: evidently, payoff dominance and risk dominance need not agree. Example 3 In the Hawk–Dove game [139], the two players are animals contesting a resource of value v > 0. The players choose between two strategies: display (D) or escalate (E). If both display, the resource is split; if one escalates and the other displays, the escalator claims the entire resource; if both escalate, then each player is equally likely to claim the entire resource or to be injured, suffering a cost of c > v in the latter case. The payoff matrix for the Hawk–Dove game is therefore D D E

1 2v

v

E 1 2 (v

0  c)

:

This game has no symmetric Nash equilibrium in pure strategies. It does, however, admit the symmetric mixed v equilibrium x  D (x D ; x E ) D ( cv c ; c ). (In fact, it can be shown that every symmetric normal form game admits at least one symmetric mixed Nash equilibrium [151].) In this example, our focus on symmetric behavior may seem odd: rather than randomizing symmetrically, it seems more natural for players to follow an asymmetric Nash equilibrium in which one player escalates and the other displays. But the symmetric equilibrium is the most relevant one for understanding natural selection in populations whose members are randomly matched in pairwise contests –;see Sect. “Static Notions of Evolutionary Stability”. Example 4 Consider the class of Rock–Paper–Scissors games:

Evolutionary Game Theory

R P S

R 0 w l

P l 0 w

S w l 0

For all y ¤ x; [x 0 Ax D y 0 Ax] implies that [x 0 Ay > y 0 Ay] : :

Here w > 0 is the benefit of winning the match and l > 0 the cost of losing; ties are worth 0 to both players. We call this game good RPS if w > l, so that the benefit of winning the match exceeds the cost of losing, standard RPS if w D l, and bad RPS if w < l. Regardless of the values of w and l, the unique symmetric Nash equilibrium of this game, x  D (x R ; x P ; x S ) D ( 13 ; 13 ; 13 ), requires uniform randomization over the three strategies.

(6)

Condition (4) is familiar: it requires that the incumbent strategy x be a best response to itself, and so is none other than our definition of symmetric Nash equilibrium. Condition (6) requires that if a mutant strategy y is an alternative best response against the incumbent strategy x, then the incumbent earns a higher payoff against the mutant than the mutant earns against itself. A less demanding notion of stability can be obtained by allowing the incumbent and the mutant in condition (6) to perform equally well against the mutant: For all y 2 X; [x 0 Ax D y 0 Ax]

Static Notions of Evolutionary Stability

implies that [x 0 Ay y 0 Ay] :

In introducing game-theoretic ideas to the study of animal behavior, Maynard Smith advanced this fundamental principle: that the evolutionary success of (the genes underlying) a given behavioral trait can depend on the prevalences of all traits. It follows that natural selection among the traits can be modeled as random matching of animals to play normal form games [137,138,139,140]. Working in this vein, Maynard Smith offered a stability concept for populations of animals sharing a common behavioral trait – that of playing a particular mixed strategy in the game at hand. Maynard Smith’s concept of evolutionary stability, influenced by the work of Hamilton [87] on the evolution of sex ratios, defines such a population as stable if it is resistant to invasion by a small group of mutants carrying a different trait. Suppose that a large population of animals is randomly matched to play the symmetric normal form game A. We call mixed strategy x 2 X an evolutionarily stable strategy (ESS) if x 0 A((1  ")x C "y) > y 0 A((1  ")x C "y) for all "  "¯(y) and y ¤ x :

(5)

To interpret condition (5), imagine that a population of animals programmed to play mixed strategy x is invaded by a group of mutants programmed to play the alternative mixed strategy y. Equation (5) requires that regardless of the choice of y, an incumbent’s expected payoff from a random match in the post-entry population exceeds that of a mutant so long as the size of the invading group is sufficiently small. The definition of ESS above can also be expressed as a combination of two conditions: x 0 Ax y 0 Ax

for all y 2 X;

(4)

(7)

If x satisfies conditions (4) and (7), it is called a neutrally stable strategy (NSS) [139]. Let us apply these stability notions to the games introduced in the previous section. Since every ESS and NSS must be a Nash equilibrium, we need only consider whether the Nash equilibria of these games satisfy the additional stability conditions, (6) and (7). Example 5 In the Prisoner’s Dilemma game (Example 1), the dominant strategy D is an ESS. Example 6 In the Stag Hunt game (Example 2), each pure Nash equilibrium is an ESS. But the mixed equilibrium  ; x  ) D ( sh ; h ) is not an ESS: if mutants playing ei(x H S s s ther pure strategy enter the population, they earn a higher payoff than the incumbents in the post-entry population. Example 7 In the Hawk–Dove game (Example 3), the v mixed equilibrium (x D ; x E ) D ( cv c ; c ) is an ESS. Maynard Smith used this and other examples to explain the existence of ritualized fighting in animals. While an animal who escalates always obtains the resource when matched with an animal who merely displays, a population of escalators is unstable: it can be invaded by a group of mutants who display, or who merely escalate less often. Example 8 In Rock–Paper–Scissors games (Example 4), whether the mixed equilibrium x  D ( 13 ; 13 ; 13 ) is evolutionarily stable depends on the relative payoffs to winning and losing a match. In good RPS (w > l), x  is an ESS; in standard RPS (w D l), x  is a NSS but not an ESS, while in bad RPS (w < l), x  is neither an ESS nor an NSS. The last case shows that neither evolutionary nor neutrally stable strategies need exist in a given game. The definition of an evolutionarily stable strategy has been extended to cover a wide range of strategic settings, and has been generalized in a variety of directions. Prominent

1003

1004

Evolutionary Game Theory

among these developments are set-valued versions of ESS: in rough terms, these concepts consider a set of mixed strategies Y  X to be stable if the no population playing a strategy in the set can be invaded successfully by a population of mutants playing a strategy outside the set. [95] provides a thorough survey of the first 15 years of research on ESS and related notions of stability; key references on set-valued evolutionary solution concepts include [15,199,206]. Maynard Smith’s notion of ESS attempts to capture the dynamic process of natural selection using a static definition. The advantage of this approach is that his definition is often easy to check in applications. Still, more convincing models of natural selection should be explicitly dynamic models, building on techniques from the theories of dynamical systems and stochastic processes. Indeed, this thoroughgoing approach can help us understand whether and when the ESS concept captures the notion of robustness to invasion in a satisfactory way. The remainder of this article concerns explicitly dynamic models of behavior. In addition to being dynamic rather than static, these models will differ from the one considered in this section in two other important ways as well. First, rather than looking at populations whose members all play a particular mixed strategy, the dynamic models consider populations in which different members play different pure strategies. Second, instead of maintaining a purely biological point of view, our dynamic models will be equally well-suited to studying behavior in animal and human populations. Population Games Population games provide a simple and general framework for studying strategic interactions in large populations whose members play pure strategies. The simplest population games are generated by random matching in normal form games, but the population game framework allows for interactions of a more intricate nature. We focus here on games played by a single population (i. e., games in which all agents play equivalent roles). We suppose that there is a unit mass of agents, each of whom chooses a pure strategy from the set S D f1; : : : ; ng. The aggregate behavior of these agents is described by a population state x 2 X, with x j representing the proportion of agents choosing pure strategy j. We identify a population game with a continuous vector-valued payoff function F : X ! Rn . The scalar F i (x) represents the payoff to strategy i when the population state is x. Population state x  is a Nash equilibrium of F if no agent can improve his payoff by unilaterally switching

strategies. More explicitly, x  is a Nash equilibrium if x i > 0

implies that F i (x) F j (x) for all j 2 S: (8)

Example 9 Suppose that the unit mass of agents are randomly matched to play the symmetric normal form game A. At population state x, the (expected) payoff to P strategy i is the linear function Fi (x) D j2S A i j x j ; the payoffs to all strategies can be expressed concisely as F(x) D Ax. It is easy to verify that x  is a Nash equilibrium of the population game F if and only if x  is a symmetric Nash equilibrium of the symmetric normal form game A. While population games generated by random matching are especially simple, many games that arise in applications are not of this form. In the biology literature, games outside the random matching paradigm are known as playing the field models [139]. Example 10 Consider the following model of highway congestion [17,143,166,173]. A pair of towns, Home and Work, are connected by a network of links. To commute from Home to Work, an agent must choose a path i 2 S connecting the two towns. The payoff the agent obtains is the negation of the delay on the path he takes. The delay on the path is the sum of the delays on its constituent links, while the delay on a link is a function of the number of agents who use that link. Population games embodying this description are known as a congestion games. To define a congestion game, let ˚ be the collection of links in the highway network. Each strategy i 2 S is a route from Home to Work, and so is identified with a set of links ˚ i ˚. Each link  is assigned a cost function c : RC ! R, whose argument is link ’s utilization level u : X u (x) D x i ; where () D fi 2 S :  2 ˚ i g i2 ( )

The payoff of choosing route i is the negation of the total delays on the links in this route: X c (u (x)) : F i (x) D 

2˚ i

Since driving on a link increases the delays experienced by other drivers on that link (i. e., since highway congestion involves negative externalities), cost functions in models of highway congestion are increasing; they are typically convex as well. Congestion games can also be used to model positive externalities, like the choice between different technological standards; in this case, the cost functions are decreasing in the utilization levels.

Evolutionary Game Theory

Revision Protocols We now introduce foundations for our models of evolutionary dynamics. These foundations are built on the notion of a revision protocol, which describes both the timing and results of agents’ myopic decisions about how to continue playing the game at hand [24,35,96,175,217]. Revision protocols will be used to derive both the deterministic dynamics studied in Sect. “Deterministic Dynamics” and the stochastic dynamics studied in Sect. “Stochastic Dynamics”; similar ideas underlie the local interaction models introduced in Sect. “Local Interaction”. Definition nn Formally, a revision protocol is a map : Rn  X ! RC that takes the payoff vectors  and population states x as arguments, and returns nonnegative matrices as outputs. For reasons to be made clear below, scalar i j (; x) is called the conditional switch rate from strategy i to strategy j. To move from this notion to an explicit model of evolution, let us consider a population consisting of N < 1 members. (A number of the analyzes to follow will consider the limit of the present model as the population size N approaches infinity – see Sects. “Mean Dynamics”, “Deterministic Approximation”, and “Stochastic Stability via Large Population Limits”.) In this case, the set of feasible social states is the finite set X N D X \ N1 Zn D fx 2 X : Nx 2 Zn g, a grid embedded in the simplex X. A revision protocol , a population game F, and a population size N define a continuous-time evolutionary process – a Markov process fX tN g – on the finite state space X N . A one-size-fits-all description of this process is as follows. Each agent in the society is equipped with a “stochastic alarm clock”. The times between rings of of an agent’s clock are independent, each with a rate R exponential distribution. The ringing of a clock signals the arrival of a revision opportunity for the clock’s owner. If an agent playing strategy i 2 S receives a revision opportunity, he switches to strategy j ¤ i with probability i j /R. If a switch occurs, the population state changes accordingly, from the old state x to a new state y that accounts for the agent’s change in strategy. While this interpretation of the evolutionary process can be applied to any revision protocol, simpler interpretations are sometimes available for protocols with additional structure. The examples to follow illustrate this point.

Examples Imitation Protocols and Natural Selection Protocols In economic contexts, revision protocols of the form

i j (; x) D x j ˆi j (; x)

(9)

are called imitation protocols [35,96,216]. These protocols can be given a very simple interpretation: when an agent receives a revision opportunity, he chooses an opponent at random and observes her strategy. If our agent is playing strategy i and the opponent strategy j, the agent switches from i to j with probability proportional to ˆi j . Notice that the value of the population share xj is not something the agent need know; this term in (9) accounts for the agent’s observing a randomly chosen opponent. Example 11 Suppose that after selecting an opponent, the agent imitates the opponent only if the opponent’s payoff is higher than his own, doing so in this case with probability proportional to the payoff difference: i j (; x) D x j [ j   i ]C : This protocol is known as pairwise proportional imitation [188]. Protocols of form (9) also appear in biological contexts, [144], [153,158], where in these cases we refer to them as natural selection protocols. The biological interpretation of (9) supposes that each agent is programmed to play a single pure strategy. An agent who receives a revision opportunity dies, and is replaced through asexual reproduction. The reproducing agent is a strategy j player with probability i j (; x) D x j ˆi j (; x), which is proportional both to the number of strategy j players and to some function of the prevalences and fitnesses of all strategies. Note that this interpretation requires the restriction X i j (; x) 1: j2S

Example 12 Suppose that payoffs are always positive, and let i j (; x) D P

xj j : k2S x k  k

(10)

Understood as a natural selection protocol, (10) says that the probability that the reproducing agent is a strategy j player is proportional to x j  j , the aggregate fitness of strategy j players. In economic contexts, we can interpret (10) as an imitative protocol based on repeated sampling. When an agent’s clock rings he chooses an opponent at random. If the opponent is playing strategy j, the agent imitates him with probability proportional to  j . If the agent does not imitate this opponent, he draws a new opponent at random and repeats the procedure.

1005

1006

Evolutionary Game Theory

Direct Evaluation Protocols In the previous examples, only strategies currently in use have any chance of being chosen by a revising agent (or of being the programmed strategy of the newborn agent). Under other protocols, agents’ choices are not mediated through the population’s current behavior, except indirectly via the effect of behavior on payoffs. These direct evaluation protocols require agents to directly evaluate the payoffs of the strategies they consider, rather than to indirectly evaluate them as under an imitative procedure. Example 13 Suppose that choices are made according to the logit choice rule: i j (; x) D P

exp(1  j ) : 1 k2S exp(  k )

(11)

The interpretation of this protocol is simple. Revision opportunities arrive at unit rate. When an opportunity is received by an i player, he switches to strategy j with probability i j (; x), which is proportional to an exponential function of strategy j’s payoffs. The parameter  > 0 is called the noise level. If  is large, choice probabilities under the logit rule are nearly uniform. But if  is near zero, choices are optimal with probability close to one, at least when the difference between the best and second best payoff is not too small. Additional examples of revision protocols can be found in the next section, and one can construct new revision protocols by taking linear combinations of old ones; see [183] for further discussion. Deterministic Dynamics Although antecedents of this approach date back to the early work of Brown and von Neumann [45], the use of differential equations to model evolution in games took root with the introduction of the replicator dynamic by Taylor and Jonker [205], and remains an vibrant area of research; Hofbauer and Sigmund [108] and Sandholm [183] offer recent surveys. In this section, we derive a deterministic model of evolution: the mean dynamic generated by a revision protocol and a population game. We study this deterministic model from various angles, focusing in particular on local stability of rest points, global convergence to equilibrium, and nonconvergent limit behavior. While the bulk of the literature on deterministic evolutionary dynamics is consistent with the approach we take here, we should mention that other specifications exist, including discrete time dynamics [5,59,131,218], and dynamics for games with continuous strategy sets [41,42,77,100,159,160] and for Bayesian population

games [62,70,179]. Also, deterministic dynamics for extensive form games introduce new conceptual issues; see [28,30,51,53,55] and the monograph of Cressman [54]. Mean Dynamics As described earlier in Sect. “Definition”, a revision protocol , a population game F, and a population size N define a Markov process fX tN g on the finite state space X N . We now derive a deterministic process – the mean dynamic – that describes the expected motion of fX tN g. In Sect. “Deterministic Approximation”, we will describe formally the sense in which this deterministic process provides a very good approximation of the behavior of the stochastic process fX tN g, at least over finite time horizons and for large population sizes. But having noted this result, we will focus in this section on the deterministic process itself. To compute the expected increment of fX tN g over the next dt time units, recall first that each of the N agents receives revision opportunities via a rate R exponential distribution, and so expects to receive Rdt opportunities during the next dt time units. If the current state is x, the expected number of revision opportunities received by agents currently playing strategy i is approximately Nx i Rdt: Since an i player who receives a revision opportunity switches to strategy j with probability i j /R, the expected number of such switches during the next dt time units is approximately Nx i i j dt: Therefore, the expected change in the number of agents choosing strategy i during the next dt time units is approximately 1 0 X X x j ji (F(x); x)  x i i j (F(x); x)A dt: (12) N@ j2S

j2S

Dividing expression (12) by N and eliminating the time differential dt yields a differential equation for the rate of change in the proportion of agents choosing strategy i: x˙ i D

X j2S

x j ji (F(x); x)  x i

X

i j (F(x); x):

(M)

j2S

Equation (M) is the mean dynamic (or mean field) generated by revision protocol in population game F. The first term in (M) captures the inflow of agents to strategy i from other strategies, while the second captures the outflow of agents to other strategies from strategy i. Examples We now describe some examples of mean dynamics, starting with ones generated by the revision protocols from

Evolutionary Game Theory

approaches one whenever the best response is unique. At such points, the logit dynamic approaches the best response dynamic [84]:

Sect. “Examples”. To do so, we let X F(x) D x i F i (x) i2S

denote the average payoff obtained by the members of the population, and define the excess payoff to strategy i, Fˆi (x) D F i (x)  F(x) ;

(16)

where B F (x) D argmax y2X y 0 F(x)

to be the difference between strategy i’s payoff and the population’s average payoff. Example 14 In Example 11, we introduced the pairwise proportional imitation protocol i j (; x) D x j [ j   i ]C . This protocol generates the mean dynamic x˙ i D x i Fˆi (x) :

x˙ 2 B F (x)  x;

(13)

Equation (13) is the replicator dynamic [205], the bestknown dynamic in evolutionary game theory. Under this dynamic, the percentage growth rate x˙ i /x i of each strategy currently in use is equal to that strategy’s current excess payoff; unused strategies always remain so. There are a variety of revision protocols other than pairwise proportional imitation that generate the replicator dynamic as their mean dynamics; see [35,96,108,217].

defines the (mixed) best response correspondence for game F. Note that unlike the other dynamics we consider here, (16) is defined not by an ordinary differential equation, but by a differential inclusion, a formulation proposed in [97]. Example 17 Consider the protocol i h X i j (; x) D  j  x k k : k2S

C

When an agent’s clock rings, he chooses a strategy at random; if that strategy’s payoff is above average, the agent switches to it with probability proportional to its excess payoff. The resulting mean dynamic, X x˙i BM D [Fˆi (x)]C  x i [Fˆk (x)]C ; k2S

Example 15 In Example 12, we assumed that payoffs are always positive, and introduced the protocol i j / x j  j ; which we interpreted both as a model of biological natural selection and as a model of imitation with repeated sampling. The resulting mean dynamic, x i F i (x) x i Fˆi (x) ;  xi D x˙ i D P x F (x) F(x) k2S k k

(14)

Example 16 In Example 13 we introduced the logit choice rule i j (; x) / exp(1  j ): The corresponding mean dynamic, exp(1 F i (x))  xi ; 1 k2S exp( F k (x))

Example 18 Consider the revision protocol i j (; x) D [ j   i ]C :

is the Maynard Smith replicator dynamic [139]. This dynamic only differs from the standard replicator dynamic (13) by a change of speed, with motion under (14) being relatively fast when average payoffs are relatively low. (In multipopulation models, the two dynamics are less similar, and convergence under one does not imply convergence under the other – see [183,216].)

x˙ i D P

is called the Brown–von Neumann–Nash (BNN) dynamic [45]; see also [98,176,194,200,217].

(15)

is called the logit dynamic [82]. If we take the noise level  to zero, then the probability with which a revising agent chooses the best response

When an agent’s clock rings, he selects a strategy at random. If the new strategy’s payoff is higher than his current strategy’s payoff, he switches strategies with probability proportional to the difference between the two payoffs. The resulting mean dynamic, x˙ i D

X j2S

x j [F i (x)  F j (x)]C  x i

X

[F j (x)  F i (x)]C ;

j2S

(17) is called the Smith dynamic [197]; see also [178]. We summarize these examples of revision protocols and mean dynamics in Table 1. Figure 1 presents phase diagrams for the five basic dynamics when the population is randomly matched to play standard Rock–Paper–Scissors (Example 4). In the phase diagrams, colors represent speed of motion: within each

1007

1008

Evolutionary Game Theory

Evolutionary Game Theory, Figure 1 Five basic deterministic dynamics in standard Rock–Paper–Scissors. Colors represent speeds: red is fastest, blue is slowest

diagram, motion is fastest in the red regions and slowest in the blue ones. The phase diagram of the replicator dynamic reveals closed orbits around the unique Nash equilibrium

x  D ( 13 ; 13 ; 13 ). Since this dynamic is based on imitation (or on reproduction), each face and each vertex of the simplex X is an invariant set: a strategy initially absent from the population will never subsequently appear.

Evolutionary Game Theory Evolutionary Game Theory, Table 1 Five basic deterministic dynamics Revision Protocol ij D xj [j  i ]C exp( 1 j ) ij D P 1  ) k k2S exp(

Mean Dynamic x˙ i D xi Fˆ i (x)

Name and source Replicator [205]

exp( 1 Fi (x))  xi Logit [82] 1 F (x)) k k2S exp(

x˙ i D P

x˙ 2 BF (x)  x ij D 1f jDargmaxk2S k g   P P ij D j  k2S xk k C x˙ i D [Fˆ i (x)]C  xi [Fˆ j (x)]C j2S X x˙ i D xj [Fi (x)  Fj (x)]C ij D [j  i ]C

j2S

 xi

X

Best response [84] BNN [45]

Smith [197] xj [Fj (x)  Fi (x)]C

j2S

The other four dynamics pictured are based on direct ecaluation, allowing agents to select strategies that are currently unused. In these cases, the Nash equilibrium is the sole rest point, and attracts solutions from all initial conditions. (In the case of the logit dynamic, the rest point happens to coincide with the Nash equilibrium only because of the symmetry of the game; see [101,104].) Under the logit and best response dynamics, solution trajectories quickly change direction and then accelerate when the best response to the population state changes; under the BNN and especially the Smith dynamic, solutions approach the Nash equlibrium in a less angular fashion. Evolutionary Justification of Nash Equilibrium One of the goals of evolutionary game theory is to justify the prediction of Nash equilibrium play. For this justification to be convincing, it must be based on a model that makes only mild assumptions about agents’ knowledge about one another’s behavior. This sentiment can be captured by introducing two desiderata for revision protocols: (C) (SD)

Continuity:

is Lipschitz continuous.

Scarcity of data:

i j only depends on  i ;  j ; and x j :

Continuity (C) asks that revision protocols depend continuously on their inputs, so that small changes in aggregate behavior do not lead to large changes in players’ responses. Scarcity of data (SD) demands that the conditional switch rate from strategy i to strategy j only depend on the payoffs of these two strategies, so that agents need only know those facts that are most germane to the decision at hand [183]. (The dependence of i j on x j is included to allow for dynamics based on imitation.) Protocols that respect these two properties do not make unreal-

istic demands on the amount of information that agents in an evolutionary model possess. Our two remaining desiderata impose restrictions on mean dynamics x˙ D V F (x), linking the evolution of aggregate behavior to incentives in the underlying game. (NS) Nash stationarity: V F (x) D 0 (PC)

if and only if x 2 N E(F) :

Positive correlation: F

V (x) ¤ 0

implies that V F (x)0 F(x) > 0:

Nash stationarity (NS) is a restriction on stationary states: it asks that the rest points of the mean dynamic be precisely the Nash equilibria of the game being played. Positive correlation (PC) is a restriction on disequilibrium adjustment: it requires that away from rest points, strategies’ growth rates be positively correlated with their payoffs. Condition (PC) is among the weakest of the many conditions linking growth rates of evolutionary dynamics and payoffs in the underlying game; for alternatives, see [76,110,149,162,170,173,200]. In Table 2, we report how the the five basic dynamics fare under the four criteria above. For the purposes of justifying the Nash prediction, the most important row in the table is the last one, which reveals that the Smith dynamic satisfies all four desiderata at once: while the revision protocol for the Smith dynamic (see Example 18) requires only limited information on the part of the agents who employ it, this information is enough to ensure that rest points of the dynamic and Nash equilibria coincide. In fact, the dynamics introduced above can be viewed as members of families of dynamics that are based on similar revision protocols and that have similar qualitative properties. For instance, the Smith dynamic is a member of the family of pairwise comparison dynamics [178], under which agents only switch to strategies that outperform

1009

1010

Evolutionary Game Theory Evolutionary Game Theory, Table 2 Families of deterministic evolutionary dynamics and their properties; yes indicates that a weaker or alternate form of the property is satisfied Dynamic Replicator Best response Logit BNN Smith

Family Imitation

(C) yes no Perturbed best response yes Excess payoff yes Pairwise comparison yes

their current choice. For this reason, the exact functional forms of the previous examples are not essential to establishing the properties noted above. In interpreting these results, it is important to remember that Nash stationarity only concerns the rest points of a dynamic; it says nothing about whether a dynamic will converge to Nash equilibrium from an arbitrary initial state. The question of convergence is addressed in Sects. “Global Convergence” and “Nonconvergence”. There we will see that in some classes of games, general guarantees of convergence can be obtained, but that there are some games in which no reasonable dynamic converges to equilibrium. Local Stability Before turning to the global behavior of evolutionary dynamics, we address the question of local stability. As we noted at the onset, an original motivation for introducing game dynamics was to provide an explicitly dynamic foundation for Maynard Smith’s notion of ESS [205]. Some of the earliest papers on evolutionary game dynamics [105,224] established that being an ESS is a sufficient condition for asymptotically stablity under the replicator dynamic, but that it is not a necessary condition. It is curious that this connection obtains despite the fact that ESS is a stability condition for a population whose members all play the same mixed strategy, while (the usual version of) the replicator dynamic looks at populations of agents choosing among different pure strategies. In fact, the implications of ESS for local stability are not limited to the replicator dynamic. Suppose that the symmetric normal form game A admits a symmetric Nash equilibrium that places positive probability on each strategy in S. One can show that this equilibrium is an ESS if and only if the payoff matrix A is negative definite with respect to the tangent space of the simplex: z0 Az < 0

for all z 2 T X D n X zˆ 2 Rn :

i2S

o zˆi D 0 :

(18)

(SD) yes yes yes no yes

(NS) no yes no yes yes

(PC) yes yes no yes yes

Condition (18) and its generalizations imply local stability of equilibrium not only under the replicator dynamic, but also under a wide range of other evolutionary dynamics: see [52,98,99,102,111,179] for further details. The papers cited above use linearization and Lyapunov function arguments to establish local stability. An alternative approach to local stability analysis, via index theory, allows one to establish restrictions on the stability properties of all rest points at once – see [60]. Global Convergence While analyses of local stability reveal whether a population will return to equilibrium after a small disturbance, they do not tell us whether the population will approach equilibrium from an arbitrary disequilibrium state. To establish such global convergence results, we must restrict attention to classes of games defined by certain interesting payoff structures. These structures appear in applications, lending strong support for the Nash prediction in the settings where they arise. Potential Games A potential game [17,106,143,166,173, 181] is a game that admits a potential function: a scalar valued function whose gradient describes the game’s payoffs. n In a full potential game F : RC ! Rn (see [181]), all information about incentives is captured by the potential funcn ! R, in the sense that tion f : RC r f (x) D F(x)

n : for all x 2 RC

(19)

If F is smooth, then it is a full potential game if and only if it satisfies full externality symmetry: @F j @F i (x) D (x) @x j @x i

n for all i; j 2 S and x 2 RC :

(20)

That is, the effect on the payoff to strategy i of adding new strategy j players always equals the effect on the payoff to strategy j of adding new strategy i players. Example 19 Suppose a single population is randomly matched to play the symmetric normal form game

Evolutionary Game Theory

A 2 Rnn , generating the population game F(x) D Ax. We say that A exhibits common interests if the two players in a match always receive the same payoff. This means that A i j D A ji for all i and j, or, equivalently, that the matrix A is symmetric. Since DF(x) D A, this is precisely what we need for F to be a full potential game. The full potential function for F is f (x) D 12 x 0 Ax; which is one-half of the P ¯ average payoff function F(x) D i2S x i F i (x) D x 0 Ax. The common interest assumption defines a fundamental model from population genetics, this assumption reflects the shared fate of two genes that inhabit the same organism [73,106,107]. Example 20 In Example 10, we introduced congestion games, a basic model of network congestion. To see that these games are potential games, observe that an agent taking path j 2 S affects the payoffs of agents choosing path i 2 S through the marginal increases in congestion on the links  2 ˚ i \ ˚ j that the two paths have in common. But since the marginal effect of an agent taking path i on the payoffs of agents choosing path j is identical, full externality symmetry (20) holds: @F i (x) D  @x j

X

c 0 (u (x)) D

2˚ i \˚ j

@F j (x): @x i

In congestion games, the potential function takes the form f (x) D 

XZ



u  (x)

c (z) dz; 0

and so is typically unrelated to aggregate payoffs, F(x) D

X i2S

x i F i (x) D 

X

u (x) c (u (x)):



However, potential is proportional to aggregate payoffs if the cost functions c are all monomials of the same degree [56,173]. Population state x is a Nash equilibrium of the potential game F if and only if it satisfies the Kuhn–Tucker first order conditions for maximizing the potential function f on the simplex X [17,173]. Furthermore, it is simple to verify that any dynamic x˙ D V F (x) satisfying positive correlation (PC) ascends the potential function: d dt

f (x t ) D r f (x t )0 x˙ t D F(x t )0 V F (x t ) 0:

It then follows from classical results on Lyapunov functions that any dynamic satisfying positive correlation (PC) converges to a connected set of rest points. If the dynamic

also satisfies Nash stationarity (NS), these sets consist entirely of Nash equilibria. Thus, in potential games, very mild conditions on agents’ adjustment rules are sufficient to justify the prediction of Nash equilibrium play. In the case of the replicator dynamic, one can say more. On the interior of the simplex X, the replicator dynamic for the potential game F is a gradient system for the potential function f (i. e., it always ascends f in the direction of maximum increase). However, this is only true after one introduces an appropriate Riemannian metric on X [123,192]. An equivalent statement of this result, due to [2], is that the replicator dynamic is the gradient system for f under the usual Euclidean metric if we stretch the state space X onto the radius 2 sphere. This stretching is accomplished using the Akin transformation p H i (x) D 2 x i , which emphasizes changes in the use of rare strategies relative to changes in the use of common ones [2,4,185]. (There is also a dynamic that generates the gradient system for f on X under the usual metric: the socalled projection dynamic [130,150,185].) Example 21 Consider evolution in 123 Coordination:

1 2 3

1 1 0 0

2 0 2 0

3 0 0 3

:

Figure 2a presents a phase diagram of the replicator dynamic on its natural state space X, drawn atop of a contour plot of the potential function f (x) D 12 ((x1 )2 C 2(x2 )2 C 3(x3 )2 ). Evidently, all solution trajectories ascend this function and converge to one of the seven symmetric Nash equilibria, with trajectories from all but a measure zero set of initial conditions converging to one of the three pure equilibria. Figure 2b presents another phase diagram for the replicator dynamic, this time after the solution trajectories and the potential function have been transported to the surface of the radius 2 sphere using the Akin transformation. In this case, solutions cross the level sets of the potential function orthogonally, moving in the direction that increases potential most quickly. Stable Games [102] if

A population game F is a stable game

(y  x)0 (F(y)  F(x))  0

for all x; y 2 X:

(21)

If the inequality in (21) always holds strictly, then F is a strictly stable game.

1011

1012

Evolutionary Game Theory

Example 22 The symmetric normal form game A is symmetric zero-sum if A is skew-symmetric (i. e., if A D A0 ), so that the payoffs of the matched players always sum to zero. (An example is provided by the standard Rock– Paper–Scissors game (Example 4).) Under this assumption, z0 Az D 0 for all z 2 Rn ; thus, the population game generated by random matching in A, F(x) D Ax, is a stable game that is not strictly stable. Example 23 Suppose that A satisfies the interior ESS condition (18). Then (22) holds strictly, so F(x) D Ax is a strictly stable game. Examples satisfying this condition include the Hawk–Dove game (Example 3) and any good Rock–Paper–Scissors game (Example 4). Example 24 A war of attrition [33] is a symmetric normal form game in which strategies represent amounts of time committed to waiting for a scarce resource. If the two players choose times i and j > i, then the j player obtains the resource, worth v, while both players pay a cost of ci : once the first player leaves, the other seizes the resource immediately. If both players choose time i, the resource is split, so payoffs are v2  c i each. It can be shown that for any resource value v 2 R and any increasing cost vector c 2 Rn , random matching in a war of attrition generates a stable game [102].

Evolutionary Game Theory, Figure 2 The replicator dynamic in 123 Coordination. Colors represent the value of the game’s potential function

If F is smooth, then F is a stable game if and only if it satisfies self-defeating externalities: z0 DF(x)z  0

for all z 2 T X and x 2 X;

(22)

where DF(x) is the derivative of F : X ! Rn at x. This condition requires that the improvements in the payoffs of strategies to which revising agents are switching are always exceeded by the improvements in the payoffs of strategies which revising agents are abandoning.

The flavor of the self-defeating externalities condition (22) suggests that obedience of incentives will push the population toward some “central” equilibrium state. In fact, the set of Nash equilibria of a stable game is always convex, and in the case of strictly stable games, equilibrium is unique. Moreover, it can be shown that the replicator dynamic converges to Nash equilibrium from all interior initial conditions in any strictly stable game [4,105,224], and that the direct evaluation dynamics introduced above converge to Nash equilibrium from all initial conditions in all stable games, strictly stable or not [98,102,104,197]. In each case, the proof of convergence is based on the construction of a Lyapunov function that solutions of the relevant dynamic descend. The Lyapunov functions for the five basic dynamics are presented in Table 3. Interestingly, the convergence results for direct evaluation dynamics are not restricted to the dynamics listed in Table 3, but extend to other dynamics in the same families (cf Table 2). But compared to the conditions for convergence in potential games, the conditions for convergence in stable games demand additional structure on the adjustment process [102]. Perturbed Best Response Dynamics in Supermodular Games Supermodular games are defined by the property

Evolutionary Game Theory Evolutionary Game Theory, Table 3 Lyapunov functions for five basic deterministic dynamics in stable games Dynamic Replicator Logit

Lyapunov function for stable games P x Hx  (x) D i2S(x  ) xi log xii   P P ˜ D max y0 F(x) ˆ   i2S yi log yi C  i2S xi log xi G(x) y2int(X)

Best response G(x) D max Fˆ i (x) i2S

BNN Smith

 (x) D

1 2

P

2 ˆ i2S [Fi (x)]C

PP 1

 (x) D 2

i2S j2S

xi [Fj (x)Fi (x)]2C

that higher choices by one’s opponents (with respect to the natural ordering on S D f1; : : : ; ng) make one’s own higher strategies look relatively more desirable. Let the matrix ˙ 2 R(n1)n satisfy ˙ i j D 1 if j > i and ˙ i j D 0 otherwise, so that ˙ x 2 Rn1 is the “decumulative distribution function” corresponding to the “density function” x. The population game F is a supermodular game if it exhibits strategic complementarities: If ˙ y ˙ x; then F iC1(y)  F i (y) F iC1 (x)  F i (x) for all i < n and x 2 X:

(23)

If F is smooth, condition (23) is equivalent to @(FiC1  F i ) (x) 0 @(e jC1  e j )

for all i; j < n and x 2 X: (24)

Example 25 Consider this model of search with positive externalities. A population of agents choose levels of search effort in S D f1; : : : ; ng. The payoff to choosing effort i is F i (x) D m(i) b(a(x))  c(i); P where a(x) D kn kx k is the aggregate search effort, b is some increasing benefit function, m is an increasing multiplier function, and c is an arbitrary cost function. Notice that the benefits from searching are increasing in both own search effort and in the aggregate search effort. It is easy to check that F is a supermodular game. Complementarity condition (23) implies that the agents’ best response correspondence is monotone in the stochastic dominance order, which in turn ensures the existence of minimal and maximal Nash equilibria [207]. One can take advantage of the monotoncity of best responses in studying evolutionary dynamics by appealing to the theory of monotone dynamical systems [196]. To do so, one needs to focus on dynamics that respect the monotonicity of best responses and that also are smooth, so that the

the theory of monotone dynamics can be applied. It turns out that the logit dynamic satisfies these criteria; so does any perturbed best response dynamic defined in terms of stochastic payoff perturbations. In supermodular games, these dynamics define cooperative differential equations; consequently, solutions of these dynamics from almost every initial condition converge to an approximate Nash equilibrium [104]. Imitation Dynamics in Dominance Solvable Games Suppose that in the population game F, strategy i is a strictly dominated by strategy j: Fi (x) < F j (x) for all x 2 X. Consider the evolution of behavior under the replicator dynamic (13). Since for this dynamic we have x˙ i x j  x˙ j x i d xi D dt x j (x j )2 x i Fˆi (x)x j  x j Fˆ j (x)x i D (x j )2  xi  ˆ D Fi (x)  Fˆ j (x) ; xj solutions from every interior initial condition converge to the face of the simplex where the dominated strategy is unplayed [3]. It follows that the replicator dynamic converges in games with a strictly dominant strategy, and by iterating this argument, one can show that this dynamic converges to equilibrium in any game that can be solved by iterative deletion of strictly dominated strategies. In fact, this argument is not specific to the replicator dynamic, but can be shown to apply to a range of dynamics based on imitation [110,170]. Even in games which are not dominance solvable, arguments of a similar flavor can be used to restrict the long run behavior of imitative dynamics to better-reply closed sets [162]; see Sect. “Convergence to Equilibria and to Better–Reply Closed Sets” for a related discussion.

1013

1014

Evolutionary Game Theory

While the analysis here has focused on imitative dynamics, it is natural to expect that elimination of dominated strategies will extend to any reasonable evolutionary dynamic. But we will see in Sect. “Survival of Dominated Strategies” that this is not the case: the elimination of dominated strategies that obtains under imitative dynamics is the exception, not the rule. Nonconvergence The previous section revealed that when certain global structural conditions on payoffs are satisfied, one can establish global convergence to equilibrium under various classes of evolutionary dynamics. Of course, if these conditions are not met, convergence cannot be guaranteed. In this section, we offer examples to illustrate some of the possibilities for nonconvergent limit behavior. Conservative Properties of the Replicator Dynamic in Zero-Sum Games In Sect. “Stable Games”, we noted that in strictly stable games, the replicator dynamic converges to Nash equilibrium from all interior initial conditions. To prove this, one shows that interior solutions descend the function X x H x  (x) D x i log xii ; i2S(x  )

Evolutionary Game Theory, Figure 3 Solutions of the replicator dynamic in a zero-sum game. The solutions pictured lie on the level set Hx  (x) D :58

proved that the the time average of each solution trajectory converges to Nash equilibrium [190]. The existence of a constant of motion is not the only conservative property enjoyed by replicator dynamics for symmetric zero-sum games: these dynamics are also volume preserving after an appropriate change of speed or change of measure [5,96].

until converging to its minimizer, the unique Nash equilibrium x  . Now, random matching in a symmetric zero-sum game generates a population game that is stable, but not strictly stable (Example 22). In this case, for each interior Nash equilibrium x  , the function H x  is a constant of motion for the replicator dynamic: its value is fixed along every interior solution trajectory.

Games with Nonconvergent Dynamics The conservative properties described in the previous section have been established only for the replicator dynamic (and its distant relative, the projection dynamic [185]). Inspired by Shapley [193], many researchers have sought to construct games in which large classes of evolutionary dynamics fail to converge to equilibrium.

Example 26 Suppose that agents are randomly matched to play the symmetric zero-sum game A, given by

Example 27 Suppose that players are randomly matched to play the following symmetric normal form game [107,109]:

1 2 3 4

1 0 1 0 1

2 1 0 1 0

3 0 1 0 1

4 1 0 1 0

:

The Nash equilibria of F(x) D Ax are the points on the line segment N E connecting states ( 12 ; 0; 12 ; 0) and (0; 12 ; 0; 12 ), a segment that passes through the barycenter x  D ( 14 ; 14 ; 14 ; 14 ). Figure 3 shows solutions to the replicator dynamic that lie on the level set H x  (x) D :58. Evidently, each of these solutions forms a closed orbit. Although solution trajectories of the replicator dynamic do not converge in zero-sum games, it can be

1 2 3 4

1 0 " 1 0

2 0 0 " 1

3 1 0 0 "

4 " 1 0 0

:

When " D 0, the payoff matrix A" D A0 is symmetric, so F 0 is a potential game with potential function f (x) D 12 x 0 A0 x D x1 x3  x2 x4 . The function f attains its minimum of  14 at states v D ( 12 ; 0; 12 ; 0) and w D (0; 12 ; 0; 12 ), has a saddle point with value  18 at the Nash equilibrium x  D ( 14 ; 14 ; 14 ; 14 ), and attains its maximum of 0 along the closed path of Nash equilibria  consisting of edges e1 e2 , e2 e3 , e3 e4 , and e4 e1 .

Evolutionary Game Theory

Let x˙ D V F (x) be an evolutionary dynamic that satisfies Nash stationarity (NS) and positive correlation (PC), and that is based on a revision protocol that is continuous (C). If we apply this dynamic to game F 0 , then the fore0 going discussion implies that all solutions to x˙ D V F (x) 1 whose initial conditions  satisfy f () >  8 converge to  . The Smith dynamic for F 0 is illustrated in Fig. 4a. Now consider the same dynamic for the game F " , 0 where " > 0. By continuity (C), the attractor  of V F con" tinues to an attractor  " of V F whose basin of attraction 0 approximates that of  under x˙ D V F (x) (Fig. 4b). But since the unique Nash equilibrium of F " is the barycenter x  , it follows that solutions from most initial conditions converge to an attractor far from any Nash equilibrium. Other examples of games in which many dynamics fail to converge include monocyclic games [22,83,97,

106], Mismatching Pennies [91,116], and the hypnodisk game [103]. These examples demonstrate that there is no evolutionary dynamic that converges to Nash equilibrium regardless of the game at hand. This suggests that in general, analyses of long run behavior should not restrict attention to equilibria alone. Chaotic Dynamics We have seen that deterministic evolutionary game dynamics can follow closed orbits and approach limit cycles. We now show that they also can behave chaotically. Example 28 Consider evolution under the replicator dynamic when agents are randomly matched to play the symmetric normal form game below [13,195], whose lone interior Nash equilibrium is the barycenter x  D ( 14 ; 14 ; 14 ; 14 ): 1 2 3 4

1 0 20 21 10

2 12 0 4 2

3 0 0 0 2

4 22 10 35 0

:

Figure 5 presents a solution to the replicator dynamic for this game from initial condition x0 D (:24; :26; :25; :25). This solution spirals clockwise about x  . Near the rightmost point of each circuit, where the value of x3 gets close to zero, solutions sometimes proceed along an “outside” path on which the value of x3 surpasses .6. But they sometimes follow an “inside” path on which x3 remains below .4, and at other times do something in between. Which

Evolutionary Game Theory, Figure 4 Solutions of the Smith dynamic in a the potential game F 0 ; b the 1 perturbed potential game F " , " D 10

Evolutionary Game Theory, Figure 5 Chaotic behavior under the replicator dynamic

\unskip

1015

1016

Evolutionary Game Theory

of these alternatives occurs is difficult to predict from approximate information about the previous behavior of the system. While the game in Example 28 has a complicated payoff structure, in multipopulation contexts one can find chaotic evolutionary dynamics in very simple games [187]. Survival of Dominated Strategies In Sect. “Imitation Dynamics in Dominance Solvable Games”, we saw that dynamics based on imitation eliminate strictly dominated strategies along solutions from interior initial conditions. While this result seems unsurprising, it is actually extremely fragile: [25,103] prove that dynamics that satisfy continuity (C), Nash stationarity (NS), and positive correlation (PC) and that are not based exclusively on imitation must fail to eliminate strictly dominated strategies in some games. Thus, evolutionary support for a basic rationality criterion is more tenuous than the results for imitative dynamics suggest. Example 29 Figure 6a presents the Smith dynamic for “bad RPS with a twin”: R P S T

R 0 1 2 2

P 2 0 1 1

S 1 2 0 0

T 1 2 0 0

:

The Nash equilibria of this game are the states on line segment N E D fx  2 X : x  D ( 13 ; 13 ; c; 13  c)g, which is a repellor under the Smith dynamic. Under this dynamic, strategies gain players at rates that depend on their payoffs, but lose players at rates proportional to their current usage levels. It follows that when the dynamics are not at rest, the proportions of players choosing strategies 3 and 4 become equal, so that the dynamic approaches the plane P D fx 2 X : x3 D x4 g on which the twins receive equal weight. Since the usual three-strategy version of bad RPS, exhibits cycling solutions here on the plane P approach a closed orbit away from any Nash equilibrium. Figure 6b presents the Smith dynamic in “bad RPS with a feeble twin”, R P S T

R 0 1 2 2  

P 2 0 1 1

S 1 2 0 

T 1 2 0 

:

1 with " D 10 . Evidently, the attractor from Fig. 6a moves slightly to the left, reflecting the fact that the payoff to

Evolutionary Game Theory, Figure 6 The Smith dynamic in two games

Twin has gone down. But since the new attractor is in the interior of X, the strictly dominated strategy Twin is always played with probabilities bounded far away from zero. Stochastic Dynamics In Sect. “Revision Protocols” we defined the stochastic evolutionary process fX tN g in terms of a simple model of myopic individual choice. We then turned to the study of deterministic dynamics, which we claimed could be used to approximate the stochastic process fX tN g over finite time spans and for large population sizes. In this section, we turn our attention to the stochastic process fX tN g itself. After offering a formal version of the deterministic approximation result, we investigate the long run behav-

Evolutionary Game Theory

ior of fX tN g, focusing on the questions of convergence to equilibrium and selection among multiple stable equilibria. Deterministic Approximation In Sect. “Revision Protocols”, we defined the Markovian evolutionary process fX tN g from a revision protocol , a population game F, and a finite population size N. In Sect. “Mean Dynamics”, we argued that the expected motion of this process is captured by the mean dynamic x˙ i D ViF (x) D

X

x j ji (F(x); x)  x i

j2S

X

i j (F(x); x):

j2S

(M) The basic link between the Markov process fX tN g and its mean dynamic (M) is provided by Kurtz’s Theorem [127], variations and extensions of which have been offered in a number of game-theoretic contexts [24,29,43, 44,175,204]. Consider the sequence of Markov processes N ffX tN g t0 g1 NDN 0 , supposing that the initial conditions X 0 converge to x0 2 X. Let fx t g t0 be the solution to the mean dynamic (M) starting from x0 . Kurtz’s Theorem tells us that for each finite time horizon T < 1 and error bound " > 0, we have that ! ˇ N ˇ lim P sup ˇX  x t ˇ < " D 1: (25) N!1

t2[0;T]

t

Thus, when the population size N is large, nearly all sample paths of the Markov process fX tN g stay within " of a solution of the mean dynamic (M) through time T. By choosing N large enough, we can ensure that with probability close to one, X tN and xt differ by no more than " for all times t between 0 and T (Fig. 7). The intuition for this result comes from the law of large numbers. At each revision opportunity, the increment in the process fX tN g is stochastic. Still, at most population states the expected number of revision opportunities that arrive during the brief time interval I D [t; t C dt] is large – in particular, of order Ndt. Since each opportunity leads to an increment of the state of size N1 , the size of the overall change in the state during time interval I is of order dt. Thus, during this interval there are a large number of revision opportunities, each following nearly the same transition probabilities, and hence having nearly the same expected increments. The law of large numbers therefore suggests that the change in fX tN g during this interval should be almost completely determined by the expected motion of fX tN g, as described by the mean dynamic (M).

Evolutionary Game Theory, Figure 7 Deterministic approximation of the Markov process fXtN g

Convergence to Equilibria and to Better–Reply Closed Sets Stochastic models of evolution can also be used to address directly the question of convergence to equilibrium [61,78,117,118,125,143,172,219]. Suppose that a society of agents is randomly matched to play an (asymmetric) normal form game that is weakly acyclic in better replies: from each strategy profile, there exists a sequence of profitable unilateral deviations leading to a Nash equilibrium. If agents switch to strategies that do at least as well as their current one against the choices of random samples of opponents, then the society will eventually escape any better-response cycle, ultimately settling upon a Nash equilibrium. Importantly, many classes of normal form games are weakly acyclic in better replies: these include potential games, dominance solvable games, certain supermodular games, and certain aggregative games, in which each agent’s payoffs only depend on opponents’ behavior through a scalar aggregate statistic. Thus, in all of these cases, simple stochastic better-reply procedures are certain to lead to Nash equilibrium play. Outside these classes of games, one can narrow down the possibilities for long run behavior by looking at better-reply closed sets: that is, subsets of the set of strategy profiles that cannot be escaped without a player switching to an inferior strategy (cf. [16,162]). Stochastic better-reply procedures must lead to a cluster of population states corresponding to a better-reply closed set; once the society enters such a cluster, it never departs.

1017

1018

Evolutionary Game Theory

Stochastic Stability and Equilibirum Selection To this point, we used stochastic evolutionary dynamics to provide foundations for deterministic dynamics and to address the question of convergence to equilibrium. But stochastic evolutionary dynamics introduce an entirely new possibility: that of obtaining unique long-run predictions of play, even in games with multiple locally stable equilibria. This form of analysis, which we consider next, was pioneered by Foster and Young [74], Kandori, Mailath, and Rob [119], and Young [219], building on mathematical techniques due to Freidlin and Wentzell [75]. Stochastic Stability To minimize notation, let us describe the evolution of behavior using a discrete-time N Markov chain fX kN;" g1 kD0 on X , where the parameter " > 0 represents the level of “noise” in agents’ decision procedures. The noise ensures that the Markov chain is irreducible and aperiodic: any state in X N can be reached from any other, and there is positive probability that a period passes without a change in the state. Under these conditions, the Markov chain fX kN;" g admits a unique stationary distribution,  N;" , a measure on the state space X N that is invariant under the Markov chain: X

ˇ  N;"   N;" (x) P X kC1 D y ˇ X kN;" D x D  N;" (y)

x2X N

for all y 2 X N : The stationary distribution describes the long run behavior of the process fX tN;" g in two distinct ways. First,  N;" is the limiting distribution of fX tN;" g: ˇ   lim P X kN;" D y ˇ X0N;" D x D  N;" (y)

k!1

for all x; y 2 X N : Second,  N;" almost surely describes the limiting empirical distribution of fX tN;" g: P



lim

K!1

K1  1 X 1fX N;" 2Ag D  N;" (A) D 1 k K kD0

for any A X N : Thus, if most of the mass in the stationary distribution  N;" were placed on a single state, then this state would provide a unique prediction of long run behavior. With this motivation, consider a sequence of Markov chains ffX kN;" g1 kD0 g"2(0;¯") parametrized by noise levels "

that approach zero. Population state x 2 X N is said to be stochastically stable if it retains positive weight in the stationary distributions of these Markov chains as " becomes arbitrarily small: lim  N;" (x) > 0:

"!0

When the stochastically stable state is unique, it offers a unique prediction of play that is relevant over sufficiently long time spans. Bernoulli Arrivals and Mutations Following the approach of many early contributors to the literature, let us consider a model of stochastic evolution based on Bernoulli arrivals of revision opportunities and best responses with mutations. The former assumption means that during each discrete time period, each agent has probability 2 (0; 1] of receiving an opportunity to update his strategy. This assumption differs than the one we proposed in Sect. “Revision Protocols”; the key new implication is that all agents may receive revision opportunities simultaneously. (Models that assume this directly generate similar results.) The latter assumption posits that when an agent receives a revision opportunity, he plays a best response to the current strategy distribution with probability 1  ", and chooses a strategy at random with probability ". Example 30 Suppose that a population of N agents is randomly matched to play the Stag Hunt game (Example 2): H S

H h 0

S h s

:

Since s > h > 0, hunting hare and hunting stag are both symmetric pure equilibria; the game also admits the sym ; x  ) D ( sh ; h ). metric mixed equilibrium x  D (x H S s s  If more than fraction x H of the agents hunt hare, then hare is the unique best response, while if more than fraction x S of the agents hunt stag, then stag is the unique best response. Thus, under any deterministic dynamic that respects payoffs, the mixed equilibrium x  divides the state space into two basins of attraction, one for each of the two pure equilibria. Now consider our stochastic evolutionary process. If the noise level " is small, this process typically behaves like a deterministic process, moving quickly toward one of the two pure states, e H D (1; 0) or e S D (0; 1), and remaining there for some time. But since the process is ergodic, it will eventually leave the pure state it reaches first, and in fact will switch from one pure state to the other infinitely often. To determine the stochastically stable state, we must compute and compare the “improbabilities” of these tran-

Evolutionary Game Theory

sitions. If the current state is e H , a transition to e S requires mutations to cause roughly Nx S agents to switch to the suboptimal strategy S, sending the population into the basin of attraction of e S ; the probability of this event is of  order " N x S . Similarly, to transit from e S to e H , mutations  D N(1  x  ) to switch from S to must cause roughly Nx H S  H; this probability of this event is of order " N(1x S ) . Which of these rare events is more likely ones depends on whether x S is greater than or less than 12 . If  s > 2h, so that x S < 12 , then " N x S is much smaller than  " N(1x S ) when " is small; thus, state e S is stochastically stable (Fig. 8a). If instead s < 2h, so that x S > 12 , then   " N(1x S ) < " N x S , so e H is stochastically stable (Fig. 8b). These calculations show that risk dominance – being the optimal response against a uniformly randomizing opponent – drives stochastic stability 2  2 games. In particular, when s < 2h, so that risk dominance and payoff dominance disagree, stochastic stability favors the former over the latter. This example illustrates how under Bernoulli arrivals and mutations, stochastic stability analysis is based on mutation counting: that is, on determining how many simultaneous mutations are required to move from each equilibrium into the basin of attraction of each other equilibrium. In games with more than two strategies, completing the argument becomes more complicated than in the example above: the analysis, typically based on the tree-analysis techniques of [75,219], requires one to account for the relative difficulties of transitions between all pairs of equilibria. [68] develops a streamlined method of computing the stochastically stable state based on radius-coradius calcu-

Evolutionary Game Theory, Figure 8 Equilibrium selection via mutation counting in Stag Hunt games

lations; while this approach is not always sufficiently fine to yield a complete analysis, in the cases where it works it can be considerably simpler to apply than the tree-analysis method. These techniques have been employed successfully to variety of classes of games, including pure coordination games, supermodular games, games satisfying “bandwagon” properties, and games with equilibria that satisfy generalizations of risk dominance [68,120,121,134]. A closely related literature uses stochastic stability as a basis for evaluating traditional solution concepts for extensive form games [90,115,122,128,152,168,169]. A number of authors have shown that variations on the Bernoulli arrivals and mutations model can lead to different equilibrium selection results. For instance, [165,211] show that if choices are determined from the payoffs from a single round of matching (rather than from expected payoffs), the payoff dominant equilibrium rather than the risk dominant equilibrium is selected. If choices depend on strategies’ relative performances rather than their absolute performances, then long run behavior need not resemble a Nash equilibrium at all [26,161,171,198]. Finally, if the probability of mutation depends on the current population state, then any recurrent set of the unperturbed process (e. g., any pure equilibrium of a coordination game) can be selected in the long run if the mutation rates are specified in an appropriate way [27]. This last result suggests that mistake probabilities should be provided with an explicit foundation, a topic we take up in Sect. “Poisson Arrivals and Payoff Noise”. Another important criticism of the stochastic stability literature concerns the length of time needed for its predic-

1019

1020

Evolutionary Game Theory

tions to become relevant [31,67]. If the population size N is large and the mutation rate " is small, then the probability "c N that a transition between equilibria occurs during given period is miniscule; the waiting time between transitions is thus enormous. Indeed, if the mutation rate falls over time, or if the population size grows over time, then ergodicity may fail, abrogating equilibrium selection entirely [163,186]. These analyses suggest that except in applications with very long time horizons, the unique predictions generated by analyses of stochastic stability may be inappropriate, and that modelers would do better to focus on history-dependent predictions of the sort provided by deterministic models. At the same time, there are frameworks in which stochastic stability becomes relevant much more quickly. The most important of these are local interaction models, which we discuss in Sect. “Local Interaction”. Poisson Arrivals and Payoff Noise Combining the assumption of Bernoulli arrivals of revision opportunities with that of best responses with mutations creates a model in which the probabilities of transitions between equilibria are easy to compute: one can focus on events in which large numbers of agents switch to a suboptimal strategy at once, each doing so with the same probability. But the simplicity of this argument also highlights the potency of the assumptions behind it. An appealing alternative approach is to model stochastic evolution using Poisson arrivals of revision opportunities and payoff noise [29,31,38,39,63,135,145,209,210,222]. (One can achieve similar effects by looking at models defined in terms of stochastic differential equations; see [18,48,74,79,113].) By allowing revision opportunities to arrive in continuous time, as we did in Sect. “Revision Protocols”, we ensure that agents do not receive opportunities simultaneously, ruling out the simultaneous mass revisions that drive the Bernoulli arrival model. (One can accomplish the same end using a discrete time model by assuming that one agent updates during each period; the resulting process is a random time change away from the Poisson arrivals model.) Under Poisson arrivals, transitions between equilibria occur gradually, as the population works its way out of basins of attraction one agent at a time. In this context, the mutation assumption becomes particularly potent, ensuring that the probabilities of suboptimal choices do not vary with their payoff consequences. Under the alternative assumption of payoff noise, one supposes that agents play best responses to payoffs that are subject to random perturbations drawn from a fixed multivariate distribution. In this case, suboptimal choices are much more likely near

basin boundaries, where the payoffs of second-best strategies are not much less than those of optimal ones, than they are at stable equilibria, where payoff differences are larger. Evidently, assuming Poisson arrivals and payoff noise means that stochastic stability cannot be assessed by way of mutation counting. To determine the unlikelihood of escaping from an equilibrium’s basin of attraction, one must not only account for the “width” of the basin of attraction (i. e., the number of suboptimal choices needed to escape it), but also for its “depth” (the unlikelihood of each of these choices). In two-strategy games this is not difficult to accomplish: in this case the evolutionary process is a birth-and-death chain, and its stationary distribution can be expressed using an explicit formula. Beyond this case, one can employ the Freidlin and Wentzell [75] machinery, although doing so tends to be computationally demanding. This computational burden is less in models that retain Poisson arrivals, but replace perturbed optimization with decision rules based on imitation and mutation [80]. Because agents imitate successful opponents, the population spends the vast majority of periods on the edges of the simplex, implying that the probabilities of transitions between vertices can be determined using birth-and-death chain methods [158]. As a consequence, one can reduce the problem of finding the stochastically stable state in an n strategy coordination game to that of computing the limiting stationary distribution of an n state Markov chain. Stochastic Stability via Large Population Limits The approach to stochastic stability followed thus far relies on small noise limits: that is, on evaluating the limit of the stationary distributions  N;" as the noise level " approaches zero. Binmore and Samuelson [29] argue that in the contexts where evolutionary models are appropriate, the amount of noise in agents decisions is not negligible, so that taking the low noise limit may not be desirable. At the same time, evolutionary models are intended to describe behavior in large populations, suggesting an alternative approach: that of evaluating the limit of the stationary distributions  N;" as the population size N grows large. In one respect, this approach complicates the analysis. When N is fixed and " varies, each stationary distribution  N;" is a measure on the fixed state space X N D fx 2 X : Nx 2 Zn g. But when " is fixed and N varies, the state space X N varies as well, and one must introduce notions of weak convergence of probability measures in order to define stochastic stability. But in other respects taking large population limits can make analysis simpler. We saw in Sect. “Deterministic Ap-

Evolutionary Game Theory

proximation” that by taking the large population limit, we can approximate the finite-horizon sample paths of the stochastic evolutionary process fX tN;" g by solutions to the mean dynamic (M). Now we are concerned with infinite horizon behavior, but it is still reasonable to hope that the large population limit will again reduce some of our computations to a calculus problems. As one might expect, this approach is easiest to follow in the two-strategy case, where for each fixed population size N, the evolutionary process fX tN;" g is a birthand-death chain. When one takes the large population limit, the formulas for waiting times and for the stationary distribution can be evaluated using integral approximations [24,29,39,222]. Indeed, the approximations so obtained take an appealing simple form [182]. The analysis becomes more complicated beyond the two-strategy case, but certain models have proved amenable to analysis. For instance [80], characterizes large population stochastic stability in models based on imitation and mutation. Imitation ensures that the population spends nearly all periods on the edges of the simplex X, and the large population limit makes evaluating the probabilities of transitions along these edges relatively simple. If one supposes that agents play best responses to noisy payoffs, then one must account directly for the behavior of the process fX tN;" g in the interior of the simplex. One possibility is to combine the deterministic approximation results from Sect. “Deterministic Approximation” with techniques from the theory of stochastic approximation [20,21] to show that the large N limiting stationary distribution is concentrated on attractors of the mean dynamic. By combining this idea with convergence results for deterministic dynamics from Sect. “Global Convergence”, Ref. [104] shows that the limiting stationary distribution must be concentrated around equilibrium states in potential games, stable games, and supermodular games. The results in [104] do not address the question of equilibrium selection. However, for the specific case of logit evolution in potential games, a complete characterization of the large population limit of the process fX tN;" g has been obtained [23]. By combining deterministic approximation results, which describe the usual behavior of the process within basins of attraction, with a large deviations analysis, which characterizes the rare escapes from basins of attraction, one can obtain a precise asymptotic formula for the large N limiting stationary distribution. This formula accounts both for the typical procession of the process along solutions of the mean dynamic, and for the rare sojourns of the process against this deterministic flow.

Local Interaction All of the game dynamics considered so far have been based implicitly on the assumption of global interaction: each agent’s payoffs depend directly on all agents’ actions. In many contexts, one expects to the contrary that interactions will be local in nature: for instance, agents may live in fixed locations and interact only with neighbors. In addition to providing a natural fit for these applications, local interaction models respond to some of the criticisms of the stochastic stability literature. At the same time, once one moves beyond relatively simple cases, local interaction models become exceptionally complicated, and so lend themselves to methods of analysis very different from those considered thus far. Stochastic Stability and Equilibrium Selection Revisited In Sect. “Stochastic Stability and Equilibirum Selection”, we saw the prediction of risk dominant equilibrium play provided by stochastic stability models is subverted by the waiting-time critique: namely, that the length of time required before this equilibrium is reached may be extremely long. Ellison [67,68] shows that if interactions are local, then selection of the risk dominant equilibrium persists, and waiting times are no longer an issue. Example 31 In the simplest local interaction model, a population of N agents are located at N distinct positions around a circle. During each period of play, each agent plays the Stag Hunt game (Examples 2 and 30) with his two nearest neighbors, following the same action against both of his opponents. If we suppose that s 2 (h; 2h), so that hunting hare is the risk dominant strategy, then by definition, an agent whose neighbors play different strategies finds it optimal to choose H himself. Now suppose that there are Bernoulli arrivals of revision opportunities, and that decisions are based on best responses and rare mutations. To move from the all S state to the all H state, it is enough that a single agent mutates S to H. This one mutation begins a chain reaction: the mutating agent’s neighbors respond optimally by switching to H themselves; they are followed in this by their own neighbors; and the contagion continues until all agents choose H. Since a single mutation is always enough to spur the transition from all S to all H, the expected wait before this transition is small, even when the population is large. In contrast, the transition back from all H to all S is extremely unlikely. Even if all but one of the agents simultaneously mutate to S, the contagion process described above will return the population to the all-H state. Thus, while the transition from all-S to all-H occurs quickly, the

1021

1022

Evolutionary Game Theory

reverse transition takes even longer than in the global interaction setting. The local interaction approach to equilibrium selection has been advanced in a variety of directions: by allowing agents to choose their locations [69], or to pay a cost to choose different strategies against different opponents [86], and by basing agents’ decisions on the attainment of aspiration levels [11], or on imitation of successful opponents [9,10]. A portion of this literature initiated by Blume develops connections between local interaction models in evolutionary game theory with models from statistical mechanics [36,37,38,124,141]. These models provide a point of departure for research on complex spatial dynamics in games, which we consider next. Complex Spatial Dynamics The local interaction models described above address the questions of convergence to equilibrium and selection among multiple equilibria. In the cases where convergence and selection results obtain, behavior in these models is relatively simple, as most periods are spent with most agents coordinating on a single strategy. A distinct branch of the literature on evolution and local interaction focuses on cases with complex dynamics, where instead of settling quickly into a homogeneous, static configuration, behavior remains in flux, with multiple strategies coexisting for long periods of time. Example 32 Cooperating is a dominated strategy in the Prisoner’s Dilemma, and is not played in equilibrium in finitely repeated versions of this game. Nevertheless, a pair of Prisoner’s Dilemma tournaments conducted by Axelrod [14] were won by the strategy Tit-for-Tat, which cooperates against cooperative opponents and defects against defectors. Axelrod’s work spawned a vast literature aiming

to understand the persistence of individually irrational but socially beneficial behavior. To address this question, Nowak and May [153,154, 155,156,157] consider a population of agents who are repeatedly matched to play the Prisoner’s Dilemma C D

C 1 g

D " 0

;

where the greedy payoff g exceeds 1 and " > 0 is small. The agents are positioned on a two-dimensional grid. During each period, each agent plays the Prisoner’s Dilemma with the eight agents in his (Moore) neighborhood. In the simplest version of the model, all agents simultaneously update their strategies at the end of each period. If an agent’s total payoff that period is as high as that of any of neighbor, he continues to play the same strategy; otherwise, he switches to the strategy of the neighbor who obtained the highest payoff. Since defecting is a dominant strategy in the Prisoner’s Dilemma, one might expect the local interaction process to converge to a state at which all agents defect, as would be the case in nearly any model of global interaction. But while an agent is always better off defecting himself, he also is better off the more of his neighbors cooperate; and since evolution is based on imitation, cooperators tend to have more cooperators as neighbors than do defectors. In Figs. 9–11, we present snapshots of the local interaction process for choices of the greedy payoff g from each of three distinct parameter regions. If g > 53 (Fig. 9), the process quickly converges to a configuration containing a few rectangular islands of cooperators in a sea of defectors; the exact configuration depending on the initial conditions. If instead g < 85 (Fig. 10), the process moves towards a configuration in which agents other than those in a “web” of defectors cooperate. But for g 2 ( 85 ; 53 ) (Fig. 11), the sys-

Evolutionary Game Theory, Figure 9 Local interaction in a Prisoner’s Dilemma; greedy payoff g D 1:7. In Figs. 9–11, agents are arrayed on a 100 × 100 grid with periodic boundaries (i. e., a torus). Initial conditions are random with 75% cooperators and 25% defectors. Agents update simultaneously, imitating the neighbor who earned the highest payoff. Blue cells represent cooperators who also cooperated last period, green cells represent new cooperators; red cells represent defectors who also defected last period, yellow cells represent new defectors. (Figs. 9– 11 created using VirtualLabs [92])

Evolutionary Game Theory

Evolutionary Game Theory, Figure 10 Local interaction in a Prisoner’s Dilemma; greedy payoff g D 1:55

Evolutionary Game Theory, Figure 11 Local interaction in a Prisoner’s Dilemma; greedy payoff g D 1:65

tem evolves in a complicated fashion, with clusters of cooperators and of defectors forming, expanding, disappearing, and reforming. But while the configuration of behavior never stabilizes, the proportion of cooperators appears to settle down to about .30. The specification of the dynamics considered above, based on simultaneous updating and certain imitation of the most successful neighbor, presents a relatively favorable environment for cooperative behavior. Nevertheless, under Poisson arrivals of revision opportunities, or probabilistic decision rules, or both, cooperation can persist for very long periods of time for values of g significantly larger than 1 [154,155]. The literature on complex spatial dynamics in evolutionary game models is large and rapidly growing, with the evolution of behavior in the spatial Prisoners’ Dilemma

being the single most-studied environment. While analyses are typically based on simulations, analytical results have been obtained in some relatively simple settings [71,94]. Recent work on complex spatial dynamics has considered games with three or more strategies, including Rock– Paper–Scissors games, as well as public good contribution games and Prisoner’s Dilemmas with voluntary participation. Introducing more than two strategies can lead to qualitatively novel dynamic phenomena, including largescale spatial cycles and traveling waves [93,202,203]. In addition to simulations, the analysis of complex spatial dynamics is often based on approximation techniques from non-equilibrium statistical physics, and much of the research on these dynamics has appeared in the physics literature. [201] offers a comprehensive survey of work on this topic.

1023

1024

Evolutionary Game Theory

Applications Evolutionary game theory was created with biological applications squarely in mind. In the prehistory of the field, Fisher [73] and Hamilton [87] used game-theoretic ideas to understand the evolution of sex ratios. Maynard Smith [137,138,139,140] introduced his definition of ESS as a way of understanding ritualized animal conflicts. Since these early contributions, evolutionary game theory has been used to study a diverse array of biological questions, including mate choice, parental investment, parent-offspring conflict, social foraging, and predator-prey systems. For overviews of research on these and other topics in biology, see [65,88]. The early development of evolutionary game theory in economics was motivated primarily by theoretical concerns: the justification of traditional game-theoretic solution concepts, and the development of methods for equilibrium selection in games with multiple stable equilibria. More recently, evolutionary game theory has been applied to concrete economic environments, in some instances as a means of contending with equilibrium selection problems, and in others to obtain an explicitly dynamic model of the phenomena of interest. Of course, these applications are most successful when the behavioral assumptions that underlie the evolutionary approach are appropriate, and when the time horizon needed for the results to become relevant corresponds to the one germane to the application at hand. Topics in economics theoretical studied using the methods of evolutionary game theory range from behavior in markets [1,6,7,8,12,19,64,112,129,212], to bargaining and hold-up problems [32,46,57,66,164,208,220,221, 222], to externality and implementation problems [47,49, 136,174,177,180], to questions of public good provision and collective action [146,147,148]. The techniques described here are being applied with increasing frequency to problems of broader social science interest, including residential segregation [40,62,142,222,223,225,226] and cultural evolution [34,126], and to the study of behavior in transportation and computer networks [72,143,150,173, 175,177,197]. A proliferating branch of research extends the approaches described in this article to address the evolution of structure and behavior in social networks; a number of recent books [85,114,213] offer detailed treatments of work in this domain. Future Directions Evolutionary game theory is a maturing field; many basic theoretical issues are well understood, but many difficult questions remain. It is tempting to say that stochastic and

local interaction models offer the more open terrain for further explorations. But while it is true that we know less about these models than about deterministic evolutionary dynamics, even our knowledge of the latter is limited: while dynamics on one and two dimensional state spaces, and for games satisfying a few interesting structural assumptions, are well-understood, the dynamics of behavior in the vast majority of many-strategy games are not. The prospects for further applications of the tools of evolutionary game theory are brighter still. In economics, and in other social sciences, the analysis of mathematical models has too often been synonymous with the computation and evaluation of equilibrium behavior. The questions of whether and how equilibrium will come to be are often ignored, and the possibility of long-term disequilibrium behavior left unmentioned. For settings in which its assumptions are tenable, evolutionary game theory offers a host of techniques for modeling the dynamics of economic behavior. The exploitation of the possibilities for a deeper understanding of human social interactions has hardly begun. Acknowledgments The figures in Sects. “Deterministic Dynamics” and “Local Interaction” were created using Dynamo [184] and VirtualLabs [92], respectively. I am grateful to Caltech for its hospitality as I completed this article, and I gratefully acknowledge financial support under NSF Grant SES0617753. Bibliography 1. Agastya M (2004) Stochastic stability in a double auction. Games Econ Behav 48:203–222 2. Akin E (1979) The geometry of population genetics. Springer, Berlin 3. Akin E (1980) Domination or equilibrium. Math Biosci 50:239– 250 4. Akin E (1990) The differential geometry of population genetics and evolutionary games. In: Lessard S (ed) Mathematical and statistical developments of evolutionary theory. Kluwer, Dordrecht, pp 1–93 5. Akin E, Losert V (1984) Evolutionary dynamics of zero-sum games. J Math Biol 20:231–258 6. Alós-Ferrer C (2005) The evolutionary stability of perfectly competitive behavior. Econ Theory 26:497–516 7. Alós-Ferrer C, Ania AB, Schenk-Hoppé KR (2000) An evolutionary model of Bertrand oligopoly. Games Econ Behav 33:1–19 8. Alós-Ferrer C, Kirchsteiger G, Walzl M (2006) On the evolution of market institutions: The platform design paradox. Unpublished manuscript, University of Konstanz 9. Alós-Ferrer C, Weidenholzer S (2006) Contagion and efficiency. J Econ Theory forthcoming, University of Konstanz and University of Vienna

Evolutionary Game Theory

10. Alós-Ferrer C, Weidenholzer S (2006) Imitation, local interactions, and efficiency. Econ Lett 93:163–168 11. Anderlini L, Ianni A (1996) Path dependence and learning from neighbors. Games Econ Behav 13:141–177 12. Ania AB, Tröger T, Wambach A (2002) An evolutionary analysis of insurance markets with adverse selection. Games Econ Behav 40:153–184 13. Arneodo A, Coullet P, Tresser C (1980) Occurrence of strange attractors in three-dimensional Volterra equations. Phys Lett 79A:259–263 14. Axelrod R (1984) The evolution of cooperation. Basic Books, New York 15. Balkenborg D, Schlag KH (2001) Evolutionarily stable sets. Int J Game Theory 29:571–595 16. Basu K, Weibull JW (1991) Strategy sets closed under rational behavior. Econ Lett 36:141–146 17. Beckmann M, McGuire CB, Winsten CB (1956) Studies in the economics of transportation. Yale University Press, New Haven 18. Beggs AW (2002) Stochastic evolution with slow learning. Econ Theory 19:379–405 19. Ben-Shoham A, Serrano R, Volij O (2004) The evolution of exchange. J Econ Theory 114:310–328 20. Benaïm M (1998) Recursive algorithms, urn processes, and the chaining number of chain recurrent sets. Ergod Theory Dyn Syst 18:53–87 21. Benaïm M, Hirsch MW (1999) On stochastic approximation algorithms with constant step size whose average is cooperative. Ann Appl Probab 30:850–869 22. Benaïm M, Hofbauer J, Hopkins E (2006) Learning in games with unstable equilibria. Unpublished manuscript, Université de Neuchâtel, University of Vienna and University of Edinburgh 23. Benaïm M, Sandholm WH (2007) Logit evolution in potential games: Reversibility, rates of convergence, large deviations, and equilibrium selection. Unpublished manuscript, Université de Neuchâtel and University of Wisconsin 24. Benaïm M, Weibull JW (2003) Deterministic approximation of stochastic evolution in games. Econometrica 71:873–903 25. Berger U, Hofbauer J (2006) Irrational behavior in the Brownvon Neumann-Nash dynamics. Games Econ Behav 56:1–6 26. Bergin J, Bernhardt D (2004) Comparative learning dynamics. Int Econ Rev 45:431–465 27. Bergin J, Lipman BL (1996) Evolution with state-dependent mutations. Econometrica 64:943–956 28. Binmore K, Gale J, Samuelson L (1995) Learning to be imperfect: The ultimatum game. Games Econ Behav 8:56–90 29. Binmore K, Samuelson L (1997) Muddling through: Noisy equilibrium selection. J Econ Theory 74:235–265 30. Binmore K, Samuelson L (1999) Evolutionary drift and equilibrium selection. Rev Econ Stud 66:363–393 31. Binmore K, Samuelson L, Vaughan R (1995) Musical chairs: Modeling noisy evolution. Games Econ Behav 11:1–35 32. Binmore K, Samuelson L, Peyton Young H (2003) Equilibrium selection in bargaining models. Games Econ Behav 45:296– 328 33. Bishop DT, Cannings C (1978) A generalised war of attrition. J Theor Biol 70:85–124 34. Bisin A, Verdier T (2001) The economics of cultural transmission and the dynamics of preferences. J Econ Theory 97:298– 319

35. Björnerstedt J, Weibull JW (1996) Nash equilibrium and evolution by imitation. In: Arrow KJ et al. (eds) The Rational Foundations of Economic Behavior. St. Martin’s Press, New York, pp 155–181 36. Blume LE (1993) The statistical mechanics of strategic interaction. Games Econ Behav 5:387–424 37. Blume LE (1995) The statistical mechanics of best reponse strategy revision. Games Econ Behav 11:111–145 38. Blume LE (1997) Population games. In: Arthur WB, Durlauf SN, Lane DA (eds) The economy as an evolving complex system II. Addison–Wesley, Reading pp 425–460 39. Blume LE (2003) How noise matters. Games Econ Behav 44:251–271 40. Bøg M (2006) Is segregation robust? Unpublished manuscript, Stockholm School of Economics 41. Bomze IM (1990) Dynamical aspects of evolutionary stability. Monatshefte Mathematik 110:189–206 42. Bomze IM (1991) Cross entropy minimization in uninvadable states of complex populations. J Math Biol 30:73–87 43. Börgers T, Sarin R (1997) Learning through reinforcement and the replicator dynamics. J Econ Theory 77:1–14 44. Boylan RT (1995) Continuous approximation of dynamical systems with randomly matched individuals. J Econ Theory 66:615–625 45. Brown GW, von Neumann J (1950) Solutions of games by differential equations. In: Kuhn HW, Tucker AW (eds) Contributions to the theory of games I, volume 24 of Annals of Mathematics Studies. Princeton University Press, Princeton, pp 73– 79 46. Burke MA, Peyton Young H (2001) Competition and custom in economic contracts: A case study of Illinois agriculture. Am Econ Rev 91:559–573 47. Cabrales A (1999) Adaptive dynamics and the implementation problem with complete information. J Econ Theory 86:159–184 48. Cabrales A (2000) Stochastic replicator dynamics. Int Econ Rev 41:451–481 49. Cabrales A, Ponti G (2000) Implementation, elimination of weakly dominated strategies and evolutionary dynamics. Rev Econ Dyn 3:247–282 50. Crawford VP (1991) An “evolutionary” interpretation of Van Huyck, Battalio, and Beil’s experimental results on coordination. Games Econ Behav 3:25–59 51. Cressman R (1996) Evolutionary stability in the finitely repeated prisoner’s dilemma game. J Econ Theory 68:234–248 52. Cressman R (1997) Local stability of smooth selection dynamics for normal form games. Math Soc Sci 34:1–19 53. Cressman R (2000) Subgame monotonicity in extensive form evolutionary games. Games Econ Behav 32:183–205 54. Cressman R (2003) Evolutionary dynamics and extensive form games. MIT Press, Cambridge 55. Cressman R, Schlag KH (1998) On the dynamic (in)stability of backwards induction. J Econ Theory 83:260–285 56. Dafermos S, Sparrow FT (1969) The traffic assignment problem for a general network. J Res Nat Bureau Stand B 73:91– 118 57. Dawid H, Bentley MacLeod W (2008) Hold-up and the evolution of investment and bargaining norms. Games Econ Behav forthcoming62:26–52 58. Dawkins R (1976) The selfish gene. Oxford University Press, Oxford

1025

1026

Evolutionary Game Theory

59. Dekel E, Scotchmer S (1992) On the evolution of optimizing behavior. J Econ Theory 57:392–407 60. Demichelis S, Ritzberger K (2003) From evolutionary to strategic stability. J Econ Theory 113:51–75 61. Dindoš M, Mezzetti C (2006) Better-reply dynamics and global convergence to Nash equilibrium in aggregative games. Games Econ Behav 54:261–292 62. Dokumacı E, Sandholm WH (2007) Schelling redux: An evolutionary model of residential segregation. Unpublished manuscript, University of Wisconsin 63. Dokumacı E, Sandholm WH (2007) Stochastic evolution with perturbed payoffs and rapid play. Unpublished manuscript, University of Wisconsin 64. Droste E, Hommes, Tuinstra J (2002) Endogenous fluctuations under evolutionary pressure in Cournot competition. Games Econ Behav 40:232–269 65. Dugatkin LA, Reeve HK (eds)(1998) Game theory and animal behavior. Oxford University Press, Oxford 66. Ellingsen T, Robles J (2002) Does evolution solve the hold-up problem? Games Econ Behav 39:28–53 67. Ellison G (1993) Learning, local interaction, and coordination. Econometrica 61:1047–1071 68. Ellison G (2000) Basins of attraction, long run equilibria, and the speed of step-bystep evolution. Rev Econ Stud 67:17–45 69. Ely JC (2002) Local conventions. Adv Econ Theory 2:1(30) 70. Ely JC, Sandholm WH (2005) Evolution in Bayesian games I: Theory. Games Econ Behav 53:83–109 71. Eshel I, Samuelson L, Shaked A (1998) Altruists, egoists, and hooligans in a local interaction model. Am Econ Rev 88:157– 179 72. Fischer S, Vöcking B (2006) On the evolution of selfish routing. Unpublished manuscript, RWTH Aachen 73. Fisher RA (1930) The genetical theory of natural selection. Clarendon Press, Oxford 74. Foster DP, Peyton Young H (1990) Stochastic evolutionary game dynamics. Theor Popul Biol 38:219–232 also in Corrigendum 51:77–78 (1997) 75. Freidlin MI, Wentzell AD (1998) Random perturbations of dynamical systems, 2nd edn. Springer, New York 76. Friedman D (1991) Evolutionary games in economics. Econometrica 59:637–666 77. Friedman D, Yellin J (1997) Evolving landscapes for population games. Unpublished manuscript, UC Santa Cruz 78. Friedman JW, Mezzetti C (2001) Learning in games by random sampling. J Econ Theory 98:55–84 79. Fudenberg D, Harris C (1992) Evolutionary dynamics with aggregate shocks. J Econ Theory 57:420–441 80. Fudenberg D, Imhof LA (2006) Imitation processes with small mutations. J Econ Theory 131:251–262 81. Fudenberg D, Imhof LA (2008) Monotone imitation dynamics in large populations. J Econ Theory 140:229–245 82. Fudenberg D, Levine DK (1998) Theory of learning in games. MIT Press, Cambridge 83. Gaunersdorfer A, Hofbauer J (1995) Fictitious play, shapley polygons, and the replicator equation. Games Econ Behav 11:279–303 84. Gilboa I, Matsui A (1991) Social stability and equilibrium. Econometrica 59:859–867 85. Goyal S (2007) Connections: An introduction to the economics of networks. Princeton University Press, Princeton

86. Goyal S, Janssen MCW (1997) Non-exclusive conventions and social coordination. J Econ Theory 77:34–57 87. Hamilton WD (1967) Extraordinary sex ratios. Science 156:477–488 88. Hammerstein P, Selten R (1994) Game theory and evolutionary biology. In: Aumann RJ, Hart S (eds) Handbook of Game Theory. vol 2, chap 28, Elsevier, Amsterdam, pp 929–993 89. Harsanyi JC, Selten R (1988) A General Theory of equilibrium selection in games. MIT Press, Cambridge 90. Hart S (2002) Evolutionary dynamics and backward induction. Games Econ Behav 41:227–264 91. Hart S, Mas-Colell A (2003) Uncoupled dynamics do not lead to Nash equilibrium. Am Econ Rev 93:1830–1836 92. Hauert C (2007) Virtual Labs in evolutionary game theory. Software http://www.univie.ac.at/virtuallabs. Accessed 31 Dec 2007 93. Hauert C, De Monte S, Hofbauer J, Sigmund K (2002) Volunteering as Red Queen mechanism for cooperation in public goods games. Science 296:1129–1132 94. Herz AVM (1994) Collective phenomena in spatially extended evolutionary games. J Theor Biol 169:65–87 95. Hines WGS (1987) Evolutionary stable strategies: A review of basic theory. Theor Popul Biol 31:195–272 96. Hofbauer J (1995) Imitation dynamics for games. Unpublished manuscript, University of Vienna 97. Hofbauer J (1995) Stability for the best response dynamics. Unpublished manuscript, University of Vienna 98. Hofbauer J (2000) From Nash and Brown to Maynard Smith: Equilibria, dynamics and ESS. Selection 1:81–88 99. Hofbauer J, Hopkins E (2005) Learning in perturbed asymmetric games. Games Econ Behav 52:133–152 100. Hofbauer J, Oechssler J, Riedel F (2005) Brown-von NeumannNash dynamics: The continuous strategy case. Unpublished manuscript, University of Vienna 101. Hofbauer J, Sandholm WH (2002) On the global convergence of stochastic fictitious play. Econometrica 70:2265–2294 102. Hofbauer J, Sandholm WH (2006) Stable games. Unpublished manuscript, University of Vienna and University of Wisconsin 103. Hofbauer J, Sandholm WH (2006) Survival of dominated strategies under evolutionary dynamics. Unpublished manuscript, University of Vienna and University of Wisconsin 104. Hofbauer J, Sandholm WH (2007) Evolution in games with randomly disturbed payoffs. J Econ Theory 132:47–69 105. Hofbauer J, Schuster P, Sigmund K (1979) A note on evolutionarily stable strategies and game dynamics. J Theor Biol 81:609–612 106. Hofbauer J, Sigmund K (1988) Theory of evolution and dynamical systems. Cambridge University Press, Cambridge 107. Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge 108. Hofbauer J, Sigmund K (2003) Evolutionary game dynamics. Bull Am Math Soc (New Series) 40:479–519 109. Hofbauer J, Swinkels JM (1996) A universal Shapley example. Unpublished manuscript, University of Vienna and Northwestern University 110. Hofbauer J, Weibull JW (1996) Evolutionary selection against dominated strategies. J Econ Theory 71:558–573 111. Hopkins E (1999) A note on best response dynamics. Games Econ Behav 29:138–150

Evolutionary Game Theory

112. Hopkins E, Seymour RM (2002) The stability of price dispersion under seller and consumer learning. Int Econ Rev 43:1157–1190 113. Imhof LA (2005) The long-run behavior of the stochastic replicator dynamics. Ann Appl Probab 15:1019–1045 114. Jackson MO Social and economic networks. Princeton University Press, Princeton, forthcoming 115. Jacobsen HJ, Jensen M, Sloth B (2001) Evolutionary learning in signalling games. Games Econ Behav 34:34–63 116. Jordan JS (1993) Three problems in learning mixed-strategy Nash equilibria. Games Econ Behav 5:368–386 117. Josephson J (2008) Stochastic better reply dynamics in finite games. Econ Theory, 35:381–389 118. Josephson J, Matros A (2004) Stochastic imitation in finite games. Games Econ Behav 49:244–259 119. Kandori M, Mailath GJ, Rob R (1993) Learning, mutation, and long run equilibria in games. Econometrica 61:29–56 120. Kandori M, Rob R (1995) Evolution of equilibria in the long run: A general theory and applications. J Econ Theory 65:383– 414 121. Kandori M, Rob R (1998) Bandwagon effects and long run technology choice. Games Econ Behav 22:84–120 122. Kim Y-G, Sobel J (1995) An evolutionary approach to pre-play communication. Econometrica 63:1181–1193 123. Kimura M (1958) On the change of population fitness by natural selection. Heredity 12:145–167 124. Kosfeld M (2002) Stochastic strategy adjustment in coordination games. Econ Theory 20:321–339 125. Kukushkin NS (2004) Best response dynamics in finite games with additive aggregation. Games Econ Behav 48:94–110 126. Kuran T, Sandholm WH (2008) Cultural integration and its discontents. Rev Economic Stud 75:201–228 127. Kurtz TG (1970) Solutions of ordinary differential equations as limits of pure jump Markov processes. J Appl Probab 7:49–58 128. Kuzmics C (2004) Stochastic evolutionary stability in extensive form games of perfect information. Games Econ Behav 48:321–336 129. Lahkar R (2007) The dynamic instability of dispersed price equilibria. Unpublished manuscript, University College London 130. Lahkar R, Sandholm WH The projection dynamic and the geometry of population games. Games Econ Behav, forthcoming 131. Losert V, Akin E (1983) Dynamics of games and genes: Discrete versus continuous time. J Math Biol 17:241–251 132. Lotka AJ (1920) Undamped oscillation derived from the law of mass action. J Am Chem Soc 42:1595–1598 133. Mailath GJ (1992) Introduction: Symposium on evolutionary game theory. J Econ Theory 57:259–277 134. Maruta T (1997) On the relationship between risk-dominance and stochastic stability. Games Econ Behav 19:221–234 135. Maruta T (2002) Binary games with state dependent stochastic choice. J Econ Theory 103:351–376 136. Mathevet L (2007) Supermodular Bayesian implementation: Learning and incentive design. Unpublished manuscript, Caltech 137. Maynard Smith J (1972) Game theory and the evolution of fighting. In: Maynard Smith J On Evolution. Edinburgh University Press, Edinburgh, pp 8–28 138. Maynard Smith J (1974) The theory of games and the evolution of animal conflicts. J Theor Biol 47:209–221

139. Maynard Smith J (1982) Evolution and the theory of games. Cambridge University Press, Cambridge 140. Maynard Smith J, Price GR (1973) The logic of animal conflict. Nature 246:15–18 141. Miekisz ˛ J (2004) Statistical mechanics of spatial evolutionary games. J Phys A 37:9891–9906 142. Möbius MM (2000) The formation of ghettos as a local interaction phenomenon. Unpublished manuscript, MIT 143. Monderer D, Shapley LS (1996) Potential games. Games Econ Behav 14:124–143 144. Moran PAP (1962) The statistical processes of evolutionary theory. Clarendon Press, Oxford 145. Myatt DP, Wallace CC (2003) A multinomial probit model of stochastic evolution. J Econ Theory 113:286–301 146. Myatt DP, Wallace CC (2007) An evolutionary justification for thresholds in collective-action problems. Unpublished manuscript, Oxford University 147. Myatt DP, Wallace CC (2008) An evolutionary analysis of the volunteer’s dilemma. Games Econ Behav 62:67–76 148. Myatt DP, Wallace CC (2008) When does one bad apple spoil the barrel? An evolutionary analysis of collective action. Rev Econ Stud 75:499–527 149. Nachbar JH (1990) “Evolutionary” selection dynamics in games: Convergence and limit properties. Int J Game Theory 19:59–89 150. Nagurney A, Zhang D (1997) Projected dynamical systems in the formulation, stability analysis and computation of fixed demand traffic network equilibria. Transp Sci 31:147–158 151. Nash JF (1951) Non-cooperative games. Ann Math 54:287– 295 152. Nöldeke G, Samuelson L (1993) An evolutionary analysis of backward and forward induction. Games Econ Behav 5:425– 454 153. Nowak MA (2006) Evolutionary dynamics: Exploring the equations of life. Belknap/Harvard, Cambridge 154. Nowak MA, Bonhoeffer S, May RM (1994) More spatial games. Int J Bifurc Chaos 4:33–56 155. Nowak MA, Bonhoeffer S, May RM (1994) Spatial games and the maintenance of cooperation. Proc Nat Acad Sci 91:4877– 4881 156. Nowak MA, May RM (1992) Evolutionary games and spatial chaos. Nature 359:826–829 157. Nowak MA, May RM (1993) The spatial dilemmas of evolution. Int J Bifurc Chaos 3:35–78 158. Nowak MA, Sasaki A, Taylor C, Fudenberg D (2004) Emergence of cooperation and evolutionary stability in finite populations. Nature 428:646–650 159. Oechssler J, Riedel F (2001) Evolutionary dynamics on infinite strategy spaces. Econ Theory 17:141–162 160. Oechssler J, Riedel F (2002) On the dynamic foundation of evolutionary stability in continuous models. J Econ Theory 107:141–162 161. Rhode P, Stegeman M (1996) A comment on “learning, mutation, and long run equilibria in games”. Econometrica 64:443– 449 162. Ritzberger K, Weibull JW (1995) Evolutionary selection in normal form games. Econometrica 63:1371–1399 163. Robles J (1998) Evolution with changing mutation rates. J Econ Theory 79:207–223 164. Robles J (2008) Evolution, bargaining and time preferences. Econ Theory 35:19–36

1027

1028

Evolutionary Game Theory

165. Robson A, Vega-Redondo F (1996) Efficient equilibrium selection in evolutionary games with random matching. J Econ Theory 70:65–92 166. Rosenthal RW (1973) A class of games possessing pure strategy Nash equilibria. Int J Game Theory 2:65–67 167. Samuelson L (1988) Evolutionary foundations of solution concepts for finite, two-player, normal-form games. In: Vardi MY (ed) Proc. of the Second Conference on Theoretical Aspects of Reasoning About Knowledge (Pacific Grove, CA, 1988), Morgan Kaufmann Publishers, Los Altos, pp 211–225 168. Samuelson L (1994) Stochastic stability in games with alternative best replies. J Econ Theory 64:35–65 169. Samuelson L (1997) Evolutionary games and equilibrium selection. MIT Press, Cambridge 170. Samuelson L, Zhang J (1992) Evolutionary stability in asymmetric games. J Econ Theory 57:363–391 171. Sandholm WH (1998) Simple and clever decision rules in a model of evolution. Econ Lett 61:165–170 172. Sandholm WH (2001) Almost global convergence to p-dominant equilibrium. Int J Game Theory 30:107–116 173. Sandholm WH (2001) Potential games with continuous player sets. J Econ Theory 97:81–108 174. Sandholm WH (2002) Evolutionary implementation and congestion pricing. Rev Econ Stud 69:81–108 175. Sandholm WH (2003) Evolution and equilibrium under inexact information. Games Econ Behav 44:343–378 176. Sandholm WH (2005) Excess payoff dynamics and other wellbehaved evolutionary dynamics. J Econ Theory 124:149–170 177. Sandholm WH (2005) Negative externalities and evolutionary implementation. Rev Econ Stud 72:885–915 178. Sandholm WH (2006) Pairwise comparison dynamics. Unpublished manuscript, University of Wisconsin 179. Sandholm WH (2007) Evolution in Bayesian games II: Stability of purified equilibria. J Econ Theory 136:641–667 180. Sandholm WH (2007) Pigouvian pricing and stochastic evolutionary implementation. J Econ Theory 132:367–382 181. Sandholm WH (2007) Large population potential games. Unpublished manuscript, University of Wisconsin 182. Sandholm WH (2007) Simple formulas for stationary distributions and stochastically stable states. Games Econ Behav 59:154–162 183. Sandholm WH Population games and evolutionary dynamics. MIT Press, Cambridge, forthcoming 184. Sandholm WH, Dokumacı E (2007) Dynamo: Phase diagrams for evolutionary dynamics. Software http://www.ssc.wisc. edu/~whs/dynamo 185. Sandholm WH, Dokumacı E, Lahkar R The projection dynamic and the replicator dynamic. Games Econ Behav, forthcoming 186. Sandholm WH, Pauzner A (1998) Evolution, population growth, and history dependence. Games Econ Behav 22:84– 120 187. Sato Y, Akiyama E, Doyne Farmer J (2002) Chaos in learning a simple two-person game. Proc Nat Acad Sci 99:4748–4751 188. Schlag KH (1998) Why imitate, and if so, how? A boundedly rational approach to multi-armed bandits. J Econ Theory 78:130–156 189. Schuster P, Sigmund K (1983) Replicator dynamics. J Theor Biol 100:533–538 190. Schuster P, Sigmund K, Hofbauer J, Wolff R (1981) Selfregu-

191. 192. 193.

194. 195. 196.

197.

198.

199. 200. 201. 202. 203.

204. 205. 206. 207. 208. 209. 210. 211. 212. 213. 214. 215.

lation of behaviour in animal societies I: Symmetric contests. Biol Cybern 40:1–8 Selten R (1991) Evolution, learning, and economic behavior. Games Econ Behav 3:3–24 Shahshahani S (1979) A new mathematical framework for the study of linkage and selection. Mem Am Math Soc 211 Shapley LS (1964) Some topics in two person games. In: Dresher M, Shapley LS, Tucker AW (eds) Advances in game theory. vol 52 of Annals of Mathematics Studies. Princeton University Press, Princeton, pp 1–28 Skyrms B (1990) The Dynamics of Rational Deliberation. Harvard University Press, Cambridge Skyrms B (1992) Chaos in game dynamics. J Log Lang Inf 1:111–130 Smith HL (1995) Monotone Dynamical Systems: An introduction to the theory of competitive and cooperative systems. American Mathematical Society, Providence, RI Smith MJ (1984) The stability of a dynamic model of traffic assignment –an application of a method of Lyapunov. Transp Sci 18:245–252 Stegeman M, Rhode P (2004) Stochastic Darwinian equilibria in small and large populations. Games Econ Behav 49:171– 214 Swinkels JM (1992) Evolutionary stability with equilibrium entrants. J Econ Theory 57:306–332 Swinkels JM (1993) Adjustment dynamics and rational play in games. Games Econ Behav 5:455–484 Szabó G, Fáth G (2007) Evolutionary games on graphs. Phys Rep 446:97–216 Szabó G, Hauert C (2002) Phase transitions and volunteering in spatial public goods games. Phys Rev Lett 89:11801(4) Tainaka K-I (2001) Physics and ecology of rock-paper-scissors game. In: Marsland TA, Frank I (eds) Computers and games, Second International Conference (Hamamatsu 2000), vol 2063 in Lecture Notes in Computer Science. Springer, Berlin, pp 384–395 Tanabe Y (2006) The propagation of chaos for interacting individuals in a large population. Math Soc Sci 51:125–152 Taylor PD, Jonker L (1978) Evolutionarily stable strategies and game dynamics. Math Biosci 40:145–156 Thomas B (1985) On evolutionarily stable sets. J Math Biol 22:105–115 Topkis D (1979) Equilibrium points in nonzero-sum n-person submodular games. SIAM J Control Optim 17:773–787 Tröger T (2002) Why sunk costs matter for bargaining outcomes: An evolutionary approach. J Econ Theory 102:28–53 Ui T (1998) Robustness of stochastic stability. Unpublished manuscript, Bank of Japan van Damme E, Weibull JW (2002) Evolution in games with endogenous mistake probabilities. J Econ Theory 106:296–315 Vega-Redondo F (1996) Evolution, games, and economic behaviour. Oxford University Press, Oxford Vega-Redondo F (1997) The evolution of Walrasian behavior. Econometrica 65:375–384 Vega-Redondo F (2007) Complex social networks. Cambridge University Press, Cambridge Volterra V (1931) Lecons sur la Theorie Mathematique de la Lutte pour la Vie. Gauthier–Villars, Paris von Neumann J, Morgenstern O (1944) Theory of games and economic behavior. Prentice–Hall, Princeton

Evolutionary Game Theory

216. Weibull JW (1995) Evolutionary game theory. MIT Press, Cambridge 217. Weibull JW (1996) The mass action interpretation. Excerpt from “The work of John Nash in game theory: Nobel Seminar, December 8, 1994”. J Econ Theory 69:165–171 218. Weissing FJ (1991) Evolutionary stability and dynamic stability in a class of evolutionary normal form games. In: Selten R (ed) Game Equilibrium Models I. Springer, Berlin, pp 29–97 219. Peyton Young H (1993) The evolution of conventions. Econometrica 61:57–84 220. Peyton Young H (1993) An evolutionary model of bargaining. J Econ Theory 59:145–168 221. Peyton Young H (1998) Conventional contracts. Review Econ Stud 65:773–792

222. Peyton Young H (1998) Individual strategy and social structure. Princeton University Press, Princeton 223. Peyton Young H (2001) The dynamics of conformity. In: Durlauf SN, Peyton Young H (eds) Social dynamics. Brookings Institution Press/MIT Press, Washington/Cambridge, pp 133– 153 224. Zeeman EC (1980) Population dynamics from game theory. In: Nitecki Z, Robinson C (eds) Global theory of dynamical systems (Evanston, 1979). number 819 in Lecture Notes in Mathematics. Springer, Berlin, pp 472–497 225. Zhang J (2004) A dynamic model of residential segregation. J Math Sociol 28:147–170 226. Zhang J (2004) Residential segregation in an all-integrationist world. J Econ Behav Organ 24:533–550

1029

1030

Evolution in Materio

Evolution in Materio SIMON HARDING1 , JULIAN F. MILLER2 1 Department of Computer Science, Memorial University, St. John’s, Canada 2 Department of Electronics, University of York, Heslington, UK Article Outline Glossary Definition of the Subject Introduction Evolutionary Algorithms Evolution in Materio: Historical Background Evolution in Materio: Defining Suitable Materials Evolution in Materio Is Verified with Liquid Crystal Evolution in Materio Using Liquid Crystal: Implementational Details The Computational Power of Materials Future Directions Bibliography Glossary Evolutionary algorithm A computer algorithm loosely inspired by Darwinian evolution. Generate-and-test The process of generating a potential solution to a computational problem and testing it to see how good a solution it is. The idea behind it is that no human ingenuity is employed to make good solutions more likely. Genotype A string of information that encodes a potential solution instance of a problem and allows its suitability to be assessed. Evolution in materio The method of applying computer controlled evolution to manipulate or configure a physical system. Liquid crystal Substances that have properties between those of a liquid and a crystal. Definition of the Subject Evolution in materio refers to the use of computers running search algorithms, called evolutionary algorithms, to find the values of variables that should be applied to material systems so that they carry out useful computation. Examples of such variables might be the location and magnitude of voltages that need to be applied to a particular physical system. Evolution in materio is a methodology for programming materials that utilizes physical effects that the human programmer need not be aware of.

It is a general methodology for obtaining analogue computation that is specific to the desired problem domain. Although a form of this methodology was hinted at in the work of Gordon Pask in the 1950s it was not convincingly demonstrated until 1996 by Adrian Thompson, who showed that physical properties of a digital chip could be exploited by computer controlled evolution. This article describes the first demonstration that such a method can be used to obtain specific analogue computation in a non-silicon based physical material (liquid crystal). The work is important for a number of reasons. Firstly, it proposes a general method for building analogue computational devices. Secondly it explains how previously unknown physical effects may be utilized to carry out computations. Thirdly, it presents a method that can be used to discover useful physical effects that can form the basis of future computational devices. Introduction Physical Computation Classical computation is founded on a mathematical model of computation based on an abstract (but physically inspired) machine called a Turing Machine [1]. A Turing machine is a machine that can write or erase symbols on a possibly infinite one dimensional tape. Its actions are determined by a table of instructions that determine what the machine will write on the tape (by moving one square left or right) given its state (stored in a state register) and the symbol on the tape. Turing showed that the calculations that could be performed on such a machine accord with the notion of computation in mathematics. The Turing machine is an abstraction (partly because it uses a possibly infinite tape) and to this day it is still not understood what limitations or extensions to the computational power of Turing’s model might be possible using real physical processes. Von Neumann and others at the Institute for Advanced Study at Princeton devised a design for a computer based on the ideas of Turing that has formed the foundation of modern computers. Modern computers are digital in operation. Although they are made of physical devices (i. e. transistors), computations are made on the basis of whether a voltage is above or below some threshold. Prior to the invention of digital computers there have been a variety of analogue computing machines. Some of these were purely mechanical (e. g. an abacus, a slide-rule, Charles Babbage’s difference engine, Vannevar Bush’s Differential Analyzer) but later computing machines were built using operational amplifiers [2]. There are many aspects of computation that were deliberately ignored by Turing in his model of computation.

Evolution in Materio

For instance, speed, programmability, parallelism, openess, adaptivity are not considered. The speed at which an operation can be performed is clearly an important issue since it would be of little use to have a machine that can calculate any computable function but takes an arbitrarily large amount of time to do so. Programmability is another issue that is of great importance. Writing programs directly in the form of instruction tables that could be used with a device based on a Turing is extremely tedious. This is why many high-level computer languages have been devised. The general issue of how to subdivide a computer program into a number of parallel executing processes so that the intended computation is carried out as quickly as possible is still unsolved. Openness refers to systems that can interact with an external environment during their operation. Openness is exhibited strongly in biological systems where new resources can be added or removed either by an external agency or by the actions taken by the system itself. Adaptivity refers to the ability of systems to change their characteristics in response to an environment. In addition to these aspects, the extent to which the underlying physics affects both the abstract notion of computation and its tractability has been brought to prominence through the discovery of quantum computation, where Deutsch pointed out that Turing machines implicitly use assumptions based on physics [3]. He also showed that through ‘quantum parallelism’ certain computations could be performed much more quickly than on classical computers. Other forms of physical computation that have recently been explored are: reaction-diffusion systems [4], DNA computing [5,6] and synthetic biology [7]. In the UK a number of Grand Challenges in computing research have been proposed [8], in particular ‘Journeys in Non-Classical Computation’ [9,10] seeks to explore, unify and generalize many diverse non-classical computational paradigms to produce a mature science of computation. Toffoli argued that ‘Nothing Makes Sense in Computing Except in the Light of Evolution’ [11]. He argues firstly that a necessary but not sufficient condition for a computation to have taken place, is when a novel function is produced from a fixed and finite repertoire of components (i. e. logic gates, protein molecules). He suggests that a sufficient condition requires intention. That is to say, we cannot argue that computation has taken place unless a system has arisen for a higher purpose (this is why he insists on intention as being a prerequisite for computation). Otherwise, almost everything is carrying out some form of computation (which is not a helpful point of view). Thus a Turing machine does not carry out computations unless

it has been programmed to do so, and since natural evolution constructs organisms that have an increased chance of survival (the higher ‘purpose’) we can regard them as carrying out computations. It is in this sense that Toffoli points to the fundamental role of evolution in the definition of a computation as it has provided animals with the ability to have intention. This brings us to one of the fundamental questions in computation. How can we program a physical system to perform a particular computation? The dominant method used to answer this question has been to construct logic gates and from these build a von Neumann machine (i. e. a digital computer). The mechanism that has been used to devise a computer program to carry out a particular computation is the familiar top-down design process, where ultimately the computation is represented using Boolean operations. According to Conrad this process leads us to pay “The Price of Programmability” [12], whereby in conventional programming and design we proceed by excluding many of the processes that may lead to us solving the problem at hand. Natural evolution does not do this. It is noteworthy that natural evolution has constructed systems of extraordinary sophistication, complexity and computational power. We argue that it is not possible to construct computational systems of such power using a conventional methodology and that complex software systems that directly utilize physical effects will require some form of search process akin to natural evolution together with a way of manipulating the properties of materials. We suggest that some form of evolution ought to be an appropriate methodology for arriving at physical systems that compute. In this chapter we discuss work that has adopted this methodology. We call it evolution in materio. Evolutionary Algorithms Firstly we propose that to overcome the limitations of a top-down design process, we should use a more unconstrained design technique that is more akin to a process of generate-and-test. However, a guided search method is also required that spends more time in areas of the search space that confer favorable traits for computation. One such approach is the use of evolutionary algorithms. These algorithms are inspired by the Darwinian concepts of survival of the fittest and the genetic inheritance of information. Using a computer, a population of randomly generated solutions is systematically tested, selected and modified until a solution has been found [13,14,15]. As in nature, a genetic algorithm optimizes a population of individuals by selecting the ones that are best suited to solving a problem and allowing their genetic make-up

1031

1032

Evolution in Materio

to propagate into future generations. It is typically guided only by the evolutionary process and often contains very limited domain specific knowledge. Although these algorithms are bio-inspired, it is important that any analogies drawn with nature are considered only as analogies. Their lack of specialization for a problem makes genetic algorithms ideal search techniques where little is known about a problem. As long as a suitable representation is chosen along with a fitness function that allows for ease of movement around a search space, a GA can search vast problem spaces rapidly. Another feature of their behavior is that provided that the genetic representation chosen is sufficiently expressive the algorithm can explore potential solutions that are unconventional. A human designer normally has a set of predefined rules and strategies that they adopt to solve a problem. These preconceptions may prevent trying a new method, and may prevent the designer using a better solution. A genetic algorithm does not necessarily require such domain knowledge. Evolutionary algorithms have been shown to be competitive or surpass human designed solutions in a number of different areas. The largest conference on evolutionary computation called GECCO has an annual session on evolutionary approaches that have produced human competitive scientific and technological results. Moreover the increase in computational power of computers makes such results increasingly more likely. Many different versions of genetic algorithms exist. Variations in representations and genetic operators change the performance characteristics of the algorithm, and depending on the problem, people employ a variety of modifications of the basic algorithm. However, all the algorithms follow a similar basic set of steps. Firstly the numbers or physical variables that are required to define a potential solution have to be identified and encoded into a data representation that can be manipulated inside a computer program. This is referred to as the encoding step. The representation chosen is of crucial importance as it is possible to inadvertedly choose overly constrained representations which limits the portions of the space of potential solutions that will be considered by the evolutionary algorithm. Generally the encoded information is referred to as a genotype and genotypes are sometimes divided into a number of separate strings called chromosomes. Each entry in the chromosome string is an allele, and one or more of these make up a gene. The second step is to create inside the computer a number of independently generated genotypes whose alleles have been chosen with uniform probability from the allowed set of values. This collection of genotypes is called a population.

In its most basic form, an individual genotype is a single chromosome made of 1 s and 0 s. However, it is also common to use integer and floating-point numbers if they more appropriate for the task at hand. Combinations of different representations can also be used within the same chromosome, and that is the approach used in the work described in this article. Whatever representation is used, it should be able to adequately describe the individual and provide a mechanism where its characteristics can be transferred to future generations without loss of information. Each of these individuals is then decoded into its phenotype, the outward, physical manifestation of the individual and tested to see how well the candidate solution solves the problem at hand. This is usually returned as a number that is referred to as the fitness of the genotype. Typically it is this phase in a genetic algorithm that is the most time consuming. The next stage is to select what genetic information will proceed to the next generation. In nature the fitness function and selection are essentially the same – individuals that are better suited to the environment survive to reproduce and pass on their genes. In the genetic algorithm a procedure is applied to determine what information gets to proceed. Genetic algorithms are often generational – where all the old population is removed before moving to the next generation, in nature this process is much less algorithmic. However, to increase the continuity of information between generations, some versions of the algorithm use elitism, where the fittest individuals are always selected for promotion to the next generation. This ensures that good solutions are not lost from the population, but it may have the side effect of causing the genetic information in the population to converge too quickly so that the search stagnates on a sub-optimal solution. To generate the next population, a procedure analogous to sexual reproduction occurs. For example, two individuals will be selected and they will then have their genetic information combined together to produce the genotype for the offspring. This process is called recombination or crossover. The genotype is split into sections at randomly selected points called crossover points. A “simple” GA has only one of these points, however it is possible to perform this operation at multiple points. Sections of the two chromosomes are then put together to form a new individual. This individual shares some of the characteristics of both parents. There are many different ways to choose which members of the population to breed with each other, the aim in general is to try and ensure that fit individuals get to reproduce with other fit

Evolution in Materio

individuals. Individuals can be selected with a probability proportional to their relative fitness or selected through some form of tournament, which may choose two or more chromosomes at random from the population and select the fittest. In natural recombination, errors occur when the DNA is split and combined together. Also, errors in the DNA of a cell can occur at any time under the influence of a mutagen, such as radiation, a virus or toxic chemical. The genetic algorithm also has mutations. A number of alleles are selected at random and modified in some way. For a binary GA, the bit may be flipped, in a real-numbered GA a random value may be added to or subtracted from the previous allele. Although GAs often have both mutation and crossover, it is possible to just use mutation. A mutation only approach has in some cases been demonstrated to work, and often crossover is seen as a macro mutation operator – effectively changing large sections of a chromosome. After the previous operations have been carried out, the new individuals in the population are then retested and their new fitness scores calculated. Eventually this process leads to an increase in the average fitness of the population, and so the population moves closer toward a solution. This cycle of test, select and reproduce is continued until a solution is found (or some other termination condition is reached), at which point the algorithm stops. The performance of a genetic algorithm is normally measured in terms of the number of evaluations required to find a solution of a given quality. Evolution in Materio: Historical Background It is arguable that ‘evolution in materio’ began in 1958 in the work of Gordon Pask who worked on experiments to grow neural structures using electrochemical assemblages [16,17,18,19]. Gordon Pask’s goal was to create a device sensitive to either sound or magnetic fields that could perform some form of signal processing – a kind of ear. He realized he needed a system that was rich in structural possibilities, and chose to use a metal solution. Using electric currents, wires can be made to self-assemble in an acidic aqueous metal-salt solution (e. g. ferrous sulphate). Changing the electric currents can alter the structure of these wires and their positions – the behavior of the system can be modified through external influence. Pask used an array of electrodes suspended in a dish containing the metal-salt solution, and by applying current (either transiently or a slowly changing source) was able to build iron wires that responded differently to two different frequencies of sound – 50 Hz and 100 Hz.

Evolution in Materio, Figure 1 Pask’s experimental set up for growing dendritic wires in ferrous sulphate solution [17]

Pask had developed a system whereby he could manually train the wire formation in such a way that no complete specification had to be given – a complete paradigm shift from previous engineering techniques which would have dictated the position and behavior of every component in the system. His training technique relied on making changes to a set of resistors, and updating the values with given probabilities – in effect a test-randomly modify-test cycle. We would today recognize this algorithm as some form of evolutionary, hill climbing strategy – with the test stage as the fitness evaluation. In 1996 Adrian Thompson started what we might call the modern era of evolution in materio. He was investigating whether it was possible to build working electronic circuits using unconstrained evolution (effectively, generate-and-test) using a re-configurable electronic silicon chip called an Field Programmable Gate Array (FPGA). Carrying out evolution by defining configurations of actual hardware components is known as intrinsic evolution. This is quite possible using FPGAs which are devices that have a two-dimensional array of logic functions that a configuration bit string defines and connects together. Thompson had set himself the task of evolving a digital circuit that could discriminate between an applied 1 kHz or 10 kHz applied signal [20,21]. He found that computer controlled evolution of the configuring bit strings could relatively easily solve this problem. However, when he analyzed the successful circuits he found to his surprise that they worked by utilizing subtle electrical properties of the silicon. Despite painstaking analysis and simulation work he was unable to explain how, or what property was being utilized. This lack of knowledge of how the system works, of course, prevents humans from designing systems that are intended to exploit these subtle and complex physical characteristics. However, it does not prevent exploitation through artificial evolution. Since then a number of re-

1033

1034

Evolution in Materio

searchers have demonstrated the viability of intrinsic evolution in silicon devices [21,22,23,24,25,26,27]. The term evolution in materio was first coined by Miller and Downing [28]. They argued that the lesson that should be drawn from the work of [21] is that evolution may be used to exploit the properties of a wider range of materials than silicon. In summary, evolution in materio can be described as: Exploitation, using an unconstrained evolutionary algorithm, of the non-linear properties of a malleable or programmable material to perform a desired function by altering its physical or electrical configuration. Evolution in materio is a subset of a research field known as evolvable hardware. It aims to exploit properties of physical systems with much fewer preconditions and constraints than is usual, and it deliberately tries to avoid paying Conrad’s ‘Price of Programmability’. However, to get access to physically rich systems, we may have to discard devices designed with human programming in mind. Such devices are often based on abstract idealizations of processes occurring in the physical world. For example, FPGAs are considered as digital, but they are fundamentally analogue devices that have been constrained to behave in certain, human understandable ways. This means that intrinsically complex physical processes are carefully manipulated to represent extremely simple effects (e. g. a rapid switch from one voltage level to another). Unconstrained evolution, as demonstrated by Thompson, allows for the analogue properties of such devices to be effectively utilized. We would expect physically rich systems to exhibit non-linear properties – they will be complex systems. This is because physical systems generally have huge numbers of parts interacting in complex ways. Arguably, humans have difficulty working with complex systems, and the use of evolution enables us to potentially overcome these limitations when dealing with such systems. When systems are abstracted, the relationship to the physical world becomes more distant. This is highly convenient for human designers who do not wish to understand, or work with, hidden or subtle properties of materials. Exploitation through evolution reduces the need for abstraction, as it appears evolution is capable of discovering and utilizing any physical effects it can find. The aim of this new methodology in computation is to evolve special purpose computational processors. By directly exploiting physical systems and processes, one should be able to build extremely fast and efficient computational devices. It is our view that computer controlled evolution is a uni-

versal methodology for doing this. Of course, von Neumann machines (i. e. digital computers) are individually universal and this is precisely what confers their great utility in modern technology, however this universality comes at a price. They ignore the rich computational possibilities of materials and try to create operations that are close to a mathematical abstraction. Evolution in materio is a universal methodology for producing specific, highly tuned computational devices. It is important not to underestimate the real practical difficulties associated with using an unconstrained design process. Firstly the evolved behavior of the material may be extremely sensitive to the specific properties of the material sample, so each piece would require individual training. Thompson originally experienced this difficulty, however in later work he showed that it was possible to evolve the configuration of FPGAs so that they produced reliable behavior in a variety of environmental conditions [29]. Secondly, the evolutionary algorithm may utilize physical aspects of any part of the training set-up. Both of these difficulties have already been experienced [21,23]. A third problem can be thought of as “the wiring problem”. The means to supply huge amounts of configuration data to a tiny sample. This problem is a very fundamental one. It suggests that if we wish to exploit the full physical richness of materials we might have to allow the material to grow its own wires and be self-wiring. This has profound implications for intrinsic evolution as artificial hardware evolution requires complete reconfigurability, this implies that one would have to be able to “wipe-clean” the evolved wiring and start again with a new artificial genotype. This might be possible by using nanoparticles that assemble into nanowires. These considerations bring us to an important issue in evolution in materio. Namely, the problem of choosing a suitable materials that can be exploited by computer controlled evolution. Evolution in Materio: Defining Suitable Materials The obvious characteristic required by a candidate material is the ability to reconfigure it in some way. Liquid crystal, clay, salt solutions etc can be readily configured either electrically or mechanically; their physical state can be adjusted, and readjusted, by applying a signal or force. In contrast (excluding its electrical properties) the physical properties of an FPGA would remain unchanged during configuration. It is also desirable to bulk configure the system. It would be infeasible to configure every molecule in the material, so the material should support the ability to be reconfigured over large areas using a small amount of configuration.

Evolution in Materio

The material needs to perform some form of transformation (or computation) on incident signals that we apply. To do this, the material will have to interfere with the incident signal and perform a modification to it. We will need to be able to observe this modification, in order to extract the result of the computation. To perform a nontrivial computation, the material should be capable of performing complex operations upon the signal. Such capabilities would be maximized if the system exhibited nonlinear behavior when interacting with input signals. In summary, we can say that for a material to be useful to evolution in materio it should have the following properties:  Modify incident signals in observable ways.  The components of a system (i. e. the molecules within a material) interact with each other locally such that non-linear effects occur at either the local or global levels.  It is possible to configure the state of the material locally.  It is possible to observe the state of the material – either as a whole or in one or more locations.  For practical reasons we can state that the material should be reconfigurable, and that changes in state should be temporary or reversible. Miller and Downing [28] identified a number of physical systems that have some, if not all, of these desirable properties. They identified liquid crystal as the most promising in this regard as it digitally writable, reconfigurable and works at a molecular level. Most interestingly, it is an example of mesoscopic organization. Some people have argued that it is within such systems that emergent, organized behavior can occur [30]. Liquid crystals also exhibit the phenomenon of self-assembly. They form a class of substances that are being designed and developed in a field of chemistry called Supramolecular Chemistry [31]. This is a new and exciting branch of chemistry that can be characterized as ‘the designed chemistry of the intermolecular bond’. Supramolecular chemicals are in a permanent process of being assembled and disassembled. It is interesting to consider that conceptually liquid crystals appear to sit on the ‘edge of chaos’ [32] in that they are fluids (chaotic) that can be ordered, under certain circumstances. Liquid Crystal Liquid crystal (LC) is commonly defined as a substance that can exist in a mesomorphic state [33,34]. Mesomorphic states have a degree of molecular order that lies between that of a solid crystal (long-range positional and ori-

entational) and a liquid, gas or amorphous solid (no longrange order). In LC there is long-range orientational order but no long-range positional order. LC tends to be transparent in the visible and near infrared and quite absorptive in UV. There are three distinct types of LC: lyotropic, polymeric and thermotropic. Lyotropic LC is obtained when an appropriate amount of material is dissolved in a solvent. Most commonly this is formed by water and amphiphilic molecules: molecules with a hydrophobic part (water insoluble) and a hydrophillic part (strongly interacting with water). Polymeric LC is basically a polymer version of the aromatic LC discussed. They are characterized by high viscosity and include vinyls and Kevlar. Thermotropic LC (TLC) is the most common form and is widely used. TLC exhibit various liquid crystalline phases as a function of temperature. They can be depicted as rod-like molecules and interact with each other in distinctive ordered structures. TLC exists in three main forms: nematic, cholesteric and smectic. In nematic LC the molecules are positionally arranged randomly but they all share a common alignment axis. Cholesteric LC (or chiral nematic) is like nematic however they have a chiral orientation. In smectic LC there is typically a layered positionally disordered structure. The three types A, B and C are defined as follows. In type A the molecules are oriented in alignment with the natural physical axes (i. e normal to the glass container), however in type C the common molecular axes of orientation is at an angle to the container. LC molecules typically are dipolar. Thus the organization of the molecular dipoles give another order of symmetry to the LC. Normally the dipoles would be randomly oriented. However in some forms the natural molecular dipoles are aligned with one another. This gives rise to ferroelectric and ferrielectric forms. There is a vast range of different types of liquid crystal. LC of different types can be mixed. LC can be doped (as in Dye-Doped LC) to alter their light absorption characteristics. Dye-Doped LC film has been made that is optically addressable and can undergo very large changes in refractive index [35]. There are Polymer-Dispersed Liquid Crystals, which can have tailored, electrically controlled light refractive properties. Another interesting form of LC being actively investigated is Discotic LC. These have the form of disordered stacks (1-dimensional fluids) of discshaped molecules on a two-dimensional lattice. Although discotic LC is an electrical insulator, it can be made to conduct by doping with oxidants [36]. The oxidants are incorporated into the fluid hydrocarbon chain matrix (between disks). LC is widely known as useful in electronic displays, however, there are in fact, many non-display applications too. There are many applications of LC (espe-

1035

1036

Evolution in Materio

cially ferroelectric LC) to electrically controlled light modulation: phase modulation, optical correlation, optical interconnects and switches, wavelength filters, optical neural networks. In the latter case a ferroelectric LC is used to encode the weights in a neural network [37]. Conducting and Electroactive Polymers Conducting polymer composites have been made that rapidly change their microwave reflection coefficient when an electric field is applied. When the field is removed, the composite reverts to its original state. Experiments have shown that the composite can change from one state to the other in the order of 100 ms [38]. Also, some polymers exhibit electrochromism. These substances change their reflectance when a voltage is applied. This can be reversed by a change in voltage polarity [39]. Electroactive polymers [40] are polymers that change their volume with the application of an electric field. They are particularly interesting as voltage controlled artificial muscle. Organic semiconductors also look promising especially when some damage is introduced. Further details of electronic properties of polymers and organic crystals can be found in [41]. Voltage Controlled Colloids Colloids are suspensions of particles of sub-micron sizes in a liquid. The phase behavior of colloids is not fully understood. Simple colloids can self assemble into crystals, while multi-component suspensions can exhibit a rich variety of crystalline structures. There are also electrorheological fluids. These are suspensions of extremely fine nonconducting particles in an electrically insulating fluid. The viscosity of these fluids can be changed in a reversible way by large factors in response to an applied electric field in times of the order of milliseconds [42]. Also colloids can also be made in which the particles are charged making them easily manipulatable by suitable applied electric fields. Even if the particles are not charged they may be moved through the action of applied fields using a phenomenon known as dielectrophoresis which is the motion of polarized but electrically uncharged particles in nonuniform electric fields [43]. In work that echoes the methods of Pask nearly four decades ago, dielectrophoresis has been used to grow tiny gold wires through a process of self-assembly [44]. Langmuir–Blodgett Films Langmuir–Blodgett films are molecular monolayers of organic material that can be transferred to a solid substrate [45]. They usually consist of hydrophillic heads

Evolution in Materio, Figure 2 Kirchhoff–Lukasiewicz Machine

and hydrophobic tails attached to the substrate. Multiple monolayers can be built and films can be built with very accurate and regular thicknesses. By arranging an electrode layer above the film it seems feasible that the local electronic properties of the layers could be altered. These systems look like feasible systems whose properties might be exploitable through computer controlled evolution of the voltages. Kirchoff–Lukasiewicz Machines Work by Mills [46,47] also demonstrates the use of materials in computation. He has designed an ‘Extended Analog Computer’ (EAC) that is a physical implementation of a Kirchhoff–Lukasiewicz Machine (KLM) [46]. The machines are composed of logical function units connected to a conductive media, typically a conductive polymer sheet. The logical units implement Lukasiewicz Logic – a type of multi-valued logic [47]. Figure 2 shows how the Lukasiewicz Logic Arrays (LLA) are connected to the conductive polymer. The LLA bridge areas of the sheet together. The logic units measure the current at one point, perform a transformation and then apply a current source to the other end of the bridge. Computation is performed by applying current sinks and sources to the conductive polymer and reading the output from the LLAs. Different computations can be performed that are determined by the location of applied signals in the conducting sheet and the configuration of the LLAs. Hence, computation is performed by an interaction of the physics described by Kirchoff’s laws and the Lukasiewicz Logic units. Together they form a physical device that can solve certain kinds of partial differential equations. Using this form of analogue computation, a large

Evolution in Materio

number of these equations can be solved in nanoseconds – much faster than on a conventional computer. The speed of computation is dependent on materials used and how they are interfaced to digital computers, but it is expected that silicon implementations will be capable of finding tens of millions of solutions to the equations per second. Examples of computation so far implemented in this system include robot control, control of a cyclotron beam [48], models of biological systems (including neural networks) [49] and radiosity based image rendering. One of the most interesting feature of these devices is the programming method. It is very difficult to understand the actual processes used by the system to perform computation, and until recently most of the reconfiguration has been done manually. This is difficult as the system is not amenable to traditional software development approaches. However, evolutionary algorithms can be used to automatically define the parameters of the LLAs and the placement of current sinks and sources. By defining a suitable fitness function, the configuration of the EAC can be evolved – which removes the need for human interaction and for knowledge of the underlying system. Although it is clear that such KLMs are clearly using the physical properties of a material to perform computation, the physical state of the material is not reconfigured (i. e., programmed), only the currents in the sheet are changed. Evolution in Materio Is Verified with Liquid Crystal Harding [50] has verified Miller’s intuition about the suitability of liquid crystal as an evolvable material by demonstrating that it is relatively easy to configure liquid crystal to perform various forms of computation. In 2004, Harding constructed an analogue processor that utilizes the physical properties of liquid crystal for computation. He evolved the configuration of the liquid crystal to discriminate between two square waves of many different frequencies. This demonstrated, for the first time, that the principle of using computer-controlled evolution was a viable and powerful technique for using non-silicon materials for computation. The analogue processor consists of a passive liquid crystal display mounted on a reconfigurable circuit, known as an evolvable motherboard. The motherboard allows signals and configuration voltages to be routed to physical locations in the liquid crystal. Harding has shown that many different devices can be evolved in liquid crystal including:  Tone discriminator. A device was evolved in liquid crystal that could differentiate many different frequen-

cies of square wave. The results were competitive, if not superior to those evolved in the FPGA.  Logic gates. A variety of two input logic gates were evolved, showing that liquid crystal could behave in a digital fashion. This indicates that liquid crystal is capable of universal computation.  Robot controller. An obstacle avoidance system for a simple exploratory robot was evolved. The results were highly competitive, with solutions taking fewer evaluations to find compared to other work on evolved robot controllers. One of the surprising findings in this work has been that it turns out to be relatively easy to evolve the configuration of liquid crystal to solve tasks; i. e., only 40 generations of a modest population of configurations are required to evolve a very good frequency discriminator, compared to the thousands of generations required to evolve a similar circuit on an FPGA. This work has shown that evolving such devices in liquid crystal is easier than when using conventional components, such as FPGAs. The work is a clear demonstration that evolutionary design can produce solutions that are beyond the scope of human design. Evolution in Materio Using Liquid Crystal: Implementational Details An evolvable motherboard (EM) [23] is a circuit that can be used to investigate intrinsic evolution. The EM is a reconfigurable circuit that rewires a circuit under computer control. Previous EMs have been used to evolve circuits containing electronic components [23,51] – however they can also be used to evolve in materio by replacing the standard components with a candidate material. An EM is connected to an Evolvatron. This is essentially a PC that is used to control the evolutionary processes. The Evolvatron also has digital and analog I/O, and can be used to provide test signals and record the response of the material under evolution. The Liquid Crystal Evolvable Motherboard (LCEM) is a circuit that uses four cross-switch matrix devices to dynamically configure the circuits connecting to the liquid crystal. The switches are used to wire the 64 connections on the LCD to one of 8 external connections. The external connections are: input voltages, grounding, signals and connections to measurement devices. Each of the external connectors can be wired to any of the connections to the LCD. The external connections of the LCEM are connected to the Evolvatron’s analogue inputs and outputs. One connection was assigned for the incident signal, one for measurement and the other for fixed voltages. The value of the

1037

1038

Evolution in Materio

Evolution in Materio, Figure 3 Equipment configuration

Evolution in Materio, Figure 4 The LCEM

fixed voltages is determined by the evolutionary algorithm, but is constant throughout each evaluation. In these experiments the liquid crystal glass sandwich was removed from the display controller it was originally mounted on, and placed on the LCEM. The display has a large number of connections (in excess of 200), however because of PCB manufacturing constraints we are limited in the size of connection we can make, and hence the number of connections. The LCD is therefore roughly positioned over the pads on the PCB, with many of the PCB pads touching more than one of the connectors on the LCD. This means that we are applying configuration voltages to several areas of LC at the same time. Unfortunately neither the internal structure nor the electrical characteristics of the LCD are known. This raises

Evolution in Materio, Figure 5 Schematic of LCEM

the possibility that a configuration may be applied that would damage the device. The wires inside the LCD are made of an extremely thin material that could easily be burnt out if too much current flows through them. To guard against this, each connection to the LCD is made through a 4.7 Kohm resistor in order to provide protection against short circuits and to help limit the current in the LCD. The current supplied to the LCD is limited to 100 mA. The software controlling the evolution is also responsible for avoiding configurations that may endanger the device (such as short circuits). It is important to note that other than the control circuitry for the switch arrays there are no other active components on the motherboard – only analog switches, smoothing capacitors, resistors and the LCD are present.

Evolution in Materio

Stability and Repeatability Issues When the liquid crystal display is observed while solving a problem it is seen that some regions of the liquid display go dark indicating that the local molecular direction has been changed. This means that the configuration of the liquid crystal is changing while signals are being applied. To draw an analogy with circuit design, the incident signals would be changing component values or changing the circuit topology, which would have an affect on the behavior of the system. This is likely to be detrimental to the measured performance of the circuit. When a solution is evolved, the fitness function automatically measures it stability over the period of the evaluation. Changes made by the incident signals can be considered part of the genotype-phenotype mapping. Solutions that cannot cope with their initial configurations being altered will achieve a low score. However, the fitness function cannot measure the behavior beyond the end of the evaluation time. Therein lies the difficulty, in evolution in materio long term stability cannot be guaranteed. Another issue concerns repeatability. When a configuration is applied to the liquid crystal the molecules are unlikely go back to exactly where they were when this configuration was tried previously. Assuming, that there is a strong correlation between genotype and phenotype, then it is likely that evolution will cope with this extra noise. However, if evolved devices are to be useful one needs to be sure that previously evolved devices will function in the same way as they did when originally evolved. In [27] it is noted that the behavior of circuits evolved intrinsically can be influenced by previous configurations – therefore their behavior (and hence fitness) is dependent not only on the currently evaluated individual’s configuration but on those that came before. It is worth noting that this is precisely what happens in natural evolution. For example, in a circuit, capacitors may still hold charge from a previously tested circuit. This charge would then affect the circuits operation, however if the circuit was tested again with no stored charge a different behavior would be expected and a different fitness score would be obtained. Not only does this affect the ability to evolve circuits, but would mean that some circuits are not valid. Without the influence of the previously evaluated circuits the current solution may not function as expected. It is expected that such problems will have analogies in evolution in materio. The configurations are likely to be highly sensitive to initial conditions (i. e. conditions introduced by previous configurations). Dealing with Environmental Issues A major problem when working with intrinsic evolution is

separating out the computation allegedly being carried out by the target device, and that actually done by the material being used. For example, whilst trying to evolve an oscillator Bird and Layzell discovered that evolution was using part of the circuit for a radio antenna, and picking up emissions from the environment [22]. Layzell also found that evolved circuits were sensitive to whether or not a soldering iron was plugged in (not even switched on) in another part of the room [23]! An evolved device is not useful if it is highly sensitive to its environment in unpredictable ways, and it will not always be clear what environmental effects the system is using. It would be unfortunate to evolve a device for use in a space craft, only to find out it fails to work once out of range of a local radio tower! To minimize these risks, we will need to check the operation of evolved systems under different conditions. We will need to test the behavior of a device using a different set up in a different location. It will be important to know if a particular configuration only works with one particular sample of a given material. The Computational Power of Materials In [52], Lloyd argued that the theoretical computing power of a kilogram of material is far more than is possible with a kilogram of traditional computer. He notes that computers are subject to the laws of physics, and that these laws place limits on the maximum speed they can operate and the amount of information it can process. Lloyd shows that if we were fully able to exploit a material, we would get an enormous increase in computing power. For example, with 1 kg of matter we should be able to perform roughly 5  1050 operations per second, and store 1031 bits. Amazingly, contemporary quantum computers do operate near these theoretical limits [52]. A small amount of material also contains a large number of components (regardless of whether we consider the molecular or atomic scale). This leads to some interesting thoughts. If we can exploit materials at this level, we would be able to do a vast amount of computation in a small volume. A small size also hints at low power consumption, as less energy has to be spent to perform an operation. Many components also provide a mechanism for reliability through redundancy. A particularly interesting observation, especially when considered in terms of non Von-Neumann computation, is the massive parallelism we may be able to achieve. The reason that systems such as quantum, DNA and chemical computation can operate so quickly is that many operations are performed at the same time. A programmable material might be capable of per-

1039

1040

Evolution in Materio

forming vast numbers of tasks simultaneously, and therefore provide a computational advantage. In commercial terms, small is often synonymous with low cost. It may be possible to construct devices using cheaply available materials. Reliability may not be an issue, as the systems could be evolved to be massively fault tolerant using their intrinsic redundancy. Evolution is capable of producing novel designs. Koza has already rediscovered circuits that infringe on recent patents, and his genetic programming method has ‘invented’ brand new circuit designs [53]. Evolving in materio could produce many novel designs, and indeed given the infancy of programmable materials all designs may be unique and hence patentable. Future Directions The work described here concerning liquid crystal computational devices is at an early stage. We have merely demonstrated that it is possible to evolve configurations of voltages that allow a material to perform desired computations. Any application that ensues from this work is unlikely to be a replacement for a simple electronic circuit. We can design and build those very successfully. What we have difficulty with is building complex, fault tolerant systems for performing complex computation. It appears that nature managed to do this. It used a simple process of a repetitive test and modify, and it did this in a universe of unimaginable physical complexity. If nature can exploit the physical properties of a material and its surroundings through evolution, then so should we. There are many important issues that remain to be addressed. Although we have made some suggestions about materials worthy of investigation, it is at present unclear which materials are most suitable. An experimental platform needs to be constructed that allows many materials to be tried and investigated. The use of microelectrode arrays in a small volume container would allow this. This would also have the virtue of allowing internal signals in the materials to be inspected and potentially understood. We need materials that are rapidly configurable. They must not be fragile and sensitive to minute changes in physical setup. They must be capable of maintaining themselves in a stable configuration. The materials should be complex and allow us to carry out difficult computations more easily than conventional means. One would like materials that can be packaged into small volumes. The materials should be relatively easily interfaced with. So far, material systems have been configured by applying a constant configuration pattern, however this may not be appropriate for all systems. It may be necessary to put the physical

system under some form of responsive control, in order to program and then keep the behavior stable. We may or may not know if a particular material can be used to perform some form of computation. However, we can treat our material as a “black box”, and using evolution as a search technique, automatically discover what, if any, computations our black box can perform. The first step is to build an interface that will allow us to communicate with a material. Then we will use evolution to find a configuration we can apply using this platform, and then attempt to find a mapping from a given problem to an input suitable for that material, and a mapping from the materials response to an output. If this is done correctly, we might be automatically able to tell if a material can perform computation, and then classify the computation. When we evolve in materio, using mappings evolved in software, how can we tell when the material is giving us any real benefit? The lesson of evolution in materio has been that the evolved systems can be very difficult to analyze, and the principal obstacle to the analysis is the problem of separating out the computational role that each component plays in the evolved system. These issues are by no means just a problem for evolution in materio. They may be an inherent part of complex evolved systems. Certainly the understanding of biological systems are providing immense challenges to scientists. The single most important aspect that suggests that evolution in materio has a future is that natural evolution has produced immensely sophisticated material computational systems. It would seem foolish to ignore this and merely try to construct computational devices that operate according to one paradigm of computation (i. e. Turing). Oddly enough, it is precisely the sophistication of the latter that allows us to attempt the former. Bibliography Primary Literature 1. Turing AM (1936) On computable numbers, with an application to the entscheidungsproblem. Proc Lond Math Soc 42(2):230–265 2. Bissell C (2004) A great disappearing act: the electronic analogue computer. In: IEEE Conference on the History of Electronics, 28–30 June 3. Deutsch D (1985) Quantum theory, the church–turing principle and the universal quantum computer. Proc Royal Soc Lond A 400:97–117 4. Adamatzky A, Costello BDL, Asai T (2005) Reaction-Diffusion Computers. Elsevier, Amsterdam 5. Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Science 266(11):1021–1024 6. Amos M (2005) Theoretical and Experimental DNA Computation. Springer, Berlin

Evolution in Materio

7. Weiss R, Basu S, Hooshangi S, Kalmbach A, Karig D, Mehreja R, Netravali I (2003) Genetic circuit building blocks for cellular computation, communications, and signal processing. Nat Comput 2(1):47–84 8. UK Computing Research Committee (2005) Grand challenges in computer research. http://www.ukcrc.org.uk/grand_ challenges/ 9. Stepney S, Braunstein SL, Clark JA, Tyrrell A, Adamatzky A, Smith RE, Addis T, Johnson C, Timmis J, Welch P, Milner R, Partridge D (2005) Journeys in non-classical computation I: A grand challenge for computing research. Int J Parallel Emerg Distrib Syst 20(1):5–19 10. Stepney S, Braunstein S, Clark J, Tyrrell A, Adamatzky A, Smith R, Addis T, Johnson C, Timmis J, Welch P, Milner R, Partridge D (2006) Journeys in non-classical computation II: Initial journeys and waypoints. Int J Parallel, Emerg Distrib Syst 21(2):97–125 11. Toffoli T (2005) Nothing makes sense in computing except in the light of evolution. Int J Unconv Comput 1(1):3–29 12. Conrad M (1988) The price of programmability. The Universal Turing Machine. pp 285–307 13. Goldberg D (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, Massachusetts 14. Holland J (1992) Adaptation in Natural and Artificial Systems. 2nd edn. MIT Press, Cambridge 15. Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge, MA, USA 16. Pask G (1958) Physical analogues to the growth of a concept. In: Mechanization of Thought Processes. Symposium 10, National Physical Laboratory, pp 765–794 17. Pask G (1959) The natural history of networks. In: Proceedings of International Tracts. In: Computer Science and Technology and their Application. vol 2, pp 232–263 18. Cariani P (1993) To evolve an ear: epistemological implications of gordon pask’s electrochemical devices. Syst Res 3:19–33 19. Pickering A (2002) Cybernetics and the mangle: Ashby, beer and pask. Soc Stud Sci 32:413–437 20. Thompson A, Harvey I, Husbands P (1996) Unconstrained evolution and hard consequences. In: Sanchez E, Tomassini M (eds) Towards Evolvable Hardware: The evolutionary engineering approach. LNCS, vol 1062. Springer, Berlin, pp 136–165 21. Thompson A (1996) An evolved circuit, intrinsic in silicon, entwined with physics. ICES 390–405 22. Bird J, Layzell P (2002) The evolved radio and its implications for modelling the evolution of novel sensors. In: Proceedings of Congress on Evolutionary Computation, pp 1836–1841 23. Layzell P (1998) A new research tool for intrinsic hardware evolution. Proceedings of The Second International Conference on Evolvable Systems: From Biology to Hardware. LNCS, vol 1478. Springer, Berlin, pp 47–56 24. Linden DS, Altshuler EE (2001) A system for evolving antennas in-situ. In: 3rd NASA / DoD Workshop on Evolvable Hardware. IEEE Computer Society, pp 249–255 25. Linden DS, Altshuler EE (1999) Evolving wire antennas using genetic algorithms: A review. In: 1st NASA / DoD Workshop on Evolvable Hardware. IEEE Computer Society, pp 225–232 26. Stoica A, Zebulum RS, Guo X, Keymeulen D, Ferguson MI, Duong V (2003) Silicon validation of evolution-designed circuits. In: Proceedings. NASA/DoD Conference on Evolvable Hardware, pp 21–25

27. Stoica A, Zebulum RS, Keymeulen D (2000) Mixtrinsic evolution. In: Proceedings of the Third International Conference on Evolvable Systems: From Biology to Hardware (ICES2000). Lecture Notes in Computer Science, vol 1801. Springer, Berlin, pp 208–217 28. Miller JF, Downing K (2002) Evolution in materio: Looking beyond the silicon box. In: Proceedings of NASA/DoD Evolvable Hardware Workshop, pp 167–176 29. Thompson A (1998) On the automatic design of robust electronics through artificial evolution. In: Sipper M, Mange D, Pérez-Uribe A (eds) Evolvable Systems: From Biology to Hardware, vol 1478. Springer, New York, pp 13–24 30. Laughlin RB, Pines D, Schmalian J, Stojkovic BP, Wolynes P (2000) The middle way. Proc Nat Acd Sci 97(1):32–37 31. Lindoy LF, Atkinson IM (2000) Self-assembly in Supramolecular Systems. Royal Society of Chemistry 32. Langton C (1991) Computation at the edge of chaos: Phase transitions and emergent computation. In: Emergent Computation, pp 12–37. MIT Press 33. Demus D, Goodby JW, Gray GW, Spiess HW, Vill V (eds) (1998) Handbook of Liquid Crystals, vol 4. Wiley-VCH, ISBN 3-52729502-X, pp 2180 34. Khoo IC (1995) Liquid Crystals: physical properties and nonlinear optical phenomena. Wiley 35. Khoo IC, Slussarenko S, Guenther BD, Shih MY, Chen P, Wood WV (1998) Optically induced space-charge fields, dc voltage, and extraordinarily large nonlinearity in dye-doped nematic liquid crystals. Opt Lett 23(4):253–255 36. Chandrasekhar S (1998) Columnar, discotic nematic and lamellar liquid crystals: Their structure and physical properties. In: Handbook of Liquid Crystals, vol 2B. Wiley-VCH pp 749–780 37. Crossland WA, Wilkinson TD (1998) Nondisplay applications of liquid crystals. In: Handbook of Liquid Crystals, vol 1. WileyVCH, pp 763–822 38. Wright PV, Chambers B, Barnes A, Lees K, Despotakis A (2000) Progress in smart microwave materials and structures. Smart Mater Struct 9:272–279 39. Mortimer RJ (1997) Electrochromic materials. Chem Soc Rev 26:147–156 40. Bar-Cohen Y (2001) Electroactive Polymer (EAP) Actuators as Artificial Muscles – Reality, Potential and Challenges. SPIE Press 41. Pope M, Swenberg CE (1999) Electronic Processes of Organic Crystals and Polymers. Oxford University Press, Oxford 42. Hao T (2005) Electrorheological Fluids: The Non-aqueous Suspensions. Elsevier Science 43. Khusid B, Activos A (1996) Effects of interparticle electric interactions on dielectrophoresis in colloidal suspensions. Phys Rev E 54(5):5428–5435 44. Khusid B, Activos A (2001) Hermanson KD, Lumsdon SO, Williams JP, Kaler EW, Velev OD. Science 294:1082–1086 45. Petty MC (1996) Langmuir–Blodgett Films: An Introduction. Cambridge University Press, Cambridge 46. Mills JW (1995) Polymer processors. Technical Report TR580, Department of Computer Science, University of Indiana 47. Mills JW, Beavers MG, Daffinger CA (1989) Lukasiewicz logic arrays. Technical Report TR296, Department of Computer Science, University of Indiana 48. Mills JW (1995) Programmable vlsi extended analog computer for cyclotron beam control. Technical Report TR441, Department of Computer Science, University of Indiana

1041

1042

Evolution in Materio

49. Mills JW (1995) The continuous retina: Image processing with a single sensor artificial neural field network. Technical Report TR443, Department of Computer Science, University of Indiana 50. Harding S, Miller JF (2004) Evolution in materio: A tone discriminator in liquid crystal. In: In Proceedings of the Congress on Evolutionary Computation 2004 (CEC’2004), vol 2, pp 1800– 1807 51. Crooks J (2002) Evolvable analogue hardware. Meng project report, The University Of York 52. Lloyd S (2000) Ultimate physical limits to computation. Nature 406:1047–1054 53. Koza JR (1999) Human-competitive machine intelligence by means of genetic algorithms. In: Booker L, Forrest S, Mitchell M, Riolo R (eds) Festschrift in honor of John H Holland. Center for the Study of Complex Systems, Ann Arbor, pp 15–22

Books and Reviews Analog Computer Museum and History Center: Analog Computer Reading List. http://dcoward.best.vwh.net/analog/readlist. htm Bringsjord S (2001) In Computation, Parallel is Nothing, Physical Everything. Minds and Machines 11(1)

Feynman RP (2000) Feyman Lectures on Computation. Perseus Books Group Fifer S (1961) Analogue computation: theory, techniques, and applications. McGraw-Hill, New York Greenwood GW, Tyrrell AM (2006) Introduction to Evolvable Hardware: A Practical Guide for Designing Self-Adaptive Systems. Wiley-IEEE Press Hey AJG (ed) (2002) Feynman and Computation. Westview Press Penrose R (1989) The Emperor’s New Mind, Concerning Computers, Minds, and the Laws of Physics. Oxford University, Oxford Piccinini G: The Physical Church–Turing Thesis: Modest or Bold? http://www.umsl.edu/~piccininig/CTModestorBold5.htm Raichman N, Ben-Jacob N, Segev R (2003) Evolvable Hardware: Genetic Search in a Physical Realm. Phys A 326:265–285 Sekanina L (2004) Evolvable Components: From Theory to Hardware Implementations, 1st edn. Springer, Heidelberg Siegelmann HT (1999) Neural Networks and Analog Computation, Beyond the Turing Limits. Birkhauser, Boston Sienko T, Adamatzky A, Rambidi N, Conrad M (2003) Molecular Computing. MIT Press Thompson A (1999) Hardware Evolution: Automatic Design of Electronic Circuits in Reconfigurable Hardware by Artificial Evolution, 1st edn. Springer, Heidelberg

Evolving Cellular Automata

Evolving Cellular Automata MARTIN CENEK, MELANIE MITCHELL Computer Science Department, Portland State University, Portland, USA Article Outline Glossary Definition of the Subject Introduction Cellular Automata Computation in CAs Evolving Cellular Automata with Genetic Algorithms Previous Work on Evolving CAs Coevolution Other Applications Future Directions Acknowledgments Bibliography Glossary Cellular automaton (CA) Discrete-space and discretetime spatially extended lattice of cells connected in a regular pattern. Each cell stores its state and a statetransition function. At each time step, each cell applies the transition function to update its state based on its local neighborhood of cell states. The update of the system is performed in synchronous steps – i. e., all cells update simultaneously. Cellular programming A variation of genetic algorithms designed to simultaneously evolve state transition rules and local neighborhood connection topologies for non-homogeneous cellular automata. Coevolution An extension to the genetic algorithm in which candidate solutions and their “environment” (typically test cases) are evolved simultaneously. Density classification A computational task for binary CAs: the desired behavior for the CA is to iterate to an all-1s configuration if the initial configuration has a majority of cells in state 1, and to an all-0s configuration otherwise. Genetic algorithm (GA) A stochastic search method inspired by the Darwinian model of evolution. A population of candidate solutions is evolved by reproduction with variation, followed by selection, for a number of generations. Genetic programming A variation of genetic algorithms that evolves genetic trees.

Genetic tree Tree-like representation of a transition function, used by genetic programming algorithm. Lookup table (LUT) Fixed-length table representation of a transition function. Neighborhood Pattern of connectivity specifying to which other cells each cell is connected. Non-homogeneous cellular automaton A CA in which each cell can have its own distinct transition function and local neighborhood connection pattern. Ordering A computational task for one-dimensional binary CAs with fixed boundaries: The desired behavior is for the CA to iterate to a final configuration in which all initial 0 states migrate to the left-hand side of the lattice and all initial 1 states migrate to the right-hand side of the lattice. Particle Periodic, temporally coherent boundary between two regular domains in a set of successive CA configurations. Particles can be interpreted as carrying information about the neighboring domains. Collisions between particles can be interpreted as the processing of information, with the resulting information carried by new particles formed by the collision. Regular domain Region defined by a set of successive CA configurations that can be described by a simple regular language. Synchronization A computational task for binary CAs: the desired behavior for the CA is to iterate to a temporal oscillation between two configurations: all cells have state 1 and all cells have state 0s. Transition function Maps a local neighborhood of cell states to an update state for the center cell of that neighborhood. Definition of the Subject Evolving cellular automata refers to the application of evolutionary computation methods to evolve cellular automata transition rules. This has been used as one approach to automatically “programming” cellular automata to perform desired computations, and as an approach to model the evolution of collective behavior in complex systems. Introduction In recent years, the theory and application of cellular automata (CAs) has experienced a renaissance, due to advances in the related fields of reconfigurable hardware, sensor networks, and molecular-scale computing systems. In particular, architectures similar to CAs can be used to construct physical devices such as field configurable

1043

1044

Evolving Cellular Automata

gate arrays for electronics, networks of robots for environmental sensing and nano-devices embedded in interconnect fabric used for fault tolerant nanoscale computing. Such devices consist of networks of simple components that communicate locally without centralized control. Two major areas of research on such networks are (1) programming – how to construct and configure the locally connected components such that they will collectively perform a desired task; and (2) computation theory – what types of tasks are such networks able to perform efficiently, and how does the configuration of components affect the computational capability of these networks? This article describes research into one particular automatic programming method: the use of genetic algorithms (GAs) to evolve cellular automata to perform desired tasks. We survey some of the leading approaches to evolving CAs with GAs, and discuss some of the open problems in this area. Cellular Automata A cellular automaton (CA) is a spatially extended lattice of locally connected simple processors (cells). CAs can be used both to model physical systems and to perform parallel distributed computations. In a CA, each cell maintains a discrete state and a transition function that maps the cell’s current state to its next state. This function is often represented as a lookup table (LUT). The LUT stores all possible configurations of a cell’s local neighborhood, which consists of its own current state and the state of its neighboring cells. Change of state is performed in discrete time steps: the entire lattice is updated synchronously. There are many possible definitions of a neighborhood, but here we will define a neighborhood as the cell to be updated and the cells adjacent to it at a distance of radius r. The number of entries in the LUT will be s N , where s is the number of possible states and N is the total number of cells in the neighborhood: (2r C 1)d for a square shaped neighborhood in a d-dimensional lattice, also known as a Moore neighborhood. CAs typically are given periodic boundary conditions, which treat the lattice as a torus. To transform a cell’s state, the values of the cell’s state and those of its neighbors are encoded as a lookup index to the LUT that stores a value representing the cell’s new state (Fig. 1: left) [8,16,59]. For the scope of this article, we will focus on homogeneous binary CAs, which means that all cells in the CAs have the same LUT and each cell has one of two possible states, s 2 f0; 1g. Figure 1 shows the mechanism of updates in a homogeneous one-dimensional two-state CA with a neighborhood radius r D 1.

Evolving Cellular Automata, Figure 1 Left top: A one-dimensional neighborhood of three cells (radius 1): Center cell, West neighbor, and East neighbor. Left middle: A sample look-up table in which all possible neighborhood configurations are listed, along with the update state for the center cell in each neighborhood. Left bottom: Mechanism of update in a one dimensional binary CA of length 13: t0 is the initial configuration at time 0 and t1 is the initial configuration at next time step. Right: The sequence of synchronous updates starting at the initial state t0 and ending at state t9

CAs were invented in the 1940s by Stanislaw Ulam and John von Neumann. Ulam used CAs as a mathematical abstraction to study the growth of crystals, and von Neumann used them as an abstraction of a physical system with the concepts of a cell, state and transition function in order to study the logic of self-reproducing systems [8,11,55]. Von Neumann’s seminal work on CAs had great significance. Science after the industrial revolution was primarily concerned with energy, force and motion, but the concept of CAs shifted the focus to information processing, organization, programming, and most importantly, control [8]. The universal computational ability of CAs was realized early on, but harnessing this power continues to intrigue scientists [8,11,32,55]. Computation in CAs In the early 1970s John Conway published a description of his deceptively simple Game of Life CA [18]. Conway proved that the Game of Life, like von Neumann’s self-reproducing automaton, has the power of a universal Turing machine: any program that can be run on a Turing machine can be simulated by the Game of Life with the appropriate initial configuration of states. This initial configuration (IC) encodes both the input and the program to be run on that input. It is interesting that so simple a CA as the Game of Life (as well as even simpler CAs – see chapter 11 in [60]) has the power of a universal computer. However, the actual application of CAs as universal

Evolving Cellular Automata

Evolving Cellular Automata, Figure 2 Two space-time diagrams illustrating the behavior of the “naïve” local majority voting rule, with lattice size N D 149, neighborhood radius r D 3, and number of time steps M D 149. Left: initial configuration has a majority of 0s. Right: initial configuration has a majority of 1s. Individual cells are colored black for state 1 and white for state 0. (Reprinted from [37] with permission of the author.)

computers is, in general, impractical due to the difficulty of encoding a given program and input as an IC, as well as very long simulation times. An alternative use of CAs as computers is to design a CA to perform a particular computational task. In this case, the initial configuration is the input to the program, the transition function corresponds to the program performing the specific task, and some set of final configurations is interpreted as the output of the computation. The intermediate configurations comprise the actual computation being done. Examples of tasks for which CAs have been designed include location management in mobile computing networks [50], classification of initial configuration densities [38], pseudo-random number generation [51], multiagent synchronization [47], image processing [26], simulation of growth patterns of material microstructures [5], chemical reactions [35], and pedestrian dynamics [45]. The problem of designing a CA to perform a task requires defining a cell’s local neighborhood and boundary conditions, and constructing a transition function for cells that produces the desired input-output mapping. Given a CA’s states, neighborhood radius, boundary conditions, and initial configuration, it is the LUT values that must be set by the “programmer” so that the computation will be performed correctly over all inputs. In order to study the application of genetic algorithms to designing CAs, substantial experimentation has been done using the density classification (or majority classification) task. Here, “density” refers to the fraction of 1s in the initial configuration. In this task, a binary-state CA must iterate to an all-1s configuration if the initial configuration has a majority of cells in state 1, and iterate to an all-0s configuration otherwise. The maximum time al-

lowed for completing this computation is a function of the lattice size. One “naïve” solution for designing the LUT for this task would be local majority voting: set the output bit to 1 for all neighborhood configurations with a majority of 1s, and 0 otherwise. Figure 2 gives two space-time diagrams illustrating the behavior of this LUT in a one-dimensional binary CA with N D 149, and r D 3, where N denotes the number of cells in the lattice, and r is the neighborhood radius. Each diagram shows an initial configuration of 149 cells (horizontal) iterating over 149 time steps (vertical, down the page). The left-hand diagram has an initial configuration with a majority of 0 (white) cells, and the righthand diagram has an initial configuration with a majority of 1 (black) cells. In neither case does the CA produce the “correct” global behavior: an all-0s configuration for the left diagram and an all-1s configuration for the right diagram. This illustrates the general fact that human intuition often fails when trying to capture emergent collective behavior by manipulating individual bits in the lookup table that reflect the settings of the local neighborhood. Evolving Cellular Automata with Genetic Algorithms Genetic Algorithms (GAs) are a group of stochastic search algorithms, inspired by the Darwinian model of evolution, that have proved successful for solving various difficult problems [3,4,36]. A GA works as follows: (1) A population of individuals (“chromosomes”) representing candidate solutions to a given problem is initially generated at random. (2) The fitness of each individual is calculated as a function of its quality as a solution. (3) The fittest individuals are then se-

1045

1046

Evolving Cellular Automata

lected to be the parents of a new generation of candidate solutions. Offspring are created from parents via copying, random mutation, and crossover. Once a new generation of individuals is created, the process returns to step two. This entire process is iterated for some number of generations, and the result is (hopefully) one or more highly fit individuals that are good solutions to the given problem. GAs have been used by a number of groups to evolve LUTs for binary CAs [2,10,14,15,34,43,47,51]. The individuals in the GA population are LUTs, typically encoded as binary strings. Figure 3 shows a mechanism of encoding LUTs from a particular neighborhood configuration. For example: the decimal value for the neighborhood 11010 is 26. The updated value for the neighborhood’s center cell 11010 is retrieved from the 26th position in the LUT, updating cell’s value to 1. The fitness of a LUT is a measure of how well the corresponding CA performs a given task after a fixed number of time steps, starting from a number of test initial configurations. For example, given the density classification task, the fitness of a LUT is calculated by running the corresponding CA on some number k of random initial configurations, and returning the fraction of those k on which the CA produces the correct final configuration (all 1s for initial configurations with majority 1s, all 0s otherwise). The set of random test ICs is typically regenerated at each generation. For LUTs represented as bit strings, crossover is applied to two parents by randomly selecting a crossover point, so that each child inherits one segment of bits from each parent. Next, each child is subject to a mutation, where the genome’s individual bits are subject to a bit complement with a very low probability. An example of the reproduction process is illustrated in Fig. 4 for a lookup

Evolving Cellular Automata, Figure 4 Reproduction applied to Parent1 and Parent2 producing Child1 and Child2 . The one-point crossover is performed at a randomly selected crossover point (bit 3) and a mutation is performed on bits 2 and 5 in Child1 and Child2 respectively

table representation of r D 1. Here, one of two children is chosen for survival at random and placed in an offspring population. This process is repeated until the offspring population is filled. Before a new evolutionary cycle begins, the newly created population of offspring replaces the previous population of parents. Previous Work on Evolving CAs Von Neumann’s self-reproducing automaton was the first construction that showed that CAs can perform universal computation [55], meaning that the CAs are capable, in principle, of performing any desired computation. However, in general it was unknown how to effectively “program” CAs to perform computations or what information processing dynamics CAs could best use to accomplish a task. In the 1980s and 1990s, a number of researchers attempted to determine how the generic dynamical behavior of a CA might be related to its ability to perform computations [18,19,21,33,58]. In particular, Langton defined a parameter on CA LUTs, , that he claimed correlated with

Evolving Cellular Automata, Figure 3 Lookup table encoding for 1D CA with neighborhood r D 2. All permutations of neighborhood values are encoded as an offset to the LUT. The LUT bit represents a new value for the center cell of the neighborhood. The binary string (LUT) encodes an individual’s chromosome used by evolution

Evolving Cellular Automata

Evolving Cellular Automata, Figure 5 Analysis of a GA evolved CA for density classification task. Left: The original spacetime diagram containing particle strategies in a CA evolved by GA. The regions of regular domains are all white, all black, or have a checkerboard pattern. Right: Spacetime diagram after regular domains are filtered out. (Reprinted from [37] with permission of the author.)

computational ability. In Langton’s work,  is a function of the state-update values in the LUT; for binary CAs,  is defined as the fraction of 1s in the state-update values. Computation at the Edge of Chaos Packard [40] was the first to use a genetic algorithm to evolve CA LUTs in order to test the hypothesis that LUTs with a critical value of  will have maximal computational capability. Langton had shown that generic CA behavior seemed to undergo a sequence of phase transitions – from simple to “complex” to chaotic – as  was varied. Both Langton and Packard believed that the “complex” region was necessary for non-trivial computation in CAs, thus the phrase “computation at the edge of chaos” was coined [33,40]. Packard’s experiments indicated that CAs evolved by GAs to perform the density classification task indeed tended to exhibit critical  values. However, this conclusion was not replicated in later work [38]. Correlations between  (or other statistics of LUTs) and computational capability in CAs have been hinted at in further work, but have not been definitively established. A major problem is the difficulty of quantifying “computational capability” in CAs beyond the general (and not very practical) capability of universal computation. Computation via CA “Particles” While Mitchell, Hraber, and Crutchfield were not able to replicate Packard’s results on , they were able to show that genetic algorithms can indeed evolve CAs to perform computations [38]. Using earlier work by Hanson and Crutchfield on characterizing computation in CAs [20,21],

Das, Mitchell and Crutchfield gave an information-processing interpretation of the dynamics exhibited by the evolved CAs in terms of regular domains and particles [21]. This work was extended by Das, Crutchfield, Mitchell, and Hanson [14] and Hordijk, Crutchfield and Mitchell [24]. In particular these groups showed that when regular domains – patterns described by simple regular languages – are filtered out of CA space-time behavior, the boundaries between these domains become forefront and can be interpreted as information-carrying “particles”. These particles can characterize non-trivial computation carried out by CAs [15,21]. The information-carrying role of particles becomes clear when applied to CAs evolved by the GA for the density classification task. Figure 5, left, shows typical behavior of the best CAs evolved by the GA. The CA contains three regular domains: all white (0 ), all black (1 ), and checkerboard ((01) ). Figure 5, right, shows the particles remaining after the regular domains are filtered out. Each particle has an origin and velocity, and carries information about the neighboring regions [37]. Hordijk et al. [24] showed that a small set of particles and their interactions can explain the computational behavior (i. e., the fitness) of the evolved cellular automata. Crutchfield et al. [13] describe how the analysis of evolved CAs in terms of particles can also explain how the GA evolved CAs with high fitness. Land and Belew [31] proved that no two-state homogeneous CA can perform the density classification task perfectly. However, the maximum possible performance for CAs on this task is not known. The density classification task remains a popular benchmark for studying the evolution of CAs with GAs,

1047

1048

Evolving Cellular Automata

since the task requires collective behavior: the decision about the global density of the IC is based on information only from each local neighborhood. Das et al. [14] also used GAs to evolve CAs to perform a global synchronization task, which requires that, starting from any initial configuration, all cells of the CA will synchronize their states (to all 1s or 0s) and in the next time step all cells must change state to the opposite value. Again, this behavior requires global coordination based on local communication. Das et al. showed that an analysis in terms of particles and their interactions was also possible for this task. Genetic Programming Andre et al. [2] applied genetic programming (GP), a variation of GAs, to the density classification task. GP methodology also uses a population of evolving candidate solutions, and the principles of reproduction and survival are the same for both GP and GAs. The main difference between these two methods is the encoding of individuals in the population. Unlike the binary strings used in GAs, individuals in a GP population have tree structures, made up of function and terminal nodes. The function nodes (internal nodes) are operators from a pre-defined function set, and the terminal nodes (leaves) represent operands from a terminal set. The fitness value is obtained by evaluating the tree on a set of test initial configurations. The crossover operator is applied to two parents by swapping randomly selected sub-trees, and the mutation operation is performed on a single node by creating a new node or by changing its value (Fig. 6) [29,30]. The GP algorithm evolved CAs whose performance is slightly higher than the performance of the best CAs evolved by a traditional GA. Unlike traditional GAs that use crossover and mutation to evolve fixed length genome solutions, GP trees evolve to different sizes or shapes, and the subtrees can be substituted out and added to the function set as automatically defined functions. According to Andre et al., this allows GP to better explore the “regularities, symmetries, homogeneities, and modularities of the problem domain” [2]. The best-evolved CAs by GP revealed more complex particles and particle interactions that than the CAs found by the EvCA group [13,24]. It is unclear whether the improved results were due to the GP representation or to the increased population sizes and computation time used by Andre et al.

automata [22,47,48,54]. Each cell of a non-homogeneous CA contains two independently evolving chromosomes. One represents the LUT for the cell (different cells can have different LUTs), and the second represents the neighborhood connections for the cell. Both the LUTs and the cell’s connectivity can be evolved at the same time. Since a task is performed by a collection of cells with different LUTs, there is no single best performing individual; the fitness is a measure of the collective behavior of the cells’ LUTs and their neighborhood assignments [46,48]. One of many tasks studied by Sipper was the global ordering task [47]. Here, the CA has fixed rather than periodic boundaries, so the “left” and “right” parts of the CA lattice are defined. The ordering in any given IC pattern will place all 0s on the left, followed by all 1s on the right. The initial density of the IC has to be preserved in the final configuration. Sipper designed a cellular programming algorithmj to co-evolve multiple LUTs and their neighborhood topologies. Cellular programming carries out the same steps as the conventional GA (initialization, evaluation, reproduction, replacement), but each cell reproduces only with its local neighbors. The LUTs and connectivity chromosomes from the locally connected sites are the only potential parents for the reproduction and replacement of cell’s LUTs and the connectivity tables respectively. The cell’s limited connectivity results in genetically diverse population. If a current population has a cell with a high-fitness LUT, its LUT will not be directly inherited by a given cell unless they are connected. The connectivity chromosome causes spatial isolation that allows evolution to explore multiple CA rules as a part of a collective solution [47,48]. Sipper exhaustively tested all homogeneous CAs with r D 1 on the ordering task, and found that the best performing rule (rule 232) correctly ordered 71% of 1000 randomly generated ICs. The cellular programming algorithm evolved a non-homogeneous CA that outperformed the best homogeneous CA. The evolutionary search identified multiple rules that the non-homogeneous CA used as the components in the final solution. The rules composing the collective CA solution were classified as state preserving or repairing the incorrect ordering of the neighborhood bits. The untested hypothesis is that the cellular programming algorithm can discover multiple important rules (partial traits) that compose more complex collective behavior.

Parallel Cellular Machines

Coevolution

The field of evolving CAs has grown in several directions. One important area is evolving non-homogeneous cellular

Coevolution is an extension of the GA, introduced by Hillis [23], inspired by host-parasite coevolution in nature.

Evolving Cellular Automata

Evolving Cellular Automata, Figure 6 An example of the encoding of individuals in a GP population, similar to one used in [2]. The function set here consists of the logical operators {and, or, not, nand, nor, and xor}. The terminal set represents the states of cells in a CA neighborhood, here {Center, East, West, EastOfEast WestOfWest, EastOfEastOfEast, WestOfWestOfWest}. The figure shows the reproduction of Parent1 and Parent2 by crossover with subsequent mutation to produce Child1 and Child2

The main idea is that randomly generated test cases will not continually challenge evolving candidate solutions. Coevolution solves this problem by evolving two populations – candidate solutions and test cases – also referred to as hosts and parasites. The hosts obtain high fitness by performing well on many of the parasites, whereas the parasites obtain high fitness by being difficult for the hosts. Simultaneously coevolving both populations engages hosts and parasites in a mutual competition to achieve increasingly better results [7,17,56]. Successful applications of coevolutionary learning include discovery of minimal sorting networks, training artificial neural networks for robotics, function induction from data, and evolving game strategies [9,23,41,44,56,57]. Coevolution also improved upon GA results on evolving CA rules for density classification [28].

In the context of evolving CAs, the LUT candidate solutions are hosts, and the ICs are parasites. The fitness of a host is a fraction of correctly evaluated ICs from the parasite population. The fitness of a parasite is a function of the number of hosts that failed to correctly classify it. Pagie et al. and Mitchell et al. among others, have found that embedding the host and parasite populations in a spatial grid, where hosts and parasites compete and evolve locally, significantly improves the performance of coevolution on evolving CAs [39,41,42,57]. Other Applications The examples described in previous sections illustrate the power and versatility of genetic algorithms used to evolve desired collective behavior in CAs. The following are some

1049

1050

Evolving Cellular Automata

additional examples of applications of CAs evolved by GAs. CAs are most commonly used for modeling physical systems. CAs evolved by GAs modeled multi-phase fluid flow in porous material [61]. A 3D CA represented a pore model, and the GA evolved the permeability characteristics of the model to match the fluid flow pattern collected from the sample data. Another example is the modeling of physical properties of material microstructures [5]. An alternative definition of CAs (effector automata) represented a 2D cross-section of a material. The rule table specified the next location of the neighborhood’s center cell. The results show that the GA evolved rules that reconstructed microstructures in the sample superalloy. Network theory and topology studies for distributed sensor networks rely on connectivity and communication among its components. Evolved CAs for location management in mobile computing networks is an application in this field [50]. The cells in the mobile networks are mapped to CA cells where each cell is either a reporting or nonreporting cell. Subrata and Zomaya’s study used three network datasets that assigned unique communication costs to each cell. A GA evolved the rules that designate each cell as reporting or not while minimizing the communication costs in the network. The results show that the GA found optimal or near optimal rules to determine which cells in a network are reporting. Sipper also hinted at applying his cellular programming algorithm to non-homogeneous CAs with non-standard topology to evolve network topology assignments [47]. Chopra and Bender applied GAs to evolve CAs to predict protein secondary structure [10]. The 1D CA with r D 5 represents interactions among local fragments of a protein chain. A GA evolved the weights for each of the neighboring fragments that determine the shape of the secondary protein structure. The algorithm achieved superior results in comparison with some other protein-secondary-structure prediction algorithms. Built-In Self-Test (BIST) is a test method widely used in the design and production of hardware components. A combination of a selfish gene algorithm (a GA variant) and CAs were used to program the BIST architecture [12]. The individual CA cells correspond to the circuitry’s input terminals, and the transition function serves as a test pattern generator. The GA identified CA rules that produce input test sequences that detect circuitry faults. The results achieved are comparable with previously proposed GA-based methods but with lower overhead. Computer vision is a fast growing research area where CAs have been used for low-level image processing. The cellular programming algorithm has evolved non-homo-

geneous CAs to perform image thinning, finding and enhancing an object’s rectangle boundaries, image shrinking, and edge detection [47]. Future Directions Initial work on evolving two-dimensional CAs with GAs was done by Sipper [47] and Jimènez-Morales, Crutchfield, and Mitchell [27]. An extension of domain-particle analysis for 2D CAs is needed in order to analyze the information processing of CAs and to identify the epochs of innovations in evolutionary learning. Spatially extended coevolution was successfully used to evolve high performance CAs for density classification. Parallel cellular machines also used spatial embedding of their components and found better performing CAs than the homogeneous CAs evolved by a traditional GA. The hypothesis is that spatially extended search techniques are successful more often than non-spatial techniques because spatial embedding enforces greater genetic diversity and, in the case of coevolution, more effective competition between hosts and parasites. This hypothesis deserves more detailed investigation. Additional important research topics include the study of the error resiliency and the effect of noise on both the information processing in CAs and evolution of CAs. How successful is evolutionary learning in noisy environment? What is the impact of failing CA components on information processing and evolutionary adaptation? Similarly, to make CAs more realistic as models of physical systems, evolving CAs with asynchronous cell updates is an important topic for future research. A number of groups have shown that CAs and similar decentralized spatially extended systems using asynchronous updates can have very different behavior from those using synchronous updates (e. g., [1,6,25,49,53]). An additional topic for future research is the effect of connectivity network structure on the behavior and computational capability of CAs. Some work along these lines has been done by Teuscher [52]. Acknowledgments This work has been funded by the Center on Functional Engineered Nano Architectonics (FENA), through the Focus Center Research Program of the Semiconductor Industry Association. Bibliography 1. Alba E, Giacobini M, Tomassini M, Romero S (2002) Comparing synchronous and asynchronous cellular genetic algorithms. In: Guervos MJJ et al (eds) Parallel problem solving from nature.

Evolving Cellular Automata

2.

3. 4. 5.

6.

7.

8. 9.

10.

11. 12.

13.

14.

15.

16.

17.

PPSN VII, Seventh International Conference. Springer, Berlin, pp 601–610 Andre D, Bennett FH III, Koza JR (1996) Evolution of intricate long-distance communication signals in cellular automata using genetic programming. In: Artificial life V: Proceedings of the fifth international workshop on the synthesis and simulation of living systems. MIT Press, Cambridge Ashlock D (2006) Evolutionary computation for modeling and optimization. Springer, New York Back T (1996) Evolutionary algorithms in theory and practice. Oxford University Press, New York Basanta D, Bentley PJ, Miodownik MA, Holm EA (2004) Evolving cellular automata to grow microstructures. In: Genetic programming: 6th European Conference. EuroGP 2003, Essex, UK, April 14–16, 2003. Proceedings. Springer, Berlin, pp 77–130 Bersini H, Detours V (2002) Asynchrony induces stability in cellular automata based models. In: Proceedings of the IVth conference on artificial life. MIT Press, Cambridge, pp 382–387 Bucci A, Pollack JB (2002) Order-theoretic analysis of coevolution problems: Coevolutionary statics. In: GECCO 2002 Workshop on Understanding Coevolution: Theory and Analysis of Coevolutionary Algorithms, vol 1. Morgan Kaufmann, San Francisco, pp 229–235 Burks A (1970) Essays on cellular automata. University of Illinois Press, Urban Cartlidge J, Bullock S (2004) Combating coevolutionary disengagement by reducing parasite virulence. Evol Comput 12(2):193–222 Chopra P, Bender A (2006) Evolved cellular automata for protein secondary structure prediction imitate the determinants for folding observed in nature. Silico Biol 7(0007):87–93 Codd EF (1968) Cellular automata. ACM Monograph series, New York Corno F, Reorda MS, Squillero G (2000) Exploiting the selfish gene algorithm for evolving cellular automata. IEEE-INNSENNS International Joint Conference on Neural Networks (IJCNN’00) 06:6577 Crutchfield JP, Mitchell M, Das R (2003) The evolutionary design of collective computation in cellular automata. In: Crutchfield JP, Schuster PK (eds) Evolutionary Dynamics – Exploring the Interplay of Selection, Neutrality, Accident, and Function. Oxford University Press, New York, pp 361–411 Das R, Crutchfield JP, Mitchell M, Hanson JE (1995) Evolving globally synchronized cellular automata. In: Eshelman L (ed) Proceedings of the sixth international conference on genetic algorithms. Morgan Kaufmann, San Francisco, pp 336–343 Das R, Mitchell M, Crutchfield JP (1994) A genetic algorithm discovers particle-based computation in cellular automata. In: Davidor Y, Schwefel HP, Männer R (eds) Parallel Problem Solving from Nature-III. Springer, Berlin, pp 344–353 Farmer JD, Toffoli T, Wolfram S (1984) Cellular automata: Proceedings of an interdisciplinary workshop. Elsevier Science, Los Alamos Funes P, Sklar E, Juille H, Pollack J (1998) Animal-animat coevolution: Using the animal population as fitness function. In: Pfeiffer R, Blumberg B, Wilson JA, Meyer S (eds) From animals to animats 5: Proceedings of the fifth international conference on simulation of adaptive behavior. MIT Press, Cambridge, pp 525–533

18. Gardner M (1970) Mathematical games: The fantastic combinations of John Conway’s new solitaire game “Life”. Sci Am 223:120–123 19. Grassberger P (1983) Chaos and diffusion in deterministic cellular automata. Physica D 10(1–2):52–58 20. Hanson JE (1993) Computational mechanics of cellular automata. Ph D Thesis, Univeristy of California at Berkeley 21. Hanson JE, Crutchfield JP (1992) The attractor-basin portrait of a cellular automaton. J Stat Phys 66:1415–1462 22. Hartman H, Vichniac GY (1986) Inhomogeneous cellular automata (inca). In: Bienenstock E, Fogelman F, Weisbuch G (eds) Disordered Systems and Biological Organization, vol F20. Springer, Berlin, pp 53–57 23. Hillis WD (1990) Co-evolving parasites improve simulated evolution as an optimization procedure. Physica D 42:228–234 24. Hordijk W, Crutchfield JP, Mitchell M (1996) Embedded-particle computation in evolved cellular automata. In: Toffoli T, Biafore M, Leão J (eds) Physics and Computation 1996. New England Complex Systems Institute, Cambridge, pp 153–158 25. Huberman BA, Glance NS (1993) Evolutionary games and computer simulations. Proc Natl Acad Sci 90:7716–7718 26. Ikebe M, Amemiya Y (2001) VMoS cellular-automaton circuit for picture processing. In: Miki T (ed) Brainware: Bio-inspired architectures and its hardware implementation, vol 6 of FLSI Soft Computing, chapter 6. World Scientific, Singapore, pp 135–162 27. Jiménez-Morales F, Crutchfield JP, Mitchell M (2001) Evolving two-dimensional cellular automata to perform density classification: A report on work in progress. Parallel Comput 27(5):571–585 28. Juillé H, Pollack JB (1998) Coevolutionary learning: A case study. In: Proceedings of the fifteenth international conference on machine learning (ICML-98). Morgan Kaufmann, San Francisco, pp 24–26 29. Koza JR (1992) Genetic programming: On the programming of computers by means of natural selection. MIT Press, Cambridge 30. Koza JR (1994) Genetic programming II: Automatic discovery of reusable programs. MIT Press, Cambridge 31. Land M, Belew RK (1995) No perfect two-state cellular automata for density classification exists. Phys Rev Lett 74(25): 5148–5150 32. Langton C (1986) Studying artificial life with cellular automata. Physica D 10D:120 33. Langton C (1990) Computation at the edge of chaos: Phase transitions and emergent computation. Physica D 42:12–37 34. Lohn JD, Reggia JA (1997) Automatic discovery of self-replicating structures in cellular automata. IEEE Trans Evol Comput 1(3):165–178 35. Madore BF, Freedman WL (1983) Computer simulations of the Belousov-Zhabotinsky reaction. Science 222:615–616 36. Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge 37. Mitchell M (1998) Computation in cellular automata: A selected review. In: Gramss T, Bornholdt S, Gross M, Mitchell M, Pellizzari T (eds) Nonstandard Computation. VCH, Weinheim, pp 95–140 38. Mitchell M, Hraber PT, Crutchfield JP (1993) Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Syst 7:89–130 39. Mitchell M, Thomure MD, Williams NL (2006) The role of space in the success of coevolutionary learning. In: Rocha LM, Yaeger

1051

1052

Evolving Cellular Automata

40.

41. 42. 43.

44. 45.

46.

47. 48. 49.

50.

51.

LS, Bedau MA, Floreano D, Goldstone RL, Vespignani A (eds) Artificial life X: Proceedings of the tenth international conference on the simulation and synthesis of living systems. MIT Press, Cambridge, pp 118–124 Packard NH (1988) Adaptation toward the edge of chaos. In: Kelso JAS, Mandell AJ, Shlesinger M (eds) Dynamic patterns in complex systems. World Scientific, Singapore, pp 293–301 Pagie L, Hogeweg P (1997) Evolutionary consequences of coevolving targets. Evol Comput 5(4):401–418 Pagie L, Mitchell M (2002) A comparison of evolutionary and coevolutionary search. Int J Comput Intell Appl 2(1):53–69 Reynaga R, Amthauer E (2003) Two-dimensional cellular automata of radius one for density classification task D 12 . Pattern Recogn Lett 24(15):2849–2856 Rosin C, Belew R (1997) New methods for competitive coevolution. Evol Comput 5(1):1–29 Schadschneider A (2001) Cellular automaton approach to pedestrian dynamics – theory. In: Pedestrian and evacuation dynamics. Springer, Berlin, pp 75–86 Sipper M (1994) Non-uniform cellular automata: Evolution in rule space and formation of complex structures. In: Brooks RA, Maes P (eds) Artificial life IV. MIT Press, Cambridge, pp 394–399 Sipper M (1997) Evolution of parallel cellular machines: The cellular programming approach. Springer, Heidelberg Sipper M, Ruppin E (1997) Co-evolving architectures for cellular machines. Physica D 99:428–441 Sipper M, Tomassini M, Capcarrere M (1997) Evolving asynchronous and scalable non-uniform cellular automata. In: Proceedings of the international conference on artificial neural networks and genetic algorithms (ICANNGA97). Springer, Vienna, pp 382–387 Subrata R, Zomaya AY (2003) Evolving cellular automata for location management in mobile computing networks. IEEE Trans Parallel Distrib Syst 14(1):13–26 Tan SK, Guan SU (2007) Evolving cellular automata to gener-

52.

53.

54.

55. 56.

57.

58. 59. 60. 61.

ate nonlinear sequences with desirable properties. Appl Soft Comput 7(3):1131–1134 Teuscher C (2006) On irregular interconnect fabrics for self-assembled nanoscale electronics. In: Tyrrell AM, Haddow PC, Torresen J (eds) 2nd ieee international workshop on defect and fault tolerant nanoscale architectures, NANOARCH’06. Lecture Notes in Computer Science, vol 2602. ACM Press, New York, pp 60–67 Teuscher C, Capcarrere MS (2003) On fireflies, cellular systems, and evolware. In: Tyrrell AM, Haddow PC, Torresen J (eds) Evolvable systems: From biology to hardware. Proceedings of the 5th international conference, ICES2003. Lecture Notes in Computer Science, vol 2602. Springer, Berlin, pp 1–12 Vichniac GY, Tamayo P, Hartman H (1986) Annealed and quenched inhomogeneous cellular automata. J Stat Phys 45:875–883 von Neumann J (1966) Theory of Self-Reproducing Automata. University of Illinois Press, Champaign Wiegand PR, Sarma J (2004) Spatial embedding and loss of gradient in cooperative coevolutionary algorithms. Parallel Probl Solving Nat 1:912–921 Williams N, Mitchell M (2005) Investigating the success of spatial coevolution. In: Proceedings of the 2005 conference on genetic and evolutionary computation. Washington DC, pp 523– 530 Wolfram S (1984) Universality and complexity in cellular automata. Physica D 10D:1 Wolfram S (1986) Theory and application of cellular automata. World Scientific Publishing, Singapore Wolfram S (2002) A new kind of science. Wolfram Media, Champaign Yu T, Lee S (2002) Evolving cellular automata to model fluid flow in porous media. In: 2002 Nasa/DoD conference on evolvable hardware (EH ’02). IEEE Computer Society, Los Alamitos, pp 210

Evolving Fuzzy Systems

Evolving Fuzzy Systems PLAMEN ANGELOV Intelligent Systems Research Laboratory, Digital Signal Processing Research Group, Communication Systems Department, Lancaster University, Lancaster, UK Article Outline Glossary Definition of the Subject Introduction Evolving Clustering Evolving TS Fuzzy Systems Evolving Fuzzy Classifiers Evolving Fuzzy Controllers Application Case Studies Future Directions Acknowledgments Bibliography Glossary Evolving system In the context of this article the term ‘evolving’ is used in the sense of the self-development of a system (in terms of both its structure and parameters) based on the stream of data coming to the system on-line and in real-time from the environment and the system itself. The system is assumed to be mathematically described by a set of fuzzy rules of the form: Rule i : IF (Input1 is close to prototype1i ) AND . . . AND (Inputn is close to prototype ni ) THEN (Output i D Inputs

T

(1)

ConseqPara ms)

In this sense, this definition strictly follows the meaning of the English word “evolving” as described in [34], p. 294, namely “unfolding; developing; being developed, naturally and gradually”. Contrast this to the definition of “evolutionary” in the same source, which is “development of more complicated forms of life (plants, animals) from earlier and simpler forms”. The terms evolutionary or genetic are also associated with such phenomena (respectively operators that mimic these) as chromosome crossover, mutation, selection and reproduction, parents and off-springs [32]. Evolving (fuzzy and neuro-fuzzy) systems do not deal with such phenomena. They rather consider a gradual development of the underlying (fuzzy or neuro-fuzzy) system structure.

Fuzzy system structure Structure of a fuzzy (or neurofuzzy) system is constituted of a set of fuzzy rules (1). Each fuzzy rule is composed of antecedent (IF) and consequents (THEN) parts. They are linguistically expressed. The antecedent part consists of a number of fuzzy sets that are linked with fuzzy logic aggregators such as conjunction, disjunction, more rarely, negation [43]. In the above example, a conjunction (logical AND) is used. It can be mathematically described by so-called t-norms or t-conorms between membership functions. The most popular membership functions are Gaussian, triangular, trapezoidal [73]. The consequent part of the fuzzy rules in the so-called Takagi–Sugeno (TS) form is represented by mathematical functions (usually linear). The structure of the TS fuzzy system can also be represented as a neural network with a specific (five layer) composition (Fig. 1). Therefore, these systems are also called neuro-fuzzy (NF). The number of fuzzy rules and inputs (which in case of classification problems are also called features or attributes) is also a part of the structure. The first layer consists of neurons corresponding to the membership functions of a wparticular fuzzy set. This layer takes the inputs, x and gives as output the degree, μ to which these fuzzy descriptors are satisfied. The second layer represents the antecedent parts of the fuzzy rules. It takes as inputs the membership function values and gives as output the firing level of the ith rule, i . The third layer of the network takes as inputs the firing levels of the respective rule, i and gives as output the normalized firing level, i as “center of gravity” [43] of i . As an alternative one can use the “winner takes all” operator. This operator is used usually in classification, while the “center of gravity” is preferred for time-series prediction and general system modeling and control. The fourth layer aggregates the antecedent and the consequent part that represents the local sub-systems (singletons or hyper planes). Finally, the last 5th layer forms the total output of the NF system. It performs a weighed summation of local subsystems. Fuzzy system parameters Parameters of the NF system of TS type include the center, c and spread,  of the Gaussians or parameters of the triangular (or trapezoidal) membership functions. An example of a Gaussian type membership function can be given as: De

 12

 2 d r

(2)

where d denotes distance between a data sample (point in the data space) and a prototype/cluster center (fo-

1053

1054

Evolving Fuzzy Systems

Evolving Fuzzy Systems, Figure 1 Structure of the (neuro-fuzzy) system of TS type

cal point of a fuzzy set); r is the radius of the cluster (spread of the membership function). Note that the distance can be represented by Euclidean (the most typical example), Mahalonobis [33], cosine etc. forms. These parameters are associated with the antecedent part of the system. Consequent part parameters are coefficients of the (usually) linear functions, singleton coefficients or coefficients of more complex functions (e. g. exponential) if such ones are used.

cursive calculations. 1 P(z) D : 1 C 2

(5)

Age of a cluster or fuzzy rule The age of the (evolving) cluster is defined as the accumulated time of appearance of the samples that form the cluster which support that fuzzy rule. 0 i 1 Sk . X i A D k@ k l A S ki (6) l D1

y i D a i0 C a i1 x1 C    C a i n x n

(3)

where a denotes parameters of the consequent part; x denote the inputs (features); i is the index of the ith fuzzy rule; n is the number (dimensionality) of the inputs (features). Potential Potential is a mathematical measure of the data density. It is calculated at a data point, z and represents numerically the accumulated proximity (density) of the data surrounding this data point. It resembles the probability distribution used in so-called Parzen windows [33] and is described in [26,72] by a Gaussian-like function: 1

P(z) D e 2r 

2

(4)

where z D [x; y] denotes the joint (input/output) vector; k1 1 P 2  2k D k1 d (z k ; z i ) is the variance of the data in iD1

terms of the cluster center. In [3,9] the Cauchy function is used which has the same properties as the Gaussian but is suitable for re-

where k denotes the current time instant; S ki denotes the support of the cluster that is the number of data samples (points) that are in the zone of influence of the cluster (formed by its radius). It is derived by simple counting of data samples (points) at the moment of their arrival (when they are first read) and assigned to the nearest cluster [10]. The values of A vary from 0 to k and the derivative of A in respect to time is always less or equal to 1 [17]. An “old” cluster (fuzzy rule) has not been updated recently. A “young” cluster (fuzzy rule) is one that has predominantly new samples or recent ones. The (first and second) derivatives of the age are very informative and useful for detection of data “shift” and “drift” [17]. Definition of the Subject Evolving Fuzzy Systems (EFS) are a class of Fuzzy Rulebased (FRB) and Neuro-Fuzzy (NF) systems that have both their parameters and underlying structure self-adapting, self-developing, self-learning from the data in on-line mode and, possibly, in real-time. The concept was con-

Evolving Fuzzy Systems

ceived at the beginning of this century [2,5]. Parallel investigations have led to similar developments in neural networks (NN) [41,42]. EFS have the significant advantage compared to the evolving NN of being linguistically tractable and transparent. EFS have been instrumental in the emergence of new branches of evolving clustering algorithms [3], evolving classifiers [16,51], evolving time-series predictors [9,47], evolving fuzzy controllers [4], evolving fault detectors [30] etc. Over the last years EFS has demonstrated a wide range of applications spanning robotics [76] and defense [24] to biomedical [70] and industrial process [29] data processing in real-time, new generations of self-calibrating, self-adapting sensors [52,53], speech [37] and image processing [56] etc. EFS have the potential to revolutionize such areas as autonomous systems [66], intelligent sensors [45], early cancer detection and diagnosis; they are instrumental in raising the so-called machine intelligence quotient [74] by developing systems that self-adapt in real-time to the dynamically changing environment and to internal changes that occur in the system itself (e. g. wearing, contamination, performance degradation, faults etc.). Although the terms intelligent and artificial intelligence have been used often during the last several decades the technical systems that claim to have such features are in reality far from true intelligence. One of the main reasons is that true intelligence is evolving, it is not fixed. EFS are the first mathematical constructs that combine the approximate reasoning typical for humans represented by the fuzzy inference with the dynamically evolving structure and respective formal mathematically sound learning mechanisms to implement it. Introduction Fuzzy Sets and Fuzzy Logic were introduced by Lotfi Zadeh in 1965 in his seminal paper [71]. During the last decade of the previous century there was an increase of the various applications of fuzzy logic-based systems mainly due to the introduction of fuzzy logic controllers (FLC) by Ebrahim Mamdani in 1975 [54], the introduction of the fuzzily blended linear systems construct called Takagi–Sugeno (TS) fuzzy systems in 1985 [65], and the theoretical proof that FRB systems are universal approximators (that is any arbitrary non-linear function in the [0; 1] range can be asymptotically approximated by a FRB system [68]). Historically, the FRB systems where first being designed based entirely or predominantly on human expert knowledge [54,71]. This offers advantages and was a novel technique at that time for incorporating uncertain, subjective information, preferences, experience, intuition, which are difficult or impossible to be described

otherwise. However, it poses enormous difficulties for the process of designing and routine use of these systems, especially in real industrial environments and in on-line and real-time modes. TS fuzzy systems made possible the development of efficient algorithms for their design not only in off-line, but also in on-line mode [14]. This is facilitated by their dual nature – they combine a fuzzy linguistic premise (antecedent) part with a functional (usually linear) consequent part [65]. With the invention of the concept of EFS [2,5] the problem of the design was completely automated and data-driven. This means, EFS systems selfdevelop their model, respectively system structure as well as adapt their parameters “from scratch” on the fly using experimental data and efficient recursive learning mechanisms. Human expert knowledge is not compulsory, not limiting, not essential (especially if it is difficult to obtain in real-time). This does not necessarily mean that such knowledge is prohibited or not possible to be used. On the contrary, the concept of EFS makes possible the use of such knowledge in initialization stages, even during the learning process itself, but this is not essential, it is optional. Examples of EFS are intelligent sensors for oil refineries [52,53], autonomous self-localization algorithms used by mobile robots [75,76], smart agents for machine health monitoring and prognosis in the car industry [30], smart systems for automatic classification of images in CD production process [51] etc. This is a new promising area of research and new applications in different branches of industry are emerging. Evolving Clustering Data Clustering and, Fuzzy Clustering in particular, are methods for grouping the data based on their similarity, density in the data space and proximity. Partitioning of the data into clusters can be done off-line (using a batch set of data, performing iterative computations over this set, minimizing certain criteria/cost function) or on-line, incrementally. Examples of incremental clustering approaches are self-organizing maps (SOM) conceived by Teuvo Kohonen in the early 1980s [44], adaptive resonance theory (ART) by Stephen Grossberg conceived in the same period [25] etc. Clustering is a type of unsupervised learning technique where the correct examples are not provided. Usually, the number of clusters is pre-specified, e. g. in SOM the number of nodes of the map is pre-defined; the number of neighbors, k in the k-nearest neighbor clustering method [33] is also supposed to be provided, the number C in the fuzzy c-means (FCM) fuzzy clustering algorithm by Jim Bezdek should also be provided [22]. Usually these approaches rely on a threshold and are very sensitive

1055

1056

Evolving Fuzzy Systems

to the specific values of this threshold. Most of the existing approaches are also mean-based (i. e. they use the mean of all data or mean of groups of data). The problem is that the mean is a virtual (non-existing and possibly infeasible) point in the data space. In contrast, the evolving clustering method eClustering conceived in the last decade [3] does not need the number of clusters, the threshold or any other parameter to be pre-specified. It is parameter-free and starts “from scratch” to cluster the data based on their density distribution alone. It is based on the recursive calculation of the potential (5). eClustering is prototype-based (some of the data points are used as prototypes of cluster centers). The procedure of the evolving clustering approach starts from scratch assuming that the first available data point is a center of a cluster. This assumption is temporary and if a priori knowledge exists the procedure can start with an initial set of cluster centers that will be further refined. The coordinates of the first cluster center are formed from the coordinates of the first data point (z1 z1 ). Its potential is set to the ideal value, P1 (z1 ) ! 1. Starting from the next data point which is read in real-time, the following steps are performed for each new data point:  calculate its potential, Pk (z k );  update the potential of the existing cluster centers (because their potential has been affected by adding a new data point);  compare the potential of the new data point with the potential of the previously existing centers. On the basis of this comparison and the membership of the existing clusters one of the following actions is taken: (add a new cluster center based on the new data point) OR (remove the cluster that describes well the new point which brings an increment to the potential) AND (replace it with a cluster formed around the new point) OR (ignore (do not change the cluster structure)). The process is illustrated in Fig. 2 for the data of NOx emissions from a car exhaust [13]. One can see that the clustering evolves (number of clusters increases from two to three, their position and their radius has changed. Note that in this experiment only two normalized to the range [0; 1] inputs (features), namely the engine output torque in N/m, x1 and pressure in the second cylinder in Pa, x2 are used. For more details on eClustering, please, consult the papers from the bibliography, and especially [2,3,9,16]. Evolving TS Fuzzy Systems TS fuzzy systems [as illustrated in Fig. 1 and described in a very general form in Eq. (1)] were first introduced in

Evolving Fuzzy Systems, Figure 2 The Evolving Clustering method applied to data concerning NOx emissions; a top plot-after 43 samples are read (after 43 s because the sampling rate is 1 sample/second or 1 Hz); b bottom plot – after 124 samples are read (after 124 s)

1985 [65] in the form: < i : IF (x1 is A i1 ) AND . . . AND (x n is A i n ) THEN (y i D a i0 C a i1 x1 C    C a i n x n )

(7)

where Ai denotes the ith fuzzy rule (i D [1; R]); R is the number of fuzzy rules; x is the input vector; x D [x1 ; x2 ; : : : ; x n ]T ; Aij denotes the antecedent fuzzy sets, j 2 f1; ng; y i is the output of the ith linear subsystem; ail are its parameters, l 2 f0; ng. The structure of the TS system (number of fuzzy rules), antecedent part of the rules, number of inputs etc., are supposed to be known and are fixed. The data may be provided to the TS system in off-line or in on-line manner.

Evolving Fuzzy Systems

Different data-driven techniques were developed to identify the best in terms of certain (local or global) error minimization criteria such as using a (recursive) least squares technique [65], using genetic algorithms [7,62] etc. The overall output is found to be a weighted sum of local outputs produced by each fuzzy rule: yD

R X

i y i :

(8)

iD1

Where the weights,  represent the normalized firing level of the respective fuzzy rule and can be determined by: n

T  ij (x j )

i D

jD1 R P

:

n

(9)

T  lj (x j )

l D1 jD1

In a vector form the above equations can be represented as: yD

T

(10)



where D [1 x Te ; 2 x Te ; : : : ; R x Te ]T is the vector weighted extended inputs; x Te D [1; x T ]; T   D 1T ; 2T ; : : : ; RT is the vector of parameters; 2

i ˛01 i ˛11

6 i D 6 4 ::: i ˛n1

i ˛02 i ˛12

::: i ˛n2

::: ::: ::: :::

i ˛0m i ˛1m

of

3

7 7 ::: 5 i ˛nm

are the parameters of the m local linear sub-systems. The assumption that the TS fuzzy system structure has to be known a priori was for the first time questioned in [2,5] and ultimately in [9] with the proposal of evolving TS (eTS) systems. In [15] a further extension of the eTS system was proposed, namely that they can have many outputs. In this way, the multi-input-multi-output eTS systems were introduced (MIMO-eTS). eTS is a very flexible and powerful tool for time-series prediction, prognosis, modeling non-stationary phenomena, intelligent sensors etc. The algorithm for its learning from streaming data in real-time has two basic phases, which can both be performed very quickly (in one time step between the arrival of two data samples – the current one and the next one). The learning mechanism proposed in [9] is computationally very efficient because it is fully recursive. The two phases include: (a) Data space partitioning and based on this forming and update of the fuzzy rule-base structure;

(b) Learning parameters of the consequent part of the fuzzy rules. Note that the partitioning of the data space serves in eTS identification a different purpose compared to the purpose of data space partitioning in eClustering. In eTS there are outputs and the aim is to find such (perhaps overlapping) clustering of the input-output joint data space that fragments the input-output relationship into locally valid simpler (possibly linear) dependences. In eClustering the aim is to cluster the input data space into distinctive regions. Other than that, the first phase of the eTS model identification is the same as the procedure in the eClustering method described above. The second phase of the learning is parameter identification. It can be performed using a fuzzily weighted version [9] of the well-known recursive least squares (RLS) method [50]. One can perform either local (11) or global (12) identification by minimizing different cost functions [9]: JL D

R X 

Y  X T i

T

  i Y  X T i

(11)

iD1

 T   Y   T : JG D Y   T 

(12)

In one of the cases (when a local cost function is used) the result will be a better approximation locally of the overall non-linear function by the local linear sub-models. The pay-off is, however, a poorer overall approximation. This is, however, compensated by a simpler and computationally more efficient procedure (if we use a locally valid cost function the covariance matrices of much smaller size can be used and they require much less memory space and time to perform computations) [9]. Evolving Fuzzy Classifiers Classification is a problem that has been well studied and a large number of conventional approaches exist to address this problem [33]. Most of them, however, are designed to operate in batch mode and do not change their structure on-line (do not capture new patterns that may be present in the streaming data once the classifier is built). Off-line pre-trained classifiers may be good for certain scenarios, but they need to be redesigned or retrained if the circumstances change. There are also so-called incremental (or on-line) classifiers which work on a “sample-bysample” basis and only require the features of that sample plus a small amount of aggregated information (a rulebase, a small number of variables needed for recursive calculations). They do not require all the history of the data

1057

1058

Evolving Fuzzy Systems

stream (all previously seen data samples). Sometimes they are also called one-pass (each sample is processed only once at a time and is then discarded from the memory). FRB systems have been successfully applied to a range of classification tasks including, but not limited to, decision making, fault detection, pattern recognition, image processing [46]. FRB systems have become one of the alternative frameworks for classifier design together with the more established Bayesian classifiers, decision trees [33], neural network-based classifiers [57], and support-vector machines (SVM) [67]. The task of the classifier is to map the set of features of the sample data onto the set of class labels. A particular advantage of the FRB classifiers is that they are linguistic in form while being also proven universal approximators [68]. In the framework of the concept of evolving fuzzy systems a family of evolving fuzzy classifiers, eClass was proposed in [16,17,51]. The first type of evolving fuzzy classifiers, eClass0 has the typical structure of a fuzzy classifier [46] that differs from structure (1) by the consequent part only: R i : IF (Feature1 is close to prototype1i ) AN D : : : AN D (Feature n is close to prototype ni ) THEN (ClassLabel i )

(13)

The output of eClass0, in the same way as typical fuzzy classifiers [46] provides the label of the class (0, 1 etc.) directly. In this sense, it is not a TS fuzzy system, but is closer to the Mamdani-type fuzzy systems [54]. The main difference of the eClass0 from the typical classifiers [46] is its ability to evolve, to expand the set of fuzzy rules that enables it to capture new data patterns, to adapt to possibly changing characteristics of the streaming data [16,17]. The inference in eClass0 is produced using the so-called “winner takes all” rule [33,46]: R

Label D Label i ; i  D arg max iD1

n   T  ij x j

! :

(14)

jD1

It is much easier and faster to build and train eClass0 in real-time, but the results of the classification can be further significantly improved if the classifier structure is assumed to be of TS-type. eClass is designed for on-line applications with an evolving (self-developing) FRB structure. The antecedents of the FRB are formed from the data stream around highly descriptive focal points (prototypes) per class in the input-output space. The features vector, x is augmented with the class label, L to form a joint input T output vector z D x T ; L . The eClustering algorithm is

applied per class (not over all the data). In this way, information granules (primitive forms of knowledge) [36] are formed in real-time around descriptive data samples, represented linguistically by fuzzy sets. This on-line algorithm works similarly to adaptive control [19] and estimation [39] – in the period between two samples two phases are performed: (1) class prediction (classification); (2) classifier update or evolution. During the first phase the class label is not known and is being predicted; during the second phase, however, it is known and is used as supervisory information to update the classifier (including its structure evolution as well as its parameters update). An alternative structure of the fuzzy classifier, eClass1 is based on the TS-type fuzzy system which has a consequent part of functional type as described in (7). The architecture of eClass1 differs significantly from the architecture of eClass0 and the typical FRB [46]. It performs a regression over the features. Having in mind that the classification surface in a data stream changes dynamically the goal of the evolving fuzzy classifier eClass1 is to evolve a rule-base which takes these changes into account by adapting parameters of the FRB (spreads, consequent parameters) as well as the focal points and the size of the rule-base. The output of each rule is a real (not integer as in the typical fuzzy classifiers) value, which if normalized represents the possibility of a data sample to be of certain class [16,17]: yi D

yi R P

:

(15)

yi

iD1

The overall output of the classifier is then taken as a weighted average (not as winner takes all as in typical fuzzy classifiers) of the normalized outputs of each fuzzy rule: n

yD

R X iD1

T  ij

jD1 R P

n

T

l D1 jD1

yi :

(16)

 lj

This output is then used to discriminate between the classes. If the problem has two classes (A and B) then the target values are, obviously, 0 for one of the two classes (e. g. Class A) and 1 for the other one (Class B) or vice versa. To discriminate in this case one can simply use a threshold of 0.5. All the outputs that are above 0.5 are being classified as Class B while all the outputs below 0.5 are classified as Class A or vice versa: IF (y > 0:5) THEN (Class A) ELSE (Class B): (17)

Evolving Fuzzy Systems

Evolving Fuzzy Systems, Figure 3 Indirect learning-based control scheme

When the problem has more than two classes, one can apply MIMO eTS where each of the K outputs corresponds to the possibility that a data sample belongs to a certain class (as discussed above). It is interesting to note that it is possible to use MIMO eTS for a two-class problem. In order to do this, one needs to have vectors that represent the target outputs, for example y D [1 0] for Class A and y D [0 1] for Class B or vice versa. In eClass1-MIMO the label is determined by the highest value of the discriminator, y l : K

Label D Label i ; i  D arg max y l

(18)

l D1

where K denotes the number of classes. Evolving Fuzzy Controllers Fuzzy logic controllers have been applied in a range of industrial processes [48,59] around the word including in home appliances [21]. The structure of the controller, however, is often decided in an ad hoc manner [54,59] and parameters are tuned off-line using various numerical techniques such as genetic algorithms for example [27,32,63]. In reality, however, even if a validation test has been made beforehand, there is no guarantee that a controller designed in this way will perform satisfactory if the object of control or its environment change [4]. The reasons could be aging, wearing, change in the mode of operation or development of a fault, seasonal changes in the environment etc. An effective mechanism for tackling such problems known from the classical control theory is adaptation [19]. It is well developed for the linear mod-

els and controllers [38], but not for a general (very often highly non-linear, complex and uncertain) case [2]. Adaptive control theory assumes a linear model with a fixed structure and applies to parameters only [19,38]. The concept of evolving fuzzy systems has been applied to the control problem in [2,4] in terms of self-developing the structure of a controller from experimental data in a data-driven manner based on the indirect adaptive learning scheme proposed initially by Psaltis [60] and developed further using NN by Anderson [1]. The indirect learning (IL) control scheme is based on the approximation of the inverse dynamics of the plant. The IL-based control scheme is a model-free concept. It feeds back the integrated (or delayed one-step back) output signal instead of feeding back the error between the plant output and the reference signal as represented in Fig. 3. Figure 3 represents only the basic concept of the approach. It has two phases and the switching between them can be represented by an imaginary switch knob. When the imaginary knob, K is in position “1” the controller is used and we are in phase “Control”. When the imaginary knob is in position “2” the controller learns, self-develops and we are in phase “Learning”. During the supervisory learning phase, the true output signal (y kC1 ) at the time-instant (k C 1) is fed back and the knob is in position “2”. The controller also receives a signal that is a delayed true output, yk . The controller has as an output the value of the control signal, uk . During the control phase (when the knob is in position “1”) the input is determined entirely based on the reference signal (ref) as an alternative to the predicted next step output, y kC1 . In this way, the controller already trained in the previous learning phases produces such a control signal (uk ), which brings the out-

1059

1060

Evolving Fuzzy Systems

Evolving Fuzzy Systems, Figure 4 Fuzzy sets for different fractions of the crude that contribute to the different quality of the end product (naphtha in this case) can be extracted automatically in real-time from the data stream using eSensor

put of the plant at the next time step (y kC1 ) close to the reference signal (ref). The IL scheme was taken further in [2,4] by implementing the controller as an evolving FRB system of TStype. The original works realized the controller as a NN that was trained off-line based on a batch set of training data for the control action and output triplets of the form [y k ; y kC1 ; u k ] for k D 1; 2; : : : ; N; where N denotes the number of training data samples. However, learning techniques for NN are iterative and, therefore training of the NN-controller as described in [1,60] is performed off-line. Additionally, NN suffers from the important disadvantage in comparison to the FRB systems that they are not transparent. In [2,4] the basic scheme of IL control is taken further by adding a disturbance and by using eTS to realize the controller. This scheme was implemented on temperature-control problems. Application Case Studies Self-Calibrating Intelligent Sensors for Process Industries So-called intelligent or inferential sensors have been adopted by the process industries (chemical, petro-chemical, manufacturing) for several decades [31]. The main reason is that they provide accurate real time estimates of difficult to measure otherwise parameters or can replace (substitute) expensive measurements such as gas emissions, biomass, melt index, etc. They use as inputs the available (“hard”) sensors for easy to measure physical variables, such as temperatures, pressures, and flows

which are also cheaper. The main disadvantage of the currently existing inferential or “soft” sensors is that significant efforts must be made based on batch sets of data to develop and maintain the mathematical models that support them (neural networks, statistical models etc.). Any process changes outside the conditions used for off-line model development can lead to significant performance deterioration which, in turn, requires maintenance and recalibration. Evolving Fuzzy Systems offer an effective opportunity to develop “soft” sensors that are more flexible, self-calibrating, and thus, more “intelligent” [45]. Several applications of EFS-based soft sensors, in particular for oil refineries [52,53] and propylene production [12] were reported. An important advantage of the evolving sensors is that they extract human-interpretable knowledge in the form of linguistic fuzzy rules. For example, Fig. 4 illustrates membership functions of the fuzzy sets (in values in the range [0;1] on the vertical axis) that describe the density of the crude, d in gram per liter (g/l) on the horizontal axis. The evolving fuzzy sensor (eSensor) implemented in the oil refinery at Santa Cruz, Tenerife, Spain predicts in realtime the temperature of the heavy naphtha (hn) T hn , °C in degrees Celsius when it evaporates 95% liquid volume, according to the ASTM D86-04b standard based on realtime measurements of:  The pressure of the tower, p, measured in kg/cm2 g  The amount of the product taking off, P, represented in %  The density of the crude, d in g/l (illustrated in Fig. 4)

Evolving Fuzzy Systems

Evolving Fuzzy Systems, Figure 5 Fuzzy rule describing a Landmark (underpass at Lancaster University campus) that was discovered automatically by eClustering using video streaming data

 Temperature of the column overhead, T co in °C  Temperature of the naphtha extraction, T ne in °C An expert in the area of oil refining processes can easily visually distinguish between the heavy crude and light crude represented by respective membership functions derived automatically from the data in real-time. Adaptive Real-Time Classifiers (Image, Land-Mark, Robotic) Presently the data that are to be processed in industry, in defense and other real-life applications are not only huge in volume, but very often they are in the form of data streams [28]. This requires not only precise classifiers, but also dynamically evolvable classifiers. For example, in mobile robotics, an autonomous vehicle produces a video stream while operating in a completely unknown environment that needs to be processed [75,76]. The Evolving Clustering method was used to automatically generate a fuzzy rule-base that describes the landmarks discovered without any prior learning, “on the fly” by a mobile robot Pioneer 3DX [58] exploring a completely unknown environment [76]. The mobile robot is using its on-board pantilt zoom camera to produce a video stream. The frames were grasped and processed by the eClustering algorithm based on a 3-dimensional color-vector (R, G, B). Fuzzy rules of the following form were extracted from the data automatically: Note that the landmarks that were identified automatically represented real objects on the route of the mobile robot such as the underpass of Lancaster University from the example. They were identified to be distinctive from the surrounding background by eClustering. The fuzzy rule base (Fig. 5) has evolved “from scratch” based on the video information and the data distribution only.

Predictive Models (Air-Conditioning, Financial Time-Series, Benchmark Data Sets) There are different techniques that can be used for predictive models, such as ARMAX models [50], neural networks [34] non-evolving (fixed structure) fuzzy rule-based models [65,73]. Evolving fuzzy systems, however, offer additionally the capability to have a predictor that evolves following the dynamic changes of the data by gradually adapting not only its parameters, but also its structure. In [6] the problem of predicting the characteristic temperature difference across a coil in a heat exchanger of an air-conditioning unit installed in a real building in Iowa, USA is considered. The evolving fuzzy rule-based model develops its structure and parameters based on the data of the flow rate entering the coil, moisture content of the air entering the coil, temperature of the chilled water, and control signal to the valve as illustrated in Fig. 6. The model proved to work satisfactorily in all season conditions due to its ability to adapt to the changes in the environment (different seasons) as demonstrated in Fig. 6c. When pre-trained (based on 400 samples) and fixed as both structure and parameters the performance deteriorated unacceptably in changing seasonal conditions as seen in Fig. 6a. A partial re-training improved significantly the results (Fig. 6b), but still this was less valid when the season changed (at around sample 912) and the model structure evolution has stopped (at sample 1000). When, the model structure evolution continued uninterrupted the result was a satisfactory performance in all seasons as seen in Fig. 6c. Fault Detection and Prognostics Evolving clustering and eTS fuzzy systems were applied in Ford Motor Company to machine health monitoring and prognosis in [30]. The ability of the evolving cluster-

1061

1062

Evolving Fuzzy Systems

Speech Signal Reconstruction The ETS fuzzy system is used in [37] for error concealment in the next-generation Voice over Internet Protocol (VoIP) communication receivers. It is used in combination with parametric speech coders of analysis-by-synthesis-type. eTS MIMO [15] is used to predict the missing values of the linear spectral pairs (LSP) that will allow one to reconstruct the lost in transmission packets. The eTS fuzzy model used ten inputs (current LSP parameter values) and ten outputs (predicted one step/20 milliseconds ahead LSP values). This research was a joint work between Lancaster University and Nokia-UK and aims the development of the next generation intelligent decoders at the receiver that will be able to conceal lost packets with a size of 80 to 160 ms without significant deterioration to the quality of service (QoS) in VoIP transmission. Future Directions

Evolving Fuzzy Systems, Figure 6 A predictive model of the characteristic temperature difference across a coil in a heat exchanger of an air-conditioning unit installed in a real building in Iowa, USA. a An off-line pre-trained model used in two different seasons (summer and spring); b Evolving FRB model trained and used during the summer (up to sample 1000) and then having its structure fixed during the spring season; c Evolving FRB model left to evolve (self-develop) during the whole period of usage (both spring and summer seasons)

ing method to form new clusters that represent different operating modes of a machine was exploited and different types of faults (incipient or drastic) were automatically identified based on the difference in the cluster formation. A prediction of the direction of movement of the cluster centers was used for prediction of possible faults and of the end-of-life of the machine.

The area of evolving fuzzy systems is in its infancy and has already demonstrated a remarkable success in addressing some of the most vibrating issues of the development, application and implementation of truly intelligent systems in a wide variety of branches of industry and real-life problems [11]. It opens the door for future developments that are related to the areas of autonomous systems, early cancer diagnosis and prognosis of the progression, even to the identification of structural changes in biological cells that correspond to the evolution of the disease. In the area of intelligent self-maintaining sensors the process industry can benefit from more flexible and smarter solutions. The problems that are yet to be addressed and can mark the future development of this vibrant area are; (1) collaboration aspects between two or more evolving fuzzy systembased intelligent systems (autonomous robots, intelligent sensors etc.); (2) further flexibility of the systems in terms of real-time self-analysis, optimal features and input selection, rule aggregation mechanism adaptation etc.; (3) even more flexible system structure architectures such as hierarchical, decentralized; (4) more robust learning algorithms that take care of missing data, different sampling intervals etc. From a broader prospective, the future developments of this discipline will influence and are closely related to similar developments in the area of communication networks (self-adaptive networks [64]), self-validating soft sensors [61], autonomous aerial, ground-based, and underwater vehicles [20,40,49] etc. The area is closely related to the developments in the area of neural networks [34,48], so-called autonomous mental development [23] and cognitive psychology [55], mining data streams [28]. One

Evolving Fuzzy Systems

can also expect more hardware implementations (the first hardware implementation of eClustering was reported in 2005 [8]). From the point of view of mathematical fundamentals and learning it is also closely related to adaptive filters theory [69] and the recent developments in particle filters [17] will certainly influence the future, more efficient techniques that will be developed in this emerging and highly potential branch of research. Acknowledgments The author would like to thank Mr. Xiaowei Zhou for his assistance in producing the illustrative material, Dr. Jose Macias Hernandez for kindly providing real data from the oil refinery CEPSA, Santa Cruz, Tenerife, Spain, Dr. Richard Buswell, Loughborough University and ASHRAE (RP-1020) for the real air conditioning data, and Dr. Edwin Lughofer from Johannes Kepler University of Linz, Austria for providing real data from car engines. Bibliography Primary Literature 1. Andersen HC, Teng FC, Tsoi AC (1994) Single Net Indirect Learning Architecture. IEEE Trans Neural Netw 5:1003–1005 2. Angelov P (2002) Evolving Rule-based Models: A Tool for Design of Flexible Adaptive Systems. Springer, Heidelberg 3. Angelov P (2004) An Approach for Fuzzy Rule-base Adaptation using On-line Clustering. Int J Approx Reason 35(3): 275–289 4. Angelov PP (2004) A Fuzzy Controller with Evolving Structure. Inf Sci 161:21–35 5. Angelov P, Buswell R (2001) Evolving Rule-based Models: A Tool for Intelligent Adaptation. In: Proc of the Joint 9th IFSA World Congress and 20th NAFIPS Intern Conf, Vancouver, 25– 28 July 2001. IEEE Press, USA, pp 1062–1066 6. Angelov P, Buswell R (2002) Identification of Evolving Rulebased Models. IEEE Trans Fuzzy Syst 10(5):667–677 7. Angelov P, Buswell R (2003) Automatic Generation of Fuzzy Rule-based Models from Data by Genetic Algorithms. Inf Sci 150(1/2):17–31 8. Angelov P, Everett M (2005) EvoMap: On-Chip Implementation of Intelligent Information Modelling using EVOlving MAPping. Lancaster University, Lancaster, pp 1–15 9. Angelov P, Filev D (2004) An approach to on-line identification of evolving Takagi–Sugeno models. IEEE Trans Syst Man Cybern part B Cybern 34(1):484–498 10. Angelov P, Filev D (2005) Simpl_eTS: A Simplified Method for Learning Evolving Takagi–Sugeno Fuzzy Models. In: Proc of The 2005 IEEE Intern. Conf. on Fuzzy Systems FUZZ-IEEE – 2005, Reno 2005, pp 1068–1073 11. Angelov P, Filev D, Kasabov N, Cordon O (eds) (2006) Evolving Fuzzy Systems. Proc of the 2nd Int Symposium on Evolving Fuzzy Systems, Ambleside, 7–9 Sept 2006. pp 1–350 12. Angelov P, Kordon A, Zhou X (2008) Adaptive Inferential Sensors based on Evolving Fuzzy Models: An Industrial Case Study. IEEE Trans Fuzzy Syst (under review)

13. Angelov P, Lughofer E, Klement PE (2005) Two Approaches for Data – Driven Design of Evolving Fuzzy Systems: eTS and FLEXFIS. In: Proc of The 2005 North American Fuzzy Information Processing Society, NAFIPS Annual Conference, Ann Arbor, June 2005, pp 31–35 14. Angelov P, Victor J, Dourado A, Filev D (2004) On-line evolution of Takagi–Sugeno Fuzzy Models. In: Proc of the 2nd IFAC Workshop on Advanced Fuzzy and Neural Control, Oulu, 16– 17 Sept 2004, pp 67–72 15. Angelov P, Xydeas C, Filev D (2004) On-line Identification of MIMO Evolving Takagi–Sugeno Fuzzy Models. In: Proc of the Intern. Joint Conf. on Neural Networks and Intern. Conf. on Fuzzy Systems, IJCNN-FUZZ-IEEE, Budapest, 25–29 July 2004, pp 55–60 16. Angelov P, Zhou X, Klawonn F (2007) Evolving Fuzzy Rulebased Classifiers. In: Proc of the First 2007 IEEE International Conference on Computational Intelligence Applications for Signal and Image Processing – a part of the IEEE Symposium Series on Computational Intelligence, SSCI-2007, Honolulu, 1–5 April 2007, pp 220–225 17. Angelov P, Zhou X, Lughofer E, Filev D (2007) Architectures of Evolving Fuzzy Rule-based Classifiers. In: Proc of the 2007 IEEE International Conference on Systems, Man, and Cybernetics, Montreal, 7–10 Oct 2007, pp 2050–2055 18. Arulampalam MS, Maskell S, Gordon N (2002) A Tutorial on Particle Filters for On-line Nonlinear Non-Gaussian Bayesian Tracking. IEEE Trans Signal Process 50(2):174–188 19. Astroem KJ, Wittenmark B (1994) Adaptive Control. Prentice Hall, Upper Saddle River 20. Azimi-Sadjadi MR, Yao D, Jamshidi AA, Dobeck GJ (2002) Underwater Target Classification in Changing Environments Using an Adaptive Feature Mapping. IEEE Trans Neural Netw 13(5):1099–1111 21. Badami VV, Chbat NW (1998) Home appliances get smart. IEEE Spectrum 35(8):36–43 22. Bezdek J (1974) Cluster Validity with Fuzzy Sets. J Cybern 3(3):58–71 23. Bonarini A, Lazaric A, Restelli M, Vitali P (2006) Self-Development Framework for Reinforcement Learning Agents. In: Proc of the 5th Intern. Conf. on Development and Learning, ICDL06 New Delhi 24. Carline D, Angelov PP, Clifford R (2005) Agile Collaborative Autonomous Agents for Robust Underwater Classification Scenarios. In: the Proceedings of the Underwater Defense Technology Conference, Amsterdam, June 2005 25. Carpenter GA, Grossberg S (2003) Adaptive Resonance Theory. In: Arbib MA (ed) The Handbook of Brain Theory and Neural Networks, 2nd edn. MIT Press, Cambridge, pp 87–90 26. Chiu SL (1994) Fuzzy model identification based on cluster estimation. J Intell Fuzzy Syst 2:267–278 27. Cordon O, Gomide F, Herrera F, Hoffmann F, Magdalena L (2004) Ten years of genetic fuzzy systems: Current framework and new trends. Fuzzy Sets Syst 141(1):5–31 28. Domingos P, Hulten G (2001) Catching up with the data: Research issues in mining data streams. In: Proc of the Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara 29. Filev D, Larson T, Ma L (2000) Intelligent Control for Automotive Manufacturing – Rule-based Guided Adaptation. In: Proc of the IEEE Conference on Industrial Electronics, IECON-2000, Nagoya Oct 2000, pp 283–288

1063

1064

Evolving Fuzzy Systems

30. Filev D, Tseng F (2006) Novelty detection-based Machine Health Prognostics. In: Proc of the 2006 Int Symposium on Evolving Fuzzy Systems. IEEE Press, USA, pp 193–199 31. Fortuna L, Graziani S, Rizzo A, Xibilia MG (2007) Soft sensors for Monitoring and Control of In Industrial Processes. Springer, London 32. Goldberg DE (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading 33. Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, Heidelberg 34. Hornby AS (1974) Oxford Advance Learner’s Dictionary. Oxford University Press, Oxford 35. Huang G-B, Saratchandran P, Sundarajan N (2005) A generalized growing and pruning RBF (GGAP – RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1):57– 67 36. Ishibuchi H, Nakashima T, Nii M (2004) Classification and Modeling with Linguistic Granules: Advanced Information Processing. Springer, Berlin 37. Jones E, Angelov P, Xydeas C (2006) Recovery of LSP Coefficients in VoIP Systems using Evolving Takagi–Sugeno Fuzzy MIMO Models. In: Proc of the 2006 Intern. Symposium on Evolving Fuzzy Systems, Ambelside, 7–9 Sept 2006, pp 208– 214 38. Kailath T, Sayed AH, Hassibi B (2000) Linear Estimation. Prentice Hall, Upper Saddle River 39. Kalman RE (1960) A New Approach to linear filtering and prediction problem. Transactions of the American Society of Mechanical Engineering, ASME, Ser. D. J Basic Eng 8:34–45 40. Kanakakis V, Valavanis KP, Tsourveloudis NC (2004) FuzzyLogic Based Navigation of Underwater Vehicles. J Intell Robotic Syst 40:45–88 41. Kasabov N (2001) Evolving fuzzy neural networks for on-line supervised/unsupervised, knowledge-based learning. IEEE Trans. on Systems, Man and Cybernetics – part B. Cybernetics 31:902–918 42. Kasabov N, Song Q (2002) DENFIS: Dynamic Evolving Neural-Fuzzy Inference System and Its Application for Time-Series Prediction. IEEE Trans Fuzzy Syst 10(2):144–154 43. Klir G, Folger T (1988) Fuzzy Sets, Uncertainty and Information. Prentice Hall, Englewood Cliffs 44. Kohonen T (1995) Self-Organizing Maps. Series in Inf Sci, vol 30. Springer, Heidelberg 45. Kordon A (2006) Inferential Sensors as Potential Application Area of Intelligent Evolving Systems. 2006 International Symposium on Evolving Fuzzy Systems, Ambleside, 7–9 September 2006, key note presentation 46. Kuncheva L (2000) Fuzzy Classifiers. Physica, Heidelberg 47. Leng G, McGuinty TM, Prasad G (2005) An approach for online extraction of fuzzy rules using a self-organizing fuzzy neural network. Fuzzy Sets Syst 150(2):211–243 48. Lin F-J, Lin C-H, Shen P-H (2001) Self-constructing fuzzy neural network speed controller for permanent-magnet synchronous motor drives. IEEE Trans Fuzzy Syst 9(5):751–759 49. Liu PX, Meng MQ-X (2004) On-line Data-Driven Fuzzy Clustering with Applications to Real-time Robotic Tracking. IEEE Trans Fuzzy Syst 12(4):516–523 50. Ljung L (1987) System Identification: Theory for the User. Prentice-Hall, New Jersey

51. Lughofer E, Angelov P, Zhou X (2007) Evolving Single- and Multi-Model Fuzzy Classifiers with FLEXFIS-Class. In: Proc of the 2007 IEEE International Conference on Fuzzy Systems, London, 23–26 July 2007, pp 363–368 52. Macias J, Angelov P, Zhou X (2006) Predicting quality of the crude oil distillation using evolving Takagi–Sugeno fuzzy models. In: Proc of the 2006 International Symposium on Evolving Fuzzy Systems, Ambleside, 7–9 Sept 2006, pp 201–207 53. Macias-Hernandez JJ, Angelov P, Zhou X (2007) Soft Sensor for Predicting Crude Oil Distillation Side Streams using Takagi Sugeno Evolving Fuzzy Models. In: Proc of the 2007 IEEE Int Conf on Syst, Man, and Cybernetics, Montreal, 7–10 Oct. 2007, pp 3305–3310 54. Mamdani EH, Assilian S (1975) An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. Int J Man-Mach Stud 7:1–13 55. Massaro DW (1991) Integration versus Interactive Activation: The Joint Influence of Stimulus and Context in Perception. Cogn Psychol 23:558–614 56. Memon MA, Angelov P, Ahmed H (2006) An Approach to Real-Time Color-based Object Tracking. In: Proc 2006 International Symposium on Evolving Fuzzy Systems, Ambleside, 7–9 Sept 2006, pp 81–87 57. Nauck D, Kruse R (1997) A Neuro-fuzzy method to learn fuzzy classification rules from data. Fuzzy Sets Syst 89:277–288 58. Pioneer-3DX (2004) User Guide. ActiveMedia Robotics, Amherst 59. Procyk TJ, Mamdani EH (1979) A linguistic self-organizing process controller. Automatica 15:15–30 60. Psaltis D, Sideris A, Yamamura AA (1988) A Multilayered Neural Network Controller. IEEE Trans Control Syst Manag 8:17–21 61. Qin SJ, Yue H, Dunia R (1997) Self-validating inferential sensors with application to air emission monitoring. Ind Eng Chem Res 36:1675–1685 62. Setnes M, Roubos H (2000) Ga-fuzzy modeling and classification: complexity and performance. IEEE Trans Fuzzy Syst 8(5):509–522 63. Shimojima K, Fukuda T, Hashegawa Y (1995) Self-Tuning Modeling with Adaptive Membership Function, Rules, and Hierarchical Structure based on Genetic Algorithm. Fuzzy Sets Syst 71:295–309 64. Sifalakis M, Hutchison D (2004) From Active Networks to Cognitive Networks. In: Proc of the ICRC Dagstuhl Seminar 04411 on Service Management and Self-Organization in IP-based Networks. Waden, October 2004 65. Takagi T, Sugeno M (1985) Fuzzy identification of systems and its application to modeling and control. IEEE Trans Syst Man Cybern B – Cybern 15:116–132 66. Valavanis K (2006) Unmanned Vehicle Navigation and Control: A Fuzzy Logic Perspective. In: Proc of the 2006 International Symposium on Evolving Fuzzy Systems. Ambleside, 7– 9 Sept. 2006, pp 200–207 67. Vapnik VN (1998) The Statistical Learning Theory. Springer, Berlin 68. Wang L-X (1992) Fuzzy Systems are Universal Approximators. In: Proc of the First IEEE International Conference on Fuzzy Systems, FUZZ-IEEE – 1992, San Diego, pp 1163–1170 69. Widrow B, Stearns S (1985) Adaptive Signal Processing. Prentice Hall, Englewood Cliffs 70. Xydeas C, Angelov P, Chiao S, Reoullas M (2006) Advances in EEG Signals Classification via Dependant HMM models and

Evolving Fuzzy Systems

71.

72.

73. 74.

75.

76.

Evolving Fuzzy Classifiers. Int J Comput Biol Medicine, special issue on Intell Technol Bio-Inform Medicine 36(10):1064– 1083 Yager R (2006) Learning Methods for Intelligent Evolving Systems. In: Proc 2006 International Symposium on Evolving Fuzzy Systems. Ambelside, 7–9 Sept. 2006, pp 3–7 Yager RR, Filev DP (1993) Learning of Fuzzy Rules by Mountain Clustering. In: Proc of the SPIE Conf. on Application of Fuzzy Logic Technology, Boston, pp 246–254 Yager RR, Filev DP (1994) Essentials of Fuzzy Modeling and Control. Wiley, New York Zadeh LA (1993) Soft Computing. Introductory Lecture for the 1st European Congress on Fuzzy and Intelligent Technologies EUFIT’93, Aachen, pp vi–vii Zhou X-W, Angelov P (2006) Real-Time joint Landmark Recognition and Classifier Generation by an Evolving Fuzzy System. In: Proc of the 2006 IEEE World Congress on Computational Intelligence, WCCI-2006, Vancouver, 16–21 July 2006, pp 6314–6321 Zhou X, Angelov P (2007) An approach to autonomous selflocalization of a mobile robot in completely unknown environment using evolving fuzzy rule-based classifier. In: Proc of the First 2007 IEEE Int Symposium on Computational Intelligence Applications for Defense and Security – a part of the IEEE Symposium Series on Computational Intelligence, SSCI2007, Honolulu, 1–5 April 2007, pp 131–138

Books and Reviews Angelov P, Xydeas C (2006) Fuzzy Systems Design: Direct and Indirect Approaches. Int J Soft Comput, special issue on New Trends in Fuzzy Modeling part I: Novel Approaches 10(9): 836–849 Angelov P, Zhou X (2006) Evolving fuzzy systems from data streams in real-time. Proc 2006 International Symposium on Evolving Fuzzy Systems, Ambleside, 7–9 Sept. 2006, pp 29–35 Bentley PJ (2000) Evolving Fuzzy Detectives: An Investigation into the Evolution of Fuzzy Rules. In: Suzuki, Roy, Ovasks, Furuhashi, Dote (eds) Soft Computing in Industrial Applications. Springer, London Chan Z, Kasabov N (2004) Evolutionary computation for on-line and off-line parameter tuning of evolving fuzzy neural networks. Int J Comput Intell Appl 4(3):309–319 Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996) From Data Mining to Knowledge Discovery: An Overview, Advances in Knowledge Discovery and Data Mining. MIT Press, Boston Fritzke B (1994) Growing cell structures – a self-organizing network for unsupervised and supervised learning. Neural Netw 7(9):1441–1460 Futschik M, Reeve A, Kasabov N (2003) Evolving connectionist systems for knowledge discovery from gene expression data of cancer tissue. Artif Intell Med 28:165–189

Hopner F, Klawonn F (2000) Obtaining interpretable fuzzy models from fuzzy clustering and fuzzy regression. In: Proc of the 4th Intern Conf on Knowledge-based Intelligent Engineering Systems (KES), Brighton, pp 162–165 Huang G-B, Saratchandran P, Sundarajan N (2005) A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans Neural Netw 16(1): 57–67 Huang L, Song Q, Kasabov N (2005) Evolving Connectionist Systems Based Role Allocation of Robots for Soccer Playing. In: Proc of the Joint 2005 International Symposium on Intelligent Control and 13th Mediterranean Conference on Control and Automation, ISIC – MED – 2005, Limassol, 27–29 June 2005 Jang JSR (1993) ANFIS: Adaptive Network-based Fuzzy Inference Systems. IEEE Trans. on Syst Man Cybernt B – Cybern 23(3):665–685 Juang C-F, Lin X-T (1999) A recurrent self-organizing neural fuzzy inference network. IEEE Trans Neural Netw 10:828–845 Kasabov N (2006) Adaptation and Interaction in Dynamical Systems: Modelling and Rule Discovery Through Evolving Connectionist Systems. Appl Soft Comput 6(3):307–322 Kasabov N (2006) Evolving connectionist systems: Brain-, gene-, and, quantum inspired computational intelligence. Springer, London Kasabov N, Chan Z, Song Q, Greer D (2005) Evolving neuro-fuzzy systems with evolutionary parameter self-optimisation. In: Do Adaptive Smart Systems exist? Series Study in Fuzziness, vol 173. Physica, Heidelberg Kim K, Baek J, Kim E, Park M (2005) TSK Fuzzy model based on-line identification. In: Proc of the 11th International Fuzzy Systems Association, IFSA World Congress, Beijing, pp 1435–1439 Klinkenberg R, Joachims T (2000) Detection concept drift with support vector machines. Proc of the 7th International Conference on Machine Learning (ICML). Morgan Kaufman, Stanford University, pp 487–494 Marin-Blazquez JG, Shen Q (2002) From approximative to descriptive fuzzy classifiers. IEEE Trans Fuzzy Syst 10(4):484–497 Marshall MR, Song Q, Ma TM, MacDonell S, Kasabov N (2005) Evolving Connectionist System versus Algebraic Formulae for Prediction of Renal Function from Serum Creatinine. Kidney Int 6:1944–1954 Ozawa S, Pang S, Kasabov N (2004) A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier. Lecture Notes in Artificial Intelligence LNAI, vol 3157. Springer, Berlin, pp 231–240 Pang S, Ozawa S, Kasabov N (2005) Incremental Linear Discriminant Analysis for Classification of Data Streams. IEEE Trans Syst Man Cybern B – Cybern 35(5):905–914 Plat J (1991) A resource allocation network for function interpolation. Neural Comput 3(2):213–225 Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

1065

1066

Extreme Value Statistics

Extreme Value Statistics MARIO N ICODEMI Department of Physics, University of Warwick, Coventry, UK Article Outline Glossary Definition of the Subject Introduction The Extreme Value Distributions The Generalized Extreme Value Distribution Domains of Attraction and Examples Future Directions Bibliography Glossary Random variable When a coin is tossed two random outcomes are permitted: head or tail. These outcomes can be mapped to numbers in a process which defines a ‘random variable’: for instance, ‘head’ and ‘tail’ could be mapped respectively to +1 and 1. More generally, any function mapping the outcomes of a random process to real numbers is defined a random variable [15]. More technically, a random variable is any function from a probability space to some measurable space, i. e., the space of admitted values of the variable, e. g., real numbers with the Borel -algebra. The amount of rainfall in a day or the daily price variation of a stock are two more examples. It’s worth to stress that, formally, the outcome of a given random experiment is not a random variable: the random variable is the function describing all the possible outcomes as numbers. Finally, two random variables are said independent when the outcome of either of them has no influence on the other. Probability distribution The probability of either outcomes, ‘head’ and ‘tail’, in tossing a coin is 50%. Similarly, a discrete random variable, X, with values fx1 ; x2 ; : : : g has an associate discrete probability distribution of occurrence fp1 ; p2 ; : : : g. More generally, for a random variable on real numbers, X, the corresponding probability distribution [15] is the function returning the probability to find a value of X within a given interval [x1 ; x2 ] (where x1 and x2 are real numbers): Pr[x1  X  x2 ]. In particular, the random variable, X, is fully characterized by its cumulative distribution function, F(x), which is: F(x) D Pr[X < x] for any x in R. The probability distribution density, f (x), can be often defined as the derivative of F(x) : f (x) D dF(x)/dx.

The probability distribution of two independent random variables, X and Y, is the product of the distributions, F X and F Y , of X and Y : F(x; y) Pr[X < x; Y < y] D Pr[X < x]  Pr[Y < y] D F X (x)  FY (y). Expected value The expected value [15] of a random variable is its average outcome over many independent experiments. Consider, for instance, a discrete random variable, X, with values in the set fx1 ; x2 ; : : : g and the corresponding probability for each of these values fp1 ; p2 ; : : : g. In probability theory, the expected, or average, value of X (denoted E(X)) is just the sum: P E(X) D x i p i . For instance, if you have an asset which can give two returns fx1 ; x2 g with probability fp1 ; p2 g, its expected return is x1 p1 C x2 p2 . In case we have a random variable defined on real numbers and F(x) is its probability distribution funcR tion, the expected value of X is: E(X) D XdF. As for some F(x) the above integral may not exist, the ‘expected value’ of a random variable is not always defined. Variance and moments The variance [15] of a probability distribution is a measure of the average deviations from the mean of the related random variable. In probability theory, the variance is usually defined as the mean squared deviation, E((X  E(X))2 ), i. e., the expected value of (X  E(X))2 . The square root of the variance is named the standard deviation and is a more sensible measure of fluctuations of X around E(X). Alike E(X), for some distributions the variance may not exist. In general, the expected value of the kth power of X, E(X k ), is called the kth moment of the distribution. The Central limit theorem The Central Limit Theorem [15] is a very important result in probability theory stating that the sum of N independent identically-distributed random variables, with finite average and variance, has a Gaussian probability distribution in the limit N ! 1, irrespective of the underlying distributions of the random variables. The domain of attraction of the Gaussian as a limit distribution is, thus, very large and can explain why the Gaussian is so frequently encountered. The theorem is, in practice, very useful since many real random processes have a finite average and variance and are approximately independent. Definition of the Subject Extreme value theory is concerned with the statistical properties of the extreme events related to a random variable (see Fig. 1), and the understanding and applications of

Extreme Value Statistics

ers a few more technical topics such as extreme r order statistics and the generalization of extreme value distribution theory. We refer to textbooks on probability theory, such as [15] (or, for simplicity, to the Glossary), for the definition of the basic notions of probability used here. The extensive research on extreme value statistics is reviewed in excellent books published over the past years, e. g., [6,11,12,13,14,17,18,21,24]; we provide here only an overview of the basic concepts and tools. To make this paper as self contained as possible, the Glossary gives a beginner introduction to all the elementary notions of probability theory encountered in the following sections. Introduction

Extreme Value Statistics, Figure 1 We show three samples, 1 to 3, each with N realizations of a random variable Xi (i 2 f1; : : : ; Ng). Extreme value theory is concerned with the statistical properties of occurrence of the extreme values in those samples, such as the maxima (circled points)

their probability distributions. The methods and the practical use of such a theory have been developed in the last 60 years, though, many complex real-life problems have only recently been tackled. Many disciplines use the tools of extreme value theory including meteorology, hydrology, ocean wave modeling, and finance to name just a few. For example, in economics, extreme value theory is currently used by actuaries to evaluate and price insurance against the probability of rare but financially catastrophic events. An other application is for the estimation of Value at Risk. In hydrology, the theory is applied by environmental risk agencies to calculate, for example, the height of sea-walls to prevent flooding. Similarly, extreme value theory is also used to set strength boundaries in engineering materials, as well as for material fatigue and reliability in buildings (e. g., bridges, oil rigs), and estimating pollution levels. This paper aims to give a simple, self contained, introduction to the motivations and basic ideas behind the development of extreme value theory, and briefly cov-

Extreme events, exceeding the typical expected value of a random variable, can have substantial relevance to problems raising in disciplines as diverse as sciences, engineering and economics. Extreme value theory is a sub-field of applied statistics, early developed [13,14,17,18] by mathematician such as Fisher, Tippett, Gnedenko, and, in particular, Emil Julius Gumbel, dealing precisely with the problems related to extreme events (see Fig. 1). One of its key point is the so called ‘three types theorem’, relating the properties of the distribution of probability of the underlying stochastic variable to its extreme value distributions, i. e., the limiting distributions for the extreme (minimum or maximum) value of a large collection of random observations. Interestingly, for a comparatively large class of random variables, the theory points out that only a few species of limit extreme value distributions are found. In some respect, the ‘three types theorem’ can be considered the analogous of the well known central limit theorem applying to ordinary sums, or averages, of random variables. From a practical point of view it is as important, since it opens a way to estimate the asymptotic distribution of extreme values without any a priori knowledge, or guess assumption, on the parent distribution. In this way, we have a solid ground to estimate the parameters of the limit distributions along with their confidence intervals, an issue crucial, for instance, to proper risk assessment. In Finance, for example, market regulators and financial institutions face the important and complex task to estimate and manage risk. Assessing the probability of rare and extreme events is, of course, a crucial issue and reliable measures of risk are needed to minimize undesirable effects on portfolios from large fluctuations in market conditions, e. g., exchange rates or prices of assets. Similar issues about risk and reliability are faced in insurance and banking, which are deeply concerned with unusually large

1067

1068

Extreme Value Statistics

fluctuations. Extreme value theory provides the solid theoretical foundation needed for the statistical modeling of such events and proper computation of risk and related confidence intervals. The study of natural and environmental hazards is also strongly interested to extreme events; for instance, reported hydrology and meteorology applications of extreme event theory concern flood frequency analysis, estimation of precipitation probabilities, or extreme tide levels. Predictions of events such as strong heat waves, rainfall, occurrence of huge sea waves are deeply grounded on such a theory as well. Analogous problems are found in telecommunications and transport systems, such as traffic data analysis, Internet traffic, queuing modeling; problems from material science, pollutants hazards on health add examples from a very long list of related phenomena. It is impossible to summarize here the huge, often technical, literature on all these topics and we refer to the general books cited in the bibliography. Actually, to give an idea of the variety of applications of the theory, we only mention a few more example from still an other class of disciplines, Physical Sciences. In Physics, for instance, the equilibrium low-temperature properties of disordered systems are characterized by the statistics of extremely low-energy states. Several problems in this class, including the Random Energy Model and models for decaying Burgers turbulence, have been connected to extreme value distributions [3]. In GaAs films, extreme values in Gaussian 1/ f correlations of voltage fluctuations were shown to follow one of the limit distributions of extreme value theory, the Gumbel asymptote [1]. Hierarchically correlated random variables representing the energies of directed polymers [9] and maximal heights of growing self-affine surfaces [20] exhibits extreme value statistics as well. The Fisher–Tippett–Gumbel asymptote is involved in distribution of extreme height fluctuations for Edwards–Wilkinson relaxation of fluctuating interfaces on small-world-coupled interacting systems [16]. A connection was also established between the energy level density of a gas of non-interacting bosons and the distribution laws of extreme value statistics [7]. In case of systems with correlated variables, the application of the extreme value theory is far from trivial. A theorem states that the statistics of maxima of stationary Gaussian sequences, with suitable correlations, asymptotically converges to a Gumbel distribution [2]. Evidences supporting similar results were derived from numerical simulations and analysis of long-term correlated exponentially distributed signals [10]. In general, however, the scenario is non trivial. For instance, in Physics a variant of Gumbel distribution was observed in turbulence [4] and

derived in the two-dimensional XY model [5], which are systems where correlations play an important role. Similarly, correlated extreme value statistics were discovered in the Sneppen depinning model [8]. In models for fluctuating correlated interfaces, such as Edwards–Wilkinson and Kardar–Parisi–Zhang equations, an exact solution for the distribution of maximal heights was recently derived and it turns out to be an Airy function [19]. After the above brief picture of the field, we illustrate next the general properties of extreme value distributions of independent random variables. The Extreme Value Distributions Extreme value distributions are the limit distributions of extremes (either maxima or minima) of a set of random variables (see Fig. 1). For definiteness, we will deal here with maxima, as minima can be seen as ‘maxima’ of a set of variables with opposite signs. Consider a set fX1 ; X2 ; : : : ; X N g of N independent identically distributed random variables, X i , with a cumulative distribution function, F(x) PrfX i  xg. The maximum in the set, YN D MaxfX1 ; X2 ; : : : ; X N g, has a distribution function, H N (x), which is simply related to F, since by definition of Y N , we have: H N (x) PrfYN  xg D PrfX 1  x; X 2  x; : : : X N  xg D PrfX1  xg  PrfX 2  xg  : : :  PrfX N  xg D F N (x) :

(1)

In the limit of large samples, N ! 1, it is possible to show that, under some general hypotheses on F described below, we can find a suitable sequence of scaling constants aN and bN , such that the scaled variable y N D (YN  b N )/a N has a non degenerate probability distribution function H(y). Specifically, as N ! 1, the distribution Prfy N  yg has a non trivial well defined limit H(y): Prfy N  yg D Prf(YN  b N )/a N  yg D PrfYN  a N y C b N g D F N (a N y C b N ) ! H(y) for N ! 1 : (2) For a given underlying distribution, F, the individuation of the precise sequence of scaling constants aN and bN required in Eq. (2) is a non trivial technical problem in the mathematics of extreme values [18], which we briefly discuss in the next sections. Such an issue is overshadowed by the simplicity of the result of the ‘three types theorem’,

Extreme Value Statistics

it is especially important as it selects the specific type of asymptote: I)

The case  D 0 corresponds to the Gumbel asymptote, since it is easy to show that h  y   i lim H(y; ; ; ) D exp  exp ; (7)  !0

II) Similarly, the case  > 0 corresponds to the Fréchet asymptote of Eq. (4), where the exponent is ˛ D 1/; III) And, finally, the case  < 0 corresponds to the Weibull asymptote of Eq. (5), where the exponent is ˛ D 1/. Extreme Value Statistics, Figure 2 As an example, we plot in this figure the Gumbel density distribution, h(x) D dH(x)/dx, from Eq. (3), and the Fréchet and Weibull density distributions from Eqs. (4) and (5), for ˛ D 2

which states that there are only three types (apart from a scaling transformation of the variable) of limiting distribution H(y) (see Fig. 2): I)

The parameters  and  of Eq. (6) are called the ‘location’ and the ‘scale’ parameters, since they are related to the moments of the generalized extreme value distribution of Eq. (6). Figure 3 plots the effects of changes in  and  on the form of H(y) from Eq. (6) in the Gumbel case,  D 0. It is possible to show [18] that the kth moment is finite only if  < 1/k. The mean, which exists only if  < 1, can be expressed in the following general form

Gumbel type: H(y) D exp[ exp(y)]

(3)

with 1 < y < 1; II) Fréchet type: H(y) D exp[y ˛ ]

(4)

where ˛ is a fixed exponent and 0 < y < 1 (with H(y) D 0 for y < 0); III) Weibull type: H(y) D exp[(y)˛ ]

(5)

where ˛ is a fixed exponent and 1 < y < 0 (with H(y) D 1 for y > 0).

E(y) D  C

 [ (1  )  1] ; 

(8)

where  (x) is the Gamma function. In the Gumbel limit,  ! 0, the above result is simplified to: E(y) D  C   , where  D 0:577 : : : is the Euler  constant. Analogously, the variance, existing for  < 1/2, can be written as:   E [y  E(y)]2 D

2  [ (1  2)   2 (1  )] ; (9) 

  which in the  ! 0 limit becomes E [y  E(y)]2 D  2  2 /6. The r Largest Order Statistics

The Generalized Extreme Value Distribution In the extreme value statistics literature, the three types of limiting distributions, Gumbel, Fréchet and Weibull, are often represented as a single family including all of them, the so called generalized extreme value distribution:  y   1/ (6) H(y; ; ; ) D exp  1 C   with support in the interval where 1 C (y  )/ > 0, as otherwise H is either zero or one. Out of the three parameters ; ;  of Eq. (6),  is called the ‘shape’ parameter and

The results on distributions of the maximum (or minimum) discussed above can be extended to the set of the rth largest value of an ensemble. Consider a set fX1 ; X2 ; : : : ; X N g of N identically distributed random variables which, for simplicity of notation, are arranged in order of magnitude: X1 < X2 <    < X N . As before, F(x) PrfX i  xg is their common cumulative distribution function. The statistics of X N and X 1 are the distribution of, respectively, the maximum and the minimum seen before. Similarly, X r (with r 2 f1; Ng) is called the r (largest) order statistic.

1069

1070

Extreme Value Statistics

In order to describe a broader panorama of the available results in extreme value theory, we give here some details on the more general case of the limit probability distribution density of the vector of the first r largest values (y1 ; y2 ; : : : ; y r ) D ((X N  b N )/a N ; (X N1  b N )/a N ; : : : ; (X NrC1 b N )/a N ). Such a limit distribution density can be shown to be [18]  1 y r    1 exp  1C r   #

X r  1 yi     1C ln 1 C  :  

h(y1 ; : : : ; y r ) D

(11)

iD1

Most of the other results of the previous sections can be generalized to the r order statistics, as shown for instance in [18]. Domains of Attraction and Examples

Extreme Value Statistics, Figure 3 In this figure we show the effects of the  and  parameters on the appearance of the generalized extreme value density distribution, h(x) D dH(x)/dx, from Eq. (6), in the case  D 0, i. e., the Gumbel type. In the upper panel, we plot h(x) for  D 1 and  D 0; 1; 2. In the lower panel, we plot h(x) for  D 0 and  D 1/4; 1/2; 1

The r order statistic has a distribution function, H r (x), simply related to F H r (x) PrfX r  xg D PrfX1  x; X 2  x; : : : ; X r  xg  PrfX rC1 > x; : : : ; X N > xg D

N X iDr

N! F i (x)[1  F(x)] Ni : (N  i)!i!

(10)

The theory of the generalized extreme value distribution can be extended to the r order statistic. Actually, in the limit N ! 1, if a suitable sequence of scaling constants, aN and bN , can be found such that the scaled maximum variable y N D (X N  b N )/a N has a limit distribution function H(y) given in Eq. (6), then the r order statistic has a limit distribution which can be easily expressed in terms of H(y) [18].

The problem of finding the domains of attraction of the classes of limiting distribution is a complex, partially still open, topic in extreme value theory [18]. Even in the case of independent identically distributed random variables, understanding which asymptote a given distribution, F, converges to and which is the sequence of scaling constants, aN and bN , can be a non trivial task. As the extreme events of a random variable are characterized by the tail of F, a simple approximate approach to guess the domain of attraction F falls into is to consider its behavior for large x. We summarize below a few well known examples having a broad validity which can guide practical applications of the theory to the case where the random variables are independent and identically distributed. E I)

Many common distributions, F(x), have exponential tails in x, very important examples being the Gaussian and the exponential distributions. In this case, their extreme value statistic is the Gumbel asymptote. A more formal condition for F to belong to the domain of attraction of the Gumbel limiting distribution was established by von Mises. Take a function F and denote xmax the largest value in its support, i. e., where F(xmax ) D 1 (the point xmax can be also infinite). Consider the derivative f (x) D dF(x)/dx and the rate of F(x) in approaching 1 as x ! xmax : when x ! xmax , if d 1  F(x) !0 (12) dx f (x) then Prfy N  yg tends to the Gumbel asymptote given in Eq. (3).

Extreme Value Statistics

The above criterion can be rephrased in a more colloquial way: the Gumbel type is the limiting distribution when 1  F(x) decays faster than a polynomial for x ! xmax . Beyond the Gaussian and exponential distributions, the lognormal, the Gamma, the Weibull, the Benktander-type I and II, and many more common distributions, with xmax either finite or infinite, belong to this class. E II) Distributions such as the Pareto, Cauchy, Student, Burr have the Fréchet asymptote. More generally, when xmax is infinite and F has a power law tail for x!1 1  F(x) ' x ˛

(13)

with an exponent ˛ > 0, then the domain of attraction of the extreme value statistics is the Fréchet type given in Eq. (4), with precisely the same exponent ˛ of Eq. (13). E III) Finally, when xmax is finite and F has a power law behavior for x ! xmax 1  F(x) ' (xmax  x)˛

(14)

with an exponent ˛ > 0, then the domain of attraction of the extreme value statistics is the Weibull type of Eq. (5), with the same exponent ˛ of Eq. (14). The Uniform and Beta distributions have, for instance, the Weibull asymptote.

Extremes of Correlated Random Variables When the underlying random variables are not independent, as in many cases of practical relevance ranging from Meteorology to Finance, the problem to individuate the form, or even the existence, of the limiting distribution is, in general, open. The existing broad technical literature on the topic [18] shows that the three types, Gumbel, Fréchet and Weibull, summarized in the generalized extreme value distribution of Eq. (6), often arise as well. For instance, a recent theorem has shown that in the case of stationary Gaussian sequences with suitable correlations the distribution of maxima asymptotically follows the Gumbel type [2]. Analysis of numerical simulations of long-term correlated exponentially distributed signals has given evidences supporting similar conclusions [10]. Sometimes, when considering ‘time’ series of N correlated variables, the approximate rule of thumb that N must be much bigger than the ‘correlation length’ of the sequence is used as a guide to decide whether Eq. (6) is likely to be the right asymptote.

Some of the example mentioned in the Introduction can help, however, in delineating the strong limits of the above approximate criteria and the lack of a general picture. For instance, in the XY model for magnetic systems used in Statistical Physics, in the Kosterliz–Thouless low temperature phase the magnetization has a distribution which is a generalized Gumbel [5], but not the one in Eq. (3), a result expected to hold in a broad class of systems. In models for fluctuating interfaces, developing correlations, described by Edwards–Wilkinson and Kardar–Parisi–Zhang like equations, it has been derived that the exact distribution of maximal heights is an Airy function [19]. These examples show the variety of situations which can arise in practical cases and indicate that the theorems derived for independent identically distributed variables must be applied with caution. Future Directions In the sections above, we reviewed at an introductory level the mathematics of extreme value theory, with a special focus on the ‘three types theorem’ on the limiting distributions. We also discussed their domains of attraction and many examples on random extreme events. We have not covered, instead, other important, though, more technical and still evolving topics such as the theoretical approach to the problem of ‘exceedances over thresholds’, and the the methodology for estimating from real sample data the parameters of extreme distributions, such as maximum likelihood and Bayesian methods. These are covered, for instance, in the general references listed in the bibliography. Actually, there is a number of excellent textbooks on these topics, ranging from the original book by E.J. Gumbel [17], to more recent volumes illustrating in details extreme value theory in the formal framework of the theory of probability [11,13,14,18]. Volumes more focused on applications to Finance and Insurances are, e. g., [6,11,12,21], as applications to climate, hydrology and meteorology research are found in [6,12,21,24]. Finally, there is a number of more technical review papers on the topic, including [10,22,23]. Bibliography 1. Antal T, Droz M, Gyrgyi G, Racz Z (2001) Phys Rev Lett 87:240601 2. Berman SM (1964) Ann Math Stat 35:502 3. Bouchaud J-P, Mézard M (1997) J Phys A 30:7997 4. Bramwell ST, Holdsworth PCW, Pinton J-F (1998) Nature (London) 396:552 5. Bramwell ST, Christensen K, Fortin J-Y, Holdsworth PCW, Jensen HJ, Lise S, Lopez JM, Nicodemi M, Pinton J-F, Sellitto M (2000) Phys Rev Lett 84:3744

1071

1072

Extreme Value Statistics

6. Bunde A, Kropp J, Schellnhuber H-J (eds) (2002) The science of disasters-climate disruptions, heart attacks, and market crashes. Springer, Berlin 7. Comtet A, Leboeuf P, Majumdar SN (2007) Phys Rev Lett 98:070404 8. Dahlstedt K, Jensen HJ (2001) J Phys A 34:11193; [Inspec] [ISI] 9. Dean DS, Majumdar SN (2001) Phys Rev E 64:046121 10. Eichner JF, Kantelhardt JW, Bunde A, Havlin S (2006) Phys Rev E 73:016130 11. Embrechts P, Klüppelberg C, Mikosch T, Karatzas I, Yor M (eds) (1997) Modelling extremal events. Springer, Berlin 12. Finkenstadt B, Rootzen H (2004) Extreme values in finance, telecommunications, and the environment. Chapman and Hall/CRC Press, London 13. Galambos J (1978) The asymptotic theory of extreme order statistics. Wiley, New York 14. Galambos J, Lechner J, Simin E (eds) (1994) Extreme value theory and applications. Kluwer, Dordrecht 15. Gnedenko BV (1998) Theory of probability. CRC, Boca Raton, FL 16. Guclu H, Korniss G (2004) Phys Rev E 69:065104(R)

17. Gumbel EJ (1958) Statistics of extremes. Columbia University Press, New York 18. Leadbetter MR, Lindgren G, Rootzen H (1983) Extremes and related properties of random sequences and processes. Springer, New York 19. Majumdar SN, Comtet A (2004) Phys Rev Lett 92:225501; J Stat Phys 119, 777 (2005) 20. Raychaudhuri S, Cranston M, Przybyla C, Shapir Y (2001) Phys Rev Lett 87:136101 21. Reiss RD, Thomas M (2001) Statistical analysis of extreme values: with applications to insurance, finance, hydrology, and other fields. Birkhäuser, Basel 22. Smith RL (2003) Statistics of extremes, with applications in environment, insurance and finance, chap 1. In: Statistical analysis of extreme values: with applications to insurance, finance, hydrology, and other fields. Birkhäuser, Basel 23. Smith RL, Tawn JA, Yuen HK (1990) Statistics of multivariate extremes. Int Stat Rev 58:47 24. v. Storch H, Zwiers FW (2001) Statistical analysis in climate research. Cambridge University Press, Cambridge

Fair Division

Fair Division* STEVEN J. BRAMS Department of Politics, New York University, New York, USA Article Outline Glossary Definition of the Subject Introduction Single Heterogeneous Good Several Divisible Goods Indivisible Goods Conclusions Future Directions Bibliography Glossary Efficiency An allocation is efficient if there is no other allocation that is better for one player and at least as good for all the other players. Envy-freeness An allocation is envy-free if each player thinks it receives at least a tied-for-largest portion and so does not envy the portion of any other player. Equitability An allocation is equitable if each player values the portion that it receives the same as every other player values its portion. Definition of the Subject Cutting a cake, dividing up the property in an estate, determining the borders in an international dispute – such allocation problems are ubiquitous. Fair division treats all these problems and many more through a rigorous analysis of procedures for allocating goods, or deciding who wins on what issues, in a dispute. Introduction The literature on fair division has burgeoned in recent years, with five academic books [1,13,23,28,32] and one popular book [15] providing overviews. In this review, I will give a brief survey of three different literatures: (i) the division of a single heterogeneous good (e. g., a cake with different flavors or toppings); (ii) the division, in whole or part, of several divisible goods; and (iii) the allocation of several indivisible goods. In each case, I assume the differ Adapted from Barry R. Weingast and Donald Wittman (eds) Oxford Handbook of Political Economy (Oxford University Press, 2006) by permission of Oxford University Press.

ent people, called players, may have different preferences for the items being divided. For (i) and (ii), I will describe and illustrate procedures for dividing divisible goods fairly, based on different criteria of fairness. For (iii), I will discuss problems that arise in allocating indivisible goods, illustrating trade-offs that must be made when different criteria of fairness cannot all be satisfied simultaneously. Single Heterogeneous Good The metaphor I use for a single heterogeneous good is a cake, with different flavors or toppings, that cannot be cut into pieces that have exactly the same composition. Unlike a sponge or layer cake, different players may like different pieces – even if they have the same physical size – because they are not homogeneous. Some of the cake-cutting procedures that have been proposed are discrete, whereby players make cuts with a knife – usually in a sequence of steps – but the knife is not allowed to move continuously over the cake. Movingknife procedures, on the other hand, permit such continuous movement and allow players to call “stop” at any point at which they want to make a cut or mark. There are now about a dozen procedures for dividing a cake among three players, and two procedures for dividing a cake among four players, such that each player is assured of getting a most valued or tied-for-most-valued piece, and there is an upper bound on the number of cuts that must be made [16]. When a cake is so divided, no player will envy another player, resulting in an envy-free division. In the literature on cake-cutting, two assumptions are commonly made: 1. The goal of each player is to maximize the minimumsize piece (maximin piece) that he or she can guarantee for himself or herself, regardless of what the other players do. To be sure, a player might do better by not following such a maximin strategy; this will depend on the strategy choices of the other players. However, all players are assumed to be risk-averse: They never choose strategies that might yield them more valued pieces if they entail the possibility of giving them less than their maximin pieces. 2. The preferences of the players over the cake are continuous. Consider a procedure in which a knife moves across a cake from left to right and, at any moment, the piece of the cake to the left of the knife is A and the piece to the right is B. The continuity assumption enables one to use the intermediate-value theorem to say the following: If, for some position of the knife, a player views piece A as being more valued than piece B, and

1073

1074

Fair Division

for some other position he or she views piece B as being more valued than piece A, then there must be some intermediate position such that the player values the two pieces exactly the same. Only two 3-person procedures [2,30], and no 4-person procedure, make an envy-free division with the minimal number of cuts (n  1 cuts if there are n players). A cake so cut ensures that each player gets a single connected piece, which is especially desirable in certain applications (e. g., land division). For two players, the well-known procedure of “I cut the cake, you choose a piece,” or “cut-and-choose,” leads to an envy-free division if the players choose maximin strategies. The cutter divides the cake 50-50 in terms of his or her preferences. (Physically, the two pieces may be of different size, but the cutter values them the same.) The chooser takes the piece he or she values more and leaves the other piece for the cutter (or chooses randomly if the two pieces are tied in his or her view). Clearly, these strategies ensure that each player gets at least half the cake, as he or she values it, proving that the division is envy-free. But this procedure does not satisfy certain other desirable properties [7,22]. For example, if the cake is, say, half vanilla, which the cutter values at 75 percent, and half chocolate, which the chooser values at 75 percent, a “pure” vanilla-chocolate division would be better for the cutter than the divide-and-choose division, which gives him or her exactly 50% percent of the value of the cake. The moving-knife equivalent of “I cut, you choose” is for a knife to move continuously across the cake, say from left to right. Assume that the cake is cut when one player calls “stop.” If each of the players calls “stop” when he or she perceives the knife to be at a 50-50 point, then the first player to call “stop” will produce an envy-free division if he or she gets the left piece and the other player gets the right piece. (If both players call “stop” at the same time, the pieces can be randomly assigned to the two players.) To be sure, if the player who would truthfully call “stop” first knows the other player’s preference and delays calling “stop” until just before the knife would reach the other player’s 50-50 point, the first player can obtain a greater-than-50-percent share on the left. However, the possession of such information by the cutter is not generally assumed in justifying cut-and-choose, though it does not undermine an envy-free division. Surprisingly, to go from two players making one cut to three players making two cuts cannot be done by a discrete procedure if the division is to be envy-free.1 The 3-per1 [28], pp. 28–29; additional information on the minimum numbers of cuts required to give envy-freeness is given in [19], and [29].

son discrete procedure that uses the fewest cuts is one discovered independently by John L. Selfridge and John H. Conway about 1960; it is described in, among other places, Brams and Taylor (1996) and Robertson and Webb (1998) and requires up to five cuts. Although there is no discrete 4-person envy-free procedure that uses a bounded number of cuts, Brams, Taylor, and Zwicker (1997) and Barbanel and Brams (2004) give moving-knife procedures that require up to 11 and 5 cuts, respectively. The Brams-Taylor-Zwicker (1997) procedure is arguably simpler because it requires fewer simultaneously moving knives. Peterson and Su (2002) give a 4-person envy-free moving-knife procedure for chore division, whereby each player thinks he or she receives the least undesirable chores, that requires up to 16 cuts. To illustrate ideas, I describe next the Barbanel– Brams [2] 3-person, 2-cut envy-free procedure, which is based on the idea of squeezing a piece by moving two knives simultaneously. The Barbanel–Brams [2] 4-person, 5-cut envy-free procedure also uses this idea, but it is considerably more complicated and will not be described here. The latter procedure, however, is not as complex as Brams and Taylor’s [12] general n-person discrete procedure. Their procedure illustrates the price one must pay for an envy-free procedure that works for all n, because it places no upper bound on the number of cuts that are required to produce an envy-free division; this is also true of other n-person envy-free procedures [25,27]. While the number of cuts needed depends on the players’ preferences over the cake, it is worth noting that Su’s [31] approximate envy-free procedure uses the minimal number of cuts at a cost of only small departures from envy-freeness.2 I next describe the Barbanel–Brams 3-person, 2-cut envy-free procedure, called the squeezing procedure [2]. I refer to players by number – player 1, player 2, and so on – calling even-numbered players “he” and odd-numbered players “she.” Although cuts are made by two knives in the end, initially one player makes “marks,” or virtual cuts, on the line segment defining the cake; these marks may subsequently be changed by another player before the real cuts are made. Squeezing procedure. A referee moves a knife from left to right across a cake. The players are instructed to call “stop” when the knife reaches the 1/3 point for each. Let the first player to call “stop” be player 1. (If two or three 2 See [10,20,26] for other approaches, based on bidding, to the housemates problem discussed in [31]. On approximate solutions to envy-freeness, see [33]. For recent results on pie-cutting, in which radial cuts are made from the center of a pie to divide it into wedgeshaped pieces, see [3,8].

Fair Division

players call “stop” at the same time, randomly choose one.) Have player 1 place a mark at the point where she calls “stop” (the right boundary of piece A in the diagram below), and a second mark to the right that bisects the remainder of the cake (the right boundary of piece B below). Thereby player 1 indicates the two points that, for her, trisect the cake into pieces A, B, and C, which will be assigned after possible modifications. A B C /–––––j–––––j–––––/ 1 1 Because neither player 2 nor player 3 called “stop” before player 1 did, each of players 2 and 3 thinks that piece A is at most 1/3. They are then asked whether they prefer piece B or piece C. There are three cases to consider: 1. If players 2 and 3 each prefer a different piece – one player prefers piece B and the other piece C – we are done: Players 1, 2, and 3 can each be assigned a piece that they consider to be at least tied for largest. 2. Assume players 2 and 3 both prefer piece B. A referee places a knife at the right boundary of B and moves it to the left. At the same time, player 1 places a knife at the left boundary of B and moves it to the right in such a way that the value of the cake traversed on the left (by B’s knife) and on the right (by the referee’s knife) are equal for player 1. Thereby pieces A and C increase equally in player 1’s eyes. At some point, piece B will be diminished sufficiently to a new piece, labeled B0 – in either player 2’s or player 3’s eyes – to tie with either piece A0 or C0 , the enlarged A and C pieces. Assume player 2 is the first, or tied for the first, to call “stop” when this happens; then give player 3 piece B0 , which she still thinks is the most valued or the tied-for-most-valued piece. Give player 2 the piece he thinks ties for the most value with piece B0 (say, piece A0 ), and give player 1 the remaining piece (piece C0 ), which she thinks ties for the most value with the other enlarged piece (A0 ). Clearly, each player will think that he or she received at least a tied-for-most-valued piece. 3. Assume players 2 and 3 both prefer piece C. A referee places a knife at the right boundary of B and moves it to the right. Meanwhile, player 1 places a knife at the left boundary of B and moves it to the right in such a way as to maintain the equality, in her view, of pieces A and B as they increase. At some point, piece C will be diminished sufficiently to C0 – in either player 2’s or player 3’s eyes – to tie with either piece A0 or B0 , the enlarged A and B pieces. Assume player 2 is the first, or the tied for the first, to call “stop” when this happens; then

give player 3 piece C0 , which she still thinks is the most valued or the tied-for-most-valued piece. Give player 2 the piece he thinks ties for the most value with piece C0 (say, piece A0 ), and give player 1 the remaining piece (piece B0 ), which she thinks ties for the most value with the other enlarged piece (A0 ). Clearly, each player will think that he or she received at least a tied-for-mostvalued piece. Note that who moves a knife or knives varies, depending on what stage is reached in the procedure. In the beginning, I assume a referee moves a single knife, and the first player to call “stop” (player 1) then trisects the cake. But, at the next stage of the procedure, in cases (2) and (3), it is a referee and player 1 that move two knives simultaneously, “squeezing” what players 2 and 3 consider to be the most-valued piece until it eventually ties, for one of them, with one of the two other pieces. Several Divisible Goods Most disputes – divorce, labor-management, merger-acquisition, and international – involve only two parties, but they frequently involve several homogeneous goods that must be divided, or several issues that must be resolved.3 As an example of the latter, consider an executive negotiating an employment contract with a company. The issues before them are (1) bonus on signing, (2) salary, (3) stock options, (4) title and responsibilities, (5) performance incentives, and (6) severance pay [14]. The procedure I describe next, called adjusted winner (AW), is a 2-player procedure that has been applied to disputes ranging from interpersonal to international ([15]).4 It works as follows. Two parties in a dispute, after perhaps long and arduous bargaining, reach agreement on (i) what issues need to be settled and (ii) what winning and losing means for each side on each issue. For example, if the executive wins on the bonus, it will presumably be some amount that the company considers too high but, nonetheless, is willing to pay. On the other hand, if the executive loses on the bonus, the reverse will hold. 3 Dividing several homogeneous goods is very different from cake-

cutting. Cake-cutting is most applicable to a problem like land division, in which hills, dales, ponds, and trees form an incongruous mix, making it impossible to give all or one thing (e. g., trees) to one player. By contrast, in property division it is possible to give all of one good to one player. Under certain conditions, 2-player cake division, and the procedure to be discussed next (adjusted winner), are equivalent [22]. 4 A website for AW can be found at http://www.nyu.edu/projects/ adjustedwinner. Procedures applicable to more than two players are discussed in [13,15,23,32].

1075

1076

Fair Division

Thus, instead of trying to negotiate a specific compromise on the bonus, the company and the executive negotiate upper and lower bounds, the lower one favoring the company and the upper one favoring the executive. The same holds true on other issues being decided, including non-monetary ones like title and responsibilities. Under AW, each side will always win on some issues. Moreover, the procedure guarantees that both the company and the executive will get at least 50% of what they desire, and often considerably more. To implement AW, each side secretly distributes 100 points across the issues in the dispute according to the importance it attaches to winning on each. For example, suppose that the company and the executive distribute their points as follows, illustrating that the company cares more about the bonus than the executive (it would be a bad precedent for it to go too high), whereas the reverse is true for severance pay (the executive wants to have a cushion in the event of being fired):

1. 2. 3. 4. 5. 6.

Issues Company Executive Bonus 10 5 Salary 35 40 Stock Options 15 20 Title and Responsibilities 15 10 Performance Incentives 15 5 Severance Pay 10 20 Total 100 100

The italicized figures show the side that wins initially on each issue by placing more points on it. Notice that whereas the company wins a total of 10 C 15 C 15 D 40 of its points, the executive wins a whopping 40C20C20 D 80 of its points. This outcome is obviously unfair to the company. Hence, a so-called equitability adjustment is necessary to equalize the points of the two sides. This adjustment transfers points from the initial winner (the executive) to the loser (the company). The key to the success of AW – in terms of a mathematical guarantee that no win-win potential is lost – is to make the transfer in a certain order (for a proof, see [13], pp. 85–94). That is, of the issues initially won by the executive, look for the one on which the two sides are in closest agreement, as measured by the quotient of the winner’s points to the loser’s points. Because the winner-toloser quotient on the issue of salary is 40/35 D 1:14, and this is smaller than on any other issue on which the executive wins (the next-smallest quotient is 20/15 = 1.33 on stock options), some of this issue must be transferred to the company.

But how much? The point totals of the company and the executive will be equal when the company’s winning points on issues 1, 4, and 5, plus x percent of its points on salary (left side of equation below), equal the executive’s winning points on issues 2, 3, and 6, minus x percent of its points on salary (right side of equation): 40 C 35x D 80  40x 75x D 40 : Solving for x gives x D 8/15  0:533. This means that the executive will win about 53% on salary, and the company will lose about 53% (i. e., win about 47%), which is almost a 50-50 compromise between the low and high figures they negotiated earlier, only slightly favoring the executive. This compromise ensures that both the company and the executive will end up with exactly the same total number of points after the equitability adjustment: 40 C 35(:533) D 80  40(:533)  58:7 : On all other issues, either the company or the executive gets its way completely (and its winning points), as it should since it valued these issues more than the other side. Thus, AW is essentially a winner-take-all procedure, except on the one issue on which the two sides are closest and which, therefore, is the one subject to the equitability adjustment. On this issue a split will be necessary, which will be easier if the issue is a quantitative one, like salary, than a more qualitative one like title and responsibilities.5 Still, it should be possible to reach a compromise on an issue like title and responsibilities that reflects the percentages the relative winner and relative loser receive (53% and 47% on salary in the example). This is certainly easier than trying to reach a compromise on each and every issue, which is also less efficient than resolving them all at once according to AW.6 In the example, each side ends up with, in toto, almost 59% of what it desires, which will surely foster greater satisfaction than would a 50-50 split down the middle on each issue. In fact, assuming the two sides are truthful, there is no better split for both, which makes the AW settlement efficient. In addition, it is equitable, because each side gets exactly the same amount above 50%, with this figure increasing the greater the differences in the two sides’ valuations of the issues. In effect, AW makes optimal trade-offs by 5 AW may require the transfer of more than one issue, but at most one issue must be divided in the end. 6 A procedure called proportional allocation (PA) awards issues to the players in proportion to the points they allocate to them. While inefficient, PA is less vulnerable to strategic manipulation than AW, with which it can be combined ([13], pp. 75–80).

Fair Division

awarding issues to the side that most values them, except as modified by the equitability adjustment that ensures that both sides do equally well (in their own subjective terms, which may not be monetary). On the other hand, if the two sides have unequal claims or entitlements – as specified, for example, in a contract – AW can be modified to give each side shares of the total proportional to its specified claims. Can AW be manipulated to benefit one side? It turns out that exploitation of the procedure by one side is practically impossible unless that side knows exactly how the other side will allocate its points. In the absence of such information, attempts at manipulation can backfire miserably, with the manipulator ending up with less than the minimum 50 points its honesty guarantees it [13,15]. While AW offers a compelling resolution to a multiissue dispute, it requires careful thought to delineate what the issues being divided are, and tough bargaining to determine what winning and losing means on each. More specifically, because the procedure is an additive point scheme, the issues need to be made as independent as possible, so that winning or losing on one does not substantially affect how much one wins or loses on others. To the degree that this is not the case, it becomes less meaningful to use the point totals to indicate how well each side does. The half dozen issues identified in the executive-compensation example overlap to an extent and hence may not be viewed as independent (after all, might not the bonus be considered part of salary?). On the other hand, they might be reasonably thought of as different parts of a compensation package, over which the disputants have different preferences that they express with points. In such a situation, losing on the issues you care less about than the other side will be tolerable if it is balanced by winning on the issues you care more about. Indivisible Goods The challenge of dividing up indivisible goods, such as a car, a boat, or a house in a divorce, is daunting, though sometimes such goods can be shared (usually at different times). The main criteria I invoke are efficiency (there is no other division better for everybody, or better for some players and not worse for the others) and envy-freeness (each player likes its allocation at least as much as those that the other players receive, so it does not envy anybody else). But because efficiency, by itself, is not a criterion of fairness (an efficient allocation could be one in which one player gets everything and the others nothing), I also consider other criteria of fairness besides envy-freeness, including Rawlsian and utilitarian measures of welfare (to be defined).

I present two paradoxes, from a longer list of eight in [4],7 that highlight difficulties in creating “fair shares” for everybody. But they by no means render the task impossible. Rather, they show how dependent fair division is on the fairness criteria one deems important and the tradeoffs one considers acceptable. Put another way, achieving fairness requires some consensus on the ground rules (i. e., criteria), and some delicacy in applying them (to facilitate trade-offs when the criteria conflict). I make five assumptions. First, players rank indivisible items but do not attach cardinal utilities to them. Second, players cannot compensate each other with side payments (e. g., money) – the division is only of the indivisible items. Third, players cannot randomize among different allocations, which is a way that has been proposed for “smoothing out” inequalities caused by the indivisibility of items. Fourth, all players have positive values for every item. Fifth, a player prefers one set S of items to a different set T if (i) S has as many items as T and (ii) for every item t in T and not in S, there is a distinct item s in S and not T that the player prefers to t. For example, if a player ranks four items in order of decreasing preference, 1 2 3 4, I assume that it prefers  the set {1,2} to {2,3}, because {1} is preferred to {3}; and  the set {1,3} to {2,4}, because {1} is preferred to {2} and {3} is preferred to {4}, whereas the comparison between sets {1,4} and {2,3} could go either way. Paradox 1. A unique envy-free division may be inefficient. Suppose there is a set of three players, {A, B, C}, who must divide a set of six indivisible items, {1, 2, 3, 4, 5, 6}. Assume the players rank the items from best to worst as follows: A: 1 23 4 5 6 B: 43 2 1 56 C: 5 12 6 3 4 The unique envy-free allocation to (A, B, C) is ({1,3}, {2,4}, {5,6}), or for simplicity (13, 24, 56), whereby A and B get their best and 3rd-best items, and C gets its best and 4thbest items. Clearly, A prefers its allocation to that of B (which are A’s 2nd-best and 4th-best items) and that of C (which are A’s two worst items). Likewise, B and C prefer their allocations to those of the other two players. Consequently, the division (13, 24, 56) is envy-free: All players prefer their allocations to those of the other two players, so no player is envious of any other. 7 For a more systematic treatment of conflicts in fairness criteria and trade-offs that are possible, see [5,6,9,11,18,21].

1077

1078

Fair Division

Compare this division with (12, 34, 56), whereby A and B receive their two best items, and C receives, as before, its best and 4th-best items. This division Pareto-dominates (13, 24, 56), because two of the three players (A and B) prefer the former allocation, whereas both allocations give player C the same two items (56). It is easy to see that (12, 34, 56) is Pareto-optimal or efficient: No player can do better with some other division without some other player or players doing worse, or at least not better. This is apparent from the fact that the only way A or B, which get their two best items, can do better is to receive an additional item from one of the two other players, but this will necessarily hurt the player who then receives fewer than its present two items. Whereas C can do better without receiving a third item if it receives item 1 or item 2 in place of item 6, this substitution would necessarily hurt A, which will do worse if it receives item 6 for item 1 or 2. The problem with efficient allocation (12, 34, 56) is that it is not assuredly envy-free. In particular, C will envy A’s allocation of 12 (2nd-best and 3rd-best items for C) if it prefers these two items to its present allocation of 56 (best and 4th-best items for C). In the absence of information about C’s preferences for subsets of items, therefore, we cannot say that efficient allocation (12, 34, 56) is envyfree.8 But the real bite of this paradox stems from the fact that not only is inefficient division (13, 24, 56) envy-free, but it is uniquely so – there is no other division, including an efficient one, that guarantees envy-freeness. To show this in the example, note first that an envy-free division must give each player its best item; if not, then a player might prefer a division, like envy-free division (13, 24, 56) or efficient division (12, 34, 56), that does give each player its best item, rendering the division that does not do so envy-possible or envy-ensuring. Second, even if each player receives its best item, this allocation cannot be the only item it receives, because then the player might envy any player that receives two or more items, whatever these items are. 8 Recall

that an envy-free division of indivisible items is one in which, no matter how the players value subsets of items consistent with their rankings, no player prefers any other player’s allocation to its own. If a division is not envy-free, it is envy-possible if a player’s allocation may make it envious of another player, depending on how it values subsets of items, as illustrated for player C by division (12, 34, 56). It is envy-ensuring if it causes envy, independent of how the players value subsets of items. In effect, a division that is envy-possible has the potential to cause envy. By comparison, an envy-ensuring division always causes envy, and an envy-free division never causes envy.

By this reasoning, then, the only possible envy-free divisions in the example are those in which each player receives two items, including its top choice. It is easy to check that no efficient division is envy-free. Similarly, one can check that no inefficient division, except (13, 24, 56), is envy-free, making this division uniquely envy-free. Paradox 2. Neither the Rawlsian maximin criterion nor the utilitarian Borda-score criterion may choose a unique efficient and envy-free division. Unlike the example illustrating paradox 1, efficiency and envy-freeness are compatible in the following example: A: 1 2 3 4 5 6 B: 5 6 2 1 4 3 C: 3 6 5 4 1 2 There are three efficient divisions in which (A, B, C) each get two items: (i) (12, 56, 34); (12, 45, 36); (iii) (14, 25, 36). Only (iii) is envy-free: Whereas C might prefer B’s 56 allocation in (i), and B might prefer A’s 12 allocation in (ii), no player prefers another player’s allocation in (iii). Now consider the following Rawlsian maximin criterion to distinguish among the efficient divisions: Choose a division that maximizes the minimum rank of items that players receive, making a worst-off player as well off as possible.9 Because (ii) gives a 5th-best item to B, whereas (i) and (iii) give players, at worst, a 4th-best item, the latter two divisions satisfy the Rawlsian maximin criterion. Between these two, (i), which is envy-possible, is arguably better than (iii), which is envy-free: (i) gives the two players that do not get a 4th-best item their two best items, whereas (iii) does not give B its two best items.10 Now consider what a modified Borda count would also give the players under each of the three efficient divisions. Awarding 6 points for obtaining a best item, 5 points for obtaining a 2nd-best item, . . . , 1 point for obtaining a worst item in the example, (ii) and (iii) give the players a total of 30 points, whereas (i) gives the players a total of 31 points.11 This criterion, which I call the util9 This is somewhat different from Rawls’s (1971) proposal to maximize the utility of the player with minimum utility, so it might be considered a modified Rawlsian criterion. I introduce a rough measure of utility next with a modified Borda count. 10 This might be considered a second-order application of the maximin criterion: If, for two divisions, players rank the worst item any player receives the same, consider the player that receives a nextworst item in each, and choose the division in which this item is ranked higher. This is an example of a lexicographic decision rule, whereby alternatives are ordered on the basis of a most important criterion; if that is not determinative, a next-most important criterion is invoked, and so on, to narrow down the set of feasible alternatives. 11 The standard scoring rules for the Borda count in this 6-item example would give 5 points to a best item, 4 points to a 2nd-best

Fair Division

itarian Borda-score criterion, gives the nod to division (i); the Borda scores provide a measure of the overall utility or welfare of the players. Thus, neither the Rawlsian maximin criterion nor the utilitarian Borda-score criterion guarantees the selection of the unique efficient and envy-free division of (iii). Conclusions The squeezing procedure I illustrated for dividing up a cake among three players ensures efficiency and envyfreeness, but it does not satisfy equitability. Whereas adjusted winner satisfies efficiency, envy-freeness, and equitability for two players dividing up several divisible goods, all these properties cannot be guaranteed if there are more than two players. Finally, the two paradoxes relating to the fair division of indivisible good, which are independent of the procedure used, illustrate new difficulties – that no division may satisfy either maximin or utilitarian notions of welfare and, at the same time, be efficient and envy-free. Future Directions Patently, fair division is a hard problem, whatever the things being divided are. While some conflicts are ineradicable, as the paradoxes demonstrate, the trade-offs that best resolve these conflicts are by no means evident. Understanding these may help to ameliorate, if not solve, practical problems of fair division, ranging from the splitting of the marital property in a divorce to determining who gets what in an international dispute. Bibliography 1. Barbanel JB (2005) The Geometry of Efficient Fair Division. Cambridge University Press, New York 2. Barbanel JB, Brams SJ (2004) Cake Division with Minimal Cuts: Envy-Free Procedures for 3 Persons, 4 Persons, and Beyond. Math Soc Sci 48(3):251–269 3. Barbanel JB, Brams SJ (2007) Cutting a Pie Is Not a Piece of Cake. Am Math Month (forthcoming) 4. Brams SJ, Edelman PH, Fishburn PC (2001) Paradoxes of Fair Division. J Philos 98(6):300–314 5. Brams SJ, Edelman PH, Fishburn PC (2004) Fair Division of Indivisible Items. Theory Decis 55(2):147–180 6. Brams SJ, Fishburn PC (2000) Fair Division of Indivisible Items Between Two People with Identical Preferences: EnvyFreeness, Pareto-Optimality, and Equity. Soc Choice Welf 17(2):247–267 7. Brams SJ, Jones MA, Klamler C (2006) Better Ways to Cut a Cake. Not AMS 35(11):1314–1321 item, . . . , 0 points to a worst item. I depart slightly from this standard scoring rule to ensure that each player obtains some positive value for all items, including its worst choice, as assumed earlier.

8. Brams SJ, Jones MA, Klamler C (2007) Proportional Pie Cutting. Int J Game Theory 36(3–4):353–367 9. Brams SJ, Kaplan TR (2004) Dividing the Indivisible: Procedures for Allocating Cabinet Ministries in a Parliamentary System. J Theor Politics 16(2):143–173 10. Brams SJ, Kilgour MD (2001) Competitive Fair Division. J Political Econ 109(2):418–443 11. Brams SJ, King DR (2004) Efficient Fair Division: Help the Worst Off or Avoid Envy? Ration Soc 17(4):387–421 12. Brams SJ, Taylor AD (1995) An Envy-Free Cake Division Protocol. Am Math Month 102(1):9–18 13. Brams SJ, Taylor AD (1996) Fair Division: From Cake-Cutting to Dispute Resolution. Cambridge University Press, New York 14. Brams SJ, Taylor AD (1999a) Calculating Consensus. Corp Couns 9(16):47–50 15. Brams SJ, Taylor AD (1999b) The Win-Win Solution: Guaranteeing Fair Shares to Everybody. W.W. Norton, New York 16. Brams SJ, Taylor AD, Zwicker SW (1995) Old and New MovingKnife Schemes. Math Intell 17(4):30–35 17. Brams SJ, Taylor AD, Zwicker WS (1997) A Moving-Knife Solution to the Four-Person Envy-Free Cake Division Problem. Proc Am Math Soc 125(2):547–554 18. Edelman PH, Fishburn PC (2001) Fair Division of Indivisible Items Among People with Similar Preferences. Math Soc Sci 41(3):327–347 19. Even S, Paz A (1984) A Note on Cake Cutting. Discret Appl Math 7(3):285–296 20. Haake CJ, Raith MG, Su FE (2002) Bidding for Envy-Freeness: A Procedural Approach to n-Player Fair Division Problems. Soc Choice Welf 19(4):723–749 21. Herreiner D, Puppe C (2002) A Simple Procedure for Finding Equitable Allocations of Indivisible Goods. Soc Choice Welf 19(2):415–430 22. Jones MA (2002) Equitable, Envy-Free, and Efficient Cake Cutting for Two People and Its Application to Divisible Goods. Math Mag 75(4):275–283 23. Moulin HJ (2003) Fair Division and Collective Welfare. MIT Press, Cambridge 24. Peterson E, Su FE (2000) Four-Person Envy-Free Chore Division. Math Mag 75(2):117–122 25. Pikhurko O (2000)On Envy-Free Cake Division. Am Math Month 107(8):736–738 26. Potthoff RF (2002) Use of Linear Programming to Find an EnvyFree Solution Closest to the Brams-Kilgour Gap Solution for the Housemates Problem. Group Decis Negot 11(5):405–414 27. Robertson JM, Webb WA (1997) Near Exact and Envy-Free Cake Division. Ars Comb 45:97–108 28. Robertson J, Webb W (1998) Cake-Cutting Algorithms: Be Fair If You Can. AK Peters, Natick 29. Shishido H, Zeng DZ (1999) Mark-Choose-Cut Algorithms for Fair and Strongly Fair Division. Group Decis Negot 8(2): 125–137 30. Stromquist W (1980) How to Cut a Cake Fairly. Am Math Month 87(8):640–644 31. Su FE (1999) Rental Harmony: Sperner’s Lemma in Fair Division. Am Math Month 106:922–934 32. Young HP (1994) Equity in Theory and Practice. Princeton University Press, Princeton 33. Zeng DZ (2000) Approximate Envy-Free Procedures. Game Practice: Contributions from Applied Game Theory. Kluwer Academic Publishers, Dordrecht, pp. 259–271

1079

1080

Field Theoretic Methods

Field Theoretic Methods UWE CLAUS TÄUBER Department of Physics, Center for Stochastic Processes in Science and Engineering, Virginia Polytechnic Institute and State University, Blacksburg, USA Article Outline Glossary Definition of the Subject Introduction Correlation Functions and Field Theory Discrete Stochastic Interacting Particle Systems Stochastic Differential Equations Future Directions Acknowledgments Bibliography Glossary Absorbing state State from which, once reached, an interacting many-particle system cannot depart, not even through the aid of stochastic fluctuations. Correlation function Quantitative measure of the correlation of random variables; usually set to vanish for statistically independent variables. Critical dimension Borderline dimension dc above which mean-field theory yields reliable results, while for d  dc fluctuations crucially affect the system’s large scale behavior. External noise Stochastic forcing of a macroscopic system induced by random external perturbations, such as thermal noise from a coupling to a heat bath. Field theory A representation of physical processes through continuous variables, typically governed by an exponential probability distribution. Generating function Laplace transform of the probability distribution; all moments and correlation functions follow through appropriate partial derivatives. Internal noise Random fluctuations in a stochastic macroscopic system originating from its internal kinetics. Langevin equation Stochastic differential equation describing time evolution that is subject to fast random forcing. Master equation Evolution equation for a configurational probability obtained by balancing gain and loss terms through transitions into and away from each state.

Mean-field approximation Approximative analytical approach to an interacting system with many degrees of freedom wherein spatial and temporal fluctuations as well as correlations between the constituents are neglected. Order parameter A macroscopic density corresponding to an extensive variable that captures the symmetry and thereby characterizes the ordered state of a thermodynamic phase in thermal equilibrium. Nonequilibrium generalizations typically address appropriate stationary values in the long-time limit. Perturbation expansion Systematic approximation scheme for an interacting and/or nonlinear system that involves a formal expansion about an exactly solvable simplication by means of a power series with respect to a small coupling. Definition of the Subject Traditionally, complex macroscopic systems are often described in terms of ordinary differential equations for the temporal evolution of the relevant (usually collective) variables. Some natural examples are particle or population densities, chemical reactant concentrations, and magnetization or polarization densities; others involve more abstract concepts such as an apt measure of activity, etc. Complex behavior often entails (diffusive) spreading, front propagation, and spontaneous or induced pattern formation. In order to capture these intriguing phenomena, a more detailed level of description is required, namely the inclusion of spatial degrees of freedom, whereupon the above quantities all become local density fields. Stochasticity, i. e., randomly occuring propagation, interactions, or reactions, frequently represents another important feature of complex systems. Such stochastic processes generate internal noise that may crucially affect even long-time and large-scale properties. In addition, other system variables, provided they fluctuate on time scales that are fast compared to the characteristic evolution times for the relevant quantities of interest, can be (approximately) accounted for within a Langevin description in the form of external additive or multiplicative noise. A quantitative mathematical analysis of complex spatio-temporal structures and more generally cooperative behavior in stochastic interacting systems with many degrees of freedom typically relies on the study of appropriate correlation functions. Field-theoretic, i. e., spatially continuous, representations both for random processes defined through a master equation and Langevin-type stochastic differential equations have been developed since the 1970s. They provide a general framework for the com-

Field Theoretic Methods

putation of correlation functions, utilizing powerful tools that were originally developed in quantum many-body as well as quantum and statistical field theory. These methods allow us to construct systematic approximation schemes, e. g., perturbative expansions with respect to some parameter (presumed small) that measures the strength of fluctuations. They also form the basis of more sophisticated renormalization group methods which represent an especially potent device to investigate scale-invariant phenomena.

Introduction Stochastic Complex Systems Complex systems consist of many interacting components. As a consequence of either these interactions and/or the kinetics governing the system’s temporal evolution, correlations between the constituents emerge that may induce cooperative phenomena such as (quasi-)periodic oscillations, the formation of spatio-temporal patterns, and phase transitions between different macroscopic states. These are characterized in terms of some appropriate collective variables, often termed order parameters, which describe the large-scale and long-time system properties. The time evolution of complex systems typically entails random components: either, the kinetics itself follows stochastic rules (certain processes occur with given probabilities per unit time); or, we project our ignorance of various fast microscopic degrees of freedom (or our lack of interest in their detailed dynamics) into their treatment as stochastic noise. An exact mathematical analysis of nonlinear stochastic systems with many interacting degrees of freedom is usually not feasible. One therefore has to resort to either computer simulations of corresponding stochastic cellular automata, or approximative treatments. A first step, which is widely used and often provides useful qualitative insights, consists of ignoring spatial and temporal fluctuations, and just studying equations of motion for ensemble-averaged order parameters. In order to arrive at closed equations, additional simplifications tend to be necessary, namely the factorization of correlations into powers of the mean order parameter densities. Such approximations are called mean-field theories; familiar examples are rate equations for chemical reaction kinetics or Landau–Ginzburg theory for phase transitions in thermal equilibrium. Yet in some situations mean-field approximations are insufficient to obtain a satisfactory quantitative description (see, e. g., the recent work collected in [1,2]). Let us consider an illuminating example.

Example: Lotka–Volterra Model In the 1920s, Lotka and Volterra independently formulated a mathematical model to describe emerging periodic oscillations respectively in coupled autocatalytic chemical reactions, and in the Adriatic fish population (see, e. g., [3]). We shall formulate the model in the language of population dynamics, and treat it as a stochastic system with two species A (the ‘predators’) and B (the ‘prey’), subject to the following reactions: predator death A ! ;, with rate ; prey proliferation B ! B C B, with rate  ; predation interaction A C B ! A C A, with rate . Obviously, for  D 0 the two populations decouple; while the predators face extinction, the prey population will explode. The average predator and prey population densities a(t) and b(t) are governed by the linear differential ˙ D  b(t), whose soluequations a˙(t) D a(t) and b(t) tions are exponentials. Interesting competition arises as a consequence of the nonlinear process governed by the rate . In an exact representation of the system’s temporal evolution, we would now need to know the probability of finding an A  B pair at time t. Moreover, in a spatial Lotka–Volterra model, defined on a d-dimensional lattice, say, on which the individual particles can move via nearest-neighbor hopping, the predation reaction should occur only if both predators and prey occupy the same or adjacent sites. The evolution equations for the mean densities a(t) and b(t) would then have to be respectively amended by the terms ˙ha(x; t)b(x; t)i. Here a(x, t) and b(x, t) represent local concentrations, the brackets denote the ensemble average, and ha(x; t)b(x; t)i represents A  B cross correlations. In the rate equation approximation, it is assumed that the local densities are uncorrelated, whereupon ha(x; t)b(x; t)i factorizes to ha(x; t)ihb(x; t)i D a(t)b(t). This yields the famous deterministic Lotka–Volterra equations ˙ D a(t)b(t)  a(t) ; a(t) ˙b(t) D  b(t)  a(t)b(t) :

(1)

Within this mean-field approximation, the quantity K(t) D [a(t) C b(t)]   ln a(t)   ln b(t) (essentially the system’s Lyapunov function) is a constant of motion, ˙ D 0. This results in regular nonlinear population osK(t) cillations, whose frequency and amplitude are fully determined by the initial conditions, a rather unrealistic feature. Moreover Eqs. (1) are known to be unstable with respect to various model modifications (as discussed in [3]). In contrast with the rate equation predictions, the original stochastic spatial Lotka–Volterra system displays much richer behavior (a recent overview is presented

1081

1082

Field Theoretic Methods

Field Theoretic Methods, Figure 1 Snapshots of the time evolution (left to right) of activity fronts emerging in a stochastic Lotka–Volterra model simulated on a 512  512 lattice, with periodic boundary conditions and site occupation numbers restricted to 0 or 1. For the chosen reaction rates, the system is in the species coexistence phase (with rates  D 4:0,  D 0:1, and  D 2:2), and the corresponding mean-field fixed point a focus. The red, blue, and black dots respectively represent predators A, prey B, and empty sites ;. Reproduced with permission from [4]

Field Theoretic Methods, Figure 2 Static correlation functions a CAA (x) (note the logarithmic scale), and b CAB (x), measured in simulations on a 1024  1024 lattice without any restrictions on the site occupations. The reaction rates were  D 0:1,  D 0:1, and  was varied from 0.5 (blue triangles, upside down), 0.75 (green triangles), to 1.0 (red squares). Reproduced with permission from [5]

in [4]): The predator–prey coexistence phase is governed, for sufficiently large values of the predation rate, by an incessant sequence of ‘pursuit and evasion’ wave fronts that form quite complex dynamical patterns, as depicted in Fig. 1, which shows snapshots taken in a two-dimensional lattice Monte Carlo simulation where each site could at most be occupied by a single particle. In finite systems, these correlated structures induce erratic population oscillations whose features are independent of the initial configuration. Moreover, if locally the prey ‘carrying capacity’ is limited (corresponding to restricting the maximum site occupation number per lattice site), there appears an extinction threshold for the predator population that separates the active coexistence regime through a continuous phase transition from a state wherein at long times t ! 1 only prey survive. With respect to the predator population, this represents an absorbing state: Once all A particles

have vanished, they cannot be produced by the stochastic kinetics. A quantitative characterization of the emerging spatial structures utilizes equal-time correlation functions such as C AA (x  x 0 ; t) D ha(x; t)a(x 0 ; t)i  a(t)2 and C AB (x  x 0 ; t) D ha(x; t)b(x 0 ; t)i  a(t)b(t), computed at some large time t in the (quasi-)stationary state. These are shown in Fig. 2 as measured in computer simulations for a stochastic Lotka–Volterra model (but here no restrictions on the site occupation numbers of the A or B particles were implemented). The A  A (and B  B) correlations obviously decay essentially exponentially with distance x, C AA (x) / C BB (x) / ejxj/ , with roughly equal correlation lengths  for the predators and prey. The cross-correlation function C AB (x) displays a maximum at six lattice spacings; these positive correlations indicate the spatial extent of the emerging activity fronts (prey followed

Field Theoretic Methods

P [S i ] D 1, where the integration extends over the allowed range of values for the Si ; i. e., P [S i ] D ZD

1 Z

exp (A[S i ]) ;

Z Y N

(2) dS i exp (A[S i ]) :

iD1

Field Theoretic Methods, Figure 3 Space-time plot (space horizontal, with periodic boundary conditions; time vertical, proceeding downward) showing the temporal evolution of a one-dimensional stochastic Lotka–Volterra model on 512 lattice sites, but without any restrictions on the site occupation numbers (red: predators, blue: prey, magenta: sites occupied by both species; rates:  D 0:1,  D 0:1,  D 0:1). Reproduced with permission from [5]

by the predators). At closer distance, the A and B particles become anti-correlated (C AB (x) < 0 for jxj < 3): prey would not survive close encounters with the predators. In a similar manner, one can address temporal correlations. These appear prominently in the space-time plot of Fig. 3 obtained for a Monte Carlo run on a one-dimensional lattice (no site occupation restrictions), indicating localized population explosion and extinction events. Correlation Functions and Field Theory The above example demonstrates that stochastic fluctuations and correlations induced by the dynamical interactions may lead to important features that are not adequately described by mean-field approaches. We thus require tools that allow us to systematically account for fluctuations in the mathematical description of stochastic complex systems and evaluate characteristic correlations. Such a toolbox is provided through field theory representations that are conducive to the identification of underlying symmetries and have proven useful starting points for the construction of various approximation schemes. These methods were originally devised and elaborated in the theory of (quantum and classical) many-particle systems and quantum fields ([6,7,8,9,10,11,12,13] represent a sample of recent textbooks).

In canonical equilibrium statistical mechanics, A[S i ] D H [S i ]/kB T is essentially the Hamiltonian, and the normalization is the partition function Z. In Euclidean quantum field theory, the action A[S i ] is given by the Langrangian. All observables O should be functions of the basic degrees of freedom Si ; their ensemble average thus becomes hO[S i ]i D

Z Y N

dS i O[S i ]P [S i ]

iD1

D

1

Z Y N

Z

dS i O[S i ] exp (A[S i ]) :

(3)

iD1

If we are interested in n-point correlations, i. e., expectation values of the products of the variables Si , it is useful to define a generating function * + N X W [ j i ] D exp ji Si ; (4) iD1

with W [ j i D 0] D 1. Notice that W [ j i ] formally is just the Laplace transform of the probability distribution P [S i ]. The correlation functions can now be obtained via partial derivatives of W [ j i ] with respect to the sources ji : hS i 1 : : : S i n i D

ˇ @ @ ˇ ::: W [ j i ]ˇ : j i D0 @ ji1 @ jin

(5)

Connected correlation functions or cumulants can be found by similar partial derivatives of the logarithm of the generating function: hS i 1 : : : S i n ic D

ˇ @ @ ˇ ::: ln W [ j i ]ˇ ; j i D0 @ ji1 @ jin

(6)

e. g., hS i ic D hS i i, and hS i S j ic D hS i S j i  hS i ihS j i D h(S i  hS i i)(S j  hS j i)i. Perturbation Expansion

Generating Functions The basic structure of these field theories rests in a (normalized) exponential probability distributionR P [S i ] for QN the N relevant variables Si , i D 1; : : : ; N: iD1 dS i

For a Gaussian action, i. e., a quadratic form A0 [S i ] D 1P i j S i A i j S j (for simplicity we assume real variables 2 Si ), one may readily compute the corresponding generating function W0 [ j i ]. After diagonalizing the symmetric

1083

1084

Field Theoretic Methods

N  N matrix A i j , completing the squares, and evaluating the ensuing Gaussian integrals, one obtains

point functions (‘propagators’) that are connected to vertices that stem from the (polynomial) interaction terms; for details, see, e. g., [6,7,8,9,10,11,12,13].

(2) N/2

Z0 D p

; det A 1 0 N 1 X 1 W0 [ j i ] D exp @ ji Ai j j j A ; 2

Continuum Limit and Functional Integrals ˝

Si S j

˛ 0

D A1 ij :

i; jD1

(7) Thus, the two-point correlation functions in the Gaussian ensemble are given by the elements of the inverse harmonic coupling matrix. An important special property of the Gaussian ensemble is that all n-point functions with odd n vanish, whereas those with even n factorize into sums of all possible permutations of products of two-point functions A1 i j that can be constructed by pairing up the variables Si (Wick’s theorem). For example, the four-point function reads hS i S j S k S l i0 D 1 1 1 1 1 A1 i j A k l C A i k A jl C A i l A jk . Let us now consider a general action, isolate the Gaussian contribution, and label the remainder as the nonlinear, anharmonic, or interacting part, A[S i ] D A0 [S i ] C AR [S i ]. We then observe that D

E



Z D Z0 exp AR [S i ]

; E O[S i ] exp AR [S i ] D  E 0 ; hO[S i ]i D exp AR [S i ] 

D

0

(8)

*

0

where the index 0 indicates that the expectation values are computed in the Gaussian ensemble. The nonlinear terms in Eq. (8) may now be treated perturbatively by expanding the exponentials in the numerator and denominator with respect to the interacting part AR [S i ]:  hO[S i ]i D

O[S i ]



P1

P1

`  i] 0 :  `  R A [S i ]

1 `D0 `!

1 `D0 `!

Discrete spatial degrees of freedom are already contained in the above formal description: for example, on a d-dimensional lattice with N d sites the index i for the fields Si merely needs to entail the site labels, and the total number of degrees of freedom is just N D N d times the number of independent relevant quantities. Upon discretizing time, these prescriptions can be extended in effectively an additional dimension to systems with temporal evolution. We may at last take the continuum limit by letting N ! 1, while the lattice constant and elementary time step tend to zero in such a manner that macroscopic dynamical features are preserved. Formally, this replaces sums over lattice sites and time steps with spatial and temporal integrations; the action A[S i ] becomes a functional of the fields S i (x; t); partial derivatives turn Rinto functional RderivaQN tives; and functional integrations iD1 dS i ! D[S i ] are to be inserted in the previous expressions. For example, Eqs. (3), (4) and (6) become Z 1 D[S i ]O[S i ] exp (A[S i ]) ; (10) hO[S i ]i D Z * + Z Z X d W [ j i ] D exp d x dt j i (x; t)S i (x; t) ; (11)



AR [S

(9)

0

If the interaction terms are polynomial in the variables Si , Wick’s theorem reduces the calculation of n-point functions to a summation of products of Gaussian two-point functions. Since the number of contributing terms grows factorially with the order ` of the perturbation expansion, graphical representations in terms of Feynman diagrams become very useful for the classification and evaluation of the different contributions to the perturbation series. Basically, they consist of lines representing the Gaussian two-

n Y jD1

i

+ S i j (x j ; t j )

D c

n Y jD1

ˇ ı ˇ : (12) ln W [ j i ]ˇ j i D0 ı j i j (x j ; t j )

Thus we have arrived at a continuum field theory. Nevertheless, we may follow the procedures outlined above; specifically, the perturbation expansion expressions (8) and (9) still hold, yet with arguments S i (x; t) that are now fields depending on continuous space-time parameters. More than thirty years ago, Janssen and De Dominicis independently derived a mapping of the stochastic kinetics defined through nonlinear Langevin equations onto a field theory action ([14,15]; reviewed in [16]). Almost simultaneously, Doi constructed a Fock space representation and therefrom a stochastic field theory for classical interacting particle systems from the master equation describing the corresponding stochastic processes [17,18]. His approach was further developed by several authors into a powerful method for the study of internal noise and correlation effects in reaction-diffusion systems ([19,20,21,22,23]; for recent reviews, see [24,25]). We shall see below that the field-theoretic representations of both classical master and Langevin equations require two independent fields

Field Theoretic Methods

for each stochastic variable. Otherwise, the computation of correlation functions and the construction of perturbative expansions fundamentally works precisely as sketched above. But the underlying causal temporal structure induces important specific features such as the absence of ‘vacuum diagrams’ (closed response loops): the denominator in Eq. (2) is simply Z D 1. (For unified and more detailed descriptions of both versions of dynamic stochastic field theories, see [26,27].)

can construct the master equation associated with these on-site reactions as follows. The annihilation process locally changes the occupation numbers by one; the transition rate from a state with ni particles at site i to n i  1 particles is Wn i !n i 1 D n i (n i  1), whence

Discrete Stochastic Interacting Particle Systems

represents the master equation for this reaction at site i. As an initial condition, we can for example choose a Poisson distribution P(n i ) D n¯ 0n i en¯ 0 /n i ! with mean initial particle density n¯ 0 . In order to capture the complete stochastic dynamics, we just need to add similar contributions describing other processes, and finally sum over all lattice sites i. Since the reactions all change the site occupation numbers by integer values, a Fock space representation (borrowed from quantum mechanics) turns out particularly useful. To this end, we introduce the harmonic oscillator or bosonic ladder operator algebra [a i ; a j ] D 0 D    [a i ; a j ], [a i ; a j ] D ı i j , from which we construct the particle number eigenstates jn i i, namely a i jn i i D n i jn i  1i,   a i jn i i D jn i C 1i, a i a i jn i i D n i jn i i. (Notice that a different normalization than in ordinary quantum mechanics has been employed here.) A general state with ni particles on sites i is obtained from the ‘vacuum’ configuration j0i, defined via a i j0i D 0, through the product ni Q jfn i gi D i ai j0i. To implement the stochastic kinetics, we introduce a formal state vector as a linear combination of all possible states weighted by the time-dependent configurational probability:

We first outline the mapping of stochastic interacting particle dynamics as defined through a master equation onto a field theory action [17,18,19,20,21,22,23]. Let us denote the configurational probability for a stochastically evolving system to be in state ˛ at time t with P(˛; t). Given the transition rates W˛!ˇ (t) from states ˛ to ˇ, a master equation essentially balances the transitions into and out of each state: X  @P(˛; t) Wˇ !˛ (t)P(ˇ; t)  W˛!ˇ (t)P(˛; t) : D @t ˇ 6D˛

(13) The dynamics of many complex systems can be cast into the language of ‘chemical’ reactions, wherein certain particle species (upon encounter, say) transform into different species with fixed (time-independent) reaction rates. The ‘particles’ considered here could be atoms or molecules in chemistry, but also individuals in population dynamics (as in our example in Sect. “Example: Lotka–Volterra Model”), or appropriate effective degrees of freedom governing the system’s kinetics, such as domain walls in magnets, etc. To be specific, we envision our particles to propagate via unbiased random walks (diffusion) on a d-dimensional hypercubic lattice, with the reactions occuring according to prescribed rules when particles meet on a lattice site. This stochastic interacting particle system is then at any time fully characterized by the number of particles n A ; n B ; : : : of each species A; B; : : : located on any lattice site. The following describes the construction of an associated field theory action. As important examples, we briefly discuss annihilation reactions and absorbing state phase transitions. Master Equation and Fock Space Representation The formal procedures are best explained by means of a simple example; thus consider the irreversible binary annihilation process A C A ! A, happening with rate . In terms of the occupation numbers ni of the lattice sites i, we

@P(n i ; t) D (n i C1)n i P(n i C1; t)n i (n i 1)P(n i ; t) @t (14)

j˚ (t)i D

X

P(fn i g; t)jfn i gi :

(15)

fn i g

Simple manipulations then transform the linear time evolution according to the master equation into an ‘imaginary-time’ Schrödinger equation @j˚(t)i D Hj˚(t)i ; @t

j˚(t)i D eHt j˚(0)i

(16)

governed by a stochastic quasi-Hamiltonian (rather, the Liouville time evolution operator). For on-site reaction P  processes, Hreac D i H i (a i ; a i ) is a sum of local contributions; e. g., for the binary annihilation reaction,    H i (a i ; a i ) D (1  a i )a i a2i . It is a straightforward exercise to construct the corresponding expressions within

1085

1086

Field Theoretic Methods

this formalism for the generalization kA ! `A,

 ` k a ik ; H i (a i ; a i ) D  a i  a i

(17)

and for nearest-neighbor hopping with rate D between adjacent sites hi ji,  X     (18) Hdiff D D ai  a j ai  a j :

The two contributions for each process may be interpreted as follows: The first term in Eq. (17) corresponds to the actual process, and describes how many particles are annihilated and (re-)created in each reaction. The second term encodes the ‘order’ of each reaction, i. e., the number  operator a i a i appears to the kth power, but in the normalk

ordered form a i a ik , for a kth-order process. These procedures are readily adjusted for reactions involving multiple particle species. We merely need to specify the occupation numbers on each site and correspondingly introduce additional ladder operators b i ; c i ; : : : for each new species,   with [a i ; b i ] D 0 D [a i ; c i ] etc. For example, consider the reversible reaction kA C `B • mC with forward rate  and backward rate ; the associated reaction Hamiltonian reads

  X m  k ` ci  ai bi Hreac D  a ik b`i   c im : (19) i

Similarly, for the Lotka–Volterra model of Sect. “Example: Lotka–Volterra Model”, one finds    Xh     Hreac D   1  ai ai C  bi  1 bi bi i

 i     C  ai  bi ai ai bi :

(20)

Note that all the above quasi-Hamiltonians are non-Hermitean operators, which naturally reflects the creation and destruction of particles. Our goal is to compute averages and correlation functions with respect to the configurational probability P(fn i g; t). Returning to a single-species system (again, the generalization to many particle species is obvious), this is accomplished with the aid of the projection state Q  hP j D h0j i e a i , for which hP j0i D 1 and hP ja i D hP j,  since [e a i ; a j ] D e a i ı i j . For the desired statistical averages of observables (which must all be expressible as functions of the occupation numbers fn i g), one obtains X  hO(t)i D O(fn i g)P(fn i g; t) D hP jO(fa i a i g)j˚(t)i : fn i g

For example, as a consequence of probability conservation, 1 D hP j˚(t)i D hP jeHt P j˚(0)i. Thus necessarily hP jH D 0; upon commuting e i a i with H, the creation   operators are shifted a i ! 1 C a i , whence this condi tion is fulfilled provided H i (a i ! 1; a i ) D 0, which is indeed satisfied by our above explicit expressions (17) and (18). Through this prescription, we may replace  a i a i ! a i in all averages; e. g., the particle density becomes a(t) D ha i (t)i. In the bosonic operator representation above, we have assumed that no restrictions apply to the particle occupation numbers ni on each site. If n i  2s C 1, one may instead employ a representation in terms of spin s operators. For example, particle exclusion systems with n i D 0 or 1 can thus be mapped onto non-Hermitean spin 1/2 ‘quantum’ systems (for recent overviews, see [28,29]). Specifically in one dimension, such representations in terms of integrable spin chains have been very fruitful. An alternative approach uses the bosonic theory, but incorporates the site occupation restrictions through exponentials in the

number operators ea i a i [30]. Continuum Limit and Field Theory As a next step, we follow an established route in quantum many-particle theory [8] and proceed towards a field theory representation through constructing the path integral equivalent to the ‘Schrödinger’ dynamics (16) based on coherent states, which are right eigenstates of the ani, with complex eigennihilation operator, a i j i i D  i j i   

values  i . Explicitly, j i i D exp  12 j i j2 C  i a i j0i, and these coherent  formula  states satisfy the overlap h j j i i D exp  12 j i j2  12 j j j2 C  j  i , and the RQ 2 (over-)completeness relation i d  i jf i gihf i gj D . Upon splitting the temporal evolution (16) into infinitesimal increments, standard procedures (elaborated in detail in [25]) eventually yield an expression for the configurational average Z Y  hO(t)i / d i d i O(f i g)eA[ i ; i ;t] ; (22) i

which is of the form (3), with the action A[ i ;  i ; t f ] D

Z C 0

(21)

X

  i (t f )

i

tf

!  @ i   dt  i C H i ( i ;  i )  n¯ 0  i (0) ; @t (23)

Field Theoretic Methods

where the first term originates from the projection state, and the last one stems from the initial Poisson distribution. Through this procedure, in the original quasi-Hamil tonian the creation and annihilation operators a i and ai  are simply replaced with the complex numbers  i and  i . Finally, we proceed to the continuum limit,  i (t) ! (x; t),  i (t) ! ˆ (x; t). The ‘bulk’ part of the action then becomes Z A[ ˆ ; ] D dd x

Z @  Dr 2  dt ˆ C Hreac( ˆ ; ) ; (24) @t where the discrete hopping contribution (18) has naturally turned into a continuum diffusion term. We have thus arrived at a microscopic field theory for stochastic reaction– diffusion processes, without invoking any assumptions on the form or correlations of the internal reaction noise. Note that we require two independent fields ˆ and to capture the stochastic dynamics. Actions of the type (24) may serve as a basis for further systematic coarse-graining, constructing a perturbation expansion as outlined in Sect. “Perturbation Expansion”, and perhaps a subsequent renormalization group analysis [25,26,27]. We remark that it is often useful to perform a shift in the field ˆ about the mean-field solution, ˆ (x; t) D 1 C e(x; t). For occasionally, the resulting field theory action allows the derivation of an equivalent Langevin dynamics, see Sect. “Stochastic Differential Equations” below. Annihilation Processes Let us consider our simple single-species example kA ! `A. The reaction part of the corresponding field theory action reads   k Hreac ( ˆ ; ) D  ˆ `  ˆ k ; (25) see Eq. (17). It is instructive ı to study the classical field equations, namely ı A ı D 0, which is always ˆ solved ı by D 1, reflecting probability conservation, and ı A ı ˆ D 0, which, upon inserting ˆ D 1 yields @ (x; t) D Dr 2 (x; t)  (k  `) (x; t) k ; @t

(26)

i. e., the mean-field equation for the local particle density (x; t), supplemented with a diffusion term. For k D 1, the particle density grows (k < `) or decays (k > `) exponentially. The solution of the rate equation for k > 1, 1/(k1)  a(t) D h (x; t)i D a(0)1k C (k  l)(k  1)t

implies a divergence within a finite time for k < `, and an algebraic decay (t)1/(k1) for k > `. The full field theory action, which was derived from the master equation defining the very stochastic process, provides a means of systematically including fluctuations in the mathematical treatment. Through a dimensional analysis, we can determine the (upper) critical dimension below which fluctuations become sufficiently strong to alter these power laws. Introducing an inverse length scale , [x]  1 , and applying diffusive temporal scaling, [Dt]  2 , and [ ˆ (x; t)]  0 , [ (x; t)]  d in d spatial dimensions, the reaction rate in terms of the diffusivity scales according to [/D]  2(k1)d . In large dimensions, the kinetics is reaction-limited, and at least qualitatively correctly described by the mean-field rate equation. In low dimensions, the dynamics becomes diffusion-limited, and the annihilation reactions generate depletion zones and spatial particle anti-correlations that slow down the density decay. The nonlinear coupling /D becomes dimensionless at the boundary critical dimension dc (k) D 2/(k  1) that separates these two distinct regimes. Thus in physical dimensions, intrinsic stochastic fluctuations are relevant only for pair and triplet annihilation reactions. By means of a renormalization group analysis (for details, see [25]) one finds for k D 2 and d < dc (2) D 2: a(t) (Dt)d/2 [21,22], as confirmed by exact solutions in one dimension. Precisely at the critical dimension, the mean-field decay laws acquire logarithmic corrections, namely a(t) (Dt)1 ln(Dt) for k D 2  1/2 for k D 3 at at dc (2) D 2, and a(t) (Dt)1 ln(Dt) dc (3) D 1. Annihilation reaction between different species (e. g., A C B ! ;) may introduce additional correlation effects, such as particle segregation and the confinement of active dynamics to narrow reaction zones [23]; a recent overview can be found in [25]. Active to Absorbing State Phase Transitions Competition between particle production and decay processes leads to even richer scenarios, and can induce genuine nonequilibrium transitions that separate ‘active’ phases (wherein the particle densities remain nonzero in the long-time limit) from ‘inactive’ stationary states (where the concentrations ultimately vanish). A special but abundant case are absorbing states, where, owing to the absence of any agents, stochastic fluctuations cease entirely, and no particles can be regenerated [31,32]. These occur in a variety of systems in nature ([33,34] contain extensive discussions of various model systems); examples are chemical reactions involving an inert state ;, wherefrom no reactants A are released anymore, or stochastic

1087

1088

Field Theoretic Methods

population dynamics models, combining diffusive migration of a species A with asexual reproduction A ! 2A (with rate ), spontaneous death A ! ; (at rate ), and lethal competition 2A ! A (with rate ). In the inactive state, where no population members A are left, clearly all processes terminate. Similar effective dynamics may be used to model certain nonequilibrium physical systems, such as the domain wall kinetics in Ising chains with competing Glauber and Kawasaki dynamics. Here, spin flips ""##!"""# and ""#"!"""" may be viewed as domain wall (A) hopping and pair annihilation 2A ! ;, whereas spin exchange ""##!"#"# represents a branching process A ! 3A. Notice that the paraand ferromagnetic phases respectively map onto the active and inactive ‘particle’ states. The ferromagnetic state becomes absorbing if the spin flip rates are taken at zero temperature. The reaction quasi-Hamiltonian corresponding to the stochastic dynamics of the aforementioned population dynamics model reads    Hreac ( ˆ ; ) D 1  ˆ  ˆ     ˆ 2 : (27) The associated rate equation is the Fisher–Kolmogorov equation (see Murray 2002 [3]) a˙(t) D (  )a(t)  a(t)2 ;

(28)

which yields both inactive and active phases: For  <  we have a(t ! 1) ! 0, whereas for  >  the density eventually saturates at a s D (  )/.ı The explicit time-dependent solution a(t) D a(0)a s [a(0) C [a s  a(0)]e()t ] shows that both stationary states are approached exponentially in time. They are separated by a continuous nonequilibrium phase transition at  D , where the temporal decay becomes algebraic, a(t) D a(0)/[1Ca(0)t]) ! 1/(t) as t ! 1, independent of the initial density a(0). As in second-order equilibrium phase transitions, however, critical fluctuations are expected to invalidate the mean-field power laws in low dimensions d < dc .

If we now shift the field ˆ about its stationary value 1 p and rescalepaccording to ˆ (x; t) D 1 C  / e S(x; t) and (x; t) D / S(x; t), the (bulk) action becomes 

 A e S; S D

Z

"

  @ S C D r  r2 S d x dt e @t #   2 2 e e e  u S  S SS C S S : (29) d

Z

Thus, the three-point verticesphave been scaled to identical coupling strengths u D , which in fact represents the effective coupling of the perturbation expansion. Its scaling dimension is [u] D 2d/2 , whence we infer the upper critical dimension dc D 4. The four-point vertex / , with [] D 2d , is then found to be irrelevant in the renormalization group sense, and can be dropped for the computation of universal, asymptotic scaling properties. The action (29) with  D 0 is known as Reggeon field theory [35]; it satisfies a characteristic symmetry, namely invariance under so-called rapidity inversion S(x; t) $ e S(x; t). Remarkably, it has moreover been established that the field theory action (29) describes the scaling properties of critical directed percolation clusters [36,37,38]. The fluctuation-corrected universal power laws governing the vicinity of the phase transition can be extracted by renormalization group methods (reviewed for directed percolation in [39]). Table 1 compares the analytic results obtained in an " expansion about the critical dimension ( D 4  d) with the critical exponent values measured in Monte Carlo computer simulations [33,34]. According to a conjecture originally formulated by Janssen and Grassberger, any continuous nonequilibrium phase transition from an active to an absorbing state in a system governed by Markovian stochastic dynamics that is decoupled from any other slow variable, and in the absence of special additional symmetries or quenched randomness, should in fact fall in the directed percolation universality class [38,40]. This statement has indeed been confirmed in a large variety of model sytems (many exam-

Field Theoretic Methods, Table 1 Comparison of the values for the critical exponents of the directed percolation universality class measured in Monte Carlo simulations with the analytic renormalization group results within the  D 4  d expansion:  denotes the correlation length, tc the characteristic relaxation time, as the saturation density in the active state, and ac (t) the critical density decay law Scaling exponent  j j tc  z j jz as j jˇ ac (t) t ˛

dD1   1:100 z  1:576 ˇ  0:2765 ˛  0:160

dD2   0:735 z  1:73 ˇ  0:584 ˛  0:46

dD4  D 1/2 C /16 C O( 2 ) z D 2  /12 C O( 2 ) ˇ D 1  /6 C O( 2 ) ˛ D 1  /4 C O( 2 )

Field Theoretic Methods

ples are listed in [33,34]). It even pertains to multi-species generalizations [41], and applies for instance to the predator extinction threshold in the stochastic Lotka–Volterra model with restricted site occupation numbers mentioned in Sect. “Example: Lotka–Volterra Model” [4]. Stochastic Differential Equations This section explains how dynamics governed by Langevin-type stochastic differential equations can be represented through a field-theoretic formalism [14,15,16]. Such a description is especially useful to capture the effects of external noise on the temporal evolution of the relevant quantities under consideration, which encompasses the case of thermal noise induced by the coupling to a heat bath in thermal equilibrium at temperature T. The underlying assumption in this approach is that there exists a natural separation of time scales between the slow variables Si , and all other degrees of freedom % i which in comparison fluctuate rapidly, and are therefore summarily gathered in zero-mean noise terms, assumed to be uncorrelated in space and time, h% i (x; t)i D 0 ; ˝ ˛ % i (x; t)% j (x 0 ; t 0 ) D 2L i j [S i ]ı(x  x 0 )ı(t  t 0 ) :

the end all the path integrals are defined through appropriate discretizations in space and time): 1D

Z Y

D[S i ]

i

Y @S i (x; t)  ı  F i [S i ](x; t)  % i (x; t) @t Z Z(x;t) Y e D D[i S i ]D[S i ] exp  dd x Z 

i

dt

X



e Si

i

h

1 4

Z

(30)

tf



dt 0

@S i (t) D F i [S i ] C % i ; @t

(31)

where we may decompose the ‘systematic forces’ into reversible terms of microscopic origin and relaxational contributions that are induced by the noise and drive the system towards its stationary state (see below), i. e.: F i [S i ] D F irev[S i ] C Firel [S i ]. Both ingredients may contain nonlinear terms as well as mode couplings between different variables. Again, we first introduce the abstract formalism, and then proceed to discuss relaxation to thermal equilibrium as well as some examples for nonequilibrium Langevin dynamics.

:

(32)

Z

dd x

X

% i (x; t)

h

i

#

L1 i j % j (x; t)

:

(33)

ij

Inserting the identity (32) and the probability distribution (33) into the desired stochastic noise average of any observable O[S i ], we arrive at hO[S i ]i /

Z Y

D[ie S i ]D[S i ]

i " Z

# Z X @S i d e Si exp  d x dt  F i [S i ] O[S i ] @t Zi Z Y Z  D[% i ] exp  dd x dt i



X 1 i

4

%i

X

e L1 i j % j  S i %i



:

(34)

j

Subsequently evaluating the Gaussian integrals over the noise % i yields at last

Field Theory Representation of Langevin Equations The shortest and most general route towards a field theory representation of the Langevin dynamics (31) with noise correlations (30) starts with one of the most elaborate ways to expand unity, namely through a product of functional delta functions (for the sake of compact notations, we immediately employ a functional integration language, but in



In the second line we have used the Fourier representation of the (functional) delta distribution by means of the purely imaginary auxiliary variables e S i (also called Martin–Siggia–Rose response fields [42]). Next we require the explicit form of the noise probability distribution that generates the correlations (30); for simplicity, we may employ the Gaussian W [% i ] / exp 

Here, the noise correlator 2L i j [S i ] may be a function of the slow system variables Si , and also contain operators such as spatial derivatives. A general set of coupled Langevin-type stochastic differential equations then takes the form

@S i  F i [S i ]  % i @t

hO[S i ]i D

Z Y

D[S i ]O[S i ]P [S i ] ;

i

P [S i ] /

Z Y

e

(35)

D[ie S i ]eA[S i ;S i ] ;

i

with the statistical weight governed by the Janssen–De Do-

1089

1090

Field Theoretic Methods

minicis ‘response’ functional [14,15] A[e S i ; Si ] D



X

Z

dd x

Z

tf

dt 0

2 4e Si



i

3

X @S i S j5 : Si L i je  F i [S]  e @t

(36)

j

It should be noted that in the above manipulations, we have omitted the functional determinant from the variable change f% i g ! fS i g. This step can be justified through applying a forward (Itô) discretization (for technical details, see [16,27,43]). Normalization implies RQ A[e S i ;S] i D 1. The first term in the ace i D[i S i ]D[S i ]e tion (36) encodes the temporal evolution according to the systematic terms in the Langevin Equations (31), whereas the second term specifies the noise correlations (30). Since the auxiliary fields appear only quadratically, they could be eliminated via completing the squares and Gaussian integrations. This results in the equivalent Onsager–Machlup functional which however contains squares of the nonlinear terms and the inverse of the noise correlator operators; the form (36) is therefore usually more convenient for practical purposes. The Janssen–De Dominicis functional (36) takes the form of a (d C 1)-dimensional statistical field theory with again two independent sets of fields Si and e S i . It may serve as a starting point for systematic approximation schemes including perturbative expansions, and subsequent renormalization group treatments. Causality is properly incorporated in this formalism which has important technical implications [16,27,43].

Consider the dynamics of a system that following some external perturbation relaxes towards thermal equilibrium governed by the canonical Boltzmann distribution at fixed temperature T, 1 Z(T)

exp (H [S i ]/kB T) :

(37)

The relaxational term in the Langevin Equation (31) can then be specified as F irel[S i ] D  i

ı H [S i ] ; ıS i

Z

dd x

X i

 ı  rev F i [S i ]eH [S i ]/k B T D 0 : ıS i (x)

(38)

(39)

This condition severely constrains the reversible force terms. For example, for a system whose microscopic time evolution is˚determined through the Poisson brack ets Q i j (x; x 0 ) D S i (x); S j (x 0 ) D Q ji (x 0 ; x) (to be replaced by commutators in quantum mechanics), one finds for the reversible mode-coupling terms [44] Z F irev [S i ](x) D  dd x 0 X ıQ i j (x; x 0 ) ı H [S i ]  T  k : Q i j (x; x 0 ) B ıS j (x 0 ) ıS j (x 0 )

(40)

j

Second, the noise correlator in Eq. (30) must be related to the Onsager relaxation coefficients through the Einstein relation L i j D kB T i ı i j :

Thermal Equilibrium and Relaxational Critical Dynamics

Peq [S i ] D

@S i /@t C r  J i D 0, with a conserved current that is typically given by a gradient of the field Si : J i D D i rS i C: : :; as a consequence, the fluctuations of the fields Si will relax diffusively with diffusivity Di , and  i D D i r 2 becomes a spatial Laplacian. In order for P (t) ! Peq as t ! 1, the stochastic Langevin dynamics needs to satisfy two conditions, which can be inferred from the associated Fokker–Planck equation [27,44]. First, the reversible probability current is required to be divergence-free in the space spanned by the fields Si :

(41)

To provide a specific example, we focus on the case of purely relaxational dynamics (i. e., reversible force terms are absent entirely), with the (mesoscopic) Hamiltonian given by the Ginzburg–Landau–Wilson free energy that describes second-order phase transitions in thermal equilibrium for an n-component order parameter Si , i D 1; : : : ; N [6,7,8,9,10,11,12,13]:

H [S i ] D

Z

" N X 1 r d x [S i (x)]2 C [rS i (x)]2 2 2 iD1 # N X u 2 2 C [S i (x)] [S j (x)] ; 4! d

(42)

jD1

with Onsager coefficients  i ; for nonconserved fields,  i is a positive relaxation rate. On the other hand, if the variable Si is a conserved quantity (such as the energy density), there is an associated continuity equation

where the control parameter r / T  Tc changes sign at the critical temperature T c , and the positive constant u governs the strength of the nonlinearity. If we assume that

Field Theoretic Methods

the order parameter itself is not conserved under the dynamics, the associated response functional reads 



S i ; Si D A e 

X

e Si

Z

i

Z

dd x

dt

@ ı H [S i ] Si  kB T ie C i @t ıS i

:

(43)

This case is frequently referred to as model A critical dynamics [45]. For a diffusively relaxing conserved field, termed model B in the classification of [45], one has instead 



A e S i ; Si D



X

e Si



i

Z

dd x

Z dt

@ ı H [S i ] Si C kB T D i r 2e  Di r 2 @t ıS i

: (44)

Consider now the external fields hi that are thermodynamically conjugate to the variables R mesoscopic P Si , i. e., H (h i ) D H (h i D 0)  dd x i h i (x)S i (x). For the simple relaxational models (43) and (44), we may thus immediately relate the dynamic susceptibility to twopoint correlation functions that involve the auxiliary fields e S i [43], namely ˇ ıhS i (x; t)i ˇˇ ıh j (x 0 ; t 0 ) ˇ h i D0 ˛ ˝ D kB T i S i (x; t)e S j (x 0 ; t 0 )

 i j (x  x 0 ; t  t 0 ) D

(45)

for nonconserved fields, while for model B dynamics ˝ ˛  i j (x  x 0 ; t  t 0 ) D kB T D i S i (x; t)r 2e S j (x 0 ; t 0 ) : (46) Finally, in thermal equilibrium the dynamic response and correlation functions are related through the fluctuationdissipation theorem [43] ˛ @ ˝  i j (x  x ; t  t ) D (t  t ) 0 S i (x; t)S j (x 0 ; t 0 ) : (47) @t 0

0

0

Driven Diffusive Systems and Interface Growth We close this section by listing a few intriguing examples for Langevin sytems that describe genuine out-of-equilibrium dynamics. First, consider a driven diffusive lattice gas (an overview is provided in [46]), namely a particle system with conserved total density with biased diffusion in a specified (‘k’) direction. The coarse-grained Langevin equation for the scalar density fluctuations thus becomes

spatially anisotropic [47,48], @S(x; t) @t   Dg 2 D D r? C crk2 S(x; t)C rk S(x; t)2 C%(x; t); 2 (48) and similarly for the conserved noise with h%i D 0,   ˛ ˝ 2 C c˜rk2 ı(x  x 0 )ı(t  t 0 ) : %(x; t)%(x 0 ; t 0 ) D 2D r? (49) Notice that the drive term / g breaks both the system’s spatial reflection symmetry as well as the Ising symmetry S ! S. In one dimension, Eq. (48) coincides with the noisy Burgers equation [49], and since in this case (only) the condition (39) is satisfied, effectively represents a system with equilibrium dynamics. The corresponding Janssen–De Dominicis response functional reads " Z Z     @S d 2 e e d x dt S A S; S D C crk2 S  D r? @t #   Dg 2 2 e 2 C D r? C c˜rk S  rk S : (50) 2 It describes a ‘massless’ theory, hence we expect the system to generically display scale-invariant features, without the need to tune to a special point in parameter space. The large-scale scaling properties can be analyzed by means of the dynamic renormalization group [47,48]. Another famous example for generic scale invariance emerging in a nonequilibrium system is curvature-driven interface growth, as captured by the Kardar–Parisi–Zhang equation [50] @S(x; t) Dg D Dr 2 S(x; t) C [rS(x; t)]2 C %(x; t) ; (51) @t 2 with again h%i D 0 and the noise correlations ˝ ˛ %(x; t)%(x 0 ; t 0 ) D 2Dı(x  x 0 )ı(t  t 0 ) :

(52)

(For more details and intriguing variants, see e. g. [51, 52,53].) The associated field theory action Z   A e S; S D dd x " #

Z Dg @S 2 2 2 S  Dr S  [rS]  De  dt e S (53) @t 2 encodes surprisingly rich behavior including a kinetic roughening transition separating two distinct scaling regimes in dimensions d > 2 [51,52,53].

1091

1092

Field Theoretic Methods

Future Directions The rich phenomenology in many complex systems is only inadequately captured within widely used mean-field approximations, wherein both statistical fluctuations and correlations induced by the subunits’ interactions or the system’s kinetics are neglected. Modern computational techniques, empowered by recent vast improvements in data storage and tact frequencies, as well as the development of clever algorithms, are clearly invaluable in the theoretical study of model systems displaying the hallmark features of complexity. Yet in order to gain a deeper understanding and to maintain control over the typically rather large parameter space, numerical investigations need to be supplemented by analytical approaches. The field-theoretic methods described in this article represent a powerful set of tools to systematically include fluctuations and correlations in the mathematical description of complex stochastic dynamical systems composed of many interacting degrees of freedom. They have already been very fruitful in studying the intriguing physics of highly correlated and strongly fluctuating many-particle systems. Aside from many important quantitative results, they have provided the basis for our fundamental understanding of the emergence of universal macroscopic features. At the time of writing, the transfer of field-theoretic methods to problems in chemistry, biology, and other fields such as sociology has certainly been initiated, but is still limited to rather few and isolated case studies. This is understandable, since becoming acquainted with the intricate technicalities of the field theory formalism requires considerable effort. Also, whereas it is straightforward to write down the actions corresponding the stochastic processes defined via microscopic classical discrete master or mesoscopic Langevin equations, it is usually not that easy to properly extract the desired information about largescale structures and long-time asymptotics. Yet if successful, one tends to gain insights that are not accessible by any other means. I therefore anticipate that the now welldeveloped methods of quantum and statistical field theory, with their extensions to stochastic dynamics, will find ample successful applications in many different areas of complexity science. Naturally, further approximation schemes and other methods tailored to the questions at hand will have to be developed, and novel concepts be devised. I look forward to learning about and hopefully also participating in these exciting future developments. Acknowledgments The author would like to acknowledge financial support through the US National Science Foundation grant NSF

DMR-0308548. This article is dedicated to the victims of the terrible events at Virginia Tech on April 16, 2007.

Bibliography 1. Lindenberg K, Oshanin G, Tachiya M (eds) (2007) J Phys: Condens Matter 19(6): Special issue containing articles on Chemical kinetics beyond the textbook: fluctuations, many-particle effects and anomalous dynamics; see: http://www.iop.org/EJ/ toc/0953-8984/19/6 2. Alber M, Frey E, Goldstein R (eds) (2007) J Stat Phys 128(1/2): Special issue on Statistical physics in biology; see: http:// springerlink.com/content/j4q1ln243968/ 3. Murray JD (2002) Mathematical biology, vols. I, II, 3rd edn. Springer, New York 4. Mobilia M, Georgiev IT, Täuber UC (2007) Phase transitions and spatio-temporal fluctuations in stochastic lattice Lotka– Volterra models. J Stat Phys 128:447–483. several movies with Monte Carlo simulation animations can be accessed at http:// www.phys.vt.edu/~tauber/PredatorPrey/movies/ 5. Washenberger MJ, Mobilia M, Täuber UC (2007) Influence of local carrying capacity restrictions on stochastic predator-prey models. J Phys: Condens Matter 19:065139, 1–14 6. Ramond P (1981) Field theory – a modern primer. Benjamin/Cummings, Reading 7. Amit DJ (1984) Field theory, the renormalization group, and critical phenomena. World Scientific, Singapore 8. Negele JW, Orland H (1988) Quantum many-particle systems. Addison-Wesley, Redwood City 9. Parisi G (1988) Statistical field theory. Addison-Wesley, Redwood City 10. Itzykson C, Drouffe JM (1989) Statistical field theory. Cambridge University Press, Cambridge 11. Le Bellac M (1991) Quantum and statistical field theory. Oxford University Press, Oxford 12. Zinn-Justin J (1993) Quantum field theory and critical phenomena. Clarendon Press, Oxford 13. Cardy J (1996) Scaling and renormalization in statistical physics. Cambridge University Press, Cambridge 14. Janssen HK (1976) On a Lagrangean for classical field dynamics and renormalization group calculations of dynamical critical properties. Z Phys B 23:377–380 15. De Dominicis C (1976) Techniques de renormalisation de la théorie des champs et dynamique des phénomènes critiques. J Physique (France) Colloq 37:C247–C253 16. Janssen HK (1979) Field-theoretic methods applied to critical dynamics. In: Enz CP (ed) Dynamical critical phenomena and related topics. Lecture Notes in Physics, vol 104. Springer, Heidelberg, pp 26–47 17. Doi M (1976) Second quantization representation for classical many-particle systems. J Phys A: Math Gen 9:1465–1477 18. Doi M (1976) Stochastic theory of diffusion-controlled reactions. J Phys A: Math Gen 9:1479–1495 19. Grassberger P, Scheunert M (1980) Fock-space methods for identical classical objects. Fortschr Phys 28:547–578 20. Peliti L (1985) Path integral approach to birth-death processes on a lattice. J Phys (Paris) 46:1469–1482 21. Peliti L (1986) Renormalisation of fluctuation effects in the A C A ! A reaction. J Phys A: Math Gen 19:L365–L367

Field Theoretic Methods

22. Lee BP (1994) Renormalization group calculation for the reaction kA ! ;. J Phys A: Math Gen 27:2633–2652 23. Lee BP, Cardy J (1995) Renormalization group study of the A C B ! ; diffusion-limited reaction. J Stat Phys 80:971– 1007 24. Mattis DC, Glasser ML (1998) The uses of quantum field theory in diffusion-limited reactions. Rev Mod Phys 70:979–1002 25. Täuber UC, Howard MJ, Vollmayr-Lee BP (2005) Applications of field-theoretic renormalization group methods to reactiondiffusion problems. J Phys A: Math Gen 38:R79–R131 26. Täuber UC (2007) Field theory approaches to nonequilibrium dynamics. In: Henkel M, Pleimling M, Sanctuary R (eds) Ageing and the glass transition. Lecture Notes in Physics, vol 716. Springer, Berlin, pp 295–348 27. Täuber UC, Critical dynamics: a field theory approach to equilibrium and nonequilibrium scaling behavior. To be published at Cambridge University Press, Cambridge. for completed chapters, see: http://www.phys.vt.edu/~tauber/utaeuber.html 28. Schütz GM (2000) Exactly solvable models for many-body systems far from equilibrium. In: Domb C, Lebowitz JL (eds) Phase transitions and critical phenomena, vol 19. Academic Press, London 29. Stinchcombe R (2001) Stochastic nonequilibrium systems. Adv Phys 50:431–496 30. Van Wijland F (2001) Field theory for reaction-diffusion processes with hard-core particles. Phys Rev E 63:022101, 1–4 31. Chopard B, Droz M (1998) Cellular automaton modeling of physical systems. Cambridge University Press, Cambridge 32. Marro L, Dickman R (1999) Nonequilibrium phase transitions in lattice models. Cambridge University Press, Cambridge 33. Hinrichsen H (2000) Nonequilibrium critical phenomena and phase transitions into absorbing states. Adv Phys 49:815–958 34. Ódor G (2004) Phase transition universality classes of classical, nonequilibrium systems. Rev Mod Phys 76:663–724 35. Moshe M (1978) Recent developments in Reggeon field theory. Phys Rep 37:255–345 36. Obukhov SP (1980) The problem of directed percolation. Physica A 101:145–155

37. Cardy JL, Sugar RL (1980) Directed percolation and Reggeon field theory. J Phys A: Math Gen 13:L423–L427 38. Janssen HK (1981) On the nonequilibrium phase transition in reaction-diffusion systems with an absorbing stationary state. Z Phys B 42:151–154 39. Janssen HK, Täuber UC (2005) The field theory approach to percolation processes. Ann Phys (NY) 315:147–192 40. Grassberger P (1982) On phase transitions in Schlögl’s second model. Z Phys B 47:365–374 41. Janssen HK (2001) Directed percolation with colors and flavors. J Stat Phys 103:801–839 42. Martin PC, Siggia ED, Rose HA (1973) Statistical dynamics of classical systems. Phys Rev A 8:423–437 43. Bausch R, Janssen HK, Wagner H (1976) Renormalized field theory of critical dynamics. Z Phys B 24:113–127 44. Chaikin PM, Lubensky TC (1995) Principles of condensed matter physics. Cambridge University Press, Cambridge 45. Hohenberg PC, Halperin BI (1977) Theory of dynamic critical phenomena. Rev Mod Phys 49:435–479 46. Schmittmann B, Zia RKP (1995) Statistical mechanics of driven diffusive systems. In: Domb C, Lebowitz JL (eds) Phase transitions and critical phenomena, vol 17. Academic Press, London 47. Janssen HK, Schmittmann B (1986) Field theory of long time behaviour in driven diffusive systems. Z Phys B 63:517–520 48. Leung KT, Cardy JL (1986) Field theory of critical behavior in a driven diffusive system. J Stat Phys 44:567–588 49. Forster D, Nelson DR, Stephen MJ (1977) Large-distance and long-time properties of a randomly stirred fluid. Phys Rev A 16:732–749 50. Kardar M, Parisi G, Zhang YC (1986) Dynamic scaling of growing interfaces. Phys Rev Lett 56:889–892 51. Barabási AL, Stanley HE (1995) Fractal concepts in surface growth. Cambridge University Press, Cambridge 52. Halpin-Healy T, Zhang YC (1995) Kinetic roughening phenomena, stochastic growth, directed polymers and all that. Phys Rep 254:215–414 53. Krug J (1997) Origins of scale invariance in growth processes. Adv Phys 46:139–282

1093

1094

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata HIROSHI UMEO Electro-Communication, University of Osaka, Osaka, Japan Article Outline Glossary Definition of the Subject Introduction Firing Squad Synchronization Problem Variants of the Firing Squad Synchronization Problem Firing Squad Synchronization Problem on Two-dimensional Arrays Summary and Future Directions Bibliography Glossary Cellular automaton A cellular automaton is a discrete computational model studied in mathematics, computer science, economics, biology, physics and chemistry etc. It consists of a regular array of cells, each cell is a finite state automaton. The array can be in any finite number of dimensions. Time (step) is also discrete, and the state of a cell at time t ( 1) is a function of the states of a finite number of cells (called its neighborhood) at time t  1. Each cell has a same rule set for updating its next state, based on the states in the neighborhood. At every step the rules are applied to the whole array synchronously, yielding a new configuration. Time-space diagram A time-space diagram is frequently used to represent signal propagations in one-dimensional cellular space. Usually, the time is drawn on the vertical axis and the space on the horizontal axis. The trajectories of individual signals in propagation are expressed in this diagram by sloping lines. The slope of the line represents the propagation speed of the signal. Time-space-diagrams that show the position of individual signals in time and in space are very useful for understanding cellular algorithms, signal propagations and crossings in the cellular space. Definition of the Subject The firing squad synchronization problem (FSSP for short) is formalized in terms of the model of cellular automata. Figure 1 shows a finite one-dimensional cellular array consisting of n cells, denoted by Ci , where 1  i  n.

Firing Squad Synchronization Problem in Cellular Automata, Figure 1 One-dimensional cellular automaton

All cells (except the end cells) are identical finite state automata. The array operates in lock-step mode such that the next state of each cell (except the end cells) is determined by both its own present state and the present states of its right and left neighbors. All cells (soldiers), except the left end cell, are initially in the quiescent state at time t D 0 and have the property whereby the next state of a quiescent cell having quiescent neighbors is the quiescent state. At time t D 0 the left end cell (general) is in the fire-when-ready state, which is an initiation signal to the array. The firing squad synchronization problem is stated as follows: Given an array of n identical cellular automata, including a general on the left end which is activated at time t D 0, one wants to give the description (state set and next-state transition function) of the automata so that, at some future time, all of the cells will simultaneously and, for the first time, enter a special firing state. The set of states and the next-state transition function must be independent of n. Without loss of generality, it is assumed that n 2. The tricky part of the problem is that the same kind of soldiers having a fixed number of states must be synchronized, regardless of the length n of the array. The problem itself is interesting as a mathematical puzzle and a good example of recursive divide-and-conquer strategy operating in parallel. It has been referred to as achieving macro-synchronization given in micro-synchronization systems and realizing a global synchronization using only local information exchange [10]. Introduction Cellular automata are considered to be a nice model of complex systems in which an infinite one-dimensional array of finite state machines (cells) updates itself in synchronous manner according to a uniform local rule. A comprehensive study is made for a synchronization problem that gives a finite-state protocol for synchronizing a large scale of cellular automata. Synchronization of general network is a computing primitive of parallel and distributed computations. The synchronization in cellu-

Firing Squad Synchronization Problem in Cellular Automata

lar automata has been known as firing squad synchronization problem since its development, in which it was originally proposed by J. Myhill in Moore [29] to synchronize all parts of self-reproducing cellular automata. The problem has been studied extensively for more than 40 years [1–80]. The present article firstly examines the state transition rule sets for the famous firing squad synchronization algorithms that give a finite-state protocol for synchronizing large-scale cellular automata, focusing on the fundamental synchronization algorithms operating in optimum steps on one-dimensional cellular arrays. The algorithms discussed herein are the Goto’s first algorithm [12], the eight-state Balzer’s algorithm [1], the seven-state Gerken’s algorithm [9], the six-state Mazoyer’s algorithm [25], the 16-state Waksman’s algorithm [74] and a number of revised versions thereof. In addition, the article constructs a survey of current optimum-time synchronization algorithms and compares their transition rule sets with respect to the number of internal states of each finite state automaton, the number of transition rules realizing the synchronization, and the number of state-changes on the array. It also presents herein a survey and a comparison of the quantitative and qualitative aspects of the optimumtime synchronization algorithms developed thus far for one-dimensional cellular arrays. Then, it provides several variants of the firing squad synchronization problems including fault-tolerant synchronization protocols, one-bit communication protocols, non-optimum-time algorithms, and partial solutions etc. Finally, a survey on two-dimensional firing squad synchronization algorithms is presented. Several new results and viewpoints are also given. Firing Squad Synchronization Problem A Brief History of the Developments of Firing Squad Synchronization Algorithms The problem known as the firing squad synchronization problem was devised in 1957 by J. Myhill, and first appeared in print in a paper by E.F. Moore [29]. This problem has been widely circulated, and has attracted much attention. The firing squad synchronization problem first arose in connection with the need to simultaneously turn on all parts of a self-reproducing machine. The problem was first solved by J. McCarthy and M. Minsky [28] who presented a non-optimum-time synchronization scheme that operates in 3nCO(1) steps for synchronizing n cells. In 1962, the first optimum-time, i. e. (2n  2)-step, synchronization algorithm was presented by Goto [12], with each cell having several thousands of states. Waksman [74]

presented a 16-state optimum-time synchronization algorithm. Afterward, Balzer [1] and Gerken [9] developed an eight-state algorithm and a seven-state algorithm, respectively, thus decreasing the number of states required for the synchronization. In 1987, Mazoyer [25] developed a six-state synchronization algorithm which, at present, is the algorithm having the fewest states. Firing Squad Synchronization Algorithm Section “Firing Squad Synchronization Algorithm” briefly sketches the design scheme for the firing squad synchronization algorithm according to Waksman [74] in which the first transition rule set was presented. It is quoted from Waksman [74]. The code book of the state transitions of machines is so arranged to cause the array to progressively divide itself into 2k equal parts, where k is an integer and an increasing function of time. The end machines in each partition assume a special state so that when the last partition occurs, all the machines have for both neighbors machines at this state. This is made the only condition for any machine to assume terminal state. Figure 2 is a time-space diagram for the Waksman’s optimum-step firing squad synchronization algorithm. The general at time t D 0 emits an infinite number of signals which propagate at 1/(2 kC1  1) speed, where k is positive integer. These signals meet with a reflected signal at half point, quarter points, . . . , etc., denoted by ˇ in Fig. 2. It is noted that these cells indicated by ˇ are synchronized. By increasing the number of synchronized cells exponentially, eventually all of the cells are synchronized. Complexity Measures and Properties in Firing Squad Synchronization Algorithms Time Complexity Any solution to the firing squad synchronization problem can easily be shown to require (2n  2)-steps for synchronizing n cells, since signals on the array can propagate no faster than one cell per step, and the time from the general’s instruction until the synchronization must be at least 2n  2. See Balzer [1], Goto [12] and Waksman [74] for a proof. The next two theorems show the optimum-time complexity for synchronizing n cells on one-dimensional arrays. Theorem 1 ([1,12,74]) Synchronization of n cells in less than (2n  2)-steps is impossible. Theorem 2 ([1,12,74]) Synchronization of n cells at exactly (2n  2)-steps is possible.

1095

1096

Firing Squad Synchronization Problem in Cellular Automata

some 4- and 5-state partial solutions that can synchronize infinite cells, but not all. Theorem 3 ([1,40]) There is no four-state CA that can synchronize n cells. Berthiaume, Bittner, Perkovi´c, Settle and Simon [2] considered the state lower bound on ring-connected cellular automata. It is shown that there exists no three-state solution and no four-state symmetric solution for rings. Theorem 4 ([2]) There is no four-state symmetric optimum-time solution for ring-connected cellular automata. Number of Transition Rules Any k-state transition table for the synchronization has at most (k  1)k 2 entries in (k  1) matrices of size k  k. The number of transition rules reflects the complexity of synchronization algorithms. Transition Rule Sets for Optimum-Time Firing Squad Synchronization Algorithms

Firing Squad Synchronization Problem in Cellular Automata, Figure 2 Time-space diagram for Waksman’s optimum-time firing squad synchronization algorithm

Number of States The following three distinct states: the quiescent state, the general state, and the firing state, are required in order to define any cellular automaton that can solve the firing squad synchronization problem. The boundary state for C0 and CnC1 is not generally counted as an internal state. Balzer [1] implemented a search strategy in order to prove that there exists no four-state solution. He showed that no four-state optimum-time solution exists. Sanders [40] studied a similar problem on a parallel computer and showed that the Balzer’s backtrack heuristic was not correct, rendering the proof incomplete and gave a proof based on a computer simulation for the non-existence of four-state solution. Balzer [1] also showed that there exist no five-state optimum-time solution satisfying special conditions. It is noted that the Balzer’s special conditions do not hold for the Mazoyer’s six-state solution with the fewest states known at present. The question that remains is: “What is the minimum number of states for an optimum-time solution of the problem?” At present, that number is five or six. Section “Partial Solutions” gives

Section “Transition Rule Sets for Optimum-Time Firing Squad Synchronization Algorithms” implements most of the transition rule sets for the synchronization algorithms above mentioned on a computer and check whether these rule sets yield successful firing configurations at exactly t D 2n  2 steps for any n such that 2  n  10; 000. Waksman’s 16-State Algorithm Waksman [74] proposed a 16-state firing squad synchronization algorithm, which, together with an unpublished algorithm by Goto [12], is referred to as the first-in-the-world optimum-time synchronization algorithm. Waksman presented the first set of transition rules described in terms of a state transition table that is defined on the following state set D consisting of 16 states such that D = fQ, T, P0 , P1 , B0 , B1 , R0 , R1 , A000 , A001 , A010 , A011 , A100 , A101 , A110 , A111 g, where Q is a quiescent state, T is a firing state, P0 and P1 are prefiring states, B0 and B1 are states for signals propagating at various speeds, R0 and R1 are trigger states which cause the B0 and B1 states move in the left or right direction and A i jk ; i; j; k 2 f0; 1g are control states which generate the state R0 or R1 either with a unit delay or without any delay. The state P0 also acts as an initial general. USN Transition Rule Set Cellular automata researchers have reported that some errors are included in the Waksman’s transition table. A computer simulation made in Umeo, Sogabe and Nomura [64] reveals this to be true. They corrected some errors included in the original Waksman’s transition rule set. The correction procedures can

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 3 USN transition table consisting of 202 rules that realize Waksman’s synchronization algorithm. The symbol represents the boundary state

be found in Umeo, Sogabe and Nomura [64]. This subsection gives a complete list of the transition rules which yield successful synchronizations for any n. Figure 3 is the complete list, which consists of 202 transition rules. The list is referred to as the USN transition rule set. In the correction, a ninety-three percent reduction in the number of transition rules is realized compared to the Waksman’s original list. The computer simulation based on the table of Fig. 3 gives the following observation. Figure 4 shows

snapshots of the Waksman’s 16-state optimum-time synchronization algorithm on 21 cells. Observation 3.1 ([54,64]) The set of rules given in Fig. 3 is the smallest transition rule set for Waksman’s optimumtime firing squad synchronization algorithm. Balzer’s Eight-State Algorithm Balzer [1] constructed an eight-state, 182-rule synchronization algorithm and

1097

1098

Firing Squad Synchronization Problem in Cellular Automata

Gerken’s Seven-State Algorithm Gerken [9] constructed a seven-state, 118-rule synchronization algorithm. In the computer examination, no errors were found, however, 13 rules were found to be redundant. Figure 6 gives a list of the transition rules for Gerken’s algorithm and snapshots for synchronization operations on 28 cells. The 13 redundant rules are marked by shaded squares in the table. The symbols “>”, “/”, “. . . ” and “#” represent the general, quiescent, firing and boundary states, respectively. The symbol “. . . ” is replaced by “F” in the configuration (right) at time t D 54. Mazoyer’s Six-State Algorithm Mazoyer [25] proposed a six-state, 120-rule synchronization algorithm, the structure of which differs greatly from the previous three algorithms discussed above. The computer examination revealed no errors and only one redundant rule. Figure 7 presents a list of transition rules for Mazoyer’s algorithm and snapshots of configurations on 28 cells. In the transition table, the letters “G”, “L”, “F” and “X” represent the general, quiescent, firing and boundary states, respectively.

Firing Squad Synchronization Problem in Cellular Automata, Figure 4 Snapshots of the Waksman’s 16-state optimum-time synchronization algorithm on 21 cells

the structure of which is completely identical to that of Waksman [74]. A computer examination made by Umeo, Hisaoka and Sogabe [54] revealed no errors, however, 17 rules were found to be redundant. Figure 5 gives a list of transition rules for Balzer’s algorithm and snapshots for synchronization operations on 28 cells. Those redundant rules are indicated by shaded squares. In the transition table, the symbols “M”, “L”, “F” and “X” represent the general, quiescent, firing and boundary states, respectively. Noguchi [34] also constructed an eight-state, 119-rule optimum-time synchronization algorithm.

Goto’s Algorithm The first synchronization algorithm presented by Goto [12] was not published as a journal paper. According to Prof. Goto, the original note Goto [12] is now unavailable, and the only existing material that treats the algorithm is Goto [13]. The Goto’s study presents one figure (Fig. 3.8 in Goto [13]) demonstrating how the algorithm works on 13 cells with a very short description in Japanese. Umeo [50] reconstructed the Goto’s algorithm based on this figure. Mazoyer [27] also reconstructed this algorithm again. The algorithm that Umeo [50] reconstructed is a non-recursive algorithm consisting of a marking phase and a 3n-step synchronization phase. In the first phase, by printing a special marker in the cellular space, the entire cellular space is divided into many smaller subspaces, the lengths of which increase exponentially with a common ratio of two, that is 2j , for any integer j such that 1  j  blog2 nc  1. The marking is made from both the left and right ends. In the second phase, each subspace is synchronized using a well-known conventional 3n-step simple synchronization algorithm. A time-space diagram of the reconstructed algorithm is shown in Fig. 8. Gerken’s 155-State Algorithm Gerken [9] constructed two kinds of optimum-time synchronization algorithms. One seven-state algorithm has been discussed in the previous subsection, and the other is a 155-state algorithm having (n log n) state-change complexity. The transition table given in Gerken [9] is described in terms of two-layer construction with 32 states and 347 rules. An expansion of

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 5 Transition table for the Balzer’s eight-state protocol (left) and its snapshots for synchronization operations on 28 cells (right)

the transition table into a single-layer format yields a 155state table consisting of 2371 rules. Figure 9 shows a configuration on 28 cells. State Change Complexity Vollmar [73] introduced a state-change complexity in order to measure the efficiency of cellular algorithms and

showed that ˝(n log n) state-changes are required for the synchronization of n cells in (2n  2) steps. Theorem 5 ([73]) ˝(n log n) state-change is necessary for synchronizing n cells in (2n  2) steps. Theorem 6 ([9,54]) Each optimum-time synchronization algorithm developed by Balzer [1], Gerken [9], Ma-

1099

1100

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 6 Transition table for the Gerken’s seven-state protocol (left) and snapshots for synchronization operations on 28 cells (right)

zoyer [25] and Waksman [74] has an O(n2 ) state-change complexity, respectively.

Goto’s time-optimum synchronization algorithm. Umeo, Hisaoka and Sogabe [54] has shown that:

Theorem 7 ([9]) Gerken’s 155-state synchronization algorithm has a (n log n) state-change complexity.

Theorem 8 ([54]) Goto’s time-optimum synchronization algorithm as reconstructed by Umeo [50] has (n log n) state-change complexity.

It has been shown that any 3n-step thread-like synchronization algorithm has a (n log n) state-change complexity and such a 3n-step thread-like synchronization algorithm can be used for subspace synchronization in the

Figure 10 shows a comparison of the state-change complexities in several optimum-time synchronization algorithms.

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 7 Transition table for the Mazoyer’s six-state protocol (left) and its snapshots of configurations on 28 cells (right)

A Comparison of Quantitative Aspects of Optimum-Time Synchronization Algorithms Section “A Comparison of Quantitative Aspects of Optimum-Time Synchronization Algorithms” presents a table based on a quantitative comparison of optimum-time synchronization algorithms and their transition tables discussed above with respect to the number of internal states of each finite state automaton, the number of transition rules realizing the synchronization, and the number of state-changes on the array.

One-Sided vs. Two-Sided Recursive Algorithms Firing squad synchronization algorithms have been designed on the basis of parallel divide-and-conquer strategy that calls itself recursively in parallel. Those recursive calls are implemented by generating many Generals that work for synchronizing divided small areas in the cellular space. Initially a General G0 located at the left end works for synchronizing the whole cellular space consisting of n cells. In Fig. 11 (left), G1 synchronizes the subspace between G1 and the right end of the array. The ith Gen-

1101

1102

Firing Squad Synchronization Problem in Cellular Automata Firing Squad Synchronization Problem in Cellular Automata, Table 1 Quantitative comparison of transition rule sets for optimum-time firing squad synchronization algorithms. The symbol shows the correction and reduction of transition rules made in Umeo, Hisaoka and Sogabe [54]. The

symbol indicates the number of states and rules obtained after the expansion of the original two-layer construction Algorithm # of states # of transition rules State change complexity Goto [12] many thousands – (n log n) Waksman [74] 16 202 (3216) O(n2 )  Balzer [1] 8 165 (182) O(n2 ) Noguchi [34] 8 119 O(n2 )  Gerken [9] 7 105 (118) O(n2 )  Mazoyer [25] 6 119 (120) O(n2 )   Gerken [9] 155 (32) 2371 (347) (n log n) Firing Squad Synchronization Problem in Cellular Automata, Table 2 A qualitative comparison of optimum-time firing squad synchronization algorithms Algorithm One-/two-sided Recursive/non-recursive # of signals Goto [12] – non-recursive finite Waksman [74] two-sided recursive infinite Balzer [1] two-sided recursive infinite Noguchi [34] two-sided recursive infinite Gerken [9] two-sided recursive infinite Mazoyer [25] one-sided recursive infinite Gerken [9] two-sided recursive finite

eral Gi , i D 2; 3; : : : ; works for synchronizing the cellular space between G i1 and Gi , respectively. Thus, all of the Generals generated by G0 are located at the left end of the divided cellular spaces to be synchronized. On the other hand, in Fig. 11 (right), the General G0 generates General Gi , i D 1; 2; 3; : : : ; . Each Gi , i D 1; 2; 3; : : : ; synchronizes the divided space between Gi and G iC1 , respectively. In addition, Gi , i D 2; 3; : : : ; does the same operations as G0 . Thus, in Fig. 11 (right) one can find Generals located at either end of the subspace for which they are responsible. If all of the recursive calls for the synchronization are issued by Generals located at one (both two) end(s) of partitioned cellular spaces for which the General works, the synchronization algorithm is said to have one-sided (two-sided) recursive property, respectively. A synchronization algorithm with the one-sided (two-sided) recursive property is referred to as one-sided (two-sided) recursive synchronization algorithm. Figure 11 illustrates a time-space diagram for one-sided (Fig. 11 (left)) and two-sided (Fig. 11 (right)) recursive synchronization algorithms both operating in optimum 2n  2 steps. It is noted that optimumtime synchronization algorithms developed by Balzer [1], Gerken [9], Noguchi [34] and Waksman [74] are twosided ones and an algorithm proposed by Mazoyer [25] is an only synchronization algorithm with the one-sided recursive property.

Observation 3.2 ([54]) Optimum-time synchronization algorithms developed by Balzer [1], Gerken [9], Noguchi [34] and Waksman [74] are two-sided ones. The algorithm proposed by Mazoyer [25] is a one-sided one. A more general design scheme for one-sided recursive optimum-time synchronization algorithms can be found in Mazoyer [24]. Recursive vs. Non-recursive algorithms As is shown in the previous section, the optimumtime synchronization algorithms developed by Balzer [1], Gerken [9], Mazoyer [25], Noguchi [34] and Waksman [74] are recursive ones. On the other hand, it is noted that overall structure of the reconstructed Goto’s algorithm is a non-recursive one, however divided subspaces are synchronized by using a recursive 3n C O(1)step synchronization algorithm. Number of Signals Waksman [74] devised an efficient way to cause a general cell to generate infinite signals at propagating speeds of 1/1; 1/3; 1/7, .., 1/(2 k  1), where k is any natural number. These signals play an important role in dividing the array into two, four, eight, . . . , equal parts synchronously. The same set of signals is used in Balzer [1]. Gerken [9]

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 8 Time-space diagram for Goto’s algorithm as reconstructed by Umeo [50]

had a similar idea in the construction of his seven-state algorithm. Thus infinite set of signals with different propagation speed is used in the first three algorithms. On the other hand, finite sets of signals with propagating speed f1/5; 1/2; 1/1g and f1/3; 1/2; 3/5; 1/1g are made use of in Gerken’s 155-state algorithm and the reconstructed Goto’s algorithm, respectively. A Comparison of Qualitative Aspects of Optimum-Time Synchronization Algorithms Section “A Comparison of Qualitative Aspects of Optimum-Time Synchronization Algorithms” presents a table

Firing Squad Synchronization Problem in Cellular Automata, Figure 9 Snapshots of the Gerken’s 155-state algorithm on 28 cells

based on a qualitative comparison of optimum-time synchronization algorithms with respect to one/two-sided recursive properties and the number of signals being used for simultaneous space divisions. Variants of the Firing Squad Synchronization Problem Generalized Firing Squad Synchronization Problem Section “Generalized Firing Squad Synchronization Problem” considers a generalized firing squad synchroniza-

1103

1104

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 10 A comparison of state-change complexities in optimum-time synchronization algorithms

Firing Squad Synchronization Problem in Cellular Automata, Figure 11 One-sided recursive synchronization scheme (left) and two-sided recursive synchronization scheme (right)

Firing Squad Synchronization Problem in Cellular Automata

tion problem which allows the general to be located anywhere on the array. It has been shown to be impossible to synchronize any array of length n less than n  2 C max(k; n  k C 1) steps, where the general is located on Ck . Moore and Langdon [30], Szwerinski [45] and Varshavsky, Marakhovsky and Peschansky [69] developed a generalized optimum-time synchronization algorithm with 17, 10 and 10 internal states, respectively, that can synchronize any array of length n at exactly n  2 C max(k; n  k C 1) steps. Recently, Settle and Simon [43] and Umeo, Hisaoka, Michisaka, Nishioka and Maeda [56] have proposed a 9-state generalized synchronization algorithm operating in optimum-step. Figure 12 shows snapshots for synchronization configurations based on the rule set of Varshavsky, Marakhovsky and Peschansky [69].

Theorem 9 ([30,43,45,56,69]) There exists a cellular automaton that can synchronize any one-dimensional array of length n in optimum n  2 C max(k; n  k C 1) steps, where the general is located on the kth cell from left end. Non-Optimum-Time 3n-Step Synchronization Algorithms Non-optimum-time 3n-step algorithm is a simple and straightforward one that exploits a parallel divide-andconquer strategy based on an efficient use of 1/1- and 1/3speed of signals. Minsky and MacCarthy [28] gave an idea for designing the 3n-step synchronization algorithm, and Fischer [8] implemented the 3n-step algorithm, yielding a 15-state implementation, respectively. Yunès [76] developed two seven-state synchronization algorithms, thus decreasing the number of internal states of each cellular automaton. This section presents a new symmetric six-state 3n-step firing squad synchronization algorithm developed in Umeo, Maeda and Hongyo [61]. The number six is the smallest one known at present in the class of 3n-step synchronization algorithms. Figure 13 shows the 6-state transition table and snapshots for synchronization on 14 cells. In the transition table, the symbols “P”, “Q”, “F” and “*” represent the general, quiescent, firing and boundary states, respectively. Yunès [79] also developed a symmetric 6-state 3n-step solution. Theorem 10 ([61,79]) There exists a symmetric 6-state cellular automaton that can synchronize any n cells in 3n + O(log n) steps. A non-trivial, new symmetric six-state 3n-step generalized firing squad synchronization algorithm is also presented in Umeo, Maeda and Hongyo [61]. Figure 14 gives a list of transition rules for the 6-state generalized synchronization algorithm and snapshots of configurations on 15 cells. The symbol “M” is the general state. Theorem 11 ([61]) There exists a symmetric 6-state cellular automaton that can solve the generalized firing squad synchronization problem in max(k; n  k C 1) C 2n + O(log n) steps.

Firing Squad Synchronization Problem in Cellular Automata, Figure 12 Snapshots of the three Russian’s 10-state generalized optimumtime synchronization algorithm on 22 cells

In addition, a state-change complexity is studied in 3n-step firing squad synchronization algorithms. It has been shown that the six-state algorithms presented above have O(n2 ) state-change complexity, on the other hand, the thread-like 3n-step algorithms developed so far have O(n log n) state-change complexity. Here, the following table presents a quantitative comparison of the 3n-step synchronization algorithms developed so far.

1105

1106

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 13 Transition table for symmetric six-state protocol (left) and snapshots for synchronization algorithm on 14 cells

Delayed Firing Squad Synchronization Algorithm This section introduces a freezing-thawing technique that yields a delayed synchronization algorithm for onedimensional arrays. The technique is very useful in the design of time-efficient synchronization algorithms for oneand two-dimensional arrays in Umeo [52], Yunès [77] and Umeo and Uchino [65]. A similar technique was used by Romani [37] in the tree synchronization. The technique is stated as in the following theorem. Theorem 12 ([52]) Let t0 , t1 , t2 and t be any integer such that t0 0, t0  t1  t0 C n  1, t1  t2 and t D t2  t1 . It is assumed that a usual optimum-time synchronization operation is started at time t D t0 by generating a special signal at the left end of one-dimensional array and the right end cell of the array receives another special signals from outside at time t D t1 and t2 , respectively. Then, there exists a one-dimensional cellular au-

Firing Squad Synchronization Problem in Cellular Automata, Figure 14 Transition table for generalized symmetric six-state protocol (left) and snapshots for synchronization algorithm on 15 cells with a General on C5

tomaton that can synchronize the array of length n at time t D t0 C 2n  2 C t. The array operates as follows: 1. Start an optimum-time firing squad synchronization algorithm at time t D t0 at the left end of the array. A 1/1speed signal is propagated towards the right direction to wake-up cells in quiescent state. The signal is referred to as wake-up signal. A freezing signal is given from outside at time t D t1 at the right end of the array. The signal is propagated in the left direction at its maximum speed, that is, 1 cell per 1 step, and freezes the configuration progressively. Any cell that receives the freezing signal from its right neighbor has to stop its state-change and transmits the freezing signal to its left neighbor. The frozen cell keeps its state as long as no thawing signal will arrive.

Firing Squad Synchronization Problem in Cellular Automata Firing Squad Synchronization Problem in Cellular Automata, Table 3 A comparison of 3n-step firing squad synchronization algorithms Algorithm

# States # Rules Time complexity

Minsky and MacCarthy 13 Fischer 15 Yunès 7 Yunès 7 Settle and Simon 6 Settle and Simon 7 Umeo et al. 6 Umeo et al. 6 Umeo and Yanagihara 5

– – 105 107 134 127 78 115 67

Statechange complexity 3n C n log n C c O(n log n) 3n  4 O(n log n) 3n ˙ 2n log n C c O(n log n) 3n ˙ 2n log n C c O(n log n) 3n C 1 O(n2 ) 2n  2 C k O(n2 ) 3n C O(log n) O(n2 ) max(k; n  k C 1) C 2n C O(log n) O(n2 ) 3n  3 O(n2 )

Yunès

105

3n C dlog ne  3

6

2. A special signal supplied with outside at time t D t2 is used as a thawing signal that thaws the frozen configuration. The thawing signal forces the frozen cell to resume its state-change procedures immediately. See Fig. 15 (left). The signal is also transmitted toward the left end at speed 1/1. The readers can see how those three signals work. The entire configuration can be freezed during t steps and the synchronization on the array is delayed for t steps. It is easily seen that the freezing signal can be replaced by the reflected signal of the wake-up signal, that is generated at the right end cell at time t D t0 C n  1. See Fig. 15. The scheme is referred to as freezing-thawing technique. Fault-Tolerant Firing Squad Synchronization Problem Consider a one-dimensional array of cells, some of which are defective. At time t D 0, the left end cell C1 is in the fire-when-ready state, which is the initialization signal for the array. The fault-tolerant firing squad synchronization problem for cellular automata with defective cells is to determine a description of cells that ensures all intact cells enter the fire state at exactly the same time and for the first time. The fault-tolerant firing squad synchronization problem has been studied in Kutrib and Vollmar [22,23], Umeo [52] and Yunès [77].

O(n log n)

Generals’s Type position

Notes

Ref.

left left left left right arbitrary left arbitrary left/right

thread thread thread thread plane plane plane plane plane

0  n < 1 – 0  n < 1 0  n < 1 – – – – n D 2k ; kD 1; 2; : : :

[28] [8] [76] [76] [43] [43] [61] [61] [67]

left

thread

[79]

to as a defective (intact) segment, respectively. Any defective and intact cell can detect whether its neighbor cells are defective or not. Cellular arrays are assumed to have an intact segment at its left and right ends. New defections do not occur during the operational lifetime on any cell.  Signal Propagation in a Defective Segment: It is assumed that any cell in defective segment can only transmit the signal to its right or left neighbor depending on the direction in which it comes to the defective segment. The speed of the signal in any defective segment is fixed to 1/1, that is, one cell per one step. In defective segments, both the information carried by the signal and the direction in which the signal is propagated are preserved without any modifications. Thus, one can see that any defective segment has two one-way pipelines that can transfer one state at 1/1 speed in either direction. Note that from a standard viewpoint of state transition of usual CA each cell in a defective segment can change its internal states in a specific manner. The array consists of p defective segments and (p C 1) intact segments, where they are denoted by I i and Dj , respectively and p is any positive integer. Let ni and mj be number of cells on the ith intact and jth defective segments, where i and j be any integer such that 1  i  p C 1 and 1  j  p. Let n be the number of cells of the array such that n D (n1 Cm1 )C(n2 Cm2 )C; : : : ; C(n p Cm p )Cn pC1 .

Cellular Automata with Defective Cells  Intact and Defective Cells: Each cell has its own selfdiagnosis circuit that diagnoses itself before its operation. A consecutive defective (intact) cells are referred

Fault-Tolerant Firing Squad Synchronization Algorithms Umeo [52] studied the synchronization algorithms for such arrays that there are locally more intact cells than defective ones, i. e., n i m i for any i such that

1107

1108

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 15 Time-space diagram for delayed firing squad synchronization scheme based on the freezing-thawing technique (left) and a delayed (for t D 5 steps) configuration in Balzer’s optimum-time firing squad synchronization algorithm on n D 11 cells (right)

1  i  p. First, consider the case p D 1 where the array has one defective segment and n1 m1 . Figure 16 illustrates a simple synchronization scheme. The fault-tolerant synchronization algorithm for one defective segment is stated as follows: Theorem 13 ([52]) Let M be any cellular array of length n with one defective and two intact segments such that n1

m1 , where n1 and m1 denote the number of cells on the first intact and defective segments, respectively. Then, M is synchronizable in 2n  2 optimum-time. The synchronization scheme above can be generalized to arrays with multiple defective segments more than two. Figure 17 shows the synchronization scheme for a cellular array with three defective segments. Details of the algorithm can be found in Umeo [52]. Theorem 14 ([52]) Let p be any positive integer and M

be any cellular array of length n with p defective segments, where n i m i and n i C m i p  i, for any i such that 1  i  p. Then, M is synchronizable in 2n  2 C p steps. Partial Solutions The original firing squad synchronization problem is defined to synchronize all cells of one-dimensional array. In this section, consider a partial FSSP solution that can synchronize an infinite number of cells, but not all. The first partial solution was given by Umeo and Yanagihara [67]. They proposed a five-state solution that can synchronize any one-dimensional cellular array of length n D 2 k in 3n  3 steps for any positive integer k. Figure 18 shows the five-state transition table consisting of 67 rules and its snapshots for n D 8 and 16. In the transition table, the symbols “R”, “Q”, “F” and “*” represent the general, quiescent, firing and boundary states, respectively.

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 16 Time-space diagram for optimum-time firing squad synchronization algorithm with one defective segment

Firing Squad Synchronization Problem in Cellular Automata, Figure 17 Time-space diagram for optimum-time firing squad synchronization algorithm with three defective segments

Theorem 15 ([67]) There exists a 5-state cellular automaton that can synchronize any array of length n D 2 k in 3n  3 steps, where k is any positive integer.

“G”, “Q”, “F” and “*” represent the general, quiescent, firing and boundary states, respectively. Figure 20 shows the four-state transition table given by Umeo, Kamikawa and Yunès [58] which is based on Wolfram’s Rule 150. It has 32 transition rules. Snapshots for n D 16 are given in the figure. In the transition table, the symbols “G”, “Q”, “F” and “*” represent the general, quiescent, firing and boundary states, respectively. Umeo, Kamikawa and Yunès [58] proposed a different, but looking-similar, 4-state protocol based on Wolfram’s Rule 150. Figure 21 shows the four-state transition table consisting of 37 rules and its snapshots for n D 9 and 17. The four-state protocol has some desirable properties. Note that the algorithm operates in optimum-step and its transition rule is symmetric. The state of a general can be either “G” or “A”. Its initial position can be at the left or right end.

Surprisingly, Yunès [80] and Umeo, Kamikawa and Yunès [58] proposed four-state synchronization protocols which are based on an algebraic property of Wolfram’s two-state cellular automata. Theorem 16 ([58,80]) There exists a 4-state cellular automaton that can synchronize any array of length n D 2 k in non-optimum 2n  1 steps, where k is any positive integer. Figure 19 shows the four-state transition table given by Yunès [80] which is based on Wolfram’s Rule 60. It consists of 32 rules. Snapshots for n D 16 cells are also illustrated in Fig. 19. In the transition table, the symbols

1109

1110

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 18 Transition table for the five-state protocol (left) and its snapshots of configurations on 8 and 16 cells (right)

Theorem 17 ([58]) There exists a symmetric 4-state cellular automaton that can synchronize any array of length n D 2 k C 1 in 2n  2 optimum-steps, where k is any positive integer. Yunès [80] has given a state lower bound for the partial solution: Theorem 18 ([80]) There is no 3-state partial solution. Thus, the 4-state partial solutions given above are optimum in state-number complexity in partial solutions. Synchronization Algorithm for One-Bit Communication Cellular Automata In the study of cellular automata, the amount of bitinformation exchanged at one step between neighbor-

ing cells has been assumed to be O(1)-bit. An O(1)-bit communication CA is a conventional cellular automaton in which the number of communication bits exchanged at one step between neighboring cells is assumed to be O(1)-bit, however, such an inter-cell bit-information exchange has been hidden behind the definition of conventional automata-theoretic finite state description. On the other hand, the 1-bit inter-cell communication model is a new cellular automaton in which inter-cell communication is restricted to 1-bit data, referred to as the 1-bit CA model (CA1bit ). The number of internal states of the CA1bit is assumed to be finite in the usual sense. The next state of each cell is determined by the present state of that cell and two binary 1-bit inputs from its left and right neighbor cells. Thus, the CA1bit can be thought of as one of the most powerless and the simplest models in

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 19 Transition table for the four-state protocol based on Wolfram’s Rule 60 (left) and its snapshots of configurations on 16 cells (right)

a variety of CA’s. A precise definition of the CA1bit can be found in Umeo [51] and Umeo and Kamikawa [57]. Mazoyer [26] and Nishimura and Umeo [32] each designed an optimum-time synchronization algorithm on the CA1bit based on Balzer’s algorithm and Waksman’s algorithm, respectively. Figure 22 shows a configuration of the 1-bit synchronization algorithm on 15 cells that is based on the design of Nishimura and Umeo [32]. Each cell has 78 internal states and 208 transition rules. The small black triangles I and J indicate a 1-signal transfer in the right or left direction, respectively, between neighboring cells. A symbol in a cell shows an internal state of the cell. Theorem 19 ([26,32]) There exists a CA1bit that can synchronize n cells in optimum 2n  2 steps. Synchronization Algorithms for Multi-Bit Communication Cellular Automata Section “Synchronization Algorithms for Multi-Bit Communication Cellular Automata” studies a trade-off between internal states and communication bits in fir-

Firing Squad Synchronization Problem in Cellular Automata, Figure 20 Transition table for the four-state protocol based on Wolfram’s Rule 150 (left) and its snapshots of configurations on 16 cells (right)

ing squad synchronization protocols for k-bit communication-restricted cellular automata (CA kbit ) and propose several time-optimum state-efficient bit-transferbased synchronization protocols. It is shown that there exists a 1-state CA5bit that can synchronize any n cells in 2n  2 optimum-step. The result is interesting, since one knows that there exists no 4-state synchronization algorithm on conventional O(1)-bit communication cellular automata. A bit-transfer complexity is also introduced to measure the efficiency of synchronization protocols. It is shown that ˝(n log n) bit-transfer is a lower-bound for synchronizing n cells in (2n  2) steps. In addition, each optimum-time/non-optimum-time synchronization protocols presented has an O(n2 ) bit-transfer complexity, respectively. Most of the results presented here are from Umeo, Yanagihara and Kanazawa [68]. A computational relation between the conventional CA and CA kbit is stated as follows: Lemma 20 ([57]) Let N be any s-state conventional cellular automaton with time complexity T(n). Then, there exists a CA1bit which can simulate N in kT(n) steps, where k is a positive constant integer such that k D dlog2 se.

1111

1112

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 21 Transition table for the four-state protocol (left) and its snapshots of configurations on 9 and 17 cells (right) Firing Squad Synchronization Problem in Cellular Automata, Table 4 A comparison of optimum-time/non-optimum-time firing squad synchronization protocols for multi-bit communication cellular automata Synchronization protocol Communication bits transferred P1 1 P2 2 P3 3 P4 4 P4 4 P5 5 Mazoyer [26] 1 Mazoyer [26] 12 Nishimura and Umeo [32] 1

# of states 54 6 4 3 2 1 58 3 78

# of Transition rules 207 60 76 87 88 114 – – 208

Time complexity 2n  1 2n  2 2n  2 2n  2 2n  2 2n  2 2n  2 2n  2 2n  2

One-/Two-sided recursiveness One-sided One-sided One-sided One-sided One-sided One-sided Two-sided – Two-sided

Lemma 21 ([68]) Let N be any s-state conventional cellular automaton. Then, there exists an s-state CA kbit which can simulate N in real time, where k is a positive integer such that k D dlog2 se.

no internal state is necessary to synchronize the whole array, as is shown in Theorem 27. The protocol design is based on the 6-state Mazoyer’s algorithm given in Mazoyer [25].

The following theorems show a trade-off between internal states and communication bits, and present state-efficient synchronization protocols P i ; 1  i  5. In some sense,

Theorem 22 ([68]) There exists a 54-state CA1bit with protocol P1 that can synchronize any n cells in 2n  1 optimum-step.

Firing Squad Synchronization Problem in Cellular Automata

Theorem 25 ([68]) There exists a 3-state CA4bit with protocol P4 that can synchronize any n cells in 2n  2 optimum-step. Theorem 26 ([68]) There exists a 2-state CA4bit with protocol P4 that can synchronize any n cells in 2n  2 optimum-step. Theorem 27 ([68]) There exists a 1-state CA5bit with protocol P5 that can synchronize any n cells in 2n  2 optimum-step. Figures 24 and 25 illustrate snapshots of the 3-state (2n  2)-step synchronization protocol P4 operating on CA4bit (left) and for the 1-state protocol P5 operating on CA5bit (right). Let BT(n) be total number of bits transferred which is needed for synchronizing n cells on CA kbit . By using the similar technique developed by Vollmar [73], a lower-bound on bit-transfer complexity can be established for synchronizing n cells on CA kbit in a way such that BT(n) D ˝(n log n). In addition, it is shown that each synchronization protocol P i ; 1  i  5 and P4 presented above has an O(n2 ) bit-transfer complexity, respectively. Theorem 28 ([68]) ˝(n log n) bit-transfer is a lower bound for synchronizing n cells on CA kbit in (2n  2) steps. Theorem 29 ([68]) Each optimum-time/non-optimumtime synchronization protocols P i ; 1  i  5 and P4 has an O(n2 ) bit-transfer complexity, respectively. Firing Squad Synchronization Problem on Two-Dimensional Arrays Firing Squad Synchronization Problem in Cellular Automata, Figure 22 A configuration of optimum-time synchronization algorithm with 1-bit inter-cell communication on 15 cells

Theorem 23 ([68]) There exists a 6-state CA2bit with protocol P2 that can synchronize any n cells in 2n  2 optimum-step. Theorem 24 ([68]) There exists a 4-state CA3bit with protocol P3 that can synchronize any n cells in 2n  2 optimum-step. Figure 23 illustrates snapshots of the 6-state (2n  2)-step synchronization protocol P2 operating on CA2bit (left) and for the 4-state protocol P3 operating on CA3bit with 24 cells (right).

Figure 26 shows a finite two-dimensional (2-D) cellular array consisting of m  n cells. The array operates in lockstep mode in such a way that the next state of each cell (except border cells) is determined by both its own present state and the present states of its north, south, east and west neighbors. All cells (soldiers), except the north-west corner cell (general), are initially in the quiescent state at time t D 0 with the property that the next state of a quiescent cell with quiescent neighbors is the quiescent state again. At time t D 0, the north-west corner cell C11 is in the fire-when-ready state, which is the initiation signal for the synchronization. The firing squad synchronization problem is to determine a description (state set and nextstate function) of cells that ensures all cells enter the fire state at exactly the same time and for the first time. The set of states and transition function must be independent of m and n.

1113

1114

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 23 Snapshots for the 6-state protocol P∈ operating on CA2bit with 24 cells (left) and for the 4-state protocol P3 operating on CA3bit with 24 cells (right)

Several synchronization algorithms on 2-D arrays have been proposed by Beyer [3], Grasselli [14], Shinahr [44], Szwerinski [45], Umeo, Maeda and Fujiwara [59], and Umeo, Hisaoka and Akiguchi [53]. Most of the synchronization algorithms for 2-D arrays are based on mapping schemes which map synchronized configurations for 1-D arrays onto 2-D arrays. Section “Firing Squad Synchronization Problem on Two-Dimensional Arrays” presents such several mapping schemes that yield time-efficient 2D synchronization algorithms. Orthogonal Mapping: A Simple Linear-Time Algorithm In this section, a very simple synchronization algorithm is

provided for 2-D arrays. The overall of the algorithm is as follows: 1. First, synchronize the first column cells using a usual optimum-step 1-D algorithm with a general at one end, thus requiring 2m  2 steps. 2. Then, start the row synchronization operation on each row simultaneously. Additional 2n  2 steps are required for the row synchronization. Totally, its time complexity is 2(m C n)  4 steps. The implementation is referred to as orthogonal mapping. It is shown that s C 2 states are enough for the implementation of the algorithm above, where s is the number of internal states of the 1-D base algorithm. Figure 27 shows

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 24 Snapshots for 3-state (2n  2)-step synchronization protocol P4 operating on CA4bit with 24 cells

snapshots of the 8-state synchronization algorithm running on a rectangular array of size 4  6. Theorem 30 There exists an (s C 2)-state protocol for synchronizing any m  n rectangular arrays in 2(m C n)  4 steps, where s is number of states of any optimum-time 1-D synchronization protocol. L-shaped Mapping: Shinar’s Optimum-Time Algorithm The first optimum-time 2-D synchronization algorithm was developed by Shinar [44] and Beyer [3]. The rectangular array of size m  n is regarded as min(m; n) rotated

Firing Squad Synchronization Problem in Cellular Automata, Figure 25 Snapshots for 1-state (2n  2)-step synchronization protocol P5 operating on CA5bit with 18 cells

(90ı in clockwise direction) L-shaped 1-D arrays, where they are synchronized independently using the generalized firing squad synchronization algorithm. The configuration of the generalized synchronization on 1-D array can be mapped on 2-D array. See Fig. 28. Thus, an m  n array synchronization problem is reduced to independent min(m; n) 1-D generalized synchronization problems such that: P (m; m C n  1), P (m  1; m C n  3), . . . , P (1; n  m C 1) in the case m  n and P (m; m C n  1), P (m  1; m C n  3), . . . , P (m  n C 1; m  n C 1) in the case m > n, where P (k; `) means the 1-D generalized synchronization problem for ` cells with a general on the kth cell from left end. Beyer [3] and Shinahr [44]

1115

1116

Firing Squad Synchronization Problem in Cellular Automata

g k , 1  k  m C n  1, defined as follows. g k D fC i; j j(i  1) C ( j  1) D k  1g ; g1 D fC1;1 g ;

i. e.;

g2 D fC1;2 ; C2;1 g ;

g3 D fC1;3 ; C2;2 ; C3;1 g ; : : : ; g mCn1 D fC m;n g :

Firing Squad Synchronization Problem in Cellular Automata, Figure 26 A two-dimensional cellular automaton

have shown that an optimum-time complexity for synchronizing any m  n arrays is m C n C max(m; n)  3 steps. Shinahr [44] has also given a 28-state implementation. Theorem 31 ([3,44]) There exists an optimum-time 28state protocol for synchronizing any m  n rectangular arrays in m C n C max(m; n)  3 steps. Diagonal Mapping I: Six-State Linear-Time Algorithm The proposal is a simple and state-efficient mapping scheme that enables us to embed any 1-D firing squad synchronization algorithm with a general at one end onto two-dimensional arrays without introducing additional states. Consider a 2-D array of size m  n, where m; n 2. Firstly, divide mn cells on the array into m C n  1 groups

Figure 29 shows the division of the two-dimensional array of size m  n into m C n  1 groups. Let M be any 1-D CA that synchronizes ` cells in T(`) steps. Assume that M has m C n  1 cells, denoted by Ci , where 1  i  m C n  1. Then, consider a one-to-one correspondence between the ith group g i and the ith cell Ci on M such that g i $ Ci , where 1  i  m C n  1. One can construct a 2-D CA N such that all cells in g i simulate the ith cell Ci in realtime and N can synchronize any m  n arrays at time t D T(m C n  1) if and only if M synchronizes 1-D arrays of length m C n  1 at time t D T(m C n  1). It is noted that the set of internal states of N constructed is the same as M. Thus an m  n 2-D array synchronization problem is reduced to one 1-D synchronization problem with the general at the left end. The algorithm obtained is slightly slower than the optimum ones, but the number of internal states is considerably smaller. Figure 30 shows snapshots of the proposed 6-state linear-time firing squad synchronization algorithm on rectangular arrays. For the details of the construction of the transition rule set, see Umeo, Maeda, Hisaoka and Teraoka [60]. Theorem 32 ([60]) Let A be any s-state firing synchronization algorithm operating in T(`) steps on 1-D ` cells. Then, there exists a 2-D s-state cellular automaton that can synchronize any m  n rectangular array in T(m C n  1) steps.

Firing Squad Synchronization Problem in Cellular Automata, Figure 27 Snapshots of the synchronization process on 4  6 array

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 28 An optimum-time synchronization scheme for rectangular arrays

Theorem 33 ([60]) There exists a 6-state 2-D CA that can synchronize any m  n rectangular array in 2(m C n)  4 steps. Theorem 34 ([60]) There exists a 6-state 2-D CA that can synchronize any m  n rectangular array containing isolated rectangular holes in 2(m C n)  4 steps. Theorem 35 ([60]) There exists a 6-state firing squad synchronization algorithm that can synchronize any 3-D m  n  ` solid arrays in 2(m C n C `)  6 steps. Diagonal Mapping II: Twelve-State Time-Optimum Algorithm The second diagonal mapping scheme in this section enables us to embed a special class of 1-D generalized synchronization algorithm onto two-dimensional arrays without introducing additional states. An m  n 2-D array synchronization problem is reduced to one 1-D generalized synchronization problem: P (m; m C n  1). Divide mn cells into m C n  1 groups g k defined as follows,

Firing Squad Synchronization Problem in Cellular Automata, Figure 29 A correspondence between 1-D and 2-D arrays

where k is any integer such that (m  1)  k  n  1: g k D fC i; j j j  i D kg ;

(m  1)  k  n  1 :

Figure 31 shows the correspondence between 1-D and 2-D arrays. Property A: Let S it denote the state of Ci at step t. It is said that a generalized firing algorithm has a property A, where any state S it appearing in the area A can be comt1 t1 and S iC1 puted from its left and right neighbor states S i1 t1 but it never depends on its own previous state S i . Figure 32 shows the area A in the time-space diagram for the generalized optimum-step firing squad synchronization algorithm. Any one-dimensional generalized firing squad synchronization algorithm with the property A can be easily embedded onto two-dimensional arrays without introducing additional states. Theorem 36 ([53]) Let M be any s-state generalized synchronization algorithm with the property A operating in

1117

1118

Firing Squad Synchronization Problem in Cellular Automata

Firing Squad Synchronization Problem in Cellular Automata, Figure 30 Snapshots of the proposed 6-state linear-time firing squad synchronization algorithm on rectangular arrays

T(k; `) steps on 1-D ` cells with a general on the kth cell from the left end. Then, based on M, one can construct a 2-D s-state cellular automaton that can synchronize any m  n rectangular array in T(m; m C n  1) steps. It has been shown in Umeo, Hisaoka and Akiguchi [53] that there exists a 12-state implementation of the generalized optimum-time synchronization algorithms having the property A. Then, one can get a 12-state optimumtime synchronization algorithm for rectangular arrays. Figure 33 shows snapshots of the proposed 12-state optimum-time firing squad synchronization algorithm operating on a 7  9 array. Theorem 37 ([53]) There exists a 12-state firing squad synchronization algorithm that can synchronize any m  n rectangular array in optimum m C n C max(m; n)  3 steps. Rotated L-Shaped Mapping: Time-Optimum Algorithm In this section we present an optimum-time synchronization algorithm based on a rotated L-shaped mapping. The synchronization scheme is quite different from previous designs. The scheme uses the freezing-thawing technique. Without loss of generality, it is assumed that m  n. A rectangular array of size m  n is regarded as m ro-

tated (90ı in counterclockwise direction) L-shaped 1-D arrays. Each L-shaped array is denoted by L i ; 1  i  m. See Fig 34. Each Li consists of three segments of length i, n  m, and i, respectively. Each segment can be synchronized by the freezing-thawing technique. Synchronization operations for L i ; 1  i  m are as follows: Figure 35 shows a time-space diagram for synchronizing Li . The wake-up signals for the three segments of Li are generated at time t D m C 2(m  i)  1; 3m  i  2, and n C 2(m  i)  1, respectively. Synchronization operations on each segments are delayed for t i j ; 1  j  3 such that: 8 ˆ t, is described in a statistical manner by the conditional probability distribution p(x; tjx0 ; t 0 ). The conditional probability is subject to a Fokker–Planck equation, also known as second Kolmogorov equation, if @ p(x; t j x0 ; t 0 ) D @t d X @ (1) D (x; t)p(x; t j x0 ; t 0 )  @x i i iD1

C Article Outline Glossary Definition of the Subject Introduction Stochastic Processes Stochastic Time Series Analysis Applications: Processes in Time Applications: Processes in Scale Future Directions Further Reading Acknowledgment Bibliography Glossary Complexity in time Complex structures may be characterized by non regular time behavior of a describing variable q 2 Rd . Thus the challenge is to understand or to model the time dependence of q(t), which may be d q(t) D : : : or by achieved by a differential equation dt the discrete dynamics q(t C ) D f (q(t); : : : ) fixing the evolution in the future. Of special interest are not only nonlinear equations leading to chaotic dynamics but also those which include general noise terms, too. Complexity in space Complex structures may be characterized by their spatial disorder. The disorder on a selected scale l may be measured at the location x by some scale dependent quantities, q(l; x), like wavelets, increments and so on. The challenge is to understand or to model the features of the disorder variable q(l; x) on different scales l. If the moments of q show power behavior hq(l; x)n i / l (n) the complex structures are called fractals. Well known examples of spatial complex structures are turbulence or financial market data. In the first case the complexity of velocity fluctuations over different distances l are investigated, in the second case the complexity of price changes over different time steps (time scale) are of interest.

d 1 X @2 D(2) (x; t)p(x; t j x0 ; t 0 ) ; 2 @x i @x j i j i; jD1

holds. Here D(1) and D(2) are the drift vector and the diffusion matrix, respectively. Kramers–Moyal coefficients Knowing for a stochastic process the conditional probability distribution p(x(t); tjx0 ; t 0 ), for all t and t0 the Kamers–Moyal coefficients can be estimated as nth order moments of the conditional probability distribution. In this way also the drift and diffusion coefficient of the Fokker–Planck equation can be obtained form the empirically measured conditional probability distributions. Langevin equation The time evolution of a variable x(t) is described by Langevin equation if for x(t) it holds: p d x(t) D D(1) (x; t)  C D(2) (x; t)  (t i ) : dt Using Itô’s interpretation the deterministic part of the differential equation is equal to the drift term, the noise amplitude is equal to the square root of the diffusion term of a corresponding Fokker–Planck equation. Note, for vanishing noise a purely deterministic dynamics is included in this description. Stochastic process in scale For the description of complex system with spatial or scale disorder usually a measure of disorder on different scales q(l; x) is used. A stochastic process in scale is now a description of the l evolution of q(l; x) by means of stochastic equations. As a special case the single event q(l; x) follows a Langevin equation, whereas the probability p(q(l)) follows a Fokker–Planck equation. Definition of the Subject Measurements of time signals of complex systems of the inanimate and the animate world like turbulent fluid motions, traffic flow or human brain activity yield fluctuating time series. In recent years, methods have been devised which allow for a detailed analysis of such data. In

1131

1132

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

particular methods for parameter free estimations of the underlying stochastic equations have been proposed. The present article gives an overview on the achievements obtained so far for analyzing stochastic data and describes results obtained for a variety of complex systems ranging from electrical nonlinear circuits, fluid turbulence, to traffic flow and financial market data. The systems will be divided into two classes, namely systems with complexity in time and systems with complexity in scale. Introduction The central theme of the present article is exhibited in Fig. 1. Given a fluctuating, sequentially measured set of experimental data one can pose the question whether it is possible to determine underlying trends and to assess the characteristics of the fluctuations generating the experimental traces. This question becomes especially important for nonlinear systems, which can only partly be analyzed by the evaluation of the powerspectra obtained from a Fourier representation of the data. In recent years it has become evident that for a wide class of stochastic processes the posed question can be answered in an affirmative way. A common method has been developed which can deal with stochastic processes (Langevin processes, Lévy processes) in time as well as in scale. In the first case, one faces the analysis of temporal disorder, whereas in the second case one considers scale disorder, which is an inherent feature of turbulent fluid motion and, quite interestingly, can also be detected in financial time series. This scale disorder is often linked to fractal scaling behavior and can be treated by a stochastic ansatz in a more general way. In the present article we shall give an overview on the developed methods for analyzing stochastic data in time and scale. Furthermore, we list complex systems ranging from electrical nonlinear circuits, fluid turbulence, finance to biological data like heart beat data or data of human tremor, for which a successful application of the data analysis method has been performed. Furthermore, we shall focus on results obtained from some exemplary applications of the method to electronics, traffic flow, turbulence, and finance. Complexity in Time Complex systems are composed of a large number of subsystems behaving in a collective manner. In systems far from equilibrium collectivity arises due to selforganization [1,2,3]. It results in the formation of temporal, spatial, spatio-temporal and functional structures.

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 1 Stochastic time series, data generated numerically and measured in a Rayleigh–Bénard experiment: Is it possible to disentangle turbulent trends from chances?

The investigation of systems like lasers, hydrodynamic instabilities and chemical reactions has shown that selforganization can be described in terms of order parameters u i (t) which obey a set of stochastic differential equations of the form X g i j (u1 ; ::; u n )dWj ; (1) du i D N i [u1 ; ::; u n ]dt C j

where W j are independent Wiener processes. Although the state vector q(t) of the complex system under consideration is high dimensional, its long time behavior is entirely governed by the dynamics of typically few order parameters: q(t) D Q(u1 ; : : : u n ) :

(2)

This fact allows one to perform a macroscopic treatment of complex systems on the basis of the order parameter concept [1,2,3]. For hydrodynamic instabilities in the laminar flow regime like Rayleigh–Bénard convection or the Taylor– Couette experiment thermal fluctuations are usually small and can be neglected. However, in nonlinear optics and, especially, in biological systems the impact of noise has been shown to be of great importance. In principle, the order parameter equations (1) can be derived from basic equations characterizing the system under consideration close to instabilities [1,2]. However if the basic equations are not available, as is the case e. g. for systems considered in biology or medicine, the order parameter concept yields a top-down approach to complexity [3].

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

In this top-down approach the analysis of experimental time series becomes a central issue. Methods of nonlinear time series analysis (c.f. the monograph of Kantz and Schreiber [4]) have been widely applied to analyze complex systems. However, the developed methods aim at the understanding of deterministic systems and can only be successful if the stochastic forces are weak. Apparently, these methods have to be extended to include stochastic processes. Complexity in Scale In the case of selfsimilar structures complexity is commonly investigated by a local measure q(l; x) characterizing the structure on the scale l at x. Selfsimilarity means that in a certain range of l the processes q(l; x) ;

 q(l;  x)

(3)

This formula is based on the assumption that in a turbulent flow regions with different scaling indices  exist, where p(; l) gives a measure of the scaling indices  at scale l. The major shortcoming of the fractal and multifractal approach to complexity in scale is the fact that it only addresses the statistics of the measure q(l; x) at a single scale l. In general one has to expect dependencies of the measures q(l; x) and q(l 0 ; x) from different scales. Thus the question, which we will address in the following, can be posed, are there methods, which lead to a more comprehensive characterization of the scale disorder by general joint statistics f (q N ; l N ; q N1 ; l N1 ; : : : ; q1 ; l1 ; q0 ; l0 )

(9)

of the local measure q at multiple scales li . Stochastic Data Analysis

should have the same statistics. More precisely, the probability distribution of the quantity q takes the form

q 1 (4) f (q; l) D  F  l l

Processes in time and scale can be analyzed in a similar way, if we generalize the time process given in (1) to a stochastic process equation evolving in scale

with a universal function F(Q). Furthermore, the moments exhibt scaling behavior

Z q 1 (5) hq n (l)i D dq q n  F  D Q n l n l l

where dW belongs to a random process. The aim of data analysis is to disentangle deterministic dynamics and the impact of fluctuations. Loosely speaking, this amounts to detect trends and chances in data series of complex systems. A complete analysis of experimental data, which is generated by the interplay of deterministic dynamics and dynamical noise, has to address the following issues

Such type of behavior has been termed fractal scaling behavior. There are many experimental examples of systems, like turbulent fields or surface roughness, just to mention two, that such a simple picture of a complex structure is only a rough, first approximation. In fact, especially for turbulence where q(l; x) is taken as a velocity increment, it has been argued that multifractal behavior is more appropriate, where the nth order moments scale according to hq n (l)i D Q n l (n) ;

(6)

where the scaling indices %(n) are nonlinear functions of the order n: 2

3

%(n) D n0 C n 1 C n 2 C    :

(7)

Such a behavior can formally be obtained by the assumption that the probability distribution f (q; l) has the following form

Z 1 q f (q; l) D d p(; l)  F  : (8) l l

q(l C dl) D q(l) C N(q; l)dl C g(q; l)dW(l)

(10)

 Identification of the order parameters  Extracting the deterministic dynamics  Evaluating the properties of fluctuations The outline of the present article is as follows. First, we shall summarize the description of stochastic processes focusing mainly on Markovian processes. Second, we discuss the approach developed to analyze stochastic processes. The last parts are devoted to applications of the data analysis method to processes in time and processes in scale. Stochastic Processes In the following we consider the class of systems which are described by a multivariate state vector X(t) contained in a d-dimensional state space fxg. The evolution of the state vector X(t) is assumed to be governed by a deterministic part and by noise:     d X(t) D N X(t); t C F X(t); t : dt

(11)

1133

1134

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

N denotes a nonlinear function depending on the stochastic variable X(t) and additionally, may explicitly depend on time t (Note, time t can also be considered as a general variable and replaced for example by a scale variable l like in (86)). Because the function N can be nonlinear, also systems exhibiting chaotic time evolution in the deterministic case are included in the class of stochastic processes (11). The second part, F(X(t); t), fluctuates on a fast time scale. We assume that the d components F i can be represented in the form d   X   F i X(t); t D g i j X(t); t  j (t) :

(12)

jD1

The quantities  j (t) are considered to be random functions, whose statistical characteristics are well-defined. It is evident that these properties significantly determine the dynamical behavior of the state vector X(t). Formally, our approach also includes purely deterministic processes taking F D 0.

b) Generalized Langevin Equations: Lévy Noise A more general class is formed by the discrete time evolution laws [5,6], X(t iC1 ) D X(t i ) C N(X(t i ); t i )  C g(X(t i ); t i )  1/˛ ˛;ˇ (t i )

where the increment ˛;ˇ (t i ) is a fluctuating quantity distributed according to the Lévy stable law characterized by the Lévy parameters ˛, ˇ, [7]. As is well-known, only the Fourier-transform of this distribution can be given: Z 1 h˛;ˇ D dk Z(k; ˛; ˇ) ei kx 2 ˛ (1iˇ

Z(k; ˛; ˇ) D eijkj ˚ D tan  ˚ D

Discrete Time Evolution It is convenient to consider the temporal evolution (11) of the state vector X(t) on a time scale, which is large compared to the time scale of the fluctuations  j (t). As we shall briefly indicate below, a stochastic process related to the evolution Equation (11) can be modeled by stochastic evolution laws relating the state vectors X(t) at times t i ; t iC1 D t i C ; t iC2 D t i C 2 ; : : : for small but finite values of . In the present article we shall deal with the class of proper Langevin processes and generalized Langevin processes, which are defined by the following discrete time evolutions. a) Proper Langevin Equations: White Noise The discrete time evolution of a proper Langevin process is given by X(t iC1 ) D X(t i ) C N(X(t i ); t i )  C g(X(t i ); t i ) 

p (t i )

(13)

where the stochastic increment (t i ) is a fluctuating quantity characterized by a Gaussian distribution with zero mean, h (t i )i D 0 Pd 2 1 1 h( ) D p e 2 D p e ˛D1 d d ( 2) ( 2)

2˛ 2

(14)

Furthermore, the increments are statistically independent for different times h (t i ) (t j )i D ı i j

(15)

(16)

sign(k)˚)

(17)

˛ 2

˛ ¤ 1;

2 ln jkj ˛ D 1 : 

The Gaussian distribution is contained in the class of Lévy stable distributions (˛ D 2, ˇ D 0). Formally, ˛ can be taken from the interval 0 < ˛  2. However, for applications it seems reasonable to choose 1 < ˛  2 in order that the first moment of the noise source exists. The consideration of this type of statistics for the noise variables  is based on the central limit theorem, as discussed in a subsection below. The discrete Langevin (13) and generalized Langevin Equations (16) have to be considered in the limit ! 0. They are the basis of all further treatments. A central point is that if one assumes the noise sources to be independent of the variable X(t i ) the discrete time evolution equations define a Markov process, whose generator, i. e. the conditional probability distribution or short time propagator can be established on the basis of (13), (16). In the following we shall discuss, how the discrete time processes can be related to the stochastic evolution equation (11). Discrete Time Approximation of Stochastic Evolution Equations In order to motivate the discrete time approximations (13), (16) we integrate the evolution law (11) over a finite but small time increment : Z tC X(t C ) D X(t) C dt 0 N(X(t 0 ); t 0 ) t

Z

tC

C t

dt 0 g(X(t 0 ); t 0 ) (t 0 )

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

 X(t) C N(X(t); t) Z tC dt 0 g(X(t 0 ); t 0 ) (t 0 ) : C

stochastic integrals (18)

Z

tC

dW(t; ) D

dt 0 (t 0 ) :

(19)

t

These are the quantities, for which a statistical characterization can be given. We shall pursue this problem in the next subsection. However, looking at (18) we encounter the difficulty that the integrals over the noise forces may involve functions of the state vector within the time interval (t; t C ). The interpretation of such integrals for wildly fluctuating, stochastic quantities (t) is difficult. The simplest way is to formulate an interpretation of these terms leading to different interpretations of the stochastic evolution equation (11). We formulate the widely used definitions due to Itô and Stratonovich. In the Itô sense, the integral is interpreted as Z

tC

dt 0 g(X(t 0 ); t 0 ) (t 0 ) D g(X(t); t)

t

Z

tC

dt 0 (t 0 )

t

D g(X(t); t) dW(t; ) : (20) The Stratonovich definition is Z

tC t

dt 0 g(X(t 0 ); t 0 ) (t 0 ) X(t C ) C X(t) ;t C Dg 2 X(t C ) C X(t) Dg ;t C 2

Z tC dt 0 (t 0 ) 2 t

dW(t; ) : (21) 2

Since from experiments one obtains probability distributions of stochastic processes which are related to stochastic Langevin equations, we are free to choose a certain interpretation of the process. In the following we shall always adopt the Itô interpretation. In this case, the drift vector D1 (x; t) D N(x; t) coincides with the nonlinear vector field N(x; t). Limit Theorems, Wiener Process, Lévy Process In the following we shall discuss possibilities to characterize the

tC

dt 0  (t 0 )

(22)

t

t

The time interval is chosen to be larger than the time scale of the fluctuations of  j (t). It involves the rapidly fluctuating quantities  j (t) and is denoted as a stochastic integral [8,9,10,11]. If we assume the matrix g to be independent on time t and state vector X(t), we arrive at the integrals

Z

W(t C )  W(t) D

 (t) is a rapidly fluctuating quantity of zero mean. In order to characterize the properties of this force one can resort to the limit theorems of statistical mechanics [7]. The central limit theorem states that if the quantities  j , j D 1; : : : ; n are statistically independent variables of zero mean and variance  2 then the sum n 1 X p j D  (23) n jD1 tends to a Gaussian random variable with variance  2 for large values of n. The limiting probability distribution h() is then a Gaussian distribution with variance  2 : 2 1 h() D p e 2 2 : (24) 2 2 As is well known, there is a generalization of the central limit theorem, which applies to random variables whose second moment does not exist. It states that the distribution of the sum over identically distributed random variables  j n 1 X

n1/˛

 j D ˛;ˇ

(25)

jD1

tends to a random variable ˛;ˇ , which is distributed according to the Lévy-stable distribution h˛;ˇ (). The Lévy stable distributions can only be given by their Fourier – transforms, cf. Eq. (17). In order to evaluate the integral (22) using the limit theorems, it is convenient to represent the stochastic force  (t) as a sum over N ı-kicks occuring at discrete times tj X  (t) D  j (t)1/˛ ı(t  t j ) : (26) j

Thereby, t is the time difference between the occurrence of two kicks. Then, we obtain X 1 X  j (t)1/˛ D (N t)1/˛ 1/˛ j dW(t; ) D N j;t 2 j;t 2 j

j

1/˛

D (t) : (27) An application of the central limit theorem shows that if the quantities  j are identically distributed independet variables the integral Z tC 1 D (t) (28) 1/˛ t can be considered to be a random variable (t) which in the limit N ! 1 tends to a stable random variable.

1135

1136

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Thus, for the case ˛ D 2, i. e. for the case where the second moments of the random kicks exist, the stochastic variable dW(t; ) can be represented by the increments p dW(t; ) D (t) (29)

can be constructed from the knowledge of the conditional probability distributions

where (t) is a Gaussian distributed random variable. For the more general case, dW(t; ) is a stochastic variable

according to

dW(t; ) D 1/˛ ˛;ˇ (t)

(30)

where the distribution of ˛;ˇ is the Lévy distribution (17). Statistical Description of Stochastic Processes In the previous subsection we have discussed processes described by stochastic equations. In the present subsection we shall summarize the corresponding statistical description. Such a description is achieved by introducing suitable statistical averages. We shall denote these averages by the brackets h: : :i. For stationary processes the averages can be viewed as a time average. For nonstationary processes averages are defined as ensemble averages, i. e. averages over an ensemble of experimental (or numerical) realizations of the stochastic process (11). For stationary processes in time, one usually deals with time averages. For processes in scale, the average is an ensemble average. Probability Distributions The set of stochastic evolution equations (11), or it’s finite time representations (13), (16) define a Markov process. We consider the joint probability density f (xn ; t n ; : : : ; x1 ; t1 ; x0 ; t0 )

(31)

which is related to the joint probability, to find the system at times ti in the volume Vi in phase space. If we take times ti which are separated by the small time increment D t iC1  t i , then the probability density can be related to the discrete time representation of the stochastic process (13), (16) according to f (xn ; t n ; : : : ; x1 ; t1 ; x0 ; t0 ) D hı(xn  X(t n )) : : : ı(x0  X(t0 ))i ;

(32)

where the brackets indicate the statistical average, which may be a time average (for stationary processes) or an ensemble average. Markov Processes An important subclass of stochastic processes are Markov processes. For these processes the joint probability distribution f (xn ; t n ; : : : ; x1 ; t1 ; x0 ; t0 )

p(x iC1 ; t iC1 jx i ; t i ) D

f (x iC1; t iC1 ; x i ; t i ) f (x i ; t i )

(33)

f (xn ; t n ; : : : ; x1 ; t1 ; x0 ; t0 ) D p(xn ; t n jxn1 ; t n1 ) : : : p(x1 ; t1 jx0 ; t0 ) f (x0 ; t0 ) ; (34) Here the Markov property of a process for multiple conditioned probabilities p(x i ; t i j x i1 ; t i1 ; : : : ; x0 ; t0 ) D p(x i ; t i j x i1 ; t i1 ) (35) is used. As a consequence, the knowledge of the transition probabilities together with the initial probability distribution f (x0 ; t0 ) suffices to define the N-times probability distribution. It is straightforward to prove the Chapman–Kolmogorov equation Z p(x j ; t j j x i ; t i ) D

dx k p(x j ; t j j x k ; t k ) p(x k ; t k j x i ; t i ): (36)

This relation is valid for all times t i < t k < t j . In the following we shall show that the transition probabilities p(x jC1 ; tC jx j ; t) can be determined for small time differences . This defines the so-called short time propagators. Short Time Propagator of Langevin Processes It is straightforward to determine the short time propagator from the finite time approximation (13) of the Langevin equation. We shall denote these propagators by p(x jC1 ; t C jx j ; t), in contrast to the finite time propagators (33), for which the time interval t iC1  t i is large compared to . We first consider the case of Gaussian noise. The variables (t i ) are Gaussian random vectors with probability distribution h  i 1 h[ ] D p exp  : (37) 2 (2)d The finite time intepretation of the Langevin equation can be rewritten in the form

(t i ) D

1 [g(X(t i ); t i )]1 [X(t iC1 )  X(t i ) 1/2  N(X(t i ))] :

(38)

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

This relation, in turn, defines the transition probability distribution p(x iC1 ; t iC1 j x i ; t i ) dx iC1 D h[ D (t i )]J(x i ; t i ) dx iC1 ;

Furthermore, we observe that under the assumption that the random vector (t i ) is independent on the variables X(t j ) for all j  i we can construct the N-time probability distribution

(39) f (xn ; t n ; : : : ; x1 ; t1 ; x0 ; t0 )

where J is the determinant of the Jacobian @ ˛ (t i ) D ; @x iC1;ˇ

D p(xn ; t n j xn1 ; t n1 ) : : : p(x1 ; t1 j x0 ; t0 ) f (x0 ; t0 ) (40)

(46)

and [g]1 denotes the inverse of the matrix g (which is assumed to exist). For the following it will be convenient to define the socalled diffusion matrix D(2) (x i ; t i )

However, this is the definition of a Markov chain. Thereby, the transition probabilities are the short time propagators, i. e. the representation (46) is valid in the short time limit ! 0. The probability distribution (46) is the discrete approximation of the path integral representation of the stochastic process under consideration [8]. Let us summarize: The statistical description of the Langevin equation based on the n-times joint probability distribution leads to the representation in terms of the conditional probability distribution. This representation is the definition of a Markov process. Due to the assumptions on the statistics of the fluctuating forces different processes arise. If the fluctuating forces are assumed to be Gaussian the short time propagator is Gaussian and, as a consequence, solely defined by the drift vector and the diffusion matrix. If the fluctuating forces are assumed to be Lévy distributed, more complicated short time propagators arise. Let us add the following remarks: a) The assumption of Gaussianity of the statistics is not necessary. One can consider fluctuating forces with nonGaussian probability distributions. In this case the probability distributions have to be characterized by higher order moments, or, more explicitly, by its cumulants. At this point we remind the reader that for non-Gaussian distributions, infinitely many cumulants exist. b) The Markovian property, i. e. the fact that the propagator p(x i ; t i jx i1 t i1 ) does not depend on states x k at times t k < t i1 is usually violated for physical systems due to the fact that the noise sources become correlated for small time differences Tmar . This point already has been emphasized in the famous work of A. Einstein on Brownian motion, which is one of the first works on stochastic processes [12]. This time scale is denoted as Markov–Einstein scale. It seems to be a highly interesting quantity especially for nonequilibrium systems like turbulence [13,14] and earthquake signals [15].

J˛ˇ

D(2) (x i ; t i ) D g T (x i ; t i ) g(x i ; t i )

(41)

We are now able to explicitly state the short time propagator of the process (13): 1 p(x i C ; t iC1 jx i ; t i ) D p eS(x iC1 ;x i ;t i ; ) d (2 ) Det[D(2) ] (42) We have defined the quantity S(x iC1 ; x i ; t i ; ) according to hx i iC1  x i S(x iC1 ; x i ; t i ; ) D  D(1) (x i ; t) i hx iC1  x i  D(1) (x i ; t i ) : (43)  [D(2) (x i ; t i )]1 As we see, the short time propagator, which yields the transition probability density from state x i to state x iC1 in the finite but small time interval is a normal distribution. Short Time Propagator of Lévy Processes It is now straightforward to determine the short time propagator for Lévy processes. We have to replace the Gaussian distribution by the (multivariate) Lévy distribution h˛;ˇ ( ). As a consequence, we obtain the conditional probability, i. e. the short time propagator, for Lévy processes: 1 p(x iC1 ; t i C j x i ; t i ) D h˛;ˇ Det[g(x i ; t i )]

 1 1  [g(x(t ); t )] [x(t )  x(t )  N(x(t ))] i i iC1 i i 1/˛ (44) Joint Probability Distribution and Markovian Properties Due to statistical independence of the random vectors (t i ), (t j ) for i ¤ j we obtain the joint probability distribution as a product of the distributions h( ): h( N ; : : : ; 0 ) D h( N ) h( N1 ) : : : h( 0 )

(45)

Finite Time Propagators Up to now, we have considered the short time propagators p(x i ; t i jx i1 ; t i1 ) for inifinitesimal time differences

1137

1138

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

t i  t i1 D . However, one is interested in the conditional probability distributions for finite time intervals, p(x; tjx0 ; t 0 ), t  t 0  . Fokker–Planck Equation The conditional probability distribution p(x; tjx0 ; t 0 ), t  t 0  can be obtained from the solution of the Fokker–Planck equation (also known as second Kolmogorov equation [16]): d

X @ (1) @ D (x; t) p(x; t j x0 ; t 0 ) p(x; t j x0 ; t 0 ) D  @t @x i i iD1

C

d 1 X @2 D(2) (x; t) p(x; t j x0 ; t 0 ) ; 2 @x i @x j i j

(47)

i; jD1

D(1) and D(2) are drift vector and diffusion matrix. Under consideration of Itô’s definitions of stochastic integrals the coefficients D(1) , D(2) of the Fokker–Planck equation (47) and the functionals N, g of the Langevin equation (11), (12) are related by D(1) i (x; t) D N i (x; t) ; D(2) i j (x; t) D

d X

g i l (x; t) g jl (x; t) :

(48) (49)

l D1

They are defined according to 1 lim hX i (t C )  x i ijX(t)Dx D(1) i (x; t) D !0 Z   1 D lim dx0 p(x0 ; t C j x; t) x 0i  x i !0 (50) D(2) i j (x; t) 1 h(X i (t C )  x i )(X j (t C )  x j )ijX(t)Dx Z    1 D lim dx0 p(x0 ; t C jx; t) x 0i  x i x 0j  x j : !0 (51)

D lim

!0

These expressions demonstrate that drift vector and diffusion matrix can be determined as the first and second moments of the conditional probability distributions p(x0 ; t C jx; t) in the small time limit. Fractional Fokker–Planck Equations The finite time propagators or conditional probability distributions of stochastic processes containing Lévy-noise lead to fractional diffusion equations. For a discussion of this topic we refer the reader to [5,6].

Master Equation The most general equation specifying a Markov process for a continuous state vector X(t) takes the form Z @ dx0 w(x; x0 ; t) p(x0 ; t j x0 ; t0 ) p(x; t j x0 ; t0 ) D @t Z  dx0 w(x0 ; x; t) p(x; t j x0 ; t0 ) (52) here w denote transition probabilities. Measurement Noise We can now go a step ahead and include measurement noise. Due to measurement noise, the observed state vector, which we shall now denote by Y(t i ), is related to the stochastic variable X(t i ) by an additional noise term (t i ): Y(t i ) D X(t i ) C (t i )

(53)

We assume that the stochastic variables (t i ) have zero mean, are statistically independent and obey the probability density h(). Then the probability distribution for the variable Y t i is given by Z Z g(yn ; t n ; : : : ; y0 ; t0 ) D : : : d n : : : d 0 (54)  f (yn  n ; t n ; : : : ; y0  0 ; t0 ) h(n ) : : : h(0 ) : The short time propagator recently has been determined for the Ornstein–Uhlenbeck process, a process with linear drift term and constant diffusion. Analysis of data sets spoilt by measurement noise is currently under investigation [17,18]. Stochastic Time Series Analysis The ultimate goal of nonlinear time series analysis applied to deterministic systems is to extract the underlying nonlinear dynamical system directly from measured time series in the form of a system of differential equations, cf. [4]. The role played by dynamic fluctuations has not been fully appreciated. Mostly, noise has been considered as a random variable additively superimposed on a trajectory generated by a deterministic dynamical system. Noise has been usually considered as extrinsic or measurement noise. The problem of dynamical noise, i. e. fluctuations which interfere with the deterministic dynamical evolution, has not been addressed in full details. The natural extension of the nonlinear time series analysis to (continuous) Markov processes is the estimation of short time propagators from time series. During recent years, it has become evident that such an approach is

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

feasible. In fact, noise may help in the estimation of the deterministic ingredients of the dynamics. Due to dynamical noise the system explores a larger part of phase space and thus measurements of signals yield considerably more information about the dynamics as compared to the purely deterministic case, where the trajectories fastly converge to attractors providing only limited information. The analysis of data set’s of stochastic systems exhibiting Markov properties has to be performed along the following lines:    

Disentangling Measurement and Dynamical Noise Evaluating Markovian Properties Determination of Short Time Propagators Reconstruction of Data

Since the methods for disentangling measurement and dynamical noise are currently under intense investigation, see Subsect. “Measurement Noise”, our focus is on the three remaining issues.

This procedure is feasible if large data sets are available. Due to the different conditioning both probabilities are typically based on different number of events. As an appropriate method to statistically show the similarity of (56) the Wilcoxon Test has been proposed, for details see [20,21]. In principle, higher order conditional probability distributions should be considered in a similar way. However, the validity of relation (56) is a strong hint for Markovianity. Evaluation of Chapman–Kolmogorov Equation An indirect way is to use the Chapman–Kolmogorov equation (36), whose validity is a necessary condition for Markovianity. The method is based on a comparison between the conditional pdf p(x k ; t k j x i ; t i )

(57)

taken from experiment and the one calculated by the Chapman–Kolmogorov equation

Evaluating Markovian Properties In principle it is a difficult task to decide on Markovian properties by an inspection of experimental data. The main point is that Markovian properties usually are violated for small time increments , as it already has been pointed out above and in [12]. There are at least two reasons for this fact. First, the dynamical noise sources become correlated at small time differences. If we consider Gaussian noise sources one usually observes an exponential decay of correlations

p˜(x k ; t k jx i ; t i ) Z D dx j p(x k ; t k jx j ; t j ) p(x j ; t j jx i ; t i )

(58)

where tj is an intermediate time t i < t j < t k . A refined method can be based on an iteration of the Chapman– Kolmogorov equation, i. e. considering several intermediate times. If the Chapman–Kolmogorov equation is not fulfilled, deviations are enhanced by each iteration.

0

h i (t) j (t 0 )i D ı i j

ejtt j/Tmar Tmar

(55)

Markovian properties can only expected to hold for time increments > Tmar . Second, measurement noise can spoil Markovian properties [19]. Thus, the estimation of the Markovian time scale Tmar is a necessary step for stochastic data analysis. Several methods have been proposed to test Markov properties. Direct Evaluation of Markovian Properties A direct way is to use the definition of a Markov process (35) and to consider the higher order conditional probability distributions p(x3 ; t3 j x2 t2 ; x1 ; t1 ) D

f (x3 ; t3 ; x2 t2 ; x1 ; t1 ) f (x2 t2 ; x1 ; t1 )

D p(x3 ; t3 j x2 t2 )

(56)

Direct Estimation of Stochastic Forces Probably the most direct way is the determination of the stochastic forces from data. If the drift vector field D(1) (x; t) has been established, as discussed below, the fluctuating forces can be estimated according to g(x; t) (t) D

dx  D(1) (x; t) p :

(59)

The correlations of this force can then be examined directly, see also [22] and Subsect. “Noisy Circuits”. Differentiating Between Stochastic Process and Noise Data Looking at the joint statistics of increments extracted from given data, it could be shown that the nesting of increments and the resulting statistics can be used to differentiate between noise – like data sets and those resulting from stochastic time processes [24].

1139

1140

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Estimating the Short Time Propagator A crucial step in the stochastic analysis is the assessment of the short time propagator of the continuous Markov process. This gives access to the deterministic part of the dynamics as well as to the fluctuations. Gaussian Noise We shall first consider the case of Gaussian white noise. As we have already indicated, drift vector and diffusion matrix are defined as the first and second moments of the conditional probability distribution or short time propagator, Eq. (50). We shall now describe an operational approach, which allows one to estimate drift vector and diffusion matrix from data and has been successfully applied to a variety of stochastic processes. We shall discuss the case, where averages are taken with respect to an ensemble of experimental realizations of the stochastic process under consideration in order to include nonstationary processes. Replacing the ensemble averages by time averages for statistically stationary processes is straightforward. For illustration we show in Fig. 2 the estimated functions of the drift terms obtained from the analysis of the data of Fig. 1. The procedure is as follows:  The data is represented in a d-dimensional phase space  The phase space is partitioned into a set of finite, but small d-dimensional volume elements  For each bin (denoted by ˛), located at the point x˛ of the partition we consider the quantity x(t j C ) D x(t j ) C D(1) (x(t j ); t j ) p C g(x(t j ) (t j ) :

(60)

Thereby, the considered points x(t i ) are taken from the bin located at x˛ . Since we consider time dependent processes this has to be done for each time step tj separately  Estimation of the drift vector: The drift vector assigned to the bin located at x˛ is determined as the small - limit 1 D (x; t) D lim M(x; t; ) !0 (1)

(61)

of the conditional moment M(1) (x˛ ; t j ; ) D

1 X [x(t j C )  x(t j )] (62) N˛ x(t j )2˛

The sum is over all N˛ points contained in the bin ˛.

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 2 Estimated drift terms from the numerical and experimental data of Fig. 1. In part a also the exact function of the numerical model is shown as solid curve, in part b the line D(1) D 0 is shown to visualize the multiple fixed points; after [23]

Proof: The drift vector assigned to the bin ˛ located at x˛ is approximated by the conditional expectation value M(1) (x˛ ; t; ) D

1 X (1) D (x j ; t j ) N˛ x(t j )2˛

p 1 X g(x(t j ) (t j ) (63) C N˛ x(t j )2˛

Thereby, the sum is over all points x(t j ) located in the bin assigned to x˛ . Assuming that D(1) (x; t) and g(x; t) do not vary significantly over the bin, the second con-

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

tribution drops out since 1 X (t j ) ! 0 : N˛ x 2˛

(64)

j



 Estimation of the Diffusion matrix The diffusion matrix can be estimated by the small limit D(2) (x; t) D lim

!0

1 (2) M (x; t; )

(65)

of the conditional second moment M(2) (x˛ ; t; ) 1 X f[x(t j C )  x(t j )]  D(1) (x j ; t j )g2 D N˛ j

(66) Proof: We consider the quantity M(2) (x˛ ; t; ) 1 X X g(x(t k ); t k ) (t k )g(x(t j ) (t j ) D N˛ x(t k )2˛ k

(67) If the bin size is small comparable to the scale, where the matrix g(x; t) varies significantly, we can replace g(x(t k ); t k ) by g(x˛ ; t k ) such that M(2) (x˛ ; t; )

2

3 X X 1 (t k ) (t j )5 g T (x˛ ; t k ) D g(x˛ ; t k ) 4 N˛ x 2˛ x 2˛ j

k

D g(x˛ ; t k )g (x˛ ; t k ) (68) Thereby, we have used the assumption of the statistical independence of the fluctuations X

x(t j )2˛ x(t k )2˛

 (t k ) (t j ) D ı k j E :

Lévy Processes A procedure to analyze Lévy processes along the same lines has been proposed in [28]. An important point here is the determination of the Lévy parameter ˛. Selfconsistency After determining the drift vector and the characteristics of the noise sources from data, it is straightforward to synthetically generate data sets by iterating the corresponding stochastic evolution equations. Subsequently, their statistical properties can be compared with the properties of the real world data. This yields a selfconsistent check of the obtained results. Estimation of Drift and Diffusion from Sparsely Sampled Time Series

T

1 X N˛

Technical Aspects The above procedure of estimating drift vector and diffusion matrix explicitly shows the properties, which limit the accuracy of the determined quantities. First of all, the bin size influences the results. The bin size should allow for a resonable number of events such that the sums converge, however, it should be reasonable small in order to allow for a accurate resolution of drift vector and diffusion matrix. Second, the data should allow for the estimation of the conditional moments in the limit lim !0 [25,26]. Here, a finite Markov–Einstein coherence length may cause problems. Furthermore, measurement noise can spoil the possibility of performing this limit. From the investigation of the Fokker–Planck equation much is known on the dependence of the conditional moments. This may be used for further improved estimations, as has been discussed in [27]. Furthermore, as we shall discuss below, extended estimation procedures have been devised, which overcome the problems related with the small -limit.

(69) 

 Higher order cumulants In a similar way one may estimate higher cumulants M n , which in the small time limit converge to the socalled Kramers–Moyal coefficients. The estimation of these quantities allows to answer the question whether the noise sources actually are Gaussian distributed random variables.

As we have discussed, the results from an analysis of data sets can be reconsidered selfconsistently. This fact can be used to extend the procedure to data sets with insufficient amount of data or sparsely sampled time series, for which the estimation of conditional moments M (i) (x; t; ) and the subsequent limiting procedure ! 0 can not be performed accurately. In this case, one may proceed as follows. In a first step one obtains a zeroth order approximation of drift vector D(1) (x) and diffusion matrix D(2) (x). Based on this estimate one performes, in a second step, a suitable ansatz for the drift vector and the diffusion matrix containing a set of free parameters  D(1) (x;  ); D(2) (x;  )

(70)

1141

1142

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

defining a class of Langevin-equations. Each Langevin equation defines a joint probability distribution f (xn ; t n ; : : : x1 ; t1 ; )

(71)

This joint probability distribution can be compared with the experimental one, f (xn ; t n ; : : : x1 ; t1 ; exp). The best representative of the class of Langevin equations for the reconstruction of experimental data is then obtained by minimizing a suitably defined distance between the two distributions: Distf f (xn ; t n ; : : : x1 ; t1 ; )  f (xn ; t n ; : : : x1 ; t1 ; exp)g D Min

(72)

A reasonable choice is the so-called Kullback–Leibler distance between two distributions, defined as Z K D d f (xn ; t n ; : : : ; x1 ; t1 ; exp)  ln

f (xn ; t n ; : : : ; x1 ; t1 ; exp) f (xn ; t n ; : : : ; x1 ; t1 ; )

(73)

Recently, it has been shown how the iteration procedure can be obtained from maximum likelihood arguments. For more details, we refer the reader to [29,30]. A technical question concerns the determination of the minimum. In [31] the limited-memory Broyden–Fletcher–Goldfarb– Shanno algorithm for constraint problems has been used for the solution of the optimization problem. Applications: Processes in Time The method outlined in the previous section has been used for revealing nonlinear deterministic behavior in a variety of problems ranging from physics, meteorology, biology to medicine. In most of these cases, alternative procedures with strong emphasis on deterministic features have only been partly successful due to their inappropriate treatment of dynamical fluctuations. The following list (with some exemplary citations) gives an overview on the investigated phenomena, which range from technical applications over many particle systems to biological and geophysical systems.  Chatter in cutting processes [32,33]  Identification of bifurcations towards drifting solitary structures in a gas-discharge system [34,35,36,37,38]  Electric circuits [17,39,40]  Wind energy convertors [41,42,43,44]  Traffic flow [45]  Inverse grading of granular flows [46]  Heart rythms [47,48,49,50]  Tremor data [39]

 Epileptic brain dynamics [51]  Meteorological data like El NINO [52,53,54]  Earth quake prediction [15] The main advantage of the stochastic data analysis method is its independence on modeling assumptions. It is purely data driven and is based on the mathematical features of Markov processes. As mentioned above these properties can be verified and validated selfconsistently. Before we proceed to consider some exemplary applications we would like to add the following comment. The described analysis method cleans data from dynamical and measurement noise and provides the drift vector field, i. e. one obtains the underlying deterministic dynamical system. In turn, this system can be analyzed by the methods of nonlinear time series analysis: One can determine proper embedding, Ljapunov-exponents, dimensions, fixed points, stable and unstable limit cycles etc. [55]. We want to point out that the determination of these quantities from uncleaned data usually is flawed by the presence of dynamical noise. Synthetic Data: Potential Systems, Limit Cycles, Chaos The above method of disentangling drift and dynamical noise has been tested for synthetically generated data. The investigated systems include the pitchfork bifurcation, the Van der Pol oscillator as an example of a nonlinear oscillator, as well as the noisy Lorenz equations as an example of a system exhibiting chaos [56,57]. Furthermore, it has been shown how one can analyze processes which additionally contain a time periodic forcing [58]. This is of high interest for analyzing systems exhibiting the phenomena of stochastic resonance. Quite recently, stochastic systems with time delay have been considered [59,60]. The results of these investigations may be summarized as follows: Provided there is enough data and the data is well sampled, it is possible to extract the underlying deterministic dynamics and the strength of the fluctuations accurately. Figure 3 summarizes what can be achieved for the example of the noisy Lorenz model. For a detailed discussion we refer the reader to [57]. Noisy Circuits Next, we present the application of the method to data sets from experimental systems. As first example, a chaotic electric circuit has been chosen. Its dynamics is formed by a damped oscillator with nonlinear energy support and additional dynamic noise terms. In this case, well defined electric quantities are measured for which the dynamic equations are known. The measured time series are an-

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 3 Time series of the stochastic Lorenz equation from top to bottom: a) Original time series b) Deterministic time series c) Time series obtained from an integration of the reconstructed vector field d) Reconstructed time series including noise. For details cf. [57]

alyzed according to the numerical algorithm described above. Afterwards, the numerically determined results and the expected results according to the system’s equations are compared. The dynamic equations of the electric circuit are given by the following equations, where the deterministic part is known as Shinriki oscillator [61]: X˙ 1 D



1



1 R1 C1



X1 R N IC C1 f (X 1  X2 ) 1  C  (t) C1 R N IC C1

D g1 (X 1 ; X2 ) C h1  (t) f (X 1  X 2 ) 1  X3 D g2 (X 1 ; X2 ; X3 ) X˙ 2 D C2 R3 C2 R3 X˙ 3 D  (X 2  X3 ) D g3 (X 2 ; X3 ) L

(74)

(75) (76) (77)

X1 ; X2 and X 3 denote voltage terms, Ri are values of resistors, L and C stand for inductivity and capacity values. The function f (X1  X2 ) denotes the characteristic of the nonlinear element. The quantities X i , characterizing the stochastic variable of the Shinriki oscillator with dynamical noise, were measured by means of a 12 bit A/D converter. Our analysis is based on the measurement of 100.000 data points [39]. The attractor of the noise free dynamics is shown in Fig. 4. The measured 3-dimensional time series were analyzed as outlined above. The determined deterministic dynamics – expressed by the deterministic part of the evo-

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 4 Trajectory for the Shinriki oscillator in the phase space without noise; after [40]

lution equations – corresponds to a vector field in the three dimensional state space. For presentation of the results, cuts of lower dimension have been generated. Part a) of Fig. 5 illustrates the vector field (g1 (x1 ; x2 ), g2 (x1 ; x2 ; x3 D 0)) of the reconstructed deterministic parts affiliated with (75), (76). Furthermore, the one-dimensional curve g1 (x1 ; x2 D 0) is drawn in part b). Additionally to the numerically determined results found by data analysis the expected vector field and curve (75), (76) are shown for comparison. A good agreement can be recognized.

1143

1144

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 5 Cuts of the function D(1) (x) reconstructed from experimental data of the electric circuit in comparison with the expected functions according to the known differential (Eq. (75), (76)). In part a the cut g1 (X1 ; X2 ), g2 (X1 ; X2 ; X3 D 0) is shown as a two-dimensional vector field. Thick arrows represent values determined by data analysis, thin arrows represent the theoretically expected values. In areas of the state space where the trajectory did not show up during the measurement no estimated values for the functions are obtained. Figure b shows the one dimensional cut g1 (X1 ; X2 D 0). Crosses represent values estimated numerically by data analysis. Additionally, the affiliated theoretically curve is printed as well; after [39]

Based on the reconstructed deterministic functions it is possible to reconstruct also the noisy part from the data set, see Subsect. “Direct Estimation of Stochastic Forces”. This has been performed for the three dimensionally embedded data, as well for the case of two dimensional embedding. From these reconstructed noise data, the autocorrelation was estimated. As shown in Fig. 6 correlated noise is obtained for wrong embedding indicating the violation of Markovian properties. In fact such an approach can be used to verify the correct embedding of nonlinear noisy dynamical systems. We emphasize that, provided sufficient data is available, this check of correct embedding can also be performed locally in phase space to find out where for the corresponding deterministic system crossing of trajectories take place. This procedure can be utilized to find the correct local embedding of data.

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 6 Autocorrelation function of reconstructed dynamical noise, a correctly three dimensional embedded showing ı-correlated noise, b the projected dynamics in the twodimensional phase space X1 (t) and X3 (t), showing finite time correlations; after [17]

The electronic circuit of the Shinriki oscillator has also been investigated with two further perturbations. In [17] the reconstruction of the deterministic dynamics by the presence of additional measurement noise has been addressed. In [40] the Langevin noise has been exchanged by a high frequency periodic source, as shown in Fig. 7. Even for this case reasonable (correct) estimations of the deterministic part can be achieved. Many Particle Physics – Traffic Flow Far from equilibrium interacting many particle systems exhibit collective macroscopic phenomena like pattern formation, which can be described by the concept of order

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 7 a Trajectory for the Shinriki oscillator in the phase space with a sinusoidal force. b The corresponding trajectory in the phase space, compare Fig. 5b; after [40]

parameters. In the following we shall exemplify for the case of traffic flow how complex behavior can be analyzed by means of the proposed method leading to stochastic equations for the macroscopic description of the system. Traffic flow certainly is a collective phenomena, for which a huge amount of data is available. Measured quantities are velocity v and current q D v of cars passing a fixed line on the highway. Theoretical models of traffic flow are based on the so-called fundamental diagram, which is a type of material law for traffic flow relating current and velocity of the traffic flow q D Q(v)

(78)

The special form of this relation has been much under debate in recent years.

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 8 Deterministic dynamics of the twodimensional traffic states (q; v) given by the drift vector for a right lane of freeway with vans and b all three lanes. Bold dots indicate stable fixed points and open dots saddle points, respectively; after [45]

It is tempting to describe the dynamics by the following set of stochastic difference equation v NC1 D G(v N ; q N ) C  N q NC1 D F(v N ; q N ) C  N

(79)

Here, v NC1 , q N C 1 are velocity and current of the N C 1 car traversing the line.  N and  N are noise terms with zero mean, which may depend on the variables u and q. The drift vector field D1 D [D11 (v; q); D21 (v; q)] has been determined in [45]. We point out that the meaning of the drift vector field does not depend on the assumption of ideal noise sources, as has been discussed above Subsect. “Estimating the Short Time Propagator” and Subsect. “Noisy Circuits”. The obtained drift vector field is depicted in Fig. 8.

1145

1146

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

The phase space yields interesting behavior. For the traffic data involving all three lanes there are three fixed points, two sinks separated by a saddle point. Furthermore, the arrows representing the drift vector field indicate the existence of an invariant manifold, q D Q(v)

(80)

which has to be interpreted as the above mentioned fundamental diagram of traffic flow. It is interesting to see differences caused by separation of the traffic dynamics into that of cars and that of vans. This has been roughly achieved by considering different lanes of the highway. In Figs. 8a and 9a the dynamics of the right lane caused by vans is shown. It can clearly be seen that up to a speed of about 80 km/h a meta stable plateau is present corresponding to a quasi interaction free dynamics of vans. Up to now the information contained in the noise terms has not led to clear new insights. For the discussion of further examples we refer the reader to the literature. In particular, we want to mention the analysis of segregational dynamics of single particles with different sizes in avalanches [46], which can be treated along similar lines . Applications: Processes in Scale In the following we shall consider complex behavior in scale as discussed in the introduction by the method of stochastic processes. In order to statistically describe scale complexity in a comprehensive way, one has to study joint probability distributions f (q N ; l N ; q N1 ; l N1 ; : : : ; q1 ; l1 ; q0 ; l0 )

(81)

of the local measure q at multiple scales li . To grasp relations across several scales (like given by “coherent” structures) all qi tuples are taken at common locations x. In the case of statistical independency of q at different scales li , this joint probability density factorizes: f (q N ; l N ; q N1 ; l N1 ; : : : ; q1 ; l1 ; q0 ; l0 ) D f (q N ; l N ) : : : f (q0 ; l0 ) :

(82)

In this case multifractal scaling (6) is a sufficient characterization of systems with scale complexity, provided the scaling property is given. If there is no factorization, one can use the joint probability distribution Eq. (81) to define a stochastic process in scale. Thus, one can identify the scale l with time t and try to obtain a representation of the spatial disorder in the form of a stochastic process similar to a stochastic process in time (see Subsect. “Statistical

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 9 Corresponding potentials for the deterministic dynamics given by the drift coefficients of a one dimensional projection of the results in Fig. 8 [45]

Description of Stochastic Processes” and Subsect. “Finite Time Propagators”). For such problems our method has been used as an alternative description of multifractal behavior. The present method has the advantage to relate the random variables across different scales by introducing a conditional probability distribution or, in fact, a two scale probability distribution. Scaling properties are no prerequisite. Provided that the Markovian property holds, which can be tested experimentally, a complete statistical characterization of the process across scale is obtained. The method has been used to characterize the complexity of data sets for the following systems (with some exemplary citations):  Turbulent flows [20,62,63,64]  Passive scalar in turbulent flows [21]

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

   

Financial data [65,66,67,68] Surface roughness [69,70,71,72,73,74] Earthquakes [15] Cosmic background radiation [75]

The stochastic analysis of scale dependent complex system aims to achieve a n-scale characterization, from which the question arose, whether it will be possible to derive from these stochastic processes methods for generating synthetic time series. Based on the knowledge of the n-scale statistics a way to estimate the n-point statistics has been proposed, which enables to generate synthetic time series with the same n-scale statistics [76]. It is even possible to extend given data sets, an interesting subject for forecasting. Other approaches has been proposed in [70,77]. Turbulence and Financial Market Data In the following we present results from investigations of fully developed turbulence and from data of the financial market. The complexity of turbulent fields still has not been understood in detail. Although the basic equations, namely the Navier Stokes equations, are known for more than 150 years, a general solution of these equations for high Reynolds numbers, i. e. for turbulence, is not known. Even with the use of powerful computers no rigorous solutions can be obtained. Thus for a long time there has been the challenge to understand at least the complexity of an idealized turbulent situation, which is taken to be isotropic and homogeneous. The main problem is to formulate statistical laws based on the treatment of the deterministic evolution laws of fluid dynamics. A first approach is due to Kolmogorov [78], who formulated a phenomenological theory characterizing properties of turbulent fluid motions by statistical quantities. The central observable of Kolmogorov’s theory is the so-called longitudinal velocity increment (which we label here with q) of a turbulent velocity field u(x; t) defined according to: qx (l; t) D

l  [u(x C l; t)  u(x; t)] : l

(83)

A statistical description is given in terms of the probability distribution

fields have been considered from the viewpoint of selfsimilarity addressing scaling behavior of the probability distribution f (q; l) and their nth order moments, the so-called structure functions hq n i. Multifractal scaling properties, mentioned already in Eq. (7), of the velocity increments for turbulence are identical to the well known intermittency problem1 , which manifests itself in the occurrence of heavy tailed statistics, i. e. an unexpected high probability of extreme events. Kolmogorov and Oboukhov proposed the so-called intermittency correction [79] hq(l; x)n i D l n with  n D

n n(n  3)  and n > 2 3 18

with 0:25 <  < 0:5 (for further details see [80]). The form of n has been heavily debated during the last decades. For isotropic turbulence the central issue is to reveal the mechanism, which leads to this anomalous statistics (see [80,81]). A completely different point of view of the properties of turbulent fields is gained by interpreting the probability distribution as the distribution of a random process in scale l [62,63]. It is tempting to hypothesize the existence of a stochastic process, see Subsect. “Stochastic Data Analysis” q(l C dl) D q(l) C N(q; l)dl C g(q; l)dW(l) ;

(84)

For stationary, homogeneous and isotropic turbulence, this probability distribution is independent of the reference point x, time t, and, due to isotropy, only depends on the scale l D jlj. As a consequence, the central statistical quantity is the probability distribution f (q; l). Turbulent

(86)

where dW(l) is an increment of a random process. This type of stochastic equation indicates how the velocity increment of a snapshot of the turbulent field changes as a function of scale l. In this respect, the process q(l) can be considered to be a stochastic cascade process in “time l”. This concept of complexity in scale can be carried over to other systems like the roughness of surfaces or financial data. In the latter case the scale variable l is replaced by the time distance or time scale . Anomalous Statistics A direct consequence of multifractal scaling related with nonlinear behavior of the scaling exponents n is the fact 1 Here it should be noted that the term “intermittency”

f (q; l; t; x) D hı(q  qx (l; t))i :

(85)

is used frequently in physics for different phenomena, and may cause confusions. This turbulent intermittency is not equal to the intermittency of chaos. There are also different intermittency phenomena introduced for turbulence. There is this intermittency due to the nonlinear scaling, there is the intermittency of switches between turbulent and laminar flows for non local isotropic fully developed turbulent flows, there is the intermittency due to the statistics of small scale turbulence which we discuss here as heavy tailed statistics.

1147

1148

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

tures we will discuss here are also found in other financial data like for instance quotes of stocks, see [83]. A central issue is the understanding of the statistics of price changes over a certain time interval which determines losses and gains. The changes of a time series of quotations Y(t) are commonly measured by returns r( ; t) :D Y(t C )/Y(t), logarithmic returns or by increments q( ; t) :D Y(t C )  Y(t) [84]. The moments of these quantities often exhibit power law behavior similar to the just discussed Kolmogorov scaling for turbulence, cf. [85,86,87]. For the probability distributions one additionally observes an increasing tendency to heavy tailed probability distributions for small (see Fig. 11). This represents the high frequency dynamics of the financial market. The identification of the underlying process leading to these heavy tailed probability density functions of price changes is a prominent puzzle (see [86,87,88,89,90,91]), like it is for turbulence. Stochastic Cascade Process for Scale

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 10 Comparison of the numerical solution of the Fokker–Planck equation (solid lines) for the pdfs f(q(x); l) with the pdfs obtained directly from the experimental data (bold symbols). The scales l are (from top to bottom): l D L0 ; 0:6L0 ; 0:35L0 ; 0:2L0 and 0:1L0 . The distribution at the largest scale L0 was parametrized (dashed line) and used as initial condition for the Fokker–Planck equation (L0 is the correlation length of the turbulent velocity signal). The pdfs have been shifted in vertical direction for clarity of presentation and all pdfs have been normalized to their standard deviations; after [20]

The occurrence of the heavy tailed probability distributions on small scales will be discussed as a consequence of stochastic process evolving in scale, using the above mentioned methods. Guided by the finding that the statistics changes with scale, as shown in Figs. 10 and 11, we consider the evolution of the quantity q(l; x), or q( ; t) with the scale variable l or , respectively. For a single fixed scale l we get the scale dependent disorder by the statistics of q(l; x). The complete stochastic information of the disorder on all length scales is given by the joint probability density function f (q1 ; : : : ; q n ) ;

that the shape of the probability distribution f (q; l) has to change its shape as a function of scale. A selfsimilar behavior of the form (4) would lead to fractal scaling behavior, as outlined in Eq. (5). Using experimental data from a turbulent flow this change of the shape of the pdf becomes obvious. In Fig. 10 we present f (q; l) for a data set measured in the central line of a free jet with Reynolds number of 2:7  104 , see [20]. Note for large scales (l  L0 ) the distributions become nearly Gaussian. As the scale is decreased the probability densities become more and more heavy tailed. Quite astonishingly, the anomalous statistical features of data from the financial market are similar to the just discussed intermittency of turbulence [82]. The following analysis is based on a data set Y(t), which consists of 1 472 241 quotes for US dollar-German Mark exchange rates from the years 1992 and 1993. Many of the fea-

(87)

where we set q i D q(l i ; x). Without loss of generality we take l i < l iC1 . This joint probability may be seen in analogy to joint probabilities of statistical mechanics (thermodynamics), describing in the most general way the occupation probabilities of the microstates of n particles, where q is a six-dimensional phase state vector (space and momentum). Next, the question is, whether it is possible to simplify the joint probability by conditional probabilities: f (q1 ; : : : ; q n ) D p(q1 jq2 ; : : : ; q n )  p(q2 jq3 ; : : : ; q n ) : : : p(q n1 jq n ) f (q n ) ; (88) where the multiple conditioned probabilities are given by p(q i j q iC1 ; : : : ; q n ) D p(q i j q iC1 ) :

(89)

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 11 Probability densities (pdf) f(q(t); ) of the price changes q(; t) D Y(t C )  Y(t) for the time delays  D 5120; 10240; 20480; 40960 s (from bottom to top). Symbols: results obtained from the analysis of middle prices of bit-ask quotes for the US dollar-German Mark exchange rates from October 1st, 1992 until September 30th, 1993. Full lines: results from a numerical iteration of the Fokker–Planck equation (95); the probability distribution for  D 40960s (dashed lines) was taken as the initial condition. The pdfs have been shifted in vertical direction for clarity of presentation and all pdfs have been normalized to their standard deviations; after [66]

Eq. (89) is nothing else than the condition for a Markov process evolving from state q iC1 to the state qi , i. e. from scale l iC1 to li as it has been introduced above, see Eq. (35). A “single particle approximation” would correspond to the representation: f (q1 ; : : : ; q n ) D f (q1 ) f (q2 ) : : : f (q n ) :

(90)

According to Eqs. (89) and (90), Eq. (91) holds if p(q i jq iC1 ) D f (q i ). Only for this case, the knowledge of f (q i ) is sufficient to characterize the complexity of the whole system. Otherwise an analysis of the single scale probability distribution f (q i ) is an incomplete description of the complexity of the whole system. This is the deficiency of the approach characterizing complex structures by means of fractality or multiaffinity (cf. for multiaffin-

Fluctuations, Importance of: Complexity in the View of Stochastic Processes, Figure 12 Comparison of the numerical solution of the Fokker–Planck equation for the conditional pdf p(q; ljq0 ; l0 ) denoted in this figure as p(v; rjv0 ; r0 ) with the experimental data. a: Contour plots of p(v; rjv0 ; r0 ) for r0 D L and r D 0:6L, where L denotes the integral length. Dashed lines: numerical solution of (94), solid lines: experimental data. b and c: Cuts through p(v; rjv0 ; r0 ) for v0 D C1 and v0 D 1 respectively. Open symbols: experimental data, solid lines: numerical solution of the Fokker–Planck equation; after [20]

ity [92], for turbulence [80,81], for financial market [85]). The scaling analysis of moments as indicated for turbulence in Eq. (85) provides a complete knowledge of any joint n-scale probability density only if Eq. (90) is valid. These remarks underline the necessity to investigate these conditional probabilities, which can be done in a straightforward manner from given experimental or numerical data. For the case of turbulence as well as for financial data we see that p(q i jq iC1 ) does not coincide with f (q i ), as it is shown for turbulence data in Fig. 12. If p(q i jq iC1 ) D f (q i ) no dependency on q iC1 could be detected.

1149

1150

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

The next point of interest is whether the Markov properties are fulfilled. Therefore doubly conditioned probabilities were extracted from data and compared to the single conditioned ones. For financial data as well as for turbulence we found evidence that the Markov property is fulfilled if the step size is larger than a Markov–Einstein length [20,66]. For turbulence it could be shown that the Markov–Einstein length coincides with the Taylor length marking the small scale end of the inertial range [14]. (The extensive discussion of the analysis of financial and turbulent data can be found in [20,65,66,93]). Based on the fact that the multiconditioned probabilities are equal to the single conditioned probabilities, and taking this as a verification of Markovian properties we can proceed according to Subsect. “Estimating the Short Time Propagator” and estimate from given data sets the stochastic equations underlying the cascade process. The evolution of the conditional probability density p(q; ljq0 ; l0 ) starting from a selected scale l0 follows @ p(q; ljq0 ; l0 ) @l 1 X 1 @ k D D(k) (q; l) p(q; ljq0 ; l0 ) : (91)  k! @q

l

kD1

(The minus sign on the left side is introduced, because we consider processes running to smaller scales l, furthermore we multiply the stochastic equation by l, which leads to a new parametrization of the cascade by the variable ln(1/l), a simplification for a process with scaling law behavior of its moments.) This equation is known as the Kramers–Moyal expansion [8]. As outlined in Subsect. “Finite Time Propagators” and Subsect. “Estimating the Short Time Propagator”, the Kramers–Moyal coefficients D(k) (q; l) are now defined as the limit l ! 0 of the conditional moments M (k) (q; l; l): D(k) (q; l) D lim M (k) (q; l; l) ;

l !0

(92)

M (k) (q; l; l) :D l l

C1 Z



q˜  q

 k  ˜ l  l j q; l d q˜ : p q;

(93)

1

Thus, for the estimation of the D(k) coefficients it is only necessary to estimate the conditional probabili˜ l  ljq; l). For a general stochastic process, ties p(q; all Kramers–Moyal coefficients are different from zero. According to Pawula’s theorem, however, the Kramers– Moyal expansion stops after the second term, provided that the fourth order coefficient D(4) (q; l) vanishes. In

that case the Kramers–Moyal expansion is reduced to a Fokker–Planck equation: @ p(q; ljq0 ; l0 ) @l 

1 @2 (2) @ D (q; l) p(q; ljq0 ; l0 ) : D  D(1) (q; l) C @q 2 @q2 (94)

l

D(1) is denoted as drift term, D(2) as diffusion term now for the cascade process. The probability density function f (q; l) has to obey the same equation: @ f (q; l) @l

 @ 1 @2 (2) D  D(1) (q; l) C D (q; l) f (q; l) : (95) @q 2 @q2

l

The novel point of our analysis here is that knowing the evolution equation (91), the n-increment statistics f (q1 ; : : : ; q n ) can be retrieved as well. Definitely, information like scaling behavior of the moments of q(l; x) can also be extracted from the knowledge of the process equations. Multiplying (91) by qn and successively integrating over q, an equation for the moments is obtained: @ n hq i @l

n X n! @ k D hD(k) (q; l)q nk i :  @q k!(n  k)!

l

(96)

kD1

Scaling, i. e. multi affinity as described in Eq. (7), is obtained if D(k) (q; l) / q k , see [71,94]. We summarize: By the described procedure we were able to reconstruct stochastic processes in scale directly from given data. Knowing these processes one can perform numerical solutions in order to obtain a selfconsistent check of the procedure(see [20,66]). In Figs. 10, 11 and 12 the numerical solutions are shown by solid (dashed) curves. The heavy tailed structure of the single scale probabilities as well as the conditional probabilities are well described by this approach based on a Fokker– Planck equation. Further improvements can be achieved by optimization procedures mentioned in Subsect. “Estimation of Drift and Diffusion from Sparsely Sampled Time Series”. New Insights The fact that the complexity of financial market data as well as turbulent data can be expressed by a Markovian

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

process in scale, has the consequence that the conditioned probabilities involving only two different scales are sufficient for the general n scale statistics. This indicates that three or four point correlation are sufficient for the formulation of n-point statistics. The finding of a finite length scale above which the Markov properties are fulfilled [14] has lead to a new interpretation of the Taylor length for turbulence, which so far had no specific physical meaning. For financial, as well as, for the turbulent data it has been found that the diffusion term is quadratic in the state space of the scale resolved variable. With respect to the corresponding Langevin equation, the multiplicative nature of the noise term becomes evident, which causes heavy tailed probability densities and multifractal scaling. The scale dependency of drift and diffusion terms corresponds to a non-stationary process in scale variables and l, respectively. From this point we conclude that a Levy statistics for one fixed scale, i. e. for the statistics of q(l; x) for fixed l can not be an adequate class for the statistical description of financial and turbulent data. Comparing the maximum of the distributions at small scales in Figs. 10 and 11 one finds a less sharp tip for the turbulence data. This finding is in accordance with a comparably larger additive contribution in the diffusion term D(2) D a C bq2 for turbulence data. Knowing that D(2) has an additive term and quadratic q dependence it is clear that for small q values, i. e. for the tips of the distribution, the additive term dominates. Taking this result in combination with the Langevin equation, we see that for small q values Gaussian noise is present, which leads to a Gaussian tip of the probability distribution, as found in Fig. 10. A further consequence of the additive term in D(2) is that the structure functions as given by Eq. (96) are not independent and a general scaling solution does not exist. This fact has been confirmed by optimizing the coefficients [31]. This nonscaling behavior of turbulence seems to be present also for higher Reynolds numbers. It has been found that the additive terms becomes smaller but still remains relevant [96]. The cascade processes encoded in the functions D(1) and D(2) seem to depend on the Reynolds number, which might indicate that turbulence is less universal as commonly thought. As has been outlined above it is straight forward to extend the analysis to higher dimensions. For the case of longitudinal and transversal increments a symmetry for these two different directions of the complex velocity field has been found in the way that the cascade process runs with different speeds. The factor is 3/2 [64,96] and again is not in accordance with the proposed multifractal scaling property of turbulence.

The investigation of the advection of passive scalar in a turbulent flow along the present method has revealed the interesting result that the Markovian properties are fulfilled but that higher order Kramers–Moyal coefficients are nonnegligible [21]. This indicates that for passive scalars non-Gaussian noise is present, which can be attributed to the existence of shock like structures in the distribution of the passive scalar. Future Directions The description of complex systems on the basis of stochastic processes, which include nonlinear dynamics, seems to by a promising approach for the future. The challenge will be to extend the understanding to more complicated processes, like Levy processes, processes with no white noise or higher dimensional processes, just to mention some. As it has been shown in this contribution, for these cases it should be possible to derive from precise mathematical results general methods of data series analysis, too. Besides the further improve of the method, we are convinced that there is still a wide range of further applications. Advanced sensor techniques enables scientists to collect huge data sets measured with high precision. Based on the stochastic approach we have presented here it is not any more the question to put much efforts into noise reduction, but in contrary the involved noise can help to derive a better characterization and thus a better understanding of the system considered. Thus there seem to be many applications in the inanimate and the animate world, ranging form technical applications over socio-econo systems to biomedical applications. An interesting feature will be the extraction of of higher correlation aspects, like the question of the cause and effect chain, which may be unfolded by asymmetric determinism and noise terms reconstructed from data. Further Reading For further reading we suggest the publications [1,2,3,4, 8,9,10,11]. Acknowledgment The scientific results reported in this review have been worked out in close collaboration with many colleagues and students. We mention St. Barth, F. Böttcher, F. Ghasemi, I. Grabec, J. Gradisek, M. Haase, A. Kittel, D. Kleinhans, St. Lück, A. Nawroth, Chr. Renner, M. Siefert, and S. Siegert.

1151

1152

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

Bibliography 24. 1. Haken H (1983) Synergetics, An Introduction. Springer, Berlin 2. Haken H (1987) Advanced Synergetics. Springer, Berlin 3. Haken H (2000) Information and Self-Organization: A Macroscopic Approach to Complex Systems. Springer, Berlin 4. Kantz H, Schreiber T (1997) Nonlinear Time Series Analysis. Cambridge University Press, Cambridge 5. Yanovsky VV, Chechkin AV, Schertzer D, Tur AV (2000) Physica A 282:13 6. Schertzer D, Larchevéque M, Duan J, Yanovsky VV, Lovejoy S (2001) J Math Phys 42:200 7. Gnedenko BV, Kolmogorov AN (1954) Limit distributions of sums of independent random variables. Addison-Wesley, Cambridge 8. Risken H (1989) The Fokker-Planck Equation. Springer, Berlin 9. Gardiner CW (1983) Handbook of Stochastic Methods. Springer, Berlin 10. van Kampen NG (1981) Stochastic processes in physics and chemistry. North-Holland Publishing Company, Amsterdam 11. Hänggi P, Thomas H (1982) Stochastic processes: time evolution, symmetries and linear response. Phys Rep 88:207 12. Einstein A (1905) Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Ann Phys 17:549 13. Friedrich R, Zeller J, Peinke J (1998) A Note in Three Point Statistics of Velocity Increments in Turbulence. Europhys Lett 41:153 14. Lück S, Renner Ch, Peinke J, Friedrich R (2006) The Markov Einstein coherence length a new meaning for the Taylor length in turbulence. Phys Lett A 359:335 15. Tabar MRR, Sahimi M, Ghasemi F, Kaviani K, Allamehzadeh M, Peinke J, Mokhtari M, Vesaghi M, Niry MD, Bahraminasab A, Tabatabai S, Fayazbakhsh S, Akbari M (2007) Short-Term Prediction of Mediumand Large-Size Earthquakes Based on Markov and Extended Self-Similarity Analysis of Seismic Data. In: Bhattacharyya P, Chakrabarti BK (eds) Modelling Critical and Catastrophic Phenomena in Geoscience. Lecture Notes in Physics, vol 705. Springer, Berlin, pp 281–301 16. Kolmogorov AN (1931) Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math Ann 140:415 17. Siefert M, Kittel A, Friedrich R, Peinke J (2003) On a quantitative method to analyze dynamical and measurement noise. Europhys Lett 61:466 18. Böttcher F, Peinke J, Kleinhans D, Friedrich R, Lind PG, Haase M (2006) On the proper reconstruction of complex dynamical systems spoilt by strong measurement noise. Phys Rev Lett 97:090603 19. Kleinhans D, Friedrich R, Wächter M, Peinke J (2007) Markov properties under the influence of measurement noise. Phys Rev E 76:041109 20. Renner C, Peinke J, Friedrich R (2001) Experimental indications for Markov properties of small scale turbulence. J Fluid Mech 433:383 21. Tutkun M, Mydlarski L (2004) Markovian properties of passive scalar increments in grid-generated turbulence. New J Phys 6:49 22. Marcq P, Naert A (2001) A Langevin equation for turbulent velocity increments. Phys Fluids 13:2590 23. Langner M, Peinke J, Rauh A (2008) A Langevin analysis with

25.

26.

27.

28. 29.

30.

31.

32.

33.

34. 35.

36. 37.

38.

39.

40. 41.

42.

43.

application to a Rayleigh-Bénard convection experiment. Exp Fluids (submitted) Wächter M, Kouzmitchev A, Peinke J (2004) Increment definitions for sale-dependent analysis of stochastic data. Phys Rev E 70:055103(R) Ragwitz M, Kantz H (2001) Indispensable finite time corrections for Fokker-Planck equations from time series. Phys Rev Lett 87:254501 Ragwitz M, Kantz H (2002) Comment on: Indispensable finite time correlations for Fokker-Planck equations from time series data-Reply. Phys Rev Lett 89:149402 Friedrich R, Renner C, Siefert M, Peinke J (2002) Comment on: Indispensable finite time correlations for Fokker-Planck equations from time series data. Phys Rev Lett 89:149401 Siegert S, Friedrich R (2001) Modeling nonlinear Lévy processes by data analysis. Phys Rev E 64:041107 Kleinhans D, Friedrich R, Nawroth AP, Peinke J (2005) An iterative procedure for the estimation of drift and diffusion coefficients of Langevin processes. Phys Lett A 346:42 Kleinhans D, Friedrich R (2007) Note on Maximum Likelihood estimation of drift and diffusion functions. Phys Lett A 368:194 Nawroth AP, Peinke J, Kleinhans D, Friedrich R (2007) Improved estimation of Fokker-Planck equations through optimisation. Phys Rev E 76:056102 Gradisek J, Grabec I, Siegert S, Friedrich R (2002). Stochastic dynamics of metal cutting: Bifurcation phenomena in turning. Mech Syst Signal Process 16(5):831 Gradisek J, Siegert S, Friedrich R, Grabec I (2002) Qualitative and quantitative analysis of stochastic processes based on measured data-I. Theory and applications to synthetic data. J Sound Vib 252(3):545 Purwins HG, Amiranashvili S (2007) Selbstorganisierte Strukturen im Strom. Phys J 6(2):21 Bödeker HU, Röttger M, Liehr AW, Frank TD, Friedrich R, Purwins HG (2003) Noise-covered drift bifurcation of dissipative solitons in planar gas-discharge systems. Phys Rev E 67: 056220 Purwins HG, Bödeker HU, Liehr AW (2005) In: Akhmediev N, Ankiewicz A (eds) Dissipative Solitons. Springer, Berlin Bödeker HU, Liehr AW, Frank TD, Friedrich R, Purwins HG (2004) Measuring the interaction law of dissipative solitions. New J Phys 6:62 Liehr AW, Bödeker HU, Röttger M, Frank TD, Friedrich R, Purwins HG (2003) Drift bifurcation detection for dissipative solitons. New J Phys 5:89 Friedrich R, Siegert S, Peinke J, Lück S, Siefert M, Lindemann M, Raethjen J, Deuschl G, Pfister G (2000) Extracting model equations from experimental data. Phys Lett A 271:217 Siefert M Peinke J (2004) Reconstruction of the Deterministic Dynamics of Stochastic systems. Int J Bifurc Chaos 14:2005 Anahua E, Lange M, Böttcher F, Barth S, Peinke J (2004) Stochastic Analysis of the Power Output for a Wind Turbine. DEWEK 2004, Wilhelmshaven, 20–21 October 2004 Anahua E, Barth S, Peinke J (2006) Characterization of the wind turbine power performance curve by stochastic modeling. EWEC 2006, BL3.307, Athens, February 27–March 2 Anahua E, Barth S, Peinke J (2007) Characterisation of the power curve for wind turbines by stochastic modeling. In: Peinke J, Schaumann P, Barth S (eds) Wind Energy – Proceedings of the Euromech Colloquium. Springer, Berlin, p 173–177

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

44. Anahua E, Barth S, Peinke J (2008) Markovian Power Curves for Wind Turbines. Wind Energy 11:219 45. Kriso S, Friedrich R, Peinke J, Wagner P (2002) Reconstruction of dynamical equations for traffic flow. Phys Lett A 299:287 46. Kern M, Buser O, Peinke J, Siefert M, Vulliet L (2005) Stochastic analysis of single particle segregational dynamics. Phys Lett A 336:428 47. Kuusela T (2004) Stochastic heart-rate model can reveal pathologic cardiac dynamics. Phys Rev E 69:031916 48. Ghasemi F, Peinke J, Reza Rahimi Tabar M, Muhammed S (2006) Statistical properties of the interbeat interval cascade in human subjects. Int J Mod Phys C 17:571 49. Ghasemi F, Sahimi M, Peinke J, Reza Rahimi Tabar M (2006) Analysis of Non-stationary Data for Heart-rate Fluctuations in Terms of Drift and Diffusion Coefficients. J Biological Phys 32:117 50. Tabar MRR, Ghasemi F, Peinke J, Friedrich R, Kaviani K, Taghavi F, Sadghi S, Bijani G, Sahimi M (2006) New computational approaches to analysis of interbeat intervals in human subjects. Comput Sci Eng 8:54 51. Prusseit J, Lehnertz K (2007) Stochastic Qualifiers of Epileptic Brain Dynamics. Phys Rev Lett 98:138103 52. Sura P, Gille ST (2003) Interpreting wind-driven Southern Ocean variability in a stochastic framework. J Marine Res 61:313 53. Sura P (2003) Stochastic Analysis of Southern and Pacific Ocean Sea Surface Winds. J Atmospheric Sci 60:654 54. Egger J, Jonsson T (2002) Dynamic models for islandic meteorological data sets. Tellus A 51(1):1 55. Letz T, Peinke J, Kittel A (2008) How to characterize chaotic time series distorted by interacting dynamical noise. Preprint 56. Siegert S, Friedrich R, Peinke J (1998) Analysis of data sets of stochastic systems. Phys Lett A 234:275–280 57. Gradisek J, Siegert S, Friedrich R, Grabec I (2000) Analysis of time series from stochastic processes. Phys Rev E 62:3146 58. Gradisek J, Friedrich R, Govekar E, Grabec I (2002) Analysis of data from periodically forced stochastic processes. Phys Lett A 294:234 59. Frank TD, Beek PJ, Friedrich R (2004) Identifying noise sources of time-delayed feedback systems. Phys Lett A 328:219 60. Patanarapeelert K, Frank TD, Friedrich R, Beek PJ, Tang IM (2006) A data analysis method for identifying deterministic components of stable and unstable time-delayed systems with colored noise. Phys Lett A 360:190 61. Shinriki M, Yamamoto M, Mori S (1981) Multimode Oscillations in a Modified Van-der-Pol Oscillator Containing a Positive Nonlinear Conductance. Proc IEEE 69:394 62. Friedrich R, Peinke J (1997). Statistical properties of a turbulent cascade. Physica D 102:147 63. Friedrich R, Peinke J (1997) Description of a turbulent cascade by a Fokker-Planck equation. Phys Rev Lett 78:863 64. Siefert M, Peinke J (2006) Joint multi-scale statistics of longitudinal and transversal increments in small-scale wake turbulence. J Turbul 7:1 65. Friedrich R, Peinke J, Renner C (2000) How to quantify deterministic and random influences on the statistics of the foreign exchange market. Phys Rev Lett 84:5224 66. Renner C, Peinke J, Friedrich R (2001) Markov properties of high frequency exchange rate data. Physica A 298:499–520 67. Ghasemi F, Sahimi M, Peinke J, Friedrich R, Reza Jafari G, Reza Rahimi Tabar M (2007) Analysis of Nonstationary Stochastic

68.

69. 70.

71.

72.

73.

74.

75.

76. 77.

78. 79.

80. 81. 82.

83. 84. 85. 86. 87.

Processes with Application to the Fluctuations in the Oil Price. Phys Rev E (Rapid Commun) 75:060102 Farahpour F, Eskandari Z, Bahraminasab A, Jafari GR, Ghasemi F, Reza Rahimi Tabar M, Muhammad Sahimi (2007) An Effective Langevin Equation for the Stock Market Indices in Approach of Markov Length Scale. Physica A 385:601 Wächter M, Riess F, Kantz H, Peinke J (2003) Stochastic analysis of raod surface roughness. Europhys Lett 64:579 Jafari GR, Fazeli SM, Ghasemi F, Vaez Allaei SM, Reza Rahimi Tabar M, Iraji zad A, Kavei G (2003) Stochastic Analysis and Regeneration of Rough Surfaces. Phys Rev Lett 91:226101 Friedrich R, Galla T, Naert A, Peinke J, Schimmel T (1998) Disordered Structures Analyzed by the Theory of Markov Processes. In: Parisi J, Müller S, Zimmermann W (eds) A Perspective Look at Nonlinear Media. Lecture Notes in Physics, vol 503. Springer, Berlin Waechter M, Riess F, Schimmel T, Wendt U, Peinke J (2004) Stochastic analysis of different rough surfaces. Eur Phys J B 41:259 Sangpour P, Akhavan O, Moshfegh AZ, Jafari GR, Reza Rahimi Tabar M (2005) Controlling Surface Statistical Properties Using Bias Voltage: Atomic force microscopy and stochastic analysis. Phys Rev B 71:155423 Jafari GR, Reza Rahimi Tabar M, Iraji zad A, Kavei G (2007) Etched Glass Surfaces, Atomic Force Microscopy and Stochastic Analysis. J Phys A 375:239 Ghasemi F, Bahraminasab A, Sadegh Movahed M, Rahvar S, Sreenivasan KR, Reza Rahimi Tabar M (2006) Characteristic Angular Scales of Cosmic Microwave Background Radiation. J Stat Mech P11008 Nawroth AP, Peinke J (2006) Multiscale reconstruction of time series. Phys Lett A 360:234 Ghasemi F, Peinke J, Sahimi M, Reza Rahimi Tabar M (2005) Regeneration of Stochastic Processes: An Inverse Method. Eur Phys J B 47:411 Kolmogorov AN (1941) Dissipation of energy in locally isotropic turbulence. Dokl Akad Nauk SSSR 32:19 Kolmogorov AN (1962) A refinement of previous hypotheses concerning the local structure of turbulence in a viscous incompressible fluid at high Reynolds number. J Fluid Mech 13:82 Frisch U (1995) Turbulence. Cambridge University Press, Cambridge Sreenivasan KR, Antonia RA (1997) The phenomenology of small-scale turbulence. Annu Rev Fluid Mech 29:435–472 Ghashghaie S, Breymann W, Peinke J, Talkner P, Dodge Y (1996) Turbulent Cascades in Foreign Exchange Markets. Nature 381:767–770 Nawroth AP, Peinke J (2006) Small scale behavior of financial data. Eur Phys J B 50:147 Karth M, Peinke J (2002) Stochastic modelling of fat-tailed probabilities of foreign exchange rates. Complexity 8:34 Bouchaud JP, Potters M, Meyer M (2000) Apparent multifractality in financial time series. Eur Phys J B 13:595–599 Bouchaud JP (2001) Power laws in economics and finance: some ideas from physics Quant Finance 1:105–112 Mandelbrot BB (2001) Scaling in financial prices: I. Tails and dependence. II. Multifractals and the star equation. Quant Finance 1:113–130

1153

1154

Fluctuations, Importance of: Complexity in the View of Stochastic Processes

88. Embrechts P, Klüppelberg C, Mikosch T (2003) Modelling extremal events. Springer, Berlin 89. Mantegna RN, Stanley HE (1995) Nature 376:46–49 90. McCauley J (2000) The Futility of Utility: how market dynamics marginalize Adam Smith. Physica A 285:506–538 91. Muzy JF, Sornette D, Delour J, Areneodo A (2001) Multifractal returns and hierarchical portfolio theory. Quant Finance 1: 131–148 92. Viscek T (1992) Fractal Growth Phenomena. World Scientific, Singapore 93. Renner C, Peinke J, Friedrich R (2000) Markov properties of

high frequency exchange rate data. Int J Theor Appl Finance 3:415 94. Davoudi J, Reza Rahimi Tabar M (1999) Theoretical Model for Kramers-Moyal’s description of Turbulence Cascade. Phys Rev Lett 82:1680 95. Renner C, Peinke J, Friedrich R, Chanal O, Chabaud B (2002) Universality of small scale turbulence. Phys Rev Lett 89: 124502 96. Siefert M, Peinke J (2004) Different cascade speeds for longitudinal and transverse velocity increments of small-scale turbulence. Phys Rev E 70:015302R

Food Webs

Food Webs JENNIFER A. DUNNE1,2 1 Santa Fe Institute, Santa Fe, USA 2 Pacific Ecoinformatics and Computational Ecology Lab, Berkeley, USA Article Outline Glossary Definition of the Subject Introduction: Food Web Concepts and Data Early Food Web Structure Research Food Web Properties Food Webs Compared to Other Networks Models of Food Web Structure Structural Robustness of Food Webs Food Web Dynamics Ecological Networks Future Directions Bibliography Glossary Connectance(C) The proportion of possible links in a food web that actually occur. There are many algorithms for calculating connectance. The simplest and most widely used algorithm, sometimes referred to as “directed connectance,” is links per species2 (L/S 2 ), where S2 represents all possible directed feeding interactions among S species, and L is the total number of actual feeding links. Connectance ranges from 0.03 to 0.3 in food webs, with a mean of 0.10 to 0.15. Consumer-resource interactions A generic way of referring to a wide variety of feeding interactions, such as predator-prey, herbivore-plant or parasite-host interactions. Similarly, “consumer” refers generically to anything that consumes or preys on something else, and “resource” refers to anything that is consumed or preyed upon. Many taxa are both consumers and resources within a particular food web. Food web The network of feeding interactions among diverse co-occurring species in a particular habitat. Trophic species (S) Defined within the context of a particular food web, a trophic species is comprised of a set of taxa that share the same set of consumers and resources. A particular trophic species is represented by a single node in the network, and that node is topologically distinct from all other nodes. “Trophic species” is a convention introduced to minimize bias due to uneven resolution in food web data and to focus analysis

and modeling on functionally distinct network components. S is used to denote the number of trophic species in a food web. The terms “trophic species,” “species,” and “taxa” will be used somewhat interchangeably throughout this article to refer to nodes in a food web. “Original species” will be used specifically to denote the taxa found in the original dataset, prior to trophic species aggregation. Definition of the Subject Food webs refer to the networks of feeding (“trophic”) interactions among species that co-occur within particular habitats. Research on food webs is one of the few subdisciplines within ecology that seeks to quantify and analyze direct and indirect interactions among diverse species, rather than focusing on particular types of taxa. Food webs ideally represent whole communities including plants, bacteria, fungi, invertebrates and vertebrates. Feeding links represent transfers of biomass and encompass a variety of trophic strategies including detritivory, herbivory, predation, cannibalism and parasitism. At the base of every food web are one or more types of autotrophs, organisms such as plants or chemoautotrophic bacteria, which produce complex organic compounds from an external energy source (e. g., light) and simple inorganic carbon molecules (e. g., CO2 ). Food webs also have a detrital component—non-living particulate organic material that comes from the body tissues of organisms. Feeding-mediated transfers of organic material, which ultimately trace back to autotrophs or detritus via food chains of varying lengths, provide the energy, organic carbon and nutrients necessary to fuel metabolism in all other organisms, referred to as heterotrophs. While food webs have been a topic of interest in ecology for many decades, some aspects of contemporary food web research fall within the scope of the broader crossdisciplinary research agenda focused on complex, “realworld” networks, both biotic and abiotic [2,83,101]. Using the language of graph theory and the framework of network analysis, species are represented by vertices (nodes) and feeding links are represented by edges (links) between vertices. As with any other network, the structure and dynamics of food webs can be quantified, analyzed and modeled. Links in food webs are generally considered directed, since biomass flows from a resource species to a consumer species (A ! B). However, trophic links are sometimes treated as undirected, since any given trophic interaction alters the population and biomass dynamics of both the consumer and resource species ( A $ B). The types of questions explored in food web research range from “Do

1155

1156

Food Webs

food webs from different habitats display universal topological characteristics, and how does their structure compare to that of other types of networks?” to “What factors promote different aspects of stability of complex food webs and their components given internal dynamics and external perturbations?” Two fundamental measures used to characterize food webs are S, the number of species or nodes in a web, and C, connectance—the proportion of possible feeding links that are actually realized in a web (C D L/S 2 , where L is the number of observed directed feeding links, and S2 is the number of possible directed feeding interactions among S taxa). This article focuses on research that falls at the intersection of food webs and complex networks, with an emphasis on network structure augmented by a brief discussion of dynamics. This is a subset of a wide variety of ecological research that has been conducted on feeding interactions and food webs. Refer to the “Books and Reviews” in the bibliography for more information about a broader range of research related to food webs. Introduction: Food Web Concepts and Data The concept of food chains (e. g., grass is eaten by grasshoppers which are eaten by mice which are eaten by owls; A !B ! C ! D) goes back at least several hundred years, as evidenced by two terrestrial and aquatic food chains briefly described by Carl Linnaeus in 1749 [42]. The earliest description of a food web may be the mostly detrital-based feeding interactions observed by Charles Darwin in 1832 on the island of St. Paul, which had only two bird species (Darwin 1939, as reported by Egerton [42]): By the side of many of these [tern] nests a small flying-fish was placed; which, I suppose, had been brought by the male bird for its partner . . . quickly a large and active crab (Craspus), which inhabits the crevices of the rock, stole the fish from the side of the nest, as soon as we had disturbed the birds. Not a single plant, not even a lichen, grows on this island; yet it is inhabited by several insects and spiders. The following list completes, I believe, the terrestrial fauna: a species of Feronia and an acarus, which must have come here as parasites on the birds; a small brown moth, belonging to a genus that feeds on feathers; a staphylinus (Quedius) and a woodlouse from beneath the dung; and lastly, numerous spiders, which I suppose prey on these small attendants on, and scavengers of the waterfowl. The earliest known diagrams of generalized food chains and food webs appeared in the late 1800s, and diagrams of

specific food webs, began appearing in the early 1900s, for example the network of insect predators and parasites on cotton-feeding weevils (“the boll weevil complex,” [87]). By the late 1920s, diagrams and descriptions of terrestrial and marine food webs were becoming more common (e. g., Fig. 1 from [103], see also [48,104]). Charles Elton introduced the terms “food chain” and “food cycle” in his classic early textbook, Animal Ecology [43]. By the time Eugene Odum published a later classic textbook, Fundamentals of Ecology [84], the term “food web” was starting to replace “food cycle.” From the 1920s to the 1980s, dozens of system-specific food web diagrams and descriptions were published, as well as some webs that were more stylized (e. g., [60]) and that quantified link flows or species biomasses. In 1977, Joel Cohen published the first comparative studies of empirical food web network structure using up to 30 food webs collected from the literature [23,24]. To standardize the data, he transformed the diagrams and descriptions of webs in the literature into binary matrices with m rows and n columns [24]. Each column is headed by the number of one of the consumer taxa in a particular web, and each row is headed by the number of one of the resource taxa for that web. If wij represents the entry in the ith row and the jth column, it equals 1 if consumer j eats resource i or 0 if j does not eat i. This matrix-based representation of data is still often used, particularly in a full S by S format (where S is the number of taxa in the web), but for larger datasets a compressed two- or three-column notation for observed links is more efficient (Fig. 2). By the mid-1980s, those 30 initial webs had expanded into a 113-web catalog [30] which included webs mostly culled from the literature, dating back to the 1923 Bear Island food web ([103], Fig. 1). However, it was apparent that there were many problems with the data. Most of the 113 food webs had very low diversity compared to the biodiversity known to be present in ecosystems, with a range of only 5 to 48 species in the original datasets and 3 to 48 trophic species. This low diversity was largely due to very uneven resolution and inclusion of taxa in most of these webs. The webs were put together in many different ways and for various purposes that did not include comparative, quantitative research. Many types of organisms were aggregated, underrepresented, or missing altogether, and in a few cases animal taxa had no food chains connecting them to basal species. In addition, cannibalistic links were purged when the webs were compiled into the 113-web catalog. To many ecologists, these food webs looked like little more than idiosyncratic cartoons of much richer and more complex species interactions found in natural systems, and they appeared to be an extremely un-

Food Webs

Food Webs, Figure 1 A diagram of a terrestrial Arctic food web, with a focus on nitrogen cycling, for Bear Island, published in 1923 [103]

Food Webs, Figure 2 Examples of formats for standardized notation of binary food web data. A hypothetical web with 6 taxa and 12 links is used. Numbers 1–6 correspond to the different taxa. a Partial matrix format: the 1s or 0s inside the matrix denote the presence or absence of a feeding link between a consumer (whose numbers 3–6 head columns) and a resource (whose numbers 1–6 head rows); b Full matrix format: similar to a, but all 6 taxa are listed at the heads of columns and rows; c Two-column format: a consumer’s number appears in the first column, and one of its resource’s numbers appears in the second column; d Three-column format: similar to c, but where there is a third number, the second and third numbers refer to a range of resource taxa. In this hypothetical web, taxa numbers 1 and 2 are basal taxa (i. e., taxa that do not feed on other taxa—autotrophs or detritus), and taxa numbers 3, 5, and 6 have cannibalistic links to themselves

sound foundation on which to build understanding and theory [86,92]. Another catalog of “small” webs emerged in the late 1980s, a set of 60 insect-dominated webs with 2 to 87 original species (mean = 22) and 2 to 54 trophic species (mean = 12) [102]. Unlike the 113-web catalog, these webs are

highly taxonomically resolved, mostly to the species level. However, they are still small due to their focus, in most cases, on insect interactions in ephemeral microhabitats such as phytotelmata (i. e., plant-held aquatic systems such as water in tree holes or pitcher plants) and singular detrital sources (e. g., dung paddies, rotting logs, animal car-

1157

1158

Food Webs

Food Webs, Figure 3 Food web of Little Rock Lake, Wisconsin [63]. 997 feeding links among 92 trophic species are shown. Image produced with FoodWeb3D, written by R.J. Williams, available at the Pacific Ecoinformatics and Computational Ecology Lab (www.foodwebs.org)

casses). Thus, while the 113-web catalog presented food webs for communities at fairly broad temporal and spatial scales, but with low and uneven resolution, the 60-web catalog presented highly resolved but very small spatial and temporal slices of broader communities. These two very different catalogs were compiled into ECOWeB, the “Ecologists Co-Operative Web Bank,” a machine readable database of food webs that was made available by Joel Cohen in 1989 [26]. The two catalogs, both separately and together as ECOWeB, were used in many studies of regularities in food web network structure, as discussed in the next Sect. “Early Food Web Structure Research”. A new level of detail, resolution and comprehensiveness in whole-community food web characterization was presented in two seminal papers in 1991. Gary Polis [92] published an enormous array of data for taxa found in the Coachella Valley desert (California). Over two decades, he collected taxonomic and trophic information on at least 174 vascular plant species, 138 vertebrate species, 55 spider species, thousands of insect species including parasitoids, and unknown numbers of microorganisms, acari, and nematodes. He did not compile a complete food web including all of that information, but instead reported a number of detailed subwebs (e. g., a soil web, a scorpion-focused web, a carnivore web, etc.), each of which was more diverse than most of the ECOWeB webs. On the basis of the subwebs and a simplified, aggregated 30-taxa web of the whole community, he concluded that “. . . most catalogued webs are oversimplified caricatures of actual communities

. . . [they are] grossly incomplete representations of communities in terms of both diversity and trophic connections.” At about the same time, Neo Martinez [63] published a detailed food web for Little Rock Lake (Wisconsin) that he compiled explicitly to test food web theory and patterns (see Sect. “Early Food Web Structure Research”). By piecing together diversity and trophic information from multiple investigators actively studying various types of taxa in the lake, he was able to put together a relatively complete and highly resolved food web of 182 taxa, most identified to the genus, species, or ontogentic life-stage level, including fishes, copepods, cladocera, rotifers, diptera and other insects, mollusks, worms, porifera, algae, and cyanobacteria. In later publications, Martinez modified the original dataset slightly into one with 181 taxa. The 181 taxa web aggregates into a 92 trophic-species web, with nearly 1000 links among the taxa (Fig. 3). This dataset, and the accompanying analysis, set a new standard for food web empiricism and analysis. It still stands as the best wholecommunity food web compiled, in terms of even, detailed, comprehensive resolution. Since 2000, the use of the ECOWeB database for comparative analysis and modeling has mostly given way to a focus on a smaller set of more recently published food webs [10,37,39,99,110]. These webs, available through www.foodwebs.org or from individual researchers, are compiled for particular, broad-scale habitats such as St. Mark’s Estuary [22], Little Rock Lake [63], the island of St. Martin [46], and the Northeast U.S. Marine Shelf [61].

Food Webs

Most of the food webs used in contemporary comparative research are still problematic—while they generally are more diverse and/or evenly resolved than the earlier webs, most could still be resolved more highly and evenly. Among several issues, organisms such as parasites are usually left out (but see [51,59,67,74]), microorganisms are either missing or highly aggregated, and there is still a tendency to resolve vertebrates more highly than lower level organisms. An important part of future food web research is the compilation of more inclusive, evenly resolved, and well-defined datasets. Meanwhile, the careful selection and justification of datasets to analyze is an important part of current research that all too often is ignored. How exactly are food web data collected? In general, the approach is to compile as complete a species list as possible for a site, and then to determine the diets of each species present at that site. However, researchers have taken a number of different approaches to compiling food webs. In some cases, researchers base their food webs on observations they make themselves in the field. For example, ecologists in New Zealand have characterized the structure of stream food webs by taking samples from particular patches in the streams, identifying the species present in those samples, taking several individuals of each species present, and identifying their diets through gutcontent analysis [106]. In other cases, researchers compile food web data by consulting with experts and conducting literature searches. For example, Martinez [63] compiled the Little Rock Lake (WI) food web by drawing on the expertise of more than a dozen biologists who were specialists on various types of taxa and who had been working at Little Rock Lake for many years. Combinations of these two approaches can also come into play—for example, a researcher might compile a relatively complete species list through field-based observations and sampling, and then assign trophic habits to those taxa through a combination of observation, consulting with experts, and searching the literature and online databases. It is important to note that most of the webs used for comparative research can be considered “cumulative” webs. Contemporary food web data range from time- and space-averaged or “cumulative” (e. g., [63]) to more finely resolved in time (e. g., seasonal webs—[6]) and/or space (e. g., patch-scale webs—[106]; microhabitat webs—[94]). The generally implicit assumption underlying cumulative food web data is that the set of species in question co-exist within a habitat and individuals of those species have the opportunity over some span of time and space to interact directly. To the degree possible, such webs document who eats whom among all species within a macrohabitat, such as a lake or meadow, over multiple seasons or years,

including interactions that are low frequency or represent a small proportion of consumption. Such cumulative webs are used widely for comparative research to look at whether there are regularities in food web structure across habitat (see Sect. “Food Webs Compared to Other Networks” and Sect. “Models of Food Web Structure”). More narrowly defined webs at finer scales of time or space, or that utilize strict evidence standards (e. g., recording links only through gut content sampling), have been useful for characterizing how such constraints influence perceived structure within habitats [105,106], but are not used as much to look for cross-system regularities in trophic network structure. Early Food Web Structure Research The earliest comparative studies of food web structure were published by Joel Cohen in 1977. Using data from the first 30-web catalog, one study focused on the ratio of predators to prey in food webs [23], and the other investigated whether food webs could be represented by single dimension interval graphs [24], a topic which continues to be of interest today (see Sect. “Food Webs Compared to Other Networks”). In both cases, he found regularities—(1) a ratio of prey to predators of 3/4 regardless of the size of the web, and (2) most of the webs are interval, such that all species in a food web can be placed in a fixed order on a line such that each predator’s set of prey forms a single contiguous segment of that line. The prey-predator ratio paper proved to be the first salvo in a quickly growing set of papers that suggested that a variety of food web properties were “scale-invariant.” In its strong sense, scale invariance means that certain properties have constant values as the size (S) of food webs change. In its weak sense, scale-invariance refers to properties not changing systematically with changing S. Other scale-invariant patterns identified include constant proportions of top species (Top, species with no predators), intermediate species (Int, species with both predators and prey), and basal species (Bas, species with no prey), collectively called “species scaling laws” [12], and constant proportions of T-I, I-B, T-B, and I-I links between T, I, and B species, collectively called “link scaling laws” [27]. Other general properties of food webs were thought to include: food chains are short [31,43,50,89]; cycling/looping is rare (e. g., A ! B ! C ! A; [28]); compartments, or subwebs with many internal links that have few links to other subwebs, are rare [91]; omnivory, or feeding at more than one trophic level, is uncommon [90]; and webs tend to be interval, with instances of intervality decreasing as S increases [24,29,116]. Most of these patterns were reported

1159

1160

Food Webs

for the 113-web catalog [31], and some of the regularities were also documented in a subset the 60 insect-dominated webs [102]. Another, related prominent line of early comparative food web research was inspired by Bob May’s work from the early 1970s showing that simple, abstract communities of interacting species will tend to transition sharply from local stability to instability as the complexity of the system increases—in particular as the number of species (S), the connectance (C) or the average interaction strength (i) increase beyond critical values [69,70]. He formalized this as a criterion that ecological communities near equilibrium will tend to be stable if i(SC)1/2 < 1. This mathematical analysis flew in the face of the intuition of many ecologists (e. g., [44,50,62,84]) who felt that increased complexity (in terms of greater numbers of species and links between them) in ecosystems gives rise to stability. May’s criterion and the general question of how diversity is maintained in communities provided a framework within which to analyze some readily accessible empirical data, namely the numbers of links and species in food webs. Assuming that average interaction strength (i) is constant, May’s criterion suggests that communities can be stable given increasing diversity (S) as long as connectance (C) decreases. This can be empirically demonstrated using food web data in three similar ways, by showing that 1) C hyperbolically declines as S increases, so that the product SC remains constant, 2) the ratio of links to species (L/S), also referred to as link or linkage density, remains constant as S increases, or 3) L plotted as a function of S on a log-log graph, producing a power-law relation of the form L D ˛S ˇ , displays an exponent of ˇ D 1 (the slope of the regression) indicating a linear relationship between L and S. These relationships were demonstrated across food webs in a variety of studies (see detailed review in [36]), culminating with support from the 113web catalog and the 60 insect-dominated web catalog. Cohen and colleagues identified the “link-species scaling law” of L/S  2 using the 113 web catalog (i. e., there are two links per species on average in any given food web, regardless of its size) [28,30], and SC was reported as “roughly independent of species number” in a subset of the 60 insectdominated webs [102]. However, these early conclusions about patterns of food web structure began to crumble with the advent of improved data and new analysis methods that focused on the issues of species aggregation, sampling effort, and sampling consistency [36]. Even before there was access to improved data, Tom Schoener [93] set the stage for critiques of the conventional paradigm in his Ecological Society of America MacArthur Award lecture, in which he

explored the ramifications of a simple conceptual model based on notions of “generality” (what Schoener referred to as “generalization”) and “vulnerability.” He adopted the basic notion underlying the “link-species scaling law”: that how many different taxa something can eat is constrained, which results in the number of resource taxa per consumer taxon (generality) holding relatively steady with increasing S. However, he further hypothesized that the ability of resource taxa to defend again consumers is also constrained, such that the number of consumer taxa per resource taxon (vulnerability) should increase with increasing S. A major consequence of this conceptual model is that total links per species (L/S, which includes links to resources and consumers) and most other food web properties should display scale-dependence, not scale-invariance. A statistical reanalysis of a subset of the 113-web catalog supported this contention as well as the basic assumptions of his conceptual model about generality and vulnerability. Shortly thereafter, more comprehensive, detailed datasets, like the ones for Coachella Valley [92] and Little Rock Lake [63], began to appear in the literature. These and other new datasets provided direct empirical counterpoints to many of the prevailing notions about food webs: their connectance and links per species were much higher than expected from the “link-species scaling law,” food chains could be quite long, omnivory and cannibalism and looping could be quite frequent, etc. In addition, analyzes such as the one by Martinez [63], in which he systematically aggregated the Little Rock Lake food web taxa and links to generate small webs that looked like the earlier data, demonstrated that “most published food web patterns appear to be artifacts of poorly resolved data.” Comparative studies incorporating newly available data further undermined the whole notion of “scale invariance” of most properties, particularly L/S(e. g., [65,66]). For many researchers, the array of issues brought to light by the improved data and more sophisticated analyzes was enough for them to turn their back on structural food web research. A few hardy researchers sought to build new theory on top of the improved data. “Constant connectance” was suggested as an alternative hypothesis to constant L/S (the “link-species scaling law”), based on a comparative analysis of the relationship of L to S across a subset of available food webs including Little Rock Lake [64]. The mathematical difference between constant C and constant L/S can be simply stated using a log-log graph of links as a function of species (Fig. 4). If a power law exists of the form L D ˛S ˇ , in the case of the link-species scaling law ˇ D 1, which means that L D ˛S; L/S D ˛, indicating constant L/S. In the case of constant connectance, ˇ D 2 and thus L D ˛S 2 ,

Food Webs

S animal species in a food web, based on three parameters for an individual of species j: (1) net energy gain from consumption of an individual of species i, (2) the encounter rate of individuals of species i, and (3) the handling time spent attacking an individual of species i. This allows estimation of C for the animal portion of food webs, once data aggregation and cumulative sampling, well-known features of empirical datasets, are taken into account. The model does a good job of predicting values of C observed in empirical food webs and associated patterns of C across food webs. Food Webs, Figure 4 The relationship of links to species for 19 trophic-species food webs from a variety of habitats (black circles). The solid line shows the log-log regression for the empirical data, the dashed line shows the prediction for constant connectance, and the dotted line shows the prediction for the link-species scaling law (reproduced from [36], Fig. 1)

L/S 2 D ˛, indicating constant C (L/S 2 ). Constant connectance means that L/S increases as a fixed proportion of S. One ecological interpretation of constant connectance is that consumers are likely to exploit an approximately constant fraction of available prey species, so as diversity increases, links per species increases [108]. Given the L D ˛S ˇ framework, ˇ D 2 was reported for a set of 15 webs derived from an English pond [108], and ˇ D 1:9 for a set of 50 Adirondack lakes [65], suggesting connectance may be constant across webs within a habitat or type of habitat. Across habitats, the picture is less clear. While ˇ D 2 was reported for a small subset of the 5 most “credible” food webs then available from different habitats [64], several analyzes of both the old ECOWeB data and the more reliable newer data suggest that the exponent lies somewhere between 1 and 2, suggesting that C declines non-linearly with S (Fig. 4, [27,30,36,64,79,93]). For example, Schoener’s reanalysis of the 113-web catalog suggested that ˇ D 1:5, indicating that L2/3 is proportional to S. A recent analysis of 19 recent trophic-species food webs with S of 25 to 172 also reported ˇ D 1:5, with much scatter in the data (Fig. 4). A recent analysis has provided a possible mechanistic basis for the observed constrained variation in C ( 0.03 to 0.3 in cumulative community webs) as well as the scaling of C with S implied by ˇ intermediate between 1 and 2 [10]. A simple diet breadth model based on optimal foraging theory predicts both of these patterns across food webs as an emergent consequence of individual foraging behavior of consumers. In particular, a contingency model of optimal foraging is used to predict mean diet breadth for

Food Web Properties Food webs have been characterized by a variety of properties or metrics, several of which have been mentioned previously (Sect. “Early Food Web Structure Research”). Many of these properties are quantifiable just using the basic network structure (“topology”) of feeding interactions. These types of topological properties have been used to evaluate simple models of food web structure (Sect. “Food Web Properties”). Any number of properties can be calculated on a given network—ecologists tend to focus on properties that are meaningful within the context of ecological research, although other properties such as path length (Path) and clustering coefficient (Cl) have been borrowed from network research [109]. Examples of several types of food web network structure properties, with common abbreviations and definitions, follow. Fundamental Properties: These properties characterize very simple, overall attributes of food web network structure. S: number of nodes in a food web L: number of links in a food web L/S: links per species C, or L/S2 : connectance, or the proportion of possible links that are realized Types of Taxa: These properties characterize what proportion or percentage of taxa within a food web fall into particular topologically defined roles. Bas: percentage of basal taxa (taxa without resources) Int: percentage of intermediate taxa (taxa with both consumers and resources) Top: percentage of top taxa (taxa with no consumers) Herb: percentage of herbivores plus detritivores (taxa that feed on autotrophs or detritus) Can: percentage of cannibals (taxa that feed on their own taxa)

1161

1162

Food Webs

Omn: percentage of omnivores (taxa that feed that feed on taxa at different trophic levels Loop: percentage of taxa that are in loops, food chains in which a taxon occur twice (e. g., A ! B ! C ! A) Network Structure: These properties characterize other attributes of network structure, based on how links are distributed among taxa. TL: trophic level averaged across taxa. Trophic level represents how many steps energy must take to get from an energy source to a taxon. Basal taxa have TL = 1, and obligate herbivores are TL = 2. TL can be calculated using many different algorithms that take into account multiple food chains that can connect higher level organisms to basal taxa (Williams and Martinez 2004). ChLen: mean food chain length, averaged over all species ChSD: standard deviation of ChLen ChNum: log number of food chains LinkSD: normalized standard deviation of links (# links per taxon) GenSD: normalized standard deviation of generality (# resources per taxon) VulSD: normalized standard deviation of vulnerability (# consumers per taxon) MaxSim: mean across taxa of the maximum trophic similarity of each taxon to other taxa Ddiet: the level of diet discontinuity—the proportion of triplets of taxa with an irreducible gap in feeding links over the number of possible triplets [19]—a local estimate of intervality Cl: clustering coefficient (probability that two taxa linked to the same taxon are linked) Path: characteristic path length, the mean shortest set of links (where links are treated as undirected) between species pairs The previous properties (most of which are described in [110] and [39]) each provide a single metric that characterizes some aspect of food web structure. There are other properties, such as Degree Distribution, which are not single-number properties. “Degree” refers to the number of links that connect to a particular node, and the degree distribution of a network describes (in the format of a function or a graph) the total number of nodes in a network that have a given degree for each level of degree (Subsect. “Degree Distribution”). In food web analysis, LinkSD, GenSD, and VulSD characterize the variability of different aspects of degree distribution. Many food web structure properties are correlated with each other, and vary in predictable ways with S and/or C. This provides opportunities for topological modeling that are discussed below (Sect. “Models of Food Web Structure”).

In addition to these types of metrics based on networks with unweighted links and nodes, it is possible to calculate a variety of metrics for food webs with nodes and/or links that are weighted by measures such as biomass, numerical abundance, frequency, interaction strength, or body size [11,33,67,81]. However, few food web datasets are “enriched” with such quantitative data and it remains to be seen whether such approaches are primarily a tool for richer description of particular ecosystems or whether they can give rise to novel generalities, models and predictions. One potential generality was suggested by a study of interaction strengths in seven soil food webs, where interaction strength reflects the size of the effects of species on each other’s dynamics near equilibrium. Interaction strengths appear to be organized such that long loops contain many weak links, a pattern which enhances stability of complex food webs [81]. Food Webs Compared to Other Networks Small-World Properties How does the structure of food webs compare to that of other kinds of networks? One common way that various networks have been compared is in terms of whether they are “small-world” networks. Small-world networks are characterized by two of the properties described previously, characteristic path length (Path) and clustering coefficient (Cl) [109]. Most real-world networks appear to have high clustering, like what is seen on some types of regular spatial lattices (such as a planar triangular lattice, where many of a node’s neighbors are neighbors of one another), but have short path lengths, like what is seen on “random graphs” (i. e., networks in which links are distributed randomly). Food webs do display short path lengths that are similar to what is seen in random webs (Table 1, [16,37,78,113]). On average, taxa are about two links from other taxa in a food web (“two degrees of separation”), and path length decreases with increasing connectance [113]. However, clustering tends to be quite low in many food webs, closer to the clustering expected on a random network (Table 1). This relatively low clustering in food webs appears consistent with their small size compared to most other kinds of networks studied, since the ratio of clustering in empirical versus comparable random networks increases linearly with the size of the network (Fig. 5). Degree Distribution In addition to small-world properties, many real-world networks appear to display power-law degree distribu-

Food Webs Food Webs, Table 1 Topological properties of empirical and random food webs, listed in order of increasing connectance. Path refers to characteristic path length, and Cl refers to the clustering coefficient. Pathr and Cl r refer to the mean D and Cl for 100 random webs with the same S and C. Modified from [37] Table 1 Food Web

S

C(L/S2 ) L/S

Path Pathr Cl

Clr

Cl/Clr

Grassland

61 0.026

1.59 3.74 3.63

0.11 0.03 3.7

Scotch Broom

85 0.031

2.62 3.11 2.82

0.12 0.04 3.0

Ythan Estuary 1

124 0.038

4.67 2.34 2.39

0.15 0.04 3.8

Ythan Estuary 2

83 0.057

4.76 2.20 2.19

0.16 0.06 2.7

El Verde Rainforest 155 0.063

9.74 2.20 1.95

0.12 0.07 1.4

Canton Creek

102 0.067

6.83 2.27 2.01

0.02 0.07 0.3

Stony Stream

109 0.070

7.61 2.31 1.96

0.03 0.07 0.4

Chesapeake Bay

31 0.071

2.19 2.65 2.40

0.09 0.09 1.0

St. Marks Seagrass

48 0.096

4.60 2.04 1.94

0.14 0.11 1.3

St. Martin Island

42 0.116

4.88 1.88 1.85

0.14 0.13 1.1

Little Rock Lake

92 0.118

10.84 1.89 1.77

0.25 0.12 2.1

Lake Tahoe

172 0.131

22.59 1.81 1.74

0.14 0.13 1.1

Mirror Lake

172 0.146

25.13 1.76 1.72

0.14 0.15 0.9

Bridge Brook Lake

25 0.171

4.28 1.85 1.68

0.16 0.19 0.8

Coachella Valley

29 0.312

9.03 1.42 1.43

0.43 0.32 1.3

Skipwith Pond

25 0.315

7.88 1.33 1.41

0.33 0.33 1.0

Food Webs, Figure 5 Trends in clustering coefficient across networks. The ratio of clustering in empirical networks (Clempirical ) to clustering in random networks with the same number of nodes and links (Clrandom ) is shown as a function of the size of the network (number of nodes). Reproduced from [37], Fig. 1

tions [2]. Whereas regular graphs have the same number of links per node, and random graphs display a Poisson degree distribution, many empirical networks, both biotic and abiotic, display a highly skewed power-law (“scalefree”) degree distribution, where most nodes have few links and a few nodes have many links. However, some empirical networks display less-skewed distributions such as exponential distributions [4]. Most empirical food webs display exponential or uniform degree distributions, not power-law distributions [16,37], and it has been suggested that normalized degree distributions in food webs follow universal functional forms [16] although there is a quite a bit of scatter when a wide range of data are considered (Fig. 6, [37]). Variable degree distributions, like what is seen in individual food webs, could result from simple mechanisms. For example, exponential and uniform food web degree distributions are generated by a model that combines (1) random immigration to local webs from a randomly linked regional set of taxa, and (2) random extinctions in the local webs [5]. The general lack of powerlaw degree distributions in food webs may result partly from the small size and large connectance of such networks, which limits the potential for highly skewed distributions. Many of the networks displaying power-law degree distributions are much larger and much more sparsely connected than food webs.

1163

1164

Food Webs

Food Webs, Figure 6 Log-log overlay plot of the cumulative distributions of links per species in 16 food webs. The link data are normalized by the average number of links/species in each web. If the distributions followed a power law, the data would tend to follow a straight line. Instead, they follow a roughly exponential shape. Reproduced from [37], Fig. 3

Other Properties Assortative mixing, or the tendency of nodes with similar degree to be linked to each other, appears to be a pervasive phenomenon in a variety of social networks [82]. However, other kinds of networks, including technological and biological networks, tend to show disassortative mixing, where nodes with high degree tend to link to nodes with low degree. Biological networks, and particularly two food webs examined, show strong disassortativity [82]. Some of this may relate to a finite-size effect in systems like food webs that have limits on how many links are recorded between pairs of nodes. However, in food webs it may also result from the stabilizing effects of having feeding specialists linked to feeding generalists, as has been suggested for plant-animal pollination and frugivory (fruit-eating) networks ([7], Sect. “Ecological Networks”). Another aspect of structure that has been directly compared across several types of networks including food webs are “motifs,” defined as “recurring, significant patterns of interconnections” [77]. A variety of networks (transcriptional gene regulation, neuron connectivity, food webs, two types of electronic circuits, the World Wide Web) were scanned for all possible subgraphs that could be constructed out of 3 or 4 nodes (13 and 199 possible subgraphs, respectively). Subgraphs that appeared significantly more often in empirical webs than in their randomized counterparts (i. e., networks with the same number of nodes and links, and the same degree for each node, but with links otherwise randomly distributed) were identi-

Food Webs, Figure 7 The two 3 or 4-node network motifs found to occur significantly more often than expected in most of seven food webs examined. There is one significant 3-node motif (out of 13 possible motifs), a food chain of the form A eats B eats C. There is one significant 4-node motif (out of 199 possible motifs), a trophic diamond (“bi-parallel”) of the form A eats B and C, which both eat D

fied. For the seven food webs examined, there were two “consensus motifs” shared by most of the webs—a threenode food chain, and a four-species diamond where a predator has two prey, which in turn prey on the same species (Fig. 7). The four-node motif was shared by two other types of networks (neuron connectivity, one type of electronic circuit), and nothing shared the three-node chain. The WWW and food web networks appear most dissimilar to other types of networks (and to each other) in terms of significant motifs. Complex networks can be decomposed into minimum spanning trees (MST). A MST is a simplified version of a network created by removing links to minimize the distance between nodes and some destination. For example, a food web can be turned into MST by adding an “environment” node that all basal taxa link to, tracing the shortest food chain from each species to the environment node, and removing links that do not appear in the shortest chains. Given this algorithm, a MST removes links that occur in loops and retains a basic backbone that has a treelike structure. In a MST, the quantity Ai is defined as the number of nodes in a subtree rooted at node i, and can be regarded as the transportation rate through that node. Ci is defined as the integral of Ai (i. e., the sum of Ai for all nodes rooted at node i) and can be regarded as the transportation cost at node i. These properties can be used to plot Ci versus Ai for each node in a networks, or to plot wholesystem Co versus Ao across multiple networks, to identify whether scaling relationships of the form C(A) / An are present, indicating self-similarity in the structure of the MST (see [18] for review). In a food web MST, the most efficient configuration is a star, where every species links directly to the environment node, resulting in an exponent of 1, and the least efficient configuration is a single chain, where resources have to pass through each species in a line,

Food Webs

resulting in an exponent of 2. It has been suggested that food webs display a universal exponent of 1.13 [18,45], reflecting an invariant functional food web property relating to very efficient resource transportation within an ecosystem. However, analyzes based on a larger set of webs (17 webs versus 7) suggest that exponents for Ci as a function of Ai range from 1.09 to 1.26 and are thus not universal, that the exponents are quite sensitive to small changes in food web structure, and that the observed range of exponent values would be similarly constrained in any network with only 3 levels, as is seen in food web MSTs [15]. Models of Food Web Structure An important area of research on food webs has been whether their observed structure, which often appears quite complex, can emerge from simple rules or models. As with other kinds of “real-world” networks, models that assign links among nodes randomly, according to fixed probabilities, fail to reproduce the network structure of empirically observed food webs [24,28,110]. Instead, several models that combine stochastic elements with simple link assignment rules have been proposed to generate and predict the network structure of empirical food webs. The models share a basic formulation [110]. There are two empirically quantifiable parameters: (1) S, the number of trophic species in a food web, and (2) C, the connectance of a food web, defined as links per species squared, L/S 2 . Thus, S specifies the number of nodes in a network, and C specifies the number of links in a network with S nodes. Each species is assigned a “niche value” ni drawn randomly and uniformly from the interval [0,1]. The models differ in the rules used to distribute links among species. The link distribution rules follow in the order the models were introduced in the literature: Cascade Model (Cohen and Newman [28]): Each species has the fixed probability P D 2CS/(S  1) of consuming species with niche values less than its own. This creates a food web with hierarchical feeding, since it does not allow feeding on taxa with the same niche value (cannibalism) or taxa with higher niche values (looping/cycling). This formulation [110] is a modified version of the original cascade model that allows L/S, equivalent to the CS term in the probability statement above, to vary as a tunable parameter, rather than be fixed as a constant [28]. Niche Model (Williams and Martinez [110], Fig. 8): Each species consumes all species within a segment of the [0,1] interval whose size ri is calculated using the feeding

Food Webs, Figure 8 Graphical representation of the niche model: Species i feeds on 4 taxa including itself and one with a higher niche value

range width algorithm described below. The ri ’s center ci is set at a random value drawn uniformly from the interval [r i /2; n i ] or [r i /2; 1  r i /2] if n i > 1  r i /2, which places ci equal to or lower than the niche value ni and keeps the ri segment within [0,1]. The ci rule relaxes the strict feeding hierarchy of the cascade model and allows for the possibility of cannibalism and looping. Also, the ri rule ensures that species feed on a contiguous range of species, necessarily creating interval graphs (i. e., species can be lined up along a single interval such that all of their resource species are located in contiguous segments along the interval). Feeding range width algorithm: The value of r i D xn i , where 0 < x < 1 is randomly drawn from the probability density function p(x) D ˇ(1  x)b1 (the beta distribution), where ˇ D (1/2C)  1 to obtain a C close to the desired C. Nested-Hierarchy Model (Cattin et al. [19]): Like the niche model, the number of prey items for each species is drawn randomly from a beta distribution that constrains C close to a target value. Once the number of prey items for each species is set, those links are assigned in a multistep process. First, a link is randomly assigned from species i to a species j with a lower ni . If j is fed on by other species, the next feeding links for i are selected randomly from the pool of prey species fed on by a set of consumer species defined as follows: they share at least one prey species and at least one of them feeds on j. If more links need to be distributed, they are then randomly assigned to species without predators and with niche values < n i , and finally to those with niche value n i . These rules were chosen to relax the contiguity rule of the niche model and to allow for trophic habit overlap among taxa in a manner which the authors suggest evokes phylogenetic constraints. Generalized Cascade Model (Stouffer et al. [99]): Species i feeds on species j if n j  n i with a probability drawn from the interval [0,1] using the beta or an exponential distribution. This model combines the beta distribution introduced in the niche model with the hierarchical, non-contiguous feeding of the cascade model.

1165

1166

Food Webs

These models have been evaluated with respect to their relative fit to empirical data in a variety of ways. In a series of six papers published from 1985 to 1990 with the common title “A stochastic theory of community food webs,” the cascade model was proposed as a means of explaining “the phenomenology of observed food web structure, using a minimum of hypotheses” [31]. This was not the first simple model proposed for generating food web structure [25,88,89,116], but it was the most well-developed model. Cohen and colleagues also examined several model variations, most of which performed poorly. While the cascade model appeared to generate structures that qualitatively fit general patterns in the data from the 113-web catalog, subsequent statistical analyzes suggested that the fit between the model and that early data was poor [93,96,97]. Once improved data began to emerge, it became clear that some of the basic assumptions built in to the cascade model, such as no looping and minimal overlap and no clustering of feeding habits across taxa, are violated by common features of multi-species interactions. The niche model was introduced in 2000, along with a new approach to analysis: numerical simulations to compare statistically the ability of the niche model, the cascade model, and one type of random network model to fit empirical food web data [110]. Because of stochastic variation in how species and links are distributed in any particular model web, analysis begins with the generation of hundreds to thousands of model webs with the same S and similar C as an empirical food web of interest. Model webs that fall within 3% of the target C are retained. Model-generated webs occasionally contain species with no links to other species, or species that are trophically identical. Either those webs are thrown out, or those species are eliminated and replaced, until every model web has no disconnected or identical species. Also, each model web must contain at least one basal species. These requirements ensure that model webs can be sensibly comparable to empirical trophic-species webs. Once a set of model webs are generated with the same S and C as an empirical web, model means and standard deviations are calculated for each food web property of interest, which can then be compared to empirical values. Raw error, the difference between the value of an empirical property and a model mean for that property, is normalized by dividing it by the standard deviation of the property’s simulated distribution. This approach allows assessment not only of whether a model over- or under-estimates empirical properties as indicated by the raw error, but also to what degree a model’s mean deviates from the empirical value. Normalized errors within ˙2

are considered to indicate a good fit between the model prediction and the empirical value. This approach has also been used to analyze network motifs [77] (Subsect. “Other Properties”). The initial niche model analyzes examined seven more recent, diverse food webs (S D 24 to 92) and up to 12 network structure properties for each web [110]. The random model (links are distributed randomly among nodes) performs poorly, with an average normalized error (ANE) of 27.1 (SD D 202). The cascade model performs better, with ANE of -3.0 (SD D 14:1). The niche model performs an order of magnitude better than the cascade model, with ANE of 0.22 (SD D 1:8). Only the niche model falls within ˙2 ANE considered to show a good fit to the data. Not surprisingly, there is variability in how all three models fit different food webs and properties. For example, the niche model generally overestimates food-chain length. Specific mismatches are generally attributable either to limitations of the models or biases in the data [110]. A separate test of the niche and cascade models with three marine food webs, a type of habitat not included in the original analysis, obtained similar results [39]. These analyzes demonstrate that the structure of food webs is far from random and that simple link distribution rules can yield apparently complex network structure, similar to that observed in empirical data. In addition, the analyzes suggest that food webs from a variety of habitats share a fundamentally similar network structure, and that the structure is scale-dependent in predictable ways with S and C. The nested-hierarchy model [19] and generalized cascade model [99], variants of the niche model, do not appear to improve on the niche model, and in fact may be worse at representing several aspects of empirical network structure. Although the nested-hierarchy model breaks the intervality of the niche model and uses a complicatedsounding set of link distribution rules to try to mimic phylogenetic constraints on trophic structure, it “generates webs characterized by the same universal distributions of numbers of prey, predators, and links” as the niche model [99]. Both the niche and nested-hierarchy models have a beta distribution at their core. The beta distribution is reasonably approximated by an exponential distribution for C < 0:12 [99], and thus reproduces the exponential degree distributions observed in many empirical webs, particularly those with average or less-than-average C [37]. The generalized cascade model was proposed as a simplified model that would return the same distributions of taxa and links as the niche and nested-hierarchy models. It is defined using only two criteria: (1) taxa form a totally ordered set—this is fulfilled by the arrangement of taxa along a single “niche” interval or dimension, and (2) each species

Food Webs

has an exponentially decaying probability of preying on a given fraction of species with lower niche values [99]. Although the generalized cascade model does capture a central tendency of successful food web models, only some food web properties are derivable from link distributions (e. g., Top, Bas, Can, VulSD, GenSD, Clus). There are a variety of food web structure properties of interest that are not derivable from degree distributions (e. g., Loop, Omn, Herb, TL, food-chain statistics, intervality statistics). The accurate representation of these types of properties may depend on additional factors, for example the contiguous feeding ranges specified by the niche model but absent from the cascade, nested-hierarchy, and generalized cascade models. While it is known that empirical food webs are not interval, until recently it was not clear how non-interval they are. Intervality is a brittle property that is broken by a single gap in a single feeding range (i. e., a single missing link in a food web), and trying to arrange species in a food web into their most interval ordering is a computationally challenging problem. A more robust measure of intervality in food webs has been developed, in conjunction with the use of simulated annealing to estimate the most interval ordering of empirical food webs [100]. This analysis suggests that complex food webs “do exhibit a strong bias toward contiguity of prey, that is, toward intervality” when compared to several alternative “null” models, including the generalized cascade model. Thus, the intervality assumption of the niche model, initially critiqued as a flaw of the model [19], helps to produce a better fit to empirical data than the non-interval alternate models. Structural Robustness of Food Webs A series of papers have examined the response of a variety of networks including the Internet and WWW web pages [1] and metabolic and protein networks [52,53] to the simulated loss of nodes. In each case, the networks, all of which display highly skewed power-law degree distributions, appear very sensitive to the targeted loss of highlyconnected nodes but relatively robust to random loss of nodes. When highly-connected nodes are removed from scale-free networks, the average path length tends to increase rapidly, and the networks quickly fragment into isolated clusters. In essence, paths of information flow in highly skewed networks are easily disrupted by loss of nodes that are directly connected to an unusually large number of other nodes. In contrast, random networks with much less skewed Poisson degree distributions display similar responses to targeted loss of highly-connected nodes versus random node loss [101].

Within ecology, species deletions on small (S < 14) hypothetical food web networks as well as a subset of the 113-web catalog have been used to examine the reliability of network flow, or the probability that sources (producers) are connected to sinks (consumers) in food webs [54]. The structure of the empirical webs appears to conform to reliable flow patterns identified using the hypothetical webs, but that result is based on low diversity, poorly resolved data. The use of more highly resolved data with node knock-out algorithms to simulate the loss of species allows assessment of potential secondary extinctions in complex empirical food webs. Secondary extinctions result when the removal of taxa results in one or more consumers losing all of their resource taxa. Even though most food webs do not have power-law degree distributions, they show similar patterns of robustness to other networks: removal of highly-connected species results in much higher rates of secondary extinctions than random loss of species ([38,39,95], Fig. 9). In addition, loss of high-degree species results in more rapid fragmentation of the webs [95]. Protecting basal taxa from primary removal increases the robustness of the web (i. e., fewer secondary extinctions occur) ([38], Fig. 9). While removing species with few links generally results in few secondary extinctions, in a quarter of the food webs examined, removing low-degree species results in secondary extinctions comparable to or greater than what is seen with removal of high-degree species [38]. This tends to occur in webs with relatively high C. Beyond differential impacts of various sequences of species loss in food webs, food web ‘structural robustness’ can be defined as the fraction of primary species loss that induces some level of species loss (primary + secondary extinctions) for a particular trophic-species web. Analysis of R50 (i. e., what proportion of species have to be removed to achieve 50% total species loss) across multiple food webs shows that robustness increases approximately logarithmically with increasing connectance ([38,39], Fig. 9, 10). In essence, from a topological perspective food webs with more densely interconnected taxa are better protected from species loss, since it takes greater species loss for consumers to lose all of their resources. It is also potentially important from a conservation perspective to identify particular species likely to result in the greatest number of secondary extinctions through their loss. The loss of a particular highly-connected species may or may not result in secondary extinctions. One way to identify critical taxa is to reduce the topological structure of empirical food webs into linear pathways that define the essential chains of energy delivery in each web. A particular species can be said to “dominate” other species if it passes energy to them along a chain in the

1167

1168

Food Webs

Food Webs, Figure 9 Secondary extinctions resulting from primary species loss in 4 food webs ordered by increasing connectance (C). The y-axis shows the cumulative secondary extinctions as a fraction of initial S, and the x-axis shows the primary removals of species as a fraction of initial S. 95% error bars for the random removals fall within the size of the symbols and are not shown. For the most connected (circles), least connected (triangles), and random removal (plus symbols) sequences, the data series end at the black diagonal dashed line, where primary removals plus secondary extinctions equal S and the web disappears. For the most connected species removals with basal species preserved (black dots), the data points end when only basal species remain. The shorter red diagonal lines show the points at which 50% of species are lost through combined primary removals and secondary extinctions (“robustness” or R50 )

Food Webs, Figure 10 The proportion of primary species removals required to induce a total loss (primary removals plus secondary extinctions) of 50% of the species in each of 16 food webs (“robustness,” see the shorter red line of Fig. 9 for visual representation) as a function of the connectance of each web. Logarithmic fits to the three data sets are shown, with a solid line for the most connected deletion order, a long dashed line for the most connected with basal species preserved deletion order, and a short dashed line for random deletion order. The maximum possible y value is 0.50. The equations for the fits are: y D 0:162 ln(x) C 0:651 for most connected species removals, y D 0:148 ln(x) C 0:691 for most connected species removals with basal species preserved, and y D 0:067 ln(x) C 0:571 for random species removals. Reproduced from [38], Fig. 2

dominator tree. The higher the number of species that a particular species dominates, the greater the secondary extinctions that may result from its removal [3]. This approach has the advantage of going beyond assessment of direct interactions to include indirect interactions. As in food webs, the order of pollinator loss has an effect on potential plant extinction patterns in plant-pollinator networks [75] (see Sect. “Ecological Networks”). Loss of plant diversity associated with targeted removal of highly-connected pollinators is not as extreme as comparable secondary extinctions in food webs, which may be due to pollinator redundancy and the nested topology of those networks. While the order in which species go locally extinct clearly affects the potential for secondary extinctions in ecosystems, the focus on high-degree, random, or even dominator species does not provide insight on ecologically plausible species loss scenarios, whether the focus is on human perturbations or natural dynamics. The issue of what realistic natural extinction sequences might look like has been explored using a set of pelagic-focused food webs for 50 Adirondack lakes [49] with up to 75 species [98]. The geographic nestedness of species composition across the lakes is used to derive an ecologically plausible extinction sequence scenario, with the most restricted taxa the most likely to go extinct. This sequence is corroborated by the pH tolerances of the species. Species removal simulations show that the food webs are highly robust in terms of secondary extinctions to the “realistic” extinction order and highly sensitive to the reverse order. This suggests that nested geographical distribution patterns coupled with lo-

Food Webs

cal food web interaction patterns appear to buffer effects of likely species losses. This highlights important aspects of community organization that may help to minimize biodiversity loss in the face of a naturally changing environment. However, anthropogenic disturbances may disrupt the inherent buffering of how taxa are organized geographically and trophically, reducing the robustness of ecosystems. Food Web Dynamics Analysis of the topology of food webs has proven very useful for exploring basic patterns and generalities of “who eats whom” in ecosystems. This approach seeks to identify “the most universal, high-level, persistent elements of organization” [35] in trophic networks, and to leverage understanding of such organization for thinking about ecosystem robustness. However, food webs are inherently dynamical systems, since feeding interactions involve biomass flows among species whose “stocks” can be characterized by numbers of individuals and/or aggregate population biomass. All of these stocks and flows change through time in response to direct and indirect trophic and other types of interactions. Determining the interplay among network structure, network dynamics, and various aspects of stability such as persistence, robustness, and resilience in complex “real-world” networks is one of the great current challenges in network research [101]. It is particularly important in the study of ecosystems, since they face a variety of anthropogenic perturbations such as climate change, habitat loss, and invasions, and since humans depend on them for a variety of “ecosystem services” such as supply of clean water and pollination of crops [34]. Because it is nearly impossible to compile detailed, long-term empirical data for dynamics of more than two interacting species, most research on species interaction dynamics relies on analytical or simulation modeling. Most modeling studies of trophic dynamics have focused narrowly on predator-prey or parasite-host interactions. However, as the previous sections should make clear, in natural ecosystems such interaction dyads are embedded in diverse, complex networks, where many additional taxa and their direct and indirect interactions can play important roles for the stability of focal species as well as the stability or persistence of the broader community. Moving beyond the two-species population dynamics modeling paradigm, there is a strong tradition of research that looks at interactions among 3–8 species, exploring dynamics and simple variations in structure in slightly more complex systems (see reviews in [40,55]). However, these interaction modules still present a drastic simplification of

the diversity and structure of natural ecosystems. Other dynamical approaches have focused on higher diversity model systems [69], but ignore network structure in order to conduct analytically tractable analyzes. Researchers are increasingly integrating dynamics with complex food web structure in modeling studies that move beyond small modules. The Lotka–Volterra cascade model [20,21,32] was an early incarnation of this type of integration. As its name suggests, the Lotka–Volterra cascade model runs classic L-V dynamics, including a nonsaturating linear functional response, on sets of species interactions structured according to the cascade model [28]. The cascade model was also used to generate the structural framework for a dynamical food web model with a linear functional response [58] used to study the effects of prey-switching on ecosystem stability. Improving on aspects of biological realism of both dynamics and structure, a bioenergetic dynamical model with nonlinear functional responses [119] was used in conjunction with empirically-defined food web structure among 29 species to simulate the biomass dynamics of a marine fisheries food web [117,118]. This type of nonlinear bioenergetic dynamical modeling approach has been integrated with niche model network structure and used to study more complex networks [13,14,68,112]. A variety of types of dynamics are observed in these non-linear models, including equilibrium, limit cycle, and chaotic dynamics, which may or may not be persistent over short or long time scales (Fig. 11). Other approaches model ecological and evolutionary dynamics to assemble species into networks, rather than imposing a particular structure on them. These models, which typically employ an enormous amount of parameters, are evaluated as to whether they generate plausible persistence, diversity, and network structure (see review by [72]). All of these approaches are generally used to examine stability, characterized in a diversity of ways, in ecosystems with complex structure and dynamics [71,85]. While it is basically impossible to empirically validate models of integrated structure and dynamics for complex ecological networks, in some situations it is possible to draw interesting connections between models and data at more aggregated levels. This provides opportunities to move beyond the merely heuristic role that such models generally play. For example, nonlinear bioenergetic models of population dynamics parametrized by biological rates allometrically scaled to populations’ average body masses have been run on various types of model food web structures [14]. This approach has allowed the comparison of trends in two different measures of food web stability, and how they relate to consumer-resource bodysize ratios and to initial network structure. One measure

1169

1170

Food Webs

range from 102 (consumers are 100 times smaller than their resources) to 105 (consumers are 100,000 times larger than their resources) (Fig. 12). Species persistence increases dramatically with increasing body-size ratios, until inflection points are reached at which persistence shifts to high levels ( 0.80) of persistence (Fig. 12a). However, population stability decreases with increasing bodysize ratios until inflection points are reached that show the lowest stability, and then increases again beyond those points (Fig. 12b). In both cases, the inflection points correspond to empirically observed consumer-resource bodysize ratios, both for webs parametrized to represent invertebrate dominated webs, and for webs parametrized to represent ectotherm vertebrate dominated webs. Thus, across thousands of observations, invertebrate consumerresource body size ratios are 101 , and ectotherm vertebrate consumer-resource body size ratios are 102 , which correspond to the model’s inflection points for species persistence and population stability (Fig. 12). It is interesting to note that high species persistence is coupled to low population stability—i. e., an aspect of increased stability of the whole system (species persistence) is linked to an aspect of decreased stability of components of that system (population stability). It is also interesting to note that in this formulation, using initial cascade versus niche model structure had little impact on species persistence or population stability [14], although other formulations show increased persistence when dynamics are initiated with niche model versus other structures [68]. How structure influences dynamics, and vice-versa, is an open question. Ecological Networks

Food Webs, Figure 11 5 different types of population dynamics shown as time series of population density (from [40], Fig. 10.1). The types of dynamics shown include a stable equilibrium, damped oscillations, limit cycles, amplified oscillations, and chaotic oscillations

of stability is the fraction of original species that display persistent dynamics, i. e., what fraction of species do not go extinct in the model when it is run over many time steps (“species persistence”). Another measure of stability is how variable the densities of all of the persistent species are (“population stability”)—greater variability across all the species indicates decreased stability in terms of population dynamics. Brose and colleagues [14] ran the model using different hypothetical consumer-resource body-size ratios that

This article has focused on food webs, which generally concern classic predator-herbivore-primary producer feeding interactions. However, the basic concept of food webs can be extended to a broader framework of “ecological networks” that is more inclusive of different components of ecosystem biomass flow, and that takes into consideration different kinds of species interactions that are not classic “predator-prey” interactions. Three examples are mentioned here. First, parasites have typically been given short shrift in traditional food webs, although exceptions exist (e. g., [51,67,74]). This is changing as it becomes clear that parasites are ubiquitous, often have significant impacts on predator-prey dynamics, and may be the dominant trophic habitat in most food webs, potentially altering our understanding of structure and dynamics [59]. The dynamical models described previously have been parametrized with more conventional, non-parasite consumers in mind. An interesting open question is how

Food Webs

Food Webs, Figure 12 a shows the fraction of species that display persistent dynamics as a function of consumer-resource body-size ratios for model food webs parametrized for invertebrates (gray line) and ectotherm vertebrates (black line). The inflection points for shifts to highpersistence dynamics are indicated by red arrows for both curves, and those inflection points correspond to empirically observed consumer-resource body size ratios for invertebrate dominated webs (101 —consumers are on average 10 times larger than their resources) and ectotherm vertebrate dominated webs (102 —consumers are on average 100 times larger than their resources). b shows results for population stability, the mean of how variable species population biomasses are in persistent webs. In this case, the inflection points for shifts to low population stability are indicated by red arrows, and those inflection points also correspond to the empirically observed body-size ratios for consumers and resources. Figure adapted from [14]

altering dynamical model parameters such as metabolic rate, functional response, and consumer-resource body size ratios to reflect parasite characteristics will affect our understanding of food web stability. Second, the role of detritus, or dead organic matter, in food webs has yet to be adequately resolved in either structural or dynamical approaches. Detritus has typically been included as one or several separate nodes in many binary-link and flow-weighted food webs. In some cases, it is treated as an additional “primary producer,” while in other cases both primary producers and detritivores connect to it. Researchers must think much more carefully about how to include detritus in all kinds of ecological studies [80], given that it plays a fundamental role in most ecosystems and has particular characteristics that differ from other food web nodes: it is non-living organic matter, all species contribute to detrital pools, it is a major resource for many species, and the forms it takes are extremely heterogeneous (e. g., suspended organic matter in water columns; fecal material; rotting trees; dead animal bodies; small bits of plants and molted cuticle, skin, and hair mixed in soil; etc.). Third, there are many interactions that species participate in that go beyond strictly trophic interactions. Plantanimal mutualistic networks, particularly pollination and seed dispersal or “frugivory” networks, have received the most attention thus far. They are characterized as “bipartite” (two-level) graphs, with links from animals to plants, but no links among plants or among animals [7,9,56,57, 73,107]. While both pollination and seed dispersal do in-

volve a trophic interaction, with animals gaining nutrition from plants during the interactions, unlike in classic predator-prey relationships a positive benefit is conferred upon both partners in the interaction. The evolutionary and ecological dynamics of such mutualistic relationships place unique constraints on the network structure of these interactions and the dynamical stability of such networks. For example, plant-animal mutualistic networks are highly nested and thus asymmetric, such that generalist plants and generalist animals tend to interact among themselves, but specialist species (whether plants or animals) also tend to interact with the most generalist species [7,107]. When simple dynamics are run on these types of “coevolutionary” bipartite networks, it appears that the asymmetric structure enhances long-term species coexistence and thus biodiversity maintenance [9]. Future Directions Food web research of all kinds has expanded greatly over the last several years, and there are many opportunities for exciting new work at the intersection of ecology and network structure and dynamics. In terms of empiricism, there is still a paucity of detailed, evenly resolved community food webs in every habitat type. Current theory, models, and applications need to be tested against more diverse, more complete, and more highly quantified data. In addition, there are many types of datasets that could be compiled which would support novel research. For example, certain kinds of fossil assemblages may allow the com-

1171

1172

Food Webs

pilation of detailed paleo food webs, which in turn could allow examination of questions about how and why food web structure does or does not change over deep time or in response to major extinction events. Another example is data illustrating the assembly of food [41] webs in particular habitats over ecological time. In particular, areas undergoing rapid successional dynamics would be excellent candidates, such as an area covered by volcanic lava flows, a field exposed by a retreating glacier, a hillside denuded by an earth slide, or a forest burned in a large fire. This type of data would allow empirically-based research on the topological dynamics of food webs. Another empirical frontier is the integration of multiple kinds of ecological interaction data into networks with multiple kinds of links—for example, networks that combine mutualistic interactions such as pollination and antagonistic interactions such as predator-prey relationships. In addition, more spatially explicit food web data can be compiled across microhabitats or linked macrohabitats [8]. Most current food web data is effectively aspatial even though trophic interactions occur within a spatial context. More could also be done to collect food web data based on specific instances of trophic interactions. This was done for the insects that live inside the stems of grasses in British fields. The web includes multiple grass species, grass herbivores, their parasitoids, hyper-parasitoids, and hyper-hyper parasitoids [67]. Dissection of over 160,000 grass stems allowed detailed quantification of the frequency with which the species (S D 77 insect plus 10 grass species) and different trophic interactions (L D 126) were observed. Better empiricism will support improved and novel analysis, modeling, and theory development and testing. For example, while food webs appear fundamentally different in some ways from other kinds of “real-world” networks (e. g., they don’t display power-law degree distributions), they also appear to share a common core network structure that is scale-dependent with species richness and connectance in predicable ways, as suggested by the success of the niche and related models. Some of the disparity with other kinds of networks, and the shared structure across food webs, may be explicable through finite-size effects or other methodological or empirical constraints or artifacts. However, aspects of these patterns may reflect attributes of ecosystems that relate to particular ecological, evolutionary, or thermodynamic mechanisms underlying how species are organized in complex bioenergetic networks of feeding interactions. Untangling artifacts from attributes [63] and determining potential mechanisms underlying robust phenomenological patterns (e. g., [10]) is an important area of ongoing and future research. As a part of this, there is much work to be done to continue

to integrate structure and dynamics of complex ecological networks. This is critical for gaining a more comprehensive understanding of the conditions that underlie and promote different aspects of stability, at different levels of organization, in response to external perturbations and to endogenous short- and long-term dynamics. As the empiricism, analysis and modeling of food web structure and dynamics improves, food web network research can play a more central and critical role in conservation and management [76]. It is increasingly apparent that an ecological network perspective, which encompasses direct and indirect effects among interacting taxa, is critical for understanding, predicting, and managing the impacts of species loss and invasion, habitat conversion, and climate change. Far too often, critical issues of ecosystem management have been decided on extremely limited knowledge of one or a very few taxa. For example, this has been an ongoing problem in fisheries science. The narrow focus of most research driving fisheries management decisions has resulted in overly optimistic assessments of sustainable fishing levels. Coupled with climate stressors, over-fishing appears to be driving steep, rapid declines in diversity of common predator target species, and probably many other kinds of associated taxa [114]. Until we acknowledge that species of interest to humans are embedded within complex networks of interactions that can produce unexpected effects through the interplay of direct and indirect effects, we will continue to experience negative outcomes from our management decisions [118]. An important part of minimizing and mitigating human impacts on ecosystems also involves research that explicitly integrates human and natural dynamics. Network research provides a natural framework for analyzing and modeling the complex ways in which humans interact with and impact the world’s ecosystems, whether through local foraging or large-scale commercial harvesting driven by global economic markets. These and other related research directions will depend on efficient management of increasingly dispersed and diversely formatted ecological and environmental data. Ecoinformatic tools—the technologies and practices for gathering, synthesizing, analyzing, visualizing, storing, retrieving and otherwise managing ecological knowledge and information—are playing an increasingly important role in the study of complex ecosystems, including food web research [47]. Indeed, ecology provides an excellent testbed for developing, implementing, and testing new information technologies in a biocomplexity research context (e. g., Semantic Prototypes in Research Ecoinformatics/SPiRE: spire.umbc.edu/us/; Science Environment for Ecological Knowledge/SEEK: seek.ecoinformatics.org;

Food Webs

Webs on the Web/WoW: www.foodwebs.org). Synergistic ties between ecology, physics, computer science and other disciplines will dramatically increase the efficacy of research that takes advantage of such interdisciplinary approaches, as is currently happening in food web and related research.

20. 21.

22.

Bibliography Primary Literature 1. Albert R, Jeong H, Barabási AL (2000) Error and attack tolerance of complex networks. Nature 406:378–382 2. Albert R, Barabási AL (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97 3. Allesina S, Bodini A (2004) Who dominates whom in the ecosystem? Energy flow bottlenecks and cascading extinctions. J Theor Biol 230:351–358 4. Amaral LAN, Scala A, Berthelemy M, Stanley HE (2000) Classes of small-world networks. Proc Nat Acad Sci USA 97:11149– 11152 5. Arii K, Parrott L (2004) Emergence of non-random structure in local food webs generated from randomly structured regional webs. J Theor Biol 227:327–333 6. Baird D, Ulanowicz RE (1989) The seasonal dynamics of the Chesapeake Bay ecosystem. Ecol Monogr 59:329–364 7. Bascompte J, Jordano P, Melián CJ, Olesen JM (2003) The nested assembly of plant-animal mutualistic networks. Proc Natl Acad Sci USA 100:9383–9387 8. Bascompte J, Melián CJ, Sala E (2005) Interaction strength combinations and the overfishing of a marine food web. Proc Natl Acad Sci 102:5443–5447 9. Bascompte J, Jordano P, Olesen JM (2006) Asymmetric coevolutionary networks facilitate biodiversity maintenance. Science 312:431–433 10. Beckerman AP, Petchey OL, Warren PH (2006) Foraging biology predicts food web complexity. Proc Natl Acad Sci USA 103:13745–13749 11. Bersier L-F, Banašek-Richter C, Cattin M-F (2002) Quantitative descriptors of food web matrices. Ecology 83:2394–2407 12. Briand F, Cohen JE (1984) Community food webs have scaleinvariant structure. Nature 398:330–334 13. Brose U, Williams RJ, Martinez ND (2003) Comment on “Foraging adaptation and the relationship between food-web complexity and stability”. Science 301:918b 14. Brose U, Williams RJ, Martinez ND (2006) Allometric scaling enhances stability in complex food webs. Ecol Lett 9:1228– 1236 15. Camacho J, Arenas A (2005) Universal scaling in food-web structure? Nature 435:E3-E4 16. Camacho J, Guimerà R, Amaral LAN (2002) Robust patterns in food web structure. Phys Rev Lett 88:228102 17. Camacho J, Guimerà R, Amaral LAN (2002) Analytical solution of a model for complex food webs. Phys Rev Lett E 65:030901 18. Cartoza CC, Garlaschelli D, Caldarelli G (2006) Graph theory and food webs. In: Pascual M, Dunne JA (eds) Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford University Press, New York, pp 93–117 19. Cattin M-F, Bersier L-F, Banašek-Richter C, Baltensperger M,

23. 24. 25. 26.

27. 28.

29.

30.

31. 32.

33.

34. 35. 36.

37.

38.

39.

40.

41.

Gabriel J-P (2004) Phylogenetic constraints and adaptation explain food-web structure. Nature 427:835–839 Chen X, Cohen JE (2001) Global stability, local stability and permanence in model food webs. J Th Biol 212:223–235 Chen X, Cohen JE (2001) Transient dynamics and food web complexity in the Lotka–Volterra cascade model. Proc Roy Soc Lond B 268:869–877 Christian RR, Luczkovich JJ (1999) Organizing and understanding a winter’s seagrass foodweb network through effective trophic levels. Ecol Model 117:99–124 Cohen JE (1977) Ratio of prey to predators in community food webs. Nature 270:165–167 Cohen JE (1977) Food webs and the dimensionality of trophic niche space. Proc Natl Acad Sci USA 74:4533–4563 Cohen JE (1978) Food Webs and Niche Space. Princeton University Press, NJ Cohen JE (1989) Ecologists Co-operative Web Bank (ECOWeB™). Version 1.0. Machine Readable Data Base of Food Webs. Rockefeller University, NY Cohen JE, Briand F (1984) Trophic links of community food webs. Proc Natl Acad Sci USA 81:4105–4109 Cohen JE, Newman CM (1985) A stochastic theory of community food webs: I. Models and aggregated data. Proc R Soc Lond B 224:421–448 Cohen JE, Palka ZJ (1990) A stochastic theory of community food webs: V. Intervality and triangulation in the trophic niche overlap graph. Am Nat 135:435–463 Cohen JE, Briand F, Newman CM (1986) A stochastic theory of community food webs: III. Predicted and observed length of food chains. Proc R Soc Lond B 228:317–353 Cohen JE, Briand F, Newman CM (1990) Community Food Webs: Data and Theory. Springer, Berlin Cohen JE, Luczak T, Newman CM, Zhou Z-M (1990) Stochastic structure and non-linear dynamics of food webs: qualitative stability in a Lotka–Volterra cascade model. Proc R Soc Lond B 240:607–627 Cohen JE, Jonsson T, Carpenter SR (2003) Ecological community description using the food web, species abundance, and body size. Proc Natl Acad Sci USA 100:1781–1786 Daily GC (ed) (1997) Nature’s services: Societal dependence on natural ecosystems. Island Press, Washington DC Doyle J, Csete M (2007) Rules of engagement. Nature 446:860 Dunne JA (2006) The network structure of food webs. In: Pascual M, Dunne JA (eds) Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford University Press, New York, pp 27–86 Dunne JA, Williams RJ, Martinez ND (2002) Food web structure and network theory: the role of connectance and size. Proc Natl Acad Sci USA 99:12917–12922 Dunne JA, Williams RJ, Martinez ND (2002) Network structure and biodiversity loss in food webs: robustness increases with connectance. Ecol Lett 5:558–567 Dunne JA, Williams RJ, Martinez ND (2004) Network structure and robustness of marine food webs. Mar Ecol Prog Ser 273:291–302 Dunne JA, Brose U, Williams RJ, Martinez ND (2005) Modeling food-web dynamics: complexity-stability implications. In: Belgrano A, Scharler U, Dunne JA, Ulanowicz RE (eds) Aquatic Food Webs: An Ecosystem Approach. Oxford University Press, New York, pp 117–129 Dunne JA, Williams RJ, Martinez ND, Wood RA, Erwing

1173

1174

Food Webs

42. 43. 44. 45. 46. 47.

48.

49. 50. 51. 52.

53. 54. 55. 56.

57.

58.

59. 60. 61. 62. 63.

64. 65.

DE (2008) Compilation and network analyses of Cambrian food webs. PLoS Biology 5:e102. doi:10.1371/journal.pbio. 0060102 Egerton FN (2007) Understanding food chains and food webs, 1700–1970. Bull Ecol Soc Am 88(1):50–69 Elton CS (1927) Animal Ecology. Sidgwick and Jackson, London Elton CS (1958) Ecology of Invasions by Animals and Plants. Chapman & Hall, London Garlaschelli D, Caldarelli G, Pietronero L (2003) Universal scaling relations in food webs. Nature 423:165–168 Goldwasser L, Roughgarden JA (1993) Construction of a large Caribbean food web. Ecology 74:1216–1233 Green JL, Hastings A, Arzberger P, Ayala F, Cottingham KL, Cuddington K, Davis F, Dunne JA, Fortin M-J, Gerber L, Neubert M (2005) Complexity in ecology and conservation: mathematical, statistical, and computational challenges. Bioscience 55:501–510 Hardy AC (1924) The herring in relation to its animate environment. Part 1. The food and feeding habits of the herring with special reference to the East Coast of England. Fish Investig Ser II 7:1–53 Havens K (1992) Scale and structure in natural food webs. Science 257:1107–1109 Hutchinson GE (1959) Homage to Santa Rosalia, or why are there so many kinds of animals? Am Nat 93:145–159 Huxham M, Beany S, Raffaelli D (1996) Do parasites reduce the chances of triangulation in a real food web? Oikos 76:284–300 Jeong H, Tombor B, Albert R, Oltvia ZN, Barabási A-L (2000) The large-scale organization of metabolic networks. Nature 407:651–654 Jeong H, Mason SP, Barabási A-L, Oltavia ZN (2001) Lethality and centrality in protein networks. Nature 411:41 Jordán F, Molnár I (1999) Reliable flows and preferred patterns in food webs. Ecol Ecol Res 1:591–609 Jordán F, Scheuring I (2004) Network ecology: topological constraints on ecosystem dynamics. Phys Life Rev 1:139–229 Jordano P (1987) Patterns of mutualistic interactions in pollination and seed dispersal: connectance, dependence asymmetries, and coevolution. Am Nat 129:657–677 Jordano P, Bascompte J, Olesen JM (2003) Invariant properties in co-evolutionary networks of plant-animal interactions. Ecol Lett 6:69–81 Kondoh M (2003) Foraging adaptation and the relationship between food-web complexity and stability. Science 299:1388–1391 Lafferty KD, Dobson AP, Kurls AM (2006) Parasites dominate food web links. Proc Nat Acad Sci USA 103:11211–11216 Lindeman RL (1942) The trophic-dynamic aspect of ecology. Ecology 23:399–418 Link J (2002) Does food web theory work for marine ecosystems? Mar Ecol Prog Ser 230:1–9 MacArthur RH (1955) Fluctuation of animal populations and a measure of community stability. Ecology 36:533–536 Martinez ND (1991) Artifacts or attributes? Effects of resolution on the Little Rock Lake food web. Ecol Monogr 61:367– 392 Martinez ND (1992) Constant connectance in community food webs. Am Nat 139:1208–1218 Martinez ND (1993) Effect of scale on food web structure. Science 260:242–243

66. Martinez ND (1994) Scale-dependent constraints on foodweb structure. Am Nat 144:935–953 67. Martinez ND, Hawkins BA, Dawah HA, Feifarek BP (1999) Effects of sampling effort on characterization of food-web structure. Ecology 80:1044–1055 68. Martinez ND, Williams RJ, Dunne JA (2006) Diversity, complexity, and persistence in large model ecosystems. In: Pascual M, Dunne JA (eds) Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford University Press, New York, pp 163–185 69. May RM (1972) Will a large complex system be stable? Nature 238:413–414 70. May RM (1973) Stability and Complexity in Model Ecosystems. Princeton University Press, Princeton. Reprinted in 2001 as a “Princeton Landmarks in Biology” edition 71. McCann KS (2000) The diversity-stability debate. Nature 405:228–233 72. McKane AJ, Drossel B (2006) Models of food-web evolution. In: Pascual M, Dunne JA (eds) Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford University Press, New York, pp 223–243 73. Memmott J (1999) The structure of a plant-pollinator network. Ecol Lett 2:276–280 74. Memmott J, Martinez ND, Cohen JE (2000) Predators, parasitoids and pathogens: species richness, trophic generality and body sizes in a natural food web. J Anim Ecol 69:1–15 75. Memmott J, Waser NM, Price MV (2004) Tolerance of pollination networks to species extinctions. Proc Royal Soc Lond Series B 271:2605–2611 76. Memmott J, Alonso D, Berlow EL, Dobson A, Dunne JA, Sole R, Weitz J (2006) Biodiversity loss and ecological network structure. In: Pascual M, Dunne JA (eds) Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford University Press, New York, pp 325–347 77. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298:763–764 78. Montoya JM, Solé RV (2002) Small world patterns in food webs. J Theor Biol 214:405–412 79. Montoya JM, Solé RV (2003) Topological properties of food webs: from real data to community assembly models. Oikos 102:614–622 80. Moore JC, Berlow EL, Coleman DC, de Ruiter PC, Dong Q, Hastings A, Collin Johnson N, McCann KS, Melville K, Morin PJ, Nadelhoffer K, Rosemond AD, Post DM, Sabo JL, Scow KM, Vanni MJ, Wall DH (2004) Detritus, trophic dynamics and biodiversity. Ecol Lett 7:584–600 81. Neutel AM, Heesterbeek JAP, de Ruiter PC (2002) Stability in real food webs: weak links in long loops. Science 296:1120– 1123 82. Newman MEJ (2002) Assortative mixing in networks. Phys Rev Lett 89:208701 83. Newman M, Barabasi A-L, Watts DJ (eds) (2006) The Structure and Dynamics of Networks. Princeton University Press, Princeton 84. Odum E (1953) Fundamentals of Ecology. Saunders, Philadelphia 85. Pascual M, Dunne JA (eds) (2006) Ecological Networks: Linking Structure to Dynamics in Food Webs. Oxford University Press, New York

Food Webs

86. Paine RT (1988) Food webs: road maps of interactions or grist for theoretical development? Ecology 69:1648–1654 87. Pierce WD, Cushamn RA, Hood CE (1912) The insect enemies of the cotton boll weevil. US Dept Agric Bull 100:9–99 88. Pimm SL (1982) Food Webs. Chapman and Hall, London. Reprinted in 2002 as a 2nd edition by University of Chicago Press 89. Pimm SL (1984) The complexity and stability of ecosystems. Nature 307:321–326 90. Pimm SL, Lawton JH (1978) On feeding on more than one trophic level. Nature 275:542–544 91. Pimm SL, Lawton JH (1980) Are food webs divided into compartments? J Anim Ecol 49:879–898 92. Polis GA (1991) Complex desert food webs: an empirical critique of food web theory. Am Nat 138:123–155 93. Schoener TW (1989) Food webs from the small to the large. Ecology 70:1559–1589 94. Schoenly K, Beaver R, Heumier T (1991) On the trophic relations of insects: a food web approach. Am Nat 137:597–638 95. Solé RV, Montoya JM (2001) Complexity and fragility in ecological networks. Proc R Soc Lond B 268:2039–2045 96. Solow AR (1996) On the goodness of fit of the cascade model. Ecology 77:1294–1297 97. Solow AR, Beet AR (1998) On lumping species in food webs. Ecology 79:2013–2018 98. Srinivasan UT, Dunne JA, Harte H, Martinez ND (2007) Response of complex food webs to realistic extinction sequences. Ecology 88:671–682 99. Stouffer DB, Camacho J, Guimera R, Ng CA, Amaral LAN (2005) Quantitative patterns in the structure of model and empirical food webs. Ecology 86:1301–1311 100. Stouffer DB, Camacho J, Amaral LAN (2006) A robust measure of food web intervality. Proc Nat Acad Sci 103:19015–19020 101. Strogatz SH (2001) Exploring complex networks. Nature 410:268–275 102. Sugihara G, Schoenly K, Trombla A (1989) Scale invariance in food web properties. Science 245:48–52 103. Summerhayes VS, Elton CS (1923) Contributions to the ecology of Spitzbergen and Bear Island. J Ecol 11:214–286 104. Summerhayes VS, Elton CS (1928) Further contributions to the ecology of Spitzbergen and Bear Island. J Ecol 16:193–268 105. Thompson RM, Townsend CR (1999) The effect of seasonal variation on the community structure and food-web attributes of two streams: implications for food-web science. Oikos 87:75–88 106. Thompson RM, Townsend CR (2005) Food web topology varies with spatial scale in a patchy environment. Ecology 86:1916–1925 107. Vásquez DP, Aizen MA (2004) Asymmetric specialization: a pervasive feature of plant-pollinator interactions. Ecology 85:1251–1257 108. Warren PH (1989) Spatial and temporal variation in the structure of a freshwater food web. Oikos 55:299–311 109. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘smallworld’ networks. Nature 393:440–442 110. Williams RJ, Martinez ND (2000) Simple rules yield complex food webs. Nature 404:180–183 111. Williams RJ, Martinez ND (2004) Trophic levels in complex food webs: theory and data. Am Nat 163:458–468 112. Williams RJ, Martinez ND (2004) Diversity, complexity, and

113.

114.

115. 116. 117.

118. 119.

persistence in large model ecosystems. Santa Fe Institute Working Paper 04-07-022 Williams RJ, Berlow EL, Dunne JA, Barabási AL, Martinez ND (2002) Two degrees of separation in complex food webs. Proc Natl Acad Sci USA 99:12913–12916 Worm B, Sandow M, Oschlies A, Lotze HK, Myers RA (2005) Global patterns of predator diversity in the open oceans. Science 309:1365–1369 Yodzis P (1980) The connectance of real ecosystems. Nature 284:544–545 Yodzis P (1984) The structure of assembled communities II. J Theor Biol 107:115–126 Yodzis P (1998) Local trophodynamics and the interaction of marine mammals and fisheries in the Benguela ecosystem. J Anim Ecol 67:635–658 Yodzis P (2000) Diffuse effects in food webs. Ecology 81:261– 266 Yodzis P, Innes S (1992) Body-size and consumer-resource dynamics. Am Nat 139:1151–1173.

Books and Reviews Belgrano A, Scharler U, Dunne JA, Ulanowicz RE (eds) (2005) Aquatic Food Webs: An Ecosystem Approach. Oxford University Press, Oxford Berlow EL, Neutel A-M, Cohen JE, De Ruiter P, Ebenman B, Emmerson M, Fox JW, Jansen VAA, Jones JI, Kokkoris GD, Logofet DO, McKane AJ, Montoya J, Petchey OL (2004) Interaction strengths in food webs: issues and opportunities. J Animal Ecol 73:585–598 Borer ET, Anderson K, Blanchette CA, Broitman B, Cooper SD, Halpern BS (2002) Topological approaches to food web analyses: a few modifications may improve our insights. Oikos 99:397–401 Christensen V, Pauly D (1993) Trophic Models of Aquatic Ecosystems. ICLARM, Manila Cohen JE, Beaver RA, Cousins SH, De Angelis DL, et al (1993) Improving food webs. Ecology 74:252–258 Cohen JE, Briand F, Newman CM (1990) Community Food Webs: Data and Theory. Springer, Berlin DeAngelis DL, Post WM, Sugihara G (eds) (1983) Current Trends in Food Web Theory. ORNL-5983, Oak Ridge Natl Laboratory Drossel B, McKane AJ (2003) Modelling food webs. In: Bornholt S, Schuster HG (eds) Handbook of Graphs and Networks: From the Genome to the Internet. Wiley-VCH, Berlin Hall SJ, Raffaelli DG (1993) Food webs: theory and reality. Advances in Ecological Research 24:187–239 Lawton JH (1989) Food webs. In Cherett JM, (ed) Ecological Concepts. Blackwell Scientific, Oxford Lawton JH, Warren PH (1988) Static and dynamic explanations for patterns in food webs. Trends in Ecology and Evolution 3:242–245 Martinez ND (1995) Unifying ecological subdisciplines with ecosystem food webs. In Jones CG, Lawton JH, (eds) Linking Species and Ecosystems. Chapman and Hall, New York Martinez ND, Dunne JA (1998) Time, space, and beyond: scale issues in food-web research. In Peterson D, Parker VT, (eds) Ecological Scale: Theory and Applications. Columbia University Press, New York May RM (1983) The structure of food webs. Nature 301:566–568

1175

1176

Food Webs

May RM (2006) Network structure and the biology of populations. Trends Ecol Evol 21:394–399 Montoya JM, Pimm SL, Sole RV (2006) Ecological networks and their fragility. Nature 442:259–264 Moore J, de Ruiter P, Wolters V (eds) (2005) Dynamic Food Webs: Multispecies Assemblages, Ecosystem Development and Environmental Change. Academic Press, Elsevier, Amsterdam Pimm SL, Lawton JH, Cohen JE (1991) Food web patterns and their consequences. Nature 350:669–674 Polis GA, Winemiller KO, (eds) (1996) Food Webs: Integration of Patterns & Dynamics. Chapman and Hall

Polis GA, Power ME, Huxel GR, (eds) (2003) Food Webs at the Landscape Level. University of Chicago Press Post DM (2002) The long and short of food-chain length. Trends Ecol Evol 17:269–277 Strong DR (ed) (1988) Food web theory: a ladder for picking strawberries. Special Feature. Ecology 69:1647–1676 Warren PH (1994) Making connections in food webs. Trends Ecol Evol 9:136–141 Woodward G, Ebenman B, Emmerson M, Montoya JM, Olesen JM, Valido A, Warren PH (2005) Body size in ecological networks. Trends Ecol Evol 20:402–409

Fuzzy Logic

Fuzzy Logic LOTFI A. Z ADEH Department of EECS, University of California, Berkeley, USA

Article Outline Glossary Definition of the Subject Introduction Conceptual Structure of Fuzzy Logic The Basics of Fuzzy Set Theory The Concept of Granulation The Concepts of Precisiation and Cointensive Precisiation The Concept of a Generalized Constraint Principal Contributions of Fuzzy Logic A Glimpse of What Lies Beyond Fuzzy Logic Bibliography

Glossary Cointension A qualitative measure of proximity of meanings/input-output relations. Extension principle A principle which relates to propagation of generalized constraints. f-validity fuzzy validity. Fuzzy if-then rule A rule of the form: if X is A then Y is B. In general, A and B are fuzzy sets. Fuzzy logic (FL) A precise logic of imprecision, uncertainty and approximate reasoning. Fuzzy logic gambit Exploitation of tolerance for imprecision through deliberate m-imprecisiation followed by mm-precisiation. Fuzzy set A class with a fuzzy boundary. Generalized constraint A constraint of the form X isr R, where X is the constrained variable, R is the constraining relation and r is an indexical variable which defines the modality of the constraint, that is, its semantics. In general, generalized constraints have elasticity. Generalized constraint language A language generated by combination and propagation of generalized constraints. Graduation Association of a scale of degrees with a fuzzy set. Granuland Result of granulation. Granular variable A variable which takes granules as variables. Granulation Partitioning of an object/set into granules.

Granule A clump of attribute values drawn together by indistinguishability, equivalence, similarity, proximity or functionality. Linguistic variable A granular variable with linguistic labels of granular values. m-precision Precision of meaning. mh-precisiand m-precisiand which is described in a natural language (human-oriented). mm-precisiand m-precisiand which is described in a mathematical language (machine-oriented). p-validity provable validity. Precisiand Result of precisiation. Precisiend Object of precisiation. v-precision Precision of value.

Definition of the Subject Viewed in a historical perspective, fuzzy logic is closely related to its precursor – fuzzy set theory [70]. Conceptually, fuzzy logic has a much broader scope and a much higher level of generality than traditional logical systems, among them the classical bivalent logic, multivalued logics, model logics, probabilistic logics, etc. The principal objective of fuzzy logic is formalization – and eventual mechanization – of two remarkable human capabilities. First, the capability to converse, communicate, reason and make rational decisions in an environment of imprecision, uncertainty, incompleteness of information, partiality of truth and partiality of possibility. And second, the capability to perform a wide variety of physical and mental tasks – such as driving a car in city traffic and summarizing a book – without any measurement and any computations. A concept which has a position of centrality in fuzzy logic is that of a fuzzy set. Informally, a fuzzy set is a class with a fuzzy boundary, implying a gradual transition from membership to nonmembership. A fuzzy set is precisiated through graduation, that is, through association with a scale of grades of membership. Thus, membership in a fuzzy set is a matter of degree. Importantly, in fuzzy logic everything is or is allowed to be graduated, that is, be a matter of degree. Furthermore, in fuzzy logic everything is or is allowed to be granulated, with a granule being a clump of attribute-values drawn together by indistinguishability, equivalence, similarity, proximity or functionality. Graduation and granulation form the core of fuzzy logic. Graduated granulation is the basis for the concept of a linguistic variable – a variable whose values are words rather than numbers [73]. The concept of a linguistic variable is employed in almost all applications of fuzzy logic.

1177

1178

Fuzzy Logic

During much of its early history, fuzzy logic was an object of controversy stemming in part from the pejorative connotation of the term “fuzzy”. In reality, fuzzy logic is not fuzzy. Basically, fuzzy logic is a precise logic of imprecision and uncertainty. An important milestone in the evolution of fuzzy logic was the development of the concept of a linguistic variable and the machinery of fuzzy if-then rules [73,90]. Another important milestone was the conception of possibility theory [79]. Possibility theory and probability theory are complimentary. A further important milestone was the development of the formalism of computing with words (CW) [93]. Computing with words opens the door to a wide-ranging enlargement of the role of natural languages in scientific theories. In the following, fuzzy logic is viewed in a nontraditional perspective. In this perspective, the cornerstones of fuzzy logic are graduation, granulation, precisisation and the concept of a generalized constraint. The concept of a generalized constraint serves to precisiate the concept of granular information. Granular information is the basis for granular computing (GrC) [2,29,37,81,91,92]. In granular computing the objects of computation are granular variables, with a granular value of a granular variable representing an imprecise and/or uncertain information about the value of the variable. In effect, granular computing is the computational facet of fuzzy logic. GrC and CW are closely related. In coming years, GrC and CW are likely to play increasingly important roles in the evolution of fuzzy logic and its applications. Introduction Science deals not with reality but with models of reality. In large measure, scientific progress is driven by a quest for better models of reality. In the real world, imprecision, uncertainty and complexity have a pervasive presence. In this setting, construction of better models of reality requires a better understanding of how to deal effectively with imprecision, uncertainty and complexity. To a significant degree, development of fuzzy logic has been, and continues to be, motivated by this need. In essence, logic is concerned with formalization of reasoning. Correspondently, fuzzy logic is concerned with formalization of fuzzy reasoning, with the understanding that precise reasoning is a special case of fuzzy reasoning. Humans have many remarkable capabilities. Among them there are two that stand out in importance. First, the capability to converse, communicate, reason and make rational decisions in an environment of imprecision, uncer-

tainty, incompleteness of information, partiality of truth and partiality of possibility. And second, the capability to perform a wide variety of physical and mental tasks – such as driving a car in heavy city traffic and summarizing a book – without any measurements and any computations. In large measure, fuzzy logic is aimed at formalization, and eventual mechanization, of these capabilities. In this perspective, fuzzy logic plays the role of a bridge from natural to machine intelligence. There are many misconceptions about fuzzy logic. A common misconception is that fuzzy logic is fuzzy. In reality, fuzzy logic is not fuzzy. Fuzzy logic deals precisely with imprecision and uncertainty. In fuzzy logic, the objects of deduction are, or are allowed to be fuzzy, but the rules governing deduction are precise. In summary, fuzzy logic is a precise system of reasoning, deduction and computation in which the objects of discourse and analysis are associated with information which is, or is allowed to be, imprecise, uncertain, incomplete, unreliable, partially true or partially possible. For illustration, here are a few simple examples of reasoning in which the objects of reasoning are fuzzy. First, consider the familiar example of deduction in Aristotelian, bivalent logic. all men are mortal Socrates is a man Socrates is mortal In this example, there is no imprecision and no uncertainty. In an environment of imprecision and uncertainty, an analogous example – an example drawn from fuzzy logic – is most Swedes are tall Magnus is a Swede it is likely that Magnus is tall with the understanding that Magnus is an individual picked at random from a population of Swedes. To deduce the answer from the premises, it is necessary to precisiate the meaning of “most” and “tall,” with “likely” interpreted as a fuzzy probability which, as a fuzzy number, is equal to “most”. This simple example points to a basic characteristic of fuzzy logic, namely, in fuzzy logic precisiation of meaning is a prerequisite to deduction. In the example under consideration, deduction is contingent on precisiation of “most”, “tall” and “likely”. The issue of precisiation has a position of centrality in fuzzy logic. In fuzzy logic, deduction is viewed as an instance of question-answering. Let I be an information set consisting

Fuzzy Logic

of a system of propositions p1 ; : : : ; p n , I D S(p1 ; : : : ; p n ). Usually, I is a conjunction of p1 ; : : : ; p n . Let q be a question. A question-answering schema may be represented as I q

where ans(q/I) denotes the answer to q given I. The following examples are instances of the deduction schema

q:

most Swedes are tall Magnus is a Swede what is the probability that Magnus is tall?

most Swedes are tall what fraction of Swedes are not tall?

most Swedes are tall what fraction of Swedes are short?

I:

q:

most Swedes are tall what is the average height of Swedes?

(2)

A very simple example of interpolative deduction is the following. Assume that  a1 ,  a2 ,  a3 are labeled small, medium and large, respectively. A fuzzy graph of f may be expressed as a calculus of fuzzy if-then rules. f

(3)

(4)

ans(q/I) a box contains balls of various sizes most are small there are many more small balls than large balls what is the probability that a ball drawn at random is neither large nor small? ans(q/I) : (5)

In these examples, rules of deduction in fuzzy logic must be employed to compute ans(q/I). For (1) and (2) deduction is simple. For (3)–(5) deduction requires the use of what is referred to as the extension principle [70,75]. This principle is discussed in Sect. “The Concept of a Generalized Constraint”. A less simple example of deduction involves interpolation of an imprecisely specified function. Interpolation of imprecisely specified functions, or interpolative deduction for short, plays a pivotal role in many applications of fuzzy logic, especially in the realm of control. For simplicity, assume that f is a function from reals to reals, Y D f (X). Assume that what is known about f is a collection of input-output pairs of the form 

f D (( a1 ;  b1 ); : : : ; ( a n ;  b n )) ;

i D 1; : : : ; n :

ans(q/( f ;  a))

:

ans(q/I) I: q:

Y is b i ;

I: f q :  f ( a)

(1)

ans(q/I) is (1  most) I: q:

then

Let a be a value of X. Viewing  f as an information set, I, interpolation of f may be expressed as a questionanswering schema

ans(q/I) is likely, likely=most I: q:

X is a i

if

ans(q/I)

I:

where  a is an abbreviation of “approximately a”. Such a collection is referred to as a fuzzy graph of f [74]. A fuzzy graph of f may be interpreted as a summary of f . In many applications, a fuzzy graph is described as a collection of fuzzy if-then rules of the form

if X is small then Y is small if X is medium than Y is large if X is large then Y is small:

Given a value of X, X D a, the question is: What is The so-called Mamdani rule [39,74,78] provides an answer to this question. More generally, in an environment of imprecision and uncertainty, fuzzy if-then rules may be of the form

 f ( a)?

if

X is  a i ;

then usually (Y is  b i ) ;

i D 1; : : : ; n :

Rules of this form endow fuzzy logic with the capability to model complex causal dependencies, especially in the realms of economics, social systems, forecasting and medicine. What is not widely recognized is that fuzzy logic is more than an addition to the methods of dealing with imprecision, uncertainty and complexity. In effect, fuzzy logic represents a paradigm shift. More specifically, it is traditional to associate scientific progress with progression from perceptions to numbers. What fuzzy logic adds to this capability are four basic capabilities. (a) Nontraditional. Progression from perceptions to precisiated words (b) Nontraditional. Progression from unprecisiated words to precisiated words (c) Countertraditional. Progression from numbers to precisiated words (d) Nontraditional. Computing with words (CW)/NLcomputation.

1179

1180

Fuzzy Logic

These capabilities open the door to a wide-ranging enlargement of the role of natural languages in scientific theories. Our brief discussion of deduction in the context of fuzzy logic is intended to clarify the nature of problems which fuzzy logic is designed to address. The principal concepts and techniques which form the core of fuzzy logic are discussed in the following. Our discussion draws on the concepts and ideas introduced in [102].

Conceptual Structure of Fuzzy Logic There are many logical systems, among them the classical, Aristotelian, bivalent logic, multivalued logics, model logics, probabilistic logic, logic, dynamic logic, etc. What differentiates fuzzy logic from such logical systems is that fuzzy logic is much more than a logical system. The point of departure in fuzzy logic is the concept of a fuzzy set. Informally, a fuzzy set is a class with a fuzzy boundary, implying that, in general, transition from membership to nonmembership in a fuzzy set is gradual rather than abrupt. A set is a class with a crisp boundary (Fig. 1). A set, A, in a space U, U D fug, is precisiated through association with a characteristic function which maps U into f0; 1g. More generally, a fuzzy set, A, is precisiated through graduation, that is, through association with A of a membership function, A – a mapping from U to a grade of membership space, G, with A (u) representing the grade of membership of u in A. In other words, membership in a fuzzy set is a matter of degree. A familiar example of graduation is the association of Richter scale with the class of earthquakes. A fuzzy set is basic if G is the unit interval. More generally, G may be a partially ordered set. L-fuzzy sets [23] fall into this category. A basic fuzzy set is of Type 1. A fuzzy set, A, is of Type 2 if A (u) is a fuzzy set of Type 1. Recursively, a fuzzy set, A, is of Type n if A (u) is a fuzzy set of Type n  1, n D 2; 3; : : : [75]. Fuzzy sets of Type 2 have become an object of growing attention in the literature of fuzzy logic [40]. Unless stated to the contrary, a fuzzy set is assumed to be of Type 1 (basic). Note. A clarification is in order. Consider a concatenation of two words, A and B, with A modifying B, e. g. A is an adjective and B is a noun. Usually, A plays the role of an s-modifier, that is, a modifier which specializes B in the sense that AB is a subset of B, as in convex set. In some instances, however, A plays the role of a g-modifier, that is, a modifier which generalizes B. In this sense, fuzzy in fuzzy set, fuzzy logic and, more generally, in fuzzy B, is a g-modifier. Examples: fuzzy topology, fuzzy measure, fuzzy arithmetic, fuzzy stability, etc. Many misconceptions

Fuzzy Logic, Figure 1 The concepts of a set and a fuzzy set are derived from the concept of a class through precisiation. A fuzzy set has a fuzzy boundary. A fuzzy set is precisiated through graduation

about fuzzy logic are rooted in incorrect interpretation of fuzzy as an s-modifier. What is important to note is that (a) A (u) is namebased (extensional) if u is the name of an object in U, e. g., middle-aged(Vera); (b) A (u) is attribute-based (intensional) if the grade of membership is a function of an attribute of u, e. g., age; and (c) A (u) is perception-based if u is a perception of an object, e. g., Vera’s grade of membership in the class of middle-aged women, based on her appearance, is 0.8. It should be observed that a class of objects, A, has two basic attributes: (a) the boundary of A; and (b) the cardinality (count) or, more generally, the measure of A (Fig. 1). In this perspective, fuzzy set theory is, in the main, boundary-oriented, while probability theory is, in the main, measure-oriented. Fuzzy logic is, in the main, both boundary- and measure-oriented. The concept of a membership function is the centerpiece of fuzzy set theory. With the concept of a fuzzy set as the point of departure, different directions may be pursued, leading to various facets of fuzzy logic. More specifically, the following are the principal facets of fuzzy logic: the logical facet, FLl; the fuzzy-set-theoretic facet, FLs, the epistemic facet, Fle; and the relational facet, FLr (Fig. 2). The logical facet of FL, FLl, is fuzzy logic in its narrow sense. FLl may be viewed as a generalization of multivalued logic. The agenda of FLl is similar in spirit to the agenda of classical logic [17,22,25,44,45]. The fuzzy-set-theoretic facet, FLs, is focused on fuzzy sets. The theory of fuzzy sets is central to fuzzy logic. Historically, the theory of fuzzy sets [70] preceded fuzzy

Fuzzy Logic

Fuzzy Logic, Figure 3 The cornerstones of a nontraditional view of fuzzy logic Fuzzy Logic, Figure 2 Principal facets of fuzzy logic (FL). The nucleus of fuzzy logic is the concept of a fuzzy set

logic [77]. The theory of fuzzy sets may be viewed as an entry to generalizations of various branches of mathematics, among them fuzzy topology, fuzzy measure theory, fuzzy graph theory, fuzzy algebra and fuzzy differential equations. Note that fuzzy X is a fuzzy-set-theory-based or, more generally, fuzzy-logic-based generalization of X. The epistemic facet of FL, FLe, is concerned with knowledge representation, semantics of natural languages and information analysis. In FLe, a natural language is viewed as a system for describing perceptions. An important branch of FLe is possibility theory [15,79,83]. Another important branch of FLe is the computational theory of perceptions [93,94,95]. The relational facet, FLr, is focused on fuzzy relations and, more generally, on fuzzy dependencies. The concept of a linguistic variable – and the associated calculi of fuzzy if-then rules – play pivotal roles in almost all applications of fuzzy logic [1,3,6,8,12,13,18,26,28,32,36,40,48,52,65,66, 67,69].

Fuzzy Logic, Figure 4 Precisiation of middle-age through graduation

The cornerstones of fuzzy logic are the concepts of graduation, granulation, precisiation and generalized constraints (Fig. 3). These concepts are discussed in the following. The Basics of Fuzzy Set Theory The grade of membership, A (u), of u in A may be interpreted in various ways. It is convenient to employ the proposition p: Vera is middle-aged, as an example, with middle-age represented as a fuzzy set shown in Fig. 4. Among the various interpretations of p are the following. Assume that q is the proposition: Vera is 43 years old; and r is the proposition: the grade of membership of Vera in the fuzzy set of middle-aged women is 0.8. (a) The truth value of p given r is 0.8. (b) The possibility of q given p and r is 0.8. (c) The degree to which the concept of middle-age has to be stretched to apply to Vera is (1–0.8).

1181

1182

Fuzzy Logic

(d) 80% of voters in a voting group vote for p given q and r. (e) The probability of p given q and r is 0.8. Of these interpretations, (a)–(c) are most compatible with intuition. If A and B are fuzzy sets in U, then their intersection (conjunction), A \ B, and union (disjunction), A [ B, are defined as A\B (u) D A (u) ^ B (u) ;

u2U

A[B (u) D A (u) _ B (u) ;

u2U

where ^ D min and _ D max. More generally, conjunction and disjunction are defined through the concepts of t-norm and t-conorm, respectively [48]. When U is a finite set, U D fu1 ; : : : ; u n g, it is convenient to represent A as a union of fuzzy singletons, A (u i )/u i ;

i D 1; : : : ; n :

Specifically;

A D A (u1 )/u1 C    C A (u n )/u n

in which C denotes disjunction. More compactly, A D ˙ i A (u i )/u i ;

i D 1; : : : ; n :

When U is a continuum, A may be expressed as Z A (u)/u : U

A basic concept in fuzzy set theory is that of a level set [70], commonly referred to as an ˛-cut [48]. Specifically, if A is a fuzzy set in U, then an ˛-cut, A˛ , is defined as (Fig. 5) A˛ D fujA (u) > ˛g ;

0 supSCL C(p ; p) :

Fuzzy Logic

This obvious inequality has an important implication. Specifically, as a meaning representation language, fuzzy logic dominates bivalent logic. As a very simple example consider the proposition p: Speed limit is 65 mph. Realistically, what is the meaning of p? The inequality implies that employment of fuzzy logic for precisiation of p would lead to a precisiand whose cointension is at least as high – and generally significantly higher – than the cointension which is achievable through the use of bivalent logic. More concretely, assume that A tells B that the speed limit is 65 mph, with the understanding that 65 mph should be interpreted as “approximately 65 mph”. B asks A to precisiate what is meant by “approximately 65 mph”, and stipulates that no imprecise numbers and no probabilities should be used in precisiation. With this restriction, A is not capable of formulating a realistic meaning of “approximately 65 mph”. Next, B allows A to use imprecise numbers but no probabilities. B is still unable to formulate a realistic definition. Next, B allows A to employ imprecise numbers but no imprecise probabilities. Still, A is unable to formulate a realistic definition. Finally, B allows A to use imprecise numbers and imprecise probabilities. This allows A to formulate a realistic definition of “approximately 65 mph”. This simple example is intended to demonstrate the need for the vocabulary of fuzzy logic to precisiate the meaning of terms and concepts which involve imprecise probabilities. In addition to serving as a basis for precisiation of meaning, GCL serves another important function – the function of a deductive question-answering system [100]. In this role, what matters are the rules of deduction. In GCL, the rules of deduction coincide with the rules which govern constraint propagation and counterpropagation. Basically, these are the rules which govern generation of a generalized constraint from other generalized constraints [100,101]. The principal rule of deduction in fuzzy logic is the socalled extension principle [70,75]. The extension principle can assume a variety of forms, depending on the generalized constraints to which it applies. A basic form which involves possibilistic constraints is the following. An analogous principle applies to probabilistic constraints. Let X be a variable which takes values in U, and let f be a function from U to V. The point of departure is a possibilistic constraint on f (X) expressed as

which may be expressed as g(X) is B ; where B is a fuzzy relation. The question is: What is B? The extension principle reduces the problem of computation of B to the solution of a variational problem. Specifically, f (X) is A g(X) is B where B (w) D supu A ( f (u)) subject to w D g(u) : The structure of the solution is depicted in Fig. 21. Basically, the possibilistic constraint on f (X) counterpropagates to a possibilistic constraint on X. Then, the possibilistic constraint on X propagates to a possibilistic constraint on g(X). There is a version of the extension principle – referred to as the fuzzy-graph extension principle – which plays an important role in control and systems analysis [74,78,90]. More specifically, let f be a function from reals to reals, Y D f (X). Let  f and  X be the granulands of f and X, respectively, with  f having the form of a fuzzy graph (Sect. “The Concept of Granulation”). 

f D A1  B j(1) C    C A m  B j(m) ;

f (X) is A ; where A is a fuzzy relation in V which is defined by its membership function A (v), v 2 V . Let g be a function from U to W. The possibilistic constraint on f (X) induces a possibilistic constraint on g(X)

Fuzzy Logic, Figure 21 Structure of the extension principle

1193

1194

Fuzzy Logic

Fuzzy Logic, Figure 22 Fuzzy-graph extension principle. B D  f(A)

where the A i , i D 1; : : : ; m and the B j , j D 1; : : : ; n, are granules of X and Y, respectively; × denotes Cartesian product; and C denotes disjunction (Fig. 22). In this instance, the extension principle may be expressed as follows. X is A f is (A1  B j(1) C    C A m  B j(m) ) Y is (m1 ^ B j(1) C    C m m ^ B j(m) ) where the m i are matching coefficients, defined as [78] m i D sup(A \ A i ) ;

i D 1; : : : ; m

and ^ denotes conjunction (min). In the special case where X is a number, a, the possibilistic constraint on Y may be expressed as Y is (A 1 (a) ^ B j(1) C    C A m (a) ^ B j(m) ) : In this form, the extension principle plays a key role in the Mamdani–Assilian fuzzy logic controller [39]. Deduction Assume that we are given an information set, I, which consists of a system of propositions (p1 ; : : : ; p n ). I will be referred to as the initial information set. The canonical problem of deductive question-answering is that of computing an answer to q, ans(qjI), given I [100,101]. The first step is to ask the question: What information is needed to answer q? Suppose that the needed information consists of the values of the variables X1 ; : : : ; X n .

Thus, ans(qjI) D g(X 1 ; : : : ; X n ) ; where g is a known function. Using GCL as a meaning precisiation language, one can express the initial information set as a generalized constraint on X 1 ; : : : ; X n . In the special case of possibilistic constraints, the constraint on the X i may be expressed as f (X 1 ; : : : ; X n ) is A ; where A is a fuzzy relation. At this point, what we have is (a) a possibilistic constraint induced by the initial information set, and (b) an answer to q expressed as ans(qjI) D g(X 1 ; : : : ; X n ) ; with the understanding that the possibilistic constraint on f propagates to a possibilistic constraint on g. To compute the induced constraint on g what is needed is the extension principle of fuzzy logic [70,75]. As a simple illustration of deduction, it is convenient to use an example which was considered earlier. Initial information set; Question;

p : most Swedes are tall q : what is the average height of Swedes?

What information is needed to compute the answer to q? Let P be a population of n Swedes, Swede1 ; : : : ; Sweden . Let h i be the height of Swede i ; i D 1; : : : ; n. Knowing the h i , one can express the answer to q as av(h) : ans(qjp) D

1 n

(h1 C    C h n ) :

Fuzzy Logic

Turning to the constraint induced by p, we note that the mm-precisiand of p may be expressed as the possibilistic constraint mm-precisiation 1 X p ! n Count(tall.Swedes) is most ; where ˙ Count(tall.Swedes) is the number of tall.Swedes in P, with the understanding that tall.Swedes is a fuzzy subset of P. Using this definition of ˙ Count [86], one can write the expression for the constraint on the hi as   1 n tall (h1 ) C    C tall (h n ) is most : At this point, application of the extension principle leads to a solution which may be expressed as    av(h) (v) D sup h n1 most tall (h1 ) C    C tall (h n ) ; h D (h1 ; : : : ; h n ) subject to vD

1 n

(h1 C    C h n ) :

In summary, the Generalized Constraint Language is, by construction, maximally expressive. Importantly, what this implies is that, in realistic settings, fuzzy logic, viewed as a modeling language, has a significantly higher level of power and generality than modeling languages based on standard constraints or, equivalently, on bivalent logic and bivalent-logic-based probability theory. Principal Contributions of Fuzzy Logic As was stated earlier, fuzzy logic is much more than an addition to existing methods for dealing with imprecision, uncertainty and complexity. In effect, fuzzy logic represents a paradigm shift. The structure of the shift is shown in Fig. 23. Contributions of fuzzy logic range from contributions to basic sciences to applications involving various types of systems and products. The principal contributions are summarized in the following. Fuzzy Logic as the Basis for Generalization of Scientific Theories One of the principal contributions of fuzzy logic to basic sciences relates to what is referred to as FL-generalization. By construction, fuzzy logic has a much more general conceptual structure than bivalent logic. A key element in the transition from bivalent logic to fuzzy logic is the generalization of the concept of a set to a fuzzy set. This generalization is the point of departure for FL-generalization.

Fuzzy Logic, Figure 23 Fuzzy logic as a paradigm shift

More specifically, FL-generalization of any theory, T, involves an addition to T of concepts drawn from fuzzy logic. In the limit, as more and more concepts which are drawn from fuzzy logic are added to T, the foundation of T is shifted from bivalent logic to fuzzy logic. By construction, FL-generalization results in an upgraded theory, T C , which is at least as rich and, in general, significantly richer than T. As an illustration, consider probability theory, PT – a theory which is bivalent-logic-based. Among the basic concepts drawn from fuzzy logic which may be added to PT are the following [96]. set C fuzzy set event C fuzzy event relation C fuzzy relation probability C fuzzy probability random set C fuzzy random set independence C fuzzy independence stationarity C fuzzy stationarity random variable C fuzzy random variable etc. As a theory, PTC is much richer than PT. In particular, it provides a basis for construction of models which are much closer to reality than those that can be constructed through the use of PT. This applies, in particular, to computation with imprecise probabilities. A number of scientific theories have already been FLgeneralized to some degree, and many more are likely to be FL-generalized in coming years. Particularly worthy of note are the following FL-generalizations. control ! fuzzy control [12,18,20,69,72] linear programming ! fuzzy linear programming [19,103]

1195

1196

Fuzzy Logic

probability theory ! fuzzy probability theory [96,98,101] measure theory ! fuzzy measure theory [14,62] topology ! fuzzy topology [38,68] graph theory ! fuzzy graph theory [34,41] cluster analysis ! fuzzy cluster analysis [7,27] Prolog ! fuzzy Prolog [21,42] etc. FL-generalization is a basis for an important rationale for the use of fuzzy logic. It is conceivable that eventually the foundations of many scientific theories may be shifted from bivalent logic to fuzzy logic. Linguistic Variables and Fuzzy If-Then Rules The most visible, the best understood and the most widely used contribution of fuzzy logic is the concept of a linguistic variable and the associated machinery of fuzzy if-then rules [90]. The machinery of linguistic variables and fuzzy if-then rules is unique to fuzzy logic. This machinery has played and is continuing to play a pivotal role in the conception and design of control systems and consumer products. However, its applicability is much broader. A key idea which underlies the machinery of linguistic variables and fuzzy if-then rules is centered on the use of information compression. In fuzzy logic, information compression is achieved through the use of graduated (fuzzy) granulation. The Concepts of Precisiation and Cointension The concepts of precisiation and cointension play pivotal roles in fuzzy logic [101]. In fuzzy logic, differentiation is made between two concepts of precision: precision of value, v-precision; and precision of meaning, m-precision. Furthermore, differentiation is made between precisiation of meaning which is (a) human-oriented, or mh-precisiation for short; and (b) machine-oriented, or mm-precisiation for short. It is understood that mm-precisiation is mathematically well defined. The object of precisiation, p, and the result of precisiation, p , are referred to as precisiend and precisiand, respectively. Informally, cointension is defined as a measure of closeness of the meanings of p and p . Precisiation is cointensive if the meaning of p

is close to the meaning of p. One of the important features of fuzzy logic is its high power of cointensive precisiation. What this implies is that better models of reality can be achieved through the use of fuzzy logic. Cointensive precisiation has an important implication for science. In large measure, science is bivalent-logicbased. In consequence, in science it is traditional to define concepts in a bivalent framework, with no degrees of truth allowed. The problem is that, in reality, many concepts in science are fuzzy, that is, are a matter of degree. For this reason, bivalent-logic-based definitions of scientific concepts are, in many cases, not cointensive. To formulate cointensive definitions of fuzzy concepts it is necessary to employ fuzzy logic. As was noted earlier, one of the principal contributions of fuzzy logic is its high power of cointensive precisiation. The significance of this capability of fuzzy logic is underscored by the fact that it has always been, and continues to be, a basic objective of science to precisiate and clarify what is imprecise and unclear. Computing with Words (CW), NL-Computation and Precisiated Natural Language (PNL) Much of human knowledge is expressed in natural language. Traditional theories of natural language are based on bivalent logic. The problem is that natural languages are intrinsically imprecise. Imprecision of natural languages is rooted in imprecision of perceptions. A natural language is basically a system for describing perceptions. Perceptions are intrinsically imprecise, reflecting the bounded ability of human sensory organs, and ultimately the brain, to resolve detail and store information. Imprecision of perceptions is passed on to natural languages. Bivalent logic is intolerant of imprecision, partiality of truth and partiality of possibility. For this reason, bivalent logic is intrinsically unsuited to serve as a foundation for theories of natural language. As the logic of imprecision and approximate reasoning, fuzzy logic is a much better choice [71,80,84,85,86,88]. Computing with words (CW), NL-computation and precisiated natural language (PNL) are closely related formalisms [93,94,95,97]. In conventional modes of computation, the objects of computation are mathematical constructs. By contrast, in computing with words the objects of computation are propositions and predicates drawn from a natural language. A key idea which underlies computing with words involves representing the meaning of propositions and predicates as generalized constraints. Computing with words opens the door to a wide-ranging enlargement of the role of natural languages in scientific theories [30,31,36,58,59,61].

Fuzzy Logic

Computational Theory of Perceptions Humans have a remarkable capability to perform a wide variety of physical and mental tasks without any measurements and any computations. In performing such tasks humans employ perceptions. To endow machines with this capability what is needed is a formalism in which perceptions can play the role of objects of computation. The fuzzy-logic-based computational theory of perceptions (CTP) serves this purpose [94,95]. A key idea in this theory is that of computing not with perceptions per se, but with their descriptions in a natural language. Representing perceptions as propositions drawn from a natural language opens the door to application of computing with words to computation with perceptions. Computational theory of perceptions is of direct relevance to achievement of human level machine intelligence. Possibility Theory Possibility theory is a branch of fuzzy logic [15,79]. Possibility theory and probability theory are distinct theories. Possibility theory may be viewed as a formalization of perception of possibility, whereas probability theory is rooted in perception of likelihood. In large measure, possibility theory and probability theory are complementary rather than competitive. Possibility theory is of direct relevance to, knowledge representation, semantics of natural languages, decision analysis and computation with imprecise probabilities. Computation with Imprecise Probabilities Most realworld probabilities are perceptions of likelihood. As such, real-world probabilities are intrinsically imprecise [75]. Until recently, the issue of imprecise probabilities has been accorded little attention in the literature of probability theory. More recently, the problem of computation with imprecise probabilities has become an object of rapidly growing interest [63,99]. Typically, imprecise probabilities occur in an environment of imprecisely defined variables, functions, relations, events, etc. Existing approaches to computation with imprecise probabilities do not address this reality. To address this reality what is needed is fuzzy logic and, more particularly, computing with words and the computational theory of perceptions. A step in this direction was taken in the paper “Toward a perception-based theory of probabilistic reasoning with imprecise probabilities”; [96] followed by the 2005 paper “Toward a generalized theory of uncertainty (GTU) – an outline”, [98] and the 2006 paper “Generalized theory of uncertainty (GTU) –principal concepts and ideas” [101]. Fuzzy Logic as a Modeling Language Science deals not with reality but with models of reality. More often than

not, reality is fuzzy. For this reason, construction of realistic models of reality calls for the use of fuzzy logic rather than bivalent logic. Fuzzy logic is a logic of imprecision, uncertainty and approximate reasoning [82]. It is natural to employ fuzzy logic as a modeling language when the objects of modeling are not well defined [102]. But what is somewhat paradoxical is that in many of its practical applications fuzzy logic is used as a modeling language for systems which are precisely defined. The explanation is that, in general, precision carries a cost. In those cases in which there is a tolerance for imprecision, reduction in cost may be achieved through imprecisiation, e. g., data compression, information compression and summarization. The result of imprecisiation is an object of modeling which is not precisely defined. A fuzzy modeling language comes into play at this point. This is the key idea which underlies the fuzzy logic gambit. The fuzzy logic gambit is widely used in the design of consumer products – a realm in which cost is an important consideration. A Glimpse of What Lies Beyond Fuzzy Logic Fuzzy logic has a much higher level of generality than traditional logical systems – a generality which has the effect of greatly enhancing the problem-solving capability of fuzzy logic compared with that of bivalent logic. What lies beyond fuzzy logic? What is of relevance to this question is the so-called incompatibility principle [73]. Informally, this principle asserts that as the complexity of a system increases a point is reached beyond which high precision and high relevance become incompatible. The concepts of mm-precisiation and cointension suggest an improved version of the principle: As the complexity of a system increases a point is reached beyond which high cointension and mm-precision become incompatible. What it means in plain words is that in the realm of complex systems – such as economic systems – it may be impossible to construct models which are both realistic and precise. As an illustration consider the following problem. Assume that A asks a cab driver to take him to address B. There are two versions of this problem. (a) A asks the driver to take him to B the shortest way; and (b) A asks the driver to take him to B the fastest way. Based on his experience, the driver decides on the route to take to B. In the case of (a), a GPS system can suggest a route that is provably the shortest way, that is, it can come up with a provably valid (p-valid) solution. In the case of (b) the uncertainties involved preclude the possibility of constructing a model of the system which is cointensive and mm-pre-

1197

1198

Fuzzy Logic

cise, implying that a p-valid solution does not exist. The driver’s solution, based on his experience, has what may be called fuzzy validity (f-validity). Thus, in the case of (b) no p-valid solution exists. What exists is an f-valid solution. In fuzzy logic, mm-precisiation is a prerequisite to computation. A question which arises is: What can be done when cointensive mm-precisiation is infeasible? To deal with such problems what is needed is referred to as extended fuzzy logic (FLC). In this logic, mm-precisiation is optional rather than mandatory. Very briefly, what is admissible in FLC is f-validity. Admissibility of f-validity opens the door to construction of concepts prefixed with f, e. g. f-theorem, f-proof, f-principle, f-triangle, f-continuity, f-stability, etc. An example is f-geometry. In f-geometry, figures are drawn by hand with a spray can. An example of f-theorem in f-geometry is the f-version of the theorem: The medians of a triangle are concurrent. The f-version of this theorem reads: The f-medians of an f-triangle are f-concurrent. An f-theorem can be proved in two ways. (a) empirically, that is, by drawing triangles with a spray can and verifying that the medians intersect at an f-point. Alternatively, the theorem may be f-proved by constructing an f-analogue of its traditional proof. At this stage, the extended fuzzy logic is merely an idea, but it is an idea which has the potential for being a point of departure for construction of theories with important applications to the solution of real-world problems. Bibliography Primary Literature 1. Aliev RA, Fazlollahi B, Aliev RR, Guirimov BG (2006) Fuzzy time series prediction method based on fuzzy recurrent neural network. In: Neuronal Informatinformation Processing Book. Lecture notes in computer science, vol 4233. Springer, Berlin, pp 860–869 2. Bargiela A, Pedrycz W (2002) Granular computing: An Introduction. Kluwer Academic Publishers, Boston 3. Bardossy A, Duckstein L (1995) Fuzzy rule-based modelling with application to geophysical, biological and engineering systems. CRC Press, New York 4. Bellman RE, Zadeh LA (1970) Decision-making in a fuzzy environment. Manag Sci B 17:141–164 5. Belohlavek R, Vychodil V (2006) Attribute implications in a fuzzy setting. In: Ganter B, Kwuida L (eds) ICFCA (2006) Lecture notes in artificial intelligence, vol 3874. Springer, Heidelberg, pp 45–60 6. Bezdek J, Pal S (eds) (1992) Fuzzy models for pattern recognition – methods that search for structures in data. IEEE Press, New York 7. Bezdek J, Keller JM, Krishnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing. In: Zimmermann H (ed) Kluwer, Dordrecht

8. Bouchon-Meunier B, Yager RR, Zadeh LA (eds) (2000) Uncertainty in intelligent and information systems. In: Advances in fuzzy systems – applications and theory, vol 20. World Scientific, Singapore 9. Colubi A, Santos Domínguez-Menchero J, López-Díaz M, Ralescu DA (2001) On the formalization of fuzzy random variables. Inf Sci 133(1–2):3–6 10. Cresswell MJ (1973) Logic and Languages. Methuen, London 11. Dempster AP (1967) Upper and lower probabilities induced by a multivalued mapping. Ann Math Stat 38:325–329 12. Driankov D, Hellendoorn H, Reinfrank M (1993) An Introduction to Fuzzy Control. Springer, Berlin 13. Dubois D, Prade H (1980) Fuzzy Sets and Systems – Theory and Applications. Academic Press, New York 14. Dubois D, Prade H (1982) A class of fuzzy measures based on triangular norms. Int J General Syst 8:43–61 15. Dubois D, Prade H (1988) Possibility Theory. Plenum Press, New York 16. Dubois D, Prade H (1994) Non-standard theories of uncertainty in knowledge representation and reasoning. Knowl Engineer Rev Camb J Online 9(4):pp 399–416 17. Esteva F, Godo L (2007) Towards the generalization of Mundici’s gamma functor to IMTL algebras: the linearly ordered case, Algebraic and proof-theoretic aspects of nonclassical logics, pp 127–137 18. Filev D, Yager RR (1994) Essentials of Fuzzy Modeling and Control. Wiley-Interscience, New York 19. Gasimov RN, Yenilmez K (2002) Solving fuzzy linear programming problems with linear membership functions. Turk J Math 26:375–396 20. Gerla G (2001) Fuzzy control as a fuzzy deduction system. Fuzzy Sets Syst 121(3):409–425 21. Gerla G (2005) Fuzzy logic programming and fuzzy control. Studia Logica 79(2):231–254 22. Godo LL, Esteva F, García P, Agustí J (1991) A formal semantical approach to fuzzy logic. In: International Symposium on Multiple Valued Logic, ISMVL’91, pp 72–79 23. Goguen JA (1967) L-fuzzy sets. J Math Anal Appl 18:145–157 24. Goodman IR, Nguyen HT (1985) Uncertainty models for knowledge-based systems. North Holland, Amsterdam 25. Hajek P (1998) Metamathematics of fuzzy logic. Kluwer, Dordrecht 26. Hirota K, Sugeno M (eds) (1995) Industrial applications of fuzzy technology in the world. In: Advances in fuzzy systems – applications and theory, vol 2. World Scientific, Singapore 27. Höppner F, Klawonn F, Kruse R, Runkler T (1999) Fuzzy cluster analysis. Wiley, Chichester 28. Jamshidi M, Titli A, Zadeh LA, Boverie S (eds) (1997) Applications of fuzzy logic – towards high machine intelligence quotient systems. In: Environmental and intelligent manufacturing systems series, vol 9. Prentice Hall, Upper Saddle River 29. Jankowski A, Skowron A (2007) Toward rough-granular computing. In: Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, (RSFDGrC’07), Toronto, Canada, pp 1–12 30. Kacprzyk J, Zadeh LA (eds) (1999) Computing with words in information/intelligent systems part 1. Foundations. Physica, Heidelberg, New York 31. Kacprzyk J, Zadeh LA (eds) (1999) Computing with words in information/intelligent systems part 2. Applications. Physica, Heidelberg, New York

Fuzzy Logic

32. Kandel A, Langholz G (eds) (1994) Fuzzy control systems. CRC Press, Boca Raton 33. Klir GJ (2006) Uncertainty and information: Foundations of generalized information theory. Wiley-Interscience, Hoboken 34. Kóczy LT (1992) Fuzzy graphs in the evaluation and optimization of networks. Fuzzy Sets Syst 46(3):307–319 35. Lambert K, Van Fraassen BC (1970) Meaning relations, possible objects and possible worlds. Philosophical problems in logic, pp 1–19 36. Lawry J, Shanahan JG, Ralescu AL (eds) (2003) Modelling with words – learning, fusion, and reasoning within a formal linguistic representation framework. Springer, Heidelberg 37. Lin TY (1997) Granular computing: From rough sets and neighborhood systems to information granulation and computing in words. In: European Congress on Intelligent Techniques and Soft Computing, September 8–12, pp 1602–1606 38. Liu Y, Luo M (1997) Fuzzy topology. In: Advances in fuzzy systems – applications and theory, vol 9. World Scientific, Singapore 39. Mamdani EH, Assilian S (1975) An experiment in linguistic synthesis with a fuzzy logic controller. Int J Man-Machine Stud 7:1–13 40. Mendel J (2001) Uncertain rule-based fuzzy logic systems – Introduction and new directions. Prentice Hall, Upper Saddle River 41. Mordeson JN, Nair PS (2000) Fuzzy graphs and fuzzy hypergraphs. In: Studies in Fuzziness and Soft Computing. Springer, Heidelberg 42. Mukaidono M, Shen Z, Ding L (1989) Fundamentals of fuzzy prolog. Int J Approx Reas 3(2):179–193 43. Nguyen HT (1993) On modeling of linguistic information using random sets. In: Fuzzy sets for intelligent systems. Morgan Kaufmann Publishers, San Mateo, pp 242–246 45. Novak V (2006) Which logic is the real fuzzy logic? Fuzzy Sets Syst 157:635–641 44. Novak V, Perfilieva I, Mockor J (1999) Mathematical principles of fuzzy logic. Kluwer, Boston/Dordrecht 46. Ogura Y, Li S, Kreinovich V (2002) Limit theorems and applications of set-valued and fuzzy set-valued random variables. Springer, Dordrecht 47. Orlov AI (1980) Problems of optimization and fuzzy variables. Znaniye, Moscow 48. Pedrycz W, Gomide F (2007) Fuzzy systems engineering: Toward human-centric computing. Wiley, Hoboken 49. Perfilieva I (2007) Fuzzy transforms: a challenge to conventional transforms. In: Hawkes PW (ed) Advances in images and electron physics, 147. Elsevier Academic Press, San Diego, pp 137–196 50. Puri ML, Ralescu DA (1993) Fuzzy random variables. In: Fuzzy sets for intelligent systems. Morgan Kaufmann Publishers, San Mateo, pp 265–271 51. Ralescu DA (1995) Cardinality, quantifiers and the aggregation of fuzzy criteria. Fuzzy Sets Syst 69:355–365 52. Ross TJ (2004) Fuzzy logic with engineering applications, 2nd edn. Wiley, Chichester 53. Rossi F, Codognet P (2003) Special Issue on Soft Constraints: Constraints 8(1) 54. Rutkowska D (2002) Neuro-fuzzy architectures and hybrid learning. In: Studies in fuzziness and soft computing. Springer 55. Rutkowski L (2008) Computational intelligence. Springer, Polish Scientific Publishers PWN, Warzaw

56. Schum D (1994) Evidential foundations of probabilistic reasoning. Wiley, New York 57. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton 58. Trillas E (2006) On the use of words and fuzzy sets. Inf Sci 176(11):1463–1487 59. Türksen IB (2007) Meta-linguistic axioms as a foundation for computing with words. Inf Sci 177(2):332–359 60. Wang PZ, Sanchez E (1982) Treating a fuzzy subset as a projectable random set. In: Gupta MM, Sanchez E (eds) Fuzzy information and decision processes. North Holland, Amsterdam, pp 213–220 61. Wang P (2001) Computing with words. Albus J, Meystel A, Zadeh LA (eds) Wiley, New York 62. Wang Z, Klir GJ (1992) Fuzzy measure theory. Springer, New York 63. Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman & Hall, London 64. Wygralak M (2003) Cardinalities of fuzzy sets. In: Studies in fuzziness and soft computing. Springer, Berlin 65. Yager RR, Zadeh LA (eds) (1992) An introduction to fuzzy logic applications in intelligent systems. Kluwer Academic Publishers, Norwell 66. Yen J, Langari R, Zadeh LA (ed) (1995) Industrial applications of fuzzy logic and intelligent systems. IEEE, New York 67. Yen J, Langari R (1998) Fuzzy logic: Intelligence, control and information, 1st edn. Prentice Hall, New York 68. Ying M (1991) A new approach for fuzzy topology (I). Fuzzy Sets Syst 39(3):303–321 69. Ying H (2000) Fuzzy control and modeling – analytical foundations and applications. IEEE Press, New York 70. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353 71. Zadeh LA (1972) A fuzzy-set-theoretic interpretation of linguistic hedges. J Cybern 2:4–34 72. Zadeh LA (1972) A rationale for fuzzy control. J Dyn Syst Meas Control G 94:3–4 73. Zadeh LA (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst Man Cybern SMC 3:28–44 74. Zadeh LA (1974) On the analysis of large scale systems. In: Gottinger H (ed) Systems approaches and environment problems. Vandenhoeck and Ruprecht, Göttingen, pp 23–37 75. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning Part I. Inf Sci 8:199–249; Part II. Inf Sci 8:301–357; Part III. Inf Sci 9:43–80 76. Zadeh LA (1975) Calculus of fuzzy restrictions. In: Zadeh LA, Fu KS, Tanaka K, Shimura M (eds) Fuzzy sets and their applications to cognitive and decision processes. Academic Press, New York, pp 1–39 77. Zadeh LA (1975) Fuzzy logic and approximate reasoning. Synthese 30:407–428 78. Zadeh LA (1976) A fuzzy-algorithmic approach to the definition of complex or imprecise concepts. Int J Man-Machine Stud 8:249–291 79. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28 80. Zadeh LA (1978) PRUF – a meaning representation language for natural languages. Int J Man-Machine Stud 10:395–460 81. Zadeh LA (1979) Fuzzy sets and information granularity. In: Gupta M, Ragade R, Yager R (eds) Advances in fuzzy set theory

1199

1200

Fuzzy Logic

82.

83.

84.

85.

86. 87.

88.

89.

90. 91.

92.

93.

94.

95. 96.

97. 98.

and applications. North-Holland Publishing Co., Amsterdam, pp 3–18 Zadeh LA (1979) A theory of approximate reasoning. In: Hayes J, Michie D, Mikulich LI (eds) Machine intelligence 9. Halstead Press, New York, pp 149–194 Zadeh LA (1981) Possibility theory and soft data analysis. In: Cobb L, Thrall RM (eds) Mathematical frontiers of the social and policy sciences. Westview Press, Boulder, pp 69–129 Zadeh LA (1982) Test-score semantics for natural languages and meaning representation via PRUF. In: Rieger B (ed) Empirical semantics. Brockmeyer, Bochum, pp 281–349 Zadeh LA (1983) Test-score semantics as a basis for a computational approach to the representation of meaning. Proceedings of the Tenth Annual Conference of the Association for Literary and Linguistic Computing, Oxford University Press Zadeh LA (1983) A computational approach to fuzzy quantifiers in natural languages. Comput Math 9:149–184 Zadeh LA (1984) Precisiation of meaning via translation into PRUF. In: Vaina L, Hintikka J (eds) Cognitive constraints on communication. Reidel, Dordrecht, pp 373–402 Zadeh LA (1986) Test-score semantics as a basis for a computational approach to the representation of meaning. Lit Linguist Comput 1:24–35 Zadeh LA (1986) Outline of a computational approach to meaning and knowledge representation based on the concept of a generalized assignment statement. In: Thoma M, Wyner A (eds) Proceedings of the International Seminar on Artificial Intelligence and Man-Machine Systems. Springer, Heidelberg, pp 198–211 Zadeh LA (1996) Fuzzy logic and the calculi of fuzzy rules and fuzzy graphs. Multiple-Valued Logic 1:1–38 Zadeh LA (1997) Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 90:111–127 Zadeh LA (1998) Some reflections on soft computing, granular computing and their roles in the conception, design and utilization of information/intelligent systems. Soft Comput 2:23–25 Zadeh LA (1999) From computing with numbers to computing with words – from manipulation of measurements to manipulation of perceptions. IEEE Trans Circuits Syst 45:105–119 Zadeh LA (2000) Outline of a computational theory of perceptions based on computing with words. In: Sinha NK, Gupta MM, Zadeh LA (eds) Soft Computing & Intelligent Systems: Theory and Applications. Academic Press, London, pp 3–22 Zadeh LA (2001) A new direction in AI – toward a computational theory of perceptions. AI Magazine 22(1):73–84 Zadeh LA (2002) Toward a perception-based theory of probabilistic reasoning with imprecise probabilities. J Stat Plan Inference 105:233–264 Zadeh LA (2004) Precisiated natural language (PNL). AI Magazine 25(3)74–91 Zadeh LA (2005) Toward a generalized theory of uncertainty (GTU) – an outline. Inf Sci 172:1–40

99. Zadeh LA (2005) From imprecise to granular probabilities. Fuzzy Sets Syst 154:370–374 100. Zadeh LA (2006) From search engines to question answering systems – The problems of world knowledge, relevance, deduction and precisiation. In: Sanchez E (ed) Fuzzy logic and the semantic web, Chapt 9. Elsevier, pp 163–210 101. Zadeh LA (2006) Generalized theory of uncertainty (GTU)– principal concepts and ideas. Comput Stat Data Anal 51:15–46 102. Zadeh LA (2008) Is there a need for fuzzy logic? Inf Sci 178:(13)2751–2779 103. Zimmermann HJ (1978) Fuzzy programming and linear programming with several objective functions. Fuzzy Sets Syst 1:45–55

Books and Reviews Aliev RA, Fazlollahi B, Aliev RR (2004) Soft computing and its applications in business and economics. In: Studies in fuzziness and soft computing. Springer, Berlin Dubois D, Prade H (eds) (1996) Fuzzy information engineering: A guided tour of applications. Wiley, New York Gupta MM, Sanchez E (1982) Fuzzy information and decision processes. North-Holland, Amsterdam Hanss M (2005) Applied fuzzy arithmetic: An introduction with engineering applications. Springer, Berlin Hirota K, Czogala E (1986) Probabilistic sets: Fuzzy and stochastic approach to decision, control and recognition processes, ISR. Verlag TUV Rheinland, Köln Jamshidi M, Titli A, Zadeh LA, Boverie S (1997) Applications of fuzzy logic: Towards high machine intelligence quotient systems. In: Environmental and intelligent manufacturing systems series. Prentice Hall, Upper Saddle River Kacprzyk J, Fedrizzi M (1992) Fuzzy regression analysis. In: Studies in fuzziness. Physica 29 Kosko B (1997) Fuzzy engineering. Prentice Hall, Upper Saddle River Mastorakis NE (1999) Computational intelligence and applications. World Scientific Engineering Society Pal SK, Polkowski L, Skowron (2004) A rough-neural computing: Techniques for computing with words. Springer, Berlin Ralescu AL (1994) Applied research in fuzzy technology, international series in intelligent technologies. Kluwer Academic Publishers, Boston Reghis M, Roventa E (1998) Classical and fuzzy concepts in mathematical logic and applications. CRC-Press, Boca Raton Schneider M, Kandel A, Langholz G, Chew G (1996) Fuzzy expert system tools. Wiley, New York Türksen IB (2005) Ontological and epistemological perspective of fuzzy set theory. Elsevier Science and Technology Books Zadeh LA, Kacprzyk J (1992) Fuzzy logic for the management of uncertainty. Wiley Zhong N, Skowron A, Ohsuga S (1999) New directions in rough sets, data mining, and granular-soft computing. In: Lecture Notes in Artificial Intelligence. Springer, New York

Fuzzy Logic, Type-2 and Uncertainty

Fuzzy Logic, Type-2 and Uncertainty ROBERT I. JOHN1 , JERRY M. MENDEL2 Centre for Computational Intelligence, School of Computing, De Montfort University, Leicester, United Kingdom 2 Signal and Image Processing Institute, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, USA

1

Article Outline Glossary Definition of the Subject Introduction Type-2 Fuzzy Systems Generalized Type-2 Fuzzy Systems Interval Type-2 Fuzzy Sets and Systems Future Directions Bibliography Glossary Type-1 fuzzy sets Are the underlying component in fuzzy logic where uncertainty is represented by a number between one and zero. Type-2 fuzzy sets Are where the uncertainty is represented by a type-1 fuzzy set. Interval type-2 fuzzy sets Are where the uncertainty is represented by a type-1 fuzzy set where the membership grades are unity. Definition of the Subject Type-2 fuzzy logic was first defined in 1975 by Zadeh and is an increasingly popular area for research and applications. The reason for this is because it appears to tackle the fundamental problem with type-1 fuzzy logic in that it is unable to handle the many uncertainties in real systems. Type-2 fuzzy systems are conceptually and mathematically more difficult to understand and implement but the proven applications show that the effort is worth it and type-2 fuzzy systems are at the forefront of fuzzy logic research and applications. These systems rely on the notion of a type-2 fuzzy set where the membership grades are type-1 fuzzy sets.

been able. The use of fuzzy sets in real computer systems is extensive, particularly in consumer products and control applications. Fuzzy logic (a logic based on fuzzy sets) is now a mainstream technique in everyday use across the world. The number of applications is many, and growing, in a variety of areas, for example, heat exchange, warm water pressure, aircraft flight control, robot control, car speed control, power systems, nuclear reactor control, fuzzy memory devices and the fuzzy computer, control of a cement kiln, focusing of a camcorder, climate control for buildings, shower control and mobile robots. The use of fuzzy logic is not limited to control. Successful applications, for example, have been reported in train scheduling, system modeling, computing, stock tracking on the Nikkei stock exchange and information retrieval. Type-1 fuzzy sets represent uncertainty using a number in [0; 1] whereas type-2 fuzzy sets represent uncertainty by a function. This is discussed in more detail later in the article. Essentially, the more imprecise or vague the data is, then type-2 fuzzy sets offer a significant improvement on type-1 fuzzy sets. Figure 1 shows the view taken here of the relationships between levels of imprecision, data and technique. As the level of imprecision increases then type-2 fuzzy logic provides a powerful paradigm for potentially tackling the problem. Problems that contain crisp, precise data do not, in reality, exist. However some problems can be tackled effectively using mathematical techniques where the assumption is that the data is precise. Other problems (for example, in control) use imprecise terminology that can often be effectively modeled using type-1 fuzzy sets. Perceptions, it is argued here, are at a higher level of imprecision and type-2 fuzzy sets can effectively model this imprecision.

Introduction Fuzzy sets [1] have, over the past forty years, laid the basis for a successful method of modeling uncertainty, vagueness and imprecision in a way that no other technique has

Fuzzy Logic, Type-2 and Uncertainty, Figure 1 Relationships between imprecision, data and fuzzy technique

1201

1202

Fuzzy Logic, Type-2 and Uncertainty

The reason for this lies in some of the problems associated with type-1 fuzzy logic systems. Although successful in the control domain they have not delivered as well in systems that attempt to replicate human decision making. It is our view that this is because a type-1 fuzzy logic system (FLS) has some uncertainties which cannot be modeled properly by type-1 fuzzy logic. The sources of the uncertainties in type-1 FLSs are:  The meanings of the words that are used in the antecedents and consequents of rules can be uncertain (words mean different things to different people).  Consequents may have a histogram of values associated with them, especially when knowledge is extracted from a group of experts who do not all agree.  Measurements that activate a type-1 FLS may be noisy and therefore uncertain.  The data that are used to tune the parameters of a type-1 FLS may also be noisy. The uncertainties described all essentially have to do with the uncertainty contained in a type-1 fuzzy set. A type-1 fuzzy set can be defined in the following way: Let X be a universal set defined in a specific problem, with a generic element denoted by x. A fuzzy set A in X is a set of ordered pairs:

tween one and zero. The example shown is linear but, of course, it could be any function. Fuzzy sets offer a practical way of modeling what one might refer to as ‘fuzziness’. The real world can be characterized by the fact that much of it is imprecise in one form or other. For a clear exposition (important to the notion of, and argument for, type-2 sets) two ideas of ‘fuzziness’ can be considered important – imprecision and vagueness (linguistic uncertainty). Imprecision As has already been discussed, in many physical systems measurements are never precise (a physical property can always be measured more accurately). There is imprecision inherent in measurement. Fuzzy numbers are one way of capturing this imprecision by having a fuzzy set represent a real number where the numbers in an interval near to the number are in the fuzzy set to some degree. So, for example, the fuzzy number ‘About 35’ might look like the fuzzy set in Fig. 3 where the numbers closer to 35 have membership nearer unity than those that are further away from 35.

A D f(x; A (x) j x 2 X)g ; where A : X ! [0; 1] is called the membership function A of and A (x) represents the degree of membership of the element x in A. The key points to draw from this definition of a fuzzy set are:  The members of a fuzzy set are members to some degree, known as a membership grade or degree of membership.  A fuzzy set is fully determined by the membership function.  The membership grade is the degree of belonging to the fuzzy set. The larger the number (in [0; 1]) the more the degree of belonging.  The translation from x to A (x) is known as fuzzification.  A fuzzy set is either continuous or discrete.  Graphical representation of membership functions is very useful. For example, the fuzzy set ‘Tall’ might be represented as shown in Fig. 2 where someone who is of height five feet has a membership grade of zero while someone who is of height seven feet is tall to degree one, with heights in between having membership grade be-

Fuzzy Logic, Type-2 and Uncertainty, Figure 2 The fuzzy set ‘Tall’

Fuzzy Logic, Type-2 and Uncertainty, Figure 3 The fuzzy number ‘About 35’

Fuzzy Logic, Type-2 and Uncertainty

Vagueness or Linguistic Uncertainty Another use of fuzzy sets is where words have been used to capture imprecise notions, loose concepts or perceptions. We use words in our everyday language that we, and the intended audience, know what we want to convey but the words cannot be precisely defined. For example, when a bank is considering a loan application somebody may be assessed as a good risk in terms of being able to repay the loan. Within the particular bank this notion of a good risk is well understood. It is not a black and white decision as to whether someone is a good risk or not – they are a good risk to some degree. Type-2 Fuzzy Systems A type-1 fuzzy system uses type-1 fuzzy sets in either the antecedent and/or the consequent of type-1 fuzzy if-then rules and a type-2 fuzzy system deploys type-2 fuzzy sets in either the antecedent and/or the consequent of type-2 fuzzy rules. Fuzzy systems usually have the following features:  The fuzzy sets as defined by their membership functions. These fuzzy sets are the basis of a fuzzy system. They capture the underlying properties or knowledge in the system.  The if-then rules that combine the fuzzy sets – in a rule set or knowledge base.  The fuzzy composition of the rules. Any fuzzy system that has a set of if-then rules has to combine the rules.  Optionally, the defuzzification of the solution fuzzy set. In many (most) fuzzy systems there is a requirement that the final output be a ‘crisp’ number. However, for certain fuzzy paradigms the output of the system is a fuzzy set, or its associated word. This solution set is ‘defuzzified’ to arrive at a number. Type-1 fuzzy sets are, in fact, crisp and not at all fuzzy, and are two dimensional. A domain value x is simply represented by a number in [0; 1] – the membership grade. The methods for combining type-1 fuzzy sets in rules are also precise in nature. Type-2 fuzzy sets, in contrast, are three dimensional. The membership grade for a given value in a type-2 fuzzy set is a type-1 fuzzy set. A formal definition of a type-2 fuzzy set is given in the following: ˜ is characterized by a type-2 memA type-2 fuzzy set, A, bership function A˜ (x; u), where x 2 X and u 2 J x subset JM[0; 1], and J x is called the primary membership, i. e. A˜ D f((x; u); A˜ (x; u)) j 8x 2 X; 8J x subset JM[0; 1]g : (1)

Fuzzy Logic, Type-2 and Uncertainty, Figure 4 A typical FOU of a type-2 set

A useful way of looking at a type-2 fuzzy set is by considering its Footprint of Uncertainty (FOU). This is a two dimensional view of a type-2 fuzzy set. See Fig. 4 for a simple example. The shaded area represents the Union of all the J x . An effective way to compare type-1 fuzzy sets and type-2 fuzzy sets is by use of a simple example. Suppose, for a particular application, we wish to describe the imprecise concept of ‘tallness’. One approach would be to use a type-1 fuzzy set tall1 . Now suppose we are only considering three members of this set – Michael Jordan, Danny Devito and Robert John. For the type-1 fuzzy approach one might say that Michael Jordan is tall1 to degree 0.95, Danny Devito to degree 0.4 and Robert John to degree 0.6. This can be written as tall1 D 0:95/Michael Jordan C 0:4/Danny Devito C 0:6/Robert John : A type-2 fuzzy set (tall2 ) that models the concept of ‘tallness’ could be tall2 D High1 /Michael Jordan C Low1 /Danny Devito C Medium1 /Robert John ; where High1 , Low1 and Medium1 are type-1 fuzzy sets. Figure 5 shows what the sets High1 , Low1 and Medium1 might look like if represented graphically. As can be seen the x axis takes values between 0 and 1, as does the y axis (). Type-1 sets have an x axis representing the domain – in this case the height of an individual. Type-2 sets employ type-1 sets as the membership grades. Therefore, these fuzzy sets of type-2 allow for the idea that a fuzzy approach does not necessarily have membership grades [0; 1] in but the degree of membership for the member is itself a type-1 fuzzy set. As can be seen, by the simple example, there is an

1203

1204

Fuzzy Logic, Type-2 and Uncertainty

Union of Type-2 Fuzzy Sets ˜ B) ˜ correspondThe union ([) of two type-2 fuzzy sets (A; ing to A˜ OR B˜ is given by: ˜ join JM B˜ A˜ [ B˜ , A[ ˜ B˜ (x) D A X ( f (u i )JM g(w j )) D : (u i JMw j ) ij

Fuzzy Logic, Type-2 and Uncertainty, Figure 5 The Fuzzy Sets High1 , Low 1 and Medium1

inherent extra fuzziness offered by type-2 fuzzy sets over and above a type-1 approach. So, a type-2 fuzzy set could be called a fuzzy-fuzzy set. Real situations do not allow for precise numbers in [0; 1]. In a control application, for instance, can we say that a particular temperature, t, belongs to the type-1 fuzzy set hot1 with a membership grade x precisely? No. Firstly it is highly likely that the membership could just as well be x  0:1 for example. Different experts would attach different membership grades and, indeed, the same expert might well give different values on different days! On top of this uncertainty there is always some uncertainty in the measurement of t. So, we have a situation where an uncertain measurement is matched precisely to another uncertain value!! Type-2 fuzzy sets on the other hand, for certain appropriate applications, allow for this uncertainty to be modeled by not using precise membership grades but imprecise type-1 fuzzy sets. So that type-2 sets can be used in a fuzzy system (in a similar manner to type-1 fuzzy sets) a method is required for computing the intersection (AND) and union (OR) of two type-2 sets. Suppose we have two type-2 fuzzy sets, A˜ and B˜ in X and A˜ (x) and B˜ (x) are two secondary membership functions of A˜ and B˜ respectively, represented as: f (u1 ) f (u2 ) f (u n ) C CC u1 u2 un X f (u i ) D ; ui

A˜ (x) D

i

g(w) f (w2 ) f (w m ) C CC w1 w2 wm X g(w j ) D ; wj

B˜ (x) D

j

where the functions f and g are membership functions of fuzzy grades and fu i ; i D 1; 2; : : : ; ng, fw j ; j D 1; 2; : : : ; mg, are the members of the fuzzy grades.

Intersection of Type-2 Fuzzy Sets ˜ B) ˜ correThe intersection (\) of two type-2 fuzzy sets (A; sponding to A˜ AND B˜ is given by: ˜ meet JM B˜ A˜ \ B˜ , A\ ˜ B˜ (x) D A X ( f (u i )JM g(w j )) D ; (u i JMw j ) ij

where join JM denotes join and meet JM denotes meet. Thus the join and meet allow for us to combine type-2 fuzzy sets for the situation where we wish to ‘AND’ or ‘OR’ two type-2 fuzzy sets. Join and meet are the building blocks for type-2 fuzzy relations and type-2 fuzzy inferencing with type-2 if-then rules. Type-2 fuzzy if-then rules (type-2 rules) are similar to type-1 fuzzy if-then rules. An example type-2 if-then rule is given by IF x is A˜ and y is B˜ then z is C˜ :

(2)

Obviously the rule could have a more complex antecedent connected by AND. Also the consequent of the rule could be type-1 or, indeed, crisp. Type-2 output processing can be done in a number of ways through type-reduction (e. g. Centroid, Centre of Sums, Height, Modified Height and Center-of-Sets) that produces a type-1 fuzzy set, followed by defuzzification of that set. Generalized Type-2 Fuzzy Systems So far we have been discussing type-2 fuzzy systems where the secondary membership function can take any form – these are known as generalized type-2 fuzzy sets. Historically these have been difficult to work with because the complexity of the calculations is too high for real applications. Recent developments [2,3,4] mean that it is now possible to develop type-2 fuzzy systems where, for example, the secondary membership functions are triangular in shape. This is relatively new but offers an exciting opportunity to capture the uncertainty in real applications. We expect the interest in this area to grow considerably but for

Fuzzy Logic, Type-2 and Uncertainty

the purposes of this article we will concentrate on the detail of interval type-2 fuzzy systems where the secondary membership function is always unity. Interval Type-2 Fuzzy Sets and Systems As of this date, interval type-2 fuzzy sets (IT2 FSs) and interval type-2 fuzzy logic systems (IT2 FLSs) are most widely used because it is easy to compute using them. IT2 FSs are also known as interval-valued FSs for which there is a very extensive literature (e. g., [1], see the many references in this article, [5,6,16,28]). This section focuses on IT2 FSs and IT2 FLSs. Interval Type-2 Fuzzy Sets An IT2 FS A˜ is characterized as (much of the background material in this sub-section is taken from [23]; see also [17]): Z Z 1/(x; u) A˜ D x2X u2J x [0;1] Z (3) Z ı 1/u x ; D u2J x [0;1]

x2X

where x, the primary variable, has domain X; u 2 U, the secondary variable, has domain J x at each x 2 X; J x is called the primary membership of x and is defined below in (9); and, the secondary grades of A˜ all equal 1. ˜ X ! Note that for continuous X and U, (3) means: A: f[a; b] : 0 6 a 6 b 6 1g. The bracketed term in (3) is called the secondary MF, ˜ and is denoted A˜ (x), i. e. or vertical slice, of A, Z A˜ (x) D

1/u ;

(4)

u2J x [0;1]

so that A˜ can be expressed in terms of its vertical slices as: A˜ D

Z x2X

A˜ (x)/x :

(5)

Fuzzy Logic, Type-2 and Uncertainty, Figure 6 FOU (shaded), LMF (dashed), UMF (solid) and an embedded FS (wavy line) for IT2 FS A˜

˜ is associated with the lower bound of FOU(A) ˜ and LMF(A) is denoted A˜ (x); 8x 2 X, i. e. ˜  ˜ 8x 2 X ; ¯ A˜ (x) D FOU(A) UMF(A)

(7)

˜  (x) D FOU(A) ˜ LMF(A) A˜

(8)

8x 2 X :

Note that J x is an interval set, i. e. h i ¯ A˜ (x) : J x D A˜ (x); 

(9)

This set is discrete when U is discrete and is continuous ˜ in (6) can when U is continuous. Using (9), the FOU(A) also be expressed as i [ h ˜ D ¯ A˜ (x) : FOU(A) A˜ (x);  (10) 8x2X

A very compact way to describe an IT2 FS is [22]: ˜ ; A˜ D 1/FOU(A)

(11)

where this notation means that the secondary grade equals ˜ 1 for all elements of FOU(A). For continuous universes of discourse X and U, an embedded IT2 FS A˜e is Z (12) A˜ e D [1/u] /x u 2 J x : x2X

Uncertainty about A˜ is conveyed by the union of all the primary memberships, which is called the footprint of uncertainty (FOU) of A˜ (see Fig. 6), i. e. ˜ D [ J x D f(x; u) : u 2 J x [0; 1]g : FOU(A) 8x2X

(6)

The upper membership function (UMF) and lower membership function (LMF) of A˜ are two type-1 MFs that ˜ is associated with the upbound the FOU (Fig. 6). UMF(A) ˜ ¯ A˜ (x); 8x 2 X, and per bound of FOU(A) and is denoted 

Note that (12) means: A˜ e : X ! fu : 0 6 u 6 1g. The set A˜e is embedded in A˜ such that at each x it only has one secondary variable (i. e., one primary membership whose ¯ A˜ (x) secondary grade equals 1). Examples of A˜ e are 1/ and 1/ A˜ (x), 8x 2 X. In this notation it is understood that the secondary grade equals 1 at all elements in  A˜ (x) ¯ A˜ (x). or  For discrete universes of discourse X and U, in which x has been discretized into N values and at each of these values u has been discretized into M i values, an embedded IT2

1205

1206

Fuzzy Logic, Type-2 and Uncertainty

FS A˜ e has N elements, where A˜ e contains exactly one element from J x 1 ; J x 2 ; : : : ; J x N , namely u1 ; u2 ; : : : ; u N , each with a secondary grade equal to 1, i. e., A˜ e D

N X [1/u i ]/x i ;

(13)

iD1

˜ and, there are where u i 2 J x i . Set A˜ e is embedded in A, Q M embedded T2 FSs. a total of n A D N i iD1 Associated with each A˜ e is an embedded T1 FS Ae , where Z Ae D u/x u 2 J x : (14) x2X

The set Ae , which acts as the domain for A˜ e (i. e., A˜ e D 1/Ae ) is the union of all the primary memberships ¯ A˜ (x) and  ˜ (x), of the set A˜ e in (12). Examples of Ae are  A 8x 2 X. When the universes of discourse X and U are continuous then there is an uncountable number of embedded ˜ Because such sets are only used for IT2 and T1 FSs in A. theoretical purposes and are not used for computational purposes, this poses no problem. For discrete universes of discourse X and U, an embedded T1 FS Ae has N elements, one each from J x 1 ; J x 2 ; : : : ; J x N , namely u1 ; u2 ; : : : ; u N , i. e., Ae D

N X

u i /x i :

(15)

iD1

Set Ae is the union of all the primary memberships of set Q Ae and, there are a total of N iD1 M i embedded T1 FSs. Theorem 1 (Representation Theorem (RT) [19] Specialized to an IT2 FS [22]) For an IT2 FS, for which X and U are discrete, A˜ is the union of all of its embedded IT2 FSs. Equivalently, the domain of A˜ is equal to the union of all of its embedded T1 FSs, so that A˜ can be expressed as A˜ D

nA X jD1

j ˜ D1 A˜ e D 1/FOU(A)

nA ıX jD1

j

Ae D 1

N ıX

j

u i /x i

iD1

o ı [ n ¯ A˜ (x)  A˜ (x); : : : ;  D1

(16)

use leads to the structure of the solution to a problem, after which efficient computational methods must be found to implement that structural solution. Table 1 summarizes set theoretic operations and uncertainty measures all of which were computed using the RT. Additional results for similarity measures are in [25]. Interval Type-2 Fuzzy Logic Systems An interval type-2 fuzzy logic system (IT2 FLS), which is a FLS that uses at least one IT2 FS, contains five components – fuzzifier, rules, inference engine, type-reducer and defuzzifier – that are inter-connected, as shown in Fig. 7 (the background material in this sub-section is taken from [23]). The IT2 FLS can be viewed as a mapping from inputs to outputs (the path in Fig. 7, from “Crisp Inputs” to “Crisp Outputs”), and this mapping can be expressed quantitatively as y D f (x), and is also known as interval type-2 fuzzy logic controller (IT2 FLC) [7], interval type-2 fuzzy expert system, or interval type-2 fuzzy model. The inputs to the IT2 FLS prior to fuzzification may be certain (e. g., perfect measurements) or uncertain (e. g., noisy measurements). T1 or IT2 FSs can be used to model the latter measurements. The IT2 FLS works as follows: the crisp inputs are first fuzzified into either type-0 (known as singleton fuzzification), type-1 or IT2 FSs, which then activate the inference engine and the rule base to produce output IT2 FSs. These IT2 FSs are then processed by a type-reducer (which combines the output sets and then performs a centroid calculation), leading to an interval T1 FS called the type-reduced set. A defuzzifier then defuzzifies the type-reduced set to produce crisp outputs. Rules are the heart of a FLS, and may be provided by experts or can be extracted from numerical data. In either case, rules can be expressed as a collection of IF-THEN statements. A multi-input multi-output (MIMO) rule base can be considered as a group of multi-input single-output (MISO) rule bases; hence, it is only necessary to concentrate on a MISO rule base. Consider an IT2 FLS having p inputs x1 2 X 1 ; : : : ; x p 2 X p and one output y 2 Y. We assume there are M rules where the ith rule has the form

8x2X

and if X is a continuous universe, then the infinite ¯ A˜ (x)g is replaced by the interval set set fA˜ (x); : : : ;  ¯ A˜ (x)]. [ A˜ (x);  This RT is arguably the most important result in IT2 FS theory because it can be used as the starting point for solving all problems involving IT2 FSs. It expresses an IT2 FS in terms of T1 FSs, so that all results for problems that use IT2 FSs can be solved using T1 FS mathematics [22]. Its

R i : IF x1 is F˜1i and : : : and x p is F˜pi ; THEN y is G˜ i

i D 1; : : : ; M :

(17)

This rule represents a T2 relation between the input space, X 1      X p , and the output space, Y, of the IT2 FLS. Associated with the p antecedent IT2 FSs, F˜ki , are the IT2 MFs F˜ i (x k )(k D 1; : : : ; p), and associated with the conk sequent IT2 FS G˜ i is its IT2 MF G˜ i (y).

Fuzzy Logic, Type-2 and Uncertainty

Fuzzy Logic, Type-2 and Uncertainty, Table 1 Results for IT2 FSs Set Theoretic Operations [22] h i S Union ¯ A˜ (x) _  ¯ B˜ (x) A˜ [ B˜ D 1/ 8x2X A˜ (x) _ B˜ (x);  h i S Intersection A˜ \ B˜ D 1/ 8x2X A˜ (x) ^ B˜ (x);  ¯ A˜ (x) ^  ¯ B˜ (x) h i S Complement A¯˜ D 1/ 8x2X 1  A˜ (x); 1   ¯ A˜ (x) Uncertainty Measures [9,21,24] Centroid

Cardinality Fuzziness

˜ cr (A)] ˜ D CA˜ D [cl (A);

PL

¯ A˜ (xi )C iD1 xi 

PL

¯ A˜ (xi )C iD1 

PR

PN

iDLC1 xi A˜ (xi )

PN

iDLC1 A˜ (xi )

;

iD1 xi A˜ (xi )C

PR

iD1 A˜ (xi )C

PN

¯ A˜ (xi ) iDRC1 xi 

PN

¯ A˜ (xi ) iDRC1 

L and R computed using the KM Algorithms in Table 2. P ˜ pr (A)] ˜ D [p( (x)); p( PA˜ D [pl (A); ¯ A˜ (x))]; p(B) D jXj NiD1 B (xi )/N A˜ P  N ˜ f2 (A)] ˜ D [f1 (Ae1 ); f2 (Ae2 )]; f (A) D h FA˜ D [f1 (A); iD1 g(A (xi )) 8 0g

for ˛ D 0

(21)

where clsfAg denotes the closure of set fAg. Tanaka, Okuda and Asai [120] make the following remarkable observation: sup  D (x) D sup[˛ ^ max G (x)] : x

˛



˜ depicted For illustration, consider two fuzzy sets, C˜ and G, in Fig. 2. In case (i), ˛ D ˛1 , and C˛ is the interval between the end-points of the ˛-cut, [C  (˛1 ); C C (˛1 )]. The maximum G in this interval is shown in the example. In this case, ˛1 < maxC ˛1 G (x)], so [˛1 ^ maxC ˛1 G (x)] D ˛1 . In case (ii), ˛2 > maxC ˛2 G (x), so [˛2 ^ maxC ˛2 G (x)] D

1223

1224

Fuzzy Optimization

maxC ˛2 G (x). In case (iii), ˛  D maxC ˛ G (x). It should be apparent from Fig. 2 that ˛  D supx  D . In case (iii), ˛ D ˛  is also sup˛ [˛ ^ maxC ˛ G (x)]. For any ˛ < ˛ , we have case (i), where [˛1 ^ maxC ˛1 G (x)] D ˛1 < ˛  ; and for any ˛ > ˛ , we have case (ii), where [˛2 ^ maxC ˛2 G (x)] D maxC ˛2 G (x) < ˛  . The formal proof, which follows the reasoning illustrated here pictorially, is omitted for brevity’s sake. However, the interested reader is referred to Tanaka, Okuda, and Asai [120]. This result allows the following reformulation of the fuzzy mathematical problem:

x2C ˛

˛

i

and

j

0

max min  i @ x0

X

i

max  D (x) ;

Zimmermann Just two years after Tanaka, Okuda, and Asai [120] suggested the use of ˛-cuts to solve the fuzzy mathematical problem, Zimmermann published a linear programming equivalent to the ˛-cut formulation. Beginning with the crisp programming problem, x 0;

(23)

the decision maker introduces flexibility in the constraints, and sets a target value for the objective function, Z, ˜ Z; cx 

˜ b; Ax 

x 0:

(24)

A linear membership function i is defined for each flexible right hand side, including the goal Z as 0 1 X ai j x jA i @ j

D

8 1 ˆ ˆ
0. Then (44) reduces to

Since both x and h are variables, this is a non-linear programming problem. When h is fixed, it becomes linear. We can use the simplex method or an interior point algorithm to solve for a given value of h. If the decision maker wishes to maximize h, the linear programming method must be applied iteratively. Fuzzy Max Dubois and Prade [19] suggested that the concept of “fuzzy max” could be applied to constraints with fuzzy parameters. The “fuzzy max” concept was used to solve possibilistic linear programs with triangular possibilistic coefficients by Tanaka, Ichihashi, and Asai [121]. Ramik and Rimanek [101] applied the same technique to L-R fuzzy numbers. For consistency, we will discuss the fuzzy max technique with respect to trapezoidal numbers. The fuzzy max, illustrated in Fig. 6 where C D max[A; B], is the extended maximum operator between real numbers, and defined by the extension principle (8) as C (c) D

max

fa;b : cDmax(a;b)g

min[A˜ (a); B˜ (b)] :

Using fuzzy max, we can define an inequality relation (44)

as

A˜ B˜ , ˜ ˜ max(A; B) D A˜ :

(46)

Fuzzy Optimization

Fuzzy Optimization, Figure 6 Illustration of Fuzzy Max

Fuzzy Optimization, Figure 7 Illustration of A˜  B˜

Applying (46) to a fuzzy inequality constraint: ˜ f (x; a˜) g(x; b) ,

(47)

˜ D f (x; a˜) : ˜ g(x; b)) max( f (x; a); Observe that the inequality relation defined by (46) yields only a partial ordering. That is, it is sometimes the case that neither A˜ B˜ nor B˜ A˜ holds. To improve this, Tanaka, Ichihashi, and Asai, introduce a level h, corresponding to the decision maker’s degree of optimism. They define an h-level set of A˜ B˜ as A˜ h B˜ , ˜ ˛ max[B] ˜˛ max[A] ˜ ˜ min[A]˛ min[B]˛

(48)

˛ 2 [1  h; 1] : This definition of A˜ B˜ requires that two of Dubois’ inequalities from Subsect. “Fuzzy Relations”, (ii) and (iii), hold at the same time, and is illustrated in Fig. 7. Tanaka, Ichihashi, and Asai, [118] suggest a similar treatment for a fuzzy objective function. A problem with the single objective function, Maximize z(x; c˜), becomes a multi-objective problem with objective functions: ( inf(z(x; c˜)˛ ) ˛ 2 [0; 1] maximize (49) sup(z(x; c˜)˛ ) ˛ 2 [0; 1]: Clearly, since ˛ can assume an infinite number of values, (49) has an infinite number of parameters. Since (49) is not tractable, Inuiguchi, Ichihashi, and Tanaka, [44] suggest using the following approximation using a finite

set of ˛ i 2 [0; 1]: ( min(z(x; c˜))˛ i maximize max(z(x; c˜))˛ i

i D 1; 2; : : : ; p i D 1; 2; : : : ; p

:

(50)

Jamison and Lodwick Jamison and Lodwick [50,74] develop a method for dealing with possibilistic right hand sides that is a possibilistic generalization of the recourse models in stochastic optimization. Violations of constraints are allowable, at a cost determined a priori by the decision maker. Jamison and Lodwick choose the utility (that is, valuation) of a given interval of possible values to be its expected average (a concept defined by Yager [139].) The expected average (or EA) of a possibilistic distribution of a˜ is defined to be Z 1 1  ˜ D ( a˜ (˛) C a˜C (˛)) d˛: (51) EA( a) 2 0 It should be noted that the expected average of a crisp value is the a˜ (˛) D a˜ C (˛) D a, R1 R value itself, since 1 1 EA(a) D 2 0 (a C a)dx D a 0 dx D a. Jamison and Lodwick start from the following possibilistic linear program: max z D c T x Ax  bˆ ;

(52)

x 0: By subtracting a penalty term from the objective function, they transform (52) into the following possibilistic nonlinear program: ˆ max z D c T x C pT max(0; Ax  b) x 0:

(53)

1229

1230

Fuzzy Optimization

The “max” in the penalty is taken component-wise and each p i < 0 is the cost per unit violation of the right hand side of constraint i. The utility, which is the expected average, of the objective function is chosen to be minimized. The possibilistic programming problem becomes ˆ max z D c T x C pEA(max(0; Ax  b)) x 2 [0; U]:

(54)

A closed-form objective function (for the purpose of differentiating when solving) is achieved in [72] by replacing ˆ max (0; Ax  b) with q

ˆ C  2 C Ax  bˆ (Ax  b) 2

:

Jamison and Lodwick’s method can be extended, [50], to account for possibilistic values for A, b, c, and even the penalty coefficients p with the following formulation: 1 EA f˜(x) D 2

Z

1n

cˆ (˛)T x C cˆC (˛)T x

0

   pˆC (˛) max(0; Aˆ (˛)x  bˆ C (˛) o   pˆ (˛) max(0; AˆC (˛)x  bˆ  (˛) d˛ : (55)

This approach differs significantly from the others we’ve examined in several regards. First, many of the approaches we’ve seen have incorporated the objective function(s) as goals into the constraints in the Bellman and Zadeh tradition. Jamison and Lodwick, on the other hand, incorporate the constraints into the objective function. Bellman and Zadeh create a symmetry between constraints and objective, while Jamison and Lodwick temper the objective with the constraints. A second distinction of the expected average approach is the nature of the solution. The other formulations we have examined to this point have produced either (1) a crisp solution for a particular value of ˛, (namely, the maximal value of ˛), or, (2) a fuzzy/possibilistic solution which encompasses all possible ˛ values. The Jamison and Lodwick approach provides a crisp solution via the expected average utility which encompasses all alpha values. This may be a desirable quality to the decision maker who wants to account for all possibility levels and still reach a crisp solution. Luhandjula Luhanjula’s [84] formulation of the possibilistic mathematical program depends upon his concept

of “more possible” values. He first defines a possibility distribution ˘ X with respect to constraint F as ˘ X D F (u) ; where F (u) is the degree to which the constraint F is satisfied when u is the value assigned to the solution X. Then the set of more possible values for X, denoted by Vp (X), is given by Vp (X) D ˘ X1 (max ˘ X (u)) : u

In other words, Vp (X) contains elements of U which are most compatible with the restrictions defined by ˘ X . It follows from intuition, and from Luhanjula’s formal proof [84], that when ˘ X is convex, Vp (X) is a real-valued interval, and when ˘ X is strongly convex, Vp (X) is a single real number. Luhandjula considers the mathematical program max z˜ D cˆx subject to: Aˆ i  bˆi ;

(56)

x 0: By replacing the possibilistic numbers cˆ, Aˆi , and bˆi with their more possible values, Vp (ˆc ), Vp (Aˆi ), and Vp (bˆi ), respectively, Luhandjula arrives at a deterministic equivalent to Eq. (56): max z D kx subject to: k i 2 Vp (cˆi ) X ti xi  si (57)

i

t i 2 Vp ( aˆ i j ) s i 2 Vp (bˆ i ) x 0 : This formulation varies significantly from the other approaches considered thus far. The possibility of each possibilistic component is maximized individually. Other formulations have required that each possibilistic component c˜j , A˜i j , and b˜i achieve the same possibility level defined by ˛. This formulation also has a distinct disadvantage over the others we’ve considered, since to date there is no proposed computational method for determining the “more possible” values, V p , so there is no way to solve the deterministic MP. Programming with Fuzzy and Possibilistic Components Sometimes the values of an optimization problem’s components are ambiguous and the decision-makers are vague

Fuzzy Optimization

(or flexible) regarding feasibility requirements. This sections explores a couple of approaches for dealing with such fuzzy/possibilistic problems. One type of mixed programming problem that arises (see [20,91]) is a mathematical program with possibilistic constraint coefficients aˆ i j whose possible values are defined by fuzzy numbers of the form a˜ i j : max cx subject to: aˆ 0i x 0 b˜ i 0

(58)

t

x D (1; x )t 0: ˜ N ˜ as Zadeh [142] defines the set-inclusion relation M  M˜ (r)   N˜ (r)8r 2 R. Recall that Dubois [18] interprets the set-inclusive constraint a˜0i x 0 b˜ i as a fuzzy extension of the crisp equality constraint. Mixed programming, however, interprets the set-inclusive constraint to mean that the region in which a˜0i x 0 can possibly occur is restricted to b˜i , a region which is tolerable to the decision maker. Therefore, the left side of (58) is possibilistic, and the right side is fuzzy. Negoita [91] defines the fuzzy right hand side as follows: b˜ i D fr j r b i g :

(59)

As a result, we can interpret a˜ 0i x 0 b˜ i as an extension of an inequality constraint. The set-inclusive constraint (58) is reduced to C aC i (˛)x  b i (˛)  a i (˛)x b i (˛)

(60)

for all ˛ 2 (0; 1]: If we abide by Negoita’s definition of b˜ (59), bC i D 1 for all values of ˛, so we can drop the first constraint in (60). Nonetheless, we still have an infinitely (continuum) constrained program, with two constraints for each value of ˛ 2 (0; 1]. Inuiguchi observes [44] that if the left-hand side of the membership functions for a i0 ; a i1 ; : : : a i n ; b i are identical for all i, and the right-hand side of the membership functions for a i0 ; a i1 ; : : : a i n ; b i are identical for all i, the constraints are reduced to the finite set, a i;˛ b i;˛ a i; b i; a i;ˇ  b i;ˇ

(61)

a i;ı  b i;ı : As per our usual notation, (; ı) is the support of the fuzzy number, and (˛; ˇ) is its core. In application, constraint formulation (61) has limited utility because of the

narrowly defined sets of memberships functions it admits. For example, if a i0 ; a i1 ; : : : a i n ; b i are defined by trapezoidal fuzzy numbers, they must all have the same spread, and therefore the same slope, on the right-hand side; and they must all have the same spread, and therefore the same slope, on the left-hand side if (61) is to be implemented. Recall that in this kind of mixed programming, the ai j ’s are possibilistic, reflecting a lack of information about their values, and the bi are fuzzy, reflecting the decision maker’s degree of satisfaction with their possible values. It is possible that n possibilistic components and 1 fuzzy component will share identically-shaped distribution, but it is not something likely to happen with great frequency. Delgado, Verdegay and Vila Delgado, Verdegay, and Villa [11] propose the following formulation for dealing with ambiguity in the constraint coefficients and right-hand sides, as well as vagueness in the inequality relationship: max cx ˆ  ˜ bˆ subject to: Ax

(62)

x 0: In addition to (62), membership functions  a i j are defined for the possible values of each possibilistic element of ˆ membership functions b are defined for the possible A, i ˆ and membership values of each possibilistic element of b, function  i gives the degree to which the fuzzy constraint i is satisfied. Stated another way, i is the membership function of the fuzzy inequality. The uncertainty in the a˜ i j and the b˜ i is due to ambiguity concerning the actual value of ˜ is due to the the parameter, while the uncertainty in the  decision maker’s flexibility regarding the necessity of satisfying the constraints in full. Delgado, Verdegay, and Vila do not define the fuzzy inequality, but leave that specification to decision maker. Any ranking index that preserves ranking of fuzzy numbers when multiplied by a positive scalar is allowed. For instance, one could select any of Dubois’ four inequalities from Subsect. “Fuzzy Relations”. Once the ranking index is selected, the problem is solved parametrically, as in Verdegay’s earlier work [130] (see Subsect. “Verdegay”). To illustrate this approach, let us choose Dubois’ pessimistic inequality (i), which interprets A˜ b˜ to mean 8x 2 A; 8y 2 B; x y. This is equivalent to a C b . Then (62) becomes max cx  subject to: a C i j (˛)x  b i (˛)

x 0:

(63)

1231

1232

Fuzzy Optimization

Fuzzy/Robust Programming The approach covered so far in this section, has the same ˛ cut define the level of ambiguity in the coefficients and the level at which the decision-maker’s requirements are satisfied. These ˛, however, mean very different things. The fuzzy ˛ represents the level at which the decision-maker’s requirements are satisfied. The possibilistic ˛, on the other hand, represents the likelihood that the parameters will take on values which will attain that level. The solution is interpreted to mean that for any value ˛ 2 (0; 1] there is a possibility ˛ of obtaining a solution that satisfies the decision maker to degree ˛. Using the same ˛ value for both the possibilistic and fuzzy components of the problem is convenient, but does not necessarily provide a meaningful model of reality. A recent model [126] based on Markowitz’s meanvariance approach to portfolio optimization (see [89,117]) differentiates between the fuzzy ˛ and the possibilistic ˛. Markowitz introduced an efficient combination, which has the minimum risk for a return greater than or equal to a given level; or one which has the maximum return for a risk less than or equal to a given level. The decision maker can move among these efficient combinations, or along the efficient frontier, according to her/his degree of risk aversion. Similarly, in mixed possibilistic and fuzzy programming, one might wish to allow a trade-off between the potential reward of the outcome and the reliability of the outcome, with the weights of the two competing objectives determined by the decision maker’s risk aversion. The desire is to obtain an objective function like the following:   max reward C (risk aversion)  (reliability) : (64) The reward variable is the ˛-level associated with the fuzzy constraints and goal(s). It tells the decision maker how satisfactory the solution is. The reliability variable is the ˛-level associated with the possibilistic parameters. It tells the decision maker how likely it is that the solution will actually be satisfactory. To avoid confusion, let us refer to the fuzzy constraint membership parameter as ˛ and the possibilistic parameter membership level as ˇ. In addition, let  2 [0; 1] be an indicator of the decision maker’s valuation of reward and risk-avoidance, with 0 indicating that the decision maker cares exclusively about the reward, and 1 indicating the only risk avoidance is important. Using this notation, the desired objective is max(1  )˛ C ˇ : Suppose we begin with the mixed problem: max cˆT x

(65)

ˆ  ˜b subject to Ax

(66)

x 0: Incorporating fuzziness from soft constraints in the tradition of Zimmermann and incorporating a pessimistic view of possibility results in the following formulation: max

(1  )˛ C ˇ

subject to

˛

(67)

X uj X uj  wj g C xj C x jˇ d0 d0 d0 j

˛

bi  di

j

X vi j j

X zi j  vi j xj  x jˇ di di j

x 0 ˛; ˇ 2 [0; 1] : (68) The last terms in each of the constraints contain ˇx, so the system is non-linear. It can be fairly easily reformulated as an optimization program with linear objective function and quadratic constraints, but the feasible set is non-convex, so finding to solution to this mathematical programming problem is very difficult. Possibilistic, Interval, Cloud, and Probabilistic Optimization Utilizing IVPM This section is taken from [79,80] and begins by defining what is meant by an IVPM. This generalization of a probability measure includes probability measures, possibility/necessity measures, intervals, and clouds (see [95]) which will allow a mixture of uncertainty within one constraint (in)equality. The previous mixed methods was restricted to a single type of uncertainty for any particular (in)equality and are unable to handle cases in which a mixture of fuzzy and possibilistic parameter occurs in the same constraint (in)equality. The IVPM set function may be thought of as a method for giving a partial representation for an unknown probability measure. Throughout, arithmetic operations involving set functions are in terms of interval arithmetic [90] and the set˚of all intervals contained  in [0, 1] is denoted, Int[0;1] a; b j 0  a  b  1 . Moreover, S is used to denote the universal set and a set of subsets of the universal set is denoted as A S. In particular A is a set of subset on which a structure has been imposed on it as will be seen and a generic set of the structure A is denoted by A. Definition 18 (Weichselberger [135]) Given measurable space (S; A), an interval valued function i m : A A ! Int[0;1] is called an R-probability if:

Fuzzy Optimization

  C  (a) i m (A) D i   m (A) ; i m (A) [0; 1] with i m (A) iC m (A), (b) 9 a probability measure Pr on A such that 8A 2 A, Pr (A) 2 i m (A). By an R-probability field we mean the triple (S; A; i m ). Definition 19 (Weichselberger [135]) Given an R-probability field R D (S; A; i m ) the set M (R) D fPr j Pr is a probability measure on A

such that 8A 2 A; Pr (A) 2 i m (A)g is called the structure of R. Definition 20 (Weichselberger [135]) An R-probability field R D (S; A; i m ) is called an F-probability field if 8A 2 A: (a) i C m (A) D sup fPr (A) j Pr 2 M (R)g, (b) i  m (A) D inf fPr (A) j Pr 2 M (R)g. It is interesting to note that given a measurable space (S; A) and a set of probability measures P, then defining  iC m (A) D sup fPr (A) j Pr 2 Pg and i m (A) D inffPr (A) j Pr 2 Pg gives an F-probability and that P is a subset of the structure. The following examples show how intervals, possibility distributions, clouds and (of course) probability measures can define R-probability fields on B, the Borel sets on the real line. Example  21  (An interval defines an F-probability field) Let I D a; b be a non-empty interval on the real line. On the Borel sets define

1 if I \ A ¤ ; iC D (A) m 0 otherwise and i m (A) D

1 0

if I A otherwise

then   C i m (A) D i  m (A) ; i m (A) defines an F-probability field R D (R; B; i m ). To see this, simply let P be the set of all probability measures on B such that Pr (I) D 1. This example also illustrates that any set A, not just an interval I, can be used to define an F-probability field.

Example 22 (A probability measure is an F-probability field) Let Pr be a probability measure over (S; A). Define i m (A) D [Pr (A) ; Pr (A)] : This definition of a probability as an IVPM is equivalent to having total knowledge about a probability distribution over S. The concept of a cloud was introduced by Neumaier in [95] as follows: Definition 23 A cloud over set S is a mapping c such that:   1) 8s 2 S, c(s) D n(s); p¯(s) with 0  n(s)  p¯(s)  1 ¯ ¯ 2) (0; 1) [s2S c(s) [0; 1] In addition, random variable X taking values in S is said to belong to cloud c (written X 2 c) iff   3) 8˛ 2 [0; 1] ; Pr (n (X) ˛)  1  ˛  Pr p¯ (X) > ˛ ¯ Property 3) above defines when a random variable belongs to a cloud. That any cloud contains a random variable X is proved in section 5 of [96]. This is a significant result as will be seen, since among other things, it means that clouds can be used to define IVPMs. Clouds are closely related to possibility theory. It is shown in [51] that possibility distributions can be constructed which satisfy the following consistency definition. Definition 24 ([51]) Let p : S ! [0; 1] be a possibility distribution function with associated possibility measure Pos and necessity measure Nec. Then p is said to be consistent with random variable X if 8 measurable sets A, Nec (A)  Pr (X 2 A)  Pos (A). Possibility distributions constructed in a consistent manner are able to bound (unknown) probabilities of interest. The reason this is significant is twofold. Firstly, possibility and necessity distributions are easier to construct since the axioms they satisfy are more general. Secondly, the algebra on possibility and necessity pairs are much simpler since they are min/max algebras akin to the min/max algebra of interval arithmetic (see, for example, [76]). In particular, they avoid convolutions which are requisite for probabilistic arithmetic. The concept of a cloud can be stated in terms of certain pairs of consistent possibility distributions as shown by the following proposition (which means that clouds may be considered as pairs of consistent possibilities – possibility and necessity pairs, see [51]). Proposition 25 Let p¯ ; p be a pair of regular possibility ¯ set S such that 8s 2 S p¯(s) C distribution functions over   p(s) 1. Then the mapping c(s) D n(s); p¯(s) where ¯ ¯

1233

1234

Fuzzy Optimization

n(s) D 1  p(s) (i. e. the dual necessity distribution func¯ ¯ In addition, if X is a random variable taktion) is a cloud. ing values in S and the possibility measures associated with p¯ ; p are consistent with X then X belongs to cloud c. Con¯ versely, every cloud defines such a pair of possibility distribution functions and their associated possibility measures are consistent with every random variable belonging to c. Proof (see [52,78,79])



Example 26 (A cloud defines an R-probability field) Let c be a cloud over the real line. Let Pos1 ; Nec1 ; Pos2 ; Nec2 be the possibility measures and their dual necessity measures relating to p¯(s) and p(s) (where p¯ and p are as in Proposi¯ ¯ tion 18). Define   ˚ i m (A) D max Nec1 (A) ; Nec2 (A) ; ˚  min Pos1 (A) ; Pos2 (A) : Neumaier [96] proved that every cloud contains a random variable X. Since consistency requires that Pr (X 2 A) 2 i m (A), the result that every cloud contains a random variable X shows consistency. Thus every cloud defines an R-probability field because the inf and sup of the probabilities are bounded by the lower and upper bounds of i m (A). Example 27 (A possibility distribution defines an R-probability field) Let p : S ! [0; 1] be a possibility distribution function and let Pos be the associated possibility measure and Nec the dual necessity measure. Define i m (A) D [Nec (A) ; Pos (A)]. Defining a second possibility distribution, p(x) D 18x means that the pair p; p define ¯ ¯ Since a cloud for which i m (A) defines the R-probability. a cloud defines an R-probability field, this means that this possibility in turn generates a R-probability. Note that the above example means that every pair of possibility p(x) and necessity n(x) such that n(x)  p(x) has an associated F-probability field. This is because such a pair defines a cloud and in every cloud there exists a probability distribution. The F-probability field can then be constructed from a inf /sup over all such enclosed probabilities that are less than or equal to the bounding necessity/possibility distributions. The application of these concepts to mixed fuzzy and possibilistic optimization is as follows. Suppose the optimization problem is to maximize f (E x ; aE) subject to E D 0 (where aE and bE are parameters). Assume aE and g(E x ; b) bE are vectors of independent uncertain parameters, each with an associated IVPM. Assume the constraint may be violated at a cost pE > 0 so that the problem becomes one to maximize ˇ      ˇ h xE; aE; bE D f xE; aE  pEˇ g xE; bE ˇ :

Given the independence assumption, form an IVPM for the product space i aEbE for the joint distribution (see the example below and [52,78,79]) and calculate the interval-valued expected value (see [48,140]) with respect to this IVPM. The interval-valued expected value is denoted (there is a lower expected value and an upper expected value) Z R

  h xE; aE; bE di aEbE :

(69)

To optimize (69) requires an ordering of intervals, a valuation function denoted by v : IntR ! Rn . One such ordering is the midpoint of the interval on the principle that in the absence of additional data, the midpoint is the best estimate for the true value so that for this function (mid  point and width), v : Int R ! R2 . Thus, for I D a; b , this particular valuation function is v(I) D ((a C b)/2; b  a). Next a utility function of a vector in Rn is required which is denote by u : Rn ! R. A utility function operating on the midpoint and width valuation function is u : R2 ! R and particular utility is a weighted sum of the midpoint and width u(c; d) D ˛c C ˇd. Using the valuation and utility functions, the optimization problem is: Z 

max u v : h(x; a; b)di ab x

(70)

R

Thus there are four steps to obtaining a real-valued optimization problem from an IVPM problem. The first step is to obtain interval probability h(x; a; b). The second step is R E to obtain the IVPM expected value R h(E x ; aE; b)di . The aEbE third step is to obtain the vector value of v : Int R ! Rn . The fourth step is to obtain the utility function value of a vector u : Rn ! R. Example 28 Consider the problem max f (x; a) D 8x1 C 7x2 subject to: g1 (x; b) D 3x1 C [1; 3]x2  4 D 0 ˜ 1 C 5x2  1 D 0 g2 (x; b) D 2x xE 2 [0; 2] where 2˜ D 1/2/3, that is, 2˜ is a triangular possibilistic number with support [1,3] and modal value at 2. For pE D (1; 1)T , ˜ 1 C [3; 5]x2  6 h(x; a; b) D 5x1  2x

Fuzzy Optimization

so that Z R

h(x; a; b)di ab Z 1 Z 1 D 5x1 C (˛  3) d˛; (1  ˛) d˛ x1 0

0

C [3; 5]x2  6 5 3 D 5x1 C  ;  x1 C [3; 5]x2  5 : 2 2 Since the constant -5 will not affect the optimization, it will be removed (then added at the end), so that Z



5 7 h (x; a; b) di ab D v ; x1 C [3; 5]x2 v 2 2 R D (3; 1)x1 C (4; 2)x2 : Let u(Ey ) D yields

Pn

iD1

y i which for the context of this problem



Z max z D u v h (x; a; b) di ab x

R

D max (4x1 C 6x2 )  5 xE2[0;2]

D 20  5 D 15 x1 D 2; x2 D 2 : Example 29 (see [124]) Consider ¯ 2 C [3; 5]x3 ˆ 1  0x max z D 2x ˆ 1 C [1; 5]x2  2x3  [0; 2] D 0 4x ¯ 2 C 9x3  9 D 0 6x1  2x ˆ 3 C 5¯ D 0 2x1  [1; 4]x2  8x 0  x1  3; 1  x2 ; x3  2 : ¯ 2, ¯ and 5¯ are probability distributions. Note the Here the 0; mixture of three uncertainty types in the third constraint equation. Using the same approach as in the previous example, the optimal values are: z D 3:9179 x1 D 0 ;

x2 D 0:4355 ;

x3 D 1:0121 :

In [124] it is shown that these mixed problems arising from linear programs remain linear programs. Thus, the complexity of mixed problems is equivalent to that of linear programming. Future Directions The applications of fuzzy optimization seems to be headed toward “industrial strength” problems. Increasingly, each

year there are a greater number of applications that appear. Given that greater attention is being given to the semantics of fuzzy optimization and as fuzzy optimization becomes increasingly used in applications, associated algorithms that are more sophisticated, robust, and efficient will need to be developed to handle these more complex problems. It would be interesting to develop modeling languages like GAMS [6], MODLER [34], or AMPL [30], that support fuzzy data structures. From the theoretical side, the flexibility that fuzzy optimization has with working with uncertainty data that is fuzzy, flexible, and/or possibilistic (or a mixture of these via IVPM), means that fuzzy optimization is able to provide an ample approach to optimization under uncertainty. Further research into the development of more robust methods that use fuzzy Banach spaces would certainty provide a deeper theoretical foundation to fuzzy optimization. Fuzzy optimization that utilize fuzzy Banach spaces have the advantage that the problem remains fuzzy throughout and only when one needs to make a decision or implement the solution does one map the solution to a real number (defuzzify). The methods that map fuzzy optimization problems to their real number equivalent defuzzify first and then optimize. Fuzzy optimization problems that optimize in fuzzy Banach spaces keep the solution fuzzy and defuzzify as a last step. Continued development of clear input and output semantics of fuzzy optimization will greatly aid fuzzy optimization’s applicability and relevance. When fuzzy optimization is used in, for example, an assembly-line scheduling problem and one’s solution is a fuzzy three, how does one convey this solution to the assembly-line manager? Lastly, continued research into handling dependencies in an efficient way would amplify the usefulness and applicability of fuzzy optimization.

Bibliography 1. Asai K, Tanaka H (1973) On the fuzzy – mathematical programming, Identification and System Estimation Proceedings of the 3rd IFAC Symposium, The Hague, 12–15 June 1973, pp 1050–1051 2. Audin J-P, Frankkowska H (1990) Set-Valued Analysis. Birkhäuser, Boston 3. Baudrit C, Dubois D, Fargier H (2005) Propagation of uncertainty involving imprecision and randomness. ISIPTA 2005, Pittsburgh, pp 31–40 4. Bector CR, Chandra S (2005) Fuzzy Mathematical Programming and Fuzzy Matrix Games. Springer, Berlin 5. Bellman RE, Zadeh LA (1970) Decision-Making in a Fuzzy Environment. Manag Sci B 17:141–164 6. Brooke A, Kendrick D, Meeraus A (2002) GAMS: A User’s Guide. Scientific Press, San Francisco (the latest updates can be obtained from http://www.gams.com/)

1235

1236

Fuzzy Optimization

7. Buckley JJ (1988) Possibility and necessity in optimization. Fuzzy Sets Syst 25(1):1–13 8. Buckley JJ (1988) Possibilistic linear programming with triangular fuzzy numbers. Fuzzy Sets Syst 26(1):135–138 9. Buckley JJ (1989) Solving possibilistic linear programming problems. Fuzzy Sets Syst 31(3):329–341 10. Buckley JJ (1989) A generalized extension principle. Fuzzy Sets Syst 33:241–242 11. Delgado M, Verdegay JL, Vila MA (1989) A general model for fuzzy linear programming. Fuzzy Sets Syst 29:21–29 12. Delgado M, Kacprzyk J, Verdegay J-L, Vila MA (eds) (1994) Fuzzy Optimization: Recent Advances. Physica, Heidelberg 13. Demster AP (1967) Upper and lower probabilities induced by multivalued mapping. Ann Math Stat 38:325–339 14. Dempster MAH (1969) Distributions in interval and linear programming. In: Hansen ER (ed) Topics in Interval Analysis. Oxford Press, Oxford, pp 107–127 15. Diamond P (1991) Congruence classes of fuzzy sets form a Banach space. J Math Anal Appl 162:144–151 16. Diamond P, Kloeden P (1994) Metric Spaces of Fuzzy Sets. World Scientific, Singapore 17. Diamond P, Kloeden P (1994) Robust Kuhn–Tucker conditions and optimization under imprecision. In: Delgado M, Kacprzyk J, Verdegay J-L, Vila MA (eds) Fuzzy Optimization: Recent Advances. Physica, Heidelberg, pp 61–66 18. Dubois D (1987) Linear programming with fuzzy data. In: Bezdek JC (ed) Analysis of Fuzzy Information, vol III: Applications in Engineering and Science. CRC Press, Boca Raton, pp 241–263 19. Dubois D, Prade H (1980) Fuzzy Sets and Systems: Theory and Applications. Academic Press, New York 20. Dubois D, Prade H (1980) Systems of linear fuzzy constraints. Fuzzy Sets Syst 3:37–48 21. Dubois D, Prade H (1981) Additions of interactive fuzzy numbers. IEEE Trans Autom Control 26(4):926–936 22. Dubois D, Prade H (1983) Ranking fuzzy numbers in the setting of possibility theory. Inf Sci 30:183–224 23. Dubois D, Prade H (1986) New results about properties and semantics of fuzzy set-theoretical operators. In: Wang PP, Chang SK (eds) Fuzzy Sets. Plenum Press, New York, pp 59–75 24. Dubois D, Prade H (1987) Fuzzy numbers: An overview. Tech. Rep. no 219, (LSI, Univ. Paul Sabatier, Toulouse, France). Mathematics and Logic. In: James Bezdek C (ed) Analysis of Fuzzy Information, vol 1, chap 1. CRC Press, Boca Raton, pp 3–39 25. Dubois D, Prade H (1988) Possibility Theory an Approach to Computerized Processing of Uncertainty. Plenum Press, New York 26. Dubois D, Prade H (2005) Fuzzy elements in a fuzzy set. Proceedings of the 11th International Fuzzy System Association World Congress, IFSA 2005, Beijing, July 2005, pp 55–60 27. Dubois D, Moral S, Prade H (1997) Semantics for possibility theory based on likelihoods. J Math Anal Appl 205:359–380 ˘ I, Tu¸s A (2007) Interactive fuzzy linear programming 28. Ertugrul and an application sample at a textile firm. Fuzzy Optim Decis Making 6:29–49 29. Fortin J, Dubois D, Fargier H (2008) Gradual numbers and their application to fuzzy interval analysis. IEEE Trans Fuzzy Syst 16:2, pp 388–402 30. Fourer R, Gay DM, Kerninghan BW (1993) AMPL: A Modeling Language for Mathematical Programming. Scientific Press,

31. 32.

33.

34. 35. 36.

37.

38. 39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

San Francisco, CA (the latest updates can be obtained from http://www.ampl.com/) Fullér R, Keresztfalvi T (1990) On generalization of Nguyen’s theorem. Fuzzy Sets Syst 41:371–374 Ghassan K (1982) New utilization of fuzzy optimization method. In: Gupta MM, Sanchez E (eds) Fuzzy Information and Decision Processes. North-Holland, Netherlands, pp 239–246 Gladish B, Parra M, Terol A, Uría M (2005) Management of surgical waiting lists through a possibilistic linear multiobjective programming problem. Appl Math Comput 167:477–495 Greenberg H (1992) MODLER: Modeling by Object-Driven Linear Elemental Relations. Ann Oper Res 38:239–280 Hanss M (2005) Applied Fuzzy Arithmetic. Springer, Berlin Hsu H, Wang W (2001) Possibilistic programming in production planning of assemble-to-order environments. Fuzzy Sets Syst 119:59–70 Inuiguchi M (1992) Stochastic Programming Problems Versus Fuzzy Mathematical Programming Problems. Jpn J Fuzzy Theory Syst 4(1):97–109 Inuiguchi M (1997) Fuzzy linear programming: what, why and how? Tatra Mt Math Publ 13:123–167 Inuiguchi M (2007) Necessity measure optimization in linear programming problems with fuzzy polytopes. Fuzzy Sets Syst 158:1882–1891 Inuiguchi M (2007) On possibility/fuzzy optimization. In: Melin P, Castillo O, Aguilar LT, Kacprzyk J, Pedrycz W (eds) Foundations of Fuzzy Logic and Soft Computing: 12th International Fuzzy System Association World Congress, IFSA 2007, Cancun, June 2007, Proceedings. Springer, Berlin, pp 351–360 Inuiguchi M, Ramik J (2000) Possibilistic linear programming: A brief review of fuzzy mathematical programming and a comparison with stochastic programming in portfolio selection problem. Fuzzy Sets Syst 111:97–110 Inuiguchi M, Sakawa M (1997) An achievement rate approach to linear programming problems with an interval objective function. J Oper Res Soc 48:25–33 Inuiguchi M, Tanino T (2004) Fuzzy linear programming with interactive uncertain parameters. Reliab Comput 10(5):512– 527 Inuiguchi M, Ichihashi H, Tanaka H (1990) Fuzzy programming: A survey of recent developments. In: Slowinski R, Teghem J (eds) Stochastic versus Fuzzy Approaches to Multiobjective Mathematical Programming Under Uncertainty. Kluwer, Netherlands, pp 45–68 Inuiguchi M, Ichihashi H, Kume Y (1992) Relationships Between Modality Constrained Programming Problems and Various Fuzzy Mathematical Programming Problems. Fuzzy Sets Syst 49:243–259 Inuiguchi M, Ichihashi H, Tanaka H (1992) Fuzzy Programming: A Survey of Recent Developments. In: Slowinski R, Teghem J (eds) Stochastic versus Fuzzy Approaches to Multiobjective Mathematical Programming under Uncertainty.Springer, Berlin, pp 45–68 Inuiguchi M, Sakawa M, Kume Y (1994) The usefulness of possibilistic programming in production planning problems. Int J Prod Econ 33:49–52 Jamison KD (1998) Modeling Uncertainty Using Probabilistic Based Possibility Theory with Applications to Optimization. Ph D Thesis, University of Colorado Denver, Department

Fuzzy Optimization

49.

50. 51.

52. 53.

54. 55.

56.

57.

58.

59. 60. 61.

62. 63. 64. 65. 66.

67. 68. 69. 70.

of Mathematical Sciences. http://www-math.cudenver.edu/ graduate/thesis/jamison.pdf Jamison KD (2000) Possibilities as cumulative subjective probabilities and a norm on the space of congruence classes of fuzzy numbers motivated by an expected utility functional. Fuzzy Sets Syst 111:331–339 Jamison KD, Lodwick WA (2001) Fuzzy linear programming using penalty method. Fuzzy Sets Syst 119:97–110 Jamison KD, Lodwick WA (2002) The construction of consistent possibility and necessity measures. Fuzzy Sets Syst 132(1):1–10 Jamison KD, Lodwick WA (2004) Interval-valued probability measures. UCD/CCM Report No. 213, March 2004 Joubert JW, Luhandjula MK, Ncube O, le Roux G, de Wet F (2007) An optimization model for the management of South African game ranch. Agric Syst 92:223–239 Kacprzyk J, Orlovski SA (eds) (1987) Optimization Models Using Fuzzy Sets and Possibility Theory. D Reidel, Dordrecht Kacprzyk J, Orlovski SA (1987) Fuzzy optimization and mathematical programming: A brief introduction and survey. In: Kacprzyk J, Orlovski SA (eds) Optimization Models Using Fuzzy Sets and Possibility Theory. D Reidel, Dordrecht, pp 50–72 Kasperski A, Zeilinski P (2007) Using gradual numbers for solving fuzzy-valued combinatorial optimization problems. In: Melin P, Castillo O, Aguilar LT, Kacprzyk J, Pedrycz W (eds) Foundations of Fuzzy Logic and Soft Computing: 12th International Fuzzy System Association World Congress, IFSA 2007, Cancun, June 2007, Proceedings. Springer, Berlin, pp 656–665 Kaufmann A, Gupta MM (1985) Introduction to Fuzzy Arithmetic – Theory and Applications. Van Nostrand Reinhold, New York Kaymak U, Sousa JM (2003) Weighting of Constraints in Fuzzy Optimization. Constraints 8:61–78 (also in the 2001 Proceedings of IEEE Fuzzy Systems Conference) Klir GJ, Yuan B (1995) Fuzzy Sets and Fuzzy Logic. Prentice Hall, Upper Saddle River Lai Y, Hwang C (1992) Fuzzy Mathematical Programming. Springer, Berlin Lin TY (2005) A function theoretic view of fuzzy sets: New extension principle. In: Filev D, Ying H (eds) Proceedings of NAFIPS05 Liu B (1999) Uncertainty Programming. Wiley, New York Liu B (2000) Dependent-chance programming in fuzzy environments. Fuzzy Sets Syst 109:97–106 Liu B (2001) Fuzzy random chance-constrained programming. IEEE Trans Fuzzy Syst 9:713–720 Liu B (2002) Theory and Practice of Uncertainty Programming. Physica, Heidelberg Liu B, Iwamura K (2001) Fuzzy programming with fuzzy decisions and fuzzy simulation-based genetic algorithm. Fuzzy Sets Syst 122:253–262 Lodwick WA (1990) Analysis of structure in fuzzy linear programs. Fuzzy Sets Syst 38:15–26 Lodwick WA (1990) A generalized convex stochastic dominance algorithm. IMA J Math Appl Bus Ind 2:225–246 Lodwick WA (1999) Constrained Interval Arithmetic. CCM Report 138, Feb. 1999. CCM, Denver Lodwick WA (ed) (2004) Special Issue on Linkages Between Interval Analysis and Fuzzy Set Theory. Reliable Comput 10

71. Lodwick WA (2007) Interval and fuzzy analysis: A unified approach. In: Advances in Imaging and Electronic Physics, vol 148. Elsevier, San Diego, pp 75–192 72. Lodwick WA, Bachman KA (2005) Solving large-scale fuzzy and possibilistic optimization problems: Theory, algorithms and applications. Fuzzy Optim Decis Making 4(4):257–278. (also UCD/CCM Report No. 216, June 2004) 73. Lodwick WA, Inuiguchi M (eds) (2007) Special Issue on Optimization Under Fuzzy and Possibilistic Uncertainty. Fuzzy Sets Syst 158:17; 1 Sept 2007 74. Lodwick WA, Jamison KD (1997) A computational method for fuzzy optimization. In: Bilal A, Madan G (eds) Uncertainty Analysis in Engineering and Sciences: Fuzzy Logic, Statistics, and Neural Network Approach, chap 19. Kluwer, Norwell 75. Lodwick WA, Jamison KD (eds) (2003) Special Issue: Interfaces Fuzzy Set Theory Interval Anal 135:1; April 1, 2003 76. Lodwick WA, Jamison KD (2003) Estimating and Validating the Cumulative Distribution of a Function of Random Variables: Toward the Development of Distribution Arithmetic. Reliab Comput 9:127–141 77. Lodwick WA, Jamison KD (2005) Theory and semantics for fuzzy and possibilistic optimization, Proceedings of the 11th International Fuzzy System Association World Congress. IFSA 2005, Beijing, July 2005 78. Lodwick WA, Jamison KD (2006) Interval-valued probability in the analysis of problems that contain a mixture of fuzzy, possibilistic and interval uncertainty. In: Demirli K, Akgunduz A (eds) 2006 Conference of the North American Fuzzy Information Processing Society, 3–6 June 2006, Montréal, Canada, paper 327137 79. Lodwick WA, Jamison KD (2008) Interval-Valued Probability in the Analysis of Problems Containing a Mixture of Fuzzy, Possibilistic, Probabilistic and Interval Uncertainty. Fuzzy Sets and Systems 2008 80. Lodwick WA, Jamison KD (2007) The use of interval-valued probability measures in optimization under uncertainty for problems containing a mixture of fuzzy, possibilistic, and interval uncertainty. In: Melin P, Castillo O, Aguilar LT, Kacprzyk J, Pedrycz W (eds) Foundations of Fuzzy Logic and Soft Computing: 12 th International Fuzzy System Association World Congress, IFSA 2007, Cancun, Mexico, June 2007, Proceedings. Springer, Berlin, pp 361–370 81. Lodwick WA, Jamison KD (2007) Theoretical and semantic distinctions of fuzzy, possibilistic, and mixed fuzzy/possibilistic optimization. Fuzzy Sets Syst 158(17):1861–1872 82. Lodwick WA, McCourt S, Newman F, Humpheries S (1999) Optimization Methods for Radiation Therapy Plans. In: Borgers C, Natterer F (eds) IMA Series in Applied Mathematics – Computational Radiology and Imaging: Therapy and Diagnosis. Springer, New York, pp 229–250 83. Lodwick WA, Neumaier A, Newman F (2001) Optimization under uncertainty: methods and applications in radiation therapy. Proc 10th IEEE Int Conf Fuzzy Syst 2001 3:1219–1222 84. Luhandjula MK (1986) On possibilistic linear programming. Fuzzy Sets Syst 18:15–30 85. Luhandjula MK (1989) Fuzzy optimization: an appraisal. Fuzzy Sets Syst 30:257–282 86. Luhandjula MK (2004) Optimisation under hybrid uncertainty. Fuzzy Sets Syst 146:187–203

1237

1238

Fuzzy Optimization

87. Luhandjula MK (2006) Fuzzy stochastic linear programming: Survey and future research directions. Eur J Oper Res 174:1353–1367 88. Luhandjula MK, Ichihashi H, Inuiguchi M (1992) Fuzzy and semi-infinite mathematical programming. Inf Sci 61:233–250 89. Markowitz H (1952) Portfolio selection. J Finance 7:77–91 90. Moore RE (1979) Methods and Applications of Interval Analysis. SIAM, Philadelphia 91. Negoita CV (1981) The current interest in fuzzy optimization. Fuzzy Sets Syst 6:261–269 92. Negoita CV, Ralescu DA (1975) Applications of Fuzzy Sets to Systems Analysis. Birkhäuser, Boston 93. Negoita CV, Sularia M (1976) On fuzzy mathematical programming and tolerances in planning. Econ Comput Cybern Stud Res 3(31):3–14 94. Neumaier A (2003) Fuzzy modeling in terms of surprise. Fuzzy Sets Syst 135(1):21–38 95. Neumaier A (2004) Clouds, fuzzy sets and probability intervals. Reliab Comput 10:249–272. Springer 96. Neumaier A (2005) Structure of clouds. (submitted – downloadable http://www.mat.univie.ac.at/~neum/papers.html) 97. Ogryczak W, Ruszczynski A (1999) From stochastic dominance to mean-risk models: Semideviations as risk measures. Eur J Oper Res 116:33–50 98. Nguyen HT (1978) A note on the extension principle for fuzzy sets. J Math Anal Appl 64:369–380 99. Ralescu D (1977) Inexact solutions for large-scale control problems. In: Proceedings of the 1st International Congress on Mathematics at the Service of Man, Barcelona 100. Ramik J (1986) Extension principle in fuzzy optimization. Fuzzy Sets Syst 19:29–35 101. Ramik J, Rimanek J (1985) Inequality relation between fuzzy numbers and its use in fuzzy optimization. Fuzzy Sets Syst 16:123–138 102. Ramik J, Vlach M (2002) Fuzzy mathematical programming: A unified approach based on fuzzy relations. Fuzzy Optim Decis Making 1:335–346 103. Ramik J, Vlach M (2002) Generalized Concavity in Fuzzy Optimization and Decision Analysis. Kluwer, Boston 104. Riverol C, Pilipovik MV (2007) Optimization of the pyrolysis of ethane using fuzzy programming. Chem Eng J 133:133–137 105. Riverol C, Pilipovik MV, Carosi C (2007) Assessing the water requirements in refineries using possibilistic programming. Chem Eng Process 45:533–537 106. Rommelfanger HJ (1994) Some problems of fuzzy optimization with T-norm based extended addition. In: Delgado M, Kacprzyk J, Verdegay J-L, Vila MA (eds) Fuzzy Optimization: Recent Advances. Physica, Heidelberg, pp 158–168 107. Rommelfanger HJ (1996) Fuzzy linear programming and applications. Eur J Oper Res 92:512–527 108. Rommelfanger HJ (2004) The advantages of fuzzy optimization models in practical use. Fuzzy Optim Decis Making 3:293–309 109. Rommelfanger HJ, Slowinski R (1998) Fuzzy linear programming with single or multiple objective functions. In: Slowinski R (ed) Fuzzy Sets in Decision Analysis, Operations Research and Statistics. The Handbooks of Fuzzy Sets. Kluwer, Netherlands, pp 179–213 110. Roubens M (1990) Inequality constraints between fuzzy numbers and their use in mathematical programming. In: Slowinski R, Teghem J (eds) Stochastic versus Fuzzy Approaches

111. 112. 113.

114. 115.

116. 117. 118. 119.

120. 121.

122.

123.

124.

125.

126.

127.

128.

129.

to Multiobjective Mathematical Programming Under Uncertainty. Kluwer, Netherlands, pp 321–330 Russell B (1924) Vagueness. Aust J Philos 1:84–92 Sahinidis N (2004) Optimization under uncertainty: State-ofthe-art and opportunities. Comput Chem Eng 28:971–983 Saito S, Ishii H (1998) Existence criteria for fuzzy optimization problems. In: Takahashi W, Tanaka T (eds) Proceedings of the International Conference on Nonlinear Analysis and Convex Analysis, Niigata, Japan, 28–31 July 1998. World Scientific Press, Singapore, pp 321–325 Shafer G (1976) Mathematical A Theory of Evidence. Princeton University Press, Princeton Shafer G (1987) Belief functions and possibility measures. In: James Bezdek C (ed) Analysis of Fuzzy Information. Mathematics and Logic, vol 1. CRC Press, Boca Raton, pp 51–84 Sousa JM, Kaymak U (2002) Fuzzy Decision Making in Modeling and Control. World Scientific Press, Singapore Steinbach M (2001) Markowitz revisited: Mean-variance models in financial portfolio analysis. SIAM Rev 43(1):31–85 Tanaka H, Asai K (1984) Fuzzy linear programming problems with fuzzy numbers. Fuzzy Sets Syst 13:1–10 Tanaka H, Okuda T, Asai K (1973) Fuzzy mathematical programming. Trans Soc Instrum Control Eng 9(5):607–613; (in Japanese) Tanaka H, Okuda T, Asai K (1974) On fuzzy mathematical programming. J Cybern 3(4):37–46 Tanaka H, Ichihasi H, Asai K (1984) A formulation of fuzzy linear programming problems based on comparison of fuzzy numbers. Control Cybern 13(3):185–194 Tanaka H, Ichihasi H, Asai K (1985) Fuzzy decisions in linear programming with trapezoidal fuzzy parameters. In: Kacprzyk J, Yager R (eds) Management Decision Support Systems Using Fuzzy Sets and Possibility Theory. Springer, Heidelberg, pp 146–159 Tang J, Wang D, Fung R (2001) Formulation of general possibilistic linear programming problems for complex industrial systems. Fuzzy Sets Syst 119:41–48 Thipwiwatpotjani P (2007) An algorithm for solving optimization problems with interval-valued probability measures. CCM Report, No. 259 (December 2007). CCM, Denver Untiedt E (2006) Fuzzy and possibilistic programming techniques in the radiation therapy problem: An implementationbased analysis. Masters Thesis, University of Colorado Denver, Department of Mathematical Sciences (July 5, 2006) Untiedt E (2007) A robust model for mixed fuzzy and possibilistic programming. Project report for Math 7593: Advanced Linear Programming. Spring, Denver Untiedt E (2007) Using gradual numbers to analyze nonmonotonic functions of fuzzy intervals. CCM Report No 258 (December 2007). CCM, Denver Untiedt E, Lodwick WA (2007) On selecting an algorithm for fuzzy optimization. In: Melin P, Castillo O, Aguilar LT, Kacprzyk J, Pedrycz W (eds) Foundations of Fuzzy Logic and Soft Computing: 12th International Fuzzy System Association World Congress, IFSA 2007, Cancun, Mexico, June 2007, Proceedings. Springer, Berlin, pp 371–380 Vasant PM, Barsoum NN, Bhattacharya A (2007) Possibilistic optimization in planning decision of construction industry. Int J Prod Econ (to appear)

Fuzzy Optimization

130. Verdegay JL (1982) Fuzzy mathematical programming. In: Gupta MM, Sanchez E (eds) Fuzzy Information and Decision Processes. North-Holland, Amsterdam, pp 231–237 131. Vila MA, Delgado M, Verdegay JL (1989) A general model for fuzzy linear programming. Fuzzy Sets Syst 30:21–29 132. Wang R, Liang T (2005) Applying possibilistic linear programming to aggregate production planning. Int J Prod Econ 98:328–341 133. Wang S, Zhu S (2002) On fuzzy portfolio selection problems. Fuzzy Optim Decis Making 1:361–377 134. Wang Z, Klir GJ (1992) Fuzzy Measure Theory. Plenum Press, New York 135. Weichselberger K (2000) The theory of interval-probability as a unifying concept for uncertainty. Int J Approx Reason 24:149–170 136. Werners B (1988) Aggregation models in mathematical programming. In: Mitra G (ed) Mathematical Models for Decision Support, NATO ASI Series vol F48. Springer, Berlin 137. Werners B (1995) Fuzzy linear programming – Algorithms and applications. Addendum to the Proceedings of ISUMANAPIPS’95, College Park, Maryland, 17–20 Sept 1995, pp A7A12 138. Whitmore GA, Findlay MC (1978) Stochastic Dominance: An Approach to DecisionDecision-Making Under Risk. Lexington Books, Lexington 139. Yager RR (1980) On choosing between fuzzy subsets. Kybernetes 9:151–154

140. Yager RR (1981) A procedure for ordering fuzzy subsets of the unit interval. Inf Sci 24:143–161 141. Yager RR (1986) A characterization of the extension principle. Fuzzy Sets Syst 18:205–217 142. Zadeh LA (1965) Fuzzy Sets. Inf Control 8:338–353 143. Zadeh LA (1968) Probability measures of fuzzy events. J Math Anal Appl 23:421–427 144. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning. Inf Sci, Part I: 8:199– 249; Part II: 8:301–357; Part III: 9:43–80 145. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28 146. Zeleny M (1994) Fuzziness, knowledge and optimization: New optimality concepts. In: Delgado M, Kacprzyk J, Verdegay J-L, M Vila A (eds) Fuzzy Optimization: Recent Advances. Physica, Heidelberg, pp 3–20 147. Zimmermann HJ (1974) Optimization in fuzzy environment. Paper presented at the International XXI TIMS and 46th ORSA Conference, Puerto Rico Meeting, San Juan, Porto Rico, Oct 1974 148. Zimmermann HJ (1976) Description and optimization of fuzzy systems. Int J General Syst 2(4):209–216 149. Zimmermann HJ (1978) Fuzzy programming and linear programming with several objective functions. Fuzzy Sets Syst 1:45–55 150. Zimmermann HJ (1983) Fuzzy mathematical programming. Comput Oper Res 10:291–298

1239

1240

Fuzzy Probability Theory

Fuzzy Probability Theory MICHAEL BEER National University of Singapore, Kent Ridge, Singapore Article Outline Glossary Definition of the Subject Introduction Mathematical Environment Fuzzy Random Quantities Fuzzy Probability Representation of Fuzzy Random Quantities Future Directions Bibliography

Fuzzy Probability Theory, Figure 1 Normalized fuzzy set with ˛-level sets and support

Ai 2 S(X) ) ACi 2 S(X) ;

(6)

and if for every sequence of sets Ai

Glossary Fuzzy set and fuzzy vector Let X represent a universal set and x be the elements of X, then

Ai 2 S(X); i D 1; 2; : : : )

1 [

Ai 2 S(X) :

(7)

i

˜ D f(x; A (x)) j x 2 Xg; A (x) 0 8x 2 X (1) A ˜ on X. A (x) is the memberis referred to as fuzzy set A ship function (characteristic function) of the fuzzy set ˜ and represents the degree with which the elements x A ˜ If belong to A. sup[A (x)] D 1 ;

(2)

x2X

˜ are called the membership function and the fuzzy set A normalized; see Fig. 1. In case of a limitation to the Euclidean space X D Rn and normalized fuzzy sets, the ˜ is also referred to as fuzzy vector denoted fuzzy set A by x˜ with its membership function (x), or, in the onedimensional case, as fuzzy variable x˜ with (x). ˛-Level set and support The crisp sets A˛k D fx 2 X j A (x) ˛ k g

(3)

˜ for real numbers extracted from the fuzzy set A ˛ k 2 (0; 1] are called ˛-level sets. These comply with the inclusion property A˛k A˛ i

8˛ i ; ˛ k 2 (0; 1] with ˛ i  ˛ k :

(4)

˜ The largest ˛-level set A˛k !C0 is called support S(A); see Fig. 1.  -Algebra A family M(X) of sets Ai on the universal set X is referred to as -algebra S(X) on X, if X 2 S(X) ;

(5)

In this definition, ACi is the complementary set of Ai with respect to X, a family M(X) of sets Ai refers to subsets and systems of subsets of the power set P(X) on X, and the power set P(X) is the set of all subsets Ai of X. Definition of the Subject Fuzzy probability theory is an extension of probability theory to dealing with mixed probabilistic/non-probabilistic uncertainty. It provides a theoretical basis to model uncertainty which is only partly characterized by randomness and defies a pure probabilistic modeling with certainty due to a lack of trustworthiness or precision of the data or a lack of pertinent information. The fuzzy probabilistic model is settled between the probabilistic model and nonprobabilistic uncertainty models. The significance of fuzzy probability theory lies in the treatment of the elements of a population not as crisp quantities but as set-valued quantities or granules in an uncertain fashion, which largely complies with reality in most everyday situations. Probabilistic and non-probabilistic uncertainty can so be transferred adequately and separately to the results of a subsequent analysis. This enables best case and worst case estimates in terms of probability taking account of variations within the inherent non-probabilistic uncertainty. The development of fuzzy probability theory was initiated by H. Kwakernaak with the introduction of fuzzy random variables in [47] in 1978. Subsequent developments have been reported in different directions and from dif-

Fuzzy Probability Theory

ferent perspectives including differences in terminology. The usefulness of the theory has been underlined with various applications beyond mathematics and information science, in particular, in engineering. The application fields are not limited and may be extended increasingly, for example, to medicine, biology, psychology, economy, financial sciences, social sciences, and even to law. Introduction The probably most popular example of fuzzy probability is the evaluation of a survey on the subjective assessment of temperature. A group of test persons are asked – under equal conditions – to give a statement on the current temperature as realistically as possible. If the scatter of the statements is considered as random, the mean value of the statements provides a reasonable statistical point estimate for the actual temperature. The statements are, however, naturally given in an uncertain form. The test persons enunciate their perception in a form such as about 25°C, possibly 27°C, between 24°C and 26°C, or they even only come up with linguistic assessments such as warm, very warm, or pleasant. This uncertainty is non-probabilistic but has to be taken into account in the statistical evaluation. It is transferred to the estimated mean value, which is no longer obtained as a crisp number but as a value range or a set of values corresponding to the possibilities within the range of uncertainty of the statements. If the uncertain statements are initially quantified as fuzzy values, the mean value is obtained as a fuzzy value, too; and the probability of certain events is also computed as a fuzzy quantity – referred to as fuzzy probability. This example is a typical materialization of the following general real-world situation. The numerical representation of a physical quantity with the aid of crisp numbers x 2 R or sets thereof is frequently interfered by uncertainty regarding the trustworthiness of measured, or otherwise specified, values. The perceptions of physical quantities may appear, for example, as imprecise, diffuse, vague, dubious, or ambiguous. Underlying reasons for this phenomenon include the limited precision of any measurement (digital or analog), indirect measurements via auxiliary quantities in conjunction with a – more or less trustworthy – model to eventually determine the value wanted, measurements under weakly specified or arbitrarily changing boundary conditions, and the specification of values by experts in a linguistic manner. The type of the associated uncertainty is non-frequentative and improper for a subjective probabilistic modeling; hence, it is non-probabilistic. This uncertainty is unavoidable and may always be made evident by a respective choice of scale.

If a set of uncertain perceptions of a physical quantity is present in the form of a random sample, then the overall uncertainty possesses a mixed probabilistic/non-probabilistic character. Whilst the scatter of the realizations of the physical quantity possesses a probabilistic character (frequentative or subjective), each particular realization from the population may, additionally, exhibit nonprobabilistic uncertainty. Consequently, a realistic modeling in those cases must involve both probabilistic and non-probabilistic uncertainty. This modeling without distorting or ignoring information is the mission of fuzzy probability theory. A pure probabilistic modeling would introduce unwarranted information in the form of a distribution function that cannot be justified and would thus diminish the trustworthiness of the probabilistic results. Mathematical Environment Fuzzy probability is part of the framework of generalized information theory [38] and is covered by the umbrella of granular computing [50,62]. It represents a special case of imprecise probabilities [15,78] with ties to concepts of random sets [52]. This is underlined by Walley’s summary of the semantics of imprecise probabilities with the term indeterminacy, which arises from ignorance about facts, events, or dependencies. Within the class of mathematical models covered by the term imprecise probabilities, see [15, 38,78], fuzzy probability theory has a relationship to concepts known as upper and lower probabilities [28], sets of probability measures [24], distribution envelops [7], interval probabilities [81], and p-box approach [23]. Also, similarities exist with respect to evidence theory (or Dempster–Shafer theory) [20,70] as a theory of infinitely monotone Choquet capacities [39,61]. The relationship to the latter is associated with the interpretation of the measures plausibility and belief, with the special cases of possibility and necessity, as upper and lower probabilities, respectively, [8]. Fuzzy probability shares the common feature of all imprecise probability models: the uncertainty of an event is characterized with a set of possible measure values in terms of probability, or with bounds on probability. Its distinctive feature is that set-valued information, and hence the probability of associated events, is described with the aid of uncertain sets according to fuzzy set theory [83,86]. This represents a marriage between fuzzy methods and probabilistics with fuzziness and randomness as special cases, which justifies the denotation as fuzzy randomness. Fuzzy probability theory enables a consideration of a fuzzy set of possible probabilistic models over the range of im-

1241

1242

Fuzzy Probability Theory

precision of the knowledge about the underlying randomness. The associated fuzzy probabilities provide weighted bounds on probability – the weights of which are obtained as the membership values of the fuzzy sets. Based on ˛-discretization [86] and the representation of fuzzy sets as sets of ˛-level sets, the relationship of fuzzy probability theory to various concepts of imprecise probabilities becomes obvious. For each ˛-level a common interval probability, crisp bounds on probability, or a classical set of probability measures, respectively, are obtained. The ˛-level sets of fuzzy events can be treated as random sets. Further, a relationship of these random sets to evidence theory can be constructed if a respective basic probability assignment is selected; see [21]. Consistency with evidence theory is obtained if the focal sets are specified as fuzzy elementary events and if the basic probability assignment follows a discrete uniform distribution over the fuzzy elementary events. The model fuzzy randomness with its two components – fuzzy methods and probabilistics – can utilize a distinction between aleatory and epistemic uncertainty with respect to the sources of uncertainty [29]. This is particularly important in view of practical applications. Irreducible uncertainty as a property of the system associated with fluctuations/variability may be summarized as aleatory uncertainty and described probabilistically, and reducible uncertainty as a property of the analysts, or its perception, associated with a lack of knowledge/precision may be understood as epistemic uncertainty and described with fuzzy sets. The model fuzzy randomness then combines, without mixing, both components in the form of a fuzzy set of possible probabilistic models over some particular range of imprecision. This distinction is retained throughout any subsequent analysis and reflected in the results. The development of fuzzy probability was initiated with the introduction of fuzzy random variables by Kwakernaak [47,48] in 1978/79. Subsequent milestones were set by Puri and Ralescu [63], Kruse and Meyer [46], Wang and Zhang [80], and Krätschmer [43]. The developments show differences in terminology, concepts, and in the associated consideration of measurability; and the investigations are ongoing [11,12,14,17,35,36,37,45,73]. Krätschmer [43] showed that the different concepts can be unified to a certain extent. Generally, it can be noted that ˛-discretization is utilized as a helpful instrument. An overview with specific comments on the different developments is provided in [43] and [57]. Investigations were pursued on independent and dependent fuzzy random variables for which parameters were defined with particular focus on variance and covariance [22,32,40,41,60]. Fuzzy random processes

were examined to reveal properties of limit theorems and martingales associated with fuzzy randomness [44,65]; see also [71,79] and for a survey [49]. Particular interest was devoted to the strong law of large numbers [10,33,34]. Further, the differentiation and the integration of fuzzy random quantities was investigated in [51,64]. Driven by the motivation for the establishment of fuzzy probability theory considerable effort was made in the modeling and statistical evaluation of imprecise data. Fundamental achievements were reported by Kruse and Meyer [46], Bandemer and Näther [2,5], and by Viertl [75]. Classical statistical methods were extended in order to take account of statistical fluctuations/variability and imprecision simultaneously, and the specific features associated with the imprecision of the data were investigated. Ongoing research is reported, for example, in [58, 74] in view of evaluating measurements, in [51,66] for decision making, and in [16,42,59] for regression analysis. Methods for evaluating imprecise data with the aid of generalized histograms are discussed in [9,77]. Also, the application of resampling methods is pursued; bootstrap concepts are utilized for statistical estimations [31] and hypothesis testing [26] based on imprecise data. Another method for hypothesis testing is proposed in [27], which employs fuzzy parameters in order to describe a fuzzy transition between rejection and acceptance. Bayesian methods have also been extended by the inclusion of fuzzy variables to take account of imprecise data; see [75] for basic considerations. A contribution to Bayesian statistics with imprecise prior distributions is presented in [76]. This leads to imprecise posterior distributions, imprecise predictive distributions, and may be used to deduce imprecise confidence intervals. A combination of the Bayesian theorem with kriging based on imprecise data is described in [3]. A Bayesian test of fuzzy hypotheses is discussed in [72], while in [67] the application of a fuzzy Bayesian method for decision making is presented. In view of practical applications, probability distribution functions are defined for fuzzy random quantities [54, 75,77,85] – despite some drawback [6]. These distribution functions can easily be formulated and used for further calculations, but they do not uniquely describe a fuzzy random quantity. This theoretical lack is, however, generally without an effect in practical applications so that stochastic simulations may be performed according to the distribution functions. Alternative simulation methods were proposed based on parametric [13] and non-parametric [6, 55] descriptions of imprecision. The approach according to [55] enables a direct generation of fuzzy realizations based on a new concept for an incremental representation of fuzzy random quantities. This method is designed to

Fuzzy Probability Theory

simulate and predict fuzzy time series; it circumvents the problems of artificial uncertainty growth or bias of nonprobabilistic uncertainty, which is frequently concerned with numerical simulations. This variety of theoretical developments provides reasonable margins for the formulation of fuzzy probability theory but does not allow the definition of a unique concept. Choices have to be made within the elaborated margins depending on the envisaged application and environment. For the subsequent sections these choices are made in view of a broad spectrum of possible applications, for example, in civil/mechanical engineering [54]. These choices concern the following three main issues. First, measurability has to be ensured according to a sound concept. According to [43], the concepts of measurable bounding functions [47,48], of measurable mappings of ˛-level sets [63], and of measurable fuzzy valued mappings [17,36] are available; or the unifying concept proposed in [43] itself, which utilizes a special topology on the space of fuzzy realizations, may be selected. From a practical point of view the concept of measurable bounding functions is most reasonable due to its analogy to traditional probability theory. On this basis, a fuzzy random quantity can be regarded as a fuzzy set of traditional, crisp, random quantities, each one carrying a certain membership degree. Each of these crisp random quantities is then measurable in the traditional fashion, and their membership degrees can be transferred to the respective measure values. The set of the obtained measure values including their membership degrees then represents a fuzzy probability. Second, a concept for the integration of a fuzzy-valued function has to be selected from the different available approaches [86]. This is associated with the computation of the probability of a fuzzy event. An evaluation in a mean sense weighted by the membership function of the fuzzy event is suggested in [84], which leads to a scalar value for the probability. This evaluation is associated with the interpretation that an event may occur partially. An approach for calculating the probability of a fuzzy event as a fuzzy set is proposed in [82]; the resulting fuzzy probability then represents a set of measure values with associated membership degrees. This complies with the interpretation that the occurrence of an event is binary but it is not clearly indicated if the event has occurred or not. The imprecision is associated with the observation rather than with the event. The latter approach corresponds with the practical situation in many cases, provides useful information in form of the imprecision reflected in the measure values, and follows the selected concept of measurability. It is thus taken as a basis for further consideration.

Third, the meaning of the distance between fuzzy sets as realizations of a fuzzy random quantity must be defined, which is of particular importance for the definition of the variance and of further parameters of fuzzy random quantities. An approach that follows a set-theoretical point of view and leads to a crisp distance measure is presented in [40]. It is proposed to apply the Hausdorff metric to the ˛-level sets of a fuzzy quantity and to average the results over the membership scale. Consequently, parameters of a fuzzy random quantity which are associated with a distance between fuzzy realizations reflect the variability within the imprecision merely in an integrated form. For example, the variance of a fuzzy random variable is obtained as a crisp value. In contrast to this, the application of standard algorithms for operations on fuzzy sets, such as the extension principle, [4,39,86] leads to fuzzy distances between fuzzy sets. Parameters, including variances, of fuzzy random quantities are then obtained as fuzzy sets of possible parameter values. This corresponds to the interpretation of fuzzy random quantities as fuzzy sets of traditional, crisp, random quantities. The latter approach is thus pursued further. These three selections comply basically with the definitions in [46] and [47]; see also [57]. Among all the possible choices, this set-up ensures the most plausible settlement of fuzzy probability theory within the framework of imprecise probabilities [15,38,78] with ties to evidence theory and random set approaches [21,30]. Fuzzy probability is obtained in the form of weighted plausible bounds on probability. Moreover, in view of practical applications, the treatment of fuzzy random quantities as fuzzy sets of traditional random quantities enables the utilization of established probabilistic methods as kernel solutions in the environment of a fuzzy analysis. For example, sophisticated methods of Monte Carlo simulation [25,68,69] may be combined with a generally applicable fuzzy analysis based on an global optimization approach using ˛-discretization [53]. If some restricting conditions are complied with, numerically efficient methods from interval mathematics [1] may be employed for the ˛-level mappings instead of a global optimization approach; see [56]. The selected concept, eventually, enables best-case and worst-case studies within the range of possible probabilistic models. Fuzzy Random Quantities With the above selections, the definitions from traditional probability theory can be adopted and extended to dealing with imprecise outcomes from a random experiment. Let ˝ be the space of random elementary events ! 2 ˝

1243

1244

Fuzzy Probability Theory

Fuzzy Probability Theory, Figure 2 Fuzzy random variable

and the universe on which the realizations are observed be the n-dimensional Euclidean space X D Rn . Then, a membership scale  is introduced perpendicular to the hyperplane ˝  X. This enables the specification of fuzzy sets on X for given elementary events ! from ˝ without interaction between ˝ and . That is, randomness induced by ˝ and fuzziness described by  – only in x-direction – are not mixed with one another. Let F(X) be the set of all fuzzy quantities on X D Rn ; that is, F(X) denotes the ˜ according ˜ on X D Rn , with A collection of all fuzzy sets A to Eq. (1). Then, the imprecise result of the mapping ˜ : ˝ ! F(X) X

(8)

˜ In conis referred to as fuzzy random quantity X. trast to real-valued random quantities, a fuzzy realization x˜ (!) 2 F(X), or x˜ (!) X, is now assigned to each elementary event ! 2 ˝; see Fig. 2. These fuzzy realizations may be understood as a numerical representation of granules. Generally, a fuzzy random quantity can be discrete or continuous with respect to both fuzziness and randomness. The further consideration refers to the continuous case, from which the discrete case may be derived. Without a limitation in generality, the fuzzy realizations may be restricted to normalized fuzzy quantities, thus representing fuzzy vectors. Further restrictions can be defined in view of a convenient numerical treatment if the application field allows for. This concerns, for example, a restriction to connected and compact ˛-level sets A˛ D x ˛ of the fuzzy realizations, a restriction to convex ˜ is convex if fuzzy sets as fuzzy realizations (a fuzzy set A

all its ˛-level sets A˛ are convex sets), or a restriction to fuzzy realizations with only one element x i carrying the membership (x i ) D 1 as in [47]. For the treatment of the fuzzy realizations, a respective algorithm for operations on fuzzy sets has to be selected. Following the literature and the above selection, the standard basis is employed with the min-operator as a special case of a t-norm and the associated max-operator as a special case of a t-co-norm [18,19,86]. This leads to the minmax operator and the extension principle [4,39,86]. With the interpretation of a fuzzy random quantity as a fuzzy set of real-valued random quantities, according to the above selection, the following relationship to traditional probability theory is obtained. Let x ji be a realization of a real-valued random quantity X j and x˜ i be a fuzzy ˜ with x ji and x˜ i realization of a fuzzy random quantity X be assigned to the same elementary event ! i . If x ji 2 x˜ i , then x ji is called contained in x˜ i . If, for all elementary events ! i 2 ˝; i D 1; 2; : : :, the x ji are contained in the x˜ i , the set of the x ji ; i D 1; 2; : : :, then constitutes an orig˜ see Fig. 2. The inal X j of the fuzzy random quantity X; ˜ original X j is referred to as completely contained in X, ˜ X j 2 X. Each real-valued random quantity X that is com˜ is an original X j of X ˜ and carries the pletely contained in X membership degree (X j ) D max[˛j x ji 2 x i˛ 8i:] :

(9)

That is, in the ˝-direction, each original X j must be con˜ Consequently, the fuzzy sistent with the fuzziness of X. ˜ random quantity X can be represented as the fuzzy set of

Fuzzy Probability Theory

˜ all originals X j contained in X, ˚  ˜ D (X j ; (X j ))j x ji 2 x˜ i 8i: : X

˜ with their membership values (X j ). Specifically, tity X (10)

˜ contains at least one Each fuzzy random quantity X ˜ Each real-valued random quantity X as an original X j of X. ˜ fuzzy random quantity X that possesses precisely one original is thus a real-valued random quantity X. That is, realvalued random quantities are a special case of fuzzy random quantities. This enables a simultaneous treatment of real-valued random quantities and fuzzy random quantities within the same theoretical environment and with the same numerical algorithms. Or, vice versa, it enables the utilization of theoretical results and established numerical algorithms from traditional probability theory within the framework of fuzzy probability theory. If ˛-discretization is applied to the fuzzy random ˜ random ˛-level sets X˛ are obtained, quantity X, X˛ D fX D X j j (X j ) ˛g :

(11)

Their realizations are ˛-level sets x i˛ of the respective ˜ fuzzy realizations x˜ i of the fuzzy random quantity X. A fuzzy random quantity can thus, alternatively, be represented by the set of its ˛-level sets, ˚  ˜ D (X˛ ; (X˛ )) j (X˛ ) D ˛ 8˛ 2 (0; 1] : (12) X In the one-dimensional case and with the restriction to connected and compact ˛-level sets of the realizations, the ˜ berandom ˛-level sets X˛ of the fuzzy random variable X come closed random intervals [X˛l ; X˛r ]. Fuzzy Probability Fuzzy probability is derived as a fuzzy set of probability measures for events the occurrence of which depends on the behavior of a fuzzy random quantity. These events are referred to as fuzzy random events with the following characteristics. ˜ be a fuzzy random quantity according to Eq. (8) Let X with the realizations x˜ and S(X) be a -algebra of sets Ai defined on X. Then, the event ˜ hits Ai E˜ i : X

˜ i ) D f(P(X j 2 Ai ); (P(X j 2 Ai ))) P(A ˜ (P(X j 2 Ai )) D (X j ) 8 jg : j X j 2 X;

Each of the involved probabilities P(X j 2 Ai ) is a traditional, real-valued probability associated with the traditional probability space [X; S; P] and complying with all established theorems and properties of traditional probability. For a formal closure of fuzzy probability theory, the membership scale  is incorporated in the probability space to constitute the fuzzy probability space denoted by ˜ [X; S; P; ] or [X; S; P]. The evaluation of the fuzzy random event Eq. (13) hinges on the question whether a fuzzy realization x˜ k hits the set Ai or not. Due to the fuzziness of the x˜ , these events appear as fuzzy events E˜ i k : x˜ k hits Ai with the following three options for occurrence; see Fig. 3:  The fuzzy realization x˜ k lies completely inside the set Ai , the event E˜ i k has occurred.  The fuzzy realization x˜ k lies only partially inside Ai , the event E˜ i k may have occurred or not occurred.  The fuzzy realization x˜ k lies completely outside the set Ai , the event E˜ i k has not occurred. ˜ i ) takes account of all three opThe fuzzy probability P(A tions within the range of fuzziness. The fuzzy random ˜ is discretized into a set of random ˛-level sets quantity X X˛ according to Eq. (11), and the events E˜ i and E˜ i k , respectively, are evaluated ˛-level by ˛-level. In this evaluation, the event E i k˛ : x k˛ hits Ai admits the following two “extreme” interpretations of occurrence:  E i k˛l : “x k˛ is contained in Ai : x k˛ Ai ”, and  E i k˛r : “x k˛ and Ai possess at least one element in common: x k˛ \ Ai ¤ ;”. Consequently, the events E i k˛l are associated with the smallest probability P˛l (Ai ) D P(X˛ Ai ) ;

(13)

is referred to as fuzzy random event, which occurs if ˜ hits a fuzzy realization x˜ of the fuzzy random quantity X the set Ai . The associated probability of occurrence of E˜ i is ˜ i ). It is obtained as the referred to as fuzzy probability P(A fuzzy set of the probabilities of occurrence of the events E i j : X j 2 Ai

(14)

associated with all originals X j of the fuzzy random quan-

(15)

Fuzzy Probability Theory, Figure 3 Fuzzy event x˜ k hits Ai in the one-dimensional case

(16)

1245

1246

Fuzzy Probability Theory

Fuzzy Probability Theory, Figure 4 Events for determining P˛l (Ai ) and P˛r (Ai ) in the one-dimensional case

and the events E i k˛r correspond to the largest probability P˛r (Ai ) D P(X˛ \ Ai ¤ ;) :

(17)

The probabilities P˛l (Ai ) and P˛r (Ai ) are bounds on the ˜ i ) on the respective ˛-level associated with probability P(A ˜ the random ˛-level set X˛ of the fuzzy random quantity X; ˜ the see Fig. 4. As all elements of X˛ are originals X j of X, probability that an X j 2 X˛ hits Ai is bounded according to P˛l (Ai )  P(X j 2 Ai )  P˛r (Ai )

8X j 2 X˛ :

(18)

This enables a computation of the probability bounds directly from the real-valued probabilities P(X j 2 Ai ) associated with the originals X j , P˛l (Ai ) D min P(X j 2 Ai ) ;

(19)

P˛r (Ai ) D max P(X j 2 Ai ) :

(20)

X j 2X˛

X j 2X˛

˜ represents a fuzzy set of If the fuzzy random quantity X continuous real-valued random quantities and if the mem˜ are at bership functions of all fuzzy realizations x˜ i of X least segmentally continuous, then the probability bounds P˛l (Ai ) and P˛r (Ai ) determine closed connected intervals ˜ i) [P˛l (Ai ); P˛r (Ai )]. In this case, the fuzzy probability P(A is obtained as a continuous and convex fuzzy set, which may be specified uniquely with the aid of ˛-discretization, ˚ ˜ i ) D (P˛ (Ai ); (P˛ (Ai )))jP˛ (Ai ) P(A D [P˛l (Ai ); P˛r (Ai )] ; (P˛ (Ai )) D ˛  8˛ 2 (0; 1] : (21)

˜ i ) result The properties of the fuzzy probability P(A from the properties of the traditional probability measure in conjunction with fuzzy set theory. For example, a com˜ ) as folplementary relationship may be derived for P(A i lows. The equivalence   (X˛ Ai ) , X˛ \ ACi D ; (22) with ACi being the complementary set of Ai with respect to the universe X, leads to   P(X˛ Ai ) D P X˛ \ ACi D ; (23) for each ˛-level. If the event X˛ \ ACi D ; is expressed in terms of its complementary event X˛ \ ACi ¤ ;, Eq. (23) can be rewritten as   P(X˛ Ai ) D 1  P X˛ \ ACi ¤ ; : (24) This leads to the relationships   P˛l (Ai ) D 1  P˛r ACi ;   P˛r (Ai ) D 1  P˛l ACi ;

(25) (26)

and   ˜ i ) D 1  P˜ AC : P(A i

(27)

In the special case that the set Ai contains only one ˜ i ) changes element Ai D x i , the fuzzy probability P(A ˜ to P(x i ). The event X˛ \ A i ¤ ; is then replaced by x i 2 X˛ , and X˛ Ai becomes X˛ D x i . The probability P˛l (x i ) D P(X˛ D x i ) may take values greater than zero only if a realization of X˛ exists that possesses exactly one

Fuzzy Probability Theory

element X˛ D t with t D x i and if this element t represents a realization of a discrete original of X˛ . Otherwise, ˜ ) is exclusively specP˛l (x i ) D 0, and the fuzziness of P(x i ified by P˛r (x i ). In the one-dimensional case with A i D fx j x 2 X; x1  x  x2 g

(28)

˜ i ) can be represented in a simplithe fuzzy probability P(A fied manner. If the random ˛-level sets X˛ are restricted to be closed random intervals [X˛l ; X˛r ], the associated ˜ can be completely described by fuzzy random variable X means of the bounding real-valued random quantities X˛l and X˛r ; see Fig. 4. X˛l and X˛r represent specific origi˜ This enables the specification of the probabilnals X j of X. ity bounds for each ˛-level according to  P˛l (A i ) D max 0; P(X˛r D tr j x2 ; tr 2 X; tr  x2 )   P(X˛l D t l j x1 ; t l 2 X; t l < x1 ) ; (29) and P˛r (A i ) D P(X˛l D t l j x2 ; t l 2 X; t l  x2 )  P(X˛r D tr j x1 ; tr 2 X; tr < x1 ) :

(30)

From the above framework, the special case of real-valued random quantities X, may be reobtained as a fuzzy ˜ that contains precisely one original random quantity X X j D X1 , X1 D (X jD1 ; (X jD1 ) D 1) ;

(31)

see Eq. (10). Then, all X˛ contain only the sole original X1 , and both X˛ \ Ai ¤ ; and X˛ Ai reduce to X1 2 Ai . That is, the event X1 hits Ai does no longer provide options for interpretation reflected as fuzziness, P˛l (Ai ) D P˛r (Ai ) D P(Ai ) D P(X1 2 Ai ) :

(32)

In the above special one-dimension case, Eqs. (29) and (30), the setting X D X˛l D X˛r leads to P˛l (A i ) D P˛r (A i ) D P(X1 D t j x1 ; x2 ; t 2 X; x1  t l  x2 ) ; (33) Further properties and computation rules for fuzzy probabilities may be derived from traditional probability theory in conjunction with fuzzy set theory. Representation of Fuzzy Random Quantities ˜ i ) may be computed for each The fuzzy probability P(A arbitrary set Ai 2 S(X). If – as a special family of sets S(X) – the Borel  -algebra S0 (Rn ) of the Rn is selected, the concept of the probability distribution function may be applied to fuzzy random quantities. That is, the system S0 (Rn ) of the open sets ˚ Ai0 D t D (t1 ; : : : ; t k ; : : : ; t n )jx D x i ; x; t 2 Rn ;  (34) t k < x k ; k D 1; : : : ; n on X D Rn is considered; S0 (Rn ) is a Boolean set algebra. The concept of fuzzy probability according to Sect. “Fuzzy Probability” applied to the sets from Eq. (34) leads to fuzzy probability distribution functions; see Fig. 5. The ˜ fuzzy probability distribution function F(x) of the fuzzy n ˜ random quantity X on X D R is obtained as the set of the ˜ i0 ) with Ai0 according to Eq. (34) fuzzy probabilities P(A for all x i 2 X, ˜ ˜ i0 ) 8x 2 Xg : F(x) D fP(A i

(35)

It is a fuzzy function. Bounds for the functional values ˜ F(x) are specified for each ˛-level depending on x D x i in Eq. (34) and in compliance with Eqs. (19) and (20),   F˛l x D (x1 ; : : : ; x n )  D 1  max P X j D t D (t1 ; : : : ; t n )jx; t 2 X D Rn ; X j 2X˛

 9t k x k ; 1  k  n ;

Fuzzy Probability Theory, Figure 5 ˜ of a continuous fuzzy random variable X˜ Fuzzy probability density function f˜ (x) and fuzzy probability distribution function F(x)

(36)

1247

1248

Fuzzy Probability Theory

F˛r (x D (x1 ; : : : ; x n ))  D max P X j D t D (t1 ; : : : ; t n )jx; t 2 X D Rn ; X j 2X˛

 t k < x k ; k D 1; : : : ; n :

(37)

For the determination of F˛l (x) the relationship in Eq. (25) is used. If F˛l (x) and F˛r (x) form closed connected intervals [F˛l (x); F˛r (x)] – see Sect. “Fuzzy Proba˜ are bility” for the conditions – the functional values F(x) determined based on Eq. (21), ˚

˜ F(x) D (F˛ (x); (F˛ (x)))jF˛ (x) D [F˛l (x); F˛r (x)];  (F˛ (x)) D ˛ 8˛ 2 (0; 1] ; (38) In this case, the functional values of the fuzzy probabil˜ are continuous and convex ity distribution function F(x) fuzzy sets. In correspondence with Eq. (15), the fuzzy probability ˜ ˜ represents the fuzzy set of distribution function F(x) of X the probability distribution functions F j (x) of all originals ˜ with the membership values (F j (x)), X j of X ˚ ˜ ˜ (F j (x)) D (F j (x); (F j (x))) j X j 2 X; F(x)

 D (X j ) 8 j :

(39)

Each original X j determines precisely one trajectory F j (x) ˜ ˜ of weighted functions F j (x) 2 F(x). within the bunch F(x) In the one-dimensional case and with the restriction to closed random intervals [X˛l ; X˛r ] for each ˛-level, the ˜ is determined fuzzy probability distribution function F(x) by F˛l (x) D P(X˛r D tr j x; tr 2 X; tr < x) ;

(40)

F˛r (x) D P(X˛l D t l j x; t l 2 X; t l < x) :

(41)

In correspondence with traditional probability theory, fuzzy probability density functions f˜(t) or f˜(x) are defined ˜ in association with the F(x); see Fig. 5. The f˜(t) or f˜(x) are fuzzy functions which – in the continuous case with respect to randomness – are integrable for each original ˜ and satisfy the relationship X j of X Z F j (x) D

Z

t 1 Dx 1 t 1 D1

Z

t k Dx k

:::

t n Dx n

::: t k D1

t n D1

f j (t) dt ; (42)

with t D (t1 ; : : : ; t n ) 2 X. For each original X j the integration of the associated trajectory f j (x) 2 f˜(x) leads to ˜ the respective trajectory F j (x) 2 F(x). ˜ paFor the description of a fuzzy random quantity X, ˜ rameters in the form of fuzzy quantities p˜ t (X) may be

used. These fuzzy parameters may represent any type of parameters known from real-valued random quantities, such as moments, weighting factors for different distribution types in a compound distribution, or functional parameters of the distribution functions. The fuzzy parame˜ of the fuzzy random quantity X ˜ is the fuzzy set ter p˜ t (X) of the parameter values p t (X j ) of all originals X j with the membership values (p t (X j )), ˚ ˜ D (p t (X j ); (p t (X j ))) j X j 2 X; ˜ p˜ t (X)

 (p t (X j )) D (X j ) 8 j :

(43)

For each ˛-level, bounds are given for the fuzzy parameter ˜ by p˜ t (X) ˜ D min [p t (X j )] ; p t;˛l (X)

(44)

˜ D max [p t (X j )] : p t;˛r (X)

(45)

X j 2X˛

X j 2X˛

˜ represents a fuzzy set of If the fuzzy random quantity X continuous real-valued random quantities, if all fuzzy re˜ are connected sets, and if the parameter alizations x˜ i of X pt is defined on a continuous scale, then the fuzzy param˜ is determined by its ˛-level sets eter p˜ t (X) ˜ D [p t;˛l (X); ˜ p t;˛r (X)] ˜ ; p t;˛ (X)

(46)

˚ ˜ D (p t;˛ (X); ˜ (p t;˛ (X))) ˜ j p˜ t (X)  ˜ D ˛ 8 ˛ 2 (0; 1] ; (p t;˛ (X))

(47)

and represents a continuous and convex fuzzy set. ˜ is described by more than If a fuzzy random quantity X ˜ one fuzzy parameter p˜ t (X), interactive dependencies are generally present between the different fuzzy parameters. If this interaction is neglected, a fuzzy random quantity ˜ hull is obtained, which covers the actual fuzzy random X ˜ completely. That is, for all realizations of X ˜ hull quantity X ˜ and X the following holds, x˜ i hull  x˜ i 8i :

(48)

Fuzzy parameters and fuzzy probability distribution functions do not enable a unique reproduction of fuzzy realizations based on the above description. But they are sufficient to compute fuzzy probabilities correctly for any events defined according to Eq. (13). The presented concept of fuzzy probability can be extended to fuzzy random functions and processes. Future Directions Fuzzy probability theory provides a powerful key to solving a broad variety of practical problems that defy an appropriate treatment with traditional probabilistics due

Fuzzy Probability Theory

to imprecision of the information for model specification. Fuzzy probabilities reflect aleatory uncertainty and epistemic uncertainty of the underlying problem simultaneously and separately and provide extended information and decision aids. These features can be utilized in all application fields of traditional probability theory and beyond. Respective developments can be observed, primarily, in information science and, increasingly, in engineering. Potential for further extensive fruitful applications exists, for example, in psychology, economy, financial sciences, medicine, biology, social sciences, and even in law. In all cases, fuzzy probability theory is not considered as a replacement for traditional probabilistics but as a beneficial supplement for an appropriate model specification according to the available information in each particular case. The focus of further developments is seen on both theory and applications. Future theoretical developments may pursue a measure theoretical clarification of the embedding of fuzzy probability theory in the framework of imprecise probabilities under the umbrella of generalized information theory. This is associated with the ambition to unify the variety of available fuzzy probabilistic concepts and to eventually formulate a consistent generalized fuzzy probability theory. Another important issue for future research is the mathematical description and treatment of dependencies within the fuzziness of fuzzy random quantities such as non-probabilistic interaction between fuzzy realizations, between fuzzy parameters, and between fuzzy probabilities of certain events. In parallel to theoretical modeling, further effort is worthwhile towards a consistent concept for the statistical evaluation of imprecise data including the analysis of probabilistic and nonprobabilistic dependencies of the data. In view of applications, the further development of fuzzy probabilistic simulation methods is of central importance. This concerns both theory and numerical algorithms for the direct generation of fuzzy random quantities – in a parametric and in a non-parametric fashion. Representations and computational procedures for fuzzy random quantities must be developed with focus on a high numerical efficiency to enable a solution of realworld problems. For a spread into practice it is further essential to elaborate options and potentials for an interpretation and evaluation of fuzzy probabilistic results such as fuzzy mean values or fuzzy failure probabilities. The most promising potentials for a utilization are seen in worst-case investigations in terms of probability, in a sensitivity analysis with respect to non-probabilistic uncertainty, and in decision-making based on mixed probabilistic/non-probabilistic information.

In summary, fuzzy probability theory and its further developments significantly contribute to an improved uncertainty modeling according to reality. Bibliography Primary Literature 1. Alefeld G, Herzberger J (1983) Introduction to interval computations. Academic Press, New York 2. Bandemer H (1992) Modelling uncertain data. AkademieVerlag, Berlin 3. Bandemer H, Gebhardt A (2000) Bayesian fuzzy kriging. Fuzzy Sets Syst 112:405–418 4. Bandemer H, Gottwald S (1995) Fuzzy sets, fuzzy logic fuzzy methods with applications. Wiley, Chichester 5. Bandemer H, Näther W (1992) Fuzzy data analysis. Kluwer, Dordrecht 6. Beer M (2007) Model-free sampling. Struct Saf 29:49–65 7. Berleant D, Zhang J (2004) Representation and problem solving with distribution envelope determination (denv). Reliab Eng Syst Saf 85(1–3):153–168 8. Bernardini A, Tonon F (2004) Aggregation of evidence from random and fuzzy sets. Special Issue of ZAMM. Z Angew Math Mech 84(10–11):700–709 9. Bodjanova S (2000) A generalized histogram. Fuzzy Sets Syst 116:155–166 10. Colubi A, Domínguez-Menchero JS, López-Díaz M, Gil MA (1999) A generalized strong law of large numbers. Probab Theor Relat Fields 14:401–417 11. Colubi A, Domínguez-Menchero JS, López-Díaz M, Ralescu DA (2001) On the formalization of fuzzy random variables. Inf Sci 133:3–6 12. Colubi A, Domínguez-Menchero JS, López-Díaz M, Ralescu DA (2002) A de[0, 1]-representation of random upper semicontinuous functions. Proc Am Math Soc 130:3237–3242 13. Colubi A, Fernández-García C, Gil MA (2002) Simulation of random fuzzy variables: an empirical approach to statistical/probabilistic studies with fuzzy experimental data. IEEE Trans Fuzzy Syst 10:384–390 14. Couso I, Sanchez L (2008) Higher order models for fuzzy random variables. Fuzzy Sets Syst 159:237–258 15. de Cooman G (2002) The society for imprecise probability: theories and applications. http://www.sipta.org 16. Diamond P (1990) Least squares fitting of compact set-valued data. J Math Anal Appl 147:351–362 17. Diamond P, Kloeden PE (1994) Metric spaces of fuzzy sets: theory and applications. World Scientific, Singapore 18. Dubois D, Prade H (1980) Fuzzy sets and systems theory and applications. Academic Press, New York 19. Dubois D, Prade H (1985) A review of fuzzy set aggregation connectives. Inf Sci 36:85–121 20. Dubois D, Prade H (1986) Possibility theory. Plenum Press, New York 21. Fellin W, Lessmann H, Oberguggenberger M, Vieider R (eds) (2005) Analyzing uncertainty in civil engineering. Springer, Berlin 22. Feng Y, Hu L, Shu H (2001) The variance and covariance of fuzzy random variables and their applications. Fuzzy Sets Syst 120(3):487–497

1249

1250

Fuzzy Probability Theory

23. Ferson S, Hajagos JG (2004) Arithmetic with uncertain numbers: rigorous and (often) best possible answers. Reliab Eng Syst Saf 85(1–3):135–152 24. Fetz T, Oberguggenberger M (2004) Propagation of uncertainty through multivariate functions in the framework of sets of probability measures. Reliab Eng Syst Saf 85(1–3):73–87 25. Ghanem RG, Spanos PD (1991) Stochastic finite elements: a spectral approach. Springer, New York; Revised edition 2003, Dover Publications, Mineola 26. González-Rodríguez G, Montenegro M, Colubi A, Ángeles Gil M (2006) Bootstrap techniques and fuzzy random variables: Synergy in hypothesis testing with fuzzy data. Fuzzy Sets Syst 157(19):2608–2613 27. Grzegorzewski P (2000) Testing statistical hypotheses with vague data. Fuzzy Sets Syst 112:501–510 28. Hall JW, Lawry J (2004) Generation, combination and extension of random set approximations to coherent lower and upper probabilities. Reliab Eng Syst Saf 85(1–3):89–101 29. Helton JC, Johnson JD, Oberkampf WL (2004) An exploration of alternative approaches to the representation of uncertainty in model predictions. Reliab Eng Syst Saf 85(1–3):39–71 30. Helton JC, Oberkampf WL (eds) (2004) Special issue on alternative representations of epistemic uncertainty. Reliab Eng Syst Saf 85(1–3):1–369 31. Hung W-L (2001) Bootstrap method for some estimators based on fuzzy data. Fuzzy Sets Syst 119:337–341 32. Hwang C-M, Yao J-S (1996) Independent fuzzy random variables and their application. Fuzzy Sets Syst 82:335–350 33. Jang L-C, Kwon J-S (1998) A uniform strong law of large numbers for partial sum processes of fuzzy random variables indexed by sets. Fuzzy Sets Syst 99:97–103 34. Joo SY, Kim YK (2001) Kolmogorovs strong law of large numbers for fuzzy random variables. Fuzzy Sets Syst 120:499–503 35. Kim YK (2002) Measurability for fuzzy valued functions. Fuzzy Sets Syst 129:105–109 36. Klement EP, Puri ML, Ralescu DA (1986) Limit theorems for fuzzy random variables. Proc Royal Soc A Math Phys Eng Sci 407:171–182 37. Klement EP (1991) Fuzzy random variables. Ann Univ Sci Budapest Sect Comp 12:143–149 38. Klir GJ (2006) Uncertainty and information: foundations of generalized information theory. Wiley-Interscience, Hoboken 39. Klir GJ, Folger TA (1988) Fuzzy sets, uncertainty, and information. Prentice Hall, Englewood Cliffs 40. Körner R (1997) Linear models with random fuzzy variables. Phd thesis, Bergakademie Freiberg, Fakultät für Mathematik und Informatik 41. Körner R (1997) On the variance of fuzzy random variables. Fuzzy Sets Syst 92:83–93 42. Körner R, Näther W (1998) Linear regression with random fuzzy variables: extended classical estimates, best linear estimates, least squares estimates. Inf Sci 109:95–118 43. Krätschmer V (2001) A unified approach to fuzzy random variables. Fuzzy Sets Syst 123:1–9 44. Krätschmer V (2002) Limit theorems for fuzzy-random variables. Fuzzy Sets Syst 126:253–263 45. Krätschmer V (2004) Probability theory in fuzzy sample space. Metrika 60:167–189 46. Kruse R, Meyer KD (1987) Statistics with vague data. Reidel, Dordrecht

47. Kwakernaak H (1978) Fuzzy random variables I. definitions and theorems. Inf Sci 15:1–19 48. Kwakernaak H (1979) Fuzzy random variables II. algorithms and examples for the discrete case. Inf Sci 17:253–278 49. Li S, Ogura Y, Kreinovich V (2002) Limit theorems and applications of set valued and fuzzy valued random variables. Kluwer, Dordrecht 50. Lin TY, Yao YY, Zadeh LA (eds) (2002) Data mining, rough sets and granular computing. Physica, Germany 51. López-Díaz M, Gil MA (1998) Reversing the order of integration in iterated expectations of fuzzy random variables, and statistical applications. J Stat Plan Inference 74:11–29 52. Matheron G (1975) Random sets and integral geometry. Wiley, New York 53. Möller B, Graf W, Beer M (2000) Fuzzy structural analysis using alpha-level optimization. Comput Mech 26:547–565 54. Möller B, Beer M (2004) Fuzzy randomness – uncertainty in civil engineering and computational mechanics. Springer, Berlin 55. Möller B, Reuter U (2007) Uncertainty forecasting in engineering. Springer, Berlin 56. Muhanna RL, Mullen RL, Zhang H (2007) Interval finite element as a basis for generalized models of uncertainty in engineering mechanics. J Reliab Comput 13(2):173–194 57. Gil MA, López-Díaz M, Ralescu DA (2006) Overview on the development of fuzzy random variables. Fuzzy Sets Syst 157(19):2546–2557 58. Näther W, Körner R (2002) Statistical modelling, analysis and management of fuzzy data, chapter on the variance of random fuzzy variables. Physica, Heidelberg, pp 25–42 59. Näther W (2006) Regression with fuzzy random data. Comput Stat Data Analysis 51:235–252 60. Näther W, Wünsche A (2007) On the conditional variance of fuzzy random variables. Metrika 65:109–122 61. Oberkampf WL, Helton JC, Sentz K (2001) Mathematical representation of uncertainty. In: AIAA non-deterministic approaches forum, number AIAA 2001–1645. AIAA, Seattle 62. Pedrycz W, Skowron A, Kreinovich V (eds) (2008) Handbook of granular computing. Wiley, New York 63. Puri ML, Ralescu D (1986) Fuzzy random variables. J Math Anal Appl 114:409–422 64. Puri ML, Ralescu DA (1983) Differentials of fuzzy functions. J Math Anal Appl 91:552–558 65. Puri ML, Ralescu DA (1991) Convergence theorem for fuzzy martingales. J Math Anal Appl 160:107–122 66. Rodríguez-Muñiz L, López-Díaz M, Gil MA (2005) Solving influence diagrams with fuzzy chance and value nodes. Eur J Oper Res 167:444–460 67. Samarasooriya VNS, Varshney PK (2000) A fuzzy modeling approach to decision fusion under uncertainty. Fuzzy Sets Syst 114:59–69 68. Schenk CA, Schuëller GI (2005) Uncertainty assessment of large finite element systems. Springer, Berlin 69. Schuëller GI, Spanos PD (eds) (2001) Proc int conf on monte carlo simulation MCS 2000. Swets and Zeitlinger, Monaco 70. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, Princeton 71. Song Q, Leland RP, Chissom BS (1997) Fuzzy stochastic fuzzy time series and its models. Fuzzy Sets Syst 88:333–341 72. Taheri SM, Behboodian J (2001) A bayesian approach to fuzzy hypotheses testing. Fuzzy Sets Syst 123:39–48

Fuzzy Probability Theory

73. Terán P (2006) On borel measurability and large deviations for fuzzy random variables. Fuzzy Sets Syst 157(19):2558–2568 74. Terán P (2007) Probabilistic foundations for measurement modelling with fuzzy random variables. Fuzzy Sets Syst 158(9):973–986 75. Viertl R (1996) Statistical methods for non-precise data. CRC Press, Boca Raton 76. Viertl R, Hareter D (2004) Generalized Bayes’ theorem for nonprecise a-priori distribution. Metrika 59:263–273 77. Viertl R, Trutschnig W (2006) Fuzzy histograms and fuzzy probability distributions. In: Proceedings of the 11th Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems, Editions EDK, Paris, CD-ROM 78. Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman & Hall, London 79. Wang G, Qiao Z (1994) Convergence of sequences of fuzzy random variables and its application. Fuzzy Sets Syst 63:187–199 80. Wang G, Zhang Y (1992) The theory of fuzzy stochastic processes. Fuzzy Sets Syst 51:161–178 81. Weichselberger K (2000) The theory of interval-probability as a unifying concept for uncertainty. Int J Approx Reason 24(2– 3):149–170 82. Yager RR (1984) A representation of the probability of fuzzy subsets. Fuzzy Sets Syst 13:273–283 83. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353 84. Zadeh LA (1968) Probability measures of fuzzy events. J Math Anal Appl 23:421–427 85. Zhong Q, Yue Z, Guangyuan W (1994) On fuzzy random linear programming. Fuzzy Sets Syst 65(1):31–49 86. Zimmermann HJ (1992) Fuzzy set theory and its applications. Kluwer, Boston

Books and Reviews Ayyub BM (1998) Uncertainty modeling and analysis in civil engineering. CRC Press, Boston Blockley DI (1980) The nature of structural design and safety. Ellis Horwood, Chichester Buckley JJ (2003) Fuzzy Probabilities. Physica/Springer, Heidelberg Cai K-Y (1996) Introduction to fuzzy reliability. Kluwer, Boston Chou KC, Yuan J (1993) Fuzzy-bayesian approach to reliability of existing structures. ASCE J Struct Eng 119(11):3276–3290 Elishakoff I (1999) Whys and hows in uncertainty modelling probability, fuzziness and anti-optimization. Springer, New York Feng Y (2000) Decomposition theorems for fuzzy supermartingales and submartingales. Fuzzy Sets Syst 116:225–235 Gil MA, López-Díaz M (1996) Fundamentals and bayesian analyses of decision problems with fuzzy-valued utilities. Int J Approx Reason 15:203–224 Gil MA, Montenegro M, González-Rodríguez G, Colubi A, Casals MR (2006) Bootstrap approach to the multi-sample test of means with imprecise data. Comput Stat Data Analysis 51:148–162 Grzegorzewski P (2001) Fuzzy sets b defuzzification and randomization. Fuzzy Sets Syst 118:437–446 Grzegorzewski P (2004) Distances between intuitionistic fuzzy sets and/or interval-valued fuzzy sets based on the Hausdorff metric. Fuzzy Sets Syst 148(2):319–328 Hareter D (2004) Time series analysis with non-precise data. In: Wojtkiewicz S, Red-Horse J, Ghanem R (eds) 9th ASCE specialty conference on probabilistic mechanics and structural reliability. Sandia National Laboratories, Albuquerque

Helton JC, Cooke RM, McKay MD, Saltelli A (eds) (2006) Special issue: The fourth international conference on sensitivity analysis of model output – SAMO 2004. Reliab Eng Syst Saf 91:(10– 11):1105–1474 Hirota K (1992) An introduction to fuzzy logic applications in intelligent systems. In: Kluwer International Series in Engineering and Computer Science, Chapter probabilistic sets: probabilistic extensions of fuzzy sets, vol 165. Kluwer, Boston, pp 335– 354 Klement EP, Puri ML, Ralescu DA (1984) Cybernetics and Systems Research 2, Chapter law of large numbers and central limit theorems for fuzzy random variables. Elsevier, North-Holland, pp 525–529 Kutterer H (2004) Statistical hypothesis tests in case of imprecise data, V Hotine-Marussi Symposium on Mathematical Geodesy. Springer, Berlin, pp 49–56 Li S, Ogura Y (2003) A convergence theorem of fuzzy-valued martingales in the extended hausdorff metric h(inf). Fuzzy Sets Syst 135:391–399 Li S, Ogura Y, Nguyen HT (2001) Gaussian processes and martingales for fuzzy valued random variables with continuous parameter. Inf Sci 133:7–21 Li S, Ogura Y, Proske FN, Puri ML (2003) Central limit theorems for generalized set-valued random variables. J Math Anal Appl 285:250–263 Liu B (2002) Theory and practice of uncertainty programming. Physica, Heidelberg Liu Y, Qiao Z, Wang G (1997) Fuzzy random reliability of structures based on fuzzy random variables. Fuzzy Sets Syst 86:345–355 Möller B, Graf W, Beer M (2003) Safety assessment of structures in view of fuzzy randomness. Comput Struct 81:1567–1582 Möller B, Liebscher M, Schweizerhof K, Mattern S, Blankenhorn G (2008) Structural collapse simulation under consideration of uncertainty – improvement of numerical efficiency. Comput Struct 86(19–20):1875–1884 Montenegro M, Casals MR, Lubiano MA, Gil MA (2001) Two-sample hypothesis tests of means of a fuzzy random variable. Inf Sci 133:89–100 Montenegro M, Colubi A, Casals MR, Gil MA (2004) Asymptotic and bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika 59:31–49 Montenegro M, González-Rodríguez G, Gil MA, Colubi A, Casals MR (2004) Soft methodology and random information systems, chapter introduction to ANOVA with fuzzy random variables. Springer, Berlin, pp 487–494 Muhanna RL, Mullen RL (eds) (2004) Proceedings of the NSF workshop on reliable engineering computing. Center for Reliable Engineering Computing. Georgia Tech Savannah, Georgia Muhanna RL, Mullen RL (eds) (2006) NSF workshop on reliable engineering computing. center for reliable engineering computing. Georgia Tech Savannah, Georgia Negoita VN, Ralescu DA (1987) Simulation, knowledge-based computing and fuzzy-statistics. Van Nostrand, Reinhold, New York Oberguggenberger M, Schuëller GI, Marti K (eds) (2004) Special issue on application of fuzzy sets and fuzzy logic to engineering problems. ZAMM: Z Angew Math Mech 84(10–11):661–776 Okuda T, Tanaka H, Asai K (1978) A formulation of fuzzy decision problems with fuzzy information using probability measures of fuzzy events. Inf Control 38:135–147

1251

1252

Fuzzy Probability Theory

Proske FN, Puri ML (2002) Strong law of large numbers for banach space valued fuzzy random variables. J Theor Probab 15: 543–551 Reddy RK, Haldar A (1992) Analysis and management of uncertainty: theory and applications, chapter a random-fuzzy reliability analysis of engineering systems. North-Holland, Amsterdam, pp 319–329 Reddy RK, Haldar A (1992) A random-fuzzy analysis of existing structures. Fuzzy Sets Syst 48:201–210 Ross TJ (2004) Fuzzy logic with engineering applications, 2nd edn. Wiley, Philadelphia Ross TJ, Booker JM, Parkinson WJ (eds) (2002) Fuzzy logic and probability applications – bridging the gap. SIAM & ASA, Philadelphia Stojakovic M (1994) Fuzzy random variables, expectation, and martingales. J Math Anal Appl 184:594–606

Terán P (2004) Cones and decomposition of sub- and supermartingales. Fuzzy Sets Syst 147:465–474 Tonon F, Bernardini A (1998) A random set approach to the optimization of uncertain structures. Comput Struct 68(6):583–600 Weichselberger K (2000) The theory of interval-probability as a unifying concept for uncertainty. Int J Approx Reason 24(2–3): 149–170 Yukari Y, Masao M (1999) Interval and paired probabilities for treating uncertain events. IEICE Trans Inf Syst E82–D(5):955–961 Zadeh L (1985) Is probability theory sufficient for dealing with uncertainty in ai: A negative view. In: Proceedings of the 1st Annual Conference on Uncertainty in Artificial Intelligence (UAI85). Elsevier Science, New York, pp 103–116 Zhang Y, Wang G, Su F (1996) The general theory for response analysis of fuzzy stochastic dynamical systems. Fuzzy Sets Syst 83:369–405

Fuzzy Sets Theory, Foundations of

Fuzzy Sets Theory, Foundations of JANUSZ KACPRZYK Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

Article Outline Glossary Definition of the Subject Introduction Fuzzy Sets – Basic Definitions and Properties Fuzzy Relations Linguistic Variable, Fuzzy Conditional Statement, and Compositional Rule of Inference The Extension Principle Fuzzy Numbers Fuzzy Events and Their Probabilities Defuzzification of Fuzzy Sets Fuzzy Logic – Basic Issues Bellman and Zadeh’s General Approach to Decision Making Under Fuzziness Concluding Remarks Bibliography

Fuzzy event and its probability Make it possible to formally define events which are imprecisely specified, like “high temperature” and calculate their probabilities, for instance the probability of a “high temperature tomorrow”. Fuzzy logic Provides formal means for the representation of, and inference based on imprecisely specified premises and rules of inference; can be understood in different ways, basically as fuzzy logic in a narrow sense, being some type of multivalued logic, and fuzzy logic in a broad sense, being a way to formalize inference based on imprecisely specified premises and rules of inference. Definition of the Subject We provide a brief exposition of basic elements of Zadeh’s [95] fuzzy sets theory. We discuss basic properties, operations on fuzzy sets, fuzzy relations and their compositions, linguistic variables, the extension principle, fuzzy arithmetic, fuzzy events and their probabilities, fuzzy logic, fuzzy dynamic systems, etc. We also outline Bellman and Zadeh’s [8] general approach to decision making in a fuzzy environment which is a point of departure for virtually all fuzzy decision making, optimization, control, etc. models. Introduction

Glossary Fuzzy set A mathematical tool that can formally characterize an imprecise concept. Whereas a conventional set to which elements can either belong or not, elements in a fuzzy set can belong to some extent, from zero, which stands for a full nonbelongingness) to one, which stands for a full belongingness, through all intermediate values. Fuzzy relation A mathematical tool that can formally characterize that which is imprecisely specified, notably by using natural language, relations between variables, for instance, similar, much greater than, almost equal, etc. Extension principle Makes it possible to extend relations, algorithms, etc. defined for variables that take on nonfuzzy (e. g. real) values to those that take on fuzzy values. Linguistic variable, fuzzy conditional statement, compositional rule of inference Make it possible to use variables, which take on linguistic (instead of numeric) values to represent relations between such variables, by using fuzzy conditional statements and use them in inference by using the compositional rule of inference.

This paper is meant to briefly expose a novice reader to basic elements of theory of fuzzy sets and fuzzy systems viewed for our purposes as an effective and efficient means and calculus to deal with imprecision in the definition of data, information and knowledge, and to provide tools and techniques for dealing with imprecision therein. Our exposition will be as formal as necessary, of more intuitive and constructive a character, so that fuzzy tools and techniques can be useful for the multidisciplinary audience of this encyclopedia. For the readers requiring or interested in a deeper exposition of fuzzy sets and related concepts, we will recommend many relevant references, mainly books. However, as the number of books and volumes on this topic and its applications in a variety of fields is huge, we will recommend some of them only, mostly those better known ones. For the newest literature entries the readers should consult the most recent catalogs of major scientific publishers who have books and edited volumes on fuzzy sets/logic and their applications. Our discussion will proceed, on the other hand, in the pure fuzzy setting, and we will not discuss possibility theory (which is related to fuzzy sets theory). The reader interested in possibility theory is referred to, e. g., Dubois and Prade [29,30] or their article in this encyclopedia.

1253

1254

Fuzzy Sets Theory, Foundations of

We will consecutively discuss the idea of a fuzzy set, basic properties of fuzzy sets, operations on fuzzy sets, some extensions of the basic concept of a fuzzy set, fuzzy relations and their compositions, linguistic variables, fuzzy conditional statements, and the compositional rule of inference, the extension principle, fuzzy arithmetic, fuzzy events and their probabilities, fuzzy logic, fuzzy dynamic systems, etc. We also outline Bellman and Zadeh’s [8] general approach to decision making in a fuzzy environment which is a point of departure for virtually all fuzzy decision making, optimization, control, etc. models. Fuzzy Sets – Basic Definitions and Properties Fuzzy sets theory, introduced by Zadeh in 1965 [95], is a simple yet very powerful, effective and efficient means to represent and handle imprecise information (of vagueness type) exemplified by tall buildings, large numbers, etc. We will present fuzzy sets theory as some calculus of imprecision, not as a new set theory in the mathematical sense. The Idea of a Fuzzy Set From our point of view, the main purpose of a (conventional) set in mathematics is to formally characterize some concept (or property). For instance, the concept of “integer numbers which are greater than or equal three and less than or equal ten” may be uniquely represented just by showing all integer numbers that satisfy this condition; that is, given by the following set: fx 2 I : 3  x  10g D f3; 4; 5; 6; 7; 8; 9; 10g where I is the set of integers. Notice that we need to specify first a universe of discourse (universe, universal set, referential, reference set, etc.) that contains all those elements which are relevant for the particular concept as, e. g., the set of integers I in our example. A conventional set, say A, may be equated with its characteristic function defined as 'A : X ! f0; 1g

(1)

which associates with each element x of a universe of discourse X D fxg a number '(x) 2 f0; 1g such that: 'A (x) D 0 means that x 2 X does not belong to the set A, and 'A (x) D 1 means that x belongs to the set A. Therefore, for the set verbally defined as integer numbers which are greater than or equal three and less than or equal ten, its equivalent set A D f3; 4; 5; 6; 7; 8; 9; 10g, listing all the respective integer numbers, may be represented by its characteristic function ( 'A (x) D

1 for x 2 f3; 4; 5; 6; 7; 8; 9; 10g 0 otherwise :

Notice that in a conventional set there is a clear-cut differentiation between elements belonging to the set and not, i. e. the transition from the belongingness to nonbelongingness is clear-cut and abrupt. However, it is easy to notice that a serious difficulty arises when we try to formalize by means of a set vague concepts which are commonly encountered in everyday discourse and widely used by humans as, e. g., the statement “integer numbers which are more or less equal to six.” Evidently, the (conventional) set cannot be used to adequately characterize such an imprecise concept because an abrupt and clear-cut differentiation between the elements belonging and not belonging to the set is artificial here. This has led Zadeh [95] to the idea of a fuzzy set which is a class of objects with unsharp boundaries, i. e. in which the transition from the belongingness to nonbelongingness is not abrupt; thus, elements of a fuzzy set may belong to it to partial degrees, from the full belongingness to the full nonbelongingness through all intermediate values. Notice that this is presumably the most natural and simple way to formally define the imprecision of meaning. We should therefore start again with a universe of discourse (universe, universal set, referential, reference set, etc.) containing all elements relevant for the (imprecise) concept we wish to formally represent. Then, the characteristic function ' : X ! f0; 1g is replaced by a membership function defined as A : X ! [0; 1]

(2)

such that A (x) 2 [0; 1] is the degree to which an element x 2 X belongs to the fuzzy set A: From A (x) D 0 for the full nonbelongingness to A (x) D 1 for the full belongingness, through all intermediate (0 < A (x) < 1) values. Now, if we consider as an example the concept of integer numbers which are more or less six. Then x D 6 certainly belongs to this set so that A (6) D 1, the numbers five and seven belong to this set almost surely so that A (5) and A (7) are very close to one, and the more a number differs from six, the less its A (:). Finally, the numbers below one and above ten do not belong to this set, so that their A (:) D 0. This may be sketched as in Fig. 1 though we should bear in mind that although in our example the membership function is evidently defined for the integer numbers (x’s) only, it is depicted in a continuous form to be more illustrative. In practice the membership function is usually assumed to be piecewise linear as shown in Fig. 2 (for the same fuzzy set as in Fig. 1, i. e. the fuzzy set integer numbers which are more or less six). To specify the membership function we then need four numbers only: a, b, c, and d as, e. g., a D 2, b D 5, c D 7, and d D 10 in Fig. 2.

Fuzzy Sets Theory, Foundations of

Fuzzy Sets Theory, Foundations of, Figure 1 Membership function of a fuzzy set, integer numbers which are more or less six

Fuzzy Sets Theory, Foundations of, Figure 2 Membership function of a fuzzy set, integer numbers which are more or less six

Notice that the particular form of a membership function is subjective as opposed to an objective form of a characteristic function. However, this may be viewed as quite natural as the underlying concepts are subjective indeed as, e. g., the set of integer numbers more or less six depend on an individual opinion. Unfortunately, this inherent subjectivity of the membership function may lead to some problems in many formal models in which users would rather have a limit to the scope of subjectivity. We will comment on this issue later on. We will now define formally a fuzzy set in a form that is very often used. A fuzzy set A in a universe of discourse X D fxg, written A in X, is defined as a set of pairs A D f(A (x); x)g

(3)

where A : X ! [0; 1] is the membership function of A and A (x) 2 [0; 1] is the grade of membership (or a membership grade) of an element x 2 X in a fuzzy set A. Needless to say that our definition of a fuzzy set (3) is clearly equivalent to the definition of the membership function (2) because a function may be represented by a set of pairs argument–value of the function for this argument.

For our purposes, however, the definition (3) is more settheoretic-like which will often be more convenient. So, in this paper we will practically equate fuzzy sets with their membership functions saying, e. g., a fuzzy set, A (x), and also very often we will equate fuzzy sets with their labels saying, e. g., a fuzzy set, large numbers, with the understanding that the label large numbers is equivalent to the fuzzy set mentioned, written A D large numbers. However, we will use the notation A (x) for the membership function of a fuzzy set A in X, and not an abbreviated notation A(x) as in some more technical papers, to be consistent with our more-set-theoretic-like convention. For practical reasons, it is very often assumed (also in this paper) that all the universes of discourse are finite as, e. g., X D fx1 ; : : : ; x n g. In such a case the pair f(A (x); x)g will be denoted by A (x)/x which is called a fuzzy singleton. Then, the fuzzy set A in X will be written as

 A (x) A D f(A (x); x)g D x (4) n X A (x i ) A (x n ) A (x1 ) C ::: C D ; D x1 xn xi iD1

P

where C and are meant in the set-theoretic sense. By convention, the pairs A (x)/x with A (x) D 0 are omitted here. A conventional (nonfuzzy) set may obviously be written in the fuzzy sets notation introduced above, for instance the (non-fuzzy) set, integer numbers greater than or equal three and less than or equal ten, may be written as AD

1 3

C

1 4

C

1 5

C

1 6

C

1 7

C

1 8

C

1 9

C

1 10

:

The family of all fuzzy sets defined in X is denoted by A; it includes evidently also the empty fuzzy set to be defined by (9), i. e. A D ; such that A (x) D 0, for each x 2 X, and the whole universe of discourse X written as X D 1/x1 C : : : C 1/x n . The concept of a fuzzy set as defined above has been the point of departure for the theory of fuzzy sets (or fuzzy sets theory) which will be briefly sketched below. We will again follow a more intuitive and less formal presentation, which is better suited for this encyclopedia. Some Extensions of the Concept of Zadeh’s Fuzzy Set The concept of Zadeh’s [95] fuzzy set introduced in the previous section is the by far the simplest and most natural way to fuzzify the concept of a (conventional) set, and clearly provides what we mean to represent and handle imprecision. However, its underlying elements are the

1255

1256

Fuzzy Sets Theory, Foundations of

most straightforward possible. This concerns above all the membership function. Therefore, it is quite natural that some extensions have been presented to this basic concept. We will just briefly mention some of them. First, it is quite easy to notice that though the definition of a fuzzy set by the membership function of the type A : X ! [0; 1] is the simplest and most straightforward one, allowing for a gradual transition from the belongingness and nonbelongingness, it can readily be extended. The same role is namely played by a generalized definition by a membership function of the type A : X ! L ;

(5)

where L is some (partially) ordered set as, e. g., a lattice. This obvious, but powerful extension was introduced by Goguen [37] as an L-fuzzy set, where l stands for a lattice. Notice that by using a lattice as the set of values of the membership function we can accommodate situations i which we can encounter elements of the universe of discourse which are not comparable. Another quite and obvious extension, already mentioned but not developed by Zadeh [95,101] is the concept of a type 2 fuzzy set. The rationale behind this concept is obvious. One can easily imagine that the values of grades of membership of the particular elements of a universe of discourse are fuzzy sets themselves. And further, these fuzzy sets may have grades of membership which are type 2 fuzzy sets, which leads to type 3 fuzzy sets, and one can continue arriving at type n fuzzy sets. The next, natural extension is that instead of assuming that the degrees of membership are real numbers from the unit interval, one can go a step further and replace these real numbers from [0; 1] by intervals with endpoints belonging to the unit interval. This leads to interval valued fuzzy sets which are attributed to Dubois and Gorzałlczany (cf. Klir and Yuan [53]). Notice that by using intervals as values of degrees of membership we significantly increase our ability to represent imprecision. A more radical extension to the concept of Zade’s fuzzy set is the so-called intuitionistic fuzzy set introduced by Atanassov [1,2]. An intuitionistic fuzzy set A0 in a universe of discourse X is defined as 0

A D fhx; A0 (x); A (x)ijx 2 Xg where:  The degree of membership is A0 : X ! [0; 1] ;

(6)

 the degree of non-membership is

A0 : X ! [0; 1] ;  and the condition holds 0  A0 (x) C A0 (x)  1 ;

for each x 2 X :

Obviously, each (conventional) fuzzy set A in X corresponds to the following intuitionistic fuzzy set A0 in X: A D fhx; A0 (x); 1  A0 (x)ijx 2 Xg For each intuitionistic fuzzy set A0 (x) D 1  A0 (x)  A0 (x) ;

A0

(7)

in X, we call

for each x 2 X (8)

the intuitionistic fuzzy index (or a hesitation margin) of x in A0 . The intuitionistic fuzzy index expresses a lack of knowledge of whether an element x 2 X belongs to an intuitionistic fuzzy set A0 or not. Notice that the concept of an intuitionistic fuzzy set is a substantial departure from the concept of a (conventional) fuzzy set as it assumes that the degrees of membership and non-membership do not sum up to one, as it is the case in virtually all traditional set theories and their extensions. For more information, we refer the reader to Atanassov’s [3] book. We will not use these extensions in this short introductory article, and the interested readers are referred to the source literature cited. Basic Definition and Properties Related to Fuzzy Sets We will now provide a brief account of basic definitions and properties related to fuzzy sets. We illustrate them with simple examples. A fuzzy set A is said to be empty, written A D ;, if and only if A (x) D 0 ;

for each x 2 X

(9)

and since we omit the pairs 0/x, an empty fuzzy set is really void in the notation (4) as there are no singletons in the right-hand side. Two fuzzy sets A and B defined in the same universe of discourse X are said to be equal, written A D B, if and only if A (x) D B (x) ;

for each x 2 X

Example 1 Suppose that X D f1; 2; 3g and A D 0:1/1 C 0:5/2 C 1/3 B D 0:2/1 C 0:5/2 C 1/3 C D 0:1/1 C 0:5/2 C 1/3 then A D C but A ¤ C and B ¤ C.

(10)

Fuzzy Sets Theory, Foundations of

It is easy to see that this classic definition of the equality of two fuzzy sets by (10) is rigid and clear-cut, contradicting in a sense our intuitive feeling that the equality of fuzzy sets should be softer, and not abrupt, i. e. should rather be to some degree, from zero to one. We will show below one of possible definitions of such an equality to a degree. Two fuzzy sets A and B defined in X are said to be equal to a degree e(A; B) 2 [0; 1], written A D e B, and the degree of equality e(A; B) may be defined in many ways exemplified by those given below (cf. Bandler and Kohout [6]). First, to simplify, we denote: Case 1: A D B in the sense of (10); Case 2: A ¤ B in the sense of (10) and T D fx 2 X : A (x) ¤ B (x)g; Case 3: A ¤ B in the sense of (10) and there exists an x 2 X such that

or

A (x) D 0

and B (x) ¤ 0

A (x) ¤ 0

and B (x) D 0 :

Case 4: A ¤ B in the sense of (10) and there exists an x 2 X such that

or

A (x) D 0

and B (x) D 1

A (x) D 1

and B (x) D 0 :

Now, the following degrees of equality of two fuzzy sets, A and B, may be defined:

e1 (A; B) D

8 ˆ 1 fuzzy goals G1 ; : : : ; G n defined in Y, m > 1 fuzzy constraints C1 ; : : : ; C m defined in X, and a function f : X ! Y, y D f (x), we analogously have  D (x) D G 10 (x)^: : :^G 0n (x)^C 1 (x)^: : :^C n (x); for each x 2 X :

 D (x  ) D max  D (x) : x2X

The basic conceptual fuzzy decision making model can be used in many specific areas, notably in fuzzy optimization which will be covered elsewhere in this volume. The models of decision making under fuzziness developed above can also be extended to the case of multiple criteria, multiple decision makers, and multiple stage cases. We will present the last extension, to fuzzy multistage decision making (control) case which makes it possible to account for dynamics.

(93)

Example 19 Let X D f1; 2; 3; 4g, Y D f2; 3; : : : ; 10g, and y D 2x C 1. If now

 D (x) D G 0 (x) ^ C (x) ;

The maximizing decision is defined as (92), i. e.

(95)

Multistage Decision Making (Control) Under Fuzziness In this case it is convenient to use control-related notation and terminology. In particular, decisions will be referred to as controls, the discrete time moments at which decisions are to be made – as control stages, and the inputoutput (or cause-effect) relationship – as a system under control. The essence of multistage control in a fuzzy environment may be portrayed as in Fig. 9. First, suppose that the control space is U D fug D fc1 ; : : : ; c m g and the state space is X D fxg D fs1 ; : : : ; s n g. Initially we are in some initial state x0 2 X. We apply a control u0 2 U subjected to a fuzzy constraint C 0 (u0 ). We attain a state x1 2 X via a known cause-effect relationship (i. e. S); a fuzzy goal G 1 (x1 ) is imposed on x1 . Next, we apply a control u1 subjected to a fuzzy constraint C 1 (u1 ), and attain a fuzzy state x2 on which a fuzzy goal G 2 (x2 ) is imposed, etc. Suppose for simplicity that the system under control is deterministic and its temporal evolution is governed by a state transition equation f : X  U ! X ;

Fuzzy Sets Theory, Foundations of, Figure 9 Essence of the multistage control in a fuzzy environment (under fuzziness)

(96)

Fuzzy Sets Theory, Foundations of

such that x tC1 D f (x t ; u t ) ;

t D 0; 1; : : :

(97)

where x t ; x tC1 2 X D fs1 ; : : : ; s n g are the states at control stages t and t C 1, respectively, and u t 2 U D fc1 ; : : : ; c m g is the control at t. At each t, u t 2 U is subjected to a fuzzy constraint C t (u t ), and on the state attained x tC1 2 X a fuzzy goal is imposed; t D 0; 1; : : :. The initial state is x0 2 X and is assumed to be known, and given in advance. The termination time (planning, or control, horizon), i. e. is the maximum number of control stages, is denoted by N 2 f1; 2; : : :g, and may be finite or infinite. The performance (goodness) of the multistage control process under fuzziness is evaluated by the fuzzy decision

For a detailed analysis of resulting problems and their solutions (by employing dynamic programming, branchand-bound, genetic algorithms, etc.) we refer the reader to Kacprzyk’s [42,44] books. Concluding Remarks We provided a brief survey of basic elements of Zadeh’s [95] fuzzy sets theory, mainly of basic properties of fuzzy sets, operations on fuzzy sets, fuzzy relations and their compositions, linguistic variables, the extension principle, fuzzy arithmetic, fuzzy events and their probabilities, fuzzy logic, and Bellman and Zadeh’s [8] general approach to decision making in a fuzzy environment. Various aspects of fuzzy sets theory will be expanded in other papers in this part of the volume.

 D (u0 ; : : : ; u N1 j x0 ) D C 0 (u0 ) ^ G 1 (x1 ) ^ : : : ^ C N1 (u N1 ) ^ G N (x N ) :

(98)

In most cases, however, a slightly simplified form of the fuzzy decision (98) is used, namely it is assumed all the subsequent fuzzy controls, u0 ; u1 ; : : : ; u N1 , are subjected to the fuzzy constraints, C 0 (u0 ); C 1 (u1 ); : : : ; C N1 (u N1 , while the fuzzy goal is just imposed on the final state xN , G N (x N ). In such a case the fuzzy decision becomes  D (u0 ; : : : ; u N1 j x0 ) D C 0 (u0 ) ^ : : : ^ C N1 (u N1 ) ^ G N (x N ) :

(99)

The multistage control problem in a fuzzy environment is now formulated as to find an optimal sequence of controls u0 ; : : : ; uN1 , ut 2 U, t D 0; 1; : : : ; N  1, such that:  D (u0 ; : : : ; uN1 j x0 ) D

max

u 0 ;:::;u N1 2U

 D (u0 ; : : : ; u N1 j x0 ) :

(100)

Usually it is more convenient to express the solution, i. e. the controls to be applied, as a control policy a t : X ! U, such that u t D a t (x t ), t D 0; 1; : : :, i. e. the control to be applied at t is expressed as a function of the state at t. The above basic formulation of multistage control in a fuzzy environment may readily be extended with respect to:  The type of the termination time (fixed and specified, implicitly specified, fuzzy, and infinite), and  the type of the system under control (deterministic, stochastic, and fuzzy).

Bibliography 1. Atanassov KT (1983) Intuitionistic fuzzy sets. VII ITKR session. Central Sci.-Techn. Library of Bulg. Acad. of Sci., Sofia pp 1697/84. (in Bulgarian) 2. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20:87–96 3. Atanassov KT (1999) Intuitionistic fuzzy sets: Theory and applications. Springer, Heidelberg 4. Bandemer H, Gottwald S (1995) Fuzzy sets, fuzzy logic, fuzzy methods, with applications. Wiley, Chichester 5. Bandemer H, Näther W (1992) Fuzzy data analysis. Kluwer, Dordrecht 6. Bandler W, Kohout LJ (1980) Fuzzy power sets for fuzzy implication operators. Fuzzy Sets Syst 4:13–30 7. Bellman RE, Giertz M (1973) On the analytic formalism of the theory of fuzzy sets. Inform Sci 5:149–157 8. Bellman RE, Zadeh LA (1970) Decision making in a fuzzy environment. Manag Sci 17:141–164 9. Belohlávek R, Vychodil V (2005) Fuzzy equational logic. Springer, Heidelberg 10. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New York 11. Black M (1937) Vagueness: An exercise in logical analysis. Philos Sci 4:427–455 12. Black M (1963) Reasoning with loose concepts. Dialogue 2: 1–12 13. Black M (1970) Margins of precision. Cornell University Press, Ithaca 14. Buckley JJ (2004) Fuzzy statistics. Springer, Heidelberg 15. Buckley JJ (2005) Fuzzy probabilities. New approach and applications, 2nd edn. Springer, Heidelberg 16. Buckley JJ (2005) Simulating fuzzy systems. Springer, Heidelberg 17. Buckley JJ (2006) Fuzzy probability and statistics. Springer, Heidelberg 18. Buckley JJ, Eslami E (2002) An introduction to fuzzy logic and fuzzy sets. Springer, Heidelberg 19. Calvo T, Mayor G, Mesiar R (2002) Aggregation operators. New trends and applications. Springer, Heidelerg

1271

1272

Fuzzy Sets Theory, Foundations of

20. Carlsson C, Fullér R (2002) Fuzzy reasoning in decision making and optimization. Springer, Heidelberg 21. Castillo O, Melin P (2008) Type-2 fuzzy logic: Theory and applications. Springer, Heidelberg 22. Cox E (1994) The fuzzy system handbook. A practitioner’s guide to building, using, and maintaining fuzzy systems. Academic, New York 23. Cross V, Sudkamp T (2002) Similarity and compatibility in fuzzy set theory. Assessment and applications. Springer, Heidelberg 24. Delgado M, Kacprzyk J, Verdegay JL, Vila MA (eds) (1994) Fuzzy optimization: Recent advances. Physica, Heidelberg 25. Dompere KK (2004) Cost-benefit analysis and the theory of fuzzy decisions. Fuzzy value theory. Springer, Heidelberg 26. Dompere KK (2004) Cost-benefit analysis and the theory of fuzzy decisions. Identification and measurement theory. Springer, Heidelberg 27. Driankov D, Hellendorn H, Reinfrank M (1993) An introduction to fuzzy control. Springer, Berlin 28. Dubois D, Prade H (1980) Fuzzy sets and systems: Theory and applications. Academic, New York 29. Dubois D, Prade H (1985) Théorie des possibilités. Applications a la représentation des connaissances en informatique. Masson, Paris 30. Dubois D, Prade H (1988) Possibility theory: An approach to computerized processing of uncertainty. Plenum, New York 31. Dubois D, Prade H (1996) Fuzzy sets and systems (Reedition on CR-ROM of [28]. Academic, New York 32. Fullér R (2000) Introduction to neuro-fuzzy systems. Springer, Heidelberg 33. Gaines BR (1977) Foundations of fuzzy reasoning. Int J ManMach Stud 8:623–668 34. Gil Aluja J (2004) Fuzzy sets in the management of uncertainty. Springer, Heidelberg 35. Gil-Lafuente AM (2005) Fuzzy logic in financial analysis. Springer, Heidelberg 36. Glöckner I (2006) Fuzzy quantifiers. A computational theory. Springer, Heidelberg 37. Goguen JA (1967) L-fuzzy sets. J Math Anal Appl 18:145–174 38. Goguen JA (1969) The logic of inexact concepts. Synthese 19:325–373 39. Goodman IR, Nguyen HT (1985) Uncertainty models for knowledge-based systems. North-Holland, Amsterdam 40. Hájek P (1998) Metamathematics of fuzzy logic. Kluwer, Dordrecht 41. Hanss M (2005) Applied fuzzy arithmetic. An introduction with engineering applications. Springer, Heidelberg 42. Kacprzyk J (1983) Multistage decision making under fuzziness, Verlag TÜV Rheinland, Cologne 43. Kacprzyk J (1992) Fuzzy sets and fuzzy logic. In: Shapiro SC (ed) Encyclopedia of artificial intelligence, vol 1. Wiley, New York, pp 537–542 44. Kacprzyk J (1996) Multistage fuzzy control. Wiley, Chichester 45. Kacprzyk J, Fedrizzi M (eds) (1988) Combining fuzzy imprecision with probabilistic uncertainty in decision making. Springer, Berlin 46. Kacprzyk J, Orlovski SA (eds) (1987) Optimization models using fuzzy sets and possibility theory. Reidel, Dordrecht 47. Kandel A (1986) Fuzzy mathematical techniques with applications. Addison Wesley, Reading

48. Kaufmann A, Gupta MM (1985) Introduction to fuzzy mathematics – theory and applications. Van Nostrand Reinhold, New York 49. Klement EP, Mesiar R, Pap E (2000) Triangular norms. Springer, Heidelberg 50. Klir GJ (1987) Where do we stand on measures of uncertainty, ambiguity, fuzziness, and the like? Fuzzy Sets Syst 24:141–160 51. Klir GJ, Folger TA (1988) Fuzzy sets, uncertainty and information. Prentice-Hall, Englewood Cliffs 52. Klir GJ, Wierman M (1999) Uncertainty-based information. Elements of generalized information theory, 2nd end. Springer, Heidelberg 53. Klir GJ, Yuan B (1995) Fuzzy sets and fuzzy logic: Theory and application. Prentice-Hall, Englewood Cliffs 54. Kosko B (1992) Neural networks and fuzzy systems. PrenticeHall, Englewood Cliffs 55. Kruse R, Meyer KD (1987) Statistics with vague data. Reidel, Dordrecht 56. Kruse R, Gebhard J, Klawonn F (1994) Foundations of fuzzy systems. Wiley, Chichester 57. Kuncheva LI (2000) Fuzzy classifier design. Springer, Heidelberg 58. Li Z (2006) Fuzzy chaotic systems modeling, control, and applications. Springer, Heidelberg 59. Liu B (2007) Uncertainty theory. Springer, Heidelberg 60. Ma Z (2006) Fuzzy database modeling of imprecise and uncertain engineering information. Springer, Heidelberg 61. Mamdani EH (1974) Application of fuzzy algorithms for the control of a simple dynamic plant. Proc IEE 121:1585–1588 62. Mareš M (1994) Computation over fuzzy quantities. CRC, Boca Raton 63. Mendel J (2000) Uncertain rule-based fuzzy logic systems: Introduction and new directions. Prentice Hall, New York 64. Mendel JM, John RIB (2002) Type-2 fuzzy sets made simple. IEEE Trans Fuzzy Syst 10(2):117–127 65. Mordeson JN, Nair PS (2001) Fuzzy mathematics. An introduction for engineers and scientists, 2nd edn. Springer, Heidelberg 66. Mukaidono M (2001) Fuzzy logic for beginners. World Scientific, Singapore 67. Negoi¸ta CV, Ralescu DA (1975) Application of fuzzy sets to system analysis. Birkhäuser/Halstead, Basel/New York 68. Nguyen HT, Waler EA (205) A first course in fuzzy logic, 3rd end. CRC, Boca Raton 69. Nguyen HT, Wu B (2006) Fundamentals of statistics with fuzzy data. Springer, Heidelberg 70. Novák V (1989) Fuzzy sets and their applications. Hilger, Bristol, Boston 71. Novák V, Perfilieva I, Moˇckoˇr J (1999) Mathematical principles of fuzzy logic. Kluwer, Boston 72. Peeva K, Kyosev Y (2005) Fuzzy relational calculus. World Scientific, Singapore 73. Pedrycz W (1993) Fuzzy control and fuzzy systems, 2nd edn. Research Studies/Wiley, Taunton/New York 74. Pedrycz W (1995) Fuzzy sets engineering. CRC, Boca Raton 75. Pedrycz W (ed) (1996) Fuzzy modelling: Paradigms and practice. Kluwer, Boston 76. Pedrycz W, Gomide F (1998) An introduction to fuzzy sets: Analysis and design. MIT Press, Cambridge 77. Petry FE (1996) Fuzzy databases. Principles and applications. Kluwer, Boston

Fuzzy Sets Theory, Foundations of

78. Piegat A (2001) Fuzzy modeling and control. Springer, Heidelberg 79. Ruspini EH (1991) On the semantics of fuzzy logic. Int J Approx Reasining 5:45–88 80. Rutkowska D (2002) Neuro-fuzzy architectures and hybrid learning. Springer, Heidelberg 81. Rutkowski L (2004) Flexible neuro-fuzzy systems. Structures, learning and performance evaluation. Kluwer, Dordrecht 82. Seising R (2007) The fuzzification of systems. The genesis of fuzzy set theory and its initial applications – developments up to the 1970s. Springer, Heidelberg 83. Smithson M (1989) Ignorance and uncertainty. Springer, Berlin 84. Sousa JMC, Kaymak U (2002) Fuzzy decision making in modelling and control. World Scientific, Singapore 85. Thole U, Zimmermann H-J, Zysno P (1979) On the suitability of minimum and product operator for the intersection of fuzzy sets. Fuzzy Sets Syst 2:167–180 86. Turk¸sen IB (1991) Measurement of membership functions and their acquisition. Fuzzy Sets Syst 40:5–38 87. Türksen IB (2006) An ontlogical and epistemological perspective of fuzzy set theory. Elsevier, New York 88. Wang Z, Klir GJ (1992) Fuzzy measure theory. Kluwer, Boston 89. Wygralak M (1996) Vaguely defined objects. Representations, fuzzy sets and nonclassical cardinality theory. Kluwer, Dordrecht 90. Wygralak M (2003) Cardinalities of fuzzy sets. Springer, Heidelberg 91. Yager RR (1983) Quantifiers in the formulation of multiple objective decision functions. Inf Sci 31:107–139 92. Yager RR, Filev DP (1994) Essentials of fuzzy modeling and control. Wiley, New York 93. Yager RR, Kacprzyk J (eds) (1996) The ordered weighted averaging operators: Theory, methodology and applications. Kluwer, Boston 94. Yazici A, George R (1999) Fuzzy database modeling. Springer, Heidelberg 95. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

96. Zadeh LA (1968) Probability measures of fuzzy events. J Math Anal Appl 23:421–427 97. Zadeh LA (1973) Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans Syst, Man Cybern SMC-2:28–44 98. Zadeh LA (1975) Fuzzy logic and approximate reasoning. Synthese 30:407–428 99. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning. Inf Sci (Part I) 8:199– 249, (Part II) 8:301–357, (Part III) 9:43–80 100. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28 101. Zadeh LA (1983) A computational approach to fuzzy quantifiers in natural languages. Comput Math Appl 9:149– 184 102. Zadeh LA (1985) Syllogistic reasoning in fuzzy logic and its application to usuality and reasoning with dispositions. IEEE Trans Syst Man Cybern SMC-15:754–763 103. Zadeh LA (1986) Fuzzy probabilities. Inf Process Manag 20:363–372 104. Zadeh LA, Kacprzyk J (eds) (1992) Fuzzy logic for the management of uncertainty. Wiley, New York 105. Zadeh LA, Kacprzyk J (eds) (1999) Computing with words in information/intelligent systems. 1 Foundations. Springer, Heidelberg 106. Zadeh LA, Kacprzyk J (eds) (1999) Computing with words in information/intelligent systems. 2 Applications. Springer, Heidelberg 107. Zang H, Liu D (2006) Fuzzy modeling and fuzzy control. Birkhäuser, New York 108. Zimmermann H-J (1976) Description and optimization of fuzzy systems. Int J Gen Syst 2:209–215 109. Zimmermann H-J (1987) Fuzzy sets, decision making, and expert systems. Kluwer, Dordrecht 110. Zimmermann H-J (1996) Fuzzy set theory and its applications, 3rd edn. Kluwer, Boston 111. Zimmermann H-J, Zysno P (1980) Latent connectives in human decision making. Fuzzy Sets Syst 4:37–51

1273

1274

Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions

Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions I. BURHAN TÜRK¸S EN Head Department of Industrial Engineering, TOBB-ETÜ (Economics and Technology University of the Union of Turkish Chambers and Commodity Exchanges), Ankara, Republic of Turkey

Article Outline Glossary Definition of the Subject Introduction Type 1 Fuzzy System Models of the Past Future of Fuzzy System Models Case Study Applications Experimental Design Conclusions and Future Directions Bibliography Glossary Z-FRB Zadeh’s linguistic fuzzy rule base. TS-FR Takagi–Sugeno fuzzy rule base. c the number of rules in the rule base. nv the number of input variables in the system. X D (x1 ; x2 ; : : : ; xnv ) input vector. x j the input (explanatory variable), for j D 1; : : : ; nv. A ji the linguistic label associated with jth input variable of the antecedent in the ith rule. B i the consequent linguistic label of the ith rule. R i ith rule with membership function  i (x j ) : x j ! [0; 1]. A i multidimensional type 1 fuzzy set to represent the ith antecedent part of the rules defined by the membership function  i (x) : x ! [0; 1]. a i D (a i;1 ; : : : ; a i;nv ) the regression coefficient vector associated with the ith rule. b i the scalar associated with the ith rule in the regression equation. SFF-LSE “Special Fuzzy Functions” that are generated by the Least Squares Estimation. SFF-SVM “Special Fuzzy Functions” estimated by Support Vector Machines. The estimate of yi would be obtained as Yi D ˇ i0Cˇ i1 i C ˇ i2 X with SFF-LSE y the dependent variable, assumed to be a linear function.

ˇ j0 j D 0; 1; : : : ; nv, indicate how a change in one of the independent variables affects the dependent variable. X D (x j;k j j D 1; : : : ; nv; k D 1; : : : ; nd) the set of observations in a training data set. m the level of fuzziness, m D 1:1; : : : ; 2:5. c the number of clusters, c D 1; : : : ; 10. J the objective function to be minimized. k:kA a norm that specifies a distance-based similarity between the data vector xk and a fuzzy. A D I the Euclidean norm. A D COV 1 the Mahalonobis norm. COV the covariance matrix. m , c the optimal pair. v XjY;i D (x1;i ; x2;i ; : : : ; x nv;i ; y i ) the cluster centers for m  m D m and each cluster i D 1; : : : ; c  . v X;i D (x1;i ; x2;i ; : : : ; x nv;i ) the cluster centers of the m

“input space” for m D m and c D 1; : : : ; c  . i k (x k ) the normalized membership values of x data sample in the ith cluster, i D 1; : : : ; c  . i D ( i k j i D 1; : : : ; c ; k D 1; : : : ; nd) the membership values of x in the ith cluster. potential augmented input matrices in SFFX 0i ; X 00i ; X 000 i LSE. E xEk i C b linear Support Vector Ref (xEk ) D yˆ k D hw; gression (SVR) equation. l" D jy k  f (xEk )j" D maxf0; jy  f (x)j  "g "insensitive loss function. E b the weight vector and bias term. w; c > 0 the tradeoff between the empirical error and the complexity term.

k  0 and k  0 the slack variables. ˛ k and ˛ k Lagrange multipliers. KhE x k 0 ; xEk i the kernel mapping of the input vectors. nd   P   ˛ i k  ˛ i k KhE x i k 0 ; xEi k iCb i yˆ i k 0 D fˆ xEi k 0 ; ˛ i ; ˛ i D kD1

output value of kth data sample in ith cluster with SFFSVM. ˜ f(x; (u; f x (u)))jx 2 X; u 2 J x  [0; 1]g type 2 fuzzy AD ˜ set, A. f x (u) : J x ! [0; 1]; 8u 2 J x  [0; 1]; 8x 2 X secondary membership function. f x (u) D 1; 8x 2 X; 8u 2 J x ; J x  [0; 1] interval value type 2 membership function. [27] the domain of the primary membership is discrete and the secondary membership values are fixed to 1. Thus, the proposed method is utilizing discrete interval valued type 2 fuzzy sets in order to represent the linguistic values assigned to each fuzzy variable in each rule. These fuzzy sets can be mathematically defined as follows:

Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions

hP i R A˜ D x2X u2J x 1/u /x Discrete Interval Valued Type 2 Fuzzy Sets (DIVT2FS) x 2 X  0, then  k  k D 0 must also be true. In the same sense, there can never be two slack variables  k ;  k > 0 which are both non-zero and equal. Special Fuzzy Functions with SVM (FF-SVM) Method As one can build ordinary least squares for the estimation of the Special Fuzzy Functions when the relationship between input variables and the output variable can be linearly defined in the original input space, one may also build support vector regression models to estimate the parameters of the non-linear Special Fuzzy Functions using support vector regression methods. The augmented input matrix is determined from FCM algorithm such that there is one SVR in SFF-SVM for each cluster same as SFF-LSE model. One may choose any membership transformation depending on the input dataset. Then one can apply support vector regression, SVR, algorithm instead of LSE to

1281

1282

Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions

each augmented matrix, which are comprised of the original selected input variables and the membership values and/or their transformations. Support vector machines’ optimization algorithm is applied to each augmented matrix of each cluster (rule) i; i D 1; : : : ; c  , to optimize their Lagrange multipliers, ˛ i k and ˛ i k , and find the candidate support vectors, k D 1; : : : ; nd. Hence, using SFFSVM, one finds Lagrange multipliers of each kth train data sample one for each cluster, i. Then the output value of kth data sample in ith cluster is estimated using the Equation (27) as follows: yˆi k 0

  D fˆ xEi k 0 ; ˛ i ; ˛ i D

nd X 

 ˛ i k  ˛ i k KhE x i k 0 ; xEi k i C b i :

(27)

kD1

Where the yˆi k 0 is the estimated output of the k 0 th vector in ith cluster which is estimated using the support vector regression function with the Lagrange multipliers of the ith cluster. The augmented kernel matrix denotes the kernel mapping of the augmented input matrix (as described in SFF-LSE approach) where the membership values and their transformations are used as additional input variables. After the optimization algorithm finds the optimum Lagrange multipliers, one can estimate the output value of each data point in each cluster using Eq. (27). The inference structure of SFF-SVM is adapted from the Special Fuzzy Functions with least squares where one can estimate a single output of a data point (see Eq. (15) by taking the membership value weighted averages of its output values calculated for each cluster using Eq. (27).

Where u is defined as the primary membership value, J x [0; 1] is the domain of u and f z (u) is the secondary membership function. An alternative definition of type 2 ˜ used by Mendel [28] and inspired by Mizufuzzy set A, moto and Tanaka [30], can be given as follows: Given that x is a continuous universe of discourse, the same type 2 fuzzy set can be defined as: A˜ D

Z

Z

f x (u)/u /x :

x2X

(30)

u2J x

And for discrete case, the same type 2 fuzzy set can be defined as: hX i X A˜ D f x (u)/u /x : (31) x2X

u2J x

The secondary membership function, f x (u),can be defined as follows: Definition 2 Secondary membership function, f x (u), is a function that maps membership values of universe of discourse x onto unit interval [0; 1]. Thus, f x (u) can be characterized as follows: f x (u) : J x ! [0; 1]; 8u 2 J x [0; 1] ;

8x 2 X : (32)

With the secondary membership function defined as above, the membership function of type 2 fuzzy set ˜  ˜ A (x), can then be written for continuous and discrete A; cases, respectively, as follows: Z ˜ A (x) D 

f x (u)/u ;

8x 2 X ;

(33)

u2J x

Future of Fuzzy System Models

˜ A (x) D 

In the future, fuzzy system models are expected to be structured by type 2 fuzzy sets. For this purpose, we next present basic definitions. Basic Definitions Definition 1 A type 2 fuzzy set A˜ on universe of discourse, x, is a fuzzy set which is characterized by a fuzzy ˜ A (x) is a mapping ˜ A (x), where  membership function,  as shown below: [0;1]

˜ A (x) : X ! [0; 1] 

:

u2J x

f x (u)/u ;

8x 2 X :

(34)

An Interval Valued Type 2 Fuzzy Set (IVT2FS), which is a special case of type 2 fuzzy set, can be defined as follows: Definition 3 Let A˜ be a linguistic label with type-2 membership function on the universe of discourse of base vari˜ A (x) : X ! f x (u)/u; u 2 J x ; J x [0; 1]. The folable x,  lowing condition needs to be satisfied in order to consider ˜ A (x) as an interval value type 2 membership functions:  f x (u) D 1; 8x 2 X; 8u 2 J x ; J x [0; 1] :

(35)

(28)

˜ can be characterized as follows: Then type 2 fuzzy set, A, A˜ D f(x; (u; f x (u)))jx 2 X; u 2 J x [0; 1]g :

X

(29)

Thus, the interval valued type 2 membership function is a mapping as shown below: ˜ A (x) : X ! 1/u; u 2 J x ; J x [0; 1] : 

(36)

Fuzzy System Models Evolution from Fuzzy Rulebases to Fuzzy Functions

General Structure of Type 2 Fuzzy System Models In a series of papers, Mendel, Karnik and Liang [22,23, 24,25] extended traditional type 1 inference methods such that these methods can process type 2 fuzzy sets. These studies were explained thoroughly by Mendel in [28]. The classical Zadeh and Takagi–Sugeno type 1 models are modified as type 2 fuzzy rule bases (T2Z-FR and T2TSFR,), respectively, as follows: c h NV   ALSO IF AND x j 2 X j isr A˜ ji iD1

jD1

i THEN y 2 Y isr B˜ i ;

(37)

c h NV   ALSO IF AND x j 2 X j isr A˜ ji iD1

jD1

i THEN y i D a i x T C b i :

(38)

Mendel, Karnik and Liang [22,23,24,25] assumed that the antecedent variables are separable (i. e., non-interactive). After formulating the inference for a full type 2 fuzzy system model, Karnik et al. [22,23,24,25] simplified their proposed methods for the interval values case. In order to identify the structure of the IVT2FS, it was assumed that the membership functions are Gaussian. A clustering method was utilized to identify the mean parameters for the Gaussian functions. However, the clustering method has not been specified. It was assumed that the standard error parameters of the Gaussian membership functions are exactly known. The number of rules was assigned as eight due to the nature of their application. However, the problem of finding the suitable number of rules was not discussed in the paper. Liang and Mendel [26] proposed another method to identify the structure of IVT2FS. It has been suggested to initialize the inference parameters and to use steepestdescent (or other optimization) method in order to tune these parameters of an IVT2FS. Two different approaches for the initialization phase have been suggested in [26]. Partially dependent approach utilizes a type 1 fuzzy system model to provide a baseline for the type 2 fuzzy system model design. Totally independent approach starts with assigning random values to initialize the inference parameters. Liang and Mendel [26] indicated that the main challenge in their proposed tuning method is to determine the active branches. Mendel [28] indicated that several structure identification methods, such as one-pass, least-squares, back-propagation (steepest descent), singular value-QR decomposition, and iterative design methods, can be utilized in order to find the most suitable inference parameters of the type 2

fuzzy system models. Mendel [28] provided an excellent summary of the pros and cons of each method. Several other researchers such as Starczewski and Rutkowski [37], John and Czarnecki [20,21] and Chen and Kawase [10] worked on T2-FSM. Starczewski and Rutkowski [37] proposed a connectionist structure to implement interval valued type 2 fuzzy structure and inference. It was indicated that methods such as, back propagation, recursive least squares or Kalman algorithm-based methods, can be used in order to determine the inference parameters of the structure. John and Czarnecki [20,21] extended the ANFIS structure such that it can process the type 2 fuzzy sets. All of the above methods assume non-interactivity between antecedent variables. Thus, the general steps of the inference can be listed as: fuzzification, aggregation of the antecedents, implication, and aggregation of the consequents, type reduction and defuzzification. With the general structure of type 2 fuzzy system models and inference techniques, we next propose discrete interval valued type 2 rule base structures. Discrete Interval Valued Type 2 Fuzzy Sets (DIVT2FS) In Discrete Interval Valued Type 2 Fuzzy Sets (DIVT2FS) [48] the domain of the primary membership is discrete and the secondary membership values are fixed to 1. Thus, the proposed method is utilizing discrete interval valued type 2 fuzzy sets in order to represent the linguistic values assigned to each fuzzy variable in each rule. These fuzzy sets can be mathematically defined as follows: Z hX i A˜ D 1/u /x ; (39) x2X

u2J x

where x 2 X 13 . One can check that more complicated deviations are also worse. The second part of the definition needs to be checked as well, so we need to ensure that a player cannot do as well in terms of payoff by moving to a less complex strategy, namely a one-state machine. A one-state machine that always plays C will get the worst possible payoff, since the other machine will keep playing D against it. A one-state machine that plays D will get a payoff of 4 in periods 2,4 . . .

Game Theory and Strategic Complexity

ı ı or a total payoff of 4ı (1  ı 2 ) as against 3ı (1  ı). The second is strictly greater for ı > 13 . This machine gives a payoff close to 3 per stage for ı close to 1. As ı ! 1, the payoff of each player goes to 3, the cooperative outcome. The paper by Abreu and Rubinstein obtains a basic result on the characterization of payoffs obtained as NEC in the infinitely repeated Prisoners’ Dilemma. We recall that the “Folk Theorem” for repeated games tells us that all outcome paths that give a payoff per stage strictly greater for each player than the minmax payoff for that player in the stage game can be sustained by Nash equilibrium strategies. Using endogenous complexity, one can obtain a refinement; now only payoffs on a so-called “cross” are sustainable as NEC. This result is obtained from two observations. First, in any NEC of a two-player game, the number of states in the players’ machines must be equal. This follows from the following intuitive reasoning (we refer readers to the original paper for the proofs). Suppose we fix the machine used by one of the players (say Player 1), so that to the other player it becomes part of the “environment”. For Player 2 to calculate a best response or an optimal strategy to Player 1’s given machine, it is clearly not necessary to partition past histories more finely than the other player has done in obtaining her strategy; therefore the number of states in Player 2’s machine need not (and therefore will not, if there are complexity costs) exceed the number in Player 1’s machine in equilibrium. The same holds true in the other direction, so the number of states must be equal. (This does not hold for more than two players.) Another way of interpreting this result is that it restates the result from Markov decision processes on the existence of an optimal “stationary” policy (that is depending only on the states of the environment, which are here the same as the states of the other player’s machine). See also Piccione [40]. Thus there is a one-to-one correspondence between the states of the two machines. (Since the number of states is finite and the game is infinitely repeated, the machine must visit at least one of the states infinitely often for each player.) One can strengthen this further to establish a oneto-one correspondence between actions. Suppose Player 1’s machine has a1t D a1s , where these denote the actions taken at two distinct periods and states by Player 1, with a2t ¤ a2s for Player 2. Since the states in t and s are distinct for Player 1 and the actions taken are the same, the transitions must be different following the two distinct states. But then Player 1 does not need two distinct states, he can drop one and condition the transition after, say, s on the different action used by Player 2. (Recall the transition is a function of the state and the opponent’s action.) But then

Player 1 would be able to obtain the same payoff with a less complex machine; so the original one could not have been a NEC machine. Therefore the actions played must be some combination of (C; C) and (D; D) (the correspondence is between the two Cs and the two Ds) or some combination of (C; D) and (D; C). (By combination, we mean combination over time. For example, fC; C) is played, say, 10 times for every 3 plays of (D; D)). In the payoff space, sustainable payoffs are either on the line joining (3,3) and (0,0) or on the line joining the payoffs on the other diagonal; hence the evocative name chosen to describe the result-the cross of the two diagonals. While this is certainly a selection of equilibrium outcomes, it does not go as far as we would wish. We would hope that some equilibrium selection argument might deliver us the co-operative outcome (3,3) uniquely (even in the limit as ı ! 1), instead of the actual result obtained. There is work that does this, but it uses evolutionary arguments for equilibrium selection (see Binmore and Samuelson [9]). An alternative learning argument for equilibrium selection is used by Maenner [32]. In his model, a player tries to infer what machine is being used by his opponent and chooses the simplest automaton that is consistent with the observed pattern of play as his model of his opponent. A player then chooses a best response to this inference. It turns out complexity is not sufficient to pin down an inference and one must use optimistic or pessimistic rules to select among the simplest inferences. One of these gives only (D; D) repeated, whilst the other reproduces the Abreu–Rubinstein NEC results. Piccione and Rubinstein [41] show that the NEC profile of 2-player repeated extensive form games is unique if the stage game is one of perfect information. This unique equilibrium involves all players playing their one-shot myopic non-cooperative actions at every stage. This is a strong selection result and involves stage game strategies not being observable (only the path of play is) as well as the result on the equilibrium numbers of states being equal in the two players’ machines. In repeated games with more than two players or with more than two actions at each stage the multiplicity problem may be more acute than just not being able to select uniquely a “cooperative outcome”. In some such games complexity by itself may not have any bite and the Folk Theorem may survive even when the players care for the complexity of their strategies. (See Bloise [12] who shows robust examples of two-player repeated games with three actions at each stage such that every individually rational payoff can be sustained as a NEC if players are sufficiently patient.)

1297

1298

Game Theory and Strategic Complexity

Exogenous Complexity We now consider the different approach taken by Neyman [35,36], Ben Porath [6,7], Zemel [50] and others. We shall confine ourselves to the papers by Neyman and Zemel on the Prisoners’ Dilemma, without discussing the more general results these authors and others have obtained. Neyman’s approach treats complexity as exogenous complexity. Let Player i be restricted to use strategies/automata with the number of states not to exceed mi . He also considers finitely repeated games, unlike the infinitely repeated games we have discussed up to this point. With the stage game being the Prisoners’ Dilemma and the number of repetitions being T (for convenience, this includes the first time the game is played). We can write the game being considered as G T (m1 ; m2 ). Note that without the complexity restrictions, the finitely repeated Prisoners’ Dilemma has a unique Nash equilibrium outcome path (and a unique subgame perfect equilibrium)–(D; D) in all stages. Thus sustaining cooperation in this setting is obtaining non-equilibrium behavior, though one that is frequently observed in real life. This approach therefore is an example of bounded rationality being used to explain observed behavior that is not predicted in equilibrium. If the complexity restrictions are severe, it turns out that (C; C) in each period is an equilibrium. For this, we need 2  m1 ; m2  T  1. To see this consider the grim trigger strategy mentioned earlier-representable as a two-state automaton- and let T D 3. Here (q1 ) D C; (q2 ) D D; (q1 ; C) D q1 ; (q1 ; D) D q2 ; (q2 ; C or D) D q2 . If each player uses this strategy, (C; C) will be observed. Such a pair of strategies is clearly not a Nash equilibrium-given Player 1’s strategy, Player 2 can do better by playing D in stage 3. But if Player 2 defects in the second stage, by choosing a two-state machine where (q1 ; C) D D, he will gain 1 in the second stage and lose 3 in the third stage as compared to the machine listed above, so he is worse off. But defecting in stage 3 requires an automaton with three states-two states in which C is played and one in which D is played. The transitions in state q1 will be similar but, if q2 is the second cooperative state, the transition from q2 to the defect state will take place no matter whether the other player plays C or D. However, automata with three states violate the constraint that the number of states be no more than 2, so the profitable deviation is out of reach. Whilst this is easy to see, it is not clear what happens when the complexity is high. Neyman shows the following result: For any integer k, there exists a T 0 , such that for 1 T T0 and T k  m1 ; m2  T k , there is a mixed strategy

equilibrium of G T (m1 ; m2 ) in which the expected average payoff to each player is at least 3  1k . The basic idea is that rather than playing (C; C) at each stage, players are required to play a complex sequence of C and D and keeping track of this sequence uses up a sufficient number of states in the automaton so that profitable deviations again hit the constraint on the number of states. But since D cannot be avoided on the equilibrium path, only something close to (C; C) each period can be obtained rather than (C; C) all the time. Zemel’s paper adds a clever little twist to this argument by introducing communication. In his game, there are two actions each player chooses at each stage, either C or D as before and a message to be communicated. The message does not directly affect payoffs as the choice of C or D does. The communication requirements are now made sufficiently stringent, and deviation from them is considered a deviation, so that once again the states “left over” to count up to N are inadequate in number and (C; C) can once again be played in each stage/period. This is an interesting explanation of the rigid “scripts” that many have observed to be followed, for example, in negotiations. Neyman [36] surveys his own work and that of Ben Porath [6,7]. He also generalizes his earlier work on the finitely-repeated Prisoners’ Dilemma to show how small the complexity bounds would have to be in order to obtain outcomes outside the set of (unconstrained) equilibrium payoffs in the finitely-repeated, normal-form game (just as (C; C) is not part of an unconstrained equilibrium outcome path in the Prisoners’ Dilemma). Essentially, if the complexity permitted grows exponentially or faster with the number of repetitions, the equilibrium payoff sets of the constrained and the unconstrained games will coincide. For sub-exponential growth, a version of the Folktheorem is proved for two-person games. The first result says: For every game G in strategic form and with mi being the bound on the complexity of i 0 s strategy and T the number of times the game G is played, there exists a constant c such that if m i exp(cT), then E(G T ) D E(G T (m1 ; m2 ) where E(:) is the set of equilibrium payoffs in the game concerned. The second result, which generalizes the Prisoners’ Dilemma result already stated, considers a sequence of triples (m1 (n); m2 (n); T(n)) for a two-player strategic form game, with m2 m1 and shows that the lim inf of the set of equilibrium payoffs of the automata game as n ! 1 includes essentially the strictly individually rational payoffs of the stage game if m1 (n) ! 1 and ı (log m1 (n)) T(n) ! 0 as n ! 1. Thus a version of the

Game Theory and Strategic Complexity

Folk theorem holds provided the complexity of the players’ machines does not grow too fast with the number of repetitions. Complexity and Bargaining Complexity and the Unanimity Game The well-known alternating offers bargaining model of Rubinstein has two players alternating in making proposals and responding to proposals. Each period or unit of time consists of one proposal and one response. If the response is “reject”, the player who rejects makes the next proposal but in the following period. Since there is discounting with discount factor ı per period, a rejection has a cost. The unanimity game we consider is a multiperson generalization of this bargaining game, with n players arranged in a fixed order, say 1; 2; 3 : : : n. Player 1 makes a proposal on how to divide a pie of size unity among the n people; players 2; 3; : : : n respond sequentially, either accepting or rejecting. If everyone accepts, the game ends. If someone rejects, Player 2 now gets to make a proposal but in the next period. The responses to Player 2’s proposal are made sequentially by Players 3; 4; 5 : : : n; 1. If Player i gets a share xi in an eventual agreement at time t, his payoff is ı t1 x i . Avner Shaked had shown in 1986 that the unanimity game had the disturbing feature that all individually rational (that is non-negative payoffs for each player) outcomes could be supported as subgame perfect equilibria. Thus the sharp result of Rubinstein [43], who found a unique subgame perfect equilibrium in the two-play stood in complete contrast with the multiplicity of subgame perfect equilibria in the multiplayer game. Shaked’s proof had involved complex changes in expectations of the players if a deviation from the candidate equilibrium were to be observed. For example, in the three-player game with common discount factor ı, the three extreme points (1; 0; 0), (0; 1; 0), (0; 0; 1) sustain one another in the following way. Suppose Player 1 is to propose (0; 1; 0), which is not a very sensible offer for him or her to propose, since it gives everything to the second 1 deviates and proposes, say, ı player. If Player ı ((1  ı) 2; ı; (1  ı) 2), then it might be reasoned that Player 2 would have no incentive to reject because in any case he or she can’t get more than 1 in the following period and Player 3 would surely prefer a positive payoff to 0. However, there is a counter-argument. In the subgame following Player 1’s deviation, Player 3’s expectations have been raised so that he (and everyone else, including Player 1) now expect the outcome to be (0; 0; 1), instead of the earlier expected outcome. For sufficiently

high discount factor, Player 3 would reject Player 1’s insufficiently generous offer. Thus Player 1 would have no incentive to deviate. Player 1 is thus in a bind; if he offers Player 2 less than ı and offers Player 3 more in the deviation, the expectation that the outcome next period will be (0; 1; 0) remains unchanged, so now Player 2 rejects his offer. So no deviation is profitable, because each deviation generates an expectation of future outcomes, an expectation that is confirmed in equilibrium. (This is what equilibrium means.) Summarizing, (0; 1; 0) is sustained as follows: Player 1 offers (0; 1; 0), Player 2 accepts any offer of at least 1 and Player 3 any offer of at least 0. If one of them rejects Player 1’s offer, the next player in order offers (0; 1; 0) and the others accept. If any proposer, say Player 1, deviates from the offer (0; 1; 0) to (x1 ; x2 ; x3 ) the player with the lower of fx2 ; x3 g rejects. Suppose it is Player i who rejects. In the following period, the offer made gives 1 to Player i and 0 to the others, and this is accepted. Various attempts were made to get around the continuum of equilibria problem in bargaining games with more than two players; most of them involved changing the game. (See [15,16] for a discussion of this literature.) An alternative to changing the game might be to introduce a cost for this additional complexity, in the belief that players who value simplicity will end up choosing simple, that is history independent, strategies. This seems to be a promising approach because it is clear from Shaked’s construction that the large number of equilibria results from the players choosing strategies that are history-dependent. In fact, if the strategies are restricted to those that are history-independent (also referred to as stationary or Markov) then it can be shown (see Herrero [27]) that the subgame perfect equilibrium is unique and induces equal division of the pie as ı ! 1. The two papers ([15,16]) in fact seek to address the issue of complex strategies with players having a preference for simplicity, just as in Abreu and Rubinstein. However, now we have a game of more than two players, and a single extensive form game rather than a repeated game as in Abreu–Rubinstein. It was natural that the framework had to be broadened somewhat to take this into account. For each of n players playing the unanimity game, we define a machine or an implementation of the strategy as follows. A stage of the game is defined to be n periods, such that if a stage were to be completed, each player would play each role at most once. A role could be as proposer or n  1th responder or n  2th responder . . . up to first responder (the last role would occur in the period before the player concerned had to make another proposal). An outcome of a stage is defined as

1299

1300

Game Theory and Strategic Complexity

a sequence of offers and responses, for example e D (x; A; A; R; y; R; z; A; R; b; A; A; A) in a four-player game where the (x; y; z; b) are proposals made in the four periods and (A; R) refer to accept and reject respectively. From the point of view of the first player to propose (for convenience, let’s call him Player 1), he makes an offer x, which is accepted by Players 2 and 3 but rejected by Player 4. Now it is Player 2’s turn to offer, but this offer, y, is rejected by the first responder Player 3. Player 1 gets to play as second responder in the next period, where he rejects Player 3’s proposal. In the last period of this stage, a proposal b is made by Player 4 and everyone accepts (including Player 1 as first responder). Any partial history within a stage is denoted by s. For example, when Player 2 makes an offer, he does so after a partial history s D (x; A; A; R). Let the set of possible outcomes of a stage be denoted by E and the set of possible partial histories by S. Let Qi denote the set of states used in the ith player’s machine M i . The output mapping is given by  i : S  Q i ! , where  is the set of possible actions (that is the set of possible proposals, plus accept or reject). The transition between states now takes place at the end of each stage, so the transition mapping is given as  i : E  Q i ! Q i . As before, in the Abreu– Rubinstein setup, there is an initial state qinitial;i specified for each player. There is also a termination state F, which is supposed to indicate agreement. Once in the termination state, players will play the null action and make transitions to this state. Note that our formulation of a strategy naturally uses a Mealy machine. The output mapping  i (:; :) has two arguments, the state of the machine and the input s, which lists the outcomes of previous moves within the stage. The transitions take place at the end of the stage. The benefit of using this formulation is that the continuation game is the same at the beginning of each stage. In Chatterjee and Sabourian [16], we investigate the effects of modifying this formulation, including studying the effects of having a sub-machine to play each role. The different formulations can all implement the same strategies, but the complexities in terms of various measures could differ. We refer the reader to that paper for details, but emphasize that in the general unanimity game, the results from other formulations are similar to the one developed here, though they could differ for special cases, like three-player games. We now consider a machine game, where players first choose machines and then the machines play the unanimity game in analogy with Abreu–Rubinstein. Using the same lexicographic utility, with complexity coming after bargaining payoffs, what do we find for Nash equilibria of the machine game?

As it turns out, the addition of complexity costs in this setting has some bite but not much. In particular, any division of the pie can be sustained in some Nash equilibrium of the machine game. Perpetual disagreement can, in fact, be sustained by a stationary machine, that is one that makes the same offers and responses each time, irrespective of past history. Nor can we prove, for general n-player games that the equilibrium machines will be onestate. (A three-player counter-example exists in [16]; it does not appear to be possible to generate in games that lasted less than thirty periods.) For two-player games, the result that machines must be one-state in equilibrium can be shown neatly ([16]); another illustration that in this particular area, there is a substantial increase of analytical difficulty in going from two to three players. One reason why complexity does not appear important here is that the definition of complexity used is too restrictive. Counting the number of states is fine, so long as we don’t consider how complex a response might be for partial histories within a stage. The next attempt at a solution is based on this observation. We devise the following definition of complexity: Given the machine and the states, if a machine made the same response to different partial stage histories in different states and another machine made different responses, then the second one was more complex (given that the machines were identical in all other respects). We refer to this notion as response complexity. (In [15] the concept of response complexity is in fact stated in terms of the underlying strategy rather than in terms of machines.) It captures the intuition that counting states is not enough; two machines could have the same number of states, for example because each generated the same number of distinct offers, but the complexity of responses in one machine could be much lower than that in the other. Note that this notion would only arise in extensive-form games. In normal form games, counting states could be an adequate measure of complexity. Nor is this notion of complexity derivable from notions of transition complexity, due to Banks and Sundaram, for example, which also apply in normal-form games. The main result of Chatterjee and Sabourian [15] is that this new aspect of complexity enables us to limit the amount of delay that can occur in equilibrium and hence to infer that only one-state machines are equilibrium machines. The formal proofs using two different approaches are available in Chatterjee and Sabourian [15,16]. We mention the basic intuition behind these results. Suppose, in the three player game, there is an agreement in period 4 (this is in the second stage). Why doesn’t this agreement

Game Theory and Strategic Complexity

take place in period 1 instead? It must be because if the same offer and responses are seen in period 1 some player will reject the offer. But of course, he or she does not have to do so because the required offer never happens. But a strategy that accepts the offer in period 4 and rejects it off the equilibrium path in period 1 must be more complex, by our definition, than one that always accepts it whenever it might happen, on or off the expected path. Repeated application of this argument by backwards induction gives the result. (The details are more complicated but are in the papers cited above.) Note that this uses the definition that two machines might have the same number of states and yet one could be simpler than the other. It is interesting, as mentioned earlier, that for two players one can obtain an analogous result without invoking the response simplicity criterion, but from three players on this criterion is essential. The above result (equilibrium machines have one state each and there are no delays beyond the first stage) is still not enough to refine the set of equilibria to a single allocation. In order to do this, we consider machines that can make errors/trembles in output. As the error goes to zero, we are left with perfect equilibria of our game. With onestate machines, the only subgame perfect equilibria are the ones that give equal division of the pie as ı ! 1. Thus a combination of two techniques, one essentially recognizing that players can make mistakes and the other that players prefer simpler strategies if the payoffs are the same as those given by a more complex strategy, resolves the problem of multiplicity of equilibria in the multiperson bargaining game. As we mentioned before, the introduction of errors ensures that the equilibrium strategies are credible at every history. We could also take the more direct (and easier) way of obtaining the uniqueness result with complexity costs by considering NEC strategies that are subgame perfect in the underlying game (PEC) (as done in [15]). Then since an history-independent subgame perfect equilibrium of the game is unique and any NEC automaton profile has one state and hence is history-independent, it follows immediately that any PEC is unique and induces equal division as ı ! 1. Complexity and Repeated Negotiations In addition to standard repeated games or standard bargaining games, multiplicity of equilibria often appear in dynamic repeated interactions, where a repeated game is superimposed on an alternating offers bargaining game. For instance, consider two firms, in an ongoing vertical relationship, negotiating the terms of a merger. Such sit-

uations have been analyzed in several “negotiation models” by Busch and Wen [13], Fernandez and Glazer [18] and Haller and Holden [25]. These models can be interpreted as combining the features of both repeated and alternating-offers bargaining games. In each period, one of the two players first makes an offer on how to divide the total available periodic (flow) surplus; if the offer is accepted, the game ends with the players obtaining the corresponding payoffs in the current and every period thereafter. If the offer is rejected, they play some normal form game to determine their flow payoffs for that period and then the game moves on to the next period in which the same play continues with the players’ bargaining roles reversed. One can think of the normal form game played in the event of a rejection as a “threat game” in which a player takes actions that could punish the other player by reducing his total payoffs. If the bargaining had not existed, the game would be a standard repeated normal form game. Introducing bargaining and the prospect of permanent exit, the negotiation model still admits a large number of equilibria, like standard repeated games. Some of these equilibria involve delay in agreement (even perpetual disagreement) and inefficiency, while some are efficient. Lee and Sabourian [31] apply complexity considerations to this model. As in Abreu and Rubinstein [1] and others, the players choose among automata and the equilibrium notion is that of NEC and PEC. One important difference however is that in this paper the authors do not assume the automata to be finite. Also, the paper introduces a new machine specification that formally distinguishes between the two roles – proposer and responder – played by each player in a given period. Complexity considerations select only efficient equilibria in the negotiation model players are sufficiently patient. First, it is shown that if an agreement occurs in some finite period as a NEC outcome then it must occur within the first two periods of the game. This is because if a NEC induces an agreement beyond the first two periods then one of the players must be able to drop the last period’s state of his machine without affecting the outcome of the game. Second, given sufficiently patient players, every PEC in the negotiation model that induces perpetual disagreement is at least long-run almost efficient; that is, the game must reach a finite date at which the continuation game then on is almost efficient. Thus, these results take the study of complexity in repeated games a step further from the previous literature in which complexity or bargaining alone has produced only limited selection results. While, as we discussed above, many inefficient equilibria survive complexity refinement,

1301

1302

Game Theory and Strategic Complexity

Lee and Sabourian [31] demonstrate that complexity and bargaining in tandem ensure efficiency in repeated interactions. Complexity considerations also allow Lee and Sabourian to highlight the role of transaction costs in the negotiation game. Transaction costs take the form of paying a cost to enter the bargaining stage of the negotiation game. In contrast to the efficiency result in the negotiation game with complexity costs, Lee and Sabourian also show that introducing transaction costs into the negotiation game dramatically alters the selection result from efficiency to inefficiency. In particular, they show that, for any discount factor and any transaction cost, every PEC in the costly negotiation game induces perpetual disagreement if the stage game normal form (after any disagreement) has a unique Nash equilibrium. Complexity, Market Games and the Competitive Equilibrium There has been a long tradition in economics of trying to provide a theory of how a competitive market with many buyers and sellers operates. The concept of competitive (Walrasian) equilibrium (see Debreu [17]) is a simple description of such markets. In such an equilibrium each trader chooses rationally the amount he wants to trade taking the prices as given, and the prices are set (or adjust) to ensure that total demanded is equal to the total supplied. The important feature of the set-up is that agents assume that they cannot influence (set) the prices and this is often justified by appealing to the idea that each individual agent is small relative to the market. There are conceptual as well as technical problems associated with such a justification. First, if no agent can influence the prices then who sets them? Second, even in a large but finite market a change in the behavior of a single individual agent may affect the decisions of some others, which in turn might influence the behavior of some other agents and so on and so forth; thus the market as a whole may end up being affected by the decision of a single individual. Game theoretic analysis of markets have tried to address these issues (e. g. see [21,47]). This has turned out to be a difficult task because the strategic analysis of markets, in contrast to the simple and elegant model of competitive equilibrium, tends to be complex and intractable. In particular, dynamic market games have many equilibria, in which a variety of different kinds of behavior are sustained by threats and counter-threats. More than 60 years ago Hayek [26] noted the competitive markets are simple mechanisms in which economic agents only need to know their own endowments, prefer-

ences and technologies and the vector of prices at which trade takes place. In such environments, economic agents maximizing utility subject to constraints make efficient choices in equilibrium. Below we report some recent work, which suggests that the converse might also be true: If rational agents have, at least at the margin, an aversion to complex behavior, then their maximizing behavior will result in simple behavioral rules and thereby in a perfectly competitive equilibrium (Gale and Sabourian [22]). Homogeneous Markets In a seminal paper, Rubinstein and Wolinsky [46], henceforth RW, considered a market for a single indivisible good in which a finite number of homogeneous buyers and homogeneous sellers are matched in pairs and bargain over the terms of trade. In their set-up, each seller has one unit of an indivisible good and each buyer wants to buy at most one unit of the good. Each seller’s valuation of the good is 0 and each buyer’s valuation is 1. Time is divided into discrete periods and at each date, buyers and sellers are matched randomly in pairs and one member of the pair is randomly chosen to be the proposer and the other the responder. In any such match the proposer offers a price p 2 [0; 1] and the responder accepts or rejects the offer. If the offer is accepted the two agents trade at the agreed price p and the game ends with the seller receiving a payoff p and the buyer in the trade obtaining a payoff 1  p. If the offer is rejected the pair return to the market and the process continues. RW further assume that there is no discounting to capture the idea that there is no friction (cost to waiting) in the market. Assuming that the number of buyers and sellers is not the same, RW showed that this dynamic matching and bargaining game has, in addition to a perfectly competitive outcome, a large set of other subgame perfect equilibrium outcomes, a result reminiscent of the Folk Theorem for repeated games. To see the intuition for this, consider the case in which there is one seller s and many buyers. Since there are more buyers than sellers the price of 1, at which the seller receives all the surplus, is the unique competitive equilibrium; furthermore, since there are no frictions p D 1 seems to be the most plausible price. RW’s precise result, however, establishes that for any price p 2 [0; 1] and any buyer b there is a subgame perfect equilibrium that results in s and b trading at p . The idea behind the result is to construct an equilibrium strategy profile such that buyer b is identified as the intended recipient of the good at a price p . This means that the strategies are such

Game Theory and Strategic Complexity

that (i) when s meets b , whichever is chosen as the proposer offers price p and the responder accepts, (ii) when s is the proposer in a match with some buyer b ¤ b , s offers the good at a price of p D 1 and b rejects and (iii) when a buyer b ¤ b is the proposer he offers to buy the good at a price of p D 0 and s rejects. These strategies produce the required outcome. Furthermore, the equilibrium strategies make use of of the following punishment strategies to deter deviations. If the seller s deviates by proposing to a buyer b a price p ¤ p , b rejects this offer and the play continues with b becoming the intended recipient of the item at a price of zero. Thus, after rejection by b strategies are the same as those given earlier with the price zero in place of p and buyer b in place of buyer b . Similarly, if a buyer b deviates by offering a price p ¤ p then the seller rejects, another buyer b0 ¤ b is chosen to be the intended recipient and the price at which the unit is traded changes to 1. Further deviations from these punishment strategies can be treated in an exactly similar way. The strong impression left by RW is that indeterminacy of equilibrium is a robust feature of dynamic market games and, in particular, there is no reason to expect the outcome to be perfectly competitive. However, the strategies required to support the family of equilibria in RW are quite complex. In particular, when a proposer deviates, the strategies are tailor-made so that the responder is rewarded for rejecting the deviating proposal. This requires coordinating on a large amount of information so that at every information set the players know (and agree) what constitutes a deviation. In fact, RW show that if the amount of information available to the agents is strictly limited so that the agents do not recall the history of past play then the only equilibrium outcome is the competitive one. This suggests that the competitive outcome may result if agents use simple strategies. Furthermore, the equilibrium strategies used described in RW to support non-competitive outcomes are particularly unattractive because they require all players, including those buyers who do not end up trading, to follow complex non-stationary strategies in order to support a non-competitive outcome. But buyers who do not trade and receive zero payoff on the equilibrium path could always obtain at least zero by following a less complex strategy than the ones specified in RW’s construction. Thus, RW’s construction of non-competitive equilibria is not robust if players prefer, at least at the margin, a simpler strategy to a more complex one. Following the above observation, Sabourian [47], henceforth S, addresses the role of complexity (simplicity) in sustaining a multiplicity of non-competitive equilibria in RW’s model. The concept of complexity in S is similar

to that in Chatterjee and Sabourian [15]. It is defined by a partial ordering on the set of individual strategies (or automata) that very informally satisfies the following: If two strategies are otherwise identical except that in some role the second strategy uses more information than that available in the current period of bargaining and the first uses only the information available in the current period, then the second strategy is said to be more complex than the first. S also introduces complexity costs lexicographically into the RW game and shows that any PEC is history-independent and induces the competitive outcome in the sense that all trades take place at the unique competitive price of 1. Informally, S’s conclusions in the case of a single seller s and many buyers follows from the following three steps. First, since trading at the competitive price of 1 is the worst outcome for a buyer and the best outcome for the seller, by appealing to complexity type reasoning it can be shown that in any NEC a trader’s response to a price offer of 1 is always history-independent and thus he either always rejects 1 or always accepts 1. For example, if in the case of a buyer this were not the case, then since accepting 1 is a worst possible outcome, he could economize on complexity and obtain at least the same payoff by adopting another strategy that is otherwise the same as the equilibrium strategy except that it always rejects 1. Second, in any non-competitive NEC in which s receives a payoff of less than 1, there cannot be an agreement at a price of 1 between s and a buyer at any history. For example, if at some history, a buyer is offered p D 1 and he accepts then by the first step the buyer should accept p D 1 whenever it is offered; but this is a contradiction because it means that the seller can guarantee himself an equilibrium payoff of one by waiting until he has a chance to make a proposal to this buyer. Third, in any non-competitive PEC the continuation payoffs of all buyers are positive at every history. This follows immediately from the previous step because if there is no trade at p D 1 at any history it follows that each buyer can always obtain a positive payoff by offering the seller more than he can obtain in any subgame. Finally, because of competition between the buyers (there is one seller and many buyers), in any subgame perfect equilibrium there must be a buyer with a zero continuation payoff after some history. To illustrate the basic intuition for this claim, let m be the worst continuation payoff for s at any history and suppose that there exists a subgame at which s is the proposer in a match with a buyer b and the continuation payoff of s at this subgame is m. Then if at this subgame s proposes m C  ( > 0), b must reject (otherwise s can get more than m). Since the total surplus

1303

1304

Game Theory and Strategic Complexity

is 1, b must obtain at least 1  m   in the continuation game in order to reject s 0 s offer and s gets at least m, this 0 implies that the continuation payoff of all b ¤ b after b s rejection is less than ". The result follows by making " arbitrarily small (and by appealing to the finiteness of f ). But the last two claims contradict each other unless the equilibrium is competitive. This establishes the result for the case in which there is one seller and many buyers. The case of a market with more than one seller is established by induction on the number of sellers. The matching technology in the above model is random. RW also consider another market game with the matching is endogenous: At each date each seller (the short side of the market) chooses his trading partner. Here, they show that non-competitive outcomes and multiplicity of equilibria survive even when the players discount the future. By strengthening the notion of complexity S also shows that in the endogenous matching model of RW the competitive outcome is the only equilibrium if complexity considerations are present. These results suggest perfectly competitive behavior may result if agents have, at least at the margin, preferences for simple strategies. Unfortunately, both RW and S have too simple a market set-up; for example, it is assumed that the buyers are all identical, similarly for the sellers and each agent trades at most one unit of the good. Do the conclusions extend to richer models of trade? Heterogeneous Markets There are good reasons to think that it may be too difficult (or even impossible) to establish a similar set of conclusions as in S in a richer framework. For example, consider a heterogeneous market for a single indivisible good, where buyers (and sellers) have a range of valuations of the good and each buyer wants at most one unit of the good and each seller has one unit of the good for sale. In this case the analysis of S will not suffice. First, in the homogeneous market of RW, except for the special case where the number of buyers is equal to the number of sellers, the competitive equilibrium price is either 0 or 1 and all of the surplus goes to one side of the market. S’s selection result crucially uses this property of the competitive equilibrium. By contrast, in a heterogeneous market, in general there will be agents receiving positive payoffs on both sides of the market in a competitive equilibrium. Therefore, one cannot justify the competitive outcome simply by focusing on extreme outcomes in which there is no surplus for one party from trade. Second, in a homogeneous market individually rational trade is by definition efficient. This may not be the case in a heterogeneous market (an inefficient trade

between inframarginal and an extramarginal agent can be individually rational). Third, in a homogeneous market, the set of competitive prices remains constant, independently of the set of agents remaining in the market. In the heterogeneous market, this need not be so and in some cases, the new competitive interval may not even intersect the old one. The change in the competitive interval of prices as the result of trade exacerbates the problems associated with using an induction hypothesis because here future prices may be conditioned on past trades even if prices are restricted to be competitive ones. Despite these difficulties associated with a market with a heterogeneous set of buyers and sellers, Gale and Sabourian [22], henceforth GS, show that the conclusions of S can be extended to the case of a heterogeneous market in which each agent trades at most one unit of the good. GS, however, focus on deterministic sequential matching models in which one pair of agents are matched at each date and they leave the market if they reach an agreement. In particular, they start by considering exogenous matching processes in which the identities of the proposer and responder at each date are an exogenous and deterministic function of the set of agents remaining in the market and the date. The main result of the paper is that a PEC is always competitive in in such a heterogeneous market, thus supporting the view that competitive equilibrium may arise in a finite market where complex behavior is costly. The notion of complexity in GS is similar to that in S [15]. However, in the GS set-up with heterogeneous buyers and sellers the set of remaining agents changes depending who has traded and left the market and who is remaining, and this affects the market conditions. (In the homogeneous case, only the number of remaining agents matters.) Therefore, the definition of complexity in GS is with reference to a given set of remaining agents. GS also discuss an alternative notion of complexity that is independent of the set of remaining agents; such a definition may be too strong and may result in an equilibrium set being empty. To show their result, GS first establish two very useful restrictions on the strategies that form a NEC (similar to the no delay result in Chatterjee and Sabourian [15]). First, they show that if along the equilibrium path a pair of agents k and ` trade at a price p with k as the proposer and ` as the responder then k and ` always trade at p, irrespective of the previous history, whenever the two agents are matched in the same way with the same remaining set of agents. To show this consider first the case of the responder `. Then it must be that at every history with the same remaining set of agents ` always accepts p by k. Oth-

Game Theory and Strategic Complexity

erwise, ` could economize on complexity by choosing another strategy that is otherwise identical to his equilibrium strategy except that it always accepts p from k without sacrificing any payoff: Such a change of behavior is clearly more simple than sometimes accepting and sometimes rejecting the offer and moreover, it results in either agent k proposing p and ` accepting, so the payoff to agent ` is the same as from the equilibrium strategy, or agent k not offering p, in which case the change in the strategy is not observed and the play of the game is unaffected by the deviation. Furthermore, it must also be that at every history with the same remaining set of agents agent k proposes p in any match with `. Otherwise, k could economize on complexity by choosing another strategy that is otherwise identical to his equilibrium strategy except that it always proposes p to ` without sacrificing any payoff on the equilibrium path: Such a change of behavior is clearly more simple and moreover k’s payoff is not affected because either agent k and ` are matched and k proposes p and ` by the previous argument accepts, so the payoff to agent k is the same as from the equilibrium strategy, or agent k and ` are not matched with k as the proposer, in which case the change in the strategy is not observed and the play of the game is unaffected by the deviation. GS’s shows a second restriction, again with the same remaining set of agents, namely that in any NEC for any pair of agents k and `, player `’s response to k’s (on or off-the-equilibrium path) offer is always the same. Otherwise, it follows that ` sometimes accepts an offer p by k and sometimes rejects (with the same remaining set of agents). Then by the first restriction it must be that if such an offer is made by k to ` on the equilibrium path it is rejected. But then ` can could economize on complexity by always rejecting p by k without sacrificing any payoff on the equilibrium path: Such a change of behavior is clearly more simple and furthermore `’s payoff is not affected because such a behavior is the same as what the equilibrium strategy prescribes on the equilibrium path. By appealing to the above two properties of NEC and to the competitive nature of the market GS establish, using a complicated induction argument, that every PEC induces a competitive outcome in which each trade occurs at the same competitive price. The matching model we have described so far is deterministic and exogenous. The selection result of GS however extends to richer deterministic matching models. In particular, GS also consider a semi-endogenous sequential matching model in which the choice of partners is endogenous but the identity of the proposer at any date is exogenous. Their results extends to this variation, with an endogenous choice of responders. A more radical depar-

ture change would be to consider the case where at any date any agent can choose his partner and make a proposal. Such a totally endogenous model of trade generates new conceptual problems. In a recent working paper Gale and Sabourian [24] consider a continuous time version of such a matching model and show that complexity considerations allows one to select a competitive outcome in the case of totally endogenous matching. Since the selection result holds for all the different matching models we can conclude that complexity considerations inducing a competitive outcome seem to be a robust result in deterministic matching and bargaining market games with heterogeneous agents. Random matching is commonly used in economic models because of its tractability. The basic framework of GS, however, does not extend to such a framework if either the buyers or the sellers are not identical. This is for two different reasons. First, in general in any random framework there is more than one outcome path that can occur in equilibrium with a positive probability; as a result introducing complexity lexicographically may not be enough to induce agents to behave in a simple way (they will have to be complex enough to play optimally along all paths that occur with a positive probability). Second, in Gale and Sabourian [23] it is shown that subgame perfect equilibria in Markov strategies are not necessarily perfectly competitive for the random matching model with heterogeneous agents. Since the definition of complexity in GS is such that Markov strategies are the least complex ones, it follows that with random matching the complexity definition used in GS is not sufficient to select a competitive outcome. Complexity and Off-The-Equilibrium Path Play The concept of the PEC (or NEC) used in S, GS and elsewhere was defined to be such that for each player the strategy/automaton has minimal complexity amongst all strategies/automata that are best responses to the equilibrium strategies/automata of others. Although, these concepts are very mild in the treatment of complexity, it should be noted that there are other ways of introducing complexity into the equilibrium concept. One extension of the above set-up is to treat complexity as a (small) positive fixed cost of choosing a more complex strategy and define a Nash (subgame perfect) equilibrium with a fixed positive complexity costs accordingly. All the selection results based on lexicographic complexity in the papers we discuss in this survey also hold for positive small complexity costs. This is not surprising because with positive costs complexity has at least as much bite as in the lexicographic case; there is at least as much refinement of the equilibrium

1305

1306

Game Theory and Strategic Complexity

concept with the former as with the latter. In particular, in the case of a NEC (or a PEC), in considering complexity, players ignore any consideration of payoffs off the equilibrium path and the trade-off is between the equilibrium payoffs of two strategies and the complexity of the two. As a result these concepts put more weight on complexity costs than on being “prepared” for off-the-equilibriumpath moves. Therefore, although complexity costs are insignificant, they take priority over optimal behavior after deviations. (See [16] for a discussion.) A different approach would be to assume that complexity is a less significant criterion than the off-the-equilibrium payoffs. In the extreme case, one would require agents to choose minimally complex strategies among the set of strategies that are best responses on and off the equilibrium path (see Kalai and Neme [28]). An alternative way of illustrating the differences between the different approaches is by introducing two kinds of vanishingly small perturbations into the underlying game. One perturbation is to impose a small but positive cost of choosing a more complex strategy. Another perturbation is to introduce a small but positive probability of making an error (off-the-equilibrium-path move). Since a PEC requires each agents to choose a minimally complex strategy within the set of best responses, it follows that the limit points of Nash equilibria of the above perturbed game correspond to the concept of PEC if we first let the probability of making an off-the-equilibrium-path move go to zero and then let the cost of choosing a more complex strategy go to zero (this is what Chatterjee and Sabourian [15] do). On the other hand, in terms of the above limiting arguments, if we let the cost of choosing a more complex strategy go to zero and then let the probability of making an off-the-equilibrium-path move go to zero then any limit corresponds to the equilibrium definition in Kalai and Neme [28] where agents choose minimally complex strategies among the set of strategies that are best responses on and off the equilibrium path. Most of the results reported in this paper on refinement and endogenous complexity (for example Abreu– Rubinstein [1]), Chatterjee and Sabourian [15], Gale and Sabourian [22] and Lee and Sabourian [31] hold only for the concept of NEC and its variations and thus depend crucially on assuming that complexity costs are more important than off-the-equilibrium payoffs. This is because these results always appeal to an argument that involves economizing on complexity if the complexity is not used off the equilibrium path. Therefore, they may be a good predictor of what may happen only if complexity costs are more significant than the perturbations that induce offthe-equilibrium-path behavior. The one exception is the

selection result in S [47]. Here, although the result we have reported is stated for NEC and its variations, it turns out that the selection of competitive equilibrium does not in fact depend on the relative importance of complexity costs and off-the-equilibrium path payoffs. It remains true even for the case where the strategies are required to be least complex amongst those that are best responses at every information set. This is because in S’s analysis complexity is only used to show that every agent’s response to the price offer of 1 is always the same irrespective of the past history of play. This conclusion holds irrespective of the relative importance of complexity costs and off-the-equilibrium payoff because trading at the price of 1 is the best outcome that any seller can achieve at any information set (including those off-the-equilibrium) and a worst outcome for any buyer. Therefore, irrespective of the order, the strategy of sometimes accepting a price of 1 and sometimes rejecting cannot be an equilibrium for a buyer (similar arguments applies for a seller) because the buyer can economize on complexity by always rejecting the offer without sacrificing any payoff off or on-the-equilibrium path (accepting p D 1 is a worse possible outcome). Discussion and Future Directions The use of finite automata as a model of players in a game has been criticized as being inadequate, especially because as the number of states becomes lower it becomes more and more difficult for the small automaton to do routine calculations, let alone the best response calculations necessary for game-theoretic equilibria. Some of the papers we have explored address other aspects of complexity that arise from the concrete nature of the games under consideration. Alternative models of complexity are also suggested, such as computational complexity and communication complexity. While our work and the earlier work on which it builds focuses on equilibrium, an alternative approach might seek to see whether simplicity evolves in some reasonable learning model. Maenner [32] has undertaken such an investigation with the infinitely repeated Prisoners’ Dilemma (studied in the equilibrium context by Abreu and Rubinstein). Maenner provides an argument for “learning to be simple”. On the other hand, there are arguments for increasing complexity in competitive games ([42]). It is an open question, therefore, whether simplicity could arise endogenously through learning, though it seems to be a feature of most human preferences and aesthetics (see [11]). The broader research program of explicitly considering complexity in economic settings might be a very fruit-

Game Theory and Strategic Complexity

ful one. Auction mechanisms are designed with an eye towards how complex they are – simplicity is a desideratum. The complexity of contracting has given rise to a whole literature on incomplete contracts, where some models postulate a fixed cost per contingency described in the contract. All this is apart from the popular literature on complexity, which seeks to understand complex, adaptive systems from biology. The use of formal complexity measures such as those considered in this survey and the research we describe might throw some light on whether incompleteness of contracts, or simplicity of mechanisms, is an assumption or a result (of explicitly considering choice of level of complexity). Acknowledgments We wish to thank an anonymous referee and Jihong Lee for valuable comments that improved the exposition of this chapter. We would also like to thank St. John’s College, Cambridge and the Pennsylvania State University for funding Dr Chatterjee’s stay in Cambridge at the time this chapter was written. Bibliography 1. Abreu D, Rubinstein A (1988) The structure of Nash equilibria in repeated games with finite automata. Econometrica 56:1259– 1282 2. Anderlini L (1990) Some notes on Church’s thesis and the theory of games. Theory Decis 29:19–52 3. Anderlini L, Sabourian H (1995) Cooperation and effective computability. Econometrica 63:1337–1369 4. Aumann RJ (1981) Survey of repeated games. In: Essays in game theory and mathematical economics in honor of Oskar Morgenstern. Bibliographisches Institut, Mannheim/Vienna/ Zurich, pp 11–42 5. Banks J, Sundaram R (1990) Repeated games, finite automata and complexity. Games Econ Behav 2:97–117 6. Ben Porath E (1986) Repeated games with bounded complexity. Mimeo, Stanford University 7. Ben Porath E (1993) Repeated games with finite automata. J Econ Theory 59:17–32 8. Binmore KG (1987) Modelling rational players I. Econ Philos 3:179–214 9. Binmore KG, Samuelson L (1992) Evolutionary stability in repeated games played by finite automata. J Econ Theory 57:278–305 10. Binmore KG, Piccione M, Samuelson L (1998) Evolutionary stability in alternating-offers bargaining games. J Econ Theory 80:257–291 11. Birkhoff GD (1933) Aesthetic measure. Harvard University Press, Cambridge 12. Bloise G (1998) Strategic complexity and equilibrium in repeated games. Unpublished doctoral dissertation, University of Cambridge 13. Busch L-A, Wen Q (1995) Perfect equilibria in a negotiation model. Econometrica 63:545–565

14. Chatterjee K (2002) Complexity of strategies and multiplicity of Nash equilibria. Group Decis Negot 11:223–230 15. Chatterjee K, Sabourian H (2000) Multiperson bargaining and strategic complexity. Econometrica 68:1491–1509 16. Chatterjee K, Sabourian H (2000) N-person bargaining and strategic complexity. Mimeo, University of Cambridge and the Pennsylvania State University 17. Debreu G (1959) Theory of value. Yale University Press, New Haven/London 18. Fernandez R, Glazer J (1991) Striking for a bargain between two completely informed agents. Am Econ Rev 81:240–252 19. Fudenberg D, Maskin E (1990) Evolution and repeated games. Mimeo, Harvard/Princeton 20. Fudenberg D, Tirole J (1991) Game theory. MIT Press, Cambridge 21. Gale D (2000) Strategic foundations of general equilibrium: Dynamic matching and bargaining games. Cambridge University Press, Cambridge 22. Gale D, Sabourian H (2005) Complexity and competition. Econometrica 73:739–770 23. Gale D, Sabourian H (2006) Markov equilibria in dynamic matching and bargaining games. Games Econ Behav 54:336– 352 24. Gale D, Sabourian H (2008) Complexity and competition II; endogenous matching. Mimeo, New York University/University of Cambridge 25. Haller H, Holden S (1990) A letter to the editor on wage bargaining. J Econ Theory 52:232–236 26. Hayek F (1945) The use of knowledge in society. Am Econ Rev 35:519–530 27. Herrero M (1985) A Strategic theory of market institutions. Unpublished doctoral dissertation, London School of Economics 28. Kalai E, Neme A (1992) The strength of a little perfection. Int J Game Theory 20:335–355 29. Kalai E, Stanford W (1988) Finite rationality and interpersonal complexity in repeated games. Econometrica 56:397–410 30. Klemperer P (ed) (2000) The economic theory of auctions. Elgar, Northampton 31. Lee J, Sabourian H (2007) Coase theorem, complexity and transaction costs. J Econ Theory 135:214–235 32. Maenner E (2008) Adaptation and complexity in repeated games. Games Econ Behav 63:166–187 33. Miller GA (1956) The magical number seven plus or minus two: Some limits on our capacity to process information. Psychol Rev 63:81–97 34. Neme A, Quintas L (1995) Subgame perfect equilibrium of repeated games with implementation cost. J Econ Theory 66:599–608 35. Neyman A (1985) Bounded complexity justifies cooperation in the finitely-repeated Prisoners’ Dilemma. Econ Lett 19:227– 229 36. Neyman A (1997) Cooperation, repetition and automata in cooperation: Game-theoretic approaches. In: Hart S, Mas-Colell A (eds) NATO ASI Series F, vol 155. Springer, Berlin, pp 233–255 37. Osborne M, Rubinstein A (1990) Bargaining and markets. Academic, New York 38. Osborne M, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge 39. Papadimitriou CH (1992) On games with a bounded number of states. Games Econ Behav 4:122–131

1307

1308

Game Theory and Strategic Complexity

40. Piccione M (1992) Finite automata equilibria with discounting. J Econ Theory 56:180–193 41. Piccione M, Rubinstein A (1993) Finite automata play a repeated extensive game. J Econ Theory 61:160–168 42. Robson A (2003) The evolution of rationality and the Red Queen. J Econ Theory 111:1–22 43. Rubinstein A (1982) Perfect equilibrium in a bargaining model. Econometrica 50:97–109 44. Rubinstein A (1986) Finite automata play the repeated Prisoners’ Dilemma. J Econ Theory 39:83–96 45. Rubinstein A (1998) Modeling bounded rationality. MIT Press, Cambridge 46. Rubinstein A, Wolinsky A (1990) Decentralized trading, strate-

47. 48.

49.

50.

gic behaviour and the Walrasian outcome. Rev Econ Stud 57:63–78 Sabourian H (2003) Bargaining and markets: Complexity and the competitive outcome. J Econ Theory 116:189–228 Selten R (1965) Spieltheoretische Behandlung eines Oligopolmodells mit Nachfrageträgheit. Z gesamte Staatswiss 12:201– 324 Shaked A (1986) A three-person unanimity game. In: The Los Angeles national meetings of the Institute of Management Sciences and the Operations Research Society of America, Mimeo, University of Bonn Zemel E (1989) Small talk and cooperation: A note on bounded rationality. J Econ Theory 49:1–9

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

Genetic and Evolutionary Algorithms and Programming: General Introduction and Application to Game Playing MICHAEL ORLOV, MOSHE SIPPER, AMI HAUPTMAN Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel Article Outline Glossary Definition of the Subject Introduction Evolutionary Algorithms A Touch of Theory Extensions of the Basic Methodology Lethal Applications Evolutionary Games Future Directions Bibliography Glossary Evolutionary algorithms/evolutionary computation A family of algorithms inspired by the workings of evolution by natural selection whose basic structure is to: 1. produce an initial population of individuals, these latter being candidate solutions to the problem at hand 2. evaluate the fitness of each individual in accordance with the problem whose solution is sought 3. while termination condition not met do (a) select fitter individuals for reproduction (b) recombine (crossover) individuals (c) mutate individuals (d) evaluate fitness of modified individuals end while Genome/chromosome An individual’s makeup in the population of an evolutionary algorithm is known as a genome, or chromosome. It can take on many forms, including bit strings, real-valued vectors, characterbased encodings, and computer programs. The representation issue – namely, defining an individual’s genome (well) – is critical to the success of an evolutionary algorithm. Fitness A measure of the quality of a candidate solution in the population. Also known as fitness function. Defining this function well is critical to the success of an evolutionary algorithm.

Selection The operator by which an evolutionary algorithm selects (usually probabilistically) higher-fitness individuals to contribute genetic material to the next generation. Crossover One of the two main genetic operators applied by an evolutionary algorithm, wherein two (or more) candidate solutions (parents) are combined in some pre-defined manner to form offspring. Mutation One of the two main genetic operators applied by an evolutionary algorithm, wherein one candidate solution is randomly altered. Definition of the Subject Evolutionary algorithms are a family of search algorithms inspired by the process of (Darwinian) evolution in nature. Common to all the different family members is the notion of solving problems by evolving an initially random population of candidate solutions, through the application of operators inspired by natural genetics and natural selection, such that in time fitter (i. e., better) solutions emerge. The field, whose origins can be traced back to the 1950s and 1960s, has come into its own over the past two decades, proving successful in solving multitudinous problems from highly diverse domains including (to mention but a few): optimization, automatic programming, electronic-circuit design, telecommunications, networks, finance, economics, image analysis, signal processing, music, and art. Introduction The first approach to artificial intelligence, the field which encompasses evolutionary computation, is arguably due to Turing [31]. Turing asked the famous question: “Can machines think?” Evolutionary computation, as a subfield of AI, may be the most straightforward answer to such a question. In principle, it might be possible to evolve an algorithm possessing the functionality of the human brain (this has already happened at least once: in nature). In a sense, nature is greatly inventive. One often wonders how so many magnificent solutions to the problem of existence came to be. From the intricate mechanisms of cellular biology, to the sandy camouflage of flatfish; from the social behavior of ants to the diving speed of the peregrine falcon – nature created versatile solutions, at varying levels, to the problem of survival. Many ingenious solutions were invented (and still are), without any obvious intelligence directly creating them. This is perhaps the main motivation behind evolutionary algorithms: creating the settings for a dynamic environment, in which solutions can be created and improved in the course of time, ad-

1309

1310

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

vancing in new directions, with minimal direct intervention. The gain to problem solving is obvious. Evolutionary Algorithms In the 1950s and the 1960s several researchers independently studied evolutionary systems with the idea that evolution could be used as an optimization tool for engineering problems. Central to all the different methodologies is the notion of solving problems by evolving an initially random population of candidate solutions, through the application of operators inspired by natural genetics and natural selection, such that in time fitter (i. e., better) solutions emerge [9,16,19,28]. This thriving field goes by the name of evolutionary algorithms or evolutionary computation, and today it encompasses two main branches – genetic algorithms [9] and genetic programming [19] – in addition to less prominent (though important) offshoots, such as evolutionary programming [10] and evolution strategies [26]. A genetic algorithm (GA) is an iterative procedure that consists of a population of individuals, each one represented by a finite string of symbols, known as the genome, encoding a possible solution in a given problem space. This space, referred to as the search space, comprises all possible solutions to the problem at hand. Generally speaking, the genetic algorithm is applied to spaces which are too large to be exhaustively searched. The symbol alphabet used is often binary, but may also be characterbased, real-valued, or any other representation most suitable to the problem at hand. The standard genetic algorithm proceeds as follows: an initial population of individuals is generated at random or heuristically. Every evolutionary step, known as a generation, the individuals in the current population are decoded and evaluated according to some predefined quality criterion, referred to as the fitness, or fitness function. To form a new population (the next generation), individuals are selected according to their fitness. Many selection procedures are available, one of the simplest being fitnessproportionate selection, where individuals are selected with a probability proportional to their relative fitness. This ensures that the expected number of times an individual is chosen is approximately proportional to its relative performance in the population. Thus, high-fitness (good) individuals stand a better chance of reproducing, while lowfitness ones are more likely to disappear. Selection alone cannot introduce any new individuals into the population, i. e., it cannot find new points in the search space; these are generated by genetically-inspired operators, of which the most well known are crossover and

mutation. Crossover is performed with probability pcross (the crossover probability or crossover rate) between two selected individuals, called parents, by exchanging parts of their genomes (i. e., encodings) to form one or two new individuals, called offspring. In its simplest form, substrings are exchanged after a randomly-selected crossover point. This operator tends to enable the evolutionary process to move toward promising regions of the search space. The mutation operator is introduced to prevent premature convergence to local optima by randomly sampling new points in the search space. It is carried out by flipping bits at random, with some (small) probability pmut . Genetic algorithms are stochastic iterative processes that are not guaranteed to converge. The termination condition may be specified as some fixed, maximal number of generations or as the attainment of an acceptable fitness level. Figure 1 presents the standard genetic algorithm in pseudo-code format. Let us consider the following simple example, demonstrating the GA’s workings. The population consists of four individuals, which are binary-encoded strings (genomes) of length 10. The fitness value equals the number of ones in the bit string, with pcross D 0:7 and pmut D 0:05. More typical values of the population size and the genome length are in the range 50–1000. Note that fitness computation in this case is extremely simple, since no complex decoding or evaluation is necessary. The initial (randomly generated) population might look as shown in Table 1. Using fitness-proportionate selection we must choose four individuals (two sets of parents), with probabilities proportional to their relative fitness values. In our example, suppose that the two parent pairs are fp2 ; p4 g and fp1 ; p2 g (note that individual p3 did not get selected as our procedure is probabilistic). Once a pair of parents is se-

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Figure 1 Pseudo-code of the standard genetic algorithm

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 1 The initial population Label p1 p2 p3 p4

Genome 0000011011 1110111101 0010000010 0011010000

Fitness 4 8 2 3

lected, crossover is effected between them with probability pcross , resulting in two offspring. If no crossover is effected (with probability 1  pcross ), then the offspring are exact copies of each parent. Suppose, in our example, that crossover takes place between parents p2 and p4 at the (randomly chosen) third bit position: 111j0111101 001j1010000 This results in offspring p01 D 1111010000 and D 0010111101. Suppose no crossover is performed between parents p1 and p2 , forming offspring that are exact copies of p1 and p2 . Our interim population (after crossover) is thus as depicted in Table 2: Next, each of these four individuals is subject to mutation with probability pmut per bit. For example, suppose offspring p02 is mutated at the sixth position and offspring p04 is mutated at the ninth bit position. Table 3 describes the resulting population. The resulting population is that of the next generation (i. e., p00i equals pi of the next generation). As can be seen, p02

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 2 The interim population Label p01 p02 p03 p04

Genome 1111010000 0010111101 0000011011 1110111101

Fitness 5 6 4 8

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 3 The resulting population Label p00 1 p00 2 p00 3 p00 4

Genome 1111010000 0010101101 0000011011 1110111111

Fitness 5 5 4 9

the transition from one generation to the next is through application of selection, crossover, and mutation. Moreover, note that the best individual’s fitness has gone up from eight to nine, and that the average fitness (computed over all individuals in the population) has gone up from 4.25 to 5.75. Iterating this procedure, the GA will eventually find a perfect string, i. e., with maximal fitness value of ten. Another prominent branch of the evolutionary computation tree is that of genetic programming, introduced by Cramer [7], and transformed into a field in its own right in large part due to the efforts of Koza [19]. Basically, genetic programming (GP) is a GA (genetic algorithm) with individuals in the population being programs instead of bit strings. In GP we evolve a population of individual LISP expressions1 , each comprising functions and terminals. The functions are usually arithmetic and logic operators that receive a number of arguments as input and compute a result as output; the terminals are zero-argument functions that serve both as constants and as sensors, the latter being a special type of function that queries the domain environment. The main mechanism behind GP is precisely that of a GA, namely, the repeated cycling through four operations applied to the entire population: evaluate-selectcrossover-mutate. However, the evaluation of a single individual in GP is usually more complex than with a GA since it involves running a program. Moreover, crossover and mutation need to be made to work on trees (rather than simple bit strings), as shown in Fig. 2. A Touch of Theory Evolutionary computation is mostly an experimental field. However, over the years there have been some notable theoretical treatments of the field, gaining valuable insights into the properties of evolving populations. Holland [17] introduced the notion of schemata, which are abstract properties of binary-encoded individuals, and analyzed the growth of different schemas when fitness-proportionate selection, point mutation and onepoint crossover are employed. Holland’s approach has since been enhanced and more rigorous analysis performed; however, there were not many practical consequences on the existing evolutionary techniques, since most of the successful methods are usually much more complex in many aspects. Moreover, the schematic anal1 Languages other than LISP have been used, although LISP is still by far the most popular within the genetic programming domain.

1311

1312

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Figure 2 Genetic operators in genetic programming. LISP programs are depicted as trees. Crossover (top): Two sub-trees (marked in bold) are selected from the parents and swapped. Mutation (bottom): A sub-tree (marked in bold) is selected from the parent individual and removed. A new sub-tree is grown instead

ysis suffers from an important approximation of infinite population size, while in reality schemata can vanish. Note that the No Free Lunch theorem states that “. . . for any [optimization] algorithm, any elevated performance over one class of problems is exactly paid for in performance over another class” [32]. Extensions of the Basic Methodology We have reviewed the basic evolutionary computation methods. More advanced techniques are used to tackle complex problems, where an approach of a single population with homogeneous individuals does not suffice. One such advanced approach is coevolution [24]. Coevolution refers to the simultaneous evolution of two or more species with coupled fitness. Such coupled evolution favors the discovery of complex solutions whenever complex solutions are required. Simplistically speaking, one can say that coevolving species can either compete (e. g., to obtain exclusivity on a limited resource) or cooperate (e. g., to gain access to some hard-to-attain resource). In a competitive coevolutionary algorithm the fitness of an individual is based on direct competition with individuals of other species, which in turn evolve separately in their own populations. Increased fitness of one of the species

implies a diminution in the fitness of the other species. This evolutionary pressure tends to produce new strategies in the populations involved so as to maintain their chances of survival. This arms race ideally increases the capabilities of each species until they reach an optimum. Cooperative (also called symbiotic) coevolutionary algorithms involve a number of independently evolving species which together form complex structures, well suited to solve a problem. The fitness of an individual depends on its ability to collaborate with individuals from other species. In this way, the evolutionary pressure stemming from the difficulty of the problem favors the development of cooperative strategies and individuals. Single-population evolutionary algorithms often perform poorly – manifesting stagnation, convergence to local optima, and computational costliness – when confronted with problems presenting one or more of the following features: 1) the sought-after solution is complex, 2) the problem or its solution is clearly decomposable, 3) the genome encodes different types of values, 4) strong interdependencies among the components of the solution, and 5) components-ordering drastically affects fitness [24]. Cooperative coevolution addresses effectively these issues, consequently widening the range of applications of evolutionary computation.

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

Consider, for instance, the evolution of neural networks [33]. A neural network consists of simple units called neurons, each having several inputs and a single output. The inputs are assigned weights, and a weighted sum of the inputs exceeding a certain threshold causes the neuron to fire an output signal. Neurons are usually connected using a layered topology. When we approach the task of evolving a neural network possessing some desired property naively, we will probably think of some linearized representation of a neural network, encoding both the neuron locations in the network, and their weights. However, evolving such a network with a simple evolutionary algorithm might prove quite a frustrating task, since much information is encoded in each individual, and it is not homogeneous, which presents us with the difficult target of evolving the individuals as single entities. On the other hand, this task can be dealt with more sagely via evolving two independently encoded populations of neurons and network topologies. Stanley and Miikkulainen [30] evaluate the fitness of an individual in one of the populations using the individuals of the other. In addition to the simplification of individuals in each population, the fitness is now dynamic, and an improvement in the evolution of topologies triggers a corresponding improvement in the population of neurons, and vice versa. Lethal Applications In this section we review a number of applications that – though possibly not killer (death being in the eye of the beholder. . . ) – are most certainly lethal. These come from a sub-domain of evolutionary algorithms, which has been gaining momentum over the past few years: human-competitive machine intelligence. Koza et al. [20] recently affirmed that the field of evolutionary algorithms “now routinely delivers high-return human-competitive machine intelligence”, meaning, according to [20]:  Human-competitive: Getting machines to produce human-like results, e. g., a patentable invention, a result publishable in the scientific literature, or a game strategy that can hold its own against humans.  High-return: Defined by Koza et al. as a high artificialto-intelligence ratio (A/I), namely, the ratio of that which is delivered by the automated operation of the artificial method to the amount of intelligence that is supplied by the human applying the method to a particular system.  Routine: The successful handling of new problems once the method has been jump-started.

 Machine intelligence: To quote Arthur Samuel, getting “machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.” Indeed, as of 2004 the major annual event in the field of evolutionary algorithms – GECCO (Genetic and Evolutionary Computation Conference; see www.sigevo. org) – boasts a prestigious competition that awards prizes to human-competitive results. As noted at www. human-competitive.org: “Techniques of genetic and evolutionary computation are being increasingly applied to difficult real-world problems – often yielding results that are not merely interesting, but competitive with the work of creative and inventive humans.” We now describe some winners from the HUMIES competition at www.human-competitive.org. Lohn et al. [22] won a Gold Medal in the 2004 competition for an evolved X-band antenna design and flight prototype to be deployed on NASA’s Space Technology 5 (ST5) spacecraft: The ST5 antenna was evolved to meet a challenging set of mission requirements, most notably the combination of wide beamwidth for a circularly-polarized wave and wide bandwidth. Two evolutionary algorithms were used: one used a genetic algorithm style representation that did not allow branching in the antenna arms; the second used a genetic programming style tree-structured representation that allowed branching in the antenna arms. The highest performance antennas from both algorithms were fabricated and tested, and both yielded very similar performance. Both antennas were comparable in performance to a hand-designed antenna produced by the antenna contractor for the mission, and so we consider them examples of human-competitive performance by evolutionary algorithms [22]. Preble et al. [25] won a gold medal in the 2005 competition for designing photonic crystal structures with large band gaps. Their result is “an improvement of 12.5% over the best human design using the same index contrast platform.” Recently, Kilinç et al. [18] was awarded the Gold Medal in the 2006 competition for designing oscillators using evolutionary algorithms, where the oscillators possess characteristics surpassing the existing human-designed analogs. Evolutionary Games Evolutionary games is the application of evolutionary algorithms to the evolution of game-playing strategies

1313

1314

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

for various games, including chess, backgammon, and Robocode.

mental and integrative research needed to achieve human-level AI [21].

Motivation and Background

Evolving Game-Playing Strategies

Ever since the dawn of artificial intelligence in the 1950s, games have been part and parcel of this lively field. In 1957, a year after the Dartmouth Conference that marked the official birth of AI, Alex Bernstein designed a program for the IBM 704 that played two amateur games of chess. In 1958, Allen Newell, J. C. Shaw, and Herbert Simon introduced a more sophisticated chess program (beaten in thirty-five moves by a ten-year-old beginner in its last official game played in 1960). Arthur L. Samuel of IBM spent much of the fifties working on game-playing AI programs, and by 1961 he had a checkers program that could play rather decently. In 1961 and 1963 Donald Michie described a simple trial-and-error learning system for learning how to play Tic-Tac-Toe (or Noughts and Crosses) called MENACE (for Matchbox Educable Noughts and Crosses Engine). These are but examples of highly popular games that have been treated by AI researchers since the field’s inception. Why study games? This question was answered by Susan L. Epstein, who wrote:

Recently, evolutionary algorithms have proven a powerful tool that can automatically design successful game-playing strategies for complex games [2,3,13,14,15,27,29].

There are two principal reasons to continue to do research on games. . . . First, human fascination with game playing is long-standing and pervasive. Anthropologists have catalogued popular games in almost every culture. . . . Games intrigue us because they address important cognitive functions. . . . The second reason to continue game-playing research is that some difficult games remain to be won, games that people play very well but computers do not. These games clarify what our current approach lacks. They set challenges for us to meet, and they promise ample rewards [8]. Studying games may thus advance our knowledge in both cognition and artificial intelligence, and, last but not least, games possess a competitive angle which coincides with our human nature, thus motivating both researcher and student alike. Even more strongly, Laird and van Lent [21] proclaimed that, . . . interactive computer games are the killer application for human-level AI. They are the application that will soon need human-level AI, and they can provide the environments for research on the right kinds of problems that lead to the type of the incre-

1. Chess (endgames) Evolve a player able to play endgames [13,14,15,29]. While endgames typically contain but a few pieces, the problem of evaluation is still hard, as the pieces are usually free to move all over the board, resulting in complex game trees – both deep and with high branching factors. Indeed, in the chess lore much has been said and written about endgames. 2. Backgammon Evolve a full-fledged player for the nondoubling-cube version of the game [2,3,29]. 3. Robocode A simulation-based game in which robotic tanks fight to destruction in a closed arena (robocode. alphaworks.ibm.com). The programmers implement their robots in the Java programming language, and can test their creations either by using a graphical environment in which battles are held, or by submitting them to a central web site where online tournaments regularly take place. Our goal here has been to evolve Robocode players able to rank high in the international league [27,29]. A strategy for a given player in a game is a way of specifying which choice the player is to make at every point in the game from the set of allowable choices at that point, given all the information that is available to the player at that point [19]. The problem of discovering a strategy for playing a game can be viewed as one of seeking a computer program. Depending on the game, the program might take as input the entire history of past moves or just the current state of the game. The desired program then produces the next move as output. For some games one might evolve a complete strategy that addresses every situation tackled. This proved to work well with Robocode, which is a dynamic game, with relatively few parameters, and little need for past history. Another approach is to couple a current-state evaluator (e. g., board evaluator) with a next-move generator. One can go on to create a minimax tree, which consists of all possible moves, counter moves, counter countermoves, and so on; for real-life games, such a tree’s size quickly becomes prohibitive. The approach we used with backgammon and chess is to derive a very shallow, singlelevel tree, and evolve smart evaluation functions. Our artificial player is thus had by combining an evolved board

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

evaluator with a simple program that generates all nextmove boards (such programs can easily be written for backgammon and chess). In what follows we describe the definition of six items necessary in order to employ genetic programming: program architecture, set of terminals, set of functions, fitness measure, control parameters, and manner of designating result and terminating run. Example: Chess As our purpose is to create a schema-based program that analyzes single nodes thoroughly, in a way reminiscent of human thinking, we did not perform deep lookahead. We evolved individuals represented as LISP programs. Each such program receives a chess endgame position as input, and, according to its sensors (terminals) and functions, returns an evaluation of the board, in the form of a real number. Our chess endgame players consist of an evolved LISP program, together with a piece of software that generates all possible (legal) next-moves and feeds them to the program. The next-move with the highest score is selected (ties are broken stochastically). The player also identifies when the game is over (either by a draw or a win). Program Architecture As most chess players would agree, playing a winning position (e. g., with material advantage) is very different than playing a losing position, or an even one. For this reason, each individual contains not one but three separate trees: an advantage tree, an even tree, and a disadvantage tree. These trees are used according to the current status of the board. The disadvantage tree is smaller, since achieving a stalemate and avoiding exchanges requires less complicated reasoning. Most terminals and functions were used for all trees. The structure of three trees per individual was preserved mainly for simplicity reasons. It is actually possible to coevolve three separate populations of trees, without binding them to form a single individual before the end of the experiment. This would require a different experimental setting, and is one of our future-work ideas. Terminals and Functions While evaluating a position, an expert chess player considers various aspects of the board. Some are simple, while others require a deep understanding of the game. Chase and Simon found that experts recalled meaningful chess formations better than novices [6]. This led them to hypothesize that chess skill depends on a large knowledge base, indexed through thousands of familiar chess patterns.

We assumed that complex aspects of the game board are comprised of simpler units, which require less game knowledge, and are to be combined in some way. Our chess programs use terminals, which represent those relatively simple aspects, and functions, which incorporate no game knowledge, but supply methods of combining those aspects. As we used strongly typed GP [23], all functions and terminals were assigned one or more of two data types: Float and Boolean. We also included a third data type, named Query, which could be used as any of the former two. We also used ephemeral random constants (ERCs). The Terminal Set We developed most of our terminals by consulting several high-ranking chess players.2 The terminal set examined various aspects of the chessboard, and may be divided into three groups: Float values, created using the ERC mechanism. ERCs were chosen at random to be one of the following six values: ˙1  f 21 ; 13 ; 14 g  MAX (MAX was empirically set to 1000), and the inverses of these numbers. This guaranteed that when a value was returned after some group of features has been identified, it was distinct enough to engender the outcome. Simple terminals, which analyzed relatively simple aspects of the board, such as the number of possible moves for each king, and the number of attacked pieces for each player. These terminals were derived by breaking relatively complex aspects of the board into simpler notions. More complex terminals belonged to the next group (see below). For example, a player should capture his opponent’s piece if it is not sufficiently protected, meaning that the number of attacking pieces the player controls is greater than the number of pieces protecting the opponent’s piece, and the material value of the defending pieces is equal to or greater than the player’s. Adjudicating these considerations is not simple, and therefore a terminal that performs this entire computational feat by itself belongs to the next group of complex terminals. The simple terminals comprising this second group were derived by refining the logical resolution of the previous paragraphs’ reasoning: Is an opponent’s piece attacked? How many of the player’s pieces are attacking that piece? How many pieces are protecting a given opponent’s piece? What is the material value of pieces attacking and defending a given opponent’s piece? All these questions were embodied as terminals within the second group. The ability to easily embody such reasoning within the GP setup, as functions and terminals, is a major asset of GP. 2 The highest-ranking player we consulted was Boris Gutkin, ELO 2400, International Master, and fully qualified chess teacher.

1315

1316

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

Other terminals were also derived in a similar manner. See Table 4 for a complete list of simple terminals. Note that some of the terminals are inverted – we would like terminals to always return positive (or true) values, since these values represent a favorable position. This is why we used, for example, a terminal evaluating the player’s king’s distance from the edges of the board (generally a favorable feature for endgames), while using a terminal evaluating the proximity of the opponent’s king to the edges (again, a positive feature). Complex terminals: these are terminals that check the same aspects of the board a human player would. Some prominent examples include: the terminal OppPieceCanBeCaptured considering the capture of a piece; checking if the current position is a draw, a mate, or a stalemate (especially important for non-even boards); checking if there is a mate in one or two moves (this is the most complex terminal); the material value of the position; comparing the material value of the position to the original board – this is important since it is easier to consider change than to evaluate the board in an absolute manner. See Table 5 for a full list of complex terminals. Since some of these terminals are hard to compute, and most appear more than once in the individual’s trees, we used a memoization scheme to save time [1]: After the first calculation of each terminal, the result is stored, so that

further calls to the same terminal (on the same board) do not repeat the calculation. Memoization greatly reduced the evolutionary run-time. The Function Set The function set used included the If function, and simple Boolean functions. Although our tree returns a real number, we omitted arithmetic functions, for several reasons. First, a large part of contemporary research in the field of machine learning and game theory (in particular for perfect-information games) revolves around inducing logical rules for learning games (for example, see [4,5,11]). Second, according to the players we consulted, while evaluating positions involves considering various aspects of the board, some more important than others, performing logical operations on these aspects seems natural, while mathematical operations does not. Third, we observed that numeric functions sometimes returned extremely large values, which interfered with subtle calculations. Therefore the scheme we used was a (carefully ordered) series of Boolean queries, each returning a fixed value (either an ERC or a numeric terminal, see below). See Table 6 for the complete list of functions. Fitness Evaluation As we used a competitive evaluation scheme, the fitness of an individual was determined by its

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 4 Simple terminals for evolving chess endgame players. Opp: opponent, My: player Terminal B=NotMyKingInCheck() B=IsOppKingInCheck() F=MyKingDistEdges() F=OppKingProximityToEdges() F=NumMyPiecesNotAttacked() F=NumOppPiecesAttacked() F=ValueMyPiecesAttacking() F=ValueOppPiecesAttacking() B=IsMyQueenNotAttacked() B=IsOppQueenAttacked() B=IsMyFork() B=IsOppNotFork() F=NumMovesMyKing() F=NumNotMovesOppKing() F=MyKingProxRook() F=OppKingDistRook() B=MyPiecesSameLine() B=OppPiecesNotSameLine() B=IsOppKingProtectingPiece() B=IsMyKingProtectingPiece()

Description Is the player’s king not being checked? Is the opponent’s king being checked? The player’s king’s distance form the edges of the board The opponent’s king’s proximity to the edges of the board The number of the player’s pieces that are not attacked The number of the opponent’s attacked pieces The material value of the player’s pieces which are attacking The material value of the opponent’s pieces which are attacking Is the player’s queen not attacked? Is the opponent’s queen attacked? Is the player creating a fork? Is the opponent not creating a fork? The number of legal moves for the player’s king The number of illegal moves for the opponent’s king Proximity of my king and rook(s) Distance between opponent’s king and rook(s) Are two or more of the player’s pieces protecting each other? Are two or more of the opponent’s pieces protecting each other? Is the opponent’s king protecting one of his pieces? Is the player’s king protecting one of his pieces?

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 5 Complex terminals for evolving chess endgame players. Opp: opponent, My: player. Some of these terminals perform lookahead, while others compare with the original board Terminal F=EvaluateMaterial() B=IsMaterialIncrease() B=IsMate() B=IsMateInOne() B=OppPieceCanBeCaptured() B=MyPieceCannotBeCaptured() B=IsOppKingStuck() B=IsMyKingNotStuck() B=IsOppKingBehindPiece() B=IsMyKingNotBehindPiece() B=IsOppPiecePinned() B=IsMyPieceNotPinned()

Description The material value of the board Did the player capture a piece? Is this a mate position? Can the opponent mate the player after this move? Is it possible to capture one of the opponent’s pieces without retaliation? Is it not possible to capture one of the player’s pieces without retaliation? Do all legal moves for the opponent’s king advance it closer to the edges? Is there a legal move for the player’s king that advances it away from the edges? Is the opponent’s king two or more squares behind one of his pieces? Is the player’s king not two or more squares behind one of my pieces? Is one or more of the opponent’s pieces pinned? Are all the player’s pieces not pinned?

success against its peers. We used the random-two-ways method, in which each individual plays against a fixed number of randomly selected peers. Each of these encounters entailed a fixed number of games, each starting from a randomly generated position in which no piece was attacked. The score for each game was derived from the outcome of the game. Players that managed to mate their opponents received more points than those that achieved only a material advantage. Draws were rewarded by a score of low value and losses entailed no points at all. The final fitness for each player was the sum of all points earned in the entire tournament for that generation. Control Parameters and Run Termination We used the standard reproduction, crossover, and mutation operators. The major parameters were: population size – 80, generation count – between 150 and 250, reproduction probability – 0.35, crossover probability – 0.5, and mutation probability – 0.15 (including ERC).

Results We pitted our top evolved chess-endgame players against two very strong external opponents: 1) A program we wrote (‘Master’) based upon consultation with several high-ranking chess players (the highest being Boris Gutkin, ELO 2400, International Master); 2) CRAFTY – a world-class chess program, which finished second in the 2004 World Computer Speed Chess Championship (www. cs.biu.ac.il/games/). Speed chess (blitz) involves a timelimit per move, which we imposed both on CRAFTY and on our players. Not only did we thus seek to evolve good players, but ones that play well and fast. Results are shown in Table 7. As can be seen, GP-EndChess manages to hold its own, and even win, against these top players. For more details on GP-EndChess see [13,29]. Deeper analysis of the strategies developed [12] revealed several important shortcomings, most of which stemmed from the fact that they used deep knowledge and little search (typically, they developed only one level of the search tree). Simply increasing the search depth would not solve the problem, since the evolved programs examine

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 6 Function set of GP chess player individual. B: Boolean, F: Float Function F=If3(B1 , F 1 , F 2 ) B=Or2(B1 , B2 ) B=Or3(B1 , B2 , B3 ) B=And2(B1 , B2 ) B=And3(B1 , B2 , B3 ) B=Smaller(B1 , B2 ) B=Not(B1 )

Description If B1 is non-zero, return F 1 , else return F 2 Return 1 if at least one of B1 , B2 is non-zero, 0 otherwise Return 1 if at least one of B1 , B2 , B3 is non-zero, 0 otherwise Return 1 only if B1 and B2 are non-zero, 0 otherwise Return 1 only if B1 , B2 , and B3 are non-zero, 0 otherwise Return 1 if B1 is smaller than B2 , 0 otherwise Return 0 if B1 is non-zero, 1 otherwise

1317

1318

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 7 Percent of wins, advantages, and draws for best GP-EndChess player in tournament against two top competitors %Wins %Advs %Draws Master 6.00 2.00 68.00 CRAFTY 2.00 4.00 72.00

each board very thoroughly, and scanning many boards would increase time requirements prohibitively. And so we turned to evolution to find an optimal way to overcome this problem: How to add more search at the expense of less knowledgeable (and thus less time-consuming) node evaluators, while attaining better performance. In [15] we evolved the search algorithm itself , focusing on the Mate-In-N problem: find a key move such that even with the best possible counter-plays, the opponent cannot avoid being mated in (or before) move N. We showed that our evolved search algorithms successfully solve several instances of the Mate-In-N problem, for the hardest ones developing 47% less game-tree nodes than CRAFTY. Improvement is thus not over the basic alpha-beta algorithm, but over a world-class program using all standard enhancements [15]. Finally, in [14], we examined a strong evolved chessendgame player, focusing on the player’s emergent capabilities and tactics in the context of a chess match. Using a number of methods we analyzed the evolved player’s building blocks and their effect on play level. We concluded that evolution has found combinations of building blocks that are far from trivial and cannot be explained through simple combination – thereby indicating the possible emergence of complex strategies. Example: Robocode Program Architecture A Robocode player is written as an event-driven Java program. A main loop controls the tank activities, which can be interrupted on various occasions, called events. The program is limited to four lines of code, as we were aiming for the HaikuBot category, one of the divisions of the international league with a four-line code limit. The main loop contains one line of code that directs the robot to start turning the gun (and the mounted radar) to the right. This insures that within the first gun cycle, an enemy tank will be spotted by the radar, triggering a ScannedRobotEvent. Within the code for this event, three additional lines of code were added, each controlling a single actuator, and using a single numerical input that was supplied by a genetic programming-evolved sub-pro-

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Figure 3 Robocode player’s code layout (HaikuBot division)

gram. The first line instructs the tank to move to a distance specified by the first evolved argument. The second line instructs the tank to turn to an azimuth specified by the second evolved argument. The third line instructs the gun (and radar) to turn to an azimuth specified by the third evolved argument (Fig. 3). Terminal and Function Sets We divided the terminals into three groups according to their functionality [27], as shown in Table 8: 1. Game-status indicators: A set of terminals that provide real-time information on the game status, such as last enemy azimuth, current tank position, and energy levels. 2. Numerical constants: Two terminals, one providing the constant 0, the other being an ERC (ephemeral random constant). This latter terminal is initialized to a random real numerical value in the range [1; 1], and does not change during evolution. 3. Fire command: This special function is used to curtail one line of code by not implementing the fire actuator in a dedicated line.

Fitness Measure We explored two different modes of learning: using a fixed external opponent as teacher, and coevolution – letting the individuals play against each other; the former proved better. However, not one external opponent was used to measure performance but three, these adversaries downloaded from the HaikuBot league (robocode.yajags.com). The fitness value of an individual equals its average fractional score (over three battles). Control Parameters and Run Termination The major evolutionary parameters [19] were: population size – 256, generation count – between 100 and 200, selec-

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing, Table 8 Robocode representation. a Terminal set. b Function set (F: Float) Terminal Energy() Heading() X()

Description Returns the remaining energy of the player Returns the current heading of the player Returns the current horizontal position of the player Y() Returns the current vertical position of the player MaxX() Returns the horizontal battlefield dimension MaxY() Returns the vertical battlefield dimension EnemyBearing() Returns the current enemy bearing, relative to the current player’s heading EnemyDistance() Returns the current distance to the enemy EnemyVelocity() Returns the current enemy’s velocity EnemyHeading() Returns the current enemy heading, relative to the current player’s heading EnemyEnergy() Returns the remaining energy of the enemy Constant() An ERC (Ephemeral Random Constant) in the range [1; 1] Random() Returns a random real number in the range [1; 1] Zero() Returns the constant 0 Function Add(F, F) Sub(F, F) Mul(F, F) Div(F, F)

Description Add two real numbers Subtract two real numbers Multiply two real numbers Divide first argument by second, if denominator non-zero, otherwise return zero Abs(F) Absolute value Neg(F) Negative value Sin(F) Sine function Cos(F) Cosine function ArcSin(F) Arcsine function ArcCos(F) Arccosine function IfGreater(F, F, F, F) If first argument greater than second, return value of third argument, else return value of fourth argument IfPositive(F, F, F) If first argument is positive, return value of second argument, else return value of third argument Fire(F) If argument is positive, execute fire command with argument as firepower and return 1; otherwise, do nothing and return 0

tion method – tournament, reproduction probability – 0, crossover probability – 0.95, and mutation probability – 0.05. An evolutionary run terminates when fitness is observed to level off. Since the game is highly nondeterministic a lucky individual might attain a higher fitness value than better overall individuals. In order to obtain a more

accurate measure for the evolved players we let each of them do battle for 100 rounds against 12 different adversaries (one at a time). The results were used to extract the top player – to be submitted to the international league. Results We submitted our top player to the HaikuBot division of the international league. At its very first tournament it came in third, later climbing to first place of 28 (robocode.yajags.com/20050625/haiku-1v1.html). All other 27 programs, defeated by our evolved strategy, were written by humans. For more details on GP-Robocode see [27,29]. Backgammon: Major Results We pitted our top evolved backgammon players against Pubeval, a free, public-domain board evaluation function written by Tesauro. The program – which plays well – has become the de facto yardstick used by the growing community of backgammon-playing program developers. Our top evolved player was able to attain a win percentage of 62.4% in a tournament against Pubeval, about 10% higher (!) than the previous top method. Moreover, several evolved strategies were able to surpass the 60% mark, and most of them outdid all previous works. For more details on GP-Gammon see [2,3,29]. Future Directions Evolutionary computation is a fast growing field. As shown above, difficult, real-world problems are being tackled on a daily basis, both in academia and in industry. In the future we expect major developments in the underlying theory. Partly spurred by this we also expect major new application areas to succumb to evolutionary algorithms, and many more human-competitive results. Expecting such pivotal breakthroughs may seem perhaps a bit of overreaching, but one must always keep in mind Evolutionary Computation’s success in Nature. Bibliography 1. Abelson H, Sussman GJ, Sussman J (1996) Structure and Interpretation of Computer Programs, 2nd edn. MIT Press, Cambridge 2. Azaria Y, Sipper M (2005) GP-Gammon: Genetically programming backgammon players. Genet Program Evolvable Mach 6(3):283–300. doi:10.1007/s10710-005-2990-0 3. Azaria Y, Sipper M (2005) GP-Gammon: Using genetic programming to evolve backgammon players. In: Keijzer M, Tettamanzi A, Collet P, van Hemert J, Tomassini M (eds) Proceedings of 8th European Conference on Genetic Programming (EuroGP2005). Lecture Notes in Computer Science, vol 3447. Springer, Heidelberg, pp 132–142. doi:10.1007/b107383

1319

1320

Genetic and Evolutionary Algorithms and Programming: General Introduction and Appl. to Game Playing

4. Bain M (1994) Learning logical exceptions in chess. Ph D thesis, University of Strathclyde, Glasgow, Scotland. citeseer.ist.psu. edu/bain94learning.html 5. Bonanno G (1989) The logic of rational play in games of perfect information. Papers 347, California Davis – Institute of Governmental Affairs. http://ideas.repec.org/p/fth/caldav/347.html 6. Charness N (1991) Expertise in chess: The balance between knowledge and search. In: Ericsson KA, Smith J (eds) Toward a general theory of Expertise: Prospects and limits. Cambridge University Press, Cambridge 7. Cramer NL (1985) A representation for the adaptive generation of simple sequential programs. In: Grefenstette JJ (ed) Proceedings of the 1st International Conference on Genetic Algorithms. Lawrence Erlbaum Associates, Mahwah, pp 183– 187 8. Epstein SL (1999) Game playing: The next moves. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence. AAAI Press, Menlo Park, pp 987–993 9. Fogel DB (2006) Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, 3rd edn. Wiley-IEEE Press, Hoboken 10. Fogel LJ, Owens AJ, Walsh MJ (1966) Artificial Intelligence Through Simulated Evolution. Wiley, New York 11. Fürnkranz J (1996) Machine learning in computer chess: The next generation. Int Comput Chess Assoc J 19(3):147–161. citeseer.ist.psu.edu/furnkranz96machine.html 12. Hauptman A, Sipper M (2005) Analyzing the intelligence of a genetically programmed chess player. In: Late Breaking Papers at the 2005 Genetic and Evolutionary Computation Conference, distributed on CD-ROM at GECCO-2005, Washington DC 13. Hauptman A, Sipper M (2005) GP-EndChess: Using genetic programming to evolve chess endgame players. In: Keijzer M, Tettamanzi A, Collet P, van Hemert J, Tomassini M (eds) Proceedings of 8th European Conference on Genetic Programming (EuroGP2005). Lecture Notes in Computer Science, vol 3447. Springer, Heidelberg, pp 120–131. doi:10.1007/b107383 14. Hauptman A, Sipper M (2007) Emergence of complex strategies in the evolution of chess endgame players. Adv Complex Syst 10(1):35–59. doi:10.1142/s0219525907001082 15. Hauptman A, Sipper M (2007) Evolution of an efficient search algorithm for the mate-in-n problem in chess. In: Ebner M, O’Neill M, Ekárt A, Vanneschi L, Esparcia-Alcázar AI (eds) Proceedings of 10th European Conference on Genetic Programming (EuroGP2007). Lecture Notes in Computer Science vol. 4455. Springer, Heidelberg, pp 78–89. doi:10.1007/ 978-3-540-71605-1_8 16. Holland JH (1975) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. University of Michigan Press, Ann Arbor (2nd edn. MIT Press, Cambridge, 1992) 17. Holland JH (1992) Adaptation in Natural and Artificial Systems, 2nd edn. MIT Press, Cambridge

18. Kilinç S, Jain V, Aggarwal V, Cam U (2006) Catalogue of variable frequency and single-resistance-controlled oscillators employing a single differential difference complementary current conveyor. Frequenz: J RF-Eng Telecommun 60(7–8):142–146 19. Koza JR (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge 20. Koza JR, Keane MA, Streeter MJ, Mydlowec W, Yu J, Lanza G (2003) Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer, Norwell 21. Laird JE, van Lent M (2000) Human-level AI’s killer application: Interactive computer games. In: AAAI-00: Proceedings of the 17th National Conference on Artificial Intelligence. MIT Press, Cambridge, pp 1171–1178 22. Lohn JD, Hornby GS, Linden DS (2005) An evolved antenna for deployment on NASA’s Space Technology 5 mission. In: O’Reilly UM, Yu T, Riolo R, Worzel B (eds) Genetic Programming Theory and Practice II, Genetic Programming, vol 8, chap 18. Springer, pp 301–315. doi:10.1007/0-387-23254-0_18 23. Montana DJ (1995) Strongly typed genetic programming. Evol Comput 3(2):199–230. doi:10.1162/evco.1995.3.2.199 24. Peña-Reyes CA, Sipper M (2001) Fuzzy CoCo: A cooperativecoevolutionary approach to fuzzy modeling. IEEE Trans Fuzzy Syst 9(5):727–737. doi:10.1009/91.963759 25. Preble S, Lipson M, Lipson H (2005) Two-dimensional photonic crystals designed by evolutionary algorithms. Appl Phys Lett 86(6):061111. doi:10.1063/1.1862783 26. Schwefel HP (1995) Evolution and Optimum Seeking. Wiley, New York 27. Shichel Y, Ziserman E, Sipper M (2005) GP-Robocode: Using genetic programming to evolve robocode players. In: Keijzer M, Tettamanzi A, Collet P, van Hemert J, Tomassini M (eds) Genetic Programming: 8th European Conference, EuroGP 2005, Lausanne, Switzerland, March 30–April 1, 2005. Lecture Notes in Computer Science, vol 3447. Springer, Berlin, pp 143–154. doi:10.1007/b107383 28. Sipper M (2002) Machine Nature: The Coming Age of Bio-Inspired Computing. McGraw-Hill, New York 29. Sipper M, Azaria Y, Hauptman A, Shichel Y (2007) Designing an evolutionary strategizing machine for game playing and beyond. IEEE Trans Syst, Man, Cybern, Part C: Appl Rev 37(4):583– 593 30. Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evol Comput 10(2):99–127. doi:10.1162/106365602320169811 31. Turing AM (1950) Computing machinery and intelligence. Mind 59(236):433–460. http://links.jstor.org/sici? sici=0026-4423(195010)2:59:2362.0.CO;2-5 32. Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82. doi:10.1109/ 4235.585893 33. Yao X (1999) Evolving artificial neural networks. Proc IEEE 87(9):1423–1447. doi:10.1009/5.784219

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques TZUNG-PEI HONG1 , CHUN-HAO CHEN2 , VINCENT S. T SENG2 1 Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan 2 Department of Computer Science and Information Engineering, National Cheng–Kung University, Tainan, Taiwan Article Outline Glossary Definition of the Subject Introduction Data Mining Fuzzy Sets Fuzzy Data Mining Genetic Algorithms Genetic-Fuzzy Data Mining Techniques Future Directions Bibliography Glossary Data mining Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. The common techniques include mining association rules, mining sequential patterns, clustering, and classification, among others. Fuzzy set theory The fuzzy set theory was first proposed by Zadeh in 1965. It is primarily concerned with quantifying and reasoning using natural language in which words can have ambiguous meanings. It is widely used in a variety of fields because of its simplicity and similarity to human reasoning. Fuzzy data mining The concept of fuzzy sets can be used in data mining to handle quantitative or linguistic data. Basically, fuzzy data mining first uses membership functions to transform each quantitative value into a fuzzy set in linguistic terms and then uses a fuzzy mining process to find fuzzy association rules. Genetic algorithms Genetic Algorithms (GAs) were first proposed by Holland in 1975. They have become increasingly important for researchers in solving difficult problems since they could provide feasible solutions in a limited amount of time. Each possible solution is encoded as a chromosome (individual) in a population.

According to the principle of survival of the fittest, GAs generate the next population by several genetic operations such as crossover, mutation, and reproductions. Genetic-fuzzy data mining Genetic algorithms have been widely used for solving optimization problems. If the fuzzy mining problem can be converted into an optimization problem, then the GA techniques can easily be adopted to solve it. They are thus called genetic-fuzzy data-mining techniques. They are usually used to automatically mine both appropriate membership functions and fuzzy association rules from a set of transaction data. Definition of the Subject Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. However, transactions with quantitative values are commonly seen in real-world applications. Fuzzy data-mining algorithms are thus proposed for extracting interesting linguistic knowledge from transactions stored as quantitative values. They usually integrate fuzzyset concepts and mining algorithms to find interesting fuzzy knowledge from a given transaction data set. Most of them mine fuzzy knowledge under the assumption that a set of membership functions [8,23,24,35,36,50] is known in advance for the problem to be solved. The given membership functions may, however, have a critical influence on the final mining results. Different membership functions may infer different knowledge. Automatically deriving an appropriate set of membership functions for a fuzzy mining problem is thus very important. There are at least two reasons for it. The first one is that a set of appropriate membership functions may not be defined by experts because lots of money and time are needed and experts are not always available. The second one is that data and concepts are always changing along with time. Some mechanisms are thus needed to automatically adapt the membership functions to the changes if needed. The fuzzy mining problem can thus be extended to finding both appropriate membership functions and fuzzy association rules from a set of transaction data. Recently, genetic algorithms have been widely used for solving optimization problems. If the fuzzy mining problem can be converted into an optimization problem, then the GA techniques can easily be adopted to solve it. They are thus called genetic-fuzzy data-mining techniques. They are usually used to automatically mine both

1321

1322

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 1 A KDD Process

appropriate membership functions and fuzzy association rules from a set of transaction data. Some existing approaches are introduced here. These techniques can dynamically adapt membership functions by genetic algorithms according to some criteria, use them to fuzzify the quantitative transactions, and find fuzzy association rules by fuzzy mining approaches. Introduction Most enterprises have databases that contain a wealth of potentially accessible information. The unlimited growth of data, however, inevitably leads to a situation in which accessing desired information from a database becomes difficult. Knowledge discovery in databases (KDD) has thus become a process of considerable interest in recent years, as the amounts of data in many databases have grown tremendously large. KDD means the application of nontrivial procedures for identifying effective, coherent, potentially useful, and previously unknown patterns in large databases [16]. The KDD process [16] is shown in Fig. 1. In Fig. 1, data are first collected from a single or multiple sources. These data are then preprocessed, including methods such as sampling, feature selection or reduction, data transformation, among others. After that, datamining techniques are then used to find useful patterns, which are then interpreted and evaluated to form human knowledge. Especially, data mining plays a critical role to the KDD process. It involves applying specific algorithms for extracting patterns or rules from data sets in a particular representation. Because of its importance, many researchers in the database and machine learning fields are primarily interested in this topic because it offers opportunities to discover useful information and important relevant patterns in large databases, thus helping decision-makers easily analyze the data and make good decisions regarding the domains concerned. For example, there may exist some implicitly useful knowledge in a large database containing

millions of records of customers’ purchase orders over the recent years. This knowledge can be found by appropriate data-mining approaches. Questions such as “what are the most important trends in customers’ purchase behavior?” can thus be easily answered. Most of the mining approaches were proposed for binary transaction data. However, in real applications, quantitative data exists and should also be considered. Fuzzy set theory is being used more and more frequently in intelligent systems because of its simplicity and similarity to human reasoning [51]. The theory has been applied in fields such as manufacturing, engineering, diagnosis, economics, among others [45,52]. Several fuzzy learning algorithms for inducing rules from given sets of data have been designed and used to good effect within specific domains [7,22]. As to fuzzy data mining, many algorithms are also proposed [8,23,24,35,36,50]. Most of these fuzzy data-mining algorithms assume the membership functions are already known. In fuzzy mining problems, the given membership functions may, however, have a critical influence on the final mining results. Developing effective and efficient approaches to derive both the appropriate membership functions and fuzzy association rules automatically are thus worth being studied. Genetic algorithms are widely used for finding membership functions in different fuzzy applications. In this article, we discuss the genetic-fuzzy data-mining approaches which can mine both appropriate membership functions and fuzzy association rules [9,10,11,12,25,26,27,31,32,33]. The genetic-fuzzy mining problems can be divided into four kinds according to the types of fuzzy mining problems and the ways of processing items. The types of fuzzy mining problems include Single-minimum-Support Fuzzy Mining (SSFM) and Multiple-minimum-Support Fuzzy Mining (MSFM). The ways of processing items include processing all the items together (integrated approach) and processing them individually (divide-and-conquer approach). Each of them will be described in details in the following sections. But first of all, the basic concepts of data mining will be described below.

Genetic-Fuzzy Data Mining Techniques

Data Mining Data mining techniques have been used in different fields to discover interesting information from databases in recent years. Depending on the type of databases processed, mining approaches may be classified as working on transaction databases, temporal databases, relational databases, multimedia databases, and data streams, among others. On the other hand, depending on the classes of knowledge derived, mining approaches may be classified as finding association rules, classification rules, clustering rules, and sequential patterns, among others. Finding association rules in transaction databases is most commonly seen in data mining. It is initially applied to market basket analysis for getting relationships of purchased items. An association rule can be expressed as the form A ! B, where A and B are sets of items, such that the presence of A in a transaction will imply the presence of B. Two measures, support and confidence, are evaluated to determine whether a rule should be kept. The support of a rule is the fraction of the transactions that contain all the items in A and B. The confidence of a rule is the conditional probability of the occurrences of items in A and B over the occurrences of items in A. The support and the confidence of an interesting rule must be larger than or equal to a user-specified minimum support and a minimum confidence, respectively. To achieve this purpose, Agrawal and his coworkers proposed several mining algorithms based on the concept of large item sets to find association rules in transaction data [1,2,3,4]. They divided the mining process into two phases. In the first phase, candidate item sets were generated and counted by scanning the transaction data. If the number of an itemset appearing in the transactions was larger than a predefined threshold value (called minimum support), the itemset was considered a large itemset. Itemsets containing only one item were processed first. Large itemsets containing only single items were then combined to form candidate itemsets containing two items. This process was repeated until all large itemsets had been found. In the second phase, association rules were induced from the large itemsets found in the first phase. All possible association combinations for each large itemset were formed, and those with calculated confidence values larger than a predefined threshold (called minimum confidence) were output as association rules. In addition to the above approach, there are many other ones proposed for finding association rules. Most mining approaches focus on binary valued transaction data. Transaction data in real-world applications, however, usually consist of quantitative values. Many so-

phisticated data-mining approaches have thus been proposed to deal with various types of data [6,46,53]. This also presents a challenge to workers in this research field. In addition to proposing methods for mining association rules from transactions of binary values, Srikant et al. also proposed a method [46] for mining association rules from those with quantitative attributes. Their method first determines the number of partitions for each quantitative attribute, and then maps all possible values of each attribute into a set of consecutive integers. It then finds large itemsets whose support values are greater than the user-specified minimum-support levels. For example, the following is a quantitative association rule “If Age is [20; : : : ; 29], then Number of Car is [0; 1]” with a support value (60%) and a confidence value (66.6%). This means that if the age of a person is between 20 to 29-years old, then he/she has zero or one car with 66.6%. Of course, different partition approaches for discretizing the quantitative values may influence the final quantitative association rules. Some researches have thus been proposed for discussing and solving this problem [6,53]. Recently, fuzzy sets have also been used in data mining to handle quantitative data due to its ability to deal with the interval boundary problem. The theory of fuzzy sets will be introduced below. Fuzzy Sets Fuzzy set theory was first proposed by Zadeh in 1965 [51]. It is primarily concerned with quantifying and reasoning using natural language in which words can have ambiguous meanings. It is widely used in a variety of fields because of its simplicity and similarity to human reasoning [13,45,52]. For example, the theory has been applied in fields such as manufacturing, engineering, diagnosis, economics, among others [19,29,37]. Fuzzy set theory can be thought of as an extension of traditional crisp sets, in which each element must either be in or not in a set. Formally, the process by which individuals from a universal set X are determined to be either members or nonmembers of a crisp set can be defined by a characteristic or discrimination function [51]. For a given crisp set A, this function assigns a value A (x) to every x 2 X such that ( 1 if and only if x 2 A A (x) D 0 if and only if x … A : The function thus maps elements of the universal set to the set containing 0 and 1. This kind of function can be generalized such that the values assigned to the elements of the universal set fall within specified ranges, referred

1323

1324

Genetic-Fuzzy Data Mining Techniques

to as the membership grades of these elements in the set. Larger values denote higher degrees of set membership. Such a function is called a membership function, A (x), by which a fuzzy set A is usually defined. This function is represented by A : X ! [0; 1] ; where [0; 1] denotes the interval of real numbers from 0 to 1, inclusive. The function can also be generalized to any real interval instead of [0; 1]. A special notation is often used in the literature to represent fuzzy sets. Assume that x1 to xn are the elements in fuzzy set A, and 1 to n are, respectively, their grades of membership in A. A is then represented as follows: A D 1 /x1 C 2 /x2 C    C n /x n : An ˛-cut of a fuzzy set A is a crisp set A˛ that contains all the elements in the universal set X with their membership grades in A greater than or equal to a specified value of ˛. This definition can be written as A˛ D fx 2 X j A (x) ˛g : The scalar cardinality of a fuzzy set A defined on a finite universal set X is the summation of the membership grades of all the elements of X in A. Thus, X jAj D A (x) : x2X

Three basic and commonly used operations on fuzzy sets are complementation, union and intersection, as proposed by Zadeh. They are described as follows. 1. The complementation of a fuzzy set A is denoted by :A, and the membership function of :A is given by :A (x) D 1  A (x) 8x 2 X :

Fuzzy Data Mining As mentioned above, the fuzzy set theory is a natural way to process quantitative data. Several fuzzy learning algorithms for inducing rules from given sets of data have thus been designed and used to good effect with specific domains [7,22,41]. Fuzzy data mining approaches have also been developed to find knowledge with linguistic terms from quantitative transaction data. The knowledge obtained is expected to be easy to understand. A fuzzy association rule is shown in Fig. 2. In Fig. 2, instead of quantitative intervals used in quantitative association rules, linguistic terms are used to represent the knowledge. As we can observe from the rule “If middle amount of bread is bought, then high amount of milk is bought”, bread and milk are items, and middle and high are linguistic terms. The rule means that if the quantity of the purchased item bread is middle, then there is a high possibility that the associated purchased item is milk with high quantity. Many approaches have been proposed for mining fuzzy association rules [8,23,24,35,36,50]. Most of the approaches set a single minimum support threshold for all the items or itemsets and identify the association relationships among transactions. In real applications, different items may have different criteria to judge their importance. Multiple minimum support thresholds are thus proposed for this purpose. We can thus divide the fuzzy data mining approaches into two types, namely Singleminimum-Support Fuzzy Mining (SSFM) [8,23,24,35,50] and Multiple-minimum-Support Fuzzy Mining (MSFM) problems [36]. In the SSFM problem, Chan and Au proposed an FAPACS algorithm to mine fuzzy association rules [8]. They first transformed quantitative attribute values into linguistic terms and then used the adjusted difference analysis to find interesting associations among attributes.

2. The intersection of two fuzzy sets A and B is denoted by A \ B, and the membership function of A \ B is given by A\B (x) D minfA (x); B (x)g

8x 2 X :

3. The union of two fuzzy sets A and B is denoted by A [ B, and the membership function of A [ B is given by A[B (x) D maxfA (x); B (x)g

8x 2 X :

Note that there are other calculation formula for the complementation, union and intersection, but the above are the most popular.

Genetic-Fuzzy Data Mining Techniques, Figure 2 A fuzzy association rule

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 3 The concept of fuzzy data mining for the SSFM problem

Kuok et al. proposed a mining approach for fuzzy association rules. Instead of minimum supports and minimum confidences used in most mining approaches, significance factors and certainty factors were used to derive large itemsets and fuzzy association rules [35]. At nearly the same time, Hong et al. proposed a fuzzy mining algorithm to mine fuzzy rules from quantitative transaction data [23]. Basically, these fuzzy mining algorithms first used membership functions to transform each quantitative value into a fuzzy set in linguistic terms and then used a fuzzy mining process to find fuzzy association rules. Yue et al. then extended the above concept to find fuzzy association rules with weighted items from transaction data [50]. They adopted Kohonen self-organized mapping to derive fuzzy sets for numerical attributes. In general, the basic concept of fuzzy data mining for the SSFM problem is shown in Fig. 3. In Fig. 3, the process for fuzzy data mining first transforms quantity transactions into a fuzzy representation according to the predefined membership functions. The transformed data are then calculated to generate large itemsets. Finally, the generated large itemsets are used to derive fuzzy association rules. As to the MSFM problem, Lee et al. proposed a mining algorithm which used multiple minimum supports to mine fuzzy association rules [36]. They assumed that items

had different minimum supports and the maximum constraint was used. That is, the minimum support for an itemset was set as the maximum of the minimum supports of the items contained in the itemset. Under the constraint, the characteristic of level-by-level processing was kept, such that the original a priori algorithm could easily be extended to finding large itemsets. In addition to the maximum constraint, other constraints such as the minimum constraint can also be used with different rationale. In addition to the above fuzzy mining approaches, fuzzy data mining with taxonomy or fuzzy taxonomy has been developed. Fuzzy web mining is another application of it. Besides, fuzzy data mining is strongly related to fuzzy control, fuzzy clustering, and fuzzy learning. In most fuzzy mining approaches, the membership functions are usually predefined in advance. Membership functions are, however, very crucial to the final mined results. Below, we will describe how the genetic algorithm can be combined with fuzzy data mining to make the entire process more complete. The concept of the genetic algorithm will first be briefly introduced in the next section. Genetic Algorithms Genetic Algorithms (GAs) [17,20] have become increasingly important for researchers in solving difficult prob-

1325

1326

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 4 The entire GA process

lems since they could provide feasible solutions in a limited amount of time [21]. They were first proposed by Holland in 1975 [20] and have been successfully applied to the fields of optimization [17,39,40,43], machine learning [17,39], neural networks [40], fuzzy logic controllers [43], and so on. GAs are developed mainly based on the ideas and techniques from genetic and evolutionary theory [20]. According to the principle of survival of the fittest, they generate the next population by several operations, with each individual in the population representing a possible solution. There are three principal operations in a genetic algorithm.

first step is to define a representation that describes the problem states. The most common way used is the bit string representation. An initial population of individuals, called chromosomes, is then defined and the three genetic operations (crossover, mutation, and selection) are performed to generate the next generation. Each chromosome in the population is evaluated by a fitness function to determine its goodness. This procedure is repeated until a userspecified termination criterion is satisfied. The entire GA process is shown in Fig. 4.

1. The crossover operation: it generates offspring from two chosen individuals in the population by exchanging some bits in the two individuals. The offspring thus inherit some characteristics from each parent. 2. The mutation operation: it generates offspring by randomly changing one or several bits in an individual. The offspring may thus possess different characteristics from their parents. Mutation prevents local searches of the search space and increases the probability of finding global optima. 3. The selection operation: it chooses some offspring for survival according to predefined rules. This keeps the population size within a fixed constant and puts good offspring into the next generation with a high probability.

In the previous section for fuzzy data mining, several approaches were introduced, in which the membership functions were assumed to be known in advance. The given membership functions may, however, have a critical influence on the final mining results. Although many approaches for learning membership functions were proposed [14,42,44,47,48], most of them were usually used for classification or control problems. There were several strategies proposed for learning membership functions in classification or control problems by genetic algorithms. Below are some of them:

On applying genetic algorithms to solving a problem, the

Genetic-Fuzzy Data Mining Techniques

1. Learning membership functions first, then rules; 2. Learning rules first, then membership functions; 3. Simultaneously learning rules and membership functions; 4. Iteratively learning rules and membership functions.

Genetic-Fuzzy Data Mining Techniques

For fuzzy mining problems, many researches have also been done by combining the genetic algorithm and the fuzzy concepts to discover both suitable membership functions and useful fuzzy association rules from quantitative values. However, most of them adopt the first strategy. That is, the membership functions are first learned and then the fuzzy association rules are derived based on the obtained membership functions. It is done in this way because the number of association rules is often large in mining problems and can not easily be coded in a chromosome. In this article, we introduce several genetic-fuzzy data mining algorithms that can mine both appropriate membership functions and fuzzy association rules [9,10,11, 12,25,26,27,31,32,33]. The genetic-fuzzy mining problems can be divided into four kinds according to the types of fuzzy mining problems and the ways of processing items. The types of fuzzy mining problems include Singleminimum-Support Fuzzy-Mining (SSFM) and MultipleMinimum-Support Fuzzy-Mining (MSFM) as mentioned above. The ways of processing items include processing all the items together (integrated approach) and processing them individually (divide-and-conquer approach). The integrated genetic-fuzzy approaches encode all membership functions of all items (or attributes) into a chromosome (also called an individual). The genetic algorithms are then used to derive a set of appropriate membership functions according to the designed fitness function. Finally, the best set of membership functions are then used to mine fuzzy association rules. On the other hand, the divide-and-conquer genetic-fuzzy approaches encode membership functions of each item into a chromosome. In other words, chromosomes in a population were maintained just for only one item. The membership functions can thus be found for one item after another or at the same time via parallel processing. In general, the chromosomes in the divide-and-conquer genetic-fuzzy approaches are much shorter than those in the integrated approaches since the former only focus on individual items. But there are more application limitations on the former than on the latter. This will be explained later. The four kinds of problems are thus the Integrated Genetic-Fuzzy problem for items with a Single Minimum Support (IGFSMS) [9,11,26,31,32,33], the Integrated Genetic-Fuzzy problem for items with Multiple Minimum Supports (IGFMMS) [12], the Divide-and-Conquer Genetic-Fuzzy problem for items with a Single Minimum Support (DGFSMS) [10,25,27] and the Divide-and-Conquer Genetic-Fuzzy problem for items with Multiple Minimum Supports (DGFMMS). The classification is shown in Table 1.

Genetic-Fuzzy Data Mining Techniques, Table 1 The four different genetic-fuzzy data mining problems

Single minimum support Multiple minimum supports

Integrated approach IGFSMS Problem

Divide-and-conquer approach DGFSMS Problem

IGFMMS Problem

DGFMMS Problem

Each of the four kinds of genetic-fuzzy data mining problems will be introduced in the following sections. The Integrated Genetic-Fuzzy Problem for Items with a Single Minimum Support (IGFSMS) Many approaches have been published for solving the IGFSMS problem [9,11,26,31,32,33]. For example, Hong et al. proposed a genetic-fuzzy data-mining algorithm for extracting both association rules and membership functions from quantitative transactions [26]. They proposed a GA-based framework for searching membership functions suitable for given mining problems and then use the final best set of membership functions to mine fuzzy association rules. The proposed framework is shown in Fig. 5. The proposed framework consists of two phases, namely mining membership functions and mining fuzzy association rules. In the first phase, the proposed framework maintains a population of sets of membership functions, and uses the genetic algorithm to automatically derive the resulting one. It first transforms each set of membership functions into a fixed-length string. The chromosome is then evaluated by the number of large 1-itemsets and the suitability of membership functions. The fitness value of a chromosome Cq is then defined as f (C q ) D

jL1 j ; suitability(C q )

where jL1 j is the number of large 1-itemsets obtained by using the set of membership functions in Cq . Using the number of large 1-itemsets can achieve a trade-off between execution time and rule interestingness. Usually, a larger number of 1-itemsets will result in a larger number of all itemsets with a higher probability, which will thus usually imply more interesting association rules. The evaluation by 1-itemsets is, however, faster than that by all itemsets or interesting association rules. Of course, the number of all itemsets or interesting association rules can also be used in the fitness function. A discussion for different choices of fitness functions can be found in [9].

1327

1328

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 5 A genetic-fuzzy framework for the IGFSMS problem

The suitability measure is used to reduce the occurrence of bad types of membership functions. The two bad types of membership functions are shown in Fig. 6, where the first one is too redundant, and the second one is too separate. Two factors, called the overlap factor and the coverage factor, are used to avoid the bad shapes. The overlap factor is designed for avoiding the first bad case (too redundant), and the coverage factor is for the second one (too separate). Each factor has its formula for evaluating a value from a chromosome. After fitness evaluation, the approach then chooses appropriate chromosomes for mating, gradually creating good offspring membership function sets. The offspring membership function sets then undergo recursive evolution until a good set of membership functions has been obtained. In the second phase, the final best membership functions are gathered to mine fuzzy association rules. The fuzzy mining algorithm proposed in [24] is adopted to achieve this purpose.

The calculation for large 1-itemsets, however, will still take a lot of time, especially when the database can not totally be fed into the main memory. An enhanced approach, called the cluster-based fuzzy-genetic mining algorithm was thus proposed [11] to speed up the evaluation process and keep nearly the same quality of solutions as that in [26]. That approach also maintains a population of sets of membership functions and uses the genetic algorithm to derive the best one. Before fitness evaluation, the clustering technique is first used to cluster chromosomes. It uses the k-means clustering approach to gather similar chromosomes into groups. The two factors, overlap factor and coverage factor, are used as two attributes for clustering. For example, coverage and overlap factors for ten chromosomes are shown in Table 2, where the column “Suitability” represents the pair (coverage factor, overlap factor). The k-means clustering approach is then executed to divide the ten chromosomes into k clusters. In this exam-

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 6 The two bad types of membership functions Genetic-Fuzzy Data Mining Techniques, Table 2 The coverage and the overlap factors of ten chromosomes Chromosome C1 C2 C3 C4 C5

Suitability (4; 0) (4:24; 0:5) (4:37; 0) (4:66; 0) (4:37; 0:33)

Chromosome C6 C7 C8 C9 C 10

Suitability (4:5; 0) (4:45; 0) (4:37; 0:53) (4:09; 8:33) (4:87; 0)

Genetic-Fuzzy Data Mining Techniques, Table 3 The three clusters found in the example Clusteri Cluster1 Cluster2 Cluster3

Chromosomes C1 ; C2 ; C5 ; C8 C3 ; C4 ; C6 ; C7 ; C10 C9

Representative chromosome C5 C4 C9

ple, assume the parameter k is set at 3. The three clusters found are shown in Table 3. The representative chromosomes in the three clusters are C5 (4:37; 0:33), C4 (4:66; 0) and C9 (4:09; 8:33). All the chromosomes in a cluster use the number of large 1-itemsets derived from the representative chromosome in the cluster and their own suitability of membership functions to calculate their fitness values. Since the number for scanning a database decreases, the evaluation cost can thus be reduced. In this example, the representative chromosomes are chromosomes C4 , C5 , C9 and it only needs to calculate the number of large 1-itemsets three times. The evaluation results are utilized to choose appropriate chromosomes for mating in the next generation. The offspring membership function sets then undergo recursive evolution until a good set of membership functions has been obtained. Finally, the derived membership functions are used to mine fuzzy association rules. Kaya and Alhaji also proposed several genetic-fuzzy data mining approaches to derive membership functions and fuzzy association rules [31,32,33]. In [31], the pro-

posed approach tries to derive membership functions, which can get a maximum profit within an interval of user-specified minimum support values. It then uses the derived membership functions to mine fuzzy association rules. The concept of their approaches is shown in Fig. 7. As shown in Fig. 7a, the approach first derives membership functions from the given quantitative transaction database by genetic algorithms. The final membership functions are then used to mine fuzzy association rules. Figure 7b shows the concept of maximizing the large itemsets of the given minimum support interval. It is used as the fitness function. Kaya and Alhaji also extended the approach to mine fuzzy weighted association rules [32]. Furthermore, fitness functions are not easily defined for GA applications, such that multiobjective genetic algorithms have also been developed [28,30]. In other words, more than one criterion is used in the evaluation. A set of solutions, namely nondominated points (also called ParetoOptimal Surface), is derived and given to users, instead of only the one best solution obtained by genetic algorithms. Kaya and Alhaji thus proposed an approach based on multiobjective genetic algorithms to learn membership functions, which were then used to generate interesting fuzzy association rules [33]. Three objective functions, namely strongness, interestingness and comprehensibility, were used in their approach to find the Pareto-Optimal Surface. In addition to the above approaches for the IGFSMS problem, some others are still in progress.

The Integrated Genetic-Fuzzy Problem for Items with Multiple Minimum Supports (IGFMMS) In the above subsection, it can be seen that lots of researches focus on integrated genetic-fuzzy approaches for items with a single minimum support. However, different items may have different criteria to judge their importance. For example, assume among a set of items there are some which are expensive. They are thus seldom bought because of their high cost. Besides, the support values of

1329

1330

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 7 The concept of Kaya and Alhaji’s approaches

these items are low. A manager may, however, still be interested in these products due to their high profits. In such cases, the above approaches may not be suitable for this problem. Chen et al. thus proposed another genetic-fuzzy data mining approach [12], which was an extension of the approach proposed in [26], to solve it. The approach combines the clustering, fuzzy and genetic concepts to derive minimum support values and membership functions for items. The final minimum support values and membership functions are then used to mine fuzzy association rules. The genetic-fuzzy mining framework for the IGFMMS problem is shown in Fig. 8. As shown in Fig. 8, the framework can be divided into two phases. The first phase searches for suitable minimum support values and membership functions of items and the second phase uses the final best set of minimum support values and membership functions to mine fuzzy association rules. The proposed framework maintains a population of sets of minimum support values and membership functions, and uses the genetic algorithm to automatically derive the resulting one. A genetic algorithm requires a population of feasible solutions to be initialized and updated during the evolution process. As mentioned above, each individual within the population is a set of minimum support values and

isosceles-triangular membership functions. Each membership function corresponds to a linguistic term of a certain item. In this approach, the initial set of chromosomes is generated based on the initialization information derived by the k-means clustering approach on the transactions. The frequencies and quantitative values of items in the transactions are the two main factors to gather similar items into groups. The initialization information includes an appropriate number of linguistic terms, the range of possible minimum support values and membership functions of each item. All the items in the same cluster are considered to have similar characteristics and are assigned similar initialization values when a population is initialized. The approach then generates and encodes each set of minimum support values and membership functions into a fixed-length string according to the initialization information. In this approach, the minimum support values of items may be different. It is hard to assign the values. As an alternative, the values can be determined according to the required number of rules. It is, however, very time-consuming to obtain the rules for each chromosome. As mentioned above, a larger number of 1-itemsets will usually result in a larger number of all itemsets with a higher probability, which will thus usually imply more interesting as-

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 8 A genetic-fuzzy framework for the IGFMMS problem

sociation rules. The evaluation by 1-itemsets is faster than that by all itemsets or interesting association rules. Using the number of large 1-itemsets can thus achieve a trade-off between execution time and rule interestingness [26]. A criterion should thus be specified to reflect the user preference on the derived knowledge. In the approach, the required number of large 1-itemsets RNL is used for this purpose. It is the number of linguistic large 1-itemsets that a user wants to get from an item. It can be defined as the number of linguistic terms of an item mul-

tiplied by the predefined percentage which reflects users’ preference on the number of large 1-itemsets. It is used to reflect the closeness degree between the number of derived large 1-itemsets and the required number of large 1-itemsets. For example, assume there are three linguistic terms for an item and the predefined percentage p is set at 80%. The RNL value is then set as b3  0:8c, which is 2. The fitness function is then composed of the suitability of membership functions and the closeness to the RNL value. The minimum support values and membership functions can

1331

1332

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 9 A genetic-fuzzy framework for the DGFSMS problem

thus be derived by GA and are then used to mine fuzzy association rules by a fuzzy mining approach for multiple minimum supports such as the one in [36]. The Divide-and-Conquer Genetic-Fuzzy Problem for Items with a Single Minimum Support (DGFSMS) The advantages of the integrated genetic-fuzzy approaches lie in that they are simple, easy to use, and with few constraints in the fitness functions. In addition to the number of large 1-itemsets, the other criteria can also be used. However, if the number of items is large, the integrated genetic-fuzzy approaches may need lots of time to find a near-optimal solution because the length of a chromosome is very long. Recently, the divide-and-conquer strategy has been used in the evolutionary computation com-

munity to very good effect. Many algorithms based on it have also been proposed in different applications [5,15, 34,49]. When the number of large 1-itemsets is used in fitness evaluation, the divide-and-conquer strategy becomes a good choice to deal with it since each item can be individually processed in this situation. Hong et al. thus used a GA-based framework with the divide-and-conquer strategy to search for membership functions suitable for the mining problem [27]. The framework is shown in Fig. 9. The proposed framework in Fig. 9 is divided into two phases: mining membership functions and mining fuzzy association rules. Assume the number of items is m. In the phase of mining membership functions, it maintains m populations of membership functions, with each population for an item. Each chromosome in a population represents a possible set of membership functions for that item.

Genetic-Fuzzy Data Mining Techniques

Genetic-Fuzzy Data Mining Techniques, Figure 10 A genetic-fuzzy framework for the DGFMMS problem

The chromosomes in the same population are of the same length. The fitness of each set of membership functions is evaluated by the fuzzy-supports of the linguistic terms in large 1-itemsets and by the suitability of the derived membership functions. The offspring sets of membership functions undergo recursive evolution until a good set of membership functions has been obtained. Next, in the phase of mining fuzzy association rules, the sets of membership function for all the items are gathered together and used to mine the fuzzy association rules from the given quantitative database. An enhanced approach [10], which combines the clustering and the divide-and-conquer techniques, was also proposed to speed up the evaluation process. The clustering idea is similar to that in IGFSMS [11] except that the center value of each membership function is also used as an attribute to cluster chromosomes. The clustering process is thus executed according to the cov-

erage factors, the overlap factors and the center values of chromosomes. For example, assume each item has three membership functions. In total five attributes including one coverage factor, one overlap factor and three center values, are used to form appropriate clusters. Note that the number of linguistic terms for each item is predefined in the mentioned approaches. It may also be automatically and dynamically adjusted [25]. The Divide-and-Conquer Genetic-Fuzzy Problem for Items with Multiple Minimum Supports (DGFMMS) The problem may be thought of as the combination of the IGFMMS and the DGFSMS problems. The framework for the DGFMMS problem can thus be easily designed from the previous frameworks for IGFMMS and DGFSMS. It is shown in Fig. 10.

1333

1334

Genetic-Fuzzy Data Mining Techniques

The proposed framework in Fig. 10 is divided into two phases: mining minimum supports and membership functions, and mining fuzzy association rules. In the first phase, the clustering approach is first used for deriving initialization information which is then used for obtaining better initial populations as used for IGFMMS. It then maintains m populations of minimum supports and membership functions, with each population for an item. Next, in the phase of mining fuzzy association rules, the minimum support values and membership functions for all the items are gathered together and are used to mine fuzzy interesting association rules from the given quantitative database. Future Directions In this article, we have introduced some genetic-fuzzy data mining techniques and their classification. The concept of fuzzy sets is used to handle quantitative transactions and the process of genetic calculation is executed to find appropriate membership functions. The geneticfuzzy mining problems are divided into four kinds according to the types of fuzzy mining problems and the ways of processing items. The types of fuzzy mining problems include Single-minimum-Support Fuzzy-Mining (SSFM) and Multiple-minimum-Support Fuzzy-Mining (MSFM). The methods of processing items include processing all the items together (integrated approach) and processing them individually (divide-and-conquer approach). Each of the four kinds of problems has been described with some approaches given. Data mining is very important especially because the data amounts in the information era are extremely large. The topic will continuously grow but with a variety of forms. Some possible research directions in the future about genetic-fuzzy data mining are listed as follows.  Applying multiobjective genetic algorithms to the genetic-fuzzy mining problems: In the article, mined knowledge (number of large itemsets or number of rules) and suitability of membership functions are two important factors used in genetic-fuzzy data mining. Analyzing the relationship between the two factors is thus an interesting and important task. Besides, multiobjective genetic algorithms can also be used to consider the factors at the same time.  Analyzing effects of different shapes of membership functions and different genetic operators: Different shapes of membership functions may have different results on genetic-fuzzy data mining. They may be evaluated in the future. Different genetic operations may also be tried to obtain better results than the ones used in the above approaches.

 Enhancing performance of the fuzzy-rule mining phase: The final goal of genetic-fuzzy mining techniques introduced in this article is to mine appropriate fuzzy association rules. However, the phase of mining fuzzy association rules is very time-consuming. How to improve the process of mining interesting fuzzy rules is thus worth studying. Some possible approaches include modifying existing approaches, combining the existing ones with other techniques, or defining new evaluation criteria.  Developing visual tools for these genetic-fuzzy mining approaches: Another interesting aspect of future work is to develop visual tools for demonstrating the genetic-fuzzy mining results. It can help a decision maker easily understand or get useful information quickly. The visual tools may include, for example, how to show the derived membership functions and to illustrate the interesting fuzzy association rules.

Bibliography 1. Agrawal R, Srikant R (1994) Fast algorithm for mining association rules. In: Proceedings of the international conference on very large data bases, pp 487–499 2. Agrawal R, Imielinksi T, Swami A (1993) Database mining: a performance perspective. Trans IEEE Knowl Data Eng 5(6):914–925 3. Agrawal R, Imielinksi T, Swami A (1993) Mining association rules between sets of items in large database. In: Proceedings of the conference ACMSIGMOD, Washington DC, USA 4. Agrawal R, Srikant R, Vu Q (1997) Mining association rules with item constraints. In: Proceedings of the third international conference on knowledge discovery in databases and data mining, Newport Beach, California, August 1997 5. Au WH, Chan KCC, Yao X (2003) A novel evolutionary data mining algorithm with applications to churn prediction. Trans IEEE Evol Comput 7(6):532–545 6. Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In: Proceedings of the ACMSIGKDD international conference on knowledge discovery and data mining, pp 261–270 7. Casillas J, Cordón O, del Jesus MJ, Herrera F (2005) Genetic tuning of fuzzy rule deep structures preserving interpretability and its interaction with fuzzy rule set reduction. Trans IEEE Fuzzy Syst 13(1):13–29 8. Chan CC, Au WH (1997) Mining fuzzy association rules. In: Proceedings of the conference on information and knowledge management, Las Vegas, pp 209–215 9. Chen CH, Hong TP, Vincent Tseng S (2007) A comparison of different fitness functions for extracting membership functions used in fuzzy data mining. In: Proceedings of the symposium IEEE on foundations of computational intelligence, pp 550– 555 10. Chen CH, Hong TP, Tseng VS (2007) A modified approach to speed up genetic-fuzzy data mining with divide-and-conquer strategy. In: Proceedings of the congress IEEE on evolutionary computation (CEC), pp 1–6

Genetic-Fuzzy Data Mining Techniques

11. Chen CH, Tseng VS, Hong TP (2008) Cluster-based evaluation in fuzzy-genetic data mining. Trans IEEE Fuzzy Syst 16(1):249– 262 12. Chen CH, Hong TP, Tseng VS, Lee CS (2008) A genetic-fuzzy mining approach for items with multiple minimum supports. Accepted and to appear in Soft Computing (SCI) 13. Chen J, Mikulcic A, Kraft DH (2000) An integrated approach to information retrieval with fuzzy clustering and fuzzy inferencing. In: Pons O, Vila MA, Kacprzyk J (eds) Knowledge management in fuzzy databases. Physica, Heidelberg 14. Cordón O, Herrera F, Villar P (2001) Generating the knowledge base of a fuzzy rule-based system by the genetic learning of the data base. Trans IEEE Fuzzy Syst 9(4):667–674 15. Darwen PJ, Yao X (1997) Speciation as automatic categorical modularization. Trans IEEE Evol Comput 1(2):101–108 16. Frawley WJ, Piatetsky-Shapiro G, Matheus CJ (1991) Knowledge discovery in databases: an overview. In: Proceedings of the workshop AAAI on knowledge discovery in databases, pp 1–27 17. Goldberg DE (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, Boston 18. Grefenstette JJ (1986) Optimization of control parameters for genetic algorithms. Trans IEEE Syst Man Cybern 16(1):122–128 19. Heng PA, Wong TT, Rong Y, Chui YP, Xie YM, Leung KS, Leung PC (2006) Intelligent inferencing and haptic simulation for Chinese acupuncture learning and training. Trans IEEE Inf Technol Biomed 10(1):28–41 20. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Michigan 21. Homaifar A, Guan S, Liepins GE (1993) A new approach on the traveling salesman problem by genetic algorithms. In: Proceedings of the fifth international conference on genetic algorithms 22. Hong TP, Lee YC (2001) Mining coverage-based fuzzy rules by evolutional computation. In: Proceedings of the international IEEE conference on data mining, pp 218–224 23. Hong TP, Kuo CS, Chi SC (1999) Mining association rules from quantitative data. Intell Data Anal 3(5):363–376 24. Hong TP, Kuo CS, Chi SC (2001) Trade-off between time complexity and number of rules for fuzzy mining from quantitative data. Int J Uncertain Fuzziness Knowledge-Based Syst 9(5):587–604 25. Hong TP, Chen CH, Wu YL, Tseng VS (2004) Finding active membership functions in fuzzy data mining. In: Proceedings of the workshop on foundations of data mining in the fourth international IEEE conference on data mining 26. Hong TP, Chen CH, Wu YL, Lee YC (2006) AGA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions. Soft Comput 10(11):1091–1101 27. Hong TP, Chen CH, Wu YL, Lee YC (2008) Genetic-fuzzy data mining with divide-and-conquer strategy. Trans IEEE Evol Comput 12(2):252–265 28. Ishibuchi H, Yamamoto T (2004) Fuzzy rule selection by multiobjective genetic local search algorithms and rule evaluation measures in data mining. Fuzzy Sets Syst 141:59–88 29. Ishibuchi H, Yamamoto T (2005) Rule weight specification in fuzzy rule-based classification systems. Trans IEEE Fuzzy Syst 13(4):428–435

30. Jin Y (2006) Multi-objective machine learning. Springer, Berlin 31. Kaya M, Alhajj R (2003) A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining. In: Proceedings of the international IEEE conference on fuzzy systems, pp 881–886 32. Kaya M, Alhaji R (2004) Genetic algorithms based optimization of membership functions for fuzzy weighted association rules mining. In: Proceedings of the international symposium on computers and communications, vol 1, pp 110–115 33. Kaya M, Alhajj R (2004) Integrating multi-objective genetic algorithms into clustering for fuzzy association rules mining. In: Proceedings of the fourth international IEEE conference on data mining, pp 431–434 34. Khare VR, Yao X, Sendhoff B, Jin Y, Wersing H (2005) Coevolutionary modular neural networks for automatic problem decomposition. In: Proceedings of the (2005) congress IEEE on evolutionary computation, vol 3, pp 2691–2698 35. Kuok C, Fu A, Wong M (1998) Mining fuzzy association rules in databases, Record SIGMOD 27(1):41–46 36. Lee YC, Hong TP, Lin WY (2004) Mining fuzzy association rules with multiple minimum supports using maximum constraints. In: Lecture notes in computer science, vol 3214. Springer, Heidelberg, pp 1283–1290 37. Liang H, Wu Z, Wu Q (2002) A fuzzy based supply chain management decision support system. In: Proceedings of the world congress on intelligent control and automation, vol 4, pp 2617–2621 38. Mamdani EH (1974) Applications of fuzzy algorithms for control of simple dynamic plants. Proc IEEE 121(12):1585– 1588 39. Michalewicz Z (1994) Genetic algorithms + data structures = evolution programs. Springer, New York 40. Mitchell M (1996) An introduction to genetic algorithms. MIT Press, Cambridge MA 41. Rasmani KA, Shen Q (2004) Modifying weighted fuzzy subsethood-based rule models with fuzzy quantifiers. In: Proceedings of the international IEEE conference on fuzzy systems, vol 3, pp 1679–1684 42. Roubos H, Setnes M (2001) Compact and transparent fuzzy models and classifiers through iterative complexity reduction. Trans IEEE Fuzzy Syst 9(4):516–524 43. Sanchez E et al (1997) Genetic algorithms and fuzzy logic systems: soft computing perspectives (advances in fuzzy systems – applications and theory, vol 7). World-Scientific, River Edge 44. Setnes M, Roubos H (2000) GA-fuzzy modeling and classification: complexity and performance. Trans IEEE Fuzzy Syst 8(5):509–522 45. Siler W, James J (2004) Fuzzy expert systems and fuzzy reasoning. Wiley, New York 46. Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the (1996) international ACMSIGMOD conference on management of data, Montreal, Canada, June 1996, pp 1–12 47. Wang CH, Hong TP, Tseng SS (1998) Integrating fuzzy knowledge by genetic algorithms. Trans IEEE Evol Comput 2(4):138– 149 48. Wang CH, Hong TP, Tseng SS (2000) Integrating membership functions and fuzzy rule sets from multiple knowledge sources. Fuzzy Sets Syst 112:141–154

1335

1336

Genetic-Fuzzy Data Mining Techniques

49. Yao X (2003) Adaptive divide-and-conquer using populations and ensembles. In: Proceedings of the (2003) international conference on machine learning and application, pp 13–20 50. Yue S, Tsang E, Yeung D, Shi D (2000) Mining fuzzy association rules with weighted items. In: Proceedings of the international IEEE conference on systems, man and cybernetics, pp 1906– 1911

51. Zadeh LA (1965) Fuzzy set. Inf Control 8(3):338–353 52. Zhang H, Liu D (2006) Fuzzy modeling and fuzzy control. Springer, New York 53. Zhang Z, Lu Y, Zhang B (1997) An effective partitioning-combining algorithm for discovering quantitative association rules. In: Proceedings of the Pacific-Asia conference on knowledge discovery and data mining, pp 261–270

Gliders in Cellular Automata

Gliders in Cellular Automata CARTER BAYS Department of Computer Science and Engineering, University of South Carolina, Columbia, USA Article Outline Glossary Definition of the Subject Introduction Other GL Rules in the Square Grid Why Treat All Neighbors the Same? Gliders in One Dimension Two Dimensional Gliders in Non-Square Grids Three and Four Dimensional Gliders Future Directions Bibliography Glossary Game of life A particular cellular automaton (CA) discovered by John Conway in 1968. Neighbor A neighbor of cell x is typically a cell that is in close proximity to (frequently touching) cell x. Oscillator A periodic shape within a specific CA rule. Glider A translating oscillator that moves across the grid of a CA. Generation The discrete time unit which depicts the evolution of a CA. Rule Determines how each individual cell within a CA evolves. Definition of the Subject A cellular automaton is a structure comprising a grid with individual cells that can have two or more states; these cells evolve in discrete time units and according to a rule, which usually involves neighbors of each cell. Introduction Although cellular automata has origins dating from the 1950s, interest in that topic was given a boost during the 1980s by the research of Stephan Wolfram, which culminated in 2002 with his publication of the massive tome, “A New Kind of Science” [11]. And widespread popular interest was created when John Conway’s “game of life” cellular automaton was initially revealed to the public in a 1970 Scientific American article [8]. The single feature of his game that probably caused this intensive interest was

undoubtedly the discovery of “gliders” (translating oscillators). Not surprisingly, gliders are present in many other cellular automata rules; the purpose of this article is to examine some of these rules and their associated gliders. Cellular automata (CA) can be constructed in one, two, three or more dimensions and can best be explained by giving a two dimensional example. Start with an infinite grid of squares. Each individual square has eight touching neighbors; typically these neighbors are treated the same (a Moore neighborhood), whether they touch a candidate square on a side or at a corner. (An exception is one dimensional CA, where position usually plays a role). We now fill in some of the squares; we shall say that these squares are alive. Discrete time units called generations evolve; at each generation we apply a rule to the current configuration in order to arrive at the configuration for the next generation; in our example we shall use the rule below. (a) If a live cell is touching two or three live cells (called neighbors), then it remains alive next generation, otherwise it dies. (b) If a non-living cell is touching exactly three live cells, it comes to life next generation. Figure 1 depicts the evolution of a simple configuration of filled-in (live) cells for the above rule. There are many notations for describing CA rules; these can differ depending upon the type of CA. For CA of more than one dimension, and in our present discussion, we shall utilize the following notation, which is standard for describing CA in two dimensions with Moore neighborhoods. Later we shall deal with one dimension. We write a rule as E1 ; E2 ; : : : /F1 ; F2 : : : where the Ei (“environment”) specify the number of live neighbors required to keep a living cell alive, and the Fi (“fertility”) give the number required to bring a non-living cell to life. The Ei and Fi will be listed in ascending order; hence if i > j then Ei > E j etc. Thus the rule for the CA given above is 2; 3/3. This rule, discovered by John Horton Conway, was examined in several articles in Scientific American and elsewhere, beginning with the seminal article in 1970 [8]. It is popularly known as Conway’s game of life. Of course it is not really a game in the usual sense, as the outcome is determined as soon as we pick a starting configuration. Note that the shape in Fig. 1 repeats, with a period of two. A repeating form such as this is called an oscillator. Stationary forms can be considered oscillators with

1337

1338

Gliders in Cellular Automata

growth. (We say that rules with expansive growth are unstable). We can easily find gliders for many unstable rules; for example Fig. 4 illustrates some simple constructs for rule 2/2. Note that it is practically impossible NOT to create gliders with this rule! Hence we shall only look at gliders for rules that stabilize (i. e. exhibit bounded growth) and eventually yield only zero or more oscillators. We call such rules GL (game of life) rules. Stability can be a rather murky concept, since there may be some carefully constructed forms within a GL rule that grow without bounds. Typically, such forms would never appear in random configurations. Hence, we shall informally define a GL rule as follows:  All neighbors must be touching the candidate cell and all are treated the same (a Moore neighborhood). Gliders in Cellular Automata, Figure 1 Top: Each cell in a grid has eight neighbors. The cells containing n are neighbors of the cell containing the X. Any cell in the grid can be either dead or alive. Bottom: Here we have outlined a specific area of what is presumably a much larger grid. At the left we have installed an initial shape. Shaded cells are alive; all others are dead. The number within each cell gives the quantity of live neighbors for that cell. (Cells containing no numbers have zero live neighbors). Depicted are three generations, starting with the configuration at generation one. Generations two then three show the result when we apply the following cellular automata rule: Live cells with exactly two or three live neighbors remain alive (otherwise they die); dead cells with exactly three live neighbors come to life (otherwise they remain dead). Let us now evaluate the transition from generation one to generation two. In our diagram, cell a is dead. Since it does not have exactly three live neighbors, it remains dead. Cell b is alive, but it needs exactly two or three live neighbors to remain alive; since it only has one, it dies. Cell c is dead; since it has exactly three live neighbors, it comes to life. And cell d has two live neighbors; hence it will remain alive. And so on. Notice that the form repeats every two generations. Such forms are called oscillators

a period of one. In Figs. 2 and 3 we show several oscillators that move across the grid as they change from generation to generation. Such forms are called translating oscillators, or more commonly, gliders. Conway’s rule popularized the term; in fact a flurry of activity began during which a great many shapes were discovered and exploited. These shapes were named whimsically – “blinker” (Fig. 1), “boat”, “beehive” and an unbelievable myriad of others. Most translating oscillators were given names other than the simple moniker glider – there were “lightweight spaceships”, “puffer trains”, etc. For this article, we shall call all translating oscillators gliders. Of course rule 2; 3/3 is not the only CA rule (even though it is the most interesting). Configurations under some rules always die out, and other rules lead to explosive

 there must exist at least one translating oscillator (a glider).  Random configurations must eventually stabilize. This definition is a bit simplistic; for a more formal definition of a GL rule refer to [5]. Conway’s rule 2; 3/3 is the original GL rule and is unquestionably the most famous CA rule known. A challenge put forth by Conway was to create a configuration that would generate an ever increasing quantity of live cells. This challenge was met by William Gosper in 1970 – back when computing time was expensive and computers were slow by today’s standards. He devised a form that spit out a continuous stream of gliders – a “glider gun”, so to speak. Interestingly, his gun configuration was displayed not as nice little squares, but as a rather primitive typewritten output (Fig. 5); this emphasizes the limited resources available in 1970 for seeking out such complex structures. Soon a cottage industry developed – all kinds of intricate initial configurations were discovered and exploited; such research continues to this day. Other GL Rules in the Square Grid The rule 2; 4; 5/3 is also a GL rule and sports the glider shown in Fig. 6. It has not been seriously investigated and will probably not reveal the vast array of interesting forms that exist under 2; 3/3. Interestingly, 2; 3/3; 8 appears to be a GL rule which not unsurprisingly supports many of the constructs of 2; 3/3. This ability to add terms of high neighbor counts onto known GL rules, obtaining other GL rules, seems to be easy to implement – particularly in higher dimensions or in grids with large neighbor counts such as the triangular grid, which has a neighbor count of 12.

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 2 Here we see a few of the small gliders that exist for 2,3/2. The form at the top – the original glider – was discovered by John Conway in 1968. The remaining forms were found shortly thereafter. Soon after Conway discovered rule 2,3/2 he started to give his various shapes rather whimsical names. That practice continues to this day. Hence, the name glider was given only to the simple shape at the top; the other gliders illustrated were called (from top to bottom) lightweight spaceship, middleweight spaceship and heavyweight spaceship. The numbers give the generation; each of the gliders shown has a period of four. The exact movement of each is depicted by its shifting position in the various small enclosing grids

Gliders in Cellular Automata, Figure 3 The rule 2,3/2 is rich with oscillators – both stationary and translating (i. e. gliders). Here are but two of many hundreds of gliders that exist under this rule. The top form has a period of five and the bottom conglomeration, a period of four

Gliders in Cellular Automata, Figure 4 Gliders exist under a large number of rules, but almost all such rules are unstable. For example the rule 2/2 exhibits rapid unbounded growth, and almost any starting configuration will yield gliders; e. g. just two live cells will produce two gliders going off in opposite directions. But almost any small form will quickly grow without bounds. The form at the bottom left expands to the shape at the right after only 10 generations. The generation is given with each form

1339

1340

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 6 There are a large number of interesting rules that can be written for the square grid and Rule 2; 3/2 is undoubtedly the most fascinating – but it is not the only GL rule. Here we depict a glider that has been found for the rule 2; 4; 5/3. And since that rule stabilizes, it is a valid GL rule. Unfortunately it is not as interesting as 2; 3/2 because its glider is not as likely to appear in random (and other) configurations – hence limiting the ability of 2; 4; 5/3 to produce interesting moving configurations. Note that the period is seven, indicated in parentheses Gliders in Cellular Automata, Figure 5 A fascinating challenge was proposed by Conway in 1970 – he offered $50 to the first person who could devise a form for 2,3/2 that would generate an infinite number of living cells. One such form could be a glider gun – a construct that would create an endless stream of gliders. The challenge was soon met by William Gosper, then a student at MIT. His glider gun is illustrated here. At the top, testifying to the primitive computational power of the time, is an early illustration of Gosper’s gun. At the bottom we see the gun in action, sending out a new glider every thirty generations (here it has sent out two gliders). Since 1970 there have been numerous such guns that generate all kinds of forms – some gliders and some stationary oscillators. Naturally in the latter case the generator must translate across the grid, leaving its intended stationary debris behind

Why Treat All Neighbors the Same? By allowing only Moore neighborhoods in two (and higher) dimensions we greatly restrict the number of rules that can be written. And certainly we could consider specialized neighborhoods – e. g. treat as neighbors only those cells that touch on sides, or touch only the left two corners and nowhere else, or touch anywhere, but state in our rule that two or more live neighbors of a subject cell must not touch each other, etc. But here we are only exploring gliders. Consider the following rule for finding the next generation. 1) A living cell dies. 2) A dead cell comes to life if and only if its left side touches a live cell.

If we start, say, with a single cell we will obtain a glider of one cell that moves to the right one cell each generation! Such rules are easy to construct, as are more complex glider-producing positional rules. So we shall not investigate them further. Yet as we shall see, the neighbor position is an important consideration in one dimensional CA. Gliders in One Dimension One dimensional cellular automata differ from CA in higher dimensions in that the restrictive grid (essentially a single line of cells) limits the number of rules that can be applied. Hence, many 1D CA involve neighborhoods that extend beyond the immediate two touching neighbors of a cell whose next generation status we wish to evaluate. Or more than the two states (alive, dead) may be utilized. For our discussion about gliders, we shall only look at the simplest rules – those involving just the two adjacent neighbors and two states. Unlike 2D (and higher) dimensions, we usually consider the relative position of the neighbors when giving a rule. Since three cells (center, left, right) are involved in determining the state for the next generation of the central cell, we have 23 D 8 possible initial states, with each state leading to a particular outcome. And since each initial state causes a particular outcome (i. e. the cell in the middle lives or dies next generation) we thus we have 28 possible rules. The behavior of these 256 rules has been extensively studied by Wolfram [11] who also introduced a very convenient shorthand that completely describes each rule (Fig. 7).

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 7 The one dimensional rules six and 110 are depicted by the diagram shown. There are eight possible states involving a center cell and its two immediate neighbors. The next generation state for the center cell depends upon the current configuration; each possible current state is given. The rule is specified by the binary number depicted by the next generation state of the center cell, This notation is standard for the simplest 1D CA and was introduced by Wolfram (see [11]), who also converts the binary representation to its decimal equivalent. There are 256 possible rules, but most are not as interesting as rule 110. Rule six is one of many that generate nothing but gliders (see Fig. 8)

Gliders in Cellular Automata, Figure 8 Rule six (along with many others) creates nothing but gliders. At the upper left, we have several generations starting with a single live cell (top). (For 1D CA each successive generation moves vertically down one level on the page.) At the lower left is an enlargement of the first few generations. By following the diagram for rule six in Fig. 7, the reader can see exactly how this configuration evolves. At the top right, we start with a random configuration; at the lower right we have enlarged the small area directly under the large dot. Very quickly, all initial random configurations lead solely to gliders heading west

As we add to the complexity of defining 1D CA we greatly increase the number of possible rules. For example, just by having three states instead of two, we note that now, instead of 23 possible initial states, there are 33 (Fig. 12). This leads to 27 possible initial states, and we now can create 327 unique rules – more than six trillion! Wolfram observed that even with more complex 1D rules, the fundamental behavior for all rules is typified by the simplest rules [11]. Gliders in 1D CA are very common (Figs. 8 and 9) but true GL rules are not, because most gliders for stable rules exist against a uniform patterned background (Figs. 9 through 11) instead of a grid of non-living cells. Two Dimensional Gliders in Non-Square Grids Although most 2D CA research involves a square grid, the triangular tessellation has been investigated somewhat. Here we have 12 touching neighbors; as with the square grid, they are all treated equally (Fig. 13). The increased number of neighbors allows for the possibility of more GL rules (and hence several gliders). Figure 14 shows many of these gliders and their various GL rules. The GL rule 2; 7; 8/3 supports two rather unusual gliders (Figs. 15 and 16) and to date is the only known GL rule other than Conway’s original 2; 3/3 game of life that exhibits glider guns. Figure 17 shows starting configurations for two of these guns and Fig. 18 exhibits evolution of the two guns

Gliders in Cellular Automata, Figure 9 Evolution of rule 110 for the first 500 generations, given a random starting configuration. With 1D CA, we can depict a great many generations on a 2D display screen

1341

1342

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 12 There are 27 possible configurations when we have three states instead of two. Each configuration would yield some specific outcome as in Fig. 7; thus there would be three possible outcomes for each state, and hence 327 distinct rules

Gliders in Cellular Automata, Figure 10 Rule 110 at generations 2000–2500. The structures that move vertically are stationary oscillators; slanted structures can be considered gliders. Unlike higher dimensions, where gliders move in an unobstructed grid with no other live cells in the immediate vicinity, many 1D gliders reside in an environment of oscillating cells (the background pattern). The black square outlines an area depicted in the next figure

Gliders in Cellular Automata, Figure 13 Each cell in the triangular grid has 12 touching neighbors. The subject central cells can have two orientations, E and O

Gliders in Cellular Automata, Figure 11 An area from the previous figure enlarged. One can carefully trace the evolution from one generation to the next. The background pattern repeats every seven generations

after 800 generations. Due to the extremely unusual behavior of the period 80 2; 7; 8/3 glider (Fig. 16), it is highly likely that other guns exist. The hexagonal grid supports the GL rule 3/2, along with GL rules 3; 5/2, 3; 5; 6/2 and 3; 6/2, which all behave in a manner very similar to 3/2. The glider for these three rules is shown in Fig. 19. It is possible that no other distinct hexagonal GL rules exist, because with only six touching neighbors, the set of interesting rules is quite limited. Moreover the fertility portion of the rule must start with two and rules of the form  /2; 3 are unstable. Thus,

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 14 Most of the known GL rules and their gliders are illustrated. The period for each is given in parentheses

Gliders in Cellular Automata, Figure 16 Here we depict the large 2; 7; 8/3 glider. Perhaps flamboyant would be a better description, for this glider spews out much debris as it moves along. It has a period of 80 and its exact motion can be traced by observing its position relative to the black dot. Note that the debris tossed behind does not interfere with the 81st generation, where the entire process repeats 12 cells to the right. By carefully positioning two of these gliders, one can (without too much effort) construct a situation where the debris from both gliders interacts in a manner that produces another glider. This was the method used to discover the two guns illustrated in Figs. 17 and 18

any other hexagonal GL rules must be of the form  /2; 4;  /2; 4; 5; etc. (i. e. only seven other fertility combinations). A valid GL rule has also been found for at least one pentagonal grid (Fig. 19). Since there are several topologically unique pentagonal tessellations (see [10]), probably other pentagonal gliders will be found, especially when all the variants of the pentagonal grid are investigated. Three and Four Dimensional Gliders

Gliders in Cellular Automata, Figure 15 The small 2; 7; 8/3 glider is shown. This glider also exists for the GL rule 2; 7/3. The small horizontal dash is for positional reference

In 1987, the first GL rules in three dimensions were discovered [1,7]. The initially found gliders and their rules are depicted in Fig. 20. It turns out that the 2D rule 2; 3/3 is in many ways contained in the 3D GL rule 5; 6; 7/6. (Note the similarity between the glider at the bottom of Fig. 20 and at the top of Fig. 2). During the ensuing years, several other 3D gliders were found (Figs. 21 and 22). Most of these gliders were unveiled by employing random but symmetric small initial configurations. The large number of live cells in these 3D gliders implies that they are uncommon random occurrences in their respective GL rules; hence it is highly improbable that the plethora of interesting forms (e. g. glider guns) such as those for 2D rule 2; 3/3 exist in three dimensions.

1343

1344

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 19 GL rules are supported in pentagonal and hexagonal grids. The pentagonal grid (left) is called the Cairo Tiling, supposedly named after some paving tiles in that city. There are many different topologically distinct pentagonal grids; the Cairo Tiling is but one. At the right are gliders for the hexagonal rules 3/2 and 3/2; 4; 5. The 3/2 glider also works for 3; 5/2; 3; 5; 6/2 and 3; 6/2. All four of these rules are GL rules. The rule 3/2; 4; 5 is unfortunately disqualified (barely) as a GL rule because very large random blobs will grow without bounds. The periods of each glider are given in parentheses

Gliders in Cellular Automata, Figure 17 The GL rule 2; 7; 8/3 is of special interest in that it is the only known GL rule besides Conway’s rule that supports glider guns – configurations that spew out an endless stream of gliders. In fact, there are probably several such configurations under that rule. Here we illustrate two guns; the top one generates period 18 (small) gliders and the bottom one creates period 80 (large) gliders. Unlike Gosper’s 2; 3/3 gun, these guns translate across the grid in the direction indicated. In keeping with the fanciful jargon for names, translating glider guns are also called “rakes” Gliders in Cellular Automata, Figure 20 The first three dimensional GL rules were found in 1987; these are the original gliders that were discovered. The rule 5; 6; 7/6 is analogous to the 2D rule 2; 3/3 (see [1]). Note the similarity between this glider and the one at the top of Fig. 2

Gliders in Cellular Automata, Figure 18 After 800 generations, the two guns from Fig. 17 will have produced the output shown. Motion is in the direction given by the arrows. The gun at the left yields period 18 gliders, one every 80 generations, and the gun at the right produces a period 80 glider every 160 generations

The 3D grid of dense packed spheres has also been investigated somewhat; here each sphere touches exactly 12 neighbors. What is pleasing about this configuration is that each neighbor is identical in the manner that it touches the subject cell, unlike the square and cubic grids, where some neighbors touch on their sides and others at their corners. The gliders for spherical rule 3/3 are shown in Fig. 23. This rule is a borderline GL rule, as random finite configurations appear to stabilize, but infinite ones apparently do not. Future Directions Gliders are an important by-product of many cellular automata rules. They have made possible the construction

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 21 Several more 3D GL rules were discovered between 1990–1994. They are illustrated here. The 8/5 gliders were originally investigated under the rule 6; 7; 8/5

of extremely complicated forms – most notably within the universe of Conway’s rule, 2; 3/3. (Figs. 24 and 25 illustrate a remarkable example of this complexity). Needless to say many questions remain unanswered. Can a glider gun be constructed for some three dimensional rule? This would most likely be rule 5; 6; 7/6, which is the three dimensional analog of 2; 3/3 [7], but so far no example has been found. The area of cellular automata research is more-or-less in its infancy – especially when we look beyond the square grid. Even higher dimensions have been given a glance; Fig. 26 shows just one of several gliders that are known to exist in four dimensions. Since each cell has 80 touching neighbors, it will come as no surprise that there are a large number of 4D GL rules. But there remains much work do be done in lower dimensions as well. Consider simple one dimensional cellular automata with four possible states. It will be a long time before all 1038 possible rules have been investigated!

Gliders in Cellular Automata, Figure 22 By 2004, computational speed had greatly increased, so another effort was made to find 3D gliders under GL rules; these latest discoveries are illustrated here

1345

1346

Gliders in Cellular Automata

Gliders in Cellular Automata, Figure 23 Some work has been done with the 3D grid of dense packed spheres. Two gliders have been discovered for the rule 3/3, which almost qualifies as a GL rule

Gliders in Cellular Automata, Figure 24 The discovery of the glider in 2; 3/3, along with the development of several glider guns, has made possible the construction of many extremely complex forms. Here we see a Turing machine, developed in 2001 by Paul Rendell. Figure 25 enlarges a small portion of this structure

Bibliography 1. Bays C (1987) Candidates for the Game of Life in Three Dimensions. Complex Syst 1:373–400 2. Bays C (1987) Patterns for Simple Cellular Automata in a Universe of Dense Packed Spheres. Complex Syst 1:853–875 3. Bays C (1994) Cellular Automata in the Triangular Tessellation. Complex Syst 8:127–150 4. Bays C (1994) Further Notes on the Game of Three Dimensional Life. Complex Syst 8:67–73 5. Bays C (2005) A Note on the Game of Life in Hexagonal and Pentagonal Tessellations. Complex Syst 15:245–252

Gliders in Cellular Automata, Figure 25 We have enlarged a tiny portion at the upper left of the Turing machine shown in Fig. 24. One can see the complex interplay of gliders, glider guns, and various other stabilizing forms

Gliders in Cellular Automata, Figure 26 Some work (not much) has been done in four dimensions. Here is an example of a glider for the GL rule 11; 12/12; 13. Many more 4D gliders exist

6. Bays C (2007) The Discovery of Glider Guns in a Game of Life for the Triangular Tessellation. J Cell Autom 2(4):345–350 7. Dewdney AK (1987) The game Life acquires some successors in three dimensions. Sci Am 286:16–22 8. Gardner M (1970) The fantastic combinations of John Conway’s new solitaire game ‘Life’. Sci Am 223:120–123 9. Preston K Jr, Duff MJB (1984) Modern Cellular Automata. Plenum Press, New York 10. Sugimoto T, Ogawa T (2000) Tiling problem of convex pentagon[s]. Forma 15:75–79 11. Wolfram S (2002) A New Kind of Science. Wolfram Media, Champaign Il

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach SALVATORE GRECO1 , BENEDETTO MATARAZZO1 , 2,3 ´ ROMAN SŁOWI NSKI 1 Faculty of Economics, University of Catania, Catania, Italy 2 Pozna´ n University of Technology, Institute of Computing Science, Poznan, Poland 3 Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Article Outline Glossary Definition of the Subject Introduction: Granular Computing and Ordered Data Philosophical Basis of DRSA Granular Computing Dominance-Based Rough Set Approach Fuzzy Set Extensions of the Dominance-Based Rough Set Approach Variable-Consistency Dominance-Based Rough Set Approach (VC-DRSA) Dominance-Based Rough Approximation of a Fuzzy Set Monotonic Rough Approximation of a Fuzzy Set Versus Classical Rough Set Dominance-Based Rough Set Approach to Case-Based Reasoning An Algebraic Structure for Dominance-Based Rough Set Approach Conclusions Future Directions Bibliography Glossary Case-based reasoning Case-based reasoning is a paradigm in machine learning whose idea is that a new problem can be solved by noticing its similarity to a set of problems previously solved. Case-based reasoning regards the inference of some proper conclusions related to a new situation by the analysis of similar cases from a memory of previous cases. Very often similarity between two objects is expressed on a graded scale and this justifies application of fuzzy sets in this context. Fuzzy case-based reasoning is a popular approach in this domain. Decision rule Decision rule is a logical statement of the type “if. . . , then. . . ”, where the premise (condition

part) specifies values assumed by one or more condition attributes and the conclusion (decision part) specifies an overall judgment. Dominance-based rough set approach (DRSA) DRSA permits approximation of a set in universe U based on available ordinal information about objects of U. Also the decision rules induced within DRSA are based on ordinal properties of the elementary conditions in the premise and in the conclusion, such as “if property f i1 is present in degree at least ˛ i1 and . . . property f ip is present in degree at least ˛ ip , then property f iq is present in degree at least ˛ iq ”. Fuzzy sets Differently from ordinary sets in which an object belongs or does not belong to a given set, in a fuzzy set an object belongs to a set in some degree. Formally, in universe U a fuzzy set X is characterized by its membership function  X : U ! [0; 1], such that for any y 2 U, y certainly does not belong to set X if  X (y) D 0, y certainly belongs to X if  X (y) D 1, and y belongs to X with a given degree of certainty represented by the value of  X (y) in all other cases. Granular computing Granular computing is a general computation theory for using granules such as subsets, classes, objects, clusters, and elements of a universe to build an efficient computational model for complex applications with huge amounts of data, information, and knowledge. Granulation of an object a leads to a collection of granules, with a granule being a clump of points (objects) drawn together by indiscernibility, similarity, proximity, or functionality. In human reasoning and concept formulation, the granules and the values of their attributes are fuzzy rather than crisp. In this perspective, fuzzy information granulation may be viewed as a mode of generalization, which can be applied to any concept, method, or theory. Ordinal properties and monotonicity Ordinal properties in description of objects are related to graduality of the presence or absence of a property. In this context, it is meaningful to say that a property is more present in one object than in another object. It is important that the ordinal descriptions are handled properly, which means, without introducing any operation, such as sum, averages, or fuzzy operators, like t-norm or t-conorm of Łukasiewicz, taking into account cardinal properties of data not present in the considered descriptions, which would, therefore, give not meaningful results. Monotonicity is strongly related to ordinal properties. It regards relationships between degrees of presence or absence of properties in the objects, like “the more present is property f i , the more present is property f j ”, or “the more present is property f i , the

1347

1348

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

more absent is property f j ”. The graded presence or absence of a property can be meaningfully represented using fuzzy sets. More precisely, the degree of presence of property f i in object y 2 U is the value given to y by the membership function of the set of objects having property f i . Rough set A rough set in universe U is an approximation of a set based on available information about objects of U. The rough approximation is composed of two ordinary sets called lower and upper approximation. Lower approximation is a maximal subset of objects which, according to the available information, certainly belong to the approximated set, and upper approximation is a minimal subset of objects which, according to the available information, possibly belong to the approximated set. The difference between upper and lower approximation is called boundary. Definition of the Subject This article describes the dominance-based rough set approach (DRSA) to granular computing and data mining. DRSA was first introduced as a generalization of the rough set approach for dealing with multicriteria decision analysis, where preference order is important. The ordering is also important, however, in many other problems of data analysis. Even when the ordering seems absent, the presence or the absence of a property can be represented in ordinal terms, because if two properties are related, the presence, rather than the absence, of one property should make more (or less) probable the presence of the other property. This is even more apparent when the presence or the absence of a property is graded or fuzzy, because in this case, the more credible the presence of a property, the more (or less) probable the presence of the other property. Since the presence of properties, possibly fuzzy, is the basis of any granulation, DRSA can be seen as a general basis for granular computing. After presenting the main ideas of DRSA for granular computing and its philosophical basis, the article introduces the basic concepts of DRSA, followed by its extensions in a fuzzy context and in probabilistic terms. This prepares the ground for treating the rough approximation of a fuzzy set, which is the core of the subject. It is also explained why the classical rough set approach is a specific case of DRSA. The article continues with presentation of DRSA for case-based reasoning, where the main ideas of DRSA for granular computing are fruitfully applied. Finally, some basic formal properties of the whole approach are presented in terms of an algebra modeling the logic of DRSA.

Introduction: Granular Computing and Ordered Data Granular computing originated in the research of Lin [37, 38,39,40,41,42] and Zadeh [57,58,59,60] and gained considerable interest in the last decade. The basic components of granular computing are granules, such as subsets, classes, objects, clusters, and elements of a universe. Granulation of an object a leads to a collection of granules, with a granule being a clump of points (objects) drawn together by indiscernibility, similarity, proximity, or functionality. In human reasoning and concept formulation, the granules and the values of their attributes are fuzzy rather than crisp. In this perspective, fuzzy information granulation may be viewed as a mode of generalization, which can be applied to any concept, method, or theory. Moreover, the theory of fuzzy granulation provides a basis for computing with words, due to the observation that in a natural language, words play the role of labels of fuzzy granules. Since fuzzy granulation plays a central role in fuzzy logic and in its applications, and rough set theory can be considered as a crisp granulation of set theory, it is interesting to study the relationship between fuzzy sets and rough sets from this point of view. Moreover, noticing that fuzzy granulation that leads to fuzzy logic underlies all applications of granulation, hybridization of fuzzy sets and rough sets can lead to a more general theory of granulation with a potential of application to any domain of human investigation. This explains the interest in putting together rough sets and fuzzy sets. Recently, it has been shown that a proper way of handling graduality in rough set theory is to use the DRSA [29]. This implies that DRSA is also a proper way of handling granulation within rough set theory. Let us explain this point in detail. The rough set approach has been proposed to approximate some relationships existing between concepts. For example, in medical diagnosis the concept of “disease Y” can be represented in terms of such concepts as “low blood pressure” and “high temperature”, or “muscle pain” and “headache”. The classical rough approximation is based on a very coarse representation, that is, for each aspect characterizing a concept (“low blood pressure”, “high temperature”, “muscle pain”, etc.), only its presence or its absence is considered relevant. In this case, the rough approximation involves a very primitive idea of monotonicity related to a scale with only two values: “presence” and “absence”. Monotonicity gains importance when a finer representation of the concepts is considered. A representation is finer when, for each aspect characterizing a concept, not only its presence or its absence is taken into account, but

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

also the degree of its presence or absence is considered relevant. Graduality is typical for fuzzy set philosophy [56] and, therefore, a joint consideration of rough sets and fuzzy sets is worthwhile. In fact, rough sets and fuzzy sets capture the two basic complementary aspects of monotonicity: rough sets deal with relationships between different concepts and fuzzy sets deal with expression of different dimensions in which the concepts are considered. For this reason, many approaches have been proposed to combine fuzzy sets with rough sets (see e. g. [3,5,45,49,51]). Our combination of rough sets and fuzzy sets presents some important advantages with respect to the other approaches, which are discussed below. The main preoccupation in almost all the studies combining rough sets with fuzzy sets was related to a fuzzy extension of Pawlak’s definition of lower and upper approximations using fuzzy connectives [10,34]. In fact, there is no rule for the choice of the “right” connective, so this choice is always arbitrary to some extent. Another drawback of fuzzy extensions of rough sets involving fuzzy connectives is that they are based on cardinal properties of membership degrees. In consequence, the result of these extensions is sensitive to order preserving transformation of membership degrees. For example, consider the t-conorm of Łukasiewicz as a fuzzy connective; it may be used in the definition of both fuzzy lower approximation (to build fuzzy implication) and fuzzy upper approximation (as a fuzzy counterpart of a union). The t-conorm of Łukasiewicz is defined as T  (˛; ˇ) D min(˛ C ˇ; 1) ;

˛; ˇ 2 [0; 1] :

T  (˛; ˇ) can be interpreted as follows. Given two fuzzy propositions p and q, putting v(p) D ˛ and v(q) D ˇ, T  (˛; ˇ) can be interpreted as v(p _ q), the truth value of the proposition p _ q. Let us consider the following values of arguments: ˛ D 0:5 ;

ˇ D 0:3 ;

 D 0:2 ;

ı D 0:1 ;

and their order preserving transformation: ˛ 0 D 0:4 ;

ˇ 0 D 0:3 ;

 0 D 0:2 ;

ı 0 D 0:05 :

The values of the t-conorm are in the two cases as follows: T  (˛; ı) D 0:6 ; 

0

0

T (˛ ; ı ) D 0:45 ;

T  (ˇ;  ) D 0:5 ; T  (ˇ 0 ;  0 ) D 0:5 :

One can see that the order of the results has changed after the order preserving transformation of the arguments. This means that the Łukasiewicz t-conorm takes into account not only the ordinal properties of the truth values, but also their cardinal properties. A natural question

arises: is it reasonable to expect from truth values a cardinal content instead of ordinal only? Or, in other words, is it realistic to claim that a human is able to say in a meaningful way not only that (a) “proposition p is more credible than proposition q” but even something like (b) “proposition p is two times more credible than proposition q”? It is much safer to consider information of type (a), because information of type (b) is rather meaningless for a human. Since fuzzy generalization of rough set theory using DRSA takes into account only ordinal properties of fuzzy membership degrees, it is the proper way of fuzzy generalization of rough set theory. Moreover, the classical rough set approach [46,47] can be seen as a specific case of our general model. This is important for several reasons. In particular, this interpretation of DRSA gives an insight into fundamental properties of the classical rough set approach and permits its further generalization. Rough set theory [46,47] relies on the idea that some knowledge (data, information) is available about objects of a universe of discourse U. Thus, a subset of U is defined using the available knowledge about the objects and not on the base of information about membership or non-membership of the objects to the subset. For example, knowledge about patients suffering from a certain disease may contain information about body temperature, blood pressure, etc. All patients described by the same information are indiscernible in view of the available knowledge, and form groups of similar objects. These groups are called elementary sets, and can be considered as basic granules of the available knowledge about patients. Elementary sets can be combined into compound concepts. For example, elementary sets of patients can be used to represent a set of patients suffering from a certain disease. Any union of elementary sets is called a crisp set, while other sets are referred to as rough sets. Each rough set has boundary line objects, i. e. objects which, in view of the available knowledge, cannot be classified with certainty as members of the set or of its complement. Therefore, in the rough set approach, any set is associated with a pair of crisp sets, called the lower and the upper approximation. Intuitively, in view of the available information, the lower approximation consists of all objects which certainly belong to the set and the upper approximation contains all objects which possibly belong to the set. The difference between the upper and the lower approximation constitutes the boundary region of the rough set. Analogously, for a partition of

1349

1350

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

universe U into classes, one may consider rough approximation of the partition. It appeared to be particularly useful for analysis of classification problems, being the most common decision problems. The rough set approach operates on an information table composed of a set U of objects described by a set Q of attributes. If in the set Q disjoint sets (C and D) of condition and decision attributes are distinguished, then the information table is called a decision table. It is often assumed, without loss of generality, that set D is a singleton fdg, and thus decision attribute d makes a partition of set U into decision classes corresponding to its values. Data collected in such a decision table correspond to a multiple attribute classification problem. The classical Indiscernibility-Based Rough Set Approach (IRSA) is naturally adapted to analysis of this type of decision problems, because the set of objects can be identified with examples of classification and it is possible to extract all the essential knowledge contained in the decision table using indiscernibility or similarity relations. However, as pointed out by the authors (see e. g. [17,20,26,54]), IRSA cannot extract all the essential knowledge contained in the decision table if a background knowledge about monotonic relationships between evaluation of objects on condition attributes and their assignment to decision classes has to be taken into account. Such a background knowledge is typical for data describing various phenomena, as well as for data describing multiple criteria decision problems (see e. g. [9]), e. g., “the larger the mass and the smaller the distance, the larger the gravity”, “the more a tomato is red, the more it is ripe” or “the better the school marks of a pupil, the better his overall classification”. The monotonic relationships, typical for multiple criteria decision problems, follow from preferential ordering of value sets of attributes (scales of criteria), as well as preferential ordering of decision classes. In order to take into account the ordinal properties of the considered attributes and the monotonic relationships between condition and decision attributes, a number of methodological changes to the original rough set theory were necessary. The main change was the replacement of the indiscernibility relation with a dominance relation, which permits approximation of ordered sets. The dominance relation is a very natural and rational concept within multiple criteria decision analysis. The dominance-based rough set approach (DRSA) has been proposed and characterized by the authors (see e. g. [17,20,24,25,26,54]). Let us mention that ordered value sets of attributes and a kind of order dependency among attributes has also been considered by specialists of relational databases (see, e. g., [13]). There is, however, a striking difference between consideration of orders in database queries and consider-

ation of orders in knowledge discovery. More precisely, assuming ordered domains of attributes, knowledge discovery tends to discover monotonic relationships between ordered attributes, e. g., if a student is at least medium in networks, and at least good in databases, then his overall evaluation is at least medium. On the other hand, assuming an order of attribute value sets and order dependency in databases, one can exploit this given information for a more efficient answer to a query, e. g., when the dates of bank checks and their numbers are ordered, there is an order dependency between these two attributes, because in day x a check cannot hold a number smaller than checks from day x  1, which permits us to prune the search tree and make the search more efficient. Looking at DRSA from a granular computing perspective, one can observe that DRSA permits us to deal with ordered data by considering a specific type of information granules defined by means of dominance-based constraints having a syntax of the type: “x is at least R” or “x is at most R”, where R is a qualifier from a properly ordered scale. In evaluation space, such granules are dominance cones. In this sense, the contribution of DRSA consists of:  Extending the paradigm of granular computing to problems involving ordered data,  Specifying a proper syntax and modality of information granules (the dominance-based constraints which should be adjoined to other modalities of information constraints, such as possibilistic, veristic, and probabilistic [60]),  Defining a methodology dealing properly with this type of information granules, and resulting in a theory of computing with words and reasoning about data in the case of ordered data. Let us observe that other modalities of information constraints, such as veristic, possibilistic, and probabilistic, have also to deal with ordered values (with qualifiers relative to grades of truth, possibility, and probability). Therefore, granular computing with ordered data and DRSA as a proper way of reasoning about ordered data, are very important in the future development of the whole domain of granular computing. Indeed the DRSA approach proposed in [15,16] avoids arbitrary choice of fuzzy connectives and not meaningful operations on membership degrees. It exploits only ordinal character of the membership degrees and proposes a methodology of fuzzy rough approximation that infers the most cautious conclusion from available imprecise information. In particular, any approximation of knowledge about concept Y using knowledge about con-

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

cept X is based on positive or negative relationships between premises and conclusions, i. e.: (i) “The more x is X, the more it is Y” (positive relationship), (ii) “The more x is X, the less it is Y” (negative relationship). The following simple relationships illustrate i) and ii):  “The larger the market share of a company, the greater its profit” (positive relationship), and  “The greater the debt of a company, the smaller its profit” (negative relationship). These relationships have the form of gradual decision rules [4]. Examples of these decision rules are: “if a car is speedy with credibility at least 0.8 and it has high fuel consumption with credibility at most 0.7, then it is a good car with a credibility at least 0.9”, and “if a car is speedy with credibility at most 0.5 and it has high fuel consumption with credibility at least 0.8, then it is a good car with a credibility at most 0.6”. It is worth noting that the syntax of gradual decision rules is based on monotonic relationships between degrees of credibility, that can also be found in dominance-based decision rules induced from preference-ordered data. This explains why one can build a fuzzy rough approximation using DRSA. Finally, the fuzzy rough approximation taking into account monotonic relationships can be applied to casebased reasoning [28]. In this perspective, it is interesting to consider monotonicity of the type “the more similar is y to x, the more credible is that y belongs to the same class as x”. Application of DRSA in this context leads to decision rules similar to the gradual decision rules: “the more object z is similar to a referent object x w.r.t. condition attribute s, the more z is similar to a referent object x w.r.t. decision attribute t”, or, equivalently, but more technically, s(z; x) ˛

)

t(z; x) ˛ ;

where functions s and t measure the credibility of similarity with respect to condition attribute and decision attribute, respectively. When there are multiple condition and decision attributes, functions s and t aggregate similarity with respect to these attributes. The decision rules induced on the basis of DRSA do not need the aggregation of the similarity with respect to different attributes into one comprehensive similarity. This is important because it permits us to avoid using aggregation operators (weighted average, min, etc.) which are always arbitrary to some extent. Moreover, the DRSA

decision rules permit us to consider different thresholds for degrees of credibility in the premise and in the conclusion. The article is organized as follows. Next Section presents the philosophical basis of DRSA granular computing. Section “Dominance-Based Rough Set Approach” recalls the main steps of DRSA for multiple criteria classification, or, in general, for ordinal classification. Section “Fuzzy Set Extensions of the Dominance-Based Rough Set Approach” presents, a review of fuzzy set extensions of DRSA based on fuzzy connectives. In Sect. “Variable-Consistency Dominance-Based Rough Set Approach (VC– DRSA)”, a “probabilistic” version of DRSA, the variableconsistency dominance-based rough set approach is presented. Dominance-based rough approximation of a fuzzy set is presented in Sect “Dominance-Based Rough Approximation of a Fuzzy Set”. The explanation that IRSA is a particular case of DRSA is presented in Sect. “Monotonic Rough Approximation of a Fuzzy Set versus Classical Rough Set”. Section “Dominance-Based Rough Set Approach to Case-Based Reasoning” is devoted to DRSA for case-based reasoning. In Sect. “An Algebraic Structure for Dominance-Based Rough Set Approach” an algebra modeling logic of DRSA is presented. Section “Conclusions” contains conclusions. In Sect. “Future Directions” some issues for further developments are presented. Philosophical Basis of DRSA Granular Computing It is interesting to analyze the relationships between DRSA and granular computing from the point of view of the philosophical basis of rough set theory proposed by Pawlak. Since according to Pawlak [48], rough set theory refers to some ideas of Gottlob Frege (vague concepts), Gottfried Leibniz (indiscernibility), George Boole (reasoning methods), Jan Łukasiewicz (multi-valued logic), and Thomas Bayes (inductive reasoning), it is meaningful to give an account for DRSA generalization of rough sets, justifying it in reference to some of these main ideas recalled by Pawlak. The identity of indiscernibles is a principle of analytic ontology first explicitly formulated by Gottfried Leibniz in his Discourse on Metaphysics, Sect. 9 [43]. Two objects x and y are defined indiscernible, if x and y have the same properties. The principle of identity of indiscernibles states that if x and y are indiscernible, then x D y :

(II1)

This can be expressed also as if x ¤ y, then x and y are discernible, i. e. there is at least one property that x has and y does not, or vice versa. The converse of the principle of

1351

1352

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

identity of indiscernibles is called indiscernibility of identicals and states that if x D y, then x and y are indiscernible, i. e. they have the same properties. This is equivalent to saying that if there is at least one property that x has and y does not, or vice versa, then x ¤ y. The conjunction of both principles is often referred to as “Leibniz’s law”. Rough set theory is based on a weaker interpretation of Leibniz’s law, having as objective the ability to classify objects falling under the same concept. This reinterpretation of Leibniz’s law is based on a reformulation of the principle of identity of indiscernibles as follows: if x and y are indiscernible, then x and y belong to the same class.

(II2)

Let us observe that the word “class” in the previous sentence can be considered as synonymous with “granule”. Thus, from the point of view of granular computing, (II2) can be rewritten as if x and y are indiscernible, then x and y belong to the same granule of classification.

(II2’)

Notice also that the principle of indiscernibility of identicals cannot be reformulated in analogous terms. In fact, such an analogous reformulation would amount to stating that if x and y belong to the same class, then x and y are indiscernible. This principle is too strict, however, because there can be two discernible objects x and y belonging to the same class. Thus, within rough set theory, the principle of indiscernibility of identicals should continue to hold in its original formulation (i. e. if x D y, then x and y are indiscernible). It is worthwhile to observe that the relaxation in the consequence of the implication from (II1) to (II2), implies an implicit relaxation also in the antecedent. In fact, one could say that two objects are identicals if they have the same properties, if one would be able to take into account all conceivable properties. For human limitations this is not the case, therefore, one can imagine that (II2) can be properly reformulated as if x and y are indiscernible taking into account a given set of properties; then x and y belong to the same class.

(II2’’)

This weakening in the antecedent of the implication means also that the objects indiscernible with respect to a given set of properties can be seen as a granule, such that, finally, the (II2) could be rewritten in terms of granulation as if x and y belong to the same granule with respect to a given set of properties; then x and y belong to the same classification granule. (II2’’’)

For this reason, rough set theory needs a still weaker form of the principle of identity of indiscernibles. Such a principle can be formulated using the idea of vagueness due to Gottlob Frege. According to Frege “the concept must have a sharp boundary – to the concept without a sharp boundary there would correspond an area that had not a sharp boundary-line all around”. Therefore, following this intuition, the principle of identity of indiscernibles can be further reformulated as if x and y are indiscernible, then x and y should belong to the same class.

(II3)

In terms of granular computing, (II3) can be rewritten as if x and y belong to the same granule with respect to a given set of properties; then x and y should belong to the same classification granule. (II3’) This reformulation of the principle of identity of indiscernibles implies that there is an inconsistency in the statement that x and y are indiscernible, and x and y belong to different classes. Thus, Leibniz’s principle of identity of indiscernibles and Frege’s intuition about vagueness found the basic idea of the rough set concept proposed by Pawlak. The above reconstruction of the basic idea of Pawlak’s rough set should be completed, however, by referring to another basic idea. This is the idea of Georg Boole that concerns a property which is satisfied or not satisfied. It is quite natural to weaken this principle admitting that a property can be satisfied to some degree. This idea of graduality can be attributed to Jan Łukasiewicz and his proposal of many-valued logic where, in addition to wellknown truth values “true” and “false”, other truth values representing partial degrees of truth were present. Łukasiewicz’s idea of graduality has been reconsidered, generalized and fully exploited by Zadeh [56] within fuzzy set theory, where graduality concerns membership to a set. In this sense, any proposal of putting rough sets and fuzzy sets together can be seen as a reconstruction of the rough set concept, where Boole’s idea of binary logic is abandoned in favor of Łukasiewicz’s idea of many-valued logic, such that Leibniz’s principle of identity of indiscernibles and Frege’s intuition about vagueness are combined with the idea that a property is satisfied to some degree. Putting aside, for the moment, Frege’s intuition about vagueness, but taking into account the concept of graduality, the principle of identity of indiscernibles can reformulated as follows:

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

if the grade of each property for x is greater than or equal to the grade for y; then x belongs to the considered class in a grade at least as high as y.

(II4)

Taking into account the paradigm of granular computing, (II4) can be rewritten as if x belongs to the granules defined by considered properties more than y, because the grade of each property for x is greater than or equal to the grade for y, then x belongs to the considered classification granule in a grade at least as high as y. (II4’) Considering the concept of graduality together with Frege’s intuition about vagueness, one can reformulate the principle of identity of indiscernibles as follows: if the grade of each property for x is greater than or equal to the grade for y; then x should belong to the considered class in a grade at least as high as y.

(II5)

In terms of granular computing, (II5) can be rewritten as if x belongs to the granules defined by considered properties more than y, because the grade of each property for x is greater than or equal to the grade for y; then x should belong to the considered classification granule in a grade at least as high as y. (II5’) The formulation (II5’) of the principle of identity of indiscernibles is perfectly concordant with the rough set concept defined within the dominance-based rough set approach [20]. DRSA has been proposed by the authors to deal with ordinal properties of data related to preferences in decision problems [26,54]. The fundamental feature of DRSA is that it handles monotonicity of comprehensive evaluation of objects with respect to preferences relative to evaluation of these objects on particular attributes. For example, the more preferred is a car with respect to such attributes as maximum speed, acceleration, fuel consumption, and price, the better is its comprehensive evaluation. The type of monotonicity considered within DRSA is also meaningful for problems where relationships between different aspects of a phenomenon described by data are to be taken into account, even if preferences are not considered. Indeed, monotonicity concerns, in general, mutual trends existing between different variables, like distance and gravity in physics, or inflation rate and interest rate

in economics. Whenever a relationship between different aspects of a phenomenon is discovered, this relationship can be represented by a monotonicity with respect to some specific measures of the considered aspects. Formulation (II5) of the principle of identity of indiscernibles refers to this type of monotonic relationships. So, in general, the monotonicity permits us to translate into a formal language a primitive intuition of relationships between different concepts of our knowledge corresponding to the principle of identity of indiscernibles formulated as (II5’). Dominance-Based Rough Set Approach This section presents the main concepts of the dominancebased rough set approach (for a more complete presentation see, for example, [17,20,26,54]). Information about objects is represented in the form of an information table. The rows of the table are labeled by objects, whereas columns are labeled by attributes and entries of the table are attribute-values. Formally, an information system (table) is the 4-tuple S D hU; Q; V ; i, where U is a finite set of objects, Q is a finite set of atS tributes, V D q2Q Vq and V q is the set of values of the attribute q, and  : U  Q ! Vq is a total function such that (x; q) 2 Vq for every q 2 Q, x 2 U, called an information function [47]. The set Q is, in general, divided into set C of condition attributes and set D of decision attributes. Condition attributes with value sets ordered according to decreasing or increasing preference are called criteria. For criterion q 2 Q, q is a weak preference relation on U such that x q y means “x is at least as good as y with respect to criterion q”. It is supposed that q is a complete preorder, i. e. a strongly complete and transitive binary relation, defined on U on the basis of evaluations (; q). Without loss of generality, the preference is supposed to increase with the value of (; q) for every criterion q 2 C, such that for all x; y 2 U, x q y if and only if (x; q) (y; q). Furthermore, it is supposed that the set of decision attributes D is a singleton d. Values of decision attribute d makes a partition of U into a finite number of decision classes, Cl D fCl t ; t D 1; : : : ; ng, such that each x 2 U belongs to one and only one class Cl t 2 Cl. It is supposed that the classes are preference-ordered, i. e. for all r; s 2 f1; : : : ; ng, such that r > s, the objects from Cl r are preferred to the objects from Cl s . More formally, if  is a comprehensive weak preference relation on U, i. e. if for all x; y 2 U, x  y means “x is at least as good as y”, it is supposed: [x 2 Cl r ; y 2 Cls ; r>s] ) [x  y and not y  x]. The above assumptions are typical for consideration of or-

1353

1354

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

dinal classification problems (also called multiple criteria sorting problems). The sets to be approximated are called upward union and downward union of classes, respectively: Cl t D

[

Cls ;

Cl t D

st

[

Cls ;

t D 1; : : : ; n :

st

The statement x 2 Cl t means “x belongs to at least class Cl t ”, while x 2 Cl t means “x belongs to at most class Cl t ”. Let us remark that Cl1 D Cl n D U, Cl n D Cl n and Cl1 D Cl1 . Furthermore, for t D 2; : : : ; n,  D U  Cl t Cl t1

Cl t

DU

 Cl t1

and :

The key idea of the rough set approach is representation (approximation) of knowledge generated by decision attributes, by “granules of knowledge” generated by condition attributes. In DRSA, where condition attributes are criteria and decision classes are preference ordered, the represented knowledge is a collection of upward and downward unions of classes and the “granules of knowledge” are sets of objects defined using a dominance relation. x dominates y with respect to P C (shortly, x P-dominates y), denoted by xD P y, if for every criterion q 2 P, (x; q) (y; q). The relation of P-dominance is reflexive and transitive, that is it is a partial preorder. Given a set of criteria P C and x 2 U, the “granules of knowledge” used for approximation in DRSA are:  A set of objects dominating x, called P-dominating set, DC P (x) D fy 2 U : yD P xg,  A set of objects dominated by x, called P-dominated set, D P (x) D fy 2 U : xD P yg. Note that the “granules of knowledge” defined above have the form of upward (positive) and downward (negative) dominance cones in the evaluation space. Let us recall that the dominance principle (or Pareto principle) requires that an object x dominating object y on all considered criteria (i. e. x having evaluations at least as good as y on all considered criteria) should also dominate y on the decision (i. e. x should be assigned to at least as good a decision class as y). This principle is the only objective principle that is widely agreed upon in the multiple criteria comparisons of objects. Given P C, the inclusion of an object x 2 U to the upward union of classes Cl t , t D 2; : : : ; n, is inconsistent with the dominance principle if one of the following conditions holds:

 x belongs to class Cl t or better but it is P-dominated by an object y belonging to a class worse than Cl t , i. e.  x 2 Cl t but DC P (x) \ Cl t1 ¤ ;,  x belongs to a worse class than Cl t but it P-dominates an object y belonging to class Cl t or better, i. e. x … Cl t  but D P (x) \ Cl t ¤ ;. If, given a set of criteria P C, the inclusion of x 2 U to Cl t , where t D 2; : : : ; n, is inconsistent with the dominance principle, then x belongs to Cl t with some ambiguity. Thus, x belongs to Cl t without any ambiguity with respect to P C, if x 2 Cl t and there is no inconsistency with the dominance principle. This means that all  objects P-dominating x belong to Cl t , i. e. DC P (x) Cl t .  Furthermore, x possibly belongs to Cl t with respect to P C if one of the following conditions holds:  According to decision attribute d, x belongs to Cl t ,  According to decision attribute d, x does not belong to Cl t , but it is inconsistent in the sense of the dominance principle with an object y belonging to Cl t . In terms of ambiguity, x possibly belongs to Cl t with respect to P C, if x belongs to Cl t with or without any ambiguity. Because of the reflexivity of the dominance relation DP , the above conditions can be summarized as follows: x possibly belongs to class Cl t or better, with respect to P C, if among the objects P-dominated by x there is an object y belonging to class Cl t or better, i. e.  D P (x) \ Cl t ¤ ;. The P-lower approximation of Cl t , denoted by PCl t , and P-upper approximation of Cl t , denoted by  the  P Cl t , are defined as follows (t D 1; : : : ; n):   ˚  P Cl t D x 2 U : DC ; P (x) Cl t   ˚   P Cl t D x 2 U : D P (x) \ Cl t ¤ ; : Analogously, one can define the P-lower approximation and the P-upper approximation of Cl t as follows (t D 1; : : : ; n):   ˚  ; P Cl t D x 2 U : D P (x) Cl t   ˚   P Cl t D x 2 U : DC P (x) \ Cl t ¤ ; : The P-lower and P-upper approximations so defined satisfy the following inclusion properties for each t 2 f1; : : : ; ng and for all P C:     P Cl t Cl t P Cl t ;     P Cl t Cl t P Cl t :

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

The P-lower and P-upper approximations of Cl t and have an important complementarity property, according to which, Cl t

  P Cl t D U   P Cl t D U   P Cl t D U   P Cl t D U

    P Cl t1 and     P Cl t1 ; t D 2; : : : ; n ;     P Cl tC1 and     P Cl tC1 ; t D 1; : : : ; n  1 :

  The  P-boundary   of  Cl t and Cl t , denoted by Bn P Cl t and Bn P Cl t respectively, are defined as follows (t D 1; : : : ; n):       Bn P Cl t D P Cl t  P Cl t ;       Bn P Cl t D P Cl t  P Cl t :

Because of the complementarity property, Bn P (Cl t )  D Bn P (Cl t1 ), for t D 2; : : : ; n. The dominance-based rough approximations of upward and downward unions of classes can serve to induce “if. . . , then. . . ” decision rules. It is meaningful to consider the following five types of decision rules: 1) Certain D -decision rules: if x q1 q1 r q1 and x q2 q2 r q2 and : : : x q p q p r q p , then certainly x belongs to Cl t , where, for each w q ; z q 2 X q , “w q q z q ” means “wq is at least as good as zq ”. 2) Possible D -decision rules: if x q1 q1 r q1 and x q2 q2 r q2 and : : : x q p q p r q p , then x possibly belongs to Cl t . 3) Certain D -decision rules: if x q1 q1 r q1 and x q2 q2 r q2 and : : : x q p q p r q p , then certainly x belongs to Cl t , where, for each w q ; z q 2 X q , “w q q z q ” means “wq is at most as good as zq ”. 4) Possible D -decision rules: if x q1 q1 r q1 and x q2 q2 r q2 and : : : x q p q p r q p , then x possibly belongs to Cl t . 5) Approximate D -decision rules: if x q1 q1 r q1 and : : : x q k q k r q k and x q(kC1) q(kC1) r q(kC1) and : : : x q p q p r q p , then x 2 Cls \ Cl t , where s < t. The rules of type 1) and 3) represent certain knowledge extracted from the decision table, while the rules of type 2) and 4) represent possible knowledge. Rules of type 5) represent doubtful knowledge. Fuzzy Set Extensions of the Dominance-Based Rough Set Approach The concept of dominance can be refined by introducing gradedness through the use of fuzzy sets. Here are basic

definitions of fuzzy connectives [10,34]. For each proposition p, one can consider its truth value v(p) ranging from v(p) D 0 (p is definitely false) to v(p) D 1 (p is definitely true); and for all intermediate values, the greater v(p), the more credible is the truth of p. A negation is a non-increasing function N : [0; 1] ! [0; 1] such that N(0) D 1 and N(1) D 0. Given proposition p, N(v(p)) states the credibility of the negation of p. A t-norm T and a t-conorm T  are two functions T : [0; 1]  [0; 1] ! [0; 1] and T  : [0; 1]  [0; 1] ! [0; 1], such that given two propositions, p and q, T(v(p); v(q)) represents the credibility of the conjunction of p and q, and T  (v(p); v(q)) represents the credibility of the disjunction of p and q. t-norm T and t-conorm T  must satisfy the following properties: T(˛; ˇ) D T(ˇ; ˛) and

T  (˛; ˇ) D T  (ˇ; ˛) ;

for all ˛; ˇ 2 [0; 1] ; T(˛; ˇ)  T(; ı)

and

T  (˛; ˇ)  T  (; ı) ;

for all ˛; ˇ; ; ı 2 [0; 1] such that ˛  

and ˇ  ı ;

T(˛; T(ˇ;  )) D T(T(˛; ˇ);  ) and T  (˛; T  (ˇ;  )) D T  (T  (˛; ˇ);  ) ; for all ˛; ˇ;  2 [0; 1] ; T(1; ˛) D ˛

and

T  (0; ˛) D ˛ ;

for all ˛ 2 [0; 1] : A negation is strict iff it is strictly decreasing and continuous. A negation N is involutive iff, for all ˛ 2 [0; 1], N(N(˛)) D ˛. A strong negation is an involutive strict negation. If N is a strong negation, then (T; T  ; N) is a de Morgan triplet iff N(T  (˛; ˇ)) D T(N(˛); N(ˇ)). A fuzzy implication is a function I : [0; 1]  [0; 1] ! [0; 1] such that, given two propositions p and q, I(v(p); v(q)) represents the credibility of the implication of q by p. A fuzzy implication must satisfy the following properties (see [10]): I(˛; ˇ) I(; ˇ) for all ˛; ˇ;  2 [0; 1] ; such that ˛   ; I(˛; ˇ) I(˛;  ) for all ˛; ˇ;  2 [0; 1] ; such that ˇ  ; I(0; ˛) D 1; I(˛; 1) D 1

for all ˛ 2 [0; 1] ;

I(1; 0) D 0 :  An implication I ! N;T  is a T -implication if there  is a t-conorm T and a strong negation N such that  I! N;T  (˛; ˇ) D T (N(˛); ˇ). A fuzzy similarity relation on

1355

1356

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

the universe U is a fuzzy binary relation (i. e. function R : U  U ! [0; 1]) reflexive (R(x; x) D 1 for all x 2 U), symmetric (R(x; y) D R(y; x) for all x; y 2 U) and transitive (given t-norm T, T(R(x; y); R(y; z))  R(x; z) for all x; y; z 2 U). Let q be a fuzzy weak preference relation on U with respect to criterion q 2 C, i. e. q : U  U ! [0; 1], such that, for all x; y 2 U, q (x; y) represents the credibility of the proposition “x is at least as good as y with respect to criterion q”. Suppose that q is a fuzzy partial T-preorder, i. e. that it is reflexive (q (x; x) D 1 for each x 2 U) and T-transitive (T(q (x; y); q (y; z)) q (x; z), for each x; y; z 2 U) (see [10]). Using the fuzzy weak preference relations q , q 2 C, a fuzzy dominance relation on U (denotation D P (x; y)) can be defined, for all P C, as follows: D P (x; y) D Tq2P (q (x; y)) : Given (x; y) 2 U  U, D P (x; y) represents the credibility of the proposition “x is at least as good as y with respect to each criterion q from P”. Since the fuzzy weak preference relations q are supposed to be partial T-preorders, then also the fuzzy dominance relation DP is a partial T-preorder. Furthermore, let Cl D fCl t ; t D 1; : : : ; ng be a set of fuzzy classes in U such that, for each x 2 U, Cl t (x) represents the membership function of x to Cl t . It is supposed, as before, that the classes of Cl are increasingly ordered, i. e. that for all r; s 2 f1; : : : ; ng such that r > s, the objects from Clr have a better comprehensive evaluation than the objects from Cls . On the basis of the membership functions of the fuzzy class Cl t , fuzzy membership functions of two other sets can be defined as follows: 1) The upward union fuzzy set Cl t , whose membership function Cl t (x) represents the credibility of the proposition “x is at least as good as the objects in Cl t ”: 9 8 if 9s 2 f1; : : : ; ng : Cl s (x) > 0 = < 1 ; and s > t Cl t (x) D ; : Cl t (x) otherwise 2) the downward union fuzzy set Cl t , whose membership function Cl t (x) represents the credibility of the proposition “x is at most as good as the objects in Cl t ”: 9 8 if 9s 2 f1; : : : ; ng : Cl s (x) > 0 = < 1  : and s < t Cl t (x) D ; : Cl t (x) otherwise The P-lower and the P-upper approximations of Cl t with respect to P C are fuzzy sets in U, whose membership functions, denoted by P[Cl t (x)] and P[Cl t (x)]

respectively, are defined as:      P Cl t (x) D Ty2U T  N(D P (y; x)); Cl t (y) ;       T D P (x; y); Cl t (y) : P Cl t (x) D Ty2U P[Cl t (x)] represents the credibility of the proposition “for all y 2 U, y does not dominate x with respect to criteria from P or y belongs to Cl t ”, while P[Cl t (x)] represents the credibility of the proposition “there is at least one y 2 U dominated by x with respect to criteria from P which belongs to Cl t ”. The P-lower and P-upper approximations of Cl t with respect to P C, denoted by P[Cl t (x)] and P[Cl t (x)] respectively, can be defined, analogously, as:      P Cl t (x) D Ty2U T  N(D P (x; y)); Cl t (y) ;       P Cl t (x) D Ty2U T D P (y; x); Cl t (y) :   P Cl t (x) represents the credibility of the proposition “for all y 2 U, x does not dominate y with respect to criteria from P or y belongs to Cl t ”, while P[Cl t (x)] represents the credibility of the proposition “there is at least one y 2 U dominating x with respect to criteria from P which belongs to Cl t ”. Let us remark that, using the definition of the T  -implication, it is possible to rewrite the definitions of P[Cl t (x)], P[Cl t (x)], P[Cl t (x)] and P[Cl t (x)], in the following way:       ; P Cl t (x) D Ty2U I ! T  ;N D P (y; x); Cl t (y)    P Cl t (x) D        D (x; y); N Cl t (y) N I! Ty2U ;  P T ;N         ; P Cl t (x) D Ty2U I ! T  ;N D P (x; y); Cl t (y)    P Cl t (x) D        N I! : Ty2U T  ;N D P (y; x); N Cl t (y) The following results can be proved: 1) for each x 2 U and for each t 2 f1; : : : ; ng,     P Cl t (x)  Cl t (x)  P Cl t (x) ;     P Cl t (x)  Cl t (x)  P Cl t (x) ; 2) if (T; T  ; N) constitute a de Morgan triplet and  (x) for each x 2 U and if N[Cl t (x)] D Cl t1 t D 2; : : : ; n, then       P Cl t (x) D N P Cl t1 (x) ;       P Cl t (x) D N P Cl t1 (x) ; t D 2; : : : ; n ;       (x) ; P Cl t (x) D N P Cl tC1       P Cl t (x) D N P Cl tC1 (x) ; t D 1; : : : ; n  1 ;

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

3) for all P R C, for all x 2 U and for each t 2 f1; : : : ; ng,         P Cl t (x)  R Cl t (x) ; P Cl t (x) R Cl t (x) ;         P Cl t (x)  R Cl t (x) ; P Cl t (x) R Cl t (x) : Results 1) to 3) can be read as fuzzy counterparts of the following results well-known within the classical rough set approach: 1) (inclusion property) says that Cl t and Cl t include their P-lower approximations and are included in their P-upper approximations; 2) (complementarity property) says that the P-lower (P-upper) approximation of Cl t is the complement of the P-upper (P-lower) approximation of its comple , (analogous property holds for Cl t mentary set Cl t1  and Cl tC1 ); 3) (monotonicity with respect to sets of attributes) says that enlarging the set of criteria, the membership to the lower approximation does not decrease and the membership to the upper approximation does not increase. Greco, Inuiguchi, and Słowi´nski [14] proposed, moreover, the following fuzzy rough approximations based on dominance, which go in line with the fuzzy rough approximation by Dubois and Prade [3,5], concerning classical rough sets:      P Cl t (x) D inf y2U I D P (y; x); Cl t (y) ;      P Cl t (x) D sup y2U T D P (x; y); Cl t (y) ;      P Cl t (x) D inf y2U I D P (x; y); Cl t (y) ;      P Cl t (x) D sup y2U T D P (y; x); Cl t (y) : Using fuzzy rough approximations based on DRSA, one can induce decision rules having the same syntax as the decision rules obtained from crisp DRSA. In this case, however, each decision rule has a fuzzy credibility. Variable-Consistency Dominance-Based Rough Set Approach (VC-DRSA) The definitions of rough approximations introduced in Sect. “Dominance-Based Rough Set Approach” are based on a strict application of the dominance principle. However, when defining non-ambiguous objects, it is reasonable to accept a limited proportion of negative examples, particularly for large data tables. Such an extended version of DRSA is called the variable-consistency DRSA model (VC-DRSA) [31]. For any P C, x 2 U belongs to Cl t without any ambiguity at consistency level l 2 (0; 1], if x 2 Cl t and at

least l  100% of all objects y 2 U dominating x with respect to P also belong to Cl t , i. e., for t D 2; : : : ; n, ˇ C ˇ ˇ D (x) \ Cl  ˇ t P ˇ C ˇ

l: ˇ D (x)ˇ P The level l is called consistency level because it controls the degree of consistency with respect to objects qualified as belonging to Cl t without any ambiguity. In other words, if l < 1, then at most (1  l)  100% of all objects y 2 U dominating x with respect to P do not belong to Cl t and thus contradict the inclusion of x in Cl t . Analogously, for any P C, x 2 U belongs to Cl t without any ambiguity at consistency level l 2 (0; 1], if x 2 Cl t and at least l  100% of all the objects y 2 U dominated by x with respect to P also belong to Cl t , i. e., for t D 1; : : : ; n  1, ˇ ˇ  ˇD (x) \ Cl  ˇ t Pˇ ˇ

l: ˇ D (x)ˇ P

The concept of non-ambiguous objects at some consistency level l leads naturally to the corresponding definition of P-lower approximations of the unions of classes Cl t and Cl t , respectively: ) ( ˇ ˇ C ˇ D (x) \ Cl  ˇ   t  P l ˇ C ˇ

l ; P Cl t D x 2 Cl t : ˇ D (x)ˇ P

t D 2; : : : ; n ; ( ) ˇ ˇ  ˇ D (x) \ Cl  ˇ   t  Pˇ l ˇ P Cl t D x 2 Cl t :

l ; ˇ D (x)ˇ P

t D 1; : : : ; n  1 : Given P C and consistency level l, the corresponding P-upper approximations of Cl t and Cl t , denoted l

l

by P (Cl t ) and P (Cl t ), respectively, can be defined as   a complement of P l (Cl t1 ) and P l (Cl tC1 ) with respect to U:     l  ; t D 2; : : : ; n ; P Cl t D U  P l Cl t1    l   l P Cl t D U  P Cl tC1 ; t D 1; : : : ; n  1 : l

P (Cl t ) can be interpreted as a set of all the objects belonging to Cl t , possibly ambiguous at consistency level l. l

Analogously, P (Cl t ) can be interpreted as a set of all the objects belonging to Cl t , possibly ambiguous at consistency level l. The P-boundaries (P-doubtful regions) of Cl t and Cl t at consistency level l are defined as:      l  Bn Pl Cl t D P Cl t  P l Cl t ; t D 2; : : : ; n      l  Bn Pl Cl t D P Cl t  P l Cl t ; t D 1; : : : ; n  1 :

1357

1358

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

The variable consistency model of the dominancebased rough set approach provides some degree of flexibility in assigning objects to lower and upper approximations of the unions of decision classes. The following properties can be easily proved: for 0 < l 0 < l  1,    0  P l Cl t P l Cl t t D 2; : : : ; n ;    0  P l Cl t P l Cl t

and

and

  l  l0  P Cl t  P Cl t ; l





P Cl t

P

l0





Cl t

;

t D 1; : : : ; n  1 : The following two basic types of variable-consistency decision rules can be considered: 1. D -decision rules with the following syntax: “if (x; q1) r q1 and (x; q2) r q2 and . . . (x; qp)

r q p , then x 2 Cl t ” with confidence ˛ (i. e. in fraction ˛ of considered cases), where P D fq1; : : : ; qpg C, (r q1 ; : : : ; r q p ) 2 Vq1  Vq2 : : :  Vq p and t D 2; : : : ; n; 2. D -decision rules with the following syntax: “if (x; q1)  r q1 and (x; q2)  r q2 and . . . (x; qp)  r q p , then x 2 Cl t ” with confidence ˛, where P D fq1; : : : ; qpg C, (r q1 ; : : : ; r q p ) 2 Vq1 Vq2 : : :Vq p and t D 1; : : : ; n  1. The variable consistency model is inspired by the variable precision model proposed by Ziarko [61,62] within the classical indiscernibility-based rough set approach. Dominance-Based Rough Approximation of a Fuzzy Set This section shows how the dominance-based rough set approach can be used for rough approximation of fuzzy sets. A fuzzy information base is the 3-tuple B D hU; F; 'i, where U is a finite set of objects (universe), F D f f1 ; f2 ; : : : ; f m g is a finite set of properties, and ' : U  F ! [0; 1] is a function such that '(x; f h ) 2 [0; 1] expresses the credibility that object x has property f h . Each object x from U is described by a vector DesF (x) D ['(x; f1 ); : : : ; '(x; f m )] ; called description of x in terms of the degrees to which it has properties from F; it represents the available information about x. Obviously, x 2 U can be described in terms of any non-empty subset E F and in this case DesE (x) D ['(x; f h ); f h 2 E] : For any E F, the dominance relation DE can be defined as follows: for all x, y 2 U, x dominates y with respect

to E (denotation xD E y) if, for any f h 2 E, '(x; f h ) '(y; f h ) : Given E F and x 2 U, let DC E (x) D fy 2 U : yD E xg;

D E (x) D fy 2 U : xD E yg:

Let us consider a fuzzy set X in U, with its membership function  X : U ! [0; 1]. For each cutting level ˛ 2 [0; 1] and for  2 f ; >g, the E-lower and the E-upper approximation of X ˛ D fy 2 U :  X (y)  ˛g with respect to E F (denotation E(X ˛ ) and E(X ˛ ), respectively), can be defined as:  ˚ ˛ ; E(X ˛ ) D x 2 U : DC E (x) X ˚  ˛  ˛ E(X ) D x 2 U : D E (x) \ X ¤ ; : Rough approximations E(X ˛ ) and E(X ˛ ) can be expressed in terms of unions of granules DC E (x) as follows: [˚  C ˛ DC ; E(X ˛ ) D E (x) : D E (x) X x2U

E(X ˛ ) D



  ˛ ¤; : DC E (x) : D E (x) \ X

x2U

Analogously, for each cutting level ˛ 2 [0; 1] and for ˘ 2 f; ˛ ), E(X >˛ ), E(X ˛ ), E(X ˛ E(X >˛ ) ; E(X ˛ ) and E(x) (X >˛ ). Observe that the lower approximation of X ˛ with respect to x contains all the objects y 2 U such that any object w, being similar to x at least as much as y is similar to x w.r.t. all the considered features E F, also belongs to X ˛ . Thus, the data from the fuzzy pairwise information base B confirm that if w is similar to x not less than y 2 E(x) (X ˛ ) is similar to x w.r.t. all the considered features E F, then w belongs to X ˛ . In other words, x is a reference object and

Granular Computing and Data Mining for Ordered Data: The Dominance-Based Rough Set Approach

y 2 E(x) (X ˛ ) is a limit object which belongs “certainly” to set X with credibility at least ˛; the limit is understood such that all objects w that are similar to x w.r.t. considered features at least as much as y is similar to x, also belong to X with credibility at least ˛. Analogously, the upper approximation of X ˛ with respect to x contains all objects y 2 U such that there is at least one object w, being similar to x at most as much as y is similar to x w.r.t. all the considered features E F, which belongs to X ˛ . Thus, the data from the fuzzy pairwise information base B confirm that if w is similar to x not less than y 2 E(x) (X ˛ ) is similar to x w.r.t. all the considered features E F, then it is possible that w belongs to X ˛ . In other words, x is a reference object and y 2 E(x) (X ˛ ) is a limit object which belongs “possibly” to set X with credibility at least ˛; the limit is understood such that all objects z 2 U similar to x not less than y w.r.t. considered features, possibly belong to X ˛ . For each x 2 U and ˛ 2 [0; 1] and ˘ 2 f; jN2 j for specificity, then any payoff vector in the core assigns $1 to each player of the scarce type; that is, players with a RH glove each receive 0 while players with a LH glove each receive $1.

Not all games have nonempty cores, as the following example illustrates. Example 2 (A simple majority game with an empty core) Let N D f1; 2; 3g and define the function v as follows:

v(S) D

0 if jSj D 1 ; 1 otherwise :

It is easy to see that the core of the game is empty. For if a payoff vector u were in the core, then it must hold that for any i 2 N; u i 0 and for any i; j 2 N, u i C u j 1. Moreover, feasibility dictates that u 1 C u2 C u3  1. This is impossible; thus, the core is empty. Before leaving this example, let us ask whether it would be possible to subsidize the players by increasing the payoff to the total player set N and, by doing so, ensure that the core of the game with a subsidy is nonempty. We leave it to the reader to verify that if v(N) were increased to $3/2 (or more), the new game would have a nonempty core. Let (N; ) be a game and let i; j 2 N. Then players i and j are substitutes if, for all groups S  N with i; j … S it holds that v(S [ fig) D v(S [ f jg) : Let (N; ) be a game and let u 2 R N be a payoff vector for the game. If for all players i and j who are substitutes it holds that u i D u j then u has the equal treatment property. Note that if there is a partition of N into T subsets, say N1 ; : : : ; N T , where all players in each subset N t are substitutes for each other, then we can represent u by a vector u 2 RT where, for each t, it holds that u t D u i for all i 2 N t . Essential Superadditivity We wish to treat games where the worth of a group of players is independent of the total player set in which it is embedded and an option open to the members of a group is to partition themselves into smaller groups; that is, we treat games that are essentially superadditive. This is built into our the definition of feasibility above, (1). An alternative approach, which would still allow us to treat situations where it is optimal for players to form groups smaller than the total player set, would be to assume that v is the “superadditive cover” of some other worth function v 0 . Given a not-necessarily-superadditive function v 0 , for each group S define v(S) by: v(S) D max

X

v 0 (S k )

(4)

Market Games and Clubs

where the maximum is taken over all partitions fS k g of S; the function v is the superadditive cover of v 0 . Then the notion of feasibility requiring that a payoff vector u is feasible only if u(N)  v(N) ;

(5)

gives an equivalent set of feasible payoff vectors to those of the game (N; v 0 ) with the definition of feasibility given by (1). The following Proposition may be well known and is easily proven. This result was already well understood in Gillies [27] and applications have appeared in a number of papers in the theoretical literature of game theory; see, for example (for " D 0) Aumann and Dreze [6] and Kaneko and Wooders [33]. It is also well known in club theory and the theory of economies with many players and local public goods. Proposition 1 Given " 0, let (N; v 0 ) be a game. A payoff vector u 2 R N is in the weak, respectively uniform, " -core of (N; v 0 ) if and only if it is in the weak, respectively uniform, "-core of the superadditive cover game, say (N; ), where v is defined by (4). A Market In this section we introduce the definition, from Shapley and Shubik [60], of a market. Unlike Shapley and Shubik, however, we do not assume concavity of utility functions. A market is taken to be an economy where all participants have continuous utility functions over a finite set of commodities that are all linear in one commodity, thought of as an “idealized” money. Money can be consumed in any amount, possibly negative. For later convenience we will consider an economy where there is a finite set of types of participants in the economy and all participants of the same type have the same endowments and preferences. Consider an economy with T C 1 types of commodities. Denote the set of participants by N D f(t; q) : t D 1; : : : ; T; and q D 1; : : : ; n t g : Assume that all participants of the same type, (t; q), q D 1; : : : ; n t have the same utility functions given by b u t (y; ) D u t (y) C  T and  2 R. Let a tq 2 R T be the endowwhere y 2 RC C ment of the (t; q)th player of the first T commodities. The P total endowment is given by (t;q)2N a tq . For simplicity and without loss of generality, we can assume that no participant is endowed with any nonzero amount of the

(T C 1) th good, the “money” or medium of exchange. One might think of utilities as being measured in money. It is because of the transferability of money that utilities are called “transferable”. Remark 1 Instead of assuming that money can be consumed in negative amounts one might assume that endowments of money are sufficiently large so that no equilibrium allocates any participant a negative amount of money. For further discussion of transferable utility see, for example, Bergstrom and Varian [9] or Kaneko and Wooders [34] . Given a group S  N, a S-allocation of commodities is a set 8 < T (y tq ;  tq ) 2 RC  R: : 9 = X X X y tq  a tq and  tq  0 ; ; (t;q)2S

(t;q)2S

(t;q)2S

that is, a S-allocation is a redistribution of the commodities owned by the members of S among themselves and monetary transfers adding up to no more than zero. When S D N, a S-allocation is called simply an allocation. With the price of the (T C 1) th commodity  set equal to 1, a competitive outcome is a price vector p in RT , listing prices for the first T commodities, and an allocation f(y tq ;  tq ) 2 RT  R : (t; q) 2 Ng for which y)  p  (b y  a tq ) (a) u t (y tq )  p  (y tq  a tq ) u t (b T , (t; q) 2 N ; for all b y 2 RC

(b)

P

(t;q)2N

y tq D

P

(t;q)

a tq D y ;

(c)  tq D p  (y tq  a tq ) for all (t; q) 2 N and (d)

P

(t;q)2N

 tq D 0 : (6)

Given a competitive outcome with allocation f(y tq ;  tq ) 2 T  R : (t; q) 2 Ng and price vector p, the competitive RC payoff to the (t; q) th participant is u(y tq )  p  (y tq  a tq ). A competitive payoff vector is given by (u(y tq )  p  (y tq  a tq ) : (t; q) 2 N) : In the following we will assume that for each t, all participants of type t have the same endowment; that is, for 0 each t, it holds that a tq D a tq for all q; q0 D 1; : : : ; n t . In this case, every competitive payoff has the equal treatment property; 0

0

0

u t (y tq )  p  (y tq  a tq ) D u t (y tq )  p  (y tq  a tq )

1805

1806

Market Games and Clubs

for all q; q0 and for each t. It follows that a competitive payoff vector can be represented by a vector in RT with one component for each player type. It is easy to generate a game from the data of an economy. For each group of participants S  N, define v(S) D max

X

u t (y tq ;  tq )

tq2S

where the maximum is taken over the set of S-allocations. Let (N; ) denote a game derived from a market. Under the assumption of concavity of the utility functions of the participants in an economy, Shapley and Shubik [60] show that a competitive outcome for the market exists and that the competitive payoff vectors are in the core of the game. (Since [22], such results have been obtained in substantially more general models of economies.) Market-Game Equivalence To facilitate exposition of the theory of games with many players and the equivalence of markets and games, we consider games derived from a common underlying structure and with a fixed number of types of players, where all players of the same type are substitutes for each other.

Let T be a positive integer, to be interpreted as a number of T , where Z T is player types. A profile s D (s1 ; : : : ; s T ) 2 ZC C the T-fold Cartesian product of the non-negative integers ZC , describes a group of players by the numbers of players of each type in the group. Given profile s, define the norm or size of s by def

X

st ;

t

simply the total number of players in a group of players T is a profile s described by s. A subprofile of a profile n 2 ZC satisfying s  n. A partition of a profile s is a collection of subprofiles fs k g of n, not all necessarily distinct, satisfying X

k

where the maximum is taken over the set of all partitions fs k g of s . The function  is said to be superadditive if the worth functions  and   are equal. T We define a pregame as a pair (T;  ) where  : ZC ! RC . As we will now discuss, a pregame can be used to generate multiple games. To generate a game from a pregame, it is only required to specify a total player set N and the numbers of players of each of T types in the set. Then the pregame can be used to assign a worth to every group of players contained in the total player set, thus creating a game. A game determined by the pregame (T;  ), which we will typically call a game or a game with side payments, is a pair [n; (T;  )] where n is a profile. A subgame of a game [n; (T;  )] is a pair [s; (T;  )] where s is a subprofile of n. With any game[n; (T;  )] we can associate a game (N; ) in the form introduced earlier as follows: Let N D f(t; q) : t D 1; : : : ; T and q D 1; : : : ; n t g be a player set for the game. For each subset S  N define T , by its components the profile of S, denoted by prof(S)2 ZC

Pregames

ksk D

Given  , define a worth function   , called the superadditive cover of  , by X def   (s) D max  (s k ) ;

sk D s :

k

A partition of a profile is analogous to a partition of a set except that all members of a partition of a set are distinct. T to R Let  be a function from the set of profiles ZC C with  (0) D 0. The value  (s) is interpreted as the total payoff a group of players with profile s can achieve from collective activities of the group membership and is called the worth of the profile s.

ˇ def ˇ prof(S) t D ˇfS \ f(t 0 ; q) : t 0 D t and q D 1; : : : ; n t gˇ and define def

v(S) D  (prof(S)) : Then the pair (N; ) satisfies the usual definition of a game with side payments. For any S  N, define def

v  (S) D   (prof(S)) : The game (N;  ) is the superadditive cover of (N; ). A payoff vector for a game (N; ) is a vector u 2 R N . For each nonempty subset S of N define def

u(S) D

X

u tq :

(t;q)2S

A payoff vector u is feasible for S if u(S)  v  (S) D   (prof(S)) : If S D N we simply say that the payoff vector u is feasible if u(N)  v  (N) D   (prof(N)) :

Market Games and Clubs

Note that our definition of feasibility is consistent with essential superadditivity; a group can realize at least as large a total payoff as it can achieve in any partition of the group and one way to achieve this payoff is by partitioning into smaller groups. A payoff vector u satisfies the equal-treatment prop0 erty if u tq D u tq for all q; q0 2 f1; : : : ; n t g and for each t D 1; : : : ; T. Let [n; (T;  )] be a game and let ˇ be a collection of subprofiles of n. The collection is a balanced collection of subprofiles of n if there are positive real numbers s for P s 2 ˇ such that  s s D n. The numbers  s are called s2ˇ

balancing weights. Given real number " 0, the game [n; (T;  )] is "-balanced if for every balanced collection ˇ of subprofiles of n it holds that   (n)

X

 s ( (s)  "ksk)

(7)

s2ˇ

where the balancing weights for ˇ are given by s for s 2 ˇ. This definition extends that of Bondareva [13] and Shapley [56] to games with player types. Roughly, a game is (") balanced if allowing “part time” groups does not improve the total payoff (by more than " per player). A game [n; (T;  )] is totally balanced if every subgame [s; (T;  )] is balanced. The balanced cover game generated by a game [n; (T;  )] is a game [n; (T;  b )] where 1.  b (s) D  (s) for all s ¤ n and 2.  b (n)  (n) and  b (n) is as small as possible consistent with the nonemptiness of the core of [n; (T;  b )]. From the Bondareva–Shapley Theorem it follows that  b (n) D   (n) if and only if the game [n; (T;  )] is balanced ("-balanced, with " D 0). For later convenience, the notion of the balanced cover of a pregame is introduced. Let (T;  ) be a pregame. For each profile s, define def

 b (s) D max ˇ

X

 g  (g) ;

(8)

Premarkets In this section, we introduce the concept of a premarket and re-state results from Shapley and Shubik [60] in the context of pregames and premarkets. Let L C 1 be a number of types of commodities and let fb u t (y; ) : t D 1; : : : ; Tg denote a finite number of functions, called utility functions, of the form b u t (y; ) D u t (y) C  ; L and  2 R. (Such functions, in the literwhere y 2 RC ature of economics, are commonly called quasi-linear). L : t D 1; : : : ; Tg be interpreted as a set of Let fa t 2 RC endowments. We assume that u t (a t ) 0 for each t. For def t D 1; : : : ; T we define c t D (u t (); a t ) as a participant t type and let C D fc : t D 1; : : : ; Tg be the set of participant types. Observe that from the data given by C we can construct a market by specifying a set of participants N and a function from N to C assigning endowments and utility functions – types – to each participant in N. A premarket is a pair (T; C). Let (T; C) be a premarket and let s D (s1 ; : : : ; s T ) 2 T . We interpret s as representing a group of economic ZC participants with st participants having utility functions and endowments given by ct for t D 1; : : : ; T; for each t, that is, there are st participants in the group with type ct . Observe that the data of a premarket gives us sufficient data to generate a pregame. In particular, given a profile s D (s1 ; : : : ; s T ) listing numbers of participants of each of T types, define def

W(s) D max

X

s t u t (y t )

t L : t D where the maximum is taken over the set fy t 2 RC P P t t t 1; : : : ; T and t s t y D t a y g. Then the pair (T; W) is a pregame generated by the premarket. The following Theorem is an extension to premarkets or a restatement of a result due to Shapley and Shubik [60].

Theorem 1 Let (T; C) be a premarket derived from economic data in which all utility functions are concave. Then the pregame generated by the premarket is totally balanced.

g2ˇ

Direct Markets and Market-Game Equivalence where the maximum is taken over all balanced collections ˇ of subprofiles of s with weights  g for g 2 ˇ. The pair (T;  b ) is called the balanced cover pregame of (T;  ). Since a partition of a profile is a balanced collection it is immediately clear that  b (s)   (s) for every profile s.

Shapley and Shubik [60] introduced the notion of a direct market derived from a totally balanced game. In the direct market, each player is endowed with one unit of a commodity (himself) and all players in the economy have the same utility function. In interpretation, we might think of

1807

1808

Market Games and Clubs

this as a labor market or as a market for productive factors, (as in [50], for example) where each player owns one unit of a commodity. For games with player types as in this essay, we take the player types of the game as the commodity types of a market and assign all players in the market the same utility function, derived from the worth function of the game. Let (T;  ) be a pregame and let [n; (T;  )] be a derived game. Let N D f(t; q) : t D 1; : : : ; T and q D 1; : : : ; n t for each tg denote the set of players in the game where all participants f(t 0 ; q) : q D 1; : : : ; n t 0 g are of type t 0 for each t 0 D 1; : : : ; T. To construct the direct market generated by a derived game [n; (T;  )], we take the T and suppose that each parcommodity space as RC ticipant in the market of type t is endowed with one unit of the tth commodity, and thus has endowment T where “1” is in the tth 1 t D (0; : : : ; 0; 1; 0; : : : ; 0) 2 RC position. The total endowment of the economy is then P given by n t 1 t D n. T define For any vector y 2 RC def

u(y) D max

X

s  (s) ;

(9)

sn

the maximum running over all fs 0 : s 2 satisfying X s s D y :

T; ZC

s  ng

(10)

sn

As noted by Shapley and Shubik [60], but for our types case, it can be verified that the function u is concave and one-homogeneous. This does not depend on the balancedness of the game [n; (T;  )]. Indeed, one may think of u as T ”. Note the “balanced cover of [n; (T;  )] extended to RC also that u is superadditive, independent of whether the pregame (T;  ) is superadditive. We leave it to the interested reader to verify that if  were not necessarily superadditive and   is the superadditive cover of  then it P P holds that max sn s  (s) D max sn s   (s). Taking the utility function u as the utility function of each player (t; q) 2 N where N is now interpreted as the set of participants in a market, we have generated a market, called the direct market, denoted by [n; u; (T;  )], from the game [n; (T;  )]. Again, the following extends a result of Shapley and Shubik [60] to pregames. Theorem 2 Let [n; u; (T;  )] denote the direct market generated by a game [n; (T;  )] and let [n; (T; u)] denote the game derived from the direct market. Then, if [n; (T;  )] is a totally balanced game, it holds that [n; (T; u)] and [n; (T;  )] are identical.

Remark 2 If the game [n; (T;  )] and every subgame [s; (T;  )] has a nonempty core – that is, if the game is ‘totally balanced’– then the game [n; (T; u)] generated by the direct market is the initially given game [n; (T;  )]. If however the game [n; (T;  )] is not totally balanced then u(s)  (s) for all profiles s  n. But, whether or not [n; (T;  )] is totally balanced, the game [n; (T; u)] is totally balanced and coincides with the totally balanced cover of [n; (T;  )]. Remark 3 Another approach to the equivalence of markets and games is taken by Garratt and Qin [26], who define a class of direct lottery markets. While a player can participate in only one coalition, both ownership of coalitions and participation in coalitions is determined randomly. Each player is endowed with one unit of probability, his own participation. Players can trade their endowments at market prices. The core of the game is equivalent to the equilibrium of the direct market lottery. Equivalence of Markets and Games with Many Players The requirement of Shapley and Shubik [60] that utility functions be concave is restrictive. It rules out, for example situations such as economies with indivisible commodities. It also rules out club economies; for a given club structure of the set of players – in the simplest case, a partition of the total player set into groups where collective activities only occur within these groups – it may be that utility functions are concave over the set of alternatives available within each club, but utility functions need not be concave over all possible club structures. This rules out many examples; we provide a simple one below. To obtain the result that with many players, games derived from pregames are market games, we need some further assumption on pregames. If there are many substitutes for each player, then the simple condition that per capita payoffs are bounded – that is, given a pregame (T;  ), that there exists some constant K such that  (s) < K for all profiles s – suffices. If, however, there may ksk be ‘scarce types’, that is, players of some type(s) become negligible in the population, then a stronger assumption of ‘small group effectiveness’ is required. We discuss these two conditions in the next section. Small Group Effectiveness and Per Capita Boundedness This section discusses conditions limiting gains to group size and their relationships. This definition was introduced in Wooders [83], for NTU, as well as TU, games.

Market Games and Clubs

PCB A pregame (T;  ) satisfies per capita boundedness (PCB) if PCB :

sup T s2ZC

 (s) is finite ksk

Example 4 ([94]) Consider a pregame (T;  ) where T D f1; 2g and  is the superadditive cover of the function  0 defined by:

(11) def

 0 (s) D

jsj 0

if s1 D 2 ; otherwise :

or equivalently,   (s) is finite : sup ksk s2Z T C

It is known that under the apparently mild conditions of PCB and essential superadditivity, in general games with many players of each of a finite number of player types and a fixed distribution of player types have nonempty approximate cores; Wooders [81,83]. (Forms of these assumptions were subsequently also used in Shubik and Wooders [69,70]; Kaneko and Wooders [35]; and Wooders [89,91] among others.) Moreover, under the same conditions, approximate cores have the property that most players of the same type are treated approximately equally ([81,94]; see also Shubik and Wooders [69]). These results, however, either require some assumption ruling out ‘scarce types’ of players, for example, situations where there are only a few players of some particular type and these players can have great effects on total feasible payoffs. Following are two examples. The first illustrates that PCB does not control limiting properties of the per capita payoff function when some player types are scarce. Example 3 ([94]) Let T D 2 and let (T;  ) be the pregame given by (  (s1 ; s2 ) D

s1 C s2

when s1 > 0

0

otherwise :

The function  obviously satisfies PCB. But there is a problem in defining lim (s1 ; s2 )/s1 C s2 as s1 C s2 tends to infinity, since the limit depends on how it is approached. Consider the sequence (s1 ; s2 ) where (s1 ; s2 ) D (0; ); then lim (s1 ; s2 )/s1 C s2 D 0. Now suppose in contrast that (s1 ; s2 ) D (1; ); then lim (s1 ; s2 )/s1 C s2 D 1. This illustrates why, to obtain the result that games with many players are market games either it must be required that there are no scarce types or some some assumption limiting the effects of scarce types must be made. We return to this example in the next section. The next example illustrates that, with only PCB, uniform approximate cores of games with many players derived from pregames may be empty.

Thus, if a profiles D (s1 ; s2 ) has s1 D 2 then the worth of the profile according to  0 is equal to the total number of players it represents, s1 C s2 , while all other profiles s have worth of zero. In the superadditive cover game the worth of a profile s is 0 if s1 < 2 and otherwise is equal to s2 plus the largest even number less than or equal to s1 . Now consider a sequence of profiles (s  ) where  s1 D 3 and s2 D for all . Given " > 0, for all sufficiently large player sets the uniform "-core is empty. Take, for example, " D 1/4. If the uniform "-core were nonempty, it would have to contain an equal-treatment payoff vector.1 For the purpose of demonstrating a contradiction, suppose that u D (u1 ; u2 ) represents an equal treatment payoff vector in the uniform "-core of [s  ; (T;  )]. The following inequalities must hold: 3u1 C u2  C 2 ; 2u1 C u2 C 2; and u1

3 4

:

which is impossible. A payoff vector which assigns each player zero is, however, in the weak "-core for any 1 . But it is not very appealing, in situations such " > C3 as this, to ignore a relatively small group of players (in this case, the players of type 1) who can have a large effect on per capita payoffs. This leads us to the next concept. To treat the scarce types problem, Wooders [88,89,90] introduced the condition of small group effectiveness (SGE). SGE is appealing technically since it resolves the scarce types problem. It is also economically intuitive and appealing; the condition defines a class of economies that, when there are many players, generate competitive markets. Informally, SGE dictates that almost all gains to collective activities can be realized by relatively small groups of players. Thus, SGE is exactly the sort of assumption required to ensure that multiple, relatively small coalitions, firms, jurisdictions, or clubs, for example, are optimal or nearoptimal in large economies. 1 It is well known and easily demonstrated that the uniform "-core of a TU game is nonempty if and only if it contains an equal treatment payoff vector. This follows from the fact that the uniform "-core is a convex set.

1809

1810

Market Games and Clubs

A pregame (T;  ) satisfies small group effectiveness, SGE, if:

SGE :

For each real number " > 0; there is an integer 0 (") such that for each profile s; for some partition fs k g of s with k ks k  0 (") for each subprofile s k , it holds that P   (s)  k  (s k )  "ksk ; (12)

given " > 0 there is a group size 0 (") such that the loss from restricting collective activities within groups to groups containing fewer that 0 (") members is at most " per capita [88].2 SGE also has the desirable feature that if there are no ‘scarce types’ – types of players that appear in vanishingly small proportions– then SGE and PCB are equivalent. Theorem 3 ([91] With ‘thickness,’ SGE = PCB) (1) Let (T;  ) be a pregame satisfying SGE. Then the pregame satisfies PCB. (2) Let (T;  ) be a pregame satisfying PCB. Then given any positive real number , construct a new pregame (T;  ) where the domain of  is restricted to profiles s st > or s t D 0. where, for each t D 1; : : : ; T, either ksk Then (T;  ) satisfies SGE on its domain. It can also be shown that small groups are effective for the attainment of nearly all feasible outcomes, as in the above definition, if and only if small groups are effective for improvement – any payoff vector that can be significantly improved upon can be improved upon by a small group (see Proposition 3.8 in [89]). Remark 4 Under a stronger condition of strict small group effectiveness, which dictates that (") in the definition of small group effectiveness can be chosen independently of ", stronger results can be obtained than those presented in this section and the next. We refer to Winter and Wooders [80] for a treatment of this case. Remark 5 (On the importance of taking into account scarce types) Recall the quotation from von Neumann and Morgenstern and the discussion following the quotation. The assumption of per capita boundedness has significant consequences but is quite innocuous – ruling out the possibility of average utilities becoming infinite as economies grow large does not seem restrictive. But with only per capita boundedness, even the formation of small coalitions can have significant impacts on aggregate outcomes. 2 Exactly the same definition applies to situations with a compact metric space of player types, c.f. Wooders [84,88].

With small group effectiveness, however, there is no problem of either large or small coalitions acting together – large coalitions cannot do significantly better then relatively small coalitions. Roughly, the property of large games we next introduce is that relatively small groups of players make only “asymptotic negligible” contributions to per-capita payoffs of large groups. A pregame (˝;  ) satisfies asymptotic negligibility if, for any sequence of profiles f f  g where k f  k ! 1 as ! 1; 0

( f  ) D  ( f  ) for all and 0 and  ( f ) lim!1 k f k

(13)

exists ;

then for any sequence of profiles f` g with k` k D0; !1 k f  k

(14)

lim

it holds that lim!1 lim!1

  k f C` k k f C` k   k f C` k k f C` k

exists, and D lim!1

 ( f ) k f k

(15) :

Theorem 4 ([89,95]) A pregame (T;  ) satisfies SGE if and only if it satisfies PCB and asymptotic negligibility Intuitively, asymptotic negligibility ensures that vanishingly small percentages of players have vanishingly small effects on aggregate per-capita worths. It may seem paradoxical that SGE, which highlights the importance of relatively small groups, is equivalent to asymptotic negligibility. To gain some intuition, however, think of a marriage model where only two-person marriages are allowed. Obviously two-person groups are (strictly) effective, but also, in large player sets, no two persons can have a substantial affect on aggregate per-capita payoffs. Remark 6 Without some assumptions ensuring essential superadditivity, at least as incorporated into our definition of feasibility, nonemptiness of approximate cores of large games cannot be expected; superadditivity assumptions (or the close relative, essential superadditivity) are heavily relied upon in all papers on large games cited. In the context of economies, superadditivity is a sort of monotonicity of preferences or production functions assumption, that is, superadditivity of  implies that for all T , it holds that  (s C s 0 )  (s) C  (s 0 ). Our ass; s0 2 ZC sumption of small group effectiveness, SGE, admits nonmonotonicities. For example, suppose that ‘two is company, three or more is a crowd,’ by supposing there is only

Market Games and Clubs

one commodity and by setting  (2) D 2,  (n) D 0 for n ¤ 2. The reader can verify, however, that this example satisfies small group effectiveness since   (n) D n if n is even and   (n) D n  1 otherwise. Within the context of pregames, requiring the superadditive cover payoff to be approximately realizable by partitions of the total player set into relatively small groups is the weakest form of superadditivity required for the equivalence of games with many players and concave markets.

is concave, it is continuous on the interior of its domain; this follows from PCB. Small group effectiveness ensures that the function U is continuous on its entire domain [91](Lemma 2).

Derivation of Markets from Pregames Satisfying SGE

In interpretation, T denotes a number of types of playT. ers/commodities and U denotes a utility function on RC T Observe that when U is restricted to profiles (in ZC ), the pair (T; U) is a pregame with the property that every game [n; (T; U)] has a nonempty core; thus, we will call (T; U) the premarket generated by the pregame (T;  ). That every game derived from (T; U) has a nonempty core is a consequence of the Shapley and Shubik [60] result that market games derived from markets with concave utility functions are totally balanced. It is interesting to note that, as discussed in Wooders (Section 6 in [91]), if we restrict the number of commodities to equal the number of player types, then the utility function U is uniquely determined. (If one allowed more commodities then one would effectively have ‘redundant assets’.) In contrast, for games and markets of fixed, finite size, as demonstrated in Shapley and Shubik [62], even if we restrict the number of commodities to equal the number of player types, given any nonempty, compact, convex subset of payoff vectors in the core, it is possible to construct utility functions so that this subset coincides with the set of competitive payoffs. Thus, in the Shapley and Shubik approach, equivalence of the core and the set of price-taking competitive outcomes for the direct market is only an artifact of the method used there of constructing utility functions from the data of a game and is quite distinct from the equivalence of the core and the set of competitive payoff vectors as it is usually understood (that is, in the sense of Debreu and Scarf [22] and Aumann [4]. See also Kalai and Zemel [31,32] which characterize the core in multi-commodity flow games.

With SGE and PCB in hand, we can now derive a premarket from a pregame and relate these concepts. To construct a limiting direct premarket from a pregame, we first define an appropriate utility function. Let (T;  ) be a pregame satisfying SGE. For each vector x T define in RC def

 ( f ) !1 k f  k

U(x) D kxk lim

(16)

where the sequence f f  g satisfies lim!1

f

k f k

D

x kxk

and

(17)

k f k ! 1 : Theorem 5 ([84,91]) Assume the pregame (T;  ) satT the isfies small group effectiveness. Then for any x 2 RC limit (16) exists. Moreover, U() is well-defined, concave and 1-homogeneous and the convergence is uniform in the sense that, given " > 0 there is an integer  such that for all profiles s with ksk   it holds that ˇ ˇ

ˇ s   (s) ˇˇ ˇU   ": ˇ ksk ksk ˇ From Wooders [91] (Theorem 4), if arbitrarily small percentages of players of any type that appears in games generated by the pregame are ruled out, then the above result holds under per capita boundedness [91] (Theorem 6). As noted in the introduction to this paper, for the TU case, the concavity of the limiting utility function, for the model of Wooders [83] was first noted by Aumann [5]. The concavity is shown to hold with a compact metric space of player types in Wooders [84] and is simplified to the finite types case in Wooders [91]. Theorem 5 follows from the facts that the function U is superadditive and 1-homogeneous on its domain. Since U

Theorem 6 ([91]) Let (T;  ) be a pregame satisfying small group effectiveness and let (T; U) denote the derived direct market pregame. Then (T; U) is a totally balanced market game. Moreover, U is one-homogeneous, that is, U(x) D U(x) for any non-negative real number .

Cores and Approximate Cores The concept of the core clearly was important in the work of Shapley and Shubik [59,60,62] and is also important for the equivalence of games with many players and market games. Thus, we discuss the related results of nonemptiness of approximate cores and convergence of approximate cores to the core of the ‘limit’ – the game where all players have utility functions derived from a pregame and

1811

1812

Market Games and Clubs

large numbers of players. First, some terminology is required. A vector p is a subgradient at x of the concave function U if U(y)  U(x)  p  (y  x) for all y. One might think of a subgradient as a bounding hyperplane. To avoid any confusion it might be helpful to note that, as MasColell [46] remarks: “ Strictly speaking, one should use the term subgradient for convex functions and supergradient for concave. But this is cumbersome”, (p. 29–30 in [46]). For ease of notation, equal-treatment payoff vectors for a game [n; (T;  )] will typically be represented as vectors in RT . An equal-treatment payoff vector, or simply a payoff vector when the meaning is clear, is a point x in RT . The tth component of x, x t , is interpreted as the payoff to each player of type t. The feasibility of an equal-treatment payoff vector x 2 RT for the game [n; (T;  )] can be expressed as:   (n) x  n : Let [n; (T;  )] be a game determined by a pregame (T;  ), let " be a non-negative real number, and let x 2 RT be a (equal-treatment) payoff vector. Then x is in the equal-treatment "-core of [n; (T;  )] or simply “in the "-core” when the meaning is clear, if x is feasible for [n; (T;  )] and  (s)  x  s C "ksk for all subprofiles s of n : Thus, the equal-treatment "-core is the set def

T :   (n) x  n and C(n; ") D fx 2 RC

(18)

 (s)  x  s C "ksk for all subprofiles s of ng : It is well known that the "-core of a game with transferable utility is nonempty if and only if the equal-treatment "-core is nonempty. T, Continuing with the notation above, for any s 2 RC let ˘ (s) denote the set of subgradients to the function U at the point s; def

˘ (s) D f 2 RT :   s D U(s) and   s 0 U(s 0 ) T for all s 0 2 RC g:

(19)

The elements in ˘ (s) can be interpreted as equal-treatment core payoffs to a limiting game with the mass of players of type t given by st . The core payoff to a player is simply the value of the one unit of a commodity (himself and all his attributes, including endowments of resources) that he owns in the direct market generated by a game. Thus ˘ () is called the limiting core correspondence for the

pregame (T;  ): Of course ˘ () is also the limiting core correspondence for the pregame (T; U). b (n) R T denote equal-treatment core of the Let ˘ market game [n; (T; u)]: b (n) def D f 2 RT :   n D u(n) ˘ T , s  ng : and   s u(s) for all s 2 ZC

(20)

Given any player profile n and derived games [n; (T;  )] and [n; (T; U)] it is interesting to observe the distinction between the equal-treatment core of the game b (n); defined by (20), and the [n; (T; U)], denoted by ˘ set ˘ (n) (that is, ˘ (x) with x D n). The definitions of b (n) are the same except that the qualification ˘ (n) and ˘ b (n) does not appear in the “s  n” in the definition of ˘ definition of ˘ (n). Since ˘ (n) is the limiting core correspondence, it takes into account arbitrarily large coalib (n) it tions. For this reason, for any x 2 ˘ (n) and b x2˘ holds that x  n b x  n. A simple example may be informative. Example 5 Let (T;  ) be a pregame where T D 1 and  (n) D n n1 for each n 2 ZC , and let [n; (T;  )] be a deb (n) D f(1  12 )g. rived game. Then ˘ (n) D f1g while ˘ n The following Theorem extends a result due to Shapley and Shubik [62] stated for games derived from pregames. Theorem 7 ([62]) Let [n; (T;  )] be a game derived from a pregame and let [n; u; (T;  )] be the direct market genb (n) erated by [n; (T;  )]. Then the equal-treatment core ˘ of the game [n; (T; u)] is nonempty and coincides with the set of competitive price vectors for the direct market [n; u; (T;  )]. Remark 7 Let (T;  ) be a pregame satisfying PCB. In the development of the theory of large games as models of competitive economies, the following function on the space of profiles plays an important role:   (r f ) ; r!1 r lim

see, for example, Wooders [81] and Shubik and Wooders [69]. For the purposes of comparison, we introduce another definition of a limiting utility function. For each vecT with rational components let r(x) be the smalltor x in RC est integer such that r(x)x is a vector of integers. Therefore, for each rational vector x; we can define   ( r(x)x) def ˆ U(x) D lim : !1

r(x) Since   is superadditive and satisfies per capita boundˆ edness, the above limit exists and U() is well-defined.

Market Games and Clubs

ˆ Also, U(x) has a continuous extension to any closed subT : The function U(x); ˆ set strictly in the interior of RC howT . For ever, may be discontinuous at the boundaries of RC example, suppose that T D 2 and ( 

 (k; n) D

kCn

when k > 0

0

otherwise :

The function   obviously satisfies PCB but does not satisfy SGE. To see the continuity problem, consider the sequences fx  g and fy  g of vectors in R2C where  D (0; ). Then lim  D x  D ( 1 ; 1 !1 x  ) and y   ˆ lim!1 y D (0; 1) but lim!1 U(x ) D 1 while ˆ  ) D 0. SGE is precisely the condition relim!1 U(y quired to avoid this sort of discontinuity, ensuring that the T. function U is continuous on the boundaries of RC Before turning to the next section, let us provide some b (n). Suppose a game additional interpretation for ˘ [n; (T;  )] is one generated by an economy, as in Shapley and Shubik [59] or Owen [50], for example. Players of different types may have different endowments of private b (n) is an equal-treatment payoff goods. An element  in ˘ vector in the core of the balanced cover game generated by [n; (T;  )] and can be interpreted as listing prices for player types where  t is the price of a player of type t; this price is a price for the player himself, including his endowment of private goods.

numbers. Then there is a real number " with 0 < " and an integer 0 (ı; ; " ) with the following property: for each positive " 2 (0; " ] and each game [ f ; (T;  )] with k f k > 0 (ı; ; " ) and f t /k f k for each t D 1; : : :, T, if C( f ; ") is nonempty then both b ( f )] < ı ; dist[C( f ; "); ˘ ( f )] < ı and dist[C( f ; "); ˘ where ‘dist’ is the Hausdorff distance with respect to the sum norm on RT . Note that this result applies to games derived from diverse economies, including economies with indivisibilities, nonmonotonicities, local public goods, clubs, and so on. Theorem 8 motivates the question of whether approximate cores of games derived from pregames satisfying small group effectiveness treat players most of the same type nearly equally. The following result, from Wooders [81,89,93] answers this question. Theorem 9 Let (T;  ) be a pregame satisfying SGE. Then given any real numbers  > 0 and  > 0 there is a positive real number " and an integer such that for each " 2 [0, T with knk > , if x 2 R N " ] and for every profile n 2 ZC 1 is in the uniform "-core of the game [n;  ] with player set N D f(t; q) : t D 1; : : : ; T and, for each t; q D 1; : : : ; n t g then, for each t 2 f1; : : : ; Tg with

Nonemptiness and Convergence of Approximate Cores of Large Games The next Proposition is an immediate consequence of the convergence of games to markets shown in Wooders [89,91] and can also be obtained as a consequence of Theorem 5 above. Proposition 2 (Nonemptiness of approximate cores) Let (T;  ) be a pregame satisfying SGE. Let " be a positive real number. Then there is an integer 1 (") such that any game [n; (T;  )] with knk 1 (") has a nonempty uniform "-core. (Note that no assumption of superadditivity is required but only because our definition of feasibility is equivalent to feasibility for superadditive covers.) The following result was stated in Wooders [89]. For more recent results see Wooders [94]. Theorem 8 ([89] Uniform closeness of (equal-treatment) approximate cores to the core of the limit game) Let (T;  ) be a pregame satisfying SGE and let ˘ () be as defined above. Let ı > 0 and > 0 be positive real

nt knk1



 2

it holds that

jf(t; q) : jx tq  z t j >  gj < n t g ; where, for each t D 1; : : : ; T, zt D

nt 1 X x tq ; n t qD1

the average payoff received by players of type t. Shapley Values of Games with Many Players Let (N; ) be a game. The Shapley value of a superadditive game is the payoff vector whose ith component is given by SH(v; i) D

jNj1 1 X 1

jNj JD0 jNj  1 J

X   v(S [ fig)  v(S) : S Nnfig jSjDJ

To state the next Theorem, we require one additional definition. Let (T;  ) be a pregame. The pregame satisfies

1813

1814

Market Games and Clubs

boundedness of marginal contributions (BMC) if there is a constant M such that j (s C 1 t )   (s)j  M for all vectors 1 t D (0; : : : ; 0; 1 t th place ; 0; : : : 0) for each t D 1; : : : ; T. Informally, this condition bounds marginal contributions while SGE bounds average contributions. That BMC implies SGE is shown in Wooders [89]. The following result restricts the main Theorem of Wooders and Zame [96] to the case of a finite number of types of players. Theorem 10 ([96]) Let (T;  ) be a superadditive pregame satisfying boundedness of marginal contributions. For each " > 0 there is a number ı(") > 0 and an integer (") with the following property: If [n; (T;  )] is a game derived from the pregame, for which n t > (") for each t, then the Shapley value of the game is in the (weak) "-core. Similar results hold within the context of private goods exchange economies (cf., Shapley [55], Shapley and Shubik [60], Champsaur [17], Mas-Colell [43], Cheng [18] and others). Some of these results are for economies without money but all treat private goods exchange economies with divisible goods and concave, monotone utility functions. Moreover, they all treat either replicated sequences of economies or convergent sequences of economies. That games satisfying SGE are asymptotically equivalent to balanced market games clarifies the contribution of the above result. In the context of the prior results developed in this paper, the major shortcoming of the Theorem is that it requires BMC. This author conjectures that the above result, or a close analogue, could be obtained with the milder condition of SGE, but this has not been demonstrated. Economies with Clubs By a club economy we mean an economy where participants in the economy form groups – called clubs – for the purposes of collective consumption and/or production collectively with the group members. The groups may possibly overlap. A club structure of the participants in the economy is a covering of the set of players by clubs. Providing utility functions are quasi-linear, such an economy generates a game of the sort discussed in this essay. The worth of a group of players is the maximum total worth that the group can achieve by forming clubs. The most general model of clubs in the literature at this point is Allouch and Wooders [1]. Yet, if one were to assume that

utility functions were all quasi-linear and the set of possible types of participants were finite. the results of this paper would apply. In the simplest case, the utility of an individual depends on the club profile (the numbers of participants of each type) in his club. The total worth of a group of players is the maximum that it can achieve by splitting into clubs. The results presented in this section immediately apply. When there are many participants, club economies can be represented as markets and the competitive payoff vectors for the market are approximated by equal-treatment payoff vectors in approximate cores. Approximate cores converge to equal treatment and competitive equilibrium payoffs. A more general model making these points is treated in Shubik and Wooders [65]. For recent reviews of the literature, see Conley and Smith [19] and Kovalenkov and Wooders [38].3 Coalition production economies may also be viewed as club economies. We refer the reader to Böhm [12], Sondermann [73], Shubik and Wooders [70], and for a more recent treatment and further references, Sun, Trockel and Yang [74]). Let us conclude this section with some historical notes. Club economies came to the attention of the economics profession with the publication of Buchanan [14]. The author pointed out that people care about the numbers of other people with whom they share facilities such as swimming pool clubs. Thus, there may be congestion, leading people to form multiple clubs. Interestingly, much of the recent literature on club economies with many participants and their competitive properties has roots in an older paper, Tiebout [77]. Tiebout conjectured that if public goods are ‘local’ – that is, subject to exclusion and possibly congestion – then large economies are ‘market-like’. A first paper treating club economies with many participants was Pauly [51], who showed that, when all players have the same preferred club size, then the core of economy is nonempty if and only if all participants in the economy can be partitioned into groups of the preferred size. Wooders [82] modeled a club economy as one with local public goods and demonstrated that, when individuals within a club (jurisdiction) are required to pay the same share of the costs of public good provision, then outcomes in the core permit heterogeneous clubs if and only if all types of participants in the same club have the same demands for local public goods and for congestion. Since 3 Other approaches to economies with clubs/local public goods include Casella and Feinstein [15], Demange [23], Haimanko, O., M. Le Breton and S. Weber [28], and Konishi, Le Breton and Weber [37]. Recent research has treated clubs as networks.

Market Games and Clubs

these early results, the literature on clubs has grown substantially. With a Continuum of Players Since Aumann [4] much work has been done on economies with a continuum of players. It is natural to question whether the asymptotic equivalence of markets and games reported in this article holds in a continuum setting. Some such results have been obtained. First, let N D [01] be the 0,1 interval with Lesbegue measure and suppose there is a partition of N into a finite set of subsets N 1 , . . . , N T where, in interpretation, a point in N t represents a player of type t. Let  be given. Observe that  determines a payoff for any finite group of players, depending on the numbers of players of each type. If we can aggregate partitions of the total player set into finite coalitions then we have defined a game with a continuum of players and finite coalitions. For a partition of the continuum into finite groups to ‘make sense’ economically, it must preserve the relative scarcities given by the measure. This was done in Kaneko and Wooders [35]. To illustrate their idea of measurement consistent partitions of the continuum into finite groups, think of a census form that requires each three-person household to label the players in the household, #1, #2, or #3. When checking the consistency of its figures, the census taker would expect the numbers of people labeled #1 in three-person households to equal the numbers labeled #2 and #3. For consistency, the census taker may also check that the number of first persons in three-person households in a particular region is equal to the number of second persons and third persons in three person households in that region. It is simple arithmetic. This consistency should also hold for k-person households for any k. Measurement consistency is the same idea with the work “number” replaced by “proportion” or “measure”. One can immediately apply results reported above to the special case of TU games of Kaneko–Wooders [35] and conclude that games satisfying small group effectiveness and with a continuum of players have nonempty cores and that the payoff function for the game is one-homogeneous. (We note that there have been a number of papers investigating cores of games with a continuum of players that have came to the conclusion that non-emptiness of exact cores does not hold, even with balancedness assumptions, cf., Weber [78,79]). The results of Wooders [91], show that the continuum economy must be representable by one where all players have the same concave, continuous onehomogeneous utility functions. Market games with a continuum of players and a finite set of types are also investi-

gated in Azriel and Lehrer [3], who confirm these conclusions.) Other Related Concepts and Results In an unpublished 1972 paper due to Edward Zajac [97], which has motivated a large amount of literature on ‘subsidy-free pricing’, cost sharing, and related concepts, the author writes: “A fundamental idea of equity in pricing is that ‘no consumer group should pay higher prices than it would pay by itself. . . ’. If a particular group is paying a higher price than it would pay if it were severed from the total consumer population, the group feels that it is subsidizing the total population and demands a price reduction”. The “dual” of the cost allocation problem is the problem of surplus sharing and subsidy-free pricing.4 Tauman [75] provides a excellent survey. Some recent works treating cost allocation and subsidy free-pricing include Moulin [47,48]. See also the recent notion of “Walras’ core” in Qin, Shapley and Shimomura [52]. Another related area of research has been into whether games with many players satisfy some notion of the Law of Demand of consumer theory (or the Law of Supply of producer theory). Since games with many players resemble market games, which have the property that an increase in the endowment of a commodity leads to a decrease in its price, such a result should be expected. Indeed, for games with many players, a Law of Scarcity holds – if the numbers of players of a particular type is increased, then core payoffs to players