Evolvable Systems: From Biology to Hardware: 9th International Conference, ICES 2010, York, UK, September 6-8, 2010, Proceedings (Lecture Notes in Computer Science, 6274) 3642153224, 9783642153228

Biology has inspired electronics from the very beginning: the machines that we now call computers are deeply rooted in b

134 112 16MB

English Pages 406 Year 2010

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title Page
Preface
Organization
Table of Contents
Session 1: Evolving Digital Circuits
Measuring the Performance and Intrinsic Variability of Evolved Circuits
Introduction
CMOS Variability
Causes of Device Variability
Intrinsic Parameter Fluctuations
Modelling Intrinsic Variability
Experiment Details
Measuring Speed and Power Consumption
Measuring Intrinsic Variability
Results
Conclusions and Future Work
References
An Efficient Selection Strategy for Digital Circuit Evolution
Introduction
Cartesian Genetic Programming
Benchmark Problems
The Proposed Modification of CGP
Results
Experimental Setup
Evolution from a Random Population
Post-synthesis Optimization
Analysis
Conclusions
References
Introducing Flexibility in Digital Circuit Evolution: Exploiting Undefined Values in Binary Truth Tables
Introduction
Digital Circuit Evolution
Genotype Encoding
Fitness Evaluation
The Evolutionary Algorithm
Problem Space
Quotient and Remainder Hardware Divider
Finite State Machine Logic
Distributed Don’t Cares
Implementation of Don’t Care Flexibility
Simple Don’t Care Bitmask
Extended Don’t Care Method
Evolved Data
Test Structure and Parameters
Success of Evolving 2-bit Hardware Divider
Success of Evolving FSM Next State Logic
Success of Evolving Distributed Don’t Cares Circuit
Efficiency of Evolved Circuits
Conclusion
References
Evolving Digital Circuits Using Complex Building Blocks
Introduction
Related Works
Description of Molecule Functionality
CGP and Its Extension to MolCGP
Evolution Strategy and Population Size Experiments
Discussion
Applying MolCGP to Benchmark Problems
Discussion
Examining an Evolved Solution to the 2-bit Multiplier Problem
Conclusions and Further Work
References
Session 2: Artificial Development
Fault Tolerance of Embryonic Algorithms in Mobile Networks
Introduction
Background
EmbryoWare Architecture
Case Study: Coordinated Data Sensing and Logging
Performance Evaluation under Node Mobility
Connection Aware versus Connection Unaware Sensing
Conclusions and Further Work
References
Evolution and Analysis of a Robot Controller Based on a Gene Regulatory Network
Introduction
The Artificial Developmental System
Representation and Gene Regulation
Evolution of the Genotype
Developing Organisms That Control Robots
E-Puck, Player/Stage and GRN
Mapping Sensory Inputs
Deriving Motor Command Signals
Evolution and Analysis of a GRN Based Robot Controller
Fitness Function
Assessing Task Based Performance
Measuring Performance Using Cross-Correlation
Test and Analysis on Different Maps
Test and Analysis on an E-Puck Robot
Discussion
References
A New Method to Find Developmental Descriptions for Digital Circuits
Introduction
Circuit Structure and the Developmental Program
Circuit Structure
Developmental Program
Applying Evolution to Find the Developmental Program
User Interface and Problem Statement
Evolutionary Algorithm
Fitness Function
Results
Conclusion and Future Work
References
Sorting Network Development Using Cellular Automata
Introduction
Cellular Automata
Sorting Networks and Their Design
Development of Sorting Networks Using Cellular Automata
Absolute Encoding
Relative Encoding
Evolutionary System Setup
Experimental Results and Discussion
Results from the Absolute Encoding
Results from the Relative Encoding
Conclusions
References
Session 3: GPU Platforms for Bio-inspired Algorithms
Markerless Articulated Human Body Tracking from Multi-view Video with GPU-PSO
Introduction
Related Work
Articulated Human Body Pose Estimation from Video
Particle Swarm Optimisation
Parallel PSO Implementation within the CUDA™Architecture
Parallelising PSO Using CUDA™
Pose Estimation Algorithm
Body Model
PSO Parametrisation of the Articulated Pose
Search Hierarchy
Fitness Function
Experiments
Conclusions
References
Evolving Object Detectors with a GPU Accelerated Vision System
Motivation
Evolutionary Computer Vision
GPU Accelerated Image Processing
Representation
Experiments
Conclusion
References
Systemic Computation Using Graphics Processors
Introduction
Systemic Computation
GPU Implementation of Systemic Computation
Consumer: Performing System Interactions
Producer: Finding Matching Systems
Architecture Testing and Evaluation
Genetic Algorithm Optimization of Binary Knapsack
Experiments
Conclusion
References
Session 4: Implementations and Applications of Neural Networks
An Efficient, High-Throughput Adaptive NoC Router for Large Scale Spiking Neural Network Hardware Implementations
Introduction
Motivation and Previous Works
Adaptive NoC Router
Adaptive Arbitration Policy
Adaptive Routing Scheme
Performance Analysis
Methodology
Performance Results
Evaluation
Summary and Discussion
References
Performance Evaluation and Scaling of a Multiprocessor Architecture Emulating Complex SNN Algorithms
Introduction
SNN Algorithm and Multiprocessor Architecture
SNN Model Mapping and Implementation
Performance Figures
Proposed Architecture Improvements
Conclusion
References
Evolution of Analog Circuit Models of Ion Channels
Introduction
Methods
Circuit Evolution
Neurophysiological Data
Results
Discussion and Conclusion
References
HyperNEAT for Locomotion Control inModular Robots
Introduction
Generative Encoding Description
Experimental Set-Up
Modules and Organism
Control
Modular Differentiation
Task and Evolution
Results and Analysis
Conclusion and Future Work
References
Session 5: Test, Repair and Reconfiguration Using Evolutionary Algorithms
The Use of Genetic Algorithm to Reduce Power Consumption during Test Application
Introduction
Power Consumption Metrics
Low Power Approaches
Complexity of the Problem
Problems Related to Power Dissipation Estimation
Motivation for the Research
Proposed Optimization Method
Encoding of the Problem
Fitness Function
Selection Operators
Initialization of the Population
Experimental Results
Problem Size
Impact of GA Parameters
Scalability of the Solution
Comparison with Other Approaches
Conclusions
References
Designing Combinational Circuits with an Evolutionary Algorithm Based on the Repair Technique
Introduction
The Repair Technique
The Stalling Effect
The Principle of the Repair Technique
Gates Used in the Repair Component
The Evolutionary Algorithm Based on the Repair Technique
Experiments
Multiplier
Adder
Discussions
Conclusion
References
Bio-inspired Self-testing Configurable Circuits
Introduction
Bio-inspired Properties
Cellular Architecture
Cloning
Cicatrization
Regeneration
Configurable Molecule
Configuration Layer
Application Layer
Bio-inspired Mechanisms
Configuration Test
Structural Configuration
Functional Configuration
Cloning
Control Test
Processing Test
Applications
Arithmetic and Logic Unit
Image Processing Array
Conclusion
References
Evolutionary Design of Reconfiguration Strategies to Reduce the Test Application Time
Introduction
Previous Work
Reconfiguration Before Test Application
Search Algorithm
Example
Possible Implementation Scenarios
Proposed Method
Notation
Circuit Configuration
Fitness Function
Mutation
Evolutionary Algorithm
Experimental Results
Discussion
Conclusions
References
Session 6: Applications of Evolutionary Algorithms in Hardware
Extrinsic Evolution of Fuzzy Systems Applied to Disease Diagnosis
Introduction
Fuzzy Systems
Fuzzy Systems for Disease Diagnosis
Specification of the Search Space
HW/SW Implementation
Hardware Architecture
Fuzzification Module
Results
Speedup
Conclusion
References
Automatic Code Generation on a MOVE Processor Using Cartesian Genetic Programming
Introduction
Transport Triggered Architecture
Generating MOVE Code Using CGP
Experiment
Parameters
Results
Further Analysis of Solutions
Conclusions and Future Work
References
Coping with Resource Fluctuations: The Run-time Reconfigurable Functional Unit Row Classifier Architecture
Introduction
The Reconfigurable Functional Unit Row Architecture
Organization of the FUR Architecture
Reconfigurable FUR Architecture
Evolution of FUR Architecture
Experiments and Results
Benchmarks
Accuracy and Overfitting Analyses
Reconfigurable FUR Architecture Results
Conclusion
References
Session 7: Reconfigurable Hardware Platforms
A Self-reconfigurable FPGA-Based Platform for Prototyping Future Pervasive Systems
Introduction
Related Works on Self-adaptation and Context
Description of the Prototyping FPGA-Based Platform
Hardware Part of the Prototype
Software Part of the Prototype
Management of the Platform: The Allocation Controller
Access from Software Threads to the Accelerators
Test Applications
Histogram Equalization
Optical Character Recognition
Results and Analysis of the Platform
Size of the Operators and Reconfiguration Time
Speed-Up of OCR Thanks to Hardware Support
Using the AC for Self-healing
Analysis of the Platform
Conclusions
References
The X2 Modular Evolutionary Robotics Platform
Introduction
The X2 Robotic System
Core Module
Configuration
Simulation
Robot Control and Experimental Configuration
Controller Model
Experimental Configuration 1
Experimental Configuration 2
Evolution
Results
Evolution Runs
Locomotion Results
Hardware System Status
Discussion
Conclusion
References
Ubichip, Ubidule, and MarXbot: A Hardware Platform for the Simulation of Complex Systems
Introduction
Ubichip
Ubicell Layer
Self-reconfiguration Layer
Dynamic Routing Layer
Ubidule
Ubidule's Mother-Board
Ubichip Daughter Board
The Marxbot Robotic Platform
Modularity
Ubichip Compatibility
Battery Management
Complex Systems Simulations
Neurogenetic and Synaptogenic Networks on the Ubichip
Collective Robotics for Target Localization
Conclusions
References
Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism on the Ubichip Platform
Introduction
A Power-Aware Fault Tolerant System
Background
Power Awareness
System Description
A Reconfigurable Framework: PERPLEXUS
Ubichip
The Ubicell
Inter-cell Connection
Self Reconfigulation
Ubimanager
Implementation
Implementation Results
Cell Count, Area Overhead
Timing Observation
Conclusion
Concluding Remarks
Future Work
References
Session 8: Applications of Evolution to Technology
Automatic Synthesis of Lossless Matching Networks
Introduction
TPG Sensitivity Computation
Proposed Evolutionary Algorithm
Representation
Crossover and Mutation
Fitness Computation
Algorithm Overview
Numerical Results
Conclusions
References
A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device
Introduction
Hierarchical MEMS Design
Problem Definition
Multi-objective Evolutionary Algorithm Filter Design Synthesis
Results and Comparison
Conclusions and Future Work
References
From Binary to Continuous Gates – and Back Again
NMR and Binary Gates
Functions of NMR and Continuous Gates
Evolving Robust Continuous Gates and Circuits
Continuous NAND Gate with sin/sin Characteristics
Continuous NAND Gate with sin/sinc Characteristics
Evolving XOR Circuits Using NAND Gates
Truly Continuous Gates
Conclusions and Next Steps
Experimental Setup
References
Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits
Introduction
Background
Search Methodology
Static GA Operators
Adapting GA Operators
Self-adapting GA Operators
Evaluating Quantum Circuit Evolution
Experimental Platform
Comparison Analysis
Conclusions and Perspectives
References
Session 9: Novel Methods in Evolutionary Design
Imitation Programming
Introduction
Background
Designing Unorganised Machines through Imitation
Experimentation
Asynchrony
A Comparison with Evolution
Conclusions
References
EvoFab: A Fully Embodied Evolutionary Fabricator
Introduction
EvoFab: An Evolutionary Fabricator
Proof of Concept: Interactive Evolution of Shape
Discussion
Fabrication and Epigenetic Traits
Material Use and Conservation
Design Domains
References
Evolving Physical Self-assembling Systems in Two-Dimensions
Introduction
Background
Three-Level Approach and Evolution
Level One: Definition of Rule Set
Level Two: Virtual Execution of Rule Set
Level Three: Physical Realisation of Rule Set
Evolving Self-assembly Rule Sets
Experiments and Results
Level One: Definition of Rule Set for Experiments
Level Two: Virtual Execution of Rule Set for Experiments
Experimental Set-up.
Experimental Results.
Level Three: Physical Realisation of Rule Set for Experiments
Experimental Set-up.
Experimental Results.
Conclusions
References
Author Index
Recommend Papers

Evolvable Systems: From Biology to Hardware: 9th International Conference, ICES 2010, York, UK, September 6-8, 2010, Proceedings (Lecture Notes in Computer Science, 6274)
 3642153224, 9783642153228

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany

6274

Gianluca Tempesti Andy M. Tyrrell Julian F. Miller (Eds.)

Evolvable Systems: From Biology to Hardware 9th International Conference, ICES 2010 York, UK, September 6-8, 2010 Proceedings

13

Volume Editors Gianluca Tempesti University of York Department of Electronics Intelligent Systems Group York YO10 5DD, UK E-mail: [email protected] Andy M. Tyrrell University of York Department of Electronics Intelligent Systems Group York YO10 5DD, UK E-mail: [email protected] Julian F. Miller University of York Department of Electronics Intelligent Systems Group York YO10 5DD, UK E-mail: [email protected]

Library of Congress Control Number: 2010932609 CR Subject Classification (1998): C.2, D.2, F.1, F.3, J.3, I.2 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13

0302-9743 3-642-15322-4 Springer Berlin Heidelberg New York 978-3-642-15322-8 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180

Preface

Biology has inspired electronics from the very beginning: the machines that we now call computers are deeply rooted in biological metaphors. Pioneers such as Alan Turing and John von Neumann openly declared their aim of creating artificial machines that could mimic some of the behaviors exhibited by natural organisms. Unfortunately, technology had not progressed enough to allow them to put their ideas into practice. The 1990s saw the introduction of programmable devices, both digital (FPGAs) and analogue (FPAAs). These devices, by allowing the functionality and the structure of electronic devices to be easily altered, enabled researchers to endow circuits with some of the same versatility exhibited by biological entities and sparked a renaissance in the field of bio-inspired electronics with the birth of what is generally known as evolvable hardware. Ever since, the field has progressed along with the technological improvements and has expanded to take into account many different biological processes, from evolution to learning, from development to healing. Of course, the application of these processes to electronic devices is not always straightforward (to say the least!), but rather than being discouraged, researchers in the community have shown remarkable ingenuity, as demostrated by the variety of approaches presented at this conference and included in these proceedings. Held without interruption since 1995, ICES has become the leading conference in the field of evolvable hardware and systems. The 9th ICES conference, held in York, UK, in September 2010, built on the success of its predecessors and brought together some of the leading researchers who combine biologically inspired concepts with hardware. The 33 papers included in this volume, accepted for oral presentation and publication following a rigorous review process by a selected Programme Committee, represent a good sample of some of the best research in the field and clearly illustrate the range of approaches that fall under the label of bio-inspired hardware, defined as electronic hardware that tries to draw inspiration from (and not, it is worth pointing out, to imitate) the world of biology to find solutions for the problems facing the design of computing systems. So a heartfelt note of thanks goes to the authors of the papers presented in these proceedings, who submitted material of remarkably high quality and contributed to make ICES 2010 a successful conference. This success was also a result of the outstanding work from the Organizing Committee, from the Local Chairs, Steve Smith and James Walker, who were instrumental in arranging the details of the venue and all the intricate details involved in running a conference, to the Publicity Chairs, Andy Greensted and Michael Lones, who handled the interface with the world by setting up a great website and by making sure that the conference was advertised widely through the community.

VI

Preface

We wish to show our particular gratitude to our Programme Committee: due to some unforeseen circumstances, we were forced to set a deadline for reviews that was considerably shorter than usual and the committee did a magnificent job in providing us with their invaluable feedback within a very short time. And we should not forget the contribution of the Steering Committee members whose oversight and commitment through the years ensures that the ICES series of conferences has a bright future ahead. Last but not least, we wish to thank our three outstanding Keynote Speakers, Steve Furber, Hod Lipson, and Andrew Turberfield, who stimulated thought and inspired us with their presentations. Of course, the papers in these proceedings represent just a few examples of how bio-inspired approaches are being applied to electronic hardware: analogies between the world of computer engineering and that of biology can be drawn, explicitly or implicitly, on many levels. By showcasing the latest developments in the field and by providing a forum for discussion and for the exchange of ideas, ICES 2010 represented, we hope, a small but significant step towards the fulfillment of some of our ambitions for this developing field and contributed novel ideas that will find fertile ground in our community and beyond. September 2010

Gianluca Tempesti Andy Tyrrell Julian Miller

Organization

ICES2010 was organized by the Intelligent Systems Group of the Department of Electronics, University of York, UK.

Executive Committee General Chair: Programme Chairs: Local Arrangements: Publicity:

Gianluca Tempesti Andy Tyrrell Julian Miller Stephen Smith James A. Walker Andrew Greensted Michael Lones

Steering Committee Pauline C. Haddow Tetsuya Higuchi Julian Miller Jim Torresen Andy Tyrrell

Norwegian University of Science and Technology, Norway AIST, Japan The University of York, UK The University of Oslo, Norway The University of York, UK (Chair)

Programme Committee Andrew Adamatzky Bur¸cin Aktan Tughrul Arslan Elhadj Benkhelifa Peter Bentley Michal Bidlo Stefano Cagnoni Carlos A. Coello Ronald F. DeMara Rolf Drechsler Marc Ebner R. Tim Edwards Stuart J. Flockton John Gallagher Takashi Gomi Garrison Greenwood

Pauline C. Haddow David M. Halliday Alister Hamilton Morten Hartmann Inman Harvey James Hereford Arturo Hernandez-Aguirre Jean-Claude Heudin Masaya Iwata Tatiana Kalganova Paul Kaufmann Krzysztof Kepa Didier Keymeulen Gul Muhammad Khan Gregory Larchev Per Kristian Lehre

VIII

Organization

Wenjian Luo Jordi Madrenas Trent McConaghy Bob McKay Maizura Mokhtar J. Manuel Moreno Arostegui Pierre-Andr´e Mudry Masahiro Murakawa Nadia Nedjah Andres Perez-Uribe Marek A. Perkowski Jean-Marc Philippe Tony Pipe Lucian Prodan Omer Qadir Daniel Roggen Jo¨el Rossier Eduardo Sanchez Cristina Santini

Gilles Sassatelli Thorsten Schnier Luk´ aˇs Sekanina Giovanni Squillero Till Steiner Susan Stepney Uwe Tangen Christof Teuscher Jon Timmis Yann Thoma Adrian Thompson Jim Torresen Martin Trefzer Gunnar Tufte Andres Upegui Fabien Vannel Moritoshi Yasunaga Xin Yao Tina Yu

Table of Contents

Session 1: Evolving Digital Circuits Measuring the Performance and Intrinsic Variability of Evolved Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Alfred Walker, James A. Hilder, and Andy M. Tyrrell An Efficient Selection Strategy for Digital Circuit Evolution . . . . . . . . . . . Zbyˇsek Gajda and Luk´ aˇs Sekanina Introducing Flexibility in Digital Circuit Evolution: Exploiting Undefined Values in Binary Truth Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricky D. Ledwith and Julian F. Miller Evolving Digital Circuits Using Complex Building Blocks . . . . . . . . . . . . . Paul Bremner, Mohammad Samie, Gabriel Dragffy, Tony Pipe, James Alfred Walker, and Andy M. Tyrrell

1

13

25

37

Session 2: Artificial Development Fault Tolerance of Embryonic Algorithms in Mobile Networks . . . . . . . . . David Lowe, Amir Mujkanovic, Daniele Miorandi, and Lidia Yamamoto Evolution and Analysis of a Robot Controller Based on a Gene Regulatory Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin A. Trefzer, T¨ uze Kuyucu, Julian F Miller, and Andy M. Tyrrell A New Method to Find Developmental Descriptions for Digital Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Ebne-Alian and Nawwaf Kharma Sorting Network Development Using Cellular Automata . . . . . . . . . . . . . . Michal Bidlo, Zdenek Vasicek, and Karel Slany

49

61

73

85

Session 3: GPU Platforms for Bio-inspired Algorithms Markerless Articulated Human Body Tracking from Multi-view Video with GPU-PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luca Mussi, Spela Ivekovic, and Stefano Cagnoni

97

X

Table of Contents

Evolving Object Detectors with a GPU Accelerated Vision System . . . . . Marc Ebner

109

Systemic Computation Using Graphics Processors . . . . . . . . . . . . . . . . . . . . Marjan Rouhipour, Peter J. Bentley, and Hooman Shayani

121

Session 4: Implementations and Applications of Neural Networks An Efficient, High-Throughput Adaptive NoC Router for Large Scale Spiking Neural Network Hardware Implementations . . . . . . . . . . . . . . . . . . Snaider Carrillo, Jim Harkin, Liam McDaid, Sandeep Pande, and Fearghal Morgan Performance Evaluation and Scaling of a Multiprocessor Architecture Emulating Complex SNN Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giovanny S´ anchez, Jordi Madrenas, and Juan Manuel Moreno

133

145

Evolution of Analog Circuit Models of Ion Channels . . . . . . . . . . . . . . . . . . Theodore W. Cornforth, Kyung-Joong Kim, and Hod Lipson

157

HyperNEAT for Locomotion Control in Modular Robots . . . . . . . . . . . . . . Evert Haasdijk, Andrei A. Rusu, and A.E. Eiben

169

Session 5: Test, Repair and Reconfiguration Using Evolutionary Algorithms The Use of Genetic Algorithm to Reduce Power Consumption during Test Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaroslav Skarvada, Zdenek Kotasek, and Josef Strnadel

181

Designing Combinational Circuits with an Evolutionary Algorithm Based on the Repair Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Houjun Liang, Wenjian Luo, Zhifang Li, and Xufa Wang

193

Bio-inspired Self-testing Configurable Circuits . . . . . . . . . . . . . . . . . . . . . . . Andr´e Stauffer and Jo¨el Rossier Evolutionary Design of Reconfiguration Strategies to Reduce the Test Application Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˇ aˇcek, Luk´ Jiˇr´ı Sim´ aˇs Sekanina, and Luk´ aˇs Stareˇcek

202

214

Session 6: Applications of Evolutionary Algorithms in Hardware Extrinsic Evolution of Fuzzy Systems Applied to Disease Diagnosis . . . . . Jo¨el Rossier and Carlos Pena

226

Table of Contents

Automatic Code Generation on a MOVE Processor Using Cartesian Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Alfred Walker, Yang Liu, Gianluca Tempesti, and Andy M. Tyrrell Coping with Resource Fluctuations: The Run-time Reconfigurable Functional Unit Row Classifier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . Tobias Knieper, Paul Kaufmann, Kyrre Glette, Marco Platzner, and Jim Torresen

XI

238

250

Session 7: Reconfigurable Hardware Platforms A Self-reconfigurable FPGA-Based Platform for Prototyping Future Pervasive Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Marc Philippe, Benoˆıt Tain, and Christian Gamrat The X2 Modular Evolutionary Robotics Platform . . . . . . . . . . . . . . . . . . . . Kyrre Glette and Mats Hovin Ubichip, Ubidule, and MarXbot: A Hardware Platform for the Simulation of Complex Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andres Upegui, Yann Thoma, H´ector F. Satiz´ abal, Francesco Mondada, Philippe R´etornaz, Yoan Graf, Andres Perez-Uribe, and Eduardo Sanchez Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism on the Ubichip Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kotaro Kobayashi, Juan Manuel Moreno, and Jordi Madrenas

262 274

286

299

Session 8: Applications of Evolution to Technology Automatic Synthesis of Lossless Matching Networks . . . . . . . . . . . . . . . . . . Leonardo Bruno de S´ a, Pedro da Fonseca Vieira, and Antonio Mesquita A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Farnsworth, Elhadj Benkhelifa, Ashutosh Tiwari, and Meiling Zhu From Binary to Continuous Gates – and Back Again . . . . . . . . . . . . . . . . . Matthias Bechmann, Angelika Sebald, and Susan Stepney Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cristian Ruican, Mihai Udrescu, Lucian Prodan, and Mircea Vladutiu

310

322

335

348

XII

Table of Contents

Session 9: Novel Methods in Evolutionary Design Imitation Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Larry Bull

360

EvoFab: A Fully Embodied Evolutionary Fabricator . . . . . . . . . . . . . . . . . . John Rieffel and Dave Sayles

372

Evolving Physical Self-assembling Systems in Two-Dimensions . . . . . . . . . Navneet Bhalla, Peter J. Bentley, and Christian Jacob

381

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

393

Measuring the Performance and Intrinsic Variability of Evolved Circuits James Alfred Walker, James A. Hilder, and Andy M. Tyrrell Intelligent Systems Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UK {jaw500,jah128,amt}@ohm.york.ac.uk

Abstract. This paper presents a comparison between conventional and multi-objective Cartesian Genetic Programming evolved designs for a 2-bit adder and a 2-bit multiplier. Each design is converted from a gatelevel schematic to a transistor level implementation, through the use of an open-source standard cell library, and simulated in NGSPICE in order to generate industry standard metrics, such as propagation delay and dynamic power. Additionally, a statistical intrinsic variability analysis is performed, in order to see how each design is affected by intrinsic variability when fabricated at a cutting-edge technology node. The results show that the evolved design for the 2-bit adder is slower and consumes more power than the conventional design. The evolved design for the 2-bit multiplier was found to be faster but consumed more power than the conventional design, and that it was also more tolerant to the effects of intrinsic variability in both timing and power. This provides evidence that in the future, evolutionary-based approaches could be a feasible alternative for optimising designs at cutting-edge technology nodes, where traditional design methodologies are no longer appropriate, providing speed and power information about the standard cell library is used.

1

Introduction

The construction of digital logic circuits has often been used as a method to evaluate the performance of non-standard computing techniques such as algorithms inspired by Darwinian evolution. Cartesian Genetic Programming (CGP), originally developed by Miller and Thomson, is a design technique which has been used to evolve novel logic-circuit topologies and has demonstrated efficiency in computation time and resources over other biologically-inspired methods such as Koza’s Genetic Programming [9,8]. CGP differs from conventional Genetic Programming in its representation of a program, which is a directed graph as opposed to a tree. A key benefit of this representation is the implicit re-use of nodes whereby a node can be connected to the output of any previous node within the graph. The CGP genotype is a fixed-length list of integers encoding both the node-function and its connections within the directed graph. Each node within the directed graph represents a particular function, such as a logic gate, and is encoded by a number of genes; one gene encodes the functionality and the G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 1–12, 2010. c Springer-Verlag Berlin Heidelberg 2010 

2

J.A. Walker, J.A. Hilder, and A.M. Tyrrell

remaining genes encode the inputs to the function. The nodes take feed-forward inputs from either previous nodes in the graph or a terminal input. In Miller’s conventional CGP approach an evolutionary run would terminate once a circuit which met the target boolean output-functionality was found. This approach has been successfully used to create numerous novel topologies for building-block logic circuits such as full-adders and multipliers, however the resultant circuits are often significantly larger than optimally-efficient designs need to be. Whilst the circuits are functionally correct in terms of binary output, they will often contain more gates or transistors than conventional human designs, and longer paths between input and output through gates and transistors. In standard logic design one of the primary goals is to minimise both the circuit area and delay; fewer large circuits can be fabricated on a single wafer which results in increased cost, longer delays result in a decrease in the maximum operating frequency of the device. In previous work [7], the conventional CGP algorithm is augmented with a stage which further optimises circuits, once a functionally-correct design has been found. To achieve this goal a two-tiered fitness function is used; the first tier is the conventional boolean-error score based on the binary Hamming distance between the observed output and the target truth-table. For each circuit found which is fully functionally correct, it is then rated for performance over a number of different criteria and sorted into Pareto-fronts using Non-dominating Sorting Algorithm II (NSGA-II) [5]. The fitness criteria used are based on the total gate count of the circuit and the longest gate path, along with the total transistor count of the circuits (which will generally be proportional to the circuit area in a fabricated design), and the longest-transistor path which aims to give an approximation of worst-case transition delay for the circuit. A similar procedure has been followed in recently published work by Wang and Lee, which uses an adapted CGP algorithm which attempts to optimise gate count and gate path-lengths once functionally correct circuits have been found. Their solution, implemented in hardware on a Xilinx-FPGA, does not however consider other important circuit parameters such as transistor count, treating all gates as equal [16]. Although the previous approaches [7,16] have found optimal designs, the fitness criteria used only monitors structural changes to the designs and does not give any feedback about other parameters of the design, such as speed or power consumption, which are crucial in order to enable an evolved design to be feasible and used in industry. Ideally, the optimisation process would have access to these figures but running large circuits through an analogue circuit simulator such as NGSPICE in order to generate these figures is extremely costly in terms of time and would only be feasible on extremely large scale high-performance computing resources. Commercial design tools normally operate with standard cell libraries (gate level) that have been characterised in order to have access to the speed and power figures for a certain working range thereby removing the need for an analogue simulator during the evaluation of a large circuit. However, one downfall of commercial design tools is that they are currently not capable of assessing how intrinsic variability will affect the design at cutting-edge

Measuring the Performance and Intrinsic Variability of Evolved Circuits

3

technology nodes. Recently the scale of transistors has approached the level where the precise placement of individual dopant atoms will affect the output characteristics of the transistor. As these intrinsic variations become more abundant, higher failure rates and lower yields will be observed from conventional designs. Coping with intrinsic variability has been recognised as one of the major unsolved challenges faced by the semiconductor industry [1,4]. In this paper, a conventional design and an evolved design for a 2-bit adder and a 2-bit multiplier (taken from [7]) are implemented at the transistor level and run through the analogue circuit simulator NGSPICE, in order to generate industry standard metrics for the designs, such as propagation delay and dynamic power, and to perform a comparison between the designs based on these metrics. Additionally, a statistical intrinsic variability analysis will be performed on the designs in order to see how intrinsic variability would affect the designs if they were to be fabricated at a cutting edge technology node. It will also be interesting to see if either design shows any signs of variability tolerance over the other. The structure of this paper is as follows: Section 2 discusses the causes and impact of transistor variability, and outlines the methods used to extract accurate data models which incorporate random variations. Section 3 describes the process of converting the conventional and evolved designs from the gate level to the transistor level and defines the performance metrics used. Section 4 provides details of the design comparison based on the performance and intrinsic variability analysis. The conclusions and proposals for future work are summarised in Section 5.

2

CMOS Variability

CMOS devices form the backbone of almost all modern digital circuits. Integrated circuits are assembled from complementary pairs of PMOS and NMOS transistors optimised for high speed and low-power consumption. For many years, the cyclical process of reducing transistor channel length has resulted in devices both faster and lower in power consumption than the previous generation, with modern microprocessors boasting in excess of one billion transistors and gate lengths of under 50nm [14]. The International Technology Semiconductor Road-map (ITRS) published by the Semiconductor Industry Association projects an annual reduction of 11% in gate length, resulting in reduced operating voltages and a decrease in the gate delay of 10% per year [17]. This projected improvement is under threat from the problem of decreased yield caused by heightened variability as devices shrink. 2.1

Causes of Device Variability

The precision of individual device and interconnect parameters has traditionally been dependant on constraints within the manufacturing process, and has been considered deterministic in nature. As channel lengths shrink below 50nm, unavoidable stochastic variability due to the actual location of individual dopant

4

J.A. Walker, J.A. Hilder, and A.M. Tyrrell

(a) 22nm due c.2009

MOSFET (b) 4.2nm due c.2023

MOSFET

(c) Simulated 35nm Device

Fig. 1. Future transistors illustrated at the atomic scale (a & b) and intrinsic parameter fluctuations within a simulated 35nm device (c)[2]

atoms within the device channel is becoming increasingly significant. This is illustrated to scale in figures 1(a) and 1(b), which show that as devices get smaller (22nm to 4.2nm), the ratio of device size to constituent-atom size becomes less favourable, therefore the variable constitution at the atomic scale has an increased effect on device behaviour. Many advances have been made to reduce the loss of precision caused by the manufacturing process, however the fundamental quantum-mechanical limitations cannot be overcome, and their impact will increase as the technology shrinks further [1]. Device variability occurs in both the spatial and temporal domains, and each includes both deterministic and stochastic fluctuations. Spatial variability occurs when the produced device shape differs from the intended design, including uneven doping profiles, non-uniformity in layer thickness and poly-crystalline surfaces. This variability is found at all levels: over the lifetime of a fabrication system, across a wafer of chips, between cells within a VLSI chip, and between individual devices within that cell. Temporal variability includes the effects of electromigration, gate-oxide breakdown and the distribution of negative-bias temperature instability (NBTI). Such temporal variability has been estimated, and can be combined to give an expected lifetime calculation for an individual device, or simulated to determine the compound effect across a whole chip [3,13]. Whilst deterministic variability can be accurately estimated using specific design techniques, intrinsic parameter fluctuations can only be modelled statistically and cannot be reduced with improvements in the manufacturing process [2,10]. 2.2

Intrinsic Parameter Fluctuations

Intrinsic variability is caused by the atomic-level differences in devices that could be considered identical in layout, construction and environment. Summarised below are the principal sources of intrinsic variability, as illustrated in figure 1(c). Random Dopant Fluctuations (RDF) are unavoidable variations caused by the precise number and position of dopant atoms within the silicon lattice, which exist even with a tightly controlled implant and annealing process. This uncertainty results in substantial variability in the device threshold voltage,

Measuring the Performance and Intrinsic Variability of Evolved Circuits

5

sub-threshold slope and drive current, with the most significant variations caused by atoms near the surface and channel of the device [1]. Line Edge Roughness (LER) is the deviation in the horizontal plane of a fabricated feature boundary from its ideal form. LER has both a deterministic nature, caused by imperfections in the mask-manufacturing, photo-resist and etching processes, and also a stochastic nature due to the discrete nature of molecules used within the photo-resist layer, resulting in a random roughness on the edges of blocks etched onto the wafer [2]. Surface Roughness (SR) is the vertical deviation of the actual surface compared to the ideal form. The shrinking of surface layers, in particular the oxide layer, results in variations in the parasitic capacitances between terminals which can add to VT variations [11]. Poly-Silicon Grain Boundary Variability (PSGB) is the variation due to the random arrangement of grains within the gate material due to their polycrystalline structure. Implanted ions can penetrate through the poly-silicon and insulator into the device channel, resulting in localised stochastic variations [6]. 2.3

Modelling Intrinsic Variability

To accurately model the effects of intrinsic parameter fluctuations it is necessary to use statistical 3D simulation methods with a fine-grained discretisation. The Device Modelling Group (DMG) within the University of Glasgow [1,2] has become one of the leading research centres for 3D device modelling using their atomistic simulator, which adapts conventional 3D device modelling tools to incorporate the intrinsic effects described above. To categorise a particular transistor, a large number of current-voltage (I − V ) curves are extracted and then used to calibrate a sub-set of parameters to create a model library representing the device. For the experiments described in this paper, a library of 200 different NMOS and PMOS models, based on a 35nm × 35nm Toshiba device, has been used. To use these models within an open source implementation of the Berkeley SPICE (Simulation Program with Integrated Circuit Emphasis. See http://bwrc.eecs.berkeley.edu/Classes/icbook/SPICE/) circuit simulator, known as NGSPICE (http://ngspice.sourceforge.net/), the DMG has developed a tool, randomspice, which replaces the transistors within a template netlist with models selected randomly from the library. To allow transistors with different widths to be simulated, subcircuits of random transistors connected in parallel are assembled. To estimate the impact of variability, randomspice creates a set of output netlists which are then processed by NGSPICE. Randomspice can also create a single netlist in which only uniform 35nm transistor models are used, without the parameter fluctuations, allowing the variable output to be compared to a uniform ideal output.

6

3

J.A. Walker, J.A. Hilder, and A.M. Tyrrell

Experiment Details

The conventional designs for the 2-bit adder and 2-bit multiplier used in this paper are the standard designs taught to students in most digital electronics courses. The evolved designs are taken from previous published work [7]. These designs were found to be optimal for the structural criteria specified in the paper, namely, it contained the minimal number of gates and transistors compared to the length of the longest gate and transistor paths. However, the speed and power consumption of the designs was not analysed as part of the fitness criteria. In order to assess the designs on these criteria, they need to be converted to a transistor level schematic and simulated in NGSPICE. The conventional and evolved designs for the 2-bit multiplier are shown in figure 2.

(a) Conventional 2-bit Multiplier

(b) Evolved 2-bit Multiplier

Fig. 2. Conventional and evolved designs for a 2-bit multiplier

Converting the designs requires the use of a standard cell library (SCL). SCLs are the industry standard building blocks for constructing large circuits and consist of a number of transistor level implementations of logic and memory functions. In this paper, a number of standard cell layouts from the open-source vsclib library [12] have been used and are shown in figure 3. In order to use the uniform and variability enhanced 35nm models and RandomSPICE discussed in section 2.3, the standard cell layouts and transistor sizes have been translated from there original 130nm process to the 35nm process. In order to convert the conventional and evolved gate-level designs for the 2-bit adder and 2-bit multiplier to transistor level schematics, it is simply a case of replacing each gate with its corresponding transistor implementation from the scaled down 35nm vsclib. Once the gate level designs for the conventional and evolved 2-bit adder and 2-bit multiplier have been translated to the transistor level, an input, supply and load stage are added to the transistor definitions to form the complete netlist, as illustrated in figure 4. This arrangement allows the voltage and current at the inputs, supply, and load to be measured, and allows realistic circuit loads to be connected to produce feasible results. The input signals for testing the designs are created using piece-wise linear (PWL) sources to approximate a transistor response with a given rise/fall time. One input is held logic high for a clock cycle then low for a clock cycle, and then high for a final clock cycle, whilst the

Measuring the Performance and Intrinsic Variability of Evolved Circuits

7

Fig. 3. Cells used from the open-source VSCLib

remaining three inputs are all held logic high. This process is repeated for each of the inputs. A NGSPICE transient analysis is used to observe the voltages and currents over a period of 15 clock cycles for the 2-bit adder and 12 clock cycles for the 2-bit multipier. 3.1

Measuring Speed and Power Consumption

In order to assess whether the evolved designs are faster or lower power than the conventional designs, measure statements were used in the NGSPICE simulation to calculate the propagation delay and the dynamic power of each design. The propagation delay is defined as the time taken from an input reaching the 50% threshold to an output reaching the 50% threshold. As the designs have multiple outputs, it is the slowest time taken for an output to reach the 50% threshold over all input transitions that is used, as this is the delay that would determine the operating frequency of the design. The dynamic power of each design is defined as the integral of supply voltage × supply current for the region of the clock cycle that the design is switching. This switching region is defined from the

8

J.A. Walker, J.A. Hilder, and A.M. Tyrrell

Fig. 4. The testbench used to evaluate the designs in NGSPICE

point when an input starts to switch (rise or fall) to the point when the slowest output has finished switching (falling or rising) and reached a stable state. Once again, as the designs have multiple outputs, it is the output transition(s) that consume the most power that are used. 3.2

Measuring Intrinsic Variability

In order to measure the affects of intrinsic variability, a batch of NGSPICE simulations are performed for both the conventional and evolved designs using a randomised set of 35nm variability enhanced models from randomspice. The speed and power consumption metrics described in the previous section are then calculated using the data from the entire batch of runs and non-parametric statistics are generated to describe how intrinsic variability statistically affects these performance metrics. If the evolved designs show a significant reduction in variability for either of the performance metrics then it is said to be more variability tolerant than the other design.

4

Results

To perform a statistical analysis of the affects of intrinsic variability on the conventional and evolved designs for the 2-bit adder and 2-bit multiplier, 1,000 randomspice simulations are performed for each design and delay and power measurements are calculated for the batch of simulations. The results of the conventional and evolved designs are shown in figure 5, which shows a comparison of the propagation delay and dynamic power for the worst case output when

Measuring the Performance and Intrinsic Variability of Evolved Circuits

(a) Conventional Adder

(b) Evolved Adder

(c) Conventional Multiplier

(d) Evolved Multiplier

9

Fig. 5. Statistical intrinsic variability analysis of the conventional and evolved designs for a 2-bit adder and a 2-bit multiplier. Each point of the scatter plot for each design represents the propagation delay and dynamic power from a single NGSPICE simulation, whilst each cloud of points shows the variation in propagation delay and dynamic power when manipulating an input for each design. The plots above and to the right of each scatter plot show the kernel density estimates of each distribution in terms of propagation delay and dynamic power.

manipulating each input of the designs. The figure also clearly highlights the critical paths for timing and power for both the conventional and evolved designs from which the worst case propagation delay and dynamic power figures in Table 1 are based. Additionally, the structural information also in Table 1 was taken from [7] and was used for the objectives when optimising the evolved designs.

10

J.A. Walker, J.A. Hilder, and A.M. Tyrrell

Table 1. Metrics from the CGP objectives for the conventional and evolved 2-bit adder and 2-bit multiplier designs compared with the NGSPICE measurements 2-bit Adder Metric CGP

Gate Count Transistor Count Longest Gate Path Longest Transistor Path

NGSPICE Propagation Delay Dynamic Power

2-bit Multiplier

Conventional Evolved Conventional Evolved 10 64 4 12

10 60 4 11

2.98e−11 3.79e−11 3.36e−7 1.05e−6

8 54 3 9

7 35 2 5

4.56e−11 3.28e−11 3.73e−15 4.39e−15

From the results, it can be seen that the evolved design for the adder is 27% slower and consumes 312% more power than the conventional design, whereas the evolved design for the multiplier is 28% faster but consumes 17% more power than the conventional design. The improvement in delay of the evolved multiplier corresponds to the reduction in path length between the two designs, whereas for the adder, the path lengths between the two designs are similar, so it is surprising to see the evolved design is so much slower. Both evolved designs consumed more power than the conventional designs, which is surprising considering both evolved designs have a reduction in either gate or transistor count. However, this highlights the fact that the evolved designs were not specifically optimised for power and that some sort of power measure should be incorporated into the CGP objectives. Interestingly, on comparing the statistics of the timing and power distributions for both the evolved and conventional designs, it can be seen that the evolved design for the 2-bit adder shows a greater amount of variability than the conventional design but the 2-bit multiplier has less variability than the conventional design in both distributions. The evolved design for the 2-bit multiplier shows a reduction of 39% in the inter-quartile range (IQR, defined as the middle 50% of the distribution) and a 27% in the range of the timing distribution for the critical path. Also, the power distribution of the critical path of the evolved design for the 2-bit multiplier shows a reduction of 25% in the IQR and a 17% in the range. Therefore, it can be said that the evolved design for the 2-bit multiplier is more variability tolerant than the conventional design, in addition to it being faster. However, this could be attributed to the evolved design consuming more power than the conventional design. This highlights the fact that the optimisation process used in [7] could be a feasible option for designing variability tolerant circuits at cutting-edge technology nodes, when traditional design methodologies are no longer appropriate, providing the objectives used reflect more accurately the speed and power consumption of the designs.

Measuring the Performance and Intrinsic Variability of Evolved Circuits

5

11

Conclusions and Future Work

This paper has presented a comparison between conventional and evolved designs for a 2-bit adder and a 2-bit multiplier based on performance metrics and a statistical intrinsic variability analysis obtained from a batch of 1,000 NGSPICE simulations. The results show that the evolved design for the 2-bit adder was slower and consumed more power than the conventional design and the evolved design for the 2-bit multiplier was faster but consumed more power than the conventional design. The results for the 2-bit multiplier shows some correlation to the original objectives used in the optimisation process, however no correlation can be seen for the 2-bit adder results. This partly supports the claims made in [7] that by optimising designs post-evolution using multiple objectives that consider the gate and transistor counts and path lengths, it is possible to produce fabricateable designs that show real-world improvements in circuit area and operating speed (in some cases). However, it highlights the fact in future work, the objectives used in the optimisation process from [7] need to be expanded to include power and delay measurements from each standard cell. This would enable the optimisation process to perform a similar role to some aspects of commercial design tools. The statistical intrinsic variability analysis showed that the evolved design for the 2-bit multiplier is also more tolerant to the affects of intrinsic variability than the conventional design in both timing and power. This shows that the optimisation process could be a feasible alternative for optimising designs at cutting-edge technology nodes where traditional design methodologies are no longer appropriate (as they cannot account for the affects of intrinsic variability), providing the measures suggested above are incorporated. In future work, it is intended to expand the objectives used in the optimisation process from [7] to consider the affects of intrinsic variability on both timing and power. Additionally, the standard cells themselves could first be optimised for performance and variability tolerance using the approach from [15]. The optimisation process from [7] would then appear as the next design tool in a conventional tool chain that operates at a higher level of abstraction.

Acknowledgements The authors would like to thank all partners of the Nano-CMOS project, especially the Device Modelling Group at the University of Glasgow for providing the variability-enhanced models and the randomspice application. Nano-CMOS is funded by the EPSRC under grant No. EP/E001610/1.

References 1. Asenov, A.: Random dopant induced threshold voltage lowering and fluctuations in sub 50 nm mosfets: a statistical 3D ’atomistic’ simulation study. Nanotechnology 10, 153–158 (1999)

12

J.A. Walker, J.A. Hilder, and A.M. Tyrrell

2. Asenov, A.: Variability in the next generation CMOS technologies and impact on design. In: International Conference on CMOS Variability (2007) 3. Bernstein, J.B., et al.: Electronic circuit reliability modeling. Microelectronics Reliability 46, 1957–1979 (2006) 4. Bernstein, K., et al.: High-performance CMOS variability in the 65-nm regime and beyond. Advanced Silicon Technology 50 (2006) 5. Deb, K.A.P., Agarwal, S., Meyarivan, T.: A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 181–197 (2002) 6. Eccleston, W.: The effect of polysilicon grain boundaries on MOS based devices. Microelectronic Engineering 48, 105–108 (1999) 7. Hilder, J.A., Walker, J.A., Tyrrell, A.M.: Use of a multi-objective fitness function to improve cartesian genetic programming circuits. In: NASA/ESA Conference on Adaptive Hardware and Systems, AHS-2010 (2010) 8. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992) 9. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 10. Design for Variability in Logic, Memory and Microprocessor. In: Mizuno, M., De, V. (eds.) VLSI Circuits Proc. Kyoto, Japan (2007) 11. Moroz, V.: Design for manufacturability: OPC and stress variations. In: International Conference on CMOS Variability (2007) 12. Petley, G.: VLSI and ASIC technology standard cell library design, http://www.vlsitechnology.org 13. Rubio, J., et al.: Physically based modelling of damage, amorphization and recrystallization for predictive device-size process simulation. Materials Science and Engineering B, 114–115 (2004) 14. Streetman, B.G., Banerjee, S.: Solid State Electronic Devices. Prentice-Hall, Englewood Cliffs (2000) 15. Walker, J.A., Sinnott, R., Stewart, G., Hilder, J.A., Tyrrell, A.M.: Optimising electronic standard cell libraries for variability tolerance through the Nano-CMOS grid. Philosophical Transactions of the Royal Society A (2010) 16. Wang, J., Lee, C.: Evolutionary design of combinational logic circuits using vra processor. IEICE Electronics Express 6, 141–147 (2009) 17. Wyon, C.: Future technology for advanced MOS devices. Nuclear Instruments and Methods in Physics Research B 186 (2002)

An Efficient Selection Strategy for Digital Circuit Evolution Zbyˇsek Gajda and Luk´aˇs Sekanina Brno University of Technology, Faculty of Information Technology Boˇzetˇechova 2, 612 66 Brno, Czech Republic [email protected], [email protected]

Abstract. In this paper, we propose a new modification of Cartesian Genetic Programming (CGP) that enables to optimize digital circuits more significantly than standard CGP. We argue that considering fully functional but not necessarily smallest-discovered individual as the parent for new population can decrease the number of harmful mutations and so improve the search space exploration. This phenomenon was confirmed on common benchmarks such as combinational multipliers and the LGSynth91 circuits.

1

Introduction

Cartesian Genetic Programming (CGP) exhibits many interesting features, especially for circuit design. When CGP is applied to reduce the number of gates in digital circuits it starts with the fitness function which evaluates the circuit behavior only. Once one of candidate circuits conforms to the behavioral specification the number of gates becomes important and reflected in the fitness value. This method which will be called the standard CGP in this paper, is widely adopted in literature [1, 2, 3, 4]. We have shown in our previous work [5] that area-efficient digital circuits can be evolved even if the requirement on the gate reduction is not specified explicitly. The method is based on modifying the selection mechanism and fitness function of the standard CGP. In this paper, we provide further experimental evidence for this phenomenon. In addition to testing the method using popular benchmarks such as multipliers we will perform experimental evaluation using the LGSynth91 benchmark circuits. We hypothesize that the neutral search and redundancy of encoding of CGP (as demonstrated in [6, 7, 8]) are primarily responsible for this phenomenon. We argue that considering fully functional but not necessarily smallest-discovered individuals as parents improve the search space exploration in comparison with the standard CGP. The rest of the paper is organized as follows. Section 2 surveys the basic (standard) version of CGP. Benchmark problems are presented in Section 3. Proposed modification of CGP is formulated in Section 4. The results of experiments are summarized in Section 5. Section 6 deals with the analysis of results on the basis of measurement of non-destructive mutations. Finally, conclusions are given in Section 7. G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 13–24, 2010. c Springer-Verlag Berlin Heidelberg 2010 

14

2

Z. Gajda and L. Sekanina

Cartesian Genetic Programming

Cartesian Genetic Programming is a widely-used method for evolution of digital circuits [9, 1]. In CGP, a candidate entity (circuit) is modeled as an array of nc (columns) × nr (rows) of programmable nodes (gates). The number of inputs, ni , and outputs, no , is fixed. Each node input can be connected either to the output of a node placed in previous l columns or to one of the program inputs. The l-back parameter, in fact, defines the level of connectivity and thus reduces/extends the search space. For example, if l=1 only neighboring columns may be connected; if nr = 1 and l = nc then full connectivity is enabled. Feedback is not allowed. Each node is programmed to perform one of na -input functions defined in the set Γ (nf denotes |Γ |). Each node is encoded using na + 1 integers where values 1 . . . na are the indexes of the input connections and the last value is the function code. Every individual is encoded using nc .nr .(na + 1) + no integers. Figure 1 shows an example of a candidate circuit and its chromosome.

0

1

3

nor

5

5

2

7

xor

not

1 2

2 xor

4

3

6

and

5

8

not

1,2,1; 1,2,2; 4,2,5; 3,4,3; 6,1,2; 0,5,5; 7,6

Fig. 1. An example of a candidate circuit in CGP and its chromosome: l = 3, nc = 3, nr = 2, ni = 3, no = 2, na = 2, Γ = {NOR (1), XOR (2), AND (3), NAND (4), NOT (5)}

CGP operates with the population of 1 +λ individuals (typically, λ is between 1 and 20). The initial population is constructed either randomly or by a heuristic procedure. Every new population consists of the best individual of the previous population and its λ offspring. The offspring individuals are created using a point mutation operator which modifies h randomly selected genes of the chromosome, where h is the user-defined value. There is one important rule for selection of the parent. In case when two or more individuals can serve as the parent, the individual which has not served as the parent in the previous generation will be selected as the new parent. This strategy is important because it ensures the diversity of population [7]. The algorithm is terminated when the maximum number of generations is exhausted or a sufficiently working solution is obtained. Because we will deal with digital circuit evolution, let us consider the fitness function for that case only. The goal is to obtain a perfectly working circuit

An Efficient Selection Strategy for Digital Circuit Evolution

15

(all assignments to the inputs have to be tested) with the number of gates as low as possible. Additional criteria can be included; however, we will not deal with them in this paper. The most effective strategy to the fitness calculation proposed so far is as follows: The fitness value of a candidate circuit is defined as [3]:  b when b < no 2ni , (1) f it1 = b + (nc nr − z) otherwise, where b is the number of correct output bits obtained as response for all possible assignments to the inputs, z denotes the number of gates utilized in a particular candidate circuit and nc .nr is the total number of available gates. It can be seen that the last term nc nr − z is considered only if the circuit behavior is perfect, i.e. b = bmax = no 2ni . We can observe that the evolution has to discover a perfectly working solution firstly while the size of circuit is not important. Then, the number of gates is optimized. The encoding used in CGP is redundant since there may be genes that are entirely inactive. These genes do not influence the phenotype, and hence the fitness. This phenomenon is often referred to as neutrality. The role of neutrality has been investigated in detail [10, 6, 7]. For example, it was found that the most evolvable representations occur when the genotype is extremely large and in which over 95% of the genes are inactive [7]. But for example, Collins has shown that for some specific problems the neutrality-based search is not the best solution [11]. Miller has also identified that the problem of bloat is insignificant for CGP [12].

3

Benchmark Problems

Design of small multipliers is the most popular benchmark problem for the gate level circuit evolution. Because the direct CGP approach is not scalable it works only for 4-bit multipliers (i.e. 8-input/8-output circuits) and smaller. Table 1 summarizes the best known results for various multipliers according to [1, 2]. CGP was used with two-input gates, l = nc , λ = 4, h = 3, remaining parameters are given in Table 1. CGP was seeded using conventional designs. The fitness function was constructed according to equation 1. CGP is capable of creating innovative designs for this class of circuits. However, it is important to carefully initialize CGP parameters. For example, in order to reduce the search space the function set should contain just the logic functions that are important for multipliers (the solutions denoted as Best CGP in Table 1 were obtained using Γ = {x AN D y, x XOR y, (not x) AN D y}). However, the gate (not x) AN D y is not usually considered as a single gate in digital design. Its implementation is constructed using two gates: AND and NOT. Hence we also included ‘Recalc. CGP’ to Table 1 which is the result recalculated when one considers (not x) AN D y as two gates in the multipliers shown in [2].

16

Z. Gajda and L. Sekanina Table 1. The number of two-input gates in multipliers according to [1, 2] Multiplier 2b×2b 3b×2b 3b×3b 4b×3b 4b×4b

Best conv. 8 17 30 47 64

Best CGP 7 13 23 37 57

Recalc. CGP 9 14 25 44 67

nr × nc 1×7 1 × 17 1 × 35 1 × 56 1 × 67

Max. gener. 10k 200k 20M 200M 700M

For further comparison of the standard CGP and proposed method we have selected 16 circuits from the LGSynth91 benchmark suite [13] (see Table 4). In this case we have utilized CGP in the postsynthesis phase, i.e. CGP is employed to reduce the number of gates in already synthesized circuits. In this paper, we have used the ABC tool to perform (conventional) synthesis [14]. Each circuit is represented as a netlist of gates in the BLIF format (Berkeley Logic Interchange Format).

4

The Proposed Modification of CGP

From the perspective of this paper, the fitness function and selection strategy are the most interesting features of the standard CGP. Because (1 + λ) strategy is used, the highest-scored individual p (whose fitness value will be denoted fp ) is always preserved. The result of evolution is then just the highest-scored individual of the last generation in the standard CGP. Consider a situation in which a fully working circuit has already been obtained (b = bmax ) and the number of gates is optimized now. If the mutation operator creates an individual x with the fitness value fx and fx ≥ fp then x will become a new parental solution p (assuming that there is no better result of mutation in the population). However, if the mutation operator creates individual y with the fitness value fy and (fy < fp ) ∧ (fy ≥ bmax ) then p will be selected as parent for the new population and y will be discarded (assuming that the fitness values of other solutions are lower than fy ). In this way, many new fully functional solutions, however slightly worse than the parent, are lost. We will demonstrate in Section 5 that considering individual y for which the property (fy < fp )∧(fy ≥ bmax ) holds as a new parent is beneficial for the efficient search process. The new selection strategy and fitness function is proposed only for the situation when the number of gates is optimized, i.e. the fitness value of the best individual is higher than or equal to bmax . Otherwise, the algorithm works as the standard CGP. As the best individual found so far will not be copied to the new population automatically, it is necessary to store it in an auxiliary variable. Let β denote the best discovered solution and let fβ be its fitness value. In the first population, β is initialized using p. Assume that x1 . . . xλ are individuals (with fitness values fx1 . . . fxλ ) created from the parental solution p using the mutation operator and fβ ≥ bmax (i.e. we are in the gate reduction phase now). Because the best individual β and parental

An Efficient Selection Strategy for Digital Circuit Evolution

17

individual p are not always identical we have to determine their new instances β  and p separately. The best-discovered solution is defined as:  β when fβ ≥ fxi , i = 1 . . . λ,  β = (2) otherwise, xj where xj is the highest-scored individual for which fxj > fβ holds. If multiple individuals exist that have higher fitness than fβ in {x1 . . . xλ }, randomly choose the best one of them. The new parental individual is defined as:  p when ∀i, i = 1 . . . λ : fxi < bmax p = (3) xj otherwise, where xj is a randomly selected individual from those in {x1 . . . xλ } which obtained the fitness score higher than or equal to bmax . In other words, the new parent must be a fully functional solution; however, the number of gates is not important for its selection. Note that the result of evolution is no longer p but β. The proposed strategy will be denoted fit2.

5 5.1

Results Experimental Setup

CGP is used according to its definition in Section 2. In this paper, we always use nr = 1 and l = nc . The initial population is generated either randomly or using a solution obtained from a conventional synthesis method. If CGP is applied as a postsynthesis optimizer then the number of gates of the result of conventional synthesis is denoted as m (it is assumed that each of the gates has up to γ inputs). Then CGP will operate with the parameters nc = m, nr = 1, l = nc , na = γ. In all experiments λ = 14, γ = 2 and h is between 1 and 14 (the mean value is 7). We have used Γ  = {and, or, not, nand, nor, xor, identity, const1 , const0 } where not and identity are unary functions (taking the first input of the gate) and constk is constant generator with the value k. Each experiment is repeated ten times with the 100 million generation limit. In all experiments the standard fitness function of CGP (denoted fit1) is compared with the method presented in Section 4 (denoted fit2). 5.2

Evolution from a Random Population

In the first experiment, we have evolved multipliers with up to four-bit operands from randomly generated initial population. According to recommendations of [7], we intentionally allowed relatively long chromosomes to be used by CGP. The nc values were set on the basis of ABC synthesis (see Table 3, the seed). Table 2 summarizes the number of gates (the best and mean values), mean number of generations to reach bmax and the success rate for fit1 and fit2. As

18

Z. Gajda and L. Sekanina

Table 2. The best-obtained and mean number of gates for the multiplier benchmarks when CGP starts from randomly generated initial population Circuit 2b × 2b 3b × 2b 3b × 3b 4b × 3b 4b × 4b

Alg. fit1 fit2 fit1 fit2 fit1 fit2 fit1 fit2 fit1 fit2

nc 7 16 57 125 269

gates (best) 7 7 13 13 25 23 46 37 110 60

gates (mean) 7 7 13 13 27.7 23.4 52.7 43.1 128.3 109.4

mean # gener. 2 738 2 777 651 297 741 758 476 812 625 682 2 714 891 4 271 179 29 673 418 37 573 311

succ. runs 100% 100% 100% 100% 100% 100% 100% 100% 90% 70%

design of 2b×2b and 3b×2b multipliers is easy for CGP, we will mainly analyse the results for larger problem instances (here and in next sections). It can be seen that fit2 gives better results than fit1. However, the mean number of generations is higher for fit2. We have obtained almost identical minimum number of gates when compared with [2] (also in Table 1, Best CGP) even when CGP is randomly initialized and a non-problem specific set of gates is utilized. 5.3

Post-synthesis Optimization

The second set of experiments compares fit1 and fit2 when CGP is applied to reduce the number of gates in already functional circuits. We compared three approaches to seeding the initial population in case of multipliers. The resulting multipliers of the ABC tool are taken as seeds in the first group of experiments (denoted ’seed:ABC’ in Table 3). The second group of experiments is seeded using the best multipliers reported in paper [2] (denoted ’seed:Tab. 1’ in Table 3). The seeds of the third group of experiments are created manually as combinational carry save multipliers according to [15] (denoted ’seed: CM’ in Table 3). Table 3 shows that fit2 can produce more compact designs (see the ’best’ column) than fit1. The mean number of gates is given in generation 1M, 2M, 5M, 10M, 20M, 50M and 100M (M=106 ). It can be seen that the best solution is improving over time. The best-evolved multiplier (4b × 4b) is composed of 56 gates (taken from Γ  which does not consider the AND gate with one input inverted as a single gate). The best circuit presented in [2] consists of 57 gates taken from Γ (i.e., 67 gates when Γ  is used). We can also express the implementation cost in terms of transistors used. While the 56-gate multiplier is composed of 400 transistors the multiplier reported in [2] consists of 438 transistors. It is assumed that the number of transistors required to create a particular gate is as follows: nand (4 tr.), nor (4 tr.), or (6 tr.), and (6 tr.), not (2 tr.) and xor (10 tr.) [15].

An Efficient Selection Strategy for Digital Circuit Evolution

19

Table 3. The best-obtained and mean number of gates in generations 1M...100M for the multiplier benchmarks when CGP is seeded by functional solutions of different type seed: ABC 2b × 2b

Alg. seed best 1M 2M 5M 10M 20M 50M 100M fit1 17 7 7 7 7 7 7 7 7 fit2 7 7 7 7 7 7 7 7 3b × 2b fit1 16 13 13 13 13 13 13 13 13 fit2 13 13 13 13 13 13 13 13 3b × 3b fit1 57 26 38.2 36.1 34.3 32.6 31 29.8 28.7 fit2 23 31.5 28.8 27.2 25 24.5 24.2 23.5 4b × 3b fit1 125 54 93.2 88.3 79.3 75.6 71.6 66.6 64.4 fit2 37 80 68 55.9 49.9 46.9 44.1 41.1 4b × 4b fit1 269 140 212.4 190.6 178.9 170.9 165.2 158.5 152.4 fit2 68 218.2 182.2 151.3 136.5 121.2 107 93.3 seed: Tab. 1 seed best 1M 2M 5M 10M 20M 50M 100M 2b × 2b fit1 9 7 7 7 7 7 7 7 7 fit2 7 7 7 7 7 7 7 7 3b × 2b fit1 14 13 13 13 13 13 13 13 13 fit2 13 13 13 13 13 13 13 13 3b × 3b fit1 25 23 25 25 24.7 23.9 23.5 23.2 23.1 fit2 23 25 25 24.7 24.4 24.2 23.5 23.1 4b × 3b fit1 44 36 38.5 37.8 37.1 36.8 36.8 36.4 36.3 fit2 35 37.9 37.1 36.5 36.4 36.2 36.2 36.1 4b × 4b fit1 67 57 59.6 58.8 58 57.8 57.5 57.3 57.1 fit2 56 59.5 59.2 58.7 58.3 57.2 56.8 56.8 seed: CM seed best 1M 2M 5M 10M 20M 50M 100M 2b × 2b fit1 8 7 7 7 7 7 7 7 7 fit2 7 7 7 7 7 7 7 7 3b × 2b fit1 17 13 13 13 13 13 13 13 13 fit2 13 13 13 13 13 13 13 13 3b × 3b fit1 30 23 28 28 28 27.8 27.6 26.5 25.8 fit2 23 28 28 27.6 26.8 25 24.4 23.4 4b × 3b fit1 45 37 43 43 43 42.4 41.9 40.6 39.2 fit2 37 43 43 42.6 42.2 41.5 39.9 38.4 4b × 4b fit1 64 59 62.9 62.6 62.6 62.3 61.5 60.6 60.2 fit2 59 62.9 62.9 62.8 62.4 62 61.3 60.8

Table 4 gives the best-obtained and mean number of gates for the LGSynth91 benchmark circuits when CGP is seeded by already working circuits. The working circuits (of the size given by nc ) were obtained using ABC initialized with the original LGSynth91 circuits (in the BLIF format) and mapped on two-input gates of Γ  . The ’exp. gates’ is the estimated number of two-input gates (after the conventional synthesis) given in [13]. It can be seen that fit2 is more successful than fit1. In general, CGP gives better results than ’exp. gates’ because it does not employ any deterministic synthesis algorithm; all the optimizations are being done implicitly, without any structural biases.

20

Z. Gajda and L. Sekanina 4x4 Multiplier seeded by ABC

4x4 Multiplier seeded by ABC 300

fit2 fit1

250

Number of gates

Number of gates

300 200 150 100 50 0

fit2 fit1

250 200 150 100 50 0

1e+08

b)

9e+07

a)

8e+07

Generations

7e+07

6e+07

5e+07

4e+07

3e+07

2e+07

1e+07

0

1e+08

9e+07

8e+07

7e+07

6e+07

5e+07

4e+07

3e+07

2e+07

1e+07

0

Generations

Fig. 2. a) The number of gates of the parent individual (from the best run for 4b×4b multiplier) b) The mean number of gates of the best-obtained individuals β (from 10 runs for 4b×4b multiplier)

Table 4. The best-obtained and mean number of gates for the LGSynth91 benchmarks when CGP starts from the initial solution (of size nc ) synthesized using ABC Circuit

ni

no

9symml C17 alu2 alu4 b1 cm138a cm151a cm152a cm42a cm82a cm85a decod f51m majority x2 z4ml

9 5 10 14 3 6 12 11 4 5 11 5 8 5 10 7

1 2 6 8 4 8 2 1 10 3 3 16 8 1 7 4

exp. gates 43 6 335 681 13 17 33 17 27 38 22 43 9 42 20

nc seed 216 6 422 764 11 19 34 24 20 12 41 34 146 10 60 40

gates fit1 (best) 53 6 134 329 4 16 24 22 17 10 23 30 29 8 27 15

gates fit2 (best) 23 6 73 274 4 16 23 21 17 10 22 26 26 8 27 15

gates fit1 (mean) 68.5 6 149 358 4 16 24 22.1 17 10 24.1 30 32.9 8 29.6 15

gates fit2 (mean) 25.5 6 89.4 279 4 16 23 21.8 17 10 22 26.1 27.3 8 27.4 15

Figure 2a shows the number of gates of the parent individual p in every 1000th generation during the progress of evolution of the 4b×4b multiplier using fit1 and fit2 (taken from the best runs; seeded by ABC). It can be seen that the parent is different from the best-obtained solution for fit2 (the curve is not monotonic). We can also observe that fit1 provides better result than fit2 in the early stages of the evolution. However, fit2 outperforms fit1 when more generations are allowed for evolution. Figure 2b shows the mean number of gates of the best-obtained individuals (averaged from 10 independent runs).

An Efficient Selection Strategy for Digital Circuit Evolution

6

21

Analysis

We have seen so far that selecting of the parent individual on the basis of its functionality solely (and so neglecting the number of gates) provides slightly better results at the end of evolution (when the goal is to reduce the phenotype size) than the standard CGP. How is it possible that the approach really works? Recall that the fitness landscape is rugged and neutral in case of digital circuit evolution using CGP [6, 8]. Hence relatively simple mutation-based search algorithms are more successful than sophisticated search algorithms and genetic operators such as those developed in the field of genetic algorithms and estimation of distribution algorithms. In the standard CGP, generating the offspring individuals is biased to the best individual that has been discovered so far. The best individual is changed only if a better or equally-scored solution is found. In the proposed method, the changes of the parent individual are more frequent because the only requirement for a candidate individual to qualify as the parent is to be fully functional. Hence we consider the proposed algorithm as more explorative than the standard CGP. Our hypothesis is that if a high degree of redundancy is present in the genotype the proposed method will generate more functionally correct individuals than the standard CGP. And because the fitness landscape is rugged and neutral the proposed method is more efficient in finding compact circuit implementations than the standard CGP. In order to verify this hypothesis we have measured the number of mutations that lead to functionally correct circuits. When CGP is seeded with a working circuit, we have in fact measured the number of neutral and useful mutations. Figure 3 compares the results for fit1 and fit2 in the experiments that are reported in Table 2 and Table 3. The y-axis is labeled as MNM which stands for ’Millions of Non-destructive Mutations’. For small multipliers (2b×2b, 3b×2b) fit1 always yields higher MNM which contradicts with our hypothesis. However, we have already declared that these really small multipliers are not interesting because the problem is easy and an optimal solution can be discovered very quickly. In case of more difficult circuits, fit2 provides higher MNM in most cases, especially when sufficient redundancy is available (see Fig. 3 a, d). When the best resulting multipliers of paper [2] are used to seed the initial population, fit1 is always higher than fit2 (see Fig. 3 b). It corresponds with a theory that CGP (with almost the zero redundancy in the genotype) has got stuck at a local extreme and fit2 does not have a space to work. The number of non-destructive mutations was counted in every 1000 generations and the resulting value was plotted as a single point to Fig. 4a (3b×3b multiplier) and Fig. 4b (4b×4b multiplier). The best run seeded using ABC is shown in both cases. It is evident that significantly more correct individuals have been generated for fit2 on average. It can also be seen that while fit1 tends to create a relatively stable number of correct individuals in time (the dispersion is approx. 200 individuals for the 4b×4b multiplier), great differences are observable in the number of correct individuals for fit2 (the dispersion is approx. 1000 individuals for the 4b×4b multiplier). That also supports the idea of biased search of fit1.

22

Z. Gajda and L. Sekanina Seed:Table 1 50 fit1 fit2

fit1 fit2

40 MNM

MNM

Seed:ABC 400 350 300 250 200 150 100 50 0

30 20 10 0

2x2

3x2

3x3 4x3 Multiplier

4x4

2x2

3x2

a)

Random Intial Population

fit1 fit2 MNM

MNM

40 35 30 25 20 15 10 5 0 3x2

4x4

b)

Seed:Comb. Mult.

2x2

3x3 4x3 Multiplier

3x3 4x3 Multiplier

400 350 300 250 200 150 100 50 0

fit1 fit2

4x4

2x2

c)

3x2

3x3 4x3 Multiplier

4x4

d)

fit2 fit1

4bx4b multiplier seeded by ABC 7000 6000 5000 4000 3000 2000 1000 0

fit2 fit1

1e+08

9e+07

8e+07

7e+07

6e+07

5e+07

4e+07

3e+07

2e+07

1e+07

0

1e+08

9e+07

8e+07

7e+07

6e+07

5e+07

4e+07

3e+07

2e+07

1e+07

Generations

a)

Non-dest. muts. per 1000 gens.

3bx3b multiplier seeded by ABC 4000 3500 3000 2500 2000 1500 1000 500 0 0

Non-dest. muts. per 1000 gens.

Fig. 3. Millions of Non-destructive Mutations (MNM) for different experiments (mean values given)

Generations

b)

Fig. 4. The number of non-destructive mutations per 1000 generations for: a) 3b×3b multiplier; b) 4b×4b multiplier

7

Conclusions

In this paper, we have shown that the selection of the parent individual on the basis of its functionality instead of compactness leads to smaller phenotypes at the end of evolution. The method is especially useful for the optimization of

An Efficient Selection Strategy for Digital Circuit Evolution

23

nontrivial circuits when a sufficient redundancy is available in terms of available gates and a sufficient time is allowed for evolution. In the future work we plan to test the proposed method to reduce the size of phenotype in symbolic regression problems.

Acknowledgments This work was partially supported by the grant Natural Computing on Unconventional Platforms GP103/10/1517, the BUT FIT grant FIT-10-S-1 and the research plan Security Oriented Research in Information Technology MSM 0021630528.

References [1] Miller, J.F., Job, D., Vassilev, V.K.: Principles in the Evolutionary Design of Digital Circuits – Part I. Genetic Programming and Evolvable Machines 1(1), 8–35 (2000) [2] Vassilev, V., Job, D., Miller, J.: Towards the Automatic Design of More Efficient Digital Circuits. In: Proc. of the 2nd NASA/DoD Workshop on Evolvable Hardware, pp. 151–160. IEEE Computer Society, Los Alamitos (2000) [3] Kalganova, T., Miller, J.F.: Evolving more efficient digital circuits by allowing circuit layout evolution and multi-objective fitness. In: The First NASA/DoD Workshop on Evolvable Hardware, pp. 54–63. IEEE Computer Society, Los Alamitos (1999) [4] Gajda, Z., Sekanina, L.: Reducing the number of transistors in digital circuits using gate-level evolutionary design. In: 2007 Genetic and Evolutionary Computation Conference, pp. 245–252. ACM, New York (2007) [5] Gajda, Z., Sekanina, L.: When does cartesian genetic programming minimize the phenotype size implicitly? In: Genetic and Evolutionary Computation Conference. ACM, New York (2010) (accepted) [6] Vassilev, V.K., Miller, J.F.: The advantages of landscape neutrality in digital circuit evolution. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES 2000. LNCS, vol. 1801, pp. 252–263. Springer, Heidelberg (2000) [7] Miller, J.F., Smith, S.L.: Redundancy and Computational Efficiency in Cartesian Genetic Programming. IEEE Transactions on Evolutionary Computation 10(2), 167–174 (2006) [8] Miller, J.F., Job, D., Vassilev, V.K.: Principles in the Evolutionary Design of Digital Circuits – Part II. Genetic Programming and Evolvable Machines 1(3), 259–288 (2000) [9] Miller, J., Thomson, P.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) [10] Yu, T., Miller, J.F.: Neutrality and the evolvability of boolean function landscape. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 204–217. Springer, Heidelberg (2001) [11] Collins, M.: Finding needles in haystacks is harder with neutrality. In: GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation, pp. 1613–1618. ACM, New York (2005)

24

Z. Gajda and L. Sekanina

[12] Miller, J.: What bloat? cartesian genetic programming on boolean problems. In: 2001 Genetic and Evolutionary Computation Conference Late Breaking Papers, pp. 295–302 (2001) [13] Yang, S.: Logic Synthesis and Optimization Bechmarks User Guide, Version 3.0 (1991) [14] Berkley Logic Synthesis and Verification Group (ABC: A System for Sequential Synthesis and verification) [15] Weste, N., Harris, D.: CMOS VLSI Design: A Circuits and Systems Perspective, 3rd edn. Addison-Wesley, Reading (2004)

Introducing Flexibility in Digital Circuit Evolution: Exploiting Undefined Values in Binary Truth Tables Ricky D. Ledwith and Julian F. Miller Dept. of Electronics, The University of York, York, UK [email protected], [email protected]

Abstract. Evolutionary algorithms can be used to evolve novel digital circuit solutions. This paper proposes the use of flexible target truth tables, allowing evolution more freedom where values are undefined. This concept is applied to three test circuits with different distributions of “don’t care” values. Two strategies are introduced for utilising the undefined output values within the evolutionary algorithm. The use of flexible desired truth tables is shown to significantly improve the success of the algorithm in evolving circuits to perform this function. In addition, we show that this flexibility allows evolution to develop more hardware efficient solutions than using a fully-defined truth table. Keywords: Genetic Programming (GP), Evolutionary Algorithms, Cartesian Genetic Programming (CGP), Evolvable Hardware, “Don’t Care” Logic.

1 Introduction The design of digital circuits using evolutionary algorithms has attracted interest [1, 2, 3, 14, 15]. In this paper the evolutionary design of digital combinational circuits is considered using the established technique Cartesian Genetic Programming (CGP) [4]. However, for the first time as far as the authors are aware, this paper takes account of unspecified logic terms. These unspecified values are referred to as “don’t cares”, and often occur in design of finite state machines, and logic synthesis for machine learning [5]. In CGP, genotypes are represented as a list of integers mapped to directed graphs, as opposed to the more typical tree mapping structure. This provides a general framework for solving a range of problems, which has been proven effective in multiple areas including for evolution of combinational digital circuits. The evolution of digital circuits utilises a version of CGP where the behaviour of nodes are characterised by Boolean logic equations. A genotype is mapped to a phenotype by realisation of the digital circuit constructed from the nodes (and connections) encoded within the genotype. Since not all of the nodes will have connections that influence the outputs, either directly or indirectly, some of the nodes do not contribute to the resulting circuit. This introduces a level of neutrality to CGP, whereby multiple genotypes are mapped to the same phenotype and hence have equal fitness values. In this paper extrinsic evolution is employed, whereby circuit phenotypes are evaluated in software. An assemble-and-test approach is used, where the phenotype circuit is constructed from its components and simulated. The binary truth table of the assembled G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 25–36, 2010. © Springer-Verlag Berlin Heidelberg 2010

26

R.D. Ledwith and J.F. Miller

circuit is then compared with the desired circuit truth table. The fitness function performs this comparison, with the fitness being the number of correct output bits in the table. Extrinsic evolution is accepted by many to be most suited to digital circuit evolution, as it has the advantage of providing symbolic solutions that can be implemented on a variety of devices. This method is used by Miller et al in [1] and [3]. Limitations of this system arise when attempting to evolve a circuit for which there are outputs whose value is not specified for a given input pattern. Since an assembleand-test strategy is being used, the entire truth table must be encoded and provided to the program at run-time to be available for the comparison tests. This requires each output value to be specified for all possible input combinations, and hence “don’t care” values must be assigned a value. Arbitrarily selecting a value for these situations restricts the evolution of the circuit by forcing the program to evolve solutions which satisfy the entire truth table, including those values which are unspecified in the real-world. This investigation looks at the potential improvements that can be achieved with the use of “don’t care” logic in the desired truth table, by modifying the fitness function to allow this flexibility. Small test circuits are studied in this paper to provide a first investigation of the utility of “don’t care” values in the evolutionary design of digital circuits. This paper is organised as follows. Section 2 details how digital circuits are encoded and evolved using CGP. Section 3 introduces example application areas of “don’t care” logic, and provides the test problems for use in evolution trials. In Section 3 the changes to CGP in order to allow exploitation of undefined truth table values are described. Results of the changes on evolution of the example circuits are given in Section 5, and conclusions drawn in Section 6.

2 Digital Circuit Evolution 2.1 Genotype Encoding The digital circuit encoding used in this paper has been developed and improved over a number of years by Miller et al, as seen in [1][3]. A digital circuit is considered as a specific case of the general acyclic directed graph model used in Cartesian Genetic Programming [4]. A graph in CGP is seen as a rectangular array of nodes, characterised by the number of columns, the number of rows, and levels-back. The number of nodes in use by the algorithm is the product of the graph dimensions number of columns and number of rows. The levels-back parameter specifies the maximum number of columns to the left of a node its inputs can originate from. This also controls how many columns from the furthermost right hand side of the grid outputs can be taken from. Nodes cannot be connected to nodes within the same column. The graph has a feed-forward structure, whereby a node may not receive inputs from nodes in columns to its right. Fig. 1 displays these values diagrammatically, showing an example of a 5 by 4 array with levels-back of 3, where node 21 receives inputs from nodes 10 and 12 both within 3 columns to the left.

Introducing Flexibility in Digital Circuit Evolution

27

levels-back = 3

Inputs

n0 n1 n2 n3

4

8

5 6 7

n10

n12

12

16

20

9

13

17

10

14

18

22

11

15

19

23

n21

21 number of rows = 4

number of columns = 5

Fig. 1. Visual representation of an example array of nodes as used in CGP. Example has 4 inputs, 5 columns, 4 rows, levels-back value of 3 (shown as dotted box relative to node 21).

Each individual node is described by its inputs, output and function. The output from each node, and the provided input data, is sequentially indexed from zero as seen in Fig. 1. All nodes utilised in this paper require 2 inputs, and their single-output functions are described by the Boolean logic equations in Table 1. The allowed node functions were selected fairly arbitrarily, although provided they are kept constant over all tests this is sufficient for comparisons to be made. All possible functions for a node are independently indexed. This separate sequential integer indexing for outputs and functions allows a single node to be fully described by its output index and 3 integer values: input1, input2, function. The genotype encoding maps the 2 dimensional graph to flat list of integers. It is specified that node output indexing is sequential within this list, beginning with the first integer index after the inputs. This removes the need to index each node within the genotype encoding, since it is inherent in the node location within the list. The outputs are specified at the end of the genotype as a list of integers specifying the node outputs to be used. Table 1. Allowed node functions, subset of those used by Miller [1]

AND

OR

XOR

ܽήܾ

ܽത ൅ ܾത

ܽ ْ ܾ

ܽ ή ܾത

ܽ ْ ܾത

28

R.D. Ledwith and J.F. Miller

2.2 Fitness Evaluation To calculate the fitness of a genotype the evolved circuit’s outputs are compared with the desired outputs as specified in a truth table. To perform this comparison, the CGP program makes efficient use of the processor by carrying out comparisons on multiple lines of a truth table simultaneously. This technique was introduced by R Poli [6], and considers a 32-bit processor as 32 individual 1-bit processors for simple logic functions. Since bit comparison can be achieved by utilising simple logic functions (See Section 4.1), this technique can be exploited to carry out comparisons of up to 32 lines of a truth table in just a few single-cycle operations. The genotype fitness is then defined as the total number of correct output bits in the resulting phenotype. In order for this to be achieved, the desired truth table must be provided in a 32-bit representation within the configuration file which describes the intended system. 2.3 The Evolutionary Algorithm A form of the (1 + λ)-ES evolutionary algorithm discussed by Bäck et al [7] is used throughout this paper. This strategy has also been used by Miller et al [1][3] and been shown to produce good results. The algorithm implements neutral search whereby if a parent and offspring have equal fitness, the offspring is always chosen in the interests of finding neutral solutions. Neutral search has been shown to be crucial to the efficiency of CGP [4][8]. The algorithm can be described by the following steps: 1. Randomly initialize a population of λ valid genotypes, where constraints discussed in Section 2.1 are adhered to. 2. Evaluate fitness of each genotype. 3. Identify fittest genotype, giving priority to offspring if parent and offspring have equal fitness. Copy fittest genotype into the new population to become the new parent. 4. Produce (λ – 1) offspring to fill population by creating mutated versions of parent genotype. 5. Destroy old population and return to step 2 using new population, unless a perfect solution or maximum number generations has been reached.

3 Problem Space This investigation into the use of “don’t care” logic will be tested by attempting to evolve the circuits for three problem areas. 3.1 Quotient and Remainder Hardware Divider Division in microprocessors is most often performed by algorithms such as “shift and subtract” [9] or SRT (Sweeney, Robertson, and Tocher). Faster algorithms can also be used such as Newton-Raphson and Goldschmidt, both of which are implemented in some AMD processors [10]. This paper, however, looks at developing a simple divider implemented entirely in hardware by standard logic gates. This circuit is selected as it demonstrates a clearly apparent and understandable existence of undefined

Introducing Flexibility in Digital Circuit Evolution

29

outputs; since calculations involving a division by zero are mathematically undefined. The divider will take the form of a quotient and remainder divider, with a single status output for the divide by zero (DIV/0) error. For 2 inputs A and B, where B is non-zero, this circuit will compute outputs Q and R to satisfy the following equation: (1) For the case where B is equal to zero the solution is undefined and the status output D goes active (defined as ‘1’ for this case). At this point all of the bits in the both output buses Q and R are undefined. As an initial investigation into the potential performance gains of utilizing “don’t care” logic, and in order to keep the complexity of the tests low, this paper only considers a 2-bit divider. The 2-bit divider has 4 single-bit inputs (A1, A0, B1, B0), and 5 single-bit outputs (Q1, Q0, R1, R0, D). The efficiency of evolution making use of “don’t care” logic will be compared against using fully-defined logic. 3.2 Finite State Machine Logic “Don’t care” states often arise when designing next state and output logic for a finite state machine (FSM). Each state in the FSM must be assigned a binary value, and hence if the number of states is not an exact power of two there will be unused binary values. These unused values will result in entire “don’t care” rows in the truth table. The FSM used in this paper is of a Mealy structure, where the output(s) depend on both the current state and current input pattern. The logic to be designed will be required to produce both the next state value and the output. The design for the FSM was chosen from the benchmarks for the 1991 International Workshop on Logic Synthesis, referred to as the LGSynth91 benchmarks [11]. The selected FSM benchmark dk27 has 7 states, 1 input and 2 outputs. The state assignment is therefore 3-bit, and one value is unused (chosen as 000). To keep the complexity low, the 2 outputs in the dk27 circuit were flattened into a single-bit output. With the single-bit input and 3-bit state assignment this results in a circuit with 4 inputs and 4 outputs, and two rows of “don’t care” values. 3.3 Distributed Don’t Cares The previous test cases both result in clusters of “don’t care” values, where all or most of a row is undefined for specific input patterns. In order to ensure the experimental results are reflective of a range of circuits, this test case comprises a truth table designed under the constraints of a maximum of one “don’t care” value per truth table row. The circuit was chosen to have 4 inputs and 4 outputs to match the finite state machine. The outputs were randomly generated, with ones and zeros having equal probability. The “don’t care” states were also generated randomly, with probability of a “don’t care” within any row being 50%, and equal probabilities for each output. The resulting truth table is shown in Table 2.

30

R.D. Ledwith and J.F. Miller

Table 2. Truth table for the distributed “don't care” circuit, showing maximum of one undefined output per row

A

Inputs B C

D

W

Outputs X Y

Z

A

Inputs B C

D

W

Outputs X Y

Z

0 0 0 0 0 0 0 0

0 0 0 0 1 1 1 1

0 1 0 1 0 1 0 1

1 0 1 1 1 X X 0

X 0 1 X 1 0 0 X

1 0 X 1 1 1 0 0

1 1 1 1 1 1 1 1

0 0 0 0 1 1 1 1

0 1 0 1 0 1 0 1

0 X 1 1 0 X 1 1

1 1 0 1 0 0 0 0

0 1 0 1 1 1 X 0

0 0 1 1 0 0 1 1

1 1 1 1 0 0 1 1

0 0 1 1 0 0 1 1

X 0 1 0 0 1 1 0

4 Implementation of Don’t Care Flexibility 4.1 Simple Don’t Care Bitmask In order to implement “don’t care” logic, it was necessary to add a method of describing undefined states. In order to maintain efficient fitness evaluation, no changes were made to the 32-bit truth table representation method. Instead, an additional section was added describing a 32-bit bitmask for each value in the table. In this bitmask, a value of ‘1’ indicates the truth table value is valid and fixed, and a value of ‘0’ indicates flexibility (an undefined value). Before the comparison between the actual and desired truth table value is carried out, both undergo a logical AND operation with the bitmask. This process ensures that all undefined states appear as ‘0’ in both the actual and desired truth tables, and hence match. This method allows for minimal changes to the fitness evaluation code, and thus minimises extra computational time. The fitness comparison for a single value thus changes from that in equation (2) to equation (3); where A is the actual output from the phenotype under evaluation, D is the desired output, and b the bitmask. (2) (3) Extra efficiency can be gained if it is ruled that all undefined values are assigned the value of ‘0’ in the desired truth table, thus the logical AND with the bitmask is not required for the desired truth table, resulting in equation (4). This comparison requires only one addition logical operation from the original, and hence should not slow the fitness evaluation by more than one clock cycle per 32-bit comparison. (4) 4.2 Extended Don’t Care Method The simple “don’t care” method allows evolution the flexibility of exploiting all of the undefined states. The concept can however be extended further to allow evolution even more control over exactly how to utilise the undefined outputs.

Introducing Flexibility in Digital Circuit Evolution

31

This is achieved by appending additional genes to the chromosome, describing how to interpret each of the available undefined outputs. A simple binary gene representing whether or not to use each “don’t care” was first considered, however this method would then restrict evolution to the values encoded in the configuration file truth table. The extended version instead uses genes with 3 possible values: 0, 1, or 2. A value of zero or one specifies that the desired output should be interpreted as a ‘0’ or ‘1’ respectively. This effectively removes the “don’t care” from the desired truth table and replaces it with a zero or one. A value of two represents the desired output should be considered as a “don’t care” state, and treated as in the simple method. The fitness evaluation is then the same as for the simple method; however the desired truth table row and “don’t care” bitmask must be constructed for each evaluation using the “don’t care” genes in the current chromosome.

5 Evolved Data 5.1 Test Structure and Parameters The size of the node array was not kept constant for each test case, since the differing complexities require different array sizes. However the maximum number of generations was fixed for all tests at 100,000. For each test circuit the mutation rate was varied, with 100 runs for each mutation rate executed using: the fully defined truth table, the truth table with “don’t care” bitmask using the simple strategy, and the truth table with “don’t care” bitmask using the extended strategy. 5.2 Success of Evolving 2-bit Hardware Divider The following parameters were used for evolution of the 2-bit hardware divider detailed in Section 3.1: number of rows and columns was 4 and levels-back was also. The resulting genotype contains 53 genes, and therefore the minimum mutation rate for mutations to occur is 2% (1 gene per generation). The mutation rate was increased from 2% in steps of 2.0% until all runs failed to reach a perfect solution. At each mutation rate 100 runs were executed using the fully defined truth table and each strategy for the incompletely defined truth table. Fig. 2 clearly shows the improved performance of evolution using the flexible truth table compared with the fully defined truth table. It also demonstrates the superior performance of the simple strategy compared to the extended version for this circuit. 5.3 Success of Evolving FSM Next State Logic The FSM next state and output logic is detailed in Section 3.2. The CGP grid was 6x6 with levels back equal to 6. The resulting genotype contains 112 genes, and hence the minimum mutation rate is 1%. The mutation rate was increased in steps of 1.0% until all runs failed to reach a perfect solution. Once again, for each mutation rate, 100 runs were executed using the fully defined truth table and each strategy for the incompletely defined truth table.

32

R.D. Ledwith and J.F. Miller

The results are displayed in Fig. 3, which also shows the improved performance of evolution using the flexible truth table compared with the fully defined truth table. Once again, the simple “don’t care” strategy outperforms the extended version for this circuit. 100

Percentage of runs achieving perfect solutions (%)

90 80 70

Fully-defined

60 50

Don't care (simple)

40 30

Don't care (extended)

20 10 0 0

2

4

6

8

10 12 Mutation rate (%)

14

16

18

20

Fig. 2. Graph of the number of perfect solutions reached (out of 100 runs) by using standard and “don’t care” truth tables for the 2-bit hardware divider 80

Percentage of runs achieving perfect solutions (%)

70

60

Fully-defined 50

Don't care (simple)

40

30

Don't care (extended)

20

10

0 0

1

2

3

4

5

6

7

8

9

Mutation rate (%)

Fig. 3. Graph of the number of perfect solutions reached (out of 100 runs) by using standard and “don’t care” truth tables for the FSM next state and output logic

Introducing Flexibility in Digital Circuit Evolution

33

Percentage of runs achieving perfect solutions (%)

70

60

50

Fully-defined 40

Don't care (simple)

30

Don't care (extended)

20

10

0 0

1

2

3

4 5 Mutation rate (%)

6

7

8

9

Fig. 4. Graph of the number of perfect solutions reached (out of 100 runs) by using standard and “don’t care” truth tables for the distributed “don’t care” circuit

5.4 Success of Evolving Distributed Don’t Cares Circuit Since this circuit was designed to mimic the complexity of the FSM logic, the same experimental parameters were used. The mutation rate was also varied from 1% upwards in steps of 1.0%. The results are displayed in Fig. 4, which once again supports previous results of improved performance using the flexible truth table compared with the fully defined truth table. The simple “don’t care” strategy also outperforms the extended version for this circuit. 5.5 Efficiency of Evolved Circuits Whilst it is advantageous to consider the computational benefits of the flexible truth table, perhaps more exciting is to consider the hardware efficiency of the evolved solutions. To enable evolution to continue beyond the initial perfect solution and attempt to reduce hardware requirements, the genotype fitness for perfect circuits must be modified. The simple modification defines fitness for perfect genotypes as the maximum fitness plus the number of redundant nodes (nodes which do not contribute to the outputs). This causes the algorithm to continuing executing until the maximum number of generations is reached, attempting to reduce the number of active nodes. This algorithm was executed on each test case with the parameters and varying mutation rates given in previous sections. The array size was also varied, up to a maximum of 100 available nodes.

34

R.D. Ledwith and J.F. Miller

Conventional methods such as the Karnaugh map allow minimised Boolean equations to be obtained from a desired truth table (See [13] for a good explanation). Karnaugh maps cannot utilise the XOR operator and as such the circuits evolved in the previous sections are expected to require less gates regardless of the “don’t care” modifications. However, with this in mind, the Karnaugh map can still be used to identify a benchmark for the hardware requirements of the test circuits. A Karnaugh map was constructed for each of the outputs of each circuit, and the sum-of-products Boolean equations obtained. Considering only the use of 2-input gates, the required number of gates to synthesise each circuit is shown in Table 3. Table 3. Number of 2-input gates required to synthesis test circuits from Karnaugh map minimised sum-of-products

Circuit 2-bit Divider FSM Logic Distributed Don’t Cares

Number of 2-input gates required 16 41 35

Hardware divider: The most efficient solution in terms of hardware requirements for the hardware divider was found to require 8 gates, a hardware saving of 50% compared with that found by conventional methods in Table 3. This solution used the simple “don’t care” strategy. Without the “don’t care” modification, the most efficient solution required 10 gates, and so a hardware saving of 20% was achieved over standard CGP. Finite State Machine Logic: The most hardware efficient design for the finite state machine next state and output logic required 14 gates. This solution was also found using the simple “don’t care” strategy, and gives a hardware saving of 26% over the most efficient solution without truth table flexibility, requiring 19 gates. Distributed Don’t Cares: Once again the simple strategy outperformed the extended version for finding efficient solutions, with the least number of gates required being 15. Without any truth table flexibility a solution requiring 18 gates was achieved, giving a hardware saving of 17% by the “don’t care” strategy. Clearly, the extended strategy for “don’t care” utilisation does not offer any benefits to the simple version for finding efficient circuits. The use of flexible truth tables does however have a clear advantage over standard CGP, resulting in at least a 17% reduction in hardware for all three test circuits.

6 Conclusion The motivation behind introducing flexibility in the desired truth table has been discussed in this paper, and a method for implementing this technique using a “don’t care” bitmask has been shown. Two strategies have been introduced for making use of available undefined states, although the simple strategy outperformed the extended

Introducing Flexibility in Digital Circuit Evolution

35

version for all test circuits presented. Using three circuits with incompletely defined truth tables, the use of unfixed output values has been demonstrated to increase the performance of CGP, as well as producing more hardware efficient designs. Allowing “don’t care” logic in the truth table can be thought of as increasing the potential number of perfect truth tables, and hence perfect phenotypes. Since CGP already has a many-to-one genotype-phenotype mapping, increasing the number of perfect phenotypes significantly increases the number of perfect fitness genotypes. This greatly increases the level of neutrality in the search space, and therefore agrees with the findings of Miller et al [4][8]. The chosen test circuits were kept small in order to keep complexity and required processing time low. Now the technique has been proven it could be extended to application for larger circuits. Note that for every addition of a “don’t care” state in a binary truth table, the total number of possible truth tables which satisfy the requirement is doubled. This implies that with larger circuits, and possibly higher numbers of “don’t cares”, the potential benefits of this technique could be even greater.

References 1. Miller, J.F., Job, D., Vassilev, V.K.: Principles in the Evolutionary Design of Digital Circuits - Part I. Journal of Genetic Programming and Evolvable Machines 1, 8–35 (2000) 2. Perez, E.I., Coello, C.C.: Extracting and re-using design patterns from genetic algorithms using case-based reasoning. Engineering Optimization 35(2), 121–141 (2003) 3. Miller, J.F., Thomson, P., Fogarty, T.: Designing Electronic Circuits Using Evolutionary Algorithms. In: Quagliarella, D., Periaux, J., Poloni, C., Winter, G. (eds.) Arithmetic Circuits: A Case Study, Genetic Algorithms and Evolution Strategies in Engineering and Computer Science, pp. 105–131. Wiley, Chichester (1997) 4. Miller, J.F., Thomson, P.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 5. Perkowski, M., Foote, D., Chen, Q., Al-Rabadi, A., Jozwiak, L.: Learning hardware using multiple-valued logic-Part 1: introduction and approach. IEEE Mirco 22(3), 41–51 (2002) 6. Poli, R.: Sub-machine-code GP: New results and extensions. In: Langdon, W.B., Fogarty, T.C., Nordin, P., Poli, R. (eds.) EuroGP 1999. LNCS, vol. 1598, pp. 65–82. Springer, Heidelberg (1999) 7. Bäck, T., Hoffmeister, F., Schwefel, H.P.: A survey of evolution strategies. In: Belew, R., Booker, L. (eds.) Proceedings of the 4th International Conference on Genetic Algorithms, pp. 2–9. Morgan Kaufmann, San Francisco (1991) 8. Miller, J.F., Smith, S.L.: Redundancy and Computational Efficiency in Cartesian Genetic Programming. IEEE Trans. on Evolutionary Computation 10, 167–174 (2006) 9. Shaw, R.F.: Arithmetic Operations in a Binary Computer. The Review of Scientific Instruments 21(8) (1950) 10. Oberman, S.F.: Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor. In: Proc. IEEE Symposium on Computer Arithmetic, pp. 106–115 (1999) 11. Yang, S.: Logic synthesis and optimisation benchmark user guide version 3. MCNC (1991) 12. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)

36

R.D. Ledwith and J.F. Miller

13. Holder, M.E.: A modified Karnaugh map technique. IEEE Transactions on Education 48(1), 206–207 (2005) 14. Sekanina, L.: Evolutionary Design of Digital Circuits: Where Are Current Limits? In: Proceedings of the First NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2006), pp. 171–178. IEEE CS, Los Alamitos (2006) 15. Stomeo, E., Kalganova, T., Lambert, C.: Generalized Disjunction Decomposition for Evolvable Hardware. IEEE Trans. Syst., Man, and Cyb. Part B 36(5), 1024–1043 (2006)

Evolving Digital Circuits Using Complex Building Blocks Paul Bremner1 , Mohammad Samie1 , Gabriel Dragffy1 , Tony Pipe1 , James Alfred Walker2 , and Andy M. Tyrrell2 1

Bristol Robotics Laboratory, University of the West of England, Bristol, BS16 1QY 2 Intelligent Systems Group, Department of Electronics, University of York, Heslington, York, YO10 5DD

Abstract. This work is a study of the viability of using complex building blocks (termed molecules) within the evolutionary computation paradigm of CGP; extending it to MolCGP. Increasing the complexity of the building blocks increases the design space that is to be explored to find a solution; thus, experiments were undertaken to find out whether this change affects the optimum parameter settings required. It was observed that the same degree of neutrality and (greedy) 1+4 evolution strategy gave optimum performance. The Computational Effort used to solve a series of benchmark problems was calculated, and compared with that used for the standard implementation of CGP. Significantly less Computational Effort was exerted by MolCGP in 3 out of 4 of the benchmark problems tested. Additionally, one of the evolved solutions to the 2-bit multiplier problem was examined, and it was observed that functionality present in the molecules, was exploited by evolution in a way that would be highly unlikely if using standard design techniques.

1

Introduction

A proposed approach to tackling the issue of fault-tolerance, and hence reliability issues for digital systems, is a bio-inspired prokaryotic cell array. Under this paradigm a circuit is made up of interconnected, identical, cells that are configured, using bit strings of genes, to fulfill the necessary routing and functional properties to make up a digital system [1]. The cells in our proposed array have been designed with a great deal of functionality. A consequence of this is that it is a complex task to specify genes to fully exploit the functionality of the cells, when implementing digital systems using standard digital design techniques. An alternative to a deterministic technique of gene specification, Genetic Programming, has therefore been investigated. Genetic Programming (GP) has been shown to be capable of producing novel [2][3], compact [4] solutions to digital design problems; often these result in circuits that are unlikely to be conceived using standard design techniques. It therefore seems possible that some form of GP might be used to produce circuits, using the proposed cells, that would exploit their functionality in ways that a deterministic technique might not. Cartesian Genetic Programming, developed G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 37–48, 2010. c Springer-Verlag Berlin Heidelberg 2010 

38

P. Bremner et al.

by Miller and Thomson [5], is a method that could be readily adapted to allow the use of cells within the evolution process. In our case the standard 2 input logic gates that are normally used as nodes in CGP for digital circuit evolution, will be replaced by a cut down version of the proposed cells. The proposed cells are able to process 8 inputs to perform a variety of routing and functional roles. The routing functionality of the cells is not suitable to be included within the framework of CGP, so a cut down version dubbed a molecule will be used; hence the name of the proposed technique, Molecular Cartesian Genetic Programming (MolCGP). However, a key issue with taking this approach is the efficiency with which a solution might be found. CGP has been shown to produce solutions to a number of benchmark problems with a useful degree of efficiency; by increasing the complexity of the nodes, the amount of design space that must be explored similarly increases. As a consequence, as well as the efficacy of MolCGP to exploit the functionality of the molecules being investigated, the efficiency and efficacy with which it is able to solve benchmark digital problems will be investigated.

2

Related Works

It has been shown that Evolutionary Algorithms can be used to successfully evolve digital circuits [2][3]. Miller and Thompson proposed Cartesian Genetic Programming as one such method of approaching this problem. It differs from the original Genetic Programming (GP) technique proposed by Koza [6] in that a program is represented as an acyclic directed graph rather than a tree like structure; each node in the graph representing a digital function. They demonstrated the ability to evolve novel solutions to 2-bit and 3-bit multipliers, and the even 4-bit parity problem. Although the technique affords exploration of the design space in ways that produces solutions that are beyond the remit of traditional design techniques, the building blocks of those solutions are restricted to the range of functions that are defined for the nodes. The work presented here proposes to extend CGP by using nodes with more functional flexibility. Other work has also sought to expand on the capabilities of CGP. Sekanina proposed the use of unconventional functions in the form of polymorphic logic gates [7]. He found that it was possible to evolve multi-functional digital circuits using a combination of standard and polymorphic logic gates as node functions. Thus, by expanding the range of available node functions, a wider range of design space can be successfully explored. Walker and Miller looked to improve the capabilities of CGP by allowing the automatic creation of multi-node modules, which could be inserted into the graph in place of a standard node during mutation [8]. Thus, the function set could be increased by adding other useful functions beyond the base primitives; this facilitated more efficient evolution, particularly in the case of modular designs. Haddow et al. have also attempted to find a method to produce the configuration bits for a Look Up Table (LUT) based cell array [9]. However their technique is totally different from that presented here, they use a set of evolved growth

Evolving Digital Circuits Using Complex Building Blocks

39

rules to propagate configuration bits to the array rather than evolving the configuration bits directly. Thus, they have shifted the complexity from the genotype to the method of conversion from genotype to phenotype. This produces some interesting results but they have not attempted to use their technique to evolve digital circuits.

3

Description of Molecule Functionality

A potential design paradigm for fault tolerant digital systems is the bio-inspired prokaryotic array. This sort of system is made up of an array of identical cells, capable of being configured to perform a variety of functions [1]. The functional part of the cells, for the array that we are currently developing, is made up of two functional units. Each unit can be configured to operate independently, or cooperatively, to process 8 input lines, realising a variety of processing and routing functions on the data. The configuration of these cells is carried out using a bit string of genes. In order to constrain the design space of the system to a degree whereby a GP method can operate with some efficiency, the routing and cooperative functionality of the cells has been ignored. Hence, each cell has been broken down into two molecules which are a cut down version of a functional unit. Similarly, a segment of the complete gene string is used to control the functionality of the molecule. The cell and gene string have been decomposed in such a way that cells (and their requisite genes) could be reconstructed from the evolved molecules. Each molecule has four inputs and two outputs. The function realised at the primary output (PO ) is driven by an 8 bit wide LUT, so can produce any arbitrary three input boolean function; its inputs (PI 1−3 ) are in1, in2 and either in3, in4, in1 or logic 0 (these last 2 result in a 2 input function). The secondary output (SO ) is primarily for use as a carry output when the molecule is configured as a full or half adder; otherwise, it either routes PI 3 , or produces in1 .in2 + PI 3 .in1 ⊕ in2 Although the functionality of the secondary output is relatively limited it is allowed as a valid connection, it is a fixed part of the cell design, and any functionality available should be allowed to be exploited by the evolution. A schematic of the molecule is shown in Fig. 1. The functionality of the molecule is controlled by an 11 bit long binary string. The first 8 bits of which constitute the LUT, the other 3 bits control which value is passed as PI 3 , and the function executed by SO . When PI 3 is selected in such a way that results in a 2 input function only half of the LUT bits are used; which bits of the LUT are used is determined by whether in1 or logic 0 are selected.

4

CGP and Its Extension to MolCGP

Miller and Thomson developed Cartesian Genetic Programming in order to facilitate the evolution of digital circuits [5]. Molecular Cartesian Genetic Programming (MolCGP) is an extension of CGP using more complex nodes than in

40

P. Bremner et al.

Fig. 1. A schematic of the molecule. LUT1-8 comprise the LUT part of the gene string, C1-3 are the control genes that define the remaining functionality.

the original implementation. In CGP a digital circuit is represented as a directed graph, each node representing some form of digital processing element. It is described as Cartesian because the nodes are laid out in a grid, so the Cartesian coordinates of a node are used to identify the connections of the edges of the graph. A benefit of this type of representation is that the outputs of a given node can be connected to any other, allowing implicit reuse of the processing performed (Fig. 2). CGP has been shown to be most efficient when only a single column (or row) of nodes is used, rather than a grid of nodes as suggested in the original implementation [10]; this single dimension approach is followed here. Additionally, the graph is acyclic as all the functions to be evolved are feed-forward, combinational logic. Thus, a node may only have input connections from preceding nodes in the graph and program inputs; the outputs are further restricted in that they may not be connected (directly) to program inputs.

Fig. 2. Acyclic Directed Graph, 3 nodes each with 2 inputs and 1 output, 2 program inputs (A,B) and one program output (C)

The genotype in MolCGP, as in CGP, is made up of a number of sets of integers, one set for each node in the graph. The genotype length is fixed, a specified number of nodes is defined for every member of the population. However, the genotype-phenotype mapping is such that each node need not necessarily contribute to the value produced at any of the outputs. Thus, although the genotype is bounded, the phenotype is of variable length. The unconnected nodes represent redundant genetic information that may be expressed if a mutation results in their inclusion in an input to output path. Therefore the effect of a single point mutation on the genotype can have a dramatic effect on the phenotype, an example of this is shown in Fig 3. In order to ensure these neutral mutations

Evolving Digital Circuits Using Complex Building Blocks

41

influence the evolution, new population members that have the same fitness as the parent are deemed fitter than parents. This phenomenon is often referred to as neutrality, as mutations in the redundant sections of the genome have no effect on the fitness of the individual; it has been shown to be beneficial to the operation of CGP [10][5][11]. A degree of redundancy as high as 95% is suggested by Miller and Smith [11] as providing optimum increase in performance. The optimum number of nodes for producing similar levels of improvement in efficiency in MolCGP is investigated in section 5.

Fig. 3. Point mutation occurs changing which node the output C is connected to. Redundant nodes are indicated by dotted lines.

In CGP, each node consists of one number representing the function of the node, and the remaining numbers defining the sources of the input connections, the number of these connections is dependent on the arity of the function that the node implements; in MolCGP there are always 4 inputs. In CGP the function of each node is represented by an integer, allowing functions to be drawn from a predefined list; in [5] this list consisted of a range of primitive boolean functions as well as some MUX nodes to allow inversion of various inputs. MolCGP is an extension of CGP in that the functionality of a node is defined by a bit string, which allows generation of arbitrary 3 input logic functions from the primary output, and the full range of possible functionality from the secondary output, through mutation of this bit string; thus the nodes in MolCGP are significantly more flexible in the functions they are able to implement. Additionally, nodes in CGP only have one output, nodes in MolCGP have 2 outputs, hence each connection gene is, instead, a pair of numbers, defining the node connected to, and the output of that node (in a similar manner to the genotype used in ECGP and MCGP proposed by Walker and Miller [8]). Typically, CGP uses only mutation as a genetic operator, and that concept is followed in MolCGP. A number of nodes are mutated for each generation, defined by a mutation rate that is a percentage of the number of nodes specified for the current population; a mutation rate of 3% was found to give good performance, and is used throughout the work presented here. For each node mutated, either the function, or one of the connection gene pairs, may be mutated. If the function is mutated, a random number of bits in the function gene-string are flipped. If a connection is mutated, a new random, valid, connection pair is generated.

42

P. Bremner et al.

The fitness function used is simply the total hamming distance of the resultant bit strings (one from each output) that result from evaluating the truth table of the given problem, with those specified by said truth table. Thus a lower fitness score is better, and evolution is stopped when a fitness of zero is reached.

5

Evolution Strategy and Population Size Experiments

As a consequence of the increased complexity of the nodes, the design space to be explored is a great deal larger, and mutations on the node functions are likely to have a greater effect on the fitness of an individual. It therefore seems prudent to investigate whether the evolution strategy (1+4, i.e., each new population consists of the best individual from the previous generation and 4 offspring produced by mutating it), and genome redundancy used by [5] is appropriate for MolCGP. In order to investigate this, a 2-bit multiplier was chosen as a sample program to be evolved. It is sufficiently complex that effects of parameter changes should be observable, while not being so complex that solutions take a very long time to evolve. As a measure of efficiency, to allow direct comparison between the different parameter settings, Individuals Processed to find a Solution (IPS) is used. IPS is calculated using equation (1), where M is the number of individuals in a population, and i is the median number of generations to find a solution. IPS can be seen to have some similarities to Computational Effort (CE) proposed by Koza [6]; it is used instead due to the inaccuracies of CE for low run, high generation experiments [12]. Each parameter set is used for 50 independent runs, and a Box & Whisker plot is generated for analysis. IP S = M ∗ i;

(1)

To test evolution strategies (λ + x) the number of nodes is set at 50. To test genotype lengths the evolution strategy is set as 1+4. In both cases the evolution is always run until success. 5.1

Discussion

It is clear from Fig. 4 that increasing the number of nodes, and therefore the redundancy, has (as in standard CGP) a beneficial effect on the efficiency of evolution. Pairwise Mann-Whitney U-tests, and Kolmogorov-Smirnov tests, were carried out on the data, and showed that the observed improvement in efficiency is largely not significant (at the 5% level) beyond 20 nodes. This is contrary to the findings in [11], where as node numbers were increased so to does the efficiency of evolution (for all values tested). A potential reason for this is that far fewer nodes than in standard CGP are required for the 95% neutrality suggested by Miller; the precise degree of neutrality present is non-trivial to calculate, given the implicit neutrality in nodes, i.e., nodes expressed in the phenotype that do not actually contribute to the functionality. To try to approximate the neutrality in the genome (explicit neutrality) the number of nodes was severely restricted

Evolving Digital Circuits Using Complex Building Blocks

43

Fig. 4. Box & Whisker Plot of Variable Numbers of Nodes in Each Individual. 50 Independent Runs Performed for Each Value

Fig. 5. Box & Whisker Plot of Variable Evolution Strategy. 50 Independent Runs Performed for Each Value

44

P. Bremner et al.

and evolution ran until success, a solution can be found with as few as 5 nodes, implying that with 20 nodes there is at least 75% neutrality. In addition, there is also a trade off to be made in the apparent improvement in IPS and the complexity of the individual; individuals in populations with more nodes are likely to have larger phenotypes than those with less nodes [11], and thus tend to take longer to process. Therefore, the difference in processing time is not necessarily improved with an increased number of nodes. Consequently, selecting the correct number of nodes for a given problem appears critical to shorter evolution times. It is clear from Fig. 5 that a 1+4 strategy gives, as suggested for standard CGP, maximum efficiency. Pairwise Mann-Whitney U-tests, and KolmogorovSmirnov tests, were carried out on the data, and showed that the observed improvement in efficiency is only significant (at the 5% level) between the smaller and larger population sizes. However, although the observed improvements in efficiency between the low population size experiments are not significant, the Box & Whisker plot shows that the variance increases as the strategy deviates from 1+4; thus, a 1+4 evolutionary strategy will give consistently more efficient evolution.

6

Applying MolCGP to Benchmark Problems

In order to test the efficacy of MolCGP, 4 benchmark problems have been attempted that are commonly used to test new techniques; the 4 and 8 bit even parity problems, and, 2 and 3 bit multipliers [13]. Despite the inaccuracies of CE as a measure for MolCGP, it is used as a standard measure for many GP approaches; therefore, to allow comparison of performance on benchmark problems, in particular with standard CGP, it is used in this section. Walker et al. state that CE is a point statistic and that, in order to improve the validity of comparisons with other techniques (mitigating the inaccuracies of CE for our high generation, low run approach), a confidence interval (CI) should be calculated [13]. These intervals are calculated here using the methodology presented in [13], and these values are included in Table 1, along with the values for standard CGP taken from [12] and used for comparison. In all cases, 50 independent runs were conducted, with an evolution strategy of 1 + 4, a mutation rate of 3%, and a genotype length of 100 nodes. 6.1

Discussion

Looking at the multiplier problems, it can be seen that CGP outperforms MolCGP by approximately 2 times for the simpler 2-bit multiplier problem, but the reverse is true for the 3-bit multiplier where a 5 fold decrease in CE can be seen for MolCGP. The results for the even parity problems are even more dramatic; however, the function set for CGP is limited to AND, OR, NAND, and NOR, which are known to require very complex solutions when using only these functions [2], thus the vast improvement is, perhaps, partly attributable

Evolving Digital Circuits Using Complex Building Blocks

45

Table 1. The computational effort (in number of generations) for the 4 benchmark problems tested, for both MolCGP and CGP. Also included are the true CE confidence interval (CI) lower and upper bounds.

Benchmark Problem 2-Bit Multiplier 3-Bit Multiplier Even 4-Bit Parity Even 8-Bit Parity

MolCGP CIlower CE CIupper 53,814 73,282 109,313 4,283,402 5,832,962 8,700,865 12,021 24,071 31,325 83,687 120,324 167,575

CGP CIlower CE CIupper 24,675 33,602 50,123 16,448,737 24,152,005 33,867,501 106,546 151,683 210,235 22,902,612 31,187,842 46,522,022

to this. However, it is clear that despite the increased design space that is being explored by MolCGP, significant improvements in CE are demonstrated, especially for more complex problems. A caveat to this finding is that one of the limitations of CE as a performance indicator is that the calculation does not take into account the complexity of each individual solution. The nodes in CGP typically require 1 or 2 bitwise operations on the input data, molecules require many times more than this.

7

Examining an Evolved Solution to the 2-bit Multiplier Problem

In order to investigate how the resources of the nodes are being utilised, and how the solution differs from what might have been created by a human designer, one of the evolved solutions to the 2-bit multiplier problem has been examined. The solution was evolved using only 10 nodes, as each node is so complicated that a larger solution would be very difficult to analyse meaningfully; although reducing the neutrality increased the number of individuals that had to be processed, a solution was still found fairly quickly (less than a minute). Fig. 6 shows how the nodes were connected up in the evolved solution, it clearly shows that (as expected) node outputs are being reused; 3 of the nodes have both their primary and secondary outputs connected, resulting in 65% of the available resources being used. The functionality of each node is shown in table 2. The outputs C3, C2 and C1 are solved in such a way that follows fairly closely that which would be produced using a Karnaugh map. C3 and C2 use more nodes than is actually necessary, in some places taking advantage of the input routing nature of the secondary outputs (SO ), in others combining inputs in redundant ways. However, nodes still perform logically obvious operations. What is particularly interesting is the use of the functionality of the secondary outputs to calculate C1 in such a way that is very different from standard design techniques. Using a Karnaugh map the function deduced for C1 (the sum of minterms) is shown in equation (2), it requires 6 nodes (as no previous functionality can be directly reused); alternatively the multiplier can be constructed

46

P. Bremner et al.

Fig. 6. Connectivity of the 10 nodes in the examined 2-bit multiplier solution

using half-adders, the function for which is shown in equation (3), it requires 3 nodes (output 4S can be reused). There are 4 nodes unique to C1 but they did not result in producing anything like equation (3), instead equation (2) is produced using a convoluted combination of nodes (verified through extensive Boolean algebra not reproduced here). This exploitation of the unusual functionality of the secondary outputs, combined with the three input logic function of the primary outputs, in a way that standard design techniques would not lead to, highlights the benefit of using an evolutionary technique to produce circuits for the proposed array; i.e., exploration of areas of the design space that would not normally be used, giving rise to the potential for more efficient solutions to be evolved than could be designed. However, the circuit produced is not as efficient as that produced when constructing the multiplier using half-adders (requiring 6 nodes); this is due to some undesired exploitation of the routing capabilities of SO , and some redundant recombination of inputs. Thus, alteration of the fitness evaluation to include parsimony should increase functional exploitation, and minimise routing exploitation and redundant nodes; adding parsimony to the fitness function is one of the ideas discussed in section 8. A¯1 .A0 .B1 + A0 B1 .B¯0 + A1 .B¯1 .B0 + A1 .A¯0 .B0

(2)

(A0 .A1 .B1 .B0 ) ⊕ A1 B1

(3)

Evolving Digital Circuits Using Complex Building Blocks

47

Table 2. Functions of Nodes in the Examined 2-Bit Multiplier Solution Node Number PO Function SO Function 0 0 A1.B1 ¯ B0 ¯ + B1.B1 ¯ B1. + A1.B0.B1 1 B1 ¯ + A1.A0.B0 ¯ 2 A1.B0 A1.A0 + B0.A1 ⊕ A0 ¯ 3 0PO .A0 0 PO ¯ 4 B0.A0 B0.A0 3S¯O + 4SO 5 3SO .4SO 6 5SO .B1 5SO .3PO + B1.5SO ⊕ 3PO 3P¯O .4P¯O .2SO + 3P¯O .4PO .2S¯O + 3PO .4PO .2SO 3PO .4PO + 2SO .3PO ⊕ 4PO 7 8 0PO + 5P¯O 0 ¯ 1P¯O .A0.7S¯O + 7SO .A0 9 7 SO

8

Conclusions and Further Work

In this paper MolCGP, an extension of CGP, has been presented, and its capabilities investigated. It has been shown that, for a set of standard benchmark problems, it is able to find solutions with a practicable amount of computational effort, particularly when compared to standard CGP; thus demonstrating that it is a potentially valuable technique for evolving useful digital circuits on a bioinspired prokaryotic cell array. In order to facilitate scaleing MolCGP to more complex problems (than those presented here) further development of the algorithm to improve the efficiency of evolution, will be investigated. One possible approach for this is automatic module acquisition, as described in [8]. In this approach, collections of nodes with a useful function (a module) are added to the list of possible functional mutations of any given node, thus a mutation could replace a node with a module instead of a standard functional change. This facilitates further exploitation of the modular nature of many digital circuits. Owing to the multi-functional nature of the nodes in MolCGP, useful node functions (i.e., gene strings) could also be acquired, and added to the list of possible mutations. Having established that MolCGP is capable of evolving useful digital circuits, it will be developed to maximise the exploitation of the functionality of molecules. Upon examining one of the evolved solutions to the 2-bit multiplier problem, it can be seen that the functionality of the nodes is being relatively well exploited, and in such a way that would not normally be carried out by a human designer. This exploitation gives rise to the potential to evolve solutions that are more effecient than those that would typically be designed. Hence, in order to capitalise on this, and minimize routing exploitation and redundant nodes, the fitness function will be modified to include parsimony. An approach for doing so with CGP is suggested in [4]. Successfully evolved solutions (including standard designed solutions) are allowed to evolve further to see whether more compact solutions can be found. Alternatively, parsimony could be included in the fitness function to begin with, resulting in multi-objective evolution. Both approaches will be investigated. Should sufficiently efficient solutions be able to

48

P. Bremner et al.

be evolved, they will be examined for possible design techniques that can exploit the functionality of the nodes. Acknowledgments. This research work is supported by the Engineering and Physical Sciences Research Council of the United Kingdom under Grant Number EP/F062192/1.

References 1. Samie, M., Dragffy, G., Popescu, A., Pipe, T., Melhuish, C.: Prokaryotic bioinspired model for embryonics. In: NASA/ESA Conference on Adaptive Hardware and Systems, pp. 163–170 (2009) 2. Miller, J.F., Job, D., Vassilev, V.K.: Principles in the evolutionary design of digital circuits—part i. Genetic Programming and Evolvable Machines 1(1-2), 7–35 (2000) 3. Coello Coello, C.A., Aguirre, A.H.: Design of combinational logic circuits through an evolutionary multiobjective optimization approach. Artif. Intell. Eng. Des. Anal. Manuf. 16(1), 39–53 (2002) 4. Vassilev, V.K., Job, D., Miller, J.F.: Towards the automatic design of more efficient digital circuits. In: EH 2000: Proceedings of the 2nd NASA/DoD workshop on Evolvable Hardware, vol. 151 (2000) 5. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 6. Koza, J.: Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge (1996) 7. Sekanina, L.: Evolutionary design of gate-level polymorphic digital circuits. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2005. LNCS, vol. 3449, pp. 185–194. Springer, Heidelberg (2005) 8. Walker, J.A., Miller, J.F.: The automatic acquisition, evolution and reuse of modules in cartesian genetic programming. IEEE Trans. Evolutionary Computation 12(4), 397–417 (2008) 9. Haddow, P.C., Tufte, G., van Remortel, P.: Shrinking the genotype: L-systems for evolvable hardware? In: Liu, Y., Tanaka, K., Iwata, M., Higuchi, T., Yasunaga, M. (eds.) ICES 2001. LNCS, vol. 2210, pp. 128–139. Springer, Heidelberg (2001) 10. Yu, T., Miller, J.F.: Neutrality and the evolvability of boolean function landscape. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 204–217. Springer, Heidelberg (2001) 11. Miller, J.F., Smith, S.L.: Redundancy and computational efficiency in cartesian genetic programming. Transactions on Evolutionary Computation 10(2), 167–174 (2006) 12. Walker, J.A.: The Automatic Aquisition, Evolution and Re-use of modules in Cartesian Genetic Programming. PhD Thesis 13. Walker, M., Edwards, H., Messom, C.: Confidence intervals for computational effort comparisons. In: Ebner, M., O’Neill, M., Ek´ art, A., Vanneschi, L., EsparciaAlc´ azar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 23–32. Springer, Heidelberg (2007)

Fault Tolerance of Embryonic Algorithms in Mobile Networks David Lowe1 , Amir Mujkanovic1 , Daniele Miorandi2 , and Lidia Yamamoto3 1

Centre for Real-Time Information Networks University of Technology Sydney, Australia [email protected], [email protected] 2 CREATE-NET, v. alla Cascata 56/D, 38123, Povo, Trento, IT [email protected] 3 Computer Science Department, University of Basel, Switzerland [email protected]

Abstract. In previous work the authors have described an approach for building distributed self–healing systems – referred to as EmbryoWare – that, in analogy to Embryonics in hardware, is inspired by cellular development and differentiation processes. The approach uses “artificial stem cells” that autonomously differentiate into the node types needed to obtain the desired system–level behaviour. Each node has a genome that contains the full service specification, as well as rules for the differentiation process. This approach has inherent self-healing behaviours that naturally give rise to fault tolerance. Previous evaluations of this fault tolerance have however focused on individual node failures. A more systemic fault modality arises when the nodes become mobile, leading to regular changes in the network topology and hence the potential introduction of local node type faults. In this paper we evaluate the extent to which the existing fault tolerance copes with the class of faults arising from node mobility and associated network topology changes. We present simulation results that demonstrate a significant relationship between network stability, node speed, and node sensing rates.

1

Introduction

In this paper, we consider the issue of fault-tolerance in self–healing distributed networks that incorporate mobile devices and hence rapidly changing network topologies. Inspired by related work on Embryonics [1, 2], in our earlier work [3] we proposed EmbryoWare, an “embryonic software” architecture for robust and self-healing distributed systems. Like Embryonics, the EmbryoWare approach is based on the assumption that each node in the system contains a genome that includes a complete specification of the service to be performed, as well as a set of differentiation rules meant to ensure that each node differentiates into the node type required to provide required overall system–level behaviour. A particular feature of both Embryonics and EmbryoWare is that there is no G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 49–60, 2010. c Springer-Verlag Berlin Heidelberg 2010 

50

D. Lowe et al.

distinction between the fault1 handling behavior and the normal behavior of a node. The ability of a node to restore from a faulty to a normal state is a sideeffect of the system’s normal process of differentiating into the locally correct node type. Therefore, no special fault-handling routines are needed, which can make the system potentially more robust to unforeseen disruptions. In [3] we examined the general behaviour and performance of the EmbryoWare approach and demonstrated its validity as well as its inherent robustness and self–healing ability. That previous work however focused on individual node failures with a uniform probability distribution of failures occurring in any node. There does exists the likelihood of more complex patterns of node failure. One of the more significant of these occurs when we have mobile nodes, leading to regular changes in the network topology. When the topology changes the local neighbourhood for nodes is affected. Given that nodes differentiate into different types based, in part, on the sensed information from nodes in their local neighbourhood, when this neighbourhood changes it can mean that the node types are no longer correct. This can be interpreted as the introduction of faults into the system. An example of this situation would be an ad hoc network of mobile devices (such as cell phones) that form a distributed processing network. As devices move, they establish and then lose temporary connections, and hence the network topology is constantly changing. This has implications for ensuring the validity of the system–level functionalities – particularly where the correct behaviour of each node is dependent upon the behaviours in its neighbourhood. In this paper we evaluate the fault–tolerance behaviour of EmbryoWare under mobility, by measuring the extent to which the patterns in EmbryoWare can be maintained in a valid state in spite of mobility. In particular, we are interested in the relationships between the rate of fault generation (which will correspond to the speed of the nodes and hence the rate of change in the network topology) and those factors that affect the rate at which faults are addressed. In essence we are considering how quickly the nodes in an embryonic system can re-differentiate to ensure that the individual nodes are in a valid state. In section 2 we discuss the background to our approach and related work. Then in section 3 we provide a brief overview of the basic EmbryoWare architecture and the changes we have made to incorporate node mobility into our simulations. We then describe our analysis approach and results in section 4. Finally, in section 5 we describe our conclusions and future work.

2

Background

The motivation for our work comes from the increasing utilisation of distributed services, i.e. services whose outcomes depend on the interaction of different components possibly running on different processors. Distributed services typically 1

We refer to a fault as any circumstance in which a node is not operating in a steady state but rather a state in which subsequent sensing is likely to lead to a differentiation of the node type. This should be distinguished from a node failure, where the node has failed to operate correctly due to some other operational reason.

Fault Tolerance of Embryonic Algorithms in Mobile Networks

51

require complex design with regard to the distribution and coordination of the system components. They are also prone to errors related to possible faults in one (or more) of the nodes where the components execute. This is particularly significant for applications that reside on open, uncontrolled, rapidly evolving and large–scale environments, where the resources used for providing the service may not be on dedicated servers (as the case in many grid or cloud computing applications) but rather utilise spare resources, such as those present in user’s desktops or even mobile devices. (Examples of such scenarios are the various projects making use of the BOINC or similar platforms2 .) Other examples of distributed applications where each node takes on specific functionality include: peer-to-peer file sharing; distributed databases and network file systems; distributed simulation engines and multiplayer games; pervasive computing [4] and amorphous computing [5]. With all of these applications there is a clear need to employ mechanisms that enhance robustness and reliability, ensuring the system’s ability to detect faults and recover automatically, restoring system–level functionalities in the shortest possible time. In this work, we deal with problems arising when the topology changes due to nodes mobility. While as of today the vast majority of distributed services are meant to run over static nodes, the increasing penetration of powerful mobile devices (smartphones) has the potential of boosting the adoption of similar approaches in the mobile computing field. Even when the devices themselves are not mobile there still exists the potential for changes to the network topology due to approaches such as intelligent routing. We report the following example of applications, which help in better positioning our work. Example 1 Wireless Grid Computing: One example of the kind of applications our framework applies to is the so–called wireless grid computing [6, 7, 8]. This applies the same principles underpinning grid computing research to mobile phones. Sharing the load for performing heavyweight computational tasks across a plurality of devices can provide advantages in terms of completion time and load balancing. The possibility that the network topology can change dynamically introduces an additional level of complexity with respect to grid computing scenarios, due to the need to ensure that tasks will get completed even in the presence of disconnections. Example 2 Distributed Sensing Platforms: Current state-of-the-art smartphones are sensor–rich. They typically include at least a camera (video and image sensor), a microphone (audio sensor) and short–range communication capabilities (such as Bluetooth and WiFi). Smartphones carried around by users could therefore be used as a distributed wireless sensing platform [9, 10]. Such a platform could be used to gather environmental information. An example is the distributed search engine considered in [11]. Example 3 Mobile Data Sharing: As smartphones are commonly equipped with some form of short–range wireless communications, they could be used to exchange data and content in a peer–to–peer fashion [12,13,14]. Going beyond pure 2

http://boinc.berkeley.edu/

52

D. Lowe et al.

flooding–based strategies (` a la Gnutella) requires the introduction of a distributed indexing/caching services, which should be able to ensure some system–level performance (related, e.g., to the ability of locating and retrieving given content) even in the presence of device mobility. We are particularly interested in distributed services whereby the desired system–level behaviour (or: system–level configuration, meaning the mapping of devices to ’types’, where different node types carry out different behaviours) can be expressed in terms of spatial constraints between the nodes and their types. An example could be “A node of type A has to be no more than two hops away from a node of type B” or “Any node of type C shall have no more than two 3–hop neighbours of type D”. Robustness in distributed computing systems is a well–studied topic. Classical fault–tolerance techniques include the use of redundancy (letting multiple nodes perform the same job) and/or the definition of a set of rules triggering a system reconfiguration after a fault has been detected [15]. In many cases however it is not feasible to pre–engineer all possible failure patterns and the consequent self-healing actions to be taken for restoring global functionalities. In previous work by two of the authors [16], we considered the potential for using bottomup approaches inspired by embryology to the automated creation and evolution of software. In these approaches, complexity emerges from interactions among simpler units. It was argued that this approach can also inherently introduce self–healing as one of the constituent properties without the need to introduce separate fault–handling behaviours. The ability of a node to restore from a faulty to a normal state is a side-effect of the system’s normal process of differentiating into the locally correct node type.

3

EmbryoWare Architecture

EmbryoWare [3] applies concepts inspired by cellular development to the design of self–healing distributed software systems, leveraging off previous research conducted in the evolvable hardware domain. Such approaches, which gave rise to the embryonics research field [1, 2], are based on the use of “artificial stem cells” [17, 18], in the form of totipotent entities that can differentiate – based on sensing of the state of neighbouring cells – into any component needed to obtain the desired system–level behaviour. In general, we define an embryonic system as a system composed of networked entities that: 1. Are able to sense the state (or: type) expressed by neighbouring entities, i.e., those immediate neighbours with which direct communication is possible, or those entities for which information is provided by immediate neighbours; 2. Are able to differentiate their behaviour into a given type, depending on the type expressed by neighbouring entities and according to a set of well-defined rules; 3. Are able to replicate to neighbouring entities (i) the definition of all types (ii) the set of differentiation rules.

Fault Tolerance of Embryonic Algorithms in Mobile Networks

   

    

   

 

  

   

   



53



  

    

   

 

  

   



   

Fig. 1. EmbryoWare Architecture, showing two neighbouring nodes

Our specific architecture is shown in Figure 1 for the case of two neighbouring nodes. Nodes are organised in a network, and each node contains the following components: – Genome: defines behaviour of the system as a whole, and determines the type to be expressed based on local context (i.e., neighbour cell types). – Sensing agent: component that periodically communicates with neighbours regarding their current type. We consider in this work pull sensing, in which each node periodically polls its neighbours to inquire about their currently expressed type (as distinct from push sensing, in which each node ‘pushes’ information on its type to its neighbours). – Replication agent: component that periodically polls the neighbours about the presence of a genome; if a genome is not present then the current genome is copied to the “empty” cell. – Differentiation agent: component that periodically decides, based on the cell’s current type and the knowledge about the types of the neighbouring cells, which functions should be performed by the node. In our earlier work we discussed some possible design choices and considered the overall system performance – including the impact of network characteristics such as latency and dropped data packets [3]. However, whilst the algorithms themselves are independent on the network topology, we did not measure the impact of mobile nodes, and hence of a changing network topology. When the topology changes the local neighbourhood for nodes is affected, and this can mean that the node types are no longer correct. This can be interpreted as the introduction of faults into the system, and hence have significant implications for the ongoing validity of the system. 3.1

Case Study: Coordinated Data Sensing and Logging

The following example scenario will be used throughout this paper: a number of mobile wireless sensor devices are deployed over an area for the purpose of

54

D. Lowe et al.

environmental monitoring. Each device collects sensor information from its surroundings, and the data collected must be logged within the local neighbourhood (to minimise longer range communication overheads). This means that each monitoring node should be within only a few hops of a logging node. In this case study we set this distance to two hops. When a monitoring node, through sensing its neighbourhood, discovers that it is not within two hops of a logger, then it will probabilistically differentiate into a logger. The differentiation behaviours are given in Algorithm 1 and the pattern that results is illustrated in Figure 2. It is worth remarking that this specific example could be regarded as a clustering problem, in which cluster heads need to be at a maximum distance of four hops. Similar problems have received attention in the ad hoc network community, in particular related to the problem of computing the connected dominating set (CDS) [19]. This problem could be addressed in a traditional way by, e.g., first computing the CDS of the original network and then computing the CDS on the resultant overlay. However we believe that the EmbryoWare solution is much simpler, more compact, and able to handle faults in an intrinsic way. A comparison with existing cluster construction algorithms is a good topic for future work.

– Stem cell: with probability PT toM : T ype ← Monitor – Monitor cell: no 2-hop logger ⇒ with probability PM toL : T ype ← Logger with probability PM toT : T ype ← Stem – Logger cell: 2-hop logger ⇒ with probability PM LtoT : T ype ← Stem with probability PLtoT : T ype ← Stem

Algorithm 1. Differentiation behaviour for Genome for simple environment logging application

In the subsequent sections, we will evaluate the impact on the system validity (i.e. the ability to return to a correct state from a state that include faults), in the case of a time–varying network topology due to nodes mobility, of different choices for the sensing period, i.e., the time elapsed between consecutive polls of a nodes neighbour. Furthermore, we will consider two options related to the timing of when a node becomes aware of a change in the topology. The baseline behaviour would be that nodes operate completely independently except for the periodic sensing. In our earlier work, with a fixed topology, this sensing only gave information on the current type of neighbouring nodes. With mobile nodes becoming a possibility, the sensing will give not only information on nodes types but also node connections – i.e. the local neighbourhood topology. This means that if the topology changes due to node movement (or failure) then each node will only become aware of that, and respond to it through appropriate differentiation, after its next sensing operation. We refer to this sensing behavior

Fault Tolerance of Embryonic Algorithms in Mobile Networks

55

Fig. 2. Example differentiated node pattern: The red (darker) circles represent monitoring nodes and the green (lighter) circles are logging nodes. Nodes with a black centre are currently in an invalid state

as connection unaware. The alternative to this is if the node maintains a continuous awareness of its connections to other nodes (through relevant lowerlevel communications mechanisms, such as the loss or gain of a carrier signal and/or reception of appropriate beacon messages) then it could become aware of a changed topology much sooner than the next sensing cycle. In this situation it would be able to react much more quickly. We call this mode of operation connection aware. The implications of these two different sensing behaviours will be analysed in the following section.

4

Performance Evaluation under Node Mobility

We now evaluate the impact of mobility on the fault-tolerance properties of the scenario described in Section 3.1. Initially, the overall system may be in a valid state (i.e. all monitoring nodes within 2 hops of a logger). However, as nodes move, and the topology changes, the validity of the system can be affected. Consider the cluster of monitoring (red) nodes around (1, 7) in Figure 2. If these nodes were to move upwards then they would become isolated from the associated logging node at (1, 5), and hence they would be in an fault state. This fault would persist until the nodes were able to sense the lack of a neighbourhood logger, and one of the nodes in this cluster differentiated into a logger. A key performance characteristic to evaluate the system’s self-healing ability is the percentage of time that the system is in an invalid state (i.e. a fault in the system is persisting). Two factors will affect this: the frequency with which faults arise, and the speed with which they are then corrected. The former should be predominantly related to the rate of change in the topology, and hence the speed at which the nodes are moving. The latter will be related to the speed with which the fault is detected, and hence the sensing behaviour.

56

D. Lowe et al.

Understanding the relationship between system validity, node speed, and sensing behaviour is important insofar as it allows us to appropriately tune the behaviours of the system. Sensing the state of neighbours (or, as discussed above, the existence of network connections) incurs both processing and bandwidth overheads. If we have a sensing behaviour that performs more rapid sensing than is necessary, then we are wasting resources. To evaluate the extent to which each of these factors plays a role we extended the Matlab simulations from our previous work in order to incorporate node mobility. The basic algorithms for implementing the embryonic behaviours are outlined in [3]. These were modified in several ways. Firstly, the nodes have been made mobile. They have an initial location and a random (uniform distribution) velocity that only changes when the the node reaches the edge of the containing area (a lossless reflection). All nodes continuously move, with connections existing between nodes only when they are within a specified range of each other. The node network shown in Figure 2 was generated using N = 40 nodes initially randomly distributed in a 10m × 10m area, with nodes being connected when they within 2m of each other. We then undertook two main fault-tolerance evaluations – using each of the two primary sensing behaviours described above. To evaluate the fault-tolerance, we varied the maximum node velocity over the range 0...2m/s, and the sensing period over the range 0.05...0.8secs. For each pair of velocity and sensing period values, we ran 10 simulation passes, with each pass running for an initial period to allow node replication to occur, and then a 60sec evaluation period where we measured the proportion of time during which no fault was present in the network. 4.1

Connection Aware versus Connection Unaware Sensing

The first set of analyses were carried out for the two sensing behaviours described previously. Figure 3 graphs the overall system fault rate (i.e. the percentage of time for which the system contains at least one faulty node, i.e. a node is not within 2 hops of a logging node and hence needs to differentiate to return to a valid node type) against node speed and sensing period for the two cases discussed above (i.e. where the nodes do, and do not, retain awareness of the existence and loss of network connections). As can be seen from these results, in both cases there is a noticeable, though expected, increase in the percentage time that the system contains at least faulty node as the node mobility increases. Of interest is that this increase is gradual and relatively linear, and there does not appear to be a point at which the ability of the system to recover collapses. This is an important observation insofar as the implications for varying node speeds that can be tolerated. Somewhat more surprising is the result with regard to variations in the sensing period. In the “connection aware” case, variations in the sensing period appear to have only marginal effect on the fault recovery. This can be explained as follows: when a connection between two nodes is broken because of node movement, the direct neighbouring nodes will become aware of this immediately and any

Fault Tolerance of Embryonic Algorithms in Mobile Networks

57

Fig. 3. Fault recovery: Results showing the percentage of time that the system contains at least one faulty node for varying node maximum speed and varying sensing times: (a) where the nodes retain awareness of the existence or failure of network connections; and (b) where the nodes do not monitor the state of the network connections. (Generated by Matlab files GenomeTester v3j.m and GenomeTester v3k.m)

information that either node obtained from the other node is removed from its list of sensed data. This means that the node differentiation can then occur immediately, rather than needing to wait for the next sensing period. The details of the implementation of this are given in Algorithm 2. The only occasions when an immediate re-differentiation does not occur is where the directly impacted nodes are still in a valid state, and it is nodes further away in the neighbourhood that are the only ones that enter a faulty state. In this case the re-differentiation that corrects the fault must wait for a sensing cycle to occur. Overall, this particular behaviour leads to a more rapid response to changes in the network topology and a relative Independence of the sensing period, but does require that all nodes retain constant awareness of their connectivity to nearby nodes (often this would be available through the presence of a carrier signal) with the associated resource overheads that this implies. In the “connection free” case, there is a slightly stronger relationship with the sensing period. As can be seen, as the sensing period gets longer, the percentage for all i ∈ Nodes do if Node movement leaves region then reverse Node velocity update Node location for all i, j ∈ Nodes do calculate distance(i, j) if distance(i, j)< commRange then connected(i, j)=true for all i, j ∈ Nodes do if !connected(i, j) then delete Node i sensed data obtained from Node j

Algorithm 2. Node movement

58

D. Lowe et al.

Fig. 4. Connection unaware sensing: Results showing the average percentage of time that a node is faulty for varying node maximum speed and varying sensing times, where the nodes do not monitor the state of the network connections. (Generated by Matlab file GenomeTester v3k.m)

of time that the system contains faulty nodes increases. We can understand this relationship more clearly by looking not only at the time that the whole system is valid (i.e. no nodes at all that are in a faulty state), but at the average validity of each individual node. Figure 4 shows these results. As can be seen, there is a much more significant relationship to the sensing period. Several other observations arise from this data. Firstly, it appears that there is a baseline fault rate that even extremely rapid sensing cannot improve – for example, with the system configuration used in these simulations3 , at a node maximum speed of 0.25m/s, it does not appear possible to reduce the average percentage of time that nodes are in a fault state below 1% irrespective of how quickly the sensing occurs. We believe that this is an artifact of the algorithmic sequencing in our simulation – though even if this is the case, similar behaviours would be likely to emerge in real-time code executing on live mobile devices. A second observation arising from the data shown in Figure 4 is the increasing volatility of the a verage node fault rate as the sensing period increases. The processes being evaluated are inherently stochastic, both in terms of the speed and associated movement of the nodes (and hence the changes to network topology), and in terms of the node differentiation decisions. At low sensing periods the baseline fault rate (as discussed above) tends to dominate the behaviour. At slower sensing rates however the delay in returning to a valid state from a fault state appears to be significantly more variable. This may be an issue that needs to be taken into account with applications that cannot afford extended periods of unavailability of individual nodes – though it is worth acknowledging that embryonic systems are designed explicitly so that they do not rely on the behaviour, or indeed even availability, of individual nodes. 3

Relevant factors in the configuration are likely to be area size, number of nodes and hence node density, and the probabilities that affect the differentiation behaviours.

Fault Tolerance of Embryonic Algorithms in Mobile Networks

5

59

Conclusions and Further Work

In this paper we report performance measurements with regard to the fault tolerance of a distributed processing architecture, based on embryonic principles, where the nodes in the system are mobile. The node mobility inherently leads to constant changes in the network topology for the system, and hence changes in the local neighbourhood for individual nodes. This in turn can lead to those nodes being temporarily in a fault state. This fault state in inherently rectified by the self-healing differentiation processes in the nodes – but this process does take time. We have evaluated the relationship between node speed, node sensing period, and fault recovery. Interestingly, we found that rather than reaching a “knee” in the performance curve where above a certain node speed the system performance collapsed and became unable to recover from the increasing number of faults, the relationship between node speed and fault recovery was relatively linear. This is likely to be an important finding in terms of dynamic adaptation of the sensing periods in the nodes in ensuring that the performance remained above a specified level. We also have shown that the fault recovery performance becomes much less dependant upon the sensing period if nodes are able to continuously monitor the existence (or loss) of the network connections. This monitoring is unlikely to be feasible in systems involving, for example, sensor networks where the communication is intentionally very sporadic in order to minimise resource utilisation (i.e. most commonly power and/or bandwidth). However in other domains where the connection is maintained (or at least there is a constant carrier) this finding will be significant in that it indicates a much lower sensing rate, and hence lower processing and bandwidth overheads, will be tolerable. One aspect that we have not considered, and which is a fruitful source for future investigation, is the possibility of replacing (or even supplementing) state sensing with pro-active state broadcasting. In this scenario when a node changes its state it would broadcast its changed state to its neighbour. This may circumvent the need for monitoring of the connection (as described in the previous paragraph) as a simpler way of making the performance less dependant on the sensing period. However this could also introduce excessive messages when mobility is high, and a compromise would have to be found. Our measurements are performed over a particular case study: the logging scenario. Ideally, one would like to know the general fault-tolerance properties of the EmbryoWare approach. For this purpose, as a future work, it would be interesting to evaluate several different cases, and see whether they share common fault handling patterns.

References 1. Ortega-Sanchez, C., Mange, D., Smith, S., Tyrrell, A.: Embryonics: a bio-inspired cellular architecture with fault-tolerant properties. Genetic Programming and Evolvable Machines 1(3), 187–215 (2000)

60

D. Lowe et al.

2. Tempesti, G., Mange, D., Stauffer, A.: Bio-inspired computing architectures: the embryionics approach. In: Proc. of IEEE CAMP (2005) 3. Miorandi, D., Lowe, D., Yamamoto, L.: Embryonic models for self-healing distributed services. In: Proc. ICST Bionetics, Avignon, France (2009) 4. Saha, D., Mukherjee, A.: Pervasive computing: A paradigm for the 21st century. Computer 36(3), 25–31 (2003) 5. Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G., Thomas, F., Knight, J., Nagpal, R., Rauch, E., Sussman, G.J., Weiss, R.: Amorphous computing. Communications of the ACM 43(5), 74–82 (2000) 6. McKnight, L.W., Howison, J., Bradner, S.: Wireless grids — distributed resource sharing by mobile, nomadic, and fixed devices. IEEE Internet Computing 8 (2004) 7. Ahuja, S.P., Myers, J.R.: A survey on wireless grid computing. J. Supercomput. 37, 3–21 (2006) 8. Palmer, N., Kemp, R., Kielmann, T., Bal, H.: Ibis for mobility: solving challenges of mobile computing using grid techniques. In: Proc. of HotMobile, pp. 1–6 (2009) 9. Akyildiz, I.F., Melodia, T., Chowdhury, K.R.: A survey on wireless multimedia sensor networks. Computer Networks, 921–960 (2006) 10. Campbell, A., Eisenman, S., Lane, N., Miluzzo, E., Peterson, R., Lu, H., Zheng, X., Musolesi, M., Fodor, K., Ahn, G.S.: The rise of people-centric sensing. IEEE Internet Computing 12, 1–21 (2008) 11. Yan, T., Ganesan, D., Manmatha, R.: Distributed image search in camera sensor networks. In: Proc. of ACM SenSys. (2008) 12. Ding, G., Bhargava, B.: Peer-to-peer file-sharing over mobile ad hoc networks. In: Proc. of IEEE PerCom Workshops, pp. 104–108 (2004) 13. Marossy, K., Csucs, G., Bakos, B., Farkas, L., Nurminen, J.: Peer-to-peer content sharing in wireless networks. In: Proc. of IEEE PIMRC., vol. 1, pp. 109–114 (2004) 14. Kel´enyi, I., Cs´ ucs, G., Forstner, B., Charaf, H.: Peer-to-peer file sharing for mobile devices. In: Fitzek, F.H.P., Reichert, F. (eds.) Mobile Phone Programming — Application to Wireless Networking, pp. 311–324. Springer, Heidelberg (2007) 15. Coulouris, G., Dollimore, J., Kindberg, T.: Distributed systems: concepts and design. Addison-Wesley Longman, Amsterdam (2005) 16. Miorandi, D., Yamamoto, L., De Pellegrini, F.: A survey of evolutionary and embryogenic approaches to autonomic networking. Computer Networks (2009) (in press), doi:10.1016/j.comnet.2009.08.021 17. Mange, D., Stauffer, A., Tempesti, G.: Embryonics: a microscopic view of the molecular architecture. In: Sipper, M., Mange, D., P´erez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 185–195. Springer, Heidelberg (1998) 18. Prodan, L., Tempesti, G., Mange, D., Stauffer, A.: Embryonics: artificial stem cells. In: Proc. of ALife VIII, pp. 101–105 (2002) 19. Wan, P.J., Alzoubi, K.M., Frieder, O.: Distributed construction of connected dominating set in wireless ad hoc networks. Mobile Networks and Applications 9(2), 141–149 (2004)

Evolution and Analysis of a Robot Controller Based on a Gene Regulatory Network Martin A. Trefzer, T¨ uze Kuyucu, Julian F. Miller, and Andy M. Tyrrell Department of Electronics, University of York, UK {mt540,tk519,jfm7,amt}@ohm.york.ac.uk

Abstract. This paper explores the application of an artificial developmental system (ADS) to the field of evolutionary robotics by investigating the capability of a gene regulatory network (GRN) to specify a general purpose obstacle avoidance controller both in simulation and on a real robot. Experiments are carried out using the e-puck robot platform. It is further proposed to use cross-correlation between inputs and outputs in order to assess the quality of robot controllers more accurately than with observing its behaviour alone.

1

Introduction

Biological development encompasses a variety of complex dynamic systems and processes at different levels, ranging from chemical reactions at the molecular level, to single cells or groups of cells dedicated to specific tasks to complex multicellular organisms that are capable of adapting to changing environments and exhibit remarkable capabilities of scalability, robustness and damage recovery [1]. It is remarkable how this large number of complex mechanisms work together in nature over long periods of time in an effective and productive manner. This makes biological development a source of inspiration for research into modelling its principles and applying them to engineered real-time systems. Current research in the area of artificial developmental systems (ADS), gene regulatory networks (GRNs) and artificial life (ALife) concentrates on both studying developmental processes from a complex dynamic systems point of view and regarding their versatility in providing an indirect mapping mechanisms between genotype and phenotype. In the first case, the properties of ADSs, particularly GRNs, are investigated by identifying transient states and attractors of such systems [2,3]. Hence, these approaches offer a more theoretical approach to modelling biological development. In the second case, there are examples where GRNs are utilised to grow neural networks or nervous systems for artificial agents [4]. Research that is undertaken into growing large, complex organisms that can represent a variety of things, such as patterns [5] morphogenesis in general [6,7] or designs [8] also fits into the second category. A third research thread seeks to exploit inherent properties of ADSs, such as the ongoing interaction between cell/organism and environment, multicellularity, chemical based gene regulation and homoeostasis, in order to achieve G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 61–72, 2010. c Springer-Verlag Berlin Heidelberg 2010 

62

M.A. Trefzer et al.

(a) Mechanisms of the GRN

(b) Structure of genes and GRN

Fig. 1. Protein interaction and regulatory feedback mechanisms of the ADS are shown on the left. On the right, it is illustrated how genes are divided into precondition and postcondition. Proteins can occur in both pre- and postcondition whereas molecules can only occur in the precondition.

adaptive, robust and scalable control mechanisms for robotic systems. Research is undertaken into emergent, autonomous, collaborative behaviours [9,10,11] and modular robotics [12]. In [13] it is shown that GRNs are a viable architecture for the on-line, real-time control of a robot. This paper introduces a GRN based robot controller, similar to the one presented in [13]. It is investigated whether the chemical regulation based GRN mechanisms of the ADS introduced in [14] are suitable to specify a general purpose obstacle avoidance controller for the e-puck robot platform1 . Evolutionary experiments have been conducted in simulation using the Player/Stage simulator and are validated on a real robot. Furthermore, the paper proposes using cross-correlation between inputs and outputs of the GRN controller to assess its quality and ability to adapt beyond observing behaviour alone. Here, the term adaptivity refers to a controller’s ability to automatically calibrate itself to perform the task on which it was trained in both known and unknown environments (in simulation and hardware).

2

The Artificial Developmental System

The GRN based model for artificial development used in this paper is based on the one that has been introduced, and is described in more detail, in [14]. The design considerations of the original ADS are retained, namely the use of data structures and operations in the GRN core that are suitable for embedded systems, i.e. boolean, integers, no division and to keep the mechanisms of the ADS as close as possible to their biological counterparts within the boundaries of the chosen data types. Whilst not crucial for the experiments in this paper, the choice of the data structures imposes no loss of generality and is therefore unchanged. However, some improvements are made to the ADS for this paper: first, a dedicated diffusion layer is added and only chemicals that are released to this layer by the cells are subject to diffusion. Chemicals need to be absorbed by the cells from the diffusion layer first, before they affect gene regulation. This is motivated by natural development. Second, a genetic representation that allows 1

http://www.e-puck.org/

Evolution and Analysis of a Robot Controller

63

for variable length GRNs is used in this paper, which allows for a more flexible and compact encoding of the genes, as shown in Figure 1(b). An overview of the mechanisms of the ADS is provided in the following sections. The term chemicals refers to both proteins and molecules. As the experiments in this paper are performed using one single cell, the description of cell signalling mechanisms and growth are omitted. A more detailed description of the ADS can be found in [14]. 2.1

Representation and Gene Regulation

The core of the developmental model is represented by a GRN, as shown in Figure 1(a). The genotype is implemented as a string of symbols that encode start and end of genes, separation of pre- and postcondition within genes, binding sites and chemicals as shown in Figure 1(b). Genes interact through chemicals and form a regulatory network. There is at least one major difference between the artificial model and biology: in the ADS used, the binding sites match exactly one chemical, whereas in natural genes binding sites are defined by certain upstream and downstream gene sequences that accept a number of proteins to bind and transcribe their genetic code. The binding sites in natural DNA therefore allow for smooth binding, i.e. the probability that certain chemicals (transcription factors) bind to the DNA is given by how well the binding sites of the chemical matches the one of the DNA. The current GRN works with four proteins (A . . . D) and eight molecules (a . . . h). Proteins are directly produced by the GRN, whereas molecules are only a product of a gene function as a result of a measurement or interaction that is performed by a protein. In addition to gene regulation, proteins implement dedicated functions and mechanisms of the ADS. Protein A (structuring/functional ) defines the cell type, B (sensory) translates sensory inputs into molecules, C (diffusion) manages chemical diffusion and D (Plasmodesmata) controls chemical sharing/exchange between adjacent cells and growth. Note that the additional roles of chemicals for robot control are described in Sections 2.3 and 3. 2.2

Evolution of the Genotype

The genotype is derived from a genome that is evolved using a 1 + 4 evolutionary strategy (ES). The genome is represented by a string of integers and mutation takes place by replacing them with a new random value at a rate of 2% of the genome length. The GRN is obtained by mapping the string of integers to GRN symbols using the modulus operation on the genome. Variable length genes are achieved via (in)active flags encoded in the genes. 2.3

Developing Organisms That Control Robots

The application in this paper is to control a robot via a GRN. Therefore, the GRN has to be able to process input signals from the robot’s infra-red (IR) range sensors and the outputs have to be translated into motor commands.

64

M.A. Trefzer et al.

Since the GRN operates on chemical concentrations, this is achieved by mapping the distance measures to input chemical concentrations and by computing speed and turning rate of the robot from output chemical concentrations. Molecules are suitable to present the sensory inputs to the GRN since they affect gene regulation and can be consumed, but not directly produced. Hence, it is not possible for the GRN to directly generate input signals which are not actually present in the environment. However, molecules can be indirectly produced via the sensory protein B (Section 2.1). Contrary to the molecules, the GRN is able to quickly change the levels of the proteins (A-D) as they can be both consumed and directly produced, hence, proteins naturally represent the outputs of the system. Thus, values for speed and turning rate of the robots are calculated from protein levels. Furthermore, as proteins occur in the precondition, they provide feedback of the states of the outputs to the GRN, which can be exploited by the organism for adaptation and self-regulation. Due to the fact that one GRN with a sufficient number of proteins is able to process the inputs of one robot, a single-cell organism is used to control the robot in the experiments described.

3

E-Puck, Player/Stage and GRN

The experiments presented in this paper are carried out using the e-puck robot platform. Evolution of the ADS that controls the robot and testing on different maps is performed using the open-source robot simulation platform Player/Stage2. Verification of the controller is achieved on a real e-puck robot. As described in Section 2.3, sensory inputs and motor signals are mapped to chemical concentrations which can be processed by the GRN. Due to the 16bit processor available on the e-puck, the maximum protein level is 65535 and the mapping functions for input and output signals are designed in such a way that the full protein value range is utilised. An important and still open question is how the time scales of development and the robot should be related to each other. In biology, for instance, neural networks operate at a greater speed than gene regulation, which inherently constrains those systems to certain tasks. In the case of engineering and computer science, those boundaries are not existent and therefore subject to research. In this paper, one developmental step of the controller (GRN) corresponds to one sensor/motor update cycle of 10 Hz. The latter value is given by the e-puck robot and is set accordingly in simulation. 3.1

Mapping Sensory Inputs

The e-puck provides 8 IR distance sensors, which are positioned around the outside of the robot at 10◦ , 45◦ , 90◦ , 150◦, −150◦ , −90◦, −45◦ and −10◦. The range of the IR sensors is theoretically about 10 cm for the real robot. However, measuring and calibrating the actual IR sensor ranges of the e-puck used for these experiments shows linear behaviour only up to a maximum range of 5 cm, which 2

http://playerstage.sourceforge.net/

Evolution and Analysis of a Robot Controller

65

is assumed as an approximation for the Player/Stage simulation, whereas the different sensors are assigned different maximum ranges in the case of the real e-puck according to the measurements taken (see max range below). For simplicity, linear behaviour is also assumed in the case of the e-puck, despite the fact that an accurate calibration would have to take the exponential characteristics of the IR diodes into account. This leads to the following equation for mapping IR sensor readings to input chemical levels: ⎧ ⎨65535 × (1 − sensori ) if sensori < max rangei max rangei chem leveli = (1) ⎩0 if sensori ≥ max rangei with max range = 0.05 for all i in the uncalibrated case and max range0..7 = 0.03, 0.05, 0.05, 0.05, 0.005, 0.03, 0.015, 0.005 in the calibrated case. 3.2

Deriving Motor Command Signals

Both the e-puck and its simulation model provide an interface that enables the speed and turning rate to be set. While speed is a value without unit between -1 and 1 (maximum reverse speed/ forward speed), the turning rate is expected in radian. Hence, computing a value for speed and turning rate from the output chemical levels can be achieved in a straight forward manner: newspeed =

0.15 × ((proteinA − proteinB )/65535) + 0.05

(2a)

newturnrate =

3.0 × ((proteinC − proteinD )/65535).

(2b)

where maximum speeds between −0.1 · · · + 0.2, the forward speed bias of +0.05 and possible turning rates between −171◦ · · · + 171◦ are arbitrarily chosen. The factors (proteinA,C − proteinB,D )/65535 are normalised to [-1,1].

4

Evolution and Analysis of a GRN Based Robot Controller

The task is to optimise a GRN based controller for an e-puck robot via an EA. This experiment is carried out using a simulation model of the e-puck in Player/Stage. The aim is to achieve obstacle avoidance and area coverage in the map shown in Figure 2(a). A relatively basic map is chosen, since the aim is to achieve a low-level controller, which interacts directly with the robot’s sensors and actuators rather than operating on a higher abstraction level using predefined actions or behaviours. The size of the map is 1.6 m × 1.6 m with x/y coordinates between −0.8 m.. + 0.8 m. 4.1

Fitness Function

The fitness is the averaged score of three rounds of a maximum of 1000 time steps (= developmental steps) each. The chemical levels are initialised to 0 before the

66

M.A. Trefzer et al.

Algorithm 1. Pseudo-code of the fitness function used for three rounds do reset score reset previous distance randomise starting position and angle of the e-puck with x,y in the range of −0.65.. − 0.75 m (lower left corner) angle in the range of 0..360◦ for 1000 time steps do perform sensor reading map distance values to molecule levels (a-h) perform one developmental step calculate new speed and turning rate from protein levels (A-D) send motor commands // stimulating covering distance: if current distance to starting point > previous distance to starting point then score = score + distance end if // stimulating obstacle avoidance: if robot bumps into obstacle or wall then end this round (and the chance to increase score) end if end for add score to fitness end for divide fitness by number of rounds

first round. For subsequent rounds, the state of the developmental system (chemical levels) is retained in order to allow for the ADS to adapt to the environment. The fitness is calculated as shown in Algorithm 1. Note that rounds are only terminated during optimisation when hitting a wall, but not when assessing the behavioural performance in different environments later on. 4.2

Assessing Task Based Performance

In the case of a robot controller, it is possible to qualitatively assess its performance by observing the behaviour of the robot for a period of time and count the number of times it fails to avoid walls or obstacles. The ability of the robot to explore the map and reach the opposite end of the map can be observed by tracking its path. This can be easily achieved in simulation by enabling path highlighting, which is a feature of Player/Stage. In the case of the real robot this becomes more difficult as generally either a video recording or a tracking system is required. A controller with a good performance is re-run for 6000 time steps and the resulting path of the robot is shown in Figure 2(a). The starting point of the robot is in the lower left corner of the map. At the beginning of the run, the robot

Evolution and Analysis of a Robot Controller

(a) Cave 1

(b) Course of Protein Levels

(c) Correlation Matrix

(d) Course of Correlation

67

Fig. 2. The maze that is used to evolve the GRN robot controller, protein levels and correlation matrix are shown in the figure. a − b are inputs, A − D are outputs

bumps into walls twice, indicated by the star symbols. After that, it manages to navigate through the cave with no further collisions. Also, it can be seen from Figure 2(a) that the robot roams the entire cave by following the wall on its left hand side. However, the controller achieves slightly more than just wall-following, as it automatically starts turning and returns to the left wall in case it loses track of it. As can be seen from the tracks at the turning point in Figure 2(a), where the robot turns left in one round and right in another, it is not default behaviour to stop and always turn right when approaching a wall. From this it can be concluded that the GRN achieves control of the robot in a manner that satisfies the requirements of the fitness function: the robot avoids walls and navigates as far away as possible from the starting point. The fact that the robot hits a wall only twice in the beginning of the run suggests some

68

M.A. Trefzer et al.

kind of adaptivity of the GRN based controller. Hence, the controller’s ability to adapt is further investigated in Section 5. 4.3

Measuring Performance Using Cross-Correlation

Although tracking the path of the robot and counting the number of collisions are suitable to verify whether the evolved controller satisfies the behavioural requirements of the fitness function, this provides no information of the complexity of the states and the dynamics of the supposedly adaptive, GRN based controller. It would be particularly useful to have information about how the controller makes use of the input sensor data and in what way the inputs are related to the outputs, i.e. the actions of the robot, since a common problem [15] (although not analysed and published very often) with evolved controllers is the fact that they are likely to ignore inputs to the system but still manage to find partially optimal solutions. In this paper, it is proposed to use cross-correlation as a measure of dependency between sensory inputs and motor outputs. Cross-correlation is a measure of similarity of two continuous functions, which in general also considers a timelag applied to one of them. In this case, it is assumed that the time-shift between input and output is 0. In order to obtain values in a bounded range, normalised cross-correlation is used for the experiments presented: (f ∗ g) =

 (f (t) − f¯) · (g(t) − g¯) 1 · , n−1 t σf · σt

(3)

where f¯, g¯ are the mean values, σf , σg are standard deviations and n is the number of samples of the time series. Note that the usage of mean and standard deviation might be problematic, as the statistical distribution of the samples is unknown. However, using Equation 3 is convenient as the output value range is −1 . . . 1, where −1/1 denote maximum negative/positive correlation and 0 means the signals are uncorrelated. The measured input chemical levels (a-h) and output chemical levels (A − D) for 6000 time steps are shown in Figure 2(b) and the development of the cross-correlation of the chemical levels is shown in Figure 2(d). As can be seen from Figure 2(b) and 4(c), input sensors f,g,h (front, left) show almost constant activity, a,b,c (front, right) show only occasional peaks and d,e (rear) are almost always zero. This corresponds to the observed behaviour where the robot follows the left wall and only occasionally encounters a wall on its right hand side. In order to answer the question whether the inputs are actually considered by the controller when generating the output chemical levels A, B, C, D—which define speed and turning rate of the robot according to Equation 1—, the course of the cross-correlation values over time (at each point in time from 0 . . . t) for each input/output chemical pair is shown in Figure 2(d). At the beginning of the run, the cross-correlation values keep changing before settling to particular values (although there are still slight adjustments taking place at later iterations, e.g. in the case of a,c,h). Again, this suggests that, to a certain extent, an adaptation

Evolution and Analysis of a Robot Controller

(a) Cave 2

(b) U-Maze

(c) Distributed Obstacles

(d) Cross-corr. Cave 2

(e) Cross-corr. U-Maze

(f) Cross-corr. Obstacles

69

Fig. 3. The figure shows a comparison of the behaviour of the GRN controller in different maps

process is taking place at the beginning of the run. However, this needs to be further consolidated by the following sections, when the controller is tested on different maps and on a real robot. For a better overview, the cross-correlation matrix of time step 5000 is shown in Figure 2(c), including the differences A − B (∝ speed) and C − D (∝ turning rate). Looking at the correlation matrix (black = max. absolute correlation, white = no correlation), it can be confirmed that there is a correlation between speed and turning rate and the sensors at the front and the sides. It is interesting to see that A−B appears to be correlated to B, but not A, and C − D appears to be more correlated to C than D. This suggests that the evolved controller keeps A and D relatively constant and achieves changes in speed and turning rate by adjusting the chemical levels of B and C.

5

Test and Analysis on Different Maps

In order to investigate whether the controller exhibits—at least to a certain extent—adaptive behaviour, it is tested in simulation on three different maps, shown in Figure 3(a,b,c). As can be seen from the recorded tracks, the controller successfully navigates the robot through the three maps, which feature different characteristics: the first map (Figure 3(a)) is similar to the one in which the controller is evolved. Hence it is expected that the robot does not collide with walls in this case. Since the primary behaviour of the evolved controller is wall-following,

70

M.A. Trefzer et al.

(a) Trace of the real e-puck.

(b) Correlation Matrix

(c) E-Puck sensors

Fig. 4. Results from experiments with the real e-puck are shown in the figure

the robot explores the additional branches present in this map, rather than going straight to the opposite side of the map. The second map (Figure 3(b)) represents a simple U-maze. The important feature of this map are the straight edges and the sharp 90◦ turns, which impose challenges on the controller. It is observed that, although the robot manages to navigate through the entire map, it hits the wall in all cases where it approaches the wall under a right angle. In those cases, the controller is unable to decide in which direction to turn. The third map (Figure 3(c)) is significantly different one to the one used for the evolution of the controller. Despite that, the robot successfully navigates around the obstacles and explores the map. It is interesting to see that in this case the robot bumps into obstacles only in the beginning of the run (starting point is in the lower right corner of the map) and manages to successfully avoid all obstacles as time goes on. This is again a hint that an adaptation process is actually happening. When comparing the cross-correlation matrices in Figures 3(d,e,f), it can be observed that the cross-correlation values at which the controller settles at time step 5000 are different for different environments. Particularly in the case of the third map (Figure 3(c)), the correlation between the outputs and sensors c,d,e,f has increased. This indicates that those sensors are playing a more important role in the case of the third map. The results show that the cross-correlation matrix looks different for different maps, which indicates that the controller indeed features different states of operation, depending on the environment.

6

Test and Analysis on an E-Puck Robot

In order to show the relevance of the presented experiments for real-world applications, the evolved GRN robot controller is tested on an e-puck robot. The only modifications that are made for the real robot is using the calibrated maximum sensor range values rather than the same for each sensor, as described in Section 3 and Equation 1. The results obtained with the real robot are shown in Figure 4. For visualisation, the path of the robot has been manually traced in Figure 4(a) using Player/Stage. It is observed that the e-puck is trapped for about the first 2500 out of 6000 time steps in the lower-right corner of the map shown in Figure 4(a), before it successfully resumes its primary wall-following behaviour, from then on without

Evolution and Analysis of a Robot Controller

71

getting stuck in equal situations again, and navigate through the map. The fact that this behaviour is then similar to the one observed in simulation (see Figure 3(b)) after it manages to escape the corner indicates that there might be some kind of adaptation to the new environment (the real e-puck) taking place. It can be seen from the cross-correlation matrix that the controller settles in a state that looks similar to the one from simulation, but the development of the cross-correlation values over time is significantly noisier. In order to quantitatively compare the cross-correlation matrices, it will be necessary to define a distance measure or visualise the state space given by the cross-correlation matrices as part of future work.

7

Discussion

This paper3 has explored the application of an ADS to the field of evolutionary robotics by investigating the capability of a GRN to control an e-puck robot. A GRN controller has been successfully evolved that exhibits a general ability to avoid obstacles in different maps as well as when transferred to a real robot. It has been shown that GRN based controllers have the potential to adapt to different environments, due to the fact that the robot successfully managed to navigate through previously unknown maps and could be successfully transferred to a real robot without further modification of the controller. Hence, it is concluded that GRNs are a suitable approach for real-time robot control and can cope with variations inferred by changing environments and sensor noise of a real robot. The results further suggest that it is possible to specify a general purpose obstacle avoidance behaviour via a GRN. It is proposed that cross-correlation between inputs and outputs is a suitable measure to quantitatively assess the quality of robot controllers (particularly evolved ones) beyond observing whether the robot exhibits the desired behaviour only. It has been shown that the cross-correlation settles at different values for different environments. On the one hand, this simply confirms that the level of activity and importance of the sensors changes for different environments. On the other hand, in conjunction with the observation that the robot still exhibits the desired behaviour, different cross-correlation matrices for different environments indicate that the controller features different stable states of operation and shows the ability of the controller to autonomously adapt to a certain extent. As the experiments show, this is the case for different maps in simulation and when transferring the controller to a real robot. However, it is an open question and subject to future work how to investigate whether this emergent adaptivity is a general, inherent property of ’soft’ controllers, rather than ones based on thresholds and decisions, or whether it is a specific feature of the GRN based, developmental controllers like the one introduced in this paper. One of the the greatest challenges in evolutionary computation (EC) is the design of the fitness function. This is particularly true in the case of behavioural fitness functions and real-world systems which can be extremely noisy, i.e. good solutions have a significant probability of being discarded during the optimisation process simply because of unlucky initial conditions at one iteration of the EA. Therefore, 3

This work is part of a project that is funded by EPSRC - EP/E028381/1.

72

M.A. Trefzer et al.

we will explore the possibility of including the cross-correlation measure in the fitness function, in order to provide an additional quality measure which is independent of the behaviour. Even if the robot does not solve the task, it will be possible to emphasise correlation between inputs and outputs which will prevent evolution from ignoring the inputs and might offer a means to overcome sub-minimally competent controllers, particularly in the beginning of the optimisation process.

References 1. Wolpert, L., Beddington, R., Jessell, T., Lawrence, P., Meyerowitz, E., Smith, J.: Principles of development. Oxford University Press, Oxford (2002) 2. Kauffman, S.A.: Metabolic stability and epigenesis in randomly constructed genetic nets. Journal of Theoretical Biology 22, 437–467 (1969) 3. De Jong, H.: Hybrid modeling and simulation of genetic regulatory networks: a qualitative approach. In: ERCIM News, pp. 267–282. Springer, Heidelberg (2003) 4. Astor, J.C.: A Developmental Model for the Evolution of Artificial Neural Networks: Design, Implementation and Evaluation. Artificial Life 6, 189–218 (1998) 5. Miller, J.: Evolving developmental programs for adaptation, morphogenesis, and self-repair. In: Banzhaf, W., Ziegler, J., Christaller, T., Dittrich, P., Kim, J.T. (eds.) ECAL 2003. LNCS (LNAI), vol. 2801, pp. 256–265. Springer, Heidelberg (2003) 6. Eggenberger, P.: Evolving morphologies of simulated 3d organisms based on differential gene expression. In: Fourth European Conference on Artificial Life, pp. 205–213. The MIT Press, Cambridge (1997) 7. Bentley, P., Kumar, S.: Three ways to grow designs: A comparison of embryogenies for an evolutionary design problem. In: Proc. of the Genetic and Evolutionary Computation Conf., Orlando, Florida, USA, pp. 35–43. Morgan Kaufmann, San Francisco (1999) 8. Hornby, G.: Generative representations for evolving families of designs. In: Cant´ uPaz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 209–217. Springer, Heidelberg (2003) 9. Quick, T., Nehaniv, C.L., Dautenhahn, K., Roberts, G.: Evolving Embodied Genetic Regulatory Networks-driven Control Systems. In: Banzhaf, W., Ziegler, J., Christaller, T., Dittrich, P., Kim, J.T. (eds.) ECAL 2003. LNCS (LNAI), vol. 2801, pp. 266–277. Springer, Heidelberg (2003) 10. Floreano, D., Mondada, F.: Evolution of Homing Navigation in a Real Mobile Robot. IEEE Trans. on Systems, Man, and Cybernetics–Part B, 396–407 (1996) 11. Ziegler, J., Banzhaf, W.: Evolving Control Metabolisms for a Robot. Artificial Life 7, 171–190 (2001) 12. Groß, R., Bonani, M., Mondada, F., Dorigo, M.: Autonomous self-assembly in swarmbots. IEEE Trans. Robot, 1115–1130 (2006) 13. Kumar, S.: A Developmental Genetics-inspired Approach to Robot Control. In: Proc. of the Workshops on Genetic and Evolutionary Computation (GECCO), pp. 304–309. ACM Press, New York (2005) 14. Trefzer, M.A., Kuyucu, T., Miller, J.F., Tyrrell, A.M.: A Model for Intrinsic Artificial Development Featuring Structural Feedback and Emergent Growth. In: Proc. of the IEEE Congress on Evolutionary Computation (CEC), Norway (2009) 15. Tarapore, D., Lungarella, M., Gomez, G.: Quantifying patterns of agent-environment interaction. Robotics and {A}utonomous {S}ystems 54(2), 150–158 (2006)

A New Method to Find Developmental Descriptions for Digital Circuits Mohammad Ebne-Alian and Nawwaf Kharma Computational Intelligence Lab, Electrical and Computer Engineering Department Concordia University, Montreal, Québec, Canada [email protected], [email protected]

Abstract. In this paper we present a new method to find developmental descriptions for gate-level feed forward combinatorial circuits. In contrast to the traditional description of FPGA circuits in which an external bit stream explicitly describes the internal architecture and the connections of the circuit, developmental descriptions form the circuit by synchronously running an identical developmental program in each building block of the circuit. Unlike some previous works, the connections are all local here. Evolution is used to find the developmental code for the given problem. We use an innovative fitness function to increase the performance of evolution in search for the solutions, and also relax the position and order of the inputs and output(s) of the circuit to increase the density of the solutions in the search space. The results show that the chance of finding a solution can be increased up to 375% compared to the use of traditional fitness function. The preliminary studies show that this method is capable of describing basic circuits and is easily scalable for modular circuits. Keywords: Developmental Program, Evolutionary Hardware Design, Fitness Function, Scalability.

1 Introduction Evolvable hardware design (EHW) uses Evolutionary Algorithms (EAs) to find an optimum design of digital circuits in terms of surface, speed and fault tolerance. They can also use the physical characteristics of the underlying chip to improve its performance [1][2][3]. Miller [4][5] showed that EHW is also capable of finding innovative designs which outperform the traditional human design in terms of used resources. While EHW can address issues like efficient surface usage, fault tolerance and innovation, they suffer from an instinctively drawback of Evolutionary Algorithms: the solution is usually not scalable. This means that having the solution to the problem of the smaller size usually does not help to find the solution to the problem of the bigger size any faster. Instead, the runtime of the EA usually exponentially grows by the linear increase of the problem size. A solution to overcome the scalability issue in EAs is to break the direct mapping between the genotype and the phenotype. If the genotype has a one-to-one mapping to the phenotype, searching for more complex individuals will be equal to searching a larger and probably higher dimensional space. This eventually will make the EAs to G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 73–84, 2010. © Springer-Verlag Berlin Heidelberg 2010

74

M. Ebne-Alian and N. Kharma

fail finding the solutions to the large problems unless there exists a very efficient encoding. Developmental Programs that grow into a final circuit do not have this problem. The size of the circuit is not bounded by the size of the developmental program (DP), and it is possible to have one DP growing into fully functional circuits of vastly different sizes. In approaches like CGP[6][7], although the solution is a developmental code which tells the connections between the cells but still needs an external module to do the routing between cells on a physical configurable circuit. We try to eliminate this step by making all the connections local (i.e. cells are only allowed to connect to the immediate neighbor cells). If a primary input or an intermediate signal needs to be routed to a cell far away, the neighborhood cells themselves should form a router to pass that input or signal to the destination cell. Each cell by itself should decide to either be a router or performs a logical operation on its input. In this paper we present a method to implement any combinatorial digital circuit in gate level on a grid of configurable hardware elements. The main contribution of this work is that the resulting circuit includes sufficient information to build the functional circuit, including the gate arrangement and the routings. Keeping in mind that a considerable amount of resources on the configurable hardware (e.g. FPGAs) and the circuit compilation time is dedicated to the routing and connections, this property of our method tends to be attractive for practical problems. Also we try to improve the traditional fitness function used in EHW (for example the fitness function used in [5] and [8] or the basic component of the fitness function in [9]) to move toward the optimum solution more efficiently. The improvement to the fitness function is described in details in section 3.3.

2 Circuit Structure and the Developmental Program 2.1 Circuit Structure A circuit here is a two dimensional array of configurable cells. The inputs are provided through the left-most cells and the outputs are read from the rightmost cells. This means that the direction of the signals is from left to right in a high level abstract view (fig. 1.a). To implement this, each cell[i][j] (a cell in row i and column j of the circuit) can only accept inputs from either cell[i-1][j-1], cell[i][j-1] or cell[i+1][j-1] (fig 1.b). This limit on the connections enables the circuit to form without the need of any external processing module for the routings, as in CGP. In CGP, each cell (m) in the row can be connected to each cell (n) as long as n < m. While that description is enough for the circuit to be implemented, there needs to be a routing mechanism for the circuit to physically connect the cell inputs to the other cell outputs. The circuit resulted from our method does not have such a demand. This means that after each cell sets its own function and input connection to the adjacent cells, the routing is already done, without the need of any external central routing mechanism. Each cell in the circuit has an identical developmental program and 5 properties, each of which can be set to an integer. Fig.2 represents an abstract view of the cell. The cells at the borders of the circuit are named border cells and all their properties are set to -1. For all other cells, the initial value for all properties is 0. Table 1 lists the cell properties and their possible assigned values for non-border cells and table 2 lists the equivalent cell function for each value of the “function” property.

A New Method to Find Developmental Descriptions for Digital Circuits

75

Fig. 1. (a) The grid of cells in a circuit; (b) Potential inputs for the cell (i,j); (c) Naming of the neighbors

Rule#1 Rule#2 . . . Rule#n

Function Input1 Input2 row col

Fig. 2. An abstract view of one cell

Fig. 3. The structure of one rule

Table 1. The cell properties and their valid values Parameter Maximum Value

Function 0-7

Input 1 0-2

Input 2 0-2

row No limit

col No limit

Table 2. The output of each cell based on its function value “function” value Cell’s output

0 0

1 Input1

2 ~Input1

3 Input1 AND Input2

4 Input1 OR Input2

5 Input 1 XOR Input 2

6 Input 1 XNOR Input 2

7 Input 1 NAND Input 2

2.2 Developmental Program The developmental program is stored in the genome. The circuit size is fixed to a certain size at the beginning and there is no growth in terms of increasing the number of cells in the circuit. The genome is simply a variable number of ordered IF-THEN rules (Fig 3.). The IF part can check any property of any neighborhood cell. Based on the values of that property, the rule can set or update any property in the calling cell. The general format of a rule as shown in Fig.3 is as follows:

76

M. Ebne-Alian and N. Kharma

IF the property p of the neighbor n has the relation r to the value a THEN either assign the value a or do the action s on the value b and assign it to property p’ of the cell In which p and p’ can be any property of a cell (e.g. function, first input connection, etc), n is the index of the neighbor (0 to 7, for any of the 8 adjacent cells in fig.1.c), r is one of the possible relation from table 3 and a and b are the possible values for p and p’, respectively. The list of possible actions on the parameter b is listed in table 4. Only the parameters n, p, r, a, s, b, p’ are stored in the genome. For example, the third rule in the table 4 (1 0 1 -1 3 0 0) reads as follows: If the function of the neighbor 1 is equal to -1, then the row property of the cell should be set to 0. It is important to keep in mind that the row and col properties of the cell follow the very same regime as of other properties of the cell; i.e. they are initiated to 0 and are only changed by the developmental program. It is possible for them to gain any value at the end of development of the circuit, and not necessarily hold the coordination of the cell in the circuit. Table 3. The possible relations to be used in each rule of the genome 0 ≠

Value of “r” in the rule Corresponding relation

1 =

2


Table 4. The possible actions in the THEN part of each rule Value of “s” in the rule Corresponding action

0, 1, 2 Assign b

3 Assign a

4 Assign a+1

5 Assign a-1

There are 4 pre-written rules in the genome which affect the row and col properties of the cell. These rules are manually designed and added to the genome (table 5). These rules aim to simulate the protein gradient along the embryo of multicellular organisms at the axis specification step[14]. The rest of the rules in the genome are generated randomly using an even distribution random generator, and are tuned in the course of evolution. The number of rules in a genome is limited to 25 plus the 4 prewritten rules. Table 5. The 4 pre-defined rules in the genome

Rule index

Rule

1 2 3 4

1 3 3 -2 3 4 0 3 4 3 -2 4 4 0 1 0 1 -1 3 0 0 3 0 1 -1 4 0 0

During the development of the circuit, cells update their structure synchronously. A developmental step is composed of updating all the columns of the circuit, starting

A New Method to Find Developmental Descriptions for Digital Circuits

77

from the leftmost column and moving to the next column at the right until reaching the rightmost column. Updating each column is done by updating the topmost cell in the column and then move to the next cell at the bottom, until reaching the lowest cell in the column. A solution is a genome (i.e. rule base) which leads the desired behavior to emerge in the circuit after going through a certain number of developmental steps. The number of developmental steps needed for this is determined by the evolution, as is the genome itself.

3 Applying Evolution to Find the Developmental Program 3.1 User Interface and Problem Statement We apply evolution as a tool to find the solution to the given circuit design problem. As explained in section 2.2, the solution is a developmental program of the format mentioned in that section, necessary number of steps for the circuit development as well as the size of the circuit. Note that the developmental program itself does not provide or care about the size of the circuit. Any developmental program can be run on any circuit of any size. It is evolution’s task to find the appropriate circuit size for the developmental program. To define a specific problem user has to state the number of inputs, number of outputs, and the mapping between the input patterns and the output(s). The latter one is done by telling the program the set of minterms created on each output pin. No information about the circuit’s possible internal architecture is provided from the user. For example, the following lines define a full adder: Number of inputs: 3; Number of outputs: 2 output[0] = {3, 5, 6, 7} //carry output[1] = {1, 2, 4, 7} //sum Evolution also gives the exact position of each input and output signal on the circuit. Unlike some previous works in which user had to fix the position and the order of input and output signals, evolution is free to find the optimum placement of the I/O signals on the circuit. It is easy to realize that relaxing the I/O interface in this manner increases the density of the solutions in the search space. The easiest support of this is that the horizontal flip of a solution circuit is now a solution circuit itself, something which will not be the case if the inputs are fixed. The inputs are always provided on the left border (cells [i][0]) and the outputs are read from the right border of the circuit (cells[i][N-1]). Fig. 5 shows a sample full adder found by the program for the above description. It is important to remember that the program does not directly find the circuit in Fig. 5, but a generative code which makes the circuit after going through the developmental process. 3.2 Evolutionary Algorithm Evolution starts by creating a fixed sized population of random individuals. The population size was 500 in most of our experiments. As Halavati explains in [10], for

78

M. Ebne-Alian and N. Kharma

evolution of cooperative rule base systems for static problems in which all the training instances are available, the Pittsburgh approach [11] with the individual having the whole rule-base works better than the Michigan approach [12] in which each individual is only one rule and the whole population together form the rule-base system. Each individual then is a complete circuit here, including the developmental program, the number of developmental steps for that program and the size of the circuit. To create a random individual first we create a random sized circuit with the following restrictions (Eq. 1): Number of inputs+2 M) else 0 1 1 If we provide the values M−L and H−M , the computation of the membership value can then be executed only with a floating point substractor, a comparator, a multiplier and some multiplexers and selection logic, as shown in Figure 6. Note that, as mentioned above, the whole fuzzification module is pipelined and thus needs some delay elements (the gray blocks in the figure) in order to synchronize the required data at the corresponding pipeline levels.

6

Results

The modeling problem involves discriminating two categories of patients based on their gene-expression profiles. It admits a relatively high number of variables and consequently, a huge search space. An initial, exploratory number of software-based evolutionary fuzzy modeling runs, and the subsequent analysis showed that many different systems were capable of satisfactorily solving the pursued discrimination problem. Furthermore, we observed that there exist many, radically different, pools of genes that may lead to highly accurate models (i.e., 100% classification with very few rules and variables). This fact, besides being unusual for a fuzzy modeling project, obliged us to redefine our main modeling goal. We thus focused our experiments on detecting highly-frequent models and genes across a large number of fuzzy modeling runs in order to unveil common patterns, which implied performing many evolutionary runs. In addition, we took advantage of the multiple evolutionary runs to perform cross-validation analysis. Finally, in order not to be stuck by a too long computational time, we had to apply our hardware diagnosis-finding system to this problem. Concretely, the sample database consisted of 1016 biomarkers values for 32 patients suffering (or not) from cancer. To assess overfitting in the data, we conducted our evolutionary runs with only 31 patients and used the remaining one for cross-validation. We thus executed 3200 successive runs, i.e. 100 runs with

236

J. Rossier and C. Pena

each one of the patients used as a cross-validator, and we considered only the resulting systems that had a 100% correct classification on the 31 patients but also a correct prediction on the left-out case (68% of 3200). The evolutionary runs were conducted having a population size of 300 individuals, an elitism value of 1 (the best individual is copied to the next generation), a rank-based probabilistic selection of the ancestor participating to the crossover and a probability of mutation of 1/400 for each bit of the genome (793 bits). The fitness was defined as (specificity+0.8·sensitivity)/1.8 if the system is not a perfect classifier yet and 1+f(size) in the other case. Note that in about 400 generations each run gave rise to perfect classifiers, and the remaining generation were thus used to reduce the size of the system (number of used rules and biomarkers). This procedure thus gave us 2176 prefect classifiers, each of them containing specific biomarkers combinations (with 97% of them using less than 5 biomarkers). From this data we could study the frequency of appearance of specific biomarkers or specific pairs of biomarkers through all the selected runs. This gave us some hints about the significance of the different biomarkers in the diagnosis of the disease, but this discussion lies beyond the scope of this paper.

7

Speedup

We have defined our system using different kinds of pipelining and parallelization for its execution to be really fast. At the HW/SW design level, we have several parallel data flows using specific threads and hardware resources, and each of them is structurally pipelined (see Figure 4). One level below, as shown in Figure 5, we have designed our computational cores to be also pipelined, comprising a fuzzification stage, an implication stage and an aggregation/defuzzification stage. Moreover, as it can also be seen in this figure, each of these stages contains in itself some parallelism (e.g. the membership functions, respectively the rules computation as well as the multiplications in the defuzzification block, are all computed concurrently). Finally, at the lowest level of the system, we also implemented some pipelining behaviors (see Figures 5 and 6). With the efficient use of the above mentioned hardware acceleration techniques, we implemented and exhaustively tested our system with the setup exposed in the previous section. It exhibited a speedup of about 150 with respect to a standard C++ software implementation. To give an order of magnitude, for this specific modelling project, the computation of 3200 evolutionary runs, each of them consisting of 1000 generations with 300 individuals (i.e. 300 fuzzy systems encoded following the explanations given in Section 3.1) for the 1016biomarker 32-patient database, takes about 9.3 hours on our system, a time that compares very well with the about 2 months !!! required to perform the same computation with the software-only implementation.

8

Conclusion

In addition to finding fuzzy systems based on a database of biomarker samples and able to give a valid diagnosis for an unknown patient, the computational

Extrinsic Evolution of Fuzzy Systems Applied to Disease Diagnosis

237

speedup exhibited by our implementation enables us to also use our system to generate statistically representative measures. Indeed, when repeating several thousand times the evolutionary process of finding a valid fuzzy system, we end up with several thousand sets of biomarkers allowing a correct disease diagnosis. Analyzing these sets then enables us to propose to the biologists the most significant biomarkers for a specific disease out of all the ones that have been sampled within the database. This information can then be used by biologists to give them some hints about particular relations between the specific biomarkers and the disease itself. Moreover this frequency-based analysis enables us to further reduce the size of the database, diminishing the number of biomarkers that have to be measured to correctly guess the presence of the disease in a patient. To summarize, we can say that the evolutionary process used within our system enables us to quite “easily” find simple and accurate fuzzy systems applied to the diagnosis of specific diseases. Moreover, the use of fuzzy logic to realize the diagnosis systems permits to end up with linguistically meaningful predictive systems that are understandable by biologists. Finally, the hardware implementation of the computational core of our system and the great resulting speedup allows obtaining statistically representative information in a tractable time (as opposed to a full software implementation). The combination of evolutionary techniques, fuzzy logic systems and specific hardware design has thus proven its great potential in answering several kinds of complex biological questions.

References 1. Zadeh, L.A.: Fuzzy sets. Information and Control 8, 338–353 (1965) 2. Zadeh, L.A.: The concept of a linguistic variable and its applications to approximate reasoning. Information Science, Parts I 8,199–249, II 8, 301–357, III 9, 43–80 (1975) 3. Mamdani, E.H.: Application of fuzzy algorithms for control of a simple dynamic plant. Proc. of the IEE 121(12), 1585–1588 (1974) 4. Mamdani, E.H., Assilian, S.: An experiment in linguistic synthesis with a fuzzy logic controller. Int. Journal of Man-Machine Studies 7(1), 1–13 (1975) 5. Sugeno, M., Kang, G.T.: Structure identification of fuzzy model. Fuzzy Sets and Systems 28(1), 15–33 (1988) 6. Takagi, Y., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Systems, Man and Cybernetics 15, 116–132 (1985) 7. Zadeh, L.A.: A fuzzy-set-theoretic interpretation of linguistic hedges. Cybernetics and Systems 2(3), 4–34 (1972) 8. Yager, R.R., Filev, D.P.: Essentials of fuzzy modeling and control. John Wiley & Sons, New York (1994) 9. Mendel, J.M.: Fuzzy logic systems for engineering: A tutorial. Proc. of the IEEE 83(3), 345–377 (1995) 10. Pena-Reyes, C.-A., Sipper, M.: Fuzzy CoCo: Balancing accuracy and interpretability of fuzzy models by means of coevolution. In: Accuracy Improvements in Linguistic Fuzzy Modeling. Studies in Fuzziness and Soft Computing, vol. 129, pp. 119–146 (2003)

Automatic Code Generation on a MOVE Processor Using Cartesian Genetic Programming James Alfred Walker, Yang Liu, Gianluca Tempesti, and Andy M. Tyrrell Intelligent Systems Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UK {jaw500,yl520,gt512,amt}@ohm.york.ac.uk

Abstract. This paper presents for the first time the application of Cartesian Genetic Programming to the evolution of machine code for a simple implementation of a MOVE processor. The effectiveness of the algorithm is demonstrated by evolving machine code for a 4-bit multiplier with three different levels of parallelism. The results show that 100% successful solutions were found by CGP and by further optimising the size of the solutions, it is possible to find efficient implementations of the 4-bit multiplier that have the potential to be “human competitive”. Further analysis of the results revealed that the structure of some solutions followed a known general design methodology.

1

Introduction

In the past decade, evolvable hardware has attracted interest from both circuits design and evolutionary computation. Generally there are two main branches of applications: optimising the elementary parameters of a circuit [2], and more interestingly, creating a circuit from smaller compositional units [4,11]. In the latter, Cartesian Genetic Programming (CGP)[5] has demonstrated great potential to create combinatorial circuits. However, previous evolutionary digital circuit designs are limited by the fine-grained nature of the devices. Gate level programmability provides the system with high flexibility but also enlarges the overall search space and thus increases the degree of complexity tending to reduce the overall functionality that can be achieved. Lifting the granularity from gate level to processor architecture level provides another form of evolutionary medium (substrate). A typical coarse-grained programmable machine is a general purpose processor (GPP). At this level, Nordin et al proposed a graph-based genetic programming technique, known as Linear Genetic Programming (LGP), to automatically generate machine code for GPPs [7]. The behaviour of a processor is expressed by executing the evolved instructions. However, except for conventional computer applications, GPPs are not always feasible due to performance reasons or the overall manufacturing cost. Usually, a highly customised computing architecture is more suitable for specific domains. Between the fine-grained and coarse-grained architectures, another architecture exists, which is known as Application Specific Instruction Processors (ASIP). G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 238–249, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Automatic Code Generation on a MOVE Processor

239

As the name suggests, an ASIP usually performs application oriented functionalities and the instructions are coded for particular operations. In order to reduce the complexity of the decoder logic and the length of the execution pipeline, an operation of an ASIP is usually an atomic control of the internal data flow of the processor. In this paper, a simple implementation of an ASIP, called MOVE, is evolved focussing on automatic machine code generation using Cartesian Genetic Programming (CGP). Section 2 reviews the transport triggered architecture (TTA). Section 3 describes the CGP algorithm for code generation. Section 4 presents a demonstration of how to create a particular function using a sequence of evolved instructions. Finally, Section 5 concludes the paper and proposes the future work.

2

Transport Triggered Architecture

The transport triggered architecture (TTA) was created in the 1980’s, as a development of the Very Long Instruction Word (VLIW) architecture ([1]). A MOVE processor is an instantiation of the TTA. It has been used in other bio-inspired systems, such as the POEtic Project [9,6], because of its simplicity and modularity. It has also been adopted by the SABRE1 project [3], as high modularity provides intrinsic fault tolerance. In this section, the basic components and typical features of the TTA are briefly reviewed. The TTA usually contains an instruction decoder, an interconnection network and a number of functional units (FU), as shown in Fig. 1. Functional units and the decoder are connected through data and address buses. A single data/address bus pair is called a slot. A processor can have multiple slots to allow parallelism. An input/output interface of a functional unit is called a port. Ports are usually globally addressable. The connection of a port to a slot is called a socket and the number of conjunctions in a socket is flexible. In the TTA, the instruction decoder is a unit that fetches code from the memory, decodes the instructions and controls the interconnecting network. The

Fig. 1. Transport triggered architecture [6] 1

Self-healing cellular Architectures for Biologically-inspired highly Reliable Electronic systems.

240

J.A. Walker et al.

Fig. 2. Instruction format

decoder has a program counter (PC) which points to the address of next instruction in the memory. The structure of a decoder is generally very simple, because instructions of a TTA only contain two types of information: the source port (SRC) address (or intermediate value) and the destination port (DST) address, as shown in Fig. 2. A source address and a destination address together indicate a data flow from one port to another. From this perspective, the TTA has similar characteristics to both control flow and data flow machines. Compared with conventional RISC, CISC or VLIW architectures, which are also called operation triggered architectures (OTA), TTA only has one instruction, which is: move destination, source. Some destination ports only store the incoming data, whilst others receive the data and also trigger the operation. Operations are not explicitly expressed in the instructions, but implicitly referred to by the address of the destination port. Once the decoder retrieves the addresses from an instruction, it will open the corresponding source and destination ports. In order to illustrate the implicit operations, we compare the TTA and RISC instruction formats. Most single RISC instructions can be represented by: opcode, result, operand0, operand1. It can be decomposed into 3 separate move operations, as shown in Table 1. The ports are presented by the name of a functional unit with subscripts. The triggered inputs of the functions are denoted xxxt and the non-triggered inputs are denoted by a number, for example, xxx0 . The outputs of the functions are denoted as xxxr for the result. There are some interesting features which distinguish TTA from conventional architectures. Firstly, a TTA processor does not need to explicitly transport the result from a FU to a general purpose register, unless the data is still useful after the next operation on the same FU has started. As shown on the right column in Table 1, a result of an FU can be directly transported into another FU as an operand. Therefore, there is a high probability that a TTA processor will use Table 1. Examples of RISC and TTA instructions. The adder, subtractor and registers are denoted by addx , subx , and r1 to r3, respectively) RISC

TTA

add r2, r2, r1

move move move move move move

sub r3, r3, r2

Optimised TTA add0 , r1 addt , r2 r2, addr sub0 , r2 subt , r3 r3, subr

move add0 , r1 move addt , r2 move sub0 , addr move subt , r3 move r3, subr

Automatic Code Generation on a MOVE Processor

241

fewer general purpose registers than conventional OTA processors. Secondly, the granularity of functional units is highly flexible. The bitwidth and the number of data buses can range from 1 up to hundreds, and the functionality can range from a simple boolean operation up to a complicated integration transform. However, the instruction format and the bus control mechanism remains the same.

3

Generating MOVE Code Using CGP

There are various approaches that use GP techniques to evolve machine code for processors. For example, LGP is deliberately designed for OTA machines [7]. As mentioned in section 2, most RISC instructions can be decomposed into 2 or 3 MOVE instructions. Due to the intrinsic relationship of LGP and its target OTA machines, it is appropriate to use LGP to generate a RISC code and then translate this into a series of MOVE instructions. However, it may need an extra optimiser to bypass the redundant use of general purpose registers during or after the translation. Alternatively, CGP does not favour in any specific hardware architecture. Therefore, it is possible to apply CGP directly to a TTA. CGP was originally developed by Miller and Thomson [5] for the purpose of evolving digital circuits. It represents a program as a directed graph (that for feed-forward functions is acyclic). The benefit of this type of representation is that it allows the implicit re-use of nodes, as a node can be connected to the output of any previous node in the graph, thereby allowing the repeated re-use of sub-graphs. This is an advantage over tree-based GP representations (without ADFs) where identical sub-trees have to be constructed independently. Originally, CGP used a program topology defined by a rectangular grid of nodes with a user defined number of rows and columns. However, later work on CGP showed that it was more effective when the number of rows is chosen to be one [12]. This one-dimensional topology is used in this paper. In CGP, the genotype is a fixed length representation consisting of a list of integers which encode the function and connections of each node in the directed graph. However, CGP uses a genotype-phenotype mapping that does not require all of the nodes to be connected to each other, this results in the program (phenotype) being bounded but having variable length. Thus there maybe genes that are entirely inactive, having no influence on the phenotype, and hence the fitness. Such inactive genes therefore have a neutral effect on genotype fitness. This phenomenon is often referred to as neutrality. The influence of neutrality in CGP has been investigated in detail [5,12] and has been shown to be extremely beneficial to the efficiency of the evolutionary process on a range of problems. In this paper, the CGP genotype decodes to a phenotype that resembles a linear string of MOVE instructions. This is similar to the idea of applying CGP to the lawnmower problem [10]. However, the main difference between the work in [10] and this paper is how the linear string of instructions is constructed. In [10], the terminals to the CGP program represent the control instructions (such as move forward, turn right, etc) and the function set consisted of program nodes and other manipulative functions (such as vector addition) to assemble

242

J.A. Walker et al.

Fig. 3. A CGP genotype and corresponding phenotype for a string of MOVE instructions. The inactive areas of the genotype and phenotype are shown in grey dashes.

the control instructions. In this paper, the terminals to the CGP program don’t actually represent anything, they simply act as a method for terminating the current branch of execution. It is the function of each node that performs the MOVE operations (using values from a lookup table) and it is how the nodes are decoded that produces the linear string of instructions. The CGP genotype is decoded using a recursive approach, which starts at the output terminal and iterates backwards through the graph, ensuring that the first input connection of the node is always evaluated before the second node input. An example is shown in Fig. 3. Once the linear string of instructions has been constructed, it is evaluated by iterating through the sequence and performing each MOVE instruction, which will transfer values between the input, function, and output registers. Once all of the instructions have been performed, it is the value in the output register that is compared with the perfect solution.

4

Experiment

To demonstrate the approach described in section 3, the algorithm is applied to a “4-bit multiplier” problem. The aim of the CGP algorithm is to find an efficient solution to the 4-bit multiplier problem using a MOVE architecture. In order to achieve this, a two stage fitness function is used. Initially, the fitness function evaluates all possible input combinations and performs a summation of the number of instances where the output of the evolved solution differs from that of the perfect solution. Once a solution is found, the fitness function minimises the length of the instruction sequence produced by evolution whilst keeping the functionality of the solution fixed. This is similar to the approach used in [4] for finding efficient logic circuits.

Automatic Code Generation on a MOVE Processor

243

Table 2. The parameters used by CGP for the 4-bit multiplier problem Parameter

Value

Population size 5 Genotype length (nodes/genes) 200/600 Mutation rate (% of genes) 2 Run length (generations) 10,000,000 Number of runs 50

4.1

Parameters

The parameters used for the experiment are shown in Table 2. The CGP algorithm uses the (1 + 4) evolutionary strategy (a population size of 5) that is normally associated with the technique [5,12,10,4]. Each CGP node was allowed two inputs, so that the implicit re-use of nodes in the genotype was possible. The mutation rate was chosen based on previous experience and no crossover operator is used. The run length was chosen to allow the CGP algorithm enough time to find a solution and then optimise the solution size. The permitted MOVE operations for a single slot, which the CGP algorithm is allowed to use for the 4-bit multiplier with this function set are shown in Table 3. The function set used consists of the functional units: addition (two inputs, one output), shift left with carry (one input, two outputs), shift right with carry (one input, two outputs) and a multiplexer (three inputs, one output). All functional units have a triggered input and possibly other non-triggered inputs depending on the functional unit. In addition to the function set, there are also three input registers (the third providing a constant value of ”0”) and an output register. As the 4-bit multiplier produces a 8-bit output, all input, output and function registers are 8-bit. In addition to the notations used to describe the MOVE operations, as mentioned in Section 2, xxxc is used to present the carry out. 4.2

Results

Three versions of the CGP algorithm were run on the 4-bit multiplier problem in order to investigate whether performing MOVE operations in parallel was beneficial to both evolution and the overall solution size. In terms of the number of slots implemented in the MOVE processor, the three different variants of CGP are denoted as sequential for a single-slot, parallel2 for a double-slot and parallel3 for a triple-slot. The parallel2 and parallel3 algorithms were implemented with extended function sets that allowed some of the permitted MOVE operations to occur in parallel. For example, move operations to FUs with two input ports in parallel2 and move operations to FUs with two or three input ports in parallel3. In future work, it is intended to parallelise all possible combinations of permitted MOVE operations.

244

J.A. Walker et al. Table 3. The permitted move operations for the function set Addition (add) in0 in1 in2 addr shlr shlc shrr shrc muxr in0 in1 in2 addr shlr shlc shrr shrc muxr addr

→ → → → → → → → → → → → → → → → → → →

add0 add0 add0 add0 add0 add0 add0 add0 add0 addt addt addt addt addt addt addt addt addt out0

Shift Left (shl) in0 in1 in2 addr shlr shlc shrr shrc muxr shlr shlc

→ → → → → → → → → → →

shlt shlt shlt shlt shlt shlt shlt shlt shlt out0 out0

Shift Right (shr) in0 in1 in2 addr shlr shlc shrr shrc muxr shrr shrc

→ → → → → → → → → → →

shrt shrt shrt shrt shrt shrt shrt shrt shrt out0 out0

Multiplexer (mux) in0 in1 in2 addr shlr shlc shrr shrc muxr in0 in1 in2 addr shlr shlc shrr shrc muxr in0 in1 in2 addr shlr shlc shrr shrc muxr muxr

→ → → → → → → → → → → → → → → → → → → → → → → → → → → →

mux0 mux0 mux0 mux0 mux0 mux0 mux0 mux0 mux0 mux1 mux1 mux1 mux1 mux1 mux1 mux1 mux1 mux1 muxt muxt muxt muxt muxt muxt muxt muxt muxt out0

All three versions of the CGP algorithm were capable of finding 100% successful solutions to the 4-bit multiplier problem within the generation limit. Fig. 4 is a box and whisker plot showing the time taken for all three algorithms to find a successful solution. From the figure, it can be seen that the sequential algorithm performs best on average and that as more MOVE operations are performed in parallel the performance of the algorithm degrades. This could be attributed to the fact that the search space increases in proportion to the number of parallel MOVE operations, as the function sets scale from 69 MOVE operations for the sequential algorithm to 190 and 638 MOVE operations for the parallel2 and parallel3 algorithms respectively. This highlights the trade-off between improving the run-time performance of the solution and increasing the solution complexity. Fig. 5 shows a comparison between the size of the solution when it was first discovered and the size of the efficient solution after optimisation. From Fig. 5(a), it can be seen that on average, allowing parallel MOVE operations decreases the

Automatic Code Generation on a MOVE Processor

245

Fig. 4. The number of generations required to find a solution for the sequential, parallel2 and parallel3 algorithms

(a) Before optimisation

(b) After optimisation

Fig. 5. The number of instructions per solution before (a) and after (b) optimisation for sequential, parallel2 and parallel3

size of the solution found. From Fig. 5(b), it can be seen that all three algorithms reduce the size of the solutions found in Fig. 5(a) between 13.7 and 16.6 times on average, whilst still maintaining the trend that parallelism decreases the size of the solution. This allows for the evolution of efficient and feasible designs for the 4-bit multiplier on the MOVE architecture using both sequential and parallel MOVE operations. The most efficient solutions found for the sequential, parallel2 and parallel3 versions of CGP are shown in Table 4.

246

J.A. Walker et al.

Table 4. The best efficient solutions evolved by the sequential, parallel2 and parallel3 versions of CGP Clock

Sequential

Cycle

Slot 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

in0 in2 in1 shrr muxr shrr shrr muxr shrr shlr addr shrr muxr shlr shlr in1 addr muxr shlr addr

→ → → → → → → → → → → → → → → → → → → →

mux1 mux0 shrt muxt add0 shrt muxt shlt shrt addt add0 muxt shlt shlt addt muxt shlt add0 addt out0

Clock Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14

4.3

Parallel2 Slot 1 in1 in0 shrr shrc addr shrc shlr shrr muxr shlr shrc shlr muxr shrr in0 muxr in1 muxr

→ → → → → → → → → → → → → → → → → →

shrt add0 shrt mux0 shlt muxt add0 shrt mux0 shlt muxt add0 mux0 muxt add0 mux0 muxt out0

Slot 2 |

in0 → addt

| addr → mux1 | muxr → addt | addr → mux1 | muxr → addt | addr → mux1 | muxr → addt | addr → mux1

Parallel3 Slot 1 in0 in1 shlc shlr muxr shrr shlr shlr muxr shrr shlr addr muxr muxr

→ → → → → → → → → → → → → →

shrt shlt mux0 add0 mux0 shrt shlt add0 mux0 shrt add0 addt mux0 out0

Slot 2

Slot 3

| in1 → mux1 | shrc → muxt | muxr → addt | addr → mux1 | shrr → muxt | muxr → addt | addr → mux1 | shrr → muxt | muxr → addt | addr → mux1 | shrr → muxt

Further Analysis of Solutions

In order to determine whether it is possible to extract any general design methodology from the evolved solutions, two results of the sequential algorithm are

Automatic Code Generation on a MOVE Processor

247

Table 5. A Comparison between the structure of an efficient and a left-shift solution discovered by sequential CGP Clock Cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Efficient Solution

Left-shift Solution

in0 in2 in1 shrr muxr shrr shrr muxr shrr shlr addr shrr muxr shlr shlr in1 addr muxr shlr addr

in0 shlr shlr shlr shlr in1 shlr addr shlr shlc muxr shlr addr shlr shlc muxr shlr addr shlr shlc muxr shlr addr shlr shlc muxr

→ → → → → → → → → → → → → → → → → → → →

mux1 mux0 shrt muxt add0 shrt muxt shlt shrt addt add0 muxt shlt shlt addt muxt shlt add0 addt out0

→ → → → → → → → → → → → → → → → → → → → → → → → → →

shlt shlt shlt shlt shlt add0 addt mux1 mux0 muxt shlt addt mux1 mux0 muxt shlt addt mux1 mux0 muxt shlt addt mux1 mux0 muxt out0

compared in Table 5. The left column shows the most efficient solution discovered by the sequential algorithm. However, it is hard to determine whether it follows a general design methodology due to its ad hoc structure. The right column of Table 5, shows another solution discovered by the sequential algorithm. Although the number of instructions is larger than the efficient solution, repetitive patterns (also referred to as building blocks) can be observed throughout the solution. On examining the structure of the solution, it was found to follow a general design methodology known as the left-shift algorithm [8], which is one form of the shift-and-add multiplication algorithms. This general design methodology can be used to generate larger multipliers (i.e. 32-bit) for a MOVE processor. Alternatively, it may be possible to implement some of the discovered building blocks in the CGP function set in order to improve the scalability of the algorithm on larger multipliers. This will be investigated in future work. Generally, a repetitive pattern in a CGP design implies CGP node re-use, which usually facilitates the evolution speed. As previously mentioned, the CGP algorithm did not involve any loop branches (creating a cyclic graph) due to

248

J.A. Walker et al.

the nature of representation. However, node re-use is effectively equivalent to a “for-loop” structure in higher level programming languages because at runtime a for-loop is also executed sequentially. The practical difference between node re-use and a for-loop only resides in their respective static forms, namely the size of the code. The size of the code is also affected by the number of slots available in the MOVE processor. This is very similar to the VLIW architecture. The total program storage is calculated by multiplying the total number of slots by the number of long word instructions. For instance, in Table 4, slots 2 and 3 in Parallel3 are free at the first clock cycle. However, additional “NOP” instructions have to be inserted into the free slots. Therefore, the actual size of the code for the three algorithms in Table 4 is 20, 36 and 42. Although Parallel3 runs faster than the others, it occupies a larger memory space. Speed and memory space are two significant criteria on which to evaluate a piece of machine code.

5

Conclusions and Future Work

This paper has presented for the first time the application of evolving machine code on a MOVE architecture using CGP. The results show that CGP is capable of evolving machine code that consists of sequential and parallel operations for the 4-bit multiplier. It has also been shown that by modifying the fitness function once a solution is found, it was also possible to discover efficient solutions that could potentially be classed as “human competitive”. In order to further our exploration in generating more effective code, there are a number of directions for future work. Firstly, a multi-objective optimisation technique will be introduced to comprehensively evaluate the result, as this would allow us to optimise the solutions for both performance and memory footprint. Secondly, the scalability of the CGP approach will be investigated on larger multipliers (e.g. 8-bit, 16-bit, 32-bit) and other complicated problems, in order to assess the computational feasibility of the approach for “real world” problems. Finally, the implementation of conditional loops should be investigated, as it can drastically affect the size of the code, especially when the number of loops is very large. Also, as MOVE is a highly customised processor, some performance-critical code (for example, a repetitive pattern of instructions) may also be transformed to a hardware functional unit to speed up the execution time.

References 1. Corporaal, H.: Microprocessor Architectures: From VLIW to TTA. John Wiley & Sons, Inc., New York (1998) 2. Hilder, J., Walker, J., Tyrrell, A.: Optimising variability tolerant standard cell libraries. In: IEEE Congress on Evolutionary Computation, CEC (2009) 3. Liu, Y., Timmis, J., Qadir, O., Tempesti, G., Tyrrell, A.: A developmental and immune-inspired dynamic task allocation algorithm for microprocessor array systems. In: Hart, E. (ed.) ICARIS 2010. LNCS, vol. 6209, pp. 199–212. Springer, Heidelberg (2010)

Automatic Code Generation on a MOVE Processor

249

4. Miller, J.F., Job, D., Vassilev, V.K.: Principles in the evolutionary design of digital circuits - part I. Genetic Programming and Evolvable Machines 1(1), 8–35 (2000) 5. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 6. Mudry, P.A.: A hardware-software codesign framework for cellular computing. Ph.D. thesis, EPFL (2009) 7. Nordin, P.: Evolutionary Program Induction of Binary Machine Code and its Applications. Ph.D. thesis, Universitat Dortmund am Fachereich Informatik (1997) 8. Parhami, B.: Computer Arithmetic: Algorithms and Hardware Designs. Oxford University Press, New York (2000) 9. Rossier, J., Thoma, Y., Mudry, P.A., Tempesti, G.: MOVE processors that selfreplicate and differentiate. In: Ijspeert, A.J., Masuzawa, T., Kusumoto, S. (eds.) BioADIT 2006. LNCS, vol. 3853, pp. 160–175. Springer, Heidelberg (2006) 10. Walker, J.A., Miller, J.F.: Embedded cartesian genetic programming and the lawnmower and hierarchical-if-and-only-if problems. In: Proceedings of the 2006 Genetic and Evolutionary Computation Conference (GECCO). ACM, New York (2006) 11. Walker, J.A., Miller, J.F.: The automatic acquisition, evolution and reuse of modules in cartesian genetic programming. IEEE Transactions on Evolutionary Computation 12, 397–417 (2008) 12. Yu, T., Miller, J.F.: Neutrality and the evolvability of boolean function landscape. In: Miller, J., Tomassini, M., Lanzi, P.L., Ryan, C., Tetamanzi, A.G.B., Langdon, W.B. (eds.) EuroGP 2001. LNCS, vol. 2038, pp. 204–217. Springer, Heidelberg (2001)

Coping with Resource Fluctuations: The Run-time Reconfigurable Functional Unit Row Classifier Architecture Tobias Knieper1 , Paul Kaufmann1 , Kyrre Glette2 , Marco Platzner1 , and Jim Torresen2 1

University of Paderborn, Department of Computer Science, Warburger Str. 100, 33098 Paderborn, Germany {tknieper,paul.kaufmann,platzner}@upb.de 2 University of Oslo, Department of Informatics, P.O. Box 1080 Blindern, 0316 Oslo, Norway {kyrrehg,jimtoer}@ifi.uio.no

Abstract. The evolvable hardware paradigm facilitates the construction of autonomous systems that can adapt to environmental changes and degrading effects in the computational resources. Extending these scenarios, we study the capability of evolvable hardware classifiers to adapt to intentional run-time fluctuations in the available resources, i.e., chip area, in this work. To that end, we leverage the Functional Unit Row (FUR) architecture, a coarse-grained reconfigurable classifier, and apply it to two medical benchmarks, the Pima and Thyroid data sets from the UCI Machine Learning Repository. We show that FUR’s classification performance remains high during changes of the utilized chip area and that performance drops are quickly compensated for. Additionally, we demonstrate that FUR’s recovery capability benefits from extra resources.

1

Introduction

Evolvable hardware (EHW) denotes the combination of evolutionary algorithms with reconfigurable hardware technology to construct self-adaptive and selfoptimizing hardware systems. The term evolvable hardware was coined by de Garis [1] and Higuchi [2] in 1993. While the majority of EHW related work focus on the evolution of functional correct circuits or circuits with a high functional quality, some authors investigates the robustness of EHW. The related literature spans this area from offline evolution of fault-tolerant circuits able to withstand defects in silicon [3] without increasing circuit’s size significantly [4] or compensating supply voltage drifts [5] by recurrent re-evolution after a series of deteriorating events as the wide-band temperature changes or radiation beams treatments [6,7]. Evolvable hardware has a variety of applications, one of which are classifier systems. A number of studies report on the use of EHW for classification applications such as character recognition [8], prosthetic hand control [9], sonar G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 250–261, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Coping with Resource Fluctuations: The Run-time Reconfigurable FUR

251

return classification [10,11], and face image recognition [10]. These studies have demonstrated that EHW classifiers can outperform traditional classifiers such as artificial neural networks (ANNs) in terms of classification accuracy. For the electromyographic (EMG) signal classification, it has been showed that EHW approaches can perform close to the modern state-of-the-art classification methods such as support vector machines (SVMs) [9]. In this work we focus on robust EHW-based classifiers. The novelty is that we investigate classifier systems able to cope with changing resources at run-time and evaluate their classification performance while changing the size of the utilized chip area. To this end, we leverage the Functional Unit Row (FUR) architecture, a scalable and run-time reconfigurable classifier architecture introduced by Glette et al. [12]. During optimization, we increase and decrease the number of pattern matching elements included in FUR and study the development of the resulting classification accuracy and, specifically, the recovery capability of FUR. In contrast to most previous work that studies self-adaptation in response to stimuli from outside the system, we explicitly build our analysis on the assumption of resource competition between different tasks run inside an adaptable system. The paper is structured as follows: Section 2 presents the FUR architecture for classification tasks, its reconfigurable variant and the applied evolutionary optimization method. Benchmarks together with an overfitting analysis as well as the experiments with the reconfigurable FUR architecture are shown in Section 3. Section 4 concludes the paper and gives an outlook on future work.

2

The Reconfigurable Functional Unit Row Architecture

The Functional Unit Row (FUR) architecture for classification tasks was first presented by Glette in [12]. It is an architecture tailored to online evolution combined with fast reconfiguration. To facilitate online evolution, the classifier architecture is implemented as a circuit whose behavior and connections can be controlled through configuration registers, similar to the approach of Sekanina [7]. By writing the genome bitstream produced by a GA to these registers, one obtains the phenotype circuit which can then be evaluated. In [13], it was shown that the partial reconfiguration capabilities of FPGAs can be used to change the architecture’s footprint. The amenability of FUR to partial reconfiguration is an important precondition for our work. In the following, we present the organization of the FUR architecture, the principle of the reconfigurable FUR architecture, and the applied evolutionary technique. For details about the implementation of FUR we refer to [12]. 2.1

Organization of the FUR Architecture

Fig. 1 shows the overall organization of the FUR architecture. The FUR architecture is rather generic and can be used together with different basic pattern matching primitives [9,10]. It combines multiple pattern matching elements into

T. Knieper et al.

input

252

CCs



decision 

max

...

CDM



Fig. 1. The Functional Unit Row (FUR) Architecture is hierarchically partitioned for every category into Category Detection Modules (CDMs). For an input vector, a CDM calculates the likeliness for a previously trained category by summing up positive answers from basic pattern matching elements: the Category Classifiers (CCs). The CDM with most activated CCs defines the FUR’s decision.

a single module with graded output detecting one specific category. A majority voter decides for a specific category by identifying the module with the highest number of activated pattern matching elements. More specifically, for C categories the FUR architecture consists of C Category Detection Modules (CDMs). A majority vote on the outputs of the CDMs defines the FUR architecture decision. In case of a tie, the CDM with the lower index wins. Each CDM contains M Category Classifiers (CCs), basic pattern matching elements evolved from different randomly initialized configurations and trained to detect CDM’s category. A CDM counts the number of activated CCs for a given input vector, thus the CDM output varies between 0 and M . The architecture becomes specific with the implementation of the CCs. In our case we define a single CC as a row of Functional Units (FUs), shown in Fig. 2. The FU outputs are connected to an AND gate such that in order for a CC to be activated all FU outputs have to be 1. Each FU row is evolved from an initial random bitstream, which ensures a variation in the evolved CCs. The number of FU rows defines the resolution of the corresponding CDM. input pattern FU2

...



FUn AND

...

FU1

Fig. 2. Category Classifier (CC): n Functional Units (FUs) are connected to an n-input AND gate. Multiple CCs with a subsequent counter for activated CCs define a CDM.

a c

a>c

constant configuration input selection

253

MUX

input pattern

MUX

Coping with Resource Fluctuations: The Run-time Reconfigurable FUR

function selection

Fig. 3. Functional Unit (FU): The data MUX selects which of the input data to feed to the functions “>” and “≤”. The constant c is given by the configuration lines. Finally, a result MUX selects which of the function results to output.

The FUs are reconfigurable by writing the architecture’s register elements. As depicted in Fig. 3, each FU behavior is controlled by configuration lines connected to the configuration registers. Each FU has all input bits to the system available at its inputs, but only one data element (e.g., one byte) is selected. This data is then fed to the available functions. While any number and type of functions could be imagined, Fig. 3 illustrates only two functions for clarity. In addition, the unit is configured with a constant value, c. This value and the input data element are used by the function to compute the output of the unit. Based on the data elements of the input, the functions available to the FU elements are greater than and less than or equal. Through experiments these functions have shown to work well, and intuitively this allows for discriminating signals by looking at the different amplitudes. 2.2

Reconfigurable FUR Architecture

The notion of Evolvable Hardware bases on circuit optimization and reconfiguration. EHW-type adaptable systems improve their behavior in response to system internal and external stimuli, offering an alternative to classically engineered adaptable systems. While the adaptation to environmental changes represents the main research line within the EHW community, the ability to balance resources dynamically between multiple concurrent applications is still a rather unexplored topic. One the one hand, an EHW module might run as one out of several applications sharing a system’s restricted reconfigurable resources. Depending on the current requirements, the system might decide to switch between multiple applications or run them concurrently, albeit with reduced logic footprints and reduced performance. We are interested in scalable EHW modules and architectures that can cope with such changing resource profiles. On the other hand, the ability to deal with fluctuating resources can be used to support the optimization process, for example by assigning more resources when the speed of adaptation is crucial. The FUR architecture precisely fits this requirement as its structure can be changed (disregarding the register-reconfigurable FUs) along three dimensions, namely the number of

254

T. Knieper et al.

– categories, – FU rows in a category, and – FUs in a FU row. In this work we assume the numbers of categories and FUs in a FU row as constants reconfiguring the number of FU rows in a CDM. This is illustrated in Fig. 4. For a sequence I = {i1 , i2 , . . . , ik } we evolve a FUR architecture having i1 FUs per CDM, then switching to i2 FUs per CDM and re-evolving the architecture without flushing the configuration evolved so far. The key insights we want to gain by this investigation are the sensitivity of the FUR architecture measured in the classification accuracy to changes in the resources and the time for re-establishing near asymptotic accuracy quality. CDM 1

CC 1

CC 2

CC 2

CDM C ...1 CC CC 2

...

...

CC M

CC M

CC M

CC M+1

CC M+1

CC M+1

...

...

...

add CCs

CC 1

...

...

remove CCs

CDM 2

Fig. 4. Reconfigurable Functional Unit Row Architecture: The FUR architecture is configured by the number of categories, FU rows and FUs per FU row. In our work we fix the number of categories and FUs per FU rows while changing the number of FU rows per CDM.

2.3

Evolution of FUR Architecture

To evolve a FUR classifier we employ a 1 + 4 ES scheme. In contrast to previous work [12], we do not use incremental evolution evolving CDMs and FU rows separately but evolve the complete FUR architecture in a single ES run. The mutation operator is configured to mutate three genes in every FU row. In preparation to the experiments on the reconfigurable FUR architecture we investigate FUR’s general performance by evaluating it on a set of useful FU rows per CDM and FUs per FU row configurations. The performance is calculated by a 12-fold Cross Validation (CV) scheme.

3

Experiments and Results

In this section we present two kinds of results. Initially, we analyze FURs behavior by successively testing a range of parameter combinations. Combined with an overfitting analysis we are then able to picture FUR’s complete behavior for

Coping with Resource Fluctuations: The Run-time Reconfigurable FUR

255

a given benchmark. Afterwards, we select a good-performing configuration to investigate FUR’s performance, when being reconfigured during run-time. For this experiment we define multiple FUR architecture configurations with varying number of FU rows and plot the accuracy development, when switching between the configurations. 3.1

Benchmarks

For our investigations we rely on the UCI machine learning repository [14] and specifically, on the Pima and the Thyroid benchmarks. Pima, or the Pima Indians Diabetes data set is collected by the John Hopkins University in Baltimore, MD, USA and consists of 768 samples with eight feature values each, divided into a class of 500 samples representing negative tested individuals and a class of 268 samples representing positive tested individuals. The data of the Thyroid benchmark represents samples of regular individuals and individuals suffering hypo- and hyperthyroidism. Thus, the samples are divided into 6.666, 166 and 368 samples representing regular, subnormal and hyper-function individuals. A sample consists of 22 feature values. The Pima and the Thyroid benchmarks don’t rely on high classification speeds of EHW hardware classifiers, however, these benchmarks have been selected because of their pronounced effects in the run-time reconfiguration experiment revealing FUR’s characteristics. 3.2

Accuracy and Overfitting Analyses

We implement FUR’s parameter analysis by a grid search over the number of FU rows and number of FUs. For a single (i, j)-tuple, where i denotes the number

0.78

Pima (30,8): training vs. test accuracy

0.76

test accuracy

0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

training accuracy

Fig. 5. Overfitting analysis: In this example the test and training accuracies would be roughly 0.76 and 0.76, respectively.

T. Knieper et al.

0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74

0.95 0.9 accuracy

accuracy

256

0.85 0.8 0.75 0.7 0.65 0.6

0

10

20

30

40

50

FURows

60

70

80 2

4

6

8

10

12

14

16

18

20

0 FUs

10

20

30

40

50

FURows

70

80 2

4

6

8

10

14

16

18

20

FUs

(b)

1

1

0.99

0.99

accuracy

accuracy

(a)

60

12

0.98 0.97

0.98 0.97

0.96

0.96

0.95

0.95

0.94

0.94

0

10

20

30 FURows

40

50

60

(c)

70

80 2

4

6

8

10

12

14

16

18

20

0 FUs

10

20

30 FURows

40

50

60

70

80 2

4

6

8

10

12

14

16

18

20

FUs

(d)

Fig. 6. Pima and Thyroid overfitting analysis: Best generalization and the according termination training accuracies for the Pima (a) (b) and the Thyroid (c) (d) benchmarks, respectively.

of FU rows and j the number of FUs, we evolve a FUR classifier by running the evolutionary algorithm for 100.000 generations. As we employ a 12-fold cross validation scheme, the evolution is repeated 12 times while alternating the training and test data sets. During the evolution we log for every increase in the training accuracy FUR’s performance on the test data set. The test accuracies are not used while the evolution runs. To detect the test accuracy where the FUR architecture starts to approximate the training set tightly and to contemporary lose its ability to generalize, we average the test accuracies logged during the evolutionary runs and select the termination training accuracy according to the highest average test accuracy. This is shown in Fig. 5 for the Pima benchmark and the (30, 8) configuration. The test accuracy, drawn along the y-axis, rises in relation to the training accuracy, drawn along the x-axis, until the training accuracy reaches 0.76. After this point the test accuracy degrades gradually. Consequently, we note 0.76 and 0.76 as the best combination of test and termination training accuracies. To cover the interesting parameter areas and keep the computational effort low we evaluate the Pima and Thyroid benchmarks for 2, 4, 6, . . . , 20 FUs per FU row and for 2, 4, 6, 8, 10, 14, 16, 20, 25, 30, 35, 40, 50, 60, 70, 80 FU rows. Fig. 6 shows the results for both benchmarks. In the horizontal level the diagrams span the parameter area of FU rows and FUs. The accuracy for each parameter tuple is drawn along the z-axis with a projection of equipotential accuracy lines on the horizontal level. While the test accuracies for the Pima benchmark, presented

Coping with Resource Fluctuations: The Run-time Reconfigurable FUR

257

in Fig. 6(a) are largely independent from the number of FUs and FU rows with small islands of improved behavior around the (8, 8 − 10) configurations, the Thyroid benchmark presented in Fig. 6(c) has an performance loss in regions with a large number of FUs and few FU rows. Tables 1 and 2 compare FUR’s results for the Pima and the Thyroid benchmarks to related work. Additionally, we use the data mining tool RapidMiner [15] to create numbers for standard and state-of-the-art algorithms and their modern implementations. To this, we evaluate in a 12-fold cross validation manner the algorithms: Decision Trees (DTs), k-th Nearest Neighbor (kNN), Multi-layer Perceptrons (MLPs), Linear Discriminant Analysis (LDA), Support Vector Machines (SVMs) and Classification and Regression Trees (CART). For the Pima benchmark our architecture outperforms any other method. It forms together with SVMs, LDA, Shared Kernel Models and kNNs a group of best performing algorithms within a 3% margin. The accuracy range of the Thyroid-benchmark is much smaller because of the irregular category data size proportions and a single dominant category amounting for 92.5% of the data. In this benchmark our architecture lies 0.66% behind the best algorithm.

Table 1. Pima benchmark: Error rates and standard deviation in %. We use the data mining toolbox RapidMiner [15] to evaluate the algorithms marked by “*”. Preliminary, we identify good performing algorithm parameters by a grid search. Remaining results are taken from [16]. Algorithm Error Rate FUR 21.35 SVM* 22.79 LDA* 23.18 Shared Kernel Models 23.27 kNN* 23.56 GP with OS, |pop|=1.000 24.47 CART* 25.00 DT* 25.13 GP with OS, |pop|=100 25.13 MLP* 25.26 Enhanced GP 25.80 – 24.20 Simple GP 26.30 ANN 26.41 – 22.59 EP / kNN 27.10 Enhanced GP (Eggermont et al.) 27.70 – 25.90 GP 27.85 – 23.09 GA / kNN 29.60 GP (de Falco et al.) 30.36 – 24.84 Bayes 33.40

± Standard Deviation 4.84 4.64 2.56 3.07 3.69 3.61 4.30 4.95 4.50

1.91 – 2.26

1.29 – 1.49 0.29 – 1.30

258

T. Knieper et al.

Table 2. Thyroid benchmark: Error rates and standard deviation in %. We use the data mining toolbox RapidMiner [15] to evaluate the algorithms marked by “*”. Preliminary, we identify good performing algorithm parameters by a grid search. Remaining results are taken from [16]. Algorithm DT* CART* CART PVM Logical Rules FUR GP with OS GP BP + local adapt. rates ANN BP + genetic opt. GP Quickprop RPROP GP (Gathercole et al.) SVM* MLP* ANN PGPC GP (Brameier et al.) kNN*

3.3

Error Rate 0.29 0.42 0.64 0.67 0.70 1.03 1.24 1.44 – 0.89 1.50 1.52 1.60 1.60 – 0.73 1.70 2.00 2.29 – 1.36 2.35 2.38 2.38 – 1.81 2.74 5.10 – 1.80 5.96

± Standard Deviation 0.18 0.27

0.51 0.62

0.44

Reconfigurable FUR Architecture Results

In our second experiment we investigate the question of FUR classification behavior under changes in the available resources while being under optimization. We execute for both benchmarks a single experiment where we configure a FUR architecture with 4 FUs per FU row and change the number of FUs every 40.000 generations. We split the data set into disjoint training and test sets analog to the previously used 12-fold cross validation scheme and start the training of the FUR classifier with 40 FU rows. Then, we gradually change the number of employed FU rows to 38, 20, 4, 3, 2, 1, 20, 30, 40 executing altogether 400.000 generations. Fig. 7 shows the results for the Pima benchmark. We observe the following: – The training accuracy drops significantly for almost any positive and negative change in the number of FU rows and recovers subsequently. – While the asymptotic training accuracy is lower when using only few FU rows, the test accuracy tends to reach for any FU row configuration the usual accuracy rate. This behavior is visible from generation 120.000 to 280.000 in Fig. 7 and is confirmed by previous results showed in Fig. 6 (a).

Coping with Resource Fluctuations: The Run-time Reconfigurable FUR

259

– The recovery rate of the test accuracy depends on the amount of FU rows. While for periods with few FU rows the recovery rate is slow, for periods with 20 and more FU rows the evolutionary process manages to recover the test accuracy much faster. Interestingly, the rise of the training accuracy for generations 280.000 to 320.000 results in a falling test accuracy. This could be a statistical effect, where the test accuracy varies in some interval as the classifier is evolved from a random initialized configuration. – The test accuracy is mostly located between 0.6 and 0.7, independent of the changes in the number of FU rows. Thus, and this is the main observation, the FUR architecture shows to a large extent a robust test accuracy behavior under reconfiguration for the Pima benchmark. 0.9 40 38

0.7

20

# FU rows

accuracy

0.8

0.6

4 3 2 1

# FU rows train test 0.5 0

50000

100000

150000

200000 250000 Generations

300000

350000

400000

Fig. 7. The Reconfigurable Pima benchmark: Changing classifier’s resources (number of FU rows) during the optimization run.

Figure 8 presents the results for the Thyroid benchmark. We observe the following: – The training accuracy, similar to the Pima results, drops significantly when changing the number of FU rows. – As anticipated by previous results showed in Fig. 6 (c), the test accuracy drops for FUR architecture configurations with very few FU rows. This can be observed in Fig. 8 at generations 120.000 to 280.000. – Because of the uneven distribution of category data sizes the test accuracy deviation is smaller and follows more tightly the development of the training accuracy.

260

T. Knieper et al. 1 0.99

40 38

0.98 0.97

0.95 20

# FU rows

accuracy

0.96

0.94 0.93 0.92 0.91

# FU rows train test

4 3 2 1

0.9 0

50000

100000

150000

200000 250000 Generations

300000

350000

400000

Fig. 8. Reconfigurable Thyroid benchmark: Changing classifier’s resources (number of FU rows) during the optimization run.

– Analog to the observations made by the Pima benchmark, more FU rows increase the test accuracy recovery rate. – The main result is that reconfigurations of the FUR architecture are quickly compensated in the test accuracy. The limitation in the case of the Thyroid benchmark is a minimum amount of FU rows to leverage robust behavior. In summary, as long as the FUR configuration contains enough FU rows, FUR’s test accuracy behavior is stable during reconfigurations. Additionally, more FU rows leverage faster convergence.

4

Conclusion

In this work we propose to leverage the FUR classifier architecture for creating evolvable hardware systems that can cope with fluctuating resources. We describe this reconfigurable FUR architecture and experimentally evaluate it on two medical benchmarks. First, we analyze the overfitting behavior and show that the FUR architecture performs similar or better than state-of-the-art classification algorithms. Then we demonstrate that FUR’s generalization performance is robust to changes in the available resources as long as a certain amount of FU rows is present in the system. Furthermore, FUR’s capability to recover from a change in the available resources benefits from additional FU rows.

Coping with Resource Fluctuations: The Run-time Reconfigurable FUR

261

References 1. de Garis, H.: Evolvable Hardware: Genetic Programming of a Darwin Machine. In: Intl. Conf. of Artificial Neural Nets and Genetic Algorithms, pp. 441–449. Springer, Heidelberg (1993) 2. Higuchi, T., Niwa, T., Tanaka, T., Iba, H., de Garis, H., Furuya, T.: Evolving Hardware with Genetic Learning: a First Step Towards Building a Darwin Machine. In: From Animals to Animats, pp. 417–424. MIT Press, Cambridge (1993) 3. Miller, J., Hartmann, M.: Untidy Evolution: Evolving Messy Gates for Fault Tolerance. In: Liu, Y., Tanaka, K., Iwata, M., Higuchi, T., Yasunaga, M. (eds.) ICES 2001. LNCS, vol. 2210, pp. 14–25. Springer, Heidelberg (2001) 4. Haddow, P.C., Hartmann, M., Djupdal, A.: Addressing the Metric Challenge: Evolved versus Traditional Fault Tolerant Circuits. In: Adaptive Hardware and Systems (AHS), pp. 431–438. IEEE, Los Alamitos (2007) 5. Sekanina, L.: Evolutionary Design of Gate-Level Polymorphic Digital Circuits. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2005. LNCS, vol. 3449, pp. 185–194. Springer, Heidelberg (2005) 6. Stoica, A., Zebulum, R.S., Keymeulen, D., Daud, T.: Transistor-Level Circuit Ex´ periments Using Evolvable Hardware. In: Mira, J., Alvarez, J.R. (eds.) IWINAC 2005. LNCS, vol. 3562, pp. 366–375. Springer, Heidelberg (2005) 7. Sekanina, L.: Evolutionary Functional Recovery in Virtual Reconfigurable Circuits. Journal of Emerging Technologies in Computing Systems 3(2) (2007) 8. Higuchi, T., Iwata, M., Kajitani, I., Iba, H., Hirao, Y., Manderick, B., Furuya, T.: Evolvable Hardware and its Applications to Pattern Recognition and FaultTolerant Systems. In: Sanchez, E., Tomassini, M. (eds.) Towards Evolvable Hardware 1995. LNCS, vol. 1062, pp. 118–135. Springer, Heidelberg (1996) 9. Glette, K., Gruber, T., Kaufmann, P., Torresen, J., Sick, B., Platzner, M.: Comparing Evolvable Hardware to Conventional Classifiers for Electromyographic Prosthetic Hand Control. In: Adaptive Hardware and Systems (AHS), pp. 32–39. IEEE, Los Alamitos (2008) 10. Yasunaga, M., Nakamura, T., Yoshihara, I.: Evolvable Sonar Spectrum Discrimination Chip Designed by Genetic Algorithm. In: Systems, Man and Cybernetics, vol. 5, pp. 585–590. IEEE, Los Alamitos (1999) 11. Glette, K., Torresen, J.: A Flexible On-Chip Evolution System Implemented on a Xilinx Virtex-II Pro Device. In: Moreno, J.M., Madrenas, J., Cosp, J. (eds.) ICES 2005. LNCS, vol. 3637, pp. 66–75. Springer, Heidelberg (2005) 12. Glette, K., Torresen, J., Yasunaga, M.: An Online EHW Pattern Recognition System Applied to Face Image Recognition. In: Giacobini, M. (ed.) EvoWorkshops 2007. LNCS, vol. 4448, pp. 271–280. Springer, Heidelberg (2007) 13. Torresen, J., Senland, G., Glette, K.: Partial reconfiguration applied in an on-line evolvable pattern recognition system. In: NORCHIP 2008, pp. 61–64. IEEE, Los Alamitos (2008) 14. Asuncion, A., Newman, D.: UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2007) 15. Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 935–940 (2006) 16. Winkler, S.M., Affenzeller, M., Wagner, S.: Using Enhanced Genetic Programming Techniques for Evolving Classifiers in the Context of Medical Diagnosis. In: Genetic Programming and Evolvable Machines, vol. 10(2), pp. 111–140. Kluwer Academic Publishers, Dordrecht (2009)

A Self-reconfigurable FPGA-Based Platform for Prototyping Future Pervasive Systems Jean-Marc Philippe, Benoˆıt Tain, and Christian Gamrat CEA, LIST, Embedded Computing Laboratory, Point Courrier 94, Gif-sur-Yvette, F-91191 France [email protected]

Abstract. The progress in hardware technologies lead to the possibility to embed more and more computing power in portable, low-power and low-cost electronic systems. Currently almost any everyday device such as cell phones, cars or PDAs uses at least one programmable processing element. It is forecasted that these devices will be more and more interconnected in order to form pervasive systems, enabling the users to compute everywhere at every time. This paper presents a FPGA-based self-reconfigurable platform for prototyping such future pervasive systems. The goal of this platform is to provide a generic template enabling the exploration of self-adaptation features at all levels of the computing framework (i.e. application, software, runtime architecture and hardware points of view) using a real implementation. Self-adaptation is provided to the platform by a set of closed loops comprising observation, control and actuators. Based on these loops (providing the platform with introspection), the platform can manage multiple applications (that may use parallelism) together with multiple areas able to be loaded on-demand with hardware accelerators during runtime. It can also be provided with self-healing using a model of itself. Finally, the accelerators implemented in hardware can learn how to perform their computation from a software golden model. Focusing on the low-level part of the computing framework, the paper aims at demonstrating the interest of self-adaptation combined with collaboration between hardware and software to cope with the constraints raised by future applications and systems.

1

Introduction

Thanks to the continuous technology shrink, computer designers are able to embed more and more computing power in almost every object of everyday life. Additionally, these objects are meant to be more and more interconnected, letting people entering the ubiquitous or pervasive computing era (many computers per 

This work was supported and funded by the European Commission under Project ÆTHER No. FP6-2004-IST-4-027611.

G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 262–273, 2010. c Springer-Verlag Berlin Heidelberg 2010 

A Self-reconfigurable FPGA-Based Platform for Prototyping

263

person) after the mainframe era (many people, one computer) and the personal computer era (one computer per person) [1]. For example, these communicating resources can be found in modern cars (which can use more than 60 connected embedded CPUs) as well as in cell phones or laptops and even in clothes (e.g. wearable computing). The pervasive systems formed by these networked devices provide the users with invisible services that enable them to compute everywhere and at every time. Based on that, one can observe the evolution of already existing applications and the emergence of new applications requiring more and more portable computing power (such as mobile television). These applications also put a lot of constraints on the underlying computing resources: besides the necessary high computing power, low-power as well as fault-tolerance and the ability to compute highly heterogeneous data flows are seen as important for future computing devices. One solution to face these different constraints is to take advantage on the dynamic adaptability of modern reconfigurable architectures. Being able to change on the fly the working parameters and structure of a computing device enables it to be adapted to a lot of application domains as well as to the different needs of the surrounding users [2]. Unfortunately, the high numbers of both constraints and possible states of the computing system make it difficult, and even impossible to manage with traditional control mechanisms. A possible solution to this problem is to embed into the system the necessary abilities and knowledge to enable it to manage itself: a way to observe its state and its environment, a way to take decisions and a way to apply these decisions in order to change its state to better fit the environment. This basic behavior is know as self-adaptation. Based on the fact that the pervasive systems are based on a very high number of computing resources, it is obvious to also provide these resources with the ability to share information or even tasks. This collaborative behavior is also seen as very important [3]. The exploration of the different techniques enabling both embedded self-adaptation and collaboration is a very complex research subject. This paper presents a self-reconfigurable FPGA-based platform used to prototype solutions based on closed loops for making hardware and software collaborate seamlessly so as to ease the work of pervasive system designers. The rest of this paper is organized as follows. Section 2 presents different works based on self-adaptation as well as the concept behind the paper. The proposed self-reconfigurable platform is introduced in Section 3 from the hardware and software points of view. Section 4 deals with the chosen test applications for evaluating the platform. Before concluding we present the experimental results in Section 5.

2

Related Works on Self-adaptation and Context

Self-adaptation can be defined as the ability of a system to adapt to its environment by allowing its components to monitor this environment and change their behavior in order to preserve or improve the operation of the system according

264

J.-M. Philippe, B. Tain, and C. Gamrat

to some defined criteria. This definition is related with either the modification of some parameters that define the working point of the device (e.g. the power supply voltage and the clock frequency) or the modification of the structure of the architecture (both at software and hardware levels). For example, self-adaptation can be implemented in order to provide the architecture with fault-tolerance by enabling the monitoring of temperature for the detection of transient hot spots that may damage some parts of a chip [4]. Preventing chip damages as well as self-repairing some runtime defects (e.g. caused by electromigration) is a promising idea for future computing architectures that can monitor some of the variables that characterize their state [5]. At a higher level, a controller can observe the task the architecture has to do so as to select and download the more efficient partial bitstream to implement the requested computation [6]. More advanced self-X features can also be implemented by introducing self-placement and self-routing properties to an architecture which can autonomously modify its structure in order to achieve fault detection and fault recovery [7]. In the ÆTHER project, a general model of a basic computing entity that aims to be networked with other entities of the same type to form complete systems was introduced [3]. Each of these entities is meant to be self-adaptive, which implies that they can change their own behavior to react to changes in their environment or to respect some given constraints. As shown in Fig. 1, the Self-Adaptive Networked Entity, or SANE in short, is a self-contained component composed of mainly four parts. The first one is the computing engine, dedicated to data processing. It can be adapted to the wide range of algorithms that the SANE system is able to compute. The second part is the observer which is responsible for monitoring the computing process and some runtime parameters related to the environment as well as the chip. This observation process enables the SANE to be aware of itself, of its environment, and of its computing performance related to the loaded task. The role of the controller part is to take all the decisions related to the ongoing computation task. The closed loop composed of the monitoring process associated with an adaptation controller provides the SANE with the self-adaptation ability. The last part of the SANE is the communication interface, dedicated to

Processed data

Data

Computing Engine

Observer

Controller

Communication Interface

Goals, constraints Implementations of tasks Collaboration

Fig. 1. Functional view of the SANE

A Self-reconfigurable FPGA-Based Platform for Prototyping

265

collaboration between the SANEs. The collaboration process is done through a publish/discover mechanism that allows a SANE to publish its abilities and to discover the computing environment formed by the other SANE in its neighborhood. This mechanism enables the SANEs to exchange their tasks or just to clone their states to other SANEs [8].

3

Description of the Prototyping FPGA-Based Platform

In order to study the properties of the above mentioned SANE model, a generic physical prototype was implemented. This section describes both the hardware and software sides of the platform prototype. It also gives an overview on the chosen task allocation mechanism (one possible service of the adaptation controller of the SANE) for hiding hardware complexity from the application point of view. 3.1

Hardware Part of the Prototype

The platform is based on a Virtex-4 FPGA (Xilinx ML402 board) which has self-reconfiguration abilities thanks to the Internal Configuation Access Port (ICAP) of some Xilinx FPGAs. The platform is partitioned into one static area containing a Microblaze (32-bit RISC core) for controlling the platform and four dynamically and partially reconfigurable (DPR) areas as it is shown in Fig. 2. On the current implementation, each area is composed of 3192 LUT, 20 DSP and 20 RAMB16 blocks (maximum available resources for one operator).

Fig. 2. High-level view of the platform including the static area (Microblaze), four dynamic hardware areas and the floorplan of the platform on Xilinx PlanAhead

266

J.-M. Philippe, B. Tain, and C. Gamrat

Fig. 3. Standardized interface for all operators

The Xilinx Partial Reconfiguration Early Access tools were used with both EDK (Embedded Design Kit)and PlanAhead for generating the static and partial bitstreams. All the input and output ports of the different hardware accelerators cross the boundary between the static part and the dynamic part using bus macros which are used to lock the routing between dynamic and static regions. For reusability, they were encapsulated into an interface core (see Fig. 2) which allows the designer to create a new operator without caring about bus macros since they are already placed in this interface core. It also provides a standardized interface between the Microblaze and the operators since it is based on FSL (Fast Simplex Link). The interface also provides a direct connection of the operator to external devices of the board (such as a VGA camera in the histogram equalization application shown in section 4) as it is shown in Fig. 3. The data transmission protocol consists of a data valid signal for indicating that the data present on the link is ready to be read. Different SANE prototypes can be linked using the Ethernet connection available on the board. 3.2

Software Part of the Prototype

The software part of the platform is managed by the Petalinux distribution of μCLinux [9]. μCLinux is a port of the Linux kernel to support embedded processors without memory management unit (MMU) (Microblaze and Petalinux have the MMU support since respectively version 7 and 0.30 with 2.6 Linux kernel). From an application programming perspective, Petalinux offers an interface almost identical to standard Linux, including command shells, C library support and Unix system calls, C and C++ compilers as well as execution of software applications with POSIX threads, and the use of the μCLinux ICAP driver for dynamically setting the configuration of the four DPR areas from the software. 3.3

Management of the Platform: The Allocation Controller

The accesses to the ICAP (for internal self-reconfiguration) as well as the allocation of the DPR areas to the different applications are controlled an application called Allocation Controller (AC). It manages the configuration of the platform based on its exclusive access to the ICAP: the reconfigurations are done by the AC based on both requests from the different computing applications and the

A Self-reconfigurable FPGA-Based Platform for Prototyping

267

Fig. 4. High-level structure of the management of the platform. The AC exchanges commands with the applications and configures the different hardware areas.

availability and configuration of the different DPR areas. For this purpose, the AC has an internal model of the platform which provides it with self-awareness. This internal representation (which indicates in its first version if a DPR area is free and the identifier of the loaded partial bitstream) is updated when new accelerators are requested. The AC sends the identifiers of the allocated DPR areas to the requesting applications, enabling their computing threads to seamlessly use dedicated accelerators loaded on demand by the AC. For this purpose, the AC has both an access to the ICAP and to a local bitstream repository which is located in the DDR memory of the board (see Fig. 4). The communication between applications and the AC use semaphores (to lock the computing resources) and a shared memory. A semaphore on the AC allows only one communication with the AC at a time. In case two threads (possibly from different applications) request a resource to the AC, the second thread is suspended until the AC has finished the first allocation and released the semaphore (the current implementation is based on a first come first served algorithm). There are also semaphores for the management of the shared memory, and for the hardware reconfigurable slots (e.g. when all DPR areas are in use, requesting computing threads are suspended). The shared memory is used by the computing applications and the AC to exchange request and allocation commands. Before creating a computing thread, the application sends a request command composed of the identifier of the requested hardware accelerator to be loaded by writing to the shared memory. The allocation controller compares the identifier of the requested accelerator and the identifier of the accelerators implemented on the FPGA. If the identifiers are different the AC reconfigures one area with the appropriate bitstream. Then the AC answers the requesting thread by sending the identifier of the assigned hardware area. When the thread has finished, the requesting application releases the DPR area by sending a command to the AC. The commands are composed of 32-bit words (4 octets) comprising the command identifier that is sent by the applications to the AC (request an area to be loaded by a given operator or release the area when the corresponding computing thread has finished). The second octet stores the identifier of an hardware area. This octet is written by the AC to send the identifier of the assigned DPR

268

J.-M. Philippe, B. Tain, and C. Gamrat

area to the applications and by the application to release a hardware area when the computing thread has finished. The third octet is used to store the requested operator identifier. It is written by the application and read by the AC. Finally, the fourth octet is used to store the pid of the requesting application (not used at this time). 3.4

Access from Software Threads to the Accelerators

Once the allocation is performed, the communication between the computing threads and the DPR areas is done directly through the FSL with nputfsl and ngetfsl macros. The μCLinux FSL drivers are not used since they take a lot of CPU cycles. Due to the static nature of the nputfsl and ngetfsl macros regarding the FSL identifier to be used, one C computing function per hardware area was needed for the applications. The right function is chosen depending on the hardware resource allocated by the AC.

4

Test Applications

Two applications were implemented to illustrate the possibilities of the platform. The first one is a simple image contrast processing function and the second one is an optical character recognition (OCR) application customized to use both software recognition and hardware self-adaptable accelerators. 4.1

Histogram Equalization

The first test application is an image enhancement application based on histogram equalization which is used to increase the global contrast of an image by allowing a better distribution of the pixel intensities on the histogram (see Fig. 5). The external data ports of the histogram equalization operator on the ML402 board are linked to a daughter board with a camera which are part of the Xilinx Video Starter Kit.

(a)

(b)

(c)

Fig. 5. Pictures showing the original and the badly contrasted pictures at the input of the operator and the enhanced picture at the ouput: (a) Original picture, (b) Badly contrasted picture, (c) Picture after histogram equalization)

A Self-reconfigurable FPGA-Based Platform for Prototyping

269

The histogram equalization needs one hardware area on the platform. When started, the application requests a DPR area to the AC and then performs proper initialization of the loaded operator through the related FSL. When the user stops the application, it sends a command to the AC to release the hardware resource. This application also features a closed loop based on the observation of the mean of the pixel intensities of the input picture. As shown in Fig. 5, when the mean of pixel intensities is sufficiently high, the histogram equalization is not used (picture a) and when this mean reaches a level under a user-defined threshold, histogram equalization is performed to enhance input pictures (pictures b and c). 4.2

Optical Character Recognition

Optical Character Recognition (OCR) is used in pervasive applications to provide the computing system with data to process in a user-friendly way (e.g. in future healthcare environments, for collecting important information on business cards, for online translation of foreign languages, etc.). OCR is popular since it enables a computing system to use the same information human beings process: printed letters and numbers (contrary to RFID tags and barcodes which are not human readable). Another advantage is that using printed information to identify properties of objects is very cheap and uses already available information. The GNU Ocrad OCR application (version 0.17) [10], which was used as a basis for the application, is composed of steps such as pre-processing for enhancing the input picture (binary threshold, picture transformations such as crop, scale, etc.) and analyzing the page layout to find columns, detecting text blocks and then for each text block, detecting text lines. For each text line, Ocrad detects the characters and finally recognize them. The possibility of using hardware accelerators was added to Ocrad for research purposes. The hardware accelerator is composed of an operator that computes the cross-correlation between an input character and a set of masks corresponding to the different letters. If the maximum cross-correlation is above a certain threshold, the letter is recognized. Another modification of the original Ocrad application was the parallelization at the character-level using POSIX threads (for each character in a line, the character recognition function is called by a pthread).

5

Results and Analysis of the Platform

This section presents the measurements that were realized using the abovepresented platform as well as an analysis on memory requirements. 5.1

Size of the Operators and Reconfiguration Time

Table 1 gives the physical resources required for the two hardware operators used in the test applications. The given percentages are given compared to the total amount of available resources in one of the DPR areas. Depending on the routing and the placement of the operator on each of the four DPR areas, the

270

J.-M. Philippe, B. Tain, and C. Gamrat Table 1. Hardware resources required for the two operators

LUT FF SLICE M/L DSP16 RAMB16

Histogram Equalization 1607 (50%) 1385 (43%) 491 (62%) 0 2 (10%)

OCR 876 (27%) 959 (30%) 293 (37%) 6 (30%) 2 (10%)

size of the partial bitstreams may slightly vary. However, for the different partial bitstreams generated by PlanAhead, the size is around 170 kB. The reconfiguration time is considered as the amount of time between the moment the AC receives a request command from one application through the shared memory and the moment it writes the identifier of the allocated area in the shared memory. This time also includes the retrieving of the requested partial bitstream from the on-board memory and its writing to the allocated area through the ICAP. The mean reconfiguration time was measured to be 13 ms for a 170kB partial bitstream, which means that the effective reconfiguration bandwidth is around 104Mbits/s (transfer of the bitstream from the DDR memory to the configuration memory of the chip). 5.2

Speed-Up of OCR Thanks to Hardware Support

In this custom version of OCR, the set of masks is built during runtime : the hardware learns from the software how to recognize letters. The algorithm is the following: each computing thread always calls an hardware accelerator first and then the original software version if no letter is recognized by the hardware. At the beginning of the execution, the set of masks is empty. When the first input letter needs to be recognized, the accelerator is called and returns the fact that no letter was recognized so the software version is called. It recognizes the letter and the system uses the input picture of the character as the corresponding mask for future executions (it is stored in the set). During execution, new letters are added to the set. If the cross-correlation is high enough, the character is recognized by the hardware and the software version is not called (see Fig. 6) thus speeding up character recognition (see Fig. 7). Fig. 7 shows the accelerator self-adaptation through the learning process. It was obtained by measuring the execution times of the OCR application on a text of five different lines for three configurations of the platform (pure software, one hardware accelerator, four hardware accelerators). One can notice that for the first line, the software version and the hardware version with only one accelerator have the same execution time. For the second line, the execution time of the hardware version with one hardware accelerator is more than two times

A Self-reconfigurable FPGA-Based Platform for Prototyping

271

Fig. 6. Algorithm of the hardware accelerator based on cross correlation between the input character and a set of masks

Fig. 7. Execution times of the OCR application for different configurations of the platform. The lines come from a standard text in English and are different but comparable in size. The first line enables the hardware accelerators to learn how to recognize letters from the software application: at the beginning of the second line, most of the letters are in the set of masks. For the second line and the following ones, the execution time is decreased.

faster than the software version. In fact, during the execution, the number of masks used by the hardware accelerator increases thanks to the learning process. This implies that the probability of hardware recognition also increases, so as the recognition speed. This learning property provided by a closed loop between the software golden model and the hardware accelerators can be applied to other languages (or to image recognition algorithms), since only pictures of letters are stored in the set of masks. By changing the golden model of computation (i.e. the software version of OCR), the computation can be changed and the hardware accelerators can evolve to a new configuration.

272

5.3

J.-M. Philippe, B. Tain, and C. Gamrat

Using the AC for Self-healing

Another prototyped closed loop is related to self-healing. By providing the hardware areas with an observer which probes their state, the AC is also used as a self-healing enabler since it enables the platform to recover from an external corruption of one of the hardware areas. As a demonstration example, while the system is running, one of the used hardware area can be reconfigured with a blank bitstream via the JTAG interface of the board (so that μCLinux is not aware of this modification). By reading the loaded operator identifier and by using internal timeouts, the AC can be aware of a problem and automatically reconfigure the problematic area with a fresh requested operator. This property is particularly interesting to prototype other self-healing mechanisms and to either simulate or physically implement some failures on the chip to assess their efficiency. For example, the AC can read back any loaded partial configuration so as to compare it with a reference using a hash function. If the checksum is not correct, the AC can refresh the configuration and test it again. After several different recovering techniques, it can invalidate the DPR area if all tests fail. 5.4

Analysis of the Platform

The experiments showed that the main issues to solve concern memory requirements. Due to the use of semaphores for synchronization, the memory footprint of all applications noticeably increased. Future work will consist in finding some other ways to implement synchronization mechanisms to improve the efficiency of the platform. The other memory issue is directly linked to the static nature of both the place and route process of FPGAs and the software interface to the operators. For each hardware operator, four partial bitstreams were generated and need to be stored in the memory. As already written, from the software point of view, four software functions needed to be written per application since nputfsl and ngetfsl macros only take static FSL identifier as a parameter. This issue is directly linked with FPGA tools and the way they manage the place and route process. Different challenges regarding this problem are tackled by other work such as the Erlangen Slot Machine [11] or through online routing [12]. But this management is out of the scope of our current research on closed loop prototyping since these are more seen as self-adaptation enablers for reconfigurable hardware.

6

Conclusions

In this paper, we presented a self-reconfigurable platform based on an FPGA that aims at prototyping solutions to the issues raised by future pervasive applications (e.g. providing systems with self-adaptation as studied in the ÆTHER FP6 project [13]). It is used to study how to include mechanisms to simplify the use of hardware possibilities from traditional applications. This is done through an allocation mechanism based on a shared memory which enables the running applications to request hardware accelerators to an allocation controller. The

A Self-reconfigurable FPGA-Based Platform for Prototyping

273

platform also features a low-cost self-healing mechanism thanks to the AC. This platform is used to prototype self-adaptation concepts studied in the ÆTHER project such as hardware adaptation using reconfigurable architectures, monitoring and control loops, task delegation mechanisms, application deployment and information exchange between hardware entities thanks to publish / discovery mechanisms. It was one of the main blocks of the final ÆTHER demonstration where a number of such boards exchanged information and computing tasks through Ethernet and WiFi, enabling this system to optimize its behavior.

References 1. Krikke, J.: T-engine: Japan’s ubiquitous computing architecture is ready for prime time. IEEE Pervasive Computing 4(2), 4–9 (2005) 2. Satyanarayanan, M.: Pervasive computing: vision and challenges. IEEE Personal Communications 8(4), 10–17 (2001) 3. Danek, M., Philippe, J.-M., Bartosinski, R., Honzk, P., Gamrat, C.: Self-Adaptive Networked Entities for Building Pervasive Computing Architectures. In: Hornby, G.S., Sekanina, L., Haddow, P.C. (eds.) ICES 2008. LNCS, vol. 5216, pp. 94–105. Springer, Heidelberg (2008) 4. Mukherjee, R., Mondal, S., Ogrenci Memik, S.K.: Thermal Sensor Allocation and Placement for Reconfigurable Systems. In: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, ICCAD (2006) 5. Sylvester, D., Blaauw, D., Karl, E.M.: ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon. In: IEEE Design & Test of Computers, November 2006, pp. 484–490 (2006) 6. Lagger, A., Upegui, A., Sanchez, E., Gonzalez, I.: Self-Reconfigurable Pervasive Platform for Cryptographic Application. In: Proceedings of the International Conference on Field Programmable Logic and Applications, FPL 2006 (2006) 7. Soto Vargas, J., Moreno, J.M., Madrenas, J., Cabestany, J.: Implementation of a Dynamic Fault-Tolerance Scaling Technique on a Self-Adaptive Hardware Architecture. In: Proceedings of the International Conference on Reconfigurable Computing and FPGAs, pp. 445–450 (2009) 8. Jesshope, C.R., Philippe, J.-M., van Tol, M.: An Architecture and Protocol for the Management of Resources in Ubiquitous and Heterogeneous Systems Based on the SVP Model of Concurrency. In: Berekovi´c, M., Dimopoulos, N., Wong, S. (eds.) SAMOS 2008. LNCS, vol. 5114, pp. 218–228. Springer, Heidelberg (2008) 9. Williams, J.: Embedded Linux as a platform for dynamically self-reconfiguring systems-on-chip. In: The International Conference on Engineering of Reconfigurable Systems and Algorithm (2005) 10. Diaz Diaz, A.: Ocrad - The GNU OCR, http://www.gnu.org/software/ocrad/ 11. Majer, M., Teich, J., Ahmadinia, A., Bobda, C.: The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-based Computer. Journal of VLSI Signal Processing Systems 47(1), 15–31 (2007) 12. Paulsson, K., Hbner, M., Becker, J., Philippe, J.-M., Gamrat, C.: On-line Routing of Reconfigurable Functions for Future Self-Adaptive Systems - Investigations within the AETHER Project. In: Proceedings of the International Conference on Field Programmable Logic and Applications (FPL 2008), pp. 415–422 (2008) 13. The AETHER project web page. The AETHER consortium (2006), http://www.aether-ist.org

The X2 Modular Evolutionary Robotics Platform Kyrre Glette and Mats Hovin University of Oslo, Department of Informatics, P.O. Box 1080 Blindern, 0316 Oslo, Norway {kyrrehg,matsh}@ifi.uio.no

Abstract. We present a configurable modular robotic system which is suitable for prototyping of various robotic concepts and a corresponding simulator which enables evolution of both morphology and control systems. The modular design has an emphasis on industrial configurations requiring solidity and precision, rather than rapid (self-)reconfiguration and a multitude of building blocks. As an initial validation, a three-axis industrial manipulator design has been constructed. Evolutionary experiments have been conducted using the simulator, resulting in plausible locomotion behavior for two experimental configurations.

1

Introduction

The construction of a robotic system by assembly of several instances of a general base module (and some auxiliary modules) may have multiple advantages over custom built systems. By having general components, one may construct a variety of robotic configurations with different functionality without having to redesign the whole system each time. This saves design effort, as well as enabling robot builders without in-depth knowledge about electronics and mechanical design, to assemble robotic systems. Such a system is ideal in the case of short-term student projects where focus is on robot behavior rather than the underlying hardware details. In addition, the reuse of parts from design to design is inherent in the modular robot principle, allowing for potential cost savings. Another cost saving factor could come from the production of several identical parts, allowing for savings both in purchase and production processes. However, few demonstrations of real cost savings have so far been shown [1], although some approaches, such as the Molecubes project, attempt to address this issue [2]. Several modular robotic systems offer fast reconfiguration possibilities, from simple manual connection mechanisms [3,2] to more advanced active connection mechanisms allowing self-reconfiguration [4,5,6]. On the other hand, modular robots are also emerging in industry, such as the UR-6-85-5-A industrial robot from Universal Robots [7]. Here, a more fixed structure is employed, while several advantages, such as flexibility, low cost, and ease of development are retained. Modular robotics can have several advantages of being combined with evolutionary methods [8,9]. Firstly, the general robot modules scan be suitable as G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 274–285, 2010. c Springer-Verlag Berlin Heidelberg 2010 

The X2 Modular Evolutionary Robotics Platform

275

building blocks for an evolutionary design process, giving the algorithm flexibility in terms of constructing various morphologies. Secondly, constructing a robot from modular building blocks, especially when these are fine-grained, may pose a complex design challenge. In such cases an evolutionary search may be able to find solutions which would be hard to find by traditional manual design methods. Furthermore, a multi-objective evolutionary approach could be helpful in finding solutions which take into account not only performance but also factors such as cost and robustness. An efficient simulation of the robotic system is of high importance for the flexibility and efficiency of an evolutionary search, however, in many cases the gap between the simulator and the real world, often referred to as the reality gap [10], may reduce the validity of the evolved simulated solutions. In this paper we present the early results of a developed modular robotic system, coupled with a simulator and an evolutionary framework. The robot design consists of one actuated core module and some static auxiliary modules. At the time being we have not focused on self-reconfiguration or active connection mechanisms between parts, but rather envisioning the system as a tool for prototyping of robotic concepts. This gives a longer lifetime for a given configuration, allowing for solid, fixed connection mechanisms. Moreover, we would like to have a design which is relatively easy to build, thus, in contrast to the abovementioned systems, we avoid having a custom designed printed circuit board. Instead we employ cheap and commonly available off-the-shelf components as much as possible, with the exception of the 3D-printed housing. The modules have been designed with an introductory robotics course in mind, emphasizing a revolute joint which allows simple kinematic calculations. However, it is envisioned that the modular system will also allow for more advanced experiments, such as robotic locomotion, with an increased ease of use and production than earlier approaches by the authors [11]. It is also envisioned that the rigid body structure of the proposed robot will be combined with soft polymer parts, and taking such aspects into account in the simulation by utilizing advanced features of the physics engine. The utilization of such features has been explored by Rieffel et al. in [12], and has also been partially been implemented in the simulation and explored by the authors in [13]. The paper is structured as follows. Section 2 describes the robotic system, while Section 3 describes the setup for some preliminary experiments. The experimental results are given in Section 4 and discussed in Section 5. Finally, Section 6 concludes the paper.

2

The X2 Robotic System

The X2 robotic system is modular and consists of an actuated core module and some static auxiliary modules. Each core module is independently controlled and has several connection possibilities, which allows for flexible configuration of the robot structure.

276

K. Glette and M. Hovin

Fig. 1. Exploded view of the X2 core module, showing the inner rotational core in yellow and the outer shell in white. The motor can be seen on the underside and the microcontroller board and rotational encoder on top.

2.1

Core Module

The core module is the main building block of the system, and consists of two main parts: an inner core and an outer shell, which together function as a revolute joint. In addition, the module contains a motor, a rotational sensor, and control electronics. See Fig. 1. The inner core of the core module can have a rotational movement which is actuated by the side-mounted motor through a belt. One reason for choosing an inner rotational core is the high stability offered by such a design, with high sideways support from the shell. In addition the motor axle is offloaded and does not have to directly drive a plastic part. From our experiences, driving plastic parts with high local forces may result in tearing of the plastic material. The belt also offers a gear ratio and is in a sense an alternative to a prohibitively expensive harmonic drive gearing mechanism. Further robotic modules can be attached to either the inner core or the shell in order to obtain a revolute joint functionality. For motor control an Arduino ATmega328 microcontroller board is fitted, performing sensor input, motor actuation and communications with a host. For feedback to the control loop an optical incremental rotation encoder is fitted, belt-driven by the inner core. The planetary geared 12V DC servo motor is controlled via a motor driver board. Communications to the host interface is performed over USB cable, using a USB to TTL converter chip. Each module has its own USB and power cable sockets. The core module structural parts are mainly produced by rapid prototyping technology, in this case as direct outputs from a 3D printer. Some auxiliary parts are molded in silicone from 3D-printed molds, such as a flexible pen holder tip and motor sleeves. The rest of the components, like ball bearings, belts, and the electronics, are commercial off-the-shelf parts.

The X2 Modular Evolutionary Robotics Platform

2.2

277

Configuration

Because of the modular design a large number of structural configurations are possible, limited mainly by the number of parts available. In addition, since each module is individually powered and controlled, a high number of modules could result in an impractical number of cables. Modules are connected with a number of screws for solidity, while still making reconfiguration possible with a little effort.

Fig. 2. The X2 configured as a standard three-axis manipulator arm. Left: equipped with tool. Upper middle: closeup of the microcontroller board and the rotation encoder. Lower middle: closeup of cable connectors and DC motor. Right: experimental configuration, see Sec. 3.2.

The initial configuration is the standard three-axis manipulator arm shown on the left in Fig. 2, inspired by industrial manipulators. This configuration is being used for educational purposes in a robotics course and is the only configuration actually built so far. The setup includes a base for fixing the robot to the ground, an extension arm module and a flexible silicone tip for holding a pen, or a hard tip for holding other tools such as a milling tool. One possible configuration could be a four-legged setup as shown in Fig. 3. This robot is planned built for experiments on walking and climbing behavior, but exists at the time of writing only as a simulation model. The configuration allows for up to three degrees of freedom for each of the four legs, and needs in this case twelve core motorized modules, plus some helper modules. 2.3

Simulation

A robot simulator has been developed, based on the PhysX [14] physics library and using OpenGL for visualization, see Fig. 3 for a screenshot. The PhysX library is primarily developed for real-time applications such as computer games, and some features (cloth, soft bodies, fluids) can be hardware accelerated by a graphics processing unit (GPU) through the CUDA framework. At the moment the X2 robot does not utilize any of the abovementioned features, but it is

278

K. Glette and M. Hovin

Fig. 3. The X2 configured as a four-legged robot. Image from simulator.

planned to include support for soft bodies in order to simulate soft (silicone molded) parts of the robot. The cloth feature is supported for the simulation of artificial muscles, and is described in [13]. The modules are simulated as dynamic rigid bodies, and are constrained by revolute joints which are limited to rotation along one single axis. Simple primitives, such as boxes, capsules, and spheres, are combined in order to simulate the shapes of the modules, however a polygonal mesh can be loaded and visualized on top of these primitives in order to improve the visual presentation.

3

Robot Control and Experimental Configuration

This section describes the experimental control system and evolvable robot configurations. 3.1

Controller Model

control value

For the following robot configurations, a relatively simple trigonometry-based function has been chosen for controlling the joint movements. An illustration of a period of the controller curve for one joint can be seen in Fig. 4. The attack parameter decides the time between t0 and t1 , pause0 the time between t1 and t2 , and decay the time between t2 and t3 , and pause1 the time between t3 and t4 . The

t0

t1

t2 time

t3

t4

Fig. 4. Example period of the controller output

The X2 Modular Evolutionary Robotics Platform

279

controller then repeats the same curve in a cyclic fashion, with a given frequency. All joints share the same frequency, but have different curve parameters, as well as individual phase shifts, φ. The frequency for the joint controllers are kept low in the simulator in order to not exceed the angular speed of the real robot joints, as well as to encourage static locomotion. We believe the chance of successfully transferring the control to the real robot is higher when avoiding dynamic locomotion, such as jumping behavior. 3.2

Experimental Configuration 1

Initially, we would like to investigate the possibilities of evolving locomotion behavior in the simulation phase, with transfer to a real robotic setup, and a second-phase evolutionary tuning, in mind. Therefore, for the first experiment a fixed and simple morphology has been chosen as it will be closest to the existing hardware setup and thus the first possibility for real-life validation. The configuration is based on the educational configuration as described in Sec. 2.2. However, the robot is fixed to a small platform and the tip is equipped with a high friction object for making it possible for the robot to pull itself, including the platfrom, forward – see Fig. 2 for the real configuration and Fig. 5 for the simulator configuration. In addition, only two of the joints are enabled, which makes it impossible for the robot arm to turn around the vertical axis. Evolution is in this case performed on a basic controller curve for each joint, and the initial angle and amplitude of the movement. This can be described in the genome encoding with 5 controller curve parameters, and the minimum and maximum angle of the movement, as follows (number of bits in parentheses): attack (8) pause0 (8) decay(8) pause1 (8) φ(8) min.angle(8) max.angle(4) This is then multiplied by the number of joints and decoded to floating-point values from a binary encoding. Appropriate ranges for each of the parameters have been chosen in order to avoid unstable configurations. The total length of the genome counts 107 bits: unused (3) joint0 (52) joint1 (52)

(a)

(b)

Fig. 5. Simulator screenshot of configuration 1, shown with a polygonal mesh representation (a) and the underlying simulation primitives (b).

280

3.3

K. Glette and M. Hovin

Experimental Configuration 2

As a second experiment, we would like to investigate the capabilities of the simulator by introducing a more complex morphology and possibilities for evolution controlling morphology parameters in addition to control. This is a more interesting scenario in terms of being able to evolve morphology in the first phase of the evolution, as well as investigating the scalability of the evolutionary algorithm. The configuration is based on the four-legged configuration described in Sec. 2.2, however the length of the tip as well as the length of the arm between the second and third joints are adjustable. The tip and arm length are equal for all legs for stability reasons, but to challenge the evolutionary search no symmetry is taken into account for the joint controllers, giving a total of 8 individual controller parameter sets. The joint coding follows the same style as in the first configuration, however the bit precision is changed in some places: attack (6) pause0 (6) decay(6) pause1 (6) φ(8) min.angle(6) max.angle(7) The total genome can then be described as follows, counting 379 bits: unused (3) arm length(8) tip length(8) joint0 (45) ... joint7 (45) 3.4

Evolution

For both of the robotic configurations, the fitness function is the average speed at which the robotic phenotypes are able to move along one axis. The phenotypes are evaluated during 3000 simulation steps for the first configuration and 4000 for the second, where one simulation step corresponds to 1/60 s of simulated time. Moving backwards and falling over gives zero fitness score. For the evolutionary runs, the GAlib library [15] has been employed, running the ”simple” genetic algorithm (GA) as described in [16]. The evolution runs have been run for 250 generations, with a population size of 50 and a two-point crossover probability of 0.4 for all experiments. The bit flip mutation probability has been set to the reciprocal of the number of bits in the genome.

4

Results

This section describes the experimental results and the current status of the hardware development. 4.1

Evolution Runs

Simulated evolution runs were carried out using the settings as described in 3.4. The fitness curves are plotted in Fig. 6. The entire evolutionary process took 3 hours 56 minutes for the first configuration and 22 hours 11 minutes for the second configuration. This corresponds to an average individual evaluation time of 6.4 seconds per individual for the second configuration, as opposed to the 66.7 seconds of simulated time. Note that this number is somewhat improved because some evaluations could be cut off at an early stage due to falling or similar behavior.

The X2 Modular Evolutionary Robotics Platform 1.8

5

1.6

4.5

281

4

1.4

3.5 1.2

Fitness

0.8

2.5 2

0.6 1.5 0.4

1

0.2

0.5 elite population average

elite population average

0

0 0

50

100

150

200

250

0

50

100

Generations

150

200

250

Generations

(a)

(b)

Fig. 6. Best and average population fitness plots for the first(a) and second (b) configuration. Note the different scale on the vertical axes.

4.2

Locomotion Results

The best individuals from each of the simulated evolution runs have been evaluated qualitatively as well as measuring the motor controllers’ output and positions of selected parts during the locomotion process. For the first configuration, a successful forward pulling motion was achieved, and a plot of the best controller can be seen in Fig. 7. In the phase where the actual forward movement took place, the tip was pushed to the ground in such a way that most of the platform was elevated from the ground while moving. One can also observe from the figure that a slight backwards movement took place at the end of each advancement, which was due to a slight backwards push in the process of lifting the tip.

2 1.5

control value / position

Fitness

3 1

1 0.5 0 -0.5 -1 lower joint upper joint x pos. y pos.

-1.5 -2 10

12

14

16

18 time (s)

20

22

24

26

28

Fig. 7. Controller plot of evolved solution for the first configuration. Upper joint refers to the angular control values (from the initial starting angle) for the joint nearest the tip. High values signify the limbs moving towards the ground. x and y position values have been scaled with different constants for readability.

282

K. Glette and M. Hovin right front leg left front leg right rear leg left rear leg

50

60

70

80

90

100

110

120

time (s)

Fig. 8. Second configuration leg tip heights over time, scaled with different factors and translated for readability. In the actual gait some legs were lifted higher than others.

For the second configuration the best individual managed to move forward in a cyclical manner, however the movement seemed somewhat unstable, and in some cases a slight change in the initial conditions could cause the robot to fall over. A plot of leg positions can be seen in Fig. 8. The evolved morphology parameters, tip length and arm length, are summarized in Tab. 1. Table 1. Evolved arm and tip lengths description range best ind. arm length [12.0,18.0] 15.5 tip length [8.5,17.0] 15.6

4.3

Hardware System Status

Currently, 3 core modules have been manufactured, as can be seen in Fig. 2, and simple tests have been carried out to verify the functionality of the actuation. Furthermore, the first experimental configuration has been assembled with the necessary auxiliary modules. However, the motor control system and communications have not yet been fully implemented on the microcontroller, and as such the evolved behavior from the simulator cannot yet be tested on the hardware system.

5

Discussion

By observing the elite fitness curves in Fig. 6, it seems like the advances are more frequent in the evolution of the first configuration, which is expectable since the fitness landscape is expected to be significantly less difficult than for the second configuration. This is also strenghtened by the observation that the average fitness improves more over time for the first configuration. It is however

The X2 Modular Evolutionary Robotics Platform

283

a bit unexpected to see that the final evolved solution seems to be suboptimal in the sense that there is a slight backwards movement for each cycle. This may be caused by the evolutionary search getting stuck in a local optimum, however, inaccurate friction simulation may possibly also play a role. The best solution obtained for the second configuration seems slightly awkward and suboptimal by visual inspection, however, a relatively fast movement is obtained. The evolved controllers were not entirely symmetrical with respect to the left and right legs, but still, by looking at the peaks in Fig. 8, one can discern similarities to a crawler gait as described for instance in [17]. It is therefore interesting to observe, that even when purposefully not building in symmetry into the controllers, a variation of the static crawler gait is obtained. When observing the evolved tip and arm lengths of the second configurations, one can see that while they are high, the maximum allowed values are not chosen. The reason for this could be that while longer limbs offer potentially faster locomotion, very long limbs make it hard to find a control system which can keep balance. In order to evolve more stable solutions, an individual could be tested under varying conditions, such as walking in slopes and traversing obstacles. While the proposed control system is simplistic, it does not seem to be a major limitation for the current experiments. However, for further studies in evolving locomotion it would be interesting to look into the use of central pattern generators such as in [9], as well as a tighter coupling between the morphology and the control system, and the addition of sensor inputs. We have so far not been able to test the evolved solutions on the real robotic system, however it is expected that the reality gap will be present to at least some extent. Even when directing the search towards static locomotion there may be issues such as the distribution of mass, friction, and more which could perturb the transfered results significantly. Further research should seek to investigate the relation between the simulated models and the real world performance. Furthermore, we would like to look into more aspects of evolving morphology, both in terms of growing bodies (for instance from L-systems) and introducing soft parts in both the simulation and the real robotic system. The introduction of soft parts seems particularly interesting since the PhysX engine allows for acceleration of these features through GPUs. The proposed robotic system has the advantage of a solid design suitable for industrial-like applications, coupled with being easy to build, given that one has access to a 3D printer. This is of particular interest with regard to student projects, where modules can be assembled quickly with very little electronics work. While the cost issue is addressed through the use of simple off-the-shelf electronical and mechanical parts, a current challenge is the amount of plastic material used for the shell. Material for 3D printers is at the moment very expensive, and the size of the X2 parts prohibits mass production through 3D printing. Although this may change in the future, when 3D printers are more commonplace, at the moment one solution could be to modify the shapes so that it would be possible to mold or mill them. However this is a complicated process and it therefore seems like reducing the size of the core module is a more viable option.

284

6

K. Glette and M. Hovin

Conclusion

We have developed a modular robotic system and a corresponding simulation environment with the possibility for artificial evolution of morphology and control. The design focuses more on solidness for industrial-like applications than rapid (self-)reconfiguration. Evolutionary experiments have been conducted with the simulator and static locomotion behavior has been achieved, however some practical work remains before the evolved solutions can be tested on the real robotic system. While the current design addresses production cost in several ways, it is still necessary to reduce the material cost associated with 3D printing, and we will therefore attempt to design a smaller core module. Future work also includes evolution of more advanced control and morphology, including soft parts.

References 1. Yim, M., Shen, W., Salemi, B., Rus, D., Moll, M., Lipson, H., Klavins, E., Chirikjian, G.: Modular self-reconfigurable robot systems [grand challenges of robotics]. IEEE Robotics & Automation Magazine 14(1), 43–52 (2007) 2. Zykov, V., Chan, A., Lipson, H.: Molecubes: An open-source modular robotics kit. In: Proc. IROS (2007) 3. Moeckel, R., Jaquier, C., Drapel, K., Upegui, A., Ijspeert, A.: YaMoR and Bluemove – an autonomous modular robot with Bluetooth interface for exploring adaptive locomotion. In: Proceedings CLAWAR 2005, pp. 685–692 (2005) 4. Duff, D., Yim, M., Roufas, K.: Evolution of polybot: A modular reconfigurable robot. In: Proc. of the Harmonic Drive Intl. Symposium, Nagano, Japan (November 2001) 5. Kamimura, A., Kurokawa, H., Yoshida, E., Murata, S., Tomita, K., Kokaji, S.: Automatic locomotion design and experiments for a modular robotic system. IEEE/ASME Transactions on Mechatronics 10(3), 314–325 (2005) 6. Sproewitz, A., Billard, A., Dillenbourg, P., Ijspeert, A.: Roombots–Mechanical Design of Self-Reconfiguring Modular Robots for Adaptive Furniture. In: Proceedings of the 2009 IEEE international conference on Robotics and Automation, Institute of Electrical and Electronics Engineers Inc., pp. 2735–2740 (2009) 7. Universal Robots: UR-6-85-5-A product sheet, http://www.universal-robots.com/Produkter/Produktblad.aspx 8. Hornby, G., Lipson, H., Pollack, J.: Generative representations for the automated design of modular physical robots. IEEE transactions on Robotics and Automation 19(4), 703–719 (2003) 9. Marbach, D., Ijspeert, A.: Online optimization of modular robot locomotion. In: 2005 IEEE International Conference Mechatronics and Automation, vol. 1 (2005) 10. Jakobi, N., Husbands, P., Harvey, I.: Noise and the reality gap: The use of simulation in evolutionary robotics. In: Mor´ an, F., Merelo, J.J., Moreno, A., Chacon, P. (eds.) ECAL 1995. LNCS, vol. 929, pp. 704–720. Springer, Heidelberg (1995) 11. Garder, L.M., Hovin, M.E.: Robot gaits evolved by combining genetic algorithms and binary hill climbing. In: GECCO 2006: Proceedings of the 8th annual conference on Genetic and evolutionary computation, pp. 1165–1170. ACM, New York (2006)

The X2 Modular Evolutionary Robotics Platform

285

12. Rieffel, J., Saunders, F., Nadimpalli, S., Zhou, H., Hassoun, S., Rife, J., Trimmer, B.: Evolving soft robotic locomotion in PhysX. In: GECCO 2009: Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference, pp. 2499–2504. ACM, New York (2009) 13. Glette, K., Hovin, M.: Evolution of Artificial Muscle-Based Robotic Locomotion in PhysX. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (to appear, 2010) 14. NVIDIA: PhysX SDK, http://developer.nvidia.com/object/physx.html 15. Wall, M.: GAlib: A C++ library of genetic algorithm components, http://lancet.mit.edu/ga/ 16. Goldberg, D.: Genetic Algorithms in search, optimization, and machine learning. Addison-Wesley, Reading (1989) 17. Hornby, G., Fujita, M., Takamura, S., Yamamoto, T., Hanagata, O.: Autonomous evolution of gaits with the Sony quadruped robot. In: Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, pp. 1297–1304 (1999)

Ubichip, Ubidule, and MarXbot: A Hardware Platform for the Simulation of Complex Systems Andres Upegui1 , Yann Thoma1 , H´ector F. Satiz´ abal1 , Francesco Mondada2 , 2 1 Philippe R´etornaz , Yoan Graf , Andres Perez-Uribe1, and Eduardo Sanchez1 1

REDS, HEIG-VD, HES-SO, Yverdon, Switzerland [email protected] 2 MOBOTS, EPFL Lausanne, Switzerland [email protected]

Abstract. This paper presents the final hardware platform developed in the Perplexus project. This platform is composed of a reconfigurable device called the ubichip, which is embedded on a pervasive platform called the ubidule, and can also be integrated on the marXbot robotic platform. The whole platform is intended to provide a hardware platform for the simulation of complex systems, and some examples of them are presented at the end of the paper. Keywords: Reconfigurable computing, bio-inspired systems, collective robotics, pervasive systems, complex systems.

1

Introduction

The simulation of complex systems has gained an increasing importance during the last years. These simulations are generally bounded by the initial constraints artificially imposed by the programmer (e.g., the modeller). These artificial constraints are aimed at mimicking the physical constraints that real complex systems are exposed to. Our approach relies on the principle that for modelling real complex systems like biological systems or social systems, models must not be artificially constrained but must be physically constrained by the environment. Biological systems, for instance, evolve in dynamic physical environments which are constantly changing because of their intrinsic properties and their interaction with the system. For instance, the number of parts and their interconnection in a real complex system is neither random nor regular, but follows a set of implicit building rules imposed by physical constraints and the environment in which they evolve. These constraints are a key element for the emergence of behaviours that are unpredictable by analytical methods. Such emergence has a direct impact on the self-organising properties of complex systems and vice versa, given that there is not clear causality relation between these two properties. G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 286–298, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Ubichip, Ubidule, and MarXbot: A Hardware Platform

287

Within the framework of the Perplexus project1 , our main goal has been to develop a scalable pervasive platform made of custom reconfigurable devices endowed with bio-inspired capabilities. This platform will enable the simulation of large-scale complex systems and the study of emergent complex behaviours in a virtually unbounded wireless network of computing modules. Such hardware platform was aimed at being able to model complex systems in a more realistic manner thanks to two main aspects: (1) a rich interaction with the environment thanks to sensory elements, and (2) the replacement of artificial constraints imposed by the programmer - by physical constraints - imposed by the hardware platform and its interaction with the environment. The network of modules is composed of a set of ubiquitous computing modules called (ubidules), which contain two ubidule bio-inspired chip (ubichips) capable of implementing bio-inspired mechanisms such as growth, learning, and evolution. The ubichip constitute thus the core of the Perplexus modelling platform and can be integrated on the ubidule or on the marXbot robotic platform, which has been also developed on the framework of this project. This paper presents the complete Perplexus modelling hardware platform. Sections 2, 3, and 4 describe the main hardware components of the project, respectively, the ubichip, the ubidule, and the marXbot robot. Then, section 5 gives an overview of several applications where the platform has been used for modelling different types of complex systems with applications to engineering. And finally, section 6 summarises the opportunities offered by the platform.

2

Ubichip

The ubichip is the core device of the whole hardware platform, which provides the reconfigurability support for implementing dynamic hardware architectures. Real complex systems are dynamic, their internal components and interactions are constantly changing according to the interaction of the world with its intrinsic dynamics. This dynamic aspect is precisely the main feature of the ubichip. The ubichip is a reconfigurable digital circuit that allows the implementation of complex systems with dynamic topologies. A fine-grained dynamic partial reconfiguration permits to easily modify the system from an external processor, but built-in self reconfiguration mechanisms permits also to modify it internally in a completely autonomous and distributed way. Moreover, dynamic routing allows also to create and destroy internal connections in the circuit. Previous work in this field is the POEtic tissue [11], a reconfigurable hardware platform for rapidly prototyping bio-inspired systems, which has been developed in the framework of the European project POEtic. The limitations exhibited by the POEtic tissue suggested several architectural and configurability improvements that lead us to the ubichip architecture, better suited for supporting the complex systems that we want to model with our devices. 1

PERvasive computing framework for modeling comPLEX virtually-Unbounded Systems FP6 european project. (http://www.perplexus.org)

288

A. Upegui et al. Reconfigurable Array MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC

Ubicell

Ubicell

MC MC MC MC MC MC MC MC MC MC

SR unit

MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC

DR unit

MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC MC

Ubicell

Ubicell

Fig. 1. Composition of a Macrocell

The reconfigurable array of the ubichip consists in a bi-dimensional regular array of reconfigurable cells called macrocells. A macrocell is composed of a selfreplication (SR) unit, a dynamic routing (DR) unit, and four ubicells, this last one being the basic computing unit of the ubichip. Figure 1 depicts a top level view of a macrocell, which is composed of three layers: a ubicell array layer, a dynamic routing layer, and a self-reconfiguration layer. General Purpose Inputs-Outputs (GPIOs) of the reconfigurable array are implemented in the form of dynamic routing units that, instead of being connected to ubicells, are directly connected to input and output pins of the circuit. These GPIOs allow thus to extend the array to form a multi-chip array, connected through dynamically created paths. The next subsections will briefly describe the functionality of each layer. 2.1

Ubicell Layer

A ubicell is the basic computing unit of the ubichip, which contains four 4-input LUTs and four flip-flops. The ubicell has two basic operating modes: native mode and SIMD mode. In native mode, a ubicell can be configured in different basic “classical“ modes like counter, FSM, shift-register, 4 independent registered and combinatorial LUTs, adder, subtracter, etc. There is a particular configuration mode of a ubicell, very useful in the modelling of complex systems, the 64-bit LFSR mode. In this mode one can use use the 64 configuration flip-flops used for storing the four 4-input LUTs configuration as a reasonable quality pseudo-random number generator. This novel feature, very useful on complex systems modelling, allows to include processes as probabilistic functions and pseudo-random event trigger at a very low resources cost, in terms of reconfigurable computing. This configuration mode has been used in different models implemented on the architecture, such as ontogenetic neural networks [14] and evolutionary games [7]. In SIMD mode (single instruction multiple data), the ubicell layer can be configured as an array of processing elements in order to perform vectorised parallel computation. In this mode, each ubicell can be configured as a minimal 4 bits processor, and 4 ubicells can be put together to form a 16-bit processor. A

Ubichip, Ubidule, and MarXbot: A Hardware Platform

289

centralised sequencer can read a program and send instructions to be executed in parallel by processing units. This configuration mode has been used in modelling complex systems as neural networks [2] and culture dissemination models. 2.2

Self-reconfiguration Layer

Our self-reconfiguration layer allows a given macrocell to access the configuration bit-string of a neighbour macrocell. In this way, a macrocell can either read the configuration of its neighbour, modify it, and reinsert it for modifying the neighbour’s functionality. Another possibility is to recover the configuration bitstring of its neighbour and send it to another (remote) macrocell that will use it to configure its neighbour. This is what we call replication. Now let’s consider the case of two neighbouring initial macrocells A0 and B0 . Lets permit A0 to read the configuration of B0 for copying it and creating B1 , and then lets do the same with B0 reading A0 and copying it to A1 . If we consider the tuple [AB] as a cell, we have an initial cell [A0 B0 ] that has created an exact copy of itself [A1 B1 ]; that’s self-replication. Replicating a single macrocell is not very practical, since the functionality implemented on a single macrocell may be very limited. For overcoming this limitation, we propose the THESEUS mechanism (standing for THeseus-inspired Embedded SElf-replication Using Self-reconfiguration) [13]. THESEUS includes a set of building flags that allow to replicate a larger functional block composed of several macrocells. The building flags describe how to build the complete block. In this manner, a macrocell can access its neighbor’s configuration, and through it, it can also access a chain of other macrocells’ configurations described by a predefined building path by means of the building flags. This reconfigurability feature is one of the firsts step towards the modelling of more realistic complex systems. Following our approach, the implemented complex system size can grow and prune through self-replication and self-destruction. In parallel, the system building blocks can be dynamically connected and disconnected driven by processes executed internally to each block, all this thanks to the dynamic routing mechanism that will be decribed in the next subsection. 2.3

Dynamic Routing Layer

As explained before, real complex systems are constantly modifying its topology. The brain, ecological systems, and social networks, are just some examples where neurons, species, or people are constantly modifying their interaction channels. From complex systems theory, it can be represented as graph links that are being created and destroyed. Dynamic routing offers the possibility of implementing such dynamic-topology systems on a hardware substrate in order to model such changing interactions in a more direct way. The basic idea of the algorithm is to construct paths between sources and targets by dynamically configuring multiplexers, and by letting the data follow the same path for each pair of source and target. A phase of path creation executes a breadth-first search distributed algorithm, looking for the shortest

290

A. Upegui et al.

path. Sources and targets can decide to connect to their corresponding unit at any time by launching a routing process. If we consider the high silicon overhead due to routing matrices on reconfigurable circuits, specially high for dynamic routing, we adopted a solution requiring a small silicon overhead, while being flexible enough to deal with the changing topology of our complex networks. Our dynamic routing algorithm is an improvement of the one implemented in the POEtic chip [10]. The risk of congestion has been be reduced by means of three features: (1) the new algorithm will exploit the existing paths by reusing them, (2) an 8-neighborhood (instead of the 4-neigborhood of POEtic) will allow a dramatic reduction of congestion risk compared to the amount of logic required, and (3) we allow to destruct paths in order to remove unused connections and reuse them later. Finally, while in POEtic the circuit execution was frozen during a routing process, in the ubichip the creation of a new path lets the system run without interruption.

3

Ubidule

The ubidule platform is composed of two electronic boards mainly featuring two ubichips, an ARM processor running Linux, a 3.4 Mgates FPGA, and support for several peripherals. One of the major features of the ubidule platform is its modularity and flexibility. It is easily customizable for each one of the target applications, and so is a complete and efficient modelling platform. For the sake of modularity the ubidule has been decomposed into two boards: a mother board containing the CPU, the FPGA, and peripheral support, and a daughter board containing two ubichips, which can also be integrated on the marXbot robot. 3.1

Ubidule’s Mother-Board

USB Conn

USB Conn

microSD

USB HUB

USB USB

Colibri Module

Ubichip daughter board 0

Jtag PC IV

Jtag

Ubichip

AER con

AER

Par conf

3

6 5

4

ctrl

Ubichip

SRAM

Fig. 2. Schematic of the Ubidule platform

SRAM

Jtag

LCD Touchscreen

Spartan 3A XC3SD3400A

SelectMAP Prog Flash

ctrl

2

SRAM bus

Jtag

1

7

GPIOs

UART

SRAM

SDcard bus

VLIO Memory bus

UART

USB Conn

USB Conn

USB Conn

Wifi

Ethernet

Figure 2 depicts the schematic of the ubidule platform. A mini-PCI socket supports a ubichip daughter board including two ubichips and their respective resources required for running on native and SIMD mode. Even if the current

Ubichip, Ubidule, and MarXbot: A Hardware Platform

Fig. 3. Ubidule platform (top view)

291

Fig. 4. Ubichip daughter board

board contains two ubichips, it can be scaled up to contain up to four ubichips without modifing the current addressing scheme. A second mini-PCI socket supports a CPU board containing an ARM processor, that can be either an Xscale PXA270 or PXA320, and enough memory resources for running a GNU-Linux operating system. This CPU board constitutes the first step toward the desired flexibility and modularity of our ubidules, by providing the advantages of a performant processor, a well supported operating system, gcc tools, and a number of software resources as application programs, services, and drivers. Figure 3 shows the ubidule board. 3.2

Ubichip Daughter Board

The ubichip daughter board mainly contains 2 ubichips with their respective SRAM external memories. Figure 4 shows a top view of the board. When using the ubichips in native mode, both ubichips can be used as an extended configurable array. GPIOs can be configured in both chips in order to connect dynamic routing units from one ubichip to the other. The ubichip daughter board can also be directly inserted in the marXbot robot and both ubichips can be configured from the robot iMX microcontroller through the serial configuration interface. Nevertheless, concerned about the ubichip power consumption from the marXbot power supply, the ubichip daughter board provides the possibility of powering a single ubichip.

4

The Marxbot Robotic Platform

To extend the exploration of complex systems in real-world applications we decided to embed the ubichip in a robotic platform. We therefore designed the marXbot mobile robot, taking care of several specific aspects: large number of robots (more than 20), facility of experimentation, ability to embed the ubidule

292

A. Upegui et al.

Fig. 5. A complete marXbot robot in production (Photo Basilio Noris)

Fig. 6. A group of marXbots during experimentation

as one module, and possibility to run long experiments. Because this design effort was not feasible within the perplexus project alone, we designed the marXbot robot in synergy with the swarmanoid project2 . This section presents the particular features of the marXbot (Figures 5 and 6). 4.1

Modularity

The marXbot robot is a flexible mobile robotic platform. It is made of stacked modules of a diameter of about 17 cm. The modularity of the marXbot robot is based on a common CAN bus and a LiION battery based power supply, both shared by all modules. In the examples presented in this paper, three main modules have been used: – The base module includes the wheels, the tracks (together called Treels), proximity sensors, RFID reader/writer, accelerometers and gyros and battery connection. The Treels provide mobility to the marXbot. They consist of two 2 W motors, each associated with a rubber track and a wheel. Motors are driven by dedicated electronic boards situated on each side of the battery (one for each motor). The maximum speed of the marXbot is 30 cm/s. The base of the marXbot includes infrared sensors to act as virtual bumpers and ground detectors. Those sensors have a range of some centimeters and are distributed around the robot: 24 are directed outside and 8 are directed to the ground. In addition, 4 contact ground sensors are placed under the lowest part of the robot. The base of the marXbot also embed a RFID reader and writer with an antenna situated on the bottom of the robot, close to the ground. – The scanner module allows to build a distance map of the obstacles surrounding the robot [3]. Our design is based on 4 infrared Sharp distance sensors mounted on a rotating platform. These sensors have a limited range and a dead zone close to the device, so we couple two sensors of different 2

http://www.swarmanoid.org

Ubichip, Ubidule, and MarXbot: A Hardware Platform

293

ranges (40–300 mm and 200–1500 mm) to cover distances up to 1500 mm. The platform rotates continuously to make 360◦ scans. To maximize the life time of the scanner, the fix part transfers energy by induction to the rotating part. They exchange data using infrared light. – The top module includes the cameras, a RGB LED beacon, the imx.31 processor and its peripherals such as WiFi board and SD card reader. Two cameras can be mounted: a front camera and an omnidirectional camera on top. Both are equipped with imagers of 3 Mpixels. The RGB LED beacon allows to display a high intensity (1W) color light. Combined with the cameras, this is a localized and well understandable communication system. The imx.31 processor runs LINUX and access standard peripherals such as WiFi, USB or flash storage. 4.2

Ubichip Compatibility

An ubidule extension module has been designed to ensure the embodiment of the ubidule in the marXbot. It ensures the following functionalities: – Mechanical adaptation between the four screws of the marXbot extension system and the four screws of fixation of the ubidule. – A step-up power supply generating 7.5V, 12W for the ubidule. – A microcontroller ensuring the transparent translation of messages between USB (ubidule) and CAN (marXbot). Therefore the final solution has been to develop an ubichip extension module, without screen and user interface, to be placed within the marXbot robot as a sandwich module. From a control point of view, all microcontrollers within the marXbot robot can be controlled using the ASEBA framework [4]. This framework transmits event messages over the CAN bus to exchange commands, to read sensors etc. We have implemented an ASEBA node in the ubidule making it compatible with the software architecture of the marXbot. This allows full control of the marXbot from the ubidule. 4.3

Battery Management

The exploration of complex systems requires the use of groups of robots during long periods, for instance under the control of genetic algorithms. Because of the battery-based power supply of the robots, long experiments are problematic. Therefore the marXbot is powered by a 3.7 V, 10 Ah Lithium-Polymer battery which is hot-swappable. The hot-swapping capability is provided by a supercapacitor which maintains the power supply of the robot for 10 s during battery exchange. A battery exchanger (figure 7) has been designed to automatically extract the battery from a running marXbot and replace it by a charged one in a delay below 10 seconds.

294

A. Upegui et al.

Fig. 7. The battery station able to exchange the battery of the marXbot during operation in 10 seconds

5

Complex Systems Simulations

In this section we briefly describe two examples of complex systems models, that exploit different aspects of the Perplexus hardware platform. Subsection 5.1describes an ontogenetic neural network that exploits the ubichip’s self-reconfiguration and dynamic routing mechanisms, and subsection 5.2 describes a collective foraging task set up on the marXbots. 5.1

Neurogenetic and Synaptogenic Networks on the Ubichip

Given its dynamic routing mechanisms, the ubichip results in a promising digital hardware platform for implementing connective systems with dynamic topologies, more precisely in our case, developmental artificial neural networks. The current implementation of the model considers the initial existence of a set of unconnected 4-input neurons, where dendrites (inputs) and axons (outputs) are connected to dynamic routing units which are previously configured to act as targets and sources respectively. The connectivity pattern is further generated during the neural network life-time. We use a simplified neuron model whose implementation on the ubichip requires only six macrocells. Each dendrite includes the required logic for creating and destroying a synapse in a probabilistic way, and is implemented in a single macrocell. Two more macrocells are used for implementing the soma (cell body of a neuron), the axon (the computation of the activation function and the neuron output), and the management of the dynamic routing address modification. Figure 8 illustrates the complete ontogenetic process with a series of screenshots obtained from the ubimanager tool. Initially, a single neuron is configured on the ubichip(top left screen-shot). A replication process can be triggered on this neuron which can be copied somewhere else in the circuit. A first step is to select where will it be copied, and create a dynamic routing unit to that location

Ubichip, Ubidule, and MarXbot: A Hardware Platform

295

Fig. 8. Sequence of screen-shots of the dynamic routing layer during the development of a neurogenetic and synaptogenic network with 16 4-inputs neurons

(top centre). Then the configuration is sent serially from the initial location to the destination, in order to use this information for creating an exact copy of the initial neuron after a certain number of clock cycles (top right). Now we have two neurons that can again replicate both of them simultaneously, so new target locations are selected (bottom left) for obtaining 2 newly created neurons (bottom centre). At the end, we obtain a circuit fully populated of neurons (bottom right), which in parallel had also performed a probabilistic synaptogenic process that permitted them to interconnect their dendrites and axons. We have experienced two types of developmental processes: random and ambient-driven networks. During the development of random networks, neurogenetic and synaptogenic processes are triggered randomly. The development of ambient-driven networks considers the existence of a set of input stimuli that increase the probability of most active neurons for getting connected. Moreover, existing unused synapses can be removed. At the end, our resulting networks exhibit similarities with topologies observed on biological neural networks [14]. 5.2

Collective Robotics for Target Localization

We have used the marXbot platform for testing a novel approach for the localization of targets in a population of foragers. The control of the population of robots is performed in a distributed way. In this implementation, our robots have two possible states which are “work” and “search”. In the “work” state, robots perform a certain foraging task and are distributed on the arena. In the case of the work presented in this chapter, we have a dummy foraging task

296

A. Upegui et al.

consisting on navigating on the arena avoiding obstacles. The main interest is in the “search” state, in which a robot will try to arrive to a specific target region on the arena. This target region could be a battery charging station, an area for garbage disposal, or the output of a maze. Whatever the robot may search, the goal is to exploit the collective knowledge, given that there are other robots that can estimate how far they are from the target region, and will somehow help the searching robot to achieve its goal. The proposed target localization avoids the use of global positioning systems, that might be difficult to deploy in unknown or hostile environments, and avoids also the use of odometry, which is sensitive to cumulated errors after large running periods. Our approach uses colour LEDs and omnidirectional cameras in order to indicate to other robots the shortest path to a desired target, based on a principle of disseminating the information gathered by the robots through the population. The proposed coordination scheme is completely distributed and uses state communication [1] in a intrinsic way, i.e. robots transmit some information about their internal state, but they are not aware of whether other robots receive this information or not. This fact simplifies the communication and endows the system with an intrinsic robustness. The application runs in both real marXbots robots and in the Enki simulator [5]. Figure 9 show a simulation window and figure 10 shows a set of robots on an arena running the foraging task. The rich sensory elements present in the marXbot robots represent an excellent modelling platform for the simulation of complex systems that interact with the environment.

Fig. 9. Simulated arena where the experiments evolve running the foraging task

Fig. 10. Set of marXbots running a foraging task and searching for specific zones of the arena

We performed a series of five experiments with increasing information about the target position [8]. Each of the five experiments adopt a different strategy for finding the target, and the whole set of robots were constantly searching targets. The strategies were:(1) random search, (2) the use of static landmarks, (3) the use of static landmarks and the robots as landmarks, (4) the use of only the robots as landmarks, and (5) the use of a gradient of colours, by landmark

Ubichip, Ubidule, and MarXbot: A Hardware Platform

297

propagation, mimicking a social localization. This incremental comparison has shown that the “social” approach, whereby the navigation of the population is guided by a gradient of colours, improved the performance in finding a target. We have also compared our social localization approach with a global positioning system (similar to a GPS system) in which a robot knows its position and the position of the target area. The social approach performs slightly worse without the presence of obstacles. However, when including obstacles between the robot and the target area, the social approach largely outperforms the global knowledge system, since the colour gradient formed by the colony of robots does not indicate the direction of the target but, more useful, they point the path to follow in order to find the target.

6

Conclusions

In this paper we presented the complete hardware platform resulting from the Perplexus project. The goal of the hardware is to serve as modelling platform for more realistic complex systems able to interact with the environment, and able to mimic and/or deal with physical constraints. The complete platform is composed of a reconfigurable digital device - the ubichip -, an ubiquitous computing module - the ubidule -, and a robotic platform - the marXbot -. In every case, we have also provided the possibility of simulating the systems in a transparent manner. The ubimanager tool allows to configure a real chip or to simulate a circuit from its own VHDL description [12]. The ASEBA framework [4], allows to write program for the marXbot being possible to be run on the real robot, or on the enki simulator [5]. We have also shown two examples were the platform has been successfully used for the simulation of complex systems. The first one, an ontogenetic neural network, uses the ubichip as substrate for implementing dynamic topology mechanism such as neurogenesis and synaptogenesis. The second one, a social localization task, uses the marXbot in order to implement a foraging task where robots must eventually find specific areas. There are other examples of complex systems implemented on the Perplexus hardware platform: incremental learning on the ubichip for a marXbot controller [9], artificial neural networks in SIMD mode [2], optimizer swarms of self-replicating particles, protein based computation [6], and social networks for activity recognition based on wearable systems. All these applications use of one or several parts of the Perplexus hardware platform for their implementation. The platform has shown to be an interesting alternative for complex systems modelling. The ubichip’s architectural features have provided the required flexibility for modelling the complex processes involved in the formation and evolution of dynamic networks. The sensing and actuating capabilities of the marXbot robot have provided an enhanced interaction with the environment and with other robots. And the ubidule has constituted the base platform for hosting the ubichip, allowing it to interact with the world in a flexible manner.

298

A. Upegui et al.

Acknowledgment The authors would like to thank all the members of the Perplexus project for their valuable work, and their colleagues at the REDS and the MOBOTS groups for their support. This work is funded by the FET programme IST-STREP of the European Community, under grant IST-034632 (PERPLEXUS).

References 1. Balch, T., Arkin, R.C.: Communication in reactive multiagent robotic systems. Auton. Robots 1(1), 27–52 (1994) 2. Hauptvogel, M., Madrenas, J., Moreno J.M.: SpiNDeK: An integrated design tool for the multiprocessor emulation of complex bioinspired spiking neuronal networks. In: Haddow, et al. (eds.) Proceedings of the IEEE Congress on Evolutionary Computation - CEC 2009, pp. 142–149 (2009) 3. Magnenat, S., Longchamp, V., Bonani, M., R´etornaz, P., Germano, P., Bleuler, H., Mondada, F.: Affordable SLAM through the Co-Design of Hardware and Methodology. In: Proceedings of the 2010 IEEE International Conference on Robotics and Automation. IEEE Press, Los Alamitos (2010) 4. Magnenat, S., R´etornaz, P., Bonani, M., Longchamp, V., Mondada, F.: ASEBA: A Modular Architecture for Event-Based Control of Complex Robots. IEEE/ASME Transactions on Mechatronics (2010) 5. Magnenat, S., Waibel, M., Beyeler, A.: Enki - an open source fast 2d robot simulator, http://home.gna.org/enki/ 6. Parra, J., Upegui, A., Velasco, J.: Cytocomputation in a biologically inspired and dynamically reconfigurable hardware platform. In: Haddow, et al. (eds.) Proc. of IEEE Congress on Evolutionary Computation - CEC 2009, pp. 150–157 (2009) 7. Pena, J.C., Pena, J., Upegui, A.: Evolutionary graph models with dynamic topologies on the ubichip. In: Hornby, G.S., Sekanina, L., Haddow, P.C. (eds.) ICES 2008. LNCS, vol. 5216, pp. 59–70. Springer, Heidelberg (2008) 8. Satiz´ abal, H.F., Upegui, A., P´erez-Uribe, A.: Social target localization in a population of foragers. In: NICSO, pp. 13–24 (2010) 9. Satiz´ abal, H.F., Upegui, A.: Dynamic partial reconfiguration of the ubichip for implementing adaptive size incremental topologies. In: Haddow, et al. (eds.) Proceedings of the IEEE Congress on Evolutionary Computation - CEC 2009, pp. 131–141 (2009) 10. Thoma, Y., Sanchez, E., Arostegui, J.M.M., Tempesti, G.: A dynamic routing algorithm for a bio-inspired reconfigurable circuit. In: Cheung, P.Y.K., Constantinides, G.A. (eds.) FPL 2003. LNCS, vol. 2778, pp. 681–690. Springer, Heidelberg (2003) 11. Thoma, Y., Tempesti, G., Sanchez, E., Moreno, J.M.: POEtic: an electronic tissue for bio-inspired cellular applications. Biosystems 76(1-3), 191–200 (2004) 12. Thoma, Y., Upegui, A.: Ubimanager: a software tool for managing ubichips. In: NASA/ESA Conference on Adaptive Hardware and Systems, pp. 213–219 (2008) 13. Thoma, Y., Upegui, A., Perez-Uribe, A., Sanchez, E.: Self-replication mechanism by means of self-reconfiguration. In: Lukowicz, P., Thiele, L., Tr¨ oster, G. (eds.) ARCS 2007. LNCS, vol. 4415. Springer, Heidelberg (2007) 14. Upegui, A., Perez-Uribe, A., Thoma, Y., Sanchez, E.: Neural development on the ubichip by means of dynamic routing mechanisms. In: Hornby, G.S., Sekanina, L., Haddow, P.C. (eds.) ICES 2008. LNCS, vol. 5216, pp. 392–401. Springer, Heidelberg (2008)

Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism on the Ubichip Platform Kotaro Kobayashi1, Juan Manuel Moreno2 , and Jordi Madrenas2 1

2

Delft University of Technology, Delft, The Netherlands [email protected] Universitat Polit`ecnica de Catalunya, Barcelona, Spain [email protected], [email protected]

Abstract. Dynamic fault-tolerant techniques such as Built-in Self Repair (BISR) are becoming increasingly important as new challenges emerge in deep-submicron era. A dynamic fault-tolerant system was implemented on the Ubichip platform developed in the PERPLEXUS European project, which is a bio-inspired custom reconfigurable VLSI. The system is power-aware; power consumption is monitored dynamically to regulate the number of copies made by a self-replication mechanism. This paper reports the design, implementation, and simulation of the fault-tolerant system. Keywords: Dynamic Fault Tolerance, Self-replication, Reconfiguration, BISR, Bio-inspiration, Ubichip, PERPLEXUS, Power-awareness.

1

Introduction

The IC technology scaling, which follows the famous Moore’s law has evoked a great deal of advancement in modern electronics for the last few decades. Designers have been able to integrate greater number of transistors on a limited area of silicon die; modern VLSI systems with multiple function blocks on a single die allow designers to reduce the physical size of the systems and manufacturing costs. The ITRS predicts in [2] that the gate length of VLSI systems will go below 20 nm in the later half of this decade, a length enough to fit only few hundreds of silicon atoms in one line. This deep-submicron paradigm poses new challenges to the VLSI design; intricacy of the fabrication will be greater, so that manufacturing defects will likely increase while testing for those defects will be very challenging due to the ever increasing complexity of the system. The reliability will also suffer due to phenomena such as gate insulator tunneling, Joule heating, and electromigration. Furthermore, the small feature size will certainly increase the unpredictable errors due to alpha particles, namely soft error, or Single Event Upset (SEU) [1], [2] . G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 299–309, 2010. c Springer-Verlag Berlin Heidelberg 2010 

300

K. Kobayashi, J.M. Moreno, and J. Madrenas

There have been many advancements in techniques such as Design for Test (DFT) and Built-in Self-test (BIST) [1]. While these tests can effectively detect faults due to defects, they cannot detect unforeseeable faults caused by aging-defects or temporal faults such as SEU. In order to assure the reliability while incorporating deep-submicron technologies, the system should have dynamic fault-tolerance capabilities to detect and correct errors on the run. If a VLSI system can autonomously detect and correct an error situation dynamically, it will not only increase the reliability but also the yield and life-time of the ICs, resulting in a significant cost reduction [5]. The Ubichip is a bio-inspired custom reconfigurable VLSI system developed in the PERPLEXUS project [6], [10]. Ubichip offers bio-inspired capabilities such as dynamic routing and self-replication. The operational flexibility provided by these mechanisms makes Ubichip an ideal platform to implement dynamic fault tolerant systems with Built-in Self Repair (BISR) capabilities. This paper presents the design, development, and simulation of a power-aware fault-tolerant system implemented on the Ubichip. Section 2 discusses the background and overall system architecture. Section 3 briefly introduces the Ubichip platform used in this experiment. Section 4 describes the implementation of the design in detail. Section 5 discusses the implementation and simulation results. Finally, the future research areas as well as concluding remarks are included in section 6.

2 2.1

A Power-Aware Fault Tolerant System Background

In order to protect a system from logic errors during run-time,it can use Built in Self Repair (BISR). Several different methods of implementing BISR are discussed in [5]. Triple Modular Redundancy (TMR) is a widely known method of BISR. Although it is also known to be area consuming, it is very simple to design and unlike error correcting codes [3], no static specialized design tools are required; it is more versatile in accommodating different logic circuits. In dynamic reconfigurable systems, Funcion Units (FU) are configured at run-time as required. Unused FUs can simply be deleted to give more space for necessary functions. In such systems, the same TMR circuit can work for different FUs configured in the same area because of the simplicity of the algorithm. 2.2

Power Awareness

The power consumption must be considered when implementing TMR. Having three identical circuits would result in at least three times more power consumption in terms of switching current. Furthermore, power consumption is a major issue to be solved in VLSI today; larger circuits, higher operation frequency, and smaller feature size all contribute to higher power consumption. TMR is intrinsically not a power efficient design technique. In order to reduce the effect on power consumption, authors have implemented the power-aware

Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism

301

TMR system based on a previous work presented in [11]; the system monitors its power consumption and eliminates one or both of the TMR copies when the power consumption is above a predefined threshold. While clock gating or power gating also can be used to control the power consumption of TMR designs in the same way, our framework on Ubichip is capable of dynamic reconfiguration, thus same FU space can be used for different blocks according to power consumption and operation phases. 2.3

System Description

Figure 1 shows the FSM states of the power aware design presented in this paper. Initially the system starts with a single functional unit (FU). As our system platform (Ubichip) does not have current sensing capabilities, the power consumption of the running application is measured by a ’transition counter’. This subsystem estimates the power consumption by means of a ’counter value’ and controls the number of FU copies using the self-replication (SR) function of the Ubichip. Counter value is computed by accumulating output values from multiple clock cycles and counting the number of transitions. When the number of transitions from the original FU is the highest, meaning in this case more than 3 bits transition in 2 consecutive clocks, the counter value is ’00’ and no copies of FU are made. When the number of transitions is low, meaning the transition from the original FU is between 0 and 1 bit for 2 consecutive clocks, the counter value becomes ’10’, which leads the system to create 2 copies of FU. Counter value ’01’ is an intermediate transition count; when the output from the original FU has 2-bit transition for more than 2 clock periods, only one copy of the FU is created.

Fig. 1. FSM State Diagram of the implemented system

302

K. Kobayashi, J.M. Moreno, and J. Madrenas

The system starts with single FU mode. After few clock cycles the transition counter estimates the current consumption and indicates it as ’counter value’. The system constantly monitors its power consumption and changes the number of FU copies accordingly.

3

A Reconfigurable Framework: PERPLEXUS

The system was designed within a framework developed in the PERPLEXUS European project. The Ubichip is the kernel of this project; a reconfigurable VLSI system endowed with bio-inspired capabilities. Details of the PERPLEXUS project can be found in [10], [6]. 3.1

Ubichip

Ubichips are mounted on a prototype system called Ubidule, explained in [10]. A Ubichip consists of three major blocks: An array of reconfigurable processing elements called Macrocell (MC), the System Manager and a controller for Content Addressable Memory (CAM). The system manager block is responsible for configuring the reconfigurable array and external communication. Each MC is made up with four reconfigurable cells called Ubicell, which is explained later in this section. The configuration bit stream for each MC can be recovered and configured dynamically using the Self-Replication (SR) function of the Ubichip. The SR function is used extensively in this project, thus its details are briefly explained later in this section. Each MC also contains a Dynamic Routing (DR) control unit, which allows a pair of MCs to establish communication paths dynamically. The DR functionality of Ubichip is further explained in [7]. Furthermore, a Ubichip can also be configured in multiprocessor mode where a SIMD-like parallel machine can be implemented. 3.2

The Ubicell

Figure 2 shows the overall organization of a Ubicell. As explained extensively in [4], a Ubicell can be configured to implement various logic functions in LUT mode or work as a part of multi-processor machine in ALU mode. In this project all the cells are configured to various configurations within LUT mode. 3.3

Inter-cell Connection

Neighboring Ubicells can be connected by selecting appropriate input/output multiplexers. Figure 3 shows the neighborhood connectivity among Ubicells. The output multiplexers are able to choose not only the output but raw input from other neighbor cells as well. Furthermore, it is possible for any pair of macrocells (4 Ubicells) to communicate using the Dynamic Routing (DR) capability.

Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism

303

Fig. 2. Organizatio of a Ubicell (Left), Ubicell array and Macrocell (Right)

Fig. 3. Inter-Ubicell Connectivity

3.4

Self Reconfigulation

A group of more than one macrocells (organism) can be copied to other parts of the Ubicell array using the Self-Replication (SR) mechanism. An organism has the configuration bits of its MCs connected by a chain of shift registers. The configuration bits of MCs can be recovered through this chain by a SR controller. The SR controller can use this recovered bit-stream to configure an empty area during self-replication process. Details of the SR controller on Ubichip are explained in [9]. 3.5

Ubimanager

The authors used a software tool called Ubimanager, which was designed in the PERPLEXUS project in order to manage the Ubichips. The Ubimanager allows developers to design Ubichip implementations by means of a GUI environment;

304

K. Kobayashi, J.M. Moreno, and J. Madrenas

developers can configure all the three layers of Ubichip: Ubicells, Dynamic Routing Units (DR), and Self-Replication Units (SR). It is also capable of simulating the implementation using Modelsim. A detailed description of Ubimanager tool is provided in [8]. In a Ubimanager environment, the array of Ubicells is represented in a GUI window; a developer can configure each cell by double-clicking the cell to open the configuration window.

4

Implementation

Figure 4 shows shows a block diagram of the dynamic fault tolerant system implemented on a Ubichip. There are three SR controllers; one is responsible for reading the configuration bit-stream from the original FU, being the other two responsible for replicating the copies. The control signals for the SR controllers are created in the ’Control FSM’ block. The level of system power consumption is sent to the control FSM from the ’Transition Counter’ block as ’counter value’. According to the power consumption level the FSM changes the operation mode and forces the SR controller to have an appropriate number of FU copies. Every time new copies of the FU are made, the Control FSM block relies on the signal from the ’SR Timer’ block to stop the SR process upon completion. The outputs from FUs are compared at the ’Output Comparator’ block.

Fig. 4. Power Aware Fault Tolerant System: Overall Block Diagram

The functionality of its main building blocks is the following. Functional Unit (FU): As the goal of this experiment is to show a proof of concept working system, the implementation of the Functional Unit (FU) was kept simple; a combination of memory, counter and pseudo-random number generator (LFSR) was configured in a MC as shown in figure 5.

Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism

305

Fig. 5. Functional Unit (FU)

Output Comparator: In this implementation, output comparator simply indicates the bit-wise XOR of the outputs from each FU. In a future implementation, the comparator result should be fed back to the controller to implement error correction. SR Controller: While it is possible for one SR controller to remove and make a copy of an organism (set of MCs), ’remote configuration’ explained in [9] is necessary to control two different copies separately. In this case, a total of 3 SR units are required: one for recovering the configuration bit stream of the original organism, one each for configuration of the two copies. The SR mechanism of Ubichip blocks the output from the MCs during the SR process, eliminating the need to filter erroneous output during the SR process. Furthermore, values of each register in the MCs are incorporated in the configuration bit stream; states of the circuits are preserved to the newly created copy of an organism. SR Timer: A 4-bit flag called ’H-flag’ contained in each MC defines the shape of an organism. The SR unit does not have the number of MCs included in a single organism; it is not possible for the SR unit alone to determine the number of cycles required to complete a SR process. A counter is necessary to stop the SR process at an appropriate time. FSM: While a Ubimanager provides a GUI environment for design implementation, it cannot compile from high-level languages such as C or VHDL; the entire circuit must be implemented by a combination of circuits available in the LUT mode of the Ubicells. The authors resorted to utilize commercially available RTL synthesizer tools to implement the FSM. First, the state chart was converted to HDL using Mentor Graphics HDL Designer. Next, Precision RTL, also by Mentor Graphics was used to syntesize the HDL and produce the RTL schematics with look-up tables (LUTs). The contents of the LUTs as well as the connections among the LUTs were then configured manually to each Ubicell using Ubimanager. Routing, Floor planning: Figure 6 shows the implemented system. One can see the wiring for routing, and configured Ubicells in this figure. All the routing and floor planning are conducted manually; there is no automatic tool available thus planning of the location of each functional cells, connections among cells, and overall floor-planning is a very crucial part of design implementation on Ubichips, and should be conducted carefully.

306

K. Kobayashi, J.M. Moreno, and J. Madrenas

Fig. 6. System Implementation on Ubichip

5

Implementation Results

The implementation was tested using the Modelsim tool integrated in the Ubimanager environment. Figure 7 and 8 show screen-shots of the system under simulation. One can see how a different counter value results in a different number of copies. Each system block was confirmed to be working according to the design intention. After the system was verified by simulation it was physically implemented in the Ubichip available in the Ubidule board. 5.1

Cell Count, Area Overhead

Table 1 shows the number of cells used for each system block. The fault tolerant system takes a total of 64 Ubicells. The area overhead in this application is significant because a very primitive 4-cell single MC was used as the FU. When a larger FU is implemented, the area of the rest of the system remains unchanged. As the size of Ubicell array in Ubichip is 10 by 10 MCs (20x20=400 Ubicells), the area overhead of this fault tolerant system is 16% of the total array area. 5.2

Timing Observation

Configuring cells using a serial register means that the time required for configuration increases as the number of MCs to be replicated increases. Table 2 shows the cycles required for the SR unit to complete for different number of MCs. The worst-case estimation for the operating frequency of Ubichip is 50MHz. Since the operation of the FU must pause during the replication process, the replication time especially for larger FUs may become a serious issue for timing-critical applications.

Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism

Fig. 7. Simulation View. 2FU mode

Fig. 8. Simulation View. 1FU mode Table 1. Cell Count of the Design Block: # of cells: Control FSM 22 SR Controller (x3) 12 Output Comparator 11 SR Timer 5 Transition Counter 14 Functional Unit 4

307

308

K. Kobayashi, J.M. Moreno, and J. Madrenas

Table 2. Clock Cycle and time in seconds required for Self-Replication (at 50MHz) Number of MCs: Clock Cycles: Time in seconds: 1 547 10.9μs 2 1,072 21.4μs 10 5,272 105μs 300 157,522 3.15ms

6 6.1

Conclusion Concluding Remarks

Built-in Self-Repair (BISR) is a technique becoming more and more important as the feature size of VLSIs shrink and the chance of faults such as aging defect and temporal errors increases. In this paper, a conceptual design of a poweraware BISR system using triple-modular redundancy (TMR) was implemented on a custom dynamically reconfigurable platform. Motivation of such system as well as the design and implementation was explained followed by the simulation and implementation results and observation. The authors have successfully demonstrated how the Ubichip, a bio-inspired reconfigurable custom VLSI can be used to implement flexible power-aware fault tolerant systems. 6.2

Future Work

In order to have the fault-tolerant system presented here to be available for more practical uses, the authors have found several directions for future research: Power estimation: Accurate measurement of power consumption is necessary to have a power-aware system working correctly. As the Ubichip platform does not offer current measurement capabilities, this experiment took transition of output values to estimate the dynamic power consumption of the functional unit. A research should be conducted to incorporate a system to measure the power consumption more accurately. Error Correction: In this experiment, a simple bit-wise XOR circuit compared the outputs in the TMR system. Further research and development is necessary to implement an error correction capabilities. Such correction system should detect and locate the circuit with error, eliminate the faulty circuit out from the TMR trio, and create a new copy of the circuit in a new location. SR Controller: The control mechanism of the system was implemented on reconfigurable cells on the Ubichip, resulting in an area overhead on the reconfigurable fabric. A research should be conducted to study the possibility of implementing the self-replication controller circuit as part of the platform so that developers can easily implement this BISR capability in their new designs.

Implementation of a Power-Aware Dynamic Fault Tolerant Mechanism

309

Developing Environment: Ubimanager provides many useful features for designing and implementing functions on Ubichip platform. However, the lack of a high-level language compiler means that the developers must implement LUT contents and routing manually, increasing the development time significantly. Furthermore, the lack of debug tools makes it very time consuming to detect and correct errors in the design. Design tools such as floor planner, interconnect router, high-level language compiler, and debugger would make Ubichip more accesible in practical applications.

Acknowledgements This work has been partially funded by the European Union (PERPLEXUS project, Contract no. 34632).

References 1. Bushnell, M.L., Agrawal, V.D.: Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Kluwer Academic Publishers, Boston (2000) 2. International technology roadmap for semiconductors. 2009 ITRS report, emerging research materials. Technical report (2010) 3. Kleihorst, R.P., Benschop, N.F.: Fault tolerant ICs by area-optimized error correcting codes. In: IOLTW, p. 143. IEEE Computer Society, Los Alamitos (2001) 4. Moreno, J.M., Madrenas, J.: A reconfigurable architecture for emulating large-scale bio-inspired systems. In: IEEE Congress on Evolutionary Computation, CEC 2009, pp. 126–133, 18-21 (2009) 5. Nieuwland, A.K., Kleihorst, R.P.: IC cost reduction by applying embedded fault tolerance for soft errors. J. Electronic Testing 20(5), 533–542 (2004) 6. PERPLEXUS Project. Pervasive computing framework for modeling complex virtually-unobunded (2010), http://www.perplexus.org/ 7. Thoma, Y., Sanchez, E., Moreno, J.M., Tempesti, G.: A dynamic routing algorithm for a bio-inspired reconfigurable circuit. In: Cheung, P.Y.K., Constantinides, G.A., de Sousa, J.T. (eds.) Field-Programmable Logic and Applications, pp. 681–690. Springer, Heidelberg (2003) 8. Thoma, Y., Upegui, A.: Ubimanager: A software tool for managing ubichips. In: NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2008, pp. 213– 219, 22-25 (2008) 9. Thoma, Y., Upegui, A., Perez-Uribe, A., Sanchez, E.: Self-replication mechanism by means of self-reconfiguration. In: Lukowicz, P., Thiele, L., Tr¨ oster, G. (eds.) ARCS 2007. LNCS, vol. 4415. Springer, Heidelberg (2007) 10. Upegui, A., Thoma, Y., Sanchez, E., P´erez-Uribe, A., Moreno, J.M., Madrenas, J., Sassatelli, G.: The PERPLEXUS bio-inspired hardware platform: A flexible and modular approach. KES Journal 12(3), 201–212 (2008) 11. Vargas, J.S., Moreno, J.M., Madrenas, J., Cabestany, J.: Implementation of a dynamic fault-tolerance scaling technique on a self-adaptive hardware architecture. In: Prasanna, V.K., Torres, L., Cumplido, R. (eds.) Proceedings of ReConFig 2009: 2009 International Conference on Reconfigurable Computing and FPGAs, Cancun, Quintana Roo, Mexico, December 9-11, pp. 445–450. IEEE Computer Society, Los Alamitos (2009)

Automatic Synthesis of Lossless Matching Networks Leonardo Bruno de Sá1, Pedro da Fonseca Vieira2, and Antonio Mesquita3 1

Brazilian Army Technological Center, Av. das Américas, 28705, Guaratiba, Rio de Janeiro, Brazil 2,3 Federal University of Rio de Janeiro, Ilha do Fundão, Electrical Engineering Program, Rio de Janeiro, Brazil [email protected], [email protected], [email protected]

Abstract. An evolutionary method for the synthesis of impedance matching networks is proposed. The algorithm uses a coding scheme based on the graph adjacency matrix to represent the topology and component values of the circuit. In order to generate realistic solutions the sensitivities of the network parameters are accounted for during the synthesis process. To this end a closed form expression for the Transducer Power Gain sensitivity with respect to the component values of LC lossless matching networks is derived, in such a way that the effects of the components tolerance on the matching network performance can easily be quantified. The evolutionary algorithm efficiency is tested in the synthesis of an impedance matching network and the results are compared with other methods found in the literature. Keywords: circuit synthesis, matching networks, graph representation.

1 Introduction The impedance matching problem consists of finding a linear time-invariant lossless two-port network (also called equalizer) such that the available power from an input resistive generator is delivered to a complex frequency dependent load over a prescribed frequency band [1]. Many techniques have been developed for the impedance matching problem solution along the last seventy years. The first successful technique, called analytic gain bandwidth [2-3], is based on a load model that characterizes the matching circuit termination by a prescribed rational transfer function. This technique provides a gain bandwidth limitation on any lossless infinite-element matching circuit having simple RC or RLC loads. Moreover, it describes a procedure to synthesize practical equalizers for a given load model. The main drawback of this technique is the need of precise load models requiring the use of lengthy approximation methods. In order to overcome the difficult problem of designing an equalizer by analytical means, the Real Frequency Technique (RFT) has emerged as a major breakthrough in equalizer design [4-5]. The main advantage of this technique is that no model of the load is required. However, this method is not quite effortless since several computational phases involving data fitting processes and explicit factorization of real polynomials are still necessary. G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 310–321, 2010. © Springer-Verlag Berlin Heidelberg 2010

Automatic Synthesis of Lossless Matching Networks

311

The methods described previously provide a procedure to synthesize an equalizer but present at least one of the following drawbacks: require an approximate rational model of the load [2-3], require some data fitting process [4-5], require an initial guess of the equalizer parameters [6] or use general purpose fixed topologies for the matching network [2-5]. These problems can be conveniently handled if an evolutionary approach is employed to synthesize the matching network. It will be shown that the load characteristics can be directly manipulated by the evolutionary process, improving the synthesis results since no approximation errors derived from the antenna modeling are introduced and the proposed evolutionary process does not require any data fitting or initial parameter guess. Moreover, since classical synthesis methods use a limited number of fixed topologies, they are unable to explore a wide range of design alternatives. On the other hand, evolutionary techniques may be used to find unconventional network topologies [6-7]. In the present work, a hybrid evolutionary method combining two algorithms is proposed for the matching network synthesis problem. The benefits of hybrid methods combining Genetic Algorithm (GA) and traditional optimization techniques were already attested for analog synthesis [8-9]. The topology search is provided by a GA based on the adjacency matrix chromosome representation [10], while the component values tuning is performed combining the GA and the classical Nelder-Mead Downhill Simplex [11] method. The fitness computation considers practical ranges for the component values and the matching network sensitivity, eliminating the need for post-synthesis components tolerance analysis.

2 TPG Sensitivity Computation The impedance matching problem is shown schematically in Fig 1 where the internal nodes of the lossless network are numbered from left to right starting from node 1 up to node n. V1 denotes the voltage on node 1 and Vn denotes the voltage on node n.

The values of the load impedance, Zload ( ω ) , are sampled at a discrete set of frequencies along the desired bandwidth. Then a table containing the real and imaginary parts of Zload ( ω ) at each sample frequency is known. The matching problem for a lossless network may be formulated in terms of the Transducer Power Gain (TPG) defined as the fraction of the maximum power available from the source which is delivered to the load [1]:

Fig. 1. Impedance matching arrangement

312

L.B. de Sá, P. da Fonseca Vieira, and A. Mesquita

TPG ( ω ) = 1 − ρ1 ( ω )

2

(1)

where ρ1 ( ω ) , the reflection coefficient of port 1, is a function of the input impedance Z1 ( ω ) :

ρ1 ( ω ) =

Z1 ( ω ) − R o1

(2)

Z1 ( ω ) + R o1

The voltage V1 may be expressed in terms of Z1 and R o1 as: V1 =

Z1 Z1 + R o1

(3)

where the frequency dependence was omitted to simplify. Combining (1), (2) and (3), the reflection coefficient of port 1 may be expressed in terms of V1 :

ρ1 = 2V1 − 1

(4)

Considering a lossless matching network with m parameters p = [ p1 , p 2 ,..., pm ] where the pi 's are the network component values. The TPG sensitivity with respect to each parameter, pi , is defined as [12]:

= STPG pi

pi ∂TPG ⋅ TPG ∂pi

(5)

Combining (1), (4) and (5), the TPG sensitivity may be written as:

STPG = pi

(

4pi 2

)

2V1 − 1 − 1

⋅ℜ [ 2V1 − 1] ⋅

∂V1 ∂pi

(6)

where ℜ [•] is the real part of a complex variable. Since the nodal voltages of Fig. 1 can be obtained in an AC analysis, the only unknown term to be determined in (6) is the partial derivative of V1 with respect to pi . This term can be expressed as [13]: t ∂ [Y] ∂V1 = ⎡⎣ V a ⎤⎦ ⋅ ⋅ [V] ∂pi ∂pi

(7)

where ⎡⎣ V a ⎤⎦ , [ Y ] and [ V ] are, respectively, the adjoint nodal voltage vector, the nodal admittance matrix and the nodal voltage vector. The nodal voltage vector and the adjoint nodal voltage vector can be obtained, respectively, from the AC smallsignal analysis of the circuits shown in Fig. 1 and Fig. 2. The last term in order to compute (7) is the partial derivative of the admittance matrix. There are only two possible ways of connecting a two-terminal element in a network as shown in Table 1: a “floating” connection or a “grounding” connection. The corresponding contributions to the ∂ [ Y ] / ∂pi matrix are given in the same table.

Automatic Synthesis of Lossless Matching Networks

313

Fig. 2. Circuit used to obtain the adjoint voltage vector Table 1. Partial Derivative of the Admittance Matrix Connection Case

∂ [ Y]

Circuit Connection

∂pi

⎧ j ≠ gnd floating case ⎨ ⎩ k ≠ gnd

⎧ j ≠ gnd grounding case ⎨ ⎩k = gnd

The non-zero entries of the matrices in Table 1 may be expressed as: ∂Yjj ∂pi

=

∂Yjk ∂Ykj ∂Ykk =− =− = YLC ∂pi ∂pi ∂pi

(8)

where YLC is given by:

1 ⎧ , if pi is an inductor ⎪− YLC = ⎨ j ⋅ ω⋅ L2 ⎪ j⋅ ω , if pi is a capacitor ⎩

(9)

where ω∈ Ω = ⎡⎣ωmin , ωmax ⎤⎦ . Replacing (9) and (8) in (7) and (7) in (6):

)(

)

4pi ⎧ ⋅ ℜ ⎣⎡2V1 -1⎦⎤ ⋅ YLC ⋅ Vj -Vk ⋅ Vja -Vka , if floating connection ⎪ 2 2V -1 -1 ⎪⎪ 1 STPG pi = ⎨ 4pi ⎪ ⋅ ℜ ⎡⎣2V1 -1⎤⎦ ⋅ YLC ⋅ Vj ⋅ Vja , if grounding connection ⎪ 2V -1 2 -1 1 ⎪⎩

(

)

(

)

(

(10)

Therefore, according to (10), in order to compute the TPG sensitivity for a lossless impedance matching network, two AC analyses must be performed. In the first AC

314

L.B. de Sá, P. da Fonseca Vieira, and A. Mesquita

analysis, the circuit in Fig. 1 is used to compute V1 ,…, Vk . In the second AC analysis, the circuit in Fig. 2 is used to compute the adjoint voltages V1a ,…, Vka .

3 Proposed Evolutionary Algorithm All input parameters of the proposed evolutionary algorithm are listed in Table 2. Table 2. Control Parameters used in the Proposed Evolutionary Algorithm

Maximum Number of Nodes

Parameter

Acronym N nodes _ max

Minimum Number of Nodes

N nodes _ min

2

Population Size

Nind

200

Number of Generations

N gen

100

Probability of Crossover

PC

0.6

Probability of Mutation Simplex Number of Iterations

Values 10

PM

0.1

N NM

200

λ

0.01

wTPG ,wSens ,wPF,wTR

2,2,1,1

Capacitors Interval

[ Cmin , Cmax ]

[0 ,5 ]F

Inductors Interval

[ L min , L max ]

[0 ,5 ]H

Penalty Constant Weighting Factors

3.1 Representation

Let G ( v, e ) be an oriented graph with no parallel edges and n + 1 nodes sorted out between 0 and n , where 0 is the ground node of the circuit. The reduced adjacency matrix A = [a ij ] of the oriented graph G is the n x n matrix defined as [11]: ∀ i≠ j≠0 ⎧1, if (i, j) ∈ e a ij = ⎨ ⎩0, otherwise a ii = 1, if (i, 0) ∈ e

(11)

In the above definition, the self-loops are replaced by the adjacencies to the ground node. Fig. 3 shows an example of a graph representing a typical topology of an analog circuit and its corresponding reduced adjacency matrix. The branches {e1 , e 2 , e3 } are the adjacencies to the ground node represented by the main diagonal entries. It will be shown that the adjacency matrix representation is extremely flexible, representing any lossless network topology. This is an important advantage when compared with other chromosome coding schemes found in the literature [7, 15] that limit

Automatic Synthesis of Lossless Matching Networks

315

Fig. 3. Adjacency matrix representation for analog circuits (a) oriented graph G (b) adjacency matrix A

the number of topologies generated by the evolutionary process to a small class of circuits such as ladder networks. The reduced adjacency matrix, as stated in (18), allows representing topologies of any type as shown in Fig. 4. In the proposed topology coding scheme, the capacitors, the inductors and the parallel associations (C//L) are, respectively, encoded by the numbers 1, 2 and 3. Note that the parallel association (C//L) allows the adjacency matrix to represent grounded components of different type connected to the same node.

Fig. 4. (a) Ladder and (b) non-conventional topology and their corresponding adjacency matrix representations

It can be observed from the adjacency matrix definition that the proposed encoding scheme can map at most two parallel edges between two vertices. Since elements of the same nature in parallel can be replaced by their equivalents, this does not restrict the topology search in a lossless network synthesis. In the particular case of analog circuits, the representation must simultaneously represent topology and component values. In this sense, a 3D matrix was implemented as illustrated in Fig. 5. In this structure, the first matrix dimension defines the network topology according to (18) with the elements encoding scheme used in Fig. 4. The other two matrix dimensions represent, respectively, capacitor and inductor values.

Fig. 5. Proposed representation of an impedance matching network using the adjacency matrix

316

L.B. de Sá, P. da Fonseca Vieira, and A. Mesquita

The component values ranges are defined at the start of the evolutionary process in such way that the synthesized circuit can be implemented with practical element values as described in the last two lines of Table 2. A linear normalization procedure is employed to represent the component values using a 16-bit unsigned integer matrix. 3.2 Crossover and Mutation The proposed crossover strategy, consisting in exchanging two submatrices with randomly chosen dimensions, is illustrated in Fig. 6. Assume two individuals with their adjacency matrix representations of dimensions m and n . If m < n , the coordinates of the crossover point ( i, j) are chosen on the smaller matrix, where i and j are integers inside the intervals i ∈ [0, m -1]

(12a)

j ∈ [0, m -1]

(12b)

To define the dimensions of the submatrices to be exchanged in the crossover, two integers p and q are randomly chosen in the intervals: p ∈ [1, m - i]

(13a)

q ∈ [1, m - j]

(13b)

Finally, the coordinates of the crossover point on the largest matrix are integers randomly chosen in the intervals: k ∈ [0, n - p] (14a) l ∈ [0, n - q]

(14b)

In Fig. 6, the dimensions of the matrices, the coordinates of the crossover points and the dimensions of the submatrices are, respectively, m = 3 , n = 4 , i = 1 , j = k = l = 2 , p = 1 and q = 2 .

Fig. 6. Example of a crossover between two impedance networks (a) parent 1 (b) parent 2 (c) offspring individual 1 (d) offspring individual 2

Mutation requires only one individual. The proposed mutation operator creates a new individual by randomly replacing an existing submatrix by a new submatrix containing also randomly chosen entries, as depicted in Fig. 8.

Automatic Synthesis of Lossless Matching Networks

317

Fig. 7. Example of a mutation (a) before mutation (b) after mutation

Assume one individual with its adjacency matrix representation of dimension m . The coordinates of the mutation point ( i, j) are chosen according to (12). The dimensions of the submatrix are always p = q = 1 . In Fig. 7, where a capacitor is exchanged by a parallel association of a capacitor and an inductor, the matrix dimension and the coordinates of the mutation point are, respectively, m = 3 and i = j = 1 . It can be noted that the topology and the component values of the impedance matching network are simultaneously changed by the proposed genetic operations. 3.3 Fitness Computation Since the main specifications that a practical lossless matching network should fulfill are: TPG close to one over the prescribed frequency band, low sensitivity and practical component values, the proposed evolutionary process will take into account all these characteristics in the fitness computation. The fitness should be maximized and it is defined as the inverse of the error function ε given by: ε = 1 + w TPG ⋅ ε TPG + w Sens ⋅ ε Sens + w PF ⋅ ε PF + w TR ⋅ ε TR

(15)

In this equation, ε is the total error, ε TPG is the error in the impedance matching, εSens is the error in the network sensitivity, ε PF is the error in the component values, ε TR is the error for using an ideal transformer and the w i 's are weighting factors. The choice of the weighting factors is generally based on expert knowledge of the optimization problem at hand [17]. Combining (1) and (4), the TPG can be written as a function of the voltage on node 1, which was previously stored by evolutionary algorithm after the simulator execution:

TPG = 1 − 2V1 − 1

2

(16)

Having computed the TPG values along the frequency band of interest, a minimax error criterion is used to obtain ε TPG :

{

εTPG = min max TPG −1 ω∈Ω

}

(17)

where Ω = [ ωmin , ωmax ] denotes the frequency band of interest. Low sensitivity with respect to the component values is a necessary condition for practical implementations of evolved matching networks. In this case, the closed form TPG sensitivity derived in Section II is used with a minimax criterion to compose the sensitivity error:

318

L.B. de Sá, P. da Fonseca Vieira, and A. Mesquita

⎧⎪ m ⎫⎪ (18) εSens = min ⎨ max STPG ⎬ pi ⎩⎪ i =1 ω∈Ω ⎭⎪ where m is the number of elements in the lossless matching network. The third requirement that a lossless matching network must met is related to practical component values. A penalty function strategy is used to restrict the component values to practical ranges. The major idea is to transform a constraint optimization problem into an unconstrained one by introducing a penalty term into the total error function [18]:



m

ε PF = λ ⋅

∑d

i

(19)

i =1

where λ and d i are, respectively, a user-defined constant and a distance metric for the i th constraint. The last error, ε TR , is concerned with the use of an ideal transformer for impedance matching: ⎧0.2, if an ideal transformer is used εTR = ⎨ (20) ⎩0 , otherwise 3.4 Algorithm Overview

The epistatic nature of the evolutionary analog circuit synthesis is well known [8, 19]. This means that the behavior of any analog circuit is a combined function of topology and component values. The use of a GA to modify both, the topology and the component values of a circuit may result in an inefficient evolutionary algorithm [8]. In fact, a network topology generated by genetic operations can be properly evaluated only if the component values are tuned. To overcome this problem, an evolutionary algorithm performing the topology search through a GA (crossover and mutation) and the component values tuning through a conventional optimization method should be preferred. The schematic diagram of the proposed evolutionary algorithm is shown in Fig. 8.

Fig. 8. Evolutionary algorithm used in the impedance matching network synthesis including the component values tuning step

In the proposed algorithm all individuals in the initial population are tuned by the Nelder-Mead Downhill Simplex method, which do not require the calculation of derivatives. Since there is the possibility of the Nelder-Mead algorithm getting stuck in local minima, it is combined with the random search of the GA to find the minimum to (15). The criterion of number of iterations, N NM in Table 2, is used to stop the

Automatic Synthesis of Lossless Matching Networks

319

component values tuning. In this work, a proportional selection operator with elitism is used as shown in Fig. 8. The inclusion of the component values tuning in the fitness computation requires a demanding computational effort. Except for the initial population, where all individuals must be tuned, for the next populations only part of the individuals must be tuned. This occurs since in the next populations not all individuals are submitted to the genetic operations. In this sense, a Boolean variable associated to each individual indicating if the individual suffered crossover or mutation is used. Thus, only the individuals that were affected by the genetic operators will be optimized in the tuning step, reducing the computational effort.

4 Numerical Results This example is a well-known test case called Fano LCR load [2], it was used to verify the effectiveness of the proposed evolutionary method. The load impedance consists of a 1.0Ω resistor in parallel with a 1.2F capacitor in series with a 2.3H inductor. The gain must be maximized in the bandwidth [0, 1 rad/s]. The LCR load is sampled at a discrete set of 100 frequencies along the desired bandwidth. This example was already solved in [20] using a hybrid algorithm that optimizes ladder topologies and in [21] using the Real Frequency Technique (RFT). Although the proposed algorithm can synthesize any kind of topology, in this particular case, the topology found by the proposed algorithm was the same found by the other two mentioned approaches as shown in Fig. 9.

Fig. 9. Lossless matching network topology Table 3. Parameters of example TPGmin Passband Ripple in dB Sensmax Transformer Ratio n C1 (F) L2 (F) C3 (F)

RFT [21] 0.848 0.191 24.892 1.483 0.352 2.909 0.922

Hybrid Algorithm [20] 0.852 0.239 19.876 1.485 0.386 2.976 0.951

Proposed Method 0.855 0.264 13.366 1.493 0.409 3.023 0.971

Table 3 summarizes the results. The passband ripple is defined in [21]. The proposed algorithm obtained the best result of TPG and sensitivity, but the worst of passband ripple. This is an expected consequence of trying to maximize the minimum sensitivity and passband gain regardless of the passband ripple. Fig. 10(a) shows the TPG along the

320

L.B. de Sá, P. da Fonseca Vieira, and A. Mesquita

prescribed frequency band for the three mentioned approaches. The simulations were done using HSPICE. The control parameters used by the proposed method are described in Table 2. The evolution of the best individual’s fitness throughout generations for two different configurations of the proposed evolutionary algorithm is performed. In the first configuration, without tuning step, the topology and component values are entirely manipulated by the GA. In this case, as shown in the figure, the fitness stays almost constant along the generations, since nothing was done to deal with the epistatic nature of analog circuits. In the second configuration, with tuning step, the Nelder-Mead Downhill Simplex is used with the GA. It can be noted in this case the substantial fitness changes between consecutive generations provided by the tuning step. The algorithm was run 25 times for each case and only the best individual performance of all runs is shown in Fig. 10(b).

Fig. 10. (a) Transducer Power Gain for the three approaches (b) Fitness versus generation for the best individuals in two different configurations of the evolutionary algorithm.

5 Conclusions A closed form to compute the TPG sensitivity with respect to the component values for a lossless impedance matching network was derived. An evolutionary algorithm including the sensitivity as part of the fitness computation was proposed. The representation of lossless impedance matching networks based on the adjacency matrix was presented as an alternative to representations that limit the number of topologies generated by the evolutionary process. In order to deal with the epistasy problem characteristic of the analog circuit synthesis, the conventional evolutionary algorithm steps were modified by the insertion of the component values tuning step during the fitness computation. This mechanism proved to be efficient, increasing substantially the best individual’s fitness throughout the generations of the evolutionary process. In order to test the algorithm, a well-known LCR load was used as example and it was observed that the results obtained by the proposed approach compare favorably with other results found in the literature.

Automatic Synthesis of Lossless Matching Networks

321

References 1. Balabanian, N., Bickart, T.A., Seshu, S.: Electrical Network Theory. John Wiley & Sons, Chichester (1969) 2. Fano, F.M.: Theoretical limitations on the broadband matching of arbitrary impedances. J. Franklin Inst. 249, 57–83 (1950) 3. Youla, D.C.: A new theory of broadband matching. IEEE Trans. Circuit Theory CT-11, 30–50 (1954) 4. Carlin, H.J.: A New Approach to Gain-Bandwidth Problems. IEEE Trans. on Circ. and Syst. 24(4) (April 1977) 5. Carlin, H.J., Yarman, B.S.: The Double Matching Problem: Analytic and Real Frequency Solutions. IEEE Trans. on Circ. and Syst. 30(1) (April 1983) 6. Koza, J., Bennett, F.H., Andre, D., Keane, M.A.: Genetic Programming III. Darwinian Invention and Problem Solving. Morgan Kaufmann, San Mateo (1999) 7. Lohn, J.D., Colombano, S.P.: A Circuit Representation Technique for Automated Circuit Design. IEEE Trans. Evol. Comp. 3(3), 205–219 (1999) 8. Grimbleby, J.B.: Automatic analogue circuit synthesis using genetic algorithms. IEE Proc. Circuits Devices Syst. 147(6), 319–323 (2000) 9. Damavandi, N., Safavi-Naenini, S.: A Hybrid Evolutionary Programming Method for Circuit Optimization. IEEE Trans. Circ. and Syst. I 52(5) (May 2005) 10. Mesquita, A., Salazar, F.A., Canazio, P.P.: Chromosome representation through adjacency matrix in evolutionary circuits synthesis. In: Proc. of the NASA/DoD Conference on Evolvable Hardware, pp. 102–109 (2002) 11. Nelder, J., Mead, R.: A Simplex Method for Function Minimization. Computer Journal 7, 308–311 (1965) 12. Daryanani, G.: Principles of Active Network Synthesis and Design. John Wiley & Sons, Chichester (1980) 13. Vlach, J., Singhal, K.: Computer Methods for Circuit Analysis and Design, 2nd edn. Van Nostrand Reinhold (1994) 14. Swamy, M.N.S., Thulasiraman, K.: Graphs, Networks and Algorithms. John Wiley & Sons, Chichester (1981) 15. Chang, S., Hou, H., Su, Y.: Automated Passive Filter Synthesis Using a Novel Tree Representation and Genetic Programming. IEEE Trans. Evol. Comp. 10(1), 93–100 (2006) 16. Greenwood, G.W., Tyrrell, A.M.: Introduction to Evolvable Hardware – A Practical Guide for Designing Self-Adaptive Systems. Wiley Interscience, Hoboken (2007) 17. Zebulum, R.S., Pacheco, M.A.C., Vellasco, M.M.B.R.: Evolutionary Electronics - Automatic Design of Electronic Circuits and Systems by Genetic Algorithms. CRC Press, Boca Raton (2001) 18. Smith, A.E., Coit, D.W.: Handbook of Evolutionary Computation. In: De Jong, K., Fogel, L., Schwefel, H. (eds.) C.5.2 (1997) 19. Vieira, P.F., Sa, L.B., Botelho, J.P.B., Mesquita, A.: Evolutionary synthesis of analog circuits using only MOS transistors. In: Proc. of the 2004 NASA/DoD Conference on Evolvable Hardware, pp. 38–45. IEEE Computer Press, USA (2004) 20. Rodríguez, J.L., García-Tuñon, I., Tabeada, J.M., Basteiro, F.O.: Broadband HF Antenna Matching Network Design Using Real-Coded Genetic Algorithm. IEEE Trans. Antennas Propag. 55(3) (March 2007) 21. Carlin, H.J., Amstutz, P.: On optimum broadband matching. IEEE Trans. Circuits and Syst. CAS-28, 401–405 (1981)

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device Michael Farnsworth 1, Elhadj Benkhelifa 1, Ashutosh Tiwari 1, and Meiling Zhu 2 1 Decision Engineering Centre Microsystems and Nanotechnology Centre Cranfield University, College Road, Bedfordshire, MK43 0AL {m.j.farnsworth,e.benkhelifa,a.tiwari,m.zhu}@cranfield.ac.uk 2

Abstract. This paper introduces a novel approach to the evolutionary design optimisation of an MEMS bandpass filter, incorporating areas of multidisciplinary, multi-level and multi-objective design optimisation in the process. In order to demonstrate this approach a comparison is made to previous attempts to design similar bandpass filters, providing comparable results at a significant reduction in functional evaluations. In this endeavour, a circuit equivalent of the MEMS bandpass filter is evolved extrinsically using the SPICE Simulator. Keywords: Multi-Disciplinary Optimisation; Multi-Objective Evolutionary Algorithm; Multi-Level Optimisation; MEMS; Micro-Electro-Mechanical Systems; Extrinsic Evolution.

1

Introduction

Micro-electro-mechanical systems (MEMS) or micro-machines [1,2] are a field grown out of the integrated circuit (IC) industry, utilizing fabrication techniques from the technology of Very-Large-Scale-Integration (VLSI). The goal is to develop smart micro devices which can interact with their environment in some form. The paradigm of MEMS is well established within both the commercial and academic fields. At present encompassing more than just the mechanical and electrical [3], MEMS devices now cover a broad range of domains, including the fluidic, thermal, chemical, biological and magnetic systems. This has resulted in a host of applications to arise, from micro-resonators and actuators, gyroscopes, micro-fluidic, and biological lab on chip devices, to name but a few. Normally, designs of such devices are produced in a trial and error approach dependant on user experience and naturally an antithesis to the goal of allowing designers the ability to focus on device and system design. This approach, nominally coined a ‘Build and Break’ iterative, is both time-consuming and expensive [2]. Therefore the development of a design optimisation environment [15,16], which can allow MEMS designers to automate the process of modelling, simulation and optimisation at all levels of the MEMS design process, is fundamental to the eventual progress in MEMS Industry [2]. Work in MEMS design automation and optimisation can be seen to fall into two distinct areas; firstly the more traditional approaches found within numerical methods such as gradient-based search [7]; and G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 322–334, 2010. © Springer-Verlag Berlin Heidelberg 2010

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device

323

secondly the use of more powerful stochastic methods such as simulated annealing and/or Evolutionary Algorithms (EAs) [4-6]. There has been a recent shift towards the use of EAs, and more specifically the use of Multi-Objective Genetic Algorithms (MOGA) [17] as these stochastic algorithms allow for a more robust approach to tackling the issues of a complex multi-modal landscape. The bulk of the work utilising Genetic Algorithms (GAs) and MOGA has been undertaken by researchers from the University of California, Berkeley, focusing solely on planar MEMS devices [4-6]. The paper highlights and builds upon past approaches introducing a novel multiobjective approach to the multi-level and multi-disciplinary design optimisation of a MEMS. A bandpass filter is chosen as a MEMS case study for this paper in order to demonstrate comparable results to the state of the art in the field. This MEMS device is evolved extrinsically in its equivalent analog circuit form using the SPICE simulator and then physically envisioned using the SUGAR Nodal simulator. Results are compared with those within the literature. This paper begins with a brief overview of the hierarchical design environment of MEMS in section 2, followed with a definition of the bandpass filter problem used in this study in section 3. The next section focuses on a novel evolutionary design optimisation approach to solving this problem in section 4 followed by results in section 5 and ending with conclusions. 2

Hierarchical MEMS Design

The hierarchical nature of MEMS design process provides designers with the problem of how best to approach the possible decomposition of the device at the various levels of modelling and analysis abstractions presented to them. Outlined by Senturia [14] the four levels (System, Device, Physical, and Process) each harbour its own set of tools and modelling approaches. The system level focuses upon the use of lumped element circuit models or block diagrams to model device performance, utilising powerful circuit simulators. They provide the possibility to interface with the mechanical elements of the device, either through analytical models, HDL models, reduced order models or alternatively electrical equivalent representations of the mechanical component. Both the device and physical level provide models of varying granularity. At a device level, a designer can look to build accurate 2D layout models through the use of NODAL simulators and various atomic MEMS elements, or by building mathematical analytical representations. The physical level generally utilises more expensive finite element and boundary element methods to simulate and analyse 3D models of the device. The process level looks towards the creation of appropriate mask layouts and process information needed for the batch process generally employed to fabricate the device. Therefore, by utilising system level tools it is possible to derive the function of the whole coupled electromechanical device, while the device or physical levels allow the device to be envisioned and thus allow fabrication to follow function. 3

Problem Definition

Analog circuit design for Hi, Low and Bandpass filters have been successfully undertaken using evolutionary methods in the past [8] [9], mainly through the use of genetic

324

M. Farnsworth et al.

programming and a circuit or bond graph representation [10]. These approaches looked to use components associated with circuit design and connect them in various topologies in order to match the target filter response. Recently MEMS have become a focus upon which to build devices that can provide superior performance to traditional mechanical tank components such as crystal and SAW resonators [11], widely used in bandpass filters within the radio frequency range. A feature of certain MEMS devices is the ability to represent the device as a whole in both mechanical and electrical equivalents. Taking for example a simple folded flexure resonator [11], the device can be represented as a simple spring-mass damping system, and equally this system has a similar equivalent within the electrical domain. Here the values for Mass (mrs), Stiffness (Krs), and Damping (Crs) of the resonator can be mirrored as Inductance (L), Capacitance (C), and Resistance (R) in the electrical domain. Therefore a mechanical folded flexure resonator can be represented and therefore analysed at a system level by building a simple RLC circuit. The coupling of such resonator units or ‘tanks’ through the use of mechanical bridges or springs allows the development of devices, which can provide certain filter responses. This can also be achieved in the circuit equivalent. The approach on relating the physical parameters of the folded flexure resonator to that of the equivalent circuit values has been outlined by Nguyen [11] and the subsequent equations are shown below.

ܴ௫௡ ൌ

ܿ௥௦ ඥ݇௥௦ ݉௥௦ ൌ ଶ ଶ ߟ௘௡ ܳߟ௘௡

‫ܮ‬௫௡ ൌ 

݉௥௦ ଶ ߟ௘௡

‫ܥ‬௫௡ ൌ 

ଶ ߟ௘௡ ݇௥௦

ߴ‫ܥ‬௡ ߴ‫ݔ‬

(1)

ߟ௘௡ ൌ ܸ௣௡

(2)

ʹߦܰ௙௜௡ ߝ௢ ݄ ߴ‫ܥ‬௡ ൌ ߴ‫ݔ‬ ݀

(4)

(5)

(3)

Where ܸ௣௡ is the dc bias voltage, ξ is a constant that models additional capacitance due to fringe field electrics, ߝ௢ is the permittivity of air, ݄ is the structural layer thickness, ܰ௙௜௡ is the number of comb drive fingers and ݀ is the comb finger gap spacing. Using these equations it is possible to derive resistor, capacitor and inductance values from the damping, stiffness and mass values of the resonator and equivocally vice versa. This allows a direct link between the system and device levels and as a result allows designer to derive both function and fabrication to one particular instance of the MEMS filter design. Figure 1 outlines an approach to decompose a MEMS bandpass filter into separate modelling levels, extract the chosen design variables and construct suitable genotype representations in the case of EAs. In order to assess these two levels, objective functions for evaluation need to be introduced. In the case of filter design, a target response based upon chosen design targets of ‘passband’, stopband’ and ‘central frequency’ can be constructed. Figure 2 shows how to break the filter response into sections of ‘stopband’ and ‘passband’ with ideal target values of ‘20dB or less’ and ‘0dB’. A sampling of the frequency response can then be undertaken over a specified range with the goal to have a filter response in the stopband range equal to or below the target value and in the passband range the goal

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device

325

Fig. 1. Filter Design Synthesis Breakdown Central Frequency Response One

Passband

Central Frequency Response Two

Central Frequency Target

Frequency Distance

Stopband One

Stopband Two

Fig. 2. Filter Objective Breakdown

Fig. 3. Central Breakdown

Frequency

Objective

326

M. Farnsworth et al.

is to simply match the target value. In both cases the objective function is simply the sum of the absolute error for the two ranges however considering the stopband is considerably larger a weighting factor is used to reduce this value. A second objective as shown in figure 3 looks to evaluate the distance of the peak filter response of the individual from the target central frequency. The goal being to differentiate between similar filter shapes which however may lie farther away from the target required. Once a suitable filter response has been found, the circuit model can then be converted to the equivalent mechanical values and then used as targets for 2D resonator layout design. 4

Multi-objective Evolutionary Algorithm Filter Design Synthesis

The design and optimisation of a MEMS bandpass filter forms the basis of our multi level problem. The approach used in this paper looks to couple a multi-objective genetic algorithm NSGAII [17] with an electrical circuit model representation, coined (GAECM). Utilising a varied length, real-valued and integer representation, the goal is to allow the GAECM approach to evolve the topology and parameters of the circuit in order to match the frequency response of a bandpass filter. Once a suitable filter design has been found, its values can then be converted into the equivalent mechanical values for mass and stiffness using the calculated , and then used as objective targets for the evolution of a 2D layout folded flexure resonator device. Past attempts [12][13] towards MEMS filter design optimisation have looked to couple the powerful approach of genetic programming with a bond graph representation, coined (GPBG). Though successful a large number of functional evaluations were required (2.6 million) and no respective circuit values were given and therefore it is not possible to derive whether the actual designs were physically feasible. Even so an approach was outlined to allow the automatic synthesis of a physical device in this case utilising an analytical model of a folded flexure resonator and linking it with the powerful approach of GAs [12-13]. The approach proved successful for the set of targets outlined, in this instance to match certain values for both mass, stiffness and damping of a single resonator device. However it was not a true multi-objective algorithm, nor did the actual values come from the previously designed filter. In order to solve each design problem alterations were made to the NSGAII algorithm in order to improve the overall search ability of the optimizer. The ‘SBX’ crossover for the GAECM algorithm has been adapted to be restricted to only occur between the length of the shortest individual as shown in figure 4. Included in the mutation operator is the ability to ‘clone’ or remove tanks from the individual in an attempt to aid topological search, as shown in figures 5 and 6. Tank 1

Tank 2

Tank 3

Parent One

R

C

L

CS

R

C

L

Parent Two

R

C

L

CS

R

C

L

CS

R

C

L

Restricted ‘SBX’ Crossover

Fig. 4. Restricted Crossover for System Level Representation

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device Tank 1 R1

C1

Tank 2 L1

CS2

R2

C2

327

Tank 3 L2

CS3

R3

C3

L3

R2

C2

L2

Clone

R1

C1

L1

CSc

Rc

Cc

Lc

CS2

CS3

R3

C3

L3

Insertion Point for Cloned Tank

Fig. 5. Cloning Mutation Operator Tank 1 R1

C1

Tank 2 L1

CS2

R2

C2

Tank 3 L2

CS3

R3

C3

L3

Remove Tank

R1

C1

CS3

L1

R3

C3

L3

Fig. 6. Removal Mutation Opreator

The design optimisation of the resonator looks to utilises the model representation and simulation of the NODAL analysis tool named ‘SUGAR’. This particular approach follows that of previous work [8-10] in design optimisation of MEMS using the SUGAR platform, however in this instance a completely new folded flexure resonator as shown in figure 7 is evolved in place of previous simpler meandering resonator devices. Utilising a similar hierarchical representation, the whole device consisting of both components of the central mass and supporting springs of the folded flexure are evolvable. The central mass is made up of ten beam elements, four of which can be designed and then simply mirrored to the other half of the mass. The folded flexure springs are made up of eight individual springs, four at the top and bottom, each connected by three truss beams. Each spring is made up of a number of beam elements each with their own set of design variables, in this case ‘width, length and angle’. In this particular design problem constraints are placed upon the resonator so as to adhere to a more ‘classical’ design, with fixed angles for the central mass and folded Truss Beam Two

Truss Beam One

Truss Beam Three

Beam Three Mass Spring Connector

Beam Two

Beam One Anchor

Mass Connector

Component

Centre Connector

Design Variable

Component

Design Variable

Mass Var1

Length1

Width1

Angle1

Spring1

Beam1

Beam2

Beamn

Length1

Width1

Angle1

Mass Var2

Length2

Width2

Angle2

Spring2

Beam1

Beam2

Beamn

Length1

Width1

Angle1

Mass Var3

Length3

Width3

Angle3

Spring3

Beam1

Beam2

Beamn

Length1

Width1

Angle1

Mass Varn

Lengthn

Widthn

Anglen

Springn

Beam1

Beam2

Beamn

Length1

Width1

Angle1

Fig. 7. Device Level Representation

328

M. Farnsworth et al.

Fig. 8. Whole Spring Crossover

Fig. 9. Inter Beam Crossover

Fig. 10. Central Mass Crossover

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device

329

flexure springs and a simple mirroring along the x and y axis. Adaptations to the crossover operator were introduced to mimic that of previous work [4] and replace the classic ‘SBX’ operator, with a ‘whole spring’ crossover and ‘inter beam’ crossover, shown in figures 8 and 9 respectively when evolving spring design. Central mass crossover in figure 10 however uses the original ‘SBX’ crossover operator. The use of SUGAR provides advantages over a single use analytical model, as it allows more complex devices to be evolved and in the future allows for more novel devices to be incorporated. Three case studies as shown in table 3 form the basis of testing this new approach to filter design, beginning with a relatively low frequency filter taken from [12,13], two more filter design problems are introduced to test the robustness of the algorithm at higher frequencies. The parameters used by NSGAII to solve both the system and device level design problems are shown in table 1, in this instance the system level contains a higher mutation rate to facilitate the chance of adding or removing ‘RCL’ tanks. Also two population and offspring sets were run for each case study at the system level. Table 2 holds the various parameters for the circuit design problem, resistance is worked out from capacitance, inductance and equation (1) and therefore left blank. Each case study was fixed to a specific range where points were sampled at specific frequencies and then used to evaluate the two objectives outlined previously for the system level design. These were a range of [0Hz-10kHz] for case study 1 resulting in 10,000 sampling points, and [0Hz-25kHz] and [85kHz-110kHz] for case studies 2 and 3 respectively, resulting in 25,000 sampling points. As a result weighting factors for the sum of the stopbands were set to ‘divide’ the value by 9 and 25 in order for the algorithm to not focus to heavily on optimising the stopband. Table 1. NSGAII Parameters NSGAII Probability of SBX Crossover Probability of Mutation Distribution Index for crossover Distribution Index for mutation Population Size Offspring Size Selection Size Generations Tests

System 0.8 0.35 20

Device 0.8 0.10 20

20 100 / 20 100 / 10 100 / 10 100 5

20 100 100 100 100 -

Table 2. Circuit Design Variable Parameters Variable Type Tank No Voltage Resistance (ȍ) Capacitance (F) Inductance (H) Finger Number Thickness (—m)

Case Study One Lower Upper Values Values 1 9 1 200 1e-15 1e-11 10 100000 1 200 2e-6 3e-5

Case Study Two Lower Upper Values Values 1 9 1 200 1e-17 1e-14 10 100000 1 200 2e-6 3e-5

Case Study Three Lower Upper Values Values 1 9 1 200 1e-18 1e-15 10 100000 1 200 2e-6 3e-5

Table 3. Case Study Parameter Ranges Passband Stopband 1 Stopband 2 Central Frequency

Case Study One 312Hz – 1000Hz 1Hz – 312Hz 1000Hz – 10kHz 656Hz

Case Study Two 19.5kHz – 20.5kHz 1Hz – 19.5kHz 20.5kHz – 25kHz 20kHz

Case Study Three 99.5kHz – 100.5kHz 85kHz – 99.5kHZ 100.5kHz – 110kHz 100kHz

330

M. Farnsworth et al.

5 Results and Comparison Results for each case study, and each population set for the system level filter design problem are found in table 4, with the best result ranked by filter objective listed for each test. The circuit models for test 4 of case study one for population 100 set, test 1 of case study two and test 5 of case study 3, both population 20 sets were converted to their mechanical equivalents as shown in table 5 and for each resonator ‘tank’ used as objective functions for the design synthesis of a 2D layout resonator device. The filter responses for each of these are shown in figure 11, and the evolved 2D layout designs for these filters are shown in figure 12. In the case of the 2D layout design optimisation, results which had an error of less than 0.1% for each objective were extracted. In comparison with earlier work [12,13] the results presented here show this particular approach to be robust over a set of different case studies where previous attempts focused only on one. In the course of solving each case study the GAECM method provided comparable bandpass filter shapes at a relatively small number of functional evaluations given the state of the art [12,13]. Finally the coupling of NSGAII with the NODAL platform SUGAR provided effective and fast design optimisation of the required 2D resonator layouts. Table 4. Best results for each case study ranked by filter objective Test 1 2 3 4 5 Test 1 2 3 4 5 Test 1 2 3 4 5 Test 1 2 3 4 5 Test 1 2 3 4 5 Test 1 2 3 4 5

Best Result Case Study 1: Population 100 Filter Objective Central Frequency Objective Voltage 941.76 110 112.5 953.40 86 161.7 565.25 293 66.4 478.65 24 43.9 942.03 256 159.7 Best Result Case Study 1: Population 20 Filter Objective Central Frequency Objective Voltage 940.47 112 1 1974.60 97 32.70 476.76 240 7.28 2130.29 0 109.85 2130.30 1 108.75 Best Result Case Study 2: Population 100 Filter Objective Central Frequency Objective Voltage 1798.99 230 84.3 2259.23 1250 54.99 1990.79 30 16.98 3085.71 50 102.68 2422.73 190 2.43 Best Result Case Study 2: Population 20 Filter Objective Central Frequency Objective Voltage 988.58 100 44.16 1293.24 260 78.03 2998.03 10 45.62 2095.91 150 115.56 1048.50 210 26.65 Best Result Case Study 3: Population 100 Filter Objective Central Frequency Objective Voltage 1632.81 170 86.87 2405.76 40 31.78 2712.51 110 169.61 1561.27 50 152.39 2289.03 30 197.81 Best Result Case Study 3: Population 20 Filter Objective Central Frequency Objective Voltage 2319.79 40 127.72 2181.26 30 40.30 1672.20 10 66.03 1628.61 20 27.54 1304.11 190 22.17

Tank Number 2 2 3 3 2 Tank Number 2 2 3 2 2 Tank Number 3 5 3 2 3 Tank Number 5 5 2 3 3 Tank Number 6 2 2 2 5 Tank Number 2 2 3 3 9

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device

331

Table 5. Equivalent mass and stiffness (Kx) values for the best results of each case study Individual Folded Flexure Resonator Values Equivalent Mass (kg) Tank 1 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 2 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 3 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 4 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 5 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 6 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 7 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 8 Equivalent Stiffness (N/m) Equivalent Mass (kg) Tank 9 Equivalent Stiffness (N/m)

Best Result Case Study 1 2 3 5.92e-9 2.34e-10 3.92e-10 0.083 3.91 160.52 4.78e-8 2.50e-10 4.15e-10 0.073 3.24 159.72 3.03e-8 2.67e-10 4.03e-10 0.281 3.99 159.74 2.77e-10 3.92e-10 3.99 160.52 2.26e-10 2.95e-10 3.92 159.74 3.90e-10 159.74 4.18e-10 158.80 4.11e-10 160.52 4.07e-10 159.74

(b)

(a)

(c) Fig. 11. Filter frequency response for the best result for case study one (a), case study two (b) and case study three (c), ranked by filter response objective

332

M. Farnsworth et al.

(a)

(b)

(c) Fig. 12. Folded flexure resonator layout designs for best results from case studies one (a), two (b) and three (c)

6 Conclusions and Future Work Moving towards a more multi-level approach to design optimisation of MEMS will prove to be a challenging task. Presented here was a simple approach to the coupling of both system and device level tools in the hope of designing and optimising a MEMS bandpass filter. This involved combining multiple disciplines from the electrical and mechanical domain, utilising separate circuit level modelling and analysis tools such as ‘SPICE’ with a mechanical NODAL simulator ‘SUGAR. The new GAECM approach proved successful in evolving designs which gave comparable results to earlier work [12][13], but at a fraction of the cost, needing only 10,000 functional evaluations in comparison to 2.6 million with the GPBG approach. Also our designs were restricted to bounds which gave rise to feasible and realisable physical targets unlike previous attempts, by using the required electrical equivalent to mechanical equivalent conversion method presented in [11]. This allowed for the creation of filter designs which could be feasible and realisable in terms of fabrication of the resulting 2D layout designs. The design synthesis of the specific 2D folded flexure resonator devices was undertaken through the SUGAR platform and then using the multi-objective genetic algorithm NSGAII designs were evolved to match the required targets optimally found at the system level. By using NSGAII it is possible to undertake true multi-objective optimisation and the integration of it at both system and device level make the job of coupling the two levels together at a later date far easier than a separate genetic programming and GA approach. The use of a NODAL simulator proved successful in evolving designs that could match the target values

A Novel Approach to Multi-level Evolutionary Design Optimization of a MEMS Device

333

required proving 100% successful in solving all designs with 0.1% target error for each objective set. Also the functional evaluations for each design stood only at 10,000, significantly less than the 137,500 of the current state of the art approach [12,13]. The approach presented proved to be robust enough to handle bandpass filter design problems over a wide range, topological search was facilitated by the introduced changes in the GAECM approach, as can be seen in table 5 with ‘cloning’ of RCL tanks proving essential to both case studies 2 and 3. Overall the novel approach proved to be around 260x faster in terms of required functional evaluations for the filter design problem at the system level, and around 14x as effective at the device level when compared with the state of the art currently [12,13]. Future work looks to expand this approach to include more levels of the MEMS design process, specifically that of the physical level. Here designers utilize finite element and boundary element models to accurately analyse and design MEMS devices at a significant computational cost. Therefore any approach which can look to automate and hasten the design optimisation at this level will be of great benefit.

References [1] Fujita, H.: Two Decades of MEMS– from Surprise to Enterprise. In: Proceedings of MEMS, Kobe, Japan, pp. 21–25 (January 2007) [2] Benkhelifa, E., Farnsworth, M., Tiwari, A., Bandi, G., Zhu., M.: Design and Optimisation of microelectromechanical systems: A review of the state-of-the-art. International Journal of Design Engineering 3(1), 41–76 [3] Hsu, T.R.: MEMS and Microsystems, 2nd edn. Wiley, Chichester (2008) [4] Zhou, N., Agogino, A.M., Pister, K.S.: Automated Design Synthesis for Micro-ElectroMechanical Systems (MEMS). In: Proceedings of the ASME Design Automation Conference, ASME CD ROM, Montreal, Canada, September 29-October 2 (2002) [5] Kamalian, R.H., Takagi, H., Agogino, A.M.: Optimized Design of MEMS by Evolutionary Multi-objective Optimization with Interactive Evolutionary Computation. In: Proceedings of GECCO 2004 (Genetic and Evolutionary Computation Conference), Seattle, Washington, June 26-30 (2004) CD ROM [6] Zhang, Y., Kamalian, R., Agogino, A.M., Séquin, C.H.: Design Synthesis of Microelectromechanical Systems Using Genetic Algorithms with Component-Based Genotype Representation. In: Proc. of GECCO 2006 (Genetic and Evolutionary Computation Conference), Seattle, July 8-12, vol. 1, pp. 731–738 (2006) ISBN 1-59593 187-2 [7] Haronain, D.: Maximizing microelectromechanical sensor and actuator sensitivity by optimizing geometry. Sensors and Actuators A 50, 223–236 (1995) [8] Koza, J.R., Bennett III, F.H., Andre, D., Keane, M.A., Dunlap., F.: Automated Synthesis of Analog Electrical Circuits by Means of Genetic Programming. IEEE Transactions on Evolutionary Computation 1(2), 109–128 (1997) [9] Lohn, J.D., Colombano, S.P.: A Circuit Representation Technique For Automated Circuit Design. IEEE Transactions on Evolutionary Computation 3(3), 205–219 (1999) [10] Fan, Z., Hu, J., Seo, K., Goodman, E.D., Rosenberg, R.C., Zhang, B.: A Bond Graph Representation Approach for Automated Analog Filter Design [11] Wang, K., Nguyen, C.T.-C.: High-Order Medium Frequency Micromechanical Electronic Filters. Journal of MicroElectroMechanical Systems 8(4), 534–556 (1999)

334

M. Farnsworth et al.

[12] Fan, Z., Seo, K.K., Hu, J., Rosenberg, R.C., Goodman, E.D.: System-level synthesis of mems via genetic programming and bond graphs. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 2058–2071. Springer, Heidelberg (2003) [13] Fan, Z., Wang, J., Achiche, S., Goodman, E., Rosenberg, R.: Structured synthesis of MEMS using evolutionary approaches. Applied Soft Computing 8, 579–589 (2008) [14] Senturia, S.D.: Microsystem Design, 8th edn. Kluwer Academic Publishers, Dordrecht (2001) ISBN-0-7923-7246-8 [15] Benkhelifa, E., Farnsworth, M., Tiwari, A., Zhu, M.: An Integrated Framework for MEMS Design Optimisation using modeFrontier. In: EnginSoft International Conference 2009. CAE Technologies For Industry and ANSYS Italian Conference (2009) [16] Benkhelifa, E., Farnsworth, M., Tiwari, A., Zhu, M.: Evolutionary Algorithms for Planar MEMS Design Optimisation: A Comparative Study. In: International Workshop on Nature Inspired Cooperative Strategies for Optimization, NICSO 2010 (to be Published 2010) [17] Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, pp. 849–858. Springer, Heidelberg (2000)

From Binary to Continuous Gates – and Back Again Matthias Bechmann1, , Angelika Sebald1 , and Susan Stepney2 1

Department of Chemistry, University of York, YO10 5DD, UK [email protected] 2 Department of Computer Science, University of York, YO10 5DD, UK

Abstract. We describe how nuclear magnetic resonance (NMR) spectroscopy can serve as a substrate for the implementation of classical logic gates. The approach exploits the inherently continuous nature of the NMR parameter space. We show how simple continuous NAND gates with sin/sin and sin/sinc characteristics arise from the NMR parameter space. We use these simple continuous NAND gates as starting points to obtain optimised target NAND circuits with robust, error-tolerant properties. We use Cartesian Genetic Programming (CGP) as our optimisation tool. The various evolved circuits display patterns relating to the symmetry properties of the initial simple continuous gates. Other circuits, such as a robust XOR circuit built from simple NAND gates, are obtained using similar strategies. We briefly mention the possibility to include other target objective functions, for example other continuous functions. Simple continuous NAND gates with sin/sin characteristics are a good starting point for the creation of error-tolerant circuits whereas the more complicated sin/sinc gate characteristics offer potential for the implementation of complicated functions by choosing some straightforward, experimentally controllable parameters appropriately.

1

NMR and Binary Gates

Nuclear magnetic resonance (NMR) spectroscopy in conjunction with nonstandard computation usually comes to mind as a platform for the implementation of algorithms using quantum computation. Previously we have taken a different approach by exploring (some of) the options to use NMR spectroscopy for the implementation of classical computation [5]. We have demonstrated how logic gates can be implemented in various different ways by exploiting the spin dynamics of non-coupled nuclear spins in a range of solution-state NMR experiments. When dealing with spin systems composed of isolated nuclear spins, the underlying spin dynamics can be described conveniently by the properties of magnetisation vectors and their response to the action of radio-frequency (r.f.) pulses of different durations, phases, amplitudes and frequencies. Together with the integrated intensities and/or phases of the resulting NMR signals, this scenario provides a 

Corresponding author.

G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 335–347, 2010. Springer-Verlag Berlin Heidelberg 2010

336

M. Bechmann, A. Sebald, and S. Stepney

Fig. 1. NOR gate implemented using NMR. a) NMR pulse sequence. b) Spectra corresponding to the four possible gate outputs where the integrated spectral intensity is mapped to logic outputs 0 and 1. c) Logic truth table mapping NMR parameters to gate inputs 0 and 1. (adapted from [5]).

rich parameter space and a correspondingly large degree of flexibility regarding choices of input and output parameters for the construction of logic gates. Fig. 1 shows an NMR implementation of a NOR gate, for illustration. The effects of r.f. pulses on a given nuclear spin system are fully under experimental control, and the response of the spin system is fully predictable with no approximations involved. An NMR experiment usually starts from the magnetisation vector in its equilibrium position: aligned with the direction of the external magnetic field (the z-direction in the laboratory frame). An r.f. pulse tips the magnetisation vector away from the z-direction. By choosing the duration, amplitude and frequency of the pulses appropriately, the tip of the magnetisation vector can be used to sample the entire sphere around its origin (Fig. 2).

Fig. 2. Magnetisation vector manipulation by r.f. pulses, e.g. rotation of magnetisation vector S from the z-direction to the −y-direction by a suitable r.f. pulse (a)). Structure of a r.f. pulse displaying characterisation parameters for amplitude, frequency, duration and phase as possible gate input controls (b)).

Our previous NMR implementations of logic gates [5] exploited special positions on this sphere, such as NMR spectra corresponding to the effects of 90 , or 180 , or 45 pulses to create binary input/output values. We have demonstrated that there are many different ways for such implementations of conventional logic gates by slightly less conventional NMR implementations, including many

From Binary to Continuous Gates – and Back Again

337

Fig. 3. 2D function graphs displaying influence of NMR parameters on the output of continuous NAND gates. a) Using the duration τp of the r.f. pulse and the duration of a preacquisition delay τd , resulting in sin dependence of both inputs. b) Using the resonance frequency offset ωp and the r.f. pulse duration τp , a sinc dependence for ωp and a sin dependence for τp is obtained. c) Comparison of experimental and theoretical result for a slice of sinc/sin NAND gate (in b) without mapping to the [0, 1] interval. This corresponds to the region in b) marked by the vertical bar in upper right corner. The deviation between experiment and simulation is always less than 0.5 percent.

different ways to define input and output parameters. There are many more possibilities for NMR implementations of conventional logic gates and circuits. Note that for these discrete logic gates a one-to-one mapping of the NMR parameter(s) to the binary state of the gate is possible in a straightforward manner. In this paper we concentrate on another aspect of NMR implementations of classic logic gates. Whereas previously our main focus was on the multitude of different options for implementing discrete logic gates and circuits by NMR, here we exploit another property of basic NMR experiments. Only a minute fraction of, for example, the space accessible to the magnetisation vector has so far been exploited for the construction of discrete logic gates. Now we lift this restriction and take advantage of the inherent continuous properties of our system and the natural computational power provided by the system itself [6]. The underlying continuous spin dynamics hereby provide the basis to the implementation of continuous logic operations. Compared to [5] this means we no longer restrict the inputs and outputs to be the discrete values 0 and 1, but allow them to be continuous values between 0 and 1.

2

Functions of NMR and Continuous Gates

Depending on the position of the magnetisation vector at the start of signal acquisition, the time-domain NMR signal is composed of sin and cos functions, with an exponentially decaying envelope (the so-called free induction decay, FID). Accordingly, trigonometric and exponential functions are two of the continuous functions inbuilt in any NMR experiment. Most commonly, NMR signals are represented in the frequency domain. Hence, Fourier transformation gives access to, for example, the sinc function ((sin x)/x) if applied to a truncated exponential decay. Fig. 3 illustrates this shift to continuous logic gates: we show the NMR implementation of NAND gates where the inputs have functional dependencies of sin/sin (Fig. 3a) and sin/sinc (Fig. 3b). Note how they have the same

338

M. Bechmann, A. Sebald, and S. Stepney

digital NAND gate behaviours at the corners {0, 1} × {0, 1}, but very different behaviours in between. Fig. 3c shows experimental NMR data representing the sinc function used in Fig. 3b. Taking the step to continuous gates, the input/output mapping now applies to the [0, 1] interval and is not as trivial as it is for the discrete logic gates. However, the NMR input parameters and output functions are known in analytical form, giving access to boolean behaviour at the corners of the two-dimensional parameter space, and continuous transitions in between. The digital NAND gate is universal. Here we relax the constraints on the inputs, to form our continuous NAND gates. These continuous gates can serve as starting points for the optimisation of certain properties of the NAND gate itself or, alternatively, for the optimisation of circuits based on NAND gates. We show how to obtain robust NAND gates (ones that still function as digital NAND gates, even if the inputs have considerable errors), by evolving circuits of the continuous single NAND gates with sin/sin (Fig. 3a) and sin/sinc characteristics (Fig. 3b). Then we evolve circuits for a robust XOR gate, constructed from continuous simple NAND gates. Finally, we briefly address the topic of more general continuous gates based on different functions [2] and how the naturally occurring continuous NMR functions may be exploited in such circumstances. Our optimisation tool is Cartesian Genetic Programming (CGP) [3].

3 3.1

Evolving Robust Continuous Gates and Circuits Continuous NAND Gate with sin/sin Characteristics

This continuous gate is based on the NMR parameters τp (pulse duration) and τd (preacquisition delay) (see Figs. 2b and 3a). It involves the following mapping of the NMR input parameters In 1 and In 2 :

In 1 =

In 1 , In 2 τp ; τp90

∈ [0, 1] In 2 = 1 −

τd τd90

(1)

where τp90 corresponds to a pulse duration causing a 90 flip of the magnetisation vector and τd90 is the duration of a preacquisition delay causing a 90 phase shift of the magnetisation vector in the xy-plane. The output of the simple sin/sin NAND gate implemented by the NMR experiment is then     Out = 1 − sin π2 In 1 sin π2 In 2 (2) Our target robust NAND gate is shown in Fig. 4a. It is a continuous gate, with discrete state areas which, accordingly, should represent an error-tolerant, robust gate. The sampling points used to define the fitness function for evolving this robust gate are shown in Fig. 4b. The fitness function f defined over these N sampling points is N  1   (3) f= evo   1 + Outi − Outtarget i i=1

From Binary to Continuous Gates – and Back Again

339

Fig. 4. a) Target robust NAND gate with discrete state areas. This is robust to errors in the inputs, yielding a correct digital NAND gate for inputs rounded to 0 or 1. b) Sampling points used in the fitness function to evolve the robust NAND gate.

Fig. 5. a) Functional behaviour of the array of nine continuous sin/sin NAND gates. b) Optimisation result being a linear array of nine continuous sin/sin NAND gates.

The evolved robust NAND gate is shown in Fig. 5 (see 7 for the CGP parameters used). It displays the desired feature of well-defined, discrete state areas. The behaviour towards the centre differs from Fig. 4a, but provides no contribution to the fitness function. The evolved circuit for the robust NAND gate is a linear array of nine simple NAND gates (Fig. 5b). With increasing lengths of the NAND-gate chains, the resulting circuit for the robust gate becomes fitter. Odd length chains converge to the robust NAND gate behaviour, whereas even-length chains converge toward a corresponding robust AND gate. This is illustrated in Fig. 6. The first simple NAND gate in the chain performs the NAND operation; all the remaining gates, with their paired inputs, act as simple NOT gates. The increasing length chain converges to fitter circuits, because of the S-shaped (1 − sin2 π2 x) form of the sin/sin gate along its x = y diagonal: any value passing through a pair of simple NOT gates moves closer to being 0 or 1, and so converges to 0 or 1 as the chain of simple NOT gates lengthens. The maximum displacement of points by a single NOT gate operation towards 0 or 1 is ≈ 0.11. This can be interpreted as a threshold for the convergence and stability of the array. Random fluctuations added numerically to every gate output in the range of [±0.1] do not hinder the convergence of the array (Fig. 6 last column). For rather large error values (> 0.2) the arrays tend to destabilise, especially for longer arrays.

340

M. Bechmann, A. Sebald, and S. Stepney

Fig. 6. Convergence of theoretical NAND gate arrays. Odd-numbered arrays converge toward target NAND gate (top row), even-numbered arrays (bottom row) converge toward a corresponding AND gate. The final circuit in each row displays the stability of the array convergence under erroneous signal transduction between gates, assuming random fluctuations in the range of [±0.1].

There are two possible sources for experimental imperfection and therefore imperfect gate behaviour: the accuracy by which the experimental NMR parameters (ωp , τp , . . .) can be executed by the NMR hardware; and the accuracy by which the NMR spectra can be acquired and analysed (integrated in this case). A comparison shows that the fluctuations caused by the measurement and analog-digital conversion are by far the dominating factors (e.g. pulses used were of duration 2.5 ms ±50 ns [1], while fluctuations in signal intensity were < ±0.5%). 3.2

Continuous NAND Gate with sin/sinc Characteristics

We now consider circuits based on the continuous simple sin/sinc NAND gate (Fig. 3b), again aiming for the target robust NAND gate with discrete state areas (Fig. 4a). Here mapping of the NMR parameters ωp (r.f. pulse frequency offset) and τp (r.f. pulse duration) is the following In 1 =

τp τp90

;

In 2 = 1 −

ωp ωpmax

(4)

where ωpmax is the maximum allowed r.f. frequency offset (minimum of sinc function). The output of the simple sin/sinc NAND gate implemented by the NMR experiment is then  |κp90 | κ2p90 sin2 (ωeff τp ) + 2ωp2 (1 − cos (ωeff τp )) (5) Out = 1 − 2 ωeff

From Binary to Continuous Gates – and Back Again

341

 where ωeff = ωp2 + κ2p90 assuming a perfect π/2 magnetisation flip for an onresonance r.f. pulse of amplitude and duration κp90 and τp90 respectively. The continuous sin/sinc NAND gate is a more complicated situation because it does not display symmetry along the diagonal, in contrast to the sin/sin NAND gate. We approach evolution of a robust NAND circuit based on simple sin/sinc NAND gates in a step-wise manner. Gate Confined to Include only the First Minimum of the sinc Function. To start with, we use a simple sin/sinc NAND gate confined to include only the first minimum of the sinc function (Fig. 7a).

Fig. 7. a) Initial simple sin/sinc NAND gate with one minimum included. b) CGP evolved result. c) Array of nine simple continuous sin/sinc NAND gates.

Fig. 7b shows the CGP evolved result, a robust arrangement of discrete state areas. The evolved circuit shown at the top of Fig. 7b is more complicated than the linear chain of NAND gates previously found in the circuit based on simple sin/sin NAND gates. If we build such a linear circuit from simple sin/sinc NAND gates we do find an acceptable solution (Fig. 7c), but with slightly poorer fitness. Despite the loss of symmetry of our sin/sinc starting NAND gate, repeated application of linear chains of increasing lengths still converges to the desired behaviour (Fig. 8). Gate Confined to Include the Second Minimum of the sinc Function. Next, we use a simple sin/sinc NAND gate confined to include the first two minima of the sinc function (Fig. 9a). Again, we compare the result of a CGP evolution (Fig. 9b) and the result of applying the linear array of nine simple sin/sinc NAND gates (Fig. 9c). CGP is successful in finding a solution which is fairly well optimised around the 16 sampling points (Fig. 4b), but the areas in between now display less obvious and more complicated characteristics. The linear chain of nine simple sin/sinc NAND gates is here slightly less successful finding a good solution at and around the sampling points, but a pattern relating to the number of minima in the starting gate is emerging. With only one minimum included, there are essentially just two levels in the contour

342

M. Bechmann, A. Sebald, and S. Stepney

Fig. 8. Convergence of one-minimum sin/sinc NAND gate chains for increasing (oddnumbered) chain length.

Fig. 9. a) Initial simple sin/sinc NAND gate with two minima included. b) CGP evolved result. c) Array of nine simple continuous sin/sinc NAND gates.

plot (Fig. 7c). Now, with two minima included, we find three distinct levels (around 0, around 0.5, and around 1; see Fig. 9c), separated from each other by steep steps. Fig. 10 shows the results of repeated application of linear arrays of simple sin/sinc NAND gates of increasing length. One can see how for the application of longer chains the terraced structure and step functions converge.

Fig. 10. Convergence of two-minima sin/sinc NAND gate chains for increasing (oddnumbered) chain length.

From Binary to Continuous Gates – and Back Again

343

Fig. 11. a) Initial simple sin/sinc NAND gate with three minima included. b) CGP evolved result. c) Array of nine simple continuous sin/sinc NAND gates.

Gate Confined to Include the Third Minimum of the sinc Function. Fig. 11 summarises the results when we include three minima of the sinc function in our starting sin/sinc NAND gate. CGP again evolves a solution which is optimised around all 16 sampling points (Fig. 11a), but with even more complicated behaviour in between. The (unevolved) linear chain of sin/sinc NAND gates now creates four distinct levels and an overall stepped structure, but is less fit with respect to the fitness function sampling points of Fig. 4b. From these results, we can see that continuous simple sin/sinc NAND gate can act as a good starting point for the implementation of a variety of complicated functions, simply by choosing the number of minima included appropriately for the starting continuous gate, and by defining a suitable number of sampling points.

4

Evolving XOR Circuits Using NAND Gates

Here we briefly demonstrate that this strategies used for evolving robust NAND circuits can also be used to obtain circuits with other functionality built from simple NAND gates. We use the continuous simple sin/sin NAND gate (Fig. 3a) as the starting point. Our target circuit is a robust XOR gate with discrete state areas (Fig. 12a), with the same 16 sampling points as before. An XOR gate constructed from simple sin/sin NAND gates (the grey region of Fig. 12b) gives the continuous behaviour shown in Fig. 13a. If this is followed by our previously discovered strategy of a chain of simple NAND gates (Fig. 12b), we get the result shown in Fig. 13b: a robust XOR gate. If we use CGP to evolve a solution from scratch, we get the more complicated circuit shown in Fig. 12c, with fitter continuous behaviour (Fig. 13c). Note that evolution here rediscovers the chaining strategy, and applies it to the final part of the circuit.

344

M. Bechmann, A. Sebald, and S. Stepney

Fig. 12. a) The target XOR gate with discrete state areas. b) Applying the NAND-gate chain approach for optimisation. c) CGP evolved circuit.

Fig. 13. a) XOR gate built from continuous NAND gates without optimisation. b) Result of NAND-gate chain approach. c) CGP evolved XOR gate.

5

Truly Continuous Gates

So far we have been using the continuous behaviour of the simple gates to implement robust, but still essentially digital, gates. In this section we use a different fitness function to evolve circuits with interesting truly continuous behaviour. We can make boolean logic continuous on the interval [0,1] by defining AND(a, b) = min(a, b) and NOT(a) = 1 − a (see [2]). These have the digital behaviour at the extreme values. Then NAND = 1 − min(a, b) (Fig. 14a). We start from the continuous simple sin/sin NAND gate (Fig. 14b). At first glance this seems to be a more straightforward optimisation task than for the robust gates, given that both the starting gate and the target function are continuous in nature, with a similar initial structure. Here we take a fitness function sampled over more points in the space, using a regular grid of 6 × 6 points. The evolved result is shown in Fig. 14c, together with the corresponding, rather elaborate, circuit. Here the more complex circuit yields only modest

From Binary to Continuous Gates – and Back Again

345

Fig. 14. a) The target NAND gate where NAND = 1 − min(a,b). b) The initial simple sin/sin NAND gate. c) The CGP evolved gate. d) The CGP evolved circuit (5% mutation rate, population size 500, best fitness 35.25, 10000 generations).

Fig. 15. Stability and error propagation through CGP evolved gate in Fig. 14c: with random error (a) [±0.5%]; (b) [±1%]; (c) [ ±10%]

improvements over the simple gate, with agreement between target and evolved function improving by about a factor 2 over that of the single simple sin/sin NAND gate. In particular, the evolved circuit does not really help to improve agreement with the most prominent feature of the target function, the sharp diagonal ridge. More work is needed to match the natural properties provided by the NMR system with the desired properties of the continuous gates. Fig. 15 shows the truly error tolerant behaviour of the CGP evolved gate in Fig. 14c.

346

6

M. Bechmann, A. Sebald, and S. Stepney

Conclusions and Next Steps

CGP has proved effective at evolving specific continuous circuits from the continuous simple NAND gates provided by our NMR approach. In particular, the simple sin/sinc gates can provide a rich set of disctretised behaviours. In these experiments, neither the robust gates, nor the truly continuous gates, are inspired by the natural properties of the NMR system, but rather by mathematical abstractions. Next steps will involve investigating and exploiting what the simple NAND gates “naturally” provide.

7

Experimental Setup

Evolutionary Setup. We use a modified version of the CGP code of [4]. Our setup uses a linear topology of 60 nodes plus input and output nodes with the maximum number of level-back connections. Optimum results used between nine and 33 nodes. The mutation rate during evolution was varied between 0.5% and 50%, where rates between 5% and 10% performed best. Populations of 50/500 were evolved for 10000 generations. Results presented are the best of 10 evolutionary runs. NMR Spectroscopy. 1 H NMR spectra of 99.8% deuterated CHCl3 (Aldrich Chemicals) were recorded on a Bruker Avance 600 NMR spectrometer, corresponding to a 1 H Larmor frequency of −600.13 MHz. On-resonant 90 pulse durations were 2.5 ms and recycle delays 3 s. Hardware limitations [1]: duration of r.f. pulses accurate to ±50 ns; pulse rise and fall times 5 ns and 4 ns respectively; pulse amplitude switched in 50 ns with a resolution of 0.1 dB; phases are accurate to ±0.006 degree and switched < 300 ns; r.f. range is 3–1100 MHz with a stability of 3 ·10−9 /day and 1 ·10−8 /year and a resolution of < 0.005 Hz. Frequency switching is < 300 ns for 2.5 MHz steps and < 2μs otherwise. Main source of experimental errors is integration error due to limited digitisation resolution, 0.5% maximum.

Acknowledgements We gratefully acknowledge the Leverhulme Trust for supporting this work. We thank Shubham Gupta, IIT Mumbai, India, for his cooperation in the initial stages of this work, supported by the TRANSIT project (EPSRC grant EP/F032749/1), and John Clark, York for continued discussions and comments.

References 1. Butler, E.: NMR hardware user guide version 001. Tech. rep., Bruker Biospin GmbH, Rheinstetten, Germany (2005) 2. Levin, V.: Continuous logic - i. basic concepts. Kybernetes 29(16), 1234–1249 (2000)

From Binary to Continuous Gates – and Back Again

347

3. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 4. Miller, J.: Cartesian genetic programming source code (July 2009), http://sites.google.com/site/millerjules/professional 5. Rosell´ o-Merino, M., Bechmann, M., Sebald, A., Stepney, S.: Classical computing in nuclear magnetic resonance. Int. J. of Unconventional Computing 6(3–4) (2010) 6. Stepney, S.: The neglected pillar of material computation. Physica D: Nonlinear Phenomena 237(9), 1157–1164 (2008)

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits Cristian Ruican, Mihai Udrescu, Lucian Prodan, and Mircea Vladutiu Advanced Computing Systems and Architectures Laboratory University “Politehnica” Timisoara, 2 V. Parvan Blvd., Timisoara 300223, Romania {crys,mudrescu,lprodan,mvlad}@cs.upt.ro http://www.acsa.upt.ro

Abstract. Setting the values of various parameters for an evolutionary algorithm is essential for its good performance. This paper discusses two optimization strategies that may be used on a conventional Genetic Algorithm to evolve quantum circuits: adaptive (parameters initial values are set before actually running the algorithm) or self-adaptive (parameters change at runtime). The differences between these approaches are investigated, with the focus being put on algorithm performance in terms of evolution time. When taking into consideration the runtime as main target, the performed experiments show that the adaptive behavior (tuning) is more effective for quantum circuit synthesis as opposed to self-adaptive (control). This research provides an answer to whether an evolutionary algorithm applied to quantum circuit synthesis may be more effective when automatic parameter adjustments are made during evolution.

1

Introduction

The continuous pursuit for performance pushes the exploration of new computing paradigms. The acquired experience from classical computation is considerable, as it is developed over more than half a century, whereas for quantum computing the race has started relatively recently, in the 1980’s. Even from today’s perspective, it cannot exactly be foreseen whether quantum computer will become physically feasible in the next decade. Evolutionary search was already applied for quantum circuit synthesis, with the focus being on the analysis of the genetic operators and their corresponding performance. The task of implementing the Meta-Heuristic approach on Quantum Circuit Synthesis (MH-QCS) makes use of the ProGA [5] framework, that provides all the necessary support for developing genetic algorithms. Our ProGA framework underpins a robust and optimized environment, its architecture being extended to handle the additional statistical information. The statistical data is processed on-the-fly by the adaptive algorithm and the results are used for adjusting the genetic operator’s rates during run-time. We focus on the genetic algorithm parameter control by involving statistical information taken from the current state of the search into algorithm decision. Our experiments reveal a higher convergence rate for the genetic evolution and therefore an important runtime G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 348–359, 2010. c Springer-Verlag Berlin Heidelberg 2010 

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits

349

speedup is achieved by using adaptive parameter tuning, as opposed to the selfadaptive parameter tuning approach. The automatic synthesis of a quantum circuit, for a given function, is not an easy achievable task [16][17][18]; in order to solve this problem the genetic algorithm will evolve a possible solution that will be evaluated against other previous solutions obtained, and eventually a close-to-optimal solution will be indicated. It is hard, if not impossible, to guess the values used for the tuning of genetic algorithm, because even a small change in the circuit topology will generate a different quantum logic function; this is the main motivation for adopting an adaptive genetic algorithm.

2

Background

Quantum computation is computation made with coherent atomic scale dynamics. A quantum computer is a physical device able to perform computation driven by quantum mechanical phenomena, such as entanglement and superposition of basis states. For the classical computer, the unit of information is the bit, whereas in quantum computation its counterpart is the so-called qubit. A quantum bit may be represented by using the spin 1/2 particle. For example, a spin-down | ↓ and a spin-up | ↑ may be used to represent the binary information encoded as |0 and |1. In Bra-Ket notation, a qubit is a normalized vector in a two dimensional Hilbert space |ψ = α|0 + β|1, |α|2 + |β|2 = 1 (α,β ∈ C), where |0 and |1 are the superposed basis states [9]. Genetic Algorithms (GA) are adaptive heuristic search algorithms based on evolutionary ideas of natural selection used to find solutions for optimization and search problems. The new field of Evolvable Quantum Information (EQI) has been established as the merging of quantum computation and evolvable computation [8]. The problem of setting values for different control parameters is crucial in the context of algorithm performance. Each GA parameter is responsible for controlling the evolution path towards the solution. There are two major forms of setting the parameter values for a genetic algorithm [15]: – Parameter tuning: the parameter values are fixed before the algorithm run and remain as such during run-time. There are several disadvantages for tuning: finding good parameters before the run may be time consuming and it is possible not to get optimal values for all the phases. – Parameter control: the initial parameter values are changed during the algorithm run, keeping the dynamic spirit of evolution. The adaption algorithm uses the feedback values from the process to adjust the parameters for better performance. As presented in Figure 1, the upper part of the hierarchy contains a method that aims at finding optimal parameters for the GA, while the lower part is dedicated to possible problem solutions on the application layer. We use the same approach of splitting the design into several layers. Thus, the quantum

350

C. Ruican et al.

(a) control flow

(b) information flow

Fig. 1. The 3-layered hierarchy of parameter tuning [15]

circuit synthesis genetic algorithm will run in the application layer, while the algorithm responsible with the dynamic adjustment of the operators will run in the design layer.

3

Search Methodology

Evolutionary algorithms relate to the probability theory, which is essential for the quantitative analysis of large sets of data, having as starting point the evolution of any random variable (i.e. representation types, selection methods, different operators used, etc; as opposed to the selection methods defined as natural values, the operators are in continuous space). Consider (Ω,S,P) a probability field, where Ω is the set of elementary events, S is the events space and P is a probability measure, then a random variable over Ω is an application X:Ω → R taking the following form: {ω|X(ω) < x} (1) where any subset of Ω is a part of S, where x is a random real number. We can define the probability measure for x: P (X < x) = P {ω|X(ω) < x}

(2)

Algorithm convergence is reducible to convergence in probability, which can be demonstrated by using probability values. It is considered that the evolutionary algorithms exhibit increased robustness (they work well on different data sets) largely due to the optimization functions, where the performance function (fitness) is always followed by the optimization function (metaheuristics). This way, evolutionary algorithms provide better results in comparison with other approaches (i.e. gradient type methods). If we consider X as being the solution space (a set of any individual solution states), then each individual is represented by an element from X; f : X → R. Our purpose is to identify maxx∈X f where x is a vector of decision variables that satisfies f (x) = f (x1 , ..., xn ). The individual fitness is evaluated using a performance function defined as:

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits

351

eval(x) = f (x) + W × penalty(x)

(3)

where f=

f unction(evolved circuit) f unction(initial circuit)

(4)

and penalty = 1 −

number of evolved gates − number of initial gates number of initial gates

(5)

W is a user-defined weight indicating how severely the penalty affects the evaluation. The search process dynamics is generated by applying the crossover and mutation operators. The purpose is to find an optimal combination from population individuals, as the one corresponding to the maximum value for the performance function. Each program execution contains the same number of individuals and it is considered that a following run will always contain better individuals than those from a previous run; the algorithm trend being to reach the global optimum value for the performance function. Each genetic operator is applied with a given probability (defined as an algorithm parameter) over the current population, subsequently generating a new population. Our previously developed framework ProGA [Programming Genetic Algorithms] [5] is a new and powerful tool used to solve real-world genetic problems, in order to compensate for the situations where conventional (deterministic) methods fail to successfully produce an acceptable solution. A significant part of the effort has been dedicated to maintaining the framework’s extensibility which proves especially useful for comparison purposes when two different approaches are applied (adaptive and self-adaptive behavior). While the implementation details were hidden from dedicated components (from a macro level perspective), they do not to affect the chromosome details (at the micro level). The adaptive or self-adaptive rules are not pre-programmed but discovered during the algorithm evolution. The framework allows for different configurations, and thus the comparison between the characteristics of the emerged solutions becomes straightforward and accurate. At the Macro Level the genetic algorithm describes the iteration loops that increase the convergence of the individuals towards a possible solution. Knowledge about the termination criterion and about the probability used for the natural selection is available at this level. The macro level is also responsible for creating the population that will be used during the evolution. At the Micro Level the chromosome details are essential, together with the operations that may be applied (initialization, crossover, mutation, etc.) This level is important for the solution encoding. To provide an accurate comparison between the two major forms of setting parameter values (adaptive and self-adaptive), a third level is introduced in order to interface the common part of the algorithm with the parameter tuning parts (see Fig. 2). Adaptive evolutionary computations can be separated from the main GA in order to facilitate the assessment of the evolved results.

352

C. Ruican et al.

Fig. 2. System’s provided levels

First, a direct relationship between population and adaptive components is present because additional statistical information from the current generation is necessary for parameter adjustment (the decision will later be taken when enough statistical information become available from previous generations). Second, when self-adjustment is used a relationship between the chromosome and the self-adjustment component will be created (the decision is taken by each individual on the applied operator). The quantum circuit representation is crucial for chromosome encoding. Following Nature, where a chromosome is composed of genes, in our chromosome the genes represent circuit sections. This way, we are able to encode a circuit within a chromosome [4], and therefore represent a possible candidate solution (as presented in Fig.3a). A gene will store the specific characteristics of a particular circuit section and genetic operators will be applied either at the gene level or inside the gene. The genome representation is an array of quantum gates that are chosen randomly from a given set, with the only constraint that a quantum gate cannot be split in two genes. The initialization is performed once (at start-up), and is responsible with the genome creation (see Fig. 3b). A gene stores the specific characteristic of a particular quantum circuit section where the mutation operator has the role of producing a change, hence allowing the search algorithm to explore new spaces. The crossover operator will select gates from parents to create offsprings, by copying their contents and properties. 3.1

Static GA Operators

Parameter tuning is one of the approaches used for optimization problems. The parameter values are changed statically before the algorithm run, followed by results evaluation. The tuning becomes complicated when an increased number of parameters need to be adjusted. Thus, “considering four parameters and five values for each of them, one has to test 54 = 625 different setups. Performing 100 independent runs with each setup, this implies 62,500 runs just to establish a good algorithm design” [1]. Algorithm parameters are usually not independent and testing each of the possible combinations proves as practically impossible in many cases while certainly being extremely time-consuming.

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits chromosome encoding gene encoding

gene 1

gene 2

353

gene m

n

1

(a) encoding

(b) control Fig. 3. Chromosome Encoding (a) and Chromosome Initialization (b)

3.2

Adapting GA Operators

Adaptive methods make use of additional information from the current state of the search. The statistical information is later used by the adaptive component for adjusting the algorithm operators. Compared with the static adjustment, for example, in incipient generations, large mutation steps are necessary for a good exploration in the problem search space, and later in the last runs only small mutation steps are needed to narrow the optimal solution. From the meta-heuristic point of view, it is considered that genetic algorithms contain all the necessary information for adaptive behavior. Nevertheless, the adaptive behavior optimizes the circuit synthesis algorithm (from the user point of view the setting of parameters is far from being a trivial task). Two types of statistical data are used as input for the adaptive algorithm. The first type is represented by the fitness results for each population corresponding to the best, mean and worst chromosomes. The second type is represented by the operator performance (see Fig.4). In reference [6], it is considered that the performance records are essential for deciding on operators reward. Functions as Maximum, Minimum, Average and Standard Deviation may be applied on any kind of statistical data. For each generation the maximum, average and minimum fitness values are provided by the genetic algorithm framework and stored within the statistical data. After each generation, the operator performance is updated with statistical data. Following the 1/5- Rechenberg rule [3], the analysis of the acquired data is performed after 5 generations. The operator reward is updated according to the formula given in Eq.6. When the genetic evolution is finished (i.e. when a solution has been

354

C. Ruican et al.

Fig. 4. Adaptive information flow

evolved), other statistical functions are computed. Thus, we defined statistical functions on each generation and statistical functions over all generations. σ(op) = α ∗ Absolute + β ∗ Relative − γ ∗ InRange − δ ∗ W orse

(6)

where the parameters α, β, γ and δ are introduced to rank the operator performance; they are not adjusted during the algorithm evolution. In our experiments we used the following values: α = 20, β = 5, γ = 1 and δ = 1. 3.3

Self-adapting GA Operators

An important view on the optimization problems is emphasized by the ”No Free Lunch“ theorem, stating that any additional performance over one class of problems is exactly paid in terms of performance over another class [2]. The self-adaptive parameter control algorithm outperforms this limitation due to continuous adjustment of the operators probability (evolution together with the algorithm). The dynamic parameters customization will properly handle the objective function, the encoding and the constraints. This approach leads to a flexible genetic algorithm, where the tuning is automatically performed during the genetic evolution. When an evolutionary computation evolves the new values for its adaptive parameters, it is considered to be self-adaptive. The algorithm goal is to dynamically adjust the values for its parameters to bias the evolution of the offspring (i.e. by increasing the algorithm convergence). Following this approach, the chromosome will store additional information about the applied operator success rate within the self-adaptive component (see Fig. 5). The success rate is defined, in the same way as the performance records from the adaptive approach, and it is used to identify a better operator result at the chromosome level. For the adaptive approach the performance records are saved at the population level, whereas for the self-adaptive approach the save is performed at the chromosome level. The decision component returns the result of the comparison between the success rate values for both GA operators - mutation and crossover - and then decide on which operator has more chance of creating a better offspring. If we compare it with the adaptive behavior, where after only 5 generations the adjustment is made (and decided at the population

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits

355

Fig. 5. Self-Adaptive information flow

level), at the self-adaptive each chromosome -based on its success rates- decide on the applied GA operator (there is no probability involved). Even if the GA operators parameter are now removed from our equation, a GA contains other parameters that need to be manually adjusted (i.e. the population size). A small number of individuals generate a ramp-up through solution at the start-up, but it is possible that a solution is not evolved later. If the number of individuals is too high then any generation evaluation takes long time. This paper has made the case for the optimal evolution when the solution is evolved in a faster manner.

4 4.1

Evaluating Quantum Circuit Evolution Experimental Platform

The experiments were conducted on a computer with the following configuration: Intel Core2Duo processor at 2GHz, 4GB RAM memory and Open SuSe 11.2 as operating system. In order to avoid lucky guesses the experiments have been repeated for 10 times, the average result being used for comparison in the provided graphics. To measure the performance of an application, it is common to measure the time spent until a solution is evolved. Because the results may appear within a small period of time, a fine granularity for time measurement was necessary. We used the RDTSC (Read Data Time Stamp Counter) to measure the processor ticks in order to provide excellent, high-resolution information. The number of ticks is independent from the processor platform and it accurately measures short duration events (with laptops or systems supporting Intel@Speed Technology the processor frequency will change as a result of CPU utilization when running on batteries). To estimate the time duration, the number of ticks should be divided by the processor frequency. Each case study is started with a benchmark quantum circuit (see Figure 6) that is used for synthesis algorithm evaluation. For each benchmark the name of the circuit is presented along with its number of qubits (for diversity purpose we performed the evaluation on three-qubit, four-qubit and five-qubit circuits).

356

C. Ruican et al.

|a •



|a ⊕ |b • ⊕ •



|c • • ⊕ ⊕ • (a) ham3

|a • •

|b ⊕ •

|b • ⊕ • •

|c

|c

•⊕

|d

|d ⊕



|e

(b) rd32

⊕• ⊕• ⊕ (c) xor5

Fig. 6. Benchmark circuits used for analysis [7]

Table 1. Configuration for Experiments Configuration Parameter Adaptive Self-Adaptive GA type Non-Overlapping Non-Overlapping Population size 150 150 Generations 100 100 Mutation type Multiple Multiple Crossover type Two points Two points Selector type Roulette Wheel Roulette Wheel Elitism percent 10 10 Mutation probability 0.03 NA Crossover probability 0.3 NA Adaptive increase/decrease 0.1/0.1 NA

The following configuration (see Table 1) is used to evolve synthesis solutions, mutation and crossover probabilities being adjusted during the evolution by following the adaptive or self-adaptive algorithm. The experimental results are presented as tables (see Table 2); the tests and the software source code are made available over the personal web site[13]). Table 2. Experimental Results ham3 rd32 xor5 Parameter Adaptive Self-Adaptive Adaptive Self-Adaptive Adaptive Self-Adaptive MPR 70.00 33.67 64.67 66.00 64.67 57.29 96.25 96.25 97.81 97.81 97.81 99.38 MBF 5.22E+08 1.06E+09 9.03E+08 1.09E+10 9.03E+08 6.61E+10 MT 4 6 3 3 3 7 S

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits

4.2

357

Comparison Analysis

During the evaluation of the experiments, configurable variables were used to measure and control the application results. The data analysis creates correlations between the adjusted (adaptive or self-adaptive) operators and the algorithm results. The algorithm was tested over different quantum circuits for different difficulty levels by increasing the number of circuit qubits. Four factors are explored within our experiments (see Table 2): – MPR (Mean Percentage Runs): average number of evolved generations until a solution is evolved, over all runs – MBF (Mean Best Fitness): average of the best fitness in the last population, over all runs – MT (Mean Time): average of executed ticks until a solution is evolved (measure within the current generation) – S (Solutions): number of evolved solutions Figure 7 contains a detailed comparison on each experimental run applied for the xor5 quantum circuit. For full experimental results the reader is kindly referenced to [13].

Fig. 7. MBF experimental results for the xor5 quantum circuit

Before any analysis of our results for these test cases, we note that our quantum synthesis algorithm always converges toward a solution. Considering all aspects, the adaptive approach proves more effective in developing a faster convergence because better offsprings are evolved, although the self-adaptive approach should be better in terms of evolving solutions, at least theoretically. To this end, consisting of synthesizing quantum circuits, the main goal was to reduce the evolution time; this justifies our choice for the adaptive approach.

358

C. Ruican et al.

In more detail, the effectiveness of the genetic adaptive algorithm is proven for quantum circuit synthesis. The computational power overhead, required by the adaptive component is reasonably small (see MT values expressed in comparison with the self-adaptive); however, the number of evolved solutions is higher for the self-adaptive approach.

5

Conclusions and Perspectives

This paper presented our experimental results over using two different optimization strategies for evolving quantum circuits. For this task, the best performance was achieved by using the adaptive as opposed to the self-adaptive approach. As already proven in [19] [20], metaheuristic approaches are more effective in evolving quantum circuits, being able to provide solutions for 8-qubit circuits, considering that conventional GA approaches are effective only for 4 or 5-qubit circuits. These previously experimented methods employ only the adaptive approach. The implementation and testing of the another metaheuristic approach (i.e. self-adaptive) is presented herein, with the emphasis being put on the comparison between the two strategies, at the algorithmic level. The experimental results suggest that the adaptive strategy is better than the self-adaptive one, for all the considered benchmark circuits. In fairness, it has to be mentioned that the experience gained for developing the adaptive metaheuristic will suggest the fact that further research will level this gap. Nevertheless, the difference in performance obtained by performing the experiments can also be explained by performing more mutations than necessary in many cases (due to the fact that each individual decides about its applied genetic operator). Our future work will try to investigate algorithms with a smaller number of parameters, in order to render the most effective metaheuristic strategy when evolving quantum circuits.

Acknowledgements This work was supported in part by the National University Research Council, Romania, under grant PNII-I17/2007.

References 1. Eiben, A.E., Michalewicz, Z., Schoenauer, M., Smith, J.E.: Parameter Control in Evolutionary Algorithms. In: Parameter Setting in Evolutionary Algorithms, Springer, Heidelberg (2007) 2. Wolpert, D.H., Macready, W.G.: No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation 67(1), 67–82 (1997) 3. Rechenberg, I.: Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen. Frommann-Holzboog, Stuttgart (1973) 4. Ruican, C., Udrescu, M., Prodan, L., Vladutiu, M.: Automatic Synthesis for Quantum Circuits using Genetic Algorithms. In: International Conference on Adaptive and Natural Computing Algorithms, pp. 174–183 (2007)

Adaptive vs. Self-adaptive Parameters for Evolving Quantum Circuits

359

5. Ruican, C., Udrescu, M., Prodan, L., Vladutiu, M.: A Genetic Algorithm Framework Applied to Quantum Circuit Synthesis. In: Nature Inspired Cooperative Strategies for Optimization, pp. 419–429 (2007) 6. Gheorghies, O., Luchian, H., Gheorghies, A.: Walking the Royal Road with Integrated-Adaptive Genetic Algorithms. University Alexandru Ioan Cuza of Iasi (2005), http://thor.info.uaic.ro/~tr/tr05-04.pdf 7. Maslov, D.: Reversible Logic Synthesis Benchmarks Page (2008), http://www.cs.uvic.ca/%7Edmaslov/ 8. Spector, L.: Automatic Quantum Computer Programming. A Genetic Programming Approach, 2nd edn. Springer, Heidelberg (2006) 9. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) 10. Yao, X.: An Empirical Study of Genetic Operators in Genetic Algorithms. Microprocessing and Microprogramming 38(1-5), 707–714 (1993) 11. Hilding, F.G., Ward, K.: Automated Operator Selection on Genetic Algorithms. Knowledge-Based Intelligent Information and Engineering Systems, 903–909 (2005) 12. Affenzeller, M., Wagner, S.: Offspring Selection: A New Self-Adaptive Selection Scheme for Genetic Algorithms. Adaptive and Natural Computing Algorithms, 218–221 (2005) 13. Ruican, C.: Projects Web Site Page (2010), http://www.cs.utt.ro/~crys/index_files/public/ices.tar.gz 14. Luke, S.: Essentials of Metaheuristics. Zeroth Edition (2009), http://cs.gmu.edu/~sean/book/metaheuristics/ 15. Smit, S.K., Eiben, A.E.: Comparing Parameter Tuning Methods for Evolutionary Algorithms. In: IEEE Congress on Evolutionary Computation, pp. 399–406 (2009) 16. Maslov, D., Dueck, G.W.: Level Compaction in Quantum Circuits. In: IEEE Congress on Evolutionary Computation, pp. 2405–2409 (2006) 17. Shende, V., Prasad, A.K., Markov, I.L., Hayes, J.P.: Synthesis of Reversible Logic Circuits. IEEE Transaction on CAD 22 22(6), 710–722 (2003) 18. Lukac, M., Perkowski, M.: Evolving quantum circuits using genetic algorithm. In: NASA/DoD Conference on Evolvable Hardware, pp. 177–185 (2002) 19. Ruican, C., Udrescu, M., Prodan, L., Vladutiu, M.: Quantum Circuit Synthesis with Adaptive Parametres Control. In: European Conference on Genetic Programming, pp. 339–350 (2009) 20. Ruican, C., Udrescu, M., Prodan, L., Vladutiu, M.: Genetic Algorithm Based Quantum Circuit Synthesis with Adaptive Parameters. In: IEEE Congress on Evolutionary Computation, pp. 896–903 (2009)

Imitation Programming Larry Bull Department of Computer Science, University of the West of England, Bristol BS16 1QY, U.K. [email protected]

Abstract. Many nature-inspired mechanisms have been presented for computational design and optimization. This paper introduces a population-based approach inspired by a form of cultural learning - imitation. Imitation is typically defined as learning through the copying of others. In particular, it is used in this paper to design simple circuits using a discrete dynamical system representation – Turing’s unorganised machines. Initial results suggest the imitation computation approach presented is competitive with evolutionary computation, i.e., another class of stochastic population-based search, to design circuits from such recurrent NAND gate networks. Synchronous and asynchronous circuits are considered.

1 Introduction Cultural learning is learning either directly or indirectly from others and imitation is a fundamental form of such adaptation. Dawkins [9] has highlighted the similarity between the copying of behaviours through imitation and the propagation of innate behaviours through genetics within populations. That is, he suggests information passed between individuals through imitation is both selected for by the copier and subject to copy errors, and hence an evolutionary process is at work - consequently presenting the cultural equivalent to the gene, the so-called meme. The term “memetic” has already been somewhat inaccurately adopted by a class of search algorithms which combine evolution with individual learning, although a few exceptions include imitation (e.g., [40]). Some previous work has explored the use of imitation (or imitation-like) processes as a general approach to computational intelligence however, including within reinforcement learning (e.g., [29]) and supervised learning (e.g., [5]). The imitation of humans by machines has been used to design robot controllers (e.g., [6]) and computer game agents (e.g., [13]). Other culture-inspired schemes include the use of artifacts (e.g., [17]) or the use of stored information to guide the production of new evolutionary generations, as in Cultural Algorithms [30]. This paper introduces a new form of imitation computation and applies it to the design of (simple) dynamical circuits consisting of uniform components. In 1948 Alan Turing produced a paper entitled “Intelligent Machinery” in which he highlighted cultural learning as a possible inspiration for techniques by which to program machines (e.g., see [8] for an overview). In the same paper, Turing also presented a formalism he termed “unorganised machines” by which to represent intelligence within computers. These consisted of two types: A-type unorganised machines, G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 360–371, 2010. © Springer-Verlag Berlin Heidelberg 2010

Imitation Programming

361

which were composed of two-input NAND gates connected into disorganised networks (Figure 1, left); and, B-type unorganised machines which included an extra triplet of NAND gates on the arcs between the NAND gates of A-type machines by which to affect their behaviour in a supervised learning-like scheme through the constant application of appropriate extra inputs to the network (Figure 1, right). In both cases, each NAND gate node updates in parallel on a discrete time step with the output from each node arriving at the input of the node(s) on each connection for the next time step. The structure of unorganised machines is therefore very much like a simple artificial neural network with recurrent connections and hence it is perhaps surprising that Turing made no reference to McCulloch and Pitts’ [22] prior seminal paper on networks of binary-thresholded nodes. However, Turing’s scheme extended McCulloch and Pitts’ work in that he also considered the training of such networks with his B-type architecture. This has led to their also being known as “Turing’s connectionism”. Moreover, as Teuscher [35] has highlighted, Turing’s unorganised machines are (discrete) nonlinear dynamical systems and therefore have the potential to exhibit complex behaviour despite their construction from simple elements. Around the same time as Turing was working on artificial intelligence in the 1940’s, John von Neumann, together with Stanislaw Ulam, developed the regular lattice-based discrete dynamical systems known as Cellular Automata (CA) [38]. That is, CAs are discrete dynamical systems which exist on a graph of restricted connectivity but with potentially any logical function at each node, whereas unorganised machines exist on a graph of potentially any connectivity topology but with a restricted logical function at each node. Given their simple structure from universal gates, the current work aims to explore the potential for circuit design using unorganised machines through the use of imitation computation.

Fig. 1. A-type unorganised machine consisting of four two-input NAND gates (left). B-type unorganised machine (right) consisting of four two-input NAND gates. Each connecting arc contains a three NAND gate “interference” mechanism so that external inputs such as S1 and S2 can be applied to affect overall behaviour, i.e., a form of supervised learning.

2 Background The most common form of discrete dynamical system is the Cellular Automaton which consists of an array of cells where the cells exist in states from a finite set and

362

L. Bull

update their states in parallel in discrete time. Traditionally, each cell calculates its next state depending upon its current state and the states of its closest neighbours. Packard [26] was the first to use a computational intelligence technique to design CAs such that they exhibit a given emergent global behaviour, using evolutionary computation. Following Packard, Mitchell et al. (e.g., [24]) have investigated the use of a Genetic Algorithm (GA) [16] to learn the rules of uniform one-dimensional, binary CAs. As in Packard’s work, the GA produces the entries in the update table used by each cell, candidate solutions being evaluated with regard to their degree of success for the given task — density and synchronization. Andre et al. [2] repeated Mitchell et al.’s work evolving the tree-based LISP S-expressions of Genetic Programming (GP) [20] to identify the update rules. They report similar results. Sipper [31] presented a non-uniform, or heterogeneous, approach to evolving CAs. Each cell of a one- or twodimensional CA is also viewed as a GA population member, mating only with its lattice neighbours and receiving an individual fitness. He showed an increase in performance over Mitchell et al.’s work by exploiting the potential for spatial heterogeneity in the tasks. The approach was also implemented on a Field-Programmable Gate Array (FPGA) and, perhaps most significantly, the inherent fault-tolerance of such discrete dynamical systems was explored. That is, it appears the behaviour of such systems gives them robustness to certain types of fault without extra mechanisms. This finding partially motivates the current study. Another early investigation into discrete dynamical networks was that by Kauffman (e.g., see [18] for an overview) with his “Random Boolean Networks” (RBN). An RBN typically consists of a network of N nodes, each performing one of the possible Boolean functions with K inputs from other nodes in the network, all updating synchronously. As such, RBN may be viewed as a generalization of A-type unorganised machines (since they only contain NAND gates, with K=2). Again, such discrete dynamical systems are known to display an inherent robustness to faults - with low K (see [1] for related results with such regulatory network models in general). RBN have recently been evolved for (ensemble) computation [28]. A number of representations have been presented by which to enable the evolution of computer programs and circuits. Most relevant to the representation to be explored in this paper is the relatively small amount of prior work on arbitrary graph-based representations. Significantly, Fogel et al. (e.g., [11]) were the first to evolve graphbased (sequential) programs with their use of finite state machines – Evolutionary Programming (EP). Angeline et al. [4] used a version of Fogel et al.’s approach to design highly recurrent artificial neural networks. Teller and Veloso’s [34] “neural programming” (NP) uses a directed graph of connected nodes, each with functionality defined in the standard GP way, with recursive connections included. Here each node executes in synchronous parallelism for some number of cycles before an output node’s value is taken. Luke and Spector [21] presented an indirect, or cellular, encoding scheme by which to produce graphs, as had been used to design artificial neural networks (e.g., [14]), an approach used to design both unorganised machines [35] and automata networks [7]. Poli has presented a scheme wherein nodes are connected in a graph which is placed over a two-dimensional grid. Later, recurrent artificial neural networks were designed such that the nodes were synchronously parallel and variants exist in which some nodes can update more frequently than others (see [27] for an overview). Miller (e.g., [23]) has presented a restricted graph-based representation

Imitation Programming

363

scheme originally designed to consider the hardware implementation of the evolved program wherein a two-dimensional grid of sequentially (feed forward) updating, connected logic blocks is produced. The implementation of arbitrary graphs onto FPGAs has also been considered [37]. An example of what might be identified as a population-based imitation approach is the class of algorithms known as Particle Swarm Optimization (PSO) [19]. Originally intended as a simulation tool for modelling social behaviour, PSO algorithms typically maintain a population of real-valued individuals which move through the problem space by adjusting their constituent variables based upon both their own best ever solution and the current best solution within a social or spatial group. That is, it can be said individuals imitate aspects of other current individuals to try to improve their fitness, typically using randomly weighted coefficients per variable via vector multiplication. In this paper a related form of imitation computation is presented and used to design synchronous and asynchronous dynamical circuits from variable-sized graphs.

3 Designing Unorganised Machines through Imitation A-type unorganised machines have a finite number of possible states and they are deterministic, hence such networks eventually fall into a basin of attraction. Turing was aware that his A-type unorganised machines would have periodic behaviour and he stated that since they represent “about the simplest model of a nervous system with a random arrangement of neurons” it would be “of very great interest to find out something about their behaviour” (see [8]). Figure 2 shows the fraction of nodes which change state per update cycle for 100 randomly created networks, each started from a random initial configuration, for various numbers of nodes N. As can be seen, the time taken to equilibrium is typically around 15 cycles, with all nodes changing state on each cycle thereafter, i.e., oscillating. For the smaller networks (N=5, N=50), some nodes remain unchanging at equilibrium however; with smaller networks, the probability of nodes being isolated is sufficient that the basin of attraction contains a degree of node stasis (see [35] for a similar study).

Fig. 2. Showing the average fraction of two-input NAND gate nodes which change state per update cycle of random A-type unorganised machines with various numbers of nodes N

364

L. Bull

Previously, Teuscher [35] has explored the use of evolutionary computing to design both A-type and B-type unorganised machines together with new variants of the latter. In his simplest encoding, an A-type machine is represented by a string of N pairs of integers, each integer representing the node number within the network from which that NAND gate node receives an input. Turing did not explicitly demonstrate how inputs and outputs were to be determined for A-type unorganised machines. Teuscher used I input nodes for I possible inputs, each of which receive the external input only and are then connected to any of the nodes within the network as usual connections. That is, they are not NAND nodes. He then allows for O outputs from a pre-defined position within the network. Thus his scheme departs slightly from Turing’s for B-type unorganised machines since Turing there showed input NAND nodes receiving the external input (Figure 1). Teuscher uses his own scheme for all of his work on unorganised machines, which may be viewed as directly analogous to specifying the source of inputs via a terminal set in traditional tree-based GP. The significance of this difference is not explored here, with Turing’s input scheme used. Teuscher used a GA to design A-type unorganised machines for bitstream regeneration tasks and simple pattern classification. In the former case, the size of the networks, i.e., the number of nodes, was increased by one after every 30,000 generations until a solution was found. That is, an epochal approach was exploited to tackle the issue of not knowing how complex an A-type unorganised machine will need to be for a given task. Or a fixed, predefined size was used. The basic principle of imitation computation is that individuals alter themselves based upon another individual(s), typically with some error in the process. Individuals are not replaced with the descendants of other individuals as in evolutionary search; individuals persist through time, altering their solutions via imitation. Thus imitation may be seen as a directed stochastic search process, thereby combining aspects of both recombination and mutation used in evolutionary computation. In this paper a variable-length representation of pairs of integers, defining node inputs, each with an accompanying single bit defining the node’s start state, is used. On each round of imitations, each individual in the society/population chooses another to imitate. A number of schemes are possible, such as those used in PSO, but the current highest quality solution is used by all individuals for each trait here. To encourage compact solutions, in the case of a quality tie, the smallest high quality solution is used, or a randomly chosen such individual if a further tie occurs. In the general case, for each trait/variable of an individual, a probability that the imitator will replace their own corresponding variable with a copy of that of the imitated solution (pi) could be tested. If satisfied, a further probability (pe) would then be tested to see if an error will occur in that process. For simplicity, in this paper pi is not used on a per variable basis but deterministically set such that one imitation event occurs per round per individual, with pe = 0.5. The possible imitation operations are to copy a connection, copy a start state, or copy solution size, all with or without error. For node connection without error, a randomly chosen node has one of its randomly chosen connections set to the same value as the corresponding node and its same connection in the individual it is imitating. When an error occurs, the connection is set to the copied connection’s id +/- 1 (equal probability, bounded by solution size). Imitation can also copy the start state for a randomly chosen node from the corresponding node, or do it with error (bit flip here). Varying

Imitation Programming

365

solution size depends upon whether the two individuals are the same size, with perfect and erroneous versions again used. Thus if a change of size imitation event is chosen and if the individual being imitated is larger than the copier, the connections and node start state of the first extra node are copied to the imitator, a randomly chosen node being connected to it. If the individual being imitated is smaller than the copied, the last added node is cut from the imitator and all connections to it re-assigned. If the two individuals are the same size, either event can occur (with equal probability). Node addition adds a randomly chosen node from the individual being imitated onto the end of the copier and it is randomly connected into the network. Node deletion is as before. The operation can also occur with errors such that copied connections are either incremented or decremented within bounds. For a problem with a given number of inputs I and a given number of outputs O, the node deletion operator has no effect if the solution consists of only O + I nodes. Similarly, there is a maximum size defined beyond which the growth operator has no effect. A process similar to the selection scheme typically used in Differential Evolution [33] is adopted here: each individual in the current population (μ) creates one alternative solution under imitation (μ’) and it is adopted by that individual if it is of higher quality. In the case of ties, the solution with the fewest number of variables/traits is adopted to reduce bloat, otherwise the decision is random. Other imitation algorithms have made the adoption of imitated solutions probabilistic (e.g., [15]), whereas PSO always accepts new solutions but then also imitates from the given individual’s best ever solution per learning cycle. This aspect of the approach, like many others, is open to future investigation.

4 Experimentation A simple version of the multiplexer task is used initially in this paper since they can be used to build many other logic circuits, including larger multiplexers. These Boolean functions are defined for binary strings of length l = x + 2x under which the x bits index into the remaining 2x bits, returning the value of the indexed bit. The correct response to an input results in a quality increment of 1, with all possible 2l binary inputs being presented per fitness evaluation. Upon each presentation of an input, each node in an unorganised machine has its state set to its specified start state. The input is applied to the first connection of each corresponding I input node. The unorganised machine is then executed for T cycles, where T is typically chosen to enable the machine to reach an attractor. The value on the output node(s) is then taken as the response. It can be noted that Teuscher [35] used the average output node(s) state value over the T cycles to determine the response, again the significance (or not) of this difference is not explored here. All results presented are the average of 10 runs, with a population/society of μ=20 and T=15. Experience found giving initial random solutions N = O+I+30 nodes was useful across all the problems explored here, i.e., with the other parameter/algorithmic settings described. Figure 3 (left) shows the performance of the approach on the 6-bit (x=2) multiplexer problem. Optimal performance (64) is obtained around 5,000 iterations and solutions are eventually two or three nodes smaller than at initialization.

366

L. Bull

Fig. 3. Performance on multiplexer (left) and demultiplexer (right)

A multiplexer has multiple inputs and a single output. The demultiplexer has multiple inputs and multiple outputs. Figure 3 (right) shows performance of the same algorithm for an x=2 demultiplexer, i.e., one with three inputs and four outputs. Again, quality was determined by feeding each of the possible inputs into the A-type machine. It can be seen that optimal performance (8) is reached around 7,000 iterations and solutions are typically around ten nodes smaller than at initialization. As noted above, A-type machines are similar to RBN. The effects of increasing the logic functions to {AND, NAND, OR, NOR, XOR, XNOR}, with a corresponding extra imitation operation, have been briefly explored on the same tasks. Results (not shown) indicate either no statistically significant difference in performance or a significant reduction in performance is seen: Turing’s simpler scheme appears to be robust. However, significantly smaller solutions were sometimes seen which is potentially useful for circuit design, of course.

5 Asynchrony Turing’s unorganized machines were originally described as updating synchronously in discrete time steps. However, there is no reason why this should be the case and there may be significant benefits from relaxing such a constraint. Asynchronous forms of CA have been explored (e.g., [25]) wherein it is often suggested that asynchrony is a more realistic underlying assumption for many natural and artificial systems. Asynchronous logic devices are also known to have the potential to consume less power and dissipate less heat [39], which may be exploitable during efforts towards hardware implementations of such systems. Asynchronous logic is also known to have the potential for improved fault tolerance, particularly through delay insensitive schemes (e.g., [10]). This may also prove beneficial for direct hardware implementations. See Thompson et al. [36] for evolving asynchronous hardware.

Imitation Programming

367

Fig. 4. Showing the average fraction of two-input NAND gate nodes which change state per update cycle of random asynchronous A-type unorganised machines with various numbers of nodes N.

Asynchronous CAs have also been evolved (e.g., [32]). No prior work on the use of asynchronous unorganized machines is known. Asynchrony is here implemented as a randomly chosen node (with replacement) being updated on a given cycle, with as many updates per overall network update cycle as there are nodes in the network before an equivalent cycle to one in the synchronous case is said to have occurred. Figure 4 shows the fraction of nodes which change state per update cycle for 100 randomly created networks, each started from a random initial configuration, for various numbers of nodes N. As can be seen, the time taken to equilibrium is again typically around 15 cycles, with around 10% of nodes changing state on each cycle thereafter, i.e., significantly different behavior to that seen for the synchronous case shown in Figure 2. For the smaller networks (N=5, N=50), there is some slight variance in this behaviour. Figure 5 shows the performance of the imitation algorithm with the asynchronous unorganized machines for the multiplexer and demultiplexer tasks. The same parameters as before were used in each case. As can be seen, the multiplexer task appears significantly harder, on average IP fails to solve the task on every run with the parameters used, compared to consistent optimality after 5,000 iterations in the synchronous node case (Figure 3). Performance was not significantly improved through a variety of minor parameter alterations tried (not shown). It takes around 150,000 iterations to solve the demultiplexer, again a statistically significant decrease in performance over the synchronous case. Moreover, the use of asynchronous node updating has altered the topology of the graphs evolved with more nodes (T-test, p≤0.05) being exploited. This is perhaps to be expected since redundancy, e.g., through sub-circuit duplication, presumably provides robustness to exact updating order during computation. One of the main motivating factors for exploring such unorganised machines is the potential relevance to designing forms of (nano) technology in un-clocked circuits made from simple, uniform components. However, asynchronous versions of RBN have also been presented (e.g., [12]) and so the same increase in node logic functions has been explored here as in the previous section with similar results (not shown).

368

L. Bull

Fig. 5. Performance on multiplexer (left) and demultiplexer (right) of asynchronous system

6 A Comparison with Evolution These initial results therefore indicate that unorganized machines are amenable to (open-ended) design using the imitation algorithm presented. As noted above, one of the earliest forms of evolutionary computation used a graph-based representation – Fogel et al.’s [11] Evolutionary Programming. EP traditionally utilizes five mutation operators to design finite state machines. In this paper EP has been used with the same representation of pairs of integers, defining node inputs, each with an accompanying single bit defining the node’s start state, as above. Similarly, with equal probability, an individual either has: a new NAND node added, with random connectivity; the last added node removed, and those connections to it randomly re-assigned; a randomly chosen connection to a randomly chosen node is randomly re-assigned; or, a randomly chosen node has its start state flipped. The same minimum and maximum solution size limits are maintained as before. The (μ + μ’) selection scheme of EP is also used: each individual in the parent population (μ) creates one randomly mutated offspring (μ’) and the fittest μ individuals form the next generation of parents. In the case of ties, the individual with the fewest number of nodes is kept to reduce bloat, otherwise the decision is random. Fogel et al. used a penalty function to curtail solution complexity, reducing fitness by 1% of size. All other parameters were the same as used above. Figure 6 (left) shows the performance of the EP-Atype system on the 6-bit (x=2) multiplexer problem. Optimal performance (64) is obtained around 200,000 generations and after an initial period of very slight growth, solutions are eventually no bigger than at initialization. Figure 6 (right) shows that optimal performance (8) in the equivalent demultiplexer is reached around 400,000 generations and solutions are typically five or six nodes smaller than at initialization. Hence these results are statistically significantly (T-test, p≤0.05) slower and bigger than those seen above with the imitation algorithm. The same was found to be true for the asynchronous update scheme, where the multiplexer was again unsolved (not shown).

Imitation Programming

369

Fig. 6. Performance on multiplexer (left) and demultiplexer (right) by EP (synchronous)

The imitation algorithm described can be viewed as a parallel hill-climber, simultaneously updating a number of solutions, in contrast to the traditional global replacement scheme used in evolutionary computation (hybrids are also possible, e.g., [3]). It is therefore of interest whether the imitation process aids performance in comparison to using random alterations to individuals, under the same selection process. Results (not shown) indicate that no statistically significant difference is seen from using imitation over purely random alterations on the demultiplexer task (T-test, p>0.05), but an improvement is seen on the multiplexer task through imitation (T-test, p≤0.05). With asynchronous updating imitation is better on the demultiplexer (T-test, p≤0.05). Of course, all algorithms are parameter sensitive to some degree: the parameters used here were simply chosen since they typically enabled optimal performance with all of the basic schemes, both evolution and imitation, on all tasks used, over the allotted time. Future work is needed to explore parameter sensitivity, the role of selecting who to imitate, multiple imitations per iteration, etc.

7 Conclusions This paper has examined a new form of imitation computation and used it to design circuits from discrete dynamical systems. It has also introduced an asynchronous form of the representation. Current work is exploring ways by which to improve the performance of the imitation algorithm for the design of these and other systems. The degree of inherent fault-tolerance of the NAND gate networks due to their dynamical nature is also being explored (e.g., following [18][31]).

References 1. Aldana, M., Cluzel, P.: A natural class of robust networks. PNAS 100(15), 8710–8714 (2003) 2. Andre, D., Koza, J.R., Bennett, F.H., Keane, M.: Genetic Programming III. MIT, Cambridge (1999)

370

L. Bull

3. Angeline, P.: Evolutionary Optimization vs Particle Swarm Optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 601–610. Springer, Heidelberg (1998) 4. Angeline, P., Saunders, G., Pollock, J.: An Evolutionary Algorithm that Constructs Recurrent Neural Networks. IEEE Transactions on Neural Networks 5, 54–65 (1994) 5. Atkeson, C., Schaal, S.: Robot learning from demonstration. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 12–20. Morgan Kaufmann, San Francisco (1997) 6. Billard, A., Dautenhahn, K.: Experiments in Learning by Imitation - Grounding and Use of Communication in Robotic Agents. Adaptive Behavior 7(3/4), 415–438 (1999) 7. Brave, S.: Evolving Deterministic Finite Automata using Cellular Encoding. In: Koza, J.R., et al. (eds.) Procs of the First Ann. Conf. on Genetic Programming, pp. 39–44. MIT Press, Cambridge (1996) 8. Copeland, J.: The Essential Turing, Oxford (2004) 9. Dawkins, R.: The Selfish Gene, Oxford (1976) 10. Di, J., Lala, P.: Cellular Array-based Delay Insensitive Asynchronous Circuits Design and Test for Nanocomputing Systems. Journal of Electronic Testing 23, 175–192 (2007) 11. Fogel, L.J., Owens, A.J., Walsh, M.J.: Artificial Intelligence Through A Simulation of Evolution. In: Maxfield, M., et al. (eds.) Biophysics and Cybernetic Systems: Proceedings of the 2nd Cybernetic Sciences Symposium, pp. 131–155. Spartan Books (1965) 12. Gershenson, C.: Classification of Random Boolean Networks. In: Standish, R.K., Bedau, M., Abbass, H. (eds.) Artificial Life VIII, pp. 1–8. MIT Press, Cambridge (2002) 13. Gorman, B., Humphreys, M.: Towards Integrated Imitation of Strategic Planning and Motion Modeling in Interactive Computer Games. Computers in Entertainment 4(4) (2006) 14. Gruau, F., Whitley, D.: Adding Learning to the Cellular Development Process. Evolutionary Computation 1(3), 213–233 (1993) 15. Hassdijk, E., Vogt, P., Eiben, A.: Social Learning in Population-based Adaptive Systems. In: Procs of the 2008 IEEE Congress on Evolutionary Computation, pp. 1386–1392. IEEE Press, Los Alamitos (2008) 16. Holland, J.H.: Adaptation in Natural and Artificial Systems. Univ. of Mich. Press (1975) 17. Hutchins, E., Hazelhurst, B.: Learning in the Cultural Process. In: Langton, C.G., et al. (eds.) Artificial Life II, pp. 689–706. Addison Wesley, Reading (1990) 18. Kauffman, S.A.: The Origins of Order, Oxford (1993) 19. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Press, Los Alamitos (1995) 20. Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992) 21. Luke, S., Spector, L.: Evolving Graphs and Networks with Edge Encoding: Preliminary Report. In: Koza, J.R. (ed.) Late Breaking Papers at the Genetic Programming 1996 Conference, pp. 117–124. Stanford University, Standford (1996) 22. McCulloch, W.S., Pitts, W.: A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943) 23. Miller, J.: An Empirical Study of the Efficiency of Learning Boolean Functions using a Cartesian Genetic Programming Approach. In: Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., Smith, R.E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference – GECCO 1999, pp. 1135–1142. Morgan Kaufmann, San Francisco (1999) 24. Mitchell, M., Hraber, P., Crutchfield, J.: Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations. Complex Systems 7, 83–130 (1993) 25. Nakamura, K.: Asynchronous Cellular Automata and their Computational Ability. Systems, Computers, Controls 5(5), 58–66 (1974)

Imitation Programming

371

26. Packard, N.: Adaptation Toward the Edge of Chaos. In: Kelso, J., Mandell, A., Shlesinger, M. (eds.) Dynamic Patterns in Complex Systems, pp. 293–301. World Scientific, Singapore (1988) 27. Poli, R.: Parallel Distributed Genetic Programming. In: Corne, D., Dorigo, M., Glover, F. (eds.) New Ideas in Optimisation, pp. 403–431. McGraw-Hill, New York (1999) 28. Preen, R., Bull, L.: Discrete Dynamical Genetic Programming in XCS. In: GECCO-2009: Proceedings of the Genetic and Evolutionary Computation Conference. ACM Press, New York (2009) 29. Price, B., Boutilier, C.: Implicit Imitation in Multiagent Reinforcement learning. In: Procs of Sixteenth Intl Conference on Machine Learning, pp. 325–334. Morgan Kaufmann, San Francisco (1999) 30. Reynolds, R.: An Introduction to Cultural Algorithms. In: Sebald, Fogel, D. (eds.) Procs of 3rd Ann. Conf. on Evolutionary Programming, pp. 131–139. World Scientific, Singapore (1994) 31. Sipper, M.: Evolution of Parallel Cellular Machines. Springer, Heidelberg (1997) 32. Sipper, M., Tomassini, M., Capcarrere, S.: Evolving Asynchronous and Scalable Nonuniform Cellular Automata. In: Proceedings of the Third International Conference on Artificial Neural Networks and Genetic Algorithms, pp. 66–70. Springer, Heidelberg (1997) 33. Storn, R., Price, K.: Differential Evolution - a Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. Journal of Global Optimization 11, 341–359 (1997) 34. Teller, A., Veloso, M.: Neural Programming and an Internal Reinforcement Policy. In: Koza, J.R. (ed.) Late Breaking Papers at the Genetic Programming 1996 Conference, pp. 186–192. Stanford University, Standford (1996) 35. Teuscher, C.: Turing’s Connectionism. Springer, Heidelberg (2002) 36. Thompson, A., Harvey, I., Husbands, P.: Unconstrained Evolution and Hard Consequences. In: Sanchez, E., Tomassini, M. (eds.) Towards Evolvable Hardware 1995. LNCS, vol. 1062. Springer, Heidelberg (1996) 37. Upegui, A., Sanchez, E.: Evolving Hardware with Self-reconfigurable connectivity in Xilinx FPGAs. In: Proceedings of the first NASA/ESA conference on Adaptive Hardware and Systems, pp. 153–162. IEEE Press, Los Alamitos (2006) 38. Von Neumann, J.: The Theory of Self-Reproducing Automata. University of Illinois (1966) 39. Werner, T., Akella, V.: Asynchronous Processor Survey. Comput. 30(11), 67–76 (1997) 40. Wyatt, D., Bull, L.: A Memetic Learning Classifier System for Describing ContinuousValued Problem Spaces. In: Krasnagor, N., Hart, W., Smith, J. (eds.) Recent Advances in Memetic Algorithms, pp. 355–396. Springer, Heidelberg (2004)

EvoFab: A Fully Embodied Evolutionary Fabricator John Rieffel and Dave Sayles Union College Computer Science Department Schenectady, NY 12308 USA

Abstract. Few evolved designs are subsequently manufactured into physical objects – the vast majority remain on the virtual drawing board. We suggest two sources of this “Fabrication Gap”. First, by being descriptive rather than prescriptive, evolutionary design runs the risk of evolving interesting yet unbuildable objects. Secondly, in a wide range of interesting and high-complexity design domains, such as dynamic and highly flexible objects, the gap between simulation and reality is too large to guarantee consilience between design and object. We suggest that one compelling alternative to evolutionary design in these complex domains is to avoid both simulation and description, and instead evolve artifacts directly in the real world. In this paper we introduce EvoFab: a fully embodied evolutionary fabricator, capable of producing novel objects (rather than virtual designs) in situ. EvoFab thereby opens the door to a wide range of incredibly exciting evolutionary design domains.

1

Introduction

Evolutionary algorithms have been used to design a wide number of virtual objects, ranging from virtual creatures [12] to telescope lenses [1]. Recently, with the advent of rapid prototpying 3-D printers, an increasing number of evolved designs have been fabricated in the real world as well. One of the earliest examples of an evolved design crossing the “Fabrication Gap” into reality is Funes’ LEGO structures [4]. In this work, the genotypes were a direct encoding of the physical locations of bricks in the structure - a virtual “blueprint” of the design. Fitness, based upon the weight-bearing ability of the structures, was determined inside a quasi-static simulator. The authors were able to translate the virtual phenotype into a physical object by reading the blueprint and placing physical bricks accordingly. Another notable example of a manufactured evolved design is Lohn’s satellite antenna [6]. The genotype in this case was a generative encoding L-system which, when interpreted by a LOGO-like “turtle”, drew a 3-D model of the antenna. Fitness was determined by measuring the performance of the design within an off-the-shelf antenna simulator. Other evolved designs to cross the Fabrication Gap include robots [7], furniture [5], and tensegrity structures [10]. In each of these later cases, phenotypes G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 372–380, 2010. c Springer-Verlag Berlin Heidelberg 2010 

EvoFab: A Fully Embodied Evolutionary Fabricator

373

were 3D CAD models which could then be printed directly by rapid protyping 3D printers. The quality of these examples belies their quantity. The vast majority of evolved designs remain on the virtual “drawing board”, never to be manufactured. A closer analysis of the examples above provides some insight into this “Fabrication Gap”. For Funes work, building a physical LEGO structure from a descriptive blueprint was facilitated, at least in principle, by the close correspondence between virtual and physical LEGO bricks. In practice, however, the blueprints alone didn’t contain sufficient assembly information: particularly for large structures, the evolved designs first had to be assembled on a flat horizontal surface and then tilted into place – an operation that cannot be inferred from a blueprint. In Lohn’s antenna work, the final product was manufactured by hand: using the 3D model as a guide, a skilled antenna engineer manually bent and soldered pieces of conductive wire to the specified lengths and angles. As these 3D antenna models become more complex, this process becomes increasingly intractable. We see two primary sources of this “Fabrication Gap” between evolved virtual design and physical object. The first issue is that, conventionally, evolved designs are purely descriptive. By specifying what to build but not how to build it, evolutionary design runs the risk of evolving interesting yet unbuildable objects. Imagine an evolutionary design system which evolves images of chocolate cakes. The image describes what the final product looks like (which may be delicious), but there is nothing in the image which provides insight into how it should be prepared, or whether it can even be prepared at all. Similarly, a descriptive representation shows a finished product, but contains no information about how to manufacture it. Secondly, the evolutionary design of complex objects requires high fidelity simulation in order to guarantee that the physical manifestation behaves like its virtual counterpart. For static and rigid objects, such as the tables and robot parts mentioned above, fabrication is relatively straight forward: their behavior can be realistically simulated, and their descriptive phenotype is easily translated into a print-ready CAD file. However, for high-complexity design domains, such as dynamic and highly flexible objects, the gap between simulation and reality is too large to reliably manufacture designs evolved in simulation. This begs the question: in these high complexity domains, is it at all possible to dispense with simulation and description entirely, and instead evolve assembly instructions directly within a rapid prototyper? In such an “evolutionary fabrication” scenario the genotype consists of a linear encoding of instructions to the printer, and the evaluated phenotype is the resulting structure. These ideas have been motivated and explored using simulations of rapid prototypers [8] [9], but until now haven’t been instantiated in the real world. On the face of it of course this proposition seems extreme, and the reasons against it are obvious. First of all, rapid prototyping is a slow process, and so an evolutionary run of hundreds (even thousands) of individuals might take days or weeks – not to mention the associated cost in print material. Furthermore, commercial rapid prototypers cost hundreds of thousands of dollars, and do not

374

J. Rieffel and D. Sayles

allow the access to their underlying API which this approach requires. Finally, commercial prototypers typically only print relatively rigid materials, and so are incapable of producing objects from more interesting design domains. Fortunately, the recent advent of inexpensive desktop fabricators allows for a reexamination of these constraints. Hobbyist-oriented units, such as the Fab@Home and the Makerbot Cupcake, cost only a couple thousand dollars assembled, are open source and “hackable” and, most importantly, are capable of printing a much wider range of print media - from wax and silicone elastomer to chocolate. Furthermore, evolutionary embedded in the real world has produced some profoundly interesting results in other domains. Consider for instance Thompson’s seminal work on “Silicon Evolution” [13], in which pattern discriminators evolved directly on an FPGA behaved qualitatively differently than those evolved in simulation. In fact, the final product wound up exploiting thermal and analog properties of the FPGA – something well outside the domain of the simulator. Similarly, Watson and Ficici applied “Embodied Evolution” [15] (their term) to a population of simple robots, and produced neural network based control strategies which varied significantly from their simulated-evolution counterparts. In each case, the lesson has been that evolution directly in the real world can produce profound results which would have been impossible to produce via simulation. We draw our inspiration for Evolutionary Fabrication largely from these ground breaking insights. In this paper we introduce EvoFab: a fully embodied evolutionary fabricator, capable of automatically designing and manufacturing soft and dynamic structures, thereby bridging the “Fabrication Gap”. After describing the design of this unit in detail, we demonstrate proof-of-concept Evolutionary Fabrication of flexible silicone objects. The ability to automatically design and build soft and dynamic structures via EvoFab opens the door to a wide range of exciting and vital design domains, such as soft robots and biomedical devices.

2

EvoFab: An Evolutionary Fabricator

The system capable of embodied evolutionary fabrication (EvoFab) consists of two parts: a Fab@Home desktop rapid prototyper [14], and a python-based genetic algorithm which interfaces with the Fab@Home. The Fab@Home printer (Figure 1) was developed as a hobbyist desktop rapid prototyper. Its low price, open source software, and large range of print materials makes it ideally suited as an Evofabber. A print syringe, mounted on a X-Y plotter, extrudes material onto an 8” square Z-axis-mounted print platform. We specify seven specific operations which the printer may perform: in,out - move the print head in the +/−Y direction 3mm left,right - move the print head in the +/−X direction 3mm up,down - move the print head in the +/−Z direction 3mm extrude - pushes a fixed volume of print media through the 0.8mm syringe. We refer to a linear encoding of these operations as an assembly plan.

EvoFab: A Fully Embodied Evolutionary Fabricator

375

Fig. 1. A Fab@Home desktop prototyper is used as the foundation of the EvoFab

In conventional setups, prototypers produce three dimensional objects by printing successive layered “slices” of the object on the horizontal plane, lowering the print platform between slices. In the context of Evolutionary Fabrication however, we prefer a more open-ended freeform approach, and so place no constraints upon the print process. The print head is free to move in almost any direction and to perform any operation during the execution of an assembly plan – even if that means causing the syringe to collide with the object it is printing. We will discuss why this might be beneficial in the last section of this paper. After testing a variety of materials ranging from Play-Doh to alginate (an algaebased plaster), we selected silicone bath caulk (GE Silicone II) because of its relatively short (30-minute) cure time, and viscosity (it is thick enough to remain inside the print syringe until extruded, but thin enough to easily extrude into a single strand). Figure 2 illustrates the extrusion of silicone onto the print surface. Because extruding material from a syringe creates a “thread” which dangles between the syringe and the print platform, an extrude command followed by a directional command such as lef t will print a 3mm long line on the print platform. An example assembly plan capable of printing a 3mm square of silicone might appear as the following: [extrude, lef t, extrude, in, extrude, right, extrude, out]

376

J. Rieffel and D. Sayles

Fig. 2. Freeform three dimensional printing of silicone is accomplished by a syringe mounted to an X-Y plotter. The print platform can be moved vertically along the Z axis.

In the context of evolutionary fabrication, these linear encodings of instructions form can a genotype. Mutation and crossover of genotypes is accomplished in just as it would be in any other linear encoding. Figure 3 illustrates the results of two assembly plan genotypes which differ by a small mutation.

Fig. 3. Small changes to assembly plan genotypes produce corresponding changes to the resulting silicone phenotype. The image above compares an original (top) with its mutant (bottom) in which a trailing sequence of the assembly plan has been replaced. Each object was printed from right to left.

EvoFab: A Fully Embodied Evolutionary Fabricator

377

It is worth emphasizing that assembly plans are an indirect encoding – the object which results from executing a particular assembly plan can be considered its phenotype. This layer of indirection gives rise to some interesting consequences, most significant is that there is no longer a 1 : 1 mapping from genotype to phenotype (as there would be in a direct encoding, just as simple bit string GA). Rather, there is an N : 1 mapping: physically identical phenotypes can arise from distinct underlying genotypes. In fact, when you take into account the stochastic nature of the fabrication process it becomes an N : N mapping, meaning that a single genotype can produce slightly different phenotypes when executed multiple times. We explore the consequences of this in our discussion below.

3

Proof of Concept: Interactive Evolution of Shape

We can demonstrate the potential of evolutionary fabrication using a relatively simple Interactive Genetic Algorithm (IGA). Based upon Dawkin’s idea of the “Blind Watchmaker” [3], IGAs replace an objective and automated fitness function with human-based evaluation. IGAs have been successful in a wide range of evolutionary design tasks, most notably in Karl Sims’ seminal work on artificial creatures [12], [11]. We chose as a design task the simple evolution of circular 2-dimensional shapes. A population of size 20 was initialized with random assembly plans, each of which was 20 instructions long. Individuals were then printed onto the platter in batches of four. Once the population was completely printed, the 10 best (most circular) individuals were then selected as parents for the subsequent generations. New children were created using cut-and-splice crossover [16] Each platter of four individuals took roughly 10 minutes to print, corresponding to slightly less than an hour of print time per generation. Figure 4 compares sample phenotypes from the first and ninth generations. After only a small number of generations, the population is already beginning to converge onto more circular shapes.

Fig. 4. Sample individuals from the first (left) and ninth (right) generations of the interactive evolution in which the user is selecting for roundness of shapes. After relatively few generations the population is beginning to converge onto more circular shapes.

378

4

J. Rieffel and D. Sayles

Discussion

The results presented in our proof-of-concept evolutionary fabrication above are enough to lend credence to the potential of EvoFab for exploring even more interesting and complex design domains. Before discussing these applications in more detail it is first worth discussion some of the implications and limitations of this approach. 4.1

Fabrication and Epigenetic Traits

One of the more fascinating consequences of embodied evolutionary fabrication is the capacity for the system as a whole to produce epigenetic traits - that is, phenotypic characteristics which arise purely from the mechanics of assembly, and have no underlying genotypic source. Consider for example the phenotypes in Figure 5, in which the user was selecting for shapes resembling the letter ’A’. At a glance one would assume that the “cross” of the A shapes was produced by an explicit set of operations within the underlying genotypes. In fact, they are instead caused by the print head “dragging” an extraneous thread of print material across the genotype as it moves between print regions. Explorations into simulated evolutionary fabrication have suggested that there may be some interesting benefits to this kind of phenomenon [8]. Consider for instance a print process which extruded two separate subassemblies and then used the syringe head to dynamically assemble them into a larger structure. We hope to use EvoFab to further explore the consequences in embodied systems as well. 4.2

Material Use and Conservation

A natural consequence of evolutionary fabrication is that a significant amount of print material is consumed over the multiple generations of phenotype evaluations. And, while silicone elastomer is less expensive than the plastics used in high-end commercial rapid prototypers, the costs still add up. In order to address this issue we are exploring a number of alternative and recyclable materials such as wax and even ice [2]. Ideally, once they are evaluated for fitness, phenotypes could then be reduced to their original material for reuse in a subsequent print cycle. 4.3

Design Domains

The domains in which Evolutionary Fabrication holds the most promise are those which are too complex or too inscrutable to realistically simulate. One such area is the design of flexible and dynamical systems, such as the morphology of completely soft robots. In light of recent natural disasters in Haiti and Chile, there is a compelling need for more versatile and robust search and rescue robots. Imagine, for instance, a machine that can squeeze through holes, climb up walls, and flow

EvoFab: A Fully Embodied Evolutionary Fabricator

379

Fig. 5. Example of epigenetic traits in a set of phenotypes evolved for likeness to the letter ’A’. In each case, the “crosspiece” which connects the shorter leg to the longer leg is not caused by a genotypic sequence, but is instead caused by the print head dragging extra material across the phenotype as it finishes one print job and moves to the adjacent print region.

around obstacles. Though it may sound like the domain of science fiction, modern advances in materials such as polymers and nanocomposites such a “soft robot” is becoming an increasing possibility. Unfortunately, soft and deformable bodies can possess near-infinite degrees of freedom, and elastic pre-stresses mean that any local perturbation causes a redistribution of forces throughout the structure. As a consequence, soft structures are incredibly difficult to realistically simulate, even in non-dynamic regimes. Furthermore, there are no established principles or purely analytical approaches to the problem of soft mechanical design and control – instead the design task involves significant amounts of human-based trial and error. EvoFab allows the power of evolutionary design techniques to be applied to this compelling and vital design domain. Soft bodies could be evolved and evaluated in situ, without resorting to simulation or post-hoc methods. The results of such endeavors could have significant consequences not just for search-andrescue, but also in biomedical applications such as endoscopy.

References 1. Al-Sakran, S.H., Koza, J.R., Jones, L.W.: Automated re-invention of a previously patented optical lens system using genetic programming. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 25–37. Springer, Heidelberg (2005) 2. Barnett, E., Angeles, J., Pasini, D., Sijpkes, P.: Robot-assisted rapid prototyping for ice structures. In: IEEE Int. Conf. on Robotics and Automation (2009)

380

J. Rieffel and D. Sayles

3. Dawkins, R.: The Blind Watchmaker. W. W. Norton & Company, Inc. (September 1986) 4. Funes, P., Pollack, J.B.: Evolutionary body building: Adaptive physical designs for robots. Artificial Life 4(4), 337–357 (1998) 5. Hornby, G.S., Pollack, J.B.: The advantages of generative grammatical encodings for physical design. In: Proceedings of the 2001 Congress on Evolutionary Computation CEC 2001, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 27-30 2001, pp. 600–607. IEEE Press, Los Alamitos (2001) 6. Lohn, J.D., Hornby, G.S., Linden, D.S.: An Evolved Antenna for Deployment on NASA’s Space Technology 5 Mission. In: O’Reilly, U.-M., Riolo, R.L., Yu, T., Worzel, B. (eds.) Genetic Programming Theory and Practice II. Kluwer, Dordrecht (2005) 7. Pollack, J.B., Lipson, H., Hornby, G., Funes, P.: Three generations of automatically designed robots. Artificial Life 7(3), 215–223 (Summer 2001) 8. Rieffel, J.: Evolutionary Fabrication: the co-evolution of form and formation. PhD thesis, Brandeis University (2006) 9. Rieffel, J., Pollack, J.: The Emergence of Ontogenic Scaffolding in a Stochastic Development Environment. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 804–815. Springer, Heidelberg (2004) 10. Rieffel, J., Valero-Cuevas, F., Lipson, H.: Automated discovery and optimization of large irregular tensegrity structures. Computers & Structures 87(5-6), 368–379 (2009) 11. Sims, K.: Interactive evolution of dynamical systems. In: First European Conference on Artificial Life. MIT Press, Cambridge (1991) 12. Sims, K.: Evolving 3d morphology and behavior by competition. In: Brooks, R., Maes, P. (eds.) Artificial Life IV Proceedings, pp. 28–39. MIT Press, Cambridge (1994) 13. Thompson, A.: Silicon evolution, Stanford University, pp. 444–452. MIT Press, Cambridge (1996) 14. Vilbrandt, T., Malone, E., Lipson, H., Pasko, A.: Universal desktop fabrication. Heterogenous Objects Modeling and Applications, 259–284 (2008) 15. Watson, R.A., Ficici, S.G., Pollack, J.B.: Embodied evolution: Embodying an evolutionary algorithm in a population of robots. In: Angeline, P.J., Michalewicz, Z., Schoenauer, M., Yao, X., Zalzala, A. (eds.) Proceedings of the Congress on Evolutionary Computation, Mayflower Hotel, Washington D.C., USA, 6-9 1999, vol. 1, pp. 335–342. IEEE Computer Society Press, Los Alamitos (1999) 16. Whitley, D., Beveridge, J.R., Guerra-Salcedo, C., Graves, C.: Messy genetic algorithms for subset feature selection. In: International Conference on Genetic Algorithms, ICGA 1997 (1997)

Evolving Physical Self-assembling Systems in Two-Dimensions Navneet Bhalla1 , Peter J. Bentley2 , and Christian Jacob1,3 1

Dept. of Computer Science, Faculty of Science, University of Calgary, 2500 University Drive N.W., Calgary, Alberta, Canada, T2N 1N4 [email protected] 2 Dept. of Computer Science, Faculty of Engineering Sciences, University College London, Malet Place, London, United Kingdom, WC1E 6BT [email protected] 3 Dept. of Biochemistry & Molecular Biology, Faculty of Medicine, University of Calgary, 3280 Hospital Drive N.W., Calgary, Alberta, Canada, T2N 4Z6 [email protected]

Abstract. Primarily top-down design methodologies have been used to create physical self-assembling systems. As the sophistication of these systems increases, it will be more challenging to deploy top-down design, due to self-assembly being an algorithmically NP-complete problem. Alternatively, we present a nature-inspired approach incorporating evolutionary computing, to couple bottom-up construction (self-assembly) with bottom-up design (evolution). We also present two experiments where evolved virtual component sets are fabricated using rapid prototyping and placed on the surface of an orbital shaking tray, their environment. The successful results demonstrate how this approach can be used for evolving physical self-assembling systems in two-dimensions. Keywords: self-assembly, evolutionary computing, rapid prototyping.

1

Introduction

The plethora of complex inorganic and organic systems seen throughout nature is the result of self-assembly. Complex self-assembled entities emerge from decentralised components governed by simple rules. Natural self-assembly is dictated by the morphology of the components and the environmental conditions they are subjected to, as well as their component and environment physical and chemical properties - their information [1] [2]. Components, their environment, and the interactions among them form a system, which can be described by a set of simple rules. Coupled with this bottom-up construction process (self-assembly) bottom-up design is used with living organisms where the process of evolution is displayed through their genetic rule sets - their DNA. Through transcription to RNA and translation to proteins, these rules are mapped to physical shapes, encapsulating the central dogma of molecular biology [3]. Proteins, the resulting self-assembling shapes, are the primary building blocks of living organisms. G. Tempesti, A.M. Tyrrell, and J.F. Miller (Eds.): ICES 2010, LNCS 6274, pp. 381–392, 2010. c Springer-Verlag Berlin Heidelberg 2010 

382

N. Bhalla, P.J. Bentley, and C. Jacob

However, designing artificial, physical, self-assembling systems remains an elusive goal. Based on relevant work [4], primarily top-down design methodologies have been used to create physical self-assembling systems. As the sophistication of these systems increases, it will be more challenging to deploy top-down design, due to self-assembly being an algorithmically NP-complete problem [5]. How to design a set of physical components and their environment, such that the component set self-assembles into a target structure remains an open problem. Evolutionary Computing (EC) [6] is well-suited for such problems. In pursuit of addressing this open problem, we present the incorporation of EC into the three-level approach [7] [8] for designing physical self-assembling systems. The three-level approach comprises specifying a set of self-assembly rules, modelling these rules to determine the outcome of a specific system in software, and translating to a physical system by mapping the set of self-assembly rules using physically encoded information. This is consistent with the definition of self-assembly [9], refined here as a process that involves components that can be controlled through their proper design and their environment, and which are adjustable (components can adjust their position relative to one another). Furthermore, the three-level approach is inspired by the central dogma of molecular biology, in being able to map a set of self-assembly rules directly to physical shapes. This is beneficial in that no knowledge of the target structure’s morphology is required, only its functionality. As a result, incorporating EC into the three-level approach is appropriate. The next section presents background material to which our self-assembly model and evolutionary approach is based upon. Next, an overview of the threelevel approach is presented along with details of an example incorporating EC. Two experiments follow which demonstrate the creation of evolved component sets and their translation, via physically encoded information, to physical systems using rapid prototyping1 . We conclude by summarising how this work provides as proof-of-concept a means to evolving physical self-assembling systems.

2

Background

The abstract Tile Assembly Model (aTAM) [10] was originally developed to model the self-assembly of molecular structures, such as DNA Tiles [11], on a square lattice. These tiles use interwoven strands of DNA to create the square body of a tile (double-stranded) with single strands extending from the edges of the tiles. A tile type is defined by binding domains on the North, West, South, and East edges of a tile. A finite set of tile types is specified (which are in infinite supply in the model). At least one seed tile must be specified to start the self-assembly process. Tiles cannot be rotated or reflected. There cannot be more than one tile type that can be used at a particular assembly location in the growing structure (although the same binding domain is permitted on more than one tile type). All tiles are present in the same environment, a one-pot-mixture. 1

Supplementary resources, including all CAD files and videos, pertaining to the experiments can be found at www.navneetbhalla.com/resources.

Evolving Physical Self-assembling Systems in Two-Dimensions

383

Tiles can only bind together if the interactions between binding domains are of sufficient strength (provided by a strength function), as determined by the temperature parameter. The sum of the binding strengths of the edges of a tile must meet or exceed the temperature parameter. For example, if the temperature parameter is two, at least two strength-one bonds must be achieved to assemble a tile, i.e. the temperature parameter dictates co-operative bonding. The seed tile is first placed on the square lattice environment. Tiles are then selected one at a time, and placed on the grid if the binding strength constraints are satisfied. The output is a given shape of fixed size, if the model can uniquely construct it. aTAM has been used to study algorithmic self-assembly complexity, the Minimum Tile Set Problem (MTSP) and the Tile Concentration Problem (TCP) [5]. The goal of MTSP is to find the lowest number of tile types that can uniquely self-assemble into a target structure. The goal of TCP is to find the relative concentrations of tile types that self-assemble into the target structure using the fewest assembly steps. MTSP is an NP-complete problem for general target structures. The algorithmic complexity of TCP has only been calculated for specific classes of target structures. EC has been applied to self-assembly based on aTAM. In [12], EC was used to evolve (in simulation) different co-operative bonding mechanisms between two to five tiles, to create a ten by ten square.

3

Three-Level Approach and Evolution

We extend aTAM to better suit the components (tiles) used in our systems. We also physically realise the results achieved by our EC implementation. The selfassembly design problem we are concerned with is a combination of MTSP and TCP, as well as several other constraints. These three differences are expanded upon in presenting our incorporation of EC into the three-level approach [7] [8]. The three-level approach provides a high-level description to designing selfassembling systems via physically encoded information [1] [2]. The three phases included in our approach are: (1) definition of rule set, (2) virtual execution of rule set, and (3) physical realisation of rule set (Fig. 1). The three-level approach provides a bottom-up method to create self-assembling systems. This is achieved by being able to directly map a set of self-assembly rules to a physical system. Here we present the addition of EC to evolve the level one rules. Results from the level two modelling are used for evaluation by the evolutionary algorithm (Fig. 1). After running the evolutionary algorithm, if the desired results are achieved, the level one rules can be mapped to a physical system. 3.1

Level One: Definition of Rule Set

To demonstrate how the three-level approach and EC can be used, the following example implementation was constructed. Its purpose is to show how to create a set of physical, two-dimensional, components that self-assemble into a set of target structures, created in parallel. Self-assembly rules are divided into three categories, which define a system: component, environment, and system rules.

384

N. Bhalla, P.J. Bentley, and C. Jacob Level 1: Definition of Rule Set

Level 1: Definition of Rule Set

map rule set to physicallyindependent model for evaluation

map rule set to physicallyindependent model for evaluation

Level 2: Virtual Execution of Rule Set

Level 2: Virtual Execution of Rule Set map rule set to physically encoded information

Level 3: Physical Realisation of Rule Set

evaluate modeling results

Evolutionary Computing

if desired result achieved, then map rule set to physically encoded information

Level 3: Physical Realisation of Rule Set

Fig. 1. Three-level approach (left), and incorporating EC (right)

Component rules specify primarily shape and information. Components are similar in concept to DNA Tiles [11]. Abstractly, components are all squares of unit size. Each edge of a component serves as an information location, in a four-point arrangement, i.e. North-West-South-East. Information is abstractly represented by a capital letter (A to G). If no information is associated with an information location (a neutral site), the dash symbol (−) is used. The spatial relationship of this information defines a component type (Fig. 2). Environment rules specify environmental conditions, such as the temperature of a system and boundary constraints. The temperature determines the threshold to which the assembly protocol must satisfy in order for assembly bonds to occur. Components are confined due to the environment boundary, but are permitted to translate and rotate in two-dimensions, and interact with one another and their environment. However, components are not permitted to be reflected. System rules specify the quantity of each component type, and componentcomponent information interactions (i.e. assembly interactions) and componentenvironment information interactions (i.e. transfer of energy and boundary interactions). In this implementation, there are two types of system interaction rules, referred to as fits rules and breaks rules. Abstractly, if two pieces of complementary information come into contact (i.e. they fit together, A fits B), it will cause them to assemble. This rule type is commutative, meaning if A fits B, then B fits A. Abstractly, if two assembled pieces of information experience a temperature above a certain threshold their assembly breaks. 3.2

Level Two: Virtual Execution of Rule Set

At level two, a self-assembly rule set is mapped to an abstract model. We present an extension to aTAM [10], the concurrent Tile Assembly Model (cTAM) [7]. cTAM is a modelling technique better suited to the type of physical selfassembling systems we use for demonstration purposes. There are five features to cTAM. (1) There are no seed tiles, meaning any two compatible tiles can start the self-assembly process. (2) Tiles can self-assemble into multiple substructures concurrently. (3) Tiles can be rotated, but cannot be reflected. (4) More than

Evolving Physical Self-assembling Systems in Two-Dimensions -

A

-

-

A

-

-

C B

B

D

-

-

-

A

-

-

Step 1 -

C A

-

B -

D

A

-

-

B

Neutral Site

-

-

-

-

F

-

B

E -

-

B A

B

G

-

-

Step 4

-

F -

C

D

B

A

-

B

-

A

Boundary Violation

E

F

-

A

A

-

C A

-

A

-

F

B

D -

Step 3

C -

-

-

B

A

Step 2

-

B

D

385

-

-

C

-

Uncomplimentary Information

-

B F

D

E

-

No Assembly Path

-

Fig. 2. Example cTAM steps (left) and assembly violations (right)

one tile type can be used at a particular assembly location in a growing structure. (5) All tiles are present in the same environment, one-pot-mixture. In this implementation, the temperature parameter is set to one. The initial set of tiles in cTAM is a multiset (type and frequency). In cTAM (Fig. 2), a single assembly operation is applied at a time, initialised by selecting a single tile/substructure with an open assembly location at random. If no other tile or substructure has an open complementary information location, then the location on the first tile/substructure is labelled unmatchable. If there are tiles/substructures with open complementary information locations, all those tiles/substructures are put into an assembly candidate list. Tiles and substructures are selected at random (from the assembly candidate list) until a tile/substructure can be successfully added. If no such tile/substructure can be added, due to an assembly violation (Fig. 2), then the location is labelled unmatchable. If a tile/substructre can be added, the open assembly locations on the two tiles/substructures are updated, and labelled match (all applicable assembly locations must match when adding two substructures). The algorithm repeats, and halts when all assembly locations are set to match or unmatchable. At the conclusion of the algorithm, the resulting structures are placed in a single grid environment to determine if any environment boundary violations occur. A post-evaluation of environment constraints is sufficient for this implementation, as we are more concerned with the set of self-assembled structures than environmental constraints. 3.3

Level Three: Physical Realisation of Rule Set

Components are mapped to their physical equivalents using rapid prototyping (Fig. 3). Physical components are defined by their design space, the set of physically feasible designs. The design space is a combination of a shape space and an assembly protocol space. A key-lock-neutral concept defines the shape space. A 3-magnetic-bit encoding scheme defines the assembly protocol space. Either one or two magnets are used in each position. Magnets are placed within the sides of the components. The magnets are not flush with the surface creating an air gap, to be adjustable [9] and allow for selective bonding. Lock-to-lock interactions are guaranteed to never occur. Therefore, this shape characteristic is used to manipulate the designation of the 3-magnetic-bit encodings to keys and locks. One magnet is placed in each position designated to a key,

N. Bhalla, P.J. Bentley, and C. Jacob 3.003.00

5.00

R0.80

R0.80

5.00 10.00

2.20 2.50

10.00

2.50 5.00

5.00 2.50 5.00

0.50

1.35

5.00

386

Fig. 3. Left to right: physical component shape space (solid thick lines represent the base shape, dashed lines represent neutral sites, and thin solid lines represent key shapes), physical component specifications (top and right view in mm), and an example physical component (blue/red paint on top represents magnetic north/south patterns)

and two magnets are placed in each position designated to a lock. This ensures strong binding between keys and locks, and weak binding between key-to-key interactions. The prevention of weak binding can be avoided with an appropriate environment temperature setting. Therefore, key-to-key matching errors can be avoided, and key-to-lock matching errors can be reduced through proper designation of the 3-magnetic-bit encodings to keys and locks (Table 1).

Table 1. Key/lock designations to magnetic patterns with abstract label, and interaction rules ( ’→’ transition, ’+’ assembly, ’;’ disassembly, and ’T2 ’ temperature 2) Key/Lock 3-magnetic-bit Label Lock Lock Lock Lock Key Key Key Key

3.4

000 110 011 101 111 001 100 010

A C E G B D F H

Fits Rule A fits B C fits D E fits F G fits H B fits A D fits C F fits E H fits G

→ → → → → → → →

A+B C+D E+F G+H B+A D+C F+E H+G

Breaks Rule T2 breaks A+B T2 breaks C+D T2 breaks E+F T2 breaks G+H T2 breaks B+A T2 breaks D+C T2 breaks F+E T2 breaks H+G

→ → → → → → → →

A;B C;D E;F G;H B;A D;C F;E H;G

Evolving Self-assembly Rule Sets

The objective of the evolutionary algorithm is to search for the best component set (type and concentration) able to self-assemble into a single target structure. Here we focus on a single structure, as a first step, since being able to effectively evaluate the result of many diverse structures is challenging. Environment and system (fits and breaks) rules are fixed. The following is an overview of the evolutionary algorithm used, genotype and phenotype representations, fitness function, and selection, crossover, and genetic operators used. A generational evolutionary algorithm [6] is used. The evolutionary unit, gene, is a single component. A databank of gene sequences (linear representation of

Evolving Physical Self-assembling Systems in Two-Dimensions

387

the North-West-South-East edges, using A to G and − symbol) is used to identify/compare genes. There are 6,561 total and 1,665 unique genes (when considering two-dimensional shape and rotations). Elitism is used, where the top 10% of individuals are copied to the next generation. An individual’s genotype representation is a variable length list of genes. At least two genes define a genotype (since this is the minimum for self-assembly to occur). An individual’s phenotype representation is the resulting set of selfassembled structures. A single genotype representation may have more than one phenotype representation, depending on the set of components and assembly steps. Therefore, each individual (genotype) is evaluated three times, at each generation, to help determine the fitness of an individual. A multi-objective fitness function is used to evaluate each individual. The seven objectives can be categorised into evaluating a general and refined solution (Fig. 4). The general solution has five objectives: (1) area (A), (2) perimeter (P), (3) Euler (E), (4) z-axis, and (5) matches. Each of these objectives is used to achieve the shape of the target structure. The area, perimeter, and Euler (connectivity of a shape) are calculated using 2D Morphological Image Analysis [13]. The second-moment of inertia in the z-axis [14] is calculated to identify similar, but rotated structures. To distinguish between reflected structures (which are not permitted), the number of matching components between a self-assembled structure and the target structure is calculated. A refined solution is accounted for by using two objectives: (6) locations and (7) error. We consider a refined solution as one that minimises the number of remaining open assembly locations and potential assembly errors (due to magnet interactions). The combination of these two objectives also reduce the number of unique components required. Each objective is normalised, using the highest and lowest values from a generation. For objectives one to five (i), the average normalised objective (AN Oi ) over three cTAM evaluations is calculated and compared to the target objective (T Oi ) value. For objective six, the normalised average over the three cTAM evaluations (AN O6 ) is calculated. For objective seven, the normalised objective (N O7 ) is calculated with respect to a genotype. The objectives are then weighted to give the final fitness score F (Equation 1). The weights were selected from preliminary experiments conducted by the authors. F = (0.9

5 

|T Oi − AN Oi |) + 0.1 × AN O6 + 0.1 × N O7

(1)

i=1

The fitness scores for each individual are used during selection. Roulette-wheel selection is used to select two parents (favouring lowest fitness scores). The two parents, using a variable-length crossover operator, are used to create two children. Each common gene (determined by the gene databank) between the two parents is copied to each child. Each uncommon gene, for example the gene from parent one has a 90% probability of being copied to child one (likewise for parent two and child two). After crossover is performed to create two children,

388

N. Bhalla, P.J. Bentley, and C. Jacob

B (1, 1, 1)

-

C I

II

ns: number of squares ne: number of edges nv: number of vertices

III A = ns P = -4 + 2ne E = ns - ne + nv

D (0, 0, 1)

F (1, 0, 0)

H (0, 1, 0)

G D

-

C

D

A

0

D (0, 0, 1)

5

2

F (1, 0, 0)

5

4

2

H (0, 1, 0)

4

3

3

-

IV

B (1, 1, 1)

4

Fig. 4. Fitness objective examples: structure I (A = 5, P = 12, and E = 1); structure II has the same second moment of inertia for its reflected equivalent; number of matches between reflected structure II is 3 (III); number of open locations is 2 (black circles, IV); a sliding window technique is used (matrix) as a sum of magnetic errors (odd number of magnets must match at each position along the sliding window) and is applied to all potential two-component key-to-key interactions in a system, e.g. 2 in IV

the genetic operators duplication, deletion, and mutation are applied to each child. There is a 10% probability of a single gene, chosen at random, of being duplicated, and likewise being deleted. For each information location in a gene, there is a 10% probability of being mutated (equal probability A to G, and −).

4

Experiments and Results

We present two experiments to demonstrate how self-assembling systems can be evolved in two-dimensions. Our hypothesis was, given the attributes of a target structure, an evolutionary algorithm could be used to evolve a set of component rules, which can be mapped to a physical system consisting of an environment containing components that are able to self-assemble into the target structure. The three-level approach with the addition of EC was used to test our hypothesis. Two desired entities (Fig. 5) were specified: cross-shape (experiment 1) and z-shape (experiment 2). For each experiment, enough components are supplied to create up to three target structures. Five trials are run for each experiment. A virtual trial (level two) is evaluated to be successful if all three target structures are created. A physical trial (level three) is evaluated to be successful if at least one target structure is created. The experimental procedure and results are described in terms of the three phases corresponding to the three-level approach. 4.1

Level One: Definition of Rule Set for Experiments

These two target structures were chosen since they offer degrees of complexity in terms of the number of components and their concentration, and symmetric/asymmetric features in the target structures. Consequently, the two target structures cannot be created by pattern formation exclusively. Therefore, it is appropriate for determining if the information encoded in the components is sufficient to achieve the target structures by self-assembly. The independent variable is the set of components. The set of components is defined by their type and

Evolving Physical Self-assembling Systems in Two-Dimensions

389

their concentration. The dependent variable is the resulting self-assembled structures. For each experiment, an evolved component set is generated along with a randomly generated component set, in order to test the independent variable. The evolutionary algorithm used 5,000 generations, with a population size of 50 individuals, for each run. The initial individual (genotype) length was set to the required number of components to create one target structure. Fig. 5 shows the evolutionary algorithm results. Five runs were conducted for each experiment. For experiment one, the two optimal solutions were achieved. The second solution was chosen for these experiments, as components from previous experiments could be reused [7]. For experiment two, the single optimal solution was achieved. For the randomly generated component set, components were created by selecting, with uniform probability, the information assigned to each site. The number of components randomly generated were equal to the required number of components to create one target structure. A summary of the component rules, for each experiment, is provided in Fig. 5. The number of components specified (evolved and random) were multiplied by three in order to create the maximum number of target structures for the experiments.

-

-

-

A

B

-

-

I

-

B A

B

-

B

B

A

-

-

-

A B

A

-

B

B

-

G

-

H -

A -

II

A

A

A

B -

-

-

-

-

III

IV

B

A

-

Experiment 1 Evolved

Component Set (A,A,A,A) × 3, (-,B,-,-) × 12

1 Random

(-,-,B,G) × 3, (-,D,E,E) × 3, (C,-,-,C) × 3, (C, E,-,-) × 3, (-,F,B,H) × 3

2 Evolved

(-,B,G,-) × 3, (-,-,-,A) × 6, (H,-,-,B) × 3

2 Random

(G,H,H,-) × 3, (-,A,-,-) × 3, (-,H,-,-) × 3, (-,-,E,A) × 3

V

Fig. 5. Target structures (I and II); evolutionary results (III, IV and V); component sets for experiments (represented as ’(North, West, South, East) × #’, where the directions refer to component information locations and the # symbol represents quantity), for each evolved and randomly generated component set

4.2

Level Two: Virtual Execution of Rule Set for Experiments

cTAM was used to virtually evaluate the ability of each self-assembly rule set (evolved or random) to create its respective target structure. Although cTAM is used by the evolutionary algorithm, it is used to verify the creation of multiple target structures. The level two experimental set-up and results are provided. Experimental Set-up. The component rules from Fig. 5 were mapped to an abstract representation for cTAM. Each component’s shape was a unit square. The size of the environment was represented as a ratio between the size of the base component shape (square with neutral sites at all four information locations) and the boundary of the environment. Because the environment size represents height and width, the environment size used in cTAM was ten units by ten units, for these experiments. Since cTAM selects tiles/substructures at random to step through the self-assembly process, a different random seed was used to initialise cTAM for each trial. Five trials were conducted for each experiment.

390

N. Bhalla, P.J. Bentley, and C. Jacob

Experimental Results. Each evolved component set successfully created three of their applicable target structures. These results show that even without a component acting as a seed, it is still possible to successfully create target structures. Furthermore, these results show that it is possible to create multiples of the same target structure, when appropriate component information is used. In contrast, none of the randomly generated component sets successfully created at least one target structure, in each experiment. In this case, the same reason applies to both random sets. For the first random set, the first and last component types will form substructures that are independent from substructures formed by the second, third and fourth component types. Likewise for the second random set, the first and third component types will form substructures that are independent from substructures formed by the second and fourth component types. 4.3

Level Three: Physical Realisation of Rule Set for Experiments

With the success of each system using an evolved component set at level two, a level three translation was performed to test if the translated component set of each system could self-assemble into its respective target structure. A level three translation was not performed on the systems using a randomly generated component set, since they were not successful. Experimental Set-up. Component mapping followed Table 1. Components were fabricated using an Eden 333 Polyjet rapid prototyping machine, using Vero Grey resin. Neodymium (NdFeB) disc magnets (1/16” × 1/32”, diameter × radius; grade N50) were inserted into the components. Blue/red paint (north/south) was used to mark magnetic patterns. Mapping for the environment size was done in accordance with the base component size, to specify the dimensions of the circular environment tray. The tray was fabricated using a Dimensions Elite rapid prototyping machine, using ABS plastic (sparse-fill option was used to create a rough surface texture). The outer radius of the tray is 135 mm and the inner radius is 125 mm, while the outer wall height is 9 mm and the inner wall height is 6 mm. The tray was mounted to a Maxi Mix II Vortex Mixer (using a tray mounting bracket, also fabricated using the Dimensions printer). A tray lid was cut using a Trotec Speedy 300 Laser Engraver laser cutting machine, using 2 mm clear acrylic sheet. The tray lid was secured to the tray using polycarbonate screws and wing nuts. Materials/methods details are given in [7]. Each physical trial followed seven steps [7]. (1) Set the continuous speed control on the Maxi Mix II Vortex mixer to 1,050 rpm. This speed was found to create an appropriate shaking level (environment temperature) to maintain fits rules, and to mostly break partially matched magnetic codes. (2) Secure the mixer to a table, using a 3” c-clamp and six hex nuts (to help secure the c-clamp to the back of the mixer). (3) Randomly place components on the surface of the tray (trying to ensure that complementary binding sites on the components are not in-line with each other). (4) Secure the tray lid. (5) Run the mixer for 20 minutes. (6) Turn the mixer off. (7) Record the state of the system, observations including: the number of target structures created, the number of matching errors (between conflicting physical information, where no fits rule is applicable),

Evolving Physical Self-assembling Systems in Two-Dimensions

391

and the number of assembly errors (partial attachment between corresponding physical information, where a fits rule is applicable). Experimental Results. Each trial, for each experiment, was successful in creating at least one target structure. In the second trial for experiment two, two target structures were created. Fig. 6 shows the final state for the best trial for each experiment. In experiment one, there were no matching errors (as this was not possible due to the 3-magnetic-bit codes present) and no assembly errors. In experiment two, there was only one matching error (trial five) and no assembly errors. As structures self-assembled, the environmental free space was reduced, constraining the rotation of substructures and sometimes constraining single components from reaching assembly locations. Fisher’s Exact Test [15] (onesided) for analysing binary data was used to determine the statistical significance of creating target structures. For both experiments, the p-value is 0.004, which we consider statistically significant. As a result, these successful experiments confirm our hypothesis that given the attributes of a target structure, the threelevel approach incorporating EC could be used to evolve a set of component rules, which can be mapped to a physical system consisting of an environment containing components that are able to self-assemble into the target structure.

Fig. 6. Results for the best trial of experiment one (left) and experiment two (right)

5

Conclusions

Here we used EC to evolve the component set required to create one target structure. As future work we look to being able to evolve multiple target structures simultaneously. We have also been able to extend our physical systems and look to applying EC to evolving physical self-assembling systems in three-dimensions. We envision our approach being applicable to the design of (micro)structures, circuits, and DNA Computing using self-assembly. The work presented here progresses techniques to solve an open problem in self-assembly, of being able to create a set of components and their environment, such that the components selfassemble into a target structure. We presented two proof-of-concept experiments

392

N. Bhalla, P.J. Bentley, and C. Jacob

to demonstrate how bottom-up construction (self-assembly) can be coupled with bottom-up design (evolution). EC was incorporated into the three-level approach for designing physical self-assembling systems. The successful results of the experiments presented demonstrate how the three-level approach, by incorporating EC, can be used for evolving physical self-assembling systems in two-dimensions.

References 1. Ball, P.: The Self-made Tapestry. Oxford University Press, Oxford (1999) 2. Thompson, D.W.: On Growth and Form. Dover Publication, New York (1917) (reprint 1992) 3. Crick, F.H.C.: Central Dogma of Molecular Biology. Nature 227, 561–563 (1970) 4. Groß, R., Dorigo, M.: Self-assembly at the Macroscopic Scale. Proc. IEEE 96(9), 1490–1508 (2008) 5. Adlemna, L., Cheng, Q., Goel, A., Huang, M.-D., Kempe, D., de Espan´es, P.M., Rothemund, P.W.K.: Combinatorial Optimization Problems in Self-assembly. In: 34th ACM International Symposium on Theory of Computing, pp. 23–32. ACM Press, New York (2002) 6. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (2002) 7. Bhalla, N., Bentley, P.J.: Programming Self-assembling Systems Via Physically Encoded Information. In: Doursat, R., Sayama, H., Michel, O. (eds.) ME 2010. LNCS. Springer, Heidelberg (2010) 8. Bhalla, N., Bentley, P.J., Jacob, C.: Mapping Virtual Self-assembly Rules to Physical Systems. In: Proceedings of the International Conference on Unconventional Computing, pp. 117–147. Luniver Press, Frome (2007) 9. Whitesides, G.M., Gryzbowski, G.: Self-assembly at all Scales. Science 295, 2418– 2421 (2002) 10. Winfree, E.: Simulations of Computing by Self-assembly. DNA Based Computers IV (1998) 11. Winfree, E., Liu, F., Wenzier, L., Seeman, N.: Design and Self-assembly of Twodimensional DNA crystals. Nature 394(6), 539–544 (1998) 12. Terrazas, G., Gheorghe, M., Kendall, G., Krasnogor, N.: Evolving Tiles for Automated Self-assembly Design. In: Proceeding of the 2007 IEEE Congress on Evolutionary Computation, pp. 2001–2008. IEEE Press, New York (2007) 13. Soille, P.: Morphological Image Analysis, 2nd edn. Springer, Berlin (2003) 14. Johnston Jr., E.R., Eisenberg, E., Mazurek, D.: Vector Mechanics for Engineers: Statics, 9th edn. McGraw-Hill Higher Education, New York (2009) 15. Cox, D.R., Snell, E.J.: Analysis of Binary Data, 2nd edn. Chapman & Hall/CRC, Boca Raton (1989)

Author Index

Bechmann, Matthias 335 Benkhelifa, Elhadj 322 Bentley, Peter J. 121, 381 Bhalla, Navneet 381 Bidlo, Michal 85 Bremner, Paul 37 Bull, Larry 360

Ledwith, Ricky D. 25 Liang, Houjun 193 Lipson, Hod 157 Liu, Yang 238 Li, Zhifang 193 Lowe, David 49 Luo, Wenjian 193

Cagnoni, Stefano 97 Carrillo, Snaider 133 Cornforth, Theodore W.

Madrenas, Jordi 145, 299 McDaid, Liam 133 Mesquita, Antonio 310 Miller, Julian F 25, 61 Miorandi, Daniele 49 Mondada, Francesco 286 Moreno, Juan Manuel 145, 299 Morgan, Fearghal 133 Mujkanovic, Amir 49 Mussi, Luca 97

Dragffy, Gabriel

157

37

Ebne-Alian, Mohammad Ebner, Marc 109 Eiben, A.E. 169

73

Farnsworth, Michael 322 Fonseca Vieira, Pedro da 310 Gajda, Zbyˇsek 13 Gamrat, Christian 262 Glette, Kyrre 250, 274 Graf, Yoan 286 Haasdijk, Evert 169 Harkin, Jim 133 Hilder, James A. 1 Hovin, Mats 274 Ivekovic, Spela Jacob, Christian

97 381

Kaufmann, Paul 250 Kharma, Nawwaf 73 Kim, Kyung-Joong 157 Knieper, Tobias 250 Kobayashi, Kotaro 299 Kotasek, Zdenek 181 Kuyucu, T¨ uze 61

Pande, Sandeep 133 Pena, Carlos 226 Perez-Uribe, Andres 286 Philippe, Jean-Marc 262 Pipe, Tony 37 Platzner, Marco 250 Prodan, Lucian 348 R´etornaz, Philippe 286 Rieffel, John 372 Rossier, Jo¨el 202, 226 Rouhipour, Marjan 121 Ruican, Cristian 348 Rusu, Andrei A. 169 S´ a, Leonardo Bruno de 310 Samie, Mohammad 37 Sanchez, Eduardo 286 S´ anchez, Giovanny 145 Satiz´ abal, H´ector F. 286 Sayles, Dave 372 Sebald, Angelika 335 Sekanina, Luk´ aˇs 13, 214 Shayani, Hooman 121 ˇ aˇcek, Jiˇr´ı 214 Sim´ Skarvada, Jaroslav 181

394

Author Index

Slany, Karel 85 Stareˇcek, Luk´ aˇs 214 Stauffer, Andr´e 202 Stepney, Susan 335 Strnadel, Josef 181 Tain, Benoˆıt 262 Tempesti, Gianluca 238 Thoma, Yann 286 Tiwari, Ashutosh 322 Torresen, Jim 250 Trefzer, Martin A. 61 Tyrrell, Andy M. 1, 37, 61, 238

Udrescu, Mihai Upegui, Andres

348 286

Vasicek, Zdenek Vladutiu, Mircea

85 348

Walker, James Alfred Wang, Xufa 193 Yamamoto, Lidia Zhu, Meiling

322

49

1, 37, 238