211 64 30MB
English Pages 234 Year 2010
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
ENGINEERING TOOLS, TECHNIQUES AND TABLES
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
APPLICATIONS OF SWARM INTELLIGENCE
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
ENGINEERING TOOLS, TECHNIQUES AND TABLES Additional books in this series can be found on Nova’s website under the Series tab.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Additional E-books in this series can be found on Nova’s website under the E-book tab.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
ENGINEERING TOOLS, TECHNIQUES AND TABLES
APPLICATIONS OF SWARM INTELLIGENCE
LOUIS P. WALTERS
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
EDITOR
Nova Science Publishers, Inc. New York
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2011 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com
NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Additional color graphics may be available in the e-book version of this book.
LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Applications of swarm intelligence / editor, Louis P. Walters. p. cm. Includes bibliographical references and index. ISBN 978-1-61728-813-5 (e-book) 1. Swarm intelligence. 2. Problem solving. I. Walters, Louis P. Q337.3.A67 2010 006.3--dc22 2010017740
Published by Nova Science Publishers, Inc. † New York
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
CONTENTS
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Preface
vii
Chapter 1
Swarm Intelligence and Fuzzy Systems Seyed-Hamid Zahiri
Chapter 2
Evolutionary Strategies to Find Pareto Fronts in Multiobjective Problems Voratas Kachitvichyanukul and Nguyen Phan Bach Su
33
Chapter 3
Particle Swarm Optimization Applied to Real-World Combinatorial Problems: The Case Study of the in-Core Fuel Management Optimization Anderson Alvarenga de Moura Meneses and Roberto Schirru
57
Chapter 4
Swarm Intelligence and Artificial Neural Networks Mayssam Amiri and Seyed-Hamid Zahiri
77
Chapter 5
Swarm Intelligence for the Self-Assembly of Neural Networks Charles E. Martin and James A. Reggia
89
Chapter 6
Application of Particle Swarm Optimization Method to Inverse Heat Radiation Problem Kyun Ho Lee
131
Chapter 7
Ant Colony Optimization for Fuzzy System Parameter Optimization: From Discrete to Continuous Space Chia-Feng Juang
153
Chapter 8
Particle Swarm Optimization: A Survey Adham Atyabi and Sepideh Samadzadegan
167
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
1
vi
Contents
Chapter 9
Application of PSO to Electromagnetic and Radar-Related Problems in Non Cooperative Target Identification B. Errasti-Alcalá, A. Jurado-Lucena, D. Escot-Bocanegra, D. Poyatos-Martínez, R. Fernández-Recio and I. Montiel-Sánchez
179
Chapter 10
Ant Colony Optimization: A Powerful Strategy for Biomarker Feature Selection Weixiang Zhao and Cristina E. Davis
193
Chapter 11
Swarm Intelligence Based Anonymous Authentication Protocol for Dynamic Group Management in EHRM System N.K. Sreelaja and G.A. Vijayalakshmi Pai
199
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Index
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
215
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
PREFACE Swarm Intelligence (SI) describes the evolving collective intelligence of population/ groups of autonomous agents with a low level of intelligence. The population of agents interacts with each other or their environment locally using decentralized and selforganizational aspects in their decision making. SI and related sub-methods that follow its principles are used for problem solving in a variety of areas, such as robotics and forecasting. This book discusses swarm intelligence techniques and fuzzy logic as useful tools for solving practical engineering problems and the utilization of a swarm intelligence algorithm to obtain the optimum neural network structure. Also explored is the application of Particle Swarm Optimization (PSO) methods to inverse heat radiation problems and PSO as a technique in computational electromagnetism problems. In past research studies it has been shown that the swarm intelligence techniques and fuzzy logic are two useful tools for solving practical engineering problems. In Chapter 1 we explain how each of these tools can be utilized by others for improving the performance of another. For more explanation, each fuzzy system has many structural parameters (e.g. membership functions, fuzzy antecedents, fuzzy consequents, and fuzzy operators). Usually, these parameters are selected by extensive experiments using trial and error. Thus in many cases the obtained set up of structural fuzzy systems are not optimal. At this time a swarm intelligence optimization algorithm can be utilized for automatically optimizing the fuzzy systems parameters. The first topic that is investigated in this chapter is the capability of swarm intelligence optimization techniques to obtain the optimal fuzzy systems parameters. On the other hand, it is known that the search process of the swarm intelligence techniques is non-linear and very complicated. This complexity becomes more and more when the swarm intelligence algorithms are used for solving multi-objective problems. Thus, it is hard if not impossible, to mathematically model the search process of swarm intelligence optimization algorithms. But over the years, some understandings of their search process have been accumulated, and linguistic description of their search process is available. This understanding and linguistic description make a fuzzy system a good candidate for dynamically controlling the parameters of multi-objective swarm intelligence algorithms. It leads to designing more powerful and efficient multi-objective swarm intelligence algorithms. The second topic which is discussed in this chapter is the study around the abilities of fuzzy controllers for adapting the parameters swarm intelligence optimization algorithms
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
viii
Louis P. Walters
(especially particale swarm optimization) in such manner that the powerfulness and effectiveness of these optimization algorithms are improved. By focusing on multi-objective particle swarm optimization (MOPSO), a fuzzy multiobjective swarm intelligence algorithm is introduced and called Fuzzy-MOPSO. In fact, Fuzzy-MOPSO removes many weak points of conventional MOPSO (e.g., undesired distribution of the non-dominated solutions on the Pareto front, undesired number of nondominated solutions, local capturing, low convergence rate, and premature convergence). The abovementioned topics are followed by tackling practical problems in pattern recognition, multi-objective benchmarks, and space allocation. In each practical problem, the comparison results with other heuristic methods are provided. Also, a review on some of past and ongoing research studies is presented. The existence of multiple conflicting objectives in many real world problems poses a real challenge for the search of compromised solutions. One of the common approaches is to find the Pareto fronts for the tradeoff of the conflicting objectives. In identifying the Pareto fronts, many potential solutions are required and this naturally favored most evolutionary methods that utilized multiple solutions, i.e., population based methods. Evolutionary methods such as genetic algorithm (GA), Ant colony optimization (ACO), and particle swarm optimization (PSO) have been applied to effectively find solutions for many difficult single objective optimization problems in recent years. Among these evolutionary methods, one of the more popular approaches is the particle swarm optimization (PSO) algorithm. Several PSO algorithms were proposed to deal with the problem of identifying Pareto fronts for optimization problems with multiple conflicting objectives. The main focus of these previous studies is on the components to search for the non-dominated particles and for the construction of an effective external archive to store these non-dominated particles. Chapter 2 investigates various strategies that affect the movement behaviors of the particles in the swarms. These strategies have strong direct influence on the quality of final Pareto fronts. The main strategies included in the study include the followings: guiding particles based on Crowding Distance (CD), adopting Differential Evolution (DE) concept for swarm movement, as well as using swarm with mixed particles to explore Pareto solutions. These strategies are applied to solve standard benchmark problems to assess the quality of the resulting Pareto fronts. Particle Swarm Optimization (PSO) is an Optimization Metaheuristic situated within the Swarm Intelligence paradigm. Based upon collaborative aspects of intelligence, the PSO is a search methodology that represents the social learning metaphor: the individuals learn with their own experience in a group and take advantages of the performance of other individuals. After almost fifteen years of the seminal paper written by Kennedy and Eberhart, the PSO has a noticeable number of applications, in particular real-world problems in which it outperforms other algorithms. In Chapter 3, the application of discrete PSO models to Combinatorial Optimization problems is reviewed. The PSO fundamentals are discussed, and the PSO with Random Keys, a discrete model of the PSO, is presented. Experimental computational results of its application to instances of the Traveling Salesman Problem and to the real-world InCore Fuel Management Optimization are given as examples that show the success obtained by this technique. The most important issue in a neural network is its learning algorithm to find the optimum weight vectors to reach the minimum error. There are many types of learning
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Preface
ix
algorithm (e.g. back propagation algorithm) but they are not the optimal approach confidently. At this time, a swarm intelligence optimization algorithm can be utilized to establish new optimum training algorithm. On the other hand neural networks have some structural parameters such as the number of hidden layer, the number of neurons in each layer, and activation function types. Usually, these parameters are selected manually with a trial and error process involving extensive experiments. Indeed, these structural parameters are not chosen optimally. In this stage, by employing a swarm intelligence algorithm, the optimum neural network structure can be obtained automatically. In Chapter 4, the abovementioned topic is followed by presentation of a review on some of the past research studies. The processes controlling the growth or self-assembly of biological neural networks are extremely complex and poorly understood. Developmental models have been created to tackle the considerable challenges of modeling neurogenesis. However, the dynamics of neural growth and the networks generated by these past models, even though they have generally been limited to two-dimensional spaces, tend to be very difficult to predict and control, particularly when growth is represented as a continuous process and involves large networks with complex topologies. In response to these difficulties, in Chapter 5 the authors present a developmental model of three-dimensional neural growth based on swarm intelligence in the form of collective movements and principles of self-assembly that greatly improves the robustness and controllability of artificial neurogenesis involving complex networks. A central innovation of our approach is that neural connections arise as persistent “trails” left behind moving agents, so that network connections are essentially a record of agent movements. The authors demonstrate the model’s effectiveness by using it to produce two large networks that support subsequent learning of topographic and feature maps. Improvements produced by the incorporation of collective movements are further expounded through computational experiments. These results highlight the model’s potential as a methodology for advancing our understanding of biological neural growth, among other applications, and their relationship to the concepts of swarm intelligence. In Chapter 6, an inverse heat radiation analysis is presented for the estimation of the thermal radiation properties for an absorbing, emitting, and scattering media with diffusely emitting and reflecting opaque boundaries. The particle swarm optimization (PSO) algorithm, one of high performance intelligence algorithms recently created as an alternative to a conventional intelligence method, is proposed as an effective method for improving the search efficiency for unknown radiative parameters. To verify the feasibility and the performance of PSO, it is applied to inverse heat radiation analysis in estimating the wall emissivities, absorption and scattering coefficients in a two-dimensional irregular medium when the measured temperatures are given. The accuracy of estimated parameters and the computational efficiency from PSO are compared with the results obtained by a genetic algorithm (GA). Finally, it is proven that PSO is quite a robust method for simultaneous estimation of multi-parameters when it is applied to the inverse heat radiation problem. Chapter 7 introduces fuzzy system (FS) parameter optimization using discrete and continuous ant colony optimization (ACO) algorithms. This chapter divides the FS parameter optimization problem into two parts: optimization in discrete space and optimization in continuous space. For the problem of parameter optimization in discrete space, this chapter describes the use of discrete ACO. The basic concept of using discrete ACO for solving discrete combinatorial optimization problems is described. The problem of FS consequent
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
x
Louis P. Walters
parameter optimization in discrete space is formulated as a combinatorial optimization problem. The application of discrete ACO algorithms with different pheromone matrix definitions, heuristic value assignments, and graphical representations of this optimization problem are described. For the problem of parameter optimization in continuous space, this chapter introduces the use of continuous ACO. The basic concept of using continuous ACO algorithms that find solutions in continuous space is described. This chapter introduces how to apply continuous ACO to FS parameter optimization problems. The continuous optimization problem is formulated in terms of feasible paths in a graph consisting of nodes and edges. Finally, this chapter presents simulation results of FS optimization problems using continuous ACO. Comparisons with genetic and particle swarm optimization algorithms demonstrate the superiority of continuous ACO performance. Swarm Intelligence (SI) describes the evolving collective intelligence of population/groups of autonomous agents with a low level of intelligence. Particle Swarm Optimization (PSO) is an evolutionary algorithm inspired by animal social behaviour. PSO achieves performance by iteratively directing its particles toward the optimum using its social and cognitive components. Various modifications have been applied to PSO focusing on addressing a variety of methods for adjusting PSO's parameters (i.e., parameter adjustment), social interaction of the particles (i.e., neighbourhood topology) and ways to address the search objectives (i.e., sub-swarm topology). The PSO approach can easily fit in different search optimization categories such as Self Learning, Unsupervised Learning, Stochastic Search, Population- based, and Behaviour-based Search. Chapter 8 addresses these principal aspects of PSO. In addition, conventional and Basic PSO are introduced and their shortcomings are discussed. Later on, various suggestions and modifications proposed by literature are introduced and discussed. Particle Swarm Optimization (PSO) is a technique widely used in many applications areas and is being studied extensively by many researchers. Regarding electromagnetism, PSO has shown to be one of the algorithms with the best behaviour when facing several specific problems. In Chapter 9, the application of PSO to two real problems is presented as a part of our work in computational electromagnetics applied to Non Cooperative Target Identification (NCTI). These are the estimation of the Direction of Arrival (DOA) of various incoming signals and the estimation of the electromagnetic constitutive parameters of dielectric materials. DOA estimation is a well known problem involved in different applications. For instance, it is used in ESM (Electronic Support Measurement) systems to determine the angular position of a target from its emissions. For the next generation of Passive Radars using Phased Array Antennas, the determination of the target position should be done by these techniques too. In this sense, if we want to be able to perform the identification by means of NCTI techniques it is very important to understand and master these techniques. Due to characteristics of these new radars, new efforts are necessary for estimating the DOA with fewer snapshots. In that sense, PSO provides a simple but fast and accurate solution to the DOA problem using a single snapshot. The application of PSO to DOA will be presented in full detail, including also a performance study of the algorithm. Apart from the DOA techniques, the estimation of electromagnetic properties of materials is a key part of the process to generate a trustable model to be simulated by electromagnetic software tools in order to obtain its radar signature for identification purposes. This estimation can be accomplished through free space reflection measurement of the material in an
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Preface
xi
anechoic chamber. Formerly, the classical methods tried to estimate the complex permittivity (*) from the measurements minimizing an error function based on the transmission line theory. Nowadays, soft-computing techniques like PSO are beginning to be used, with accurate results. As instrumentation develops in industry and science, we frequently generate multidimension data sets that may involve input from a large number of factors or variables. Many parameters of these instrument systems may not be directly related with the core function of the systems, and some factors may even lead to noise contamination of output signals. The potential obscuring effects of these variables on the data set can make it difficult to determine which parts of the instrument data are the most meaningful. Therefore, feature selection within data sets is becoming a core technique to detect pertinent factors or variables for system characterization. This not only reduces the data dimension but also provides pertinent information for system mechanism studies, and can ultimately yield information about the underlying instrumentation function. Feature selection within data sets has been attempted using a variety of different methods, and some conventionally used methods include statistical analyses such as Student’s t-test, the Fisher-ratio and analysis of variance (ANOVA); however, these methods may not always be feasible for nonlinear systems and non-classification problems. As an artificial intelligence method, genetic algorithm may provide a novel feature selection strategy to detect pertinent features for a variety of systems, even for those without clear mechanisms. And this type of biological inspired adaptive learning method has prompted other new approaches in feature selection, such as the ant colony algorithm (ACA) method. The ant colony algorithm that mimics the social behavior of ants is a typical swarm intelligence based optimization method, and this approach has increasingly been applied for system feature selection. Chapter 10 will provide a short review of recent ACA based feature selection studies, compare the outcomes of these studies to other intelligent feature selection methods, and discuss the advantages and disadvantages of the ACA based feature selection method. Together this chapter will suggest promising directions for future research in this area. The Internet today provides no support for privacy and authentication of multicast packets. However an increasing number of applications will require secure services in order to restrict group membership and enforce accountability of group members. Chapter 11 presents a novel architecture for digital identity management. An anonymous authentication protocol that not only allows much lower computational complexity for practical use but also meets requirements of dynamic groups is proposed. This contribution is to provide the strict analysis of security based on the framework of provable security. The protocol consists of an Authorizing Agent (AA), a Group Controller (GC) and an Access Provider (AP). In the protocol, the AP and GC does not possess any information needed to identify the users in the group. In particular, this protocol is suitable for Electronic Health Record Management (EHRM) where several users register to access patient details. The change of participating entities in a group occurs frequently. Ant Colony Optimization (ACO) is an emergent collective intelligence of groups of simple autonomous agents. The problem of cumulative member removal is explored and an ACO based Boolean Function Minimization Technique (BFMT) for group rekeying is employed in the protocol. Termed Ant Colony Optimized Boolean Expression Evolver (ABXE), the novel technique serves to efficiently obtain a minimized Boolean expression
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
xii
Louis P. Walters
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
while overcoming the drawbacks of the existing BFMT techniques, for group rekeying. Simulation results are shown to prove that the ABXE found minimized Boolean expression represents the minimum number of messages required to distribute minimum number of keys to the users accessing the EHRM system thereby reducing the communication overhead.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
In: Applications of Swarm Intelligence Editor: Louis P. Walters, pp. 1-32
ISBN: 978-1-61728-602-5 © 2011 Nova Science Publishers, Inc.
Chapter 1
SWARM INTELLIGENCE AND FUZZY SYSTEMS Seyed-Hamid Zahiri* Department of Electrical Engineering, Faculty of Engineering, Birjand University, Birjand, Iran
Abstract
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
In past research studies it has been shown that the swarm intelligence techniques and fuzzy logic are two useful tools for solving practical engineering problems. In this Chapter we explain how each of these tools can be utilized by others for improving the performance of another. For more explanation, each fuzzy system has many structural parameters (e.g. membership functions, fuzzy antecedents, fuzzy consequents, and fuzzy operators). Usually, these parameters are selected by extensive experiments using trial and error. Thus in many cases the obtained set up of structural fuzzy systems are not optimal. At this time a swarm intelligence optimization algorithm can be utilized for automatically optimizing the fuzzy systems parameters. The first topic that is investigated in this chapter is the capability of swarm intelligence optimization techniques to obtain the optimal fuzzy systems parameters. On the other hand, it is known that the search process of the swarm intelligence techniques is non-linear and very complicated. This complexity becomes more and more when the swarm intelligence algorithms are used for solving multi-objective problems. Thus, it is hard if not impossible, to mathematically model the search process of swarm intelligence optimization algorithms. But over the years, some understandings of their search process have been accumulated, and linguistic description of their search process is available. This understanding and linguistic description make a fuzzy system a good candidate for dynamically controlling the parameters of multi-objective swarm intelligence algorithms. It leads to designing more powerful and efficient multi-objective swarm intelligence algorithms. The second topic which is discussed in this chapter is the study around the abilities of fuzzy controllers for adapting the parameters swarm intelligence optimization algorithms (especially particale swarm optimization) in such manner that the powerfulness and effectiveness of these optimization algorithms are improved. *
E-mail address: [email protected], P.O. Box: 97175-376 Phone: +98-561-2227044 Fax: +98-561-2227795. (Corresponding author)
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
2
Seyed-Hamid Zahiri By focusing on multi-objective particle swarm optimization (MOPSO), a fuzzy multiobjective swarm intelligence algorithm is introduced and called Fuzzy-MOPSO. In fact, Fuzzy-MOPSO removes many weak points of conventional MOPSO (e.g., undesired distribution of the non-dominated solutions on the Pareto front, undesired number of nondominated solutions, local capturing, low convergence rate, and premature convergence). The abovementioned topics are followed by tackling practical problems in pattern recognition, multi-objective benchmarks, and space allocation. In each practical problem, the comparison results with other heuristic methods are provided. Also, a review on some of past and ongoing research studies is presented.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1. Optimizing the Parameters of Fuzzy Systems Using Swarm Intelligence Algorithms The most important issue in designing a fuzzy system is to determine appropriate fuzzy variables and their membership functions, optimum number of fuzzy rules containing suitable antecedents and consequents, and proper fuzzy operators. In fact, these are the structural fuzzy parameters of any fuzzy system which a designer tries to obtain for their optimal set up. In other words, one of the most important considerations in designing any fuzzy system is the generation of the optimal fuzzy rules as well as the membership functions for each fuzzy set. In most existing applications, the fuzzy rules are generated by experts in the area, especially for control problems with only a few inputs. With an increasing number of variables, the possible number of rules for the system increases exponentially, which makes it difficult for experts to define a complete rule set for good system performance. An automated way to design fuzzy systems might be preferable. In essence, the design of a fuzzy system can be formulated as a search problem in high dimensional space where each point represents a rule set, membership functions, and the corresponding system's behavior. Given some performance criteria, the performance of the system forms a hypersurface in the space. Developing the optimal fuzzy system design is equivalent to finding the optimal location of this hypersurface. These characteristics seem to make the swarm intelligence optimization algorithms good candidates for searching the hypersurface for optimum point.
Figure 1. Left-trapezoidal, right-trapezoidal, triangle, Gaussian, and sigmoid membership functions.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
3
In this Section the employment of swarm intelligence algorithms for optimizing fuzzy systems' parameters is investigated. For an application in data mining and pattern recognition tasks, designing an optimum fuzzy classifier by using PSO algorithm is introduced. Also other research studies in this area are reported.
1.1. Fuzzy Systems Fuzzy logic provides a general concept for description and measurement. Most fuzzy logic systems encode human reasoning into a program to make decisions or control a system. Fuzzy logic compromises fussy sets, which are a way of representing nonstatistical uncertainty and approximate reasoning, which includes the operations used to make inferences in fuzzy logic. Fuzzy rule-based systems have been successfully applied to various engineering problems (e.g. pattern recognition [1-2], and control problems [3]). In this subsection the basic concepts and definitions of fuzzy systems are presented.
1.1.1. Membership Functions Unlike traditional two-valued logic, in fuzzy logic, fuzzy set membership occurs for a fuzzy variable by degree over the range [0,1]., which is represented by a membership function. It is this function that is the fuzzy set. The function can be linear or nonlinear. Commonly used are the left-trapezoidal, right-trapezoidal, triangle, Gaussian, and sigmoid functions, as shown in Figure 1. Definitions of these membership functions as used in this chapter are as follows.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
a) Left-trapezoidal membership function: 1 b x LTrap MF ( x) b a 0
if x a if a x b if x b
b) Right-trapezoidal membership function: 1 x a RTrap MF ( x) b a 0
if x a if a x b if x b
c) Triangle membership function:
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
4
Seyed-Hamid Zahiri if x a 0 xa ba 2 if a x ba 2 Triangle MF ( x) b x b a 2 if xb ba 2 0 if x b
d) Gaussian membership function: Gaussian MF ( x) e 0.5 y
2
where
y
8( x a) 4 ba
where
y
12( x a) ba
e) Sigmoid membership function: Sig MF ( x)
f)
1 1 e ( y 6)
Reverse-sigmoid membership function:
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Rsig _ MF( x) 1 Sig _ MF( x)
Figure 2. A fuzzy variable with triangular membership functions.
From the definitions, it can be seen that each abovementioned membership function is determined by two values (the start-point a and the end-point b).
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
5
Theoretically, each fuzzy variable can have many fuzzy sets with each having its own membership function, but commonly used are three, five, seven, or nine fuzzy sets for each fuzzy variable. Figure 2 shows a fuzzy variable with triangular membership functions.
1.1.2. Fuzzy Rules The general form of a Mamdani-type fuzzy rule in a fuzzy system is If x1 is A1 AND x 2 is A2 , …, x n is An THEN y1 is C1 ,…, AND y k is C k where each y i is the consequent (output) variable whose value is inferred, each x i is an antecedent (input) variable, and each Ai and C i is a fuzzy represented by a membership function. The antecedents are combined by AND fuzzy operator. AND'ed antecedents are usually calculated by T-norm [4]. Other fuzzy operators are defined (e.g. OR, Aggregation operator, and Implication operator). In our application, a fuzzy system is utilized as a fuzzy classifier. In fuzzy classifiers the most utilized operator for feature vectors are AND operator. All the fuzzy rules in a fuzzy system are fired in parallel. The fuzzy system works as follows: Determine the fuzzy membership values activated by the inputs. Determine which rules are fired in the rule set. Combine the membership values for each activated rule using the AND operator. Trace rule activation membership values back through the appropriate output fuzzy membership functions. 5. Utilize defuzzification to determine the value for each output variable. 6. Make decision according to the output values.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1. 2. 3. 4.
Determination of the fuzzy membership values of the inputs is often called fuzzification. Each input may activate one or more fuzzy sets of that input variable according to the definitions of the fuzzy membership functions. Only the rules with at least one antecedent set activated are said to be fired by the inputs. The AND operator is typically used to combine the membership values for each fired rule to generate the membership values for the fuzzy sets of output variables in the consequent part of the rule. Since there may be several rules fired in the rule sets, for some fuzzy sets of the output variables there may be different membership values obtained from different fired rules. There are many ways to combine these values. One commonly used way is to use the OR operator, that is to take the maximum value as the membership value of the fuzzy set. Next, a defuzzification method is used to produce a single scalar value for each output variable. A common way to do the defuzzification is called centroid method [4]. Then according to the output values, some decisions can be made to solve the problem. For example, for M-class classification problem, the range of the output variable of a fuzzy classifier can be divided into M evenly distributed parts, then the input pattern belongs to class I if the inferred output value is located inside the ith part. This is the approach taken for constructing the fuzzy classifiers in this chapter.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
6
Seyed-Hamid Zahiri
1.2. Designing a Fuzzy Classifier Using Particle Swarm Optimization Algorithm (PSO) Let us assume that our pattern classification problem is an M-class problem in the ndimensional feature space with continuous attributes. The general form of a fuzzy classifier rule is as follows: If x1 is A1 AND x 2 is A2 , …, x n is An THEN y is C where each y is the output of the rule. X ( x1 ,..., xn ) is a feature vector (input) each Ai and C is a fuzzy represented by a membership function. As mentioned in the previous subsection, the range of the output variable of the rule is divided into M evenly distributed parts, then the input pattern belongs to class i if the inferred output value is located inside the ith part. For example, for three classification problem, the output range is divided into three fuzzy regions of Low, Medium, and High, corresponding to class 1 to 3 respectively. The major aim in this section is obtaining the optimum fuzzy rule set and membership functions in a fuzzy classifier, using PSO. The designed fuzzy classifier by employing PSO is called PSF-classifier.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1.2.1. Integer-Valued Particle Swarm Optimization with Constriction Coefficient In the basic PSO proposed by Kennedy and Eberhart [5], many particles move around in a multi-dimensional space and each particle memorizes the position vector and velocity vector as well as the spot at which the particle has acquired the best fitness. Furthermore, respective particles can share data at the best-fitness spot for all particles. The velocity of each particle is updated with the best positions acquired for all particles over iterations, and the best positions are acquired by the respective particles over generations. To improve the performance of the basic PSO, some new versions were proposed. At first, the concept of an inertia weight was developed to better control exploration and exploitation in [7]. Then, the research done by Clerc [8] indicated that using a constriction factor may be necessary to insure convergence of the particle swarm algorithm. After these two important modifications in the basic PSO, other researchers reported their works on PSO. For example the multi-phase particle swarm optimization (MPSO) was introduced in [9]; in [10] the particle swarm optimization with Gaussian mutation combined the idea of swarm intelligence with concepts from evolutionary algorithms; the Quantum particle swarm optimization was proposed in [11]; a modified PSO with increasing inertia weight schedule was proposed in [12]; the Gaussian particle swarm optimization (GPSO) was developed in [13] and the guaranteed convergence PSO (GCPSO)was introduced in [14]. In this Section PSO with constriction coefficient is used. The reason for this decision is that a good knowledge about the influence of constriction coefficient on the PSO search process is available [6]. Also "lbest" strategy is considered. In this version, particles have information only of their own and their neighbors' bests, rather than of the entire swarm, which is called "gbest". In this approach updating is executed by the following equations: Vi q1 .(Vi q 1 ( Pi Yi ) 2 ( Pg Yi ))
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
(1)
7
Swarm Intelligence and Fuzzy Systems Yi q1 Yi q Vi q
4 2 2 2 4
(2) (3)
In these relations i= 1, 2,…, Pop, where "Pop" is the swarm size, q is the generation counter, Vi (vi1 , vi 2 ,..., vin ) is the velocity vector of the i‘th particle of the swarm,
Pi ( pi1 , pi 2 ,..., pin ) denotes the best position it has ever visited by the particle, Yi ( yi1 , yi 2 ,..., yin ) is the current position and Pg ( p g1 , p g 2 ,..., p gn ) is the best particle among a neighborhood of the particles in the swarm. 1 and 2 are random numbers uniformly distributed in the range (0,
) and n is the dimension of space. χ is the constriction 2
coefficient. We set 1 , meaning that the space thoroughly searched before the swarm collapses into a point [6]. It is important to know that above algorithm requires no explicit limitation as upper bound Vmax . However, from subsequent experiments and applications [15] it has been concluded that a better approach to use as a "rule of thumb" is to limit Vmax to Ymax , which is
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
equal to the dynamic range of the variable. In aforementioned PSO with constriction coefficient, particles are real-valued. Since in this Section the problem of optimization of fuzzy parameters is investigated, we will see that the particles consists integer value. Thus we need to integer-valued particles. For this reason a modification is applied to the equation (2) as below: Yi q1 Round(Yi q Vi q )
(4)
where Round (A) rounds the elements of A to the nearest integers.
1.2.2. Particle Representation The first important consideration is particle representation strategy, which is how to encode the fuzzy classifier into the particle form. To completely represent a fuzzy classifier (system), each particle must contain all the needed information about the rule set and the membership functions. For more explanation, suppose a classification problem for four feature vector dimensions and three reference classes. Each variable has five fuzzy sets representing the linguistic descriptions: Very low, Low, Medium, High, and Very high. In this case, we can use the integers 1-5 to represent each of these five terms, use the integer 0 to represent the absence of a term, and use a minus sign '-' to encode the term "not". For example, the rule "IF input-1 is not Low AND input-2 is not Medium AND input-4 is High, THEN output is Very high" can be encoded as "-2-3045". The total of six types of membership functions (defined in the subsection 1-1-1) are used as the membership functions candidates; each is represented by an integer from 1 to 6. a membership function in our problems is completely determined by three values: the start–
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
8
Seyed-Hamid Zahiri
point a, the end-point b, and the function type value. In order to have a homogeneous particle, integers are chosen to represent the start-point and end-point instead of real values. Assume for the variable x that its dynamic range is [S,E] and that it has n fuzzy sets. If the fuzzy membership functions are uniformly distributed over the range with half-way overlap [1], then the center point c i (i=1,…,n) of the ith membership function is located at
ci S i * step where
step
i=1,…,n
ES . n 1
We constraint the the start–point a i of the ith membership function to vary only between
ci 1 and c i , and the end-point bi of the ith membership function can vary only between c i and ci 1 . Assume an integer q (q= 1,2,…,10) is used to represent a i and bi ., then a i and bi can be calculated from the integer q using the following relations: step * (10 q) S 2 *10 step * (10 q) bi i * step S 2 *10 ai i * step
i=1,2,…,n
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Assume for our example fuzzy classifier that the maximum acceptable number of rules is 30, then the length of the particle is 1+5*(5*(2+1))+5*30==226 and its form is as follows:
where s1 represents the number of rules varying between 1 to 30, s2 , s3 represent the start point and end point for the first fuzzy set of the first input variable. s 4 represents the membership function type for the first input variable and can vary between one and six . s 5 to
s 76 encode the remaining fuzzy membership functions (start point, and point, and type), s 77 to s 81 represent the first fuzzy rule and s 222 to s 226 represent the last fuzzy rule. Since s1 specifies how many possible rule are encoded in the particle, only the first s1 rules are used to form the rule set, but it may be that not each of them is feasible. Each possible rule is therefore checked to see whether it represents a feasible rule or not. A rule without a nonzero antecedent or consequent part is not a feasible rule and will not be included in the rule set. This schedule is completely similar to chromosome representation, where Shi et al. [1] try to implement an evolutionary fuzzy classifier using genetic algorithm. Since we are interested in comparing the powerfulness of the evolutionary algorithms and swarm intelligence in task of optimizing fuzzy systems, the schedule of implementation of a fuzzy classifier adopted from [1] to reach more meaningful comparison results.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
9
Swarm Intelligence and Fuzzy Systems
1.2.3. Fitness Function Definition To evaluate the quality of each rule set, at first, a fitness value is defined for a rule as below: Q
TP TN . TP FN FP TN
(5)
where TP: True Positives = number of instances covered by the rule that are correctly classified, i.e., its class matches the training target class. FP: False Positives = number of instances covered by the rule that are wrongly classified, i.e., its class differs from the training target class. TN: True Negatives = number of instances not covered by the rule, whose class differs from the training target class. FN: False Negatives = number of instances not covered by the rule, whose class matches the training target class. Then the total fitness of a rule set is defined as follows: K
Fit ( Rule _ set ) Ql l 1
(6)
where Ql is the fitness of the l'th rule of the K rules in the rule set.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1.3. Experimental Results The performance evaluation of the optimized fuzzy classifier (PSF-classifier) is investigated in this Section. Also the comparative results with a fuzzy classifier whose rules and membership functions are optimized by genetic algorithm [1] (namely GAF-classifier) are provided. Two performance aspects are considered for comparing above mentioned optimized fuzzy classifiers. Those are recognition score for testing data (predictive accuracy) and rule simplicity. The simplicity is measured by the number of discovered rules and the average number of antecedents per rule [16]. Three pattern classification problems with different feature vector dimensions (4, 9, 34), are used for performance evaluation and comparison of the results. A description of the data sets is given as follows: Iris data1: The Iris data contains 50 measurements of four features of each three species: Iris setosa, Iris versicolor, and Iris virginica. Features are sepal length, sepal width, petal length and petal width.
1
This data set is available at University of California, Irvine, via anonymous ftp ftp.ics.uci.edu/pub/machinelearning-databases.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
10
Seyed-Hamid Zahiri
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Cancer data2: This breast cancer database, obtained from the University of Wisconsin Hospital, Madison, has 683 breast mass samples belonging to two classes Benign and Malignant, in a nine dimensional feature space. Dermatology data3: The aim for this dataset is to determine the type of EryhematoSquamous Disease. This database contains 34 attributes, 33 of which are linear valued and one of them is nominal. To estimate more accurate performance measures, ten-fold cross validation is used. It means 10% of whole training samples are randomly considered as testing points (validation sets) and others as training set for discovery and optimization of fuzzy rules and membership functions. The validation sets is used to estimate the generalization of classifier. The whole training set is randomly divided into 10 disjoint sets of equal size. Then the PSO and GA method is run 10 times for designing PSF-classifier and GAF-classifier, respectively. Each time with a different set held out as a validation. The estimated predictive accuracy values are the mean values of these 10 scores of recognition for testing data sets. The population size is 40 and the maximum number of iterations is set to 10000 for both PSO and GA. The mutation and crossover rates are chosen equal to 0.01 and 0.7 respectively. Table 1 presents the obtained predictive accuracy results PSF-classifier and GAFclassifier. It can be seen from Table 1 that, for Iris data the obtained maximum and average predictive accuracy by PSF-classifier are better than GAF-classifier for testing data. With regard to the minimum predictive accuracy, GAF-classifier outperforms the PSF-classifier by a little value of 0.4%. With regard to Cancer data, PSF-classifier has better maximum predictive accuracy in comparison to GAF-classifier. But the obtained average and minimum predictive accuracy values by GAF-classifier are better than PSF-classifier by only 0.5% and 0.3% respectively. For Dermatology data, PSF-classifier outperforms GAF-classifier regarding minimum, maximum and average predictive accuracy values. The improvements of minimum, maximum, and average predictive accuracy results for PSF-classifier with respect to GAF-classifier are 1.3%, 2.1%, and 2.4%, respectively. The predictive results in Table 1 show that the swarm intelligence techniques (PSO in this case) can optimize the fuzzy systems parameters (fuzzy classifier in our problem) successfully. With a comparison to evolutionary algorithms, the obtained results show that the performance of the PSO algorithm in designing the optimal fuzzy classifiers, is comparable, sometimes better than genetic algorithm. Table 1. Minimum, maximum, and average predictive accuracy values (%) for PSF-classifier and GAF-classifier PSF-classifier Min. Iris 92.2 Cancer 91.0 Dermatology 90.4
2 3
Max. 98.0 97.5 95.3
GAF-classifier Ave. Min. Max. 96.1 92.6 97.1 94.1 91.3 96.8 93.6 89.1 93.2
This data set is available at: http;//www.ics.uci.edu/~mlearn/MLRepository.html The same site
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Ave. 95.8 94.6 91.2
Swarm Intelligence and Fuzzy Systems
11
Table 2. Simplicity measures for obtained rules by PSF-classifier and GAF-classifier
Iris Cancer Dermatology
Number of rules PSF-classifier GAF-classifier 5.2±0.31 5.8±0.27 6.7±0.50 6.1±0.21 8.1±0.25 8.3±0.37
Number of terms / Number of rules PSF-classifier GAF-classifier 2.78 2.52 2.91 3.25 3.82 3.96
Table 2 shows the simplicity measures which have been obtained by PSF-classifier and GAF-classifier. In this Table the first column shows the averages number of discovered rules and standard deviations, and the second column presents relative number of terms per rule. In the Iris data set, PSF-classifier uses average number of rules of 5.2, whereas the average number of rules for GAF-classifier is 5.8, but the discovered rules for GAF-classifier is simpler than those of PSF-classifier. In Cancer data the number of rules of PSF-classifier is more than GAF-classifier, but those are simpler than GAF-classifier. In Dermatology data, the number of rules and the value of terms per rule is lower than GAF-classifier. It means that the performance PSO (as a swarm intelligence technique) is better than GA in optimizing fuzzy classifier.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1.4. Other Related Researches The optimization of parameters of fuzzy systems by using the swarm intelligence algorithms has been implemented in different applications. Tao et al. [17] proposed a fuzzy entropy method incorporating with the ant colony optimization (ACO). The ACO was used to obtain the optimal parameters of fuzzy entropy method. They applied their method to the segmentation of infrared objects and they illustrated that the fuzzy entropy method, incorporating with the ACO, provides improved search performance and requires significantly reduced computations in comparison to GA. Therefore, it may be suitable for real-time vision applications, such as automatic target recognition (ATR). Han and Shi [18] utilized ACO technique for fuzzy clustering in image segmentation. Chatterjee and Siarry [19] employed the PSO algorithm to simultaneously tune the shape of the fuzzy membership functions as well as the rule consequences for the entire neuro-fuzzy rule based classifier. Chen and Zhao [20] proposed a data-driven fuzzy clustering method based on maximum entropy principle (MEP) and PSO. In their algorithm, the memberships of output variables are inferred by maximum entropy principle, and the centers of fuzzy rule base are optimized by PSO. In [21], fuzzy c-mean clustering, particle swarm optimization and recursive least-squares are combined to generate fuzzy modeling system. Li et al. [22] adopted ACO to propose a chaotic optimization method, called CAS (chaotic ant swarm) for solving the problem of designing a fuzzy system to identify dynamical systems. The position vector of each ant in the CAS algorithm corresponds to the parameter vector of the selected fuzzy system. At each learning time step, the CAS algorithm is iterated to give the optimal parameters of fuzzy systems based on the fitness theory. Then
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
12
Seyed-Hamid Zahiri
the corresponding CAS-designed fuzzy system is built and applied to the identification of the unknown nonlinear dynamical systems.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
2- Intelligently Controlling the Multi-objective Swarm Intelligence Parameters Using Fuzzy Systems Each swarm intelligence optimization algorithm (e.g. PSO and ACO) includes major essential parameters which play important roles on its search process. For example in different versions of PSO algorithm the swarm size, neighborhood size, constriction coefficient, inertia weight, social coefficient, and cognitive coefficient are some examples of those important parameters. In fact the values of these parameters directly affect on convergence rate, exploitation, exploration, premature convergence, etc. In the most of reported researches on the applications of swarm intelligence algorithms, their structural parameters have been achieved by running the swarm intelligence algorithm for several times with different sets of parameter to find a proper set. But this selection can not be the optimal set up because the search process of the swarm intelligence algorithms is nonlinear and very complicated and it is hard—if not impossible—to model mathematically the search process to adjust adaptively their parameters by a schedule which is find by try and error. In the case of multi-objective optimization, this complexity is more than single-objective optimization problems, because the optimization goal itself consists of multiple objectives. On the other hand, over the years, some understanding of search process of the multiobjective swarm intelligence techniques has become accumulated and linguistic description of the search process is currently available. New knowledge makes the fuzzy system a good candidate for controlling the multi-objective swarm intelligence parameters, intelligently. In this Section we investigate that how the fuzzy controllers may add to a multi-objective swarm intelligence algorithm to improve its powerfulness and effectiveness by controlling the important parameters of these multi-objective optimization techniques. For this purpose an adaptive fuzzy system is designed and integrated with a proposed integer-valued multiobjective particle swarm optimizer (MOPSO) to develop a more powerful technique named Fuzzy-MOPSO. The designed fuzzy controller adapts the values of important structural parameters of integer-valued MOPSO which are swarm size, neighborhood size, and constriction coefficient. Three main goals have been considered for designing the proposed method. Those are: a) good generalization4, b) maximizing the number of non-dominated solutions, and c) maximizing the spread of non-dominated solutions. Two performance metrics -named aggregation factor and minimal spacing- are introduced and utilized to reach above goals. The proposed method is investigated as an effective approach for some multiobjective benchmarks and space allocation problem, which is real-world combinatorial optimization problem. Experimental results show that Fuzzy-MOPSO can be successfully developed for space allocation problem (a complex and hard multi-objective problem), producing solutions of acceptable quality in comparison to similar approaches.
4
"Good generalization" refers to the capability of algorithm for solving a wide range of different multi-objective problems.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
13
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
2.1. A Review on the Past Researches on Multi-objective PSO Particle swarm optimization (PSO) is a heuristic, parallel, and stochastic search algorithm inspired by the choreography of flock of flying birds in searching of food. The powerfulness of this algorithm as a single-objective optimization technique encouraged the researchers to extend PSO for multi-objective optimization problems. Since the basic version of PSO was single-point-centered characteristic, it was difficult to locate the non-dominated points on the Pareto-front, because there will be more than one criteria exist to direct the velocity and position of a particle. Some efforts were done to overcome this weakness. The first attempt comes back to seven years ago, when Moore and Chapman developed an approach to solve multi-objective problems using PSO [23]. After this research -with the best of our knowledge- almost thirty proposals have been published in specialized Conferences and Journals, which contain different methods, utilizing PSO to construct effective multi-objective particle swarm optimization (MOPSO) algorithms. For example a method was suggested to use an external archive in which every particle will deposit its flight experiences after each flight cycle [24-25]. The updates to the external archive are performed considering a geographically-based system defined in terms of the objective function values of each particle. This approach also uses a mutation operator that acts both on the particles of the swarm, and on the range of each design variable of the problem to be solved. Parsopoulos and Vrahatis proposed various types of aggregating functions for solving multi-objective problems by PSO [26]. In fact, in their algorithm the multi-objective problem is transformed into a single-objective one. Also Baumgartner, Magele and Renhart suggested an approach based on the linear aggregating functions [27]. In this case, the swarm is equally divided into m sub-swarms, each of which uses a different set of weights and evolves into the direction of its own swarm leader. Their algorithm uses a gradient technique to obtain the Pareto front. Hu, Eberhart, and Shi developed an algorithm for MOPSO in such way that only one objective is optimized at a time [28]. They incorporated an extended memory similar to external archive introduced by Coello et al. The multi-objective swarm intelligence technique proposed by Ray and Liew was established on the Pareto dominance and combining some evolutionary techniques with the particle swarm [29]. They also used a nearest neighbor density estimator to promote diversity. Chow and Tsui introduced a modified PSO called ―M ulti-Species PSO‖ considering each objective function as a species swarm [30]. A communication channel is established between the neighboring swarms for transmitting the information of the best particles, in order to provide guidance for improving their objective values. Abovementioned researches were referred among several reported approaches in last seven years. A survey on different MOPSO algorithms was made by Reyes-Sierra and Coello [31]. They classified the current MOPSOs into five classes as aggregating approaches, lexicographic ordering sub-population approaches, Pareto-based approaches, combined approaches, and other approaches. Also there is a parameter free PSO (called TRIBESD), which can cope with multi-objective problems5. 5
Particle Swarm Central (PSC) http://www.particleswarm.info , program section.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
14
Seyed-Hamid Zahiri
The balance between global and local search throughout the course of run is critical to the success of all the proposed MOPSO techniques. It is known that the values of internal parameters of PSO affects on this balancing procedure. For example the inertia weight influences the trade-off between global (wide-ranging) and local (nearby) exploration abilities [32]. A large inertia weight facilitates global exploration (searching new areas) while a smaller inertia weight tends to facilitate local exploration to fine-tune the current search area. The value of the inertia weight may vary during the optimization process. Shi and Eberhart asserted that by linearly decreasing the inertia weight from a relatively large value to a small one through the course of the PSO run, the PSO tends to have more global search ability at the beginning of the run and have more local search ability near the end of the run [33]. On the other hand, Zheng et al. argued that either global or local search ability associates with a small inertia and that a large inertia weight provides the algorithm more chances to be stabilized [12]. In this way, inspired on the process of the simulated annealing algorithm, the authors proposed to use an increasing inertia weight through the PSO run. In essence, there is repugnance on the propositions of an effective schedule for assigning the inertia weight value occurs because the search process of the MOPSO is non-linear and very complicated. In fact, it is hard, to mathematically model the search process to dynamically adjust the MOPSO parameters. On the other hand, many linguistic descriptions on the effects of the parameters of MOPSO on its search process are available from the related researches. These reasons make a fuzzy system a good candidate for dynamically controlling the internal MOPSO parameters. Already this idea has been utilized to improve the powerfulness of single-objective simple PSO to design a swarm intelligence based classifier [34]. The idea is extended to develop a novel multi-objective PSO, whose basis is established on the integer-valued PSO with constriction coefficient. In this Section a fuzzy controller is designed and integrated with a proposed multiobjective particle swarm optimizer to develop a novel technique (named Fuzzy-MOPSO) which can estimate the Pareto-front more accurately.
2.2. Fuzzy-MOPSO Algorithm Fuzzy-MOPSO algorithm includes two major parts of MOPSO with constriction coefficient and a fuzzy controller. These are explained as follows:
2.2.1. Integer-Valued MOPSO with Constriction Coefficient As mentioned in the previous Section many kinds of MOPSO techniques were proposed in the literature over the past seven years. Although the idea of controlling the internal parameters of PSO may be applied on all these algorithms, but we adopt the proposed technique by Coello and Lechuga [24] as a conventional MOPSO, because it was shown that their approach is highly competitive with current evolutionary multi-objective optimization technique. They used a simple PSO to construct their MOPSO algorithm. Thus no guarantee was provided for convergence of the algorithm.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence and Fuzzy Systems
15
We used PSO with the constriction coefficient; because it was shown that with a proper parameters set up this version of PSO converges almost everywhere [6]. Moreover, to prepare necessary conditions for solving the space allocation problem, the integer-valued PSO with constriction coefficient is used as explained in subsection 1-2-1. The basic idea of having global repository is integrated with the integer-valued PSO with the constriction coefficient to construct our MOPSO algorithm. Coello and Lechuga [24] introduced their MOPSO algorithm - using a PSO with inertia weight - based on utilizing global repository in which every particle will deposit its fight experiences after each flight cycle. The repository is used by the particles to identify a leader that will guide the search. The mechanism of MOPSO is based on the generation of hypercubes which are produced dividing the search space explored. The algorithm of MOPSO using a PSO with constriction coefficient is the following: i) Initialization the Swarm and the velocity of each particle of the Swarm: For i=1 to S (S is the initial Swarm_size) Y[i]=Random_generate (); V[i]=0; End For; ii) Evaluate each of particles in the Swarm; iii) Store the positions of the particles that represent nondominated vectors in the repository REP; iv) Generate hypercubes of the search space explored so far, and locate the particles using these hypercubes as a coordinate system where each particle's coordinate are defined according to the values of its objective function. v) Initialize the memory of each particle (this memory serves as a guide to travel through the search space . This memory also stored in the repository): a) For i= 0 to S b) PBEST[i] = Y[i]; vi) WHILE maximum number of cycles has not been reached DO a) initialize the internal parametrs of PSO which are h (neighborhood size) ,χ (constriction coefficient) ,and Swarm_size. b) Compute the speed of each particle using the following expression: V q1[i] .(V q [i] 1 ( PBEST[i] Y[i]) 2 ( REP[h] Y[i])) , Where
1 and 2 are random numbers uniformly distributed in the range (0, ). 2
Other parameters have been defined in equation (1). REP[h] is a value that is taken from the repository REP; index h is selected in the following way: those hypercubes containing more than one particle are assigned fitness equal to the result of dividing any number x>1 by the number of particles that they contain. This aims to decrease the fitness of those hypercubes that contain more particles and it can be seen as a form of fitness sharing. Then the apply roulette-wheel selection is applied using these fitness values to select the hypercube from which we will take the corresponding particle. Once the hypercube has been selected, we select randomly a particle within such hypercube. PBEST[i] is the best position that the particle Y[i] has had.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
16
Seyed-Hamid Zahiri c) Compute the new positions of the particles adding the speed produced from the previous step, using this equation:
Y q1[i] Round(Y q [i] V q1[i]) d) Maintain the particles within the search space in case they go beyond its boundaries (avoid generating solutions that do not lie on valid search space). e) Evaluate each of particles in the Swarm. f) Update the contents of REP together within the hypercubes. This update consists of inserting all the currently non-dominated locations into the repository. Any dominated locations from the repository are eliminated in the process. g) When current position of the particle is better than the position contained in its memory, the particle's position is updated. The criterion to decide what position from memory should be retained is simply to apply Pareto dominance (i.e. if the current positions dominated by the position in memory, then the position in memory is kept; otherwise, the current position replaces the one in memory; if neither of them is dominated by the other, then we select one of them randomly). h) Increment the loop counter vii) End While
2.2.2. Designing Fuzzy-Controller for MOPSO
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
To construct an effective fuzzy controller for adapting the internal parameters of introduced conventional MOPSO, some metrics are needed to evaluate the progress of its search process. In fact these metrics are evaluated while MOPSO is running and the best values of its internal parameters are estimated by the fuzzy controller. Thus the effects of these internal parameters on the search process of the algorithm should investigate carefully to design effective fuzzy rules. 2.2.2.1. Metrics of Performance In the case of multi-objective optimization, the definition of quality is substantially more complex than for single-objective optimization problems, because the optimization goal itself consists of multiple objectives:
The distance of the resulting non-dominated set to the Pareto-optimal front should be minimized. A good (in most cases uniform) distribution of the solutions found is desirable. The assessment of this criterion might be based on a certain distance metric. The extent of the obtained non-dominated front should be maximized, i.e., for each objective, a wide range of values should be covered by the non-dominated solutions.
In the literature, some researches can be found to formalize the above definition (or parts of it) by means of quantitative metrics [35]. In this subsection two metrics named minimal spacing and aggregation factor are introduced and utilized to evaluate the performance of the search process of the PSO. These two metrics are explained as follows:
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
17
Swarm Intelligence and Fuzzy Systems
a) Minimal spacing Schott suggested a measure called spacing which reflects the uniformity of the distribution of the solutions over the non-dominated front [36]. Spacing (S) between solutions is calculated as 1 (d i d ) 2 | Q | i 1 Q
S
(7)
Where d i min kQ and k i M f mi f mk and f m (or, f m ) is the mth objective value of the m 1 i
k
ith (or, kth) solution in the non-dominated solution set Q. d is the mean value of all d i s.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Note that a value of nearer to 0 is desirable since it indicates that the solutions are uniformly distributed over the Pareto-optimal front. This diversity measure can be calculated for solution sets where there are more than two solutions in the set. Figure 3(a) and (b) demonstrates a common scenario when this measure is expected to fail. In Figure 3(a), since the nearest neighbor of point a is b and vice versa, the value of S will be low, indicating wrongly a uniform spread over the Pareto-optimal front. This measure is unable to indicate that a large gap exists between a and b. Note that with this assumption the value of S will be low in Figure 3(b) also. It is obvious that the situation in Figure 3(b) if preferable, since it shows a more uniform distribution over the Pareto-optimal front. However, the measure is unable to indicate this. This limitation overcame in [37] by defining a modified measure called minimal spacing ( S m ).
(a)
(b)
Figure 3. (a) and (b), example of a set of non-dominated solutions.
The essence of this measure is to consider the distance from a solution to its nearest neighbor which has not already been considered. The way S m is calculated among the solutions in the non-dominated set is as follows. Initially consider all solutions as unmarked. Take a solution and call it the seed. Mark the seed. Start computing the nearest distance from the last marked solution (initially the seed). Each time while calculating the minimum distance between two solutions, consider only those solutions which are still unmarked.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
18
Seyed-Hamid Zahiri
Once a solution is included in the distance computation process, make it as marked. Continue this process until all solutions are considered and keep track of the sum of the distances. Repeat the above process by considering each solution as seed and find out the overall minimum sum of distances. The sequence of solutions and the corresponding distance values are used for the computation of the S m , where again (7) is used with Q replaced by Q 1 (since here we have Q 1 distances). As for S, a lower value of S m indicates better performance of the corresponding multi-objective technique. Note that with this definition, the value of S m in Figure 3(b) will be lower than that in Figure 3(a), indicating that the former solution set is better. In this regard, it may be mentioned that the range of values of the different objectives often varies widely. Consequently, they are generally normalized while computing the distances. In other max
words, when computing the d i s, the term f m f m is divided by Fm i
max
normalize it, where Fm
min
and Fm
k
Fmmin in order to
are the maximum and minimum values respectively of max
the mth objective. Note that we don‘t need to know Fm
min
and Fm
. Those are the obtained
values by multi-objective optimization algorithm.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
b) Aggregation factor The minimal spacing ( S m ) is a suitable measure for evaluating the distribution quality of the solutions in objective space. But it can not guarantee the uniform distribution and good coverage of the solutions regarding all objective functions. Figure 4 shows this defect for two-objective optimization problem. With regard to minimal spacing (alone), both Figure 4 (a) and (b) have good (low) values, but it can be seen easily that the aggregation of the solutions is toward to objective f 2 for the case has been shown in Figure 4 (a).
(a) Figure 4. (a)
Sm
(b)
is desired but aggregation of solutions is undesired (b) Both and aggregation of the
solutions are desired.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
19
Swarm Intelligence and Fuzzy Systems
Thus S m is not enough for estimating the distribution quality of the solutions. To overcome this weak point, a novel performance metric is introduce and called aggregation factor. Consider a multi-objective problem defined as Maximize F ( X ) ( f1 ( X ), f 2 ( X ),, f n ( X )) , where Yi f i ( X ), (i 1,2,..., n) is the last ith objective function on the estimated Pareto front, at the present instant, and X ( x1 , x2 ,..., xm ) is the variable vector of the objective functions.
Figure 5. Example of choosing M l 's for two-objective optimization problem.
Suppose that M l , (l 1,2,..., n) , indicate the region of the objective space which is
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
desired that the distribution of obtained Pareto front occurs in that region. Note that M l 's do not belong to the Pareto-global front necessarily and those are only meaningful assumptions (guesses) which are used as references for controlling the distribution of the solutions obtained by the Fuzzy-MOPSO (Figure 5). For example consider the problem of designing a classifier using a multi-objective algorithm in such way that two objectives of Vagueness (Vag) and Error rate(Er) should be minimized. The region of the trade-off front may be limited by M 1 :(Vag=100%, Er=0%) and
M 2 :(Vag=0%,Er=100%).
Regarding aforementioned definitions the aggregation factor (A) is defined as bellow: m
al
dist (Y , M
i 1 n m
i
l
)
dist (Y , M j 1 i 1
i
j
)
n
A al l 1
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
(8)
20
Seyed-Hamid Zahiri n
Where dist(.) denotes the Euclidean distance. Since
a l 1
happens when a l 's are equal to
l
1 the maximum value of A
1 n 1 and then Amax ( ) . n n
Figure 6. The scaled membership functions of the inputs ( UN , S m , and A).
2.2.2.2. Fuzzy Parameters
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
The fuzzy controller in the Fuzzy-MOPSO has been established on three major parts of fuzzy inputs, fuzzy outputs, and fuzzy rules.
a) Inputs of fuzzy controller The inputs of the fuzzy controller are introduced as bellow:
The number of iterations, whose non-dominated points are unchanged and 6 their fitness values are not considerable . S m : Minimal spacing, as introduced in previous subsection.
A: Aggregation factor, as introduced in previous subsection.
UN :
The normalized membership functions of the inputs are shown in Figure 6.
b) Outputs of fuzzy controller The outputs of the fuzzy controller are the optimum values of the structural parameters of the Fuzzy-MOPSO: 6
When we would like to solve a problem by a swarm intelligence or evolutionary algorithm, in general, we keep in mind a (or some) fitness value(s) hopefully to reach by these algorithms. For example in decreasing the overshoot of a control system we want to remove the overshoot (the ideal fitness value is zero); thus the fitness values by 20% or more overshoot is not considerable.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems -
21
Swarm_size: The size of the swarm. h : The size of neighborhood around each particle. Cf : The controlling factor of constriction coefficient χ ( in (3)).
Scaled membership functions of the outputs are shown in Figure 7. It is not claimed that the shapes of the inputs and outputs of the membership functions are optimal. But with these shapes the best experimental results were obtained.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
c) Fuzzy rules To extract effective fuzzy rules for the proposed Fuzzy-MOPSO, at first a linguistic descriptions on the effects of the structural parameters of MOPSO on its search process is presented and then the fuzzy rules are defined.
Figure 7. Normalized outputs membership functions.
Linguistic description on the effect of structural parameters of MOPSO on its search process Swarm Size Swarm size has a significant effect on the search process of MOPSO. A large value of swarm size reduces the convergence rate considerably and slows down the algorithm; but it causes searching a wide range of solution space to obtain a desired value of aggregation factor (A) and minimal spacing ( S m ). Whereas a small value of swarm size causes a local minimum capturing and reduces the performance of MOPSO due to locating the obtained front far away the global Pareto front. Designing fuzzy rules should be in such manner that when the algorithm captures in a non-important local region of search space, the swarm size is increased to escape it from this local solution and by receiving a better solution, the swarm size is decreased to improve the convergence rate and searching performance.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
22
Seyed-Hamid Zahiri Low values of A and S m , indicate that there are some major regions in the search space
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
where the Fuzzy-MOPSO has not explored them yet. Thus the swarm size should be increased. Neighborhood Size In our conventional MOPSO, particles tend to be influenced by their success along their past history and also by the success of any particles in their neighborhood, i.e. with which they interact. These 'schemes of interactions' between particles, is termed sociometric principles. Particles can interact with each other in a number of ways. The simplest way is in the form that the particle interacts with its two nearest neighbors. Any number of nearest neighbors can be used. If the number of nearest neighbors is less than the total number of particles in the swarm, then this sociometric principle is called "lbest", else it is called "gbest". Conceptually, "gbest" connects all the particles together, what means that their social interaction is maximal. In contrast, lbest results in a local neighborhood for the particle. Early experience led to neighborhood sizes about 10% to 40% of the swarm size being used for some applications [38]. Since the information is exchanged between neighboring particles in the topology, selection of the smaller neighborhood causes finding the best position more slowly; but with regard to minimal spacing ( S m ), the search performance is improved. Increasing the neighborhood size increases the convergence rate, but might be trapped into a local optimum and caused locating the estimated Pareto front far away the global one. Constriction Coefficient The constriction coefficient (χ) is the most important structural parameter in MOPSO. It is controlled by the parameter using (3). An exploration was presented which indicates how the value of χ affects on the search process of the PSO algorithm [6]. Specifically, it has been proved that the application of constriction coefficient allows control over the dynamical characteristics of the particle swarm, including its exploration versus exploitation propensities. In fact, constriction coefficient prevents a buildup of velocity because of the effect of particle inertia. Without the constriction coefficient, particles with buildup velocities might explore the search space, but loose the ability to fine-tune a result. On the other hand, constriction the particle speed too much might damage the search space exploration. Thus the value of constriction coefficient affects the global versus local abilities of the proposed conventional MOPSO. It can be concluded from [6] that determines the value of attraction of particles by the best positions found previously by it and by its neighborhood. This means that the convergence characteristics of the conventional MOPSO can be controlled by . As the points on the estimated front tend to global Pareto front, the part of search space, which the particle explores should be smaller. It means that should be increased to decrease χ. The result is decreasing the inertia of the particle to emphasize the local search instead of global. A less improvement in the particle fitness causes a bigger search space for the exploration. This means a decreasing the value of is necessary to increase inertia of the particle.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
23
It results more dispersion for swarm and emphasizing the global search instead of local, to reach good values for performance metrics of A and S m . Based on the above descriptions seven fuzzy rules are defined to control the internal parameters of the Fuzzy-MOPSO: a) IF
UN
is high and A is low and S m is high, THEN h is high, Cf is low, and
Swarm_size is high. b) IF S m is high and UN is medium, THEN h is high and Cf is medium. c) IF A is low and UN is medium, THEN h is low, Swarm_size is high, and Cf is medium. d) IF A is high, and S m is medium, THEN Swarm_size is medium and Cf is high. e) IF A is low and S m is low, THEN Swarm_size is high, h is medium, and f)
Cf is
medium. IF A is high and S m is high, THEN Swarm_size is high, h is low, and
Cf is low. g) IF A is high and S m is low and UN is low, THEN Swarm_size is low and h is medium and Cf is high.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Rule (a) is defined to escape from local, non-important fronts. Rules (b) to (f) are designed to maximize the spread of solutions found, as smooth and uniform as possible and to maximize the number of elements of the trade-off found set. These aims happened investigating the situations of the metrics of minimal spacing ( S m ) and introduced aggregation factor (A). Rule (g) considers the states which the Fuzzy-MOPSO has estimated the Pareto-front. In these situations the swarm not to be steered to new regions and the computational costs of the algorithm should be decreased to achieve more convergence rate. Note that it is not claimed that the extracted fuzzy rules and the shape of membership functions are optimal. Those are extracted from a linguistic description and it is found experimentally that these fuzzy rules and membership functions give better performance measures for the obtained Pareto front.
2.3. Space Allocation (Problem Description and Formulation) Space allocation refers to the desired distribution of office space in academic institutions. This is a difficult real-world combinatorial optimization problem that is related to the class of knapsack problems [39]. A particular version of space allocation which is considered here is related to the distribution of some entities, in an academic institution, in such a way that the misuse of room space and the violation of soft constraints are minimized. The entities are staff, lecture rooms, computer rooms, research rooms, etc. Soft constraints are restrictions that limit the ways in which the entities can be allocated to the rooms (e.g. entities that should not share a room, entities that should be allocated together, etc.) and that are penalized if violated.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
24
Seyed-Hamid Zahiri
Also some constraints exist in the problem which are considered similar to [39] for more validity of comparative results. Those are: i) Not Sharing - two entities cannot share a room. A penalty of 50 is applied if a constraint of this type is violated. ii) Be located in - a given entity should be allocated to a given room (e.g. a computer lab). A penalty of 20 is applied if a constraint of this type is violated. iii) Be adjacent to - two given entities should be allocated in adjacent rooms (e.g. a Ph.D. student and his supervisor). A penalty of 10 is applied if a constraint of this type is violated. iv) Be away from - two given entities should be allocated away from each other (e.g. lecture room and photocopier room). A penalty of 10 is applied if a constraint of this type is violated. v) Be together with - two given entities should be allocated in the same room (e.g. two Ph.D. students working on the same project). A penalty of 10 is applied if a constraint of this type is violated.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
vi) Be grouped with - a given entity should be allocated in a room that is ‗close‘ to a given set of entities (e.g. the members of a research group). A penalty of 5 is applied if a constraint of this type is violated. Depending on the desired solutions of the problem, any of the above types of constraints can be known as hard or soft. When a particular constraint is hard, it must be satisfied for the solution to be considered feasible. Mathematically, the space allocation problem can be represented as the allocation of a set of n entities into a set of m available rooms [39]. Each entity j (j = 1, 2, ..., n) has a space requirement w(j). Similarly, each room i (i= 1, 2 ..., m) has a capacity c(i). Each entity must be allocated to exactly one room and each room can contain zero or more entities. The aggregated space requirements of all the entities allocated to a room i is denoted Q(i). For a given room i, there is space wastage if c(i) > Q(i) and there is space overuse if c(i) < Q(i). There is a penalty of 1 for each unit of space wasted and a penalty of 2 for each unit of space overused. It means that it is less desirable to overuse space than to waste it. The sum of penalties due to space wastage and space overused for all m rooms is called space misuse and is denoted by F1. The sum of all penalties due to violation of soft constraints is denoted by F2. This problem is tackled as a two-objective optimization problem, where F1 and F2 are minimization objectives. More details about space allocation problems can be found in [40]. Three versions of space allocation problem are chosen to provide meaningful comparative results with similar researches. These instances are: nottl, nott1b and trent1. The nott1 and nott1b test instances were prepared using real data corresponding to the distribution of office space in the School of Computer Science and Information Technology at the University of Nottingham during the 1999-2000 academic year.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
25
In the nott1 instance there are 131 rooms, 158 entities to be allocated, and 263 constraints (111 hard and 152 soft). The nott1b instance has 115 rooms, 142 entities, and 260 constraints (110 hard and 150 soft). The trent1 instance was prepared using real data corresponding to the distribution of office space in the Chaucer building at the Nottingham Trent University during the 20002001 academic year. In the trent1 instance there are 73 rooms, 151 entities, and 211 constraints (80 hard and 131 soft) [39].
2.4. Implementation and Experimental Results In this Section the proposed Fuzzy-MOPSO is implemented and tested on two wellknown benchmarks and three instances of space allocation problem as a combinatorial and complex multi-objective problem.
2.4.1. Application on Well-Known Benchmarks Two test functions were taken from the specialized literature to compare the proposed Fuzzy-MOPSO with MOPSO algorithm which is introduced by Coello and Lechuga [24] without any fuzzy controller. Note that for these experiments we used the basic constriction coefficient PSO (not integer-valued) in the Fuzzy-MOPSO because we need to the floating solutions. The benchmark problems are:
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1)
2)
fˆ1 ( x) ( f1 ( x), f 2 ( x)) (convex, uniform Pareto front) 1 n 1 n f1 ( x) xi2 , f 2 ( x) ( xi 2) 2 n i 1 n i 1 where
fˆ2 ( x) ( f1 ( x), f 2 ( x)) (convex, non-uniform Pareto front)
where f ( x) x , 1 1
f 2 ( x) g ( x)(1
f 1 ( x) , ) g ( x)
g ( x) 1
9 n xi n 1 i 2
The number of iterations is set to 150 for each problem, with x [0,1]2 . The MOPSO parameters were fixed as it was used in [24]. The results are shown in Figure 8 and Figure 9. It is clear that both MOPSO and proposed Fuzzy-MOPSO succeed in capturing the shape Pareto front in each test function. But as it can be seen from Figure 8 and 9 the performance of the proposed Fuzzy-MOPSO is better than MOPSO, because the solutions on the Pareto front which were obtained by MOPSO are dominated by the Pareto front calculated by FuzzyMOPSO (almost everywhere).
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
26
Seyed-Hamid Zahiri
Figure 8. Obtained Pareto front for
f1 ( x) by MOPSO and Fuzzy-MOPSO.
Figure 9. Obtained Pareto front for
f 2 ( x) by MOPSO and Fuzzy-MOPSO.
2.4.2. Application of Fuzzy-MOPSO on Space Allocation In this subsection the application of the proposed Fuzzy-MOPSO on the space allocation problem is presented. Also the comparative results are provided with similar methods for evaluating the performance of the algorithm. Those are hyper-heuristic approach, named TSRulWheel [39] and population-based annealing algorithm, named PBAA [41].
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
27
Problem description and formulation is given in subsection 2-3. At first particle representation is explained, and then the results of implementation of the proposed FuzzyMOPSO for space allocation problem are described. a) Particle Representation Suppose that allocating of n entities into a set of m available rooms is desired. A particle is a candidate solution (i.e. allocation) and is represented by a vector Y ( y1 , y2 ,..., yn ) of length n where each element y j (1, 2,...,m) for j = 1, 2, ..., n indicates the room to which the entity j is allocated. Thus, all particles in the swarm are integer-valued. This is just the reason that the integer-valued MOPSO was introduced in subsection 2-2-1 and consequently the Fuzzy-MOPSO was designed based on the modified MOPSO. b) Experimental and Comparative Results
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
To demonstrate the effectiveness of the designed Fuzzy-MOPSO the empirical results are presented in this subsection. Also the obtained results are compared to TSRoulWheel and the population-based annealing (PBAA) methods. TSRoulWheel is one of the hyper-heuristic algorithms proposed by Burke et al. [39] which has the best performance among others. In this method the choice of individuals is based on roulette-wheel selection and a given objective u is chosen with a probability that is proportional to the distance from the value of u in the current solution to the value of u in an ideally optimal solution (i.e. a solution in which the value of each objective is optimal – such a solution may not exist). PBAA is a population-based annealing algorithm which has been tailored for the space allocation problem [41]. This approach is a hybrid algorithm that evolves a population of solutions using a local search heuristic H LS and a mutation operator. This local search heuristic manages the same simple neighborhood search heuristics described above for the space allocation problem. However, H LS incorporates knowledge of the problem domain to decide which neighborhood search heuristic to apply according to the status of the current solution. The mutation operator disturbs a solution in a controlled way by removing from their assigned room those entities that have the highest penalties. All algorithms were executed in MATALB version 7.0.0.19920 (R4). For FuzzyMOPSO an initial swarm size of 100 was selected and the method was executed for predefined maximum number of iterations of 100000. The non-dominated front obtained by the Fuzzy-MOPSO, are compared with those produced by the TSRoulWheel, and the PBAA methods in Figure 10 (a) to (c) for nott1, nott1b, and trent1 problems, respectively. With regard to the non-dominating characteristic, number of the solutions, and smoothness of the fronts, it can be seen that the Fuzzy-MOPSO produces better fronts for nott1and nott1b problems in comparison to the TSRoulWheel and PBAA (Figure 10 (a) and (b)). It can be visually verified that except a few points in Figure 10(a) and (b), the trade-off surface obtained by the proposed method dominates the estimated solutions obtained by other approaches. In the problem trent1 (Figure 10 (c)), like nott1 and nott1b, the obtained front by the Fuzzy-MOPSO, clearly outperforms TSRulWheel. Also the proposed algorithm prepares better
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
28
Seyed-Hamid Zahiri
solutions in the lower part of the trade-off surface in comparison to the PBAA, while PBAA does better in the upper part of the front.
(a)
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
(b)
(c) Figure 10. Non-dominated fronts obtained by TSRoulWheel ,PBAA, and Fuzzy-MOPSO for the problems. a) nott1 , b) nott1b, and c) trent1.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
29
Swarm Intelligence and Fuzzy Systems
It should be mentioned that important knowledge of the problem domain incorporated into the PBAA approach helps to obtain high quality sets of non-dominated solutions. Here, it can be seen that the multi-objective swarm intelligence based technique appears to be competitive for trent1problem, and better for nott1 and nott1b problems. It can be visually verified from Figure 10 (a) to (c) that the produced fronts by both TSRulWheel and PBAA in nott1, nott1b, and trent1 instances, suffer large gaps (empty spaces) exist among some points of the solutions. These considerable gaps are not seen in the trade-off surfaces produced by the FuzzyMOPSO; because the designed fuzzy controller attempts to minimize the minimal spacing ( S m ) and aggregation factor (A). Thus, Fuzzy-MOPSO achieves much better coverage of the trade-off front in the three problem instances while the non-dominated solutions produced by the other two algorithms (TSRulWheel and PBAA) algorithms produce solutions that are clustered in some region on the trade-off front. Two performance measures (i.e. S m and A) of three algorithms are compared for three problems in Table 1. Already these performance metrics have been defined in subsection 2-2-2-1. Since, here two-objective optimization has been considered, thus the maximum value of A is equal to 0.25 for three problems. Also the pairs of {M1, M 2} are needed for calculating aggregation
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
factor, where have been considered {(130,1400), (270,600)}, {(30,1200), (200,450)}, and {(10,4500), (370,3400)} for nott1, nott1b, and trent1, respectively. The results of Table 1 conform the claim that the Fuzzy-MOPSO produces the best coverage and uniformity for trade-off fronts. Because the best values of Sm and A have been obtained for the proposed algorithm for nott1, nott1b, and trent1. Also, the results of Table 1 demonstrate the effectiveness of the fuzzy controller in the Fuzzy-MOPSO to lead the adaptive swarm into the more effective and useful regions of the solution space. Table 1. Performance measures of three algorithms for three problems. nott1 TSRulWheel PBAA Fuzzy-MOPSO
nott1b
trent1
Sm
A
Sm
A
Sm
A
0.584 0.425 0.185
0.055 0.066 0.178
0.423 0.358 0.227
0.085 0.072 0.182
0.162 0.236 0.154
0.032 0.121 0.194
3. Conclusion In this chapter, we described how the swarm intelligence algorithms can be employed by the fuzzy systems and vice versa. Two issues of optimizing fuzzy parameters with the aid of a swarm intelligence algorithm, and improving the powerfulness and effectiveness of the swarm intelligence algorithms aiding an added fuzzy controller are investigated and followed by different singleobjective and multi-objective practical problems.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
30
Seyed-Hamid Zahiri
In Section 1 a fuzzy classifier is designed using integer-valued PSO with constriction coefficient and its performance was evaluated for three benchmarks in a pattern recognition task. In Section 2 a Fuzzy-MOPSO algorithm was proposed for obtaining Pareto front. To evaluate the estimated Pareto front new indices were introduced. Three fuzzy inputs of aggregation factor (A), minimal spacing (Sm), and the number of iterations whose non-dominated points are unchanged (UN) were introduced to track the progress of search process of the algorithm and quality of produced trade-off front. Seven effective fuzzy rules were extracted based on linguistic descriptions on the effects of internal parameters on the search process of a modified MOPSO. By evaluating these indices, effective fuzzy rules were extracted to control the important parameters of MOPSO for improving the quality of obtained Pareto front. The performance of Fuzzy-MOPSO was evaluated by tackling multi-objective combinatorial benchmarks and different instances of space allocation problem. Overall, the experimental results show that swarm intelligence algorithms can effectively be used in designing fuzzy systems and fuzzy controllers can improve the search process of the swarm intelligence techniques. Thus, combining the fuzzy logic concepts and swarm intelligence techniques leads to more effective tools to solve multi-objective complex problems.
References
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[1]
Shi, Y; Eberhart, R; Chen, Y. "Implementation of evolutionary fuzzy systems," IEEE Trans. on Fuzzy Systems, 1999, vol. 7, no. 2, 109-119. [2] Ishubichi, H; Nakashima, T; Murata, T. "Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems," IEEE Trans. on Systems, man and Cybernetics-Patrt b: Cybernetics, 1999, vol. 29, no. 5, 601-618. [3] Sugeno, M. "An introductory survey of fuzzy control," Inf. Sci., 1985, vol. 36, nos. 1/2, 59-83. [4] Jang, JSR; Sun, CT; Mizutani, E. Neuro-Fuzzy and Soft Computing, Prentice Hall, 1997. [5] Kennedy, J; Eberhart, RC. "Particle Swarm Optimization," In Proc. IEEE International Conference on Neural Networks IV, 1995, 1942-1948. [6] Clerc, M; Kennedy, J. "The particle swarm–explosion, stability, and convergence in a multidimensional complex space," IEEE Trans. on Evolutionary Computation, 2002, vol. 6, No.1, 58-73, February. [7] Shi, Y; Eberhart, RC." A modified particle swarm optimizer," In Proc. of the IEEE Intl. Conf. on Evolutionary Computation, 1998, 69-73. [8] Clerc, M. " The swarm and the queen: towards a deterministic and adaptive particle swarm optimization," In Proc. of the 1999 Congress on Evolutionary Computation, 1999, 1951-1957. [9] Al-kazemi, B; Mohan, CK." Multi-phase generalization of the particle swarm optimization algorithm," In Proc. of the 2002 Congress on Evolutionary Computation, 2002, vol. 1, 489-494, May. [10] Higashi, N; Iba, H. "Particle swarm optimization with Gaussian mutation," In Proc. of
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Fuzzy Systems
[11] [12] [13] [14] [15] [16] [17] [18] [19] [20]
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[21] [22] [23] [24] [25] [26] [27] [28] [29]
31
the 2003 IEEE Swarm Intelligence Symposium, SIS '03., 2003, 72-79, April. Yang, S; Wang, M; Jiao, L. "A quantum particle swarm optimization," In Proc. of the 2004 Congress on Evolutionary Computation, 2004, vol. 1, 320-324, June. Zheng, Y; Ma, L; Zhang, L; Qian, J. "On the convergence analysis and parameter selection in particle swarm optimization," In Proc. of the Second Intl. Conf. on Machine Learning and Cybernetics, 1802-1807, November 2003. Secrest, BR; Lamont, GB. "Visualizing particle swarm optimization – Gaussian particle swarm optimization," In Proc. of the 2003 IEEE Swarm Intelligence Symposium, 2003, SIS'03, 198-204, April. Van den Berg, F. An Analysis of Particle Swarm Optimizers, Ph.D. thesis, Department of Computer Science, University of Pretoria, South Africa, 2002. Eberhart, RC; Shi, Y. "Comparing inertia weights and constriction factors in particle swarm optimization," In Proc. of the 2000 Congress on Evolutionary Computation, 2000, 84-88, July. Parpinelli, RS; Lopes, HS; Freitas, AA. Data mining with an ant colony optimization algorithm, IEEE Transactions on evolutionary computing, 2002, vol. 6, no. 4, 321-332. Tao, W; Jin, H; Liu, L. "Object segmentation using ant colony optimization algorithm and fuzzy entropy", Pattern Recognition Letters, 2007, vol. 28, 788-796. Han_, Y; Shi, P. "An improved ant colony algorithm for fuzzy clustering in image segmentation" Neurocomputing, 2007, vol.70, 665-671. Chatterjee, A; Siarry, P. "A PSO-aided neuro-fuzzy classifier employing linguistic hedge concepts", Expert Systems with Applications, 2007, vol. 33, 1097-1109. Chen, D; Zhao, C." Data-driven fuzzy clustering based on maximum entropy principle and PSO", Expert Systems with Applications, doi:10.1016/j.eswa.2007.09.066, 2007. Feng, HM. "Hybrid stages particle swarm optimization learning fuzzy modeling systems design", Tamkang Journal of Science and Engineering, 2006, vol.9. no.2, 67176. Li, L; Yang, Y; Peng, H. "Fuzzy system identification via chaotic ant swarm", Chaos, Solitons & Fractals, doi:10.1016/j.chaos.2008.01.011, 2008. Moore, J; Chapman, R. "Application of particle swarm to multiobjective optimization," Department of Computer Science and Software Engineering, Auburn University, 1999. Coello, CAC; Lechuga, MS. "MOPSO: a proposal for multiple objective particle swarm optimization," In Congress on Evolutionary Computation (CEC‘2002), 2002, 2, 10511056. Coello, CAC; Pulido, GT; Lechuga , MS." Handling multiple objectives with particle swarm optimization," IEEE Trans. on Evolutionary Computation, 2004, vol. 8 , no.3, 256-279. Parsopoulos, KE; Vrahatis, MN. M.N.,"Particle swarm optimization method in multiobjective problems," In Proceedings of the 2002 ACM Symposium on Applied Computing, (SAC‘2002), Madrid, Spain, ACM Press, 2002, 603-607. Baumgartner, U. Ch. Magele, and W. Renhart, "Pareto optimality and particle swarm optimization," IEEE Trans. on Magnetics, 2004, vol. 40, no.2, 1172-1175. Hu, Xi; Eberhart, RC; Shi, Y. "Particle swarm with extended memory for multiobjective optimization," In Proceedings of the 2003 IEEE Swarm Intelligence Symposium, 2003, 193-197. Ray, T; Liew, KM. "A swarm Metaphor for multiobjective design optimization,"
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
32
[30] [31] [32] [33] [34] [35] [36] [37] [38]
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[39] [40] [41]
Seyed-Hamid Zahiri Engineering Optimization, 2002, vol. 34, no.2, 141-153. Chow, Ch; Tsui, H. "Autonomous agent response learning by a multi-species particle swarm optimization," In Congress on Evolutionary Computation (CEC‘2004), 2004, 1, 778-785, 2004. Reyes-Sierra, M; Coello, CAC. "Multi-objective particle swarm optimizers: a survey of state-of-the-art," Intl. J. of Copmut. Intell. Res., 2006, 3, 287-308. Shi, Y; Eberhart, RC. R.C.,"Parameter selection in particle swarm optimization," In Evolutionary Programming VII: Proceedings of the Seventh annual Conference on Evolutionary Programming, 591-600, 1998. Shi, Y; Eberhart, RC. "Empirical study of particle swarm optimization," In Congress on Evolutionary Computation (CEC‘1999), 1999, 1945-1950. Zahiri, SH; SH; Seyedin, SA. S.-A., "Swarm intelligence based classifiers," J. of the Franklin Inst., 2007, 344, 362-376. Zitzler, E. "Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications," PhD thesis, Swiss Federal Inst. Technol. (ETH), Zurich, Switzerland, 1999. Schott, JR. "Fault Tolerant Design Using Single and Multi-Criteria Genetic Algorithms," Ph.D. dissertations, Dept. Aeronaut. and Astronaut., Massachussets Inst. Technol., Boston, MA., 1995. Bandyopadhyay, S; Pal, SK; Aruna, B. " Multiobjective GAs, quantitative indices, and pattern classification," IEEE Transactions on systems, man, and cybernetics – part b: cybernetics, 2004, vol. 34, no. 5, 2088-2099. Miranda, V; Keko, H; Duque, AJ. "Stochastic star communication topology in evolutionary particle swarms (EPSO)" International Journal of Computational Intelligent Research (to appear), 2008. Burke, EK; Landa Silva, JD; Soubeiga, E. Multiobjective Hyper-heuristic Approaches for Space Allocation and Timetabling, to appear in: T; Ibaraki, K; Nonobe, M. Yagiura, (eds.), Meta-heuristics: Progress as Real Problem Solvers, Springer. Landa-Silva, JD. "Metaheuristic and Multiobjective Approaches for Space Allocation," PhD Thesis, School of Computer Science and Information Technology, University of Nottingham, 2003. Burke, EK; Landa Silva, JD. "The influence of the fitness evaluation method on the performance of multiobjective optimisers," European Journal of Operational Research, 2003.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
In: Applications of Swarm Intelligence Editor: Louis P. Walters, pp. 33-55
ISBN: 978-1-61728-602-5 © 2011 Nova Science Publishers, Inc.
Chapter 2
EVOLUTIONARY STRATEGIES TO FIND PARETO FRONTS IN MULTIOBJECTIVE PROBLEMS Voratas Kachitvichyanukul and Nguyen Phan Bach Su Industrial and Manufacturing Engineering, Asian Institute of Technology, Klongluang, Pathumthani, Thailand
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Abstract The existence of multiple conflicting objectives in many real world problems poses a real challenge for the search of compromised solutions. One of the common approaches is to find the Pareto fronts for the tradeoff of the conflicting objectives. In identifying the Pareto fronts, many potential solutions are required and this naturally favored most evolutionary methods that utilized multiple solutions, i.e., population based methods. Evolutionary methods such as genetic algorithm (GA), Ant colony optimization (ACO), and particle swarm optimization (PSO) have been applied to effectively find solutions for many difficult single objective optimization problems in recent years. Among these evolutionary methods, one of the more popular approaches is the particle swarm optimization (PSO) algorithm. Several PSO algorithms were proposed to deal with the problem of identifying Pareto fronts for optimization problems with multiple conflicting objectives. The main focus of these previous studies is on the components to search for the non-dominated particles and for the construction of an effective external archive to store these non-dominated particles. This paper investigates various strategies that affect the movement behaviors of the particles in the swarms. These strategies have strong direct influence on the quality of final Pareto fronts. The main strategies included in the study include the followings: guiding particles based on Crowding Distance (CD), adopting Differential Evolution (DE) concept for swarm movement, as well as using swarm with mixed particles to explore Pareto solutions. These strategies are applied to solve standard benchmark problems to assess the quality of the resulting Pareto fronts.
Keywords: Particle Swarm Optimization, Pareto Front, Non-dominated Front, Evolutionary Strategy, Multi-objective Optimization.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
34
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1. Introduction Multi-objective optimization (MO) is a branch of optimization which mainly deals simultaneously with more than one objective instead of single objective in traditional optimization problems. MO becomes increasingly attractive to both practitioners and researchers because most real-world problems contain multiple conflicting objectives. One of the most intuitive methods to solve multi-objective problem is to combine the objectives into a single aggregated objective function. In this method, each objective function will be assigned a weight based on the preference of the decision makers and all of these weighted functions are linearly combined. The only remaining task is to use any available optimizer to find the solution for the problem with this single aggregated objective function. However, this approach has two major drawbacks. Firstly, a single solution is obtained based on a set of predefined, subjective weights on the objective functions. Thus the requirement of prior preference of the decision makers may not lead to a satisfactory result. Secondly, the decision maker’s knowledge about the range of each objective value may be limited. As a result, even with a preference in mind, the single solution obtained provides no possibility for tradeoffs of decisions. In order to be more objective, the approach based on a single aggregative objective function needs to be run multiple times to see the effect of the weights on the solutions obtained. Hence it is more preferable to provide means for the decision maker to find the tradeoff by identifying the non-dominated solutions or Pareto front, which usually consumes a relatively large amount of computational time. For that reason, many methods are developed to search for the Pareto front. In this case, multi-objective Evolutionary Algorithm (EA) is the most commonly selected solution technique. One of the earlier attempts to solve multi-objective optimization problems using Evolutionary Algorithm (MOEA) is Non-dominated Sorting Genetic Algorithm or NSGA (Srinivas and Deb, 1995). This method was commonly criticized for its high computational complexity which made it inefficient with a large population size. Another problem with this method is that its effectiveness depends mostly on the pre-defined sharing parameter. To address the drawbacks of the original NSGA, the new NSGA-II is proposed (Deb et al., 2002) by adopting a new non-dominated sorting procedure, an elitism structure, and a measurement of crowdedness. In their paper, NSGA-II had been demonstrated to outperform other MOEAs such as Pareto-archived evolutionary strategy (PAES) and strength- Pareto EA (SPEA). More recently, a number of researchers interested in Particles Swarm Optimization (PSO) have developed new searching mechanisms for PSO to search for the Pareto front. Coello et al. (2002) proposed the idea to use an external archive to store non-dominated particles so that each non-dominated particle may contribute their flying experience. The archive of nondominated particles is then updated by using geographically-based system. The search area in the objective-space is divided into hypercubes. Each hypercube is assigned a fitness based on the number of particle it covers. Roulette-wheel selection is applied to choose the hypercube and a random particle from that hypercube is selected as the leader or global guidance for a particle in the swarm in each iteration. Fieldsend et al. (2002) improves this work by using a tree data structure to maintain unconstrained archive. More recently, Raquel et al. (2005) adopted the idea of crowding distance (CD) proposed in NSGA-II in their PSO algorithm as a criterion for selection of leader from the particles. In this method, when preparing to move, a particle will select its global leader from the top particles in the external archive which are sorted in decreasing order of CD.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
35
Evolutionary Strategies to Find Pareto Fronts …
In search of the non-dominated solutions to form the Pareto front, the old movement strategy based on a single global best and a single local best in tradition PSO no longer made any sense and it is necessary to have alternative movement strategies to guide the particles to move toward the non-dominated solutions. This study focuses on how to efficiently use the information from the external archive to guide the movement of particles in the swarm for more efficient search of non-dominated solutions. Several movement strategies are discussed and their performances are tested with a set of published benchmarking problems. The review of non-dominated solutions and Pareto optimality is given in the next section for general readers who are not familiar with these concepts. In section 3, PSO concept and the PSO framework for MO problems are presented to show how PSO actually works in multi-objective environment. Section 4 discusses several movement strategies of particles. The experiments are discussed in section 5 and followed by the analysis of results in section 6. Finally, the conclusions and recommendations for further studies are discussed in section 7.
2. Pareto Optimality Multi-objective Optimization is a problem of finding solutions that simultaneously minimize a set of conflicting objective functions. The mathematical model for this problem is given as follow: 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒
𝑓 𝑥 = 𝑓1 𝑥 , 𝑓2 𝑥 , … , 𝑓𝐾 𝑥 .
(1)
𝑔𝑖 𝑥 ≤ 0 𝑖 = 1,2, … 𝑚
(2)
𝑖 𝑥 = 0 𝑖 = 1,2, … 𝑙
(3)
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
subject to:
where 𝑥 is the vector of decision variables, 𝑓𝑖 𝑥 is a function of 𝑥 , 𝐾 is the number of objective function to be minimzed, 𝑔𝑖 𝑥 and 𝑖 𝑥 are the constraint functions of the problem. Given two decision vectors 𝑥, 𝑦 ∈ 𝑅 𝐷 , the vector 𝑥 is considered to dominate vector 𝑦 (denote 𝑥 ≺ 𝑦), 𝑖𝑓 𝑓𝑖 𝑥 ≤ 𝑓𝑖 𝑦 𝑓𝑜𝑟 ∀𝑖 = 1,2, … , 𝐾 𝑎𝑛𝑑 ∃𝑗 = 1,2, … 𝐾|𝑓𝑗 𝑥 < 𝑓𝑗 𝑦 . As shown Figure 1, for the cases that neither 𝑥 ≺ 𝑦 nor 𝑦 ≺ 𝑥 , 𝑥 𝑎𝑛𝑑 𝑦 are called non-dominated solutions or “trade-off” solutions. A non-dominated front 𝒩 is defined as a set of non-dominated solutions if ∀𝑥 ∈ 𝒩, ∄𝑦 ∈ 𝒩|𝑦 ≺ 𝑥 . A Pareto Optimal front 𝒫 is a non-dominated front which includes all solution 𝑥 non-dominated by any other 𝑦 ∈ ℱ, 𝑦 ≠ 𝑥 where ℱ ∈ 𝑅 𝐷 is the feasible region.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
36
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
f2 𝑓2 𝑦
𝑓 𝑦
𝑓2 𝑥 𝑓 𝑥
Non dominated front 𝑓1 𝑥
f1
𝑓1 𝑦
Figure 1. 𝑥 ≺ 𝑦 for the case with two objectives.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
3. Multi-objective Optimization with PSO Kennedy and Eberhart (1995) proposed the particle swarm optimization (PSO) method which was inspired by the social behavior of birds flocking or fish schooling. During the search, each individual particles learn and adapt their behavior based on their own experience and shared experience among particles in a swarm. Hence, they could move to potentially better regions of the problem space. The original version of the PSO algorithm (Kennedy and Eberhart, 1995) is called a global model. In that model, the particles in the swarm move across the search space based on their personal best experience (cognitive term) and a global best experience (social term) of the swarm. Shi and Eberhart (1998, 1999) proposed a procedure to adjust the inertia weight to improve the searching ability of PSO. In their approach, inertia weight is linearly reduced from its pre-defined maximal value to minimal value so that the swarm can search the whole search space more aggressively at the early stage and gradually reduce the search space at the later stage. One of the drawbacks of the basic PSO algorithm is that the particles in a swarm tend to cluster quickly toward the global best particle and the swarm is frequently trapped in a local optimum and can no longer moved. To deal with the tendency to converge prematurely to a local optimum, a popular approach was proposed by Veeramachaneni et al. (2003) to divide the swarm into subgroups and to replace the global best with local best or near neighbor best particle. Pongchairerks and Kachitvichyanukul (2005, 2009b) introduced the used of multiple social learning terms and extend the concept of the standard PSO in GLNPSO. Instead of using just the global best particle, GLNPSO also incorporates the local best and near-neighbor best (Veeramachaneni et al., 2003) as additional social learning factors. In GLNPSO, the updating formula for velocity and position in the dth dimention of particle i are: 𝑝
𝑔
𝜔𝑖𝑑 𝑡 + 1 = 𝑤𝜔𝑖𝑑 𝑡 + 𝑐𝑝 𝑢 𝜓𝑖𝑑 − q𝑖𝑑 (𝑡) + 𝑐𝑔 𝑢 𝜓𝑑 − q𝑖𝑑 (𝑡) q𝑖𝑑 𝑡 + 1 = q𝑖𝑑 𝑡 + 𝜔𝑖𝑑
𝑙 𝑛 +𝑐𝑙 𝑢 𝜓𝑖𝑑 − q𝑖𝑑 (𝑡) + 𝑐𝑛 𝑢 𝜓𝑖𝑑 − q𝑖𝑑 (𝑡) 𝑡
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
(4) (5)
37
Evolutionary Strategies to Find Pareto Fronts … 𝑝
𝑔
𝑙 𝑛 In this formula, 𝜓𝑖𝑑 , 𝜓𝑑 , 𝜓𝑖𝑑 , 𝜓𝑖𝑑 , represent the dth dimention of the personal best, the global best, the local best, the near neighbor best respectively, and q𝑖𝑑 𝑡 is the current
position of particle i at the 𝑡 𝑡 generation. The local best position of particle i, Ψ𝑖𝑙 = 𝑙 𝑙 𝑙 𝜓𝑖0 , 𝜓𝑖1 , … , 𝜓𝑖𝐷 can be determined as the best particle among K neighboring particles. It is equivalent to dividing the whole swarm into multiple sub-swarms with population of size K, the local best particle is the best particle among the K neighbors, i.e., the particle gives the best fitness value in the sub-swarms. 𝑛 𝑛 𝑛 represents the , 𝜓𝑖1 , … , 𝜓𝑖𝐷 The near neighbor best position of particle i, Ψ𝑖𝑛 = 𝜓𝑖0 interaction between particles to achieve a better value of the objective function. The neighborhood is determined by the direct linear distance from the best particle. Each element of Ψ𝑖𝑛 is determined by the following formula:
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
𝐹𝐷𝑅 =
𝐹𝑖𝑡𝑛𝑒𝑠𝑠 Θi − 𝐹𝑖𝑡𝑛𝑒𝑠𝑠 Ψi q𝑖𝑑 − 𝜓𝑖𝑑
(6)
The equations for updating position and inertia weight 𝑤 are the same as traditional PSO formulas. The other parameters and working procedures are also similar to those in traditional PSO. The PSO approach has been used successfully as solution method for many difficult single objective problems, (see for examples, Pongchairerks and Kachitvichyanukul, 2009a and Ai and Kachitvichyanukul, 2009). However, the use of single reference may not be effective for applications that must deal with multiple conflicting objective functions. As discussed earlier, one of the approaches for solving problems with multiple conflicting objective functions is to search for Pareto optimal front, i.e., to search for the set of non-dominated solutions. This Pareto optimal front represents the best solution for the problems with multiple conflicting objective functions. It is quite a different proposition from searching for a single best point and it is necessary to modify the original framework of PSO. The key components to be modified include the following:
Storage of elite group or non-dominated solutions found so far Selection of a reference particles (or leaders) to guide the swarm toward better positions Movement strategy, how to use the reference particles as search guidance
In the multi-objective optimization problems, the flying experience of the swarm needs to be stored as a set of non-dominated solutions instead of a single solution. In this case, the Elitist structure as mentioned in NSGA-II is adopted. After each update of particle position, the objective functions of each particle are evaluated and they must all be processed by a nondominated sorting procedure. This sorting procedure identifies the group of particles in the swarm which are non-dominated by other particles and put all of these particles into an archive for the Elite group. Again, the Elite group is screened to eliminate inferior solutions, i.e., solutions that were dominated by those in the Elite group. As a result, the Elite group in the archive is the best non-dominated solutions found so far in the searching process of the swarm. When the Elite group is formed, one of the biggest challenges for most EAs is how to select the candidates among the Elite group to help guide the evolution of the population. The most common criterion is that the leader (or guidance) needs to lead the population to the less Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
38
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
crowded areas to obtain a better spread of the final front. A successful implementation of this idea is given in NSGA-II with the introduction of crowding distance (CD) as a measure of the spread of the non-dominated front. This approach estimates the density of solutions surrounding a specific solution by calculating the average distance of two points on either side of this point along each of the objective (see Deb et al., 2002 for more details). The advantage of this approach is that it does not require a pre-determined sharing parameter in NSGA. Coello et al. (2002) proposed a PSO algorithm with a geographically-based system to locate crowded regions. They divided the objective space into a number of hypercubes and then each member in the Elite archive is assigned to one of these hypercubes. After the archive is classified, a hypercube with smallest density is considered and one of its members is randomly selected to be used as the global guidance. Finally, the movement of particles is very critical to improve the quality of the Pareto front. Most of the proposed Multi-objective PSO (MOPSO) algorithms use only a single global guidance from the Elite group similar to the traditional PSO movement strategy. However, the existence of multiple candidates in the archive may open a large number of choices for movements. In section 4, several potential movement strategies are discussed as options to fully utilize the Elite archive as guidance for the search. In Figure 2, a PSO framework for multi-objective optimization problems is presented. This framework takes into account all the features that are mentioned above and it will be used throughout this study. The implementation of this framework is described in algorithm A1. The 𝑁𝑜𝑛_𝑑𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑_𝑆𝑜𝑟𝑡 (𝒮) uses the sorting algorithm proposed in NSGA-II to identify non-dominated solutions. After each particle is evaluated, the set of non-dominated solutions will be updated and stored in the Elite group. The number of solutions in the Elite group is usually limited to reduce the computational time for sorting and updating procedures. When the number of non-dominated solutions exceeds the limit, the particles located in the crowded areas will be selectively removed, so the Elite group can still result in a good Pareto front. The two procedures 𝑆𝑒𝑙𝑒𝑐𝑡_𝐺𝑢𝑖𝑑𝑎𝑛𝑐𝑒 (ℰ) and 𝑈𝑝𝑑𝑎𝑡𝑒_𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦 ℊ are movement strategy dependent and will be separately discussed in the next section.
A1. Algorithm for MOPSO i. Initialize the swarm 𝒮 and set the velocities of all particle to zero ii. For each particle 𝑖 ∈ 𝒮 with position Θ𝑖 Evaluate objective function 𝑓𝑘 Θ𝑖 , ∀𝑘 = 1,2, … , 𝐾 iii. 𝒮 ∗ ← 𝑁𝑜𝑛_𝑑𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑_𝑆𝑜𝑟𝑡 (𝒮) - 𝒮 ∗ is the set of non-dominated particles in 𝒮 iv. Elite archive ℰ ← 𝑁𝑜𝑛_𝑑𝑜𝑚𝑖𝑛𝑎𝑡𝑒𝑑_𝑆𝑜𝑟𝑡 (ℰ ⋃ 𝒮 ∗ ) v. If the stopping criterion is satisfied, end procedure; otherwise, go to step vi vi. 𝑈𝑝𝑑𝑎𝑡𝑒_𝑠𝑜𝑐𝑖𝑎𝑙 _𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 _𝑡𝑒𝑟𝑚𝑠 vii. 𝐺𝑙𝑜𝑏𝑎𝑙 𝑔𝑢𝑖𝑑𝑎𝑛𝑐𝑒 ℊ ← 𝑆𝑒𝑙𝑒𝑐𝑡_𝐺𝑢𝑖𝑑𝑎𝑛𝑐𝑒 (ℰ) viii. 𝑈𝑝𝑑𝑎𝑡𝑒_𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦 ℊ using equation (4) ix. 𝑈𝑝𝑑𝑎𝑡𝑒_𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 by equation (5) x. Return to step ii
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
39
Evolutionary Strategies to Find Pareto Fronts …
Elite group
Yes
No
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 2. Framework for MOPSO.
In this study, the multiple social learning terms in GLNPSO are used to update the new velocity. As a result, the new velocity is influenced by four social terms: personal best, global best (global guidance), local best and near neighbor best. The global guidance is the most important term in this framework and it depends mainly on the movement strategy adopted by the swarm; therefore it will be discussed separately. The modifications for other terms are adjusted in this framework to make it work for MO problems. In MO problems, there are two situations when the personal best need to be updated. First, when the new position of a particle dominates its personal best experience, it certainly becomes the personal best. However, if the new position and its personal best are nondominated, the issue to face is whether to update to the new value or not. Keeping the current personal best position helps the particle explore the local region deeper, which can lead to higher quality solutions. On the other hand, it is also desirable to move to new position to spread out the non-dominated front. Because each decision has its own advantages, the algorithm will randomly pick one of them to become the personal best. For the near neighbor best, a fitness distance ratio (FDR) which was originally developed to find the neighbor best are modified to handle multiple objective functions as shown in equation (5). 𝐹𝐷𝑅𝑀𝑂 =
𝐾 𝑘=1 %Δk
𝜃𝑖𝑑 − 𝜓𝑜𝑑
𝑓𝑜𝑟 𝑎𝑙𝑙 𝑑 = 1 … 𝐷, 𝑖 = 1 … 𝐿
𝑓𝑘 Θi − 𝑓𝑘 (Ψo ) %Δk = 𝑓𝑘 Θi
(7)
In equation (7), fk (.) is the kth objective function and θi𝑑 , ψo𝑑 are the values at dimension d of particle i and its neighbor o and D and L are the dimension of a particle and the number Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central, of particles in the swarm respectively (refer to Peram et al. (2003) and Veeramachaneni et al.
40
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
(2003) for more details about FDR with single objective). In the implementation, a very small value 𝜀 should be included in the dominators to handle the cases that a dominator might become zero. The amount of improvement that can be made when a neighbor h is chosen is represented by %Δk . By using equation (7), the near neighbor best should be the one that is expected to guide a particle to a position that can achieve the most improvement across all objective functions. In order to prevent the particle from being too sensitive to every change of the swarm, the local best is only updated when the new local particles dominated the current best one.
4. Movement Strategies As mentioned in the previous section, MO problems require the swarm to store its searching experience as a set of non-dominated solutions instead of a single best one. Then, a very key research question is how can a particle effectively use the knowledge of this Elite group to guide it to a better position? Because the target is to identify the near optimal Pareto front, the definition of a better position is more complex than that for the cases of single objective optimization problems. In literature, the three common criteria to measure the quality of a non-dominated front 𝒩are:
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
The average distance to the Pareto optimal front 𝒫 The distribution of non-dominated solutions in 𝒩 The spread of 𝒩 in the multi-objective space
Similar to any optimization problem, the gap between the solutions found and the true optimal solutions should be as small as possible. Moreover, the solutions should provide a good outline of the Pareto front so that the decision makers can make more informed decisions. Based on the above criteria, six movement strategies are proposed. These strategies are especially designed to obtain high quality Pareto front. The procedures to perform these movements will be included in step vii and viii of the MOPSO framework.
4.1. Ms1: Pick a Global Guidance Located in the Least Crowded Areas This strategy aims at diversifying particles in the swarm so that they can put more effort in exploring the less crowded areas, thereby increasing the spread of the non-dominated front. For that reason, a particle in the Elite group with fewer particles surrounding it is preferred when selecting the global guidance. The crowded distance CD estimates the density of solutions located around a specific solution by calculating the average distance of two points on either side of this point along each of the objectives. A procedure to calculate the crowding distance (CD) for each member in the Elite group is implemented as given in NSGA II. To make this paper self-contained, the algorithm to calculate CDs is given in algorithm CD below.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Evolutionary Strategies to Find Pareto Fronts …
41
Algorithm CD: 𝐂𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐞_𝐜𝐫𝐨𝐰𝐝𝐢𝐧𝐠_𝐝𝐢𝐬𝐭𝐚𝐧𝐜𝐞 (𝓔) (from Deb et al., 2002) 𝐿 = |ℰ| 𝐹𝑜𝑟 𝑒𝑎𝑐 𝑖, 𝑠𝑒𝑡 ℰ 𝑖 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = 0 𝐹𝑜𝑟 𝑒𝑎𝑐 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑚 ℰ = 𝑠𝑜𝑟𝑡(ℰ, 𝑚) ℰ 1 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = ℰ 𝐿 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = ∞ 𝐹𝑜𝑟 𝑖 = 1 𝑡𝑜 (𝐿 − 1) ℰ 𝑖 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = ℰ 𝑖 . 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 + (ℰ 𝑖 + 1 . 𝑚 − ℰ 𝑖 − 1 . 𝑚)/( 𝑓𝑘𝑚𝑎𝑥 − 𝑓𝑘𝑚𝑖𝑛 )
Particles with higher CDs are located in less crowded area and they are considered to be good candidates for global guidance in this movement strategy. y ℊ,𝑑 and q𝑖,𝑑 are dimension d of the global guidance and particle i in the swarm respectively. The movement direction of Ms1 is shown in Figure 3 and the pseudo-code for this movement strategy is presented in algorithm A2. In step i of algorithm A2, a procedure to calculate the crowding distance (CD) for each member in the Elite group is called.
f2
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
More attractive
Elite group
f1
Figure 3. Movement strategy 1 in bi-objective space.
A2. Algorithm for Ms1 i. ii. iii. iv.
𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒_𝑐𝑟𝑜𝑤𝑑𝑖𝑛𝑔_𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (ℰ) Sort ℰ by decreasing order of crowding distance (CD) values Randomly select a particle ℊ from top t% of ℰ Update global term in particle i movement by 𝑐𝑔 𝑢 (y ℊ,𝑑 − q𝑖,𝑑 ) for all dimension d with u ~ U(0,1)
4.2. Ms2: Create the Perturbation with Differential Evolution Concept
The fact that more than one global non-dominated solution exist has raised the questions of whether it is better to combine the knowledge of two or more members in the Elite group to guide a particle. In this strategy, the concept of Differential Evolution (DE), proposed by Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central, Price and Storn (1995) for continuous function optimization, is adopted to utilize the flying
42
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
experience of two individual in the Elite group. The key idea behind DE is to use vector differences for perturbing the vector population. In the original DE algorithm, a new parameter vector is generated by adding the weighted difference between two population members to a third member (all of these vectors are randomly selected from the population). A fitness selection scheme similar to Genetic Algorithm (GA) is carried out to produce offspring to form new population. Elite group f2
Current Position u*d R2
d=R1 -R2
New Position
Pi R1
f1
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 4. Movement strategy 2 in bi-objective space.
The inspiration for this strategy is that this PSO has the tendency to converge quite fast to some best solutions in the swarm. This is counterproductive since this can reduce its ability to search for a wider range of solutions in a Pareto front. Therefore, it is more desirable to have a mechanism to perturb the swarm and move its members to the new and less crowded areas. Figure.4 demonstrates the moving strategy Ms2 which adopts the DE concept to create the moving direction for a particle. The algorithm for Ms2 is presented in A3. The points in Figure 4 show the objective values of each particle in objective space; however, it is important to note that that the vectors also represent the corresponding positions of particles as well as their movements in positional space (and these vectors can only be plotted in higher dimension space).
A3. Algorithm for Ms2 i. ii. iii. iv. v.
𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒_𝑐𝑟𝑜𝑤𝑑𝑖𝑛𝑔_𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (ℰ) Sort ℰ by decreasing order of crowding distance (CD) values Randomly select a particle R1 from top t% of ℰ Randomly select a particle R2 from bottom b% of ℰ Update global term in particle i movement by
𝑐𝑔 𝑢 (y𝑅1,𝑑 − y𝑅2,𝑑 ) for all dimension d with u ~ U(0,1)
4.3. Ms3: Search the Unexplored Space in the Non-Dominated Front The two strategies discussed above focus mainly on moving particles to less crowded areas and expand the spread of the non-dominated front. Here, strategy Ms3 is aimed at filling the gap in the non-dominated front and hence improving the distribution of the solutions in
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Evolutionary Strategies to Find Pareto Fronts …
43
the front. Figure 5 shows how the information in the Elite group is used to guide a particle to potential unexplored space within the current non-dominated front. In this strategy, the first step is to identify the potential gap in the Elite group. When the gap is determined, a pair of vectors is used to represent the gap. Algorithm A4 provides the procedure to identify pairs of unexplored vectors and how to move particle based on this information. Elite group f2
Current Position E2 -Pi
E1 -E2 E2 Pi E1 f1
Figure 5. Movement strategy 3 in bi-objective space.
A4. Algorithm for Ms3
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
i.
Identify the unexplored areas in ℰ For each objective functions fk(.) Sort ℰ in increasing order of objective function fk(.) For i=1 to | ℰ| -1 Gap = 𝑓𝑘 Θi+1 − 𝑓𝑘 (Θi ) If Gap > x% *(𝑓𝑘𝑚𝑎𝑥 − 𝑓𝑘𝑚𝑖𝑛 ): add pair (i,i+1) in unexplored list 𝒰
ii. Randomly select one pair (E1, E2) from 𝒰 iii. Update global term in particle i movement by 𝑐𝑔 𝑢 [(𝐸1,𝑑 − q𝑖,𝑑 ) + 𝑟 ∗ (𝐸1,𝑑 − 𝐸2,𝑑 )] for all dimension d with u, r ~ U(0,1) The range of objective function fk(.) in the Elite group is (𝑓𝑘𝑚𝑎𝑥 − 𝑓𝑘𝑚𝑖𝑛 ). By using the condition Gap > x% *(𝑓𝑘𝑚𝑎𝑥 − 𝑓𝑘𝑚𝑖𝑛 ), it is expected that the final non-dominated front will only include the gap less than x% of the any objective function range. Reducing the value of x can improve the distribution of the final front but, at the same time, it may distribute the effort of swarm across the front and slow down the process of searching for better solutions.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
44
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
4.4. Ms4: Combination of Ms1 and Ms2 This strategy tries to balance between the exploration and exploitation abilities of Ms2. Therefore, instead of moving purely to new areas by DE concept, a component similar to Ms1 is added to the perturbation formula in A3 so that a particle not only explores the new region but also benefits from the flying experience of the Elite group to improve the solution quality. Ms4 uses the same algorithm as Ms2 with the following updating formula: 𝑐𝑔 𝑢
y𝑅1,𝑑 − q𝑖,𝑑
+ (y𝑅1,𝑑 − y𝑅2,𝑑 ) = 𝑐𝑔 𝑢 2y𝑅1,𝑑 − q𝑖,𝑑 − y𝑅2,𝑑
4.5. Ms5: Explore Solution Space with Mixed Particles Since each of the movement strategies has its own advantages which can have different contributions toward a high quality Pareto front, it would be beneficial to include more than one search strategy in the algorithm. One of the straightforward ways to perform this idea is to use a heterogeneous swarm, i.e., a single swarm with a mixture of particles with different movement strategies. It is preferable that the composition of a productive swarm should include groups of particles with the following characteristics:
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Ones that prefer to explore based on its own experience and with some influence from its neighbors – Group 1 Ones that prefer to follow the global trend but avoid the crowded areas (Ms1) – Group 2 Ones that like to explore new areas (Ms2) – Group 3 Ones that fill the gaps left by previous movements (Ms3) – Group 4
In Ms5, these four groups of particles co-exist in the same swarm and all of their flying experience is stored in a common Elite archive. A particle of the first group will not directly use the global knowledge but will explore the space gradually based on its own experience and a partial knowledge of its neighbor. For that reason, these particles do not change their movement abruptly every time the global trend changed. This feature helps them to better explore the local region. The second group, on the other hand, searches by using the status of the Elite group and moves to the position that has not been well explored. In the cases that particles in the Elite group have distributed uniformly, members in this group will have similar movement behavior as those in the first group. The responsibility of particles in group 3 is to explore the border to increase the spread of the non-dominated fronts with their perturbation ability. Although the first three groups have tried to explore the search in many different directions, they may still leave some gaps unexplored because of their convergence at some segments on the Pareto front. The task of the last group is to move to fill these gaps so that the final front can have a better distribution.
4.6. Ms6: Adaptive Weight Approach The sixth movement strategy Ms6 is the only one that does not use the global Elite group. The swarm follow Ms6 is divided into n + 1 sub-swarms with n is the number objective functions. The first n sub-swarms will search for the optimal solution corresponding to each
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
45
Evolutionary Strategies to Find Pareto Fronts …
objective functions just like the tradition PSO. The last sub-swarm will minimize the adaptive weighted function as defined in Gen et al. (2008) by the following formula: 𝐹 𝑥 =
𝑛 𝑗 =1 𝑤𝑘
𝑓𝑘 𝑥 − 𝑓𝑘𝑚𝑖𝑛
𝑤𝑒𝑟𝑒 𝑤𝑘 =
1 𝑓𝑘𝑚𝑎𝑥 −𝑓𝑘𝑚𝑖𝑛
(8)
The six movement strategies described in this section will be tested with some benchmarking problems to examine their effectiveness. Section 5 will discussed the test problems as well as the setting of PSO algorithm for the experiments. Table 1. Test problems (Unconstrained) Problem SCH
KUR
#. Var. 1
3
Variable bound [-103,103]
[-5,5]
Objective function 𝑓1 𝑥 = 𝑥 2 𝑓2 𝑥 = (𝑥 − 2)2
𝑓1 𝑥 = 𝑓2 𝑥 =
ZDT1
30
[0,1]
𝑛−1 𝑖=1 𝑛 𝑖=1
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
30
[0,1]
Convex
2 −10exp −0.2 𝑥𝑖2 + 𝑥𝑖+1
𝑥𝑖
0.8
+ 5𝑠𝑖𝑛 𝑥𝑖
𝑛
𝑖=2
𝑓1 𝑥 = 𝑥1 𝑓2 𝑥 = 𝑔(𝑥) 1 − 𝑥1 /𝑔(𝑥) 𝑔 𝑥 = 1+9
𝑛
𝑖=2
30
[0,1]
ZDT4
ZDT6
10
10
𝑥1 ∈ [0,1] 𝑥𝑖 ∈ −5,5 , 𝑖 = 2, . . ,10
[0,1]
𝑖=2
2
non-convex
𝑥1 𝑠𝑖𝑛 10𝜋𝑥1 𝑔(𝑥)
convex, disconnected
𝑥𝑖 / 𝑛 − 1
𝑓1 𝑥 = 𝑥1 𝑓2 𝑥 = 𝑔(𝑥) 1 − 𝑥1 /𝑔(𝑥) 𝑔 𝑥 𝑛
= 1 + 10 𝑛 − 1 +
𝑖=2
non-convex
𝑥𝑖2 − 10cos (4𝜋𝑥𝑖 )
𝑓1 𝑥 = 1 − exp −4𝑥1 𝑠𝑖𝑛2 (6𝜋𝑥1 ) 𝑓2 𝑥 = 𝑔(𝑥) 1 − 𝑓1 𝑥 /𝑔(𝑥) 2 𝑔 𝑥 = 1+9
*All objectives are to be minimized.
𝑛
convex
𝑥𝑖 / 𝑛 − 1
= 𝑔(𝑥) 1 − 𝑥1 /𝑔(𝑥) − 𝑔 𝑥 = 1+9
non-convex, disconnected
𝑥𝑖 / 𝑛 − 1
𝑓1 𝑥 = 𝑥1 𝑓2 𝑥 ZDT3
3
𝑓1 𝑥 = 𝑥1 𝑓2 𝑥 = 𝑔(𝑥) 1 − 𝑥1 /𝑔(𝑥) 𝑔 𝑥 = 1+9
ZDT2
Characteristic
𝑛
𝑖=2
0.25
𝑥𝑖 / 𝑛 − 1
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
non-convex, non-uniformly, spaced
46
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
5. Design of Experiments Ten benchmarking problems which often appear in MO literature are used in this study to evaluate the performances of each movement strategy (Deb et al., 2002). A brief description of these problems is given below in Tables 1 and 2. General PSO parameters used for the experiments are presented in Table 3. In Ms5, the ratio of numbers of particles in each group is 1:1:1:1 respectively. The ratio between subswarm sizes in Ms6 is 1:1:2. In order to prevent the swarm from being trapped in local Pareto front, the position of a particle will be reinitialized if the velocity at that position remains zero for 50 iterations. Totally, there are 6 x 10 experiments and 30 replications are performed for each experiment. Three metrics proposed by Zitzler (2000) are used to measure the quality of the Pareto front. These metrics are calculated as defined by the following formulas: 𝑀1∗ = 𝑀2∗ =
1
𝑝′∈𝑌′
𝑌′
1 𝑌 ′ −1
𝑚𝑖𝑛 𝑝′ − 𝑝 ∗ ; 𝑝 ∈ 𝑌 𝑞′ ∈ 𝑌 ′ ; 𝑝′ − 𝑞′
𝑝′∈𝑌′ n
M3∗
max 𝑝𝑖′ − 𝑞𝑖′
=
∗
∗
> 𝜎∗
; 𝑝′ , 𝑞′ ∈ 𝑌′
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
i=1
𝑀1∗is used to measure the distance to Pareto optimal front, smaller 𝑀1∗ show better convergent to the true Pareto optimal front. 𝑀2∗ and 𝑀3∗ show the distribution and the spread of the front obtained. Here 𝑌′ is the non-dominated front found by an algorithm while 𝑌 is the Pareto optimal front. The higher values of 𝑀2∗ and 𝑀3∗ indicate better results. Because the value of 𝑀2∗ is different from problem to problem and it is difficult to judge how good the distribution (of the front) is, so the measure is rescaled to 𝑀2′∗ = 𝑀2∗ / 𝑌′ which is in the scale range [0, 1]. The sign * indicates that the solutions in the objective space are considered. The distance metric is . and 𝜎 ∗ is a niche value used to measure the distribution of the nondominated front. More details about these metrics can be found in Zitzler (2000). Table 2. Test problems (Constrained) Problem
#. Var.
Variable bound
CONSTR
2
𝑥1 ∈ [0.1,1] 𝑥2 ∈ 0,5 ,
𝑓1 𝑥 = 𝑥1 𝑓2 𝑥 = (1 + 𝑥2 )/𝑥1
𝑔1 𝑥 = 𝑥2 + 9𝑥1 ≥ 6 𝑔2 𝑥 = −𝑥2 + 9𝑥1 ≥ 1
SRN
2
𝑥𝑖 ∈ −20,20 , 𝑖 = 1,2
𝑓1 𝑥 = (𝑥1 − 2)2 + (𝑥2 − 1)2 + 2 𝑓2 𝑥 = 9𝑥1 − (𝑥2 − 1)2
𝑔1 𝑥 = 𝑥12 + 𝑥22 ≤ 225 𝑔2 𝑥 = 𝑥1 − 3𝑥2 ≤ −10
Objective function
Constraints
𝑔1 𝑥 = −𝑥12 − 𝑥22 + 1 + TNK
2
𝑥𝑖 ∈ 0, 𝜋 , 𝑖 = 1,2
𝑓1 𝑥 = 𝑥1 𝑓2 𝑥 = 𝑥2
*All objective are to be minimized
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
0.1cos arctan
𝑥1 𝑥2
≤0
𝑔2 𝑥 = (𝑥1 − 0.5)2 + (𝑥2 − 0.5)2 ≤ 0.5
Evolutionary Strategies to Find Pareto Fronts …
47
Table 3. Parameters of PSO Inertia Constant acceleration Swarm size Upper limit of Elite group % selected from top of Elite group % selected from bottom of Elite group Potential gap Number of iterations
Linearly reduced from 0.9 to 0.4 cp=cg=cl=cn=1 50 particles 100 particles 10% (Ms1, Ms2, Ms5) 20% (Ms2, Ms5) 5% of range of each objective function (Ms3, Ms5) 500
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
6. Results and Discussions The results of six movement strategies on unconstrained problems are presented in Table 4. In this table, the scaled values relative to the best value are used for ease of comparison. Hence, the cells with value 1 indicate the best value among all movement strategies (the true values of a metric are given next to the relative values of the best movement strategy with respect to that metric). For unconstrained problems, Ms5 shows a very consistent convergence to the Pareto optimal front and is only slightly beaten by Ms6 in ZDT6. For the distribution of the nondominate fronts, Ms5 also shows an excellent performance in most cases. Ms6 and Ms4 are only slightly better than Ms5 in SCH and KUR but the fronts obtained by these two strategies have not well converged to the Pareto optimal front. Likewise, the spread of Ms6 and Ms4 are only better than Ms5 but worse than Ms5 in convergence quality except ZDT6. In the first of the two unconstrained problems, all movement strategies can easily find good non-dominated fronts. However, when the problems become more complicated, only Ms5 still shows a robust performance in all problems. Ms6 shows reasonably good results on some cases but it is still easily trapped at local Pareto fronts as shown in Figure 6 for ZDT3. In this problem, although the resulting Pareto front of Ms6 is very well distributed, its convergence to the Pareto optimal front is worse than those of other movement strategies. This example shows one critical disadvantage of Ms6 is that the quality of its Pareto fronts depends strongly on the extreme solutions found by each single objective sub-swarm. Ms1, on the other hand, does not have trouble with local Pareto optimal fronts because of its adaptive behavior. Ms1 focuses the effort of the swarm on less crowded area, which normally included extreme solutions. Therefore, improvement can be achieved from any solution on the Pareto front. Yet, when there are certain areas in which non-dominated solutions are easier to obtain, the swarm has a tendency to prematurely converge to these areas. In the experiments, Ms1 shows goods performance on SCH and KUR but it is not effective for ZDT1 to ZDT6 since the first objective functions are easier to be minimized. Although Ms2 and Ms4 show good results on some instances, the quality of resulting Pareto front is still one of the glaring weaknesses of this approach. Moreover, it is also noted that increasing the greediness of Ms2 in Ms4 does not guarantee better performance. Except for ZDT6, Ms3 can be considered as a robust method among other strategies. However, the performance of this strategy depends strongly on the pre-defined potential gap. For that reason, the distributions of the resulting Pareto fronts from Ms3 can only be discussed on a case by case basis.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
48
Voratas Kachitvichyanukul and Nguyen Phan Bach Su Table 4. Mean of three metrics (Unconstrained problems) Test Problem
SCH
KUR
ZDT1
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
ZDT2
ZDT3
ZDT4
ZDT6
Strategy Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6
𝑴∗𝟏 1.035062 1.058749 1.005372 2.659981 1(0.00362) 1.064958 1.758332 7.362007 1.972489 10.44917 1(0.025247) 1.353591 9.478907 10.80313 6.130603 5.973889 1(0.002581) 1.856851 1.331488 9.146298 1.182722 4.433217 1(0.001029) 5.566979 2.671906 3.97204 3.185346 5.37141 1(0.00217) 81.80185 3536.58 6317.99 17.0183 9820.341 1(0.005975) 2028.015 17.82712 2.307523 23.29279 2.159687 1.148484 1(0.00564)
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
𝑴′∗ 𝟐 0.99308 0.992282 0.994995 0.994451 0.993711 1(0.844259) 0.913435 0.989605 0.912134 1(0.833509) 0.97219 0.984856 0.991565 0.995602 0.99396 0.997323 1(0.824706) 0.99911 0.997514 0.989227 0.981219 0.996627 1(0.825737) 0.998383 0.948516 0.977238 0.859253 0.98068 1(0.824289) 0.956321 0.945161 1(0.959932) 0.724717 0.92024 0.858624 0.634008 0.773826 0.893684 0.553983 0.821724 1(0.189139) 0.910736
𝑴𝟑∗ 0.992314 0.991052 0.99221 1(5.702696) 0.991961 0.991961 0.97581 0.979561 0.982567 0.976214 0.985978 1(12.88694) 1(1.427917) 0.990323 0.96879 0.992017 0.990402 0.995628 0.993041 0.987368 0.990835 0.992238 0.993288 1(1.42377) 0.98511 0.983735 0.88697 0.992807 1(1.963175) 0.922507 0.449565 0.727392 0.126903 1(9.870613) 0.143115 0.237908 0.354222 0.397281 0.408121 0.414065 0.203091 1(7.167782)
49
Evolutionary Strategies to Find Pareto Fronts … 1.5 f_2 1 Pareto Ms1 0.5
Ms2 Ms3 f_1
0 0
0.2
0.4
0.6
0.8
-0.5
1
Ms4 Ms5 Ms6
-1
Figure 6. Non-dominated solutions with six movement strategies on ZDT3.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
In order to get more insights into each movement strategy, the movement behaviors on ZDT4 are simulated. ZDT4 is a problem with multiple local Pareto optimal fronts which is a good way to test the effectiveness of the search algorithm. Moreover, the degrees of difficulties of the two objective functions in ZDT4 are quite different. Objective function f1 . of ZDT4 only depends on one variable and can be quickly minimized by PSO. However, when PSO deals with multi-objective problems, the swarm will prefer to move to the segment of the Pareto front which minimizes the easier function. That phenomenon can make particles in the swarm trapped in a small part of the Pareto optimal front. Figures 7, 8, 9 and 10 respectively show the movement behaviors of movement strategy Ms1, Ms2, Ms3 and Ms5 for test problem ZDT4. In Figure 7, the swarm was initialized randomly but it quickly converged to the areas where minimal value of f1 . is located, which is zero. After some attempts to escape from the trap, the swarm found a few new non-dominated solutions (step 307). The swarm then moves aggressively to these points by following Ms1. However, when a more uniform nondominated front was formed, the swarm again preferred to move toward the region where f1 . is small. Ms2 performed very poorly for this problem because it lacks the mechanism to improve quality of each particle. As a result, although particles in the swarm often tried to explore to new area, the fact that they could not improve their quality from the new position drew them back to the trapped situation as presented in Figure 8. The same phenomenon was observed with Ms4 so the graph is omitted here. On the other hand, Ms3 showed a very good performance in this case. In Figure 9, like Ms1, the swarm was trapped at first. Nevertheless, when some non-dominated solutions were found, the swarm immediately identified and attacked gaps in the non-dominated front. During this process, new non-dominated solutions appeared and their gaps were gradually filled by particles. Possibly, Ms5 is the most interesting movement strategy in this case. The first stage of searching process was similar to those of previous movement strategies. Right after some new non-dominated solution were found, particle which followed Ms3 actively filled the gap in between. At the same time, particles following Ms1 and Ms2 change their direction to the Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central, new remote areas.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
50
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
Figure 7. Movement behavior of Ms1.
Figure 8. Movement behavior of Ms2.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Evolutionary Strategies to Find Pareto Fronts …
Figure 9. Movement behavior of Ms3.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
51
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
52
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
Figure 10. Movement behavior of Ms5.
Free particles (not followed any movement strategies) are influenced by the new trend very slowly and tried to improve their quality. At first, particle following Ms3 contribute to the Elite group with many new solutions. However, these solutions were partly and gradually replaced by better solutions found by free particles and particle following Ms1. When Pareto optimal front was found, most particles move back and forth to explore more solutions. Particle following Ms2 were the last ones that settled down on this front. With only less than 100 iterations after the swarm found new non-dominated solutions, the Pareto optimal front was formed. This result shows the strength of the existence of mixed particles in a swarm.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Evolutionary Strategies to Find Pareto Fronts …
53
Table 5. Mean of three metrics (Constrained problems) Test Problem
CONSTR
SRN
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
TNK
Strategy Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6 Ms1 Ms2 Ms3 Ms4 Ms5 Ms6
𝑴∗𝟏
1.119458 1(0.008664) 1.20899 1.053373 1.136229 1.376702 1.050453 2.978904 1(0.535141) 2.778986 2.287646 2.598357 1.026727 1.425835 1.060425 1.74975 1(0.004237) 1.45399
𝑴′𝟐∗
0.78853 0.755412 0.827426 0.709421 0.791389 1(0.642202) 0.986699 0.994076 0.989278 0.998118 0.983986 1(0.830053) 0.964065 0.9995 0.922117 1(0.813999) 0.988746 0.956421
𝑴∗𝟑
0.965408 0.991384 0.954657 0.967631 0.956246 1(7.850423) 0.936048 0.966782 0.936716 0.961672 0.952682 1(293.54) 0.995428 0.992962 0.98845 0.989003 1(1.400844) 0.983353
For constrained problem, the number of violated constraints is treated like an objective function. However, the Elite particles which have smaller number of violated constraints are preferred when the Elite group is updated. The advantage of this approach is that the algorithm not only considers the movements of particles in the feasible search space to get better solutions but also obtain good solutions by moving them from the infeasible region to the feasible region. The results for constrained problems are provided in Table 5. The metrics indicate that the PSO algorithms work reasonably well on constrained problems and there is no specific approach that dominates the others on all three test problems. However, the test problems used so far are quite small. In future studies, more complicated constrained problems are required to confirm the effectiveness of these movement strategies.
7. Conclusions
Multi-objective optimization is currently an important field because of its potential to deal with real world problems. Because of the complexity of MO problems, Evolutionary Algorithm is the most suitable method to find the near Pareto optimal front. In this paper, Particle Swarm Optimization algorithm is discussed and some suggestions on the alternative movement behaviors of the swarm are presented. After analyzing the requirements to achieve high quality Pareto front as well as necessary adjustments in the original PSO framework, six movement strategies are proposed. Several benchmarking problems are used to test the performance of these strategies. The results on unconstrained problems show that each movement strategy has its own strengths and weaknesses. It is also interesting that the best performance can be achieved by logically combining these strategies into heterogeneous swarm of mixed-particles. The simulation of proposed movement strategies provides some insights into the movement of the swarm and it indicates that the diversity in movement is required to obtain a good Pareto front. In future studies, it will be interesting to see how these methods onPublishers, more Incorporated, complex2010. andProQuest muchEbook larger scale problems. Moreover, making the Applications of Swarm Intelligence,perform Nova Science Central,
54
Voratas Kachitvichyanukul and Nguyen Phan Bach Su
movement strategies more adaptive is also an important aspect that needs to be carefully investigated.
Acknowledgment The authors acknowledge the financial support from the 2008 Royal Thai Government Joint Research Program for Visiting Scholars.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
References Ai, T. J. & Kachitvichyanukul, V. (2009). A particle swarm optimization for the vehicle routing problem with simultaneous pickup and delivery, Computers & Operations Research, 36, 1693-1702. Carlo, R. Raquel, Prospero, C. Naval, Jr., (2005). An effective use of crowding distance in multiobjective particle swarm optimization, Proceedings of the 2005 conference on Genetic and evolutionary computation, June 25-29, 2005, Washington DC, USA. Coello Coello, C. A. & Lechuga, M. S. (2002). MOPSO: A proposal for multiple objective particle swarm optimization, Evolutionary Computation, 2002. CEC '02. Proceedings of the 2002 Congress on, vol.2, no., 1051-1056. Deb, K., Pratap, A., Agarwal, S. & Meyarivan, T. (2002). August A fast and elitist multiobjective genetic algorithm: NSGA-ii. Evolutionary Computation, IEEE Transactions on 6(2), 182-197. Jonathan E. Fieldsend & Sameer Singh. (2002). A multi-objective algorithm based upon article swarm optimisation, an efficient data structure and turbulence. In Proceedings of the 2002 U.K. Workshop on Computational Intelligence, pages 37–44, Birmingham, UK, September 2002. Gen, M. & Cheng, R. (1999). Genetic Algorithms and Engineering Optimization (Engineering Design and Automation). Wiley-Interscience. Kennedy, J. & Eberhart, R. (2002). August Particle swarm optimization. In: Neural Networks, 1995. Proceedings., IEEE International Conference on. Vol. 4, 1942-1948 vol.4. Peram, T., Veeramachaneni, K. & Mohan, C. K. (2003). Fitness-distance-ratio based particle swarm optimization, Swarm Intelligence Symposium, 2003. SIS '03. Proceedings of the 2003 IEEE , vol., no., 174-181, 24-26. Pongchairerks, P. & Kachitvichyanukul, V. (2005). Non-homogenous particle swarm optimization with multiple social structures. In: V., Kachitvichyanukul, U. Purintrapiban, & P. Utayopas, (Eds.), Simulation and Modeling: Integrating Sciences and Technology for Effective Resource Management. Proceedings of international conference on simulation and modeling 2005. Asian Institute of Technology, P.O.Box 2754, Bangkok Thailand. Pongchairerks, P. & Kachitvichyanukul, V. (2009a). A Two-level Particle Swarm Optimization Algorithm on Job-shop Scheduling Problems, International Journal of Operational Research, Vol. 4, No. 4, 390-411, 2009. Pongchairerks, P. & Kachitvichyanukul, V. (2009b). Particle swarm optimization algorithm with multiple social learning structures. International Journal of Operational Research, 6(2), 176-194.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Evolutionary Strategies to Find Pareto Fronts …
55
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Shi, Y. & Eberhart, R. (1998). A modified particle swarm optimizer. In: Evolutionary Computation Proceedings, 1998. IEEE World Congress on Computational Intelligence., The 1998 IEEE International Conference on. 69-73. Shi, Y. & Eberhart, R. C. (1999). Empirical study of particle swarm optimization. In: Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on. Vol. 3, 1950 Vol. 3. Srinivas, N. & Deb, K. (1994). September Multiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation, 2 (3), 221-248. Veeramachaneni, K., Peram, T., Mohan, C. & Osadciw, L. A. (2003). Optimization Using Particle Swarms with Near Neighbor Interactions, in Lecture Notes in Computer Science, Springer Berlin/Heidelberg, ISBN 0302‐9743 (Print) 1611‐3349 (Online), Volume 2723/2003, DOI:10.1007/3‐540‐45105‐6_10. Zitzler et al., (2000) Zitzler, E., Deb, K. & Thiele, L. 2000 Comparison of multiobjective evolutionary algorithms: Empirical results, Evolutionary Computation, 8(2), 173-195.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
In: Applications of Swarm Intelligence Editor: Louis P. Walters, pp. 57-76
ISBN: 978-1-61728-602-5 © 2011 Nova Science Publishers, Inc.
Chapter 3
PARTICLE SWARM OPTIMIZATION APPLIED TO REAL-WORLD COMBINATORIAL PROBLEMS: THE CASE STUDY OF THE IN-CORE FUEL MANAGEMENT OPTIMIZATION Anderson Alvarenga de Moura Meneses and Roberto Schirru Nuclear Engineering Program/COPPE, Federal University of Rio de Janeiro, Brazil
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Abstract Particle Swarm Optimization (PSO) is an Optimization Metaheuristic situated within the Swarm Intelligence paradigm. Based upon collaborative aspects of intelligence, the PSO is a search methodology that represents the social learning metaphor: the individuals learn with their own experience in a group and take advantages of the performance of other individuals. After almost fifteen years of the seminal paper written by Kennedy and Eberhart, the PSO has a noticeable number of applications, in particular real-world problems in which it outperforms other algorithms. In this chapter, the application of discrete PSO models to Combinatorial Optimization problems is reviewed. The PSO fundamentals are discussed, and the PSO with Random Keys, a discrete model of the PSO, is presented. Experimental computational results of its application to instances of the Traveling Salesman Problem and to the real-world InCore Fuel Management Optimization are given as examples that show the success obtained by this technique.
Keywords: Swarm Intelligence, Optimization Metaheuristics, Particle Swarm Optimization, Nuclear Engineering, In-Core Fuel Management Optimization, Combinatorial Optimization
1. Introduction The Particle Swarm Optimization (PSO) [1, 2] is a metaheuristic based upon the Swarm Intelligence, which considers both the individual learning and the social cognition, that is, the
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
58
Anderson Alvarenga de Moura Meneses and Roberto Schirru
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
learning and experience of a particle and its ability in taking advantage of the experience of the swarm in a social context respectively. Initially inspired by the motion of fish schools, bird flocks and bee swarms, its adaptation as an optimization metaheuristic represents the metaphor of animals acting collaboratively while looking for food or a special area. After almost fifteen years of the seminal paper written by Kennedy and Eberhart in 1995, the PSO has a noticeable number of applications, in particular real-world problems in which it outperforms other algorithms. Therefore, to understand how real-world problems might be modeled in order to be solved by the PSO is particularly important in several areas of knowledge. Thus, an overview of PSO models for combinatorial optimization for application to real-world problems will be presented, focusing the PSO with Random Keys (RK) model, which presented results that outperformed other metaheuristics for combinatorial problems in areas such as nuclear engineering [3] and operational research [4]. To achieve this goal, we present the case study of the real-world In-Core Fuel Management Optimization (ICFMO) [5], which is a classical problem in Nuclear Engineering. ICFMO, Loading Pattern (LP) design optimization problem or nuclear reactor reload problem are denominations for the combinatorial problem associated to the nuclear reactor fuel reloading operation executed in Nuclear Power Plants (NPPs), which substitutes part of the nuclear fuel periodically. It is a problem studied for more than four decades and several techniques had been used in this optimization problem, which encompasses human expert knowledge with solutions guided by nuclear reactor physics characteristics such as enrichment of the fuel assemblies (FAs) and power distribution within the nuclear reactor core [6, 7, 8] and optimization metaheuristics such as Simulated Annealing (SA) [9, 10], Genetic Algorithm (GA) [11, 12], Tabu Search (TS) [13], Population-Based Incremental Learning (PBIL) [14], and Ant Colony Optimization (ACO) [15, 16]. However, it is truly important to highlight that the study of new technique for solution of a real-world problem also might involve the solution of computer science benchmarks. The primary reason why it is necessary is that there exist real-world problems that may not be benchmarked, which is the case for the ICFMO, as it will be discussed in section 4. Thus, the validation of the code and a preliminary study of the behavior of an optimization metaheuristic for application to the ICFMO might be made with problems such as the Traveling Salesman Problem (TSP) [17, 18]. Thus, the optimization of TSP instances that are similar in complexity to the model of the ICFMO that one wants to solve is also discussed herein, as well as the theoretical basis that make the ICFMO similar to the TSP in complexity. In fact, it is possible to make an analogy among the TSP and the ICFMO in the following sense: in the TSP, a tour (the order of visitation of cities) contains the information to be evaluated; in the ICFMO, positions of the FAs in the reactor core, also represented as a sequence of integer numbers without repetitions allowed, contain the information to be evaluated. In sum, the main goal of this chapter is to present the Particle Swarm Optimization with Random Keys (PSORK) [3], its computational experimentations with the benchmark TSPs Oliver30 (symmetrical) and Rykel48 (asymmetrical) comparing the size of the swarms and parameters in different experiments as well as demonstrating its efficiency as a metaheuristic with a number of evaluations relatively low for application to the ICFMO. The remainder of this chapter is organized as follows. The PSO is discussed in section 2. An overview of PSO models for combinatorial problems is presented in section 3. Section 4 presents details about the PSORK. The ICFMO is discussed in section 5, with a brief description of the TSP. Computational experimental results are in section 6 and their discussion, in section 7. Finally, conclusions are in section 8. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
59
Particle Swarm Optimization Applied …
2. Particle Swarm Optimization The PSO metaheuristic was presented in 1995 and its algorithm models a collaborative search, taking into account the social aspects of intelligence. PSO was initially proposed to optimize non-linear continuous functions [1, 2]. There exists a profound significance in such collective intelligence. Indeed, the way human beings deal with information and knowledge is only possible because of the swarm intelligence. The human abilities to handle abstractions and ideas, as well as the development of science for example, are collectively and socially constructed. In this context, Swarm Intelligence is a bio-inspired collaborative system whose computational optimization implementation model has achieved considerable results in various knowledge areas. A swarm with P particles performs the optimization in an n-dimensional search space. Each particle i has a position xit = [xi1 xi2 … xin] and a velocity vit = [vi1 vi2 … vin] at a iteration t, through the dimension j = 1, 2, ..., n updated according to the equations vijt+1 = wtvijt + c1r1(pbestij – xijt) + c2r2(gbestj – xijt)
(1)
xijt+1 = xijt + vijt+1,
(2)
and
where i = 1, 2, ..., P. The inertia weight wt may decrease linearly according to the equation
wt w
w wmin t t max
(3)
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
where w is the maximum inertia constant, wmin is the minimum inertia constant, tmax is the maximum number of iterations and t is the current iteration. High values of wt lead to global search making the particles explore large areas of the search space, while low values of wt lead to the exploitation of specific areas. At the right side of eq. (1), the first term represents the influence of the own particle motion, acting as a memory of the particle’s previous behavior; the second term represents the individual cognition, where the particle compares its position with its previous best position pbesti; and the third term represents the social aspect of intelligence, based on a comparison between the particle’s position and the best result obtained by the swarm gbest (global best position). Eq. (2) describes how the positions are updated. Both c1 and c2 are acceleration constants: c1 is related to the individual cognition whereas c2 is related to social learning; r1 and r2 are uniformly distributed random numbers. The positions and velocities are initialized randomly at implementation. The basic algorithm is described in Figure 1. The positions xit updated by eqs. (1) and (2) are evaluated by an objective function or fitness f(xi) of the problem. The positions vectors gbest = [gbest1 gbest2 … gbestn] and pbesti = [pbesti1 pbesti2 … pbestin] are updated depending on the information acquired by the swarm, constructing its knowledge on the search space over the iterations. As stated earlier, the PSO was initially developed for optimization of continuous functions. Its outstanding performance in such domain led the researchers to investigate the optimization of combinatorial problems with discrete versions of the PSO. Such models will be discussed in the next section. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
60
Anderson Alvarenga de Moura Meneses and Roberto Schirru
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
3. Models of Particle Swarm Optimization for Combinatorial Problems The first PSO model for discrete optimization was developed by Kennedy and Eberhart in 1997 [19], published two years after their first article on PSO [1]. A discrete version of the PSO was presented with the representation of the particle’s positions as bitstrings. The velocities were represented as probabilities of changing the bits of the positions. Another important PSO model for combinatorial optimization was proposed by Salman et al. in 2002 [20], who applied the PSO to the optimization of the Task Assignment Problem (TAP). The main idea is that the particles keep flying on an n-dimensional space, but their position is mapped onto combinatorial solutions for the TAP, a problem in which the repetitions are allowed. In this case, the mapping onto combinatorial solution is simply obtained by truncating the components of the positions. Although it was proven to be a good solution for the TAP, this approach might not be used for other problems in which the repetition of elements is not allowed in the representation of solutions, such as the TSP or the ICFMO. One year later, Wang et al. [21] presented a PSO model for the TSP whose equations were based on Swap Operators and Swap Sequences. The Particle Swarm Optimization based on Space Transformation with 2-opt local search procedure was proposed by Pang et al. [22] in 2004 for solution of the TSP. Tagestiren et al. [23] proposed in the same year the PSO with the Smallest Position Value for the Single Machine Total Weighted Tardiness problem with the Variable Neighborhood Search as a local search procedure. These two approaches might be seen as a different interpretation of the RK approach, proposed by Bean in 1992 [24], which will be discussed in the next section. Nevertheless, the usage of local search procedures such as the ones used in [22] and [23] might not be interesting or appropriated for other problems, and this is exactly what happens for the ICFMO. For example, it is not possible to ensure that local search procedures used for the optimization of the TSP will be successful for the real-world ICFMO because of the following. When the order of two cities in a tour (candidate solution) for a TSP is changed locally, the resulting tour may be a shorter path or not, nevertheless it is always a feasible solution. In the case of the ICFMO, the core configuration obtained by exchanging two FAs may be unfeasible. The first results of research involving the PSO with RK for the ICFMO were published in 2005 [25] without local heuristics, one of the first applications of the PSO to real-world problems in Nuclear Engineering, such as the optimization of reactor core designs [26] and the identification of NPP transients [27]. 1. Initialization 1.1. For each particle i in a population of size P: 1.1.1. Initialize xi randomly. 1.1.2. Initialize vi randomly. 1.1.3. Evaluate the fitness f(xi). 1.1.4. Initialize pbesti with a copy of xi. 1.2. Initialize gbest with a copy of xi with the best fitness. 2. Repeat until a stopping criterion is satisfied: 2.1 For each particle i: 2.1.1. Update vit and xit according to eqs. (1) and (2). 2.1.2. Evaluate f(xit). 2.1.3. if f(pbesti) < f(xit) then pbesti ← xit . t 2.1.4. if f(gbestofi) the < f(xParticle x it . i ) then gbest i ←Optimization. Figure 1. The basic algorithm Swarm
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
61
Particle Swarm Optimization Applied … 0.39
0.12
0.54
0.98
0.41
1
2
3
4
5
2
1
5
3
4
Figure 2. Random Keys approach for the mapping of a real vector onto a combinatorial candidate solution.
The next section will describe the PSO algorithm with RK for combinatorial optimization.
4. Particle Swarm Optimization with Random Keys
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
4.1. Random Keys Random Keys (RKs) model was proposed by Bean in 1992 [24] for application to GAs. RKs are useful to map a solution with real numbers, which will work as keys, onto a combinatorial solution, that is, a candidate solution for a given combinatorial problem. There are several models for RK which might be seen in [24]. For the PSORK the approach used is the same as in the Single Machine Scheduling Problem (SMSP), with no repetitions allowed. Let’s consider a representation of two chromosomes C1 and C2 in the GA, both corresponding to vectors of a five-dimensional real space. With the RK approach, for a chromosome C1 = [0.39 0.12 0.54 0.98 0.41], depicted in Figure 2, the corresponding individual, or a representation of a permutation (a candidate solution for a five-dimensional combinatorial problem where no repetition is allowed), would be I1 = (2, 1, 5, 3, 4), since 0.12 is the lower number and corresponds to the second allele; 0.39 corresponds to the first allele and so forth. In other words, the integers 1, 2, 3, 4 and 5 are re-ordered according to the information in the chromosome, in this case, C1. Another example, for a chromosome C2 = [0.08 0.36 0.15 0.99 0.76], the individual would be mapped as I2 = (1, 3, 2, 5, 4). If a crossover operation were performed directly on the feasible individuals I1 and I2 for the SMSP, TSP or ICFMO (Figure 3), with a crossing site between the second and third alleles, the resultant offspring composed of the descending individuals I3 = (2, 1, 2, 5, 4) and I4 = (1, 3, 5, 3, 4) would be unfeasible for the TSP and the ICFMO, since I3 and I4 are not feasible solutions for those problems. They represent solutions with of repetition of elements, which could be cities for the TSP or FAs for the ICFMO. In both cases, such repetitions are not allowed.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
62
Anderson Alvarenga de Moura Meneses and Roberto Schirru
Parent-individual I1 (2, 1, 5, 3, 4)
Descending-individual I3 (2, 1, 2, 5, 4)
Parent-individual I2 (1, 3, 2, 5, 4)
Descending-individual I4 (1, 3, 5, 3, 4)
Cross site
Figure 3. Cross-over operation directly applied to two feasible individuals I1 and I2 for the TSP and ICFMO. The descending-individuals I3 and I4 are not feasible for these problems.
The RK encoding guarantees that the offspring will be a representation of feasible individuals for these combinatorial problems where no repetition is allowed, since the crossover operation is performed upon the chromosomes, instead of directly upon the individuals. Given the two parent-chromosomes C1 and C2, as depicted in Figure 4, with a cross site between the second and third alleles, the descending chromosomes C3 = [0.39 0.12 0.15 0.99 0.76) and C4 = [0.08 0.36 0.54 0.98 0.41] would be decoded into feasible individuals I3 = (2, 3, 1, 5, 4) and I4 = (1, 2, 5, 3, 4).
Descending-chromosome C3 Parent-chromosome C1 [0.39 0.12 0.54 0.98 0.41] [0.39 0.12 0.15 0.99 0.76] Descending-individual I3 (2, 3, 1, 5, 4)
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Parent-chromosome C2 Cross Site
Descending-chromosome C4 [0.08 0.36 0.15 0.99 0.76] [0.08 0.36 0.54 0.98 0.41] Descending-individual I4 (1, 2, 5, 3, 4)
Figure 4. Crossover operation applied to two chromosomes C1 and C2. The descending-individuals I3 and I4 are feasible for the TSP and the ICFMO. Position xijt
4.3
2.3
1.2
2.6
4.2
Velocity vijt
-0.2
+0.4
+0.2
-0.7
+0.2
New position xijt+1
4.1
2.7
1.4
1.9
4.4
1
2
3
4
5
3
4
2
1
5
Auxiliar vector uijt
(a)
(b)
(c)
Figure 5. (a) One particle’s positions and velocity for a 5-dimensional search space. (b) New positions as real keys. (c) Auxiliary vector indicating a feasible solution, which will be evaluated by the objective function of the problem. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Particle Swarm Optimization Applied …
63
4.2. Particle Swarm Optimization with Random Keys Salman et al. [20] applied PSO to the optimization of the TAP truncating the components of the positions of the particles in order to obtain candidate solutions for the TAP, since repetitions allowed in this problem. The PSORK uses the vector xit as keys to generate feasible solutions, as it is demonstrated in Figure 5, with the step (b)(c) similar to the process depicted in Figure 2. The main adaptation is the interpretation of the position. It represents a set of keys that allows the mapping of the information acquired along the iterations. Thus, the positions do not need to be rounded or truncated as in the PSO model for TAP. The positions vector’s information is decoded by the RK yielding a feasible solution to be evaluated by the objective function. Thus, we obtain discrete feasible solutions for the ICFMO, from a key-search continuous space. In other words, the particles search on a real continuous space, with the positions mapped onto discrete candidate solutions to combinatorial problems, as the example for a five-dimensional search space depicted in Figure 5.
5. Optimization of Real-World Problems: The Case Study of the in-Core Fuel Management Optimization
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
5.1. The Traveling Salesman Problem The TSP is a NP-Hard problem and one of the most noticeable Combinatorial Optimization problems [17, 18]. It is a reference problem in the Computational Complexity Theory, considered of difficult solution although having a simple formulation: given a number n ≥ 3 of cities (or nodes) and the distance between them, the goal is to determine the shorter total distance path, visiting each city once and turning back to the first one visited. Thus, the feasible solutions of the TSP are discrete sets, corresponding to a permutation in the sequence of visited cities. When a TSP is symmetrical, given two different cities i and j, the distance dij to go from one to another is the same in the inverse path, that is, dij = dji i, j = 1, ..., n. A symmetrical TSP has n 1! feasible solutions. For an asymmetrical TSP, dij ≠ 2 dji i≠j; thus there are n 1! different paths. For the TSP, the PSORK uses the positions vector xit as real numbers keys to generate feasible solutions, as it was depicted in Figure 5, with the number of dimensions equal to the number of cities of the problem instance. Each solution generated has the cities in such a visitation order that all of the solutions become feasible since the RK do not allow repetitions in its decoding. The main adaptation is the interpretation of the position. It represents a set of keys that allows the decoding of the information acquired along the iterations. The position vector’s information is decoded by the RK yielding a feasible solution to be evaluated by the objective function. Thus, a discrete search space, with feasible solutions for the TSP, is obtained from a real keys continuous search space, where PSO reaches considerable results. In this sense, a search on a real continuous space may generate feasible solutions to combinatorial problems and this is the approach used for the ICFMO, which will be described in the next subsection.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
64
Anderson Alvarenga de Moura Meneses and Roberto Schirru
5.2. The In-Core Fuel Management Optimization
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
5.2.1. A General Description of the ICFMO The fuel-shuffling problem, LP design optimization, or In-Core Fuel Management Optimization (ICFMO) is a prominent problem in Nuclear Engineering for various reasons. The design of LPs for a Pressurized Water Reactor (PWR) involves multiple objectives in its project concerning economics, safety and reactor physics calculations. Its principal characteristics are nonlineartity, multimodality, discrete solutions with nonconvex functions with disconnected feasible regions and high dimensionalit y [10]. In [28], Galperin has performed an analysis of a search space region in order to understand its structure and 300,000 patterns have been generated, with the evaluation of performance parameters corresponding to the candidate solutions. In this way, it has been demonstrated that there exists a large number of local optima in the region studied, about one peak per hundred configurations. Following this rationale one might roughly estimate 10 11 local optima in the case of a 1/8 symmetry model, which has approximately 1013 possible configurations (LPs). In this sense, gradient-based algorithms are not adequate to the ICFMO. Conversely, metaheuristics such as SA, PBIL, ACO, GA and TS have been applied to this problem with considerable success [9-16]. After a time period, called operation cycle, it is not possible to maintain the NPP operating at the nominal power. At that time, the shutdown of the NPP is necessary for the reloading operation. In this process, the most burned fuel assemblies are exchanged by fresh nuclear fuel assemblies. The ICFMO consists in searching for the best reloading pattern of fuel assemblies, with an objective function evaluated according to specific criteria and methods of Nuclear Reactor Physics. Figure 6 depicts the simplified schematic representation of 121 nuclear FAs (view from top) of a PWR NPP considering the combinatorial characteristics of the ICFMO. In a NPP, the nuclear fuel is disposed into FAs that are distributed inside the reactor core. The ICFMO is the optimization problem related to the operation of finding the best loading pattern, that is, the core configuration that optimizes specific criteria, subject to constraints. In this case, the FAs are placed and organized in order to optimize the Uranium utilization in terms of the nuclear fuel cycle length, considering planned power demand, subjected to thermo-hydraulic constraints related to safety as maximum allowed fuel assembly burn-up, hot channel factor and moderator coefficient temperature. Thus, the ICFMO consists in searching for the best reloading pattern of fuel assemblies, with an objective function and constraints evaluated according to methods of Nuclear Reactor Physics. Concerning the combinatorial characteristic of the ICFMO as an optimization problem, the model depicted in Figure 6 yields the number of 121! (≈ 8.09E+200) solutions. Since the Power Distribution, which is calculated depending on the FAs positions within the core, must be symmetrical in the nuclear reactor core, this feature can be used to reduce the complexity of its optimization. There are two main axes of symmetry dividing the core into four regions that are called 1/4 (one-fourth) symmetry axes. These axes and the two secondary diagonal axes divide the core into eight regions. Figure 7 represents the model of 1/8-simmetry of the nuclear reactor core viewed from the top, which is interesting for this survey on combinatorial optimization of the ICFMO. This symmetry is also used in[12, 14-16]. The so-called 1/8 (one-eighth) symmetry, which is depicted in Figure 7, has 21 FAs: 10 FAs at both symmetry lines, 10 FAs out of the symmetry lines and 1 central FA, which does not take part in the permutation. In our PSORK surveys, the FAs that belong to both of the
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
65
Particle Swarm Optimization Applied …
symmetry axes in the 1/8 symmetry model must be permuted with FAs of both symmetry axes. FAs that do not belong to the symmetry axes must be permuted with FAs that do not belong to the symmetry axes.
1/8-symmetry lines 1/4-symmetry lines
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 6. Nuclear reactor core (view from top): 121 Fuel Assemblies and symmetry lines.
Figure 7. Representation of the 1/8-symmetry model: except for the central element in grey, the 20 elements are permuted.
5.2.2. Mathematical Formulation of the in-Core Fuel Management Optimization Formally, for a single plant and single-cycle optimization, without considering orientation or BP positioning, and for cycle length optimization, subject to safety constraints, the ICFMO may be stated as variation of the Assignment Problem (AP) [29], with similarities on the constraints of position, although obvious differences in the evaluation function: given a set S of m FAs, a set D of m positions the ICFMO problem is to assign each fuel assembly i S to one and only one position j D in such a manner that each position gets covered by some FA. If we let the decision variables 1, if i is assigned to j x 0 , otherwise
ij Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
(4)
66
Anderson Alvarenga de Moura Meneses and Roberto Schirru
then the objective function related to the cycle length might be written as maximize (or minimize) Fcycle-related (p)
(5)
where F is a function related to the cycle-length (that could be maximized or minimized, depending on the choice of the function) and p is a vector representing a permutation of FAs. The positioning constraints related to each FA is assign to exactly one position and to each position is covered by some FA are m
x
ij
1 i 1, 2, ..., m
(6)
ij
1 j 1, 2, ..., m
(7)
j 1
m
x i 1
and safety constraints may be represented by a function G related to the safety
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Gsafety-related (p) < G0 .
(8)
Both values for F and G during the optimization process might be obtained with Reactor Physics codes having the LP represented by p as an input. Machado and Schirru [15] have applied Ant Colony Optimization (ACO) [30] to the ICFMO, modeling the problem based on the TSP. Chapot et al. [12] use TSP benchmarks for testing metaheuristics before applying them directly to the ICFMO. This leads to a brief discussion about the models of the ICFMO, the AP and the TSP. For example, let’s consider the permutation p1 = [2 1 4 3], representing the positioning of 4 FAs into an abstract reactor core, each one at one of the 4 positions of the core, with the FA 2 in the first position, the FA 1 in the second position and so on. According to the statement of the ICFMO based on the AP, the decision variables for the permutation p1 could be represented as in the matrix 0 1 X xij 0 0
1 0 0 0 0 0 0 0 1 0 1 0
(9)
representing that the FA i is assigned to the position j. Based on the formulation of the TSP in [29], which considers i and j as two successive cities, and according to Machado and Schirru [15], the ICFMO may be represented as a sequence of successive FAs i and j in a hypothetic line in the core. The matrix Y with the decision variables yij for the abstract LP represented by p1 is therefore 0 1 Y yij 0 0
0 0 1 0 0 0 . 1 0 0 0 1 0
(10)
That is, the sequence given in p1 implies that the FA 2 is followed by the FA 1, the FA 1 is followed by the FA 4 and so on, concerning the hypothetic line within the core. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
67
Particle Swarm Optimization Applied …
Although the formulations for the ICFMO based on the AP and TSP yield different decision variables matrix, both are equivalent in the following sense. According to Lawler [31], the TSP is a special case of the Koopsman-Beckmann formulation and therefore “the ncity TSP is exactly equivalent to an n×n linear assignment problem” with the constraint that the permutations in the TSP must be cyclic. The advantage of making this point clear is that any technique applied to the TSP or to the AP may be equivalently adapted to the ICFMO, as it has been done for example with Optimization Metaheuristics such as GA, ACO and PSO.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
5.2.2. Simulation of Angra 1 NPP with the Reactor Physics Code Recnod Angra 1 NPP is a 2-loop PWR located at Rio de Janeiro State at the Southeast of Brazil, whose core is composed by 121 FAs. The Reactor Physics code RECNOD is a simulator for Angra 1 NPP. The development and tests related to the the 7th cycle of Angra 1 are detailed in [32]. With an octave-symmetry for the RECNOD simulation, FAs must be permuted except for the central element. In our experiments, FAs of the symmetry lines (quartets) are not supposed to be exchanged with elements out of the symmetry lines (octets). In [32], other situations in which this kind of symmetry is broken are reported. RECNOD is a nodal code based on the works described by Langenbuch et al. [33], Liu et al. [34] and Montagnini et al. [35] and applied to optimization surveys in several works [12, 14, 15, 16, 32]. The nuclear parameters yielded by the code are, among others, the Maximum Assembly Relative Power (Prm) and the Boron Concentration (CB). The value of Prm is used as a constraint related to safety. The computational cost of the RECNOD code is reduced since it does not perform the Pin Power Reconstruction. However, the usage of Prm as a safety constraint does not violate the technical specifications of Angra 1 NPP [32]. For a maximum required radial power peak factor FXYmax = 1.435 for Angra 1 NPP, the calculations yield a correspondent Prm = 1.395. Any LP with Prm > 1.395 is infeasible in the sense of the safety requirements. The Boron Concentration yielded by the RECNOD code is given at the equilibrium of Xenon, another aspect that reduces the computational cost of the processing, without impairing its validity for optimization purposes. Chapot [32] demonstrated that it is possible to extrapolate and predict the cycle-length based on the CB at the equilibrium of Xenon, in such a way that 4ppm are approximately equivalent to 1 Effective Full Power Day (EFPD). Thus, if we consider the equations (5), for a function F related to the cycle, and (8), for a function G related to safety, then for this specific case of Angra 1 NPP we have
1 , CB
(11)
Gsafety-related (p) Prm
(12)
G0 1.395
(13)
Fcycle-related (p)
and
then the ICFMO might be stated as
1 C B Central, Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook minimize
(14)
68
Anderson Alvarenga de Moura Meneses and Roberto Schirru
subject to
Prm 1.395 .
(15)
For the implementation of the Particle Swarm Optimization, we have aggregated both the objective function and the constraint in only one fitness function (considering that the values of Prm are always greater than the reciprocal of the Boron Concentration in our experiments) thus we have 1 , if Prm 1.395 Fitness CB Prm , otherwise
(16)
which was the fitness used in our set of experiments as well as in [3], with the nuclear parameters Prm and CB yielded by the RECNOD code.
5.2.2. PSORK Model for the ICFMO
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Finally, with the information provided previously, we may describe how the PSORK is modeled for the ICFMO. The positions vectors xit+1 are used as keys for obtaining the auxiliary vector of feasible solutions ui with 20 numbers from 1 to 20 without repetition, according to the RK approach depicted in Figure 8, analogous to the mapping depicted in Figure 5. The information into the particle’s positions vector configures a core which is evaluated by the RECNOD code. Thus, each particle represents one feasible LP, that is, candidate solutions for the ICFMO are obtained from a real 20-dimensional search space, and the RECNOD code evaluates each one of them, yielding CB and Prm for the aggregated objective function (eq. 16).
6. Computational Experimental Results The experiments for the TSPs and the ICFMO were performed with an AMD Athlon 64 X2 Dual Core Processor 3800+ 1.99GHz RAM 2GB.
6.1. Traveling Salesman Problem The codes for the TSPs optimization were implemented in FORTRAN90 language. One 50-particles flight for Oliver30 optimization with 1000 iterations took in average 0.55 seconds. One 500-particles flight for Rykel 48 optimization with 5000 iterations took in average 43.7 seconds.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
69
Particle Swarm Optimization Applied … 0.1
0.7
0.4
2.0
1.8
1.3
0.5
0.9
1.1
1.2
0.6
1.9
0.2
1.5
0.8
0.3
1.7
1.0
1.6
1.4
0.1
0.7
0.4
2.0
1.8
1.3
0.5
0.9
1.1
1.2
0.6
1.9
0.2
1.5
0.8
0.3
1.7
1.0
1.6
1.4
1
3
7
2
8
9
10
6
5
4
13
16
11
15
18
20
14
19
17
12
(a)
(b)
(c)
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1
7 (d)
2
11
8
3
12
16
9
4
13
17
19
5
14
18
20
6
15
10
1
10
3
13
6
7
16
20
5
2
11
14
17
8
15
19
12
9
18
(e)
4
Figure 8. (a) Position xit of the i-th particle within 20-dimensional space (current iteration t); (b) Split of the real keys, since a symmetry line FAs must be replaced over the symmetry line; (c) The ten first real keys represent a solution for the symmetry line FAs , decoded by the RK approach; the other ten keys are decoded in order to obtain a solution for the FAs that do not belong to the positions over the symmetry lines. (d) The original positions in the 1/8-core symmetry; (e) The final loading pattern to be evaluated by the Reactor Physics code RECNOD.
The best known solutions (Opt) are 423.73 for Oliver30 and 14422 for Rykel48 [30]. We present the best result, the worst result, average (Ave), standard deviation and relative error (Err) calculated as Err = (Ave − Opt)/Opt × 100% .
(7)
Table 1. General results for PSORK applied to the TSP Oliver30 (wmin = 0.2; 1000 iterations; results within 100 flights; fitness of the best known solution 423.73)
Number of Best Average of Standard w c Particles Result the Results Deviation 50 0.7 2.0 425.10 510.42 45.16 50 0.8 2.0 424.68 506.10 44.63 50 0.9 2.0 428.03 523.28 42.53 500 0.7 2.0 423.73 467.61 35.86 500 0.8 1.9 423.73 462.95 29.46 Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central, 500 0.9 1.9 423.73 468.61 31.80
Worst Result 617.46 635.21 645.59 565.86 556.83 544.54
Err (%) 20.46 19.44 23.49 10.36 9.26 10.59
70
Anderson Alvarenga de Moura Meneses and Roberto Schirru
Table 2. General results for PSORK applied to the TSP Rykel48 (wmin = 0.2; 500 particles; 5000 iterations; results within 100 flights; fitness of the best known solution 14442) w
c
0.7 0.8 0.9
2.0 2.0 1.9
Best Result 14572 14720 15289
Average of the Results 16813 17023 17535
Standard Deviation 949 1320 1368
Worst Result 19204 22172 22433
Err(%) 16.58 18.03 21.59
Table 3. PSORK applied to ICFMO: all of the results within the security limit Prm 1.395 (wt = 0.8 to 0.2; c1 = c2 = 1.8; 50 particles; 200 iterations; 80 flights). Best Result 1405
Average of the Results 1223
Standard Deviation 64
Worst Result 1070
The parameters, the number of iterations as well as the size of the swarms are also reported. We have run the experiments for 100 times for each pair (w, c), where w {0.7, 0.8, 0.9}, c = c1 = c2 {1.7, 1.8, 1.9, 2.0, 2.1, 2.2} and wmin = 0.2. All swarms have used the star neighborhood (gbest neighborhood) topologies [2]. The results are exposed in tables 1 (Oliver30; 50 and 500 particles; w = 0.7, 0.8, 0.9) and 2 (Rykel48; 500 particles; w = 0.7, 0.8, 0.9). Figure 9 depicts the box-plots for the PSORK applied to the TSP Oliver30 (1000 iterations, 50 and 500 particles); Figure 10 depicts the box-plots for the PSORK applied to the TSP Rykel48 (5000 iterations, 500 particles).
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
6.2. In-Core Fuel Management Optimization The codes for the ICFMO optimization were implemented in C++ language. In order to obtain results for the PSORK applied to the real-world ICFMO, we have performed 80 flights and each swarm had 50 particles, with c1 = c2 = 1.8, and a varying w from 0.8 to 0.2 within 200 iterations. Star neighborhood (gbest neighborhood) topology has also been used in this case. Table 3 demonstrates the results for the ICFMO. The Err value was not calculated since the best optimum is not defined for this real-world problem. Figure 11 depicts the Boron Concentrations of each best solution versus their equivalent Prm at the end of each one of the 80 experiments. It is interesting to notice that despite both of the objective functions of the TSP as well as the ICFMO are formulated as minimization problems, in Figure 11 the best results for the ICFMO are depicted as the Boron concentration because of its direct relationship with the NPP cycle length the ICFMO and its economical advantages. In addition, all Prm are less than or equal to 1.395, indicating that all of the best LPs found at the end of the flights respect the safety constraints.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Particle Swarm Optimization Applied …
71
7. Discussion 7.1. Traveling Salesman Problem
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
In the case of the Oliver30 symmetrical TSP, our experiences demonstrate that it is possible to attain considerable performances within very inexpensive computational cost increasing the number of particles, instead of tuning parameters, that is, searching for better values for w, c1 and c2. Indeed, to multiply the swarm’s size by ten in a problem where the search space has approximately 4.4×1030 candidate solutions represents proportionally very few enhancing of the computational cost. In other words, the PSO characteristic of finding near optimal results within reasonable time remains. In the optimization of the instance Rykel48, which has approximately 5.9×1028 times the number of candidate solutions of the instance Oliver30, the computational average time considering 500-particles swarms was about 7.9 times greater for finding near optimal results.
Figure 9. Box-plots for the PSORK applied to the TSP Oliver30. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
72
Anderson Alvarenga de Moura Meneses and Roberto Schirru
Figure 10. Box-plots for the PSORK applied to the TSP Rykel48.
One may also infer consistence in the optimization of the TSP, provided by the Random Keys approach. It is noticeable the fact that the PSORK applied to the TSP yields regularity, robustness and convergence in the results, with very similar distributions for 50-particle and 500-particle swarms for the two presented TSPs, depicted by Figures 9 and 10, reflecting the ability of the swarms of exploring large areas of the search space as well as their exploitation due to appropriate values for wt, proper characteristic of the canonical implementation or basic algorithm of the PSO. For example, experiments for Ant Colony Systems [30] with local heuristics (as nearest neighbor, for example) reach the best known minimum for the TSP Applications of Swarm Intelligence, Without Nova Sciencethis Publishers, Incorporated, 2010.the ProQuest Ebook Central, Oliver30. local updating, minimum for ACS algorithm is 423.91. ACS with
73
Particle Swarm Optimization Applied …
the local heuristic 3-opt reaches 14422 [30], without references to its solution without local heuristics. As we have written before, PSORK also does not use any local heuristics because of its design for application to real-world problems with local heuristics different from the ones used for the TSP. Furthermore, the parameters values c1 and c2 that reached the best results for combinatorial optimization of the proposed TSPs are also consistent with the literature on continuous PSO (c1 + c2 4.0) [2]. 1450
Boron Concentration
1400 1350 1300 1250 1200 1150 1100 1050 1000 1.300
1.320
1.340
1.360
1.380
1.400
MARP
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 11. PSORK applied to ICFMO (wt = 0.8 to 0.2; c1 = c2 = 1.8; 50 particles; 200 iterations; 80 flights). The Boron Concentration versus the Maximum Assembly Relative Power (MARP or Prm).
7.2. In-Core Fuel Management Optimization In [12], Chapot et al. have tested Genetic Algorithms with population size 500 under the same conditions and the Physics Reactor Code RECNOD, reaching CB = 1026 ppm. PSORK’s worst result within the 80 experiments was CB = 1070 ppm. All of the PSORK final results were in accordance with safety technical requirements, represented by the constraint Prm ≤ 1.395. In [14], parameter free PBIL (FPBIL) with a different objective function reaches the best result CB = 1554ppm, without local heuristics in 430364 evaluations. PSORK’s results were reached in 10000 evaluations. Thus, considering the number of evaluations, PSORK attained considerable performance.
8. Conclusions
In this chapter we have reviewed the optimization of combinatorial problems with the Swarm Intelligence metaheuristic PSORK, giving as examples the TSP and the real-world ICFMO. The results ratify that the PSO may optimize combinatorial problems. Increasing the size of the swarms allows finding near-optimal results with considerable performance, that is, with a computational cost relatively inexpensive for the TSP, with general results comparable to ACO and GA. Considering that the swarms do not have previous knowledge of the problems’ search space and that no local search heuristic has been used, PSORK has attained Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
74
Anderson Alvarenga de Moura Meneses and Roberto Schirru
important results when applied to combinatorial problems such as the ones presented. The complex and multimodal search space of the real-world ICFMO has been explored efficiently, compared to other metaheuristics such as PBIL, GA and FPBIL.
Acknowledgments The author R.S. would like to acknowledge FAPERJ (Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro) for providing support to this research. Part of this chapter was published in the journal Progress in Nuclear Energy.
References Eberhart, R; Kennedy. J. A New Optimizer Using Particles Swarm Theory, Proceedings of Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan. IEEE Service Center, Piscataway, NJ, 1995, 39-43. [2] Kennedy, J; Eberhart, RC. Swarm Intelligence. Morgan Kaufmann Publishers, San Francisco, CA, 2001. [3] Meneses, AAM; Machado, MD; Schirru, R. Particle Swarm Optimization applied to the nuclear reload problem of a Pressurized Water Reactor. Progress in Nuclear Energy, 51, 2009, 319-326. [4] Tasgetiren, MF; Liang, YC; Sevkli, M; Gencyilmaz, G. A particle swarm optimization algorithm for makespan and total flowtime minimization in the permutation flowshop sequencing problem. European Journal of Operational Research, 177, 2007, 19301947. [5] Levine, S. In-Core Fuel Management of Four Reactor Types. Handbook of Nuclear Reactor Calculation, 1987, Vol. II. CRC Press. [6] Naft, BN; Sesonske, A. Pressurized Water Reactor Optimal Fuel Management. Nuclear Technology, 1972, 14, 123-132. [7] Suh, JS; Levine, SH. Optimized Automatic Reload Program for Pressurized Water Reactors Using Simple Direct Optimization Techniques. Nuclear Science and Engineering, 1990, 105, 371-382. [8] Galperin, A; Kimhy, Y. Application of Knowledge-Based Methods to In-Core Fuel Management. Nuclear Science and Engineering, 1991, 109, 103-110. [9] Parks, GT. An Intelligent Stochastic Optimization Routine for Nuclear Fuel Cycle Design. Nuclear Technology, 1990, 89, 233-246. [10] Stevens, JG; Smith, KS; Rempe, KR. Optimization of Pressurized Water Reactor Shuffling by Simulated Annealing with Heuristics. Nuclear Science and Engineering, 1995, 121, 67-88. [11] Parks, GT. Multi-objective Pressurized Water Reactor Reload Core Design by NonDominated Genetic Algorithm Search. Nuclear Science and Engineering, 1996, 124, 178-187. [12] Chapot, JLC; Da Silva, FC; Schirru, R. A new approach to the use of genetic algorithms to solve the pressurized water reactor’s fuel management optimization problem. Annals of Nuclear Energy, 1999, 26, 641-655. [13] Lin, C; Yang, JI; Lin, KJ; Wang, ZD. Pressurized Water Reactor Loading Pattern Applications of Swarm Intelligence, Nova Using Science Publishers, Incorporated, 2010.Search. ProQuest Ebook Central, Science and Engineering, 1998, 129, Design the Simple Tabu Nuclear
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[1]
Particle Swarm Optimization Applied …
75
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
61-71. [14] Caldas, GHF; Schirru, R. Parameterless evolutionary algorithm applied to the nuclear reload problem. Annals of Nuclear Energy, 2008, 35, 583-590. [15] Machado, L; Schirru, R. The Ant-Q algorithm applied to the nuclear reload problem. Annals of Nuclear Energy, 2002, 29, 1455-1470. [16] De Lima, AMM; Schirru, R; Da Silva, FC; Medeiros, JACC. A nuclear reactor core fuel reload optimization using ant colony connective networks. Annals of Nuclear Energy, 2008, 35, 1606-1612. [17] Papadimitriou, CH; Steiglitz, K. Combinatorial Optimization. New Jersey, PrenticeHall, 1982. [18] Lawler, EL; Lenstra, JK; Kan, AHGR; Shmoys, DB. (Org.). The Traveling Salesman Problem: a guided tour of combinatorial optimization. 4ª. Ed. John Wiley & Sons, Wiltshire, Great-Britain, 1985. [19] Kennedy, J; Eberhart, RC. A discrete binary version of the particle swarm algorithm. Conference on Systems, Man and Cybernetics, 1997, 4104-4109. [20] Salman, A; Ahmad, I; Al-Madani, S. Particle Swarm Optimization for Task Assignment Problem. Microprocessors and Microsystems, 26, 2002, 363-371. [21] Wang, KP; Huang, L; Zhou, CG; Pang, W. Particle Swarm Optimization for Traveling Salesman Problem. International Conference on Machine Learning and Cybernetics, 2003, 3, 1583-1585. [22] Pang, W; Wang, KP; Zhou, CG; Dong, LJ; Liu, M; Zhang, HY; Wang, JY. Modified Particle Swarm Optimization based on Space Transformation for solving Traveling Salesman Problem. Proceedings of the Third International Conference on Machine Learning and Cybernetics, 2004. [23] Tasgetiren, MF; Sevkli, M; Liang, YC; Gencyilmaz, G. Particle swarm optimization algorithm for single machine total weighted tardiness problem. In: Proceedings of the IEEE Congress on Evolutionary Computation, Portland, 2004, vol. 2, 1412-1419. [24] Bean, JC. Genetic Algorithms and Random Keys for Sequencing and Optimization. ORSA Journal of Computing, 1994, 6, no. 2. [25] Meneses, AAM; Schirru, R. Particle Swarm Optimization Aplicado ao Problema Combinatório com Vistas à Solução do Problema de Recarga em um Reator Nuclear. Proceedings of the International Nuclear Atlantic Conference, INAC, Brazil, in Portuguese, 2005. [26] Domingos, RP; Schirru, R; Pereira, CMNA. Particle swarm optimization in reactor core design. Nuclear Science and Engineering, 2006, 152, 197-203. [27] Medeiros, JACC; Schirru, R. Identification of nuclear power plant transients using the Particle Swarm Optimization algorithm. Annals of Nuclear Energy, 2008, 35, 576-582. [28] Galperin, A. Exploration of the Search Space of the In-Core Fuel Management Problem by Knowledge-Based Techniques. Nuclear Science and Engineering, 1995, 119, 144152. [29] Vanderbei, RJ. Linear Programming - Foundations and Extensions. Kluwer Academic Publishers, 1992. [30] Dorigo, M; Gambardella, LM. Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1997, 1, 53-66. [31] Lawler, EL. The Quadratic Assignment Problem. Management Science, 1963, 9, 586599. [32] Chapot, JLC. Otimização Automática de Recargas de Reatores a Água Pressurizada Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
76
Anderson Alvarenga de Moura Meneses and Roberto Schirru
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Utilizando Algoritmos Genéticos. D. Sc. Thesis. COPPE/UFRJ, Brazil, 2000 (in portuguese). [33] Langenbuch, S; Maurer, W; Werner, W. Coarse mesh flux expansion method for analysis of space-time effects in large water reactor cores. Nuclear Science and Engineering, 1977, 63, 437-456. [34] Liu, YS. et al. ANC: A Westinghouse Advanced Nodal Computer Code. Technical Report WCAP-10965, Westinghouse, 1985. [35] Montagnini, B; Soraperra, P; Trentavizi, C; Sumini, M; Zardini, DM. A well balanced coarse mesh flux expansion method. Annals of Nuclear Energy, 1994, 21, 45-53.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
In: Applications of Swarm Intelligence Editor: Louis P. Walters, pp. 77-88
ISBN: 978-1-61728-602-5 © 2011 Nova Science Publishers, Inc.
Chapter 4
SWARM INTELLIGENCE AND ARTIFICIAL NEURAL NETWORKS Mayssam Amiri and Seyed-Hamid Zahiri* Research Committee, Shiraz Electric Power Distribution Company (SHEDC), Shiraz, Iran Department of Electrical Engineering, Faculty of Engineering, Birjand University, Birjand, Iran
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Summary The most important issue in a neural network is its learning algorithm to find the optimum weight vectors to reach the minimum error. There are many types of learning algorithm (e.g. back propagation algorithm) but they are not the optimal approach confidently. At this time, a swarm intelligence optimization algorithm can be utilized to establish new optimum training algorithm. On the other hand neural networks have some structural parameters such as the number of hidden layer, the number of neurons in each layer, and activation function types. Usually, these parameters are selected manually with a trial and error process involving extensive experiments. Indeed, these structural parameters are not chosen optimally. In this stage, by employing a swarm intelligence algorithm, the optimum neural network structure can be obtained automatically. The abovementioned topic is followed by presentation of a review on some of the past research studies.
1. Using Swarm Intelligence for Artificial Neural Networks Training and Structure Optimization The Artificial Neural Networks (ANNs) are known as the "universal approximators'' and "computational models'' with particular characteristics such as the ability to learn or adapt, to organize or to generalize data. Up-to-date designing a (near) optimal network architecture is made by a human expert and requires a tedious trial and error process. Especially automatic determination of the optimal number of hidden layers and hidden nodes in each (hidden) layer is the most critical task. For instance, an ANN with no or too few hidden nodes may not *
E-mail address: [email protected], P.O.2010. Box: 97175-376 Phone: Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, ProQuest Ebook Central,
+98-561-2227044 Fax: +98-561-2227795. *(
78
Mayssam Amiri and Seyed-Hamid Zahiri
differentiate among complex patterns, instead leading to only a linear estimation of such possibly non-linear problem. In contrast, if the ANN has too many nodes/layers, it might be affected severely by the noise in data due to over-parameterization, which eventually leads to a poor generalization. On such complex networks proper training can also be highly timeconsuming. The optimum number of hidden nodes/layers might depend on input/output vector sizes, training and test data sizes, more importantly the characteristics of the problem, e.g., its non-linearity, dynamic nature, etc [1]. One of the ways of learning and predicting the best architecture of ANNs is using Artificial Intelligence. Swarm Intelligence techniques can be used for this aim.
1.1. Artificial Neural Networks
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Artificial neural networks realize the sub-symbolic paradigm of representing and processing of information. The area of science that deals with methods and systems for information processing using neural networks is called neuro-computation. An artificial neural network (or simply a neural network) is a biologically inspired computational model which consists of processing elements (called neurons) and connections between them with coefficients (weights) bound to the connections, which constitute the neuronal structure, and training and recall algorithms attached to the structure. Neural networks are called connectionist models because of the main role of the connections in them. The connection weights are the "memory" of the system. Even though neural networks have similarities to the human brain, they are not meant to model it. They are meant to be useful models for problem-solving and knowledgeengineering in a "humanlike" way. The human brain is much more complex and unfortunately, many of its cognitive functions are still not well known. The main characteristics of real and artificial neural networks can be named as below: • • • • • •
Learning and adaptation Generalization Massive parallelism Robustness Associative storage of information Spatiotemporal information processing
The first mathematical model of a neuron was proposed by McCulloch and Pitts in 1943. It was a binary device using binary inputs, binary output, and a fixed activation threshold. In general, a model of an artificial neuron is based on the following parameters which describe a neuron:
• • •
Inputs Weights Input Function
According to the type of values which each of the above parameters can take, different types of neurons have been used so far. The most used activation functions are: Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Artificial Neural Networks
79
1. The hard-limited threshold function If net input value u to the neuron is above a certain threshold, the neuron becomes active (activation value of 1); otherwise it stays inactive (activation value of 0). 2. The linear threshold function The activation value increases linearly with the increase of the net input signal u, but after a certain threshold, the output becomes saturated (to a value of 1, say); there are different variants of this function depending on the range of neuronal output values. 3. The sigmoid function This is any S-shaped nonlinear transformation function. Different types of sigmoid functions have been used in practice. Most of them are The logistic function:
f = 1 /(1 + e − u ) , where e is a constant, the base of natural logarithm (e, sometimes written as exp, is actually the limit of the n-square of (1 + 1 / n) when n approaches infinity). In a more general form, the logistic function can be written as: f = 1 /(1 + e
− cu
),
where c is a constant. The reason why the logistic function has been used as a neuronal activation function is that many algorithms for performing learning in neural networks use the derivative of the activation function, and the logistic function has a simple derivative,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
∂g
∂u
= f (1 − f ) .
4. Gaussian (bell shape) function An output signal from a neuron can be represented by a single static potential, or by a pulse, which either occurs (coded as 1) or does not occur (coded as 0). Although a single neuron can perform certain simple information-processing functions, the power of neural computation comes from connecting neurons in networks. One way to understand the ideas behind the process of developing more and more complex artificial neural networks as computational models that comprise small processing elements (neurons) is to look at the history of this process. An artificial neural network (or simply neural network) is a computational model defined by four parameters:
1. 2. 3. 4.
Type of neurons (also called nodes, as a neural network resembles a graph) Connectionist architecture—the organization of the connections between neurons Learning algorithm Recall algorithm
According to the absence or presence of feedback connections in a network, two types of architectures are distinguished:
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
80
Mayssam Amiri and Seyed-Hamid Zahiri 1. Feedforward architecture There are no connections back from the output to the input neurons; the network does not keep a memory of its previous output values and the activation states of its neurons; the perceptron-like networks are feedforward types. 2. Feedback architecture There are connections from output to input neurons; such a network keeps a memory of its previous states, and the next state depends not only on the input signals but on the previous states of the network; the Hopfield network is of this type.
The most attractive characteristic of neural networks is their ability to learn. Learning makes possible modification of behavior in response to the environment. A neural network is trained so that application of a set X of input vectors produces the desired (or at least a consistent) set of output vectors Y, or the network learns about internal characteristics and structures of data from a set X. The set X used for training a network is called a training set. The elements x of this set X are called training examples. The training process is reflected in changing the connection weights of the network. During training, the network weights should gradually converge to values such that each input vector x from the set training data causes a desired output vector y produced by the network. Learning occurs if after supplying a training example, a change in at least one synaptic weight takes place. The learning ability of a neural network is achieved through applying a learning (training) algorithm. Training algorithms are mainly classified into three groups:
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1. Supervised The training examples comprise input vectors x and the desired output vectors y. Training is performed until the neural network "learns" to associate each input vector x to its corresponding and desired output vector y. 2. Unsupervised Only input vectors x are supplied; the neural network learns some internal features of the whole set of all the input vectors presented to it. 3. Reinforcement learning It sometimes called as reward-penalty learning and is a combination of the above two paradigms; it is based on presenting input vector x to a neural network and looking at the output vector calculated by the network. If it is considered "good," then a "reward" is given to the network in the sense that the existing connection weights are increased; otherwise the network is "punished," the connection weights, being considered as "not appropriately set," decrease. Thus reinforcement learning is learning with a critic, as opposed to learning with a teacher. Artificial neural networks have many applications in our world now. Some of these applications are listed below:
• Knowledge acquisition • Pattern recognition • Classification • Signal processing Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Artificial Neural Networks • • • • • •
81
Image processing Speech processing Prediction Monitoring, control and planning Optimization Decision making
Most of the information of this section comes from [2]. For more information about artificial neural networks someone could refer to [2-4].
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
1.2. Using Particle Swarm Optimization for Artificial Neural Networks Training and Structure Optimization PSO is motivated from the simulation of the social behavior (flock of birds). This optimization approach update the population of particles by applying an operator according to the fitness information obtained from the environment so that the individuals of the population can be expected to move towards the better solution. In this section, the selected researchers' works on using PSO for ANN training and structure optimization are introduced. In [5], a handwritten Chinese characters recognition method based on PSO Neural Networks is introduced. This paper introduces the particle swarm optimization algorithm (PSO) which has parallel search capability into neural network training, and applies the trained network to handwritten Chinese character recognition. The specific processes are as follows: Firstly, the distribution feature of the handwritten Chinese characters is extracted by the method of image processing. Secondly, the inertia weight which is an important factor in PSO algorithm is improved, and it is made to match the article’s fitness function value. Finally, using the ANN trained by the improved PSO to realize handwritten Chinese character recognition. The authors have introduced two key points in the ANN training with improved PSO: 1. Setting the mapping relation between the particle’s dimension and the connection weights in neural network. Each dimension of particle is regarded as a connection weights in network. 2. Defining the mean square error of the neural work as the fitness function. The mean square error minimization is regarded as the optimization objective of PSO algorithm. When the PSO found the best particle, the network is trained completely.
The network has d inputs and m hidden layers. Moreover, there are n neurons in the output layer. According to the BP net structure, there are d × m + m × n weights and m + n thresholds. Therefore, there are the same number d × m + m × n + m + n particle’s dimensions in the PSO training algorithm. Through a lot of experiments, the results show that the character recognition rate is up to 96.25%, which is higher than BP neural network clearly. In [6], Chinese word segmentation based on the improved FPSO neural networks is implemented follows: Applications of Swarm Intelligence, Novaas Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
82
Mayssam Amiri and Seyed-Hamid Zahiri
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Firstly, a solution is obtained by searching globally using FPSO (Fuzzy cluster Particle Swarm Optimization) algorithm, which has strong parallel searching ability, encoding real number, and optimizing the training weights, thresholds, and structure of neural networks. Then based on the optimization results obtained from FPSO algorithm, the optimization solution is continuously searched by following BP algorithm, which has strong local searching ability, until it is discovered finally. Simulation results show that the method proposed in this paper greatly increases both of the efficiency and accuracy of Chinese word segmentation. Ref. [7] proposes an adaptive dynamic feedforward neural network based on modified particle swarm optimization algorithm. Between input layer and the first hidden layer, and also the last hidden layer and output layer, the dynamic time delay operators are adopted, which could build the relationship between the current output and the previous time inputs. Those are dynamic adjusted with the iterative system identification process using the PSO algorithm. The time delay operators in input part can enhance the feedforward neural networks dynamic response capability and fully express the nonlinear input-output relationship for the actual controlled plant. On the other hand, the delay operators in the output part can accurately identify the pure time delay. After these operators in output part removed, the actual system model without any delay is built exactly, so some classic predict control strategies, such as smith control and model predict control, could achieve satisfactory control effect. Consequently, the high nonlinear and long time delay problem of the controlled plant is solved by our modified NN structure. Otherwise, use the white noise and Logistic mapping to improve the PSO algorithm's global search capability. Then, the parameters in the dynamic feedforward neural network are trained by modified PSO method, avoiding the complexity of gradient calculation and trapping in the local optimal points. That makes the neural networks follow the system change as fast as possible and perform real-time response. The simulation results show that the proposed LPSO+DEFNN is superior to other neural networks with the evolution algorithm in terms of the searching solutions and system identification. In Ref. [8], a successful online training of the generalized neuron (GN) and MLP with particle swarm optimization is presented. The GN has been shown to approximate nonlinear static and time varying functions accurately with fast convergence. Further, the GN online trained with PSO has been demonstrated to identify the nonlinear dynamics of a static VAR compensator in a 12 bus FACTS benchmark power system accurately. The training time taken for the GN to learn the nonlinear functions and the power system dynamics is much lesser and with fewer weights. A hybrid Particle Swarm Optimization Neural Network approach for short term load forecasting (STLF) is implemented in [9]. The approach is applied to real load data and two performance measure indicators including root mean squared error (RMSE) and absolute maximum error (MAXIMAL) are presented and compared. The results show that prediction is more accurate when compared with the traditionally used BP-based perception and ARMA model. Considering the factual circumstance of STLF, there are also some issues that need to be further discussed and improved. For example, some parameters like the maximum and minimum velocity values in PSO training initialization are now determined by trial and error, how to design a learning algorithm for the automated parameters selection will be an important issue in future research. Ref. [10] presents a novel hybrid particle swarm optimization which incorporated with fuzzy inference system and a new cross-mutated operation. An adaptive inertia weight of PSO is presented. The value of the weight is determined by a set of fuzzy rules. Furthermore, a Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Artificial Neural Networks
83
new operation namely cross-mutated operation is proposed which is used to force the particle escape the local optimum and push to global optimum. With these two new operations, the solution quality and solution reliability are improved. With numerical example on neural network application, the proposed method is successful to tune the parameters of networks and outperform compared with other PSO methods. Ref. [11] proposes an efficient cultural cooperative particle swarm optimization learning method for the functionallink-based neural fuzzy network (FLNFN) in predictive applications. The FLNFN model can generate the consequent part of a nonlinear combination of input variables. The proposed cultural cooperative PSO (CCPSO) method with cooperative behavior among multiple swarms increases the global search capacity using the belief space. The advantages of the proposed FLNFN-CCPSO method are as follows: 1. The consequent of the fuzzy rules is a nonlinear combination of input variables. This study uses the functional link neural network to the consequent part of the fuzzy rules. The functional expansion in the FLNFN model can yield the consequent part of a nonlinear combination of input variables; 2. The proposed CCPSO with cooperative behavior among multiple swarms can accelerate the search and increase global search capacity using the belief space. The experimental results demonstrate that the CCPSO method can obtain a smaller RMS error than the generally used PSO and CPSO for solving time series prediction problems.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Although the FLNFN-CCPSO method can perform better than the other methods, there is an advanced topic to the proposed FLNFN-CCPSO method. In this study, the number of rules is predefined. In Ref. [12], a novel heuristic structure optimization methodology for radial basis probabilistic neural networks (RBPNNs) is proposed. First, a minimum volume covering hyperspheres (MVCH) algorithm is proposed to select the initial hidden-layer centers of the RBPNN, and then the recursive orthogonal least square algorithm (ROLSA) combined with the particle swarm optimization (PSO) algorithm is adopted to further optimize the initial structure of the RBPNN. The proposed algorithms are evaluated through eight benchmark classification problems and two real-world application problems, a plant species identification task involving 50 plant species and a palmprint recognition task. Experimental results show that the proposed algorithm is feasible and efficient for the structure optimization of the RBPNN. The RBPNN achieves higher recognition rates and better classification efficiency than multilayer perceptron networks (MLPNs) and radial basis function neural networks (RBFNNs) in both tasks. Moreover, the experimental results illustrated that the generalization performance of the optimized RBPNN in the plant species identification task was markedly better than that of the optimized RBFNN. In Ref. [1], authors propose a novel technique for the automatic design of Artificial Neural Networks (ANNs) by evolving to the optimal network configuration(s) within an architecture space. It is entirely based on a multi-dimensional Particle Swarm Optimization (MD PSO) technique, which re-forms the native structure of swarm particles in such a way that they can make inter-dimensional passes with a dedicated dimensional PSO process. Therefore, in a multidimensional search space where the optimum dimension is unknown, swarm particles can seek both positional and dimensional optima. This eventually removes the necessity of setting a fixed dimension a priori, which is a common drawback for the family of swarm optimizers. With the proper encoding of the network configurations and Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
84
Mayssam Amiri and Seyed-Hamid Zahiri
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
parameters into particles, MD PSO can then seek the positional optimum in the error space and the dimensional optimum in the architecture space. The optimum dimension converged at the end of a MD PSO process corresponds to a unique ANN configuration where the network parameters (connections, weights and biases) can then be resolved from the positional optimum reached on that dimension. In addition to this, the proposed technique generates a ranked list of network configurations, from the best to the worst. This is indeed a crucial piece of information, indicating what potential configurations can be alternatives to the best one, and which configurations should not be used at all for a particular problem. In this study, the architecture space is defined over feed-forward, fully-connected ANNs so as to use the conventional techniques such as back-propagation and some other evolutionary methods in this field. The proposed technique is applied over the most challenging synthetic problems to test its optimality on evolving networks and over the benchmark problems to test its generalization capability as well as to make comparative evaluations with the several competing techniques. The experimental results show that the MD PSO evolves to optimum or near-optimum networks in general and has a superior generalization capability. Furthermore, the MD PSO naturally favors a low-dimension solution when it exhibits a competitive performance with a high dimension counterpart and such a native tendency eventually yields the evolution process to the compact network configurations in the architecture space rather than the complex ones, as long as the optimality prevails. Ref. [13] introduces a hybrid algorithm combining particle swarm optimization (PSO) algorithm with back-propagation (BP) algorithm, also referred to as PSO–BP algorithm, is proposed to train the weights of feedforward neural network (FNN), the hybrid algorithm can make use of not only strong global searching ability of the PSOA, but also strong local searching ability of the BP algorithm. In this paper, a novel selection strategy of the inertial weight is introduced to the PSO algorithm. In the proposed PSO–BP algorithm, a heuristic way to give a transition from particle swarm search to gradient descending search is adopted. Also three kind of encoding strategy of particles, and the different problem area in which every encoding strategy is used are given. The experimental results show that the proposed hybrid PSO–BP algorithm is better than the Adaptive Particle swarm optimization algorithm (APSOA) and BP algorithm in convergent speed and convergent accuracy. Ref. [14] presents a particle swarm optimization (PSO) technique in training a multi-layer feed-forward neural network (MFNN) which is used for a prediction model of diameter error in a boring machining. Compared to the back propagation (BP) algorithm, the present algorithm achieves better machining precision with a fewer number of iterations. In Ref. [15], authors proposed a PSO-based method to obtain the optimal weights for linear combination of multiple neural networks. The classification error rate of the ensemble system was used to evaluate the effectiveness of the method. The ensemble networks created by the Negative correlation learning (NCL), bagging and independent training were used to evaluate the performance of the proposed method. Using three benchmark data sets, it was demonstrated that the PSO-based weighting method yields better classification for the test set comparing to the simple averaging and mean square error (MSE) methods. The experimental results show that PSO can be successfully used for computing the weights for combining multiple neural network classifiers, especially for diverse networks. The time series prediction of a practical power system is investigated in Ref [16]. The radial basis function neural network (RBFNN) with a nonlinear time-varying evolution Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Artificial Neural Networks
85
particle swarm optimization (NTVE-PSO) algorithm is developed. When training RBFNNs, the NTVE-PSO method is adopted to determine the optimal structure of the RBFNN to predict time series, in which the NTVE-PSO algorithm is a dynamically adaptive optimization approach using the nonlinear time-varying evolutionary functions for adjusting inertia and acceleration coefficients. The proposed PSO method will expedite convergence toward the global optimum during the iterations. To compare the performance of the proposed NTVE-PSO method with existing PSO methods, the different practical load types of Taiwan power system (Taipower) are utilized for time series prediction of one-day ahead and fivedays ahead. Simulation results illustrate that the proposed NTVE-PSO-RBFNN has better forecasting accuracy and computational efficiency for different electricity demands than the other PSO-RBFNNs. In Ref. [17], authors first present a learning algorithm for dynamic recurrent Elman neural networks based on a modified particle swarm optimization. The proposed algorithm computes concurrently both the evolution of network structure, weights, initial inputs of the context units and self-feedback coefficient of the modified Elman network. Thereafter, they introduce and discuss a novel control method based on the proposed algorithm. More specifically, a dynamic identifier is constructed to perform speed identification and a controller is designed to perform speed control for Ultrasonic Motors (USM). Numerical experiments show that the novel identifier and controller based on the proposed algorithm can both achieve higher convergence precision and speed than other stateof-the-art algorithms. The effectiveness of the controller is verified using different kinds of speeds of constant, step and sinusoidal types. Besides, a preliminary examination on a randomly perturbation also shows the robust characteristics of the two proposed models.
1.3. Using Ant Colony Optimization for Artificial Neural Networks Training Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
In the early 1990s, ant colony optimization (ACO) was introduced by M. Dorigo and colleagues as a novel nature-inspired metaheuristic for the solution of hard combinatorial optimization (CO) problems. The inspiring source of ACO is the foraging behavior of real ants. When searching for food, ants initially explore the area surrounding their nest in a random manner. As soon as an ant finds a food source, it evaluates the quantity and the quality of the food and carries some of it back to the nest. During the return trip, the ant deposits a chemical pheromone trail on the ground. The quantity of pheromone deposited, which may depend on the quantity and quality of the food, will guide other ants to the food source. Indirect communication between the ants via pheromone trails enables them to find shortest paths between their nest and food sources. This characteristic of real ant colonies is exploited in artificial ant colonies in order to solve CO problems. In this section, the selected researchers' works on using ACO for ANN training and structure optimization are introduced. According to the problems of the nonlinearity and non-norm on dam displacement prediction, the dam displacement mode based on improved ant colony algorithm neural networks was proposed in Ref. [18]. The binary ant colony algorithm has been brought into the optimization of weights in Neural Networks. So that the shortcomings of the ant algorithm using in the combinatorial optimization in continuous field have been overcome, while the embarrassment of BP algorithm being vulnerable into the local optimum have been avoided. Therefore, this improved ant colony algorithm neural networks can have both rapid global convergence ability of binary ant colony algorithms and extensive mapping ability of neural Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central, networks. The dam displacement model based on the new ant colony algorithm-neural
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
86
Mayssam Amiri and Seyed-Hamid Zahiri
networks is built by mixed programming, and it has been used for project application. The analysis result shows that this mode is feasible in nonlinear fitting with a high accuracy, and so provides a new method for dam displacement prediction. Ref. [19] establishes ant colony neural network model and applies in lithology recognition and prediction. The model combines ant colony system with BP neural network, and ACA is been put forward to optimize authority value and threshold value of BP nerve network. The result shows that this model has extensive mapping ability of neural network and rapid global convergence of ant system. In generally, this model has significant advantages inspect of fast convergence speed, good generalization ability and not easy to yield minimal local results. Ref. [20] proposed a new method using Ant Colony System (ACS) to appropriate determines NN structure regarding to neuron connections and weighted links. Due to training difficulties in general NN structure, this method can cope with trapping, local minimum problem. The yielded structure is trained to use BP with standard Brest Cancer data. The NN structure resulting from the proposed method is efficient better than the standard NN method. Ref. [21] presents a highly efficient RBF neural network algorithm based on ACO. The algorithm took full advantage of ant features of parallel and global optimization to find the centers of basis functions. In the case that the network structure is too big, the number of the hidden layer neurons nodes can be reduced and structure of network can be simplified by the method. From the experimental results we can see that compared with RBF neural network based on k clustering, the algorithm's overall error is small; the extent of fitting has been improved. The authors of Ref. [22] presented an ACO algorithm (i.e., ACOR) for the training of feed-forward neural networks for pattern classification. The performance of the algorithm was evaluated on real-world test problems and compared to specialized algorithms for feedforward neural network training, namely BP and LM, and also to algorithms based on a genetic algorithm. The performance of the stand-alone ACOR was comparable (or at least not much worse) than the performance of specialized algorithms for neural network training. This result is particularly interesting as ACOR -being a much more generic approach- allows also the training of networks in which the neuron transfer function is either not differentiable or unknown. The hybrid of ACOR and the Levenberg-Marquardt (LM) algorithm (i.e., ACORLM) was in some cases able to outperform BP and LM. Finally, the results indicate that ACOR compare favorably against other general purpose optimizers such as GAs. In Ref. [23], the induction motor control scheme based on the proposed ACNN-PID control has been implemented and applied to DTC system. In this method, ACA (Ant Colony Algorithm) is applied to optimize the parameters of NN-PID controller to improve the on-line self-tuning capability of this controller. The parameters of ACNN-PID controller can be real time adjusted and self studied. Based on DTC system, the flux and speed ripple decrease dramatically and stability increase evidently. The effectiveness of this method is proved by experiment. The speed of ACNN-PID system response converges more quickly than traditional PID system. So the dynamic and static performances of the DTC system have been improved obviously. A novel neural network ACO-BP neural network is used in Ref. [24] to model coal ash fusion temperature based on its chemical composition. Ant colony optimization (ACO) is an ecological system algorithm, which draws its inspiration from the foraging behavior of real ants. A three-layer network is designed with 10 hidden nodes. The oxide contents consist of
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence and Artificial Neural Networks
87
the inputs of the network and the fusion temperature is the output. Data on 80 typical Chinese coal ash samples were used for training and testing. Results show that ACO-BP neural network can obtain better performance compared with empirical formulas and BP neural network. The well-trained neural network can be used as a useful tool to predict coal ash fusion temperature according to the oxide contents of the coal ash.
References [1]
[2] [3] [4] [5]
[6]
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Kiranyaz, S; Ince, T; Yildirim, A; Gabbouj, M. "Evolutionary artificial neural networks by multi-dimensional particle swarm optimization", Neural Networks, December 2009, Vol. 22, Issue 10, Pages 1448-1462. Kasabov, NK. "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", Massachusetts Institute of Technology, Second printing, 1998. Krose, B; van der Smagt, P. "An Introduction to Neural Networks", The University of Amsterdam, 1996. Rojas, R. "Neural Networks, A Systematic Introduction", Springer, 1996. Zhitao, G; Jinli, Y; Yongfeng, D; Junhua, G. "Handwritten Chinese Characters Recognition Based on PSO Neural Networks", in the Proceeding of Second International Conference on Intelligent Networks and Intelligent Systems, 2009, 350353. He, J; Chen, L. "Chinese Word Segmentation based on the Improved Particle Swarm Optimization Neural Networks", IEEE Conference on Cybernetics and Intelligent Systems, 695-699, 2008. Han, M; Fan, J; Han, B. "An Adaptive Dynamic Evolution Feedforward Neural Network on Modified Particle Swarm Optimization", in the Proceeding of International Joint Conference on Neural Networks, USA, 1083-1089, 2009. Kiran, R; Jetti, SR; Venayagamoorthy, GK. "Online Training of a Generalized Neuron with Particle Swarm Optimization", in the Proceeding of International Joint Conference on Neural Networks, BC, Canada, 5088-5095, 2006. Xuan, W; Jiake, L; Chaofu, W; Deti, X. "A Hybrid Particle Swarm Optimization Neural Network Approach for Short Term Load Forecasting", in the Proceeding of 4th International Conference on Wireless Communications, Networking and Mobile Computing, 1-5, 2008. Ling, SH; Hung Nguyen, T; Chan, KY. "A New Particle Swarm Optimization Algorithm for Neural Network Optimization", in the Proceeding of Third International Conference on Network and System Security, 516-521, 2009. Lin, CJ; Chen, CH; Lin, CT. "A Hybrid of Cooperative Particle Swarm Optimization and Cultural Algorithm for Neural Fuzzy Networks and Its Prediction Applications", IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 2009, Vol. 39, No.1, 55-68. Huang, DS; Du, JX. "A Constructive Hybrid Structure Optimization Methodology for Radial Basis Probabilistic Neural Networks", IEEE Transactions on Neural Networks, 2008, Vol. 19, No.12, 2099-2115. Zhang, JR; Zhang, J; Lok, TM; Lyu, MR. " A hybrid particle swarm optimization– back-propagation algorithm for feedforward neural network training", Applied Mathematics and Computation, 2007, Vol. 185, Issue 2, 1026-1037.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
88
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[23]
[24]
Mayssam Amiri and Seyed-Hamid Zahiri its utilization in a boring machine", Journal of Materials Processing Technology, 2006, Vol. 178, Issue 1-3, 19-23. Nabavi-Kerizi, SH; Abadi, M; Kabir, E. " A PSO-based weighting method for linear combination of neural networks", Computers & Electrical Engineering, 2008, doi:10.1016/j.compeleceng.2008.04.006. Lee, CM; Ko, CN. " Time series prediction using RBF neural networks with a nonlinear time-varying evolution PSO algorithm", Neurocomputing, 2009, Vol. 73, Issue 1-3, 449-460. Ge, HW; Liang, YC; Marchese, M. " A modified particle swarm optimization-based dynamic recurrent neural network for identifying and controlling nonlinear systems", Computers & Structures, 2007, Vol. 85, Issue 21-22, 1611-1622. Jiang, Y; Wang, J. "The Model of Dam Displacement Based on Improved Ant Colony Algorithm-Neural Networks", in the Proceeding of First International Workshop on Database Technology and Applications, 2009, 337-340. Shao, Y; Chen, Q. "Application Ant Colony Neural Network in Lithology Recognition And Prediction: Evidence from China", in the Proceeding of IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 2008, 156-159. Pokudom, N. "Determine of Appropriate Neural Networks Structure using Ant Colony System", in the Proceeding of ICROS-SICE International Joint Conference, 2009, 4522-4525. Chun-tao, M; Xiao-xia, L; Li-yong, Z. "Radial Basis Function Neural Network Based on Ant Colony Optimization", in the Proceeding of International Conference on Computational Intelligence and Security Workshops, 2007, 59-62. Blum, C; Socha, K. "Training feed-forward neural networks with ant colony optimization: An application to pattern classification", in the Proceeding of the Fifth International Conference on Hybrid Intelligent Systems, 2005. Chengzhi, C; Xiaofeng, G; Yang, L. "Research on Ant Colony Neural Network PID Controller and Application", in the Proceeding of Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, 2007, 253-258. Liu, YP; Wu, MG; Qian, JX. "Predicting coal ash fusion temperature based on its chemical composition using ACO-BP neural network", Thermochimica Acta, 2007, Vol. 454, Issue 1, 64-68.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
In: Applications of Swarm Intelligence Editor: Louis P. Walters, pp. 89-129
ISBN 978-1-61728-602-5 c 2011 Nova Science Publishers, Inc.
Chapter 5
S WARM I NTELLIGENCE FOR THE S ELF -A SSEMBLY OF N EURAL N ETWORKS Charles E. Martin1,∗ and James A. Reggia2 Departments of Mathematics1 and Computer Science2 University of Maryland, College Park, MD 20742 USA
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Abstract The processes controlling the growth or self-assembly of biological neural networks are extremely complex and poorly understood. Developmental models have been created to tackle the considerable challenges of modeling neurogenesis. However, the dynamics of neural growth and the networks generated by these past models, even though they have generally been limited to two-dimensional spaces, tend to be very difficult to predict and control, particularly when growth is represented as a continuous process and involves large networks with complex topologies. In response to these difficulties, we present a developmental model of three-dimensional neural growth based on swarm intelligence in the form of collective movements and principles of self-assembly that greatly improves the robustness and controllability of artificial neurogenesis involving complex networks. A central innovation of our approach is that neural connections arise as persistent “trails” left behind moving agents, so that network connections are essentially a record of agent movements. We demonstrate the model’s effectiveness by using it to produce two large networks that support subsequent learning of topographic and feature maps. Improvements produced by the incorporation of collective movements are further expounded through computational experiments. These results highlight the model’s potential as a methodology for advancing our understanding of biological neural growth, among other applications, and their relationship to the concepts of swarm intelligence.
1.
Introduction
Given a set of components, the self-assembly problem entails the design of local control mechanisms that enable these components to self-organize into a given target structure, without individually pre-assigned component positions or central control mechanisms. ∗
E-mail address: [email protected]. (Corresponding author)
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
90
Charles E. Martin and James A. Reggia
Over the past several years self-assembly has received a great deal of attention as a research area in the field of swarm intelligence, with recent work spanning both computer simulations (Arbuckle and Requicha 2004; Grushin and Reggia 2006; Grushin and Reggia 2008; Jones and Matari´c 2003; Klavins et al. 2004; Werfel and Nagpag 2006) and physical robotics (Bishop et al. 2005; Gross et al. 2006; Klavins 2007; Nembrini et al. 2005; White et al. 2005). Inspiration has come from both self-assembly in natural physical systems (Whitesides and Gzybowski 2002) and closely-related collective construction in which passive components are manipulated by multiple autonomous agents, such as with nest construction by social insects (Bonabeau et al. 1999). While networks are like other structures studied in past work on self-assembly in that they have discrete components that need to position themselves in appropriate spatial locations, the assembly process of a network differs markedly from what has been studied in most past work because connections must be established between the discrete components. For example, real-world network components include resistors and capacitors in electrical circuits, neuronal cell bodies in neural networks, generators and transformers in power grids, and mixing stations in a chemical manufacturing plant. Yet a major aspect of each of these systems is the connections between the components: the wires in circuits, axons (neuron output connections) in neural networks, transmission lines in power grids, pipes in chemical plants, etc. To our knowledge, relatively little consideration has been given to how such connectivity might arise in past swarm intelligence work related to self-assembly, although progress has been made in engineering, such as the self-assembly of electrical circuits using physical processes (Gracias et al. 2000). In the following, for concreteness, we focus on the self-assembly of neural network architectures, both natural and artificial. We elected to study neural networks because there is a great deal of current interest in how neurobiological circuits self-organize during development, and because there has been only very limited past work in engineering and computer science on how to control the growth and development of artificial neural networks, as follows. In neuroscience, there is currently an intense experimental effort underway to better understand how complex interactions between genetic and activity-dependent factors determine the wiring of neural circuitry during an organism’s developmental period (Grove and Fukuchi-Shimogori 2003; Lopez-Bendito and Molnar 2003; Spitzer 2006). While the vast majority of models in computational neuroscience do not involve network self-assembly or connection growth, there has been substantial recent interest in modeling neural development. Much of this work has focused on the formation of topographically-structured connections in specific brain regions, and is based upon axon growth that is guided by growth cones, specialized structures at the leading tip of growing axons that are sensitive to local biomolecular gradients (Goodhill et al. 2004; Goodhill and Xu 2005; Honda 2003). Growth cones “steer” the direction in which axons grow to their target termination locations. These past models of neurobiological development, as well as related but more abstract studies oriented towards understanding the principles of neurogenesis rather than towards producing veridical models of development in specific brain regions (Fleischer and Barr 1994; Fleischer 1995; Kalay et al. 1995), are like our own work in explicitly incorporating geometric relations (not just network topology) and in simulating the growth of axons through physical space. However, unlike our work they are limited to two-dimensional space (for
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks
91
an exception, see Rust et al. 2003), typically do not incorporate cell migration and division, are generally applied to relatively small networks, are usually concerned with feedforward networks, and do not consider axon-axon interactions during network assembly (for an exception to the latter, see Yates et al. 2004). To our knowledge, no past models of neurogenesis done in computational neuroscience have explicitly recognized the relationship of work in this area to concepts that have emerged from swarm intelligence research on collective movements and self-assembly over the last several years (Grushin and Reggia 2008; Reynolds 1987; Rodriguez and Reggia 2004). Two past developmental models of neurogenesis are particularly relevant to the work we present herein. These models have notable similarities and differences to our own approach, and are among the most advanced developmental models designed specifically to study neural growth, each incorporating a multitude of mechanisms that influence the growth process. Together they encompass most of the common methods of generating continuous growth. In addition, the need to alleviate the shortcomings of these models, particularly the restriction of growth to a two-dimensional space, no direct interactions among growth cones and a lack of controllability, partly motivated our use of collective movements as a means of producing network self-assembly. The first of these models, presented in (Kalay et al. 1995), implements neural growth in a discrete two-dimensional space, in which axons grow between a fixed set of stationary cells. Axon movement is controlled through the combined effects of an L-system and a diffusible chemical emitted by cells. A growth cone can detect local concentrations and gradients in this chemical and change its direction of motion or branch in response to it. This mechanism is responsible for guiding axons to target cells over relatively short distances and promoting the appropriate amount of axonal branching. Growth over longer distances, from regions in which the diffusible chemical is not available for guidance, is dictated by the rules of the L-system. While such rules alone provide enough control over development to, in theory, generate any desired network, doing so requires that an inordinate amount of information be hand-encoded into the L-system. By supplementing this class of rules with behaviors that are dependent on the presence of the aforementioned diffusible chemical, the amount of information that needs to be encoded in the L-system is reduced significantly. The other model of neural growth, which is described in (Fleischer and Barr 1994), is based primarily on environment-dependent rules. Their model consists of cells that are capable of moving, dividing and emitting axons. The axons are guided by growth cones and grow through a two-dimensional continuous space and make connections with cells to form networks. Certain cells emit chemicals into the environment which diffuse and guide the growth cones by establishing chemical gradients. Cells have states that are determined by the cumulative amounts of various types of proteins that they have acquired. Cells may amass proteins by collecting them as they diffuse through the intercellular environment, receiving them from a contacting cell or by generating them. In this model, behaviors depend on both the state and local environment. Because it tends to be difficult to precisely predict and control the local concentrations of diffusible elements, particularly when the dynamics of the sources are uncertain, it is hard to derive a set of rules that will result in the generation of specific networks and geometric patterns of cells. The largest networks grown by past developmental models that incorporate continuous neural growth, including the two models just described, consist of about 50 neurons and 100
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
92
Charles E. Martin and James A. Reggia
connections. In contrast, we will show below that our approach can readily grow recurrent neural networks with over 600 neurons and nearly 50,000 connections. Additionally, past work has focused on feedforward networks rather than recurrent networks which tend to be more difficult to grow. We on the other hand have focused on growing recurrent networks. Further, the large recurrent networks grown using our approach are based on target structures with precisely specified topologies. Comparatively, the target structures for most of the networks grown by past developmental models are instead specified in terms of general qualities, such as the existence of connections between two layers, as opposed to specific patterns of connectivity. Relatively small networks based on nonspecific target structures are typical of past related work due in large part to the difficulty of controlling the characteristics of the networks grown by these models. To address these limitations of past models, including the restriction of growth to a two-dimensional space, the absence of direct interactions among growing recurrent connections, and a lack of controllability, we examined the use of collective movements as a means of producing reliable network self-assembly. For the most part, past work on artificial neural networks in computer science and engineering has focused on application-related performance, with little attention being given to issues surrounding network growth, development and self-assembly. There are however two exceptions. The first consists of computational techniques that optimize neural network architectures by adding/deleting nodes/connections dynamically during learning. Examples are Marchand’s algorithm (Marchand et al. 1990), cascade correlation (Fahlman and Lebiere 1990), optimal brain damage (LeCun et al. 1990), the upstart algorithm (Frean 1990) and recursive deterministic perceptrons (Elizondo et al. 2007). Unlike the approach we take here, these past network construction methods do not involve growth or self-assembly in a physical space, and will not be considered further. The second exception involves a technique known as developmental encoding, which has been used by researchers evolving neural network architectures with genetic algorithms/programming. Examples include L-systems (Chval 2002), matrix rewriting systems (Kitano 1990), cellular encoding (Gruau 1993), and descriptive encoding (Jung and Reggia 2006). Some of the models based on these methods have incorporated a significant amount of biological detail (Astor and Adami 2000; Cangelosi et al. 1994, Eggenberger 1997). Again, these latter approaches generally do not involve growth or self-assembly in physical space; they typically consider just the topology of networks and not the geometrical relations involved. In this chapter, we present a novel model in which neural network self-assembly arises in a three-dimensional space from direct interactions between components of the developing network. This approach is motivated by numerous potential benefits and applications. Physical realizations of self-assembly require that structural components move through space during the development process, but how this should occur with networks is poorly understood at present. Since our approach incorporates growth in a continuous three-dimensional space it could be useful for studying the science of self-assembly, especially as it relates to network structures. Furthermore, the developmental processes implemented in our model make it suitable for studying phenomena such as self-repair and plasticity, which involve growth and changes in the positions of network components in response to alterations in the network structure. There are also many potential neuroscience applications, in which events that occur during the growth process influence the network that ultimately develops, for example, interactions between growing axons, and between axons and cells such as
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks
93
activity dependent development. Lastly, from the perspective of swarm intelligence based optimization, the continuous nature of the growth process and the local interactions among the network components could ultimately be taken advantage of by modifying our model so that input to a growing network trains its weights or results in a network of optimal size for a particular problem. Our model is intended to grow networks that are more directly inspired by neurobiological processes than most traditional artificial neural networks in computer science and engineering, in the sense that network architectures are defined by their geometry in addition to their topology. Also, the networks grown using our model reproduce the topological representations of past artificial neural network models in a statistical sense. Agents represent two distinct types of entities: cells, which roughly correspond to neuron cell bodies, and growth cones, which are named after the specialized structures at the leading tips of growing axons in biology. Both types of agents move, divide, and exert local “forces” upon one another, the latter being analogous to influences used to produce flock-like collective movement patterns in past swarm intelligence systems (Reynolds 1987; Rodriguez and Reggia 2004). A central innovation of our approach is that network connections arise from “trails” deposited by moving growth cone agents, something that is reminiscent of pheromone trails produced by ants (Bonabeau et al. 1999; Deneubourg et al. 1989; Deneubourg et al. 1990; Franks et al. 1991). Topographic regularity in the developing connections emerges from the collective movements of populations of growth cone agents. Our model integrates these inter-agent influences during network development with rule-based control mechanisms that govern behaviors such as cell division and axon branching, greatly facilitating one’s ability to exert control over resultant network structures. Here we focus on the improvements that collective movements bring to our model, as opposed to the importance of the rule set. This was done mostly because rule-based developmental models, such as Lsystems, have already been applied to neurogenesis, whereas techniques employing swarm intelligence mediated via local “forces” have not. However, having now used the model to grow many different networks, our experience indicates that the rule based component of the model does play an important role by making it easier to predict and control discontinuous actions, tailor the agents’ dynamics, and incorporate only local interactions. We demonstrate the effectiveness of our approach by using it to produce two large networks (large relative to what has been done in past related work) that support the emergence, during subsequent learning, of topographic and feature maps. Improvements produced by the incorporation of collective movements are further expounded through computational experiments that examine the robustness of our model. The results suggest that our approach has substantial potential as a methodology for advancing the understanding of network selfassembly and its relationship to the concepts of swarm intelligence. To our knowledge, this is the first explicit recognition that the use of swarm intelligence methods can serve as the basis for simulating growth and formation of neural networks, including those with recurrent connectivity. This chapter is organized as follows. In Section 1. we present background material, and motivate our approach by explaining its potential benefits and the goals of the research presented herein. In Section 2. we give a detailed description of our model, an overview of the computational experiments we performed, an explanation of the metrics we used in these experiments, and the details of how our model is implemented as a simulation envi-
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
94
Charles E. Martin and James A. Reggia
ronment. In Section 3. we cover the details of the computational experiments, including the experimental setups, procedures, and collected data. Section 4. discusses the implications that the results of the experiments have for our approach, and the improvements that it brings to the modeling of self-assembling network structures. This section also includes a discussion of our model’s potential relevance to neuroscience. Finally, Section 5. draws some conclusions and presents some future directions of our research.
2.
Methods
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
In this section we describe our developmental model and the simulator that implements it. The model supports the collective growth of individual neurons in a continuous, unbounded, three-dimensional space, with the development of neural networks being governed by two different but integrated mechanisms. The first mechanism, which is inspired by L-systems (Prusinkiewicz and Lindenmayer 1990), consists of a set of rules that are repeatedly applied to neurons and which govern discrete decisions such as cell division and axon emission. The second mechanism involves using local forces to govern the interactions between neurons and their constituent components, growth cones and cells (somas). This mechanism is inspired by collective movements in swarm systems and the principles of self-assembly (Bonabeau et al. 1999; Grushin and Reggia 2006; Grushin and Reggia 2008). However, as described below, our approach is somewhat atypical of swarm intelligence systems in that it is deterministic and the agents’ local coordinate systems all have the same orientation. The need for probabilistic choices never arose as a design issue, and the rules we developed worked well without introducing noise. The shared orientation was used because it made writing rules qualitatively easier (such a common reference frame might be implemented in biological systems via multiple chemical gradients).
2.1.
Agents
The model incorporates a system of agents (particles) that move through the threedimensional space. There are two classes of agents, both of which are represented as spheres (see Figure 1). Cell agents have radius rc and growth cone agents have radius rg with rc > rg . Each agent i has a position ~ri and a velocity ~vi . In addition, each cell and growth cone has internal state variables. Table 1 lists the internal state variables of cells and growth cones. The cell type indicates whether or not a cell can divide and prescribes a particular role for it when activated in a neural network. A cell of type “A” is an afferent (input) cell, type “E” an excitatory cell and type “I” an inhibitory cell. Appending “S” to any of the aforementioned cell types means that the cell can divide. If the “S” is absent, then it cannot divide. The life variable L ∈ N influences cell division by restricting the number of generations of descendent cells. If L = 0, then a cell cannot divide. Otherwise, the life variable is often used such that when a cell divides the life variables of its two child cells are L − 1, the life variables of their children are L − 2, and so forth, until L = 0, so the original parent cell (which no longer exists because it divided) will have 2L Lth generation descendent cells. The local time variable indicates how long an agent has existed. The tag set partitions agents of the same type into subgroups of agents that have all or some of the same tags. This allows agents with otherwise identical states to execute different rules.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
95
Figure 1. The growth of an axon from its emitting cell to target cells. Each branch of a growing axon is guided by a growth cone that occurs at its tip. Under the proper conditions, when a growth cone comes in contact with a cell it will establish a connection with that cell, as has already occurred here with the upper target cell. The growth cone is continuing to move upwards, presumably towards another target cell (not shown). Table 1. Internal State Variables Possessed by Cell and Growth Cone Agents
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
State Variable Cell Type Life Local Time Tag Set
Description one of the following strings: A,E,I,AS,ES,IS natural number nonnegative real number set of alphanumeric strings, e.g., {B0,E54,A3}
Agent Type Cell Cell Cell, Growth Cone Cell, Growth Cone
In this model, a neural network is specified by its geometry (relative spatial positions of the cells) and its topology (connectedness of the cells). The directed edges between cells are referred to as axons. Cells emit growth cones, which occur at the tips of axon branches and guide their growth. An axon branch is implemented as a persistent trail of small, discrete, connected segments deposited behind a moving growth cone. When a growth cone comes in contact with a cell it can establish a connection directed from the cell that emitted the growth cone to the cell that it contacts. In this way the topology of a network is generated by the collective dynamics of the growth cones and cells. Most past developmental models of neural growth do not incorporate dendritic trees, and neither does our model. However, the framework we present here is sufficiently general that they could be added.
2.2.
Rules
The behavior of the agents (particles) is governed in part by a set of rules that control actions such as cell and growth cone division and axon emission, and which manipulate the Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
96
Charles E. Martin and James A. Reggia
values of certain internal state variables such as the tag set. The rules are in the form of if-then statements in which the antecedent is a predicate that must be satisfied in order for an agent to execute the consequent, which is an instruction. An instruction is a command accompanied by the information needed to carry out the command. A command may be thought of as a function in which the information needed to execute the command is passed as arguments to the function. During each time step, every agent looks through its rule set and finds the subset of rules with satisfied antecedents that it may execute. If this subset contains rules that may not be executed on the same time step, then conflict resolution is applied to remove rules from the subset until all conflicts are resolved. In general, each rule takes the form
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
< Agent T ype; Cell T ype; T ags; Local T imes > =⇒ < Command; arg 1; arg 2; ... > .
For each agent type, Table 2 lists the requirements that may be specified in each predicate (antecedent) field and which must be met by an agent in order for it to execute the corresponding instruction. Table 3 describes the meaning of each command function and the arguments that each one accepts. Here tags is a set of alphanumeric strings; force ∈ R3 represents a force acting on the agent, and is specified in spherical coordinates [θ, φ, r], where θ ∈ [0◦, 360◦), φ ∈ [0◦, 180◦] and r ∈ [0, ∞); duration ∈ R+ is a length of time during which an agent applies or is subject to a force; cellType is a string (see Table 1); life ∈ N; direction is a unit vector in R3 indicating the orientation of an axon emission or a cell or growth cone division, and is specified in spherical coordinates [θ, φ] with r = 1; magnitude ∈ R+ is the maximum magnitude of a force; and sign ∈ {“ + ”, “ − ”} indicates whether a force is attractive or repulsive. The local growth force (LGF) and the rule-based force are defined in Section 2.3.. Additionally, there are three special symbols (*, #, and +) that may be specified in certain predicate fields or passed as arguments to certain command functions. The * symbol may be placed in the T ags or Local T imes predicate fields, in which case it indicates that an agent satisfies the predicate field with any tag set or any local time respectively. In GenCell and GenCone commands (see Table 3) the tags and life variables may be set to the # symbol. If tags = #, then the newly created agent, a cell in the case of a GenCell command and a growth cone in the case of a GenCone command, will have the same set of tags as the agent executing the rule. If life = #, then the life state variable of the newly created cell will be one less than the life of the cell executing the rule or zero. When the + symbol is present in a set of tags specified by the variable tags it indicates that when an agent executes the corresponding rule the alphabetically last tag in the executing agent’s tag set is to have a value of one added to its numeric portion, and that this new tag is be added to the set tags for the current execution of the rule. For example, the rule (Cell; ES; {B0, E2}; ∗) ⇒ < DivCell; [0, 90]; < GenCell; ES; {B0}; # >; < GenCell; IS; {D0, F 1, +}; 3 >>
instructs a cell to divide. To execute this rule a cell agent must have a cell type of “ES”, either a “B0” or an “E2” tag, or both tags, and may have any local time (indicated by the * symbol). When a cell satisfies these conditions it divides along the x-axis (specified by spherical coordinates [θ = 0, φ = 90]) into two child cells. The first child cell is of cell type
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
97
Table 2. Conditions That May Be Specified in the Predicate Fields Agent Type Cell or Growth Cone
Cell Type Must have specified cell type or null if Agent Type is Growth Cone.
Tags Agent must have all tags in specified set.
Agent must have at least one tag in specified set. Agent may have any tag set.
Local Times Agent’s local time must match one time in specified set. Agent may have any local time.
“ES”, has “B0” as its only tag, and its life is one less than its parent’s life or zero (indicated by the # symbol). The second child cell is of cell type “IS”, its tag set consists of “D0”, “F1” and “E3”, and its life is 3. The plus symbol in the second GenCell command indicates that the new cell is to be given the tag that immediately follows the alphabetically last tag in the parent cell’s tag set (hence the “E3”). Upon being created the local time of each child cell is set to 0.0. As a second example, the rule
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
(Cone; AN D{B2, D0}; {0.5, 2.0}) ⇒ < SetRuleF orce; [45, 45, 5]; 2.5 > sets a growth cone agent’s rule-based force. In order to execute this rule a growth cone agent must have both “B2” and “D0” in its tag set, which is indicated by the AND preceding the specified set. It must also have a local time of either 0.5 or 2.0. If these conditions are met, then the rule-based force (explained later) of the growth cone agent is set to the spherical coordinate [θ, φ, r] = [45◦, 45◦, 5] for 2.5 time units or until this agent executes another SetRuleForce command.
2.3.
Forces
In addition to rules, agent dynamics are governed by force-based interactions between agents, and between agents and their environment. These “forces” are not physical forces but are intended to represent influences on an agent including agent-to-agent interactions that are responsible for generating collective movements as in past swarm intelligence systems (Kennedy and Eberhart 1995; Reynolds 1987; Rodriguez and Reggia 2004), agents’ interactions with their environment (including collisions), and a separate force that is controlled by the rule set. The forces responsible for producing collective movements are F~cell ~cone , which produce flocking-like behavior among the cells and growth cones respecand F tively, and F~lgf , which causes growth cones to be attracted to or repelled from cells in their local vicinity. Environmental forces include F~drag , which induces a viscous drag on ~rule is a all agents and F~collision , which approximates collisions between agents. Finally, F rule-dependent force that is specific to each agent. The force-based dynamics of the agents obey Newton’s second law, which states that the time rate of change of the momentum of an agent is equal to the net force upon it.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
98
Charles E. Martin and James A. Reggia Table 3. Command Functions Used as Rule Consequents
Command Function
Description Generates a new cell with the specified cell type, GenCell(cellType, tags, life) tag set and life. Generates a new growth cone with the specified GenCone(tags) tag set. SetTags(tags) Sets the agent’s tag set to the one specified. Constant force applied to the agent for the speciSetRuleForce(force, duration) fied duration or until it executes another SetRuleForce command. Cell sets its LGF parameter ±k to the specified SetLGF(sign, magnitude, duration) magnitude and sign for the indicated duration or until it executes another SetLGF command. SetCellType(cellType) Sets the cell’s type to the specified cell type. EmitAxon(direction, GenCone) Cell emits an axon in the specified direction. Cell divides into two new cells along the speciDivCell(direction, GenCell, GenCell) fied axis of division. Child growth cone splits-off from its parent along DivCone(direction, GenCone) the specified axis of division, which results in a new axon branch.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Assume that there are n cells and m growth cones, that the mass of each cell is mc , and that the mass of each growth cone is mg , with mc > mg . Then the force-based movements of cells are governed by 2
1 ~ d ~ri i = Fdrag (~vi ) + F~rule (t) + 2 dt mc
n X j6=i
~cell (~rji ) + F~collision (~rji ) F
(1)
where i = 1, 2, 3, ..., n, ~ri is the position vector of cell i, ~vi is its velocity, ~rji = ~ri − ~rj and the summation is taken over the cells. Likewise, the growth cones’ movements are governed by m n n+m X X X d2~ri 1 ~ i ~rule (t) + ~cone(~rji ) + ~lgf (~rji ) + ~collision (~rji ) Fdrag (~vi) + F = F F F dt2 mg j6=i
j6=i
(2)
j6=i
where i = 1, 2, 3, ..., ~ri is the position vector of growth cone i, the first summation is taken over the growth cones, the second summation is taken over the cells, and the third summation is taken over all cells and growth cones. We describe each of these individual forces below. A rule called the movement criterion causes agents that are moving slowly and not i be the net force on the ith agent given by accelerating much to come to rest. Let F~net Equation 1 or 2 excluding the frictional force F~drag . The movement criterion is
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
99
i i If kF~net k ≤ and k~vi k ≤ δ, then F~net ← ~0 and ~vi ← ~0.
Here, and δ are nonnegative constants that are usually small relative to the magnitudes of the typical net forces and velocities in the system and have values that depend on the agent type. This rule implies that a motionless agent will not accelerate until the net force on it exceeds . The usefulness of this rule is two-fold. First, we often want the velocity of agents to reach zero in a relatively short amount of time in order to achieve a stable neural network ~drag on an agent is directly proportional to configuration, but the viscous frictional force F its velocity and so the velocity of a slowing agent approaches zero asymptotically. The movement criterion remedies this by forcing a slowing agent’s velocity to zero. Second, the movement criterion provides greater control over the static equilibrium configurations that develop among the cells and does so without the necessity of modifying the intercellular force. These configurations are more stable because the cells are less susceptible to minor disruptive perturbations caused by neighboring cells. 2.3.1. Forces Governing Collective (“Swarm”) Movements
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
The intercellular force governs the local interactions between the cells. It encompasses three interactions commonly used to induce “flocking” or “swarming” behavior in particle systems; namely, collision avoidance (separation), flock centering (cohesion) and velocity matching (alignment) (Reynolds 1987). The intercellular force that a cell j exerts on a cell i is given by ( α) α )e(−drji (b − crji rˆji, if rji ≤ Rcc ~ (3) Fcell (~rji) = 0, if rji > Rcc where Rcc is the size of a cell’s local neighborhood when interacting with other cells, ~rji = ~ r ~ri − ~rj is the vector that points from cell j to cell i, rji = k~rjik, rˆji = rji , and b, c, d, α ∈ ji + R are parameters used to tailor the shape of the function. It has a smooth transition from being repulsive at relatively close distances to attractive at further separations, and then decreasing to zero at still farther separations, as illustrated in Figure 2. It thus captures both the collision avoidance and flock centering influences, and although it does not explicitly account for velocity matching, this influence arises as a consequence of the attractive aspect of the force. Note that the cut-off at k~rjik = Rcc causes the force to adhere to the localinteractions-only restriction. Further, the distance at which the transition from repulsion to attraction occurs sets-up a characteristic equilibrium separation among cells in close proximity. Like cells, growth cones also interact through local forces inspired by flocking behavior. In this case however, implementing the three forces (separation, cohesion and alignment) explicitly produced a more useful collective dynamics. The combined effect of these three forces is represented by a single intercone force F~cone = F~s + F~c + F~a . Let ~ri be the position of the ith growth cone and ~vi its velocity. Define Rgg as its neighborhood radius and Ni as the set of growth cones within a distance Rgg . The position of the ith growth cone relative P ~i = ~ri − |N1i| j∈Ni ~rj and its relative to its neighboring growth cones in Ni is given by ρ P ~vj . The separation force F~s acts as a repulsive influence velocity is ~ µi = ~vi − 1 |Ni|
j∈Ni
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
100
Charles E. Martin and James A. Reggia
Figure 2. The magnitude and sign of the intercellular force as a function of the distance between two cells. A positive value indicates repulsion and a negative value indicates attraction (Eq. 3 with Rcc = 3.0, b = 15.0, c = 12.6, d = 0.68 and α = 1.84).
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
between neighboring growth cones and prevents them from clustering together too closely. It takes the form ρi 2 ~ ρi) = ks 1 − ρˆi. Fs (~ Rgg
(4)
The cohesion force F~c causes neighboring growth cones to be attracted to one another, and is given by ρi) = −kc F~c (~
ρi Rgg
2
ρˆi .
(5)
The alignment force F~a causes neighboring growth cones to have similar velocities, which increases the uniformity of the collective movements. It takes the form ρi , ~µi) = −ka F~a (~
ρi Rgg
2
µ ˆi .
(6)
ˆi are unit vectors and ρi = k~ ρik. Here, ks , kc, ka ∈ R+ , ρˆi and µ A local growth force F~lgf also affects growth cones and is partly responsible for guiding axons from emitting cells to target cells. It represents a local force field that can appear and disappear around specific cells at certain times during the simulation depending upon information encoded in the rule set. As a growth cone passes through these fields its motion is affected, driving it towards or away from nearby cells. F~lgf is given by
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
F~lgf (~rji) =
(
±k(rji + 1)−β rˆji , if rji ≤ Rcg 0, if rji > Rcg
101
(7)
where rji is the distance between the ith growth cone and j th cell, rˆji is the unit vector pointing from the cell to the growth cone, k, β ∈ R+ and Rcg is the radius of interaction between cells and growth cones. As specified in Table 3, the sign and magnitude of the parameter ±k are set using the SetLGF command function. 2.3.2. Environment and Rule-Based Forces All agents are subject to a drag-like friction force that limits their acceleration and prevents the center of mass of the agents from drifting continuously, making this a dissipative system where, under most circumstances, it will come to rest within a reasonable period of time, forming a static network with a particular geometric structure. The force is given by
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
~drag (~vi ) = −cd~vi F
(8)
where ~vi is the velocity of the ith agent and cd ∈ R+ . In order to make the model more realistic in a physical sense, agents collide rather than simply passing through one another. Recall that both cell and growth cone agents are modeled as spheres, not as points. Collisions are modeled using a penalty method (also known as soft collisions), in which a very strong, short range force is engaged between two agents when their boundaries intersect. This force will be on the order of 10 to 1,000 depending on the type of agents colliding and the distance between their centers. Consider an agent i and an agent j and let dc equal the sum of the two agents’ radii. Then the collision force exerted on agent i is given by ( cf (dc − k~rjik + 1)γ rˆji , if k~rjik ≤ dc (9) F~collision (~rji) = 0, if k~rjik > dc where ~rji is the vector that points from the center of agent j to the center of agent i, rˆji is the corresponding unit vector and cf , γ ∈ R+ are constants. Recall that the command SetRuleForce causes a constant force to be applied to an agent for a finite period of time (Section 2.2, Table 3). An agent may experience a sequence of such forces that depends entirely on its execution of these commands. For the ith agent, the i (t) and time-dependent function representing this sequence of forces is expressed as F~rule will be referred to as the rule-based force. This force is implemented as a means of tailoring the movements of the agents. It is especially useful for manipulating the trajectories of growing axons and for positioning cells.
2.4.
Experimental Methods
We undertook five computational experiments. Two of the experiments were designed to study the effectiveness of our approach at guiding the self-assembly of relatively large, pre-specified neural network architectures. The other three experiments were aimed at determining the impacts of collective movements on the robustness of our model.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
102
Charles E. Martin and James A. Reggia
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
2.4.1. Computational Experiments In the first two experiments we used our model to grow self-organizing maps (SOM’s) (Kohonen 2001; Ritter et al. 1992). SOM’s are widely-studied neural networks that have the capacity to learn statistical patterns or features in their input data in an unsupervised manner. There is a great deal of interest in SOM’s both for their usefulness as computational tools (Delgado 2000; Haykin 1999; Lendasse et al. 1998; Pulakka and Kujanpa 1998; Vesanto 1999; Vesanto and Alhoniemi 2000), and because they occur frequently in the brain, in which sensory inputs are mapped in a topographically ordered manner onto different regions such as the cerebral cortex (Farkaˇs and Miikkulainen 1999; Haessly et al. 1995; von der Malsburg 1973; Pearson et al. 1987; Sutton et al. 1994). They typically have two layers of neurons, an input layer, and an output layer in which map formation occurs. Upon receiving input the neurons in the output layer compete for activity until only a small portion of them remain active (the winning neurons) while the rest exhibit little to no activity. The Hebbian learning mechanism used is such that for a given input pattern the winning neurons will tend to respond more strongly the next time that pattern or a similar pattern is presented to the network. This learning process results in the formation of a topographic or feature map, in which neurons in the output layer that are close together tend to respond strongly to similar inputs, and the number of neurons that respond strongly to a particular input typically correlates with the frequency with which that input was presented to the network during training. For these experiments we selected two previously published SOM’s of idealized cerebral cortex regions to use as target structures. One is a topographic map of a patch of somatosensory cortex involving restricted input-to-cortex connectivity (Sutton et al. 1994), and the other is a more complex feature map of a patch of visual cortex involving full input-to-cortex connectivity as well as locally connected excitatory and inhibitory cortical neurons (von der Malsburg 1973). These two networks were originally hand-designed and are representative of a broad class of neural network models of self-organizing maps in the cerebral cortex that have generally been manually constructed in the past. The versions of these networks grown with our model do not necessarily reproduce the patterns of connectivity present in the biological neural networks better than the two original SOM models they are derived from because the intent was to grow networks that approximate the topologies of the models. However, our SOM’s are more like biological systems in that they do not have periodic boundary conditions and their connectivity reproduces that of the original networks in a statistical sense. Because of this difference, once the networks were fully grown and stable, we assessed their abilities to actually form maps during unsupervised Hebbian learning, comparing their functionality in this respect to that of the original models. The SOM’s that our model was used to grow are especially challenging for developmental models to generate, particularly those that incorporate a continuous growth process, for several reasons. The networks are large, each consisting of more than 600 neurons and more than 20,000 connections. There is a significant amount of concurrent axonal growth, which is more like what occurs biologically but complicates the environment in which development takes place. Also, because our model accounts for network geometry in addition to topology, we have the added challenge of dealing with inhomogeneities in the positions of
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks
103
the neurons. Lastly, the growing axons respond to different rules at different times and under different environmental conditions, which means that in the absence of inter-agent forces it is typically very difficult to find an appropriate set of parameters and a parsimonious set of rules that result in the growth of the desired network. Because of the difficulty in specifying rules and parameters to guide network development without inter-agent forces, many of the larger, more complex networks grown by past developmental models have relied upon rules and parameters derived by optimization methods, such as genetic algorithms or genetic programming, as opposed to being hand-written (Astor and Adami 2000; Fleischer 1995), and still these networks are far less complex than the SOM’s our approach has grown, which are based on fixed parameter values and hand-written rule sets, each of which required only about 5 person-days to write. The computational experiments presented herein thus demonstrate the effectiveness of incorporating collective movements into the model and treating the growth process of relatively large and complex networks as one of self-assembly. They also show that the needed rule sets are small enough and intuitive enough that they can be written by hand without the aid of an optimization method. To assess the success of our approach, we evaluate the resulting network architectures by visual inspection, use of the M 1 and M 2 similarity measures described below, and by running the resultant networks after self-assembly to verify that they produce good self-organized maps. One of the significant drawbacks of many past developmental models of neurogenesis is their lack of robustness. That is, relatively minor changes in the rules or parameters of these models often result in large and unpredictable changes in network growth, which in turn makes it very difficult to control the characteristics of the networks that ultimately emerge. In the third, fourth and fifth experiments we used our method to grow networks from a variety of randomly modified rule sets, using both a version of our approach that incorporates collective movements and one that does not. Along with these variations we manipulated the cohesion and velocity alignment forces (Eqs. 5 and 6) by varying the parameters kc and ka respectively. These experiments study the effects that swarm intelligence produced by collective movements has on the robustness of our model. 2.4.2. Implementation Details Our model is implemented as a simulation environment that is written in Java. The system of ordinary differential equations given by Equations 1 and 2, which govern the force-based dynamics of the cells and growth cones respectively, have no analytical solution and must therefore be solved numerically. To do so we use the Forward Euler method with a time-step size of 0.02. This numerical scheme allows the simulator to produce dynamics that accurately reflect the growth processes pertinent to our simulations and computational experiments. In the simulations of cortical network models below, the environment in which networks grow is unbounded and no particular units are assigned to time, distance and mass. Cells and growth cones have radii of rc = 0.5 and rg = 0.17 respectively, and the networks we grow have spatial extents of approximately 20 distance units. For each force and the movement criterion Table 4 lists the parameter values used in the simulations. A few additional conditions are placed on the agents, as follows. The velocities of all agents are bounded above by vmax = 10.0. A cell agent of type “AS”, “ES” or “IS” dies (disappears) as soon
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
104
Charles E. Martin and James A. Reggia
as its Local Time exceeds 0.2 time units. This condition ensures the removal of cells that have the ability to divide but will not because they do not satisfy the appropriate predicates in the rule set. If a growth cone comes in contact with a cell, it establishes a connection with that cell if and only if the cell is of type “E” or “I” and is inducing an attractive local growth force. Whenever new cell or growth cone agents are created, which occurs when a “parent” cell or growth cone executes a rule containing an EmitAxon, DivCell or DivCone command, their initial velocities are the same as the velocity of their parent agent at the time the rule was executed. When a cell divides the centers of the two child cells that replace it are initially a distance of 1.05rc = 0.525 from where the parent cell’s center was located. When an axon branches the newly formed growth cone is a distance of 2rg = 0.34 from the center of its parent growth cone. When a cell emits an axon the center of the newly created growth cone that guides the axon is a distance of 2rc + rg + 0.1 = 1.27 from the center of the emitting cell. All agents share a common reference frame (orientation). This means that given a rule that has a command function that takes a vector as an argument (e.g., variables direction or force), all agents that execute the rule will interpret the vector with respect to the same fixed coordinate system (basis), even though the agents do not have any explicit reference to a global coordinate system. Lastly, a growth cone agent dies, along with the axon branch connected to it, if its Local Time exceeds 3.5 time units. This property is reminiscent of the “pruning” of growing axons that occurs in developing biological nervous systems, and it ensures the removal of the very small fraction of growing axons that originate from cells at layer boundaries and fail to establish connections with cells in neighboring layers. The five computational experiments each ran on a computer with two dual-core 2.66GHz Intel Xeon processors, 4 GB of RAM, and 4,096 KB of L2 cache. Two of the experiments consist of growing large recurrent neural networks based on the somatosensory cortex model presented in (Sutton et al. 1994) and the visual cortex model presented in (von der Malsburg 1973). It takes 1.3 hours of CPU time to grow the somatosensory cortex network (612 cells and 21,188 connections), and 3.8 hours to grow the visual cortex network (631 cells and 46,030 connections). In each of the other three computational experiments a network consisting of 551 cells and approximately 61 connections is grown 1,616 times using different rule sets. Each of these three experiments requires approximately 9 hours of CPU time to complete. 2.4.3. Connectivity Measures Our simulator grows networks in which a neuron may connect with another neuron multiple times and with topologies that are similar, in a statistical sense, to precisely specified templates of connectivity. This means that given a neural network model with a precisely specified topology, such as those often used in neural network applications (e.g., von der Malsburg 1973; Sutton et al. 1994), it is necessary to measure how statistically similar the topology of the network grown by the simulator is to that of the template network. This is accomplished by defining two similarity measures, applying them to the connectivity of each neuron individually, and then computing the average of the two measures over all neurons in the network. Given a set of target neurons T that a neuron n is intended to connect to, the first measure
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
105
Table 4. Parameter Values Used in the Computational Experiments Force Equations of Motion (Eqs. 1, 2)
F~cell (Eq. 3)
~cone (Eqs. 4, 5, 6) F
F~lgf (Eq. 7) F~drag (Eq. 8)
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
~collision (Eq. 9) F Collisions between cells and growth cones. ~collision F Collisions between cells. ~collision F Collisions between growth cones. Movement Criterion for cells. Movement Criterion for growth cones.
Parameter mc mg Rcc α b c d Rgg ks kc ka Rcg ± k β
Value 1.0 0.5 3.0 1.84 15.0 12.6 0.68 2.5 10.0 10.0 10.0 3.0 + 10.0 1.66
cd rc rg cf γ cf γ cf γ δ δ
5.0 0.5 0.17 75.0 3.0 100.0 5.0 25.0 3.0 20.0 10.0 1.0 1.0
M 1 = the percentage of connections from n that connect to a neuron in T quantifies the degree to which a neuron makes connections to target neurons as opposed to nontarget neurons. The second measure M 2 = the percentage of neurons in T that receive at least one connection from n quantifies the degree to which a neuron makes a connection to all of its target neurons. The averages are taken over all neurons so as to capture the global similarity of the grown network to its template network. The connectivity template is adhered to exactly when both M 1 and M 2 are 100%.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
106
Charles E. Martin and James A. Reggia
Two of the networks grown by our simulator use the networks discussed in (von der Malsburg 1973) and (Sutton et al. 1994) as templates. These papers present a number of experiments that demonstrate some of the computational properties of the template networks. We performed a number of similar experiments using the corresponding networks grown by the simulator, the results of which are used as an additional gauge on the topological similarity between the grown networks and their templates. The geometries of these template networks were general enough (e.g., multiple uniform layers of cells), that it was sufficient to determine the similarity of a grown network’s geometry in a qualitative manner by visualizing it using the simulator’s 3D graphics component. The desired numbers of cells in the grown networks were obtained by writing rules having appropriate values for the life variable. Though important for the growth process, the particular paths taken by growing axons and migrating cells were not rigorously analyzed because the computational properties of the grown networks depend only on their topologies, and not on qualities such as the lengths of their axons.
3.
Results
Here we describe the results of the five computational experiments we performed with our model. The results of the first two experiments demonstrate the ability of our model to grow large recurrent neural networks based on specific target networks, and that the computational properties of the grown networks are similar to those exhibited by their target models. The results of the other three experiments illustrate the improvements in robustness that can be gained by incorporating swarm intelligence in the form of collective movements into the neural self-assembly process.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
3.1.
Somatosensory Cortex Model
In (Sutton et al. 1994), the authors present a SOM of the primary somatosensory cortex (area S1) in which the network input represents focal stimulation of a hand consisting of four fingers and a palm, and the topographic map depicts different areas of the hand and the frequency with which particular regions have been touched. The first layer of the network represents a region of the thalamus and the second layer represents a region of the primary somatosensory cortex. Both the thalamic and cortical neurons act as excitatory cells, and input to the network from the hand enters through the thalamic layer. Figure 3a illustrates the patterns of connectivity in this recurrently connected network. Each neuron in the thalamus connects to its 30 closest cortical neurons and each cortical neuron connects to its six nearest cortical neighbors. In (Sutton et al. 1994), it was demonstrated through a series of computational experiments that this model is sufficient to reproduce a number of important properties exhibited by cortical maps. For example, it was shown that in response to uniformly distributed input stimuli the SOM organizes into a highly-refined topographic map from an initially coarse topographic representation, and that repeated stimulation of a localized region of the sensory layer results in a substantial increase in the degree of cortical representation of that region. We applied our method to grow a network, using the network underlying the selforganizing map described in (Sutton et al. 1994) as a target. The early stages of the
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks a.
107
b.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 3. a. The connectivity of the somatosensory cortex model. The planes represent the layers of cortical and thalamic neurons. b. The connectivity of the visual cortex model. The afferent layer corresponds to retina, the other two layers to cortical neurons. The arrows indicate general inter/intra-layer patterns of connectivity. growth process are shown in Figure 4. This network consists of two layers of neurons, each of which is required to establish roughly topographic connections with sizable groups of target neurons, and to do so in an environment complicated by a substantial amount of concurrent axonal development. The grown network is quite large, consisting of 612 cells and 21,188 connections, and is a recurrent network, which further increases the difficulty of growing it. The similarity measures of the fully grown network were M 1 = 87% and M 2 = 76%. Our model used a rule set consisting of 58 rules to grow the network. Twelve of these rules regulate cell divisions and dynamics, 44 regulate axon emissions, branching and dynamics, and 2 control the local growth force. This rule set and the initial conditions can be found in (Martin and Reggia 2010), along with a brief explanation of how we design rule sets. The growth process was considered to be completed once each cell had emitted all of its axons and no axons were still growing. To assess the degree to which our grown S1 network possesses the computational properties of the original S1 model (Sutton et al. 1994), we tested the ability of our grown network to replicate the main map formation results obtained with the original network. More specifically, to measure the characteristics and quality of the topographic maps formed by the grown network, we compared measures of the cortical receptive field of each cortical cell before and after training, using the same measures as in (Sutton et al. 1994). The network grown by our approach consists of two layers of neurons that are parallel to the x-y plane. Let xi and yi be the x and y coordinates of thalamic cell i and aji the activity level of cortical cell j when an input of 1.0 is applied to thalamic cell i and an input of 0.0 is applied to all other thalamic cells. The total response rj of cortical cell j is defined as rj =
X
aji ,
(10)
i
where the sum is taken over all thalamic cells. The receptive field of cortical cell j, which measures how a cortical cell responds to the activities of the thalamic cells, is characterized Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
108
Charles E. Martin and James A. Reggia a.
b.
c.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
e.
d.
f.
Figure 4. An example of network growth generated by our model. The large spheres represent cells, the lines between cells denote axons, and the small spheres at the ends of growing axons represent growth cones. a. The network seen here develops from a set of four “seed” cells, shown in their initial positions. b, c. Repeated cell divisions cause the thalamic (bottom) and cortical (top) layers to expand. d. Cell divisions have stopped and the axons are beginning to grow. e. The axons begin to establish connections. The lateral cortical connections are not shown. f. Early on in the growth process additional cells have emitted axons. The axons continue to grow and establish connections.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
109
by its center and its moments. The coordinates of its center are given by x ˆj =
1 rj
P
i
xi aji
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Its x and y moments are defined by r 2 1 P wxj = aji (x − x ˆ ) i j i rj
and
and
yˆj =
1 rj
wyj =
P
i yi aji .
r
1 rj
2 aji . (y − y ˆ ) j i i
P
The moments measure the spread of a cortical receptive field. We performed the two primary computational experiments concerning map formation presented in (Sutton et al. 1994) using the same activation dynamics, input-stimuli, Hebbian synaptic changes, and parameter values, but with the thalamocortical network grown using our method. The first experiment tested the ability of the grown but initially untrained network to self-organize into a well-formed topographic map when trained using a uniformly distributed, random sequence of stimuli. In the second experiment the network trained in the first experiment was subjected to further training, but in this case the random sequence of input stimuli was distributed such that the second finger from the left (finger 2) was seven times more likely to be stimulated than the rest of the hand. This experiment tested the ability of the network to adapt its topographic map in response to a change in the input distribution. The quality of the resultant topographic maps were assessed visually in terms of the same three characteristics used in the original study: reduction in the size of receptive field moments, development of receptive field centers that reflect the probability distribution with which the thalamic neurons were stimulated, and the extent to which receptive field centers of neighboring cortical neurons become close to one another. Figures 5a-c illustrate the first two of these characteristics. The ellipses show the centers and moments of the cortical receptive fields plotted in the x-y plane of the input layer (strictly speaking, these are not the full receptive fields but are proportional in size to them). The outline of the hand (four fingers and a palm) is shown. The receptive fields in Figure 5a are for the initial untrained network. Their moments are relatively large and their centers have a somewhat irregular distribution across the hand, as in the original study. For the untrained network the average x moment wx is 1.008 and the average y moment wy is 1.031, with standard deviations σwx = 0.417 and σwy = 0.442 respectively. In contrast, the receptive fields in Figure 5b, which are those of the network trained with the uniformly distributed stimuli, have much smaller moments and their centers are more evenly distributed across the hand. In this case wx = 0.582 and wy = 0.578, with standard deviations σwx = 0.184 and σwy = 0.209 respectively. These are two of the characteristics of high quality topographic maps (Sutton et al. 1994). Figure 5c shows the cortical receptive fields derived from a network in which the second finger from the left was seven times more likely to be stimulated during training than the rest of the hand. This bias in the stimulations has caused some of the receptive fields from finger 1 and finger 3 and neighboring regions of the palm to shift their centers towards and into the region of the thalamic layer that represents finger 2. Furthermore, the moments of the receptive fields in the finger 2 region have become smaller on average. Specifically, there are 33 receptive field centers in the finger 2 region of the network trained with uniform stimuli, and for these receptive fields wx = 0.572, wy = 0.593, σwx =
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
110
Charles E. Martin and James A. Reggia
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
0.148 and σwy = 0.150. For the network trained with the biased stimuli the number of receptive fields in the finger 2 region has increased to 68, with wx = 0.432, wy = 0.456, σwx = 0.253 and σwy = 0.251 for these receptive fields. This means that more cortical cells have become selectively tuned to stimulations of finger 2. This adaptability to changes in the input distribution is one of the desirable properties of SOM’s, and from a qualitative standpoint, these results match-up well with those presented in (Sutton et al. 1994). Figures 5d-f illustrate the third characteristic used to measure the quality of topographic maps. The dots represent the centers of the cortical receptive fields. The edges connect dots that belong to receptive fields of cortical neurons whose centers are within a distance of 1.6 of one another, which for most cortical neurons defines a neighborhood consisting of the six nearest cortical neighbors. Figure 5d was derived from the untrained thalamocortical network. The large degree of irregularity in the grid structure (edges cross one another 482 times) is indicative of poor topographic map formation. That is, many of the receptive fields that belong to neighboring cortical neurons are not themselves neighbors in the input space. In contrast, Figure 5e, which comes from the network trained with uniformly distributed stimuli, exhibits a far more regular grid (edges cross one another only 53 times), and thus indicates the formation of a better topographic map. In this case, receptive fields that are topological neighbors tend to be geometric neighbors in the input space as well. Figure 5f shows the reorganization of the topographic map in response to an increase in the frequency with which finger 2 was stimulated relative to the rest of the hand. It can be seen that the map has adapted, in that a significant number of the receptive field centers in regions surrounding finger 2 have moved towards and into the region of the thalamus representing this finger while largely preserving the topological/geometrical neighbors duality (edges cross one another 198 times). This is further evidence that the network grown via our approach possesses the desirable characteristics of SOM’s. These results are all qualitatively very similar to those observed in (Sutton et al. 1994), demonstrating that the grown network is as equally suited for SOM formation as the original model.
3.2.
Visual Cortex Model
In (von der Malsburg 1973), the author presented the first and one of the most influential models of self-organizing maps. It was proposed as an explanation for how organization of visual cortex (area V1) might arise through self-organization (learning) rather than being entirely genetically predetermined. Unlike the topographic map of somatosensory cortex described in the preceding section, this model is a feature map: a map of the sensitivity of cortical excitatory neurons to the orientation of lines that pass through their receptive fields. Further, this visual cortex model differs from the somatosensory model in that it has a more complex architecture, and one that involves highly constrained recurrent connectivity. Past developmental models of neural network growth have most often focused on the easier task of producing feedforward networks without recurrent connections. Figure 3b shows the architecture of the visual cortex model, which consists of an afferent layer (retina), and two hexagonally tessellated cortical layers of excitatory and inhibitory neurons. Recurrent connectivity arises because each excitatory neuron must connect to its six immediately adjacent excitatory neurons and to a corresponding local patch of inhibitory neurons, the latter of which send connections back to a larger annulus of excitatory neurons. This recurrent
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks a.
b.
c.
d.
e.
f.
111
Figure 5. In a.-c. the ellipses indicate the centers and moments of cortical receptive fields plotted in the x-y plane of the input layer. The lines indicate the boundaries between the four fingers and palm of a hand (bottom region). a. Receptive fields of the untrained version of the thalamocortical network grown using our approach are large and have a somewhat irregular spatial distribution. b. Receptive fields from the network trained with uniformly distributed stimuli are much smaller and have a more uniform spatial distribution. c. Receptive fields derived from the thalamocortical network trained with finger 2 seven times more likely to be stimulated than the rest of the hand. Some of the receptive fields from surrounding areas have moved towards and into the region of the thalamic layer that represents finger 2 and become even smaller in size. This indicates that the network grown using our method possesses the topographic adaptability that is characteristic of SOM’s. In d.-f. the dots indicate the centers of cortical receptive fields plotted in the x-y plane of the input layer. Dots are connected if they belong to the receptive fields of neighboring cortical neurons. d. Receptive field centers of the untrained version of the thalamocortical network grown by our approach. They indicate a lack of topographic organization in the cortical layer. e. Receptive field centers from the network trained with uniformly distributed stimuli. The increased regularity in the grid structure is one of the indicators of a high quality topographic map. f. Receptive field centers derived from the thalamocortical network trained with finger 2 seven times more likely to be stimulated than the rest of the hand. A significant number of the receptive field centers have shifted into or towards this finger, illustrating the map’s adaptability.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
112
Charles E. Martin and James A. Reggia
connectivity among the excitatory and inhibitory neurons results in short range excitation and longer range inhibition among the excitatory cells, which constitute the map forming layer. Given “bars” (lines) of activity as input stimuli, it was shown in (von der Malsburg 1973) that this model is able to self-organize so as to qualitatively reproduce the type of topographic feature map of orientation sensitive neurons observed in the visual cortex.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 6 shows the early stages of the developmental process of a SOM modeled after the one developed by von der Malsburg. This network has a complex topology and there is a large amount of simultaneous axonal development. The network is also very large, consisting of 631 cells and 46,030 connections by the time the growth process is complete, and was grown with a rule set that contains 177 rules. Fifteen of those rules regulate cell divisions and dynamics, 160 regulate axon emissions, branching, and dynamics, and the remaining 2 govern the local growth force. The network consists of the same three layers as the original: an afferent (input) layer at the bottom, which is intended to model a retina; a layer of excitatory cells in the middle, which is where map formation occurs; and a layer of inhibitory cells at the top. The excitatory and inhibitory layers jointly represent a patch of the striate cortex. The recurrent network grown via our method based on the template presented by von der Malsburg had M 1 = 73% and M 2 = 77%. The growth process was considered to be completed once each cell had emitted all of its axons and no axons were still growing. The rule set and an animation of the neural growth process depicted in Figure 6 may be viewed on the World Wide Web (Reggia and Martin 2009). To further assess the extent to which our grown V1 network corresponds to the original von der Malsburg model (von der Malsburg 1973), we evaluated the ability of our grown network to replicate the main map formation results obtained with the original network. More specifically, once fully assembled, the neural network shown in Figure 6 was trained with the same nine orientations of line-like input patterns, activation dynamics, Hebbian learning rule, and parameter values as in von der Malsburg’s original study described in (von der Malsburg 1973). Figure 7 illustrates each excitatory neuron’s preferred orientation for input stimuli (if it has a preference) before and after training, respectively. A neuron’s preferred orientation was defined to be the orientation of input stimuli to which the neuron responded most strongly with an activity level of greater than 1.44 standard deviations from its mean response over all nine input orientations. If a neuron’s activity level did not satisfy this condition for any of the input orientations, then that neuron was considered to have no preferred orientation. The untrained network exhibits poor topographic organization of the input stimuli. Many of the neurons are not tuned to a particular input orientation and those that are frequently have neighbors with input orientation preferences very different from their own. Furthermore, a disproportionately large number of neurons respond most strongly to bars with orientations of 90, 70 and 350 degrees, and relatively few respond to orientations of 30, 330 and 310 degrees. In contrast, the trained network is seen to have self-organized into a high quality topographic feature map. Almost all of the neurons have become tuned to a particular input orientation and transitions between regions of the map containing neurons with different orientation tunings are much smoother. Moreover, for each input orientation the number of neurons tuned to it is roughly the same. These results are qualitatively similar to those exhibited by the original network (von der Malsburg 1973).
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks a.
b.
c.
e.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
113
d.
f.
Figure 6. The V1 (visual cortex) network grown using our model. a. The network develops from a set of nine “seed” cells, shown here in their initial positions. b. Repeated cell divisions cause the layers to expand. c. The fully formed afferent (bottom) layer, and partially formed excitatory (middle) and inhibitory (top) layers. d. Cell divisions have stopped and the axons have begun to grow and establish connections. e. Recurrent connectivity similar to that of the original target model (von der Malsburg 1973) is forming between the excitatory and inhibitory layers. f. Early on in the growth process additional cells have emitted axons. The axons continue to grow and establish connections.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
114
Charles E. Martin and James A. Reggia a.
b.
Figure 7. Each dot represents the center of a neuron in the excitatory layer. The line passing through a dot represents the orientation of the input stimulus that the corresponding neuron responds to most strongly and is absent (only a dot appears) if a neuron does not exhibit any responses that are greater than 1.44 standard deviations from its mean response over all nine input orientations. a. The orientation tuning of the excitatory neurons prior to training. b. After training. The latter is a well-formed topographic feature map similar to that observed in von der Malsburg’s original model.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
3.3.
Robustness Experiments
We performed three experiments to determine what, if any, improvements in robustness are brought to our model by incorporating swarm intelligence in the form of locally coordinated collective movements. In this context, robustness is a measure of the degree to which the rule set can be varied without substantially reducing the quality of the networks being grown. This is an important characteristic for any approach to network self-assembly/growth/development to have because developmental models with degrees of complexity on par with our own often exhibit a considerable degree of sensitivity to their parameters, making it difficult to grow predefined complex networks (Fleischer and Barr 1994; Kalay et al. 1995). We compared the robustness of two different versions of our methodology. In the rules-only version growth was dictated by only the rule set and did not incorporate any of the forces that generate collective movements of the growth cones (Eqs. 4-7). Such an approach is comparable to producing growth with an L-system. In contrast, in the rules-and-swarm version networks grew by incorporating both the rule set and the collective movements of “swarms” of growth cones mediated through the intercone force (Eqs. 4-6), and the local growth force (Eq. 7). The intercellular force (Eq. 3) was not used in either version. An example network grown in these experiments is shown in Figure 8. It was chosen because it captures many of the characteristics that make growing neural networks with a developmental model difficult. Specifically, it has a small region of target cells, coupled with axonal growth over a substantial distance, and intermediary cells acting as obstacles. Ideal networks were defined as those adhering to a connectivity template in which all con-
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
115
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
nections are made to target cells and each target cell receives at least one connection. That is, for the highest quality networks the two similarity measures defined in Section 2.4. are both equal to 100%.
Figure 8. An example of the networks being grown by the simulator during the robustness experiments described in Section 3.3.. A single afferent neuron emits axons that must then grow through a dense region of obstacle cells to reach their target neurons shown in lightgray. An initial rule set was developed by hand (henceforth referred to as the base rule set), such that the rules-only model would grow a good quality network. This base rule set was also separately combined with a set of parameter values governing the force-based interactions of the agents such that the rules-and-swarm model would grow a good quality network. For the intercone force we used ks = 10 in each experiment below, Rgg = 1.5 in the first experiment, and Rgg = 2.5 in the second and third experiments. In the first experiment ka = 14 and kc = 14; in the second experiment ka = 10 and kc was a variable; in the third experiment both ka and kc were variables. In each experiment the local growth force was attractive with k = 30, Rcg = 6 and β set so that the strength of the force at its boundary (rji = Rcg ) was 10 percent of its maximum value. The experiments consisted of varying the base rule set in a random manner and then comparing the changes in the quality of the networks grown by the two versions of our methodology. It was hypothesized that collective movements improve robustness and that one of the primary ways that they do so is by making network growth less sensitive to the rule set. This hypothesis would be supported if the quality of the networks grown by the rules-only version deteriorate significantly more than those generated by the rules-andswarm version as the random perturbations to the base rule set increase in size.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
116
Charles E. Martin and James A. Reggia
We elected to perturb the base rule set by making random changes to two variables named Local Times and force. In our approach, axonal growth is partly determined by the rule set, and there are two types of rules that are very frequently used and which have a particularly strong influence over the growth process. These are rules that contain either a SetRuleForce command or a DivCone command (see Table 3). More specifically, it is the Local Times variable, which is present in the predicate of rules that contain a DivCone command, and the force variable, which is an argument to a SetRuleForce command, that are largely responsible for the widely varying effects that these types of rules often have. It is the values of these variables that are among the most challenging to derive when writing a rule set to grow a particular network. All of the rule sets used in the following experiments consisted of 47 rules. Each of the 20 rules with a SetRuleForce command that were varied in the base rule set had a force variable with the same magnitude, but the directions were different. The directions were represented in spherical coordinates [θ, φ], where θ ∈ [0◦, 360◦) and φ ∈ [0◦, 180◦]. For a particular SetRuleForce command in the base rule set, let θ0 be the specified azimuth and φ0 the polar angle. Likewise, for one of the 24 rules in the base rule set that contains a DivCone command, let T0 be the specified value of T ∈ R+ , which we use to denote the Local Times variable. In the experiments, the rules in the base rule set were varied by drawing θ, φ and T from the uniform probability distributions θ ∈ U [θ0 − ∆θd , θ0 + ∆θd ], φ ∈ U [φ0, φ0 + ∆φd ] and T ∈ U [(1 − pd)T0, (1 + pd)T0]. Here, ∆θd = 1.7d, ∆φd = d and pd = 0.017d, where d ∈ {1, 2, 3, ..., 15} is the degree of variability of the rule sets. Larger values of d correspond to greater perturbations, and thus to a user having more freedom in choosing the rule set. We compared the effects of increasing the variability of the rule sets on the networks grown by the version of our model without collective movements ( rules-only) and the version with collective movements (rules-and-swarm) using the similarity measures (Sect. 2.4.). For each value of d, and for both versions of our model, 101 trials were performed. Each trial consisted of growing a network using a rule set that was derived from the base rule set by subjecting the θ, φ and T components of its constituent rules to random perturbations according to the aforementioned probability distributions. In all three experiments each of the parameters of the forces responsible for inducing collective movements ( ks , kc , ka, Rgg , k, and Rcg in Eqs. 4-7), were randomly varied by up to ±20 percent at the beginning of each trial, as were the positions of all cells, except the afferent cell. For each value of d these trials were used to determine the average values of the similarity measures M 1 and M 2. The base rule set and initial parameter values were chosen so that the corresponding similarity measures of the networks grown by the two versions of our model were as close in value as we could make them when there was no rule set variability ( d = 0). In these experiments we are interested in the changes in the similarity measures relative to their values when d = 0. In the first experiment the parameters of the intercone force were held constant (Rgg = 1.5, ks = 10, ka = 14 and kc = 14), as d was varied. The results of this experiment are shown in Figure 9. They illustrate the ability of the rules-and-swarm model to persistently grow good quality networks despite progressively larger values of rule set variability, and the comparatively rapid deterioration of network quality with increasing rule set variability for the rules-only model. This indicates that incorporating collective movements into the
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks
117
network growth process improves the robustness of our model. It was found through trial and error that the improvement occurs as long as the neighborhood radius Rgg is made small enough, and the coefficients ka and kc are made large enough relative to ks . When this is the case M 1 will decrease with increasing d at a significantly slower rate than it does for the rules-only version, and M 2 will increase up to about d = 8. However, enforcing these constraints on the parameters in the intercone force tends to cause M 2 to have a lower value when d is small (seen on the left in Figure 9b). We hypothesized that both M 1 and M 2 could be made to take on higher values when d is small while retaining the greater robustness of the rules-and-swarm version of the model by allowing one or more of the coefficients in the intercone force to vary as functions of d. Consequently, in the second experiment the parameter kc in the cohesion force was varied according to the function kc (d) = 1.36d − 0.36. That is, the cohesion among the growth cones was increased linearly with respect to increasing rule set variability. The parameters ka = 10 and Rgg = 2.5 were held constant. We performed this experiment multiple times, each time using a different linear function for kc . Specifically, kc (d) = md + (1 − m) and the experiment was repeated using different values for m ∈ R+ . For each of the values m ∈ {1.36, 1.71, 1.86, 1.93} the results were very similar. Figure 10 illustrates that if the amount of cohesion among the swarming growth cones is allowed to increase as a linear function of the degree of rule set variability d, then the rules-and-swarm version of our model is capable of growing networks with relatively high values for M 1 and M 2 when d is small and that the values of both similarity measures decrease at significantly slower rates with respect to increasing d than they do for the rules-only version. In conducting the second experiment we discovered that imposing the constraint m ≥ 15 in the equation kc (15) = m(15) + (1 − m) results in M 1 remaining very close to its 7 base value of 90.7%, but that M 2 then decreases further from its base value of 77.2%. In response to this we asked the following question: can further improvements be gained by allowing more than one of the parameters in the intercone force to vary simultaneously? We hypothesized that the decrease in similarity measure M 1 could be minimized, while keeping the reduction in M 2 small, by allowing both the cohesion factor kc and the velocity alignment factor ka from Equation 6 to vary together as functions of d. To test this hypothesis we performed a third experiment that was identical to the previous experiments except that kc (d) = d and ka(d) = 1.07d + 8.93 were both variables. In this experiment Rgg = 2.5. The results of this third experiment are shown in Figure 11. For the rules-andswarm version of our model similarity measure M 1 remains very close to its base value, and M 2 decreases by only about six percentage points. These results further indicate the usefulness of incorporating collective movements into the network growth process. We performed this experiment five more times using various linear functions for kc and ka . Specifically, kc (d) = mc d+(1−mc ) and ka(d) = mad+(10−ma ), where ma , mc ∈ R+ . For each ordered pair (mc , ma) ∈ {(0.64, 1.07), (0.64, 1.43), (1.0, 0.71), (1.0, 1.07), (1.0, 1.43)} the results of the experiment were similar.
4.
Discussion
During any nontrivial neural growth process the many agents involved are subjected to a wide variety of time-varying influences, particularly when the agents are moving. The Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
118
Charles E. Martin and James A. Reggia a.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
b.
Figure 9. A comparison of the change in neural network quality measures a. M 1 and b. M 2 as rule set variability d increases, between a version of our model that incorporates only rule-based growth and a version that utilizes both rules and collective movements generated via local forces. The parameters of the intercone force were held constant (Rgg = 1.5, ks = 10, ka = 14 and kc = 14). The dashed dark-gray lines indicate the average correctness of the networks grown by the rules-only model when the rule sets have zero variability ( d = 0). The dashed light-gray lines indicate the same thing but for the rules-and-swarm model. The error bars are 95% confidence intervals.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
119
a.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
b.
Figure 10. Comparison of the dependence of neural network quality on the variability of the rule sets, between a version of our model that incorporates only rule-based growth and a version that utilizes both rules and swarm intelligence generated via local forces. The degree of cohesion kc was an increasing linear function of the degree of variability ( d). Same notation as in Figure 9.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
120
Charles E. Martin and James A. Reggia a.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
b.
Figure 11. Comparison of the dependence of neural network quality on the variability of the rule sets, between a version of our model that incorporates only rule-based growth and a version that utilizes both rules and swarm intelligence generated via local forces. The degree of cohesion kc and velocity alignment ka were now both increasing linear functions of the degree of variability (d). Same notation as that used in Figure 9.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks
121
result is that each agent encounters a highly dynamic local environment, which makes it difficult to predict and control an agent’s behavior. This tends to make it very difficult to derive a parsimonious set of rules and parameters that will result in the growth of a particular target network. However, we surmised that incorporating swarm intelligence in the form of collective movements into the model would allow agents to act collectively in a way that overcomes many of the challenges of a dynamic and inhomogeneous environment. In particular, agents could utilize information from their neighbors to guide their own dynamics. The developmental model of neural network self-assembly that we have presented here incorporates the collective growth and interactions of discrete neurons in a continuous threedimensional space. Unlike most artificial neural network models, the networks grown using our model are characterized by a geometric relationship among their neurons in addition to a topological one. Furthermore, the grown networks are more like natural networks in that they have nonuniform densities of cells, less regular topologies, redundant connections between neurons, patterns of connectivity that are statistical in nature rather than being exact, and nonperiodic boundary conditions. For example, on average a thalamic cell in the grown network based on the somatosensory cortex model makes 1.9 connections with any target cortical cell, and on average an afferent cell in the grown network based on the visual cortex model makes 5.6 connections with any target excitatory cell. Such characteristics could be of interest to those studying the growth and properties of real-world networks such as biological neural networks or other types of networks that self-assemble from physical components and exhibit similar features. Past developmental models that incorporate continuous neural network growth have often been plagued by a lack of controllability over the dynamics of the growth processes and the characteristics of the resulting networks. In contrast, our model exhibits greatly improved controllability due largely to the incorporation of local forces among the components of growing neurons. We demonstrated this improvement by using our model to grow two relatively large and topologically complex networks inspired by past neural network models designed by hand. These two networks would, at best, be very difficult to generate using most past models that incorporate a continuous neural growth process (e.g., those that primarily use gradient following). The self-organizing maps that emerged on the grown networks during subsequent learning were shown to exhibit computational properties similar to their archetypes, indicating our model’s ability to grow three-dimensional networks with topologies that are statistical realizations of more abstract neural templates. Another significant result of this study is that incorporating collective movements into our model increased its robustness. The improvements in robustness were demonstrated by a computational experiment in which the network growing capabilities of a version of our model that incorporates collective movements were compared to those of a version that does not. Based on this experiment it is evident that swarming growth cones are able to counterbalance significant increases in the degree of rule set variability. The resulting gain in robustness is one of the primary reasons why our model offers increased controllability with respect to previous models. This is evident in the first of the robustness experiments, where the model’s parameters were independent of the degree of variability, and yet good networks continued to be grown even for large degrees of rule set variability. For degrees of variability around d = 12 (see Section 3.3.), M 2 increased by about 11 percentage points from its base value and M 1 decreased by only about 15 percentage points from its
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
122
Charles E. Martin and James A. Reggia
base value. In two additional experiments the amount of cohesion kc and velocity alignment ka among the swarming growth cones were allowed to vary as functions of the degree of variability, yielding substantial additional improvements in the model’s ability to grow high quality networks over a range of rule set variabilities. These improvements stemmed from the growth cones’ ability to collectively guide one another’s trajectories; for example, they were able to establish the proper density among the growing axons and correct abnormal courses of growth. The local growth force, Equation 7, also plays a role by helping to guide growing axons towards their target cells without obfuscating the self-assembly process. The results demonstrate how incorporating collective movements into our model significantly improves its robustness by making network growth less sensitive to the rule set. Furthermore, they show that the improvements in robustness are not highly dependent on the values of the forces’ parameters. The resulting increase in robustness, combined with the intuitiveness that the collective movements give to the agents’ dynamics, makes it easier to find a set of rules and parameters that cause the model to grow networks with desired characteristics. In other words, collective movements improve the controllability of our model. We believe that this is one of the central reasons why our model has been successful at growing pre-specified networks that are much larger than those grown by past models. We have also found that collective movements tend to reduce the size of the rule sets, although we have not quantified this in any way. Our experience suggests that this phenomenon is a consequence of the fact that in writing a rule set to grow a particular network it is easy to take advantage of the collective dynamics of the agents to guide the growth process, which reduces the need to explicitly encode trajectory information in the rule set. The rule set, in addition to collective movements, plays an important role in our model. It improves one’s ability to predict and control when and where the agents involved in a growing network execute discontinuous actions, such as cell division, axon emission and branching, and changes in a cell’s local growth force, which all have a significant impact on network development. In past models of network development such discontinues actions are usually triggered by an agent’s environment-dependent interactions (the agent’s gradual acquisition of a substance diffusing through the environment, its contact with other agents, etc.), however, these interactions are typically very hard to predict and control, and hence so are the actions they induce. The incorporation of the rule set, and its control over discontinuous actions, helps our model to overcome this challenge by making an agent’s “decision” to execute such an action relatively environment-independent. That is, none of an agent’s state variables (see Table 1) depends on its interactions with the environment, and hence neither do the truth values of the rule predicates. The rule set also enhances controllability through the inclusion of the rule-based force, which allows growth dynamics to be tailored to the specific network being grown by granting each agent a certain degree of autonomy from the other agents and its environment. Furthermore, this force eliminates the need for any long range guidance forces, thus allowing the model to adhere to the localinteractions-only criterion, which in our experience is important because the use of nonlocal forces in the computational experiments described here tends to create a more convoluted environment due to the increase in the number of forces to which an agent is simultaneously subjected. This in turn reduces the model’s controllability and the intuitiveness of the agents’ dynamics. While the work that we have described in this chapter does not try to model specific
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Swarm Intelligence for the Self-Assembly of Neural Networks
123
biological data, it is interesting to consider whether our approach has any implications for neuroscience. In neuroscience, there is currently an intense experimental effort underway to better understand how complex interactions between genetic and activity-dependent factors determine the wiring of neural circuitry during an organism’s developmental period (Grove and Fukuchi-Shimogori 2003; Lopez-Bendito and Molnar 2003; Spitzer 2006). While the vast majority of models in computational neuroscience do not involve network self-assembly or connection growth, there has been substantial recent interest in modeling neural development. Much of this work has focused on the formation of topographically structured connections in specific brain regions, and is based upon axon growth that is guided by growth cones that are sensitive to local biomolecular gradients (Goodhill et al. 2004; Goodhill and Xu 2005; Hentschel and van Ooyen 1999; Honda 2003). Growth cones “steer” the direction in which axons grow to their target termination locations. These past models of neurobiological development are like our own work in explicitly incorporating geometric relations (not just network topology) and in simulating the growth of axons through physical space. However, unlike our work they are often but not always limited to two-dimensional space, typically do not incorporate cell migration and division, are generally applied to relatively small networks, are usually concerned with feedforward networks, and do not consider axon-axon interactions during network assembly (for exceptions to the latter, see Goodhill et al. 2004; Yates et al. 2004). To our knowledge, no past models of neurogenesis done in computational neuroscience have explicitly recognized the relationship of work in that area to concepts that have emerged from swarm intelligence research on collective movements and self-assembly over the last several years (Grushin and Reggia 2008; Reynolds 1987; Rodriguez and Reggia 2004). Our model thus has a great deal of potential as a tool for future neuroscience studies of biological network growth, in part because it grows networks that incorporate geometry as well as topology. The flexibility of the model allows for straightforward incorporation of additional biological detail such as dendritic trees, more biologically realistic parameter values (e.g., making viscous drag negligible), and the diffusion of chemical messengers like neural growth factor. Moreover, the continuous nature of the growth process and the incorporation of neural activity dynamics also makes it well suited for studying the role of network activity during development (van Ooyen 1994). Finally, it would also be very valuable to construct a precise mapping between neuro-chemical mechanisms and model rules/equations to stimulate further theoretical advances. The topology of a neural network is often a significant factor in influencing its performance. Unfortunately the best topology for a given problem and training algorithm is normally unknown. The designer may use heuristics, but in many circumstances the relationship between the problem, training procedure and network topology is not sufficiently understood to produce a near optimal topology via this approach alone. In such cases optimization methods, such as evolutionary computation or particle swarm optimization, have been successfully adapted to search through the space of potential topologies for an optimal configuration (Carvalho and Ludermir 2007; Chval 2002; Gruau 1993; Jung and Reggia 2006; Kitano 1990). Because the networks grown using our model develop through a process of self-assembly that involves simple, local interactions among large numbers of agents, this process could be extended to implement a form of swarm-based topology optimization. More specifically, the agents constituting a growing network could respond to
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
124
Charles E. Martin and James A. Reggia
input stimuli and the actions of neighbor agents in such a way that the growth process would effectively be a guided exploration of prospective topologies in which beneficial patterns of connectivity are exploited, eventually leading to convergence upon a network that is a good solution to the problem at hand and is of minimal topological complexity. For example, the model could be adapted to generate minimally complex networks that solve computational problems such as time-series forecasting. This adaptation of the growth process could also be modified to train a network’s weights.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
5.
Conclusions and Future Work
We have demonstrated some of the benefits that swarm intelligence in the form of collective movements can bring to the self-assembly of networks and how our developmental model incorporates these benefits to improve the modeling of neural network growth. We conclude that collective movements can substantially improve the robustness of network self-assembly as well as the ability to control the characteristics of the grown networks. Using parsimonious sets of rules our model can be used to grow large, three-dimensional, recurrent networks with fairly complex patterns of connectivity that are statistical realizations of rigorously specified topologies. Furthermore, our approach has an abundance of potential future applications, not just in the modeling and computational study of biological neurogenesis during development, but also in topology optimization and the training of artificial neural networks (i.e., by allowing continuing, post-developmental neurogenesis during learning), as well as in computer graphics and other types of pattern formation and self-assembly that occur in systems consisting of agents interacting in a simple and local manner. An important area for future research is the automatic generation of the needed control rules rather than their manual creation when given a target network. Although this has been achieved for self-assembly of some structures (e.g., Grushin and Reggia 2006; Grushin and Reggia 2008), to our knowledge it has not previously been studied for network structures like those considered here. In this scenario a high-level description of the desired network would be automatically translated into a set of rules that the model would implement to grow the specified network. For example, one might specify the size and number of neural layers to grow and the patterns of connectivity to be established between them. Another important direction for future work would be to extend our robustness studies. We found that our approach was substantially robust to changes such as randomly varying (by up to ±20 percent) the parameters of the forces responsible for inducing collective movements. However, more convincing results could be obtained by examining in a similar fashion networks independently designed by others, or even automatically, and this remains an issue for future investigation. Important additional directions for future research include assessing the effectiveness of our approach for the self-assembly of nonneural network structures, and examining the effect of introducing nondeterministic rules or noise on network construction.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
125
Acknowledgment Supported in part by NSF Awards ITS-0325089 and DMS-0240049.
References [1] Arbuckle, D. and Requicha, A. (2004). Active self-assembly. In Proceedings of the IEEE international conference on robotics and automation (ICRA ’04) (pp. 896-901). New York: IEEE. [2] Astor, J. C. and Adami, C. (2000). A developmental model for the evolution of artificial neural networks. Artificial Life, 6, 189-218. [3] Bishop, J., Burden, S., Klavins, E., et al. (2005). Programmable parts: A demonstration of the grammatical approach to self-organization. In Proceedings of the IEEE international conference on intelligent robots and systems (IROS ’05) (pp. 3684-3691). New York: IEEE. [4] Bonabeau, E., Dorigo, M. and Theraulaz G. (1999). Swarm intelligence: From natural to artificial systems. New York, NY: Oxford University Press.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[5] Cangelosi, A., Parisi, D. and Nolfi, S. (1994). Cell division and migration in a “genotype” for neural networks. Network: Computation in Neural Systems , 5, 497-515. [6] Carvalho, M. and Ludermir, T. (2007). Particle swarm optimization of neural network architectures and weights. In Proceedings of the 7th international conference on hybrid intelligent systems (HIS ’07) (pp. 336-339). New York: IEEE. [7] Chval, J. (2002). Evolving artificial neural networks by means of evolutionary algorithms with L-systems based encoding (Research Report). Prague, Czech Republic: Czech Technical University. [8] Delgado, A. (2000). Control of nonlinear systems using a self-organizing neural network. Neural Computing & Applications , 9(2), 113-123. [9] Deneubourg, J.-L., Goss, S., Franks, N., Pasteels, J.-M. (1989). The blind leading the blind: Modelling chemically mediated army ant raid patterns. Journal of Insect Behavior, 2, 719-725. [10] Deneubourg, J.-L., Aron, S., Goss, S. and Pasteels, J.-M. (1990). The self-organizing exploratory pattern of the Argentine ant. Journal of Insect Behavior, 3, 159-168. [11] Eggenberger, P. (1997). Creation of neural networks based on developmental and evolutionary principles. In W. Gerstner, A. Germond, M. Hasler, J. Nicoud (Eds.), Proceedings of the international conference on artificial neural networks (ICANN ’97) (pp. 337-342). Berlin/Heidelberg, Germany: Springer.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
126
Charles E. Martin and James A. Reggia
[12] Elizondo, E., Birkenhead, R., G´ongora, M., et al. (2007). Analysis and test of efficient methods for building recursive deterministic perceptron neural networks. Neural Networks, 20, 1095-1108. [13] Fahlman, S. and Lebiere, C. (1990). The cascade-correlation learning architecture. In D. S. Touretzky (Ed.), Advances in neural information processing systems II (pp. 524-532). San Francisco, CA: Morgan Kaufmann. [14] Farkaˇs, I. and Miikkulainen, R. (1999). Modeling the self-organization of directional selectivity in the primary visual cortex. In Willshaw, D. and Murray, A. (Eds.), Proceedings of the international conference on artificial neural networks (ICANN ’99) (pp. 251-256). London, England: IEE. [15] Fleischer, K. and Barr, A. (1994). A simulation testbed for the study of multicellular development: The multiple mechanisms of morphogenesis. In C.G. Langton (Ed.), Artificial life III (Vol. XVII of the SFI studies in the science of complexity) (pp. 389416). Redwood City, CA: Addison-Wesley. [16] Fleischer, K. (1995). A multiple-mechanism developmental model for defining selforganizing geometric structures (Dissertation). Pasadena, CA: California Institute of Technology. [17] Franks, N., Gomez, N., Goss, S. and Deneubourg, J.-L. (1991). The blind leading the blind in army ant raid patterns: Testing a model of self-organization (Hymenoptera: Formicidae). Journal of Insect Behavior, 4, 583-607.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[18] Frean, M. (1990). The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2, 198-209. [19] Goodhill, G., Gu, M., and Urbach, J. (2004). Predicting axonal response to molecular gradients with a computational model of filopodial dynamics. Neural Computation, 16, 2221-2243. [20] Goodhill, G. and Xu, J. (2005). The development of retinotectal maps: A review of models based on molecular gradients. Network, 16, 5-34. [21] Gracias, D., Tien, J., Breen, T., Hsu, C., and Whitesides G. (2000). Forming electrical networks in three dimensions by self-assembly. Science, 289, 1170-1172. [22] Gross, R., Bonani, M., Mondala, F., Dorigo, M. (2006). Autonomous self-assembly in swarm-bots. IEEE Transactions on Robotics , 22, 1115-1130. [23] Grove, E. and Fukuchi-Shimogori, T. (2003). Generating the cerebral cortical area map. Annual Review of Neuroscience, 26, 355-380. [24] Gruau, F. (1993). Genetic synthesis of modular neural networks. In S. Forest (Ed.), Proceedings of the 5th international conference on genetic algorithms (ICGA ’93) (pp. 318-325). San Francisco, CA: Morgan Kaufmann. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
127
[25] Grushin, A., Reggia, J. (2006). Stigmergic self-assembly of prespecified artificial structures in a constrained and continuous environment. Integrated Computer-Aided Engineering, 13, 289-312. [26] Grushin, A., Reggia, J. (2008). Automated design of distributed control rules for the self-assembly of pre-specified artificial structures. Robotics and Autonomous Systems , 56, 334-359. [27] Haessly, A., Sirosh, J. and Miikkulainen, R. (1995). A model of visually guided plasticity of the auditory spatial map in the barn owl. In J. F. Lehman and J. D. Moore (Eds.), Proceedings of the 17th annual meeting of the Cognitive Science Society (pp. 154-158). Hillsdale, NJ: Lawrence Earlbaum. [28] Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice-Hall. [29] Hentschel, H. and van Ooyen, A. (1999). Models of axon guidance and bundling during development. Proceedings of the Royal Society (London) B , 266, 2231-2238. [30] Honda, H. (2003). Competition between retinal ganglion axons for targets under the servomechanism model. Journal of Neuroscience, 23, 10368-10377. [31] Jones, C. and Matari´c, M. (2003). From local to global behavior in intelligent selfassembly. In Proceedings of the IEEE international conference on robotics and automation (ICRA ’03) (pp. 721-726). New York: IEEE.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[32] Jung, J. and Reggia, J. (2006). Evolutionary design of neural network architectures using a descriptive encoding language. IEEE transactions on evolutionary computation , 10, 676-688. [33] Kalay, A., Parnas, H., Shamir, E. (1995). Neuronal growth via hybrid system of selfgrowing and diffusion based grammar rules: I. Bulletin of Mathematical Biology , 57, 205-227. [34] Kennedy, J. and Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks (pp. 1942-1948). New York: IEEE. [35] Kitano, H. (1990). Designing neural networks using genetic algorithms with graph generation system. Complex Systems, 4, 461-476. [36] Klavins, E. (2007). Programmable self-assembly. IEEE Control Systems Magazine , 27, 43-56. [37] Klavins, E., Ghrist, R., and Lipsky, D. (2004). Graph grammars for self-assembling robotic systems. In Proceedings of the IEEE International conference on robotics and automation (ICRA ’04) (pp. 5293-5300). New York: IEEE. [38] Kohonen, T. (2001). Self-organizing maps. New York, NY: Springer. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
128
Charles E. Martin and James A. Reggia
[39] LeCun, Y., Denker, J., Solla, S. (1990). Optimal brain damage. In D. Touretzky (Ed.), Advances in neural information processing systems II (pp. 598-605). San Francisco, CA: Morgan Kaufmann. [40] Lendasse, A., Verleysen, M., de Bodt, E., Gregoire, P., and Cottrell, M. (1998). Forecasting time-series by Kohonen classification. In M. Verleysen (Ed.), Proceedings of the 6th European symposium on artificial neural networks (ESANN ’98) (pp. 221226). Brussels, Belgium: D-Facto public. [41] Lopez-Bendito, G. and Molnar, Z. (2003). Thalamocortical development: How are we going to get there? Nature Reviews Neuroscience, 4, 276-289. [42] von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14, 85-100. [43] Marchand, M., Golea, M. and Rujan, P. (1990). A convergence theorem for sequential learning in two-layer perceptrons. Europhysics Letters, 11(6), 487-492. [44] Martin, C. and Reggia, J. (2010). Self-assembly of neural networks viewed as swarm intelligence. Swarm Intelligence, 4, 1-36. DOI: 10.1007/s11721-009-0035-7. [45] Nembrini, J., Reeves, N., Poncet, E., et al. (2005). Flying swarm intelligence for architectural research. In Proceedings of the IEEE swarm intelligence symposium (SIS ’05) (pp. 225-232). New York: IEEE.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[46] van Ooyen, A. (1994). Activity-dependent neural network development. Network: Computation in Neural Systems , 5, 401-423. [47] Pearson, J., Finkel, L. and Edelman, G. (1987). Plasticity in the organization of adult cerebral cortical maps: A computer simulation based on neuronal group selection. The Journal of Neuroscience, 7, 4209-4223. [48] Prusinkiewicz, P. and Lindenmayer, A. (1990). The algorithmic beauty of plants . New York, NY: Springer. [49] Pulakka, K. and Kujanpa, V. (1998). Rough level path planning method for a robot using SOFM neural network. Robotica, 16, 415-423. [50] Reggia, J. and Martin, C. (2009). Self-assembly of a neural network . College Park, MD: Univ. of Maryland, Dept. of Computer Science. http://www.cs.umd.edu/˜reggia/martin.html. [51] Reynolds, C. (1987). Flocks, herds, and schools: A distributed behavioral model. Computer Graphics, 21(4), 25-34. [52] Ritter, H., Martinetz, T. and Schulten K. (1992). Neural computation and selforganizing maps. Reading, MA: Addison-Wesley. [53] Rodriguez, A. and Reggia, J. (2004). Extending self-organizing particle systems to problem solving. Artificial Life, 10, 379-395. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Swarm Intelligence for the Self-Assembly of Neural Networks
129
[54] Rust, A., Adams, R., Schilstra, M. and Bolouri, H. (2003). Evolving computational neural systems using synthetic developmental mechanisms. In S. Kumar and P. Bentley (Eds.), On growth, form and computers (pp. 353-376). New York, NY: Academic Press. [55] Spitzer N. (2006). Electrical activity in early neuronal development. Nature, 444, 707712. [56] Sutton, G., Reggia, J., Armentrout, S., D’Autrechy, C. (1994). Cortical map reorganization as a competitive process. Neural Computation, 6, 1-13. [57] Vesanto, J. (1999). SOM-based data visualization methods. Intelligent Data Analysis , 3, 111-126. [58] Vesanto, J. and Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on Neural Networks, 11(3), 586-600. [59] Werfel, J. and Nagpag, R. (2006). Extended stigmergy in collective construction. IEEE Intelligent Systems, 21, 20-28. [60] White, P., Zykov, V., Bongard, J. and Lipson, H. (2005). Three dimensional stochastic reconfiguration of modular robots. In Proceedings of robotics: Science and systems (pp. 161-168). Cambridge, MA: MIT Press. [61] Whitesides, G. and Gzybowski, B. (2002). Self-assembly at all scales. Science, 295, 2418-2421.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
[62] Yates, P., Holub, A., McLaughlin, T., et al. (2004). Computational modeling of retinotopic map development to define contributions of EphA-EphrinA gradients, axon-axon interactions, and patterned activity. Journal of Neurobiology, 59, 95-113.
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
In: Applications of Swarm Intelligence Editor: Louis P. Walters, pp. 131-152
ISBN: 978-1-61728-602-5 © 2011 Nova Science Publishers, Inc.
Chapter 6
APPLICATION OF PARTICLE SWARM OPTIMIZATION METHOD TO INVERSE HEAT RADIATION PROBLEM Kyun Ho Lee* Korea Aerospace Research Institute, Daejeon 305-333, Republic of Korea
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Abstract In this chapter, an inverse heat radiation analysis is presented for the estimation of the thermal radiation properties for an absorbing, emitting, and scattering media with diffusely emitting and reflecting opaque boundaries. The particle swarm optimization (PSO) algorithm, one of high performance intelligence algorithms recently created as an alternative to a conventional intelligence method, is proposed as an effective method for improving the search efficiency for unknown radiative parameters. To verify the feasibility and the performance of PSO, it is applied to inverse heat radiation analysis in estimating the wall emissivities, absorption and scattering coefficients in a two-dimensional irregular medium when the measured temperatures are given. The accuracy of estimated parameters and the computational efficiency from PSO are compared with the results obtained by a genetic algorithm (GA). Finally, it is proven that PSO is quite a robust method for simultaneous estimation of multiparameters when it is applied to the inverse heat radiation problem.
1. Introduction The inverse heat transfer analyses have numerous applications in various branches of science and engineering. They can be classified in accordance with the heat transfer mechanisms, such as inverse heat conduction, inverse heat convection and inverse heat radiation analysis or their combined modes. In recent years, inverse heat analyses provide a great advantage where desired quantities are not possible to be measured directly from experiments, for example, high surface temperatures on a reentry vehicle or thermal properties of hot gas during combustion, etc [1]. Especially for inverse heat radiation analyses, many studies have been concerned with the determination of the radiation properties, boundary condition and temperature profile or source term distribution, given
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central, *
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
132
Kyun Ho Lee
various types of thermal radiation measurements [2-4]. Unfortunately the solution obtained from inverse heat analyses may neither exist, nor be unique. Given measurement data with some error, the inverse heat problems cannot be directly solved. Therefore, various mathematical methods have been adopted to obtain a stable solution in spite of the ill-posed characteristic of inverse heat problem. Among them, the conjugate gradient method (CGM), which is one of gradient-based optimization methods, has usually been adopted by many researchers in dealing with inverse heat problems [5-7]. The CGM has an advantage of stably estimating the inverse heat solutions in relatively short computational time, but complex mathematical equations such as sensitivity and adjoint problems should be additionally solved together to calculate the gradient information. Also, the infeasible solutions can be obtained if initial values are not guessed properly or if many parameters are highly correlated, and many iteration numbers are required even if they converge [8]. As an alternative to gradient-based methods, search-based methods, such as genetic algorithm (GA), have received much attention for outstanding characteristics, especially in nonlinear or multi-parameter problems. Li and Yang used GA in inverse heat radiation analysis for estimating the scattering albedo, optical thickness and phase function in parallel plane [9], while Kim et al. estimated wall emissivities with hybrid genetic algorithm [10]. Verma and Balaji implemented GA for estimating radiative properties in a problem of combined conduction–radiation from onedimensional plane parallel with participating medium [11]. Although the search-based methods have been successfully applied to various inverse heat problems with high computing resources recently, it still takes a longer computing time than gradient-based methods. In this chapter, to easily solve the inverse problems encountered in engineering which is highly non-linear, non-monotonic, or a very complex form, the particle swarm optimization (PSO) is employed as a fast, robust and stable inverse analysis method rather than the conventional GA method. To verify the feasibility and the performance of PSO, it is applied to inverse heat radiation analysis in estimating the wall emissivities, absorption and scattering coefficients in a two-dimensional absorbing, emitting and scattering irregular medium when the measured temperatures are given. Also, the accuracy of estimated parameters and the computational efficiency are compared with the results obtained by a hybrid genetic algorithm (HGA) technique.
2. Principle of Algorithm 2.1. Hybrid Genetic Algorithm (HGA) Genetic algorithm (GA) is a well known global optimization technique based on the Darwin‟s principle of the „survival of the fittest‟ and the natural process of evolution through reproduction [12]. Generally, it starts with a randomly generated population of candidate solutions (individuals) within the expected range of parameters and then exchanges genetic information between individuals to reproduce improved solutions from one generation to the next by three simulated evolution processes, which are selection, crossover and mutation. By repeating these processes, the relatively “good” individuals are reproduced while the relatively “bad” ones are extinguished at each generation. When a desirable fitness value of an object function is obtained, the GA stops evolution processes and the best individuals of the last generation are regarded as the final solution. Based on its demonstrated ability to reach global optimum solutions, the GA has been extensively applied to problems in many
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
133
Application of Particle Swarm Optimization Method …
fields such as optimization design, fuzzy logic control, neural networks, expert systems, scheduling, and many others [13-14]. However, the GA has some drawbacks such as inability to perform fine local tuning due to its poor exploitation, a premature convergence to a non-global optimum and a longer computing time. Also, a proper selection of population size, crossover and mutation probabilities, and the maximum number of generations is required because they greatly affect the performance of GA. To overcome these difficulties, an elitist strategy and a local optimization algorithm (LOA) are combined with a simple GA, which is thereby so-called hybrid genetic algorithm (HGA) [10]. The elitist strategy ensures monotonic improvement in the best fitness value of each generation and helps to reach near the global optima while LOA helps faster converge to them. The LOA is applied to only elite individual to reduce computation time after determining the elite individual. If s = v1 , v2 ,...,vm is a chromosome of elite individual and the gene v k is selected for local optimization, the resulting gene v k' is as follows:
vk t ,UB vk vk' vk t , vk LB
(1)
where LB and UB are lower and upper domain bounds of the gene v k . Initialization
Evaluate the fitness
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Selection
Crossover
Mutation
Elite Strategy
Local Optimization
Meet stop criterion ? NO YES Stop
(a) Hybrid genetic algorithm Figure 1. Continued on next page. Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
134
Kyun Ho Lee Initialization
Evaluate the fitness
pfitness > local best ? YES pfitness_best < global best ? YES Update particle velocity
Update particle position
Meet stop criterion ? NO YES Stop
(b) Particle swarm optimization algorithm
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
Figure 1. The flowchart of search-based algorithm.
The function t, y returns a value in the range [0, y] such that the probability of
t, y , being close to 0, increases as t increases. The following function is used for t, y t 1 T max t,y y 1 r
b
(2)
where r is a uniform random number, Tmax is the maximum generation number, and b is a system parameter for determining the degree of dependency on generation number, t (b=1 here). In LOA, using Eq. (1) v k' is calculated for each gene of elite individual. If v k' is fitter than v k , gene of elite individual is changed to v k' . Otherwise, v k is maintained. A more detailed description of HGA can be found in the reference [10], and its flowchart is presented in Figure 1(a).
2.2. Particle Swarm Optimization (PSO) Particle swarm optimization (PSO) is a recent high performance algorithm created in 1995 as an alternative to the GA. It is based on the social behavior of a swarm of birds or insects or Nova a school of fishIncorporated, which searches forEbook food in a very typical manner. If one of the Applications of Swarm Intelligence, Science Publishers, 2010. ProQuest Central,
135
Application of Particle Swarm Optimization Method …
members of the swarm sees a desirable path to go, the rest of the swarm will follow quickly. Every member of the swarm searches for the best in its locality-learns from its own experience. Additionally, each member learns from the others, typically from the best performer among them [15]. The PSO consists of three steps, namely, generating positions and velocities of particles, velocity update, and finally, position update. The flowchart of PSO is presented in Figure 1(b) and its detailed procedures are as follows: 1) Randomly initialize the velocities and the positions of all particles to within predefined ranges. 2) At current kth iteration, the velocity of ith particle for next iteration is updated according to following equation
vik 1 w vik c1 r1 (pik xik ) c2 r2 (pgk xik )
(3)
where xik and vik are the position and velocity of particle i. pik and p gk are the positions with the best objective value found so far by particle i and by all particles, which are called the local and the global best position respectively. w is an inertia factor which controls the flying dynamics, r1 and r2 are uniform random variables in
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
the range [0, 1]. Also, c1 and c2 are acceleration coefficients that pull each particle toward the local and the global best positions. From Eq. (3), it can be expected that the current and the best information of particle and swarm are considered all together to update velocities of all particles for next iteration, which are totally different from GA‟s updating operations. 3. Finally, after the position of each particle is updated using its new velocity for unit time according to
xik 1 xik vik 1
(4)
the fitness of objective function is estimated. 4. The three steps of velocity update, position update, and fitness calculations are repeated until meeting a desired convergence criterion, usually a sufficiently good fitness or a maximum number of iterations.
Similar to the GA, the PSO is also a population-based search algorithm. Firstly, it is initialized in a set of randomly generated potential solutions (particles), and then iteratively performs the search for the optimum one. But unlike GA, every particle in PSO flies through the search space with velocity which is dynamically adjusted according to its own flying experience and its companions‟ flying experience instead of using genetic operators, such as crossover and mutation processes, to reproduce the individual. Namely, the particles have a tendency to fly towards the better and better search area over the course of search process without using the complex techniques used in the GA. Therefore, the PSO has much more profound intelligent background and can be easily implemented compared to the GA. Generally, it is known that the PSO is much simpler, has fewer parameters to adjust and is computationally inexpensive since its memory and CPU speed requirements are low [16]. Furthermore, it does not require gradient information of the objective function being considered. Only its values are necessary. Based on these advantages, the PSO has been successfully to solve a lot practical application problems such as function Applications of Swarm Intelligence, Novaapplied Science Publishers, Incorporated, 2010.of ProQuest Ebook Central,
136
Kyun Ho Lee
optimization, artificial neural network training, pattern recognition, fuzzy control and some other fields in recent years [17].
3. Mathematical Formulation 3.1. Physical Model Figure 2 shows an irregular quadrilateral enclosure (all dimensions are in meters) which is filled with an absorbing, emitting, scattering and gray gas with an absorption coefficient a and a scattering coefficient s as described in reference [10]. The non-radiative volumetric heat source is Q 5.0 kW/m3. The walls are gray walls, and their temperatures are all Tw 1000 K. The spatial and angular domains are discretized into ( N x N y ) 10 10 control volumes and ( N N ) 4 20 control angles which corresponds to S8 quadrature scheme [18]. The temperature distribution in gray gas is determined from the following energy equation [19] N qr 0 (1 0 ) 4I b n 1
N
I m 1
mn
mn Q
(5)
where qr means the radiative heat flux and I is the radiation intensity. Also, 0 a s is the extinction coefficient, and 0 s / 0 is the scattering albedo.
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
(1.5, 1.2) (0.5, 1.0)
(0, 0)
(2.2, 0) : Temperature Measurement Point
Figure 2. Schematic of the physical system and the position of four measurement points.
3.2. Direct Problem The radiative transfer equation governing radiation intensity for a gray medium at any position r along a path s through an absorbing, emitting, and scattering medium is given by
Applications of Swarm Intelligence, Nova Science Publishers, Incorporated, 2010. ProQuest Ebook Central,
137
Application of Particle Swarm Optimization Method …
1 dI (r, s) I (r, s) (1 0 ) I b (r ) 0 0 ds 4
Ω ' 4
I (r, s' )Φ(s' s)d'
(6)
Φ(s' s) is the scattering phase function for radiation from incoming direction s' to scattered direction s . It is approximated by a finite series of Legendre polynomial as J
Φ(s' s) Φ(cos Ψ ) C j Pj cos Ψ
(7)
j 0
where C j is the expansion coefficient, and J is the order of the phase function. nn
nw
ne
P
y
ns
x
Copyright © 2010. Nova Science Publishers, Incorporated. All rights reserved.
(a) Control volume z s m
s
I Pm
I Pm
m
m
m
P
x
m
m
y
(b) Control angle Figure 3. Schematics of finite volume grids.
The boundary condition for a diffusely emitting and reflecting wall can be written as follows I (rw , s) = ε w I b (rw ) +
1 εw π
∫
s ' n w