225 50 12MB
English Pages 219 [220] Year 2023
Machine Learning: Foundations, Methodologies, and Applications
Liang Feng · Abhishek Gupta Kay Chen Tan · Yew Soon Ong
Evolutionary Multi-Task Optimization Foundations and Methodologies
Machine Learning: Foundations, Methodologies, and Applications Series Editors Kay Chen Tan, Department of Computing, Hong Kong Polytechnic University, Hong Kong, China Dacheng Tao, University of Technology, Sydney, Australia
Books published in this series focus on the theory and computational foundations, advanced methodologies and practical applications of machine learning, ideally combining mathematically rigorous treatments of a contemporary topics in machine learning with specific illustrations in relevant algorithm designs and demonstrations in real-world applications. The intended readership includes research students and researchers in computer science, computer engineering, electrical engineering, data science, and related areas seeking a convenient medium to track the progresses made in the foundations, methodologies, and applications of machine learning. Topics considered include all areas of machine learning, including but not limited to: • • • • • • • • • • • • • • • •
Decision tree Artificial neural networks Kernel learning Bayesian learning Ensemble methods Dimension reduction and metric learning Reinforcement learning Meta learning and learning to learn Imitation learning Computational learning theory Probabilistic graphical models Transfer learning Multi-view and multi-task learning Graph neural networks Generative adversarial networks Federated learning
This series includes monographs, introductory and advanced textbooks, and stateof-the-art collections. Furthermore, it supports Open Access publication mode.
Liang Feng • Abhishek Gupta • Kay Chen Tan • Yew Soon Ong
Evolutionary Multi-Task Optimization Foundations and Methodologies
Liang Feng College of Computer Science Chongqing University Chongqing, China
Abhishek Gupta Singapore Institute of Manufacturing Technology Agency for Science, Technology and Research Singapore, Singapore
Kay Chen Tan Department of Computing The Hong Kong Polytechnic University Hong Kong, China
Yew Soon Ong School of Computer Science & Engineering Nanyang Technological University Singapore, Singapore
ISSN 2730-9908 ISSN 2730-9916 (electronic) Machine Learning: Foundations, Methodologies, and Applications ISBN 978-981-19-5649-2 ISBN 978-981-19-5650-8 (eBook) https://doi.org/10.1007/978-981-19-5650-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The human brain is a prime specimen of the power of evolution, and an inspiration to the field of artificial intelligence as a whole. While biological evolution may be a slow process, its in silico counterpart—forming the field of evolutionary computation—is a highly parallelizable approach to search, optimization, and machine learning. A canonical example of this approach is the evolutionary algorithm (EA), where computationally encoded populations of solution prototypes are sampled, evaluated, and “evolved” (updated) via operators mimicking principles of inheritance and natural selection. The algorithms start with a parent population of candidate solutions (also referred to as individuals) initialized in a predefined search space. The individuals undergo randomized reproduction operations to produce a generation of offspring that are then exposed to environmental selection pressures, with a selected subset forming the parent population for the next iteration. This procedure runs repeatedly, terminating when a target solution is discovered, the population satisfactorily converges, or available computational budget is exhausted. Despite the apparent simplicity of these steps, EAs demonstrate strong search and optimization capability. They have therefore been successfully applied to arrive at optimal or near-optimal solutions across a wide range of real-world problem settings. However, unlike the natural world where evolution has engendered diverse species and produced differently skilled sub-populations, in silico EAs are typically designed to evolve a set of solutions specialized for just a single target task. This convention of problem-solving in isolation tends to curtail the power of implicit parallelism of a population. Skills evolved for a given problem instance do not naturally transfer to populations tasked to solve another. Hence, convergence rates remain restrained, even in settings where related tasks with overlapping search spaces, similar optimal solutions, or with other forms of reusable information routinely recur. In the light of the above, the emerging Evolutionary Multitasking (EMT) paradigm illuminates a new pathway for evolutionary computation research, offering a unique perspective on pushing the envelope of implicit parallelism of EAs. Instead of tackling tasks independently, one or more populations are jointly evolved v
vi
Preface
on a set of distinct tasks, with carefully crafted mechanisms for skills transfer between them. Such a setup not only raises new theoretical questions on what, how, and when to transfer in the unique context of EMT, e.g., for scaling efficiently to growing task sets with varying levels of inter-task relatedness, but also unpacks a new box of tools for practitioners engaged in real-world problem-solving. The goal of this book is to describe advances made in developing EMT algorithms for solving various complex, use-inspired optimization problems. The book is divided into four parts. In Part I, we give an introduction to conventional evolutionary optimization and conceptualize its extension to EMT. An overview of practical applications of EMT is also provided therein. In Part II, we elaborate on EMT approaches spanning implicit and explicit multitasking strategies for solving continuous optimization problems. In Part III, we further expand our discussions to cover hard combinatorial optimization problems. Part IV contains examples in largescale optimization, including single- and multi-objective optimization problems with high dimensional search spaces. Chongqing, China Singapore, Singapore Hong Kong, China Singapore, Singapore
Liang Feng Abhishek Gupta Kay Chen Tan Yew Soon Ong
Contents
Part I
Background
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.2 Evolutionary Optimization . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.3 Evolutionary Multi-Task Optimization . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1.4 Organization of the Book .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
3 3 4 5 8
2 Overview and Application-Driven Motivations of Evolutionary Multitasking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.1 An Overview of EMT Algorithms . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2 EMT in Real-World Problems .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.1 Category 1: EMT in Data Science Pipelines . . . . . . . . . . . . . . . . . . 2.2.2 Category 2: EMT in Evolving Embodied Intelligence .. . . . . . . 2.2.3 Category 3: EMT in Unmanned Systems Planning . . . . . . . . . . . 2.2.4 Category 4: EMT in Complex Design . . . . .. . . . . . . . . . . . . . . . . . . . 2.2.5 Category 5: EMT in Manufacturing, Operations Research . . . 2.2.6 Category 6: EMT in Software and Services Computing .. . . . .
11 11 12 12 16 18 20 23 25
Part II
Evolutionary Multi-Task Optimization for Solving Continuous Optimization Problems
3 The Multi-Factorial Evolutionary Algorithm.. . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1.1 Multi-Factorial Optimization .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.1.2 Similarity and Difference Between Multi-factorial Optimization and Multi-Objective Optimization .. . . . . . . . . . . . . 3.1.3 The Multi-Factorial Evolutionary Algorithm .. . . . . . . . . . . . . . . . .
31 31 31 33 34
vii
viii
Contents
3.2 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.2.1 Multitasking Across Functions with Intersecting Optima . . . . 3.2.2 Multitasking Across Functions with Separated Optima . . . . . . 3.2.3 Discussions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4 Multi-Factorial Evolutionary Algorithm with Adaptive Knowledge Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.1 Representative Crossover Operators for Continuous Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.2 Knowledge Transfer via Different Crossover Operators in MFEA .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.1.3 MFEA with Adaptive Knowledge Transfer . . . . . . . . . . . . . . . . . . . 4.2 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.1 Experimental Setup .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.2 Performance Metric .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.3 Results and Discussions . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.2.4 Other Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5 Explicit Evolutionary Multi-Task Optimization Algorithm .. . . . . . . . . . . . 5.1 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.1 Denoising Autoencoder . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.1.2 The Explicit EMT Paradigm . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 5.2.1 Single-Objective Multi-Task Optimization .. . . . . . . . . . . . . . . . . . . 5.2.2 Multi-Objective Multi-Task Optimization .. . . . . . . . . . . . . . . . . . . . 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Part III
39 40 43 44 44 47 48 48 51 53 57 57 58 59 66 73 75 76 76 78 82 82 88 94
Evolutionary Multi-Task Optimization for Solving Combinatorial Optimization Problems
6 Evolutionary Multi-Task Optimization for Generalized Vehicle Routing Problem with Occasional Drivers . . .. . . . . . . . . . . . . . . . . . . . 6.1 Vehicle Routing Problem with Heterogeneous Capacity, Time Window and Occasional Driver (VRPHTO) . . . . . . . . . . . . . . . . . . . . 6.1.1 Variants of the Vehicle Routing Problem . .. . . . . . . . . . . . . . . . . . . . 6.1.2 Mathematical Formulation for VRPHTO. .. . . . . . . . . . . . . . . . . . . . 6.2 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.1 Evolutionary Multitasking for VRPHTO . .. . . . . . . . . . . . . . . . . . . . 6.2.2 Permutation-Based Unified Representation Scheme and Decoding Exemplar .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.3 Extended Split Procedure.. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.4 Routing Information Exchange Across Instances in Evolutionary Multitasking.. . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.2.5 Chromosome Evaluation in Evolutionary Multitasking . . . . . .
97 98 98 99 101 101 102 104 106 107
Contents
ix
6.3 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.1 Benchmark Generation . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.2 Experiment Setup .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.3.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing Problem .. . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.1 Capacitated Vehicle Routing Problem (CVPR). . . .. . . . . . . . . . . . . . . . . . . . 7.2 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2.1 Learning of Mapping Across CVRPs . . . . . .. . . . . . . . . . . . . . . . . . . . 7.2.2 Knowledge Transfer Across CVRPs . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.1 Experiment Setup .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.2 Results and Discussions . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.3.3 Real-World Routing Application: The Package Delivery Problem . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Part IV
107 107 111 111 122 123 123 125 126 127 130 131 132 141 143
Evolutionary Multi-Task Optimization for Solving Large-Scale Optimization Problems
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.1 Existing Approaches for Simplifying Search Space of Large-Scale Single-Objective Optimization Problems . . . . . . . . . . . . . . . . 8.2 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.2.1 Construction of the Simplified Problem Space .. . . . . . . . . . . . . . . 8.2.2 Learning of Mapping Across Problem Spaces . . . . . . . . . . . . . . . . 8.2.3 Knowledge Transfer Across Problem Spaces . . . . . . . . . . . . . . . . . 8.2.4 Reconstruction of the Simplified Space . . .. . . . . . . . . . . . . . . . . . . . 8.2.5 Summary of the Multi-Space Evolutionary Search . . . . . . . . . . . 8.3 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.1 Experimental Setup .. . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8.3.3 AI Application of Recommender System .. . . . . . . . . . . . . . . . . . . . 8.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.1 Existing Approaches for Large-Scale Evolutionary Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2 Algorithm Design and Details . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2.1 Outline of the Algorithm . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2.2 Problem Variation.. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.2.3 Multi-Factorial Evolutionary Search .. . . . . .. . . . . . . . . . . . . . . . . . . .
147 148 149 150 151 152 153 153 154 154 157 167 169 171 171 173 174 174 179
x
Contents
9.3 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3.2 Performance Comparisons with the State-of-the-Arts . . . . . . . . 9.3.3 Effectiveness of Knowledge Transfer .. . . . .. . . . . . . . . . . . . . . . . . . . 9.3.4 Parameter Sensitivity Analysis . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.3.5 Real-World Application of Neural Network Training Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .
182 182 185 199 199 200 205
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 207
Part I
Background
Chapter 1
Introduction
1.1 Optimization Optimization is an essential ingredient in many real-world problem solving systems and artificial intelligence (AI) algorithms [1]. For instance, optimization minimizes the loss/cost function while training machines to learn from data [2], gives costefficient routing solutions for green city logistics [3], discovers out of the box engineering design solutions that may be difficult for humans to imagine [4], and even provides the means to democratize AI itself by automating the configuration of machine learning model architectures [5]. Generally, optimization defines the process of finding sets of inputs to a target objective function which result in the minimum or maximum of that function. Mathematically, a single-objective optimization problem can be expressed in a standard form, as follows: minimize f (x), .
subject to hi (x) ≤ 0, i = 1, 2, . . . , p,
(1.1)
gj (x) = 0, j = 1, 2, . . . , q. where .f (x) is the objective function and .x is the decision variable vector (encoding a candidate solution) that can be written as .x = (x1 , x2 , . . . , xD ) in a D-dimensional search space. Problems with objective functions to be maximized can also be represented in this form, since maximization of .f (x) is the same as minimizing the negative of .f (x). .hi (x) and .gj (x) represent p and q inequality and equality constraints, respectively, which must be satisfied to ensure solution feasibility.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 L. Feng et al., Evolutionary Multi-Task Optimization, Machine Learning: Foundations, Methodologies, and Applications, https://doi.org/10.1007/978-981-19-5650-8_1
3
4
1 Introduction
Moreover, side constraints, i.e., .xkl ≤ xk ≤ xku (where .xkl and .xku are the lower and upper bounds of .xk ), can be directly expressed among the aforementioned inequalities. Note that in the remainder of this chapter, constraint functions shall mostly be left out for simplicity of exposition.
1.2 Evolutionary Optimization The ubiquity of the optimization problem expressed by Eq. 1.1 has led to immense interest among scientists and engineers for decades, resulting in the development of a plethora of computational techniques for tackling them. While there exists a plethora of associated algorithms to choose from, the interest here lies in a family of nature-inspired optimization methodologies that make up the field of evolutionary computation [6, 7]. As the name suggests, the algorithms belonging to this field— often referred to as evolutionary algorithms (EAs for short)—draw inspiration from the foundational principles of evolutionary biology as introduced by Charles Darwin in his seminal book “On the Origin of Species by Means of Natural Selection” [8]. The key distinguishing feature of EAs, relative to conventional (point-based) optimization methods such as gradient ascent/descent, is that they are able to avoid local optima by concurrently processing multiple regions of the search space through a population-based search strategy. As illustrated in Algorithm 1, an evolutionary algorithm (EA) starts with a population of individuals that undergo randomized reproduction operations to produce offspring. This is followed by a computational analogue of natural selection acting on the offspring to guide the evolving population towards favorable regions of the search space. This procedure is executed iteratively, terminating when a predefined condition is satisfied. As EAs are largely agnostic to the mathematical properties of the objective (and constraint) functions at hand, they are broadly applicable to a wide variety of search and optimization tasks—with the expectation of returning near-optimal solutions. What’s more, if prior information about a problem—in the form of analytical expressions and derivatives of the objective functions, or heuristic information about the likely structure of optimal solutions [9]—is available, then that can easily be incorporated within an EA to endow it with improved performance guarantees [10]. The motivation behind emulating nature (particularly evolutionary processes) for solving optimization problems stems from the observation that populations of living organisms demonstrate consummate problem-solving ability. The selfadaptive mechanisms of natural selection have the power to eliminate one of the greatest hurdles in software development, i.e., the need to painstakingly specify all the features of a problem and the actions a program must take to deal with them [11]. In light of the above, it is recognized that one of the keys to the success of computational analogues of evolution, namely EAs, is the emergent phenomenon of implicit parallelism [12]. Such parallelism emerges as a consequence of the
1.3 Evolutionary Multi-Task Optimization
5
Algorithm 1: Pseudocode of an evolutionary algorithm 1 Begin 2 k := 0; /* Initialize the evaluation counter. */ 3 Initialize and evaluate [P (k)]; /* Create an initial population. */ 4 while Stopping conditions are not satisfied do
P (k) := Reproduction[P (k)]; /* Apply reproduction operators.*/ Evaluate [P (k)]; /* Evaluate the fitness of generated individuals.
5 6
*/ 7 8
P (k + 1) := select[P (k), P (k)]; /* Create a new population. */ k := k + 1; /* Increase the evaluation counter. */
9 End
population-based search strategy, and serves as the main distinguishing advantage of EAs over other optimization techniques. Simply put, implicit parallelism allows EAs to concurrently sample, evaluate, and process a vast number of regions of the overall search space while manipulating a relatively small number of individuals - with natural selection guiding the fraction of individuals in a region to grow at a rate proportional to the statistical estimate of its fitness. EAs are thus able to steer multiple individuals through the search space, offering better possibility of overcoming local optima to converge towards those areas that tend to minimize the population’s expected objective function value. In the literature, EAs encompass multiple major branches, including evolution strategies, evolutionary programming, genetic algorithms, etc. [6, 7]. These approaches mainly differ in the level at which they simulate the natural evolution process, the representations of individual solutions, and the search operators used to conduct exploration and exploitation of the optimization search space. Over the last decades, a vast amount of research has been dedicated to the design of new solution representation schemes, search operators, etc., towards enhanced evolutionary optimization performance in problems ranging from NP-hard combinatorial optimization to multi-objective and other high-dimensional optimization search [13–18].
1.3 Evolutionary Multi-Task Optimization Multi-task optimization (MTO) with EAs, alternatively labelled as evolutionary multitasking (EMT) or even multi-factorial optimization, puts forth the novel concept of simultaneously solving multiple self-contained optimization problems/tasks with the added scope of computationally encoded knowledge transfer between them [19–21]. If the problems happen to bear some commonality and/or complementarity in terms of their optimal solution(s) and/or function landscapes then the scope for
6
1 Introduction
knowledge transfer often leads to significant performance improvements relative to solving each problem in isolation [22]. One of the key motivations behind the idea is drawn from the observation that real-world problems seldom exist in isolation. As such, humans possess the innate cognitive ability of recognizing and reusing recurring patterns from their problem-solving experiences to solve related new tasks more efficiently. Along similar lines, AI systems of practical relevance (including those in industrial settings) are also expected to be faced with multiple related problems over their lifetime—with multitasking offering the algorithmic platform for the reuse of knowledge to take place autonomously without the constant need of a human in the loop. Prior to the proposal in [19], it is observed that despite possessing a population of evolving individuals at its disposal, the design of EAs had primarily been focused on solving only a single target task at a time. In contrast, the notion of multitasking pushes the envelope of existing EAs, leveraging the power of implicit parallelism in order to simultaneously search for multiple optimal solutions corresponding to multiple distinct (but possibly related) optimization tasks at once—with each task serving as a distinct factor influencing the evolution of the population. Consider K tasks as depicted in Fig. 1.1, where we let .fi : Xi → R be the objective function of the ith optimization task defined on a compact subset .Xi ⊂ RDi . The goal is to find .x∗i = arg minxi ∈Xi fi (xi ). The input of EMT is therefore a set of such optimization tasks .IS (Input Set).: {f1 , . . . , fi , . . . , fK }, and their corresponding search spaces .{X1 , . . . , Xi , . . . , XK }. The desired output of EMT is then given by the set of optimized solutions .OS (Output Set).: {x∗1 , . . . , x∗i , . . . , x∗K }. In what follows, we provide a more formal description of EMT from the perspective of probabilistic model-based evolutionary search.1 To this end, we first define an optimization task, with fitness function f .i : .Xi → R, in terms of the expected fitness under a probability density function as [25]: .
min
pri (xi ) Xi
fi (xi ) · pri (xi ) · dxi .
(1.2)
In probabilistic model-based EAs, .pri (xi ) represents the underlying search distribution of a population of evolving solutions [26]. Note, if .x∗i is the true optimum of ∗ ∗ .fi (xi ), then a Dirac delta function centred at .x optimizes Eq. 1.2; i.e., .pr (xi ) = i i δ(xi −x∗i ) since . Xi fi (xi )δ(xi −x∗i )dxi = fi (x∗i ). As such, probabilistic reformation does not change the optimization outcome. Based on this, the EMT formulation can
1
Only single-objective minimization without constraints is depicted for simplicity. The concept of MTO readily extends to multiple multi-objective optimization tasks [23], or even a mixture of single- and multi-objective optimization tasks [24].
1.3 Evolutionary Multi-Task Optimization
7
Fig. 1.1 An illustration of MTO with K tasks
be given by the following generalization of Eq. 1.2: minimize{wij ,prj (z)∀i,j }
K i=1
.
subject to
K
Z
fi (z) · [
K
wij · prj (z)] · dz,
j =1
wij = 1, ∀i,
(1.3)
j =1
wij ≥ 0, ∀i, j. Here, .z indicates a point in a unified search space .Z, from which solutions in taskspecific search spaces .{X1 , . . . , Xi , . . . , XK } can be decoded. To avoid introducing extra notation, we assume herein that the task-specific search spaces (and therefore .Z) are equivalent, such that objective functions .f1 , f2 , . . . , fK can be considered to be directly defined in .Z without the need for additional encoding and decoding steps. .prj (z) is the population distribution corresponding to the j th task, and .wij s
8
1 Introduction
are weights of a probability mixture model. Note that Eq. 1.3 would be exactly solved when all probabilistic components of the mixture converge to the respective optimal Dirac delta functions .prj∗ (z) = δ(z − z∗j )) in .Z, and .wij = 0 for all .i = j . It follows that optimizing Eq. 1.3 guides a multitasking population to jointly converge to the global optimums of all K tasks. What’s more, during the course of solving Eq. 1.3, the term .[ K w j =1 ij ·prj (z)] provides a unique bridge for knowledge transfer to occur among them. To elaborate, if candidate solutions evolved for the j th task (drawn from .pj (z)) turn out to be performant on the ith task as well, then it becomes possible for them to be transferred across through sampling of the mixture model - with the extent of cross-sampling being mandated by the mixture weight .wij . In contrast, if solutions transferred from a given source do not excel at the recipient task, then their mutual mixture weights can be gradually neutralized. As a way to reduce the number of variables to be optimized in Eq. 1.3, we may further consider the weights to be tied by imposing a symmetry condition on them, i.e., .wij = wj i ∀i, j . Doing so reflects the practical intuition that if the solutions evolved for task j complement the solving of task i, then the reverse is also likely to be true. Notably, if any pair of tasks do not complement each other, then setting the mutual mixture weight to zero mitigates the danger of any harmful (negative) transfer between them [27, 28].
1.4 Organization of the Book This book describes advances in EMT algorithms in a variety of practical problemsolving settings. The book is divided into four parts. In Part I, we provide a broad literature review with application-driven motivations of EMT. In Part II, we present EMT approaches for solving continuous optimization tasks spanning both singleand multi-objective problems. In Part III, we introduce EMT for combinatorial optimization. Finally in Part IV, drawing on the latest techniques in both continuous and combinatorial optimization, we propose multi-space evolutionary search for tackling large-scale single- and multi-objective optimization problems. Part I contains two chapters (i.e., Chaps. 1–2). Chapter 1 briefly introduced the foundations of evolutionary computation, and conceptualized multi-task optimization from a unique probabilistic modelling perspective. The second chapter begins with a short overview of multi-task algorithms in the literature. It subsequently motivates the developments presented in the remainder of the book through a broad review of application-oriented explorations of EMT. Part II contains three chapters (i.e., Chaps. 3–5). Chapter 3 introduces the multi-factorial evolutionary algorithm (MFEA). The MFEA expands the scope of a traditional evolutionary optimization solver, using just a single population to tackle multiple tasks simultaneously. This algorithm imparts a form of implicit knowledge transfer between tasks through genetic crossover operators that act on and recombine solutions belonging to different tasks. The details of the algorithm as well as experimental MTO test-cases shall be presented. Chapter 4 describes an
1.4 Organization of the Book
9
alternative kind of EMT algorithm with multiple populations, where the knowledge transfer across tasks is realized by explicit transfer learning components. Thereafter, in Chap. 5, a method for adaptive inter-task knowledge transfer in the MFEA is developed. To this end, the performance achieved under various crossover operators inducing implicit genetic transfers is first analyzed. An online operator selection procedure is then crafted based on their accumulated performance scores over the course of a multi-task evolutionary search. Part III contains two chapters (i.e., Chaps. 6–7). Therein, we mainly focus on NP-hard combinatorial optimization problems. In Chap. 6, we first present a single-population EMT algorithm for solving what is called the generalized vehicle routing problem with occasional drivers. Next, a multi-population EMT algorithm with explicit knowledge transfer is proposed and tested on well-known capacitated vehicle routing benchmarks. Part IV contains two chapters (i.e., Chaps. 8–9). Therein, building on the techniques and observations on both continuous and combinatorial optimization tasks, we put forth a novel multi-space evolutionary framework for solving largescale optimization problems. Specific instantiations of multi- and single-population EMT algorithms for solving single- and multi-objective optimization tasks are then proposed in Chaps. 8 and 9, respectively.
Chapter 2
Overview and Application-Driven Motivations of Evolutionary Multitasking
2.1 An Overview of EMT Algorithms A plethora of EMT algorithms have been proposed lately. Some of these either directly or indirectly make use of the probabilistic formulation of MTO discussed in Chap. 1. Nevertheless, as noted in [29], most algorithms typify one of the two methodological classes, i.e., implicit EMT and explicit EMT, described below. An extensive analysis of these methods is not included herein as excellent reviews are available elsewhere [30, 31]. Hence, only a handful of representative approaches are discussed. (1) EMT with implicit knowledge transfer: In these methods, the exchange of information between tasks occurs through evolutionary crossover operators acting on candidate solutions of a single population [32–34]. The population is encoded in a unified search space .Z. Implicit genetic transfers materialize as solutions evolved for different tasks crossover in .Z, hence exchanging learnt skills coded in their genetic material. Over the years, a multitude of evolutionary crossover operators have been developed, each with their own biases. The success of implicit transfers between any task pair thus depends on whether the chosen crossover operator is able to reveal and exploit relationships between their objective function landscapes. For example, in [35], an offline measure of inter-task correlation was defined and evaluated for parent-centric crossovers synergized (strictly) with gradient-based local search updates. In [36], an online measure was derived by means of a latent probability mixture model, akin to Eq. (1.3); the mixture was shown to result from the use of parent-centric operators in the single-population MFEA. (Adapting the extent of transfer based on the coefficients of the mixture model then led to the MFEA-II algorithm.) Greater flexibility in operator selection can however be achieved through self-adaptation strategies as proposed in [37], where data generated during evolutionary search is used for online identification of effective crossover operators for transfer. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 L. Feng et al., Evolutionary Multi-Task Optimization, Machine Learning: Foundations, Methodologies, and Applications, https://doi.org/10.1007/978-981-19-5650-8_2
11
12
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
(2) EMT with explicit knowledge transfer: Here, information exchanges take place between multiple populations. Each population corresponds to a task in MTO and evolves in problem-specific search space .Xi , ∀i. The populations mostly evolve independently, between periodic stages of knowledge transfer. An explicit transfer mechanism is triggered whenever a predefined condition, e.g., transfer interval, is met [38]. For K homogeneous tasks where .X1 = X2 = · · · = XK , island-model EAs for multitasking have been proposed [39], with added functionality to control the frequency and quantity of solution crosssampling between them [40]. Under heterogeneous search spaces, mapping functions .ψij : Xi → Xj , for all .i = j , must be defined to reconcile ith and j th populations. To this end, while most existing EMT methods have made use of linear mapping functions [38, 41], the applicability of fast yet expressive nonlinear maps, as proposed in [42, 43], are deemed worthy of future exploration. Both methodological classes of implicit and explicit EMT have their merits. In the spirit of the no free lunch theorem [44], algorithmic design preferences must therefore be guided by the attributes of the application at hand. Note that implicit genetic transfers naturally emerge from standard crossover operations in unified space, without having to craft ad hoc transfer mechanisms; hence, implementation is relatively simple and scales well for large K. However, composing a unified space and search operators for heterogeneous tasks becomes highly non-trivial (operators that work well for one task may not be effective for another). In contrast, the multipopulation approach of explicit transfer suppresses the need for unification, allowing each task to hold specialized search operators. But additional complexity may be introduced in having to define .O(K 2 ) inter-task solution mappings (.ψij ’s) under heterogeneous search spaces [31].
2.2 EMT in Real-World Problems The aim of this section is to draw the attention of both researchers and practitioners to the many practical use-cases of EMT. Prior literature exploring real-world applications is encapsulated in six broad categories, together with representative case studies and published results that showcase its benefits.
2.2.1 Category 1: EMT in Data Science Pipelines Many aspects of data science and machine learning (ML) pipelines benefit from the salient features of EAs for optimization. Problems such as feature selection [45], hyper-parameter tuning [46], neural architecture search [47], etc., involve nondifferentiable, multimodal objective functions and discrete search spaces that call
2.2 EMT in Real-World Problems
13
for gradient-free optimization. Population-based EAs have even been considered as worthy rivals to, or in synergy with, stochastic gradient descent for learning with differentiable loss functions [48, 49]. Despite the advances, there remain challenges in the efficient scaling of EAs to scenarios characterized by big data (e.g., containing a large number of individual data points), large-scale (high-dimensional) feature/parameter spaces, or involving building sets of multiple learning algorithms (e.g., ensemble learning). EMT provides different pathways to sustain the computational tractability of EAs in such data science settings. EMT with auxiliary task generation: Approaches to augment the training of ML models by turning the problem into MTO—with artificially generated auxiliary tasks—were introduced in [50]. In neural networks for instance, tasks could be defined by different loss functions or network topologies, with the transfer of model parameters between them leading to better training [51]. More generally, for scaling-up the evolutionary configuration of arbitrary ML subsystems, the idea of constructing auxiliary small data tasks from an otherwise large dataset was proposed in [52, 53]. The auxiliary tasks can then be combined with the main task in EMT, accelerating search by using small data to quickly optimize for the large dataset; speedups of over .40% were achieved in some cases of wrapperbased feature selection via an EMT algorithm with explicit transfer [52]. In another application for feature selection, the tendency of stagnation of EAs in highdimensional feature spaces was lessened by initiating information transfers between artificially generated low-dimensional tasks [54, 55]. EMT on sets of learning algorithms: Given a training dataset, an ensemble (or set) of classification models could be learnt by simple repetition of classifier evolution. However, this would multiply computational cost. As an alternative, the study in [56] proposed a variant of multi-factorial genetic programming (MFGP) for simultaneous evolution of an ensemble of decision trees. MFGP enabled a set of classifiers to be generated in a single run, with the transfer and reuse of common subtrees providing substantial cost savings in comparison to repeated (independent) runs of genetic programming. Moving upstream in the data science pipeline, [57] formulated the task of finding optimal feature subspaces for each base learner in an ensemble as an MTO problem. An EMT feature selection algorithm was then proposed to solve this problem, yielding feature subspaces that often outperformed those obtained by seeking the optimal for each base learner independently. A similar idea but targeting the specific case of hyperspectral image classifiers was presented in [58]. Beyond the training of ML models, the literature evinces applications of EMT for image processing as well. For sparse unmixing of hyperspectral images, the approaches in [59, 60] suggest to first partition an image into a set of homogeneous regions. Each member of the set is then incorporated as a constitutive sparse regression task in EMT, allowing implicit genetic transfers to exploit similar sparsity patterns. Results revealed faster convergence to near-optimal solutions via EMT, as opposed to processing pixels or groups of pixels independently. In [61], a multifidelity MTO procedure was incorporated into the hyperspectral image processing framework. A surrogate model was used to estimate the gap between low- and high-
14
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
fidelity evaluations to achieve further improvements in accuracy and algorithmic efficiency. EMT across non-identical datasets: It is envisaged that future cloud-based black-box optimization services shall open up diverse applications of EMT for automated configuration of learning algorithms. Comparable services are already on the horizon, making it possible for users to upload their raw data to the cloud and have high-quality predictive models delivered without the need for extensive human inputs [62]. Different user groups may possess non-identical data, and, as depicted in Fig. 2.1, may even pose different device requirements constraining the transition of trained models from the cloud to the edge. In such settings, EMT could effectively function as an expert ML practitioner, exploiting knowledge transfers across non-identical but related domains to speed up model configuration. Early works showing the plausibility of this idea—using a distinct class of multi-task Bayesian optimization algorithms—were presented in [63, 64]. Recently, an intriguing application of EMT feature selection to understand the employability of university graduates has been explored [65]. Students studying different disciplines (business, engineering, etc.) formed multiple non-identical cohorts, with the data for each cohort forming a feature selection task in MTO. Then, by allowing common features/attributes to be shared through multitasking, efficient identification of determinants that most influence graduate employment outcomes
Fig. 2.1 Cloud computing platforms house black-box optimization services where users can simply upload their raw data to have optimized predictive models delivered [62]. In this setting, EMT could harness knowledge transfers across non-identical but related tasks (e.g., with different training data and/or device requirements) to enable efficient model configuration
2.2 EMT in Real-World Problems
15
was achieved. In [66], a multi-task genetic programming algorithm for feature learning from images was proposed. For a given pair of related but non-identical datasets, the approach jointly evolves common trees together with task-specific trees that extract and share higher-order features for image classification. The effectiveness of the approach was experimentally verified for the case of simultaneously solving two tasks, showing similar or better generalization performance than single-task genetic programming. • Case study in symbolic regression modeling [67] Many works in the literature have explored multitasking in connection with genetic programming [69, 70]. Here, a real-world study of MFGP comprising two symbolic regression tasks with distinct time series data is considered [67]. The first problem instance contains 260 data points representing monthly average atmospheric CO.2 concentrations collected at Alert, Northwest Territories, Canada from January 1986 to August 2007. The second problem instance contains 240 data points representing monthly U.S. No 2 Diesel Retail Prices (DRP) from September 1997 to August 2017. Two simplified tasks with reduced time series datasets were also generated by subsampling of the original data. These were labelled as S_CO.2 and S_DRP, respectively. The MFGP was thus applied to solve three pairs of tasks, i.e., {CO.2 , S_CO.2 }, {CO.2 , DRP} and {DRP, S_DRP}, each with the goal of deriving a symbolic (closed-form mathematical) equation mapping elapsed time to the output prediction. Equations were evolved by minimizing their root mean square error (RMSE) [67]. Table 2.1 summarizes the RMSE values obtained by MFGP and its singletask counterpart SL-GEP [71]. Superior results are highlighted in bold. As can be seen, MFGP outperformed SL-GEP in all experimental settings. Particularly, the best results of CO.2 and DRP were achieved when paired with their corresponding simplified problem variants. This is intuitively agreeable as the simplified tasks (generated by subsampling) are expected to be similar to the original problem instances, hence engendering fruitful transfers of genetic building-blocks that speed up convergence and improve performance. Table 2.1 RMSE values achieved by MFGP and single-task SL-GEP for the symbolic regression of time series data. Best values are marked in bold. The results are obtained from [67]
CO.2 Paired problem RMSE CO.2 N/A MFGP S_CO.2 4.828 DRP 5.495 S_DRP N/A SL-GEP 5.504
DRP RMSE 0.494 N/A N/A 0.478 0.534
16
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
2.2.2 Category 2: EMT in Evolving Embodied Intelligence Evolutionary robotics takes a biologically inspired view of the design of autonomous machines [72]. In particular, EAs are used to adapt robots/agents to their environment by optimizing the parameters and architecture of their control policy (i.e., the function transforming their sensor signals to motor commands) while accounting for, or even jointly evolving, the morphology of the agent itself. It is the design of intelligent behaviour through this interplay between an agent and its environment, mediated by the physical constraints of the agent’s body, sensory and motor system, and brain, that is regarded as embodied intelligence [73]. Put differently, while mainstream robotics seeks to generate better behaviour for a given agent, embodied intelligence enables agents to adapt to diverse forms, shapes and environments, hence setting the stage for the efficacy of EMT with implicit or explicit genetic transfer to be naturally realized [74]. Imagine evolving embodied intelligence by means of different tasks parameterized by an agent’s morphological and environmental descriptors. In [75], a multitasking analogue of an archive-based exploratory search algorithm [76] was used to train a 6-legged robot to walk forward as fast as possible under different morphologies derived by changing the lengths of its legs. Each set of lengths thus defined a specific task. The experiments evolved walking gait controllers for 2000 random morphologies (or tasks) at once, under the intuition that a particular controller might transfer as a good starting point for several morphologies. The results successfully substantiated this intuition, showing that a multi-task optimization algorithm was indeed able to significantly outperform a strong singletask baseline. Similarly, in [75] and [77], a set of planar robotic arm articulation tasks with variable morphology were formulated by parameterizing the arm by the length of its links. The objective of each task was then to find the angles of rotation of each joint minimizing the distance between the tip of the arm and a predefined target. The experiments in [77] confirmed that an anomaly detection model-based adaptive EMT algorithm, with explicit transfer strategy, could achieve faster convergence and better objective function values (averaged across all tasks) compared to the baseline single-task EA. While the two previous examples considered robot morphological variations, [68] applied EMT (in particular, an adaptive version of the MFEA) for simulationbased deep learning of control policies of a robot arm situated in different Meta-World environments [78]. As shown in Fig. 2.2, the various tasks in MTO involved deep neuroevolution of policy parameters of a robot arm interacting with different objects, with different shapes, joints, and connectivity. In the experiments, up to 50 tasks were evolved at the same time, with crossover-based exchange of skills between synergistic tasks leading to higher success rates as well as lower computational cost compared to a single-task soft actor critic algorithm [68]. • Case study in neuroevolution of robot controllers [36]
2.2 EMT in Real-World Problems
17
Feedback Sensor
Sensor
Environment
Environment Knowledge Transfer Window close
Drawer open Effector
Effector
Action
Action Other environments:
Window open
Button press
Door open
Pick place
Peg insert side
Drawer close
Fig. 2.2 The window close and drawer open tasks share similar approaching and pulling movements. Hence, training a robot to perform such tasks simultaneously via EMT allows mutually beneficial knowledge transfers to occur. The lower figure panel visualizes the same robot situated in other Meta-World environments that were included in the experimental study in [68]
Here, a case study on the classical double pole balancing problem under morphological variations is considered. The basic problem setup consists of two inverted poles of different lengths hinged on a moving cart. The objective is for a neural network controller to output a force that acts on the moving cart such that both poles are balanced (i.e., remain within an angle of .±36◦ from the vertical for a specified duration of simulated time), while also ensuring that the cart does not go out of bounds of a 4.8 m horizontal track. Neuroevolution of network parameters continues until the poles are successfully balanced, or the available computational budget is exhausted. The success rates of EAs over multiple randomly initialized runs are recorded for comparison. The input to the neural network is the state of the system which is fully defined by six variables: the position and velocity of the cart on the track, the angle of each pole from the vertical, and the angular velocity of each pole. The Runge-Kutta fourth-order method is used to simulate the entire system. Multiple morphologies in MTO were constructed by varying the difference in the lengths of the two poles. In particular, the length of the long pole was fixed at 1 m, while the length .ls of the shorter pole was set as either 0.60 m (.T1 ), 0.65 m (.T2 ), or 0.70 m (.T3 ). Four resulting MTO settings are denoted as {.T1 , T2 }, {.T1 , T3 }, {.T2 , T3 }, and {.T1 , T2 , T3 }. The architecture of the neural network controller (twolayer with ten hidden neurons) was kept the same for all tasks, thus providing an inherently unified parameter space for transfer. It is well-known that the double pole system becomes increasingly difficult to control as the length of the shorter pole approaches that of the long pole. However, by simultaneously tackling multiple tasks with different levels of difficulty, the controllers evolved for simpler tasks could transfer to help solve more challenging problem instances efficiently.
18
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
Table 2.2 Comparison of success rates (in %) achieved by MFEA-II and a single-task canonical EA (CEA) on different double pole balancing problem instances. Results are obtained from [36]. Best results are marked in bold
Task
.
ls
CEA
T1 T2 .T3
0.60m 0.65m 0.70m
27% 0% 0%
. .
MFEA-II {.T1 , T2 } 30% 27% –
{.T1 , T3 } 30% – 7%
{.T2 , T3 } – 27% 27%
{.T1 , T2 , T3 } 47% 37% 17%
This intuition was borne out by the experimental studies in [36], results of which are also depicted in Table 2.2. A single-task canonical EA (CEA) could only achieve a success rate of 27% on task .T1 while failing on the more challenging instances .T2 and .T3 . In contrast, the MFEA-II algorithm, equipped with exactly the same operators as CEA, achieved better performance across all tasks by virtue of unlocking inter-task skills transfer. Not only did the success rate of .T1 reach 47% (indicating that useful information could even transfer from challenging to simpler tasks), but that of .T2 and .T3 also reached a maximum of 37% and 27%, respectively.
2.2.3 Category 3: EMT in Unmanned Systems Planning Evolutionary approaches are being used to optimize individual behaviours in robot swarms and unmanned vehicle systems. Consider unmanned aerial vehicles (UAVs) as an example. As their usage increases, UAV traffic management systems would be needed to maximize operational efficiency and safety [79], avoiding catastrophes such as collisions, loss of control, etc. In such settings, each UAV may be viewed as an individual agent that perceives its surroundings to solve its corresponding task (e.g., path planning). The communication of acquired perceptual and planning information to other UAVs in related environments could then lead to better and faster decisions collectively. An illustration is depicted in Fig. 2.3 where flight paths of different UAVs share similar straight or bent segments; these priors can be transferred and reused (as common solution building-blocks) to support real-time multi-UAV optimization. Explicit EMT offers a means to this end. An early demonstration of this idea was presented in [80], where two different multi-UAV missions were optimized jointly via the MFEA. The missions were optically distinct. While the first involved a pair of UAVs flying through two narrow openings in a barrier, the second involved four UAVs flying around a geofence of circular planform. The flight paths in both missions however possessed a hidden commonality. In all cases, the optimal magnitude of deviation from the straight line joining the start and end points of any UAV’s path was the same. The MFEA successfully exploited this commonality to quickly evolve efficient flight paths.
2.2 EMT in Real-World Problems
19
Base 3 Destination
2 1 6
5
8
4
7 11 10 9 Fig. 2.3 An illustration of multi-agent path planning. Red stars denote waypoints between the base station and the destination that must be visited by set of UAVs. The flight paths of different UAVs share similar, and hence transferrable, segments (such as segments 1-to-2 in path .p1 and 4-to-5 in path .p2 , or segments 7-to-8 in path .p3 and 9-to-11 in path .p4 ) due to their similar surroundings (e.g., buildings)
A similar application was carried out in [81] for the path planning of mobile agents operating in either the same or different workspaces. It was confirmed that EMT could indeed lead to the efficient discovery of workspace navigation trajectories with effective obstacle avoidance. In [82], a multi-objective robot path planning problem was considered to find solutions that optimally balance travel time and safety against uncertain path dangers. Given three topographic maps with distinct terrains, but bearing similarity in the distribution of obstacles, an EMT algorithm transferring evolved path information was shown to converge to sets of shorter yet safer paths quicker than its single-task counterpart. • Case study in safe multi-UAV path planning [83] As a real-world example, a case study on the multi-objective path planning of five UAVs deployed in a .10 × 7 km.2 region in the southwest of Singapore is presented. The problem is characterized by uncertainty, stemming from the sparsity of data available to model key environmental factors that translate into operational hazards. The objective is thus to minimize travel distance while also minimizing the probability of unsafe events (which could be caused by flying through bad weather,
20
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
Fig. 2.4 Convergence trends of NSGA-II and MO-MFEA-II on multi-UAV path planning. MO-MFEA-II incorporates lower-fidelity auxiliary tasks to help optimize the high-fidelity target .T1 . Plots are obtained from [83]. The shaded area spans .1/2 standard deviation on either side of the mean performance
or by loss of control due to poor communication signal strength). The latter objective is quantified based on a path-integral risk metric derived in [79]. The resultant biobjective optimization problem is further supplemented with constraint functions to ensure safe distance between UAVs, concurrence with altitude boundaries, and prevention of geofence breaches; refer to [83] for a detailed description. The ultimate goal of such a path planning system is to enable real-time decision support. However, the path-integral risk metric is computed via a numerical quadrature scheme that becomes computationally expensive for accurate risk estimation (i.e., when using a high-resolution 1D mesh). Hence, an MTO formulation was proposed in [83] where cheaper low- and medium-fidelity auxiliary tasks were generated (by means of lower-resolution meshes) and combined with the main highfidelity task at hand. The high-, medium-, and low-fidelity tasks are denoted as .T1 , .T2 and .T3 , respectively. Figure 2.4 compares the optimization performance obtained by a single-task multi-objective EA [84] (solving just the high-fidelity task) and a multi-objective version of MFEA-II (MO-MFEA-II) [83] solving {.T1 , .T2 } or {.T1 , .T2 , .T3 }. The hypervolume metric [85] is used to quantify convergence trends in multidimensional objective space. As seen in the figure, both MO-MFEA-II settings led to better hypervolume scores faster than the conventional single-task approach. The speedup is greater when given two auxiliary tasks (i.e., in the case of MTO with {.T1 , .T2 , .T3 }), demonstrating the advantage of transferring good solutions generated by lowerfidelity tasks to quickly optimize the target problem instance.
2.2.4 Category 4: EMT in Complex Design The evaluation of solutions in scientific and engineering design domains often involves time-consuming computer simulation or complex laboratory procedures
2.2 EMT in Real-World Problems
21
to be carried out (such as synthesizing candidate protein structures for protein optimization). The need for active solution sampling and evaluation to solve such tasks from scratch can thus become prohibitively expensive. MTO provides an efficient alternative that has begun to attract widespread attention; examples of practical application have included finite element simulation-based system-in-package design [87], finite difference simulation-based optimization of well locations in reservoir models [88], parameter identification of photovoltaic models [89], optimization of active and reactive electric power dispatch in smart grids [90], design of a coupledtank water level fuzzy control system [91], to name a few. The hallmark of EMT in such applications lies in seeding transferred information into the search, hence building on solutions of related tasks to enable rapid design optimization. This attribute promises to particularly enhance the concpetualization phase of design exercises, where multiple concepts with latent synergies are conceived and assessed at the same time [80, 92]. Take car design as an exemplar. In [93, 94], multi-factorial algorithms were applied to simultaneously optimize the design parameters of three different types of Mazda cars—a sport utility vehicle, a large-vehicle, and a small-vehicle—of different sizes and body shapes, but with the same number of parts. (The three problem instances were first proposed in [95], where the structural simulation software LS-DYNA1 was used to evaluate collision safety and build approximate response surface models.) Each car has 74 design parameters representing the thickness of the structural parts for minimizing weight while satisfying crashworthiness constraints. The experimental results in [93] showed that EMT was able to achieve better performance than the conventional (single-task) approach to optimizing the car designs. In another study, multi-task shape optimization of three types of cars—a pick-up truck, a sedan, and a hatchback—was undertaken to minimize aerodynamic drag (evaluated using OpenFOAM2 simulations) [86]. The uniqueness of the study lies in using a 3D point cloud autoencoder to derive a common design representation space that unifies different car shapes; a graphical summary of this idea is depicted in Fig. 2.5. The transfer of solution building-blocks through the learnt latent space not only opened up the possibility of “out of the box” shape generation, but also yielded up to 38.95% reduction in drag force compared to a single-task baseline given the same computational budget [86]. Not limited to structural and geometric design, EMT has also been successfully applied to process design optimization problems. In an industrial research [96], an adaptive multi-objective, multi-factorial differential evolution (AdaMOMFDE) algorithm was proposed for optimizing continuous annealing production processes under different environmental conditions. A set of environmental parameters defined a certain steel strip production task, with multiple parameter sets forming multiple problem instances in MTO. Each task possessed three objectives, that of achieving prescribed strip hardness specifications, minimization of energy con-
1 2
https://www.lstc.com/products/ls-dyna. https://www.openfoam.com/.
22
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
Knowledge Transfer
Unified Representation
Reconstruction
Fig. 2.5 In many applications of EMT for engineering design, the lack of clear semantic overlap between design parameters could lead to difficulties in the construction of the unified search space .X . One example is in the definition of the unified space of diverse car shapes/geometries for aerodynamic design, which was addressed in [86] using a 3D point cloud autoencoder. Once trained, inter-task knowledge transfers take place in the latent space of the autoencoder
sumption, and maximization of production capacity. Experiments simultaneously solving up to eight tasks were carried out in [96]. The results demonstrated that the AdaMOMFDE algorithm could significantly outperform the single-task NSGA-II (as quantified by convergence trends of the inverted generational distance metric), hence meeting design specifications while potentially boosting productivity in the iron and steel industry. In addition to the focused application areas above, MTO lends a general framework for handling expensive design optimizations by jointly incorporating tasks of multiple levels of fidelity. The real-world case study in the previous subsection was a case in point, albeit belonging to a different category. Other related studies have also appeared in the literature [97]. • Case study in simulation-based process design [23] This study showcases an example where EMT was applied to jointly optimize two types of liquid composite moulding (LCM) processes for producing the same lightweight composite part [23]. The part under consideration was a glass-fibrereinforced epoxy composite disk, while the two LCM processes were resin transfer moulding (RTM) and injection/compression LCM (I/C-LCM). The process details are not reproduced herein for the sake of brevity; interested readers are referred to [23]. The key characteristic of these two processes is that they possess partially overlapping design spaces. Specifically, there exist three design parameters—the pressure and temperature of the epoxy resin when injected into the mould, and the temperature of the mould itself—that have similar physical effect on both LCM processes, hence leading to the scope of exploitable inter-task synergies. The RTM and I/C-LCM optimization problem instances were formulated as biobjective minimization tasks. The first objective was to minimize mould filling time (which in turn increases process throughput), while the second was to minimize peak internal fluid and fibre compaction force (which in turn reduces setup and running cost of peripheral equipment). For a set of candidate design parameters,
2.2 EMT in Real-World Problems I/C-LCM
0.7 0.65 0.6 0.55 0.5 0.45
MO-MFEA NSGA-II
0.4 0
500
1000
Evaluations
1500
2000
Performance (Hypervolume)
RTM
0.75
Performance (Hypervolume)
23
0.7 0.65 0.6 0.55 0.5 0.45
MO-MFEA NSGA-II
0.4 0
200
400
600
800
1000 1200 1400
Evaluations
Fig. 2.6 (a) Hypervolume convergence trends of MO-MFEA and NSGA-II on the RTM process optimization task; (b) hypervolume convergence trends of MO-MFEA and NSGA-II on the I/CLCM process optimization task. These plots have been obtained from the real-world study in [23]
the objective function values for either task were evaluated using a dedicated finite element numerical simulation engine. The outputs of the multitasking MO-MFEA and the single-task NSGA-II are compared in Fig. 2.6 in terms of the normalized hypervolume metric. The convergence trends achieved by MO-MFEA on both tasks were found to surpass those achieved by NSGA-II. Taking RTM as an example (see left panel of Fig. 2.6), the MO-MFEA took only about 1000 evaluations to reach the same hypervolume score reached by NSGA-II at the end of 2000 evaluations. This represents a .∼50% saving in cost, which for expensive simulation-based optimization problems (ubiquitous in scientific and engineering applications) translates to substantial reduction in design time and the wastage of valuable physical resources.
2.2.5 Category 5: EMT in Manufacturing, Operations Research The grand vision of smart manufacturing involves integration of three levels of manufacturing systems, namely, the shop floor, enterprise, and supply chain, into automated and flexible networks that allow for seamless data collection (via distributed sensors), data exchange, analysis, and decision-making [99]. These may be supported by a nerve center or manufacturing control tower, where real-time data is collected across all system levels to offer centralized processing capacity and end-to-end visibility. It is in enabling effective functioning of such control towers that we foresee EMT to thrive, leveraging the scope of seamless data exchanges to deliver fast and optimal (or near-optimal) operational decisions [100]. Targeting energy efficient data collection and transmission to the base location (e.g., the nerve center), [101] demonstrated the utility of EMT for optimizing
24
2 Overview and Application-Driven Motivations of Evolutionary Multitasking
the topology of wireless sensor networks. The optimization of both single-hop and multi-hop network types were combined in MTO to help with consideration of both deployment options. It was shown using a variant of the MFEA with random-key encoding that the exchange of useful information derived from solving both tasks could in fact lead to better overall results than the baseline singletask method. In [102], the follow-on problem of charging the wireless sensors was also undertaken using a multi-task approach. Multiple mobile chargers were simultaneously considered, with the charging schedule for each forming a task in MTO. Returning to manufacturing operations, there exists a sizeable amount of research on applying EMT algorithms to NP-hard problems at the shop floor (e.g., for job shop scheduling [103, 104]) or at the logistics and supply chain levels (e.g., for vehicle routing applications [105, 106] and its extension to pollution-routing [107]). For last-mile logistics in particular, centralized cloud-based EMT was envisioned in [19, 108] to take advantage of similarities in the graph structures of vehicle routing problem (VRP) instances toward rapid optimization. The application of EMT to other forms of graph-based optimization tasks with potential use in manufacturing have also been explored in [109, 110]. Despite some success, there are still challenges in reliably implementing EMT for combinatorial optimization tasks ubiquitous in manufacturing and operations research. A key issue is that of solution representational mismatch, which can lead to negative transfer [111]. For instance, consider unifying two VRPs in EMT that are defined using different node labels/indices even though their underlying customer distributions are similar. Due to the resultant label mismatch, genetic transfers under standard permutation-based solution representations would lead to suboptimal (or even confounding) exchange of routes or subroutes between tasks. Two recent research avenues hold promise in overcoming the aforementioned challenge. The first entails departure from the usual direct transfer of solution prototypes in EMT. Instead, the transfer of higher-order solution construction heuristics that are agnostic to low-level solution representations is proposed (as a form of multi-task hyper-heuristic); both heuristic selection [112] and generative approaches [113] have been put forward, showing greater robustness to representational mismatches in EMT. The second research avenue deals with learning solution representations, transforming problem instances in a manner that minimizes intertask representational mismatch. An illustration of this idea is depicted in Fig. 2.7, where two VRP instances (.V RP1 and .V RP2 ) with seemingly dissimilar customer distributions and node labelling are examined. However, through an isometric transformation (comprising rotation and translation operations) of the nodes in .V RP2 (which preserves shortest routes), a new representation scheme that better aligns both tasks is obtained [98]. • Case study in last-mile logistics planning [13] Following on from the discussions above, a case study on real-world package delivery problem (PDP) instances [13] from a courier company in Beijing, China, is presented. The PDP is a variant of the NP-hard VRP, where the objective function
2.2 EMT in Real-World Problems
25
Fig. 2.7 (a) .V RP1 and .V RP2 possess seemingly dissimilar node distribution and labels; (b) solution representation learning is undertaken to isometrically transform the node distribution of .V RP2 to match .V RP1 ; (c) the similarity of the two VRPs is unveiled after the transformation [98]
pertains to minimizing total routing costs in servicing a set of geographically distributed customers with a fleet of capacity constrained vehicles located at a single or multiple depots. The results presented hereafter are for an explicit EMT combinatorial optimization algorithm (EEMTA for short) whose uniqueness lies in incorporating solution representation learning via sparse matrix transformations to facilitate the transfer of useful information across tasks. The experiments were conducted on four PDP requests that were paired to form two examples of MTO. The pairing was done based on customer distributions, with the resulting MTO formulations referred to as {.P DP1 , .P DP2 } and {.P DP3 , .P DP4 }, respectively. The obtained results show that the EEMTA was able to achieve some extent of performance speed up across all four tasks. Multitasking provided an impetus to the overall search, whilst strongly boosting outcomes of the initial stages of evolution on .P DP2 and .P DP4 in particular. The reader is referred to Chapter 7 for full details of the EEMTA and the experimental study.
2.2.6 Category 6: EMT in Software and Services Computing Many problems in software engineering can eventually be converted into optimization problem instances. Examples include finding the minimum number of test cases to cover the branches of a program, or finding a set of requirements that would minimize software development cost while ensuring customer satisfaction. The objective functions of such tasks generally lack a closed form, hence creating a niche for black-box search methods like EAs—underpinning the field of search-based software engineering [114]. What’s more, as software services increasingly move to public clouds that simultaneously cater to multiple distributed users worldwide, a playing field uniquely suited to EMT emerges. A schematic of EMT’s potential in this regard is highlighted in Fig. 2.8, where the scope of joint construction/evolution of two distinct programs by the efficient transfer and reuse of common buildingblocks of code is depicted.
26
2 Overview and Application-Driven Motivations of Evolutionary Multitasking Program A = x
Program B
= n
y
>
* m
=
while
n
y
y
Block
x
-
y y
m
...
=
1
...
while Block
y Knowledge Transfer
...
=
0
-
y y
1
Fig. 2.8 Two programs A and B concerning different tasks but with similar abstract syntax tree representations are depicted. Knowledge encoded in common subtrees could be efficiently transferred and reused through EMT to enhance the performance of an imagined automated program generator
Concrete realizations of this idea for web service composition (WSC) have been studied in the literature [115, 116]. The composition was achieved in [116] by formulating the problem as one of permutation-based optimization, where solutions encode the coupling of web services into execution workflows. Given the occurrence of multiple similar composition requests, a joint MTO formulation was proposed. The experiments compared three permutation-based variants of the MFEA against a state-of-the-art single-task EA on popular WSC benchmarks. The results showed that multitasking required significantly less execution time than its single-task counterpart, while also achieving competitive (and sometimes better) solution quality in terms of quality of semantic matchmaking and quality of service. In what follows, we delve into a specific use-case in software testing that precisely fits the MTO problem setting with a set of objective functions and a set of corresponding solutions being sought. • Case study in software test data generation [117] In [117], the ability of EMT to guide the search in software branch testing by exploiting inter-branch information was explored. Each task in MTO represented a branch of a given computer program, with the objective of finding an input such that the control flow on program execution (resulting from that input) would bring about the branch. Successfully achieving this is referred to as branch coverage. Hence, the overall problem statement, given multiple branches, was to find a set of test inputs that would maximize the number of branches covered. (Optimal coverage could be less than 100% since certain branches could be infeasible, and hence never covered.) In the experimental study, 10 numerical calculus functions written in C, extracted from the book Numerical Recipes in C: The Art of Scientific Computing [118], were considered. The inputs to these functions are of integer or real type. Two EMT algorithm variants (labelled as MTEC-one and MTEC-all, indicating the number of tasks each candidate solution in a population is evaluated for) that seek to jointly cover all branches of a program were compared against a single-task EA tackling each branch independently. Table 2.3 contains the averaged coverage percentage obtained by all algorithms over 20 independent runs, under uniform
2.2 EMT in Real-World Problems
27
Table 2.3 The coverage percentage obtained by MTEC-one, MTEC-all and single-task EA over 20 independent runs. Best values are marked in bold. Reported results are obtained from [117]
Program plgndr gaussj toeplz bessj bnldev des fit laguer sparse adi
Branches 20 42 20 18 26 16 18 16 30 44
MTEC-one 100 97.62 85 100 80.77 93.44 97.5 91.25 81.33 59.09
MTEC-all 100 97.62 85 100 80.77 91.88 97.5 90.94 90 59.09
Single-task EA 99.58 97.62 84.75 100 76.92 93.44 92.78 85 88 56.25
computational budget. The table reveals that MTEC, by virtue of leveraging intertask information transfers, achieved competitive or superior coverage performance than the independent search approach on the majority of programs.
Part II
Evolutionary Multi-Task Optimization for Solving Continuous Optimization Problems
Chapter 3
The Multi-Factorial Evolutionary Algorithm
This chapter presents the first version of the well-known multi-factorial evolutionary algorithm (MFEA). Although the MFEA is broadly applicable to a wide variety of problem-types, we limit our discussions here to the solving of continuous optimization tasks. We first present important concepts and definitions pertaining to multi-factorial optimization (MFO), an implicit EMT paradigm characterized by the concurrent existence of multiple search spaces corresponding to different tasks, each possessing a unique objective function landscape. The nomenclature is therefore inspired by the observation that every task contributes a unique factor influencing the evolution of a single population of individuals. Next, we detail the MFEA to handle such problems. The methodology is inspired by bio-cultural models of multi-factorial inheritance, which explain the transmission of complex developmental traits to offspring through the interactions of genetic and cultural factors. The numerical experiments reveal several potential advantages of implicit genetic transfer in a multitasking environment. Most notably, we demonstrate that the creation and transfer of refined genetic material can often lead to accelerated convergence for a variety of complex optimization functions.
3.1 Algorithm Design and Details 3.1.1 Multi-Factorial Optimization Akin to Sect. 1.3 of Chap. 1, consider a situation wherein K optimization tasks are to be performed simultaneously, also referred to as a K-factorial problem. Without loss of generality, here all tasks are assumed to be minimization problems. The .j t h task, denoted by .Tj , is considered to have a search space .Xj on which the objective function is defined as .fj : Xj → R. In addition, each task may be constrained by © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 L. Feng et al., Evolutionary Multi-Task Optimization, Machine Learning: Foundations, Methodologies, and Applications, https://doi.org/10.1007/978-981-19-5650-8_3
31
32
3 The Multi-Factorial Evolutionary Algorithm
several equality and/or inequality conditions that must be satisfied for a solution to be considered feasible. In such a setting, we define MFO as a single-population EMT paradigm that builds on the implicit parallelism of population-based search with the aim of finding .{x∗1 , . . . , x∗j , . . . , x∗K } = arg min{f1 (x), . . . , fj (x), . . . , fK (x)}, where .x∗j is a feasible solution in .Xj . Herein, each .fj is treated as an additional factor influencing the evolution of a single population of individuals. In order to design evolutionary solvers for MFO, it is important to formulate a general technique for comparing population members even across distinct tasks in a multitasking environment. To this end, we first define a set of properties for every individual .pi , where .i ∈ {1, 2, . . . , |P |}, in a population P . Note that we assume the individuals are encoded in a unified search space encompassing .{X1 , . . . , Xj , . . . , XK }, and can be decoded into a task-specific solution representation with respect to each of the K optimization tasks. The decoded form of .pi can thus be written as .{xi1 , . . . , xij , . . . , xiK }, where .xi1 ∈ X1 , .. . ., .xij ∈ Xj , .. . ., and i .x K ∈ XK . – Definition 1 (Factorial Cost): For a given task .Tj , the factorial cost .Υji of individual .pi is given by .Υji = λ·δji +fji ; where .λ is a large penalizing multiplier, i i .f and .δ are the objective value and the total constraint violation, respectively, j j of .pi with respect to .Tj . Accordingly, if .pi is feasible with respect to .Tj (zero constraint violation), we have .Υji = fji . – Definition 2 (Factorial Rank): The factorial rank .rji of .pi on task .Tj is simply the index of .pi in the list of population members sorted in ascending order with respect to .Υj . While assigning factorial ranks, whenever .Υja = Υjb for a pair of individuals .pa and .pb , the parity is resolved by random tie-breaking. However, since the performance of the two individuals is equivalent with respect to the .j t h task, we label them as being j -counterparts. i } of an – Definition 3 (Scalar Fitness): The list of factorial ranks .{r1i , r2i , . . . , rK individual .pi is reduced to a scalar fitness .φi based on its best rank over all tasks; 1 i.e., .φi = i . minj∈{1,...,K} {rj }
– Definition 4 (Skill Factor): The skill factor .τi of .pi is the one task, amongst all other tasks in MFO, on which the individual is most effective, i.e., .τi = arg minj {rji } where .j ∈ {1, 2, . . . , K}. Once the fitness of every individual has been scalarized according to Definition 3, performance comparison can be carried out in a straightforward manner. For example, .pa is considered to dominate .pb in multi-factorial sense simply if .φa > φb . We denote this relation between the two individuals as .pa pb . In the event that two individuals have the same skill factor, i.e. .τa = τb = j , and they also happen to be j -counterparts, we label them as being strong counterparts. It is important to note that the procedure described heretofore for comparing individuals is not absolute. As the factorial rank of an individual (and implicitly its
3.1 Algorithm Design and Details
33
scalar fitness and skill factor) depends on the performance of every other individual in the population, the comparison is in fact population dependent. Nevertheless, the procedure guarantees that if an individual .p∗ uniquely maps to the global optimum of any task, then, .φ ∗ ≥ φi for all .i ∈ {1, 2, . . . , |P |}. Therefore, it can be said that the introduced technique is indeed compatible with the ensuing definition of multi-factorial optimality. – Definition 5 (Multi-factorial Optimality): An individual .p∗ , with a list of objective values .{f1∗ , f2∗ , . . . , fK∗ }, is considered optimum in multi-factorial sense iff .∃j ∈ {1, 2, . . . , K} such that .fj∗ ≤ fj (xj ), for all feasible .xj ∈ Xj .
3.1.2 Similarity and Difference Between Multi-factorial Optimization and Multi-Objective Optimization In some cases it can be argued that the standard EAs for multi-objective optimization (MOO) are applicable for the purpose of EMT. However, it must be observed that there exists a fundamental difference between the principles of the two paradigms. While MFO aims to leverage the implicit parallelism of population-based search to exploit latent genetic complementarities between multiple tasks, MOO attempts to efficiently resolve conflicts among competing objectives of the same task. The concept of Pareto optimality [119] is thus not manifest in the prescribed scope for EMT as multi-factorial optimality (see Definition 5) does not depend on finding a good trade-off among the different objectives. Instead, it depends on finding the global optimum of at least one constitutive objective function. (Note, the notion of Pareto optimality could occur in EMT if any of the constitutive tasks was itself MOO.) In order to further emphasize the distinction, we refer to the objective space of a hypothetical 2-factorial problem depicted in Fig. 3.1. From the principles of nondominated sorting used in MOO [119], it follows that individuals .{p2 , p3 , p4 , p5 } belong to the first non-dominated front while .{p1 , p6 } belong to the second nondominated front. In other words, individuals .{p2 , p3 , p4 , p5 } are incomparable to each other and are always preferred over .{p1 , p6 }. However, in the context of MFO, referring to task 1, the individuals .p1 and .p2 (and also, .p5 and .p6 when referring to task 2) are labeled as being strong counterparts. Moreover, .{p1 , p2 , p5 , p6 } {p3 , p4 }. In other words, .p1 , p2 , p5 , p6 are considered incomparable to each other in MFO and are always preferred over .p3 , p4 . Thus, there emerges a disagreement about the relative performance of individuals as deduced from the principles of MOO and MFO.
Fig. 3.1 Sample points in combined objective space of two hypothetical optimization tasks
3 The Multi-Factorial Evolutionary Algorithm
Function value for optimization task 2
34
6 5
p1
4
p2 p3
3
p4
2
p5
1 0
0
p6
1 2 3 4 5 Function value for optimization task 1
6
3.1.3 The Multi-Factorial Evolutionary Algorithm The multi-factorial evolutionary algorithm (MFEA) is inspired by the bio-cultural models of multi-factorial inheritance. As the working of the algorithm is based on the transmission of biological and cultural building blocks (genes and memes) [120, 121] from parents to their offspring, the MFEA is regarded as belonging to the realm of memetic computation [122, 123]. In particular, cultural effects are incorporated via two features of multi-factorial inheritance acting in concert, namely, (a) assortative mating and (b) vertical cultural transmission. Details of these features and their computational analogues shall be discussed herein. Although the basic structure of the MFEA (presented in Algorithm 2) is similar to that of a classical elitist EA [7], the aforementioned memetic augmentations transform it into an effective multitask solver.
3.1.3.1 Population Initialization Assume that in K optimization tasks to be performed simultaneously, the dimensionality of the .j t h task is given by .Dj . Accordingly, we define a unified search space with dimensionality (.Dmult it ask ) equal to .maxj {Dj }. During the population initialization step, every individual is thus endowed with a vector of .Dmult it ask random variables (each lying within the fixed range .[0, 1]). This vector constitutes the chromosome (the complete genetic material) of that individual. Essentially, the .i t h dimension of the unified search space is represented by a random-key .yi , and the fixed range represents the box-constraint of the unified space. While addressing task .Tj , we simply refer to the first .Dj random-keys of the chromosome. The motivation behind using such an encoding technique, in place of simply
3.1 Algorithm Design and Details
35
Algorithm 2: Basic structure of the MFEA 1 Generate an initial population of individuals and store it in current-pop (P ). 2 Evaluate every individual with respect to every optimization task in the
multitasking environment. 3 Compute the skill factor (τ ) of each individual. 4 while while (stopping conditions are not satisfied) do 5 Apply genetic operators on current-pop to generate an offspring-pop (C). 6 7 8 9 10
Refer to Algorithm 3. Evaluate the individuals in offspring-pop for selected optimization tasks only (see Algorithm 3). Concatenate offspring-pop and current-pop to form an intermediate-pop (P ∪ C). Update the scalar fitness (φ) and skill factor (τ ) of every individual in intermediate-pop. Select the fittest individuals from intermediate-pop to form the next current-pop (P ). end while
concatenating the variables of each optimization task to form a giant chromosome of .D1 + D2 + . . . + DK elements, is two-fold: a. From a practical standpoint, it helps circumvent the challenges associated with the curse of dimensionality when several tasks with multidimensional search spaces are to be solved simultaneously. b. On theoretical grounds, it is considered to be an effective means of accessing the power of population-based search. As the schemata (or genetic building blocks) [124] corresponding to different optimization tasks are contained within a unified pool of genetic material, they get processed by the EA in parallel. Most importantly, this encourages the discovery and implicit transfer of useful genetic material from one task to another in an efficient manner. Moreover, as a single individual in the population may inherit genetic building blocks corresponding to multiple optimization tasks, the analogy with multi-factorial inheritance becomes more meaningful.
3.1.3.2 Genetic Mechanisms Canonical EAs employ a pair of genetic operators, namely, crossover and mutation [124, 125], which are analogous to their biological namesakes. A key feature of the MFEA is that certain conditions must be satisfied for two randomly selected parent candidates to undergo crossover. The principle followed is that of nonrandom or assortative mating [126, 128], which states that individuals prefer to mate with those belonging to the same cultural background. In the MFEA, the skill factor (.τ ) is
36
3 The Multi-Factorial Evolutionary Algorithm
viewed as a computational representation of an individual’s cultural bias. Thus, two randomly selected parent candidates can freely undergo crossover if they possess the same skill factor. Conversely, if their skill factors differ, crossover only occurs as per a prescribed random mating probability (rmp), or else mutation occurs. Steps for creating offspring according to these rules are provided in Algorithm 3.
Algorithm 3: Assortative mating 1 Consider two parent candidates pa and pb randomly selected
from current-pop. 2 Generate a random number rand between 0 and 1. 3 if (τa = τb ) or rand 1 2 (2u)1/(ηc +1) ,
if u ≤
where u is a random number generated in the range of [0, 1]. As can be observed, different crossovers have various forms which possess unique bias in generating offspring. Since crossover has also been employed in MFEA for implicit knowledge transfer across tasks, it may lead to different forms
4.1 Algorithm Design and Details
51
of knowledge transfer when different crossovers are configured, which could result in diverse multi-task optimization performance.
4.1.2 Knowledge Transfer via Different Crossover Operators in MFEA In this section, we investigate how different crossover operators for knowledge transfer affect the performance of MFEA, using the common SO and MO multitask optimization problems in the literature [136, 137]. In particular, the PILS and NIMS SO multi-task problems, and the CIHS and NIMS MO multi-task problems are studied in this section. More details of the multi-task benchmarks and configurations of the crossovers will be presented later in Sect. 4.2. Figure 4.4 presents the averaged convergence graphs of MFEA with 6 different crossover operators for knowledge transfer on the four multi-task problems, over 20 independent runs. In Fig. 4.4a and b, the y axis denotes the averaged fitness (objective value) in log scale, x-axis is the generation number. Further, in Fig. 4.4c and d, the y axis represents the averaged IGD value in log scale and x-axis denotes the generation number. As can be observed in Fig. 4.4, the performance of MFEA varies when different crossover operators have been employed for knowledge transfer. The performance gap in terms of objective value and IGD can be observed clearly from Fig. 4.4. For instance, on NIMS2 of Fig. 4.4d, the BLX-.α crossover converges about 2 times faster than the SBX crossover since the 200-th generation. Further, the arithmetical and geometrical crossover achieved the best performance on both SO PILS and SO NIMS. Particularly, on PILS (see Fig. 4.4a), while other operators are all stagnated at the early optimization stage, the arithmetical and geometrical crossover converged fast and obtained much better solutions. However, on MO CIHS, these two crossovers degraded to the worst crossover operators, as shown in Fig.4.4c. Further, on the MO problems, the BLX-.α crossover achieved the best IGD on MO PILS, while the geometrical crossover outperformed the others on MO NIMS2. These observations confirmed that different optimization problems may require different configurations of crossover for knowledge transfer across tasks, for efficient multi-task optimization performance, and there is no single crossover operator can perform well on all the SO and MO problems. Towards robust and efficient multi-task optimization performance when different problems are encountered, here we thus propose a new MFEA with adaptive configuration of crossover for knowledge transfer, termed as MFEA-AKT, which will be presented in the next section.
52
4 Multi-Factorial Evolutionary Algorithm with Adaptive Knowledge Transfer
Fig. 4.4 Convergence traces of MFEA with six different knowledge transfer crossovers on PILS, NIMS of the SO multi-task benchmarks and CIHS, NIMS of the MO multi-task benchmarks, respectively. y-axis: log(Averaged results); x-axis: Generation. (a) Convergence traces of SO P I LS. (b) Convergence traces of SO NI MS. (c) Convergence traces of MO CI H S. (d) Convergence traces of MO NI MS
4.1 Algorithm Design and Details
53
Fig. 4.5 The outline of MFEA-AKT
4.1.3 MFEA with Adaptive Knowledge Transfer This section presents the details of MFEA-AKT. In particular, the framework of MFEA-AKT is illustrated in Fig. 4.5, where the main differences between MFEAAKT and the original MFEA are highlighted in the dashed boxes. In MFEA-AKT, we first introduce three new definitions, which are given below. Definition 1 (Transfer Crossover Indicator) The Transfer crossover indicator (.Tci ) of an individual is an integer in the range of [1, m], where m is the number of crossover operators available for knowledge transfer. Particularly, .Tci = i indicates the individual prefers to take the i-th crossover operator for sharing knowledge with other individuals. Definition 2 (Transferred Offspring) The transferred offspring refers to the solution which is generated by parents with different skill factors via crossover. Definition 3 (Immediate Parent) The parent who has the same skill factor with the transferred offspring is defined as the offspring’s immediate parent.
54
4 Multi-Factorial Evolutionary Algorithm with Adaptive Knowledge Transfer
In MFEA-AKT, towards adaptive knowledge transfer, the transfer crossover indicator of each individual is randomly assigned initially. Next, the adaptive assortative mating kicks in to adaptively configure the crossover operator for knowledge transfer across tasks based on the .Tci of the mated individuals. The .Tci of each individual is updated according to the information collected along the evolutionary search process. Further, the .Tci of each offspring individual is obtained via the adaptive vertical cultural transmission. Last but not the least, other procedures of MFEA-AKT, such as initialization, solution evaluation and selection, are kept the same as MFEA in [19].
4.1.3.1 Adaptive Assortative Mating and Adaptive Vertical Cultural Transmission Algorithm 5 presents the details of the adaptive assortative mating procedure. In contrast to the original MFEA, which fixes the crossover for knowledge transfer across tasks, MFEA-AKT contends to adaptively employ the appropriate knowledge transfer crossovers for different individuals. In particular, as depicted in Algorithm 5, first of all, two parents p.1 and p.2 are randomly selected for reproduction. If the parents hold the same skill factor, the SBX crossover is performed for offspring generation as in MFEA.1 Otherwise, with a random mating probability (rmp), individuals with different skill factors are mated via an adaptively configured crossover for knowledge transfer. Specifically, the .Tci of p.1 or p.2 is randomly selected as the activated transfer crossover indicator (.Tcia ), and the associated crossover operator of .Tcia is then used with p.1 and p.2 for knowledge transfer. Further, the generated individuals, i.e. transferred offspring, take .Tcia as their transfer crossover indicator. If the two offspring c.1 and c.2 are generated via crossover without knowledge transfer (or mutation), c.1 and c.2 inherit the transfer crossover indicator from p.1 and p.2 , respectively. Next, the adaptive vertical cultural transmission procedure is described in Algorithm 6. In particular, if an offspring has two parents p.1 and p.2 , it imitates the skill factor of either p.1 or p.2 with equal probability. Otherwise, the offspring imitates the skill factor of the parent after mutation. If the offspring is a transferred offspring, the parent which has the same skill factor with the offspring is set as the offspring’s immediate parent. This immediate parent will be used in the adaptation of transfer crossover indicator of the offspring, which is detailed in the next section.
1
In order to investigate the effect of crossover in knowledge transfer, we kept the crossover for offspring generation without knowledge transfer the same as that in MFEA. However, other crossover can also be applied here.
4.1 Algorithm Design and Details
55
Algorithm 5: Adaptive assortative mating Input: Two randomly selected parents p1 and p2 , and their skill factor τ1 and τ2 ; A random mating probability rmp. Output: The generated offspring, c1 and c2 . 1: Generate a random number rand∈[0, 1]; 2: if (τ1 == τ2 ) or (rand 0, there exists one and only one route .Ci ∈ S such that .vi ∈ Ci . The objective function of CVRP can then be defined as: Cost (S) =
k
.
(7.1)
dis(Ci )
i=1
where .dis(Ci ) denotes the summation of the travel distance (i.e., .eij ) contained in route .Ci . An illustrative example of CVRP is given in Fig. 7.1.
Vehicle 1 Vehicle 3 Depot
Vehicle 2
Customer Depot
Opmized Roung Soluon
Travel Route of Vehicles Service Vehicle with Capacity Q
Fig. 7.1 The example of a CVRP
7.2 Algorithm Design and Details
125
7.2 Algorithm Design and Details In this section, the details of the presented explicit EMT algorithm for combinatorial optimization problem, particularly CVRP, are presented. Specifically, as depicted in Fig. 7.2, given two CVRPs (i.e., .CVRP1 and .CVRP2 ), the learning of mapping across CVRPs first kicks in to build the connections (i.e., .M12 from .CVRP1 to .CVRP2 and .M21 from .CVRP2 to .CVRP1 ) between these two problems, based on the problem data, i.e., customer distribution. This process guarantees that the useful traits can be explicitly transferred across CVRPs possessing diverse problem properties, such as customer topology, number of vehicles, and customer size. Next, two separate evolutionary solvers are employed to optimize each CVRP, respectively. As the knowledge transfer across CVRPs happens while the evolutionary search progresses online, in this chapter, for simplicity, we define the knowledge transfer happens with a fixed generation interval (see .G1 and .G2 in Fig. 7.2). However, without loss of generality, other methods for defining the transfer frequency can also be applied. For the knowledge transfer across CVRPs, to reduce negative transfer effect, we first apply selection process to identify the high-quality solutions for transfer, which is followed by the knowledge learning process to capture the latent useful information that can be transferred across CVRPs. Further, the knowledge transfer will be performed with the built mappings to share the learned useful traits across CVRPs.
Fig. 7.2 Outline of the presented explicit EMT for solving CVRPs
126
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
In what follows, the details of the learning of mapping and knowledge transfer across CVRPs are presented.
7.2.1 Learning of Mapping Across CVRPs As discussed in Sect. 7.1, the objective of CVRP is to find the optimal assignments of customers to the vehicles and the optimal service orders of customers assigned to a common vehicle. Therefore, the aim of knowledge transfer in EMT across CVRPs is to share the good customer assignments as well as customer service orders found along the evolutionary search across CVRPs. To this end, the key is to learn the proper mapping between the customers across CVRPs. It is straightforward to see that the simplest and best customer mapping is an one-to-one mapping. In another word, for each customer in one CVRP, if there is one and only one corresponding customer in the other CVRP, the customer assignments and service orders can then be transferred accordingly. However, as different CVRPs may have various number of customers, it is unpractical to learn the one-to-one mapping between customers of different CVRPs. In this chapter, we thus propose to learn a sparse customer mapping for representing one CVRP customer by the most similar customers across CVRPs. In particular, given two CVRPs, i.e., .OPs and .OPt , which contains the corresponding customer information, i.e., customer locations, that are represented by a .d × ns and a .d × nt matrix, respectively (d denotes the number of features for representing the location of a customer, while .ns and .nt give the number of customers in .OPs and .OPt , respectively.). In this way, the problem of finding customers from .OPs to represent customers in .OPt can be formulated as the learning of a .ns × nt transformation matrix .M, so that .OPs ∗ M = OPt . Further, in order to find the most similar customers in .OPs for customers in .OPt , we propose to learn a sparse .M via minimizing a weighted .l1 norm-regularized reconstruction error, which is given by: .
min ||OPs ∗ M − OPt ||F + ||D M||l1 M
(7.2)
where the first term denotes the reconstruction error of customers in .OPt using the customers in .OPs , while the second term is the weighted .l1 norm-based regularization on the mapping .M. . denotes the element-wise product between two matrices, and F is the Frobenius norm. .D is a .ns × nt matrix, which denotes
7.2 Algorithm Design and Details
127
the weight matrix to further reinforce the sparsity of the mapping .M [211]. Each element .dij of .D is calculated by: j
dij = exp[eij − emin ] ∗ eij
.
(7.3)
where .eij denotes the Euclidean distance between the i-th customer in .OPs and the j j -th customer in .OPt , and .emin gives the shortest Euclidean distance between all the customers in .OPs to the j -th customer in .OPt . Further, to solve Eq. 7.2, we propose to learn each column of .M separately, which is given by: .
j
min ||OPs ∗ M:j − OPt ||F + ||Dj ∗ M:j ||l1 M:j
(7.4)
where .M:j denotes the j th column of .M. .Dj is a .ns × ns diagonal matrix, in which the diagonal element .dii is set as .dij that is calculated via Eq. 7.3. By substituting .Dj ∗ M:j by .K, Eq. 7.4 becomes: .
min ||OPs ∗ D−1 j ∗ K − OPt ||F + ||K||l1 j
K
(7.5)
which can be easily solved by the interior-point method [212], Lastly, by calculating M:j = D−1 j ∗ K, the customer mapping .Mst across CVRPs can be obtained by the concat of .M:j s. For two CVRPs, i.e., .CVRP1 and .CVRP2 , there are two customer mappings. One is from .CVRP1 to .CVRP2 , i.e., .M12 , and the other is from .CVRP2 and .CVRP1 , i.e., .M21 . Based on the learning approach in Eq. 7.2, .M12 is obtained by treating the problem data of .CVRP1 and .CVRP2 as .OPs and .OPt , respectively. .M21 is then calculated by exchanging the input and output for .M12 accordingly. .
7.2.2 Knowledge Transfer Across CVRPs Based on the learned customer mapping .Ms across CVRPs, the knowledge sharing towards enhanced EMT performance happens while the evolutionary search progresses online. In particular, the knowledge sharing here across CVRPs is consist of three components, which are detailed below.
7.2.2.1 Solution Selection The selection of solutions for knowledge transfer across CVRP domains is important for enhanced EMT performance, since inappropriate knowledge transfer will bring about negative transfer effect [213]. In CVRP, as the useful information is embedded in the high-quality CVRP solutions, in this study, we propose to select the best Q
128
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
number of optimized solutions, in terms of objective value, from the source CVRP domain to be transferred to the target CVRP domain, when the knowledge sharing is triggered.
7.2.2.2 Knowledge Learning This process is to capture the useful information embedded in each of the selected solutions, which can be transferred across different CVRPs. As aforementioned, the objective of CVRP is to optimize both the assignments of customers to the vehicles and the service orders of customers assigned to a common vehicle. The optimization process of a CVRP thus can be interpreted as two separate phases. The first phase involves the assignment or clustering of the customers that require services to the appropriate vehicles. The second phase then serves to find the optimal service orders of each vehicle for the assigned customers obtained in phase 1. Therefore, if the proper similarity between customers can be learned, the optimal assignments and service order information can easily be obtained via clustering and pair-wise distance sorting on the customers, respectively. Keeping the above in mind, for each of the selected solution for transfer, here we propose to estimate a new customer representation based on the customer assignments and customer service order information contained in the selected CVRP solution .s. This new customer representation severs as the learned knowledge that can be transferred across CVRPs to generate target CVRP solutions with guidance from the optimized assignments and service orders in .s. In particular, let .CVRPs and .CVRPt denote the source and target CVRP domain, respectively. .ss is the selected solution of .CVRPs . As given in Fig. 7.3, using the optimized solution depicted in Fig. 7.1 as an illustrative example, we first construct a .ns × ns distance matrix .DM for all the customers in .CVRPs , where .ns is the number of customers. Each element .dmij in .DM represents the distance between the .i−th and .j −th customer. Further, in Fig. 7.3, .α denotes a small real number, while .β is a very big real number.1 The motivation behind this new distance matrix is to make the customers served by a common vehicle to be close to each other, while keep customers served by different vehicles away from one another. Further, the distance between customers severed by the same vehicle increases linearly according to the corresponding service orders. As can be observed, the optimized vehicle assignment and service orders in .ss can be easily obtained via clustering and pair-wise distance sorting using the constructed matrix .DM, respectively. Next, the new estimated customer representations .OPnew of .CVRPs can be obtained via the multidimensional scaling s with .DM [214], which possesses the complexity of O(.n3 ), where n denotes the number of customers.
1
The setting of these two values are based on the rule that the vehicle assignment and service order in .ss can be accurately obtained when applying clustering and pair-wise distance sorting with .DM.
7.2 Algorithm Design and Details
129
Fig. 7.3 Illustration of the constructed distance matrix using the CVRP example given in Fig. 7.1
7.2.2.3 Knowledge Transfer With the learned sparse customer mapping .Ms across CVRP domains (i.e., .M12 and .M21 ), and the new CVRP customer representation .OPnew based on .ss , the s knowledge transfer across CVRPs can be performed via the simple operation of matrix multiplication. In particular, as outlined in Algorithm 11, for knowledge transfer from .CVRP1 to .CVRP2 , we first set .CVRP1 and .CVRP2 as .CVRPs and .CVRPt , respectively. Further, for each of the selected solution .ss for transfer, new .OPs is estimated as discussed in Sect. 7.2.2.2, which is a .d × ns matrix, and .d is the new learned number of customer features. The approximated customers of .CVRPt is then obtained via .OPnew = OPnew × M12 . Next, to obtain t s the transferred CVRP solution for .CVRPt , K-Means clustering with random initializations is conducted on .OPnew to derive the customer assignments of t vehicles. Moreover, the service orders of each vehicle are subsequently achieved by sorting the pairwise distances among customers using .OPnew in an ascending t order. The two customers with largest distance shall then denote the first and last customer to be served. Taking the first customer as reference, the service order of the remaining customers are defined according to the sorted orders. Lastly, the transferred solution across CVRPs, are inserted into the population of .CVRPt to undergo natural selection, and bias the optimization search process accordingly. For knowledge transfer from .CVRP2 to .CVRP1 , .CVRPs and .CVRPt are then set as .CVRP2 and .CVRP1 accordingly. The knowledge transfer process is performed exactly the same as discussed above, using the customer mapping .M21 . Furthermore, as can be observed, the computational cost of the knowledge transfer process contains two parts, one is the K-means clustering with complexity of O(.n ∗ d ∗ k ∗ z), and the other is the pairwise distances sorting possessing the complexity of O(.d ∗ n2 ) in the worst case, where n, d, k and z denote the number of
130
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
customers, the feature dimension of each customer, the number of vehicles and the number of iterations in K-Means clustering, respectively.
Algorithm 11: Knowledge transfer in the presented EMT for solving CVRPs 1 2 3 4 5 6 7 8 9 10
Begin /* Knowledge transfer at G1 from CVRP1 to CVRP2 */ Set CVRPs and CVRPt as CVRP1 to CVRP2 , respectively; S = {si |1 ≤ i ≤ Q}, si is selected solution for transfer from CVRPs , i = 1; for i ≤ Q do Set ss = si , and Estimate OPnew as discussed in Sect. 7.2.2.2; s Obtain OPnew via OPnew × M12 ; t s Perform K-means and pair-wise distance sorting with OPnew t to generate CVRP solution for CVRPt ; Insert the generated solution into the population to undergo natural selection; i = i + 1;
11 /* Knowledge transfer at G2 from CVRP2 to CVRP1 */ 12 Set CVRPs and CVRPt as CVRP2 to CVRP1 , respectively; 13 S = {si |1 ≤ i ≤ Q}, si is selected solution for transfer from
CVRPs , i = 1;
14 for i ≤ Q do 15 Set ss = si , and Estimate OPnew as discussed in Sect. 7.2.2.2; s 16 Obtain OPnew via OPnew × M21 ; t s 17 Perform K-means and pair-wise distance sorting with OPnew t 18 19
to generate CVRP solution for CVRPt ; Insert the generated solution into the population to undergo natural selection; i = i + 1;
20 End
7.3 Empirical Study In this section, comprehensive empirical studies using common CVRP benchmarks are conducted to verify the efficacy of the presented explicit EMT for combinatorial optimization. In particular, we first introduce how the multi-tasking CVRP benchmarks, which possess different problem similarities, are generated based on existing CVRP instances. Next, the experimental configurations as well as the obtained results are presented and discussed.
7.3 Empirical Study
131
Table 7.1 Property summary of the CVRP instances
Instance A-n54-k7 A-n62-k8 A-n80-k10 B-n50-k7 B-n64-k9 B-n78-k10 P-n50-k8 P-n60-k10 P-n76-k5
Customer number 54 62 80 50 64 78 50 60 76
Vehicle capacity 100 100 100 100 100 100 120 120 280
Vehicle number 7 8 10 7 9 10 8 10 5
7.3.1 Experiment Setup In this study, 9 CVRP instances of the “AUGERAT” CVRP benchmark set with diverse properties (e.g., number of customers, vehicle number, etc.) are used. The detailed properties of the 9 CVRP instances considered are summarized in Table 7.1. To construct the multi-tasking CVRP instances, according to the recent reports of multi-tasking benchmarks for continuous optimization [136, 137], here we propose to build the high-similarity, medium-similarity, and low-similarity multi-tasking CVRP pairs by randomly and independently deleting 10, 30, and .50% customers from the CVRP instances, respectively. For instance, to build the high-similarity multi-tasking pairs of “A-n54-k7”, we will perform two separate random deletions of .10% customers from “A-n54-k7”. The resultant two instances are labeled as “An54-k7-h-t1” and “A-n54-k7-h-t2”, where “h” stands for high-similarity, while “t1” and “t2” denote task 1 and task 2 of “A-n54-k7”, respectively. In this way, each CVRP instance will generate 3 sets of multi-tasking CVRP benchmarks, and totally, there are 27 sets of multi-tasking CVRP benchmarks will be built. Subsequently, according to [215, 216], we implement a strong memetic algorithm2 which serves as the single-task evolutionary solver for solving CVRP, that is labeled as EA1. Moreover, as explicit EMT is able to incorporate different solvers for different optimization tasks, to investigate the benefits of employing various search mechanisms for optimization, we further implement another single-task solver, i.e., EA2, which shares the same reproduction with EA1, but differs in the local search settings. In the presented explicit EMT algorithm (EEMTA), EA1 and EA2 are employed for solving task 1 and task 2, respectively. Further, to verify the effectiveness of the transferred solutions across CVRPs, EA1 and EA2 with 2
The implemented solver is a state-of-the-art memetic algorithm, which is able to achieve the best known results of the “AUGERAT” CVRP benchmark, that are available at: http://neo.lcc.uma.es/ vrp/known-best-results/.
132
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
injection of randomly generated solution along the evolutionary search process, which are labeled as EA1+R and EA2+R respectively, are compared. Please note that, the frequency and amount of solution for injection in both EA1 and EA2 are kept the same as EEMTA. Moreover, to verify the efficacy of EEMTA, besides the single-task EAs, the recently proposed implicit EMT algorithm, i.e., PMFEA [217], for solving combinatorial optimization problems, is also considered as the baseline algorithm. For fair comparison, the search operators of PMFEA are kept the same as EA1 for the comparison on task 1, and changed to be consistent with EA2 for the evaluation on task 2. Laslty, all the search operator and parameter settings of the EA1, EA2, EA1+R, EA2+R, PMFEA and EEMTA are referred to [215–217] accordingly. Detailed configurations are given below: 1. Parameters for EEMTA (a) .α and .β in .DM: .α = 10 and .β = 1000. (b) Number of solutions selected for transfer: .Q = 5. (c) Generation interval for knowledge transfer: .G1 = G2 = 5. 2. Population size: (a) EA1, EA2, EA1+R, EA2+R and EEMTA: 50. (b) PMFEA: 100. 3. Maximum generations: 100. 4. Independent runs: 20. 5. Local search settings. (a) Local search in EA1 and EA1+R: Replace, Single-Insertion and Two-Swap [215, 216]. (b) Local search in EA2 and EA2+R: Replace [215, 216]. 6. Probability of local search: .0.1. As can be observed, since PMFEA has only one population for solving two tasks, the population size of PMFEA is doubled when compared to the other algorithms. The reproduction operator settings of the single-task EA1 and EA2 are kept the same and referred to [215]. Lastly, the explicit solution transfer across CVRPs in EEMTA happens in every 5 generation, and 5 best solutions in terms of objective value from each task will be selected for transfer from one to the other.
7.3.2 Results and Discussions To investigate the performance of EEMTA, we present, analyze, discuss and compare the results obtained against recently proposed EMT algorithms and the traditional single-task EAs, based on the criteria of solution quality and search efficiency.
7.3 Empirical Study
133
7.3.2.1 Solution Quality To evaluate the solution quality of EEMTA, Table 7.2 tabulates all the results obtained by the respective algorithms over 20 independent runs. In the table, the symbol “h”, “m” and “l” in the “Problem” column denote the high-similarity, medium-similarity, and low-similarity multi-tasking CVRP benchmarks, respectively. The column “B.Cost” and “Ave.Cost” give the best solution and averaged solution obtained across 20 independent runs, respectively. Superior performance are highlighted using bold font in the table. Further, in order to obtain the statistically comparison, Wilcoxon rank sum test with .95% confidence level has been conducted on the experimental results. It can be observed from Table 7.2 that on task 1, with solutions implicitly transferred across CVRPs, the PMFEA obtained superior or competitive solution qualities on most of the CVRPs, such as “A-n80-k10-h-t1”, “B-n78-k10-h-t1”, and “P-n60-k10-h-t1”, in terms of “Ave.Cost” when compared against the single-task EA1. As PMFEA and EA1 share the common search operator and parameter settings, these obtained results again confirmed the effectiveness of conducting evolutionary multi-tasking. However, on task 2 of most CVRP instances, although the search operator and parameter settings of PMFEA are now consistent with EA2, PMFEA achieved poor “Ave.Cost” values. This is because EA2 has weaker local search than EA1, the solutions found along the search by EA2 are thus poorer than EA1. As the solution transfer in implicit EMT is via crossover without elite selection, the transferred low-quality solutions caused negative transfer performance in PMFEA. On the other hand, in Table 7.2, EEMTA has been observed to outperform the single task EA1 and EA2 on task 1 and task 2 of most multi-tasking CVRP benchmarks, respectively. In particular, EEMTA achieved superior solution quality in terms of “Ave.Cost”, compared to EA1 or EA2 on all the 27 multi-tasking CVRP benchmarks, and obtained improved “Ave.Cost” values than both EA1 and EA2 on 12 number of multi-tasking CVRPs. Further, in terms of “B.Cost”, on task 2 of most CVRP benchmarks, with solution transferred from task 1, EEMTA has found superior best solutions than EA2. On task 1 of the benchmarks, such as “A-n80-k10m-t1” and “A-n62-k8-h-t1”, with solution transferred from task 2, EEMTA has also achieved better best solutions against EA1. As EA1 has more powerful local search capability than EA2, from the obtained results above, we can see that not only strong solver can improve the search process of the weak solver, but the weak solver may also contains information which is useful for enhancing the search of the strong solver in evolutionary multi-tasking. Further, it also can be observed that, EA1+R and EA2+R achieved competitive solution quality against EA1 and EA2 respectively, on all the CVRP benchmarks. As both EA1+R and EA2+R share the same frequency and amount of solutions for injection, with EEMTA, and only differ in the generation of injected solutions, the obtained superior solution quality of EEMTA confirmed the effectiveness of the presented explicit evolutionary multi-tasking across CVRPs. Lastly, in contrast to the implicit multi-tasking algorithm, i.e., PMFEA, EEMTA has obtained superior solutions with respect to “Ave.Cost” on most of the multitasking CVRP benchmarks. Moreover, due to the advantage of employing different
PMFEA
–
714.0
727.5 ± 8.0 –
690.0
963.5±16.4 – –
938.0 707.0
691.6 ± 1.2 –
–
960.7±20.6 724.9±12.4
–
655.0 707.53 525.0
655.0 ± 0.0 ≈
697.9 ± 1.2+
525.0 ± 0.0 ≈
655.0
696.0
525.0
T1
T2
T1
B-n50-k7-m-t1
1022.0 1062.0
1022.0 1025.1 ± 2.3 ≈ 1055.0 1073.1 ± 8.3+
T1 T2
A-n80-k10-l-t1 A-n80-k10-l-t2
B-n50-k7-h-t1
1382.0
1346.0 1394.1 ± 17.8+
A-n80-k10-m-t2 T2
B-n50-k7-h-t2
1783.0 1259.0
1747.0 1775.6 ± 13.1+
1258.0 1268.1 ± 8.9 ≈
791.7 1652.7
763.0 789.2 ± 15.1+ 1668.0 1678.8 ± 7.6 ≈
T2 A-n62-k8-l-t2 A-n80-k10-h-t1 T1
A-n80-k10-h-t2 T2
765.0
765.0 ± 0.0 ≈
765.0
T1
A-n62-k8-l-t1
A-n80-k10-m-t1 T1
915.5 958.55
916.5 ± 2.3+
975.9 ± 20.8+
915.0
942.0
T1
T2
A-n62-k8-m-t1
A-n62-k8-m-t2 766.5 ± 3.1 – –
915.0 765.0
972.7 ± 9.4 –
–
952.0
917.5 ± 2.9 – 767.4 ± 5.2 –
–
–
970.2 ± 9.6
–
– –
– 1367.0 1409.4±16.9 –
–
1379.0 1400.8±15.5
–
1761.0 1799.1±18.7 1259.0 1270.7 ± 8.1 –
525.1 ± 0.2 –
525.0
525.1 ± 0.2
702.0
655.0 ± 0.0 – –
655.0
726.0±14.0 –
655.0 ± 0.1
–
655.0 525.0
715.0±10.8 –
–
703.0
655.2 ± 0.5 – 525.2 ± 0.4 –
–
–
(continued)
717.6±10.5
–
– 1022.0 1025.6 ± 2.6 – – 1025.5 ± 2.1 1022.0 1024.7 ± 1.6 – 1063.0 1079.1 ± 7.3 – 1063.0 1074.9 ± 7.7 – – 1083.7 ± 9.2 –
1423.4±21.3 –
–
1779.0 1805.6±18.7 –
1271.0 ± 7.8 1258.0 1270.2 ± 7.4 –
1810.7±16.3 –
763.0 803.4±23.3 – 771.0 814.0±18.7 – – 822.7±15.8 – – 1658.0 1680.7±10.1 – – 1679.7±11.3 1655.0 1674.7±11.7 –
765.0
765.7 ± 1.9
960.0
917.5 ± 3.1 – –
915.0
989.5±18.5 –
917.5 ± 2.3
– 1217.0 1224.6 ± 6.6 – – 1219.0 1228.0 ± 8.6 1210.0 1223.7 ± 6.4 – 1236.0 1261.8±19.8 – 1218.0 1254.9±23.9 – – 1257.42 1279.4±10.5 –
727.3 ± 9.1 –
–
713.79
945.0
724.2 ± 10.3 ≈
691.9 ± 1.4 –
707.0
1211.0 1222.0 ± 5.1+ 1237.0 1250.7 ± 12.2+
T2
T1 T2
A-n54-k7-l-t2
A-n62-k8-h-t1 A-n62-k8-h-t2
–
690.0
978.5±19.9 – 691.7 ± 1.1
938.0 690.0
951.3 ± 9.0+
692.2 ± 1.6 ≈
933.0
690.0
T2
T1
A-n54-k7-m-t2
–
A-n54-k7-l-t1
1072.0 1088.2 ± 4.0 –
1082.0
1131.0 1154.8 ± 9.6 – 1138.0 1160.3±11.4 – – 1151.83 1166.7 ± 8.6 – – 899.0 904.4 ± 5.6 – – 904.0 ± 7.7 899.0 902.8 ± 5.5 – 899.0
1072.0 1088.3 ± 4.2 ≈
1118.0 1148.9 ± 13.3+ 899.0 905.8 ± 5.8 ≈
T1
T2 T1
–
EA2 B. cost Ave. cost
EA1 B. cost Ave. cost
EA2+R B. cost Ave. cost
EA1+R B. cost Ave. cost
1088.5 ± 2.5 1087.0 1090.1 ± 2.7 –
Ave. cost
A-n54-k7-h-t1
B. cost
B. cost Ave. cost
Task no. EEMTA
A-n54-k7-h-t2 A-n54-k7-m-t1
Problem
Table 7.2 Solution quality of EEMTA, PMFEA, EA1+R, EA2+R, EA1 and EA2 on the 9 “AUGERAT” multi-tasking CVRP benchmark sets. The superior solution quality of each respective problem instance is highlighted in bold font. (“.≈”, “.+” and “.−” denote EEMTA statistically significant similar, better, and worse than PMFEA, respectively)
134 7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
953.0
781.0 837.0
713.0
561.0 554.0
B-n78-k10-m-t2 T2
T1 T2
T1
T2
T1
T2 T1
T2
T1
T2
B-n78-k10-l-t1 B-n78-k10-l-t2
P-n50-k8-h-t1
P-n50-k8-h-t2
P-n50-k8-m-t1
P-n50-k8-m-t2 P-n50-k8-l-t1
P-n50-k8-l-t2
P-n60-k10-h-t1
P-n60-k10-h-t2
P-n60-k10-m-t1 T1 P-n60-k10-m-t2 T2
679.0
360.0
508.0 412.0
460.0
594.0
563.0
908.0 971.0 781.0 839.43 563.0 602.0 460.0 503.0 412.0 363.72 679.0 732.36 561.0 563.0
782.4 ± 0.9 ≈ 847.6 ± 5.7+
563.0 ± 0.0 ≈
608.4 ± 5.5+
460.0 ± 0.0 ≈
519.1 ± 4.6 ≈ 412.0 ± 0.0 ≈
367.0 ± 4.2+
684.2 ± 4.3 ≈
727.1 ± 6.1+
561.8 ± 1.3 ≈ 571.6 ± 9.7+
908.0
976.0±12.0+
908.1 ± 0.4 ≈
531.0
531.0 ± 0.0 ≈
463.0 ± 1.8 – –
T2 1175.0 1224.9±14.6+ 1221.54 1255.0±14.2 –
B-n78-k10-m-t1 T1
531.0
711.0
711.6±11.4+
461.0
B-n78-k10-h-t2
645.6 ± 9.7
461.0
463.6 ± 1.8
–
T2 541.0 555.5 ± 9.3 ≈ 545.0 559.8 ± 6.8 – – 541.0 552.3 ± 7.1 – – 541.0 553.3 ± 8.8 T1 1056.0 1072.4 ± 8.4 ≈ 1055.79 1069.6 ± 8.5 1058.0 1070.6 ± 7.9 – – 1058.0 1072.6 ± 8.8 – –
623.0 435.0 ± 0.0 –
B-n64-k9-l-t2 B-n78-k10-h-t1
–
T1
699.0
647.0
–
435.0
B-n64-k9-l-t1
647.0
647.0 ± 0.0 ≈
635.6±11.5 – –
T2
623.0
647.0
908.0
531.0
647.0 ± 0.0 –
–
700.0
647.0
–
531.0
714.9 ± 8.0 –
–
908.0
973.0 1002.9±21.3 –
–
1200.0 1244.1±14.3 –
908.2 ± 0.9 –
–
531.0 ± 0.0 –
–
–
–
–
–
721.9±13.7
–
– 977.0 1003.5±21.7
908.1 ± 0.4 –
1217.0 1245.1±13.2
531.0 ± 0.0 –
700.0
647.0 ± 0.0 –
563.0 460.0
563.1 ± 0.2 – 460.0 ± 0.0 –
–
599.0
563.0
–
460.0
611.1 ± 6.3 –
–
–
460.0 ± 0.0 –
602.0
563.1 ± 0.3 –
–
613.1 ± 7.7
–
679.0 –
684.8 ± 4.9 –
–
726.0
360.0
745.9 ± 6.5 –
679.0
368.3 ± 5.0 – –
–
–
363.0 734.0
685.2 ± 3.9 –
372.2 ± 5.3 747.4 ± 5.1
–
(continued)
562.7 ± 1.7 561.0 562.3 ± 1.5 – – 561.0 561.5 ± 1.1 – – 578.3 ± 8.0 – – 558.0 574.7 ± 9.5 – – 557.0 573.8 ± 9.0
746.4 ± 6.3 –
684.1 ± 4.1
372.0 ± 5.4 –
522.1 ± 7.7 – – 499.0 519.6 ± 6.7 – – 504.0 517.6 ± 8.0 412.1 ± 0.5 412.0 412.0 ± 0.0 – – 412.0 412.3 ± 0.7 – –
460.0 ± 0.1
613.6 ± 7.1 –
563.1 ± 0.4
782.1 ± 0.5 782.0 782.2 ± 0.5 – – 781.0 782.3 ± 0.9 – – 856.7±10.1 – – 838.0 851.0 ± 6.3 – – 829.0 851.4 ± 6.5
1008.9±20.5 –
908.3 ± 0.6
531.0 ± 0.0
733.0±15.7 –
647.0 ± 0.0
805.0 ± 0.0 805.0 805.0 ± 0.0 – – 805.0 805.0 ± 0.0 – – 819.0 ± 6.3 – – 796.0 811.3 ± 6.3 – – 802.0 813.4 ± 5.3
463.4 ± 2.0 –
435.0 ± 0.0 –
T1
805.0 805.85
805.0 ± 0.0 ≈ 800.2 ± 2.3+
–
B-n64-k9-m-t2
461.0
463.3 ± 1.9 ≈
435.0
B-n64-k9-m-t1
805.0 796.0
461.0
435.0 ± 0.0
T1 T2
643.7±13.0 –
T2
435.0
435.0 ± 0.0 ≈
B-n64-k9-h-t1 B-n64-k9-h-t2
625.0
633.3 ± 6.8+
B-n50-k7-l-t2
435.0
T1
B-n50-k7-l-t1
624.0
T2
B-n50-k7-m-t2
7.3 Empirical Study 135
422.0
T2
P-n76-k5-l-t2
366.0
522.0
583.0 507.0
P-n76-k5-h-t2 T2 P-n76-k5-m-t1 T1
T1
583.0
P-n76-k5-h-t1 T1
P-n76-k5-l-t1
467.0
P-n60-k10-l-t2 T2
P-n76-k5-m-t2 T2
405.0
405.0 468.0 580.34 608.26 507.0 531.75 422.0 374.54
482.1 ± 6.4 ≈
586.6 ± 2.3 ≈
598.8 ± 6.8+ 507.0 ± 0.0 ≈
538.0 ± 8.8+
422.6 ± 0.8 ≈
383.4 ± 9.8 ≈
B. cost
PMFEA
405.1 ± 0.2 ≈
B. cost Ave. cost
Task no. EEMTA
P-n60-k10-l-t1 T1
Problem
Table 7.2 continued
405.0 585.0
589.1 ± 3.2 –
467.0 –
479.1 ± 5.8 – 583.0
405.0 – 587.6 ± 3.6 –
468.0
–
480.3 ± 6.6
–
B. cost Ave. cost
EA2 405.1 ± 0.2 –
B. cost Ave. cost
EA1
422.0
385.6 ± 5.5 –
423.0 ± 0.9
553.4±11.9 –
537.0
–
367.0
422.7 ± 0.9 –
– – 382.7 ± 7.5 –
555.4±10.7 – 422.0 –
– 423.2 ± 0.9 –
362.0
532.0
381.7±11.8
–
548.7±12.2
627.2 ± 9.6 – – 609.0 631.4 ± 9.1 – – 618.0 633.2 ± 9.7 507.2 ± 0.3 507.0 507.0 ± 0.0 – – 507.0 507.0 ± 0.0 – –
588.3 ± 4.2
–
–
B. cost Ave. cost
EA2+R 405.0 ± 0.0 –
B. cost Ave. cost
EA1+R
484.5 ± 8.0 –
405.2 ± 0.5
Ave. cost
136 7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
7.3 Empirical Study
137
evolutionary solvers in the presented explicit multi-tasking paradigm, EEMTA has achieved significantly better “Ave.Cost” values than PMFEA, on task 2 of totally 22 number of CVRP benchmarks. In summary, since EEMTA shares the common search operators and parameters with PMFEA, EA1+R, EA2+R, EA1 and EA2, for solving the multi-tasking CVRP benchmarks, the obtained enhanced search performance by EEMTA, in terms of solution quality confirmed the effectiveness of EEMTA for conducting explicit EMT for combinatorial optimization.
7.3.2.2 Search Efficiency: Convergence Trends To assess the efficiency of EEMTA, the representative search convergence traces of EEMTA, PMFEA, EA1, and EA2 on the representative high-similarity, mediumsimilarity, and low-similarity multi-tasking CVRP benchmarks are presented in Figs. 7.4, 7.5, and 7.7, respectively. In the figures, the Y-axis denotes the averaged travel cost obtained in log scale, while the X-axis gives the respective computational effort incurred in terms of generation made so far. From these figures, it can be observed that the implicit EMT algorithm, i.e., PMFEA converges faster or competitive than the single-task EAs (i.e., EA1 or EA2)
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7.4 Convergence traces of EEMTA versus the PMFEA and the single-task EAs on representative high-similarity multi-tasking CVRPs. Y -axis: log(Averaged travel cost); X-axis: generation. (a) A-n62-k8-h. (b) A-n80-k10-h. (c) B-n50-k7-h. (d) B-n78-k10-h. (e) P-n50-k8-h. (f) P-n76-k5-h
138
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7.5 Convergence traces of EEMTA versus the PMFEA and the single-task EAs on representative medium-similarity multi-tasking CVRPs. Y -axis: log(Averaged travel cost); X-axis: generation. (a) A-n62-k8-m. (b) A-n80-k10-m. (c) B-n50-k7-m. (d) B-n78-k10-m. (e) P-n50-k8m. (f) P-n76-k5-m
on most of the multi-tasking CVRPs. However, as PMFEA employs a common search mechanism for different tasks, and there is no solution selection in the knowledge transfer process across tasks, the improvements in terms of convergence speed achieved by PMFEA are limited even on the high-similarity multi-tasking CVRPs (see Fig. 7.4). Next, for the introduced explicit EMT algorithm for combinatorial optimization, due to the transfer of high-quality solutions from the strong solver EA1 on task 1 to task 2, EEMTA has obtained significantly faster convergence speed over both the single-task EA2 and the multi-tasking PMFEA, on task 2 of all the high-similarity, medium-similarity, and low-similarity multi-tasking CVRPs. For instance, on “An62-k8-h-t2”, EEMTA uses only about 5 generations to arrive the solution obtained by EA2 and PMFEA at generation 20. On “B-n50-k7-m-t2”, EEMTA takes 20 generations to achieve the solution better than that obtained by EA2 and PMFEA over 30 generations. However, due to the decrease of similarity between tasks from high-similarity to the low-similarity multi-tasking benchmarks, we can observe that the improvements of convergence speed obtained by EEMTA over EA2 on task 2 of the multi-tasking CVRPs also decrease from Figs. 7.4, 7.5, 7.6, and 7.7. For example, in contrast to the high-similarity benchmark “A-n62-k8-h-t2” aforementioned (see Fig. 7.4a), on the low-similarity benchmark “A-n62-k8-l-t2”
7.3 Empirical Study
139
(a)
(b)
(c)
Fig. 7.6 Tracking of best solutions along the evolutionary search. (a) A-n54-k7-h. (b) B-n78-k10h. (c) P-n76-k5-h
140
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7.7 Convergence traces of EEMTA versus the PMFEA and the single-task EAs on representative low-similarity multi-tasking CVRPs. Y -axis: log(Averaged travel cost); X-axis: generation. (a) A-n62-k8-l. (b) A-n80-k10-l. (c) B-n50-k7-l. (d) B-n78-k10-l. (e) P-n50-k8-l. (f) P-n76-k5-l
0
Fig. 7.8 Averaged travel cost in log scale obtained by EEMTA and single task solver (i.e., EA1 and EA2) on all the CVRPs
(see Fig. 7.7a), it takes about 15 generations of EEMTA to reach the solution found by EA2 and PMFEA at generation 20. Further, on task 1 of the multi-tasking CVRPs, although task 2 has a weaker evolutionary solver, i.e., EA2, the transferred solutions from task 2 could also be useful, and thus lead to faster or competitive convergence speed of EEMTA over both EA1 and PMFEA on most of the multi-tasking CVRPs. Moreover, with the injection of randomly generated solutions in EA1 and EA2, we can observe that EA1+R and EA2+R obtained faster convergence speed over EA1
7.3 Empirical Study
141
and EA2 respectively, on most of the CVRP benchmarks, such as Figs. 7.4e, 7.5c, and 7.7d. This is because of the solution diversity introduced along the evolutionary search of EA1+R and EA2+R. However, using the same configuration of frequency and amount of solutions for injection, with the introduced knowledge transfer across CVRPs, EEMTA has been observed to achieve faster convergence speed over both EA1+R and EA2+R on all the high-, medium-, and low-similarity multi-tasking CVRP benchmarks. Further, to explore the reason behind the superior performance obtained by EEMTA over EA1+R and EA2+R, in Fig. 7.6, we plot the tracking of best solutions found along the evolutionary search in EEMTA, EA1+R and EA2+R. In particular, if the best solution is generated from the transferred solutions, it will be given tag value 1, otherwise 0. As can be observed in the figure, in EEMTA, the transferred solutions across CVRPs successfully lead to the finding of best solutions along the search, while the randomly transferred solutions in both EA1+R and EA2+R failed to contribute to the localization of best solutions. Lastly, as G (i.e., .G1 and .G2 ) and Q define the frequency and amount of knowledge sharing between CVRPs, we further study how the configurations of G and Q affect EEMTA. Generally, a small value of G and a big value of Q will greatly increase the frequency and amount of knowledge sharing across tasks, while a big value of G and a small value of Q will reduce the frequency and amount of solution transfer across tasks significantly. In particular, the averaged travel costs obtained by EEMTA and the single-task solver (i.e., EA1 and EA2) on all the CVRPs across 20 independent runs with various configuration of G and Q are summarized in Fig. 7.8. It can be observed from the figures that, superior solution qualities have been obtained by EEMTA when compared to the single-task solvers with all the configurations of G and Q. However, although the optimal confirmations of G and Q are in general problem dependent, fixing .G1 = G2 = 5 and .Q = 5 is found to provide noteworthy results across a variety of problems encountered. In summary, it is worth noting here again that EEMTA has the common search mechanisms, i.e., evolutionary search operators and parameters, with PMFEA, EA1+R, EA2+R, EA1, and EA2 for solving task 1 and task 2 of all the multitasking CVRPs. Therefore, the superior convergence speed of EEMTA can clearly be attributed to the efficiency of the introduced explicit multi-tasking approach for combinatorial optimization.
7.3.3 Real-World Routing Application: The Package Delivery Problem In this section, we further investigate the performance of EEMTA on a realworld vehicle routing application, i.e., the package delivery problem, from the logistics industry. In particular, we aim to carry out the computational demanding optimization of multiple distinct package delivery routing requests from real courier business. It is noted that the requirements of obtaining high-quality routing solution
142
7 Explicit Evolutionary Multi-Task Optimization for Capacitated Vehicle Routing. . .
efficiently is becoming as one of the key challenges of logistics in today’s global economy era, thereby highlighting the potential real-world implications of our proposition. The package delivery problem (PDP) is a NP-hard complex combinatorial optimization task. Due to the fast development of e-commerce, the courier companies confront with a large amount of package delivery tasks everyday. An efficient and effective optimization paradigm for solving PDP cannot only save the operating costs, but also improve the service quality of the courier companies. Typically, the PDP can be defined as the task of servicing a set of customers with a fleet of capacity constrained vehicles located at a single or multiple depot(s), which is a real-world application of the capacitated vehicle routing problem. In the present context, we have four PDP requests from a courier company in beijing, China. In particular, the corresponding number of customers need to be served, the vehicle available at the courier company, and the capacity of the vehicles are summarized in Table 7.3. Usually, courier companies always optimize the PDP requests in a sequential manner via the single-task heuristic solvers. In the present study, we employ EEMTA to solve the PDP requests concurrently. The implicit PMFEA is also employed for solving the PDP requests to further evaluate the efficacy of the introduced explicit evolutionary mutli-tasking algorithm. In particular, for simplicity, here we keep the configurations of search operators and parameters consistent with those used the empirical study above. Moreover, for the four PDP requests, the pyramid match kernel method [218] is used to pair the PDP requests based on the customer distributions. We then have two pairs in this study, which are {PDP request 1, PDP request 2}, and {PDP request 3, PDP request 4}. The convergence graphs obtained by EEMTA, the implicit multi-tasking PMFEA, and the corresponding single-task EAs, i.e., EA1 and EA2, on the PDP requests are presented in Fig. 7.9. As clearly revealed in the figure, both of the evolutionary multi-tasking algorithms, i.e., EEMTA and PMFEA, obtained faster convergence speed against the single-task EAs. Further, it is observed that the knowledge transfer enabled by EEMTA provides a strong impetus to the search process, speeding up the discovery of high-quality solutions by a substantial amount. In fact, with knowledge transferred from the paired PDP, the EEMTA is found to receive a significant boost during the initial stages of evolution itself, on PDP request 2 and PDP request 4 (i.e., Fig. 7.9a, b), enabling it to quickly achieve high-quality routing solutions while consuming considerably lower computational cost. To demonstrate, it takes
Table 7.3 Property summary of the PDP requests
Instance PDP request-1 PDP request-2 PDP request-3 PDP request-4
Customer number 270 255 264 246
Vehicle capacity 862 862 862 862
Vehicle number 16 15 16 14
7.4 Summary
(a)
143
(b)
Fig. 7.9 Convergence traces of EEMTA, PMFEA and the single-task EAs on the real world PDP requests. Y -axis: log(Averaged travel cost); X-axis: generation. (a) Paired PDP request-1 and 2. (b) Paired PDP request-3 and 4
EEMTA about 5 generations on average to attain the routing solution with travel cost significantly lower than that obtained by EA2 over 35 generations. On the other hand, although the knowledge transfer for PDP request 1 and PDP request 2 is from the weak solver EA2, EEMTA is also observed to obtain the solutions achieved by EA1 at generation 35, using around 20 number of generations. It is not hard to imagine that the significant savings of computational cost caused by EEMTA can play a vital role towards cutting down of optimization time, especially when faced with real-world complex combinatorial optimization problems, where computational budget available is limited.
7.4 Summary In this chapter, we have presented an explicit EMT algorithm, i.e., EEMTA, for combinatorial optimization. In particular, by employing CVRP as the illustrating combinatorial optimization problem domain, we have first presented a weighted .l1 norm-regularized formulation to learn the sparse mapping between customers across CVRPs. Further, we have introduced to learn the new representation of customers based on a distance matrix derived from the optimized CVRP solutions, so that the useful traits buried in the optimized CVRP solutions can be transferred across CVRPs via the learned customer mapping, using simple clustering and pair-wise distance sorting processes. Lastly, comprehensive empirical studies on both multitasking CVRP benchmarks and the real world PDP, against the state-of-the-art EMT algorithm and the traditional single-task evolutionary solvers, confirmed the efficacy of EEMTA for combinatorial optimization.
Part IV
Evolutionary Multi-Task Optimization for Solving Large-Scale Optimization Problems
Chapter 8
Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
Today, because of the exponential growth of the volume of data in big data applications, large-scale optimization problems (i.e., optimization problems with a large number of decision variables) have become ubiquitous in the real world [219, 220]. In this chapter, building on the algorithms and observations presented for solving both continuous and combinatorial optimization problems, we further present a novel multi-space evolutionary search framework based on EMT for solving largescale single-objective optimization problem. In particular, for a given large-scale single-objective optimization problem, besides the original problem space, multiple simplified solution spaces are derived for the given problem, which possess unique landscapes. Furthermore, instead of conducting an evolutionary search on the given problem space, evolutionary searches are concurrently performed on both the original and constructed simplified spaces of the given problem. By transferring useful traits while the search progresses online across different spaces, via EMT, an enhanced problem-solving process can be obtained. As the optimization of the large-scale optimization problem is maintained in the given problem space, and the simplified solution spaces serve to provide biases to guide the search in the original problem space, the presented multi-space evolutionary search paradigm provides more flexibilities of the construction of the simplified problem spaces, which does not rely on the assumption that a problem is decomposable or reducible. To verify its efficacy, comprehensive empirical studies in comparison to five state-of-the-art algorithms are conducted using the CEC2013 large-scale benchmark problems as well as an AI application in e-commerce, i.e., movie recommendation.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 L. Feng et al., Evolutionary Multi-Task Optimization, Machine Learning: Foundations, Methodologies, and Applications, https://doi.org/10.1007/978-981-19-5650-8_8
147
148
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
8.1 Existing Approaches for Simplifying Search Space of Large-Scale Single-Objective Optimization Problems In recent years, to improve the evolutionary algorithms used to solve optimization problems involving a large number of decision variables, many attempts have been made to simplify the problem solution space of a given problem for the evolutionary search. According to recent surveys in the literature [219–221], existing approaches to simplifying the search space of a given large-scale optimization problem can generally be categorized as decomposition-based approaches, and dimensionreduction-based methods. In particular, the decomposition-based approaches are also known as divide-and-conquer approaches in evolutionary computation and mainly involve cooperative coevolution (CC) algorithms, which decompose a given large-scale optimization problem into several smaller subproblems and then optimize each subproblem separately using different EAs. Generally, decompositionbased approaches consist of three major steps. First, by considering the structure of the underlying decision variable interactions, the original D-dimensional problem is exclusively divided into N .di -dimensional sub-problems, where . N i=1 di = D. Next, each subproblem is solved by a particular EA. Finally, the d-dimensional solutions to these subproblems are merged to form the D-dimensional complete solution for the original problem. It is straightforward to see how the decomposition of the problem is essential to the performance of CC algorithms, and how an inappropriate decomposition of the decision variables may even lead to a deteriorated optimization performance [221, 222]. Particular examples in this category include strategies that randomly divide the variables into groups without taking the variable interaction into consideration [223–225], approaches that make use of evolutionary information to learn variable interdependency and then divide variables into groups [226, 227], and static decomposition methods that are performed before conducting an evolutionary search based on the detection of variable interaction [222, 228–231]. On the other hand, instead of decomposing the solution space of the given problem, a dimension-reduction-based approach attempts to create a new solution space with lower dimensionality from the original solution space. The evolutionary search is then performed on the newly created low-dimension space, and the obtained solution is mapped back to the original space for evaluation. Generally, the existing approaches perform dimension reduction either by selecting a subset of the original decision variables or transforming the original solution space into a low-dimensional solution space. As can be observed, the preservation of important information for guiding the search toward high-quality solutions in the reduced solution space plays a key role in determining the performance of a dimensionreduction-based approach. Examples belonging to this class include the random matrix projection-based estimation of distribution algorithm, which introduced an ensemble of random projections to low dimensions of the set of fittest search points [232]; random embedding-based approach for large-scale optimization problems with low effective dimensions, which improve the scalability of the simultaneous optimistic optimization by projecting the problem space to a low dimensional
8.2 Algorithm Design and Details
149
space via random embedding [233]; multi-agent system assisted embedding for large-scale optimization that improved the reliability of random embedding via multi-agent system for large-scale optimization [234]; solving large-scale multiobjective optimization via problem reformulation, which transformed the search in the original problem space into a low-dimensional single-objective optimization space [235], and the framework for large-scale optimization based on problem transformation, which optimizes the weight values of groups of decision variables instead of the high-dimensional decision variables directly [236]. Although the above methods have shown good performances in solving largescale optimization problems, there are two main drawbacks with these two categories of methods. First, because decomposition-based methods rely heavily on the accurate detection of decision variable interactions, these methods may fail on large-scale optimization problems with complex variable interactions or that are not decomposable. Second, although dimension reduction may not rely on variable interaction, it is difficult to guarantee that the global optimum or highquality solutions are preserved in the reduced space. However, because a simplified solution space can provide useful information for efficient and effective problem solving, it is desirable to develop new search paradigms for large-scale optimization that can leverage the advantage of simplified solution spaces without the limitations discussed above.
8.2 Algorithm Design and Details This section presents the details of the multi-space evolutionary search for largescale optimization. In particular, the outline of the search paradigm is presented in Fig. 8.1. For a given problem of interest, besides the original problem space, a simplified problem space for the given problem is first created. Next, the mapping between these two problem spaces is learned, which will be used for knowledge transfer across spaces during the evolutionary search process via EMT. Further, by treating these two problem spaces as two tasks, evolutionary searches can be conducted on the tasks concurrently. As can be observed in the figure, knowledge transfer will be performed across tasks while the evolutionary search progresses online (see the green rectangle in Fig. 8.1). In this way, the useful traits found in the simplified problem space can be leveraged to facilitate the search in the original space, while the high-quality solutions found in the original problem space may also guide the search direction in the simplified problem space toward promising areas. Furthermore, to explore the usefulness of diverse auxiliary tasks, the simplified problem space will be re-constructed periodically using the solutions found during the evolutionary search process (see the yellow rectangle in Fig. 8.1). Finally, the EMT process is terminated when certain stopping criteria are satisfied. The following sections present details on the construction of the simplified problem space, learning of mapping across problem spaces, knowledge transfer across problem spaces, and reconstruction of the simplified problem space.
150
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
Fig. 8.1 Workflow of the proposed multi-space evolutionary search for large-scale optimization
8.2.1 Construction of the Simplified Problem Space Because the simplified problem space serves as an auxiliary task of a given problem of interest, there are generally no particular constraints on the construction of the simplified space. Therefore, the existing approaches proposed in the literature, such as random embedding [233], dimension reduction [237], or even search space decomposition [223, 227] could be employed for constructing the space.
8.2 Algorithm Design and Details
151
In this study, for simplicity, the dimension reduction is considered for constructing a simplified problem space, .PSs , in the multi-space evolutionary search paradigm. In particular, to generate initial population .Ps of the evolutionary search in .PSs , initial population .P is first sampled in original problem space .PS, which is routine [161]. Next, the obtained .P in .PS will undergo dimension reduction1 with dimension .ds to generate .Ps for the evolutionary search in .PSs .
8.2.2 Learning of Mapping Across Problem Spaces Once the simplified problem space has been constructed, the mappings across simplified problem space .PSs and original problem space .PS have to be learned, to allow the useful traits found in each space to be transferred across spaces toward efficient and effective problem solving for large-scale optimization. In this study, the mappings across .PSs and .PS are learned using labeled data from each space via supervised learning. In particular, as discussed in Sect. 8.2.1, .Ps is generated by performing dimension reduction of .P. Therefore, each solution in .Ps has a unique corresponding solution in .P. This correspondence thus provides the label information to connect spaces .PSs and .PS. Taking this cue, by configuring d d .T and .S as .P and .Ps , respectively, the mapping .MPSs →PS : .R s → R (.MPSs →PS is a .d × ds matrix; .ds and d are the dimensions of the simplified and original problem spaces, respectively.) from simplified space .PSs to original problem space .PS can then be approximated by minimizing the squared reconstruction loss,2 which is given by the following: Lsq (M) =
.
N 1 ||pi − M × qi ||2 2N
(8.1)
i=1
where N denotes the number of solutions in .S and .T,3 .qi is the solution in .S, and .pi gives the solution in .T, which corresponds to .qi . Further, to simplify the notation, it is assumed that a constant feature is added to the input, that is, .pi = [pi ; 1] and .qi = [qi ; 1], and an appropriate bias is incorporated within the mapping, .M = [M, b]. The loss in Eq. 8.1 is then reduced to the matrix form: Lsq (M) =
.
1
1 tr[(T − M × S)T (T − M × S)] 2N
(8.2)
Without lose of generality, any dimension reduction method can be applied here. In this study, as .S is generated via dimension reduction using .T, this mapping could be directly obtained in the dimension reduction process. However, this learning method is general in cases when only solutions in both the simplified and the original problem spaces are given. 3 .S and .T are thus represented by a .d × N matrix and a .d × N matrix, respectively. s 2
152
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
where tr.(·) and T denote the trace operation and transpose operation of a matrix, respectively. The solution of Eq. 8.2 can be expressed as the well-known closedform solution for the ordinary least squares [164], which is given by the following: M = (T × ST )(S × ST )−1
.
(8.3)
Finally, it is straightforward to see that the mapping .MPS→PSs (a .ds × d matrix) from space .PS to .PSs can also be learned via Eq. 8.3 by configuring .T and .S as .Ps and .P, respectively.
8.2.3 Knowledge Transfer Across Problem Spaces With the learned mappings .MPSs →PS and .MPS→PSs across the simplified and original problem spaces, a knowledge transfer across these two spaces can easily be conducted by the simple operation of matrix multiplication. In particular, for a knowledge transfer from .PSs to .PS, suppose this process occurs every generation 4 .Gt . First, the Q best solutions are selected in terms of the fitness values from the population of the simplified problem space, denoted by .Ss , which is a .ds ×Q matrix. Next, the transferred solutions, .TSPSs →PS , are obtained by .MPSs →PS × Ss . Finally, the solutions in .TSPSs →PS are injected into the population of the original problem space to undergo natural selection for the next generation. On the other hand, every generation .Gt , knowledge transfer also occurs from the original problem space to the simplified problem space. In particular, the Q’ best solutions in terms of fitness values are first selected from the population of the original problem space, which are labeled as .Ss and are a .d × P matrix. Subsequently, the transferred solutions .TSPS→PSs can be obtained by .MPS→PSs × Ss . Further, the solutions in .TSPS→PSs are inserted into the population of the simplified problem space with natural selection. Moreover, after the knowledge transfer process, the updated population of the simplified problem space is further transformed back to the original problem space, and archived in .As . The repeated solutions in .As are removed. As can be observed, .As preserves the search traces in the present simplified problem space, which will be used for the reconstruction of a new simplified space. That is discussed in detail in the next section.
4 The fitness values of solutions in .PS are evaluated by transforming these solutions back to the s original problem space and using the given problem objective function.
8.2 Algorithm Design and Details
153
8.2.4 Reconstruction of the Simplified Space To explore the usefulness of diverse auxiliary tasks for large-scale optimization, instead of using one fixed simplified problem space, the proposal is made to build multiple simplified problem spaces periodically while the evolutionary search progresses online. In particular, if the reconstruction of the simplified problem space occurs every generation .Gr , the dimension reduction used in Sect. 8.2.1 is considered here again to reconstruct a simplified problem space with a new set of solutions in the original problem space. Further, in order to preserve the useful traits found in the last simplified space, the solutions in archive .As are used as the new set of solutions and subjected to the dimension reduction to construct a new .PSs .5 Subsequently, the solutions in .As and corresponding mapped solutions in .PSs will be used to learn mappings .MPSs →PS and .MPS→PSs across problem spaces .PS and .PSs . Finally, the population of the simplified problem space is also re-initialized in the new .PSs , which is shown in detail in Algorithm 12. Algorithm 12: Pseudo code of the population re-initialization in .Ps Input : .PoPs : the population in the simplified problem space before the reconstruction; .As : archive in the original problem space Output: .Ps : the re-initialized population in the new simplified problem space 1 Begin 2 Transform .Ps back to the original problem space using .MPSs →PS , denoted by .Ps ; 3 Perform the reconstruction of the simplified problem space using dimension reduction with .As ; 4 Learn the new .MPSs →PS and .MPS→PSs across problem spaces .PS and .PSs as discussed in Sect. 8.2.2. 5 Re-initialize the population in the new simplified problem space by .Ps = MPS→PSs × Ps ; 6 End
8.2.5 Summary of the Multi-Space Evolutionary Search A summary of the multi-space evolutionary search for large-scale optimization is presented in Algorithm 13. As can be observed, the algorithm starts with the construction of simplified problem spaces (.PSs ), and the learning of mappings
5
In this study, for simplicity, the dimension of .PSs (.ds ) was kept unchanged.
154
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
(i.e., .MPSs →PS and .MPS→PSs ) across the simplified (.PSs ) and original problem (.PS) spaces. Next, the reproduction operators are conducted to generate offspring solutions in .PSs and .PS, respectively. The evaluations of solutions in both .PSs and .PS are performed using the given problem objective function in the original problem space .PS (see lines 8–9 in Algorithm 13). Moreover, while the EMT progresses online, the knowledge transfer across .PSs and .PS occurs in every .Gt generation (see lines 10–13 in Algorithm 13), and the reconstruction of a new simplified problem space .PSs is performed in every .Gr generations (see lines 15– 18 in Algorithm 13). The EMT search process proceeds iteratively until a certain stopping criteria satisfied. Further, archive .As on lines 12-13 of Algorithm 13 is used to store the non-repeating search traces in the simplified problems with solution representation in the original problem space. Without loss of generality, the volume of .As can be configured as needed. However, in this study, for simplicity, the volume of .As is configured as .5 ∗ NP , where NP denotes the population size of the evolutionary search. Once the search traces exceed the volume of .As , only the latest search traces are archived.
8.3 Empirical Study This section discusses the results of comprehensive empirical studies that were conducted to evaluate the performance of the multi-space evolutionary search paradigm on the commonly used large-scale optimization benchmarks, compared to several state-of-the-art algorithms proposed in the literature.
8.3.1 Experimental Setup In this study, the commonly used CEC2013 large-scale optimization benchmark [238], which contains 15 functions with diverse properties, was used to investigate the performance of the multi-space evolutionary search. As summarized in Table 8.1, according to [238], the benchmark consists of both unimodal and multimodal minimization functions, which can be generally categorized into the following five classes: (1) fully separable functions, (2) partially additive separable functions I, (3) partially additive separable functions II, (4) overlapping functions, and (5) fully nonseparable functions. Further, except for functions F13 and F14, all the functions have a dimensionality of 1000. Because of the overlapping property, functions F13 and F14 both have 905 decision variables. For more details on the CEC2013 large-scale optimization benchmark, interested readers can refer to [238]. Next, to verify the efficacy of the multi-space evolutionary search (referred to as MSES hereafter) for large-scale optimization, five state-of-the-art methods for addressing large-scale optimization, including decomposition-based cooperative coevolution and non-decomposition-based approaches, were considered as the
8.3 Empirical Study
155
Algorithm 13: Pseudo code of the multi-space evolutionary search for large-scale optimization Input: P: Given the problem space of interest; ds : dimensionality of the simplified problem space; Gt : interval of knowledge transfer across problem spaces; Gr : interval of simplified space reconstruction Output: s∗ : optimized solution of the given problem 1: Begin 2: Construct the simplified problem space PSs with dimension ds of given problem PS; 3: Learn the mappings MPSs →PS and MPS→PSs across PSs and PS; 4: gen = 1; As = ∅ 5: while terminate condition is not satisfied do 6: gen = gen + 1; 7: Perform reproduction operators (e.g., crossover and mutation) for PSs and PS, respectively; 8: Transform solutions in PSs back to PS; 9: Perform natural selection using the given problem objective function for both PSs and PS in problem space PS; 10: if mod(gen, Gt ) = 0 then 11: Perform knowledge transfer across PSs and PS; 12: Transform the population of PSs back to PS, and archive the population in As ; 13: Remove the repeated solutions in As ; 14: end if 15: if mod(gen, Gr ) = 0 then 16: Reconstruct the new simplified problem space PSs with dimension ds using solutions in As ; 17: Learn the new mappings MPSs →PS and MPS→PSs across the new constructed PSs and PS; 18: Re-initialize the populations in PSs using the new learned MPS→PSs ; 19: end if 20: end while 21: End
baseline algorithms for comparison. In particular, the cooperative coevolution approaches included the recursive decomposition methods proposed by Sun et al. in 2018 (called RDG) [230] and in 2019 (called RDG3) [231], and an improved variant of the differential grouping algorithm introduced by Omidvar et al. which is called DG2 [239]. The non-decomposition-based approaches included the levelbased learning swarm optimizer proposed by Yang et al. in 2018 (called DLLSO) [240], and the random embedding-based method proposed by Hou et al. in 2019 (called MeMAO) [234]. Further, in these compared algorithms, it should be noted that different evolutionary search methods were used as the basic optimizer. For
156
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
Table 8.1 Properties of CEC2013 benchmark functions Separability
Function
Modality
Search space
Base function
Fully separable functions
F1
Unimodal
.[−100, 100]D
Elliptic
F2
Multimodal
.[−5, 5]D
Rastrigin
F3
Multimodal
.[−32, 32]D
F4
Unimodal
.[−100, 100]
Elliptic
F5
Multimodal
.[−5, 5]D
Rastrigin
F6
Multimodal
.[−32, 32]D
Ackley
F7
Multimodal
.[−100, 100]D
F8
Unimodal
.[−100, 100]
Elliptic
F9
Multimodal
.[−5, 5]D
Rastrigin
F10
Multimodal
.[−32, 32]D
Ackley
F11
Unimodal
.[−100, 100]D
F12
Multimodal
.[−100, 100]
Rosenbrock
F13
Unimodal
.[−100, 100]D
Schwefel
F14
Unimodal
.[−100, 100]D
Schwefel
F15
Unimodal
.[−100, 100]D
Schwefel
Partially additive separable functions I
Partially additive separable functions II
Overlapping functions
Fully non-separable functions
Ackley D
D
D
Schwefel
Schwefel
example, RDG, RDG3 and DG2 employed the self-adaptive differential evolution with neighborhood search (SaNSDE) [230, 239] as the optimizer, while MeMAO considered the classical differential evolution (DE) method as the optimizer [234]. Rather than using differential evolution, DLLSO used the particle swarm optimizer as the basic search method [240]. For a fair comparison to the different baseline algorithms, the optimizer for each space in the MSES was kept consistent with the optimizer used in the compared algorithm. Further, the basic optimizers performed in the original problem space are also included as baseline algorithms for comparison, which serves as the ablation studies.6 Lastly, the parameter and operator settings of all the compared algorithms and the MSES were kept the same as those in [230], [231], [239], [240], and [234], which are summarized as follows: • Population size: population size .NP = 50, 100, and 500 for optimizers SaNSDE, DE, and DLLSO, respectively; • Independent number of runs: .runs = 25 for all compared algorithms; • Maximum number of fitness evaluations: .Max_F Es = 3E + 06; • Number of solutions to be transferred across spaces in MSES: .Q = Q = 0.2 ∗ NP ; • Interval of knowledge transfer across problem spaces: .Gt = 1; • Interval of simplified space reconstruction: .Gr = 10;
6
As knowledge transfer in this study is in the form of solutions, and we simply considered a fixed number of solutions for transfer to investigate the performance of the multi-space evolutionary search, the distribution based solver CMAES [231] is not employed in RDG3 in our empirical study.
8.3 Empirical Study
157
• Dimensionality of simplified problem space: .ds = 600; • Dimension reduction method: principal component analysis (PCA) [241]; • Size of .As : .Assize = 5 ∗ NP .
8.3.2 Results and Discussion This section presents and discusses the performance of the MSES in comparison to those of the existing state-of-the-art approaches on the CEC2013 large-scale benchmark functions in terms of the solution quality and search efficiency.
8.3.2.1 Solution Quality Table 8.2 tabulates the results with respect to the averaged objective values and standard deviations obtained by all the compared algorithms over 25 independent runs. In particular, based on the evolutionary solver employed for the search (e.g., SaNSDE, PSO, and DE), the comparison results are divided into three groups, with each group sharing the same evolutionary solver. The best performance in each comparison group is highlighted using a bold font in the table. Further, in order to obtain a statistical comparison, a Wilcoxon rank sum test with a .95% confidence level was conducted on the experimental results, where “+”, “.−”, and “.≈” show that statistically the algorithm is significantly better than, significantly worse than, or similar to MSES, respectively. As can be observed in the table, in all three comparison groups, when using different evolutionary search methods as the optimizer, MSES obtained a superior solution quality in terms of the averaged objective value on most of the problems compared to the other algorithms. In the comparison groups using SaNSDE and PSO as the optimizers, MSES.SaNSDE and MSES.DLLSO, lost to the compared algorithms on large-scale benchmarks F1 and F2. Table 7.1 shows that F1 and F2 are fully separable functions. Moreover, F1 is based on a unimodal “Elliptic” function, and the search space of F2 is only within the range of [.−5, 5], which indicates the simplicity of the search spaces for these two functions. Therefore, the reason behind the obtained performance of MSES could be the simple use of dimension reduction for constructing the simplified problem space of these two problems, which may destroy the separable information between decision variables. However, on the other more complex large-scale benchmarks such as partially additive separable, overlapping, and fully nonseparable problems, where greater appropriate guidance is required for an effective search, MSES.SaNSDE and MSES.DLLSO achieved superior and competitive averaged objective values in contrast to DG2/RDG/RDG3 and DLLSO, respectively. On benchmarks F11, F13 and F14, only MSES was able to consistently find solutions with objective values of approximately .e + 07 in both of these comparison groups. On 15 large-scale benchmarks, MSES.SaNSDE and
SaNSDE
4.70e+05.±7.02e+05.+ 1.50e+04.±1.03e+03.+ 2.06e+01.±7.04e-03.− 7.47e+09.±2.81e+09.− 2.97e+06.±4.20e+05.+ 1.05e+06.±2.02e+03.− 6.83e+06.±3.37e+06.− 1.89e+13.±1.11e+13.− 2.67e+08.±2.84e+07.+ 9.29e+07.±2.81e+05.− 6.38e+08.±1.43e+09.− 4.19e+07.±6.79e+07.− 4.58e+08.±1.67e+08.− 4.94e+08.±5.10e+08.− 4.93e+06.±1.09e+06.−
Comparison 1 MSES.SaNSDE
1.34e+07.±1.54e+06 1.80e+04.±8.60e+02 2.00e+01.±2.39e.−03 1.29e+09.±4.50e+08 4.52e+06.±7.48e+05 1.00e+06.±6.14e+03 1.85e+06.±1.67e+05 7.14e+12.±5.94e+12 4.48e+08.±7.43e+07 9.10e+07.±5.42e+05 1.51e+07.±4.35e+06 1.68e+03.±1.80e+02 1.51e+07.±2.38e+06 4.83e+07.±7.52e+06 1.64e+06.±1.21e+05
Problems
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15
5.20e+02.±1.32e+03.+ 1.26e+04.±7.07e+02.+ 2.14e+01.±1.38e.−02.− 5.08e+10.±1.78e+10.− 5.35e+06.±4.84e+05.− 1.06e+06.±9.95e+02.− 6.55e+07.±1.96e+07.− 5.71e+15.±1.48e+15.− 4.94e+08.±3.05e+07.− 9.46e+07.±2.78e+05.− 4.82e+09.±5.60e+09.− 2.62e+05.±1.17e+06.− 1.62e+09.±4.58e+08.− 5.26e+09.±2.86e+09.− 1.11e+07.±1.73e+06.−
DG2
2.16e+01.±9.09e+01.+ 1.28e+04.±7.07e+02.+ 2.14e+01.±1.59e.−02.− 4.00e+10.±1.22e+10.− 5.05e+06.±3.69e+05.− 1.06e+06.±1.20e+03.− 9.49e+07.±5.34e+07.− 4.39e+15.±1.76e+15.− 4.98e+08.±2.92e+07.− 9.45e+07.±3.49e+05.− 6.17e+08.±1.36e+08.− 4.38e+03.±1.09e+03.− 3.02e+09.±8.26e+08.− 3.73e+09.±1.95e+09.− 9.76e+06.±1.41e+06.−
RDG
6.57e+00.±2.22e+01.+ 1.25e+04.±5.84e+02.+ 2.14e+01.±1.25e.−02.− 4.55e+10.±1.69e+10.− 4.96e+06.±5.03e+05.− 1.06e+06.±1.21e+03.− 5.91e+07.±2.22e+07.− 4.16e+15.±1.47e+15.− 4.98e+08.±2.46e+07.− 9.45e+07.±2.45e+05.− 3.56e+09.±4.34e+09.− 1.39e+03.±7.80e+01.+ 9.44e+09.±2.20e+09.− 3.36e+10.±1.53e+10.− 1.01e+07.±1.59e+06.−
RDG3
Table 8.2 Averaged objective values and standard deviations obtained by MSES and compared baseline algorithms. (The best performance in each comparison group is highlighted in bold, and “.+”, “.≈”, and “.−” denote that the compared algorithm is statistically significantly better than, similar to, and worse than MSES using different EA solvers, respectively)
158 8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15
Problems
+/.≈/-No.
4/0/11
DLLSO 3.98e.−22.±1.09e.−22.+ 1.12e+03.±6.33e+01.+ 2.16e+01.±5.82e.−03.− 5.85e+09.±1.08e+09.− 6.77e+05.±1.16e+05.− 1.06e+06.±9.75e+02.≈ 1.58e+06.±8.81e+05.≈ 1.30e+14.±4.44e+13.− 4.26e+07.±9.49e+06.≈ 9.40e+07.±2.61e+05.≈ 2.35e+08.±6.58e+07.− 1.77e+03.±1.30e+02.− 3.26e+08.±1.22e+08.− 2.03e+08.±2.41e+08.− 4.51e+06.±4.70e+05.−
–
Comparision 2 MSES.DLLSO 1.00e.−04.±1.51e.−04 2.88e+03.±1.96e+02 2.13e+01.±1.54e.−01 1.30e+09.±5.16e+08 5.79e+05.±6.92e+04 1.06e+06.±1.60e+03 1.90e+06.±7.26e+05 8.73e+11.±1.42e+11 4.14e+07.±7.00e+06 9.40e+07.±2.29e+05 1.62e+07.±1.94e+07 1.62e+03.±1.84e+02 1.11e+07.±7.44e+06 2.48e+07.±8.72e+06 1.49e+06.±1.55e+05
Comparison 3 MSES.DE 5.03e+07.±4.59e+06 9.09e+03.±5.93e+02 2.14e+01.±2.25e.−02 1.10e+09.±2.58e+08 6.33e+06.±2.98e+06 1.06e+06.±9.39e+02 2.34e+06.±3.15e+05 1.13e+10.±6.25e+09 1.06e+08.±2.20e+07 9.41e+07.±1.52e+05 7.95e+07.±2.66e+07 1.78e+03.±2.66e+02 3.06e+07.±6.95e+06 6.52e+07.±8.96e+06 3.59e+06.±3.21e+05
2/0/13 DE 8.31e+04.±8.25e+04.+ 1.19e+04.±5.83e+02.− 2.13e+01.±6.65e.−02.+ 1.83e+10.±7.37e+09.− 1.81e+06.±2.26e+06.+ 1.06e+06.±1.19e+03.≈ 4.15e+07.±2.45e+07.− 1.06e+14.±5.61e+13.− 8.48e+07.±1.25e+07.+ 9.40e+07.±2.62e+05.≈ 3.45e+10.±3.02e+10.− 1.94e+06.±3.65e+06.− 4.64e+09.±1.90e+09.− 6.07e+10.±3.59e+10.− 2.43e+07.±6.31e+06.−
2/0/13 MeMAO 3.31e+11.±2.44e+10.− 1.56e+05.±7.39e+03.− 2.15e+01.±1.69e.−02.− 4.70e+12.±1.04e+12.− 4.31e+07.±4.62e+06.− 1.06e+06.±1.59e+03.≈ 5.54e+13.±4.98e+13.− 1.56e+17.±4.15e+16.− 3.25e+09.±3.99e+08.− 9.40e+07.±2.18e+05.≈ 1.22e+16.±7.07e+15.− 1.75e+12.±3.55e+11.− 1.10e+16.±7.24e+15.− 1.63e+16.±9.69e+15.− 1.32e+18.±5.28e+17.−
3/0/12
8.3 Empirical Study 159
160
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
MSES.DLLSO achieved significantly better averaged objective values on 13, 12 and 9 problems in contrast to DG2/RDG, RDG3, and DLLSO, respectively. Furthermore, in the comparison group of that used DE as the optimizer, MSES.DE obtained superior or competitive averaged objective values on all the large-scale benchmarks compared to MeMAO. In particular, on benchmarks such as F4 and F8, MSES.DE achieved improvements of orders of magnitude in contrast to MeMAO. The objective values achieved on these benchmarks were even superior to those obtained using SaNSDE and PSO as the optimizers. On 15 large-scale benchmarks, MSES.DE achieved significantly better averaged objective values on 13 problems in contrast to MeMAO. In summary, because the presented method used the same optimizer as the compared algorithms in each comparison group, and only differed in the search spaces, the superior solution quality observed in Table 8.2 confirmed the effectiveness of the proposed multi-spaces evolutionary search for large-scale optimization.
8.3.2.2 Search Efficiency This section presents the convergence graphs obtained by all the compared algorithms on all the large-scale benchmarks to assess the search efficiency of the multi-spaces evolutionary search for large-scale optimization. In particular, Figs. 8.2 and 8.3 show the obtained convergence graphs obtained on the fully separable
(a)
(b)
F1
15
(c)
F2
5.5
MSESDLLSO
0
MSESDE MSESDLLSO
-10
MSESSaNSDE DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
-15 -20 -25
0
0.5
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
4
3.5
1
(d)
1.5
2
2.5
3
3
0
0.5
1
106
1.5
2
MSESSaNSDE
1.31
13
MSESSaNSDE
12 11.5 11 10.5 10
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
7.5
7
1.5
2
2.5
3 106
6.5
5.5
1.5
2
0.5
1
1.5
Evaluations
2
2.5
3 106
F6
6.025 MSESDE
6.02
MSESDLLSO
MSESSaNSDE
6.015
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
6.01 6.005
0
2.5
6.03
9.5
Evaluations
1
6.035
MSESSaNSDE
6
1
0.5
(f)
8
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
12.5
0
Evaluations
MSESDLLSO
MSESDLLSO
0.5
MSESDLLSO DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
1.315
1.3
3
MSESDE
MSESDE
0
MSESDE
1.32
106
F5
8.5
13.5
2.5
Evaluations
(e)
F4
14
9
1.33 1.325
1.305
Evaluations
log(Averaged Objective Value)
5
4.5
log(Averaged Objective Value)
-5
1.335
MSESSaNSDE
log(Averaged Objective Value)
log(Averaged Objective Value)
5
log(Averaged Objective Value)
log(Averaged Objective Value)
10
F3
1.34
MSESDE
3 106
6
0
0.5
1
1.5
Evaluations
2
2.5
3 106
Fig. 8.2 Convergence curves of average fitness values (over 25 independent runs) obtained by MSES and compared algorithms on CEC2013 fully separable and partially additive separable functions, i.e., F1–F6 (Y-axis: averaged objective value in log scale; X-axis: number of fitness evaluations). (a) F1. (b) F2. (c) F3. (d) F4. (e) F5. (f) F6
8.3 Empirical Study
161
(b) F7
MSESDLLSO MSESSaNSDE
16
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
14 12 10
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
17 16 15
MSESDLLSO
10
14 13 12
MSESSaNSDE
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
9.5
9
8.5
0.5
1
1.5
2
2.5
10
3
0.5
1
1.5
2
2.5
3
(e) F10 MSESDE
log(Averaged Objective Value)
MSESSaNSDE DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
7.98 7.975 7.97 7.965
14 12 10
6
0.5
1
1.5
2
2.5
3
(g)
0
0.5
1
1.5
2
2.5
log(Averaged Objective Value)
MSESSaNSDE DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
12 10 8
0.5
1
1.5
Evaluations
2
2.5
0
0.5
1
1.5
2
2.5
3 106
F15
20 MSESDLLSO
18
MSESSaNSDE DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
18 16 14 12 10
3 106
6
MSESDE
16
MSESDLLSO MSESSaNSDE
14
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
12 10 8
8
0
6
(i)
20
MSESDLLSO
14
8
MSESDE
MSESDE
16
10
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
Evaluations
F14
22
18
MSESSaNSDE
2
3
(h)
20
MSESDE
12
106
Evaluations
F13
22
3 106
4
106
Evaluations
2.5
MSESDLLSO
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
16
7.955
2
F12
14
18
8
1.5
MSESDE MSESDLLSO
MSESSaNSDE
7.96
1
(f)
20
MSESDLLSO
0.5
Evaluations
F11
22
7.985
0
106
Evaluations
log(Averaged Objective Value)
(d)
0
0
106
7.5
log(Averaged Objective Value)
0
7.99
log(Averaged Objective Value)
MSESDE
MSESSaNSDE
11
Evaluations
log(Averaged Objective Value)
MSESDE
8
8
6
F9
10.5
MSESDLLSO
18
MSESDE
log(Averaged Objective Value)
log(Averaged Objective Value)
18
6
(c) F8
19
log(Averaged Objective Value)
(a) 20
0
0.5
1
1.5
Evaluations
2
2.5
3 106
6
0
0.5
1
1.5
Evaluations
2
2.5
3 106
Fig. 8.3 Convergence curves of average fitness values (over 25 independent runs) obtained by MSES and compared algorithms on CEC2013 partially additive separable, overlapping and fully non-separable functions, i.e., F7–F15 (Y-axis: averaged objective value in log scale; X-axis: number of fitness evaluations). (a) F7. (b) F8. (c) F9. (d) F10. (e) F11. (f) F12. (g) F13. (h) F14. (i) F15
functions, partially additive separable functions, overlapping functions, and fully non-separable functions. In these figures, the Y-axis denotes the averaged objective values in log scale, while the X-axis gives the respective computational effort required in terms of the number of fitness evaluations. It can be observed from Figs. 8.2 and 8.3 that on benchmarks F1 and F2, the compared algorithm DLLSO obtained the best convergence performance. This indicates that on benchmarks with a relatively simple decision space, a search on the original problem space can efficiently find high-quality solutions using properly designed search strategies. Moreover, on the other functions of the CEC 2013 benchmarks with more complex decision spaces (e.g., partially additive separable functions, overlapping functions, and fully nonseparable functions), where greater appropriate guidance is required for an efficient search for high-quality solutions, MSES obtained the
162
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization CPU Time
12000 MSESDLLSO
Computational Time (s)
10000
DLLSO MSESDE
8000
MeMAO DE MSESSaNSDE
6000
SaNSDE DG2 RDG RDG3
4000
2000
0
F8
F14
Problems
Fig. 8.4 Averaged CPU time costs (over 25 independent runs) obtained by all the compared algorithms on representative benchmark problems
best and competitive convergence performances in all three comparison groups. In particular, even on the fully separable function F3, because it is based on the complex “Ackley” function, MSES.SaNSDE and MSES.DLLSO obtained faster convergence over the compared algorithms that shared the same EA solvers. In addition, for functions such as F8, F13, and F14, regardless of which EA was considered as the search optimizer, MSES obtained the best convergence performance in contrast to the baseline algorithms in all three comparison groups. Because MSES used the same search optimizers as the compared algorithms, the superior search speed obtained confirmed the efficiency of the multi-space evolutionary search for largescale optimization. Furthermore, to investigate the computational cost of the multi-space evolutionary search in terms of central processing unit (CPU) time, we also depict the CPU time and convergence graphs with respect to CPU time of all the compared algorithm on representative benchmark problems. In particular, Figs. 8.4 and 8.5 present the averaged CPU time and the convergence curves obtained by all the compared algorithms on benchmark F8 and F14, respectively. It can be observed from the figures that, as the presented paradigm does not include processes like problem analysis and problem decomposition involved in exiting cooperative coevolutionary approaches, the CPU time consumed by the multi-space evolutionary search paradigm will not increase a lot when compared to the baseline evolutionary search algorithms. Finally, to provide deeper insights into the superior performance obtained by MSES, considering the three different search algorithms (SaNSDE, PSO, and DE) as the optimizers, the transferred solutions from the simplified space and the best solutions in the population of the original problem space on the representative
8.3 Empirical Study
(a)
163
(b)
F8
19
MSESDLLSO
DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
17 16 15 14 13 12
MSESDLLSO
MSESSaNSDE DE MeMAO DLLSO SaNSDE DG2 RDG RDG3
16 14 12 10 8
11 10
MSESDE
18
MSESSaNSDE
log(Averaged Objective Value)
log(Averaged Objective Value)
18
F14
20
MSESDE
0
500
1000
CPU Time(s)
1500
2000
6
0
1000
2000
3000
4000
5000
6000
CPU Time(s)
Fig. 8.5 Convergence curves of average fitness values (over 25 independent runs) obtained by MSES and compared algorithms on representative benchmark functions (Y-axis: averaged objective value in log scale; X-axis: CPU time). (a) F8. (b) F14
benchmarks are plotted in Fig. 8.6. As can be observed in the figure, solutions were transferred across the problem spaces during the evolutionary search process. In particular, in Fig. 8.6a, compared to the best solution in the original problem space at different stages of the search, both inferior and superior solutions in terms of the objective value were transferred across the spaces. The former could be eliminated via natural selection, while the latter survived and efficiently guided the evolutionary search in the original problem space toward promising areas of high-quality solutions, which led to the enhanced search performance of MSES, as observed in Table 8.2, Figs. 8.2, and 8.3. Similar observations can also be made in the cases of using PSO and DE as the optimizers, as depicted in Fig. 8.6b, c, respectively. These also confirmed that useful traits could be embedded in the different spaces of a given problem, and concurrently conducting an evolutionary search on multiple spaces can lead to efficient and effective problem-solving for large-scale optimization.
8.3.2.3 Sensitivity Study Five parameters were used in MSES: the size of .As (.Assize ), dimension of the simplified problem space (.ds ), interval for reconstructing the simplified problem space (.Gr ), and interval and number of solutions transferred from the simplified to the original problem space (.Gt and Q, respectively). This section presents and discusses how these parameters affect the performance of MSES. In particular, Figs. 8.7, 8.8, and 8.9 present the averaged objective values obtained by MSES and DLLSO on the representative benchmarks across 25 independent runs with different configurations of .Assize , .ds , Q, .Gt , and .Gr . In the figures, the Xaxis gives different benchmark functions, while the Y-axis denotes the normalized averaged objective value obtained by each compared configuration. Specifically, the
164
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
Fig. 8.6 Illustration of transferred solutions and the best solution in the population on representative benchmarks of different comparison groups. (a) Comparison 1: SaNSDE as EA solver. (b) Comparison 2: PSO as EA solver. (c) Comparison 3: DE as EA solver
obtained objective values on each benchmark are normalized by the worst (largest) objective obtained by all the compared algorithms on the benchmark. Therefore, values close to 0 and 1 denote the best and worst performances, respectively. Further, because DLLSO was observed to obtain a superior solution quality and search speed in contrast to the other compared algorithms in Sects. 8.3.2.1 and 8.3.2.2, it is considered as the baseline algorithm here. For a fair investigation, DLLSO was also used as the optimizer in MSES. Out of these parameters, .Assize , .ds , and .Gr were involved in the simplified problem space construction. .Assize defined the number of solutions for constructing the simplified problem space, while .ds gave the dimensionality of the constructed space. Further, .Gr determined the frequency for reconstructing the simplified problem space. Generally, small numbers of .Assize and .ds simplified the problem space to a large extent, and large numbers of these two parameters could make the constructed space close to the original problem space. In addition, small and large values for .Gr reconstructed the low-dimensional space frequently and infrequently during the evolutionary search, respectively. As can be observed in Fig. 8.7, on the partially additive separable functions, for example, F5 and F8, NP (population size)
8.3 Empirical Study Different Size of A s Normalized Averaged Objective Value
Fig. 8.7 Averaged objective values obtained by MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of .Assize
165
1
0.8
MSES(A ssize=5*NP)
MSES(A ssize=10*NP)
0.4
0.2
0
F5
F8
F10
Problems
F12
F15
Different Dimensionality of the Simplified Problem Space Normalized Averaged Objective Value
Fig. 8.8 Averaged objective values obtained by MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of .ds
DLLSO MSES(A ssize=NP)
0.6
1
0.8 DLLSO MSES(ds=200)
0.6
MSES(ds=400) MSES(ds=600)
MSES(ds=800)
0.4
0.2
0
F5
F8
F10
F12
F15
Problems
number of solutions are already able to construct a useful problem space that can improve the search in the original problem space (see the superior objective values achieved by MSES with .Assize = NP , .Assize = 5 ∗ NP , and .Assize = 10 ∗ NP ). However, on the more complex functions, for example, F12 and F15, a larger .Assize may be required to provide more information for constructing a useful problem space in MSES. Furthermore, for the dimensionality of the simplified problem space, as can be observed in Fig. 8.8, neither a small nor a large number for .ds is good for building a useful simplified problem space because a very low dimensionality could lose important information for efficient evolutionary search, and a space with a dimensionality close to the original problem cannot play a
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
Fig. 8.9 Averaged objective values obtained by MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of .Gr
Interval of Simplified Space Reconstruction Normalized Averaged Objective Value
166
1
0.8 DLLSO MSES(G r =5)
0.6
MSES(G r =10) MSES(G r =15)
0.4
0.2
0
F5
F8
F10
F12
F15
Problems Different Number of Solutions to be Transferred across Space Normalized Averaged Objective Value
Fig. 8.10 Averaged objective values obtained by MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of Q
1
0.8 DLLSO MSES(Q=0.1*NP) MSES(Q=0.2*NP) MSES(Q=0.3*NP) MSES(Q=0.4*NP) MSES(Q=0.5*NP)
0.6
0.4
0.2
0
F5
F8
F10
F12
F15
Problems
complementary role to the original problem space for MSES.7 Lastly, as depicted in Fig. 8.9, the frequency of reconstructing the simplified space did not significantly affect the performance of MSES on the considered large-scale benchmarks.
7
Dimension reduction is only one of the possible ways to construct the simplified problem space, and different dimension reduction approaches may possess various properties. For instance, the principal component analysis can only provide linear dimension reduction, while the selforganizing map is able to conduct nonlinear dimension reduction. Prior knowledge or analysis on the given optimization thus could be helpful in selecting proper approaches for constructing the simplified space in the multi-space evolutionary search.
8.3 Empirical Study
Different Interval of Knowledge Transfer across Problem Spaces Normalized Averaged Objective Value
Fig. 8.11 Averaged objective values obtained by MSES and DLLSO on representative benchmarks across 25 independent runs with various configurations of .Gt
167
1
0.8 DLLSO MSES(G t=1)
0.6
MSES(G t=5)
MSES(G t=10)
0.4
0.2
0
F5
F8
F10
Problems
F12
F15
On the other hand, parameters Q and .Gt defined the amount and frequency of knowledge sharing across problem spaces. Generally, a small value of Q and large value of .Gt significantly reduced the amount and frequency of solution transfer across spaces, while a large value of Q and small value of .Gt greatly increased the amount and frequency of knowledge sharing across problem spaces. It can be observed from Figs. 8.10 and 8.11, with different configurations of Q and .Gt values, superior solution qualities were obtained by MSES compared to DLLSO on most of the benchmarks. However, while the optimal confirmations of these parameters were generally problem-dependent, the configuration considered in the empirical study, as discussed above, was found to provide noteworthy results across a variety of larger-scale optimization problems.
8.3.3 AI Application of Recommender System Recommender systems provide users with personalized online recommendations of products or information and have shown great potential to help users find relevant items from an information overload space [242]. Recommender system has become a vital part of e-commerce and has been widely adopted in a variety of online applications, ranging from social network, tourism, and education, to healthcare, etc. Keeping this in mind, to illustrate the efficacy of the multi-space evolutionary search in solving real artificial intelligence (AI) application involving large-scale optimization, in this section, we future present a study of MSES for movie recommendation. In the experiments, we consider movie recommendation by seeking the movies that are most relevant to users’ preferred type (or genre), which is essentially a single objective large-scale optimization problem. Formally, a minimization problem is
168
8 Multi-Space Evolutionary Search for Large-Scale Single-Objective Optimization
Fig. 8.12 The structure of a solution with chromosome encoding in solving real-world large-scale optimization problem of movie recommendation
investigated here, and the objective function is formulated as [243, 244]: minf (x) =
.
(
1 p · x(i) ) + 1 u i∈R
(8.4)
where R represents the recommendation item set, .pu is the latent embedding vector associated with the target user u. For evolutionary search, following [245], we use .x = {x(1) , x(2) , · · · , x(k) } to present a chromosome as depicted in Fig. 8.12, the embedding vector of i-th item .x(i) denotes as the i-th gene in the chromosome. The length of a chromosome is the multiplication of embedding dimension and the length of item list. In other terms, each chromosome indicates a movie list recommended to the user, which is represented by .R = (m1 , m2 , · · · , mk ). MSES is applied to movie recommendation and the performances are compared with three large-scale optimization methods as discussed above, which are DLLSO, RDG3 and MeMAO. The traditional recommendation algorithm, i.e., Bayesian Personalized Ranking (labeled as BPR) [242], is also considered as a baseline algorithm. Moreover, according to [245], the baseline algorithm BPR is employed to learn the latent embedding of each candidate movie in this study, which is further normalized into a bounded latent space .[0, 100]10). The experiments are conducted on the publicly available Movielens-1M dataset, which contains movie ratings collected from the MovieLens web site.8 The ratings larger than 3 are considered as positive feedback (.575, 169 ratings for .3, 457 movies by .6, 034 users), and the dataset is randomly split into two non-overlapping sets. Empirically, .80% of the ratings are used for training, and the remaining .20% are used for testing [244]. For a fair comparison, the algorithm configurations of the large-scale optimization method are kept consistent with the empirical studies on the benchmark problems above, and the setting of BPR is referred to [242]. The convergence curves of the average fitness values over all users obtained by all the compared algorithms on movie recommendation are presented in Fig. 8.13. It can be observed from the figure that MSES.DLLSO and MSES.SaNSDE achieved faster 8
https://grouplens.org/datasets/movielens/.
8.4 Summary -3.18 -3.2
log(Averaged Objective Value)
Fig. 8.13 Convergence curves of average fitness values (over all users) obtained by MSES and compared algorithms on movie recommendation. (Y-axis: averaged objective value in log scale; X-axis: number of fitness evaluation)
169
-3.22 -3.24 -3.26 -3.28
MSESDLLSO MSESSaNSDE
-3.3
MSESDE DLLSO RDG3 MeMAO BPR
-3.32 -3.34 -3.36
0
0.5
1
Evaluations
1.5
2
104
convergence on the optimization of movie recommendation than the compared algorithms (DLLSO and RDG3) which use the same EA solver, respectively. Moreover, the final optimal averaged objective value of MSES.DLLSO obtained superior performance than that of BPR (the blue pentagram) and the other compared algorithms. This again confirmed the efficacy of the multi-space evolutionary search for solving large-scale optimization problem.
8.4 Summary This chapter presented a multi-space evolutionary search paradigm for largescale optimization. In particular, the details of the problem space construction, learning of the mapping across problem spaces, and knowledge transfer across problem spaces are introduced. In contrast to existing methods, the multi-space evolutionary search paradigm conducts an evolutionary search on multiple solution spaces derived from the given problem, each possessing a unique landscape. More importantly, it makes no assumptions about the given large-scale optimization problem, such as that the problem is decomposable or that a certain relationship exists among the decision variables. To validate the performance of this paradigm, comprehensive empirical studies on both the CEC2013 large-scale benchmark problems and an AI recommendation application were conducted. The results were compared to those of recently proposed large-scale evolutionary algorithms and traditional recommendation approach, which confirmed the efficacy of the multispace evolutionary search for large-scale optimization.
Chapter 9
Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
Besides solving large-scale single objective optimization problems, this chapter further demonstrate the multi-space evolutionary search for large-scale multi-objective optimization by using the evolutionary multitasking paradigm of MFO, termed MOEMT. The presented MOEMT first constructs several simplified problem spaces in a multi-variation manner to assist target optimization. Next, a multi-factorial evolutionary search is performed in the multi-task scenario to simultaneously optimize the original problem and the simplified problem variants. By the implicit knowledge transfer across tasks, valuable traits obtained from the simplified searching space can be seamlessly transferred to the original space providing search direction and guidance, thereby improving the effectiveness and efficiency of the original problem-solving. To evaluate the efficacy of MOEMT, comprehensive empirical studies are conducted on a set of commonly used large-scale multi-objective benchmarks with different numbers of variables and objectives. In addition, for fair comparison and to confirm the universality of MOEMT, diverse evolutionary multi-objective optimization solvers are considered to show the consistency of performance. The statistical results show that MOEMT performs better in terms of both solution quality and search speed than state-of-the-art large-scale methods on most test problems, thereby verifying the superiority of MOEMT.
9.1 Existing Approaches for Large-Scale Evolutionary Multi-Objective Optimization For solving large-scale multi-objective optimization problems (LSMOPs), there are numerical exact mathematics methods [246] and metaheuristic algorithms in the literature, and the latter is extensively recognized as a more practical approach to solve large-scale optimization problems [247]. Although various multi-objective © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 L. Feng et al., Evolutionary Multi-Task Optimization, Machine Learning: Foundations, Methodologies, and Applications, https://doi.org/10.1007/978-981-19-5650-8_9
171
172
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
evolutionary algorithms (MOEAs) have been proposed and performed well on MOPs, such MOEAs are powerless for LSMOPs due to the slow convergence and poor search capability in large-scale search spaces [248]. Thus, without additional function evaluations, specialized algorithms are required to improve the search efficiency. Existing metaheuristic approaches for solving LSMOPs can be broadly separated into three categories. The first is the decomposition-based methods, such as cooperative coevolution (CC) optimization using different grouping or clustering methods [249–255] and distributed parallel strategies [256, 257]. The second category is to design new powerful operators or search strategies that consider all decision variables, e.g., introducing new genetic operators [258, 259] and pairwise competitive learning strategy [248, 260]. The third category is based on the problem transformation or variation strategies [261–265], which vary the original problem and generate a new small-scale problem and then perform the optimization in simplified problem space accordingly. In particular, decomposition-based methods divide the variables into groups or clusters and then perform optimization on each group cooperatively in a divideand-conquer manner. In the literature, the cooperative coevolution (CC) framework assisted with different grouping approaches is common in decompositionbased methods. For example, CCGDE3 [249] and MOEA/D2 [252] directly apply CC with random grouping with two traditional MOEAs. In comparison, CC in MOEA/DVA [250] and LMEA [253] are only used for convergence-related variables after decision variable analysis, which divides the variables into two types based on their contributions for search. In addition, distributed parallel strategies [256, 257, 266] have also been proposed to speed up the CC optimization process for solving LSMOPs. The search space of the variable groups is much smaller than the original problem, which is conducive to the exploration within the group. Challenges in decomposition-based methods are more on formulating appropriate strategies for variables grouping and solution evaluation. However, decomposition-based methods may bring extra computational budget on decision variable analysis and even fail in complex variable interaction and non-separable problems. Particular examples lie in the second category, i.e., introducing new genetic operators or search strategies, include [248, 258–260]. In [258], ten improved crossover operators are defined and embedded into NSGA-III to enhance the performance for solving LSMOPs. SparseEA [260] suggests a new population initialization strategy and genetic operators by considering the sparse nature of the Pareto optimal solutions and shows superiority in tackling large-scale sparse MOPs problems. In LMOCSO [248], a new particle updating strategy based on competitive swarm optimizer (CSO) [267] has been proposed to enhance the searching efficiency on LSMOPs. It implicitly enhances the swarm diversity of conventional particle swarm optimizers (PSO) to alleviate premature convergence. Nevertheless, new operators or strategies may be designed for some specific problems (e.g., sparse LSMOPs), and thus it is difficult for such methods to have generalization performances. Besides, more computing resources are needed to achieve convergence when searching directly in the entire decision variable space.
9.2 Algorithm Design and Details
173
Recently, a direction-guided MOEA (DGEA) [259] is proposed to solve LSMOPs via adaptive offspring generation, in which promising solutions are generated by constructing direction vectors in the decision space. For the transformation-based methods, problem transformation or variation is critical in reducing the original problem scale. In ReMO [261], the simplified problem is constructed by a random embedding matrix. With a strong assumption that most dimensions do not change the objective significantly, random embedding has effectively tackled large-scale problems with low effective dimensions. However, ReMO would fail in cases where the assumption is not satisfied. In MOEA/PSL [265], two unsupervised neural networks are adopted to detect an approximate Pareto-optimal subspace by learning a sparse distribution and a compact representation of the decision variables. However, MOEA/PSL may not be effective for general problems since it is customized for sparse LSMOPs. WOF [262] and LSMOF [263] are other representative transformation-based algorithms. Based on the weight optimization strategy that allocates weight values to the groups of decision variables for updating, the optimization of the weight variables can be regarded as optimizing the subspace of the original problem. Instead of optimizing the variables in each group independently and evaluating solutions with other partner groups being fixed, the weight optimization strategy efficiently changes all decision variables at the same time through pre-allocated weight variables. Despite that LSMOF has further improved WOF by applying weight variables associated with two search directions, this strategy substantially limits the reachable solutions in the original search space and may cause common problems of information loss. From the discussion above, it can be observed that for most of the existing transformation-based methods, there is no guarantee that global or near-global optimum can be retained in the newly generated search space. Nevertheless, these methods have confirmed the effectiveness of the simplified problem and illustrated that the transformation function could be easily reused for problem simplification. These thus motivate us to perform optimization on both the original and simplified problem spaces via the multi-factorial evolutionary search for solving LSMOPs.
9.2 Algorithm Design and Details In this section, the details of the multi-space evolutionary search method, i.e., multi-variation multi-factorial evolutionary algorithm (i.e., MOEMT) for largescale multi-objective optimization is introduced. Firstly, the general workflow of MOEMT is presented in Sect. 9.2.1. Then, we introduce the multi-variation manner, i.e., problem variation and dynamic task replacement, followed by the main procedure of multi-factorial evolutionary search.
174
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
9.2.1 Outline of the Algorithm Firstly, problem variation in MOEMT is used to build a multitasking environment for further multi-factorial evolutionary search (refer to Sect. 9.2.2). As shown at the bottom of Fig. 9.1, a single population with original representations is initialized and assigned with different skill factors to distinguish different tasks. The initial allocation process is carried out randomly, as mentioned in subsection II-B. With a multi-variation manner, multiple simple variants of the original problem can be generated by task construction. Periodically, the search spaces of the inactive task or problem variant can be updated through dynamic task replacement. Next, the simplified problems work together as auxiliary or helper tasks to assist the original optimization under multi-factorial evolutionary search (detailed in Sect. 9.2.3). In MOEMT, offspring generation is controlled by knowledge transfer decision. Meeting the probability condition, cross-cultural individuals (i.e., with distinct skill factors) are allowed to mate with each other, shown in the upper left of Fig. 9.1. In this case, cross-task knowledge transfer existing in genetically encoded solutions is triggered implicitly by chromosomal crossover. Instead of constructing a new unified search space following the conventional MFO [23], MOEMT can reversely reconstruct the task-specific representation to the original high-dimensional space to obtain the unified representation. The generated offspring or new individuals are directly evaluated on the unified space and selectively inherit a parental skill factor associated with the tasks. Knowledge transfer across tasks happens throughout the whole evolutionary process. If the knowledge transfer conditions are not satisfied, depicted in the upper right of Fig. 9.1, individuals conduct intra-task evolution with a particular problem solver and the variant representation in the task-specific search space. Note that solutions found in the simplified spaces will also be reverted to the original decision space for evaluation. Further, the superior genes are eventually preserved from the population by the evolutionary process of survival of the fittest. Finally, MOEMT outputs the nondominated solutions of the original problem. The problem variant with a relatively simple landscape reduces the searching difficulty, leading to an accelerated convergence. On one hand, useful information found in the constructed spaces can be seamlessly transferred to the original problem providing search direction and guidance, thereby improving the effectiveness and efficiency of the original problem-solving. On the other hand, the transferred traits from the original problem also enrich the information in the simplified search spaces for intra-task optimization.
9.2.2 Problem Variation MOEMT provides a way of problem variation for building a multitasking environment, which can construct different simplified variants and adaptively update
9.2 Algorithm Design and Details
175
Fig. 9.1 The illustration of MOEMT framework. N simplified problem variants are created as helper tasks by problem multi-variation for the given high-dimensional problem. The multifactorial evolutionary search improves the problem-solving for the original LSMOPs
the generated variants, namely multi-variation. Two pivotal components based on problem variation, i.e., task construction and dynamic task replacement, are detailed in the following subsections.
176
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
9.2.2.1 Task Construction As an essential component for problem variation, task construction aims to reduce the difficulty of optimization problems for efficient optimization. Since the solutions found in the simplified spaces are also needed to map back to the original space for evaluation and perform knowledge transfer across tasks, the variation approach requires bi-directional mapping, i.e., problem simplification when constructing auxiliary or helper tasks and the reconstruction when unifying solution representation. Tasks with different dimension and reference solutions are identified by the skill factor .τ . The basic properties of the simplified tasks mentioned above are summarized as follows. • • • •
task.dimension: the dimension of the task-specific representation; task.method: the selected variation method; .task.prime: the reference solution for the WO method; .task.population: individuals with the same skill factor. . .
Given the skill factor .τ of a task, the bi-directional variation (i.e., simplification and reconstruction) between original high-dimensional representation .v D and taskspecific low-dimensional representation .ud can be formulated as follows: .
ud = Ψ (v D , τ ) v D = Ψ _inv(ud , τ )
(9.1)
where .Ψ denotes the used mapping or variation method. The focus of this study lies in providing a general framework to give full play to the superiority of evolutionary multitasking for solving LSMOPs without the intention of proposing a new or improved problem variation function. MOEMT has the flexibility to utilize multiple arbitrary variation methods that have been proven effective. In this study, the weight optimization (WO) method, which is widely used in existing transformation-based algorithms [262–264], is borrowed for problem simplification and extended for the reverse process. Recall that dimension reduction is only one particular realization approach for simplification to illustrate the effectiveness of MOEMT. Without loss of generality, other methods to simplify the original problem can also be applied. WO is a representative approach for problem variation. Learning from [262, 263], we design a bi-directional mapping, .W O(v D , P rime) and .W O_inv(ud , P rime), to implement the simplification and reconstruction. The decision variables are divided into several groups by different grouping mechanisms, and each group is assigned a weight variable to control the entire group of variables. Consequently, the optimization of the weight variables can be regarded as optimizing the simplified space of the original problem. The solutions found in the weight variable space are evaluated based on a reference solution P rime, which is selected from the original population based on NF and CD. Following the best-performing choice of grouping and transformation method in [262], we use the linear grouping method
9.2 Algorithm Design and Details
177
and the p-Value transformation as the realization here. For the step of simplification, after p-Value transformation based on the reference solution P rime, the highdimensional intermediate variables .ui takes the median of each group as the final condensed weight variables .ud . W O(v D , P rime) =(median(u1, · · · , ul ), · · · ,
.
median(un−l+1, · · · , un ))
ui = (vi − P rimei )/(a · (P rimei,max − P rimei,min )) + 1.0
.
Ψ (v D , τ ) = W O(v D , task(τ ).prime)
.
(9.2) (9.3) (9.4)
For the inversion, each weight variable .uj controls the entire group of variables, updating reference solution P rime to obtain the reconstructed solution .v D for further evaluation and knowledge transfer. W O_inv(ud , P rime) = (v1 , · · · , vl , · · · , vn−l+1 , · · · , vn ).
.
u1
uγ
vi = P rimei + a · (P rimei,max − P rimei,min ) · (uj − 1.0)
.
Ψ _inv(ud , τ ) = W O_inv(ud , task(τ ).prime)
.
(9.5)
(9.6) (9.7)
The boundary of the weight variables .ud is set to [0,2] to translate the variations into the interval [.−1,1], whereas parameter .a ∈ [0, 1] controls the actual amounts of change. To bring more diversity, the dimensions of weight variables of different tasks are set to be incremental. Using the WO method, several simplified problem variants with varying search spaces are constructed as auxiliary or helper tasks to assist the original problem-solving.
9.2.2.2 Dynamic Task Replacement Dynamic task replacement is another component in MOEMT to further enhance the performance for large-scale multi-objective optimization. In the evolution process, the environmental selection of survival of the fittest will gradually reveal the unbalanced effectiveness of auxiliary or helper tasks. Assuming that most of the solutions found in a task-specific search space do not survive in the next generation, this simplified task (or problem variant) may lose its activity and cannot provide useful information for the original problem-solving. This appearance is reflected in the fact that the number of individuals with skill factors denoting this task is N lower than the initial value: . K , where N is the number of the initial population, and K denotes the number of tasks. In addition to the drawback of limited knowledge provided by the helper task, the original task may encounter various difficulties
178
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
in different evolution stages, e.g., falling into a local optimum. Thus, we should periodically update the properties of the task related to constructing simplified problem variants, e.g., the reference prime solution P rime for weights. Every t function evaluations, MOEMT checks the activity of all tasks and replaces the inactive task. The interval of dynamic task replacement t should not be too small since it will lead to an insufficient search in simplified space and more additional resource consumption. At the same time, the over large t value will also weaken the role of task replacement. To keep the superiority of the current population, task replacement does not change the gene information of individuals. The genetic material .v D of supplementary individuals on the task to be updated is directly inherited from the current population. Due to the different landscapes, the unified representation of the solution (i.e., .v D ) belonging to the selected task needs to be mapped to the new search space (i.e., .ud ) through the simplification method with updated attributes. The summary of the dynamic task replacement is given in Algorithm 14.
Algorithm 14: Dynamic task replacement 1 Given the current population with the original representation v D 2 3 4 5 6 7 8
and current properties of tasks; Select up to N K existing individuals in each task as task.population, and the rest will be shuffled as candidates tP op; for task(i) in T asks do N if len(task(i).pop) < K then N Supply = K − len(task(i).pop); if task(i).method is WO then Select a prime in current population as the reference solution P rime; Update task(i).prime;
12 13
for pj in the first Supply individuals of tP op do τpj = i; udj = Ψ (v D j , τpj ); Append pj to task(i).pop; Delete pj in tP op ;
14
Cluster individuals of each task as new Population;
9 10 11
9.2 Algorithm Design and Details
179
9.2.3 Multi-Factorial Evolutionary Search After task construction, several simplified problem variants with different search spaces are generated as auxiliary or helper tasks for the original problem. Multifactorial evolutionary search is performed in the multi-task scenario to simultaneously optimize the original problem and the simplified problem variants. Since domain knowledge of tasks is typically represented as the populationbased genetic material, the implicit knowledge transfer across tasks is implemented through chromosomal information transmission (e.g., chromosomal crossover). Figure 9.2 gives a toy example of generating offspring during the multi-factorial evolutionary search. When transferring knowledge between different tasks, the crossover is taken place in the original space to ensure a unified representation scheme. In the MOEMT, two processes, i.e., knowledge transfer decision and
Fig. 9.2 Toy example of offspring generation in multi-factorial evolutionary search
180
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
selective inheritance, act in conjunction to achieve implicit knowledge transfer following multi-factorial inheritance features. Details are described as follows.
9.2.3.1 Knowledge Transfer Decision Each task is differentiated by a distinct skill factor as a computational depiction of its cultural trait. Intra-task evolution and cross-task evolution are drove by knowledge transfer decision. For creating offspring or new particles, parent candidates or reference particles are selected following different strategies of configured optimizers based on their scalar fitness values. Algorithm 15 presents the process of knowledge transfer decision for generating offspring generation by crossover and mutation. Algorithm 15: Knowledge transfer decision 1 Given candidate parents p1 and p2 with the original representation
v D and the variant representation ud ; 2 if τp1 == τp2 then 3 (v c1 , v c2 ) = Crossover_Mutate(up1 , up2 ); 4 else if rand(0, 1) < rmp then 5 v p = Ψ _inv(up , τp ); // Inversion
(v c1 , v c2 ) = Crossover_Mutate(v p1 , v p2 );
6 else 7 c1 = Mutate(p1 ) , c2 = Mutate(p2 );
Following the principle of assortative mating [126, 128], individuals sharing a common skill factor can crossover freely in the task-specific search spaces (line 3 in Algorithm 15). If their skill factor differs, the crossover only occurs under a prescribed random mating probability (rmp) (line 5–6 in Algorithm 15), or else parent candidates produce mutant offspring respectively via genetic mutation. Since multiple simplified tasks are reformulated from the original problem via different variation approaches, they are confronted with entirely different landscapes. We cannot directly undergo crossover to transfer the knowledge (in the form of genetic materials) across tasks but decode the representations into a unified search space. Particularly, MOEMT can reversely map the simplified spaces to the original high-dimensional space to obtain a unified representation. Therefore, the decoded chromosome .v D is generated through the reverse reconstruction function .Ψ _inv. For the necessary conditions, we generate a random number of rand between 0 and 1. If rand is small than predefined rmp, seamless information transmission from one task to another occurs. The parameter rmp plays a vital role in regulating
9.2 Algorithm Design and Details
181
the frequency of genetic transfer between tasks. A smaller value of rmp implies more internal evolution that only individuals sharing a common skill factor are allowed to crossover for the single task evolutionary. It is also a way to avoid negative transfer when the similarity of tasks is uncertain. Conversely, if the tasks are closely related, rmp should be set with a higher value to promote the unhindered genetic material exchange.
9.2.3.2 Selective Inheritance The generated offspring and new particles are required to determine the skill factor associated with tasks to obtain their task-specific representations (i.e., .ud ) for nextgeneration search. Taking the inspiration from vertical cultural transmission [122], we propose a selective inheritance strategy summarized in Algorithm 16, which allows offspring to randomly imitates the skill factor of any of the parents and be evaluated for the selected task. For the corresponding part of cross-task cultural exchange (line 4–9 in Algorithm 16), the inherited skill factor is used for selective evaluation and indicating the task-specific simplification method and landscape of the search space. After inheriting the skill factor .τi , the unified representation (.v D ) should be embedded into the simplified space of the current task (.ud ) using the corresponding variation method.
Algorithm 16: Selective inheritance 1 Given an offspring c ∈ C and its original representation v D ; 2 if τp1 == τp2 then 3 c imitate skill factor τp ; 4 else if c = Crossover + Mutate(p1 , p2 ) and rand(0, 1) < 0.5 then 5 c imitate skill factor τp1 ; 6
ud = Ψ (v D , τp1 ); // Simplification
7 else if c = Crossover + Mutate(p1 , p2 ) and rand(0, 1) ≥ 0.5 then 8 c imitate skill factor τp2 ; 9
ud = Ψ (v D , τp2 ); // Simplification
10 else if c = Mutate(p1 ) then 11 c imitate skill factor τp1 ; 12 else 13 c imitate skill factor τp2 ;
182
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
9.3 Empirical Study This section conducts comprehensive experiments on nine benchmark problems from the LSMOP test suite [268] to evaluate the utility of MOEMT for largescale multi-objective optimization. The performance of MOEMT shall be compared with five state-of-the-art large-scale methods representing different categories. The experiment setup and results analysis are presented in the rest of this section.
9.3.1 Experimental Settings 9.3.1.1 Baseline Methods and Benchmark Problems As discussed in Chap. 3, the multi-factorial evolutionary search is performed on both the original problem and the simplified problem relying on any population-based (or swarm-based) optimizer. In this study, four different types of MOEAs (NSGAII [84], MOEADDE [269], SMPSO [270], and LMOCSO [248]) are embedded into the MOEMT algorithm for comprehensive performance comparison. The first three internal optimizers are chosen because they are widely used in the existing large-scale multi-objective optimization frameworks [250, 262, 263] and can also guarantee the fairness of the result. In contrast, the last new competitive swarm optimizer is only used for a fair comparison. Then, five state-of-the-art large-scale MOEAs with seven implementations using different optimizers are involved in the experimental comparison. They are the representative algorithms of the three categories described in Section II, namely the decomposition-based method MOEADVA [250], the strategy-based method LMOCSO [248] and DGEA [259], and the transformation-based methods LSMOF [263] and WOF [262]. In addition to the category, WOF and LSMOF are chosen mainly because both of them adopt the same weight optimization method as ours for problem variation. Previous studies [262] and [263] have empirically confirmed the improvement of their proposed methods for the performance of the embedded internal optimizers, so we directly explore the advantages of MOEMT over the existing large-scale MOEAs instead of comparing with the original version of the embedded MOEAs. Different solvers have a variety of strengths for the problems to be optimized. According to the internal optimizer embedded, the largescale MOEA is divided into four groups and compared with the corresponding MOEMT for fairness. The benchmark problems adopted for the experimental studies are from LSMOP test suite [268], which contains nine widely used test problems for large-scale multiobjective optimization (i.e., LSMOP1-LSMOP9). LSMOP1 to LSMOP4 have a linear Pareto optimal Front (PF) and linear variable linkage in the Pareto optimal solutions (PSs), whereas LSMOP5 to LSMOP9 have nonlinear variable linkage in the PSs and a convex PF for LSMOP5-LSMOP8 and a disconnected PF for
9.3 Empirical Study
183
LSMOP9. Regarding the modality and separability, the LSMOP test suite achieves the combination of different properties of the landscapes. For example, LSMOP1 and LSMOP5 have a unimodal and fully separable fitness landscape. In this study, the number of objectives of test problems (M) is set to 2 and 3, and the number of decision variables (D) is set from 500 up to 5000.
9.3.1.2 Parameter Settings In this study, five state-of-the-art algorithms proposed for LSMOPs are compared against MOEMT. For fair comparisons, all the experiments are implemented by Matlab codes on PlatEMO v2.7 [271] by adopting the recommended parameter settings in the literature of the compared methods. Table 9.1 summarizes the general and algorithm-specific parameter settings. The population size is set to .N = 100, and the maximum number of function evaluations (FEs) is set to .F Es = 100,000 for all compared algorithms. The relatively small number of FEs is more practical for real-world large-scale optimizations because of the limitation of the computing resources and economic cost. Each method is run 20 times on the test problems independently. In MOEMT, the number of simplified problem variants is set to 4, and the dimensions of them are set to increments of 5–20. Random mating probability rmp and interval of dynamic task replacement t control the degree of genetic material exchange across tasks and the frequency of task replacement, respectively. After investigating the effect of different rmp and t values on the performance of MOEMT (refer to Sect. 9.3.4), we adopt the optimal settings in all experiments., i.e., .rmp = 0.8 and .t = 500. The parameter settings related to the WO method are kept consistent with comparison methods [262] for the sake of fairness, including grouping strategy, transformation function, bound of the weight variable, and p.
9.3.1.3 Performance Indicator The goal of the multi-objective optimization problem is that the distance of the obtained PS to the true PF should be minimized, but also, a good extent of the obtained non-dominated front is desirable. To measure both the diversity and convergence of a solution set, the widely used performance indicators, inverted generational distance (IGD) indicator [272, 273] is adopted for experimental evaluation. IGD is a comprehensive performance indicator calculated by the following definition: v∈P dis(v, Q) , .I GD(P , Q) = (9.8) |P | where P is a set of reference points uniformly distributed on the true PFs, and .|P | is the number of the point sets. Q is the optimal non-dominated solution set obtained
184
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
Table 9.1 Parameter settings for MOEMT and comparison algorithms
General settings
Popularion size Termination condition Independent runs MOEA/DVA Number of interaction analysis Number of control property analysis DGEA Number of reference vectors Internal optimizer LSMOF Weight optimization generation Weight optimization population size Internal optimizer WOF Number of groups Number of chosen solutions Weight optimization population size Bound of the weight variable Grouping strategy Transformation function p Original problem evaluations Transformed problem evaluations Delta Internal optimizer MOEMT Number of simplified variants Dimension of simplified spaces Bound of the weight variable Grouping strategy for WO method Transformation function for WO method p for WO method Random mating probability (rmp) Interval of dynamic task replacement Internal optimizer
.N
= 100 = 100000
.F Es
20
6 20
10 NSGA-II
10 30 NSGA-II / MOEADDE
4 M+1 10*(M+1) [0,2] Linear grouping p-value function 0.2 1000 500 0.5 NSGA-II/SMPSO
4 5/10/15/20 [0,2] Linear grouping p-value function 0.2 0.8 500 NSGA-II/MOEADDE/SMPSO/LMOCSO
9.3 Empirical Study
185
by the algorithm, and .dis(v, Q) is the minimum Euclidean distance between the v ∈ P and points in Q. IGD evaluates the convergence and diversity performance by calculating the average value of the minimum distances between each reference point on the true PFs and the solution set obtained by the algorithm. Thus, the smaller the IGD value, the better the algorithm’s overall performance, including convergence and diversity. In our evaluation, .|P | is set to 5000 and 10,000 for biobjective problems and tri-objective problems.
.
9.3.2 Performance Comparisons with the State-of-the-Arts The state-of-the-art large-scale MOEAs are divided into four groups based on the internal optimizer embedded into the algorithm. Pairwise comparisons are conducted in the same group between the instances of large-scale MOEAs and the corresponding MOEMT series. Symbols “+”, “.−”, and “=” indicate the compared algorithm is significantly better than, worse than, and statistically tied by corresponding MOEMT under the Wilcoxon rank-sum test [274] with a 95% confidence interval.
9.3.2.1 IGD Values In the first group, NSGA-II is embedded into LSMOF, WOF, DGEA, and MOEMT, as the internal optimizer. The statistics of IGD results obtained by these three compared algorithms are displayed in Table 9.2. The overall performance of MOEMT is significantly better than the corresponding instances of competitors. To be specific, MOEMT-NSGAII achieves the best results on 52 over 72 test problems, while the compared algorithms using the same internal optimizer, namely LSMOFNSGAII, WOF-NSGAII, and DGEA-NSGAII only respectively gain 8, 0, and 12 best results. MOEMT-NSGAII is outperformed by LSMOF-NSGAII in 8 instances and by DGEA-NSGAII in 13 cases. Most of them are distributed on tri-objective LSMOP2-LSMOP4. When the underlying solver is MOEADDE in the second group, shown in Table 9.3, MOEMT gets 54 out of 72 best results and is outperformed by the LSMOF on 18 test problems. In particular, the best performance obtained by MOEMT-MOEADDE is mainly on LSMOP1, LSMOP5, LSMOP7, LSMOP8, LSMOP9, and tri-objective LSMOP2-LSMOP4, whereas LSMOF-MOEADDE gains the best results on LSMOP6. The performances of the different algorithms also depend on the target problems. Recall that MOEMT-NSGAII fails on tri-objective LSMOP2-LSMOP4, while MOEMT-MOEADDE performed well on the same issues, indicating that MOEMT series with different embedded solvers have the expertise for solving different test problems. Note that our MOEMT-MOEADDE outperforms MOEA/DVA on all test instances, mainly because the preliminary
LSMOP2
Problem LSMOP1
3
2
3
M 2
D 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000
NSGA-II LSMOF-NSGAII 5.92e−1 (4.22e−2)6.23e−1 (2.78e−2)− 6.45e−1 (1.58e−2)− 6.56e−1 (1.17e−2)− 5.67e−1 (7.26e−3)− 6.12e−1 (1.17e−2)− 6.48e−1 (1.69e−2)− 6.92e−1 (1.52e−2)− 2.40e−2 (1.31e−3)1.84e−2 (4.96e−4)− 1.30e−2 (3.66e−4)− 1.10e−2 (4.63e−4)− 8.91e−2 (3.83e−3) = 7.21e−2 (3.63e−3) = 6.51e−2 (5.72e−3) + 6.41e−2 (5.56e−3) + WOF-NSGAII 4.33e−1 (8.39e−2)− 5.11e−1 (1.17e−1)− 4.97e−1 (8.38e−2)− 5.68e−1 (4.84e−2)− 3.82e−1 (3.59e−2)− 5.34e−1 (2.52e−2)− 6.43e−1 (3.96e−2)− 7.55e−1 (3.80e−2)− 2.69e−2 (2.36e−3)− 1.80e−2 (5.03e−4)− 1.23e−2 (3.14e−4)− 1.01e−2 (5.83e−4)− 8.63e−2 (3.82e−3) = 6.90e−2 (2.96e−3) + 6.37e−2 (4.77e−3) + 5.96e−2 (4.92e−3) +
DGEA-NSGAII 3.43e−1 (5.15e−2)− 4.05e−1 (1.08e−1)− 3.82e−1 (1.45e−1)− 3.70e−1 (1.10e−1)− 5.99e−1 (1.41e−1)− 5.99e−1 (1.40e−1)− 6.92e−1 (2.15e−1)− 6.26e−1 (1.74e−1)− 1.49e−2 (6.50e−4)− 9.49e−3 (4.43e−4)− 7.07e–3 (3.19e–4) + 5.44e–3 (1.46e–4) + 6.92e–2 (2.03e–3) + 6.20e–2 (2.71e–3) + 5.93e–2 (1.97e–3) + 5.85e–2 (2.85e–3) +
MOEMT-NSGAII 5.37e–2 (3.91e–2) 8.72e–2 (1.12e–1) 4.68e–2 (3.52e–2) 8.51e–2 (1.10e–1) 3.38e–1 (7.42e–2) 3.65e–1 (9.18e–2) 3.73e–1 (1.06e–1) 3.08e–1 (6.22e–2) 1.11e–2 (6.04e–4) 9.22e–3 (5.63e–4) 7.46e−3 (3.41e−4) 6.19e−3 (3.39e−4) 8.74e−2 (6.44e−3) 7.50e−2 (5.72e−3) 6.98e−2 (3.87e−3) 7.95e−2 (9.07e−3)
Table 9.2 The statics of IGD results obtained by three compared methods with the optimizer NSGA-II on 72 test instances from LSMOP suite. The best performance is shown in bold font
186 9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
LSMOP4
LSMOP3
3
2
3
2
500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000
1.56e+0 (6.81e−4) + 1.57e+0 (5.82e−4)− 1.57e+0 (3.24e−4)− 1.57e+0 (7.86e−5)− 8.53e–1 (8.21e–3) + 8.60e–1 (4.93e–4) + 8.60e–1 (5.29e–5) + 8.60e–1 (5.90e–5) = 5.17e−2 (1.39e−3)− 3.29e−2 (9.26e−4)− 2.36e−2 (1.28e−3)− 1.63e−2 (7.34e−4)− 2.22e−1 (5.10e−3)− 1.45e−1 (7.19e−3)− 9.71e−2 (4.68e−3)− 7.04e−2 (6.46e−3) +
1.50e+0 (4.14e−2) + 1.54e+0 (7.62e−2)− 1.46e+0 (1.07e−1)− 1.54e+0 (9.82e−2)− 8.55e−1 (2.28e−2) + 8.69e−1 (2.05e−2)− 8.69e−1 (3.10e−2)− 8.68e−1 (1.75e−2)− 5.68e−2 (4.60e−3)− 3.64e−2 (2.63e−3)− 2.70e−2 (1.97e−3)− 1.71e−2 (5.30e−4)− 2.11e−1 (7.12e−3)− 1.35e−1 (5.74e−3)− 9.20e−2 (4.32e−3) = 6.69e−2 (3.21e−3) +
1.46e+0 (5.02e–1) + 1.59e+0 (1.51e−1) = 1.87e+0 (4.09e−1)− 4.24e+0 (1.70e+0)− 8.53e−1 (2.23e−2) + 8.83e−1 (1.64e−1)− 1.18e+0 (7.58e−1) = 1.19e+0 (7.44e−1)− 4.68e−2 (3.80e−3)− 2.73e−2 (2.02e−3)− 1.52e−2 (5.93e−4)− 8.38e−3 (4.39e−4) = 1.29e–1 (4.02e–3) + 8.81e–2 (3.25e–3) + 6.82e–2 (2.76e–3) + y6.11e–2 (2.48e–3) +
(continued)
1.56e+0 (9.46e−3) 1.52e+0 (1.16e–1) 1.20e+0 (1.83e–1) 9.08e–1 (8.11e–2) 8.72e−1 (5.43e−2) 8.60e−1 (2.30e−5) 8.60e−1 (2.13e−5) 8.60e−1 (4.24e−5) 2.40e–2 (2.04e–3) 1.66e–2 (8.34e–4) 1.22e–2 (5.50e–4) 8.24e–3 (3.20e–4) 1.61e−1 (1.39e−2) 1.22e−1 (4.97e−3) 9.02e−2 (3.92e−3) 8.32e−2 (8.57e−3)
9.3 Empirical Study 187
LSMOP6
Problem LSMOP5
3
2
3
M 2
Table 9.2 (continued)
D 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000
NSGA-II LSMOF-NSGAII 7.42e−1 (1.14e−16) = 7.42e−1 (1.14e−16) = 7.42e−1 (1.14e−16) = 7.42e−1 (1.14e−16) = 5.29e−1 (1.79e−2)− 5.39e−1 (3.44e−3)− 5.41e−1 (3.02e−3)− 5.40e−1 (3.17e−5)− 3.20e–1 (4.08e–4) = 3.12e−1 (1.98e−4) = 3.08e−1 (5.62e−5) = 3.07e−1 (2.90e−5) = 7.44e–1 (2.41e–2) = 7.77e–1 (3.19e–2) = 7.74e−1 (3.85e−2)− 7.78e–1 (5.26e–2) + WOF-NSGAII 7.13e−1 (8.91e−2) = 7.44e−1 (4.21e−3)− 7.47e−1 (1.31e−2)− 7.47e−1 (9.48e−3)− 5.75e−1 (1.01e−1)− 5.57e−1 (4.59e−2)− 5.85e−1 (9.91e−2)− 6.76e−1 (1.35e−1)− 5.45e−1 (1.43e−1)− 5.69e−1 (1.30e−1)− 5.90e−1 (1.23e−1)− 6.01e−1 (1.23e−1)− 1.56e+0 (2.19e−1)− 1.70e+0 (5.38e−2)− 1.71e+0 (2.19e−2)− 1.67e+0 (8.99e−2)−
DGEA-NSGAII 2.51e+0 (1.24e+0)− 3.75e+0 (1.46e+0)− 3.36e+0 (1.54e+0)− 3.17e+0 (1.20e+0)− 5.36e−1 (1.01e−1) = 8.30e−1 (5.96e−1) = 7.36e−1 (4.36e−1) = 9.66e−1 (6.53e−1)− 5.10e−1 (2.30e−1) = 6.92e−1 (1.60e−1)− 5.81e−1 (2.36e−1)− 7.45e−1 (3.78e−5)− 7.35e+0 (1.06e+1) = 2.33e+1 (4.04e+1) = 4.24e+1 (7.64e+1)− 1.35e+2 (4.68e+2) =
MOEMT-NSGAII 7.01e–1 (1.26e–1) 7.13e–1 (1.29e–1) 7.18e–1 (1.06e–1) 7.42e–1 (1.14e–16) 4.73e–1 (8.49e–2) 5.11e–1 (5.46e–2) 4.93e–1 (8.65e–2) 5.40e–1 (1.33e–1) 4.02e−1 (1.75e−1) 3.10e–1 (1.28e–1) 3.04e–1 (7.68e–2) 2.67e–1 (8.30e–2) 8.47e−1 (2.74e−1) 8.89e−1 (2.84e−1) 7.09e–1 (2.02e–1) 9.97e−1 (2.70e−1)
188 9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
LSMOP8
LSMOP7
3
2
3
2
500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000
1.50e+0 (1.18e−3) = 1.51e+0 (4.61e−4)− 1.51e+0 (3.71e−4)− 1.51e+0 (5.25e−4)− 8.96e−1 (8.28e−3)− 8.65e−1 (2.72e−3)− 8.50e−1 (1.19e−3)− 8.42e−1 (4.39e−3)− 7.42e−1 (1.14e−16) = 7.42e−1 (1.14e−16) = 7.42e−1 (1.14e−16) = 7.42e−1 (1.14e−16) = 3.54e−1 (2.34e−2)− 3.41e−1 (1.83e−2)− 3.46e−1 (2.64e−2)− 3.36e−1 (8.18e−3)−
1.53e+0 (1.14e−1)− 2.43e+1 (1.02e+2)− 1.48e+0 (9.01e−2)− 1.49e+0 (8.16e−2)− 9.32e−1 (2.93e−2)− 8.76e−1 (3.07e−2)− 7.86e−1 (5.72e−2)− 7.89e−1 (7.63e−2)− 6.59e−1 (1.23e−1) = 7.45e−1 (2.98e−3)− 7.43e−1 (1.71e−3)− 7.44e−1 (6.53e−3)− 3.44e−1 (2.92e−2)− 3.25e−1 (3.50e−2)− 3.27e−1 (2.64e−2)− 3.36e−1 (1.19e−2)−
1.03e+3 (1.07e+3)− 1.21e+3 (1.21e+3)− 4.73e+2 (1.27e+3)− 9.29e+2 (1.88e+3)− 9.82e−1 (7.93e−2)− 9.69e−1 (6.06e−2)− 9.65e−1 (1.87e−2)− 9.44e−1 (1.95e−2)− 8.09e−1 (1.73e−1)− 9.16e−1 (3.88e−1)− 1.34e+0 (8.83e−1)− 1.43e+0 (9.73e−1)− 2.11e–1 (1.96e–1) + 2.87e−1 (2.67e−1)− 4.48e−1 (3.07e−1) = 6.20e−1 (2.77e−1)−
continued
1.42e+0 (1.75e–1) 1.32e+0 (2.23e–1) 9.93e–1 (1.22e–1) 9.78e–1 (1.08e–1) 6.99e–1 (1.38e–1) 4.58e–1 (1.05e–1) 3.95e–1 (5.55e–2) 4.77e–1 (1.26e–1) 7.42e–1 (1.14e–16) 7.13e–1 (1.26e–1) 7.42e–1 (1.14e–16) 7.42e–1 (1.14e–16) 2.98e−1 (5.12e−2) 2.60e–1 (6.31e–2) 2.49e–1 (7.15e–2) 1.84e–1 (3.11e–2)
9.3 Empirical Study 189
Best/all +/-/=
Problem LSMOP9
3
M 2
Table 9.2 (continued)
D 500 1000 2000 5000 500 1000 2000 5000
NSGA-II LSMOF-NSGAII 8.09e−1 (6.05e−4)− 8.08e−1 (1.50e−3)− 8.05e−1 (1.70e−3)− 8.04e−1 (2.48e−3)− 1.49e+0 (1.21e−1)− 1.32e+0 (2.01e−1)− 1.22e+0 (1.60e−1)− 1.16e+0 (8.81e−2)− 8/72 8/46/18 WOF-NSGAII 8.10e−1 (2.40e−13)− 8.11e−1 (3.02e−3)− 8.11e−1 (5.16e−3)− 8.10e−1 (1.78e−3)− 1.36e+0 (1.37e−1)− 1.37e+0 (1.46e−1)− 1.36e+0 (1.51e−1)− 1.43e+0 (1.46e−1)− 0/72 6/62/4
DGEA-NSGAII 1.92e+0 (6.87e−1)− 6.82e+0 (2.83e+0)− 1.25e+1 (4.14e+0)− 1.65e+1 (5.18e+0)− 8.44e+0 (2.57e+0)− 2.52e+1 (4.01e+0)− 3.96e+1 (6.44e+0)− 5.40e+1 (4.36e+0)− 12/72 13/48/11
MOEMT-NSGAII 5.59e–1 (2.74e–2) 5.25e–1 (2.35e–2) 5.00e–1 (1.19e–2) 4.83e–1 (7.00e–3) 7.24e–1 (2.50e–1) 6.80e–1 (2.13e–1) 6.63e–1 (2.36e–1) 7.87e–1 (2.87e–1) 52/72 –
190 9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
9.3 Empirical Study
191
Table 9.3 The statics of IGD results obtained by three compared methods with the optimizer MOEADDE on 72 test instances from LSMOP suite. The best performance is shown in bold font
Problem
M
D
MOEADDE MOEA/DVA
LSMOF-MOEADDE
MOEMT-MOEADDE
LSMOP1
2
500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000
7.35e+0 (3.52e−1)− 9.56e+0 (2.41e−1)− 1.07e+1 (1.75e−1)− 1.15e+1 (1.65e−1)− 7.58e+0 (3.18e−1)− 9.60e+0 (3.37e−1)− 1.08e+1 (2.16e−1)− 1.16e+1 (2.81e−1)− 7.32e−2 (2.33e−4)− 4.06e−2 (6.57e−4)− 2.30e−2 (4.35e−4)− 1.27e−2 (4.35e−4)− 8.19e−2 (3.76e−3)− 6.67e−2 (3.33e−3)− 6.23e−2 (3.93e−3)− 5.96e−2 (3.96e−3)− 7.16e+2 (5.95e+2)− 1.10e+3 (1.04e+3)− 1.29e+3 (1.20e+3)− 1.00e+3 (9.76e+2)− 2.32e+2 (2.52e+2)− 5.35e+2 (4.79e+2)− 5.70e+2 (6.01e+2)− 8.58e+2 (6.92e+2)− 1.21e−1 (6.58e−4)− 7.42e−2 (3.44e−4)− 4.38e−2 (4.62e−4)− 2.30e−2 (4.25e−4)− 2.02e−1 (5.47e−3)− 1.31e−1 (3.07e−3)− 9.03e−2 (3.37e−3)− 6.74e−2 (2.77e−3)−
2.31e−1 (1.53e−2)− 3.64e−1 (4.02e−2)− 4.69e−1 (4.59e−2)− 5.40e−1 (3.46e−2)− 5.53e−1 (5.22e−2)− 6.42e−1 (3.90e−2)− 6.91e−1 (4.05e−2)− 6.90e−1 (4.62e−2)− 1.37e−2 (1.32e−3)− 8.86e−3 (1.12e−3) = 6.28e–3 (3.91e–4) + 4.34e–3 (8.95e–5) + 6.80e–2 (1.58e–3) + 6.19e−2 (1.01e−3)− 5.93e−2 (1.00e−3)− 5.86e−2 (1.11e−3)− 1.55e+0 (3.23e−3) = 1.56e+0 (6.61e–4) + 1.57e+0 (3.99e–4) + 1.57e+0 (1.92e–4) + 7.98e−1 (4.62e−2) = 8.17e−1 (4.61e−2)− 8.40e−1 (3.40e−2)− 8.38e−1 (3.78e−2)− 4.16e−2 (1.21e−3)− 2.38e−2 (9.19e−4)− 1.39e−2 (6.02e−4)− 7.43e–3 (6.21e–4) + 1.30e–1 (5.55e–3) = 8.99e–2 (2.09e–3) + 6.99e–2 (1.04e–3) = 6.05e−2 (8.87e−4)−
1.78e–2 (1.64e–3) 1.69e–2 (1.88e–3) 1.51e–2 (1.03e–3) 1.34e–2 (1.03e–3) 1.81e–1 (4.83e–2) 1.69e–1 (3.21e–2) 1.73e–1 (5.02e–2) 1.72e–1 (4.02e–2) 1.06e–2 (1.33e–3) 8.80e–3 (6.59e–4) 8.09e−3 (4.33e−4) 6.85e−3 (4.21e−4) 7.15e−2 (2.20e−3) 5.83e–2 (1.49e–3) 5.40e–2 (2.04e–3) 5.16e–2 (1.45e–3) 1.50e+0 (2.38e–1) 2.17e+0 (2.74e−1) 3.43e+0 (5.86e−1) 5.99e+0 (5.83e−1) 7.71e–1 (1.39e–2) 7.74e–1 (1.88e–2) 7.70e–1 (1.13e–2) 7.98e–1 (4.40e–2) 2.09e–2 (1.06e–3) 1.46e–2 (5.42e–4) 1.14e–2 (5.17e–4) 8.55e−3 (3.46e−4) 1.33e−1 (9.52e−3) 9.22e−2 (2.97e−3) 7.08e−2 (2.19e−3) 5.70e–2 (1.57e–3)
3
LSMOP2
2
3
LSMOP3
2
3
LSMOP4
2
3
(continued)
decision variable analysis of MOEA/DVA consumes a major share of the available computational budget before the actual optimization. In the groups employing SMPSO and LMOCSO as an optimizer, shown in Table 9.4, there are only two comparison algorithms per group. MOEMT-SMPSO
192
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
Table 9.3 (continued)
Problem
MOEADDE MOEA/DVA
LSMOF-MOEADDE
MOEMT-MOEADDE
500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500
1.66e+1 (4.90e−1)− 2.04e+1 (3.15e−1)− 2.29e+1 (3.40e−1)− 2.45e+1 (2.98e−1)− 1.31e+1 (9.73e−1)− 1.71e+1 (6.22e−1)− 1.94e+1 (5.82e−1)− 2.07e+1 (4.64e−1)− 1.86e+3 (1.84e+3)− 2.15e+3 (1.56e+3)− 2.60e+3 (2.69e +3)− 1.84e+3 (1.91e+3)− 2.38e+4 (4.52e+3)− 3.02e+4 (4.67e+3)− 3.97e+4 (6.57e+3)− 3.97e+4 (6.19e+3)− 5.09e+4 (3.66e+3)−
7.42e−1 (1.14e−16)− 7.42e−1 (1.14e−16)− 7.42e−1 (1.14e−16)− 7.42e−1 (1.14e−16)− 5.26e−1 (2.06e−2)− 5.41e−1 (2.20e−3)− 5.42e−1 (3.85e−3)− 6.11e−1 (1.49e−1)− 3.00e–1 (1.89e–2) + 2.88e–1 (2.75e–2) + 2.71e–1 (4.16e–2) + 2.13e–1 (5.54e–2) + 6.96e–1 (6.74e–3) + 6.97e–1 (2.74e–2) = 7.10e−1 (1.33e−2)− 7.36e–1 (4.52e–2) + 1.50e+0 (5.51e−4)−
6.27e–1 (1.12e–1) 6.75e–1 (9.63e–2) 6.18e–1 (1.02e–1) 6.54e–1 (9.62e–2) 3.65e–1 (2.30e–2) y3.70e–1 (2.27e–2) 3.89e–1 (7.41e–2) 4.44e–1 (1.52e–1) 6.79e−1 (1.69e−1) 6.79e−1 (1.69e−1) 6.33e−1 (2.01e−1) 7.19e−1 (1.02e−1) 7.65e−1 (4.41e−1) 1.03e+0 (7.71e−1) 7.03e–1 (2.46e–1) 1.98e+0 (6.19e+0) 1.47e+0 (8.08e–2)
1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000 500 1000 2000 5000
7.14e+4 (2.32e+3)− 8.44e+4 (2.46e+3)− 9.25e+4 (1.87e+3)− 1.48e+3 (1.23e+3)− 1.42e+3 (1.60e+3)− 2.82e+3 (2.71e+3)− 2.12e+3 (2.81e+3)− 1.40e+1 (6.20e−1)− 1.73e+1 (5.01e−1)− 1.95e+1 (3.32e−1)− 2.09e+1 (2.02e−1)− 7.28e−1 (1.04e−1)− 6.90e−1 (7.74e−2)− 8.10e−1 (3.51e−1)− 7.01e−1 (7.22e−2)−
1.50e+0 (4.68e−4)− 1.51e+0 (2.74e−4)− 1.51e+0 (1.51e−4)− 8.40e–1 (4.10e–2) = 8.34e−1 (4.10e−2) = 7.64e−1 (7.89e−2) = 7.98e−1 (8.43e−2) = 7.42e−1 (1.14e−16)− 7.42e−1 (1.14e−16)− 7.42e−1 (1.14e−16)− 7.42e−1 (1.14e−16)− 3.22e−1 (3.28e−2)− 3.34e−1 (2.88e−2)− 2.95e−1 (4.12e−2)− 3.35e−1 (1.15e−1)−
1.32e+0 (1.90e–1) 8.97e–1 (1.04e–1) 9.08e–1 (8.09e–2) 8.64e−1 (8.90e−2) 8.02e–1 (1.69e–1) 7.42e–1 (1.89e–1) 7.11e–1 (1.70e–1) 1.06e–1 (2.17e–1) 1.32e–1 (2.63e–1) 1.87e–2 (1.56e–3) 1.24e–1 (2.66e–1) 2.18e–1 (1.06e–1) 1.07e–1 (6.19e–2) 7.73e–2 (6.10e–3) 7.08e–2 (1.61e–3)
M D
LSMOP5 2
3
LSMOP6 2
3
LSMOP7 2
3
LSMOP8 2
3
(continued)
9.3 Empirical Study
193
Table 9.3 (continued)
Problem
M
D
LSMOP9
2
500 1000 2000 5000 500 1000 2000 5000
3
Best/all +/-/=
MOEADDE MOEA/DVA
LSMOF-MOEADDE
MOEMT-MOEADDE
3.62e+1 (1.98e+0)− 4.92e+1 (1.44e+0)− 5.68e+1 (9.36e−1)− 6.20e+1 (7.71e−1)− 8.63e+1 (4.54e+0)− 1.21e+2 (3.65e+0)− 1.40e+2 (2.95e+0)− 1.52e+2 (2.06e+0)− 0/72 0/72/0
8.09e−1 (7.76e−4)− 8.07e−1 (2.30e−3)− 8.05e−1 (2.92e−3)− 8.02e−1 (3.54e−3)− 1.14e+0 (3.30e−3)− 1.15e+0 (7.07e−3)− 1.15e+0 (9.79e−3)− 1.21e+0 (1.33e−1)− 18/72 14/48/10
7.53e–1 (5.89e–2) 7.54e–1 (5.37e–2) 7.46e–1 (7.93e–2) 7.63e–1 (6.29e–2) 1.14e+0 (8.83e–4) 1.08e+0 (1.71e–1) 1.14e+0 (3.57e–3) 1.13e+0 (1.19e–2) 54/72 –
achieves 64 out of 72 best cases outperforming the corresponding WOF-SMPSO, and MOEMT-LMOCSO is better in 62 out of 72 cases. For WOF-SMPSO, although better results were obtained on only 3 test cases, statistical significance tests show that it is comparable to MOEMT in about one-quarter of the problems. However, on the remaining benchmark problems, it is significantly inferior to MOEMT, mainly due to the transformed search space of WOF limiting the reachable solutions in the original space. LMOCSO only performed well on the tri-objective LSMOP2 and LSMOP4, mostly due to its greater demand for computing resources. Since LMOCSO optimizes all decision variables together, the convergence speed is slow. Regarding the impact of the internal optimizer in MOEMT, it can be systematically reviewed that the instance of MOEMT with the SMPSO solver achieves the average best performance. For most LSMOPs, when the number of objectives for the given problem increases, the average IGD values obtained by the MOEMT series are smaller than other competitors, which indicates MOEMT may be handier for dealing with complex problems.
9.3.2.2 Convergence Trace Due to the page limitation, representative convergence graphs of 11 algorithms on bi-objective LSMOP1 with 1000 variables, bi-objective LSMOP8 with 2000 variables, and tri-objective LSMOP9 with 5000 variables are presented in Fig. 9.3. It can be observed that the series of MOEMT using different solvers converge to a promising level of IGD value in the early stages of the evolutionary process and have fast convergence rates on three test instances. With obvious advantages, MOEMT-SMPSO and MOEMT-MOEADDE stand out among compared methods on both LSMOP1 and LSMOP8, followed by WOF-SMPSO. The convergence trends of the LSMOF series on LSMOP1 are divided into two stages. Although the convergence is accelerated in the second stage, they still cannot exceed the MOEMT
194
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
Table 9.4 The statics of IGD results obtained by two compared methods with the optimizer LMOCSO and SMPSO on 72 test instances from LSMOP suite. The best performance is shown in bold font SMPSO Problem
M D
LSMOP1 2
3
LSMOP2 2
3
LSMOP3 2
3
LSMOP4 2
3
LSMOP5 2
3
LMOCSO
WOF-SMPSO
MOEMT-SMPSO
LMOCSO
500 7.13e−2 (2.89e−3)−
1.31e–2 (1.19e–3)
6.36e−1 (4.96e−2)− 2.12e–1 (1.07e–1)
1000 7.59e−2 (3.12e−3)−
1.32e–2 (1.20e–3)
1.46e+0 (1.00e−1)− 3.29e–1 (1.02e–1)
2000 7.65e−2 (3.32e−3)−
1.29e–2 (1.48e–3)
2.02e+0 (1.64e−1)− 3.97e–1 (1.64e–1)
5000 7.66e−2 (2.80e−3)−
1.20e–2 (9.96e–4)
2.28e+0 (1.62e−1)− 4.89e–1 (1.42e–1)
500 2.02e−1 (1.81e−2)−
1.24e–1 (2.04e–2)
1.37e+0 (1.18e−1)− 6.46e–1 (4.76e–2)
1000 2.13e−1 (2.66e−2)−
1.24e–1 (1.57e–2)
1.67e+0 (9.97e−2)− 6.67e–1 (4.43e–2)
2000 2.06e−1 (2.32e−2)−
1.14e–1 (1.58e–2)
1.84e+0 (1.26e−1)− 7.13e–1 (5.68e–2)
5000 2.08e−1 (1.67e−2)−
1.21e–1 (1.47e–2)
1.93e+0 (1.30e−1)− 7.34e–1 (5.92e–2)
500 9.42e−3 (4.45e−4)−
7.51e–3 (7.30e–4)
4.38e−2 (8.17e−4)− 1.29e–2 (1.92e–3)
1000 7.60e−3 (3.61e−4)−
6.76e–3 (3.98e–4)
2.48e−2 (5.96e−4)− 1.05e–2 (1.21e–3)
2000 6.20e−3 (2.76e−4) =
6.15e–3 (2.22e–4)
1.42e−2 (2.88e−4)− 8.17e–3 (5.12e–4)
5000 5.34e–3 (2.79e–4) =
5.45e−3 (2.07e−4) 7.56e−3 (8.07e−5)− 6.69e–3 (4.61e–4)
MOEMT-LMOCSO
500 6.83e−2 (2.11e−3)−
6.65e–2 (2.98e–3)
5.99e–2 (5.66e–4) +
7.87e−2 (2.25e−3)
1000 5.94e−2 (2.58e−3)−
5.74e–2 (1.62e–3)
4.99e–2 (3.43e–4) +
6.31e−2 (2.57e−3)
2000 5.49e−2 (2.33e−3) =
5.43e–2 (1.64e–3)
4.53e–2 (1.45e–4) +
5.63e−2 (1.95e−3)
5000 5.41e−2 (2.91e−3) =
5.27e–2 (1.89e–3)
4.29e–2 (5.22e–5) +
5.32e−2 (1.37e−3)
500 9.70e−1 (1.81e−1)−
7.55e–1 (1.44e–1)
9.00e–1 (1.05e–1) +
1.56e+0 (5.59e−3)
1000 9.53e−1 (1.64e−1)−
7.41e–1 (1.41e–1)
2.76e+0 (9.28e−1)− 1.55e+0 (7.71e–2)
2000 8.91e−1 (1.03e−1) =
8.47e–1 (2.39e–1)
9.45e+0 (4.26e+0)−
5000 1.02e+0 (9.90e–2) =
1.04e+0 (1.67e−1) 1.85e+1 (4.36e+0)−
1.04e+0 (1.74e–1)
8.80e−1 (9.04e−2) 1.05e+1 (2.11e+0)−
1.33e+0 (1.50e+0)
500 8.60e–1 (1.14e–16) =
1.40e+0 (1.28e–1)
1000 8.60e−1 (1.14e−16) = 8.60e–1 (1.14e–16) 1.34e+1 (2.64e+0)−
8.60e–1 (5.90e–12)
2000 8.60e−1 (1.14e−16) = 8.60e–1 (1.23e–5)
1.37e+1 (3.38e+0)−
1.09e+0 (4.72e–1)
5000 8.60e−1 (1.14e−16) = 8.60e–1 (6.48e–4)
1.90e+1 (9.43e+0)−
1.60e+0 (1.64e+0)
500 3.65e−2 (7.49e−4)−
1.53e–2 (1.03e–3)
8.65e−2 (1.47e−3)− 2.53e–2 (3.21e–3)
1000 2.07e−2 (7.03e−4)−
1.09e–2 (4.15e–4)
5.21e−2 (4.94e−4)− 1.74e–2 (1.83e–3)
2000 1.25e−2 (2.62e−4)−
8.14e–3 (2.87e–4)
3.00e−2 (2.10e−4)− 1.40e–2 (2.14e–3)
5000 7.49e−3 (2.61e−4)−
6.88e–3 (2.14e–4)
1.49e−2 (1.00e−4)− 9.20e–3 (6.50e–4)
500 1.35e−1 (6.17e−3)−
1.27e–1 (7.04e–3)
1.54e–1 (1.64e–3) +
1.82e−1 (9.53e−3)
1000 9.18e−2 (4.93e−3)−
8.85e–2 (3.38e–3)
9.84e–2 (8.40e–4) +
1.15e−1 (2.49e−3)
2000 6.77e–2 (2.45e–3) =
6.80e−2 (2.14e−3) 6.70e–2 (3.40e–4) +
7.97e−2 (2.09e−3)
5000 5.64e−2 (2.60e−3) =
5.63e–2 (2.11e–3)
4.91e–2 (1.43e–4) +
6.02e−2 (1.90e−3)
500 1.68e−1 (1.10e−2)−
7.16e–2 (1.39e–2)
1.25e+0 (1.81e−1)− 7.27e–1 (3.72e–2)
1000 1.92e−1 (1.07e−2)−
7.50e–2 (1.31e–2)
3.68e+0 (5.28e−1)− 7.41e–1 (5.83e–4)
2000 2.01e−1 (1.08e−2)−
7.36e–2 (1.53e–2)
4.95e+0 (6.69e−1)− 7.42e–1 (1.14e–16)
5000 1.94e−1 (1.19e−2)−
6.69e–2 (9.43e–3)
5.82e+0 (6.01e−1)− 7.42e–1 (8.23e–5)
500 4.52e−1 (5.52e−2)−
2.66e–1 (8.55e–2)
2.84e+0 (2.68e−1)− 5.14e–1 (5.67e–2)
1000 4.44e−1 (6.80e−2)−
2.40e–1 (5.80e–2)
3.65e+0 (2.52e−1)− 5.21e–1 (3.11e–2)
2000 4.61e−1 (4.88e−2)−
2.64e–1 (8.32e–2)
3.95e+0 (2.52e−1)− 5.08e–1 (4.64e–2)
5000 4.75e−1 (5.63e−2)−
2.48e–1 (7.71e–2)
4.24e+0 (2.69e−1)− 5.30e–1 (4.68e–2) (continued)
9.3 Empirical Study
195
Table 9.4 (continued) SMPSO Problem
M D
LSMOP6 2
3
LSMOP7 2
3
LSMOP8 2
3
LSMOP9 2
3
LMOCSO
WOF-SMPSO
MOEMT-SMPSO
LMOCSO
500 1.37e−1 (1.48e−2)−
4.33e–2 (9.53e–3)
7.83e−1 (7.28e−3)− 4.71e–1 (1.90e–1)
MOEMT-LMOCSO
1000 1.65e−1 (6.46e−2)−
7.04e–2 (7.59e–2)
7.68e−1 (6.06e−3)− 4.03e–1 (1.59e–1)
2000 1.52e−1 (5.48e−2)−
7.49e–2 (8.68e–2)
7.55e−1 (9.34e−4)− 4.03e–1 (1.57e–1)
5000 1.58e−1 (5.33e−2) =
1.35e–1 (1.21e–1)
7.47e−1 (1.63e−4)− 3.80e–1 (1.23e–1)
500 9.19e−1 (2.39e−1) =
8.53e–1 (3.43e–1)
4.76e+1 (6.01e+1)−
2.55e+0 (2.39e+0)
1000 8.50e−1 (2.04e−1)−
7.97e–1 (2.96e–1)
3.90e+2 (3.65e+2)−
2.53e+0 (1.62e+0)
2000 1.07e+0 (2.86e−1)−
6.97e–1 (1.26e–1)
1.00e+3 (1.12e+3)−
1.52e+0 (3.31e–1)
5000 9.19e−1 (2.53e−1)−
6.59e–1 (8.72e–2)
1.14e+3 (1.08e+3)−
1.76e+0 (1.29e+0)
500 1.22e+0 (1.50e–1) +
1.45e+0 (5.70e−2) 7.99e+1 (6.37e+1)−
1.50e+0 (7.79e–4)
1000 1.11e+0 (8.44e–2) +
1.42e+0 (1.54e−1) 1.58e+3 (8.09e+2)−
1.19e+0 (2.40e–1)
2000 1.08e+0 (7.52e–2) +
1.24e+0 (2.11e−1) 5.12e+3 (2.11e+3)−
9.49e–1 (1.15e–1)
5000 1.09e+0 (9.84e–2) =
1.13e+0 (1.81e−1) 8.17e+3 (2.52e+3)−
9.41e–1 (1.21e–1)
500 7.26e−1 (1.06e−1) =
5.75e–1 (1.66e–1)
9.88e−1 (8.90e−2)− 8.46e–1 (1.00e–1)
1000 5.60e−1 (1.07e−2)−
5.03e–1 (1.23e–1)
9.88e−1 (6.01e−2)− 5.65e–1 (1.13e–1)
2000 5.12e−1 (6.86e−3)−
4.81e–1 (1.34e–1)
9.71e−1 (5.95e−2)− 6.34e–1 (1.94e–1)
5000 4.92e−1 (4.09e−2)−
3.93e–1 (4.82e–2)
9.35e−1 (5.99e−2)− 4.78e–1 (9.52e–2)
500 1.45e−1 (5.78e−3)−
3.48e–2 (9.02e–3)
1.18e+0 (1.51e−1)− 7.10e–1 (8.98e–2)
1000 1.55e−1 (5.13e−3)−
2.53e–2 (6.72e–3)
2.92e+0 (3.53e−1)− 7.33e–1 (4.26e–2)
2000 1.54e−1 (2.59e−3)−
2.04e–2 (5.27e–3)
3.93e+0 (4.52e−1)− 7.41e–1 (2.70e–3)
5000 1.53e−1 (1.12e−3)−
1.97e–2 (6.43e–3)
4.63e+0 (4.51e−1)− 7.42e–1 (2.96e–5)
500 1.64e−1 (7.38e−2)−
9.49e–2 (4.40e–3)
7.40e−1 (1.71e−1)− 3.14e–1 (2.50e–2)
1000 1.28e−1 (5.52e−2)−
8.34e–2 (4.15e–3)
7.92e−1 (1.31e−1)− 3.17e–1 (2.33e–2)
2000 1.34e−1 (6.06e−2)−
8.01e–2 (3.17e–3)
8.33e−1 (1.62e−1)− 2.92e–1 (4.58e–2)
5000 1.16e−1 (4.89e−2)−
7.75e–2 (2.63e–3)
8.96e−1 (9.11e−2)− 3.05e–1 (2.41e–2)
500 8.10e−1 (1.14e−16)− 9.68e–2 (1.64e–1)
5.23e–1 (6.12e–2) +
8.06e−1 (3.85e−3)
1000 6.89e−1 (1.15e−1)−
2.54e–2 (5.93e–3)
9.82e−1 (2.64e−1)− 7.91e–1 (1.31e–2)
2000 5.92e−1 (4.67e−2)−
1.67e–2 (4.60e–3)
5.79e+0 (1.66e+0)−
6.68e–1 (4.05e–2)
5000 5.99e−1 (5.55e−2)−
2.99e–2 (5.21e–2)
1.22e+1 (3.46e+0)−
6.36e–1 (2.45e–2)
500 1.49e+0 (1.21e−1)−
5.82e–1 (1.75e–3)
7.81e−1 (1.63e−1) = 7.27e–1 (2.48e–1)
1000 1.06e+0 (2.04e−1)−
5.68e–1 (3.21e–2)
8.05e−1 (1.17e−1)− 5.98e–1 (1.47e–2)
2000 1.00e+0 (2.48e−1)−
5.23e–1 (6.46e–2)
1.42e+0 (3.58e−1)− 8.38e–1 (2.80e–1)
5000 1.01e+0 (2.77e−1)−
5.05e–1 (6.95e–2)
1.31e+1 (5.55e+0)−
7.45e–1 (2.51e–1)
Best/all
8/72
64/72
10/72
62/72
+/-/=
3/53/16
–
10/61/1
–
series by the end of the evolution. Compared with the bi-objective test cases, the convergence traces on the tri-objective problem have the same trend. In the face of more complex problems, the MOEMT series maintain advantages on tri-objective LSMOP9, shown in Fig. 9.3c. MOEADVA and LMOCSO are not outstanding in all cases because of the slow convergence speed, consistent with the above analysis. It is worth noting that different internal optimizers of MOEMT representing different search strategies have expertise on different test problems. For example,
196
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
Fig. 9.3 The convergence trace of 11 compared methods on bi-objective LSMOP1 with 1000 variables, bi-objective LSMOP8 with 2000 variables, and tri-objective LSMOP9 with 5000 variables based on IGD values. x-axis: number of FEs; y-axis: IGD value. (a) LSMOP1 (M=2 D=1000). (b) LSMOP8 (M=2 D=2000). (c) LSMOP9 (M=3 D=5000)
MOEMT-NSGSII has a clear advantage over MOEMT-MOEADDE on the triobjective LSMOP9, while MOEMT-MOEADDE performs better on the bi-objective LSMOP1 and LSMOP8. How to adaptively choose an appropriate search strategy to improve the MOEMT framework could be studied in depth in future work, and it will not be covered here.
9.3 Empirical Study
197
Fig. 9.4 The final solution sets obtained by 11 compared methods on bi-objective LSMOP1 and LSMOP5 with different dimensions, and the true PFs are indicated by the black line
9.3.2.3 Final Non-dominated Solutions To visualize the performance of these algorithms on LSMOPs, the final nondominated solutions with the median IGD values obtained by 11 compared algorithms on the same test problems (i.e., bi-objective LSMOP1 and LSMOP8 with 500 to 5000 decision variables and tri-objective LSMOP9 with 5000 variables) are depicted in Figs. 9.4 and 9.5, respectively. As can be observed, on LSMOP1 with a linear PF, the MOEMT series get sets of solutions with both good convergence and diversity, especially MOEMT-SMPSO and MOEMT-MEADDE. For the series of WOF and LSMOF, the quality of the final solution sets is the middle case. For MOEADVA and LMOCSO, the solution sets obtained by them are not wellconverged. A similar performance also appears on bi-objective LSMOP8 with a concave PF that both MOEMT-SMPSO and MOEMT-MEADDE gain a collection of high-quality solutions, followed by WOF-SMPSO, with MOEADVA worst. On the tri-objective LSMOP9 with a disconnected PF, most of the solution sets obtained by compared methods are well-converged, but their diversity is insufficient since the number of evaluations is relatively small. In this situation, the MOEMT series can find more good solutions than other competitors accelerating the convergence, which indicates the high efficiency of MOEMT.
9.3.2.4 Computation Efficiency To further explore the computational efficiency of MOEMT, we provide two bar graphs regarding the average computation time of the 11 comparison algorithms on bi-objective and tri-objective LSMOP1-LSMOP9 with 1000 variables, shown in Fig. 9.6. As can be seen, the average computation time values of MOEMT series embedded different solvers are comparable to the comparison algorithms in the same group, especially on more complex problems with three objectives.
198
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
Fig. 9.5 The final solution sets obtained by 11 compared methods on tri-objective LSMOP9 with 5000 decision variables, and the true PFs are indicated by the dash area
Fig. 9.6 Average computation time of 11 compared methods on bi-objective and tri-objective LSMOP1-LSMOP9 with 1000 decision variables. x-axis: test problems; y-axis: time(s)
Compared to the state-of-the-art algorithm for solving large-scale multi-objective optimization problems, MOEMT achieves superior performance without losing computational efficiency. Besides, to find more high-quality solutions by limited computing resources, the knowledge transfer across tasks in MOEMT is carried out throughout the evolutionary process, accompanied by periodic task replacement. In future work, we can further improve the efficiency by adaptively adjusting the frequency of knowledge transfer and task replacement. Overall, compared with state-of-the-art methods, MOEMT achieves promising performance in terms of superior solution quality and fast convergence rate, thus verifying the effectiveness and efficiency of the multi-space evolutionary search for large-scale multi-objective optimization problems.
9.3 Empirical Study
199
Fig. 9.7 Illustration of the transferred solutions and the current population at different evolution stages on bi-objective LSMOP4 with 2000 decision variables and bi-objective LSMOP5 with 1000 decision variables
9.3.3 Effectiveness of Knowledge Transfer Knowledge transfer across tasks plays an essential role in MOEMT for efficient evolutionary search. Valuable traits obtained from the task-specific searching space can be seamlessly transferred to the original space providing search direction and guidance, thereby improving the effectiveness and efficiency of the original problem-solving. The transferred traits from the original problem also can be inherited into the simplified space enriching the information for the intra-task evolutionary search. To depict the intuitive effectiveness of knowledge transfer, Fig. 9.7 illustrates the transferred solutions in the current population at different evolution stages on biobjective LSMOP4 with 2000 decision variables and bi-objective LSMOP5 with 1000 decision variables. Since the convergence rate of MOEMT on LSMOP4 is faster than LSMOP5, the selected moments are at very early stages. It can be observed that the transferred solutions are generally closer to the true PFs than the other solutions in the current population. These high-quality solutions with beneficial knowledge information are transferred across tasks at the beginning of the evolution, significantly improving the convergence rate. In addition to the superior convergence speed, knowledge transfer also brings diversity, spreading the solutions widely along with the true PFs.
9.3.4 Parameter Sensitivity Analysis In this section, parameter sensitivity analysis is adopted to investigate the influence degrees of each parameter on the performance of MOEMT. The tested parameters are the random mating probability rmp and the interval of dynamic task replacement
200
9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
t. Each parameter is set to four incremental values, and other untested parameters remain unchanged. The sensitivity experiments are conducted by MOEMT-SMPSO on a set of LSMOPs with 2-3 objectives and 1000 decision variables. Each instance is run 20 times on the test problems independently. Friedman’s test [275] is adopted to compare the significant differences between four methods with different rmp or t settings. Significant results of the test indicate that at least one of the samples is different from the other samples.
9.3.4.1 Random Mating Probability (rmp) rmp is used to across tasks and is set to .rmp = [0, 0.5, 0.8, 1] for sensitivity analysis. The results related to rmp is summarized in Table 9.5. The p-values computed by Friedman’s test show that there are significant differences among instances with different rmp on 16 out of 18 test problems at the significance level of .α = 0.05. The mean rank at the bottom of the table shows the average rankings of the four instances on all test problems based on IGD values. A smaller rank indicates better performance for this setting. It can be observed that a higher value of rmp obtains better performance than a lower rmp in most cases, indicating that a higher degree of cross-task knowledge transfer promotes the unhindered genetic material exchange and leads to a rapid convergence rate. According to the IGD-based average performance ranks, the instance with .rmp = 0.8 has the best mean rank on all the test problems. Thus, the parameter rmp is recommended to be set to 0.8 in all the other experiments.
9.3.4.2 Interval of Task Replacement (t) The value of t adjusts the frequency of dynamic task replacement. It is set to 100, 500, 1000, and 5000, respectively. According to Friedman’s test results presented in Table 9.6, the performance of MOEMT with different settings of t differs significantly. However, the optimal value for t depends on the problem to be optimized. The best performances on the test problems are divided equally among compared instances except for the case with .t = 100 since a too-small value of t may lead to an insufficient search in simplified space. Referring to the mean ranks of different instances, MOEMT adopts .t = 500 in the experiments.
9.3.5 Real-World Application of Neural Network Training Problem Further, we test MOEMT method and four compared algorithms, i.e., LSMOF, WOF, DGEA, and LMOCSO, on a real-world application of the neural network
Mean rank
LSMOP9
LSMOP8
LSMOP7
LSMOP6
LSMOP5
LSMOP4
LSMOP3
LSMOP2
Problem LSMOP1
M 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
rmp= 0 2.28e−2 (3.99e−3) 2.38e−1 (3.77e−2) 8.67e−3 (9.94e−4) 7.00e−2 (8.03e−3) 1.41e+0 (2.80e−1) 8.60e–1 (1.14e–16) 1.39e−2 (1.33e−3) 9.54e−2 (6.49e−3) 4.70e−1 (1.86e−1) 4.70e−1 (1.44e−1) 2.21e−1 (1.06e−1) 2.50e+0 (3.93e+0) 1.45e+0 (1.36e−1) 5.07e−1 (1.39e−1) 8.52e−2 (1.55e−1) 1.24e−1 (3.12e−2) 5.69e−1 (2.30e−1) 6.31e−1 (2.09e−1) 3.56
rmp= 0.5 1.45e−2 (1.40e−3) 1.48e−1 (2.57e−2) 7.03e−3 (5.63e−4) 5.81e−2 (2.36e−3) 8.21e−1 (1.99e−1) 8.60e−1 (1.14e−16) 1.12e−2 (7.13e−4) 8.85e−2 (2.98e−3) 9.58e−2 (2.79e−2) 2.58e–1 (6.49e–2) 7.46e−2 (9.17e−2) 1.00e+0 (9.96e−1) 1.41e+0 (1.74e−1) 4.75e–1 (1.16e–1) 2.76e−2 (1.06e−2) 8.96e−2 (4.72e−3) 1.19e−1 (1.92e−1) 5.79e−1 (1.21e−2) 2.20
rmp= 0.8 1.32e−2 (1.20e−3) 1.28e–1 (2.48e–2) 6.76e–3 (3.98e–4) 5.77e–2 (2.20e–3) 7.41e−1 (1.41e−1) 8.60e−1 (1.14e−16) 1.09e−2 (4.15e−4) 8.78e–2 (4.19e–3) 7.50e−2 (1.31e−2) 2.96e−1 (6.91e−2) 7.04e−2 (7.59e−2) 8.45e–1 (3.16e–1) 1.42e+0 (1.54e−1) 4.79e−1 (1.26e−1) 2.53e–2 (6.72e–3) 8.48e–2 (4.03e–3) 2.54e–2 (5.93e–3) 5.62e–1 (3.93e–2) 1.96
rmp= 1 1.26e–2 (1.16e–3) 1.30e−1 (3.01e−2) 7.11e−3 (7.82e−4) 5.94e−2 (1.73e−3) 7.40e–1 (2.29e–1) 8.70e−1 (4.36e−2) 1.06e–2 (6.18e–4) 9.22e−2 (5.05e−3) 6.95e–2 (1.09e–2) 3.22e−1 (7.35e−2) 5.34e–2 (3.50e–2) 1.27e+0 (1.30e+0) 1.38e+0 (1.76e–1) 5.88e−1 (1.39e−1) 2.61e−2 (7.77e−3) 8.54e−2 (5.41e−3) 5.28e−1 (1.10e−1) 8.46e−1 (2.78e−1) 2.27
p-value of F-test 0 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0.392 0 < 0.05 0.001 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0.782 0.004 < 0.05 0.017 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 –
Table 9.5 The statics of IGD results obtained by MOEMT-SMPSO with different setting of rmp values (.D = 1000). p-values are computed by the Friedman’s test. Mean Rank at the bottom of the table provides the average performance ranks of the four instances when solving all test problems with Friedman’s test
9.3 Empirical Study 201
Mean rank
LSMOP9
LSMOP8
LSMOP7
LSMOP6
LSMOP5
LSMOP4
LSMOP3
LSMOP2
Problem LSMOP1
M 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
t = 100 1.57e−2 (2.63e−3) 3.40e−1 (1.37e−1) 8.21e−3 (9.66e−4) 6.38e−2 (4.07e−3) 1.30e+0 (3.14e−1) 8.82e−1 (9.43e−2) 1.21e−2 (8.17e−4) 9.85e−2 (4.75e−3) 4.03e−1 (1.97e−1) 5.09e−1 (2.42e−1) 3.29e−1 (1.40e−1) 1.07e+1 (2.64e+1) 1.40e+0 (1.64e–1) 6.32e−1 (1.98e−1) 2.24e–2 (2.80e–3) 9.41e−2 (5.43e−3) 7.03e−1 (1.14e−1) 1.14e+0 (2.99e−3) 3.45
t = 500 1.32e–2 (1.20e–3) 1.28e−1 (2.48e−2) 6.76e−3 (3.98e−4) 5.77e−2 (2.20e−3) 7.41e–1 (1.41e–1) 8.60e–1 (1.14e–16) 1.09e−2 (4.15e−4) 8.78e−2 (4.19e−3) 7.50e−2 (1.31e−2) 2.96e−1 (6.91e−2) 7.04e−2 (7.59e−2) 8.45e−1 (3.16e−1) 1.42e+0 (1.54e−1) 4.79e–1 (1.26e–1) 2.53e−2 (6.72e−3) 8.48e−2 (4.03e−3) 2.54e–2 (5.93e–3) 5.62e–1 (3.93e–2) 2.05
t = 1000 1.34e−2 (1.91e−3) 1.18e–1 (1.52e–2) 6.52e–3 (4.06e–4) 5.71e−2 (2.15e−3) 8.03e−1 (1.81e−1) 8.83e−1 (1.03e−1) 1.03e–2 (4.32e–4) 8.82e−2 (5.17e−3) 6.42e−2 (1.65e−2) 2.73e–1 (1.00e–1) 6.95e−2 (9.17e−2) 8.04e−1 (2.49e−1) 1.42e+0 (1.65e−1) 5.13e−1 (1.32e−1) 3.21e−2 (1.01e−2) 8.58e−2 (3.44e−3) 2.74e−2 (1.90e−2) 5.64e−1 (4.53e−2) 2.14
t = 5000 1.54e−2 (2.45e−3) 1.40e−1 (4.27e−2) 6.74e−3 (4.39e−4) 5.63e–2 (2.11e–3) 9.02e−1 (2.00e−1) 9.42e−1 (2.02e−1) 1.06e−2 (5.99e−4) 8.62e–2 (2.65e–3) 6.34e–2 (7.19e–3) 3.19e−1 (8.57e−2) 4.45e–2 (9.68e–3) 7.09e–1 (2.03e–1) 1.49e+0 (9.20e−2) 4.88e−1 (1.22e−1) 3.82e−2 (8.34e−3) 8.41e–2 (3.60e–3) 2.93e−2 (1.17e−2) 5.65e−1 (2.50e−2) 2.36
p-value of F-test 0 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0.209 0 < 0.05 0 < 0.05 0 < 0.05 0.008 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0.042 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 0 < 0.05 –
Table 9.6 The statics of IGD results obtained by MOEMT-SMPSO with different setting of t values (.D = 1000). p-values are computed by the Friedman’s test. Mean Rank at the bottom of the table provides the average performance ranks of the four instances when solving all test problems with Friedman’s test
202 9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
9.3 Empirical Study
203
training problem provided by PlatEMO v3.0 [260, 271]. Training neural networks can be treated as a multi-objective optimization problem by considering various objectives. In this study, the objectives suggested in [276] are adopted, i.e., minimizing the complexity and classification error rate to optimize the weights of a feedforward neural network. Weight values of a neural network are encoded as a solution with D decision variables, where D equals to the number of weights. Four datasets for the neural network training problem are taken from the UCI machine learning repository [277]. All comparisons embed NSGA-II as the internal optimizer except for LMOCSO. The parameters of each compared method are kept consistent with the previous settings. Each method is executed for 10000 FEs with a population size of 100. HV indicator is adopted to evaluate the final solution set with the reference point (1, 1). Table 9.7 presents the median HV values of five algorithms on the chosen datasets. Details of the datasets, including the number of samples and the number of features, are also provided in the table. It is evident that the overall performance of MOEMT is statistically better than all the comparison methods for solving the neural network training problems. MOEMT obtains the best HV values on all datasets. The performance of MOEMT on the application of large-scale multi-objective optimization verifies the effectiveness of MOEMT and makes the experimental results more convincing.
LMOCSO
MOEMT
521 3.01e−1(1.58e−2)− 2.94e−1(1.15e−2)− 3.01e−1(2.12e−2)− 3.07e−1(2.12e−2)− 3.42e–1(3.04e–2) 1241 3.13e−1(1.65e−2)− 3.15e−1(1.54e−2)− 3.18e−1(1.52e−2)− 3.13e−1(1.62e−2)− 3.63e–1(3.94e–2) – 0/4/0 0/4/0 0/4/0 0/4/0
DGEA
24 60
WOF
1000 Statlog_German Connectionist_bench_Sonar 208 +/-/=
LSMOF
321 3.37e−1(1.66e−2)− 3.38e−1(2.02e−2)− 3.32e−1(1.94e−2)− 3.32e−1(1.82e−2)− 3.90e–1(3.46e–2) 401 3.49e−2(2.35e−2)− 3.41e−1(1.53e−2)− 3.51e−1(2.30e−2)− 3.49e−1(2.21e−2)− 4.25e–1(7.75e–2)
Samples Features D
14 18
Statlog_Australian Climate
690 540
Dataset
Table 9.7 HV values of LSMOF, WOF, DGEA, LMOCSO, and MOEMT on four datasets for neural network training
204 9 Multi-Space Evolutionary Search for Large-Scale Multi-Objective Optimization
9.4 Summary
205
9.4 Summary In this chapter, we have presented mutli-space evolutionary algorithm, i.e., MOEMT, for large-scale multi-objective optimization via multi-variation multifactorial evolutionary search. After constructing several simplified searching spaces by problem variation, MOEMT treats the problem variant as a helper and performs a multi-factorial evolutionary search on both the given problem and the helper tasks. Not only the useful solutions found along the search can be transferred across tasks for efficient problem-solving, but also the preservation of the global optimum of the given LSMOPs can be guaranteed. To verify the effectiveness and efficiency of MOEMT, comprehensive empirical studies on a set of large-scale multi-objective benchmarks against the state-of-the-art algorithms for solving LSMOPs have been conducted. The obtained results show that the simplified task well assisted the original problem, and the knowledge transferred across tasks in MOEMT provides a significant improvement in problem-solving, which has confirmed the efficacy of multi-space evolutionary search for large-scale multi-objective optimization.
References
1. M. Zöller, M.F. Huber, Benchmark and survey of automated machine learning frameworks. J. Artif. Intell. Res. 70, 409–472 (2021) 2. M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, and prospects. Science 349, 255–260 (2015) 3. S.K. Srivastava, Green supply-chain management: a state-of-the-art literature review. Int. J. Manage. Rev. 9(1), 53–80 (2007) 4. T. Rios, B.V. Stein, T. Bäck, B. Sendhoff, S. Menzel, Multitask shape optimization using a 3-d point cloud autoencoder as unified representation. IEEE Trans. Evol. Comput. 26(2), 206–217 (2022) 5. E. Bisong, Google automl: cloud vision, in Building Machine Learning and Deep Learning Models on Google Cloud Platform. (Apress, Berkeley, CA, 2019) 6. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. (Addison-Wesley Longman Publishing Co., Inc., Boston, 1989) 7. T. Bäck, Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms (Oxford University Press, Inc., Oxford, 1996) 8. C. Darwin, The Origin of Species by Means of Natural Selection. Pub One Info, 1859 9. G.B. Dantzig, Discrete-variable extremum problems. Oper. Res. 5(2), 266–288 (1957) 10. Y.S. Ong, M.H. Lim, N. Zhu, K.W. Wong, Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybernet. Part B (Cybernetics) 36(1), 141–152 (2006) 11. J.H. Holland, Genetic algorithms. Sci. Am. 267(1), 66–73 (1992) 12. A. Bertoni, M. Dorigo, Implicit parallelism in genetic algorithms. Artif. Intell. 61(2), 307– 314 (1993) 13. L. Feng, Y. Huang, L. Zhou, J. Zhong, A. Gupta, K. Tang, K.C. Tan, Explicit evolutionary multitasking for combinatorial optimization: a case study on capacitated vehicle routing problem. IEEE Trans. Cybernet. 51(6), 3143–3156 (2021) 14. H. Ishibuchi, Memetic algorithms for evolutionary multiobjective combinatorial optimization, in The 40th International Conference on Computers Indutrial Engineering (2010), pp. 1–2 15. X. Zhang, Y. Tian, Y. Jin, A knee point-driven evolutionary algorithm for many-objective optimization. IEEE Trans. Evolut. Comput. 19(6), 761–776 (2015) 16. B. Tan, H. Ma, Y. Mei, M. Zhang, Evolutionary multi-objective optimization for web service location allocation problem. IEEE Trans. Serv. Comput. 14(2), 458–471 (2021)
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 L. Feng et al., Evolutionary Multi-Task Optimization, Machine Learning: Foundations, Methodologies, and Applications, https://doi.org/10.1007/978-981-19-5650-8
207
208
References
17. X. Zhang, Y. Tian, R. Cheng, Y. Jin, A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization. IEEE Trans. Evol. Comput. 22(1), 97–112 (2018) 18. W. Chen, Y. Jia, F. Zhao, X. Luo, X. Jia, J. Zhang, A cooperative co-evolutionary approach to large-scale multisource water distribution network optimization. IEEE Trans. Evolut. Comput. 23(5), 842–857 (2019) 19. A. Gupta, Y.S. Ong, L. Feng, Multifactorial evolution: toward evolutionary multitasking. IEEE Trans. Evolut. Comput. 20(3), 343–357 (2016) 20. A. Gupta, Y. Ong, L. Feng, Insights on transfer optimization: because experience is the best teacher. IEEE Trans. Emerg. Top. Comput. Intell. 2(1), 51–64 (2018) 21. K.C. Tan, L. Feng, M. Jiang, Evolutionary transfer optimization - a new frontier in evolutionary computation research. IEEE Comput. Intell. Mag. 16(1), 22–33 (2021) 22. L. Zhou, L. Feng, J. Zhong, Z. Zhu, B. Da, Z. Wu, A study of similarity measure between tasks for multifactorial evolutionary algorithm, in Proceedings of the Genetic and Evolutionary Computation Conference Companion (2018), pp. 229–230 23. A. Gupta, Y.S. Ong, L. Feng, K.C. Tan, Multi-objective multifactorial optimization in evolutionary multitasking. IEEE Trans. Cybernet. 47(7), 1652–1665 (2017) 24. X. Ma, J. Yin, A. Zhu, X. Li, Y. Yu, L. Wang, Y. Qi, Z. Zhu, Enhanced multifactorial evolutionary algorithm with meme helper-tasks, in IEEE Transactions on Cybernetics, 2021 25. D. Wierstra, T. Schaul, T. Glasmachers, Y. Sun, J. Peters, J. Schmidhuber, Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014) 26. Q. Zhang, H. Muhlenbein, On the convergence of a class of estimation of distribution algorithms. IEEE Trans. Evolut. Comput. 8(2), 127–136 (2004) 27. B. Da, A. Gupta, Y.S. Ong, Curbing negative influences online for seamless transfer evolutionary optimization. IEEE Trans. Cybernet. 49(12), 4365–4378 (2018) 28. Y. Wen, C. Ting, Parting ways and reallocating resources in evolutionary multitasking, in 2017 IEEE Congress on Evolutionary Computation (CEC) (IEEE, New York, 2017), pp. 2404– 2411 29. Q. Xu, N. Wang, L. Wang, W. Li, Q. Sun, Multi-task optimization and multi-task evolutionary computation in the past five years: a brief review. Mathematics 9(8), 864 (2021) 30. E. Osaba, A.D. Martinez, J. Del Ser, Evolutionary multitask optimization: a methodological overview, challenges and future research directions (2021). Preprint. arXiv:2102.02558 31. T. Wei, S. Wang, J. Zhong, D. Liu, J. Zhang, A review on evolutionary multi-task optimization: trends and challenges, in IEEE Transactions on Evolutionary Computation, 2021 32. C. Yang, J. Ding, K.C. Tan, Y. Jin, Two-stage assortative mating for multi-objective multifactorial evolutionary optimization, in 2017 IEEE 56th Annual Conference on Decision and Control (CDC) (2017), pp. 76–81 33. L. Feng, W. Zhou, L. Zhou, S. Jiang, J. Zhong, B. Da, Z. Zhu, Y. Wang, An empirical study of multifactorial pso and multifactorial de, in 2017 IEEE Congress on Evolutionary Computation (CEC) (2017), pp. 921–928 34. J. Tang, Y. Chen, Z. Deng, Y. Xiang, C. Joy, A group-based approach to improve multifactorial evolutionary algorithm, in IJCAI (2018), pp. 3870–3876 35. A. Gupta, Y.S. Ong, B. Da, L. Feng, S. Handoko, Landscape synergy in evolutionary multitasking, in 2016 IEEE Congress on Evolutionary Computation (CEC) (IEEE, New York, 2016), pp. 3076–3083 36. K.K. Bali, Y. Ong, A. Gupta, P.S. Tan, Multifactorial evolutionary algorithm with online transfer parameter estimation: Mfea-ii. IEEE Trans. Evolut. Comput. 24(1), 69–83 (2019) 37. L. Zhou, L. Feng, K.C. Tan, J. Zhong, Z. Zhu, K. Liu, C. Chen, Toward adaptive knowledge transfer in multifactorial evolutionary computation, in IEEE Transactions on cybernetics, 2020 38. L. Feng, L. Zhou, J. Zhong, A. Gupta, Y. Ong, K. Tan, A.K. Qin, Evolutionary multitasking via explicit autoencoding, in IEEE Transactions on Cybernetics, 2018
References
209
39. R. Hashimoto, H. Ishibuchi, N. Masuyama, Y. Nojima, Analysis of evolutionary multi-tasking as an island model, in Proceedings of the Genetic and Evolutionary Computation Conference Companion (2018), pp. 1894–1897 40. R. Liaw, C. Ting, Evolutionary manytasking optimization based on symbiosis in biocoenosis, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 4295– 4303 41. J. Lin, H. Liu, B. Xue, M. Zhang, F. Gu, Multiobjective multitasking optimization based on incremental learning. IEEE Trans. Evolut. Comput. 24, 824–838 (2019) 42. R. Lim, A. Gupta, Y.S. Ong, L. Feng, A. Zhang, Non-linear domain adaptation in transfer evolutionary optimization. Cognit. Comput. 13(2), 290–307 (2021) 43. L. Zhou, L. Feng, A. Gupta, Y.S. Ong, Learnable evolutionary search across heterogeneous problems via kernelized autoencoding. IEEE Trans. Evolut. Comput. 25(3), 567–581 (2021) 44. D.H. Wolpert, W.G. Macready, No free lunch theorems for optimization. IEEE Trans. Evolut. Comput. 1(1), 67–82 (1997) 45. B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evolut. Comput. 20(4), 606–626 (2015) 46. S. Huang, J. Zhong, W. Yu, Surrogate-assisted evolutionary framework with adaptive knowledge transfer for multi-task optimization, in IEEE Transactions on Emerging Topics in Computing, 2019 47. Y. Liu, Y. Sun, B. Xue, M. Zhang, G.G. Yen, K.C. Tan, A survey on evolutionary neural architecture search, in IEEE Transactions on Neural Networks and Learning Systems, 2021 48. G. Morse, K.O. Stanley, Simple evolutionary optimization can rival stochastic gradient descent in neural networks, in Proceedings of the Genetic and Evolutionary Computation Conference, 2016, pp. 477–484 49. X. Cui, W. Zhang, Z. Tüske, M. Picheny, Evolutionary stochastic gradient descent for optimization of deep neural networks, in Advances in Neural Information Processing Systems, vol. 31, 2018 50. B. Zhang, Training Deep Neural Networks via Multi-task Optimisation. PhD thesis, Swinburne University of Technology Melbourne, Australia 2020 51. R. Chandra, A. Gupta, Y. Ong, C. Goh, Evolutionary multi-task learning for modular knowledge representation in neural networks. Neural Process. Lett. 47, 993–1009 (2018) 52. N. Zhang, G. Abhishek, Z. Chen, Y. Ong, Evolutionary machine learning with minions: A case study in feature selection, IEEE Transactions on Evolutionary Computation, 2021 53. C. Wang, K. Wu, J. Liu, Evolutionary multitasking auc optimization. Preprint. arXiv:2201.01145, 2022 54. K. Chen, B. Xue, M. Zhang, F. Zhou, An evolutionary multitasking-based feature selection method for high-dimensional classification, in IEEE Transactions on Cybernetics, 2020 55. K. Chen, B. Xue, M. Zhang, F. Zhou, Evolutionary multitasking for feature selection in high-dimensional classification via particle swarm optimisation, in IEEE Transactions on Evolutionary Computation, 2021 56. Y. Wen, C. Ting, Learning ensemble of decision trees through multifactorial genetic programming, in 2016 IEEE Congress on Evolutionary Computation (CEC), 2016, pp. 5293– 5300 57. B. Zhang, A.K. Qin, T. Sellis, Evolutionary feature subspaces generation for ensemble classification, in Proceedings of the Genetic and Evolutionary Computation Conference, 2018, pp. 577–584 58. J. Shi, T. Shao, X. Liu, X. Zhang, Z. Zhang, Y. Lei, Evolutionary multitask ensemble learning model for hyperspectral image classification. IEEE J. Select. Top. Appl. Earth Obser. Remote Sens. 14, 936–950 (2020) 59. H. Li, Y. Ong, M. Gong, Z. Wang, Evolutionary multitasking sparse reconstruction: framework and case study. IEEE Trans. Evolut. Comput. 23(5), 733–747 (2018) 60. Y. Zhao, H. Li, Y. Wu, S. Wang, M. Gong, Endmember selection of hyperspectral images based on evolutionary multitask, in 2020 IEEE Congress on Evolutionary Computation (CEC), 2020, pp. 1–7
210
References
61. J. Li, H. Li, Y. Liu, M. Gong, Multi-fidelity evolutionary multitasking optimization for hyperspectral endmember extraction, in Applied Soft Computing, 2021, p. 107713 62. D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, D. Sculley, Google vizier: a service for black-box optimization, in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017, pp. 1487–1495 63. K. Swersky, J. Snoek, R.P. Adams, Multi-task Bayesian optimization. Adv. Neural Inf. Process. Syst. 26, 2004–2012 (2013) 64. R.P. Adams, R.J. Snoek, K. Swersky, Systems and methods for multi-task bayesian optimization, January 2 2018. US Patent 9,858,529 65. S. Jayaratna, Understanding University Students’ Journey Using Advanced Data Analytics. PhD thesis, Swinburne University of Technology Melbourne, Australia 2021 66. Y. Bi, B. Xue, M. Zhang, Learning and sharing: a multitasking genetic programming approach to image feature learning, in IEEE Transactions on Evolutionary Computation, 2021 67. J. Zhong, L. Feng, W. Cai, Y. Ong, Multifactorial genetic programming for symbolic regression problems, in IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2018 68. A.D. Martinez, J. Del Ser, E. Osaba, F. Herrera, Adaptive multi-factorial evolutionary optimization for multi-task reinforcement learning, in IEEE Transactions on Evolutionary Computation, 2021 69. E.O. Scott, K.A. De Jong, Multitask evolution with cartesian genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2017, pp. 255–256 70. E.O. Scott, K.A. De Jong, Automating knowledge transfer with multi-task optimization, in 2019 IEEE Congress on Evolutionary Computation (CEC), 2019, pp. 2252–2259 71. J. Zhong, Y. Ong, W. Cai, Self-learning gene expression programming. IEEE Trans. Evolut. Comput. 20(1), 65–80 (2015) 72. J.C. Bongard, Evolutionary robotics. Commun. ACM 56(8), 74–83 (2013) 73. A. Cangelosi, J. Bongard, M.H. Fischer, S. Nolfi, Embodied Intelligence, 2015, pp. 697–714 74. A. Moshaiov, A. Tal, Family bootstrapping: a genetic transfer learning approach for onsetting the evolution for a set of related robotic tasks, in 2014 IEEE Congress on Evolutionary Computation (CEC) (IEEE, New York, 2014), pp. 2801–2808 75. J. Mouret, G. Maguire, Quality diversity for multi-task optimization, in Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 2020, pp. 121–129 76. J. Mouret, J. Clune, Illuminating search spaces by mapping elites (2015). Preprint, arXiv:1504.04909 77. C. Wang, J. Liu, K. Wu, Z. Wu, Solving multi-task optimization problems with adaptive knowledge transfer via anomaly detection, in IEEE Transactions on Evolutionary Computation, 2021 78. T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, S. Levine, Meta-world: a benchmark and evaluation for multi-task and meta reinforcement learning, in Conference on Robot Learning (PMLR, New York, 2020), pp. 1094–1100 79. J. Rubio-Hervas, A. Gupta, Y. Ong, Data-driven risk assessment and multicriteria optimization of UAV operations. Aerospace Sci. Technol. 77, 510–523 (2018) 80. Y. Ong, A. Gupta, Evolutionary multitasking: a computer science view of cognitive multitasking. Cognit. Comput. 8(2), 125–142 (2016) 81. Y. Zhou, T. Wang, X. Peng, MFEA-IG: a multi-task algorithm for mobile agents path planning, in 2020 IEEE Congress on Evolutionary Computation (CEC) (IEEE, New York, 2020), pp. 1–7 82. J. Yi, J. Bai, H. He, W. Zhou, L. Yao, A multifactorial evolutionary algorithm for multitasking under interval uncertainties, in IEEE Transactions on Evolutionary Computation, 2020 83. K.K. Bali, A. Gupta, Y. Ong, P.S. Tan, Cognizant multitasking in multiobjective multifactorial evolution: MO-MFEA-II, in IEEE Transactions on Cybernetics, 2020 84. K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolut. Comput. 6(2), 182–197 (2002)
References
211
85. L. While, P. Hingston, L. Barone, S. Huband, A faster algorithm for calculating hypervolume. IEEE Trans. Evolut. Comput. 10(1), 29–38 (2006) 86. T. Rios, B. van Stein, T. Bäck, B. Sendhoff, S. Menzel, Multi-task shape optimization using a 3d point cloud autoencoder as unified representation, in IEEE Transactions on Evolutionary Computation, 2021 87. W. Dai, Z. Wang, K. Xue, System-in-package design using multi-task memetic learning and optimization, in Memetic Computing, 2021 88. X. Xue, K. Zhang, K.C. Tan, L. Feng, J. Wang, G. Chen, X. Zhao, L. Zhang, J. Yao, Affine transformation-enhanced multifactorial optimization for heterogeneous problems, in IEEE Transactions on Cybernetics, 2020 89. J. Liang, K. Qiao, M. Yuan, K. Yu, B. Qu, S. Ge, Y. Li, G. Chen, Evolutionary multi-task optimization for parameters extraction of photovoltaic models. Energy Convers. Manage. 207, 112509 (2020) 90. J. Liu, P. Li, G. Wang, Y. Zha, J. Peng, G. Xu, A multitasking electric power dispatch approach with multi-objective multifactorial optimization algorithm. IEEE Access 8, 155902–155911 (2020) 91. D. Wu, X. Tan, Multitasking genetic algorithm (mtga) for fuzzy system optimization. IEEE Trans. Fuzzy Syst. 28(6), 1050–1061 (2020) 92. G. Avigad, A. Moshaiov, Interactive evolutionary multiobjective search and optimization of set-based concepts, IEEE Trans. Syst. Man Cybernet. Part B (Cybernetics) 39(4), 1013–1027 (2009) 93. G. Yokoya, H. Xiao, T. Hatanaka, Multifactorial optimization using artificial bee colony and its application to car structure design optimization, in 2019 IEEE Congress on Evolutionary Computation (CEC) (IEEE, New York, 2019), pp. 3404–3409 94. H. Xiao, G. Yokoya, T. Hatanaka, Multifactorial pso-fa hybrid algorithm for multiple car design benchmark, in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (IEEE, New York, 2019), pp. 1926–1931 95. T. Kohira, H. Kemmotsu, O. Akira, T. Tatsukawa, Proposal of benchmark problem based on real-world car structure design optimization, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2018, pp. 183–184 96. Z. Wang, X. Wang, Multiobjective multifactorial operation optimization for continuous annealing production process. Ind. Eng. Chem. Res. 58, 19166–19178 (2019) 97. C. Yang, J. Ding, Y. Jin, C. Wang, T. Chai, Multitasking multiobjective evolutionary operational indices optimization of beneficiation processes, in IEEE Transactions on Automation Science and Engineering, 2018, pp. 1–12 98. R. Lim, L. Zhou, A. Gupta, Y. Ong, A.N. Zhang, Solution representation learning in multiobjective transfer evolutionary optimization. IEEE Access 9, 41844–41860 (2021) 99. J.W.E. Tay, H.Y. Ng, P.S. Tan, Model factory@ simtech-sense and response manufacturing for industry 4.0, in Implementing Industry 4.0, 2021, p. 399 100. S. Jiang, C. Xu, A. Gupta, L. Feng, Y. Ong, A.N. Zhang, P.S. Tan, Complex and intelligent systems in manufacturing. IEEE Potentials 35(4), 23–28 (2016) 101. N.T. Tam, V.T. Dat, P.N. Lan, H.T.T. Binh, A. Swami, et al., Multifactorial evolutionary optimization to maximize lifetime of wireless sensor network. Inf. Sci. 576, 355–373 (2021) 102. T.T. Huong, H.T.T. Binh, et al., A multi-task approach for maximum survival ratio problem in large-scale wireless rechargeable sensor networks, in 2021 IEEE Congress on Evolutionary Computation (CEC) (IEEE, New York, 2021), pp. 1688–1695 103. J. Park, Y. Mei, S. Nguyen, G. Chen, M. Zhang, Evolutionary multitask optimisation for dynamic job shop scheduling using niched genetic programming, in Australasian Joint Conference on Artificial Intelligence (Springer, New York, 2018), pp. 739–751 104. F. Zhang, Y. Mei, S. Nguyen, M. Zhang, K.C. Tan, Surrogate-assisted evolutionary multitask genetic programming for dynamic flexible job shop scheduling, in IEEE Transactions on Evolutionary Computation, 2021 105. Q. Shang, Y. Huang, Y. Wang, M. Li, L. Feng, Solving vehicle routing problem by memetic search with evolutionary multitasking, in Memetic Computing, 2022, pp. 1–14
212
References
106. E. Osaba, A.D. Martinez, J.L. Lobo, I. Laña, J. Del Ser, On the transferability of knowledge among vehicle routing problems by using cellular evolutionary multitasking, in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (IEEE, New York, 2020) 107. A. Rauniyar, R. Nath, P.K. Muhuri, Multi-factorial evolutionary algorithm based novel solution approach for multi-objective pollution-routing problem. Comput. Ind. Eng. 130, 757– 771 (2019) 108. L. Feng, L. Zhou, A. Gupta, J. Zhong, Z. Zhu, K. Tan, K. Qin, Solving generalized vehicle routing problem with occasional drivers via evolutionary multitasking, in IEEE Transactions on Cybernetics, 2019 109. T.B. Thang, N.B. Long, N.V. Hoang, H.T.T. Binh, Adaptive knowledge transfer in multifactorial evolutionary algorithm for the clustered minimum routing cost problem. Appl. Soft Comput. 105, 107253 (2021) 110. T.P. Dinh, B.H.T. Thanh, T.T. Ba, L.N. Binh, Multifactorial evolutionary algorithm for solving clustered tree problems: competition among cayley codes. Memetic Comput. 12, 185–217 (2020) 111. Y. Yuan, Y. Ong, A. Gupta, P.S. Tan, H. Xu, Evolutionary multitasking in permutation-based combinatorial optimization problems: Realization with TSP, QAP, LOP, and JSP, in 2016 IEEE Region 10 Conference (TENCON) (IEEE, New York, 2016), pp. 3157–3164 112. X. Hao, R. Qu, J. Liu, A unified framework of graph-based evolutionary multitasking hyperheuristic, in IEEE Transactions on Evolutionary Computation, 2020 113. F. Zhang, Y. Mei, S. Nguyen, K.C. Tan, M. Zhang, Multitask genetic programmingbased generative hyperheuristics: a case study in dynamic scheduling, IEEE Transactions on Cybernetics, 2021 114. M. Harman, S.A. Mansouri, Y. Zhang, Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 1–61 (2012) 115. L. Bao, Y. Qi, M. Shen, X. Bu, J. Yu, Q. Li, P. Chen, An evolutionary multitasking algorithm for cloud computing service composition, in World Congress on Services (Springer, New York, 2018), pp. 130–144 116. C. Wang, H. Ma, G. Chen, S. Hartmann, Evolutionary multitasking for semantic web service composition, in 2019 IEEE Congress on Evolutionary Computation (CEC), 2019, pp. 2490– 2497 117. R. Sagarna, Y. Ong, Concurrently searching branches in software tests generation through multitask evolution, in 2016 IEEE Symposium Series on Computational Intelligence (IEEE, New York, 2016) 118. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C (Cambridge University Press, New York, 1988) 119. M. Ehrgott, Multicriteria Optimization (Springer, Berlin, Heidelberg, 2005) 120. M. Iqbal, N.W. Browne, M. Zhang, Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems. IEEE Trans. Evolut. Comput. 18(4), 465–480 (2014) 121. R. Mills, T. Jansen, R.A. Watson, Transforming evolutionary search into higher-level evolutionary search by capturing problem structure. IEEE Trans. Evolut. Comput. 18(5), 628– 642 (2014) 122. X. Chen, Y.S. Ong, M.H. Lim, K.C. Tan, A multi-facet survey on memetic computation. IEEE Trans. Evolut. Comput. 15(5), 591–607 (2011) 123. Y.S. Ong, M.H. Lim, X. Chen, Memetic computation - past, present & future [research frontier]. IEEE Comput. Intell. Mag. 5(2), 24–31 (2010) 124. A.H. Wright, M.D. Vose, J.E. Rowe, Implicit parallelism. Lect. Notes Comp. Sci. 2724, 1505–1517 (2003) 125. D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. (Addison-Wesley Longman Publishing Co., Inc., New York, 1989) 126. C.R. Cloninger J. Rice, T. Reich, Multifactorial inheritance with cultural transmission and assortative mating. I. Description and basic properties of the unitary models. Am. J. Hum. Genet. 30, 618–643 (1978)
References
213
127. J.C. Bean, Genetic algorithms and random keys for sequencing and optimization. ORSA J. Comput. 6(2), 154–160 (1994) 128. J. Rice, C.R. Cloninger, T. Reich, Multifactorial inheritance with cultural transmission and assortative mating. II. A general model of combined polygenic and cultural inheritance. Am. J. Hum. Genet. 31, 176–198 (1979) 129. C. Lin, S. Lu, Scheduling scientific workflows elastically for cloud computing, in 2011 IEEE 4th International Conference on Cloud Computing, 2011, pp. 746–747 130. Y.S. Ong, Z. Zhou, D. Lim, Curse and blessing of uncertainty in evolutionary algorithm using approximation, in 2006 IEEE International Conference on Evolutionary Computation, 2006 131. P. Chauhan, K. Deep, M. Pant, Novel inertia weight strategies for particle swarm optimization. Memetic Comput. 5(3), 229–251 (2013) 132. K. Deb, R.B. Agrawal, Simulated binary crossover for continuous search space. Complex Syst. 9(2), 115–148 (1995) 133. R. Hinterding, Gaussian mutation and self-adaption for numeric genetic algorithms, in Proceedings of 1995 IEEE International Conference on Evolutionary Computation, vol. 1, (1995), p. 384 134. R. Meuth, M.H. Lim, Y.S. Ong, D.C. Wunsch, A proposition on memes and meta-memes in computing for higher-order learning. Memetic Comput. 1(2), 85–100 (2009) 135. Y.S. Ong, A.J. Keane, Meta-lamarckian learning in memetic algorithms. IEEE Trans. Evolut. Comput. 8(2), 99–110 (2004) 136. B. Da, Y.S. Ong, L. Feng, A.K. Qin, A. Gupta, Z. Zhu, C.K. Ting, K. Tang, X. Yao, Evolutionary multitasking for single-objective continuous optimization: benchmark problems, performance metric, and baseline results. Preprint. arXiv:1706.03470 (2017) 137. Y. Yuan, Y.S. Ong, L. Feng, A.K. Qin, A. Gupta, B. Da, Q. Zhang, K.C. Tan, Y. Jin, H. Ishibuchi, Evolutionary multitasking for multiobjective continuous optimization: Benchmark problems, performance metrics and baseline results (2017). Preprint. arXiv:1706.02766 138. J. Liang, B. Qu, P.N. Suganthan, Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization. Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Singapore, 2013 139. H. Li, Q. Zhang, Multiobjective optimization problems with complicated pareto sets, MOEA/D and NSGA-II. IEEE Trans. Evolut. Comput. 13(2), 284–302 (2009) 140. N. Durand, J.M. Alliot, Genetic crossover operator for partially separable functions, in Proceedings of the 3rd Annual Conference on Genetic Programming, 1998, pp. 487–494 141. J. Lin, S. Huang, M. Jiau, An evolutionary multiobjective carpool algorithm using set-based operator based on simulated binary crossover, IEEE Transactions on Cybernetics, 2018, pp. 1–11 142. S. Hui, P.N. Suganthan, Ensemble and arithmetic recombination-based speciation differential evolution for multimodal optimization. IEEE Trans. Cybernet. 46(1), 64–74 (2016) 143. F. Herrera, M. Lozano, A.M. Sánchez, A taxonomy for the crossover operator for real-coded genetic algorithms: an experimental study. Int. J. Intell. Syst. 18(3), 309–338 (2003) 144. T. Jones, Crossover, macromutation, and population-based search, in Proceedings of the Sixth International Conference on Genetic Algorithms, 1995, pp. 73–80 145. E. Falkenauer, The worth of the uniform [uniform crossover], in Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 1, 1999, pp. 776– 782 146. Z. Michalewicz, S.J. Hartley, Genetic algorithms + data structures = evolution programs. Math. Intell. 18(3), 71 (1996) 147. J. Feder, E.L. Hinrichsen, K.J. Måløy, T. Jøssang, Geometrical crossover and self-similarity of dla and viscous fingering clusters. Phys. D: Nonlinear Phenom. 38(1–3), 104–111 (1989) 148. A.H. Wright, Genetic algorithms for real parameter optimization, in Foundations of Genetic Algorithms, vol. 1 (1991), pp. 205–218 149. H.M. Voigt, H. Mühlenbein, D. Cvetkovic, Fuzzy recombination for the breeder genetic algorithm. in Proceedings of the sixth International Conference on Genetic Algorithms, 1995
214
References
150. L.J. Eshelman, J.D. Schaffer, Real-coded genetic algorithms and interval-schemata, in Foundations of Genetic Algorithms, vol. 2 (Morgan Kaufmann, San Mateo, 1993), pp. 187– 202 151. K. Deb, K. Sindhya, T. Okabe, Self-adaptive simulated binary crossover for real-parameter optimization, in Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, 2007, pp. 1187–1194 152. S. Patro, K.K. Sahu, Normalization: a preprocessing stage (2015). Preprint. arXiv:1503.06462 153. M. Iqbal, B. Xue, H.A.-Sahaf, M. Zhang, Cross-domain reuse of extracted knowledge in genetic programming for image classification. IEEE Trans. Evolut. Comput. 21(4), 569–587 (2017) 154. D. O’Neill, H.A.-Sahaf, B. Xue, M. Zhang, Common subtrees in related problems: a novel transfer learning approach for genetic programming. Proceedings of 2017 IEEE Congress on Evolutionary Computation (CEC 2017), 2017, pp. 1287–1294 155. L.J. Eshelman, R. Caruana, J.D. Schaffer, Biases in the crossover landscape, in Proceedings of the 3rd International Conference on Genetic Algorithms, 1989, pp. 10–19 156. K.F. Man, K.S. Tang, S. Kwong, Genetic algorithms: concepts and applications [in engineering design]. IEEE Trans. Ind. Electron. 43(5), 519–534 (1996) 157. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016) http:// www.deeplearningbook.org 158. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS’12 (2012), pp. 1097–1105 159. A. Bordes, X. Glorot, J. Weston, Y. Bengio, Joint learning of words and meaning representations for open-text semantic parsing, in International Conference on Artificial Intelligence and Statistics, 2012, pp. 127–135 160. X. Glorot, A. Bordes, Y. Bengio, Domain adaptation for large-scale sentiment classification: A deep learning approach, in Proceedings of the Twenty-eight International Conference on Machine Learning, ICML’11, 2011, pp. 97–110 161. L. Feng, Y.S. Ong, S. Jiang, A. Gupta, Autoencoding evolutionary search with learning across heterogeneous problems. IEEE Trans. Evolut. Comput. (in press, 2017) 162. A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, J. Schmidhuber, A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Patt. Anal. Mach. Intell. 31(5), 855–868 (2009) 163. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010) 164. C. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006) 165. M. Potter, K. De Jong, Cooperative coevolution: An architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000) 166. R.B. Agrawal, K. Deb, R.B. Agrawal, Simulated binary crossover for continuous search space. Complex Syst. 9(2), 115–148 (1995) 167. R. Storn, K. Price, Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 168. E. Zitzler, M. Laumanns, L. Thiele, Spea2: improving the strength pareto evolutionary algorithm, in Proceedings of Evolutionary Methods for Design, Optimisation and Control with Application to Industrial Problems (EUROGEN), pp. 95–100 (2001) 169. F.H. Liu, S.Y. Shen, The fleet size and mix vehicle routing problem with time windows. J. Oper. Res. Soc. 50(7), 721–732 (1999) 170. K. Dorling, J. Heinrichs, G.G. Messier, S. Magierowski, Vehicle routing problems for drone delivery. IEEE Trans. Syst. Man Cybernet. Syst. 47(1), 70–85 (2017) 171. J. Wang, Y. Zhou, Y. Wang, J. Zhang, C.L.P. Chen, Z. Zheng, Multiobjective vehicle routing problems with simultaneous delivery and pickup and time windows: formulation, instances, and algorithms. IEEE Trans. Cybernet. 46(3), 582–594 (2016)
References
215
172. D.J. Bertsimas, A vehicle routing problem with stochastic demand. Oper. Res. 40(3), 574–585 (1992) 173. J. Renaud, G. Laporte, F.F. Boctor, A tabu search heuristic for the multi-depot vehicle routing problem. Comput. Oper. Res. 23(3), 229–235 (1996) 174. R.W. Bent, P.V. Hentenryck, Scenario-based planning for partially dynamic vehicle routing with stochastic customers. Oper. Res. 52(6), 977–987 (2004) 175. G.B. Dantzig, J.H. Ramser, The truck dispatching problem. Manage. sci. 6(1), 80–91 (1959) 176. M.M. Solomon, Algorithms for the vehicle routing and scheduling problems with time window constraints. Oper. Res. 35(2), 254–265 (1987) 177. J. Wang, T. Weng, Q. Zhang, A two-stage multiobjective evolutionary algorithm for multiobjective multidepot vehicle routing problem with time windows. IEEE Trans. Cybernet. 49(7), 2467–2478 (2019) 178. A. Gupta, C.K. Heng, Y.S. Ong, P.S. Tan, A.N. Zhang, A generic framework for multicriteria decision support in eco-friendly urban logistics systems. Expert Syst. Appl. 71, 288– 300 (2017) 179. Y. Kuo, C.C. Wang, A variable neighborhood search for the multi-depot vehicle routing problem with loading cost. Expert Syst. Appl. 39(8), 6949–6954 (2012) 180. F. Hernandez, D. Feillet, R. Giroudeau, O. Naud, Branch-and-price algorithms for the solution of the multi-trip vehicle routing problem with time windows. Eur. J. Oper. Res. 249(2), 551– 559 (2016) 181. K.C. Tan, Y.H. Chew, L.H. Lee, A hybrid multiobjective evolutionary algorithm for solving vehicle routing problem with time windows. Comput. Optim. Appl. 34(1), 115–151 (2006) 182. M. Schneider, A. Stenger, D. Goeke, The electric vehicle-routing problem with time windows and recharging stations. Transport. Sci. 48(4), 500–520 (2014) 183. M. Keskin, G. Laporte, B. Catay, Electric vehicle routing problem with time-dependent waiting times at recharging stations. Comput. Oper. Res. 107, 77–94 (2019) 184. S. Pelletier, O. Jabali, G. Laporte, The electric vehicle routing problem with energy consumption uncertainty. Transport. Res. Part B: Methodol. 126, 225–255 (2019) 185. G. Kim, Y.S. Ong, T. Cheong, P.S. Tan, Solving the dynamic vehicle routing problem under traffic congestion. IEEE Trans. Intell. Transport. Syst. 17(8), 2367–2380 (2016) 186. T. Hintsch, S. Irnich, Exact solution of the soft-clustered vehicle-routing problem, in European Journal of Operational Research, 2019 187. I. Rodríguez-Martín, J.-J. Salazar-González, H. Yaman, The periodic vehicle routing problem with driver consistency. Eur. J. Oper. Res. 273(2), 575–584 (2019) 188. Z. Wang, J.-B. Sheu, Vehicle routing problem with drones. Transport. Res. Part B: Methodol. 122, 350–364 (2019) 189. D. Schermer, M. Moeini, O. Wendt, A matheuristic for the vehicle routing problem with drones and its variants. Transport. Res. Part C: Emerg. Technol. 106, 166–204 (2019) 190. L. Dahle, H. Andersson, M. Christiansen, The vehicle routing problem with dynamic occasional drivers, in International Conference on Computational Logistics, 2017, pp. 49– 63 191. P. Toth, D. Vigo, The Vehicle Routing Problem. Monographs on Discrete Mathematics and Applications (Society for Industrial and Applied Mathematics, Philadelphia, 2002) 192. N. Christofides, The Vehicle Routing Problem (Tsinghua University Press, Beijing, 2011) 193. K. Braekers, K. Ramaekers, I.V. Nieuwenhuyse, The vehicle routing problem: State of the art classification and review. Comput. Ind. Eng. 99, 300–313 (2016) 194. S. Reil, A. Bortfeldt, L. Mönch, Heuristics for vehicle routing problems with backhauls, time windows, and 3d loading constraints. Eur. J. Oper. Res. 266(3), 877–894 (2018) 195. M. Schneider, F. Schwahn, D. Vigo, Designing granular solution methods for routing problems with time windows. Eur. J. Oper. Res. 263(2), 493–509 (2017) 196. C. Archetti, M. Savelsbergh, M.G. Speranza, The vehicle routing problem with occasional drivers. Eur. J. Oper. Res. 254(2), 472–480 (2016) 197. C. Koc, T. Bektas, O. Jabali, G. Laporte, A hybrid evolutionary algorithm for heterogeneous fleet vehicle routing problems with time windows. Comput. Oper. Res. 64, 11–27 (2015)
216
References
198. M. Mirabi, A novel hybrid genetic algorithm for the multidepot periodic vehicle routing problem. AI EDAM 29(1), 45–54 (2015) 199. C. Prins, Two memetic algorithms for heterogeneous fleet vehicle routing problems. Eng. Appl. AI 22(6), 916–928 (2009) 200. I.M. Oliver, D.J. Smith, J.R.C. Holland, A study of permutation crossover operators on the traveling salesman problem, in Proceedings of the Second International Conference on Genetic Algorithms on Genetic Algorithms and Their Application, 1987, pp. 224–230 201. J.Y. Potvin, C. Duhamel, F. Guertin, A genetic algorithm for vehicle routing with backhauling. Appl. Intell. 6(4), 345–355 (1996) 202. G. Rudolph, Evolutionary search under partially ordered fitness sets, in International Symposium on Information Science Innovations in Engineering of Natural and Artificial Intelligent Systems, 2001, pp. 818–822 203. E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results. Evolut. Comput. 8(2), 173–95 (2000) 204. N. Labadi, C. Prins, M. Reghioui, A memetic algorithm for the vehicle routing problem with time windows. RAIRO - Oper. Res. 42(3), 415–431 (2008) 205. J. Leung, L. Kelly, J.H. Anderson, Handbook of Scheduling: Algorithms, Models, and Performance Analysis (CRC Press, Inc., Boca Raton, FL, 2004) 206. G. Laporte, Fifty years of vehicle routing. Transport. Sci. 43(4), 408–416 (2009) 207. G. Dantzig, J.H. Ramser, The truck dispatching problem. Manage. Sci. 6(4), 80–91 (1959) 208. A.N. Letchford, J.-J. Salazar-Gonzalez, The capacitated vehicle routing problem: stronger bounds in pseudo-polynomial time. Eur. J. Oper. Res. 272(1), 24–31 (2019) 209. G. Laporte, F. Semet, The vehicle routing problem, in Classical Heuristics for the Capacitated VRP (SIAM, Philadelphia, PA, 2001), pp. 109–128 210. T.K. Ralphs, L. Kopman, W.R. Pulleyblank, L.E. Trotter, On the capacitated vehicle routing problem. Math. Programm. 94(2), 343–359 (2003) 211. D. Xu, Y. Huang, Z. Zeng, X. Xu, Human gait recognition using patch distribution feature and locality-constrained group sparse representation. IEEE Trans. Image Process. 21(1), 316–326 (2012) 212. K. Koh, S.J. Kim, S. Boyd, An interior-point method for large-scale l1-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007) 213. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010) 214. J. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964) 215. C. Prins, A simple and effective evolutionary algorithm for the vehicle routing problem. Comput. Operat. Res. 31(12), 1985–2002 (2004) 216. T.A.M. Toffolo, T. Vidal, T. Wauters, Heuristics for vehicle routing problems: Sequence or set optimization? Comput. Oper. Res. 105, 118–131 (2018) 217. L. Zhou, L. Feng, J. Zhong, Y.-S. Ong, Z. Zhu, E. Sha, Evolutionary multitasking in combinatorial search spaces: A case study in capacitated vehicle routing problem, in 2016 IEEE Symposium Series on Computational Intelligence (SSCI) (IEEE, New York, 2016), pp. 1–8 218. K. Grauman, T. Darrell, The pyramid match kernel: efficient learning with sets of features. J. Mach. Learn. Res. 8, 725–760 (2007) 219. P. Yang, K. Tang, X. Yao, A parallel divide-and-conquer-based evolutionary algorithm for large-scale optimization. IEEE Access 7, 163105–163118 (2019) 220. J. Jian, Z. Zhan, J. Zhang, Large-scale evolutionary optimization: a survey and experimental comparative study. Int. J. Mach. Learn. Cybernet. 11(3), 729–745 (2020) 221. S. Mahdavi, M.E. Shiri, S. Rahnamayan, Metaheuristics in large-scale global continues optimization: a survey. Inf. Sci. 295(C), 407–428 (2015) 222. M.N. Omidvar, X. Li, Y. Mei, X. Yao, Cooperative co-evolution with differential grouping for large scale optimization. IEEE Trans. Evolut. Comput. 18(3), 378–393 (2014)
References
217
223. Z. Yang, K. Tang, X. Yao, Multilevel cooperative coevolution for large scale optimization, in IEEE Congress on Evolutionary Computation, 2008, pp. 1663–1670 224. Z. Yang, K. Tang, X. Yao, Large scale evolutionary optimization using cooperative coevolution. Inf. Sci. 178(15), 2985–2999 (2008) 225. X. Li, X. Yao, Cooperatively coevolving particle swarms for large scale optimization. IEEE Trans. Evolut. Comput. 16(2), 210–224 (2012) 226. M.N. Omidvar, X. Li, X. Yao, Cooperative co-evolution with delta grouping for large scale non-separable function optimization, in IEEE Congress on Evolutionary Computation, 2010, pp. 1–8 227. Q. Yang, W. Chen, J. Zhang, Evolution consistency based decomposition for cooperative coevolution. IEEE Access 6, 51084–51097 (2018) 228. Y. Sun, M. Kirley, S.K. Halgamuge, Extended differential grouping for large scale global optimization with direct and indirect variable interactions, in Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, 2015, pp. 313–320 229. Y. Mei, M.N. Omidvar, X. Li, X. Yao, A competitive divide-and-conquer algorithm for unconstrained large-scale black-box optimization. ACM Trans. Math. Softw. 42(2), 1–24 (2016) 230. Y. Sun, M. Kirley, S.K. Halgamuge, A recursive decomposition method for large scale continuous optimization. IEEE Trans. Evolut. Comput. 22(5), 647–661 (2018) 231. Y. Sun, X. Li, A. Ernst, M.N. Omidvar, Decomposition for large-scale optimization problems with overlapping components, in 2019 IEEE Congress on Evolutionary Computation (CEC), 2019, pp. 326–333 232. A. Kabán, J. Bootkrajang, R.J. Durrant, Toward large-scale continuous EDA: a random matrix theory perspective. Evolut. Comput. 24(2), 255–291 (2016) 233. H. Qian, Y. Yu, Scaling simultaneous optimistic optimization for high-dimensional nonconvex functions with low effective dimensions, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI Press, Palo Alto, CA, 2016), pp. 2000–2006 234. Y. Hou, N. Jiang, H. Ge, Q. Zhang, X. Qu, L. Feng, A. Gupta, Memetic multi-agent optimization in high dimensions using random embeddings, in IEEE Congress on Evolutionary Computation, 2019, pp. 135–141 235. C. He, L. Li, Y. Tian, X. Zhang, R. Cheng, Y. Jin, X. Yao, Accelerating large-scale multiobjective optimization via problem reformulation. IEEE Trans. Evol. Comput. 23(6), 949–961 (2019) 236. H. Zille, H. Ishibuchi, S. Mostaghim, Y. Nojima, A framework for large-scale multiobjective optimization based on problem transformation. IEEE Trans. Evol. Comput. 22(2), 260–275 (2018) 237. I.K. Fodor, A survey of dimension reduction techniques. Technical Report, 2002 238. X. Li, K. Tang, M.N. Omidvar, Z. Yang, K. Qin, Benchmark functions for the CEC’2013 special session and competition on large-scale global optimization. Technical Report, 2013 239. M.N. Omidvar, M. Yang, Y. Mei, X. Li, X. Yao, DG2: a faster and more accurate differential grouping for large-scale black-box optimization. IEEE Trans. Evolut. Comput. 21(6), 929– 942 (2017) 240. Q. Yang, W. Chen, J.D. Deng, Y. Li, T. Gu, J. Zhang, A level-based learning swarm optimizer for large-scale optimization. IEEE Trans. Evolut. Comput. 22(4), 578–594 (2018) 241. I.T. Jolliffe, Principal Component Analysis and Factor Analysis (Springer, New York, NY, 1986), pp. 115–128 242. S. Rendle, C. Freudenthaler, Z. Gantner, L. Schmidt-Thieme, BPR: Bayesian personalized ranking from implicit feedback, in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, 2009, pp. 452–461 243. L. Li, W. Chu, J. Langford, R.E. Schapire, A contextual-bandit approach to personalized news article recommendation, in Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 661–670 244. Q. Ding, Y. Liu, C. Miao, F. Cheng, H. Tang, A hybrid bandit framework for diversified recommendation (2020). Preprint. arXiv:2012.13245
218
References
245. W. Zhou, L. Feng, K.C. Tan, M. Jiang, Y. Liu, Evolutionary search with multi-view prediction for dynamic multi-objective optimization, in IEEE Transactions on Evolutionary Computation, 2021 246. M. Stein, J. Branke, H. Schmeck, Efficient implementation of an active set algorithm for large-scale portfolio selection. Comput. Oper. Res. 35(12), 3945–3961 (2008) 247. S. Mahdavi, M.E. Shiri, S. Rahnamayan, Metaheuristics in large-scale global continues optimization: a survey. Inf. Sci. 295, 407–428 (2015) 248. Y. Tian, X. Zheng, X. Zhang, Y. Jin, Efficient large-scale multiobjective optimization based on a competitive swarm optimizer. IEEE Trans. Cybernet. 50(8), 1–13 (2019) 249. L.M. Antonio, C.A.C. Coello, Use of cooperative coevolution for solving large scale multiobjective optimization problems. 2013 IEEE Congress Evolut. Comput. 2013(2), 2758– 2765 (2013) 250. X. Ma, F. Liu, Y. Qi, X. Wang, L. Li, L. Jiao, M. Yin, M. Gong, A multiobjective evolutionary algorithm based on decision variable analyses for multiobjective optimization problems with large-scale variables. IEEE Trans. Evolut. Comput. 20(2), 275–298 (2016) 251. A. Song, Q. Yang, W.N. Chen, J. Zhang, A random-based dynamic grouping strategy for large scale multi-objective optimization, in 2016 IEEE Congress on Evolutionary Computation, CEC 2016, 2016, pp. 468–475 252. L.M. Antonio, C.A. Coello Coello, Decomposition-based approach for solving large scale multi-objective problems, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9921(221551), 2016, pp. 525–534 253. X. Zhang, Y. Tian, R. Cheng, Y. Jin, A decision variable clustering-based evolutionary algorithm for large-scale many-objective optimization. IEEE Trans. Evolut. Comput. 22(1), 97–112 (2018) 254. M. Li, J. Wei, A cooperative co-evolutionary algorithm for large-scale multi-objective optimization problems. GECCO 2018 Companion - Proceedings of the 2018 Genetic and Evolutionary Computation Conference Companion, 2018, pp. 1716–1721 255. H. Chen, R. Cheng, J. Wen, H. Li, J. Weng, Solving large-scale many-objective optimization problems by covariance matrix adaptation evolution strategy with scalable small subpopulations. Inf. Sci. 509, 457–469 (2020) 256. B. Cao, J. Zhao, Z. Lv, X. Liu, A distributed parallel cooperative coevolutionary multiobjective evolutionary algorithm for large-scale optimization. IEEE Trans. Ind. Inform. 13(4), 2030–2038 (2017) 257. H. Chen, X. Zhu, W. Pedrycz, S. Yin, G. Wu, H. Yan, PEA: Parallel evolutionary algorithm by separating convergence and diversity for large-scale multi-objective optimization. Proceedings - International Conference on Distributed Computing Systems, 2018-July(July), 2018, pp. 223–232 258. J. Yi, L. Xing, G. Wang, J. Dong, A.V. Vasilakos, A.H. Alavi, L. Wang, Behavior of crossover operators in NSGA-III for large-scale optimization problems. Inf. Sci. 509, 470–487 (2020) 259. C. He, R. Cheng, D. Yazdani, Adaptive offspring generation for evolutionary large-scale multiobjective optimization, in IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020 260. Y. Tian, X. Zhang, C. Wang, Y. Jin, An evolutionary algorithm for large-scale sparse multiobjective optimization problems. IEEE Trans. Evolut. Comput. 24(2), 380–393 (2019) 261. H. Qian, Y. Yu, Solving high-dimensional multi-objective optimization problems with low effective dimensions, in 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 875–881 262. H. Zille, H. Ishibuchi, S. Mostaghim, Y. Nojima, A framework for large-scale multiobjective optimization based on problem transformation. IEEE Trans. Evolut. Comput. 22(2), 260–275 (2018) 263. C. He, L. Li, Y. Tian, X. Zhang, R. Cheng, Y. Jin, X. Yao, Accelerating large-scale multiobjective optimization via problem reformulation. IEEE Trans. Evolut. Comput. 23(6), 949–961 (2019)
References
219
264. R. Liu, J. Liu, Y. Li, J. Liu, A random dynamic grouping based weight optimization framework for large-scale multi-objective optimization problems. Swarm and Evolutionary Computation 55, 100684 (2020) 265. Y. Tian, C. Lu, X. Zhang, K.C. Tan, Y. Jin, Solving large-scale multiobjective optimization problems with sparse optimal solutions via unsupervised neural networks, in IEEE Transactions on Cybernetics, 2020 266. Y. Ge, W. Yu, Y. Lin, Y. Gong, Z. Zhan, W. Chen, J. Zhang, Distributed differential evolution based on adaptive mergence and split for large-scale optimization. IEEE Trans. Cybernet. 48(7), 2166–2180 (2017) 267. R. Cheng, Y. Jin, A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybernet. 45(2), 191–204 (2014) 268. R. Cheng, Y. Jin, M. Olhofer, B. Sendhoff, Test problems for large-scale multiobjective and many-objective optimization. IEEE Trans. Cybernet. 47(12), 4108–4121 (2017) 269. H. Li, Q. Zhang, Multiobjective optimization problems with complicated pareto sets, MOEA/D and NSGA-II. IEEE Trans. Evolut. Comput. 13(2), 284–302 (2008) 270. A.J. Nebro, J.J. Durillo, J. Garcia-Nieto, C.C. Coello, F. Luna, E. Alba, SMPSO: a new pso-based metaheuristic for multi-objective optimization, in 2009 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making (MCDM) (IEEE, New York, 2009), pp. 66–73 271. Y. Tian, R. Cheng, X. Zhang, Y. Jin, Platemo: a matlab platform for evolutionary multiobjective optimization [educational forum]. IEEE Comput. Intell. Mag. 12(4), 73–87 (2017) 272. P.A. Bosman, D. Thierens, The balance between proximity and diversity in multiobjective evolutionary algorithms. IEEE Trans. Evolut. Comput. 7(2), 174–188 (2003) 273. A. Zhou, Y. Jin, Q. Zhang, B. Sendhoff, E. Tsang, Combining model-based and geneticsbased offspring generation for multi-objective optimization using a convergence criterion, in 2006 IEEE International Conference on Evolutionary Computation (IEEE, New York, 2006), pp. 892–899 274. F. Wilcoxon, S. Katti, R.A. Wilcox, Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Select. Tables Math. Stat. 1, 171–259 (1970) 275. M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937) 276. Y. Jin, T. Okabe, B. Sendhoff, Evolutionary multi-objective optimization approach to constructing neural network ensembles for regression, in Applications of Multi-Objective Evolutionary Algorithms (World Scientific, Singapore, 2004), pp. 635–673 277. D. Dheeru, E.K. Taniskidou, UCI machine learning repository, 2017