113 63 5MB
English Pages 152 [150] Year 2021
Adaptation, Learning, and Optimization 25
Liang Feng Yaqing Hou Zexuan Zhu
Optinformatics in Evolutionary Learning and Optimization
Adaptation, Learning, and Optimization Volume 25
Series Editors Yew Soon Ong, Nanyang Technological University, Singapore, Singapore Abhishek Gupta, Singapore Institute of Manufacturing Tec, Singapore, Singapore Maoguo Gong, Mailbox 224, Xidian University, Xian, Shaanxi, China
The role of adaptation, learning and optimization are becoming increasingly essential and intertwined. The capability of a system to adapt either through modification of its physiological structure or via some revalidation process of internal mechanisms that directly dictate the response or behavior is crucial in many real world applications. Optimization lies at the heart of most machine learning approaches while learning and optimization are two primary means to effect adaptation in various forms. They usually involve computational processes incorporated within the system that trigger parametric updating and knowledge or model enhancement, giving rise to progressive improvement. This book series serves as a channel to consolidate work related to topics linked to adaptation, learning and optimization in systems and structures. Topics covered under this series include: • complex adaptive systems including evolutionary computation, memetic computing, swarm intelligence, neural networks, fuzzy systems, tabu search, simulated annealing, etc. • machine learning, data mining & mathematical programming • hybridization of techniques that span across artificial intelligence and computational intelligence for synergistic alliance of strategies for problem-solving. • aspects of adaptation in robotics • agent-based computing • autonomic/pervasive computing • dynamic optimization/learning in noisy and uncertain environment • systemic alliance of stochastic and conventional search techniques • all aspects of adaptations in man-machine systems. This book series bridges the dichotomy of modern and conventional mathematical and heuristic/meta-heuristics approaches to bring about effective adaptation, learning and optimization. It propels the maxim that the old and the new can come together and be combined synergistically to scale new heights in problem-solving. To reach such a level, numerous research issues will emerge and researchers will find the book series a convenient medium to track the progresses made. Indexed by SCOPUS, zbMATH, SCImago.
More information about this series at http://www.springer.com/series/8335
Liang Feng Yaqing Hou Zexuan Zhu •
•
Optinformatics in Evolutionary Learning and Optimization
123
Liang Feng College of Computer Science Chongqing University Chongqing, China
Yaqing Hou School of Computer Science and Technology Dalian University of Technology Dalian, China
Zexuan Zhu College of Computer Science and Software Engineering Shenzhen University Shenzhen, China
ISSN 1867-4534 ISSN 1867-4542 (electronic) Adaptation, Learning, and Optimization ISBN 978-3-030-70919-8 ISBN 978-3-030-70920-4 (eBook) https://doi.org/10.1007/978-3-030-70920-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Evolutionary Algorithms (EAs) are adaptive search approaches that take inspiration from the principles of natural selection and genetics. Due to their efficacy of global search and ease of usage, EAs have been widely deployed to address complex optimization problems occurring in a plethora of real-world domains, including image processing, automation of machine learning, neural architecture search, urban logistics planning, etc. Despite the success enjoyed by EAs, it is worth noting that most existing EA optimizers conduct the evolutionary search process from scratch, ignoring the data that may have been accumulated from different problems solved in the past. However, today, it is well established that real-world problems seldom exist in isolation, such that harnessing the available data from related problems could yield useful information for more efficient problem-solving. Therefore, in recent years, there is an increasing research trend in conducting knowledge learning and data processing along the course of an optimization process, with the goal of achieving accelerated search in conjunction with better solution quality. To this end, the term optinformatics has been coined in the literature as the incorporation of information processing and data mining (i.e., informatics) techniques into the optimization process. This book aims to firstly summarize recent algorithmic advances towards realizing the notion of optinformatics in evolutionary learning and optimization. The book shall also contain a variety of practical applications, including inter-domain learning in vehicle route planning, data-driven techniques for feature engineering in automated machine learning, as well as evolutionary transfer reinforcement learning. Future directions for algorithmic development in the field of evolutionary computation shall also be covered. Chongqing, China Dalian, China Shenzhen, China
Liang Feng Yaqing Hou Zexuan Zhu
v
Contents
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 3 4 5
2 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Meta-Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Single-Solution-Based Meta-Heuristics . . . . . . . . . . 2.1.2 Population-Based Meta-Heuristics . . . . . . . . . . . . . . 2.1.3 Hybridization and Memetic Algorithms . . . . . . . . . . 2.2 Knowledge Learning and Transfer in Meta-Heuristics . . . . . 2.2.1 Memory-Based Knowledge Reuse Method . . . . . . . 2.2.2 Model-Based Knowledge Reuse Method . . . . . . . . . 2.2.3 Transfer Learning Based Knowledge Reuse Method 2.2.4 Online Knowledge Reuse Method . . . . . . . . . . . . . . 2.3 Memetic Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
7 7 8 9 9 9 10 10 11 11 12 13
3 Optinformatics Within a Single Problem Domain . . . . . . . . . 3.1 Knowledge Reuse in The Form of Local Search . . . . . . . . 3.1.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Memetic Feature Selection . . . . . . . . . . . . . . . . . . 3.1.3 Empirical Studies On Real-World Applications . . . 3.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Transfer Learning as Culture-Inspired Operators . . 3.2.2 Learning from Past Experiences . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
17 17 19 21 27 34
...... ...... ......
35 35 37
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Evolutionary Learning and Optimization . . 1.2 The Rise of Optinformatics in Evolutionary 1.3 Outline of Chapters . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
........... ........... Computation . ........... ...........
. . . . . .
vii
viii
Contents
3.2.3 Proposed Formulations and Algorithms for Routing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Case Study on Capacitated Arc Routing Problem . . . 3.2.5 Case Study on Capacitated Vehicle Routing Problem . 3.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
4 Optinformatics Across Heterogeneous Problem Domains and Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Knowledge Learning and Transfer Across Problems Domain Towards Advanced Evolutionary Optimization . . . . . . . . . . . . . 4.1.1 Knowledge Meme Shared by Problem Domains—CVRP & CARP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 A Common Problem Representation for CVR and CAR Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Knowledge Meme Shared by CVRP and CARP . . . . . . 4.1.4 Evolutionary Optimization with Learning and Evolution of Knowledge Meme Between CVRP and CARP . . . . . 4.1.5 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Multi-agent System . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Reinforcement Learning for Individual Learning in MAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Transfer Leaning in Multi-agent Reinforcement Learning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Realization of eTL with Learning Agents . . . . . . . . . . . 4.2.5 eTL with FALCON and BP . . . . . . . . . . . . . . . . . . . . . 4.2.6 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Potential Research Directions . . . . . . . . . . . . . . . . . . . . . . . 5.1 Deep Optinformatics in Evolutionary Learning and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Evolutionary Knowledge Transfer Across Reinforcement Learning Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 GPU Based Optinformatics in Evolutionary Learning and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Theoretical Study of Optinformatics . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
38 48 57 71 71
..
75
..
75
..
77
.. ..
79 82
. . . . .
.. 83 .. 86 . . 100 . . 101 . . 102 . . 103 . . . . . .
. . . . . .
105 111 120 122 135 135
. . . . . . . 141 . . . . . . . 141 . . . . . . . 142 . . . . . . . 142 . . . . . . . 143 . . . . . . . 143
Chapter 1
Introduction
1.1 Evolutionary Learning and Optimization In computer science, evolutionary learning and optimization denotes a class of natureinspired evolutionary computation approaches which represents one of the main pillars of computational intelligence. It contains the theory, design, practical application and development of biologically motivated computational paradigms and algorithms. Over the years, with the rapid development of digital technologies and advanced communications, evolutionary learning and optimization method has now become an important technology for tackling the ever-increasing complexity encountered in many practical applications, e.g., automated deep neural network design [1], largescale non-convex optimization [2], etc. In the literature, there are many different variants of evolutionary learning and optimization methods. However, the common underlying idea behind these methods is the same: solving a given problem by evolving a population of individuals with the pressure given by natural selection (survival of fitness), which is illustrated in Fig. 1.1. One of the most representative evolutionary learning and optimization methods is evolutionary algorithm [3, 4]. Evolutionary algorithms (EAs) are adaptive search approaches that take inspirations from the principles of natural selection and genetics. They have been shown to be suitable for solving nonlinear, multimodal, and discrete NP-hard problems effectively. As outlined in Algorithm 1, a standard EA proceeds in an iterative manner by generating new populations P(k + 1) of individuals from the previous population P(k). Every individual in the population is the encoded (in the form of binary, real, etc.) version of a tentative solution. An evaluation of the fitness or objective function associates a fitness value to every individual indicating its suitability to the problem. The canonical algorithm applies stochastic operators such as selection, crossover, and mutation on the original population to arrive at an offspring population of individuals. In a general formulation, reproduction operators are applied
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. Feng et al., Optinformatics in Evolutionary Learning and Optimization, Adaptation, Learning, and Optimization 25, https://doi.org/10.1007/978-3-030-70920-4_1
1
2
1 Introduction
Fig. 1.1 Workflow of an EA
to create a temporary population P (k) containing the offspring. The new individuals are subsequently evaluated before undergoing the selection operator to arrive at the new population P(k + 1). The stopping condition can be defined in the form of the maximum number of iterations allowed, or to find a solution that matches the precision error defined, or otherwise.
Algorithm 1: Evolutionary Algorithm
7 8
Begin k := 0 /* Initialize the evaluation counter. */ Initialize and evaluate [P(k)]; /* Create an initial population. */ while Stopping conditions are not satisfied do P (k) := Repr oduction[P(k)] /* Apply cross and mutation operators.*/ Evaluate [P (k)] P(k + 1) := select[P (k), P(k)]; /* Create an new population. */ k := k + 1; /* Increase the evaluation counter. */
9
End
1 2 3 4 5 6
It can be observed in Algorithm 1 that EA is easy to implement and transferable from one application to another, since there are only two components are problem dependent, i.e., the encoding scheme to represent a problem solution and the fitness function to evaluate the quality of a solution. The choice of suitable reproduction operators often relies on the encoding scheme and not on the specific problem being tackled, since evolutionary mechanisms underpinning natural evolution are largely species independent. Furthermore, selection operators do not even depend on the encoding method, as they only consider fitness information. Therefore, for a given problem, a suitable evolutionary algorithm can be designed easily, as long as the problem-dependent solution representation can be encoded properly. In the last decades, EAs have been applied in many practical applications, and achieved significant success in obtaining optimal or near-optimal solutions on a plethora of learning and optimization problems including feature selection [5, 6],
1.1 Evolutionary Learning and Optimization
3
image processing [7, 8], multi-objective optimization [9–11], NP-hard combinatorial optimization [12, 13], multi-agent system [14, 15], dynamic optimization [16–18], etc.
1.2 The Rise of Optinformatics in Evolutionary Computation Today, it is well recognized that the processes of learning and the transfer of what has been learned are central to humans in problem-solving [19]. Learning has been established to be fundamental to human in functioning and adapting to the fast evolving society. Besides learning from the successes and mistakes of the past and learning to avoid making the same mistakes again, the ability of human in selecting, generalizing and drawing upon what have been experienced and learned in one context, and extending them to new problems is deem to be most remarkable [20]. Within the context of computational intelligence, several core learning technologies in neural and cognitive systems, fuzzy systems, probabilistic and possibilistic reasoning have been notable for their ability in emulating some of human’s cultural and generalization capabilities, with many now used to enhance our daily life [21, 22]. In contrast, the attempts to emulate the cultural intelligence of human in evolutionary learning and optimization have to date received far less attention. In particular, existing evolutionary algorithms (EAs) have remained yet to fully exploit the useful traits that may exist in similar tasks (problems) or in the evolutionary search process. The study of evolutionary learning and optimization methodology that evolves along with the problems solved has been under-explored in the context of evolutionary computation. In machine learning, as illustrated in Fig. 1.2, the idea of taking advantage of useful traits across or within the same problem domains towards enhanced learning performance has received significant interest under the term of transfer learning (TL) in the last decade [23, 24]. Nevertheless, associated research progress of TL has largely been restricted to the domains of machine learning applications, such as
Fig. 1.2 An illustration of transfer learning
4
1 Introduction
computer vision [25], natural language processing [26], and speech recognition [27], where the availability of data makes it possible to ascertain the feasibility of knowledge transfer. For the case of evolutionary learning and optimization, it is often little problem-specific data available beforehand. Therefore, new online approaches that can harness recurrent patterns between problem-solving exercises or along the evolutionary search process are needed for automatic knowledge transfer in evolutionary learning and optimization. In recent years, to incorporate problem learning on data generated during the search towards eventually accelerated and/or improved quality solutions from evolutionary learning and optimization, the term “optinformatics” has been defined as “introduction of the informatics specialization for mining the data generated in the optimization process, with the aim of extracting possibly implicit and potentially useful information and knowledge” [28]. In particular, evolutionary computation does not have to be entirely a black-box approach that generates only the global optimal or near-global optimal solutions. How the solutions are obtained in evolutionary search and how high-quality solutions can be transferred across problems may be brought to light through optinformatics. For instance, from the perspective of numerical optimization alone, methodologies using optinformatics including those hybridizing tradtional EAs and local refinement/learning procedures, which is known as memetic algorithms (MAs) [29, 30]. In addition to combining meta-heursitics that complement one another, a common hybridization procedure is to incorporate transfer learning techniques within EAs in the spirit of Dawkins’ notion of memes. The basic idea is to utilize data mining and domain adaptation techniques to dynamically exploit and transfer problem-specific knowledge so as to improve effectiveness and efficiency of EAs [31–33]. Other examples of optinformatics include the use of response surface models in place of expensive and/or rugged objective functions [34], the incorporation of various statistical models for estimation of distribution algorithms [35], and the evolutionary transfer of knowledge sharing in reinforcement multi-agent system [14], etc.
1.3 Outline of Chapters To encourage the exploration of advanced designs of evolutionary computation approaches, this book introduces the concept of optinformatics in evolutionary learning and optimization, and provides specific algorithm developments of optinformatics in domains of complex optimization and learning problems. In order to provide the reader with a good appreciation of optinformatics, the book is divided into three parts, covering the background and preliminaries, specific algorithm designs, and discussion on potential directions for further exploration. Part I, comprising in this chapters and 2, gives background and motivation of optinformatics in evolutionary learning and optimization. In this chapter, in particular, introduces the concept of evolutionary computation as well as optinformatics, while
1.3 Outline of Chapters
5
Chap. 2 present the brief reviews on meta-heuristic, transfer learning, and memetic computation, which are the main methodologies used in optinformatics. Part II, including Chaps. 3 and 4, presents the specific algorithm designs of optinformatics in evolutionary learning and optimization within a single problem domain and across heterogeneous problem domains. In particular, in Chap. 3, we provide manifestations of knowledge mining and reuse by local search and transfer learning for feature selection and NP-hard routing problem, respectively. Chapter 4 gives the EA with learning capability across vehicle and arc routing problem domains and the evolutionary transfer reinforcement learning in multi-agent system. Part III, i.e., Chap. 5, concludes the book with a discussion on promising research directions of optinformatics for tackling the ever-increasing complexity of real-world learning and optimization problems, which includes deep optinformatics in evolutionary learning and optimization, optinformatics via reinforcement learning, metaoptinformatics in evolutionary learning and optimization, and theoretical study of optinformatics.
References 1. Y. Sun, B. Xue, M. Zhang, G.G. Yen, Evolving deep convolutional neural networks for image classification. IEEE Trans. Evol. Comput. 24(2), 394–407 (2020) 2. Y. Tian, X. Zhang, C. Wang, Y. Jin, An evolutionary algorithm for large-scale sparse multiobjective optimization problems. IEEE Trans. Evol. Comput. 24(2), 380–393 (2020) 3. X. Yao, Global optimisation by evolutionary algorithms, in Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis, pp. 282–291 (1997) 4. T. Bäck, H. Schwefel, An overview of evolutionary c. Evol. Comput. 1(1), 1–23 (1993) 5. B. Xue, M. Zhang, W.N. Browne, X. Yao, A survey on evolutionary computation approaches to feature selection. IEEE Trans. Evol. Comput. 20(4), 606–626 (2016) 6. H. Vafaie, K. De Jong, Feature space transformation using genetic algorithms. IEEE Intell. Syst. Appl. 13(2), 57–65 (1998) 7. S.M. Bhandarkar, H. Zhang, Image segmentation using evolutionary computation. IEEE Trans. Evol. Comput. 3(1), 1–21 (1999) 8. M. Omari, S. Yaichi, Image compression based on genetic algorithm optimization, in 2015 2nd World Symposium on Web Applications and Networking (WSWAN), pp. 1–5 (2015) 9. E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257–271 (1999) 10. C.M. Fonseca, P.J. Fleming, An overview of evolutionary algorithms in multiobjective optimization. Evol. Comput. 3(1), 1–16 (1995) 11. K.C. Tan, T.H. Lee, E.F. Khor, Evolutionary algorithms with dynamic population size and local exploration for multiobjective optimization. IEEE Trans. Evol. Comput. 5(6), 565–588 (2001) 12. F. Neri, G.L. Cascella, N. Salvatore, G. Acciani, D. GASSI, A hierarchical evolutionarydeterministic algorithm in topological optimization of electrical grounding grids. WSEAS Trans. Syst. 4, 2338–2345, 11 (2005) 13. K. Tang, Y. Mei, X. Yao, Memetic algorithm with extended neighborhood search for capacitated arc routing problems. IEEE Trans. Evol. Comput. 13(5), 1151–1166 (2009) 14. Y. Hou, Y. Ong, L. Feng, J.M. Zurada, An evolutionary transfer reinforcement learning framework for multiagent systems. IEEE Trans. Evol. Comput. 21(4), 601–615 (2017) 15. J. Zhou, X. Zhao, X. Zhang, D. Zhao, H. Li, Task allocation for multi-agent systems based on distributed many-objective evolutionary algorithm and greedy algorithm. IEEE Access 8, 19306–19318 (2020)
6
1 Introduction
16. H. Li, M. Li, J. Wang, The performance of genetic algorithms in dynamic optimization problems, in 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 364–369 (2008) 17. J. Brest, A. Zamuda, B. Boskovic, M.S. Maucec, V. Zumer, Dynamic optimization using selfadaptive differential evolution, in 2009 IEEE Congress on Evolutionary Computation, pp. 415–422 (2009) 18. M. Jiang, Z. Wang, S. Guo, X. Gao, K.C. Tan, Individual-based transfer learning for dynamic multiobjective optimization. IEEE Trans. Cybern. 1–14 (2020) 19. J. Bransford, A.L. Brown, R.R. Cocking, How People Learn: Brain, Mind, Experience, and School (The National Academies Press, Washington, DC, 1999) 20. J.P. Byrnes, Cognitive Development and Learning in Instructional Contexts (Pearson/Allyn and Bacon, 2008) 21. K.C. Tan, Y.J. Chen, K.K. Tan, T.H. Lee, Task-oriented developmental learning for humanoid robots. IEEE Trans. Ind. Electron. 52(3), 906–914 (2005) 22. H. Ishibuchi, K. Kwon, H. Tanaka, A learning algorithm of fuzzy neural networks with triangular fuzzy weights. Fuzzy Sets Syst. 71(3), 277–293 (1995) 23. K. Weiss, T.M. Khoshgoftaar, D.D. Wang, A survey of transfer learning. J. Big Data 3(9) (2016) 24. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng 22(10), 1345– 1359 (2010) 25. C. Sferrazza, R.D-Andrea, Transfer learning for vision-based tactile sensing, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7961–7967 (2019) 26. S. Ruder, M.E. Peters, S. Swayamdipta, T. Wolf, Transfer learning in natural language processing, in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, pp. 15–18 (2019) 27. J. Kunze, L. Kirsch, I. Kurenkov, A. Krug, J. Johannsmeier, S. Stober, Transfer learning for speech recognition on a budget, in Proceedings of the 2nd Workshop on Representation Learning for NLP, pp. 168–177 (2017) 28. D. Lim, Y.S. Ong, A. Gupta, C.K. Goh, P.S. Dutta, Towards a new praxis in optinformatics targeting knowledge re-use in evolutionary computation: simultaneous problem learning and optimization. Evol. Intell. 9(4), 203–220 (2016) 29. Y.S. Ong, A.J. Keane, Meta-lamarckian learning in memetic algorithms. IEEE Trans. Evol. Comput. 8(2), 99–110 (2004) 30. X. Chen, Y. Ong, M. Lim, K.C. Tan, A multi-facet survey on memetic computation. IEEE Trans. Evol. Comput. 15(5), 591–607 (2011) 31. A. Gupta, Y. Ong, L. Feng, Multifactorial evolution: toward evolutionary multitasking. IEEE Trans. Evol. Comput. 20(3), 343–357 (2016) 32. L. Feng, Y. Ong, M. Lim, I.W. Tsang, Memetic search with interdomain learning: a realization between CVRP and CARP. IEEE Trans. Evol. Comput. 19(5), 644–658 (2015) 33. L. Feng, L. Zhou, J. Zhong, A. Gupta, Y. Ong, K. Tan, A.K. Qin, Evolutionary multitasking via explicit autoencoding. IEEE Trans. Cybern. 49(9), 3457–3470 (2019) 34. D. Lim, Y. Jin, Y. Ong, B. Sendhoff, Generalizing surrogate-assisted evolutionary computation. IEEE Trans. Evol. Comput. 14(3), 329–355 (2010) 35. P. Larraanaga, J.A. Lozano, Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation (Kluwer Academic Publishers, USA, 2001)
Chapter 2
Preliminary
Evolutionary algorithm is one of the most well-established classes of meta-heuristics. Besides optinformatics in evolutionary learning and optimization, knowledge mining and reuse has also been proposed for improving the performance of many metaheuristic methods. To provide the reader a comprehensive background of optinformatics, this chapter presents a review on meta-heuristics and knowledge learning and transfer in meta-heuristics. Moreover, as optinformatics represents a form of memetic computation which has been defined as a computational paradigm that incorporates the notion of meme(s) as basic units of transferable information encoded in computational representations for enhancing the performance of artificial evolutionary systems in the domain of search and optimization [1], the introduction of memetic computation is also presented in this chapter.
2.1 Meta-Heuristics Meta-heuristics are widely referred to as solution methods combining local improvement procedures and higher level strategies to find a sufficiently good solution for an optimization problem with limited information and computation resource [2, 3]. Since meta-heuristics are usually problem-independent for not taking advantage of the specificity of the problem, they are applicable to various optimization problems. They are capable of exploring the target search space efficiently to find the near– optimal solutions in a reasonable running time, but they do not guarantee that the optimal solutions can be reached. Meta-heuristic algorithms are designed to pursue a tradeoff between exploration and exploitation. Exploration means to generate diverse solutions so as to explore the whole search space, whereas exploitation focuses the search in a local region with the hope of improving the current promising solution in the region. To achieve good performance, a good balance of these two major components should be established. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. Feng et al., Optinformatics in Evolutionary Learning and Optimization, Adaptation, Learning, and Optimization 25, https://doi.org/10.1007/978-3-030-70920-4_2
7
8
2 Preliminary
The earliest meta-heuristics can be traced to the mid of 1960–1970s. For example, The evolutionary strategies [4] and evolutionary programming [5] were proposed in 1965 and 1966, respectively. The genetic algorithms [6] and scatter search [7] were put forward in 1975 and 1977, respectively. The majority of meta-heuristics have their origin in 1980 and 1990s. For instance, the genetic programming [8], simulated annealing [9] and tabu search [10] introduced in 1980s indicate a big step of the development of meta-heuristics. In 1990s, meta-heuristics have enjoyed a steady rise with the invention of ant colony optimization [11], particle swarm optimization [12, 13], and differential evolution [14]. Entering the 21st century, more meta-heuristics have been proposed in the literature, e.g., the harmony search [15], bacteria foraging algorithm [16], artificial bee colony [17], firefly algorithm [18], and cuckoo search [19]. According to the way the meta-heuristics manipulating solutions, the existing meta-heuristics methods mainly can be categorized into three groups i.e., singlesolution-based, population-based, and hybrid methods. Note that the three groups might not be mutually exclusive, and some meta-heuristic algorithms could be formed by taking ideas from each of them or transit from one group to another.
2.1.1 Single-Solution-Based Meta-Heuristics Single-solution-based meta-heuristics usually search good solutions by modifying or improving a single candidate solution, e.g., the simulated annealing [9], iterated local search [20], variable neighborhood search [21], and guided local search [22]. Small changes or moves are imposed to a single solution mainly in a fashion of local search, where new solution is generated in a neighborhood. Different local search strategies, e.g., greedy-based, improvement first, and random strategies, lead to different neighborhood structures. The local search is iteratively conducted in the neighborhood until some stopping criterion or the local optimum is reached. A local optimum is the best solution outperforming other solutions in the neighborhood. Nevertheless, the local optimum might not be the global optimum and some escaping strategy should be used to guide the search toward other regions in the hope of finding better solutions. The simplest way to escape from a local optimum is to restart the search from a different position or impose a big change to the current solution like the iterated local search [20]. Another candidate solution to this issue is to use different search strategies or simply random moves once the solution is get trapped in local optima, e.g., the variable neighborhood search [21] and simulated annealing [9]. As a local optimum is related to a specific local search strategy, using different local search strategies or random moves can lead to other local optima which are likely to improve the current solution. Memory structures like tabu list [10] and augmented objective function [22] are also widely used to forbid or punish the revisiting of the previous local optima.
2.1 Meta-Heuristics
9
2.1.2 Population-Based Meta-Heuristics Population-based meta-heuristics maintain and improve a set of candidate solutions, where new promising solutions are generated via the combination of the existing solutions in the population. Evolutionary algorithms, e.g., genetic algorithms [6], evolution strategies [4], genetic programming [8], evolutionary programming [5], and differential evolution [14] and swarm intelligent methods, e.g., ant colony optimization [11], particle swarm optimization [12, 13], and artificial bee colony [17] are the best-known population-based meta-heuristics. Evolutionary algorithms are a family of population-based stochastic search methods inspired by the natural process of evolution including reproduction and selection. High-quality solutions in terms of fitness evaluation in the population are preferably selected for reproduction, where specialized operators are used to combine attributes of two or more selected solutions into new ones. Swarm intelligence methods model the collective behavior of decentralized and self-organized systems to solve complex optimization problems.
2.1.3 Hybridization and Memetic Algorithms Hybrid meta-heurstics combine operators from different meta-heuristics or other optimization approaches to take the advantages of different components. Particularly, the combination of population-based evolutionary algorithms and single-solutionbased local search forms the well-known memetic algorithms (MAs) [23]. Local search is introduced to fine-tune the evolutionary population individuals to improve their convergence in local regions. In diverse contexts, MAs are also commonly known as hybrid Evolutionary Algorithms (EAs), Baldwinian EAs, Lamarckian EAs, cultural algorithms and genetic local search. Most of the MAs are typical hybridization of EA and local search, while MAs could also be derived from any other metaheuristic search methods, (e.g., Estimation of Distribution Algorithms based compact MA [24], Ant Colony Optimization based MA [25], Memetic Particle Swarm Optimization [26], Artificial Immune Systems inspired MA [27], etc.), or special design (e.g., Meta-Lamarckian MA [28], Coevolving MA [29], Adaptive Cellular MA [30], Probabilistic Memetic Framework [31] etc.). For the survey of MA paradigms, the reader is referred to [32–34].
2.2 Knowledge Learning and Transfer in Meta-Heuristics In practice, problems seldom exist in isolation, and previous related problem instances encountered often yield useful information that when properly harnessed, can lead to more efficient future evolutionary search. To date, some attempts have been made to reuse solutions from search experiences, which can be categorized
10
2 Preliminary
as memory-based knowledge reuse methods, model-based knowledge reuse methods, transfer learning based knowledge reuse method, and online knowledge reuse method.
2.2.1 Memory-Based Knowledge Reuse Method As indicated by the title, memory-based knowledge reuse methods often archive the high-quality solutions obtained on one problem, and directly reuse these solution as biases to guide the optimization search on an unseen problem. For instance, Louis and McDonnell [35] presented a study to acquire problem specific knowledge and subsequently using them to aid in the genetic algorithm (GA) search via case-based reasoning. Rather than starting a new on each problem, appropriate intermediate solutions drawn from similar problems that have been previously solved are periodically injected into the GA population. In a separate study, Cunningham and Smyth [36] also explored the reuse of established high quality schedules from past problems to bias the search on new traveling salesman problems (TSPs). Similar ideas on implicit and explicit memory schemes to store elite solutions have also been considered in dynamic optimization problems, where the objective function, design variables, and environmental conditions may change with time (for example, periodic changes) [37]. However, as both [35, 36] as well as works on dynamic optimization problems [35] generally considered the exact storage of past solutions or partial-solutions from previous problems solved, and subsequently inserting them directly into the solution population of a new evolutionary search or the dynamic optimization search, they cannot apply well on unseen related problems that bear differences in structural properties, such as problem vertex size, topological structures, representations, etc.
2.2.2 Model-Based Knowledge Reuse Method On the other hand, instead of reusing the exact past solutions, Martin and Hauschild [38] proposed to combine pre-defined problem-specific distance metric with prior distribution mined from previous optimization experience to improve the modeldirected optimization methods, e.g., estimation of distribution algorithm (EDA). Further, Santana et al. [39] proposed to transfer the structural information from subproblems (previous parameter settings) to bias the construction of aggregation matrix of the EDA for solving multi-marker tagging single-nucleotide polymorphism (SNP). Next, Iqbal et al. [40] presented a study of reusing building blocks extracted from small-scale problems for more efficient problem solving on complex largescale problems based on learning classifier system. However, since these transfer approaches are designed for model-based evolutionary optimization methods (e.g., EDA), they cannot apply with the model free evolutionary algorithms, such as genetic algorithm.
2.2 Knowledge Learning and Transfer in Meta-Heuristics
11
2.2.3 Transfer Learning Based Knowledge Reuse Method More recently, to capture the general and structured knowledge which can be transferred across different problems and algorithms, some attempts have been made to leverage the advantage of transfer learning to boost the optimization performance of meta-heuristics. For instance, in [41], instead of the direct reuse of past optimized routing solutions, Feng et al. proposed to learn a transformation matrix from optimized routing solutions, which is then used to guide the initialization of routing solutions for the evolutionary search on newly encountered different routing problems. Haslam et al. [42] investigated the ability of genetic programming (GP) with transfer learning on different types of transfer scenarios for symbolic regression, and analyzed how the knowledge learned from the source domain was utilized during the learning process on the target domain. Further, Feng et al. [43] proposed a single layer denoising autoencoder (DA) to build the connections between problem domains by treating past optimized solutions as the corrupted version of the solutions for the newly encountered problem. In this way, optimized solutions found in one problem domain can be transferred to guide the search on another heterogeneous problems.
2.2.4 Online Knowledge Reuse Method In contrast to the other three categories, online knowledge reuse methods usually conducts optimization search concurrently on multiple search spaces corresponding to different tasks or optimization problems, each possessing a unique function landscape. The knowledge transfer occurs while the optimization processes progress online. For instance, Gupta et al. presented a multifactorial evolutionary algorithm within the domain of single- and multi-objective optimization in [44, 45], respectively. The knowledge transfer between problem is realized via genetic crossover. In [46], Zhou et al. proposed a permutation based unified representation and a split based decoding operator for conducting online knowledge reuse across multiple NP-hard capacitated vehicle routing problems. Moreover, Ting et al. explored the resource allocation mechanism for reallocating fitness evaluations on offsprings of different tasks and proposed a framework, namely evolution of biocoenosis through symbiosis (EBS), for many-tasking optimization problem in [47, 48], respectively. In [49], towards positive knowledge sharing across tasks, Bali et al. proposed a linearized domain adaptation strategy that transforms the search space of a simple task to the search space similar to its constitutive complex task.
12
2 Preliminary
2.3 Memetic Computation The term “meme” can be traced back to Dawkins in his book “The Selfish Gene”, where he defined it as “a unit of information residing in the brain and is the replicator in human cultural evolution” [50]. While some researchers believe that memes should be formalized as the information or knowledge restricted to the brain, others consider that the concept extends to artifacts and behaviors. In “The Meme Machine” [51], Blackmore reaffirmed meme as knowledge copied from one person to another and discussed on the theory of “memetic selection” as the survival of the fittest among competitive ideas down through generations. Other researchers on the other hand have defined and discussed memes in different ways, from being “contagious information patterns that replicate by parasitically infecting human minds” [52], “constellation of activated neuronal synapses in memory or information encoded in neural structure” [53] to “ideas, the kind of complex idea that forms itself into a distinct memorable unit” [54], and even “genotype as mental representation, and phenotype as implemented behavior or artifact” [55]. For the past few decades, the meme-inspired computing methodology, or more concisely memetic computation, has attracted increasing research attention recently [1]. Memetic computation is a paradigm wherein the notion of meme(s) is used as the basic information unit encoded in computational representations for the purpose of problem solving. Many of the existing works in memetic computation have been established as an extension of the classical evolutionary algorithms, taking the form of hybrid [56], adaptive hybrid [57, 58] or Memetic Algorithm (MA) [59], where a meme is perceived as a form of individual learning procedure or local search operator to improve the capability of population based search algorithms. However, falling back on the fundamental definition of a meme by Dawkins and Blackmore [50, 60] (i.e., as the fundamental building blocks of culture evolution), the potential merits or true nature of memes remains yet to be fully exploited. Beyond the simple and adaptive hybrids of memetic algorithms, the current research of memetic computation further culminates into a meme-centric “memetic automaton” [1]. More specifically, meme-centric memetic automaton has been proposed as an adaptive entity or a software agent that is self-contained where memes are defined as units of information or building blocks of knowledge. This software agent is capable of interacting with its surroundings and adapting to complex environment by evolving the memes. In contrast to existing methods on knowledge transfer in agent-based systems, memetic automaton involves the additional aspects of cultural evolution taking inspiration from Darwin’s theory of natural selection and Darwin’s Universal Darwinism. Specifically, the essential backbone of a memetic automaton comprises a series of meme-inspired cultural evolution mechanisms including memetic transmission, selection, replication, or variation. Due to the physiological limits of agents’ ability to perceive differences, small amount of errors will be introduced. These errors could generate the “growth” and/or “variation” of agents’ knowledge that learnt from the domain [61], hence exhibit higher adaptivity capabilities on solving complex problems. Further, memes are naturally defined as building
2.3 Memetic Computation
13
blocks of agents’ knowledge that have transpired in the form of recurring real-world patterns and represented by the neuron [62], tree [63] or graph [64] structures. These meme patterns provide high-level knowledge representations of common problems, hence enable reuse across problem domains and accommodate the heterogeneity of agents of differing learning architectures.
References 1. X. Chen, Y. Ong, M. Lim, K.C. Tan, A multi-facet survey on memetic computation. IEEE Transactions on Evolutionary Computation 15(5), 591–607 (2011) 2. K. Sörensen, F. Glover, Metaheuristics. Encycl. Oper. Res. Manag. Sci. 62, 960–970 (2013) 3. R. Balamurugan, A. Natarajan, K. Premalatha, Stellar-mass black hole optimization for biclustering microarray gene expression data. Appl. Artif. Intell. 29(4), 353–381 (2015) 4. I. Rechenberg, Cybernetic Solution Path of an Experimental Problem (Royal Aircraft Establishment Library Translation, 1965) 5. S.F. Fogel, A.J. Owens, Artificial Intelligence Through Simulated Evolution (Wiley, New York, NY, USA, 1966) 6. J.H. Holland, Adaptation in Natural and Artificial Systems: an Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (MIT Press, 1992) 7. F. Glover, Heuristics for integer programming using surrogate constraints. Decis. Sci. 8(1), 156–166 (1977) 8. S.F. Smith, A learning system based on genetic adaptive algorithms. Ph.D. Dissertation, University of Pittsburgh, 1980 9. S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing. Science 220(4598), 671–680 (1983) 10. F. Glover, Future paths for integer programming and links to artificial intelligence. Comput. Oper. Res. 13(5), 533–549 (1986) 11. M. Dorigo, Optimization, learning and natural algorithms. Ph.D. Dissertation, Politecnico di Milano, Italie, 1992 12. K. Chakhlevitch, P. Cowling, Hyperheuristics: recent developments, in Adaptive and Multilevel Metaheuristics (Springer, 2008), pp. 3–29 13. J. Kennedy, R. Eberhart, Particle swarm optimization, in Proceedings of 1995 International Conference on Neural Networks, vol. 4 (IEEE, 1995), pp. 1942–1948 14. R. Storn, K. Price, Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997) 15. Z.W. Geem, J.H. Kim, G.V. Loganathan, A new heuristic optimization algorithm: harmony search. Simulation 76(2), 60–68 (2001) 16. K.M. Passino, Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst. Mag. 22(3), 52–67 (2002) 17. D. Karaboga, An idea based on honey bee swarm for numerical optimization. Technical Report, Technical report-tr06 (Erciyes University, 2005) 18. X.-S. Yang, Firefly algorithms for multimodal optimization, in 5th International Symposium on Stochastic Algorithms (Springer, 2009), pp. 169–178 19. X.-S. Yang, S. Deb, Engineering optimisation by cuckoo search. Int. J. Math. Model. Numer. Optim. 1(4), 330–343 (2010) 20. H.R. Lourenço, O.C. Martin, T. Stützle, Iterated local search, in Handbook of Metaheuristics (Springer, 2003), pp. 320–353 21. N. Mladenovi´c, P. Hansen, Variable neighborhood search. Comput. Oper. Res. 24(11), 1097– 1100 (1997)
14
2 Preliminary
22. C. Voudouris, E. Tsang, Guided local search and its application to the traveling salesman problem. Eur. J. Oper. Res. 113(2), 469–499 (1999) 23. P. Moscato, On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Caltech Concurrent Computation Program, C3P Report 826 (1989) 24. T.S. Duque, D.E. Goldberg, K. Sastry, Improving the efficiency of the extended compact genetic algorithm, in Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, pp. 467–468 (2008) 25. Z.-J. Lee, C.-Y. Lee, A hybrid search algorithm with heuristics for resource allocation problem. Inf. Sci. 173(1–3), 155–167 (2005) 26. B. Liu, L. Wang, Y.-H. Jin, An effective PSO-based memetic algorithm for flow shop scheduling. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 37(1), 18–27 (2007) 27. J. Yang, L. Sun, H.P. Lee, Y. Qian, Y. Liang, Clonal selection based memetic algorithm for job shop scheduling problems. J. Bionic Eng. 5(2), 111–119 (2008) 28. Y.S. Ong, A.J. Keane, Meta-lamarckian learning in memetic algorithms. IEEE Trans. Evol. Comput. 8(2), 99–110 (2004) 29. J.E. Smith, Coevolving memetic algorithms: a review and progress report. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 37, 6–17 (2017) 30. Q.H. Nguyen, Y.S. Ong, M.H. Lim, N. Krasnogor, Adaptive cellular memetic algorithms. Evol. Comput. 17(2), 231–256 (2009) 31. Q.H. Nguyen, Y.S. Ong, M.H. Lim, A probabilistic memetic framework. IEEE Trans. Evol. Comput. 13(3), 604–623 (2009) 32. X. Chen, Y.-S. Ong, M.-H. Lim, K.C. Tan, A multi-facet survey on memetic computation. IEEE Trans. Evol. Comput. 15(5), 591–607 (2011) 33. F. Neri, C. Cotta, P. Moscato, Handbook of Memetic Algorithms, vol. 379 (Springer, 2011) 34. A. Gupta, Y.S. Ong, Memetic Computation: The Mainspring of Knowledge Transfer in a DataDriven Optimization Era, vol. 21 (Springer, 2018) 35. S.J. Louis, J. McDonnell, Learning with case-injected genetic algorithms. IEEE Trans. Evol. Comput. 8(4), 316–328 (2004) 36. P. Cunningham, B. Smyth, Case-based reasoning in scheduling: reusing solution components. Int. J. Prod. Res. 35(4), 2947–2961 (1997) 37. S. Yang, X. Yao, Population-based incremental learning with associative memory for dynamic environments. IEEE Trans. Evol. Comput. 12(5), 542–561 (2008) 38. M. Pelikan, M.W. Hauschild, Learn from the past: improving model-directed optimization by transfer learning based on distance-based bias. Missouri Estimation of Distribution Algorithms Laboratory, University of Missouri in St. Louis, MO, United States. Technical Report, p. 2012007 (2012) 39. R. Santana, A. Mendiburu, J.A. Lozano, Structural transfer using edas: an application to multimarker tagging SNP selection. IEEE Congress on Evolutionary Computation, pp. 1–8 (2012) 40. M. Iqbal, W. Browne, M.J. Zhang, Reusing building blocks of extracted knowledge to solve complex, large-scale boolean problems. IEEE Trans. Evol. Comput. 18, 465–580 (2014) 41. L. Feng, Y.S. Ong, A.H. Tan, I.W. Tsang, Memes as building blocks: a case study on evolutionary optimization + transfer learning for routing problems. Memet. Comput. 7(3), 159–180 (2015) 42. E. Haslam, B. Xue, M. Zhang, Further investigation on genetic programming with transfer learning for symbolic regression, in 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3598–3605 (2016) 43. L. Feng, Y. Ong, S. Jiang, A. Gupta, Autoencoding evolutionary search with learning across heterogeneous problems. IEEE Trans. Evol. Comput. 21(5), 760–772 (2017) 44. A. Gupta, Y. Ong, L. Feng, Multifactorial evolution: toward evolutionary multitasking. IEEE Trans. Evol. Comput. 20(3), 343–357 (2016) 45. A. Gupta, Y.S. Ong, L. Feng, K.C. Tan, Multi-objective multifactorial optimization in evolutionary multitasking. IEEE Trans. Cybern. 47(7), 1652–1665 (2017) 46. L. Zhou, L. Feng, J. Zhong, Y.S. Ong, Z. Zhu, E. Sha, Evolutionary multitasking in combinatorial search spaces: a case study in capacitated vehicle routing problem, in 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2016)
References
15
47. Y.W. Wen, C.K. Ting, Parting ways and reallocating resources in evolutionary multitasking, in 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 2404–2411 (2017) 48. R.T. Liaw, C.K. Ting, Evolutionary many-tasking based on biocoenosis through symbiosis: a framework and benchmark problems, in 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 2266–2273 (2017) 49. K.K. Bali, A. Gupta, L. Feng, Y.S. Ong, T.P. Siew, Linearized domain adaptation in evolutionary multitasking, in 2017 IEEE Congress on Evolutionary Computation (CEC), pp. 1295–1302 (2017) 50. R. Dawkins, The Selfish Gene (Oxford University Press, Oxford, 1976) 51. S. Blackmore, The Meme Machine, vol. 25 (Oxford University Press, 2000) 52. G. Grant, Memetic lexicon. Principia Cybernetica Web (1990) 53. J. Delius, Of mind memes and brain bugs; a natural history of culture. Nature and Culture, pp. 26–79 (1989) 54. D.C. Dennett, Consciousness explained (Penguin UK, 1993) 55. L. Gabora, The origin and evolution of culture and creativity. J. Memet.: Evol. Model. Inf. Transm. 1(1), 1–28 (1997) 56. J.C. Spall, Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control, vol. 65 (Wiley, 2005) 57. Y.-S. Ong, M.-H. Lim, N. Zhu, K.-W. Wong, Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 36(1), 141–152 (2006) 58. E. Özcan, B. Bilgin, E.E. Korkmaz, A comprehensive analysis of hyper-heuristics. Intell. Data Anal. 12(1), 3–23 (2008) 59. P. Moscato, C. Cotta, Memetic algorithms. Handbook of Applied Optimization, pp. 157–167 (2002) 60. T. Bäck, U. Hammel, H.-P. Schwefel, Evolutionary computation: comments on the history and current state. IEEE Trans. Evol. Comput. 1(1), 3–17 (1997) 61. M.A. Runco, S. Pritzker, Encyclopedia of Creativity (Academic Press, 1999) 62. A.-H. Tan, L. Ning, D. Xiao, Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Trans. Neural Netw. 19(2), 230–244 (2008) 63. Y. Kameya, J. Kumagai, Y. Kurata, Accelerating genetic programming by frequent subtree mining, in Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation (ACM, 2008), pp. 1203–1210 64. S.-S. Choi, K. Jung, B.-R. Moon, Lower and upper bounds for linkage discovery. IEEE Trans. Evol. Comput. 13(2), 201–216 (2009)
Chapter 3
Optinformatics Within a Single Problem Domain
This chapter introduces specific algorithm designs of optinformatics in evolutionary learning and optimization within a single problem domain. In particular, the first algorithm considers the knowledge learning and reuse in the form of local search which is integrated in a global search, such as genetic algorithm, with the aim of improving the search efficiency and effectiveness of global optimization. The wellknown feature selection is used as the practical problem to evaluate the performance of the built optinformatics method. Further, towards general knowledge learning and transfer in evolutionary search, a paradigm that integrates evolutionary search with transfer learning is presented. Taking vehicle routing and arc routing problem as the case study, knowledge is defined beyond the form of local search. Based on the knowledge definition, the learning, selection, variation, and imitation of knowledge in evolutionary search for routing are discussed. To validate the performance of the optinformatic algorithm, comprehensive empirical studies on commonly used vehicle routing and arc routing problems are conducted.
3.1 Knowledge Reuse in The Form of Local Search In conventional MAs, knowledge is usually reused in the form of local search. As outlined in Fig. 3.1, domain knowledge based local search is usually introduced into MAs to facilitate the search of the algorithm. Studies on MAs have revealed their success on a wide variety of real-world problems, including, flow shop scheduling [1, 2], capacitated arc routing [3], VSLI floorplanning [4], material structure prediction [5], and financial portfolio optimization [6]. Particularly, MAs not only converge to high quality solutions, but also search more efficiently than their conventional counterparts [7–14]. In this section, we take MA based feature selection as an example to illustrate the knowledge reuse within a single problem domain. The statistical
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. Feng et al., Optinformatics in Evolutionary Learning and Optimization, Adaptation, Learning, and Optimization 25, https://doi.org/10.1007/978-3-030-70920-4_3
17
18
3 Optinformatics within a Single Problem Domain
Memetic Algorithm BEGIN 1. Initialization: Randomly generate an initial population; 2. While(Termination Criterion Not Fulfilled) 3. Evaluate the fitness of each solution in the population; 4. Apply local search on all or part of solutions in the population; 5. Perform selection, recombination and mutation to create a new population; 6. End While END
Fig. 3.1 Outlines of a memetic algorithm
characteristic or relevance of the features to the learning target is abstracted as knowledge and reused in the local search of the memetic feature selection. MAs for feature selection were proposed in [15, 16] and have shown superior performance over GAs and other search methods. Nonetheless, due to the inefficient nature of sequential local search [15] or random bit climbing local search [16] used in these existing methods, a large amount of redundant computation is incurred on evaluating the fitness of feature subsets. This would probably make them less attractive, particularly on problems with large feature size. To handle the feature selection problem for high dimensional data, a wrapper-filter feature selection algorithm (WFFSA) [12] was proposed by using univariate filter ranking based local search to fine-tune the population of genetic algorithm (GA) [17] solutions. Different local search strategies with different local search intensity are also investigated in [12]. The empirical studies on UCI and high dimensional microarray datasets suggest that, applying local search on only the best solution of each population with local search length of four and improvement first strategy, WFFSA searches more efficiently and is capable of producing good classification accuracy with a small number of features simultaneously. Nevertheless, as the univariate filter ranking method fails in detecting possible interactions between features, it is not able to identify redundant features efficiently. To handle problem with large number of irrelevant and redundant features, another Markov blanket embedded genetic algorithm (MBEGA) was introduced in [13]. MBEGA fine-tunes the GA search by adding relevant features and removing redundant and/or irrelevant features in the solutions based on approximate Markov blanket [18]. Studies on MBEGA indicate that it attains better accuracy and robustness than other feature selection algorithms on microarray datasets since such data contains very large number of redundant features. Based on the previous studies, this section introduces the general memetic feature selection framework which could be easily fitted to the demanding of various feature selection problems given appropriate induction algorithm and filter method. Before we detail the framework, some preliminaries on feature selection are provided as follows.
3.1 Knowledge Reuse in The Form of Local Search
19
3.1.1 Feature Selection Feature selection has become the focus of many real-world application oriented developments and applied research in recent years. With the rapid advance of computer and database technologies, problems with hundreds and thousands of variables or features are now ubiquitous in pattern recognition, data mining, and machine learning [19]. Given a training set of d labeled fixed-length feature vectors (instances) {(X 1 , Y1 ), ..., (X d , Yd )}, where an instance X i is described as an assignment of values X i = (X i1 , ..., X i N ) to a set of features F = (F1 , ..., FN ) and a class label Yi , the task of classification is to induce a classifier H : X → Y that accurately predicts the labels of novel instances based on the feature values. Theoretically, more features should provide more accurate prediction, while in practice, huge datasets with extremely large number of features will not only significantly slow down the learning process, but also cause confusion to the classifier due to irrelevant or redundant features. Feature selection addresses this problem by removing the irrelevant, redundant, or noisy features. It improves the performance of the classification, reduces the computational cost, and provides better understandings of the datasets. The problem of feature selection involves selecting a minimal subset of M features S = (S1 , ..., S M ) from the original feature set F = (F1 , ..., FN ), where (M ≤ N ) and S ⊆ F, so that the feature space is optimally reduced and the performance of the classification is improved or not significantly decreased [20]. Figure 3.2 shows a simple example of feature selection, where there are 6 instance vectors belonging to two classes. Each instance is assigned 6 values from the feature set F = (F1 , ..., F6 ). Among the features, only F2 and F4 are relevant features, which could provide information for distinguishing the two classes, but they are redundant with each other as they are highly correlated with each other. Using either F2 or F4 , the classifier (e.g., with S = {F2 }, H : Yi = X i2 + 1) makes a correct prediction on all instances. A feature selection is expected to identify either F2 or F4 and eliminate other features.
Fig. 3.2 An example of feature selection
20
3.1.1.1
3 Optinformatics within a Single Problem Domain
Feature Selection Method
Generally, a typical feature selection method consists of four components: a generation procedure or search procedure, evaluation function, stopping criterion, and validation procedure. Every time a candidate subset of features is generated randomly or based on some heuristics, it is evaluated based on some independent (i.e., without involving any induction algorithm) or dependent criteria (i.e., performance of the induction algorithm). This procedure of feature selection is repeated until a predefined stopping condition is satisfied, which may be due to the completion of search, an achievement of a sufficiently good subset, or violation of the maximum allowable number of iterations. Note that the selection procedure uses only the training data, and the final selected subset should be validated using the induction algorithm and unseen test dataset. In this study, we focus on feature selection for classification. Therefore, the induction algorithm in this study is usually a classifier, e.g., K-Nearest Neighbors (KNN) or Support Vector Machine (SVM). The general process of feature selection is illustrated in Fig. 3.3. Depending on whether an inductive algorithm is used for feature subset evaluation, feature selection algorithms typically fall into three categories: filter, wrapper, and embedded methods [21, 22]. Filter methods evaluate the relevance of the feature subset by using the intrinsic characteristic of the training data, ignoring the induction algorithm. They usually evaluate each feature based on simple statistics obtained from the training data distribution. As filter method do not involve any induction algorithm, they are relatively computationally efficient and statistically robust against overfitting. However, on the other hand, they are subject to the risk of selecting features that could not match the chosen induction algorithm. Wrapper methods, on the contrary, directly utilize the induction algorithm as a black box to evaluate subsets of features according to their learning performance. (Some commonly known embedded feature selection methods, e.g. decision tree, implement
Fig. 3.3 Feature selection procedure
3.1 Knowledge Reuse in The Form of Local Search
21
the same idea except that they directly proceed the feature selection during the training the induction algorithm.) Wrapper methods conduct the search of good subset using the induction algorithm as part of the evaluation function. Therefore, they are remarkably simple and generally select more suitable feature subsets for the induction algorithm than filter methods and also outperform in terms of prediction accuracy. The trade-off however is that wrapper methods generally require massive amounts of computation. Embedded methods perform feature selection in the training of inductive algorithm. Embedded methods are efficient for simultaneously optimizing the induction algorithm and feature selection, yet they usually are specific to the given inductive algorithm.
3.1.1.2
Search the Feature Space
The key issue for feature selection methods, especially for wrapper methods, is how to search the space of all possible variable subsets. Ideally, a feature selection method searches through the subsets of features and tries to find the optimal among the competing 2 N candidate subsets according to some evaluation criteria. This problem has been known to be NP-hard [23]. A number of search strategies have been proposed in the literature for feature selection. They can be categorized into complete search (e.g., FOCUS [24], Branch and Bound [25]), heuristic search (e.g., best-first method used in CFS [26], sequential search [27]), and random search (e.g., Random Search used in LVF [28], Las Vegas algorithm [29]). These methods have shown promising results in a number of applications. However, as the number of features N increases, most of these existing methods face the problems of intractable computational time and convergence to suboptimal feature subsets. Recently, an increasing number of nature inspired approaches such as GA are used in feature selection due to their wellknown ability to produce high quality solutions within a tractable time on complex search problems. Inspired by the power of natural evolution, GA conducts global search by maintaining a population of candidate solutions and evolving the population by subjecting it to replication, variation, and selection. GA has been widely used for feature selection and shown promising performance [15, 30, 31]. However, as the feature size increases, GA could take a long time to locate the local optimum in a region of convergence and may sometimes not find the optimum with sufficient precision due to its inherent nature. One way to solve this problem is to hybridize GA with some knowledge-based local search operations to fine-tune and improve the solutions generated by GA, which results in the memetic feature selection.
3.1.2 Memetic Feature Selection As depicted in Fig. 3.4, the memetic feature selection framework is a hybridization of GA wrapper method and filter method based local search. The training data,
22
3 Optinformatics within a Single Problem Domain
Fig. 3.4 Memetic feature selection
classifier, filter method, and the independent test data are pending for user’s input on specific problem. In the first step, the GA population is initialized randomly with each chromosome encoding a candidate feature subset. Subsequently, the fitness of each chromosome is evaluated based on the given classifier. Afterward, a filter method based local search or meme that captures the knowledge of the feature relevance is applied on all or portions of the chromosomes in the spirit of Lamarckian learning [32]. Genetic operators, i.e., selection, crossover, and mutation, are then used to generate the new population. This procedure repeats until the stopping conditions are satisfied and the best chromosome in the population is outputted as the final solution whose performance is evaluated using the unseen test data.
3.1 Knowledge Reuse in The Form of Local Search
3.1.2.1
23
Encoding Representation and Population Initialization
In this memetic feature selection framework, we use a binary string chromosome to represent a candidate feature subset (as shown in Fig. 3.4). The length of the binary string chromosome equals to the total number of features N , with each bit in the chromosome encoding a single feature. A bit ‘1’ (‘0’) indicates the corresponding feature is selected (excluded). The initial population of the GA search is generated randomly.
3.1.2.2
Feature Subset Fitness Function
The objectives of feature selection are to maximize the classification accuracy and meanwhile minimize the number of selected features. One way to handle this problem is to use the aggregating function method, where the objective function is defined as a linear combination of the generalization error and the number of selected features. The weights can be empirically tuned based on the bounds of the number of selected features and the generalization error. However, the weights would significantly vary for different datasets and the selected feature subset would largely depends on the underlying weight vector. To avoid the aforementioned problem, the classification accuracy is considered to be more important than the selected feature size in this study. The fitness function is defined by the generalization error of the input classifier, i.e., Fitness(c) = J (H, Sc , E )
(3.1)
where H and Sc denote the corresponding input classifier and selected feature subset encoded in chromosome c, respectively. E specifies the classification error evaluation method which could be training error, cross validation, bootstrap and so on. The feature selection criterion function J (H, Sc , E ) is calculated as the classification error using classifier H , feature subset Sc , and evaluation method E . When two chromosomes happen to have similar fitness (i.e. for a misclassification error of less than one training data instance, the difference between their fitness is designed here to be less than a small value of ε = 1/d, where d is the number of training instances) the one with a smaller number of selected features is given a higher chance of surviving to the next generation.
3.1.2.3
Knowledge-Based Memetic Operation/Local Search
The filter method based local search used in this memetic feature selection framework is briefly outlined in Fig. 3.5. For a given candidate feature subset encoded in a chromosome c and the filter method, memetic feature selection applies the local search to improve its fitness based on the heuristic operators, namely Add and Del.
24
3 Optinformatics within a Single Problem Domain
Fig. 3.5 Local search procedure
In particular, the Add operator adds features to the candidate feature subset based on the relevance measure defined in the filter method, while the Del operator removes existing features in the selected subset using not only the relevance measure but also a redundancy measure if it is provided in the filter method. More details of Add and Del are given in the following text. The computational complexity of the local search is quantified by the search range L, which defines the maximum numbers of Add and Del operations. The Add and Del operators are applied on c under the improvement first strategy [12], i.e., a random choice from L 2 possible combinations of Add and Del operations is used to search until the first improved solution is obtained. Since our previous study in [12] suggests that applying local search on only the elite chromosome with L = 4 gives best balance between global and local search, we use such a configurations in the following empirical study. (1) Add and Del Operators For a given chromosome c, we define c and c as the subsets of selected and excluded features encoded in c, respectively. The Add operator select features from c and add them to c , while the Del operator removes existing features from c to c . The key question is what features to add or delete in a single operation. The Add and Del operators are illustrated in Fig. 3.4. The features in c are ranked using relevance measure provided in the filter method. F j is the highest ranked feature in c . In Add operation, F j would be the most likely feature to be moved to c . On the other hand, the features in c are ranked using both relevance measure and redundancy measure. Let Fi be the lowest ranked features in c . Using the Del operator, Fi is the most likely feature to be moved to c . Figure 3.4 also depicts the two most likely resultant chromosomes after the Add and Del operations.
3.1 Knowledge Reuse in The Form of Local Search
25
Fig. 3.6 Add operator
Fig. 3.7 Del operator in FR
The detail procedure of Add and Del are given in Figs. 3.6 and 3.7, respectively. In Add, features in c are ranked based on the relevance measure, a feature i from c is selected using the linear ranking selection method [33], so that the more relevant a feature is, the more it is likely to be selected and moved to c . While, in Del, all features in c are firstly ranked based on the relevance measure and the availability of redundancy measure is checked. If the redundancy measure is provided in the filter method, a highly relevant feature i is selected using linear ranking selection method and the Del proceeds to remove all other features in c which are considered to be redundant with i using the provided redundancy measure. In the case there is no feature redundant to i , the operator would try to delete i itself. If no redundancy measure is available, a less relevant feature i from c would be selected to c using linear ranking selection method.
26
3 Optinformatics within a Single Problem Domain
(2) Filter Methods It is noted that any filter feature selection method that could provide feature relevance and redundancy measure could be easily fitted into the local search. Here, we consider three different kinds of filter methods. Filter Ranking (FR) FR method was first used in a MA feature selection paradigm namely WFFSA [12]. In this method, all features are ranked individually based on ReliefF [34]. The relevance measure of each feature is based on an intuitive idea that a feature is more relevant if it distinguishes between a data instance and its nearest neighbor instances from different classes, and less relevant if it distinguishes between an instance and its nearest neighbors from the same class. It is noted that the redundancy measure is not available in FR method. Approximate Markov Blanket (AMB) Zhu et al. proposed to use the AMB [18] in the other MA feature selection method MBEGA for gene selection problem [13]. In AMB, the relevance measure is defined as the C-correlation [18], which evaluates the correlation between a feature and the class label vector. A redundancy measure based on approximate Markov blanket [18] is provided in AMB method. In particular, if a feature Fi has a AMB given by F j , it suggests that Fi gives no additional useful information beyond F j on class Y . Hence, Fi could be considered as redundant and could be safely removed. Affinity Propagation (AP) In this method, all features are clustered using the unsupervised AP method [35], which is a message passing based clustering algorithm. It operates by simultaneously considering all features as potential cluster centers (called “exemplars”) and exchanging real-valued messages between features. Clusters are formed by assigning each feature to its most similar exemplar. The obtained exemplar in each cluster is considered as the best representative feature for the corresponding cluster and features located in the same cluster are considered to be redundant with each other. Accordingly, in the relevance measure, all features are evaluated based on C-correlation and all exemplar features are considered to be more relevant to class Y than those non-exemplar features in spite of the C-correlation value. The redundancy measure detects the feature redundancy based on the cluster membership of the features. For instance, without loss of generality, given a feature Fi , if there exists another feature F j located in the same cluster with Fi and F j is of less relevance than Fi based on the relevance measure, then F j is redundant with Fi and could be removed. In the following text, memetic feature selection paradigms with FR, AMB, and AP based local search would be denoted as MAFR, MAAMB, and MAAP, respectively.
3.1.2.4
Evolutionary Operations
In each generation, the local search procedure in the spirit of Lamarckian learning is processed on the elite chromosome. Subsequently, the population then undergoes the usual evolutionary operations including fitness based selection, uniform crossover,
3.1 Knowledge Reuse in The Form of Local Search
27
and mutation operators with elitism [17]. In some cases where the upper bound m of the optimum number of features is acknowledged, it is beneficial to accelerate the evolutionary search by imposing a constraint on the number of bits ‘1’ in each chromosome to a maximum of m. To do so, the restrictive crossover and mutation [12] should be taken for account instead of simple GA operators, so that the number of bits ‘1’ in each chromosome would not violate the constraint m during evolution.
3.1.3 Empirical Studies On Real-World Applications In this section, we investigate the performance of memetic feature selection methods on both microarray and hyperspectral imagery datasets. In particular, we consider the AP based method [36], FCBF (fast correlation-based filter) [18], and standard GA feature selection for comparison. As these methods have been successfully used for gene selection or band selection and demonstrated to attain promising performance [18, 36–39]. Some brief introductions of these methods are provided as follows. FCBF represents a fast correlation based filter method. It first selects a subset of relevant features whose C-correlation are greater than a predefined threshold. Afterward, the selected features are sorted based on C-correlation and the redundant features, which have AMBs, are removed one-by-one from the list. The remaining features thus form the predominant feature subset with zero redundant features. AP feature selection method [36] operates by simultaneously considering all features as potential cluster centers namely exemplars and exchanging messages between features until a good set of exemplars and clusters emerges. The number of exemplars is controlled by quantity called “preference”, which characterizes the a priori knowledge of how good the band is as an exemplar. In this study, the preference value of each feature is set to the median of the similarity matrix by default. After the clustering procedure is finished, all exemplars form the final selected feature subset. The standard GA feature selection is similar to memetic feature selection except that it does not involve the local search. In our empirical study, the memetic feature selection methods and standard GA use the same parameter configurations with population size = 50, crossover probability = 0.6, and mutation rate = 0.1. The stopping criteria of GA and memetic feature selection methods are defined by a convergence to the global optimal or a maximum computational budget of 2000 fitness functional calls is reached. It is worth noting that the fitness function calls made to J (H, Sc , E ) in the local search are also included as part of the total fitness function calls for fair comparison to the GA. The maximum number of bit ‘1’ in the chromosome m is set to 50 for binary class microarray datasets and hyperspectral imagery datasets. For multi-class microarray datasets, m is set to 150.
28
3.1.3.1
3 Optinformatics within a Single Problem Domain
Microarray Data
We first consider the real-world microarray data with significantly large number of features (genes) but only small number of samples. In particular, fourteen publicly available datasets as tabulated in Table 3.1 are considered in the present study. Due to the small number of samples, gene selection algorithms tend to suffer from selection bias or overfitting on microarray data [40]. To prevent such problem, we consider a balanced .632+ external bootstrap [41] for estimating the prediction accuracy of a given gene subset. At each bootstrap, a training set is sampled with replacement from the original dataset, and the test data is formed by unsampled instances. Note that J (H, Sc , E ) uses only the training data while the prediction accuracy of a feature subset is evaluated based on the unseen test data. The external bootstrap is repeated 30 times for each dataset and the average results are reported. In this study, the classifier H is set to the linear kernel support vector machine (SVM) since it has shown superior performance over other methods on microarray data [42]. The classification error evaluation method E is defined as the radius margin bound [43] of SVM which estimates the leave-one-out cross validation error with only one training on the SVM model. For multiclass datasets, a One-Versus-One strategy is applied. In Table 3.2, the average test error and average number of selected genes of each feature selection method on the 14 datasets are reported. AP obtains lower classification than other methods. AP fails to identify the relevant genes, due to the unsupervised nature of the clustering and the short of learning samples. FCBF tends to select more genes than other methods. Statistical test using random permutation test [55], which does not rely on independence assumptions, at significance level of 0.05 shows that all MAs, i.e., MAFR, MAAMB, and MAAP attain lower test error rates than the standard GA. This suggests the local searches in MAs have successfully helped to fine-tune the GA solution more accurately and efficiently. Resultant smaller subset of important genes that generates improved classification accuracies are found for the datasets. Both MAFR and MAAMB give the best performance among all feature selection methods in terms of classification accuracy. MAFR obtains slightly better accuracy, which would probably caused by involving considerably more genes than MAAMB. While, the difference of the accuracy is not significant between each other, and it is worth highlighting that MAAMB converges to significant smaller gene subsets than MAFR by eliminating more redundant genes, which would enhance the process of candidate biomarkers discovering as well as assist researchers in analyzing the increasing biological research data.
3.1.3.2
Hyperspectral Imagery Data
The second real-world application we consider is the band selection for hyperspectral imagery classification. Here we investigate the performance of each feature selection method on two hyperspectral imagery datasets. To calculate the fitness function J (H, Sc , E ), we adopt the K-Nearest Neighborhood (KNN) for H and 3-fold cross validation for E . KNN is one of the most fundamental and simple classification
3.1 Knowledge Reuse in The Form of Local Search Table 3.1 Description of ALL microarray datasets used Dataset Genes Samples Classes Colon
2000
62
2
CNS
7129
60
2
Leukemia
7129
72
2
Breast
24481
97
2
Lung
12533
181
2
Ovarian
15154
253
2
Leukemia3c
7129
72
3
Leukemia4c
7129
72
4
Brain Lymphoma
10367 4026
50 62
4 3
MLL
12582
72
3
NCI
1000
61
9
SRBCT
2308
83
4
Thyroid
2000
168
4
29
Description and Reference 40 colon cancer biopsies versus 22 normal biopsies [44] Outcome of the treatments for 60 central nervous system cancer patients (21 survivors and 39 are failures) [45] Two acute leukemia: Acute Myelogenous Leukemia (AML) and Acute Lympboblastic Leukemia (ALL) [46] 97 samples from breast cancer patients (46 patients developed distance metastases, the rest 51 remained healthy after their initial diagnosis for interval of at least 5 years) [47] Classification between malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA) of the lung (31 MPM and 150 ADCA) [48] The proteomic spectra of 91 normal persons and 162 ovarian cancer patients [49] AML, ALL B-cell, and ALL T-Cell [46] AML-Bone Marrow, AML-Peripheral Blood, ALL B-cell, and T-Cell [46] 4 malignant glioma types [50] Three most prevalent adult lymphoid tumors [51] AML, ALL, and mixed-lineage leukemia (MLL) [52] NCI60 dataset with 1000 preselected genes [37] Small, round blue cell tumors of childhood [53] 3 thyroid tumor types (Follicular adenoma, Follicular carcinoma, and Papillary Carcinoma) and 1 normal tissues [54]
30
3 Optinformatics within a Single Problem Domain
Table 3.2 Performance of feature selection algorithms on microarray datasets AP FCBF GA MAFR MAAMB MAAP Colon CNS Leukemia Breast Lung Ovarian Leukemia3c Leukemia4c Brain Lymphoma MLL NCI SRBCT Thyroid Average
Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc | Acc |Sc |
84.89 30.1 71.48 23.9 98.08 28.1 70.56 21.6 95.58 20.5 94.48 33.5 88.42 22.4 84.81 22.6 75.7 80.7 95.42 50.2 90.35 30.4 71.36 71.6 78.69 25.4 59.36 40.5 82.80 35.8
84.54 21 73.29 52.8 93.82 32.8 75.89 127.9 95.7 66.3 99.91 31.3 95.92 79.0 94.40 96.5 82.61 89.5 94.10 42.7 94.10 107.0 75.62 83.9 99.01 102.6 82.03 85.6 88.64 72.8
81.01 23.3 68.3 24.1 92.31 25.2 68.81 22.1 98.07 24.4 99.43 23.3 93.63 97.8 91.87 98.7 83.58 99.4 98.83 100.3 95.45 96.6 76.16 98.3 97.63 99.7 80.66 100.9 87.55 66.7
84.77 23.1 72.3 22.1 96.81 29.6 74.18 27.8 99.12 28.6 99.50 18.7 97.44 99.1 94.41 98.2 84.42 97.5 99.19 96.0 95.68 82.2 75.37 94.0 99.09 104.1 83.08 99.3 89.67 65.7
85.66 24.5 72.21 20.5 95.89 12.8 80.74 14.5 98.96 14.1 99.52 9 95.72 23.9 93.26 27.7 83.36 39.3 98.94 38.1 95.07 31.8 74.36 44.7 99.16 52.8 80.70 29.7 89.54 27.4
84.79 18.9 73.09 22.7 97.03 32.5 73.47 15.0 98.35 17.1 99.72 31.6 92.24 43.5 91.25 34.2 82.42 84.8 98.85 43.2 94.47 53.4 74.55 93.8 97.96 86.8 79.15 85.9 88.38 47.4
Acc: Classification Accuracy; |Sc |: Number of selected features
methods. It should be one of the first choices when there is little or no prior knowledge about the distribution of the data. KNN classifier achieves high performance when a large number of training samples are available [36]. The class of a new sample is determined by the labels of k training data points that are nearest this sample. In the following experiments, the number of neighbors k in KNN is set to be 3. It is worth to mention that the low signal-to-noise bands which cover water absorption range of the spectrum are usually removed in advance [56]. However, they are purposefully not removed here to examine the performance of the proposed feature extraction
3.1 Knowledge Reuse in The Form of Local Search
31
algorithms. Additionally, in order to make the classification results robust, the training and test data sets are randomly chosen from the data five times, and the mean value is selected as the final pixel classification accuracy. The band selection is conducted using only the training data, and the final selected band subset is evaluated using the unseen test data. The first hyperspectral imagery dataset (as shown in Fig. 3.8) is a section of the subscene taken over Washington D.C mall (500×307 pixels, 210 bands, and 7 classes composed of water, vegetation, man-made structures and shadow) by the HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor [57]. The 7 classes are highlighted with different color in Fig. 3.8a. Figure 3.8b and c show images taken from the randomly selected spectral bands of 10 and 100, respectively. Due to the limitation in computer capacity, we only randomly choose 1600 samples from each class. Eventually, a set of 2800 training samples (400 from each class for learning the KNN classifier) and a set of 8400 test samples (1200 from each class for assessing the accuracy) are obtained. The second hyperspectral imagery dataset (as shown in Fig. 3.9) is a section of a scene taken over northwest Indiana’s Indian Pines (145×145 pixels, 220 bands) by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in 1992 [58]. The 16 different land-cover classes available in the original ground truth are depicted in Fig. 3.9a. There randomly selected images of band 5, 60, and 180 are also shown in Fig. 3.9b–d. From the 16 classes, seven are discarded to make the experimental analysis more significant, since only few training samples are available for them [59]. For the remaining nine land-cover classes, we choose a quarter from each class to generate the training samples and the rest to be the test samples (as shown in Table 3.3). The band selection results of all feature selection methods on the two hperspectral imagery datasets are tabulated in Table 3.4. Due to the stochastic nature of GA and MAs, the average training accuracy, test accuracy, and number of selected features of GA and MAs for ten runs are reported. The filter methods, i.e., AP and FCBF, tend to select fewer bands and gain better training accuracy, but that does not guarantee a consistent performance on the independent test data. The results in Table 3.4 show that there exists a big gap between the training and test accuracy of AP and FCBF. As the filter methods do not consider the classifier KNN in feature evaluation, they are likely to select feature subsets that could not match KNN. On the other hand, statistical test using random permutation test suggests that the accuracy of wrapper methods, i.e., the standard GA and three MAs, are not significantly different on both training and test data. All of them outperform the filter methods in terms of test accuracy. The test accuracies of GA and MAs are not significantly different from each other, while MAAP performs better than other wrapper methods with slightly higher accuracy and smaller number of selected bands.
32
3 Optinformatics within a Single Problem Domain
Fig. 3.8 Washington D.C. Mall HYDICE dataset
(a) Ground Truth
(b) Band 10
(c) Band 100
3.1 Knowledge Reuse in The Form of Local Search
33
(a) Ground Truth
(b) Band 5
(c) Band 60
(d) Band 180
Fig. 3.9 Indian pines AVIRIS dataset Table 3.3 Land cover classes with training and test set sizes of the Indian pines data used in the experiment Class Land cover type Train Test 1 2 3 4 5 6 7 8 9
Corn-no till Corn-mil till Grass/Pasture Grass/Trees Hay-windrowed Soybean-no till Soybean-mil till Soybean-clean till Woods Total
358 208 124 186 122 242 617 153 323 2333
1076 626 373 561 367 726 1851 461 971 7012
34
3 Optinformatics within a Single Problem Domain
Table 3.4 Performance of feature selection methods on hyperspectral datasets AP FCBF GA MAFR MAMB Washington
Indiana
Training Acc Test Acc |Sc | Training Acc Test Acc |Sc |
MAAP
91.32
92.36
88.21
88.32
87.93
88.21
87.54 14 86.63
87.52 5 66.57
88.23 32.3 82.34
88.19 30.1 82.94
88.26 32.2 83.33
88.60 24.6 82.94
75.90 11
41.29 5
81.87 32.5
81.69 28.9
82.73 35.4
82.80 25.5
Training Acc: Training Accuracy; Test Acc: Test Accuracy; |Sc |: Number of selected features
3.1.4 Summary In this section, we present a memetic framework for the hybridization of wrapper and filter feature selection methods and apply it to two real-world classification applications, the gene selection for cancer classification based on microarray data and the band selection for hyperspectral imagery classification. Three memetic feature selection methods with different local search scheme, i.e., Filter Ranking (MAFR), Approximate Markov Blanket (MAAMB), and Affinity Propagation (MAAP), are investigated using 14 microarray and 2 hyperspectral imagery datasets. A comparison study to the counterpart standard GA, filter methods AP and FCBF is conducted. All memetic feature selection methods are observed to obtain superior or competitive performance in terms of classification accuracy and number of selected features. Particularly, MAAMB obtains the best performance on microarray data. It is capable of effectively eliminating irrelevant and redundant features based on both Markov blanket and predictive power of wrapper model, leading to the identification of a small set of reliable genes for biologists to concentrate on in their wet-lab study. On the hyperspectral imagery datasets, MAAP obtains best overall performance among all comparing methods. The results obtained in this study suggest that, reusing knowledge in the form of local search, the memetic feature selection framework is capable of improving the classification performance and accelerating the search in identifying the core feature subsets. Since different knowledge is required for different real-world applications, there is no single local search method could dominate among all real-world applications. A particular fine designed local search method should be carefully selected for a specific problem. Fortunately, as most of the knowledge of the features, e.g., the filter ranking information, relevance and redundancy, could be easily transformed or directly embedded into the local search procedure, the memetic feature selection framework would serve as a candidate platform for the hybridization of both filter and GA wrapper method, so that the feature selection problem could get a better chance to be solved by taking advantages from both of them.
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
35
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences Beyond reusing knowledge in the form of local search in MA, in this chapter, a study on Memetic Computation Paradigm: Evolutionary Optimization + Transfer Learning for search in a single problem domain is presented. It is well-known that by leveraging from the potential common characteristics among the problem instances that belongs to the same problem domain, i.e., topological properties, data distributions or otherwise, the effective assessments of future unseen related problem instances can be achieved more efficiently, without the need to perform an exhaustive search each time or start the evolutionary search from a ground-zero knowledge state [60–62]. Besides the standard mechanisms of a conventional evolutionary search, for instance the genetic operators in the case of Genetic Algorithm, the proposed approach has four additional culture-inspired operators, namely, Learning, Selection, Variation and Imitation. The role of the learning operator is to mine for knowledge meme1 from past experiences of problem-solving, which shall then manifest as instructions to bias the search on future problems intelligently (i.e., thus narrowing down the search space). The selection operator, on the other hand, selects the high quality knowledge meme that shall then replicate and undergo new innovations via the variation operator, before drawing upon them to enhance future evolutionary search. Lastly, the imitation operator defines the assimilation of knowledge meme in subsequent problem solving. In this section, the proposed memetic computation paradigm: evolutionary optimization + transfer learning is presented. In particular, four culture-inspired operators, which introduce high quality solutions into the initial population of the evolutionary search on related problems, thus leading to enhanced optimization performances, are proposed. In this approach, the instructions for carrying out the behavior to act on a given problem are modeled as knowledge memes. The knowledge memes serve as the building blocks of past problems solving experiences that may be efficiently passed on or replicated to support the search on future unseen problems, by means of cultural evolution. This capacity to draw on the knowledge from previous instances of problem-solving sessions in the spirit of memetic computation [64–66] thus allows future search to be more efficient on related problems.
3.2.1 Transfer Learning as Culture-Inspired Operators The proposed memetic computation paradigm based on evolutionary optimization (i.e., Fig. 3.10a) + transfer learning (i.e., Fig. 3.10b) is depicted in Fig. 3.10. It is composed of a conventional evolutionary algorithm as depicted in Fig. 3.10a (or it 1 A meme is defined as the basic unit of cultural transmission in [63] stored in brains. In the context
of computational intelligence, memes are defined as recurring real-world patterns or knowledge encoded in computational representations for the purpose of effective problem-solving [64].
36
3 Optinformatics within a Single Problem Domain
Fig. 3.10 Proposed memetic computation paradigm: evolutionary optimization (i.e., Fig. 3.10a) + transfer learning (i.e., Fig. 3.10b)
can be any state-of-the-art evolutionary algorithm in the domain of interest) and four culture-inspired operators proposed for facilitating faster evolutionary optimization of related problems as depicted in Fig. 3.10b, namely Learning, Selection, Variation and Imitation, whose functions are described in what follows: • Learning Operator: Given that p corresponds to a problem instance and s∗ denotes the optimized solution of p, as attained by an evolutionary solver (labeled here as E S). The learning operator takes the role of modeling the mapping from p to s∗ , to derive the knowledge memes. Thus, the learning process evolves in an incremental manner, and builds up the wealth of ideas in the form of identified knowledge, along with the number of problem instances solved. Note the contrast to a simple storage or exact memory of specific problem instance p with associated solution s∗ as considered in the previous studies based on case-based reasoning [62]. • Selection Operator: Different prior knowledge introduces unique forms of bias into the search. Hence a certain bias would make the search more efficient on
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
37
some classes of problem instances but not for others. Inappropriately harnessed knowledge, on the other hand, may lead to the possible impairments of the search. The selection operator thus serves to select the high quality knowledge, from the knowledge pool, that replicate successfully. • Variation Operator: Variation forms the intrinsic innovation tendency of the cultural evolution. Without variations, maladaptive form of bias may be introduced in the evolutionary searches involving new problem instances. For instance, a piece of knowledge, which has been established as beneficial based on its particular demonstration of success on a given problem instance would quickly spiral out of control via replication. This will suppress the diversity and search of the evolutionary optimization across problems. Therefore, variation is clearly essential for retaining diversity in the knowledge pool towards efficient and effective evolutionary search. • Imitation Operator: From Dawkins’s book entitled “The selfish Gene” [63], ideas are copied from one person to another via imitation. In the present context, knowledge memes that are learned from past problem solving experiences replicate by means of imitation and used to enhance future evolutionary search on newly encountered problems.
3.2.2 Learning from Past Experiences The schemata representation of knowledge meme in computing as the latent pattern that is encoded in the mystical mind of nature is first identified. The problem solving experiences on the encountered problems are then captured via learning and crystallized as a part of the knowledge pool that form the memes or building blocks in the society of mind [67]. In this manner, whenever a new problem comes about, the selection operator kicks in to first identify the appropriate knowledge memes from the wealth of previously accumulated knowledge. These knowledge memes then undergo variations to effect the emergence of innovative knowledge. Enhancements to subsequent problem-solving efficiency on given new problems is then achieved by means of imitation. Referring to Fig. 3.10, at time step i = 1, the evolutionary solver E S is faced with the first problem instance p1 to search on. Since p1 denotes the first problem of its kind to be optimized, no prior knowledge is available for enhancing the evolutionary solver, E S, search.2 This is equivalent to the case where a child encounters a new problem of its kind to work on, in the absence of a priori knowledge that he/she could leverage upon. This condition is considered as “no relevant knowledge available” and the search by solver E S shall proceed normally, i.e., the selection operator remains dormant. If s1∗ corresponds to the optimized solution attained by solver E S on problem instance p1 and M denotes the knowledge meme or building 2 If a database of knowledge memes that are learned from relevant past problem solving experiences
in the same domain is available, it can be loaded and leveraged upon.
38
3 Optinformatics within a Single Problem Domain
block, then M1 is the learned knowledge derived from p1 and s1∗ via the learning operator. Since the learning process is conducted offline to the optimization process of future related problems, there is no additional computational burden placed on the existing evolutionary solver E S. On subsequent unseen problem instances j = 2, . . . , ∞, selection kicks in to identify the appropriate knowledge memes Ms from the knowledge pool, denoted here as SoM. Activated knowledge memes Ms then undergo the variation operator to arrive at innovated knowledge Mt that can be imitated to bias subsequent evolutionary optimizations by the E S. In this manner, useful experiences attained from previously solved problem instances are captured incrementally and archived in knowledge pool SoM to form the society of mind, which are appropriately activated to enhance future search performances. Like knowledge housed in the human mind for coping with our everyday life and problem solving, knowledge memes residing in the artificial mind of the evolutionary solver play the role of biasing the search positively on newly encountered problems. In this manner, the intellectual capacity of the evolutionary solver evolves along with the number of problems solved, with transferrable knowledge meme accumulating with time. When a new problem is encountered, suitable learned knowledge meme is activated and varied to guide the solver in the search process. This knowledge pool thus formed the evolving problem domain knowledge that may be activated to solve future evolutionary search efficiently.
3.2.3 Proposed Formulations and Algorithms for Routing Problems This section presents the proposed formulation and algorithmic implementations of the transfer learning culture-inspired operators, namely, learning, selection, variation and imitation for faster evolutionary optimization of related problems in the domain of routing, i.e., searching for the suitable task assignments (i.e., customers that require to be serviced) of each vehicle, and then finding the optimal service order of each vehicle for the assigned tasks. The pseudo-code and detailed workflow for the realizations of the proposed new memetic computation paradigm on routing problem domain, are outlined in j Algorithm 2 and Fig. 3.11. For a given new routing problem instance pnew (with data j representation Xnew ) posed to evolutionary solver E S, the mechanisms of the selection operator kicks in to select the high quality knowledge memes Ms to activate, if the knowledge pool SoM is not empty. Variation, which takes inspirations from the human’s ability to simplify from past knowledge learned in previous problem solving experiences, then operates on the activated knowledge memes Ms to arrive at the generalized knowledge Mt . Subsequently, for given new problem instances j Xnew , imitation proceeds to positively bias the search of evolutionary optimization solver E S, using the generalized knowledge Mt , followed by clustering and pairwise distance sorting (P DS) to generate the biased tasks assignment and service orders
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
39
Algorithm 2: Pseudo code of Fast Evolutionary Optimization of Routing by Transfer Learning from Past Experiences. 1 Begin: j
j
2 for j = 1 : ∞ new problem instances pnew or Xnew do 3 if SoM! = ∅ then 4 /*knowledge pool not empty*/ 5 Perform selection to identify high quality knowledge memes or distance matrices Ms ∈ SoM /*see Eq. 3.11* in later Sect. 3.2.3.2/ 6
Perform variation to derive generalized knowledge Mt from Ms.
7
Perform imitation of Mt on Xnew to derive the transformed problem distribution
j
j Xnew , where j j Xnew = T rans f or m(Xnew , Mt )
8
Empty the initial solution population . for g = 1 : Population Si ze do /*Fig. 3.14(a)→Fig. 3.14(b)*/ 1. T ask Assignment o f sg =
9 10 11 12
j
13 14 15
K Means(Xnew , V ehicle N o., R I ) /Fig. 3.14(b)→Fig. 3.14(c), R I denotes random initial points*/
16 17 18 19
2. Service Order o f sg = P DS(Xnew ) /*Fig. 3.14(c)→Fig. 3.14(d), P DS(·) denotes the pairwise distance sorting*/ 3. Insert sg into .
20 21
j
24
else Proceed with the original population initialization scheme of the evolutionary solver E S. /*Start of Evolutionary Solver E S Search*/ Perform reproduction and selection operations of E S with generated population sg until the predefined stopping criteria are satisfied. /*End of Evolutionary Solver ES Search*/
25
Perform learning on given pnew and corresponding optimized solution snew denoted by
22 23
j
j j (Xnew , Ynew ),
26
j∗ j
attained by E S evolutionary solver to derive knowledge Mnew . j
Archive the learned knowledge of pnew into SoM knowledge pool for subsequent reuse.
27 End
j
solutions that would enhance the search performances on pnew . When the search j j on pnew completes, the problem instance pnew together with the obtained optimized j∗ j j solution snew of E S, i.e., (Xnew , Ynew ), then undergo the learning operation so as to update the knowledge pool SoM.3 3 Note
that as the learning operation is conducted offline, it does not incur additional cost to the j evolutionary optimization of pnew .
40
3 Optinformatics within a Single Problem Domain
Pold {pi | i 1,..., n} Mi
Problem Instances
Pnew {p j | j j Snew
n 1,..., }
{s g | g 1,..., PopSize}
SoM {Mi | i 1,..., n}
Problem Solver
MMD
Ms Problem Solutions
Generalization
Mt Clustering Pairwise Distance Sorting
Sold {si * | i 1,..., n} Fig. 3.11 Meme as the instructions for “intelligent” tasks assignment and ordering of routing problems
3.2.3.1
Learning Operator
This subsection describes the learning of knowledge memes, as building blocks of useful traits from given routing problem instances p and the corresponding optimized solutions s∗ (i.e., Line 25 in Algorithm 2). To begin, refer to Fig. 3.18 (in appendix section), which shall serve as the example routing problem instance used in the illustrations. Figure 3.12 on the other hand illustrate the learning of knowledge M from an optimized routing problem instance and subsequently using this knowledge to bias the tasks assignment and ordering of a routing problem. Specifically, Fig. 3.12a depicts the distribution of the tasks in the example routing problem of Fig. 3.18 (in appendix section) that need to be serviced. Figure 3.12b then denotes the optimized routing solution of the E S evolutionary solver on problem Fig. 3.18 (in appendix section) or Fig. 3.12a. The dashed circles in Fig. 3.12b denote the task assignments of the individual vehicles and the arrows indicate the tasks service orders, as optimized by E S. Here a knowledge meme M is defined in the form of a distance matrix that maximally aligns the given original distribution and service orders of tasks to the
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences v1
v13
v14
v2
(b) Optimized solution
(a) Original task distribution
v2
v12
v3
v11
v4 v5
v6
v10
v7
v8
v9
v3
v1
41
v14
v13
Vehicle 1 Vehicle 4
v12 v11
v4 Vehicle 2
v5
v6
v7
v10
Vehicle 3
v8
v9
Learning
Learned Knowledge
v2
M
v1 v14
v3
v13 v12
v4
v11
v5
v6 v7
v8
v9
v10
(c) Knowledge meme biased tasks distribution and assignments
Fig. 3.12 Learning of Knowledge M which shall serve as the instruction for biasing the tasks assignment and ordering of a routing problem
optimized routing solution s∗ attained by solver E S. Using the example routing problem instance in Fig. 3.18 (in appendix section), the knowledge meme is formulated as matrix M that transforms or maps the task distributions depicted in Fig. 3.12a to the desired tasks distribution of s∗ while preserving the corresponding tasks service orders, as depicted in Fig. 3.12b. In this manner, whenever a new routing problem instance is encountered, suitable learned knowledge memes from previously optimized problem instances is then deployed to realign the tasks distribution and service orders constructively. For instance, Fig. 3.12c showcases the desirable scaled or transformed tasks distribution of Fig. 3.12a when the appropriate knowledge meme M is put to work. In particular, it can be observed in Fig. 3.12c that the goal is seeking for
42
3 Optinformatics within a Single Problem Domain
the knowledge memes necessary to re-locate tasks serviced by a common vehicle to become closer to one another (as desired by the optimized solution s∗ shown in Fig. 3.12b), while tasks serviced by different vehicles to be mapped further apart. In addition, to match the service orders of each vehicle to that of the optimized solution s∗ , the task distribution is adapted according to the sorted pairwise distances in ascending order (e.g., the distance between v1 and v3 is the largest among v1 , v2 and v3 , while the distance between v10 and v9 is smaller than that of v10 and v8 ). In what follows, the proposed mathematical definitions of a knowledge meme M for the transformations of tasks distribution are detailed. In particular, given V = {vi |i = 1, . . . , n}, n is the number of tasks, denoting the tasks of a problem instance to be assigned. The distance between any two tasks vi = (vi1 , . . . , vi p )T and v j = (v j1 , . . . , v j p )T in the p-dimensional space R p is then given by: d M (vi , v j ) = ||vi − v j || M = (vi − v j )T M(vi − v j ) where T denotes the transpose of a matrix or vector. M is positive semidefinite, and can be represented as M = LLT by means of singular value decomposition (SVD). Substituting this decomposition into d M (vi , v j ), and then it arrives at: d M (vi , v j ) =
(LT vi − LT v j )T (LT vi − LT v j )
(3.2)
From Eq. 3.2, it is worth noting that the distances among the tasks are scaled by meme M. Thus the knowledge meme M performs the realignment of tasks distribution and service orders of a given new problem instance to one that bears greater similarity to the optimized solution s∗ . Next, the proposed mathematical formulations for learning of knowledge meme M are given. The schemata representations of a problem instance (p), optimized solution (s∗ ) and distance constraints set N are first defined. In particular, the data representations of the example problem instance in Fig. 3.18 (in appendix section) is depicted in Fig. 3.13, where v11 , v12 , etc., denote the features representation of each task, and D(·) indicates the Euclidean distance metric. Furthermore, if task vi and task v j are served by the same vehicle, Y(i, j) = 1, otherwise, Y(i, j) = −1. The distance constraints set N contains the service order information of the tasks and derived from the optimized solution s∗ . With respect to the example in Fig. 3.18 (in appendix section), since task v3 is served after v2 from v1 , the constraint thus takes the form of D(v1 , v3 ) > D(v1 , v2 ) as depicted in Fig. 3.13. To derive the knowledge meme M of a given CVRP or CARP problem instance, denoted by (p, s∗ ), the learning task is formulated as a maximization of the statistical dependency4 between X and Y with distance constraints as follows:
4 Dependency
is a measure of the correlation of two data sets [68]. Here the interest on knowledge meme M in the form of a maximization of the statistical dependency thus ensure maximal alignment between the transformed tasks distribution and the tasks distribution of the optimized solution.
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
43
Fig. 3.13 Data representations of a problem instance p = X, the corresponding optimized solution s∗ = Y and distance constraints set N
max tr (HKHY) K
s.t. K = XT ∗ M ∗ X Di j > Diq , ∀(i, j, q) ∈ N , K 0
(3.3)
where tr (·) denotes the trace operation of a matrix. X, Y are the matrix representations of a CARP or CVRP instance p and the corresponding problem solution s∗ , respectively. In addition, H = I − n1 11 centers the data and the labels in the feature space, I denotes the identity matrix, n equals to the number of tasks. Di j > Diq is then the constraint to impose a vehicle to serve task q before task j, upon serving task i. Let Ti j denotes a n × n matrix that takes non-zeros at Tii = T j j = 1, Ti j = T ji = −1. The distance constraints Di j > Diq in Eq. 3.3 is then reformulated as tr (KTi j ) > tr (KTiq ). In addition, slack variables ξi jq are introduced to measure the violations of distance constraints and penalize the corresponding square loss. Consequently, by substituting the constraints into Eq. 3.3, it arrives at: min −tr (XHYHXT M) + M,ξ
C 2 ξi jq 2
s.t. M 0 tr (XT MXTi j ) > tr (XT MXTiq ) − ξi jq , ∀(i, j, q) ∈ N
(3.4)
where C balances between the two parts of the criterion. The first constraint enforces the learnt knowledge denoted by matrix M to be positive semi-definite, while the second constraint imposes the scaled distances among the tasks to align well with the desired service orders of the optimized solution s∗ (i.e., Y).
44
3 Optinformatics within a Single Problem Domain
To solve the learning problem in Eq. 3.4, the minimax optimization problem is first formulated by introducing dual variables α for the inequality constraints based on Lagrangian theory. C 2 ξi jq Lr = tr (−HXT MXHY) + 2 − αi jq (tr (XT MXTi j ) − tr (XT MXTiq ) +ξi jq ) Set
∂ Lr ∂ξi jq
= 0: C
ξi jq −
(3.5)
αi jq = 0 =⇒ ξi jq =
1 αi jq C
(3.6)
By substituting Eq. 3.6 into Eq. 3.5, learning problem in Eq. 3.4 is reformulated as a minimax optimization problem, which is given by: max min tr [(−XHYHXT − α
M
+
αi jq XTi j XT
αi jq XTiq XT )M] −
s.t. M 0
1 2 αi jq 2C (3.7)
By setting A = XHYHXT +
αi jq XTi j XT −
αi jq XTiq XT
and Jitjq = tr [(XTiq XT − XTi j XT )M] −
1 αi jq C
(3.8)
(3.9)
Upon solving the above formulations and derivations, Eqs. 3.8 and 3.9 have been obtained. Then, as a common practice in Machine Learning, parameter C of Eq. 3.9 is configured by means of cross validation and the learning problem of Eq. 3.7 is solved using readily available methods [69].
3.2.3.2
Selection Operator
Since different knowledge memes introduces unique biases into the evolutionary search, inappropriately chosen knowledge and hence biases can lead to potential negative impairments of the evolutionary search. To facilitate a positive transfer of knowledge that would lead to enhanced evolutionary search, the selection operator (i.e., Line 5 in Algorithm 2) is designed to select and replicate from high quality knowledge memes that share common characteristics with the given new problem instance of interest. In particular, for a given set of z unique Ms in SoM, i.e.,
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
45
SoM = {M1 , M2 , . . . , Mz } that form the knowledge pool, the selection operator is designed to give higher the weights μi to knowledge memes that would induce positive biases. Furthermore, as the learning and selection of knowledge memes are from problems of a common domain, positive correlation among the problems is a plausible assumption.5 Here, the knowledge meme coefficient vector μ is derived based on the maximum mean discrepancy criterion.6 max tr (HKHY) + μ
z (μi )2 Sim i i=1
s.t. Mt =
z
μi Mi , μi ≥ 0,
z
i=1 T
μi = 1
i=1
K = X ∗ Mt ∗ X, K 0
(3.10)
In Eq. 3.10, the first term serves to maximize the statistical dependence between input matrix X and output label Y of the clusters of tasks. The second term measures the similarity between previous problem instances solved to the given new problem of interest. By substituting the constraints of Eq. 3.10 into the objective function, μ is derived as follows: max tr (HXT Mt XHY) + μ
z (μi )2 Sim i i=1
s.t. Mt =
z
μi Mi , Mi 0
i=1 z
μi ≥ 0,
μi = 1
(3.11)
i=1
Sim i defines the similarity measure between two given problem instances. In vehicle routing, tasks distribution and vehicle capacity are two key features that define the problem. Hence the similarity measure is formulated here i = −(β s as Sim t ∗ φ(xis ) − n1t i=1 M M Di + (1 − β) ∗ DV Ci ), where M M D(Ds , Dt ) = || n1s i=1 φ(xit )|| with φ(x) = x denoting the maximum mean discrepancy between the distribution of two given instances by considering the distance between their corresponding means. DV Ci denotes the discrepancy in vehicle capacity between any two problem instances. Based on domain knowledge, the task distribution (location of nodes to be serviced) has a higher weightage than vehicle capacity information. 5 From the experimental study, the problems in the benchmark set are mostly verified to be positively
correlated. 6 Maximum
mean discrepancy measures the distribution differences between two data sets, which can come in the form of vectors, sequences, graphs, and other common structured data types.
46
3 Optinformatics within a Single Problem Domain
This implies that β > 0.5. In this work, β is configured empirically as 0.8 to favour task distribution information over vehicle capacity information. In Eq. 3.11, two unknown variables exist (i.e., μ and Y). Y is obtained from the results of task assignment (i.e., if task vi and task v j are served by the same vehicle, Y(i, j) = 1, otherwise, Y(i, j) = −1. The respective task assignment is obtained by clustering on the M transformed tasks X.). With Y fixed, Eq. 3.11 becomes a quadric programming problem of μ. To solve the optimization problem of Eq. 3.11, clustering (e.g., K-Means) [70] on input X is first performed directly to obtain the label matrix Y. By keeping Y fixed, μ is obtained by maximizing Eq. 3.11 via quadric programming solver. Next, by maintaining the chosen M fixed, clustering is made on the new X (i.e., transformed by selected M. X = LT X, where L is obtained by SVD on M) to obtain label matrix Y.
3.2.3.3
Variation Operator
Furthermore, to introduce innovations into the selected knowledge memes during subsequent reuse, the variation operator (i.e., Line 6 in Algorithm 2) kicks in. In the present context, taking inspirations from human’s ability to generalize from past problem solving experiences, variation is realized here in the form of generalization. However, it is worth noting that other alternative forms of probabilistic scheme in variations may also be considered since uncertainties can generate growth and variations of knowledge that people have of the world [71], hence leading to higher adaptivity capabilities for solving complex and non-trivial problems. Here, the variation is derived as a generalization of the selected knowledge memes: Mt =
z i=1
3.2.3.4
μi Mi , (
z
μi = 1, μi ∈ [0, 1])
i=1
Imitation Operator
In CVRP and CARP, the search for optimal solution is typically solved as two separate phases. The first phase involves the assignment of the tasks that require services to the appropriate vehicles. The second phase then serves to find the optimal service order of each vehicle for the assigned tasks obtained in phase 1. In what follows, the imitation of learned knowledge memes to bias the initial population of solutions in subsequent E S searches are described. For each solution sg in the EA population, the knowledge Mt generalized from past experiences is imitated for the purpose of generating positively biased solutions (see Line 10 in Algorithm 2) in the evolutionary search by transforming or remapping the original tasks distribution solution (i.e., both tasks assignments and tasks service orders), as j j denoted by Xnew , to become new tasks distribution Xnew given by:
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
47
j j Xnew = LT Xnew
(3.12)
where L is derived by singular value decomposition of Mt . An illustrative example j is depicted in Fig. 3.14, where Fig. 3.14a denote the original task distribution Xnew , while Fig. 3.14b is the resultant knowledge biased or transformed tasks distribution j Xnew using Mt . In phase 1, K-Means clustering with random initializations is conducted on the j knowledge biased tasks distribution Xnew to derive the tasks assignments of the vehicles as depicted in Fig. 3.14c, where the dashed circles denote the task assignments of the individual vehicles, i.e., denoting the tasks that shall be serviced by a common vehicle. In phase 2, the service orders of each vehicle are subsequently achieved by sorting the pairwise distances among tasks in an ascending order. The two tasks with largest distance shall then denote the first and last tasks to be serviced. Taking the first task
v2
v1
v13
v14
v2
v12
v3
v11
v4 v5
v1 v14
v3 j' X new
j LT X new
v13 v12
v4
v11
v5 v6
v8
v7
v9
j'
(a) Original task distribution
v2
v6
v10
v7
j X new
K-
v1 v14
v3
an Me
n so
X n ew
v13 v12
v5
j PDS( X new )
v10
v1 v14
v3
v11
v9
(b) Knowledge transformed j' task distribution X new
v2
v4
v8
v13 v12
v4
v11
v5
v6
v6 v7
v8
v9
v10
(c) K-Means clustering on knowledge transformed task distribution
v7
v8
v9
v10
(d) Obtain service orders via pairwise distance sorting (PDS)
Fig. 3.14 An illustration of knowledge imitation in the generating positively biased CVRP or CARP solutions
48
3 Optinformatics within a Single Problem Domain
as reference, the service order of the remaining tasks are defined according to the sorted orders. Referring to Fig. 3.14d as an example, where the arrows indicate the service orders of the tasks, the distance between v10 and v7 are the largest among v10 , v9 , v8 and v7 . In assigning v10 as the reference task to be served, v9 is then the next task to be serviced, since the distance between v10 and v9 is smaller than that of v10 versus v8 or versus v7 .
3.2.4 Case Study on Capacitated Arc Routing Problem In this section, empirical studies on the capacitated arc routing problem (CARP) domain are conducted to evaluate the proposed memetic computation paradigm for search, i.e., evolutionary optimization + transfer learning. In particular, both search efficiency and converged solution quality are investigated.
3.2.4.1
Capacitated Arc Routing Problem
Capacitated Arc Routing is a typical combinatorial optimization which defines the problem of servicing a set of street networks using a fleet of capacity constrained vehicles located at the central depot. The objective of the problem is to minimize the total routing cost involved. In practice, the CARP and its variants are found to be in abundance across applications involving the serving of street segments instead of specific nodes or points. Typical examples of CARP would include the applications of urban waste collection, winter gritting and post delivery [72]. Theoretically, CARP has been proven to be NP-hard with only explicit enumeration approaches known to solve them optimally. However, large scale problems are generally computationally intractable due to the poor scalability of most enumeration methods. From a survey of the literature, many heuristic approaches have played an important role in algorithms capable of providing good solutions within tractable computational time. In [73], Lacomme et al. presented the basic components that have been embedded into memetic algorithms (MAs) for solving the extended version of CARP (ECARP). Lacomme’s MA (LMA) was demonstrated to outperform all known heuristics on three sets of benchmarks. Recently, Mei et al. [74] extended Lacomme’s work by introducing two new local search methods, which successfully improved the solution quality of LMA. In addition, a memetic algorithm with extended neighborhood search was also proposed for CARP in [3]. Nevertheless, it is worth noting that majority of the works are designed based on heuristics that comes with little theoretical rigor. The CARP, first proposed by Golden and Wong [75], can be formally stated as follows: Given a connected undirected graph G = (V, E), where vertex set V = {vi }, i = 1 . . . n, n is the number of vertices, edge set E = {ei }, i = 1 . . . m with m denoting the number of edges. Consider a demand set D = {d(ei )|ei ∈ E}, where d(ei ) > 0 implies edge ei requires servicing, a travel cost vector Ct = {ct (ei )|ei ∈
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
49
E} with ct (ei ) representing the cost of traveling on edge ei , a service cost vector Cs = {cs (ei )|ei ∈ E} with cs (ei ) representing the cost of servicing on edge ei . Definition 1 Given a depot node vd ∈ V , a travel circuit C starting and ending at vd is considered valid if and only if the total load ∀ei ∈C d(ei ) ≤ W , where W is the capacity of each vehicle. The cost of a travel circuit is then defined by the total service cost for all edges that needed service together with the total travel cost of the remaining edges that formed the circuit: cost (C) =
cs (ei ) +
ei ∈Cs
ct (ei )
(3.13)
ei ∈Ct
where Cs and Ct are edge sets that required servicing and those that do not, respectively. Definition 2 A set of travel circuits S = {Ci }, i = 1 . . . k is a valid solution to the CARP if and only if: 1. ∀i ∈ [1, k], Ci is valid. 2. ∀ei ∈ E and d(ei ) > 0, there exists one and only one circuit Ci ∈ S such that ei ∈ Ci . The objective of CARP is then to find a valid solution S that minimizes the total cost: cost (Ci ) (3.14) CS = ∀Ci ∈S
The example of a CARP is illustrated in Fig. 3.15, with vd representing the depot, full line denoting edges that require servicing (otherwise known as tasks) and dashed lines representing edges that do not require servicing. Each task is assigned a unique integer number (e.g., 2 is assigned to the task from v2 to v1 ), the integer numbers enclosed in brackets denoting the inversion of each task (i.e., direction of edge) accordingly. In Fig. 3.15, three feasible solution circuits C1 = {0, 4, 2, 0}, C2 = {0, 5, 7, 0}, and C3 = {0, 9, 11, 0} can be observed, each composing of two tasks. A ‘0’ index value is assigned at the beginning and end of circuits to initialize each circuit to start and end at the depot. According to Eqs. 3.13 and 3.14, the total cost of a feasible solution S = {C1 , C2 , C3 } is then obtained as sum of the service costs for all tasks and the travel costs for all edges involved.
3.2.4.2
Experimental Configuration
The well-established egl benchmark used in this chapter, is used again in the present experimental study on CARP. The detailed properties of each egl instance are presented in Tables 3.5 and 3.6. “|V |”, “|E R |”, “E” and “L B” denote the number of vertices, number of tasks, total number of edges and lower bound, of each problem instance, respectively.
50
3 Optinformatics within a Single Problem Domain
Fig. 3.15 Illustration of the CARP. (vd representing the depot, vi , i = 1, . . . , 12, denote the customers) Table 3.5 Properties of the egl “E” Series CARP benchmarks “E” Series Data set
E1A
E1B
E1C
E2A
E2B
E2C
E3A
E3B
E3C
E4A
E4B
E4C
V
77
77
77
77
77
77
77
77
77
77
77
77
Er
51
51
51
72
72
72
87
87
87
98
98
98
E
98
98
98
98
98
98
98
98
98
98
98
98
LB
3548
4498
5566
5018
6305
8243
5898
7704
10163 6048
8884
11427
S3C
S4A
S4B
S4C
Table 3.6 Properties of the egl “S” Series CARP benchmarks “S” Series Data set
S1A
S1B
S1C
S2A
S2B
S2C
S3A
S3B
V
140
140
140
Er
75
75
75
140
140
140
140
140
140
140
140
140
147
147
147
159
159
159
190
190
E
190
190
190
190
190
190
190
190
190
190
190
190
190
LB
5018
6384
8493
9824
12968 16353 10143 13616 17100 12143 16093 20375
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
51
In traditional CARP, each task is represented by a corresponding head vertex, tail vertex, travel cost and demand (service cost). The shortest distance matrix of the vertices is first derived by means of the Dijkstra’s algorithm [76], i.e., using the distances available between the vertices of a CARP. The coordinate features (i.e., locations) of each task are then approximated by means of multidimensional scaling [77]. In this manner, each task is represented as a node in the form of coordinates. A CARP instance in the current setting is thus represented by input vector X composing of the coordinate features of all tasks in the problem. Such a representation would allow standard clustering approaches, such as the K-Means algorithm to be conducted on the CARP in task assignments, and allow the pairwise distances sorting among tasks available for preserving service orders. The label information of each task in Y belonging to the CARP instance, is as defined by the optimized solution of CARP. In the empirical study, the two evolutionary solvers for CARP in this chapter, i.e., PMA and ILMA, are considered as the baseline for comparison. Furthermore, three solution population initialization procedures based on PMA and ILMA are also considered here. The first is a simple random approach, which is labeled here as PMA-R and ILMA-R. The second is the informed heuristic based population generation procedure used in the baseline state-of-the-art PMA and ILMA for CARP [74]. There, the initial population is formed by a fusion of chromosomes generated from Augment_Merge [75], Path_Scanning [78], Ulusoy’s Heuristic [79] and the simple random initialization procedures. The last is the proposed solution generation based on the knowledge meme transferred from past problem solving experiences, which is labeled here as PMA-M and ILMA-M. In PMA-M and ILMA-M, the meme pool is empty at start, while new memes shall be accumulated with increasing CARP instances optimized. Thus, once again, the PMA-M and ILMA-M performs exactly like PMA and ILMA respectively, on the first problem instance encountered, since it is meant to serve as an intelligent PMA and ILMA whose intellectual shall increase with increasing problems solved. In the present study, the CARP instances in Tables 3.5 and 3.6 are solved from left to right, and each data set is solved independently. To facilitate a fair comparison and verify the benefits of the learning from past experiences, the evolutionary operators of both PMA and ILMA and their variants are configured as consistent to that reported in [74]. In addition, in PMA-M and ILMA-M, the MMD of Eq. 3.10 is augmented with the demand of each task. Lastly, several criteria defined to measure the search performances are then listed in Table 3.7. Among these criteria, N umber o f Fitness Evaluation is used to measure the efficiency of the algorithms, while Ave.Cost and B.Cost serve as the criteria for measuring the solution qualities of the algorithms.
3.2.4.3
Results and Discussion
The obtained empirical results and discussions are presented in this section. Search Efficiency To demonstrate the efficiency of the proposed memetic computation paradigm, the search convergence traces of PMA and ILMA and their variants (i.e., PMA-R and
52
3 Optinformatics within a Single Problem Domain
Table 3.7 Criteria for measuring performance Criterion N umber o f Fitness Evaluation Ave.Cost B.Cost Std.Dev
Definition Average number of fitness evaluation calls made across all 30 independent runs conducted Average travel cost or fitness of the solutions obtained across all 30 independent runs conducted Best travel cost or fitness of the solutions obtained across all 30 independent runs conducted Standard deviation of the solutions’ travel cost or fitness across all 30 independent runs conducted
PMA-M; ILMA-R and ILMA-M), on several representative instances of “E” and “S” series egl instances are depicted in Figs. 3.16 and 3.17, respectively. It can be observed that, the baseline PMA and ILMA converge faster than their random initialization variants with improved solution attained on most of the CARP instances, e.g., “E2B” (Figs. 3.16a and 3.17a), “E3C” (Figs. 3.16b and 3.17b), “E4B” (Figs. 3.16c and 3.17c), “S2C” (Fig. 3.16e), “S4C” (Figs. 3.17f), etc. This is because the solutions with better fitness obtained from heuristic search are used as initial seeds in PMA and ILMA, which make the further search be more focus on a certain area with potential near-optimal solution existed. Furthermore, for the proposed memetic computation search paradigm, on both the “E” and “S” series egl benchmarks, both PMA-M and ILMA-M converges more efficiently than their randomly initialized variant (i.e., PMA-R and ILMA-R) and heuristic initialized variant (i.e., PMA and ILMA) on almost all the CARP instances presented. Overall, PMA-M and ILMA-M are noted to bring about at least 2 × 106 savings in the number of fitness function evaluations to arrive at the solutions attained by PMA and ILMA respectively, on most of the CARP instances. For instance, it is worth noting that on instance “S1B” (Figs. 3.16d and 3.17d), PMA-M and ILMA-M used a total of 1.5 × 107 and 1.5 × 106 number of fitness function evaluations to converge at the solution incurred by PMA and ILMA respectively, which otherwise used up a significant large fitness evaluations of approximately 5 × 107 and 6 × 106 . Note that, the only difference lies in PMA-M (or ILMA-M) and PMA (or ILMA) is the initialization process, where the former generate the initial population with knowledge meme transferred from past solved CARP instances. Solution Quality Furthermore, Tables 3.8, 3.9, 3.10 and 3.11 tabulate the results that measure the solution quality on the “E”-Series and “S”-Series egl CARP datasets as obtained by PMA, ILMA and their variants, across 30 independent runs. In the tables, the method with superior performance with respect to “B.Cost”, “Ave.Cost” and “Success N o.” are highlighted in bold font. As can be observed from the Tables, with the incorporation of heuristic information as inductive search bias in the baseline, PMA and ILMA generally attained
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences 4.017
3.8035
PMA PMA−R PMA−M
PMA PMA−R PMA−M
4.0165
3.803
4.016 Travel Cost
Travel Cost
53
3.8025
3.802
4.0155 4.015 4.0145 4.014
3.8015
4.0135 3.801
0
0.5
1
1.5
2
4.013
2.5 7 x 10
Number of Fitness Evaluation
0
(b) E3C 3.8085
PMA PMA−R PMA−M
3.962
PMA PMA−R PMA−M
3.808
3.961
3.8075
3.96
Travel Cost
Travel Cost
7
x 10
Number of Fitness Evaluation
(a) E2B 3.963
3.959
3.807 3.8065
3.958 3.806
3.957
3.8055
3.956 3.955
0
3.805
2
1.5
1
0.5
0
1
7
(c) E4B
3
4 7
x 10
(d) S1B
4.227
4.225
PMA PMA−R PMA−M
4.226
4.222 Travel Cost
4.223
4.224 4.223 4.222 4.221
PMA PMA−R PMA−M
4.224
4.225
4.221 4.22 4.219 4.218
4.22
4.217
4.219 4.218
2
Number of Fitness Evaluation
x 10
Number of Fitness Evaluation
Travel Cost
2.5
2
1.5
1
0.5
4.216
0
1
2
3
Number of Fitness Evaluation
(e) S2C
4
5 7 x 10
4.215
0
0.5
1
1.5
2
Number of Fitness Evaluation
2.5 7
x 10
(f ) S4B
Fig. 3.16 Averaged search convergence traces (across 30 independent runs) of PMA, PMA-R, and PMA-M on representative CARP “E” and “S” Series instances. Y -axis: Travel cost, X -axis: Number of Fitness Evaluations
54
3 Optinformatics within a Single Problem Domain 3.8034
4.017
ILMA ILMA−R ILMA−M
3.8032
ILMA ILMA−R ILMA−M
4.0165
3.803 Travel Cost
Travel Cost
4.016
3.8028 3.8026 3.8024
4.0155 4.015 4.0145
3.8022
4.014
3.802 3.8018 0.5
1
1.5
2
2.5
3
3.5
4
4.5 6 x 10
Number of Fitness Evaluation
2
3
(a) E2B
5
6
7
8 6 x 10
(b) E3C 3.8105
3.9615
ILMA ILMA−R ILMA−M
3.961
ILMA ILMA−R ILMA−M
3.81 3.8095
3.9605
3.809 Travel Cost
3.96 Travel Cost
4
Number of Fitness Evaluation
3.9595 3.959 3.9585
3.8085 3.808 3.8075
3.958
3.807
3.9575
3.8065
3.957
3.806
3.9565
3.8055
2
3
4
5
6
7
8
1
2
6
3
4
5
Number of Fitness Evaluation
x 10
Number of Fitness Evaluation
(c) E4B
6 6
x 10
(d) S1B 4.324
4.127
ILMA ILMA−R ILMA−M
4.1265
ILMA ILMA−R ILMA−M
4.323
4.126
4.322 Travel Cost
Travel Cost
4.1255 4.125 4.1245 4.124 4.1235
4.321 4.32 4.319
4.123
4.318
4.1225 4.122
3
4
5
6
7
8
9
Number of Fitness Evaluation
(e) S2B
10
11 6 x 10
4.317
0.4
0.6
0.8
1
1.2
1.4
Number of Fitness Evaluation
1.6 7 x 10
(f ) S4C
Fig. 3.17 Averaged search convergence traces (across 30 independent runs) of ILMA, ILMA-R, and ILMA-M on representative CARP “E” and “S” Series instances. Y -axis: Travel cost, X -axis: Number of Fitness Evaluations
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
55
Table 3.8 Solution quality of PMA, PMA-R, and PMA-M on egl “E” Series CARP instances. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote PMA-M statistically significant similar, better, and worse than PMA, respectively) PMA − R
PMA − M
Data set
PMA
(Proposed method)
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
1.E1A
3548
3548
0
3548
3548
0
3548
3548 ≈
0
2.E1B
4498
4500.70
8.53
4498
4503.40
11.38
4498
4499.30 ≈
4.11
3.E1C
5595
5596.80
5.69
5595
5598.60
7.58
5595
5597.60 ≈
6.18
4.E2A
5018
5018
0
5018
5018
0
5018
5018 ≈
0
5.E2B
6317
6334.60
11.29
6317
6330.80
10.92
6317
6326.90 ≈
9.45
6.E2C
8335
8343.30
20.89
8335
8369.40
67.28
8335
8342.90 ≈
20.85
7.E3A
5898
5910.00
25.29
5898
5912.60
25.26
5898
5898.00+
0
8.E3B
7775
7781.80
7.37
7777
7782.40
7.60
7775
7780.20 ≈
5.55
9.E3C
10292
10316.90 35.53
10292
10319.30 39.20
10292
10309.40 ≈
34.63 4.03
10.E4A
6461
6467.60
11.64
6461
6467.10
17.22
6454
6462.40 ≈
11.E4B
8988
9026.40
31.81
8998
9037.60
49.14
8988
9028.30 ≈
20.06
12.E4C
11541
11635.70 68.08
11604
11679.50 51.74
11545
11638.80 ≈
47.63
Table 3.9 Solution quality of PMA, PMA-R, and PMA-M on egl “S” Series CARP instances. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote PMA-M statistically significant similar, better, and worse than PMA, respectively) PMA − R
PMA − M
Data set
PMA
(Proposed method)
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
1.S1A
5018
5023.40
17.07
5018
5023.40
17.07
5018
5023.33 ≈
18.04
2.S1B
6388
6393.00
15.81
6388
6393.30
14.77
6388
6388+
0
3.S1C
8518
8563.60
34.90
8518
8559.70
33.73
8518
8559.30 ≈
31.96
44.94
4.S2A
9912
9968.60
9926
10012.60 54.03
9903
9972.30 ≈
48.51
5.S2B
13143
13239.00 38.07
13174
13230.90 36.06
13129
13178.70+
39.73
6.S2C
16489
16554.60 50.66
16505
16595.20 76.22
16470
16555.30 ≈
71.30
7.S3A
10242
10306.70 61.31
10263
10333.70 42.69
10245
10304.20 ≈
53.55
8.S3B
13802
13860.80 55.89
13762
13828.00 45.98
13760
13822.80+
69.29
9.S3C
17280
17331.40 40.80
17274
17338.70 60.95
17271
17321.70 ≈
50.10
10.S4A
12327
12401.70 56.72
12358
12461.60 60.52
12326
12416.30 ≈
64.43
11.S4B
16343
16460.50 67.29
16381
16461.80 60.18
16343
16433.00 ≈
26.11
12.S4C
20575
20703.80 87.18
20566
20691.40 88.06
20537
20687.30 ≈
96.40
improved performance in terms of “Ave.Cost” over their randomly initialized variant PMA-R and ILMA-R, respectively. Such as “E1C”, “E2B”, “E3C”, “S3A”, “S4B”, etc. Turning then to the proposed memetic evolutionary search paradigm, it can be seen from the Tables that, PMA-M and ILMA-M performed competitively to PMA and ILMA respectively, on instances “E1-A” and “S1-A” as expected since these
56
3 Optinformatics within a Single Problem Domain
Table 3.10 Solution quality of ILMA, ILMA-R, and ILMA-M on egl “E” Series CARP instances. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote ILMA-M statistically significant similar, better, and worse than ILMA, respectively) ILMA − R
ILMA − M
Data set
ILMA
(Proposed method)
B.Cost
Ave.Cost Success N o. B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
1.E1A
3548
3548
0
3548
3548
0
3548
3548 ≈
0
2.E1B
4498
4517.63
12.45
4498
4517.80
13.19
4498
4513.27 ≈
14.93
3.E1C
5595
5599.33
7.56
5595
5601.73
8.84
5595
5598.07 ≈
8.58
4.E2A
5018
5018
0
5018
5018
0
5018
5018 ≈
0
5.E2B
6317
6341.53
20.15
6317
6344.03
22.38
6317
6337.90 ≈
11.90
6.E2C
8335
8359.87
36.61
8335
8355.07
39.26
8335
8349.97 ≈
26.16
7.E3A
5898
5921.23
30.07
5898
5916.93
30.50
5898
5910.97+
30.57
8.E3B
7777
7794.77
23.08
7777
7792.17
29.95
7775
7788.70 ≈
15.74
9.E3C
10292
10318.73
40.89
10292
10327.07 33.46
10292
10319.16 ≈
36.15 10.27
10.E4A
6461
6471.37
15.16
6458
6481.77
22.77
6461
6469.80 ≈
11.E4B
8995
9060.67
45.29
8993
9067.93
50.54
8988
9053.97 ≈
41.49
12.E4C
11555
11678.47
73.57
11594
11728.30 82.39
11576
11697.27 ≈
76.98
Table 3.11 Solution quality of ILMA, ILMA-R, and ILMA-M on egl “S” Series CARP instances. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote ILMA-M statistically significant similar, better, and worse than ILMA, respectively) ILMA − R
ILMA − M
Data set
ILMA
(Proposed method)
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
1.S1A
5018
5023.93
18.14
5018
5025.97
26.97
5018
5023.67 ≈
25.39
2.S1B
6388
6404.07
22.96
6388
6403.30
20.89
6388
6392.80+
14.65
3.S1C
8518
8577.63
44.18
8518
8581.67
33.98
8518
8576.53 ≈
33.12
4.S2A
9920
10037.43 61.51
9925
10050.30 54.24
9896
10010.20 ≈
67.13
5.S2B
13191
13260.03 45.37
13173
13257.90 48.94
13147
13245.56 ≈
53.02
6.S2C
16507
16605.10 65.26
16480
16626.43 62.90
16468
16615.40 ≈
76.79
7.S3A
10248
10342.77 47.56
10278
10369.40 52.42
10239
10339.40 ≈
53.29
8.S3B
13764
13912.97 79.85
13779
13899.70 76.96
13749
13881.33 ≈
85.78
9.S3C
17274
17371.10 79.12
17277
17402.43 74.37
17261
17355.03+
48.23
10.S4A
12335
12498.47 67.72
12407
12534.47 63.23
12320
12489.43 ≈
83.91
11.S4B
16378
16542.93 89.65
16443
16540.43 87.52
16415
16512.43 ≈
57.54
12.S4C
20613
20794.80 77.51
20589
20841.13 85.53
20564
20774.20 ≈
86.78
are the first encountered problem instances of PMA-M and ILMA-M on the “E” and “S” egl CARP benchmarks, respectively, where no memes are available in the meme pool. As more problem instances are optimized, the PMA-M is observed to demonstrate superior performance over PMA in terms of Ave.Cost on 15 out of total
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
57
24 egl benchmarks. In addition, the ILMA-M is observed to demonstrate superior performance over ILMA in terms of Ave.Cost on 18 out of total 24 egl benchmarks. To obtain statistical significance comparisons, the Wilcoxon rank sum test with 95% confidence level has been conducted on the experimental results. As can be observed, PMA-M and ILMA-M are noted to search efficiently at no loss in solution quality when compare to PMA and ILMA, respectively.
3.2.5 Case Study on Capacitated Vehicle Routing Problem To further evaluate the performance efficacy of the proposed transfer learning as culture-inspired operators for fast evolutionary optimization of related problems via transfer learning from past problem solving experiences, comprehensive empirical studies conducted on another well-known routing domain, namely capacitated vehicle routing problem (CVRP) are present in this section. As previous section, both search efficiency and converged solution quality are investigated here.
3.2.5.1
Capacitated Vehicle Routing Problem
The capacitated vehicle routing problem (CVRP) introduced by Dantzig and Ramser [80], is a problem to design a set of vehicle routes in which a fixed fleet of delivery vehicles of uniform capacity must service known customer demands for single commodity from a common depot at minimum cost. The CVRP can be formally defined as follows. Given a connected undirected graph G = (V, E), where vertex set V = {vi }, i = 1 . . . n, n is the number of vertex, edge set E = {ei j }, i, j = 1 . . . n denoting the arc between vertices vi and v j . Vertex vd corresponds to the depot at which k homogeneous vehicles are based, and the remaining vertices denote the customers. Each arc ei j is associated with a non-negative weight ci j , which represents the travel distance from vi to v j . Consider a demand set D = {d(vi )|vi ∈ V }, where d(vi ) > 0 implies customer vi requires servicing (i.e., known as task), the CVRP consists of designing a set of least cost vehicle routes R = {Ci }, i = 1 . . . k such that 1. Each route Ci , i ∈ [1, k] must start and end at the depot node vd ∈ V . 2. The total load of each route must be no more than the capacity W of each vehicle, ∀vi ∈C d(vi ) ≤ W . 3. ∀vi ∈ V and d(vi ) > 0, there exists one and only one route Ci ∈ R such that vi ∈ Ci . The objective of the CVRP is to minimize the overall distance cost (R) traveled by all k vehicles and is defined as:
58
3 Optinformatics within a Single Problem Domain
v2
v1
v14
v13
v3 v5
v4
v12
Vehicle 1 Vehicle 4
v11
Depot Vehicle 2
v6
v7 Vehicle 3
v8
v10
: arc or vertex requires service : depot : travel route of vehicles
v9
: capacitated service vehicle
Optimized Solution s*:
{0, v1 , v2 , v3 ,0, v6 , v5 , v 4 ,0, v10 , v9 , v8 , v7 0, v14 , v13 , v12 , v11,0} Fig. 3.18 The example of a CVRP
cost (R) =
k
c(Ci )
(3.15)
i=1
where c(Ci ) is the summation of the travel distance ei j contained in route Ci . The example of an optimized CVRP routing solution can be illustrated in Fig. 3.18, where four vehicle routes, namely, R1 = {0, v1 , v2 , v3 , 0}, R2 = {0, v6 , v5 , v4 , 0}, R3 = {0, v10 , v9 , v8 , v7 , 0} and R4 = {0, v14 , v13 , v12 , v11 , 0}, can be observed. A ‘0’ index value is assigned at the beginning and end of route to denote that each route starts and ends at the depot.
3.2.5.2
Experimental Configuration
In the present study, a recently proposed evolutionary algorithm for solving CVRPs, labeled in the original published works as CAMA [81] is considered here as the baseline conventional evolutionary solvers for the independent domains. All three commonly used CVRP benchmark sets with diversity properties (e.g., number of vertex, vehicle number, etc.) are investigated in the present empirical study, namely “AUGERAT”, “CE” and “CHRISTOFIDES”. The detailed properties of these data sets are summarized in Tables 3.12 and 3.13. In CVRP, each task or vertex has a corresponding coor dinates (i.e., 2-d space) and demand. Using the coor dinates of the vertex, the tasks assignment of each vehicle are generated based on K-Means clustering. A CVRP instance is thus represented as input matrix X, see Fig. 3.13, which is composed of the coordinate features for all tasks in the problem. The desired
53
31
100
784
V
Cv
LB
1167
100
A-n54-k7
A-n32-k5
Data set
1354
100
59
A-n60-k9
1159
100
68
A-n69-k9
Table 3.12 Properties of the “Augerat” CVRP data set
1763
100
79 829
100
40
A-n80-k10 B-n41-k6
1140
100
56
B-n57-k7
1496
100
62
1272
100
67
B-n63-k10 B-n68-k9
1221
100
77
554
150
49
B-n78-k10 P-n50-k7
627
280
75
P-n76-k5
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences 59
835
LB
682
220
735
180
830
140
75
8000
75
32
V
Cv
75
E-n76-k10
E-n33-k4 E-n76-k7 E-n76-k8
Data set
1021
100
75
E-n76-k14
Table 3.13 Properties of the “CE” and “Christofides” CVRP data sets
815
200
100 524.61
160
50
E-n101-k8 c50
835.26
140
75
c75
826.14
200
100
c100
819.56
200
100
c100b
1042.11
200
120
c120
1028.42
200
150
c150
1291.45
200
199
c199
60 3 Optinformatics within a Single Problem Domain
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
61
vehicle assigned for each task (i.e., task assignment) is then given by the E S optimized solution, Y, of the respective CVRP instances. Besides the proposed knowledge meme biased approach, two other commonly used initialization procedures for generating the population of solution individuals in the state-of-the-art baseline CAMA are investigated here to verify the efficiency and effectiveness of the proposed evolutionary search across problems. The first is the simple random approach for generating the initial population, which is labeled here as CAMA-R. The second is the informed heuristic population initialization procedure proposed in the state-of-the-art baseline CAMA [81] method. In particular, the initial population is a fusion of solution generated by Backward Sweep, Saving, and Forward Sweep and random initialization approaches. The CAMA that employs the proposed transfer learning culture-inspired operators is then notated as CAMA-M, where the initial population of individuals in CAMA are now generated based on the high quality knowledge memes that have been accumulated from past CVRP solving experiences via the cultural evolutionary mechanisms of the learning, selection, variation and imitation. Note that if no prior knowledge has been learned so far, CAMA-M shall behave exactly like the baseline CAMA. In the present study, the CVRP instances in Tables 3.12 and 3.13 are solved from left to right, and each data set is solved independently. Moreover, the operator and parameter settings of CAMA-R, CAMA, CAMA-M are kept the same as that of [81] for the purpose of fair comparison. For CAMA-M, the MMD of Eq. 3.10 is augmented with the demand of each task as one of the problem feature.
3.2.5.3
Results and Discussion
This section presents, analyzes, discusses and compares the results obtained against recently proposed methods based on the criteria of search efficiency and solution quality. Search Efficiency—Convergence Trends and Speedup To assess the efficiency of the proposed approach, the representative search convergence traces of CAMA, CAMA-R and CAMA-M on the 3 different CVRP benchmark sets are presented in Figs. 3.19, 3.20 and 3.21. The Y -axis of the figures denote the Actual travel cost obtained, while the X -axis gives the respective computational effort incurred in terms of the Number of Fitness Evaluation Calls made so far. As observed, CAMA-R is noted to converge faster than CAMA on most of the instances in “AUGERAT” and “CE” benchmarks (e.g., Fig. 3.19a–i, b, c, etc.). However, on the large scale “CHRISTOFIDES” benchmarks, CAMA poses to search more efficiently, especially from beyond 1000 number of fitness evaluations (e.g., Fig. 3.21b–d etc.). On the other hand, it can be observed that CAMA-M converges faster than both CAMA-R and CAMA on all the CVRP benchmarks. Particularly, on instances “Bn41-k6” (Fig. 3.19c), “B-n68-k9” (Fig. 3.19e) and “B-n78-k10” (Fig. 3.19f), etc., CAMA-M takes only approximately 250 number of fitness evaluations to arrive at the
62
3 Optinformatics within a Single Problem Domain
solution quality of CAMA-R and CAMA, which incurred more than 1500 number of fitness evaluations. On the large scale instances, such as “c150” (Fig. 3.21e), “c199” (Fig. 3.21f), etc., CAMA-M brings about at least 2000 number of fitness evaluations savings to arrive at the similar solution qualities of CAMA and CAMA-R.
2400
2000
1800
CAMA CAMA−R CAMA−M
2200
Travel Cost
2200 Travel Cost
2400
CAMA CAMA−R CAMA−M
2000 1800 1600
1600
1400
1400
0
200
400
600
800
1200
1000
0
Number of Fitness Evaluation
200
(a) A-n60-k9
1000
Travel Cost
1200 1100
CAMA CAMA−R CAMA−M
2000
1300 Travel Cost
800
2200
CAMA CAMA−R CAMA−M
1400
1800 1600 1400
1000
1200
900 0
500
1000
1000
1500
0
(c) B-n41-k6
2200
400
600
800
1000
(d) B-n57-k7
2400
CAMA CAMA−R CAMA−M
1800
1600
CAMA CAMA−R CAMA−M
2200
Travel Cost
2000
200
Number of Fitness Evaluation
Number of Fitness Evaluation
Travel Cost
600
(b) A-n69-k9
1500
800
400
Number of Fitness Evaluation
2000 1800 1600
1400
1200
1400
0
200
400
600
800
Number of Fitness Evaluation
(e) B-n68-k9
1000
1200
0
500
1000
1500
2000
Number of Fitness Evaluation
(f ) B-n78-k10
Fig. 3.19 Averaged search convergence traces (across 30 independent runs) of CAMA, CAMA-R, and CAMA-M on representative CVRP “AUGERAT” benchmark set. Y -axis: Travel cost, X -axis: Number of Fitness Evaluations. Note that CAMA-M is observed to search significantly faster in converging to near the lower bound solution on the respective CVRPs than the other counterparts
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences 1100
900 800
1000 900
700
800
600
700
500
0
500
1000
1500
Number of Fitness Evaluation
(h) P-n50-k7
2000
CAMA CAMA−R CAMA−M
1100
Travel Cost
Travel Cost
1200
CAMA CAMA−R CAMA−M
1000
63
600
0
500
1000
1500
Number of Fitness Evaluation
(i) P-n76-k5
Fig. 3.19 (continued)
To provide more in-depth insights on the enhanced efficiency of the CAMAM, in Table 3.14, the amount of speedup by CAMA-M over the baseline CAMA in arriving at different stages of the search defined by the fitness levels, for the representative problem instances are also tabulated. Here speedup is defined by C AM AiFitness EvaluationCalls , where i = 1...i...N and N denoting the number of fitness C AM A−M i Fitness EvaluationCalls
bins considered. AiFitness EvaluationCalls denotes the number of fitness evaluation used by algorithm A to arrive at the fitness attained in bin i. In the results reported in Table 3.14, an equal-width fitness bin size of 8 is used. For example, the fitness bin 1 and bin 8 of problem instance c100b in Table 3.14 (i.e., see the last row) shows the speedup of CAMA-M over CAMA at the start of the search and upon search convergence are 2924.24 and 9.44 times, respectively. Note that speedups are observed throughout the entire search in the representative CVRP instances, as observed in the table. For the purpose of conciseness, the log10 (Speedup) of CAMA-M over CAMA at the start of the search on the CVRP instances in the order that they are solved are further summarized in Fig. 3.22. It is worth noting that Fig. 3.22 resembles a learning curve where the increasing knowledge learned corresponds to an increasing log10 (Speedup) observed as more problems are solved in each benchmark set. For example, on the benchmark “AUGERAT” set, the log10 (Speedup) is observed to increase from under 2.6 to exceed 3.4 when instances “A2”, “A3” and “A4” are solved. Solution Quality To evaluate the solution quality of the proposed approach, Tables 3.15, 3.16, and 3.17 tabulate all the results obtained by respective algorithms over 30 independent runs. The values in “B.Cost” and “Ave.Cost” denoting superior performance are highlighted using bold font. Furthermore, in order to obtain the statistically comparison, Wilcoxon rank sum test with 95% confidence level has been conducted on the experimental results. It can be observed that in overall, CAMA-R achieved improved solution quality over CAMA in terms of Ave.Cost on most of the “AUGERAT” and “CE” CVRP instances. Particularly, on instance “A-n54-k7”, CAMA-R con-
64
3 Optinformatics within a Single Problem Domain 1400
1800
CAMA CAMA−R CAMA−M
1300
CAMA CAMA−R CAMA−M
1600 Travel Cost
Travel Cost
1200 1100 1000
1400
1200
900 1000 800 700
0
500
1000
800
1500
0
(a) E-n76-k7 2000
600
800
1800
1000
1400
CAMA CAMA−R CAMA−M
1600 Travel Cost
Travel Cost
1600
1400
1200
1000
1200
1000
400
(b) E-n76-k10 CAMA CAMA−R CAMA−M
1800
200
Number of Fitness Evaluation
Number of Fitness Evaluation
0
500
1000
Number of Fitness Evaluation
(c) E-n76-k14
1500
800
0
500
1000
1500
Number of Fitness Evaluation
(d) E-n101-k8
Fig. 3.20 Averaged search convergence traces (across 30 independent runs) of CAMA, CAMA-R, and CAMA-M on representative CVRP “CE” benchmark set. Y -axis: Travel cost, X -axis: Number of Fitness Evaluations. Note that CAMA-M is observed to search significantly faster in converging to near the lower bound solution on the respective CVRPs than the other counterparts
sistently converges to the best solution fitness across all the 30 independent runs, as denoted by “B.Cost”. CAMA-R also attained an improved B.Cost solution over CAMA on instance “B-n63-k10”. However, as observed in Table 3.17, on the “CHRISTOFIDES” benchmarks, where the size of the problem instances scale up, relative to “AUGERAT” and “CE” (i.e., the graphs are bigger in size, with larger number of customers or vertices that require servicing), the current baseline stateof-the-art CAMA is noted to exhibit superior performances over CAMA-R in terms of Ave.Cost. In terms of B.Cost, CAMA is also observed to have attained superior solution quality over CAMA-R on instance “c199”. Since the only difference between CAMA-R and CAMA lies in the heuristic bias introduced in the population initialization phase of the latter, it is possible to infer from the performances of CAMA and CAMA-R that the appropriate inductive biases made available in CAMA have been effective in narrowing down the search regions of the large scale problems. Moving on next to the proposed memetic computational search paradigm, the generation of the initial population of solutions is now guided by the memes learned
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences 1800
1400
1200
1400
1200
1000
1000
800
CAMA CAMA−R CAMA−M
1600 Travel Cost
Travel Cost
1800
CAMA CAMA−R CAMA−M
1600
0
800
1500
1000
500
Number of Fitness Evaluation
(a) c75
(b) c100
2200
2200
CAMA CAMA−R CAMA−M
2000
Travel Cost
Travel Cost
1600 1400
CAMA CAMA−R CAMA−M
2000
1800
1800 1600 1400
1200
1200
1000 0
1000
1500
1000
500
0
Number of Fitness Evaluation
200
400
600
800
1000
Number of Fitness Evaluation
(c) c100b
(d) c120
2500
3500
CAMA CAMA−R CAMA−M
2000
CAMA CAMA−R CAMA−M
3000 Travel Cost
Travel Cost
1500
1000
500
0
Number of Fitness Evaluation
800
65
2500
2000
1500 1500
1000
0
1000
2000
3000
Number of Fitness Evaluation
(e) c150
4000
1000
0
1000
2000
3000
4000
5000
Number of Fitness Evaluation
(f ) c199
Fig. 3.21 Averaged search convergence traces (across 30 independent runs) of CAMA, CAMA-R, and CAMA-M on representative CVRP “CHRISTOFIDES” benchmark set. Y -axis: Travel cost, X axis: Number of Fitness Evaluations. Note that CAMA-M is observed to search significantly faster in converging to near the lower bound solution on the respective CVRPs than the other counterparts
c100b
CHRISTOFIDES
1354.55 (684.00) 124.08(668.13) 61.44(652.25)
E-n76-k10
P-n50-k7
CE
2924.24 (1087.60)
618.18(1243.00) 31.81 (1054.15)
13.62 (1192.38)
591.14 (1333.38)
1615.15 (1361.00)
Bin3
Bin4
Bin5 15.75 (1551.50)
2.27(1091.13)
17.96(636.38)
19.81(953.80)
2.26(1040.50)
17.68(620.50)
32.86(1278.13) 16.28 (1250.50)
10.16 (1600.13)
17.46(1020.70) 18.42(987.25)
8.32(1141.75)
459.65 (1305.75)
10.87 (1648.75)
B-n57-k7
Bin2
A-n60-k9
AUGERAT
Bin1
754.55 (1746.00) 16.89 (1697.38)
Instances
Benchmark set
Speedup (Fitness)
21.77(920.35)
6.51(989.88)
17.47(604.63)
10.79 (1222.88)
16.52 (1502.88)
Bin6
Bin8
26.95(886.90)
11.43(939.25)
3.36 (588.75)
9.44(853.45)
6.11(888.63)
1.22(572.88)
6.98 (1195.25) 3.90 (1167.63)
9.72 (1454.25) 1.75 (1405.63)
Bin7
Table 3.14 Speedup by CAMA-M over CAMA in the different stages of the search on representative CVRP instances has been observed. The values embraced in brackets (.) denotes the actual fitness values for fitness bins 1–8
66 3 Optinformatics within a Single Problem Domain
4
4
3.8
3.8
3.6
3.6
3.4
3.4
Speedup (log10)
Speedup (log10)
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
3.2 3 2.8 2.6
3.2 3 2.8 2.6
2.4
2.4
2.2
2.2
2 A2
A3
A4
A5
B8
B7
B6
B9
B10
P11
P12
67
2 E2
E4
E3
E5
E6
Instance Index in "CE"
Instance Index in "AUGERAT"
(a) AUGERAT
(b) CE
4 3.8
Speedup (log10)
3.6 3.4 3.2 3 2.8 2.6 2.4 2.2 2 C2
C3
C4
C5
C6
C7
Instance Index in "CHRISTOFIDES"
(c) CHRISTOFIDES Fig. 3.22 Speedup of CAMA-M over CAMA across all the problem instances for the three CVRP benchmark sets. Y -axis: Speedup, X -axis: Problem Instance index of each CVRP in the benchmark set. A learning curve with increasing knowledge learned in each benchmark set, as observed from the increasing log10 (Speedup) observed, as more problems are solved
from past problem-solving experiences. These memes thus serve as the instructions learned from the experiences along with the CARP instances solved, which are then imitated to enhance the search on new problem instances. As discussed in Sect. 3.2.2, the “knowledge pool” of CAMA-M is empty when the first CVRP instance is encountered (e.g., “A-n32-k5” of “AUGERAT” benchmark set), and thus CAMA-M behaves like the baseline CAMA. As more CVRP instances are encountered, the learning, selection, variation and imitation mechanisms shall kick in to learn and generalize knowledge that would induce positive biases into the evolutionary search of new CVRP instances. It can be observed from Table 3.15 that the CAMA-M converges to competitive solution qualities attained by both CAMA-R and CAMA on the first problem instance of each CVRP benchmarks (since no knowledge meme is learned yet), while exhibiting superior performances over CAMA-R and CAMA on subsequent CVRP instances. Thus, beyond showing speedups in search performance at no loss
68
3 Optinformatics within a Single Problem Domain
Table 3.15 Solution quality of CAMA, CAMA-R, and CAMA-M on “AUGERAT” CVRP benchmark sets. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote CAMA-M statistically significant similar, better, and worse than CAMA, respectively) CVRP instance
CAMA
CAMA-R
CAMA-M
(Proposed method)
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
A1.An32-k5
784
748
0
784
784
0
784
784
0
A2.An54-k7
1167
1169.50
3.36
1167
1167
0
1167
1167+
0
A3.An60-k9
1354
1356.73
3.59
1354
1355.20
1.86
1354
1354.4+
1.22
A4.An69-k9
1159
1164.17
3.07
1159
1162.20
2.41
1159
1161.43+
2.37
A5.An80-k10
1763
1778.73
9.30
1763
1777.07
7.94
1763
1775.7 ≈
8.80
B6.Bn41-k6
829
829.30
0.47
829
829.93
0.94
829
829.53 ≈
0.73
B7.Bn57-k7
1140
1140
0
1140
1140
0
1140
1140 ≈
0
B8.Bn63-k10
1537
1537.27
1.46
1496
1528.77
15.77
1496
1525.86+
17.45
B9.Bn68-k9
1274
1281.47
5.56
1274
1284.80
4.51
1273
1281.43 ≈ 5.74
B10.Bn78-k10
1221
1226.07
5.48
1221
1226.80
6.39
1221
1224.37 ≈ 3.23
P11.Pn50-k7
554
556.33
2.34
554
554.93
1.72
554
554.26+
1.01
P12.Pn76-k5
627
630.70
5.34
627
628.87
1.61
627
628.63 ≈
1.51
in solution quality, CAMA-M has been observed to attain improved solution quality on the CVRPs. In particular, on “AUGERAT” and “CE” benchmark sets, CAMA-M exhibits superior performances in terms of Ave.Cost on 13 out of 19 CVRP instances. In addition, on “CHRISTOFIDES” benchmark set, CAMA-M also attained improved solution quality in terms of Ave.Cost on 4 out of 7 CVRP instances. Since CAMA, CAMA-R and CAMA-M shares a common baseline evolutionary solver, i.e., CAMA, and differing only in terms of the population initialization phase, the superior performance of CAMA-M can clearly be attributed to the effectiveness of the proposed transfer learning as culture-inspired operators where imitation of learned knowledge meme from past problem solving experiences are used to generate biased solutions that lead to enhanced future evolutionary searches. Insights on a Knowledge Biased CVRP Solution In this subsection, to provide a deep insight into the mechanisms of the proposed approach in attaining the high performance efficacy observed, analysis on samples of the solutions obtained by CAMA, CAMA-R and CAMA-M, as well as the converged
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
69
Table 3.16 Solution quality of CAMA, CAMA-R, and CAMA-M on “CE” CVRP benchmark sets. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote CAMA-M statistically significant similar, better, and worse than CAMA, respectively) CVRP instance
CAMA
CAMA-R
CAMA-M
(Proposed method)
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
E1.En33-k4
835
835
0
835
835
0
835
835 ≈
0
E2.En76-k7
682
685.67
2.17
682
684.73
1.31
682
684.66 ≈
1.12
E3.En76-k8
735
737.57
2.36
735
737.17
1.60
735
737.06 ≈
1.91
E4.En76-k10
830
837.03
3.56
831
835.80
3.19
830
834.73+
2.92
E5.En76-k14
1021
1025.67
3.48
1021
1026.27
3.33
1021
1025.80 ≈ 3.64
E6.En101-k8
816
820.63
3.20
815
818.97
1.94
815
818.53+
1.55
Table 3.17 Solution quality of CAMA, CAMA-R, and CAMA-M on “CHRISTOFIDES” CVRP benchmark sets. The superior solution quality of each respective problem instance is highlighted in bold font. (“≈”, “+” and “−” denote CAMA-M statistically significant similar, better, and worse than CAMA, respectively) CVRP instance
CAMA
CAMA-R
CAMA-M
(Proposed method)
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost Std.Dev
B.Cost
Ave.Cost
Std.Dev
C1.c50
524.61
525.45
2.56
524.61
524.61
0
524.61
525.73 ≈
2.90
C2.c75
835.26
842.32
4.04
835.26
840.28
3.52
835.26
839.53+
3.45
C3.c100
826.14
829.43
2.39
826.14
829.73
2.08
826.14
829.13 ≈
1.94
C4.c100b 819.56
819.56
0
819.56
819.56
0
819.56
819.56 ≈
0
C5.c120
1042.11
1044.18
2.21
1042.11
1043.08
1.13
1042.11
1042.83+ 0.94
C6.c150
1032.50
1043.27
5.67
1034.19
1044.73
5.87
1030.67
1041.97 ≈ 6.27
C7.c199
1304.87
1321.17
5.98
1313.46
1325.48
5.99
1308.92
1322.18 ≈ 7.51
optimized solution of CAMA on solving problem instance “B-n41-k6” is present. In Fig. 3.23, each node denotes a customer that needs to be serviced, and the nodes with the same color and shape shall be serviced by a common route or vehicle. Figure 3.23a and b denote the solution in the initial population of baseline CAMA and CAMA-R, respectively. Figure 3.23c is the solution in CAMA-M, which has been positively biased using the imitated knowledge learned from past experiences of problem solving on CVRP instances “A-n32-k5”, “A-n54-k7”, “A-n60-k9” and “An69-k9”. Furthermore, Fig. 3.23d gives the converged optimized solution achieved by CAMA. As observed, the task distributions of the solution in CAMA-M search is noted to bear greatest similarities to that of the converged optimized solution by CAMA,
70
3 Optinformatics within a Single Problem Domain
Fig. 3.23 An illustration of CVRP solutions in the respective EA populations for solving “B-n41k6” CVRP instance. Each point plotted in the sub-figures denotes a CVRP customer node that needs service. Points or nodes with same symbol are serviced by a common vehicle
as compared to that of CAMA and CAMA-R. Besides task distributions, portions of the figures are magnified in Fig. 3.23c and d, for the purpose of illustrating the service orders of a solution obtained by CAMA-M relative to the converged optimized solution of baseline CAMA, respectively. The magnified subfigures illustrate high similarities between their respective service orders. This suggests that the service orders information of the converged optimized solution for instances “A-n32-k5”, “An54-k7”, “A-n60-k9” and “A-n69-k9” has been successful learned and preserved by the learning operator of the CAMA-M, and subsequently through the cultural evolutionary mechanisms of selection, variation and imitation, the learned knowledge meme is imitated to generate positively biased solutions that is close to the optimal solution, thus bringing about significant speedups in the search on related problem instances.
3.2 Knowledge Reuse via Transfer Learning from Past Search Experiences
71
3.2.6 Summary A significantly under-explored area of evolutionary optimization in the literature is the study of optimization methodologies that can evolve along with the problems solved. Particularly, present evolutionary optimization approaches generally start their search from scratch or the ground-zero state of knowledge, independent of how similar the given new problem of interest is to those optimized previously. This chapter proposed a new Memetic Computation Paradigm: Evolutionary Optimization + Transfer Learning for search, which models how human solves problems and presented a novel study towards intelligent evolutionary optimization of problems through the transfers of structured knowledge in the form of memes learned from previous problem-solving experiences, to enhance future evolutionary searches. In particular, the four culture-inspired operators, namely, Learning, Selection, Variation and Imitation have been proposed and presented. The mathematical formulations of the cultural evolutionary operators for solving well established NP-hard routing problems have been derived, where learning is realized by maximizing the statistical dependence between past problem instances solved and the respective optimized solution. In contrast to earlier works, the proposed approach facilitates a novel representation, learning and imitation of generalized knowledge that provide greater scope for faster evolutionary search on unseen related problems of differing characteristics including problem size, topology, structure, representation, etc.
References 1. B. Liu, L. Wang, Y.-H. Jin, An effective PSO-based memetic algorithm for flow shop scheduling. IEEE Tran. Syst. Man Cybern. Part B (Cybernetics) 37(1), 18–27 (2007) 2. S.K. Hasan, R. Sarker, D. Essam, D. Cornforth, Memetic algorithms for solving job-shop scheduling problems. Memetic Comput. 1(1), 69–83 (2009) 3. K. Tang, Y. Mei, X. Yao, Memetic algorithm with extended neighborhood search for capacitated arc routing problems. IEEE Trans. Evol. Comput. 13(5), 1151–1166 (2009) 4. M. Tang, X. Yao, A memetic algorithm for vlsi floorplanning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 37(1), 62–69 (2007) 5. H. Soh, Y.S. Ong, Q.C. Nguyen, Q.H. Nguyen, M.S. Habibullah, T. Hung, J.-L. Kuo, Discovering unique, low-energy pure water isomers: memetic exploration, optimization, and landscape analysis. IEEE Trans. Evol. Comput. 14(3), 419–437 (2010) 6. C. Aranha, H. Iba, The memetic tree-based genetic algorithm and its application to portfolio optimization. Memetic Comput. 1(2), 139–151 (2009) 7. P. Moscato, Memetic algorithm: a short introduction, in New Ideas in Optimization (McGrawHill, London, 1999), pp. 219–234 8. Q.H. Nguyen, Y.S. Ong, M.H. Lim, A probabilistic memetic framework. IEEE Trans. Evol. Comput. 13(3), 604–623 (2009) 9. Y.-S. Ong, M.-H. Lim, N. Zhu, K.-W. Wong, Classification of adaptive memetic algorithms: a comparative study. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 36(1), 141–152 (2006) 10. H. Ishibuchi, T. Yoshida, T. Murata, Balance between genetic search and local search in memetic algorithms for multiobjective permutation flowshop scheduling. IEEE Trans. Evol. Comput. 7(2), 204–223 (2003)
72
3 Optinformatics within a Single Problem Domain
11. P. Merz, On the performance of memetic algorithms in combinatorial optimization, in Proceedings of 2001 Genetic and Evolutionary Computation Conference. Citeseer (2001) 12. Z. Zhu, Y.S. Ong, M. Dash, Wrapper–filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 37(1), 70–76 (2007) 13. Z. Zhu, Y.S. Ong, M. Dash, Markov blanket-embedded genetic algorithm for gene selection. Pattern Recogn. 40(11), 3236–3248 (2007) 14. N. Krasnogor, Studies on the Theory and Design Space of Memetic Algorithms. Ph.D. dissertation, University of the West of England (2002) 15. I.S. Oh, J.S. Lee, B.R. Moon, Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1424–1437 (2004) 16. C. Guerra-Salcedo, S. Chen, D. Whitley, S. Smith, Fast and accurate feature selection using hybrid genetic strategies, in Proceedings of the 1999 Congress on Evolutionary Computation, vol. 1, pp. 177–184. IEEE (1999) 17. J.H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (MIT Press, 1992) 18. L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5(Oct), 1205–1224 (2004) 19. H. Liu, L. Yu, Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005) 20. M. Dash, H. Liu, Feature selection for classification. Intell. Data Anal. Int. J. 1(3), 131–156 (1997) 21. R. Kohavi, G.H. John et al., Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273– 324 (1997) 22. I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003) 23. E. Amaldi, V. Kann et al., On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoret. Comput. Sci. 209(1), 237–260 (1998) 24. H. Almuallim, T.G. Dietterich, Learning Boolean concepts in the presence of many irrelevant features. Artif. Intell. 69(1–2), 279–305 (1994) 25. P. Narendra, K. Fukunaga, A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 26(9), 912–922 (1977) 26. M. A. Hall. Correlation-based feature selection for discrete and numeric class machine learnin, in Proceedings of the 17th International Conference on Machine Learning (2000), pp. 359–366 27. P. Pudil, J. Novoviˇcová, J. Kittler, Floating search methods in feature selection. Pattern Recogn. Lett. 15(11), 1119–1125 (1994) 28. H. Liu, R. Setiono, et al., A probabilistic approach to feature selection-a filter solution, in Proceedings of 13th International Conference on Machine Learning, vol. 96. Citeseer (1996), pp. 19–327 29. G. Brassard, P. Bratley, Fundamentals of Algorithmics, vol. 524 (Prentice Hall, 1996) 30. J. Yang, V. Honavar, Feature subset selection using a genetic algorithm. IEEE Trans. Intell. Syst. 13(2), 44–49 (1998) 31. M.L. Raymer, W.F. Punch, E.D. Goodman, L.A. Kuhn, A.K. Jain, Dimensionality reduction using genetic algorithms. IEEE Trans. Evol. Comput. 4(2), 164–171 (2000) 32. Y.S. Ong, A.J. Keane, Meta-lamarckian learning in memetic algorithms. IEEE Trans. Evol. Comput. 8(2), 99–110 (2004) 33. J.E. Baker, Adaptive selection methods for genetic algorithms, in Proceedings of the 1st International Conference on Genetic Algorithms. (Hillsdale, New Jersey, 1985), pp. 101–111 34. M. Robnik-Šikonja, I. Kononenko, Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn. 53(1–2), 23–69 (2003) 35. B.J. Frey, D. Dueck, Clustering by passing messages between data points. Science 315(5814), 972–976 (2007) 36. S. Jia, Y. Qian, Z. Ji, Band selection for hyperspectral imagery using affinity propagation, in Proceedings of the 2008 Digital Image Computing: Techniques and Applications (2008), pp. 137–141
References
73
37. C.H. Ooi, P. Tan, Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics 19(1), 37–44 (2003) 38. L. Li, C.R. Weinberg, T.A. Darden, L.G. Pedersen, Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the ga/knn method. Bioinformatics 17(12), 1131–1142 (2001) 39. J.J. Liu, G. Cutler, W. Li, Z. Pan, S. Peng, T. Hoey, L. Chen, X.B. Ling, Multiclass cancer classification and biomarker discovery using ga-based algorithms. Bioinformatics 21(11), 2691–2697 (2005) 40. C. Ambroise, G.J. McLachlan, Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. 99(10), 6562–6566 (2002) 41. U.M. Braga-Neto, E.R. Dougherty, Is cross-validation valid for small-sample microarray classification? Bioinformatics 20(3), 374–380 (2004) 42. T. Li, C. Zhang, M. Ogihara, A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20(15), 2429–2437 (2004) 43. V. Vapnik, Statistical Learning Theory (John Wiley, 1998) 44. U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999) 45. S.L. Pomeroy, P. Tamayo, M. Gaasenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, J.Y.H. Kim, L.C. Goumnerova, P.M. Black, C. Lau et al., Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002) 46. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999) 47. L.J. Van’t Veer, H. Dai, M.J. Van De Vijver, Y.D. He, A.A.M. Hart, M. Mao, H.L. Peterse, K. Van Der Kooy, M.J. Marton, A.T. Witteveen, et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871), 530–536 (2002) 48. G.J. Gordon, R.V. Jensen, L.-L. Hsiao, S.R. Gullans, J.E. Blumenstock, S. Ramaswamy, W.G. Richards, D.J. Sugarbaker, R. Bueno, Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Can. Res. 62(17), 4963–4967 (2002) 49. E.F. Petricoin III, A.M. Ardekani, B.A. Hitt, P.J. Levine, V.A. Fusaro, S.M. Steinberg, G.B. Mills, C. Simone, D.A. Fishman, E.C. Kohn et al., Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306), 572–577 (2002) 50. C.L. Nutt, D.R. Mani, R.A. Betensky, P. Tamayo, J.G. Cairncross, C. Ladd, U. Pohl, C. Hartmann, M.E. McLaughlin, T.T. Batchelor et al., Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Can. Res. 63(7), 1602–1607 (2003) 51. A.A. Alizadeh, M.B. Eisen, R.E. Davis, C. Ma, I.S. Lossos, A. Rosenwald, J.C. Boldrick, H. Sabet, T. Tran, X. Yu et al., Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000) 52. S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, S.J. Korsmeyer, Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30(1), 41–47 (2002) 53. J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C.R. Antonescu, C. Peterson et al., Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001) 54. N. Yukinawa, S. Oba, K. Kato, K. Taniguchi, K. Iwao-Koizumi, Y. Tamaki, S. Noguchi, S. Ishii, A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors. BMC Genomics 7(1), 190 (2006) 55. P. Good, Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses (Springer Science & Business Media, 2013)
74
3 Optinformatics within a Single Problem Domain
56. L. Bruzzone, C. Persello, A novel approach to the selection of spatially invariant features for the classification of hyperspectral images with improved generalization capability. IEEE Trans. Geosci. Remote Sens. 47(9), 3180–3191 (2009) 57. R. Neher, A. Srivastava, A Bayesian MRF framework for labeling terrain using hyperspectral imaging. IEEE Trans. Geosci. Remote Sens. 43(6), 1363–1374 (2005) 58. M.F. Baumgardner, L. L. Biehl, D.A. Landgrebe, 220 band aviris hyperspectral image data set: June 12, 1992 Indian pine test site 3, Sep 2015 59. F. Melgani, L. Bruzzone, Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. Remote Sens. 42(8), 1778–1790 (2004) 60. Y.C. Jin, Knowledge Incorporation in Evolutionary Computation. Studies in Fuzziness and Soft Computing (Springer, 2010) 61. P. Cunningham, B. Smyth, Case-based reasoning in scheduling: reusing solution components. Int. J. Product. Res. 35(4), 2947–2961 (1997) 62. S.J. Louis, J. McDonnell, Learning with case-injected genetic algorithms. IEEE Trans. Evol. Comput. 8(4), 316–328 (2004) 63. R. Dawkins, The Selfish Gene (Oxford University Press, Oxford, 1976) 64. Y.S. Ong, M.H. Lim, X.S. Chen, Research frontier: - past, present & future. IEEE Comput. Intell. Mag. 5(2), 24–36 (2010) 65. F. Neri, C. Cotta, P. Moscato, Handbook of Memetic Algorithms. Studies in Computational Intelligence (Springer, 2011) 66. X. Chen, Y. Ong, M. Lim, K.C. Tan, A multi-facet survey on memetic computation. IEEE Trans. Evol. Comput. 15(5), 591–607 (2011) 67. M. Minsky, The Society of Mind (Simon & Schuster, Inc., 1986) 68. A. Gretton, O. Bousquet, A. Smola, B. Scholkopf, ¨ Measuring statistical dependence with Hilbert-Schmidt norms, in Proceedings of Algorithmic Learning Theory, pp. 63–77 (2005) 69. S.C.H. Hoi, J. Zhuang, I. Tsang, A family of simple non-parametric kernel learning algorithms. J. Mach. Learn. Res. (JMLR), 12, 1313–1347 (2011) 70. L. Song, A. Smola, A. Gretton, K.M. Borgwardt, A dependence maximization view of clustering, in Proceedings of the 24th International Conference on Machine learning (2007), pp. 815–822 71. M.A. Runco, S. Pritzker, Encyclopedia of Creativity (Academic Press, 1999) 72. M. Dror, Arc Routing. Theory, Solutions and Applications (Kluwer Academic Publishers, Boston, 2000) 73. P. Lacomme, C. Prins, W. Ramdane-Cherif, Competitive memetic algorithms for arc routing problems. Ann. Oper. Res. 131(1), 159–185 (2004) 74. Y. Mei, K. Tang, X. Yao, Improved memetic algorithm for capacitated arc routing problem. IEEE Congr. Evol. Comput. 1699–1706 (2009) 75. B. Golden, R. Wong, Capacitated arc routing problems. Networks 11(3), 305–315 (1981) 76. E.W. Dijkstra, A note on two problems in connection with graphs. Numer. Math. 1, 269–271 (1959) 77. I. Borg, P.J.F. Groenen, Modern Multidimensional Scaling: Theory and Applications (Springer, 2005) 78. B.L. Golden, J.S. DeArmon, E.K. Baker, Computational experiments with algorithms for a class of routing problems. Comput. Oper. Res. 10(1), 47–59 (1983) 79. G. Ulusoy, The fleet size and mix problem for capacitated arc routing. Eur. J. Oper. Res. 22(3), 329–337 (1985) 80. G. Dantzig, J.H. Ramser, The truck dispatching problem. Manage. Sci. 6, 80–91 (1959) 81. X.S. Chen, Y.S. Ong, Q.H. Nguyen, A conceptual modeling of meme complexes in stochastic search. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(5), 612–625 (2011)
Chapter 4
Optinformatics Across Heterogeneous Problem Domains and Solvers
Besides the illustration of optinformatic algorithm design in evolutionary learning and optimization within a single problem domain, this chapter further introduce the specific algorithm development of optinformatics across heterogeneous problems or solvers. In particular, based on the paradigm of evolutionary search plus transfer learning in Sect. 3.2, the first optinformatic algorithm considers the knowledge transfer for enhanced vehicle or arc routing performance across problems domain. It intends to transfer the knowledge from vehicle routing to speed up the optimization of arc routing, and vice versa. The vehicle and arc routing problems used in Sect. 3.2 is again considered in this work to evaluate the performance of the optinformatic algorithm. Next, the second optinformatic method introduce the evolutionary knowledge learning and transfer across reinforcement learning agents in multi-agent system. Two types of neural network, i.e., feedforward and adaptive resonance theory (ART) neural network, are employed as the reinforcement learning agents, and the application of mine navigation and game of unreal tournament 2004 are used as the learning task to investigate the performance of the optinformatic method.
4.1 Knowledge Learning and Transfer Across Problems Domain Towards Advanced Evolutionary Optimization In the recent decade, it is observed that many efficient optimizations using modern advanced EAs have been achieved via the incorporation of domain specific knowledge as inductive biases that fit to the problem of interest well [1]. These dedicated EAs have been specially crafted with the embedment of human-expert domain specific knowledge about the underlying problem so as to speed up search convergence. In the recent special issues [2, 3] and journals [4] dedicated to EA research, several successes of evolutionary and memetic algorithms [5–7] that incorporate human expert knowledge have been reported on a plethora of complex applications © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. Feng et al., Optinformatics in Evolutionary Learning and Optimization, Adaptation, Learning, and Optimization 25, https://doi.org/10.1007/978-3-030-70920-4_4
75
76
4 Optinformatics Across Heterogeneous Problem …
including quadratic assignment problem [8], feature selection [9], permutation flow shop scheduling [10], and VLSI floorplanning [11], etc. To reduce the high reliance on human efforts in designing advanced evolutionary algorithms, as discussed in Sect. 2.2, some researchers have considered a direct incorporation of solutions archived from previous searches as an alternative.1 From a survey of the literature, it is worth noting that in spite of the efforts to automate the incorporation of domain knowledge into future evolutionary search, the success has been limited by several key factors. In particular, the earlier works in [12–14] make a strong assumption on the type of problems solved. References [12, 13] require the newly encountered problem instances to share common tasks with previous solved instances. Feng et al. [14] does not require the tasks to be common among problem instances, they however, be restricted to the representation used, which impedes the seamless reuse of domain knowledge across problems. To summarize, the greatest barrier to further progress can thus be attributed to the unique representations and characteristics of different problem domains. Hence, it is often the case that the information captured from a problem domain cannot be directly used in another. To date, little or no investigation has been conducted to automate the learning and evolution of knowledge from differing problem domains in the context of evolutionary optimization. Given the restricted theoretical knowledge available in this area and the limited progress made, there is thus an appeal for evolutionary search paradigms that can draw upon useful knowledge learned from different problems previously solved. Hence, this work serves as a feasibility study on evolutionary paradigm that learns and evolves knowledge nuggets in the form of memes that traverse related problem domains, in the spirit of intelligent evolutionary optimization and enhanced search, where a meme is defined here as the basic unit of cultural transmission [15]. Through this study, the following insights to the incorporation of knowledge across problem domains are provided. – What is the representation of a knowledge meme? – How to learn and mine knowledge memes from the evolutionary search? – How to evolve knowledge memes in the evolutionary search across related problem domains? – Can evolutionary optimization benefit from the knowledge memes of related problem domains? – How do the knowledge memes of different but related problem domains influence the evolutionary search? – What forms of knowledge meme can lead to enhanced evolutionary search?
1 Note
that this is in contrast to domain-specific human crafted EAs in which domain knowledge is only captured and incorporated once as part of the algorithm design and development process.
4.1 Knowledge Learning and Transfer Across Problems …
77
4.1.1 Knowledge Meme Shared by Problem Domains—CVRP & CARP Capacitated vehicle routing problem (CVRP) and Capacitated arc routing problem (CARP) have traditionally emerged as two independent research tracks in the literature. Nevertheless, since both problem domains belong to the family of vehicle routing, it makes one wonder whether the problem solving experiences learned on one domain could be useful to the other. Taking this cue, this section begins with a study on the relatedness between problems of these two independent domains based on their optimized solution routes, since it is what both targets to attain. The common objective of both domains is to minimize the distances traveled by the vehicles in serving the available customers, which heavily depends on the specific assignments of customers to each vehicle. In both CVRP and CARP optimized solution routes, each vehicle can be treated as a cluster. Thus the corresponding customers assignments are actually inter-cluster and intra-cluster structure information, which are determined by the distances of the cluster members. The intra-cluster and inter-cluster distances indicate what kind of distance is it between two customers, they should be served by the same or different vehicle, respectively. Hence to study the relatedness between CVRP and CARP optimized solutions, the pairwise distance distributions of customers or tasks that are serviced by the same (intra) and different (inter) vehicles are considered, which are given by the histograms of the following two distance sets: Ds = {d(ti , t j )|ti , t j ∈ Ts } Dd = {d(t p , tq )|t p , tq ∈ Td } where Ts and Td denote the set of customers or tasks serviced by the same and different vehicles, respectively. The customer service information is extracted from the optimized solution of each problem domain independently. d(ti , t j ) gives the shortest distance between customers (i.e., vertex in CVRP and arc in CARP) ti and t j . Figures 4.1 and 4.2 summarize the pairwise distributions of two CVR problem instances (e.g., labeled as “A-n54-k7” and “B-n57-k7”) and two CAR problem instances (e.g., labeled as “E1C” and “E2C”), respectively. The optimized solutions are obtained using recently introduced state-of-the-art evolutionary solvers of the respective domains, i.e., [16, 17]. In the figures, X-axis reflects the normalized range of the distance values, and Y-axis denotes the number of distances that fall within the respective range, which is given by: Yi =
nd j=1
I(x j ∈ Bin i )
(4.1)
78
4 Optinformatics Across Heterogeneous Problem … Customers Serviced by a Common Vehicle
Customers Serviced by Different Vehicles
12
70
10
60
8
50 40
6 30 4 20 2
0 0
10
0.4
0.7
1
0 0
0.4
(a)
0.7
1
(b)
Customers Serviced by a Common Vehicle
Customers Serviced by Different Vehicles
30
100 90
25
80 70
20
60 15
50 40
10 30 20
5
10 0 0
0.4
0.7
(c)
1
0 0
0.4
0.7
1
(d)
Fig. 4.1 Pairwise distance distributions obtained from optimized “A-n54-k7” and “B-n57-k7” CVRP instances. a denotes the pairwise distance distribution of customers serviced by a common vehicle in “A-n54-k7”; b denotes the pairwise distance distribution of customers serviced by different vehicles in “A-n54-k7”; c presents the pairwise distance distribution of customers serviced by a common vehicle in “B-n57-k7”; d presents the pairwise distance distribution of customers serviced by different vehicles in “B-n57-k7”
where nd is total number of distance values, I denotes the indicator function. As depicted, similar trends in the pairwise distance distributions can be observed for the CVRP and CARP optimized solutions, see Figs. 4.1a versus 4.2a and Figs. 4.1c versus 4.2c. These similarities imply the existence of similar structure configurations between CVRP and CARP optimized solutions. In other words, these CVRP and CARP optimized solutions bare common or similar assignment and service orders of customers, despite the differences in the problem representations. So it is straightforward to infer that common knowledge exists in these two problem domains, and it would lead to more efficient and effective problem solving when captured from one problem domain and operate on the other. Inspired by this interesting observation, in what follows, how to link these two independent routing problem
4.1 Knowledge Learning and Transfer Across Problems … Customers Serviced by Different Vehicles
Customers Serviced by a Common Vehicle 10
50
9
45
8
40
7
35
6
30
5
25
4
20
3
15
2
10
1
5
0 0
0.7
0.4
79
1
0 0
0.7
0.4
(a)
1
(b)
Customers Serviced by a Common Vehicle
Customers Serviced by Different Vehicles
15
140 120 100
10 80 60 5 40 20 0 0
0.4
0.7
(c)
1
0 0
0.4
0.7
1
(d)
Fig. 4.2 Pairwise distance distributions obtained from optimized “E1C” and “E2C” CARP instance. a denotes the pairwise distance distribution of customers serviced by a common vehicle in “E1C”; b denotes the pairwise distance distribution of customers serviced by different vehicles in “E1C”; c presents the pairwise distance distribution of customers serviced by a common vehicle in “E2C”; d presents the pairwise distance distribution of customers serviced by different vehicles in “E2C”
domains, i.e., CVRP and CARP, and derive the shared knowledge meme in these two problem domains that can be learned and evolved for enhanced problem-solving, is explored.
4.1.2 A Common Problem Representation for CVR and CAR Problem This subsection proposes to establish a common representation for CVR and CAR problem, so that the relationship between customers in these two independent domains can be conducted which makes further knowledge meme evolution across problem domain possible. An example of the CVRP and associated optimized route is given in Fig. 4.3a, where the vertices represents the customers to service and dashed
80
4 Optinformatics Across Heterogeneous Problem …
v2
v2
v1
v3
v1
v3 v {v1 , v2 }
vd
v4
v9
v5
v8 v6
vd
v4
v9
v5 v6
v7
v8
v7
(ii) Corresponding optimized route
(i) CVRP instance
(a) CVRP instance and its optimized route.
e {v h , v t }
e1
e1
e5
e5
vd
vd e2
e2
e4
e4 e3 (i) CARP instance
e3 (ii) Corresponding optimized route
(b) CARP instance and its optimized route. Fig. 4.3 An illustration of CVRP and CARP instances and their respective optimized solutions
lines denotes shortest distance between customers. Figure 4.3b gives an illustration of a CARP instance and the associated optimized route, where full lines denote the arcs to service and dashed lines gives the shortest distances between the arcs. Each arc is represented by its head and tail vertices. In CVRP, each customer of Fig. 4.3a(i) is represented as a vertex, with given cartesian coordinates (v = {v1 , v2 }). On the other hand, customers or tasks in the CARP, as depicted in Fig. 4.3b(i), (i.e., Fig. 4.3), are the full line arcs (e = {vh , vt }), where vh and vt denotes the head and tail vertex, respectively. In CARP, the distances of connected vertices are provided to describe the structure of the problem. No
4.1 Knowledge Learning and Transfer Across Problems …
e1
v1 e {v h , v t }
e1
v10
v2
v3
81
e5
vd
v9
e2 e4
v4
v 5 e3 v 6 CARP Representation
v8
e5
MDS + Manifold Alignment
e2
vd
v7
e3
e4
CVRP Representation
Fig. 4.4 Position approximation for CARP arcs via MDS and manifold alignment
information on the vertex coordinates are available. As can be observed, CVRP and CARP differ in the representation of customers in a graph network. This impedes the direct incorporation of useful knowledge from one domain to the other. In general, there are three possible means to seek a common representation involving two domains, A and B. The first is to consider the representation of A, while transforming all problem instances of domain B to A. The second is nonetheless to maintain representation of B, and transform all problem instance in A to the feature space of domain B. Lastly, all problem instances in domain A and B can be mapped to a new representation C, which is common to both. Here, options 1 and 2 are considered. In particular, the mapping from CARP to CVRP is considered and the latter is used as the common representation. The transformation from CARP to CVRP representation begins with a calculation on the shortest distances among all the arcs that need service using the Dijkstra’s algorithm [18]. Subsequently, the position of each arc is approximated with the obtained shortest distance matrix of arcs by means of multidimensional scaling (MDS) [19]. In resultant, each arc is represented as a node with spatial features, like the customers in CVRP. Furthermore, the manifold alignment [20] is performed between CVRP customers and the MDS approximated CARP arc positions to derive the common feature space of CVRP and CARP, while matching the local geometry and preserving the neighborhood relationship within both CVRP and CARP. This is based on the idea that customers in CVRP and CARP who bears the similar local geometry should be close to each other in the new common space. In this way, a common problem representation between CVRP and CARP problem can be established (see Fig. 4.4). The pseudo-code of the proposed establishment of a common problem representation for CVR and CAR is summarized in Algorithm 1.
82
4 Optinformatics Across Heterogeneous Problem …
Algorithm 1: Pseudo code of the proposed establishment of common representation for CVRP and CARP Begin: for given CVRP instance Iv and CARP instance Ia do Calculate the shortest distance matrix S D among all the arcs of Ia by Dijkstra’s algorithm.
1 2 3
4
Approximate spatial features of arcs in Ia by means of MDS with S D.
5
Perform manifold alignment between CVRP customers and the MDS approximated CARP arc positions to derive their common problem representation. End
6
CVRP optimized routes
CVRP instance v2
v1
v3
vd
v4 v6
v6
e1
e5
e5
vd
CVRP + CARP +
v7
e2
e4
e3
CARP instance
e4 e3
v6
v7
v8
e1
e1
M carp M cvrp
High Quality Solution Routes
v8
Common Representation
carp
vd
vd
e2
v9
vd
v4
v9
v5
Knowledge Meme M and M cvrp
v1
v3
v5
v7
Common Representation
e1
v2
v1
vd
v4
v8
v6
v7
v3 v9
v5
v8
v2
v1 vd
v4
v9
v5
v2
v3
e2 e3
e5
e5
vd
e2 e4
e4 CARP optimized routes
e3
Fig. 4.5 An illustration example of knowledge meme shared by CVRP and CARP
4.1.3 Knowledge Meme Shared by CVRP and CARP Like genes that serve as “instructions for building proteins”, memes are then “instructions for carrying out behavior, stored in brains” [21–25]. In the present context, a knowledge meme in evolutionary optimization serves as an instruction to guide the search towards the near-optimal solution. In practice, most problems (including optimization problems) seldom exist in isolation [26]. For example, the experiences on riding a bicycle can help one to drive a motorcycle more productively. Students are also able to apply what have been learned in school subsequently in their work life very successfully. Thus experiences or knowledge memes learned from solving a problem can be deployed to enhance the optimization on related problems. With a common problem representation established between CVRP and CARP, any knowledge meme learned from one domain can directly apply to the other. An illustration example is depicted in Fig. 4.5. As can be observed, the shared knowledge meme M of CVRP and CARP is a form of instruction mined from the optimized solution route in the common feature space. When this knowledge meme further be
4.1 Knowledge Learning and Transfer Across Problems …
83
operated on unseen routing problem across domain, it is able to generate high quality solution routes immediately without any search process. In both CVRP and CARP, the search for optimal solution involves first identifying suitable task (i.e., vertices or arcs that require to be serviced) assignments of each vehicle, and then finding the optimal service orders of each vehicle for the assigned tasks. Since knowledge meme is extracted from the optimized solution routes, it contains the success of both task assignment and task servicing ordering information inside. In what follows, a specific realization of the form of knowledge meme and how it be learned and evolved between CVR and CAR problem domains are presented.
4.1.4 Evolutionary Optimization with Learning and Evolution of Knowledge Meme Between CVRP and CARP On the basis of Chap. 3, in this section, a realization of the knowledge meme and its learning and evolution in the established common problem space of CVRP and CARP are presented for enhanced evolutionary search. For a given CVRP or CARP instance p and its optimal solution s∗ , the knowledge meme is derived as a transformation M on the problem instance p, which makes the transformed task distribution align well with the optimal solution s∗ . In such a manner, the success of task assignment and task servicing ordering in s∗ can be easily obtained via techniques such as clustering, pairwise distance comparison, etc., operating on the transformed tasks. As presented in Fig. 4.6, where Fig. 4.6a denotes the original task distribution of a given CVRP or CARP instance and Fig. 4.6b represents the obtained optimized solution. If the appropriate transformation M has been captured from Fig. 4.6b and deployed on Fig. 4.6a, the resultant task distribution is depicted in Fig. 4.6c. As can be observed, the transformation has re-located tasks serviced by a common vehicle to become closer to one another (as desired by the optimized solution shown in Fig. 4.6b), while tasks serviced by different vehicles to be kept further apart. In addition, to match the service orders of each vehicle to that of the optimized solution, the task distribution is adapted according to the sorted pairwise distances in ascending order (e.g., the distance between v1 and v3 is the largest among v1 , v2 and v3 , while the distance between v10 and v9 is smaller than that of v10 and v8 ). Thus when conducting clustering on the transformed tasks and pairwise distance comparison on the tasks assigned in each cluster, the task assignment and task service orders of Fig. 4.6b can be obtained as depicted in Fig. 4.6d.
4.1.4.1
Learning of Knowledge Meme from CVRP or CARP Search Experiences
In particular, for a given CVRP or CARP problem instance and its optimized solution, denoted by (p, s∗ ), the learning of knowledge meme M has been formulated as a maximization of the statistical dependency [27] between p and s∗ with distance
84
4 Optinformatics Across Heterogeneous Problem …
v2
v1 v 10
v8 v5
v6
v7
v3
v5
v6
(b) Obtained CVRP or CARP optimized solution
v9
v3
v8 v4
v7
v5
v10
v1
v8
v4
(a) Original CVRP or CARP task distribution
v2
v9
v3
v3 v4
v1 v 10
v2
v9
v2 v1
v10
v9
v8 Depot
v4
v6 v 7
(c) Transformed task distribution
v5
v6
v7
(d) Task assignment and servicing ordering on transformed task distribution via clustering and pairwise distance comparison
Fig. 4.6 An illustration example of how the success of task assignment and task servicing ordering in the optimal solution s are archived by a transformation
constraints as follows: max tr (HKHY) K
s.t. K = XT ∗ M ∗ X, K 0 Di j > Diq , ∀(i, j, q) ∈ N
(4.2)
where tr (·) denotes the trace operation of a matrix. X, Y are the matrix representations of a CARP or CVRP instance p and the corresponding problem solution s∗ , respectively. If task vi and task v j are served by the same vehicle, Y(i, j) = 1, otherwise, Y(i, j) = −1. Furthermore, H = I − n1 11 centers the data and the labels in the feature space, I denotes the identity matrix, n equals to the number of tasks. Di j > Diq is the constraint to impose that upon serving task i, task q is served before task j by the same vehicle. Let Ti j denotes a n × n matrix that takes non-zeros at Tii = T j j = 1, Ti j = T ji = −1. The distance constraints Di j > Diq in Eq. 4.2 is then reformulated as tr (KTi j ) > tr (KTiq ). Furthermore, slack variables ξi jq are introduced to measure the violations
4.1 Knowledge Learning and Transfer Across Problems …
85
of distance constraints and penalize the corresponding square loss. Consequently, by substituting the constraints into Eq. 4.2, it arrives at: min −tr (XHYHXT M) + M,ξ
C 2 ξi jq 2
s.t. M 0 tr (XT MXTi j ) > tr (XT MXTiq ) − ξi jq , ∀(i, j, q) ∈ N
(4.3)
where C balances between the two parts of the criterion. The first constraint enforces the learned knowledge meme denoted by matrix M to be positive semi-definite, while the second constraint imposes the scaled distances among the tasks to align well with the desired service orders of the optimized solution s∗ (i.e., Y). By configuring C based on cross-validation, Eq. 4.3 can be solved as described in [28].
4.1.4.2
Evolution of Knowledge Meme Between CVRP and CARP
The evolution of knowledge meme between CVR and CAP problem domains includes: the selection of knowledge meme and the assimilation of the selected knowledge meme for generating high quality routes for unseen problems. Selection of Learned Knowledge Meme: Furthermore, as more knowledge memes have been learned, the question of which knowledge meme should be selected for evolving across problem domain arises. Suppose there is a set of m unique Ms in the knowledge pool KP, i.e., KP = {M1 , M2 , . . . , Mm }. The knowledge meme selection process is formulated as to identify the weight μi of each knowledge meme. A fitter knowledge meme should have a higher weight n and the summation of the μi = 1). weights of all knowledge meme equates to 1 (i.e., i=1 In particular, the weight vector µ is determined as: m (μi )2 Disi max tr (HKHY) − μ
i=1
s.t. Mt =
m i=1 T
μi Mi , μi ≥ 0,
m
μi = 1
i=1
K = X ∗ Mt ∗ X, K 0
(4.4)
where Disi is the discrepancy measure between two given problem instances. In the present context, Disi = β ∗ M M Di + (1 − β) ∗ Di f i , where M M Di denotes the maximum mean discrepancy [29], which is used to compare the distribution similarity between two given instances by measuring the distance their s t between φ(xis ) − n1t i=1 φ(xit )||. Di f i corresponding means. M M D(Ds , Dt ) = || n1s i=1 denotes the difference in vehicle capacity for two given problem instances. β balances between the two parts (i.e., M M Di and Di f i ) in Disi . Based on domain knowledge,
86
4 Optinformatics Across Heterogeneous Problem …
the task distribution has a higher weightage than vehicle capacity information. This implies that β > 0.5. In this work, β is configured empirically as 0.8 to favor task distribution information over vehicle capacity information. In Eq. 4.4, the first term serves to maximize the statistical dependence between input X and output label Y for clustering [30]. The second term measures the similarity between the previous problem instances solved and the given new problem of interest. Since two unknown variables exist (i.e., μ and Y) in Eq. 4.4, it can be solved by fixing one variable alternately. When Y is fixed, Eq. 4.4 becomes a quadratic programming problem of μ. Y can be obtained by clustering (e.g., K-Means) on X if μ fixed. Furthermore, as µ obtained via solving Eq. 4.4, the selected Mt is then derived as: m m μi Mi , ( μi = 1, μi ∈ [0, 1]) Mt = i=1
i=1
Assimilation of Knowledge Meme for Evolutionary Search: Subsequently, the knowledge meme Mt generalized from past experiences is then assimilated for enhancing evolutionary search on another problem domain via the generation of meme biased solutions. In particular, the tasks distribution of the original data Xnew is first transformed or remapped to a new tasks distribution Xnew (i.e., from Fig. 4.6a to c) by:
Xnew = LT Xnew
(4.5)
where L is derived by SVD of Mt . Furthermore, the tasks assignment of vehicles and task service ordering of the meme biased solution are obtained by clustering on the transformed tasks and pairwise distance sorting among tasks assigned in the same cluster (Fig. 4.6d), respectively. In summary, the enhancement of evolutionary search with knowledge meme learned across problem domain is realized by injecting knowledge meme biased solutions into the population of the evolutionary search. An illustration of the overview flowchart is depicted in Fig. 4.7. By learning from the experienced problems and corresponding optimized solutions, the knowledge memes are extracted and stored in the pool that resembles the brain of our human beings. When a new problem encountered, the selection and assimilation process kick in to generate the respective knowledge meme biased problem solutions which will be subsequently injected into the population of the evolutionary search to guide the search process.
4.1.5 Empirical Study To investigate the feasibility of learning and evolution of knowledge memes across problem domains for enhanced evolutionary search, empirical studies conducted between the challenging NP-hard CVRP and CARP domains are presented in this
4.1 Knowledge Learning and Transfer Across Problems …
87
Evolutionary Optimization Solver
Unseen Problem Encountered
Knowledge Meme Induced Solution
Learning of Knowledge Meme
Knowledge Meme Pool
Assimilation of Knowledge Meme
Selection of Knowledge Meme
Evolution of Knowledge Meme
Solved Problems in Related Domain
Learning and Evolution of Knowledge Meme across Problem Domain Fig. 4.7 An illustration of knowledge learning and evolution across problem domain for enhanced evolutionary search
section. In particular, 10 commonly used CVRP benchmark instances and 10 well known CARP benchmark instances of diverse properties in terms of vertices size, graph topologies, etc., are considered here. In addition, two recent state-of-the-art evolutionary algorithms for solving CVRP and CARP, labeled in their respective published works as CAMA [16] and ILMA [17], are considered as the baseline conventional evolutionary solvers for the respective domains in the present study. In CAMA, the initial population is configured according to [16], as a fusion of solutions generated by Backward Sweep [31], Saving [32], Forward Sweep [31] and random initialization approaches, while in ILMA, the initial population is a fusion of chromosomes generated from Augment_Merge [33], Path_Scanning [34], Ulusoy’s Heuristic [35] and the simple random initialization procedures. For the setup of evolutionary search with knowledge meme derived from different problem domain, the best solutions of the CARP and CVRP instances are used as the search experiences in each problem domain. CAMA-K and ILMA-K denote the baseline solver with prior knowledge meme derived from CARP domain and CVRP domain, respectively. In particular, their initial populations are generated based on the evolved knowledge meme as discussed in Sect. 4.1.4.2.
88
4 Optinformatics Across Heterogeneous Problem …
Lastly, the operator and parameter settings of CAMA-K and ILMA-K are kept the same as that of [16, 17] for the purpose of fair comparison. In what follows, the empirical studies are presented to answer the three questions on knowledge meme transmission across problem domains for evolutionary optimization. – Can evolutionary optimization benefit from knowledge meme across problem domains? – How do different knowledge memes across problem domains influence the evolutionary search? – What knowledge meme across problem domain would lead to enhanced evolutionary search?
4.1.5.1
Can Evolutionary Search Benefit from Different Problem Domains?
Here it is assumed that the optimized solutions for the CARP instances are available and use them as past problem solving experiences in the CAR problem domain to solve unseen CVRPs. Incidentally, when solving CARPs, the optimized solutions of the CVRPs are used as existing search experiences.
4.1.5.2
Solving CVRP with Evolution of Knowledge Meme from CARP Domain
All results obtained on the CVRP instances by CAMA solver over 30 independent runs are summarized in Table 4.1. B.Cost, Ave.Cost and Std.Dev denote the best solution with minimum cost, averaged cost of best solution obtained in each run and the standard deviation of the optimized solutions across 30 independent runs, respectively. B.Gap measures the difference between the best-found value and the lower bound value of a benchmark instance, while Ave.Gap gives the difference between the Ave.Cost value and the lower bound value of each instance. Superior performance are highlighted in bold font. It can be observed from the results in Table 4.1 that in overall, CAMA-K achieved competitive or improved solution quality over CAMA in terms of Ave.Cost on all of the CVRP instances. In particular, with prior knowledge from the domain of CARP, CAMA-K obtained better Ave.Cost value on 8 out of totally 10 instances. Note that, on instance “A-n54-k7”, CAMA-K consistently converged to the optimal solution over all the 30 independent runs (i.e., the corresponding “Std.Dev” is “0”). Subsequently, to access the efficiency of the proposed approach, the averaged convergence graphes on the CVRP benchmark instances are depicted in Fig. 4.8. In the figures, the dotted line at the bottom of each figure denotes the lower bound or best known solution of the corresponding benchmark instance reported in the literature [16]. As can be observed, on all the CVRP instances, CAMA-K achieved superior performance over its counterpart CAMA. In particular, the initial start points
2.58 2.81 0.00 0.95 2.03 2.42 3.51 2.79 0.00 7.67
0 0 0 0 0 0 0 0 0 14.16
1.13 3.87 0 1.70 2.00 3.30 4.37 5.19 0 27.26
1167.00 1162.00 1140.00 1222.60 554.67 736.70 819.07 839.83 819.56 1315.07
1167.00 1159.00 1140.00 1221.00 554.00 735.00 817.00 835.26 819.56 1301.00
1168.13 1162.87 1140.00 1222.70 556.00 738.30 819.37 840.45 819.56 1318.71
1. A-n54-k7 2. A-n69-k9 3. B-n57-k7 4. B-n78-k10 5. P-n50-k7 6. E-n76-k8 7. E-n101-k8 8. c75 9. c100b 10. c199
1167.00 1159.00 1140.00 1221.00 554.00 735.00 815.00 835.26 819.56 1305.61
C AM A − K B.Cost Ave.Cost
Table 4.1 Statistic results of CAMA-K and CAMA on CVRP benchmarks CVRP C AM A Instances B.Cost Ave.Cost Std.Dev B.Gap Ave.Gap Std.Dev 0.00 2.39 0.00 0.44 1.52 1.95 2.03 3.75 0.00 7.93
B.Gap 0 0 0 0 0 0 2.00 0 0 9.55
Ave.Gap 0 3.00 0 1.60 0.67 1.70 4.07 4.57 0 23.62
4.1 Knowledge Learning and Transfer Across Problems … 89
90
4 Optinformatics Across Heterogeneous Problem … 1900
2400
CAMA−K CAMA
1800 1700
2000 Travel Cost
Travel Cost
CAMA−K CAMA
2200
1600 1500 1400
1800 1600 1400
1300 1200
1200
1100 0
1000 0
500
1000
1500
500
Number of Fitness Evaluation
(a) A-n54-k7
2200 2000 1800
900 800
1600
700
1400
600
1000
1500
2000
CAMA−K CAMA
1000
Travel Cost
Travel Cost
1100
CAMA−K CAMA
500
500 0
2500
500
(c) B-n78-k10 1800
1500
(d) P-n50-k7 3500
CAMA−K CAMA
CAMA−K CAMA
3000 Travel Cost
1600 Travel Cost
1000
Number of Fitness Evaluation
Number of Fitness Evaluation
1400
1200
2500
2000
1500
1000
800 0
1500
(b) A-n69-k9
2400
1200 0
1000
Number of Fitness Evaluation
500
1000
1500
Number of Fitness Evaluation
(e) c75
2000
1000 0
1000
2000
3000
4000
5000
6000
Number of Fitness Evaluation
(f ) c199
Fig. 4.8 Averaged search convergence graphs of CAMA and CAMA-K on representative CVRP benchmark instances. Y-axis: Travel Cost, X-axis: Number of Fitness Evaluations. (The dotted line at the bottom of each figure denotes the lower bound or best known solution of the respective benchmark reported in the literature)
4.1 Knowledge Learning and Transfer Across Problems …
91
of CAMA-K on “A-n69-k9”, “B-n78-k10”, “c75” and “c199”, etc., are already very close to the respective lower bound solutions, and CAMA-K takes only about 1000 fitness evaluations to arrive at the solution obtained by CAMA over 2000 number of fitness evaluations. Please note that, CAMA-K and CAMA share the same evolutionary solver for CVRP, and only different in the generation of the initial population. Furthermore, more intuitional insight on the resultant search speed efficiency of the proposed knowledge meme transmission across problem domain is provided. Different from in this chapter, which calculates the times of speedup obtained by the proposed approach against the baseline at various fitness level, here how much fitness evaluation (FE) have been saved by CAMA-K to arrive at the converged solution obtained by its counterparts on each CVRP benchmark instance are investigated. The saving is defined as: F E saved =
Ncs (A) − Ncs (A-K) × 100% Ncs (A)
(4.6)
where Ncs (·) denotes the number of fitness evaluation used by the investigating algorithm to arrive at a given solution cs. Symbol A stands for the investigating algorithm (i.e., CAMA here). If algorithm A-K obtained poor averaged convergence solution, cs is then set as the averaged convergence solution of A-K. Otherwise, cs is configured as the averaged convergence solution of algorithm A. It is worth noting that a positive F E saved value means the search of A-K is more efficient than A, and on the other hand, a negative F E saved value denotes A-K’s search is slower than its counterpart A. A higher F E saved value denotes more fitness evaluation are saved by A-K to arrive at the solution quality level obtained by A. Table 4.2 summarize the investigation results of F E saved obtained by CAMA-K on all the CVRP benchmark instances considered. As observed, with prior knowledge meme derived from CARP domain, CAMA-K has brought about up to 76% saving in terms of fitness evaluations over CAMA (i.e., “c199”). As aforementioned, Since the only difference between CAMA-K and CAMA, lies in the prior knowledge introduced in the population initialization phase of the latter, the superior performance of CAMA-K can clearly be attributed to the effectiveness of the knowledge meme transmission across problem domains. In other word, through this study, it can conclude that evolutionary search can benefit from the knowledge meme derived across problem domains. In what follows, solving the domain of CARP with knowledge meme from CVRP domain is further investigated.
Table 4.2 Computational cost saved by CAMA-K over CAMA in terms of fitness evaluation (FE) Data
A-n54k7
A-n69k9
B-n57k7
B-n78k10
P-n50k7
E-n76k8
E-n101- c75 k8
c100b
c199
F E saved (%)
39.08% 50.01% 49.79% 68.07% 13.43% 46.18% 43.79% 49.96% 59.87% 76.23%
92
4.1.5.3
4 Optinformatics Across Heterogeneous Problem …
Solving CARP with Evolution of Knowledge Meme from CVRP Domain
On the other hand, all the results obtained on the CARP instances by ILMA over 30 independent runs are presented in this section. In particular, Table 4.3 gives the solution quality comparison between ILMA-K and ILMA. Figure 4.9 presents the respective averaged convergence graphes of ILMA-K and ILMA to access the efficiency of the proposed approach. The dotted line at the bottom of each figure denotes the lower bound or best known solution of the corresponding benchmark instance reported in the literature [17]. Furthermore, the fitness evaluation savings (i.e., as defined by Eq. 4.6) obtained by ILMA-K over ILMA are summarized in Table 4.4. As can be observed, with prior knowledge meme from CVRP domain, superior or competitive performance of ILMA-K can be observed from Table 4.3 over ILMA, on most of the considered CARP instances in terms of “Ave.Cost”. In particular, ILMA-K obtained lower “Ave.Cost” on 8 out of totally 10 CARP instances against ILMA. In addition, the convergence graphes of ILMA-K and ILMA are depicted in Fig. 4.9. As CARP has higher problem complexity than CVRP, it can be observed that much more number of fitness evaluations have been taken to arrive at an acceptable solution for solving CARP. Speeded up search performances have been obtained by ILMA-K over its counterpart ILMA on most CARP instances. More interestingly, on instance “E1C” (i.e., Fig. 4.9a), “S3C” (i.e., Fig. 4.9e), “S4C” (i.e., Fig. 4.9f), although ILMA-K do not have a lower cost start point than ILMA, it converged to superior solutions. This implies that the appropriate knowledge meme derived from CVRP domains can not only generate initial solutions with lower cost for search, but it also be able to guide the search to a more potential space for search via the generated initial solutions. Look into the fitness evaluation savings defined by Eq. 4.6, ILMA-K brings about up to 52% fitness evaluation savings over ILMA and obtained more speeded up search performance on totally 8 out of 10 instances. Here, the only difference between ILAM and ILMA-K is also the prior knowledge introduced in the population initialization phase of the latter, the superior performance of ILMA-K again verified that evolutionary search can benefit from different but related problem domains. How do the Knowledge Memes of Related Problem Domain Affect the Evolutionary Search? To gain better understanding of knowledge meme transmission across problem domains for enhanced evolutionary optimization, the proposed approach with knowledge meme transmission where the meme is from the most, least similar and randomly selected problem instance across problem domains are further analyzed and compared. In particular, the maximum mean discrepancy in Eq. 4.4 is used here to measure the similarity between problem instances. Let A-BM, A-WM and A-RM denote the baseline evolutionary solver A with initial population generated by the knowledge meme derived from the most, least similar and randomly selected problem instance across problem domains, respectively. In the present context, A stands for CAMA for solving CVRP, and denotes ILMA for solving CARP.
8.10 37.52 42.65 57.96 71.50 35.04 61.45 75.34 64.77 72.40
29.00 92.00 129 114 143 25 151 157 331 291
34.13 113.00 163.77 167.67 276.73 80.33 277.63 291.07 423.13 434.87
5598.40 8349.20 10314.50 9053.17 11704.93 8567.07 16608.20 17368.10 16509.10 20793.63
5595.00 8335.00 10292.00 8998.00 11602.00 8518.00 16466.00 17258.00 16397.00 20624.00
5600.13 8356.00 10326.77 9051.67 11703.73 8573.33 16630.63 17391.07 16516.13 20809.87
1. E1C 2. E2C 3. E3C 4. E4B 5. E4C 6. S1C 7. S2C 8. S3C 9. S4B 10. S4C
5595.00 8335.00 10292.00 8998.00 11570.00 8518.00 16504.00 17257.00 16424.00 20666.00
ILMA − K B.Cost Ave.Cost
Table 4.3 Statistic results of ILMA-K and ILMA on CARP benchmarks CARP ILMA Instances B.Cost Ave.Cost Std.Dev B.Gap Ave.Gap 6.56 26.49 28.86 49.04 81.81 36.91 67.39 69.08 63.58 80.38
Std.Dev
B.Gap 29.00 92.00 129 114 175 25 113 158 304 249
32.40 106.20 151.50 169.17 277.93 74.07 255.20 268.10 416.10 418.63
Ave.Gap
4.1 Knowledge Learning and Transfer Across Problems … 93
94
4 Optinformatics Across Heterogeneous Problem … 8.6335
9.038
ILMA−K ILMA
8.633
log(Travel Cost)
8.6325 log(Travel Cost)
ILMA−K ILMA
9.036
8.632 8.6315 8.631 8.6305
9.034
9.032
9.03
8.63 8.6295
1
1.5
2
3
2.5
3.5
Number of Fitness Evaluation
9.028
4 6 x 10
6
5
4
3
2
Number of Fitness Evaluation
(a) E1C
6
x 10
( b ) E2 C
9.25
9.73
ILMA−K ILMA
ILMA−K ILMA
9.728 log(Travel Cost)
log(Travel Cost)
9.248
9.246
9.244
9.726 9.724 9.722 9.72
9.242 9.718 9.24
2
3
7
6
5
4
Number of Fitness Evaluation
9.716
8 6 x 10
4
6
(a) E3C 9.774
12
9.774
ILMA−K ILMA
14 6 x 10
9.77 9.768 9.766 9.764 9.762
ILMA−K ILMA
9.772 log(Travel Cost)
log(Travel Cost)
10
(b) S2C
9.772
9.76
8
Number of Fitness Evaluation
9.77 9.768 9.766 9.764 9.762
4
6
8
10
12
Number of Fitness Evaluation
(c) S3C
14 6 x 10
9.76
4
6
8
10
12
Number of Fitness Evaluation
14 6 x 10
(c) S4C
Fig. 4.9 Averaged search convergence graphs of ILMA and ILMA-K on representative CARP benchmark instances. Y-axis: log(Travel Cost), X-axis: Number of Fitness Evaluations. (The dotted line at the bottom of each figure denotes the lower bound or best known solution of the respective benchmark reported in the literature)
4.1 Knowledge Learning and Transfer Across Problems …
95
Table 4.4 Computational cost saved by ILMA-K over ILMA in terms of fitness evaluation (FE) Data
E1C
F E saved (%)
34.66% 52.72% 49.39% −10.58%
E2C
E3C
E4B
E4C
S1C
S2C
S3C
S4B
S4C
−11.92% 25.88% 32.83% 19.85% 10.86% 7.55%
The averaged convergence graphes on representative CVRP and CARP instances are depicted in Figs. 4.10 and 4.11, respectively. As can be observed, in Fig. 4.10, with knowledge meme transmitted from CARP problem domain, CAMA-BM, CAMA-WM and CAMA-K achieved efficient search performance than the baseline CAMA. Furthermore, among CAMA-BM, CAMA-WM and CAMA-K, it is obvious that CAMAWM lost to the other two on all the CVRP instances, since the least similar knowledge meme is incorporated. In contrast, the proposed approach, which can learn the weights of different memes based on the problem similarity, achieved the superior or competitive performance than CAMA-BM. CAMA-RM which randomly choose knowledge meme across problem domain, achieved convergence search trace between CAMA-WM and CAMA-BM. On the other hand, for CARP, in Fig. 4.11, negative effects of ILMA-WM have been observed on all the CARP instances when compare to the baseline ILMA. However, with knowledge meme from most similar CVRP instance, ILMA-BM achieved superior performance than baseline ILMA. With randomly selected knowledge meme, ILMA-RM achieved speeded up search again ILMA on instances such as “E1C” (i.e., Fig. 4.11a), “S2C” (i.e., Fig. 4.11d), but fall to ILMA on other instances. Moving on next to the proposed knowledge meme transmission, ILMA-K attained superior performances to ILMA-WM, ILMA-RM on all the CARP instances, and obtained superior or competitive against ILMA-BM. In summary, different knowledge memes will introduce unique biases into the evolutionary search, inappropriately chosen knowledge memes can lead to negative impairments of the evolutionary search process. Further, the comparisons conducted in this section also confirmed the effectiveness of the proposed selection scheme for enhanced search. What forms of Knowledge Memes from Related Problem Domain Benefit Evolutionary Optimization? In the empirical studies presented above, enhanced as well as deteriorated evolutionary search caused by knowledge meme transmission across different but related problem domains have been observed. This subsection further studies the possible reason behind the various performances obtained, and explore what knowledge meme across different but related problem domains would enhance the evolutionary search. In particular, the discrepancies obtained by each CVRP instance against all CARPs and each CARP against all CVRPs obtained via Eq. 4.4 (i.e., maximum mean discrepancy) in the established common feature space are first depicted in Figs. 4.12 and 4.13, respectively. As can be observed, in Fig. 4.12, from CVRP instance “1” (i.e., “A-n54-k7”) to “4” (i.e., “B-n78-k10”), there is a decreasing trend on the corre-
4 Optinformatics Across Heterogeneous Problem … 1900
CAMA−K CAMA−BM CAMA−WM CAMA CAMA−RM
1800
Travel Cost
1700 1600 1500 1400
2400
2000 1800 1600 1400
1300
1200
1200 1100 0
CAMA−K CAMA−BM CAMA−WM CAMA CAMA−RM
2200
Travel Cost
96
1000 0
1500
1000
500
(a) A-n54-k7 CAMA−K CAMA−BM CAMA−WM CAMA CAMA−RM
2200
Travel Cost
(b) A-n69-k9
2000 1800
1100
900 800
1600
700
1400
600
1200 0
500
1000
1500
CAMA−K CAMA−BM CAMA−WM CAMA CAMA−RM
1000
Travel Cost
2400
500 0
2000
500
Number of Fitness Evaluation
1200
3500
CAMA−K CAMA−BM CAMA−WM CAMA CAMA−RM
3000 Travel Cost
Travel Cost
1400
2500
2000
1500
1000
800 0
1500
(d) P-n50-k7 CAMA−K CAMA−BM CAMA−WM CAMA CAMA−RM
1600
1000
Number of Fitness Evaluation
(c) B-n78-k10 1800
1500
1000
500
Number of Fitness Evaluation
Number of Fitness Evaluation
500
1000
1500
Number of Fitness Evaluation
(e) c75
2000
1000 0
1000
2000
3000
4000
5000
6000
Number of Fitness Evaluation
(f ) c199
Fig. 4.10 Averaged search convergence graphs of CAMA, CAMA-WM, CAMA-BM and CAMAK on representative CVRP benchmark instances. Y-axis: Travel Cost, X-axis: Number of Fitness Evaluations. (The dotted line at the bottom of each figure denotes the lower bound or best known solution of the respective benchmark reported in the literature)
4.1 Knowledge Learning and Transfer Across Problems … 8.636
8.634
ILMA−K ILMA−BM ILMA−WM ILMA ILMA−RM
9.04 log(Travel Cost)
log(Travel Cost)
9.042
ILMA−K ILMA−BM ILMA−WM ILMA ILMA−RM
8.635
97
8.633 8.632 8.631 8.63
9.038 9.036 9.034 9.032 9.03
8.629
1
2
3
9.028 1
4 6 x 10
Number of Fitness Evaluation
2
3
(a) E1C
5
6 6
x 10
(b) E2C
9.25
9.73
ILMA−K ILMA−BM ILMA−WM ILMA ILMA−RM
ILMA−K ILMA−BM ILMA−WM ILMA ILMA−RM
9.728 log(Travel Cost)
9.248 log(Travel Cost)
4
Number of Fitness Evaluation
9.246
9.244
9.726 9.724 9.722 9.72
9.242 9.718 9.24
2
3
9.716
8
7
6
5
4
Number of Fitness Evaluation
Number of Fitness Evaluation
(c) E3C ILMA−K ILMA−BM ILMA−WM ILMA ILMA−RM
log(Travel Cost)
9.775
9.77
9.765
9.76
12 6 x 10
(d) S2C 9.96
ILMA−K ILMA−BM ILMA−WM ILMA ILMA−RM
9.955 log(Travel Cost)
9.78
10
8
6
4
6
x 10
9.95
9.945
4
6
8
10
12
Number of Fitness Evaluation
(e) S3C
14 6 x 10
9.94
4
6
8
10
Number of Fitness Evaluation
12
14 6 x 10
(f ) S4C
Fig. 4.11 Averaged search convergence graphs of ILMA, ILMA-WM, ILMA-BM and ILMA-K on representative CARP benchmark instances. Y-axis: log(Travel Cost), X-axis: Number of Fitness Evaluations. (The dotted line at the bottom of each figure denotes the lower bound or best known solution of the respective benchmark reported in the literature)
98 Fig. 4.12 Discrepancies of each CVRP instances against all CARPs. “+” denotes a discrepancy obtained between the CVRP instance and a CARP instance, and “⊕” is the minimum discrepancy obtained by each CVRP instances. The name (e.g., “E1C”) appending to each ⊕ denotes the CARP instance, on which the respective CVRP instance achieves the minimum discrepancy
Fig. 4.13 Discrepancies of each CARP instances against all CVRPs. “+” denotes a discrepancy obtained between the CARP instance and a CVRP instance, and “⊕” is the minimum discrepancy obtained by each CARP instances. The name (e.g., “c199”) appending to each ⊕ denotes the CVRP instance, on which the respective CARP instance achieves the minimum discrepancy
4 Optinformatics Across Heterogeneous Problem …
E1C E1C E2C
E1C E2C E2C
E2C E3C
E2C E3C
E-n101-k8 E-n101-k8 c199
c199 c100b
B-n78-k10 B-n57-k7 B-n78-k10
c199
c199
sponding discrepancies. Look back to the computational cost saved by CAMA-K over CAMA in Table 4.2, an increasing trend of computational cost saved can be observed from “A-n54-k7” to “B-n78-k10” in general. Furthermore, on CVRP instance “5” (i.e., “P-n50-k7”), its discrepancies go up, while the corresponding fitness evaluation (FE) saving drops to below 20%. Subsequently, when the discrepancy between CVRP instance and CARPs drops from instance 6 (i.e., “E-n76-k8”) to 10 (i.e., “c199”), the respective FE saved by CAMA-K over CAMA increases again. On the other hand, for CARP in Fig. 4.13, the discrepancies drop first from CARP instance “1” (i.e., “E1C”) to “2” (i.e., “E2C”), and increase subsequently on the later CARP instances. Coupled with the computational cost saved by ILMA-K over ILMA in Table 4.4, it can be observed that the FE saved by ILMA-K goes up from 34.66% to 52.72% from “E1C” to “E2C” and drops to −11.92% at CARP instance “5” (i.e., “E4C”) when the corresponding discrepancies increased. Furthermore, the
4.1 Knowledge Learning and Transfer Across Problems …
99
FE saved by ILMA-K goes up again at CARP instance “6” (i.e., “S1C”), although most of the discrepancies are increased. However, it is worth noting that from CARP instance “6” (i.e., “S1C”) to “10” (i.e., “S4C”), lower minimum discrepancies can been observed than that of CARP instance “4” (i.e., “E4B”) and “5” (i.e., “E4C”). Here, the minimum discrepancy of each CVRP instance against all the CARPs and each CARP against all the CVRPs in the established common feature space (i.e., Sect. 4.1.2) is given by: Dv p = min i={1,...,t} M M D(v p, api ) Dap = min i={1,...,t} M M D(ap, v pi )
(4.7) (4.8)
where v p and ap denote a CVRP and CARP instance, respectively. t is the number of CARP instance or CVRP instances. In the present context, t equals to 10. M M D(·) is the maximum mean discrepancy used in Eq. 4.4. In Figs. 4.12 and 4.13, “⊕” denotes the respective minimum discrepancy. The name appending to each ⊕ denotes the instance across problem domain, on which the current problem instance achieves the minimum discrepancy. As most of the discrepancies are increasing from CARP instance “6” to “10”, the FE savings obtained on these instances indicated that knowledge meme from the CVRP instance with minimum discrepancy would play a dominate role via its selection weight (i.e., μ in Eq. 4.4) to bias the search for solving CARP. In what follows, the computation cost savings in terms of fitness evaluation achieved by CAMA-K and ILMA-K with respect to the minimum discrepancy obtained by each problem instance across domain are further investigated. As depicted in Figs. 4.14 and 4.15, the small circles denote the particular FE saved by CAMA-K or ILMA-K with a minimum discrepancy obtained cross problem domain. The straight line is a linear regression of the depicted small circles. In general, an
Fig. 4.14 FE saved by CAMA-K over CAMA with respect to the minimum discrepancies obtained by the CVRP instances against all CARPs
80 70
FE Saved
60 50 40 30 20 10 30
40
50 60 Discrepancy
70
80
100 60 50 40 FE Saved
Fig. 4.15 FE saved by ILMA-K over ILMA with respect to the minimum discrepancies obtained by the CARP instances against all CVRPs
4 Optinformatics Across Heterogeneous Problem …
30 20 10 0 −10 −20 35
40
45
50 55 Discrepancy
60
65
inversely-proportional relationship can be observed between FE savings and the corresponding minimum discrepancy in both CVRP and CARP problem domain. In summary, from the observed relationship between FE savings and discrepancies between instances across problem domains, it can be inferred that enhanced evolutionary search would be obtained with knowledge meme transmission across domains when low discrepancy existed between the respective problem instances. If the problem instance of interest is very different from existing solved problems, the corresponding knowledge transmission would not be helpful for evolutionary search, but even may lead to negative impairments on the search process. To ensure an enhanced evolutionary search by knowledge meme transmission across problem domain, the discrepancy measure, i.e., Eqs. 4.7 and 4.8, in the common feature space would be used as a criterion to evaluate whether low discrepancy existed between problem domains.
4.1.6 Summary In this work, a study on the feasibility of evolutionary paradigm capable of learning and evolving knowledge meme across different but related problem domains for the purpose of enhanced search has been presented. In particular, on two NP-hard combinatorial optimization problems, i.e., CVRP and CARP, the common problem representation, the shared useful knowledge meme representation, and how to capture and transmit the knowledge meme between CVRP and CARP evolutionary search are derived and proposed. Empirical results show that evolutionary optimization can benefit from different but related problem domain. However, the appropriately chosen of knowledge meme is crucial for enhancing the evolutionary search process. In addition, by studying the performances of meme biased evolutionary search
4.1 Knowledge Learning and Transfer Across Problems …
101
and the discrepancies between problems in different domains, it has been observed that enhanced evolutionary optimization would be obtained from knowledge meme transmission when low discrepancy existed between the respective problems in the established common feature space.
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System Takes inspiration from Darwin’s theory of natural selection and Universal Darwinism [15, 36] as the principal driving forces that govern the evolutionary knowledge transfer process, this section presents an evolutionary Transfer reinforcement Learning framework (eTL) for building intelligent social agents with the capability of adapting to dynamic multi-agent systems (MASs). The intrinsic parallelism of natural evolution and the errors which are introduced due to the physiological limits of the agents’ ability to perceive differences [37], could generate the “growth” and “variation” of agents’ knowledge that learnt from the problem domain [38], thus exhibiting better adaptivity capabilities on solving non-trivial and complex problems. Further, the essential backbone of the proposed framework comprises of memetic automaton2 [25], which includes meme-inspired3 evolutionary mechanisms, namely meme expression, meme assimilation, meme representation, meme internal and external evolution. Particularly, meme representation and meme evolution comprise the main aspects of eTL. Meme representation pertains to what is a meme. Meme evolution, which includes meme internal evolution and meme external evolution, instructs the behavioral learning process of multiple agents. To be specific, meme internal evolution indicates the learning process that updates an individual’s internal knowledge via self learning. The meme external evolution which is central to the behavioral aspects of imitation, models the social interactions among multiple agents. In contrast to existing works on TL in MAS, eTL addresses the aforementioned limitations (e.g., blind reliance) found in the current knowledge transfer process of existing frameworks. In particular, the proposed approach constructs social selection mechanisms that are modeled after the principles of human-learning [39] to effectively identify appropriate interacting partners, thus bringing about improved agent social learning capabilities in dynamic environments [40].
2A
memetic automaton is defined as an self-contained software or agent which autonomously learns to enhance adaptivity and capability using memes as the building blocks of information during knowledge evolving. 3 A meme is regarded as the fundamental building units of cultural information, hold in an individual’s mind, which is capable of being transmitted to others [15].
102
4 Optinformatics Across Heterogeneous Problem …
4.2.1 Multi-agent System Multi-agent systems (MAS) are composed of multiple interacting agents that have different information and diverging/common interests within an environment [41]. Due to its adaptativity and effectiveness in solving non-linear and complex problems, MAS have been successfully deployed in many real-world scenarios, such as traffic and transportation [42], military applications [43], distributed system [44], etc. Formally, a MAS can be depicted using Fig. 4.16. In particular, each individual agent serves both a part of the environment while modeled as a separate entity with respective goals, actions and knowledge. The agents may have different learning architectures (indicated by different colors in Fig. 4.16) and different goals or objectives on their problem of interests. Besides the uncertainty that may be inherent in the domain of MAS, an agent needs to take into account the effects caused by other agents’ actions, since each agent has the capacity to affect the entire environment with their actions and influence the actions of other counterpart agents. Furthermore, due to the interaction among agents, MAS usually has a big state-action space as the global space is an exponential combination of all local spaces. Thus, the complexity of MAS grows rapidly with the number of agents, which increases the difficulties for agents to develop appropriate behaviors in the changing environment. This has attracted increasing research attentions where new effective approaches for identifying the optimal behavioral actions of agents in MAS are sought after. In the past few decades, the challenges presented by the task of optimizing learning in MAS has attracted increasing attentions. The majority of existing research work can be categorized into two forms. Agents can optimize their own actions with the hope of reaching the global optimum. One of most significant techniques
Fig. 4.16 An illustration of the multi-agent system. (Agents may have differing goals, action space or domain knowledge, which are denoted by different fonts. And the arrow between the agents indicates the possible communication between agents)
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
103
that has been proven to be a popular way for individuals to optimize behaviors is reinforcement learning (RL) [45], wherein agents can only acquire limited environmental feedback rewards, rather than correctly labeled instances, as is common in other machine learning contexts such as supervised and unsupervised learning. Another trend to enhance the learning performance in complex MAS is to transfer good strategies from a well studied domain as supplementary knowledge to help and instruct the learning process for more informed decisions. A notable knowledge reuse methodology, known as transfer learning (TL) [26], is to leverage knowledge from related tasks for an easier and more effective learning process on newly encountered tasks. Compare to RL studied in MAS, significantly fewer studies on TL in MAS have been studied, which makes it a fertile research field for further investigations. In the following subsections, the focus is placed on these two representative techniques for solving complex MAS, specifically, reinforcement learning for individual optimization in MAS and transfer learning for effective social interaction across multiple agents.
4.2.2 Reinforcement Learning for Individual Learning in MAS Reinforcement learning (RL) is a paradigm for learning sequential decision making tasks that enable individual agents to optimize their behaviors through trial-and-error interactions [45]. In some challenging cases, the behaviors taken by RL agents may not only have influence of immediate reward, but also the sequential delayed reward of followup situations. The characteristics of trial-and-error optimization and delayed reward is thus considered as two most distinguishing features of RL. Typically, a decision making task is formulated as a Markov Decision Process (MDP) which is composed of a set of states S, a set of actions A, a reward function R(s, a), and a transition function P(s |s, a). Figure 4.17 illustrates a standard reinforcement model. Specifically, given a state space S = (s1 , s2 , ..., sn ) and a set of actions A = (a1 , a2 , ..., an ) at step t, the RL agent firstly obtains an observation from the environment representing current state st which typically includes the reward rt . The most appropriate action at is then predicted by evaluating each of possible action in set A. Upon receiving a reward R(s, a) after performing this action, the agent arrives at a new state s which is determined by the probability distribution P(s |s, a). A policy π = P(a|s) specifies a distribution for each state on deciding which action an agent takes. The value Q ∗ (s, a) of a state-action pair estimates the future reward obtained from (s, a) when following policy π . Q ∗ (s, a) is computed by the Bellman equation as the follows: Q ∗ (s, a) = R(s, a) + γ
s
P(s |s, a)maxa Q ∗ (s , a )
104
4 Optinformatics Across Heterogeneous Problem …
Fig. 4.17 Illustration of a standard reinforcement learning process
where 0 < γ < 1 is the discount factor. The aim of a RL agent is hence to search for the policy π which maps states to appropriate actions and maximizes the expected long-term rewards obtained from the environment. Notably, the trade-off between exploration and exploitation of state-action strategy is of great importance during the behavior optimizing process of RL agents. The RL learner has to exploit what it has learnt in order to maximize rewards at the current states, and it also has to explore new behavioral strategies for making better action selections in the future. In order to tackle this problem of RL learners in real-world environments, various novel algorithms have been proposed. For example, these algorithms include Q-learning [46] combined with Temporal Difference algorithm [45] for problems in which the merit of an action is only known several steps after the action is conducted, the state-action-reward-state-action (SARSA) [47] for solving high dimensional continuous state-spaces, and others. In the past decades, RL has gained extensive popularities and a plethora of learning methods, such as direct policy search [48], temporal difference [49] and Monte Carlo [50], have been proposed for building independent autonomous agents that are able to handle increasingly complex problems. Such methods have achieved significant success in a wide variety of practical applications including manufacturing [51], inventory management [52], delivery management [53], etc. However, as RL agents start learning from a tabula rasa state, significantly large amount of experiential data and extensively high exploration time are often the necessary conditions for successful learning to take place. They can sometimes fall short in meeting with today’s competitive needs for high-efficiency problem-solvers and are hence often deemed as too slow for many complex problem domains. Thus, a significant number of current RL studies endeavors to improve the learning efficiency by leveraging domain expertise with various forms of human-provided knowledge. Typical studies in this direction include those that try deconstruct the task into a set of subtasks [54], learning with high-level abstractions of one-step actions [55] and abstracting over the given state space (e.g. value function [45]), so that the RL agent generalizes its learning experience more efficiently. The paradigm of transfer learning is one that has recently gained increasing attention for improving RL tasks in complex domains. Compared to previous studies, experience generalization in transfer learning can occur not only within a single task, but also across differing but related tasks. In what follows, a general overview of transfer learning, especially transfering among multiple RL agents, is presented for more detailed understanding.
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
105
4.2.3 Transfer Leaning in Multi-agent Reinforcement Learning Systems Transfer learning (TL) has surfaced as an attractive learning approach for enhancing the learning efficacy of a new task by reusing the valuable data from a different, but related task [56]. The idea of TL is not entirely new and has long been studied in the literature of psychology [57, 58]. Due to its flexibility and ease of use, TL has been well-known as a popular problem solver for various machine learning tasks such as classification [59] and clustering [60], and has enjoyed significant success across a wide variety of real-world applications including computer vision [61], natural language processing [62], etc. Recently, TL has gained increasing attentions for enhancing the classical RL paradigm, which leverages useful state-action strategies from well studied domains as supplementary knowledge to instruct and coordinate the learning process on newly encountered tasks. From a survey of the literature, TL not only improves agents’ performance, but also speeds up the learning process [56]. In the past years, a variety of TL methodologies, such as instance transfer [63], action-value transfer [64], feature transfer [65, 66] and rule exchanging [67], have been proposed, and have benefited a wide range of RL tasks. However, it is worth noting that these early approaches focus almost exclusively on leveraging the knowledge or data instances across problem domains [68]. In these works, the mapping of knowledge across differing problems has been commonly hand-coded. TL among multiple agents in MAS has apparent difference from these studies and demands a way for the system to determine how to map information independently and automatically [69]. To be fully autonomous, an agent who conducts knowledge transfer typically needs to perform the following steps: (1) select appropriate source agent for knowledge transfer to the given target agent; (2) investigate the relevance between the source and target agents; (3) perform effective knowledge transfer from source agents to the possible target agents. The mechanism used for these steps is interdependent and closely related to what assumption is made, how the mapping across tasks is defined and what is transferred. For example, transfer methods that assume the source and target agents have differing state or action spaces may use inter-task mapping [70] to develop corresponding connections. In accordance with specific tasks, the transferred knowledge can range from very low-level information (i.e., < s, a, r, s > instances or an action value function, etc.) to general heuristics that have the capacity to guide the whole learning process (i.e., partial policies, rules or advice, important features, etc.). Typically, low-level information is transferred across similar agents while highlevel information can be used across agents that bear lower similarity. The goal of transfer is to reduce the overall time required to learn a complex task while effectively reusing past knowledge in a novel task. Considering TL mechanisms at a generic level, the main requirement for successful transfer is the common view that the transferred knowledge from source domains is helpful to the target. Taking this cue, MAS has the capacity to bene-
106
4 Optinformatics Across Heterogeneous Problem …
fit from transfer learning methods. In MAS, transfer can occur within each single agent, for example, transfer with state-action tuples containing useful information for better decisions. Another is the transfer among different agents where agents can be affected by their own actions as well as their neighbors. Although the possibility makes MAS an attractive target for TL methods, the scenario of multiple agents and the complexity of MAS brings new challenges. In MAS, all agents (or tasks) learn simultaneously in the same environment. At the beginning of the learning process, no learned knowledge is available. Therefore, any pair of agents in the environment are likely to possess unique experiences and thus both may have information useful that can be transferred to the other. Some recent research have considered TL for solving multi-agent RL tasks. Boutsioukis et al. [71] focused on evaluating the applicability of transfer learning to multi-agent RL and discussed transferring from single-agent system to multi-agent system and from one MAS to another, in an offline manner. Nevertheless, this work focuses on the knowledge transfer from a previously well studied problem domain to enhance problem-solving in a novel domain of interest. In MAS, on the other hand, since agents interact with one another, the potential for multiple agents to learn from one another in the same problem domain simultaneously or online should be appropriately exploited. The challenge here is thus to identify suitable source tasks from multiple agents that will contain useful information to transfer [68]. To address the issues of requiring source information and selecting source tasks, Oliveira et al. [72] described a study on an interactive Advice-Exchange (AE) mechanism, wherein agents with poor performance seek advice only from the elitist for the next action to take. One major problem observed is that the bad advice, or blind reliance, of the elitist, could hinder the learning process, sometimes even beyond recovery. Similarly, Feng et al. [73] proposed to select teacher agents which not only have superior performance but also share similar past experiences. However, as the experience similarity measure is model specific, the approach is likely to fail when agents of diverse models exist. More recently, Taylor et al. [68] also discussed some key issues such as when and what to transfer, for conducting TL in MAS while learning progresses online, and further proposed a broadcast-based Parallel Transfer learning (PTL) for MAS. In PTL, each agent broadcasts its knowledge to all the other agents while deciding whose knowledge to accept based on the rewards received from other agents versus the expected rewards it predicts. Nevertheless, agents in this approach tend to infer incorrect actions on unseen circumstances. In the following subsections, more detailed discussions on these works of TL in multi-agent RL system are presented.
4.2.3.1
Interactive Advice Exchange Mechanism
Oliveira et al. [72] presented a study on an interactive advice-exchange mechanism where agents share episodes of given states by seeking advice at critical times, or by requesting demonstration of a solution to a specific problem to agents whose performance is currently better than all other agents. As compared to other ways of
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
107
exchanging environmental states, or the internal parameters and policies, the option of seeking advice has distinct advantage in the following aspects: (1) Advice exchanging has no restriction on the heterogeneity of learning algorithms; (2) Heterogeneous learning algorithms have unique exploration of the same search space hence providing greater solution diversity; (3) The information included in the exchanged advice is more valuable than exchanged environment states; (4) Advice exchanging has less requirement on pre-coded knowledge. At the beginning of learning algorithms, each agent that employs the heterogeneous learning algorithm (Random Walk, Simulated Annealing or Q-learning) has to decide on whether to undergo advice-exchange process according to the current average quality of agents. In case of advice-exchange, the request for the advice is sent to agents with best quality, wherein the current state of the advisee is included as parameters. The advisor thus takes the received state transmitted by the advisee into consideration when developing the best guess as the most possible response to the given state. The response from advisor is then used to reinforce the internal knowledge of the advisee using the standard backpropagation rule. The system repeats the aforementioned steps and benefits from mutual interaction during the learning process until the terminal condition is satisfied. Advice-exchange is a promising approach but there exists a problem, namely blind reliance or bad advice, which suggests that one agent may always execute the advice from the best agent. Blind reliance can hinder the learning process, sometimes even beyond recovery. The study in this dissertation also takes advantage of the adviceexchange mechanism. However, unlike advice-exchange where agents always learn from the elitist, this dissertation introduces an enhanced knowledge transfer approach wherein agents learn from others that not only have higher historical performance but also share the most similar experience. Another difference between the adviceexchange model and the present work is that the reinforcement signal is not only based on the advice given by advisor but is also directly collected from the environment.
4.2.3.2
Parallel Transfer Learning
Transfer learning has shown significant success in improving the performance of learning agents. However, most of traditional transfer approaches require the source task to be finished before valuable information can be transferred to the target task. The execution of the source task thus extends the learning time of target task. Taking this cue, Taylor et al. [68] proposed a simultaneous TL framework, namely parallel transfer learning (PTL) where both source and target tasks can learn in parallel and bi-directional way. In a PTL system, knowledge transfer occurs whenever the source task is deemed to be helpful to the target task. At each time step, the information of current agent agt(c) can be broadcasted to other agents every time it has distinct changes compared with some pre-defined boundary. Meanwhile, agent agt(c) checks if its own transfer buffer has received any information from other agents in the system. Once confirmed, agent agt(c) determines whether the received information is better than its current based on the frequency that the information(state) has been visited.
108
4 Optinformatics Across Heterogeneous Problem …
Finally, agent agt(c) can assimilate the information with higher quality into their own mind. Through repetition of the aforementioned process, agents in the system can explore valuable information of whole search space more faster. PTL removes the restriction of executing the source task prior to the target task. The valuable information thus can be shared among agents as soon as it becomes available. As a result, PTL improves the final performance and takes less learning time when compared to the sequential TL approach, i.e., if the time of source task is to be counted. Nevertheless, PTL employs an information broadcasting scheme which has the tendency to be time consuming and redundant since other agents might have already explored the broadcast information before. Moreover, deciding the standard of valuable information is closely relevant to the distribution of differing situations. Agents in this approach thus prefer to infer incorrect actions on unseen situations. Differing to the information broadcasting strategy, this report introduces a knowledge transfer scheme which happens mainly via imitation, wherein agents imitate behaviors of partners with better performance or that are reputed to have better skills in the request for a specific situation. As a result, the redundancy and uncertainty in knowledge sharing could be reduced significantly.
4.2.3.3
Memetic Multi-agent System
Memetics is a new science that has attracted increasing attentions in the past few decades. Beyond the formalism of memetic automaton which uses memes4 as building blocks of information, Feng et al. [73] proposed a Memetic Multi-Agent System (MeMAS) for modeling agents’ social interactions by sharing beneficial knowledge in terms of memes across multiple agents. Figure 4.18 depicts the detailed design of MeMAS which comprises a series of memetic evolutionary learning mechanisms. In MeMAS, each memetic agent employs a FALCON model [74] as the connectionist learning machine in its mind universe where the memes are acquired and stored in the cognitive field F2 . Specifically, meme representation represents the stored memes which is described internally as the stored knowledge in agents’ mind universe, and externally as the manifested behavioral actions that can be observed by others. Meme expression is the process for an individual to express their internal knowledge as actions to others. Meme assimilation is the way for an individual to capture these actions and update them into their respective mind universe. Meme internal evolution and meme external evolution comprise the main behavioral learning aspects. Particularly, meme internal evolution is the process for agents to update their mind universe by self learning. Specifically, the agent firstly predicts the rewards for conducting each available action via meme expression. The reward value is then used to select an action to perform. According to the possible feedback from the environment, a temporal difference formulation and meme assimilation are 4A
meme is defined as the basic unit of cultural information in [15] stored in an individual’s mind universe.
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
109
Fig. 4.18 The architecture of Memetic Multi-agent System
used to learn the association among multiple steps. Meme external evolution serves to model the social interaction among agents via imitation. In case of meme external evolution, a meme selection process is firstly used to select the appropriate agent to imitate. Unlike advice-exchange wherein agents only follow the advice of the agent with best quality, the similarity of assimilated knowledge among agents is also considered as part of the selection criteria. Once an agent is successfully selected, meme transmission occurs as follows: (1) agent agt(c) passes its current state to the teacher agent agt(s) identified by meme selection; (2) after receiving the given state, agt(s) expresses its inferred action that can be observed by agt(c); (3) agt(c) imitates the action from agt(s) and updates its mind universe by ways of meme assimilation. Taking advantage of meme-inspired design of individual agents and their interactions, MeMAS leads to greater level of self adaptivity and effective social behavior. Particularly, in MeMAS, agents are encouraged to learn from their possible teacher agents which not only have superior performance but also share similar past experiences. Nevertheless, as the experience similarity measure is model specific, the approach is likely to fail when agents of diverse models exist. Specifically, MeMAS has merely employed FALCON as the connectionist learner inside agents’ mind universe. The heterogeneity of learning agents, however, has yet been considered. In practice, modeling the interaction among heterogeneous learning agents based on their internal architectures is non-trivial since the learning principle, theory and mechanism of the different agents are unique and usually highly complex. Therefore, this report endeavors to develop an evolutionary transfer learning approach which facilitates the effective knowledge transfer among heterogeneous learning agents. In this section, neuronal memes are manifested as memory items or portion of an organism’s neurally-stored knowledge inside agents’ mind universe. Further, beyond the formalism of memetic automaton and taking memes as focal point of interest in the field of MAS, this work proposes an evolutionary multi-agent Transfer reinforcement
110
4 Optinformatics Across Heterogeneous Problem …
Fig. 4.19 Illustration of the evolutionary Multi-agent Transfer reinforcement Learning framework
Learning framework (see Fig. 4.19) to develop the evolutionary learning process, particularly on agents’ social interactions, for more efficient problem solving. In what follows, the attention is directed on the representation of memes and evolutionary mechanisms pertaining to eTL.
4.2.3.4
Proposed eTL Model
In the design of eTL, memes are regarded as the fundamental building blocks of the agents’ mind universe, which are categorized into memotypes and sociotypes. Internally, memotypes (depicted by the different LEGO-like block objects that lies in the agents’ mind universe of Fig. 4.19) are described as the agents’ ideas or knowledge captured as memory items or generalized abstractions inside the agents’ mind universe. Externally, sociotypes are defined as the manifested behaviors that could be observed by the others. In particular, in the case that differing agents have distinct memotype structures, sociotypes in the proposed eTL offer a channel for agents to transfer knowledge across the whole population. Meme representation and meme evolution form the two core aspects of eTL. It then undergoes meme expression and meme assimilation. Meme expression is defined for an individual to express their stored neuronal memes as behavioral actions, while meme assimilation captures new memes by translating corresponding behavioral
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
111
information into knowledge that blends into the individual’s mind universe. That is to say, meme representation pertains to what is a meme. Meme expression/assimilation on the other hand activates/updates the meme during the learning process. The meme evolution processes, namely meme internal/external evolution, comprise the main behavioral learning aspects of eTL. To be specific, meme internal evolution indicates the process for agents to update their mind universe through self learning process or personal grooming. In eTL, all agents undergo meme internal evolution by exploring the common environment simultaneously. During self learning process, meme external evolution might happen to model the social interaction among agents mainly via imitation, which takes place when memes are transmitted. Specifically, meme external evolution happens whenever the current agent identifies an appropriate teacher agent via a meme selection process. Once the teacher agent is selected, meme transmission occurs to instruct how the agent imitates from others. During this process, meme variation then facilitates the innovative characteristics of knowledge transfer among agents. Based on the received the feedback from the environment after conducting an action, the agent then proceeds to update its mind universe accordingly.
4.2.4 Realization of eTL with Learning Agents This section presents two realizations of learning agents that take the form of neurallyinspired learning structures, namely a Temporal Difference (TD)-FALCON and a BP multi-layer neural network, respectively. To be more specific, FALCON is a natural extension of self-organizing neural models proposed in [74] for real-time reinforcement learning, while BP is a classical multi-layer network that has been widely used in various learning systems. As aforementioned, eTL comprises several evolutionary operators including meme representation, meme internal evolution and meme external evolution. In what follows, the detailed realization for each of these operators is presented.
4.2.4.1
FALCON Learning Agent
The FALCON meme inspired evolutionary mechanisms are discussed in what follows. FALCON Meme Representation A FALCON learning agent (depicted in Fig. 4.20) employs a 3-channel neural network structure which is composed of a category field F2 and three input fields, which are known as a sensory field F1c1 for denoting the current input states, a motor field F1c2 for denoting behavioral actions, and a feedback field F1c3 for denoting reward signals.
112
4 Optinformatics Across Heterogeneous Problem …
Fig. 4.20 Illustration of meme representation taking the form of FALCON and BP
Input vectors: I = (S, A, R) denotes the input vectors where S = (s1 , s2 , ..., sn ) denotes the state vector, and sn indicates the value of the input sensory n; A = (a1 , a2 , ..., am ) denotes the action vector, and am indicates the mth possible action; R = (r, 1 − r ) denotes the reward vector, and r ∈ [0, 1]. Activity vectors: xck denotes the F1ck activity vector for k = 1, 2, 3 where xc1 , xc2 , xc3 correspond to the input state S, behavioral action A, and feedback reward R, respectively. ck Weight vectors: W = (w1ck , w2ck , ..., wck j ) denotes the weight vector wherein w j indicates ck the weight of the jth neuron in F2 to learn the input activity vector F1 . At the beginning, F2 only has one uncommitted neuron whose weight is initialized by all 1’s. It then becomes committed and another uncommitted neuron will be initialized if the uncommitted neuron is selected to learn the association.
Internally, a memotype is defined as the meme inhabiting inside the cognitive field of F2 , which is known as the mind universe of a FALCON learning agent. All these memotypes (neurons) constructs the internal knowledge of the agent which models the association of the input states and action, thus providing instruction for selecting the appropriate actions. Externally, the sociotype meme is regarded as its expressed actions or behaviors, which are able to be observed and imitated by others. FALCON Meme Expression • Meme activation: Firstly, a propagating process occurs in a bottom-up manner such that the activity value of the memotype in the F2 field are calculated. Partic-
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
113
ularly, based on the activity vectors xck , the value T j that represents the similarity of activity vectors with the respective weights for each meme j of F2 is calculated in what follows: 3 |xck ∧ wck j | Tj = γ ck ck , (4.9) ck α + |w j | k=1 where the fuzzy AND operation ∧ is defined by (p ∧ q)i ≡ min( pi , qi ) for vectors p and q, and the norm | · | is defined by |p| ≡ i pi . Choice parameters α ck and contribution parameters γ ck are predefined, and α ck > 0, γ ck ∈ [0, 1]. • Meme competition: Meme competition process takes place right after meme activation. During this, the F2 meme with the highest activation value will be identified. Let y denotes the F2 activity vector. The winner meme is further labeled with J where (4.10) TJ = max{T j : for all F2 node j}. Particularly, the following winner-take-all conditions are satisfied: y J = 1 and y j = 0 for all j = J . • Sociotype readout: After choosing a winning F2 meme J , the winner performs a readout of its weight vectors into the input field F1ck : xck(new) = xck(old) ∧ wck J .
(4.11)
Consequently, the xck(new) of F1ck are the result of fuzzy AND of xck(old) and wck J . FALCON Meme Assimilation • Memotype matching: After meme competition, the selected meme J is estimated by memotype matching, which is conducted to access if the weight templates of meme J are sufficiently similar to their respective activity patterns. Particularly, resonance occurs only if the match function m ck J for each channel k meets the vigilance criterion: |xck ∧ wck j | = (4.12) m ck ≥ ρ ck , J ck |x | where vigilance parameter ρ ck is pre-defined and ρ ck ∈ [0, 1] for k = 1, 2, 3. In case of the any vigilance constraints is violated, mismatch reset is expected to happens. The value TJ will be set to 0 for the duration of the input presentation and another meme is selected for the memotype matching. • Memotype learning: If resonance occurs successfully, the weight vector wck J in each channel k is updated as follows: = (1 − β ck )wck(old) + β ck (xck ∧ wck(old) ), wck(new) J J J
(4.13)
where β ck ∈ [0, 1] denotes the learning rate parameters. The rationale is to learn by encoding the common attribute values of both the input vectors and weight vectors.
114
4.2.4.2
4 Optinformatics Across Heterogeneous Problem …
BP Learning Agent
Here, this subsection describes the BP meme inspired evolutionary mechanisms. BP Meme Representation The mind universe of a BP agent has a three-layer architecture (as shown in Fig. 4.20), consisting of an input layer I for representing current states S and available actions A, an output layer O consisting of only one neuron for evaluating the association between selected action and a particular state, a hidden layer H for increasing the expressiveness of the network, and a reward signal R for estimating the errors of output neurons. All hidden and output neurons receive inputs from the initial inputs or the interconnections and produce outputs by transformation using a symmetrical sigmoid function. In contrast to the FALCON agent whose memotypes are generated dynamically during the learning process, the internal memotypes housed in a BP agent are modeled as fixed neurons in the hidden layer. BP Meme Expression • Neuronal meme (memotype) forward propagation: A forward propagation process first takes place in which the reward signals of performing each possible action are computed. Specifically, given the state vector S, for each possible action am in A, the outputs of the hidden memotypes are firstly calculated as Hh =
1
1+
p ,h e(−Gain× i=1 Ii ×Wi h )
= 1, ..., q,
(4.14)
where Gain is the gain of sigmoid function, Ii indicates the value of ith output in layer I , Wi h indicates the connection weight from ith neuron in layer I to the hth memotype in layer H , p and q are the number of neurons in the layer I and H , respectively. Then, the output reward value (Ok )m of performing each action am from A is further computed as (Ok )m =
1
1+
q ,k e(−Gain× h=1 Hh ×Whk )
= 1,
(4.15)
where Hh denotes the output of the hth meme in the hidden layer, Whk is the connective weight between hth memotype in the hidden layer and the node in the output layer, q is the total number of memotypes in the hidden layer and k = 1 means the index of the only node in the output layer. • Neuronal meme competition and sociotype readout: After meme forward propagation, the reward value (Ok )m is then used to identify the sociotype meme with the highest reward value. The winner of the winner-take-all competitive strategy is indexed as M where O M = max{(Ok )m : f or all actions in A}.
(4.16)
The corresponding action a M is thus read out as the identified sociotype meme.
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
115
BP Meme Assimilation • Memotype backward error estimation: After performing the identified sociotype meme a M , a memotype backward error estimation process checks the difference between the actual output Ok and expected output Tk of the network. If neuron b is the node of output layer, the error εb is estimated via εbl = (Tb − Ob ) · Ob · (1 − Ob ),
(4.17)
where l is the index number of the layer. If neuron b is the memotype of the hidden layer, the error signal εb is then back propagated from the output layer: εbl
=
εkl+1
· Wbk · Hb · (1 − Hb ),
(4.18)
k
where Hb denotes the output of the bth memotype in the hidden layer. • Memotype learning: With the error signal term, the weight vector of each meme in the hidden layer is updated using the generalized learning rule: new old = Wab + η · εbl · Oal−1 + τ · (η · εbl · Oal−1 )old , Wab
(4.19)
in which η ∈ [0, 1] is the constant learning rate, τ ∈ [0, 1] is the pre-defined positive momentum accelerating the convergence rate of error propagation process, Oal−1 denotes the output value of the sublayer related to the connective weight and superscripts new and old represent the current and most recent training steps, respectively.
4.2.4.3
Meme Internal Evolution
A general illustration of the meme internal evolution and meme external evolution processes in the eTL framework is depicted in Fig. 4.19. More specifically, the meme internal evolution, as summarized in Algorithm 2, governs the growth of an individual’s mind universe via self learning. It is obvious that the internal evolutionary process is made up of a sequence of learning trials, and the trials continue until the ultimate conditions, such as mission numbers and fitness levels, are satisfied (see Line 3). During the meme internal evolution, an agent firstly predicts the Q-values by conducting each possible action through a meme expression process given current state s and a possible action set A (see Lines 5–6). Furthermore, the received Q-values are used to select an appropriate action based on an action selection strategy (see Line 7). A TD formula further computes a new estimate of the Q-value for performing the chosen action in the current state upon receiving the environmental feedback after conducting the action. Then, the new Q-value is used to learn the association from the chosen action and the current state to the new Q-value (see Lines 9–19).
116
4 Optinformatics Across Heterogeneous Problem …
Algorithm 2: Meme Internal Evolution Process Input: Current state s, a set of possible actions A = (a1 , a2 , ..., am ) 1
Initialization: Generate the initial agents
2
Begin:
3 4 5
while stop conditions are not satisfied do for each current agent agt (c) do for each action am ∈ A do Q(s, am ) = Pr edict (s, am )
6 7
Select an action a M from A: a M = Scheme(Q(s, a)).
8
Perform selected a M : {s , r } =Perform(a M ) where s is the resultant state and r is the immediate reward from the environment. Estimate the reward Q (new) (s, a M ) by: Q (new) (s, a M ) = Q(s, a M ) + ϑδ(1 − Q(s, a M ))
9
10 11 12 13 14 15
Do: if agent ∈ FALCON agents then Meme activation with vector {S,A,R} Meme competition with vector {S,A,R} Meme matching with vector {S,A,R} Memotype learning with vector {S,A,R}
16 17 18 19
else Memotype forward propagation with vector {S,A,R} Memotype backward error estimation with vector {S,A,R} Memotype learning with vector {S,A,R}
20
End
In eTL, an -greedy action selection scheme is employed with the objective to balance exploration and exploitation in the meme internal evolution process (see Line 7). Compared to other more complex methods, -greedy does not require memorization of specific data during exploration and is reported to be usually the first choice method as stated by Sutton [75]. This strategy proceeds to select an action that has the highest Q(s, a) value with probability 1 − (0 ≤ ≤ 1), otherwise takes a random action. Additionally, in case of the multiple-step prediction problems, the action selection should not only depend on the current states, since the agent can only know the merit of an action beyond several steps in the future. Therefore, the Temporal Difference (TD) method, such as Q-learning, is employed to evaluate the value function of action-state pairs Q(s, a), which denotes the goodness of a learning system to perform an action a given state s (see Line 9). Specifically, the iterative updating rules of Q-values is defined by: Q (new) (s, a) ← Q(s, a) + ϑδ(1 − Q(s, a)),
(4.20)
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
117
where ϑ ∈ [0, 1] denotes the learning parameter. δ is the temporal-difference error defined as: (4.21) δ = r + γ maxa Q(s , a ) − Q(s, a), where r ∈ [0, 1] is the immediate reward value, γ ∈ [0, 1] is the discount parameter, a is the predicted action under the next state s , and maxa Q(s , a ) is the maximum estimated Q-value of the next state s . The scale term (1 − Q(s, a)) guarantees that the Q-values will always be bounded between within [0, 1], which will be consistent with the input values of the system. Further, the new Q-value that is computed by the TD formula is encoded into the reward for 1) FALCON agents: R = (Q (new) (s, a), 1 − Q (new) (s, a)) and 2) BP agents R = (Tk ) = (Q (new) (s, a)).
4.2.4.4
Meme External Evolution
Meme external evolution, which serves to model the social interaction among the learning agents, is governed by four evolutionary factors, namely meme selection, meme expression, meme imitation and meme variation. Meme Selection Meme selection takes inspiration from Darwin’s notion of natural selection, with the goal of identifying the teacher agent with the highest yield predicted. Through this mechanism, agents’ internal memes (or knowledge) that are more beneficial for problem solving are duplicated exponentially in the knowledge universe pool, while the memes that are less helpful or detrimental rarely get replicated. In the past few years, most researchers have concentrated on exploring “how an agent can learn from other agents”. Among them, uniform random selection is the simplest and most commonly used scheme for choosing a representative agent from the crowd, where each individual has equal chance of being chosen. Nevertheless, more efficient selection approaches are possible if useful information about the individuals is available. A well-known selection strategy is the “imitate-from-elitist”, in which agents learn through a reward mechanism. Based on this elitism selection mechanism, Oliveira et al. [72] presented a study on interactive advice-exchange mechanism (AE) where agents share episodes of given states by seeking advice at critical times, or by requesting the demonstration of a solution to a specific problem from agents whose performance is currently better than all the other agents. However, AE is likely to suffer from blind reliance problem since advisees have no knowledge of the holdings of the reliance and always seek the advice from the teacher agent with the best performance. It was reported that the best teacher agent may not always be the agent with the best performance. This might happen for several reasons. But the most common of them is that each agent is unique both genetically and memetically. Hence, the responding of each agent with a common action for the given states may not suffice. In such cases, an agent also needs to learn which of its partners it can trust and which has the most similar experience in dealing with
118
4 Optinformatics Across Heterogeneous Problem …
the given circumstance. To tackle this blind reliance problem, the notion of “likeattracts-like” [76] has been introduced wherein agents learn from others not only based on the updated performance, but also on the similarities of the best matched knowledge under the given circumstance. In the design of this work, the decision of which peer an agent should learn from depends on both the agents’ superior historical success, namely elitist and the agents’ confidence in solving the request for a particular given situation, namely similarity. This meme selection process in eTL is thus a fusion of the “imitate-from-elitist” and “like-attracts-like” principles. On one hand, the elitist is maintained as a centralized resource to express the judgement of agents’ behaviors in accordance with the past knowledge. It is the simplest and most direct way to provide recommendations since agents do not need to compute the reputation while being able to identify the elitist agent directly. However, when the agents communicate with others, the interaction process might be very labour-intensive and the required information could be inefficient and not always desirable. The reason is that although the elitist has a higher performance than the partners, it might not be the most suitable candidate in solving the given unseen circumstance. On the other hand, similarity does not solely focus on the knowledge of the past, but the prediction on future reward under similar situations. This scheme does not impose higher performances in the partners but rather focuses on the sharing of relevant experiences in dealing with similar previously encountered situations. As a consequence, the similarity scheme narrows down the relevant elitist to learn from. Nevertheless, in the event that agents possess differing prior knowledge (in the form of memotypes) for evaluating a common experience, the similarity-based selection scheme would be futile since it is difficult for agents to judge whether the knowledge from the teacher is beneficial or detrimental to itself.
Algorithm 3: Meme Selection Process Input: Current state s, a set of possible actions A = (a1 , a2 , ..., am ) 1
Begin:
2
Get current agent agt (c) ∈ P
3
for each agent agt (v) where v = c do
4
Get the current state of agent agt (c) as s
5
Get the Si value of agt (v) under the given s: Si(agt (v)) = max{Q(s, A)}/Q best
6
if (El(agt (v)) ≥ El(agt (c))&Si(agt (v)) ≥ Si(agt (c))) then Put agt (v) into set B
7 8 9
for each agent agt (v) in B do Sc(agt (v)) = El(agt (v)) × Si(agt (v))
10
Select the teacher (or source) agent with highest Sc(agt (v))
11
End
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
119
In meme selection, the trustworthiness of performance is estimated in accordance with the positive and (or) negative outcomes produced by the agents in the past. The detailed outline of the meme selection process is given in Algorithm 3. Particularly, it defines El(agt (v)) as the relative fitness of agent agt (v) and Si(agt (v)) as the relative maximum predicted award to perform an action under the given state s:
El(agt (v)) = Fitness(agt (v))/Fitnessbest , Si(agt (v)) = max{Q(s, A)}/Q best ,
(4.22)
where Fitness(agt (v)) denotes the fitness value of an agent agt (v), v in agt (v) is the index of the vth agent, Fitnessbest is the reputation of the elitist or the best performed agent, and Fitness = Fitness + 1 if the agent completes a mission successfully; Q(s, A) indicates the maximum predicted Q-value of agent agt (v) for given state s, and the Q best is the maximum historical Q value of agent agt (v). In the learning process, agents that have been found to have both higher elitist and similarity values than the current agent agt (c) (also known as target agent) are identified to form the set B (see Lines 6–7): B = {agt (v) ∈ P|(El(agt (v) ≥ El(agt (c))) &Si(agt (v)) ≥ Si(agt (c))},
(4.23)
where P denotes the entire population in the environment. Subsequently, the agent with the highest Sc value shall serve as the teacher agent agt (s) (or source agent) in the selection set for meme transmission (see Lines 8–10): Sc(agt (v)) = El(agt (v)) × Si(agt (v)).
(4.24)
Meme Transmission Once a selection is made, meme transmission between agents with unique learning capabilities happens by means of imitation. In the present problems, each agent learns in the same environment and has the same act abilities. Therefore, imitation could offer the advantage for the imitating agent to behave at approximately the same performance level as its imitator. In addition, if a human player is imitated, the autonomous agents could exhibit more believable and complex human-like behaviors, seamlessly. The meme transmission process in eTL is concluded in Algorithm 4. Firstly, current agent agt (c) will passes its state to the chosen teacher agent agt (s) (see Lines 3–4). Further, agent agt (s) expresses its corresponding action a(agt (s)) by meme expression, which can be observed by agent agt (c) (see Line 5). Then, agent agt (c) assimilates the received action into its mind universe by means of meme assimilation (see Lines 7–8). In addition, the detailed variation process which may occur in Algorithm 4 (see Line 6) is described in meme variation as follows.
120
4 Optinformatics Across Heterogeneous Problem …
Algorithm 4: Meme Transmission Process Input: Current state s, a set of possible actions A = (a1 , a2 , ..., am ) 1
Begin:
2
Get current agent agt (c) ∈ P
3
Get current state s(agt (c)) of agt (c)
4
Pass s(agt (c)) to the teacher agent agt (s)
5
Get action a(agt (s)) of agt (s) through Steps 5-7 in Algorithm 1
6
Perform variation on a(agt (s)) with probability ν
7
Perform a(agt (s)): {s , r } =Perform(a(agt (s))) where s is the new state and r is the immediate reward.
8
The remaining steps are the same as those in Algorithm 1
9
End
Meme Variation Meme variation serves to give the inherent innovation tendency of selected sociotypes for meme transmission in the evolutionary knowledge transfer process. More specifically, for knowledge transmission without variation, agents might believe that the knowledge as represented by sociotypes of an elite agent is always helpful with respect to its particular performance at the given circumstance. Due to the nonlinearity of the knowledge transfer process, this bias could spread and spiral out of control since the bias infects any other agents which come into contact with [73]. This would suppress the agents’ ability to explore the environment. Therefore, meme variation is considerably of great importance for retaining diversity in agents’ attitude towards the innovative knowledge transfer process. In the proposed eTL framework, meme variation occurs at the expression and imitation stages, in which a probabilistic interference cost is put on to the estimated action’s Q-value to allow different actions to be selected for expression: Q t = λ × Rand + (1 − λ) × Q,
(4.25)
where Q t is the mutated Q, λ ∈ [0, 1] is the parameter for controlling the degree of randomness, and Rand ∈ [0, 1] is a random value of uniform distribution. The pre-defined ν ∈ [0, 1] is the probability to control the frequency of meme variation process.
4.2.5 eTL with FALCON and BP The proposed eTL is illustrated in Fig. 4.21 where a population of agents learn and evolve in a single MAS environment. Particularly, both the aforementioned FALCON and BP agents denote the two distinct connectionist learners that coexist in the
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
121
Fig. 4.21 The fully general multi-agent scenario with the proposed eTL paradigms. Agents employ FALCON or BP as the learning machine in their mind universes and learn the domain knowledge for performing appropriate actions given environmental states. They may also interact directly as denoted by the arrows connected between agents
environment simultaneously. In the design of eTL, each individual can interact with the environment or one another by undergoing the meme internal evolution process and/or the meme external evolution process, where the former is the learning process for updating an individual’s internal knowledge via personal grooming. The latter process which is central to the behavioral aspects of imitation, models the interactions among multiple agents. The basic steps of the eTL are outlined in Algorithm 5. To be specific, a population of N agents is first generated with BP or FALCON architectures. Furthermore, each of the current agents performs meme internal evolution. Meanwhile, meme external evolution proceeds if an agent agt (c) identifies the teacher agent agt (s) via meme selection. Then, agent agt (c) shall undergo the action predicted using meme internal evolution or meme external evolution according to probability C p : C p (agt (c)) = 1 −
Si(agt (c)) Fitness(agt (c)) × , Fitness(agt (s)) Si(agt (s))
(4.26)
where Fitness(agt (c)) is the fitness of agent agt (c). Fitness(agt (s)), on the other hand, denotes the fitness of the selected teacher agent, Si(agt (c)) is the similarity value of current agent agt (c) for the given states, Si(agt (s)) is the similarity value of
122
4 Optinformatics Across Heterogeneous Problem …
Algorithm 5: Basic eTL Framework 1
Begin:
2
Initialization: Initialize N agents
3 4
while stop creterias are not satisfied do for each current agent agt (c) do
5 6
/*Perform meme internal evolution*/ See Algorithm 1, Line 5-7
7 8
if Identify the teacher agent via meme selection then /*Perform meme external evolution*/ Perform meme expression with agt (s) under the state of agt (c)
9 10
Perform meme variation with probability ν
11
/*ν is the frequency probability of variation process*/
12
Perform meme transmission for agt (c) to assimilate agt (s)’s action
13
/*End meme external evolution*/
14
Evaluate the probability C p of each agent
15
/*Perform the action from the meme external evolution with probability C p , else perform action from the meme internal evolution*/
16
The remaining steps are the same as those in Algorithm 1.
17
/*End meme internal evolution*/
18
End
the teacher agent. Thus, each agent undergoes the action provided by meme external evolution with probability C p . During meme external evolution, meme variation occurs at a probability of v to maintain the diversity towards agents’ selection for meme transmission.
4.2.6 Empirical Study In order to study the effectiveness and efficiency of the proposed evolutionary transfer learning approach, the experiment firstly validates eTL on a commonly used Minefield Navigation Tasks (MNT) [77]. Subsequently, a well-known commercial first person shooter game, namely “Unreal Tournament 2004” (UT2004) [78], is used to investigate the performance of eTL under complex game scenarios.
4.2.6.1
Experimental Platform—Minefield Navigation Task
In MNT, the goal of each unmanned tank (or agent) is to navigate across a minefield comprising of randomly positioned mines, other tanks and target within a given
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
123
stipulated time frame successfully. Figure 4.22 depicts an overall view of the MNT environment. The unmanned tanks that spawn randomly within the field at the beginning of each mission are equipped with sensors, thence they are able to get access to a set of detections, including mine detection, agent detection and target bearing. Further, each tank in MNT possesses sensors which have a forward view of 180◦ , including left, left oblique, front, right oblique and right. According to the input state, each unmanned tank performs one of the five possible actions at each step: A = {L E F T, L E F T F R O N T, F R O N T, R I G H T F R O N T, R I G H T }. Particularly, an unmanned tank is rewarded with a positive value of 1 when the tank arrives at the target. Or, if the tank hits a mine, collides with other tanks or does not reach the target within the given time frame, the tank will be assigned with zero reward. According to the Q-learning lemma, negative feedback signal is not used in this reward schema, which guarantees the Q value is always within the desired boundary [0, 1]. In the experimental study, there is a total of 10 mines and 1 target (known as the red flag) that are randomly generated over missions in a 16 × 16 minefield. A mission completes when all tanks reach the target (success), hit a mine or collide with another tank (failure), or exceed 30 time steps (out of time), as considered in [73, 74].
Fig. 4.22 The illustration of MNT
124
4.2.6.2
4 Optinformatics Across Heterogeneous Problem …
Experimental Configuration
The parameter settings of FALCON, BP, TD methods and the eTL configured in the present experimental study are summarized in Table 4.5. Notably, the configurations of experiments in MNT are considered to be consistent with the previous study in [74] for the purpose of fair comparisons. The results with respect to the following metrics are then reported: SR GM
denotes the average success rate of the agents on completing the missions. denotes the average number of memotypes in agents. In particular, as discussed in Sect. 4.2.4, the memotypes of FALCON and BP are defined as the number of memes generated in the cognitive field F2 and the number of memes pre-configured in the hidden layer, respectively. In general, for the same level of success rate, agents with lower number of memotypes are preferred, since it implies better generalization of knowledge.
Table 4.5 Summary of the parameter configuration in the proposed eTL FALCON parameters Choice parameters (0.1, 0.1, 0.1) (α c1 , α c2 , α c3 ) Learning rates (β c1 , β c2 , β c3 ) (1.0, 1.0, 1.0) Contribution parameters (0.5, 0.5, 0) (γ c1 , γ c2 , γ c3 ) Baseline vigilance parameters (0.2, 0.2, 0.5) (ρ c1 , ρ c2 , ρ c3 ) BP parameters Learning rate η 0.25 Momentum factor τ 0.5 Gain of sigmoid function Gain 1.0 Temporal difference learning parameters TD learning rate ϑ 0.5 Discount factor γ 0.1 Initial Q-value 0.5 -greedy action policy FALCON parameters Initial value 0.5 decay rate 0.0005 Transfer variation parameters Degree of randomness λ 0.1 Frequency of variation ν 0.1
BP 0.5 0.00025
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
KN
4.2.6.3
125
denotes the average number of knowledge transfer events happening between agents. In general, a higher number of knowledge transfer implies higher computational costs incurred to perform transfer learning among agents.
Performance of eTL Using FALCON or BP as Learning Agents
In this subsection, the objective of the present experimental study is to investigate the performance of the proposed eTL with FALCON or BP learning agents on completing the Minefield Navigation Tasks. Specifically, the MAS is composed of six FALCON or BP agents in the 16x16 size map of the Minefield Navigation Task platform where the learning architecture of MAS is discussed in Sect. 4.2.4. Further, the interaction and implementation of learning agents are presented in Sect. 4.2.3.4. In this study, 60 memotypes are configured in BP to achieve the best success rate while maintaining a low space complexity [74]. To maintain consistency with previous studies [73, 74, 76], the designed experiments conduct 30 sets of independent experiments and report the average performances of (1) FALCON agents at 100-mission intervals over a total of 2000 missions and (2) BP agents at 1000-mission intervals over a total of 20000 missions, respectively. Moreover, two state-of-the-art TL approaches for MAS, namely Advice Exchange (AE) model [72] and Parallel Transfer Learning (PTL) [68], are considered here for the purpose of comparison. In AE, the agents, or advisees, learn from better performed advisors based on the “imitate-from-elitist” selection strategy. More specifically, an agent agt (c) with poor performance will seek advice from the best performing agent agt (b) if (qc < dqb ), where qc denotes the average quality of agent agt (c), qb is the average quality of agent agt (b) and d is a user defined discount parameter in the interval of [0, 1]. For fair comparison, this experiment investigates various values of the discount factor, particularly, for (i) a small value 0.1 (AE-0.1), (ii) a medium value 0.5 (AE-0.5) and (iii) a large value 0.9 (AE-0.9), respectively, and reports the average success rates of the agents (AE-AVG). Figure 4.23 depicts the average success rates of agents under differing AE models. Notably, AE models with different discount factors reported distinct success rates. Further, PTL is a TL method where the current agent leverages from the knowledge broadcasted by the others in the environment. In particular, if an agent has useful information to share, it shall broadcast the information to all other agents. Meanwhile, the current agent will also check its communication buffer to determine if any information has been received and then decides whether to accept the information or discard it. The complete results pertaining to the success rate (SR), generated memotypes (GM) and interval-unit knowledge transfer numbers (KN) of both BP and FALCON agents under Conv. M, Advice Exchange model (AE), Parallel Transfer Learning (PTL) and the proposed eTL are summarized in Table 4.6 and Figs. 4.24, 4.25. The detailed analyses of the obtained results shall be discussed comprehensively in what follows.
4 Optinformatics Across Heterogeneous Problem … 70
70
60
60 Success Rate (SR)
Success Rate (SR)
126
50 40 AE−0.1 AE−0.5 AE−0.9 AE−AVG
30 20 10
50 40
20 10
0
500
1000
1500
AE−0.1 AE−0.5 AE−0.9 AE−AVG
30
2000
0
0.5
1
1.5
Number of Missions (MN)
Number of Missions (MN)
(a) FALCON agents.
(b) BP agents.
2 4
x 10
Fig. 4.23 SRs of agents under AE models with discount factor 0.1, 0.5 and 0.9 on completing the missions in MNTs Table 4.6 Performance comparison among eTL, existing TL approaches and Conv. M (SR: success rate, GM: generated memotypes, KN: knowledge transfer numbers) # Scenarios FALCON agents BP agents SR KN GM SR KN GM 1 2 3 4 5 6 7
eTL PTL AE-0.1 AE-0.5 AE-0.9 AE-AVG Conv. M
71.3 59.8 63.3 65.4 69.0 65.8 61.3
324 131 24 58 428 170 0
188 191 152 158 168 159 144
64.2 56.4 54.6 60.7 61.4 58.8 44.5
1715 178 59 396 4959 1797 0
60 60 60 60 60 60 60
Comparison of MASs with and without TL The success rates obtained by the MAS with and without TL approaches are given in the SR columns of Table 4.3. Their corresponding learning trends are also depicted in Fig. 4.24. It it notable that most TL approaches outperform Conv. M. This is attributed to the TL approaches which endow agents with capacities to benefit from the knowledge transferred from the better performing agents, thus accelerating the learning rate of the agents in solving the complex task more efficiently and effectively. Agents in Conv. M, on the other hand could only undergo learning via personal grooming. To illustrate the efficacy of agents in the eTL framework in solving the minefield navigation problem, Fig. 4.25 depicts the sample snapshots of the navigated routes made by the FALCON agents. As can be observed, at the early learning stages, such as 1 and 500 learning missions, the FALCON agents have high probability to hit the mines or collide with other agents. While learning progresses, these agents tend to exhibit higher success rates.
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
127
70 60
60 50 40 30 20 10 0
eTL PTL AE−AVG Conventional MAS 500 1000 1500 Number of Missions (MN)
(a) FALCON agents.
2000
Success Rate (SR)
Success Rate (SR)
70
50 40 30
eTL PTL AE-AVG Conventional MAS
20 10 0
0
0.5
1
1.5
Number of Missions (MN)
2
× 104
(b) BP agents.
Fig. 4.24 SRs of FALCON or BP agents under eTL, PTL, AE-AVG and Conv. M on completing the missions in MNT
Comparison of eTL Against Other State-of-the-art TL Approaches When compared to the state-of-the-art TL approaches, the proposed eTL is shown to achieve superior performance in terms of success rate throughout the learning process. In particular, FALCON and BP agents with the proposed eTL reported approximately 11.5% and 7.8% improvements in success rate, respectively, over PTL at the end of missions (see Table 4.3, column SRs). This is due to the reason that, when deciding whether to accept the information broadcasted by the others, agents in PTL tend to make incorrect predictions on previously unseen circumstances. Further, the proposed eTL also demonstrated superiority in attaining higher success rates than all AE models. As discussed in Sect. 4.2.4.4, this can be attributed to the meme selection operator of the proposed eTL, which considers a fusion of the “imitate-from-elitist” and “like-attracts-like” principles so as to give agents the option of choosing more reliable teacher agents over the AE model. Moreover, it is worthy noting that the AE model with a discount factor of 0.9 (labeled as AE0.9) attained a success rate that is the closest to that of the proposed eTL. However, AE-0.9 incurred a higher knowledge transfer numbers (i.e., computational efforts) during the learning process (see Table 4.3, column KNs). Moreover, in order to quantitatively evaluate how effective the memotypes are generated in FALCON agents under different TL approaches, the effectiveness ratio is defined as follows: Ratio(T L) = G M/(S R(T L) − S R(Conv.M)),
(4.27)
where (S R(T L) − S R(Conv.M)) denotes the improvements on success rate of the learning agents exhibited by the TL approaches against Conv. M. In general, a smaller positive Ratio value is preferable, which implies a higher knowledge generalization performance.
128
4 Optinformatics Across Heterogeneous Problem …
Fig. 4.25 Snapshots of the FALCON agents’ navigation routes on completing 1, 500, 1000 and 2000 learning missions in Minefield Navigation Task. The objective of each FALCON agent (denoted as a tank in the figure) in a learning mission is to arrive at the target successfully by navigating across the minefield safely and within the allocated time span. Thus a learning mission completes when all tanks reach the target, hit a mine or collide with another tank, or exceed the given time steps. All tanks and mines are randomly generated for each learning mission hence different learning missions have unique navigational routes
Table 4.7 summarizes the effectiveness ratio of the memotypes generated in FALCON agents under different TL approaches at the end of the learning missions. Notably, the proposed eTL reported the most effective memotypes in FALCON agents since it attained the smallest effectiveness ratio among all the three TL approaches.
4.2.6.4
Performance of Proposed eTL Using Heterogeneous Agents
In the previous subsection, a study on MAS with a homogenous setting has been considered, i.e., all agents in the system are assumed to be uniform and bear the same
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
129
Table 4.7 The effectiveness of generated memotypes in FALCON agents under the eTL and other existing TL approaches # Metrics FALCON agents eTL PTL AE-AVG 1 2 3
Generated memotypes Success rate improvement Memotype effectiveness
188
191
159
10.0
−1.5(1)
4.5
18.8
191
35.3
learning machine. This subsection further considers the complex realistic scenario involving a diversity of heterogeneous agents in a MAS. In particular, a mixture of three FALCON agents and three BP agents are employed in the heterogeneous MAS for solving the Minefield Navigation Tasks. Notably, the architecture of MAS and the interaction among heterogeneous learning agents are discussed in Sect. 4.2.3.4. In the heterogeneous MAS, knowledge transfer among agents is more complex than that in the homogenous MAS. This is because FALCON and BP agents possess unique learning structures, hence they have different learning capabilities, including specialized action prediction rules, learning speed, etc. As such, the knowledge identified by TL approaches working well in homogenous MAS might become futile and even detrimental to the agents with dissimilar learning structures. For instance, in the heterogeneous MAS where both FALCON and BP agents are configured using the PTL as the TL approach, BP agents will broadcast the useful knowledge to all other agents. Considering that the FALCON agents learn much faster than BP agents (as discussed in Sect. 4.2.6.3), these knowledge broadcasted from BP agents could always lag behind that of FALCON agents, and hence deteriorate the performance of FALCON agents as depicted in Fig. 4.26a. Thus, in this subsection, experiments are further conducted to investigate the performance of the proposed eTL in the heterogeneous MAS setting. The success rate (SR), knowledge transfer numbers (KN) and generated memotypes (GM) of both FALCON and BP agents obtained under Conv. M and TL approaches are presented in Table 4.8. Moreover, the learning trends in terms of success rate are depicted in Fig. 4.26. Notably, FALCON agents under AE and PTL learning approaches often succumbed to poor success rate in the long run. According to the average success rate attained by AE models with discount factor of 0.1, 0.5 and 0.9 (see Fig. 4.27), competitive success rates of FALCON agents to that of Conv. M is reported. On the other hand, under the PTL scheme, a significant drop of 5.5% in success rate as compared to Conv. M is observed at the end of the learning process (see Table 4.8, column SRs). As aforementioned, this is a result of the blind reliance on the knowledge transferred from the BP agents that misled the FALCON agents into performing erroneous acts under the heterogeneous MAS environment.
4 Optinformatics Across Heterogeneous Problem …
70
70
60
60
50 40 30
eTL PTL AE-AVG Conventional MAS
20 10
Success Rate (SR)
Generate Memotypes (GM)
130
50 40 30
10 0
500
1000
1500
eTL PTL AE-AVG Conventional MAS
20
2000
0
0.5
1
1.5
Number of Missions (MN)
Number of Missions (MN)
2 × 104
(b) BP agents.
(a) FALCON agents.
Fig. 4.26 SRs of FALCON and BP agents under eTL, PTL, AE-AVG and Conv. M on completing the missions in the heterogeneous MAS Table 4.8 Comparison among eTL, existing TL approaches and Conv. M in heterogeneous MAS (SR: success rate, GM: generated memotypes, KN: knowledge transfer numbers) # Scenarios FALCON agents BP agents SR KN GM SR KN GM eTL PTL AE-0.1 AE-0.5 AE-0.9 AE-AVG Conv. M
69.7 59.7 65.5 65.5 67.3 66.1 65.2
310 93 27 62 423 171 0
168 167 144 154 158 152 145
70
70
60
60
50 40 AE−0.1 AE−0.5 AE−0.9 AE−AVG
30 20 10 0
500
1000
1500
Number of Missions (MN)
(a) FALCON agents.
2000
Success Rate (SR)
Success Rate (SR)
1 2 3 4 5 6 7
68.2 63.1 60.1 66.7 68.2 65.0 49.1
1189 267 33 266 4700 1667 0
60 60 60 60 60 60 60
50 40 AE−0.1 AE−0.5 AE−0.9 AE−AVG
30 20 10 0
0.5
1
1.5
Number of Missions (MN)
2 4
x 10
(b) BP agents.
Fig. 4.27 SRs of FALCON and BP agents under AE models with discount factor 0.1, 0.5 and 0.9 on completing the missions in the heterogeneous MAS
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
131
On the other hand, the proposed eTL significantly outperforms the counterparts, for both FALCON and BP agents, throughout the entire learning process. Specifically, at the end of the learning process, it attained the highest success rate of 69.7% and 68.2% for FALCON and BP agents, respectively (see Table 4.8, column SRs). Particularly, according to the result in Fig. 4.26, FALCON agents are noted to learn much faster than BP agents. Therefore, the knowledge transferred from FALCON agents is helpful for complementing the learning ability of BP agents, thus enhancing the SRs of BP agents significantly on the MNT problem. The result thus further highlights the efficacy of the meme selection strategies in the eTL in the choice of reliable and efficient information while reducing blind reliance in transfer learning, specifically in the context of the heterogeneous MAS. Moreover, the experiments are investigated on KNs and GMs of FALCON and BP agents in both homogenous and heterogeneous MNTs. As can be observed in Tables 4.6 and 4.8, PTL reports the least KNs, significantly less than those of eTL and AE-AVG. However, although PTL generates far less KNs, it generates almost the most number of memotypes in FALCON agents among all TL approaches. This indicates that the transferred knowledge in PTL tends to generate too much redundant and possibly detrimental memotypes, hence leading to the poor success rates of FALCON agents. On the other hand, AE-AVG reports the smallest number of memotypes among all TL approaches for FALCON agents. Particularly, according to Table 4.8, the AE approach with a discount factor of 0.9 (labeled as AE-0.9) generates around 158 memotypes and incurs a higher knowledge transfer number (423). eTL, on the contrary, generates lower KNs, while still obtaining both higher numbers of memotypes and success rates in FALCON agents. The obtained result thus highlights the efficiency of eTL in making use of the transferred knowledge, while generating useful memotypes in FALCON agents. In summary, the performance of the proposed eTL in the heterogeneous MAS is consistent to that of the homogenous settings. This demonstrates the superiority of the proposed eTL in producing higher success rate against existing TL approaches in both homogenous and heterogeneous MASs.
4.2.6.5
Performance of Proposed eTL in Unreal Tournament 2004
Further, the performance of the proposed eTL is testified in the popular but complex commercial first-person shooter video game—“Unreal Tournament 2004” (UT2004) which provides an environment for embodying virtual artificial intelligent agents. UT2004 has many built-in game scenarios, and among them the most commonly played ones are known as the DeathMatch, Domination, and Capture The Flag (CTF) scenarios. Over the past decades, first-person shooter computer games have attracted dramatic attentions and garnered a large part of the multi–billion dollars computer game industries. Notably, the modeling of autonomous non-player characters is of great importance for the commercial success since it makes the game more playable, challenging and more importantly, it has been deemed to be useful in improving the
132
4 Optinformatics Across Heterogeneous Problem …
Fig. 4.28 Pogamut architecture
players’ degree of satisfaction. Therefore, this section further considers a practical game problem where eTL is applied to model the non-player characters in UT2004. To customize different game scenarios, UT2004 provides a scripting language namely UnrealScript, which can be coded by users to easily create maps, game models, agents’ algorithms and agents’ actions, etc. Further, to apply the FALCON algorithm for agents in Unreal Tournament 2004, the Pogamut [79] which is an integrated open source platform to facilitate development of behavior for virtual agents embodied in the 3D environment of Unreal Tournament 2004, is used in the present context. The architecture of Pogamut is described in Fig. 4.28. Pogamut is developed as a plugin for NetBeans java development environment. It can communicate with UT2004 by a generic binding called GameBots 2004 (GB2004) [80]. GB2004 provides network TCP/IP text protocol for getting information from the game environment, and allow users to connect to UT2004 and control in-game avatars employing the client-server architecture. Moreover, as GB2004 only exports and imports text messages, a java library—“Gavialib” is adopted for Pogamut to automatically convert text information into java objects and debug the virtual agent remotely through JMX protocol [81] and Pogamut NetBeans Plugin. Figure 4.29 depicts a snapshot of the running game taken from the judge view, which showcases the shootout features in UT2004, such as weapons, armour or medical kits, etc. In UT2004, all autonomous robots that spawn randomly within the environment are equipped with a set of sensors. Taking the information captured by sensors as input states, they could learn and utilize the association of their current states, behavior selections and rewards. Table 4.9 illustrates a sample knowledge of FALCON combatants during the training process. For each state, the autonomous robot performs a set of pre-defined combat actions with some rule-based heuristics. The immediate reward thus helps to update the knowledge in mind universe. In this set experiments, five behavioral actions for TD-FALCON combatants are designed. In particular, the specific actions in the action vector A = (EXPLORE, ITEMS, ENGAGE, MEDKIT, ROTATE) are respectively described as follows:
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System
133
Fig. 4.29 A snapshot of the “Unreal Tournament 2004” Table 4.9 Sample knowledge of the autonomous robots State S = (1.0, 0.0, 0, 1, 1, 0, 0.4, 0.6, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1) IF THEN WITH
Health is 1.0, and not being damaged, and opponent is in sight, and has 40% of ammo, and currently in collecting item state Go into engaging fire state Reward of 0.768
1. EXPLORE action, where a FALCON robot wanders around the map randomly. 2. ITEMS action, where a FALCON robot finds and picks up the nearest visible items, such as weapons, armors, ammo, etc. 3. ENGAGE action, where a FALCON robot pursues and eliminates the enemies. 4. MEDKIT action, where a FALCON robot finds the medical kit to recuperate. 5. ROTATE action, where a FALCON robot rotates around the spot. The state vector of an autonomous robot comprises 18 variables encoded by boolean or discretized number within [0, 1]. Assigned by groups, these 18 variables are of 9 pairs wherein each pair has a base status and the corresponding complement (i.e., x and 1 − x as a pair, x ∈ [0, 1]). The base status include the health level of an autonomous robot, whether a robot is damaged, whether enemies are spotted, whether
134
4 Optinformatics Across Heterogeneous Problem …
a robot has adequate ammo, and another five boolean variables to indicate the current behavior of a combatant by means of archiving previous states and actions. The aim of an autonomous robot is to win the game by getting higher reward. The reward vector only comprises two variables, namely the reward and its complement. Defining 0.5 as a successful hit and 1 as a successful kill, the reward system is simple for fast learning. Defining a neutral value 0.5 as a successful hit and a high value 1 as a successful kill, the reward system is simple for fast learning. In the design of this experiment, two groups of six armed robots are firstly initialized in the region of interest. Each armed robot is equipped with a set of sonar sensors and weapons. One group of armed robots are labeled as Hunters and each of them has a mind universe that is governed by human defined rules [78]. The second group comprises autonomous armed robots with FALCON as the online learning machine. Under the simple “DeathMatch” scenario, two groups of armed robots fight against each other by making use of the available resources, including weapons, medical kits, armors, etc. When the armed robots are killed by the enemies, a new one will spawn randomly in their corresponding base. The battle repeats until each group reaches a minimum missions of 200, wherein one mission completes only if an armed robot in the group is killed by the enemies. To evaluate the performance of the proposed eTL in UT2004 more quantitatively, the kill rate (KR) of game characters has been defined by the percentage of enemy kill numbers over agent death numbers [78]: t c N (k)im 1 × Nt × Nc N (d)im i=1 m=1
N
KR =
N
where Nt is the simulation numbers, Nc denotes the number of participating combatants in each team, N (k) indicates the number of kills and N (d), on the contrary, is the number of deaths. Thus, K c is a good measurement of the averaged fighting performance played by different teams. Typically, a higher KR is preferred since it indicates that a game character could kill more enemies while is less killed by the others. Figure 4.30 summarizes the competitive performance of rule-based Hunters and FALCON robots under both the Conv. M and proposed eTL, in terms of KR. Notably, at the early learning stage, the rule-based Hunters have significant higher KR than FALCON robots since the rules are well-designed for Hunters while FALCON robots suppose to randomly explore the environment at the beginning of the learning process. However, the KR of Hunters continues to decrease as the FALCON robots learn during combats. After 200 missions, the KR of the FALCON team under Conv. M is approximately 10% lower than that of the Hunters. FALCON robots under the proposed eTL, on the other hand, report significant higher KR than those under Conv. M throughout the learning process. Particularly, FALCON robots with eTL attained KR of 1.0 within 200 missions. The improved performance obtained by the proposed eTL demonstrates its efficacy in improving the combat performance of FALCON robots via the evolutionary knowledge transfer in the first-person shooter computer game —“Unreal Tournament 2004”.
4.2 Evolutionary Knowledge Learning and Transfer in Multi-agent System Fig. 4.30 KR of FALCON robots under the Conv. M and the proposed eTL fighting against the rule-based Hunters
135
2 1.8
Kill Rate (KR)
1.6
Rule−based Hunters FALCON robots under conventional MAS FALCON robots under eTL
1.4 1.2 1 0.8 0.6 0.4 0
50
100
150
200
Number of Missions
4.2.7 Summary This sections proposes an eTL framework, which is governed by several memeinspired evolutionary mechanisms, namely meme representation, meme expression, meme assimilation, meme internal evolution and meme external evolution. In particular, comprehensive designs of the meme-inspired operators and two realizations of the learning agents in eTL that take the form of the (TD)-FALCON and the classical BP multi-layer neural network, respectively, are presented. The performance efficacy of the proposed eTL is investigated via comprehensive empirical studies and benchmarked against both Conv. M without knowledge transfer and two stateof-the-art MAS TL approaches (i.e., AE Model and PTL), on the widely used MNT platform, under homogenous as well as heterogeneous MASs settings. The superior performance achieved by eTL confirms its efficacy in conducting evolutionary knowledge transfer in MAS, hence suggests the basis for greater research in developing evolutionary TL strategies. Also, a well-known first person shooter game, namely UT2004, is used to demonstrate the effectiveness of the proposed eTL under the complex problem solving scenarios.
References 1. Y.C. Jin, Knowledge Incorporation in Evolutionary Computation. Studies in Fuzziness and Soft Computing (Springer, 2010) 2. Y.S. Ong, N. Krasnogor, H. Ishibuchi, Special issue on memetic algorithm. IEEE Trans. Syst. Man Cybern.—Part B 37(1), 2–5 (2007) 3. Y.S. Ong, M.H. Lim, F. Neri, H. Ishibuchi, Special issue on emerging trends in soft computing: memetic algorithms. Soft Comput.-A Fusion Found. Methodol. Appl. 13(8–9), 1–2 (2009) 4. M.H. Lim, S. Gustafson, N. Krasnogor, Y.S. Ong, Editorial to the first issue, memetic computing. Soft Comput.-A Fusion Found. Methodol. Appl. 1(1), 1–2 (2009)
136
4 Optinformatics Across Heterogeneous Problem …
5. J.E. Smith, Co-evolving memetic algorithms: a review and progress report. IEEE Trans. Syst. Man Cybern.—Part B 37(1), 6–17 (2007) 6. I. Paenke, Y. Jin, J. Branke, Balancing population-and individual-level adaptation in changing environments. Adapt. Behav. 17(2), 153–174 (2009) 7. G. Gutin, D. Karapetyan, A selection of useful theoretical tools for the design and analysis of optimization heuristics. Memet. Comput. 1(1), 25–34 (2009) 8. J. Tang, M.H. Lim, Y.S. Ong, Diversity-adaptive parallel memetic algorithm for solving large scale combinatorial optimization problems. Soft Comput. J. 11(9), 873–888 (2007) 9. Z. Zhu, Y.S. Ong, M. Dash, Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans. Syst. Man Cybern.—Part B 37(1), 70–76 (2007) 10. S. Hasan, R. Sarker, D. Essam, D. Cornforth, Memetic algorithms for solving job-shop scheduling problems. Memet. Comput. 1(1), 69–83 (2009) 11. M. Tang, X. Yao, A memetic algorithm for VLSI floor planning. IEEE Trans. Syst. Man Cybern.—Part B 37(1), 62–69 (2007) 12. P. Cunningham, B. Smyth, Case-based reasoning in scheduling: reusing solution components. Int. J. Prod. Res. 35(4), 2947–2961 (1997) 13. S.J. Louis, J. McDonnell, Learning with case-injected genetic algorithms. IEEE Trans. Evol. Comput. 8(4), 316–328 (2004) 14. L. Feng, Y.S. Ong, I.W. Tsang, A.H. Tan, An evolutionary search paradigm that learns with past experiences, in IEEE World Congress on Computational Intelligence, Congress on Evolutionary Computation (2012) 15. R. Dawkins, The Selfish Gene (Oxford University Press, Oxford, 1976) 16. X.S. Chen, Y.S. Ong, Q.H. Nguyen, A conceptual modeling of meme complexes in stochastic search. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42(5), 612–625 (2011) 17. Y. Mei, K. Tang, X. Yao, Improved memetic algorithm for capacitated arc routing problem. IEEE Congress on Evolutionary Computation, pp. 1699–1706 (2009) 18. E.W. Dijkstra, A note on two problems in connection with graphs. Numerische Mathematik 1, 269–271 (1959) 19. I. Borg, P.J.F. Groenen, Modern Multidimensional Scaling: Theory and Applications (Springer, 2005) 20. C. Wang, S. Mahadevan, Manifold alignment without correspondence, in Proceedings of the 21st International Joint Conference on Artificial Intelligence, pp. 1273–1278 (2009) 21. F. Neri, C. Cotta, P. Moscato, Handbook of Memetic Algorithms. Studies in Computational Intelligence (Springer, 2011) 22. F. Neri, E. Mininno, Memetic compact differential evolution for cartesian robot control. IEEE Comput. Intell. Mag. 5(2), 54–65 (2010) 23. C.K. Ting, C.C. Liao, A memetic algorithm for extending wireless sensor network lifetime. Inf. Sci. 180(24), 4818–4833 (2010) 24. Y.S. Ong, M.H. Lim, X.S. Chen, Research frontier:—past, present & future. IEEE Comput. Intell. Mag. 5(2), 24–36 (2010) 25. X. Chen, Y. Ong, M. Lim, K.C. Tan, A multi-facet survey on memetic computation. IEEE Trans. Evol. Comput. 15(5), 591–607 (2011) 26. S.J. Pan, Q. Yang, A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345– 1359 (2010) 27. A. Gretton, O. Bousquet, A. Smola, B. Scholkopf, ¨ Measuring statistical dependence with hilbert-schmidt norms. Proceedings of Algorithmic Learning Theory, pp. 63–77 (2005) 28. S.C.H. Hoi, J. Zhuang, I. Tsang, A family of simple non-parametric kernel learning algorithms. J. Mach. Learn. Res. (JMLR) 12, 1313–1347 (2011) 29. K.M. Borgwardt, A. Gretton, M.J. Rasch, H.P. Kriegel, B. Scholkopf, ¨ A.J. Smola, Integrating structured biological data by kernel maximum mean discrepancy. Int. Conf. Intell. Syst. Mol. Biol. 49–57 (2006) 30. L. Song, A. Smola, A. Gretton, K.M. Borgwardt. A dependence maximization view of clustering, in Proceedings of the 24th International Conference on Machine Learning, pp. 815–822 (2007)
References
137
31. B.E. Gillett, L.R. Miller, A heuristic algorithm for the vehicle-dispatch problem. Oper. Res. 22(2), 340–349 (1974) 32. G. Clarke, J. Wright, Scheduling of vehicles from a central depot to a number of delivery points. Oper. Res. 12(4), 568–581 (1964) 33. B. Golden, R. Wong, Capacitated arc routing problems. Networks 11(3), 305–315 (1981) 34. B.L. Golden, J.S. DeArmon, E.K. Baker, Computational experiments with algorithms for a class of routing problems. Comput. Oper. Res. 10(1), 47–59 (1983) 35. G. Ulusoy, The fleet size and mix problem for capacitated arc routing. Eur. J. Oper. Res. 22(3), 329–337 (1985) 36. T. Bäck, U. Hammel, H.-P. Schwefel, Evolutionary computation: comments on the history and current state. IEEE Trans. Evol. Comput. 1(1), 3–17 (1997) 37. J.W. Eerkens, C.P. Lipo, Cultural transmission, copying errors, and the generation of variation in material culture and the archaeological record. J. Anthr. Archaeol. 24(4), 316–334 (2005) 38. M.A. Runco, S. Pritzker, Encyclopedia of Creativity (Academic Press, 1999) 39. T.L. Huston, G. Levinger, Interpersonal attraction and relationships. Ann. Rev. Psychol. 29(1), 115–156 (1978) 40. F. Bousquet, C. Le Page, Multi-agent simulations and ecosystem management: a review. Ecol. Model. 176(3), 313–332 (2004) 41. P. Stone, M. Veloso, Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000) 42. B. Burmeister, A. Haddadi, G. Matylis, Application of multi-agent systems in traffic and transportation, in IEE Proceedings-Software Engineering [see also Software, IEE Proceedings], vol. 144 (IET, 1997), pp. 51–60 43. D.L. Hancock, G.B. Lamont, Multi agent systems on military networks, in 2011 IEEE Symposium on Computational Intelligence in Cyber Security (CICS) (IEEE, 2011), pp. 100–107 44. M. Pipattanasomporn, H. Feroze, S. Rahman, Multi-agent systems in a distributed smart grid: design and implementation, in Power Systems Conference and Exposition, 2009. PSCE’09. IEEE/PES (IEEE, 2009), pp. 1–8 45. R.S. Sutton, Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988) 46. C.J. Watkins, P. Dayan, Q-learning. Mach. Learn. 8(3–4), 279–292 (1992) 47. G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems. Department of Engineering (University of Cambridge, 1994) 48. M.P. Deisenroth, G. Neumann, J. Peters et al., A survey on policy search for robotics. Found. Trends Robot. 2(1–2), 1–142 (2013) 49. S.P. Singh, R.S. Sutton, Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1–3), 123–158 (1996) 50. A. Lazaric, M. Restelli, A. Bonarini, Reinforcement learning in continuous action spaces through sequential monte carlo methods, in Advances in Neural Information Processing Systems, pp. 833–840 (2007) 51. K. Ueda, I. Hatono, N. Fujii, J. Vaario, Reinforcement learning approaches to biological manufacturing systems. CIRP Ann.-Manuf. Technol. 49(1), 343–346 (2000) 52. I. Giannoccaro, P. Pontrandolfo, Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002) 53. A.E. Gaweda, M.K. Muezzinoglu, A.A. Jacobs, G.R. Aronoff, M.E. Brier, Model predictive control with reinforcement learning for drug delivery in renal anemia management, in Engineering in Medicine and Biology Society, 2006. EMBS’06. 28th Annual International Conference of the IEEE (IEEE, 2006), pp. 5177–5180 54. T.G. Dietterich, Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. (JAIR) 13, 227–303 (2000) 55. R.S. Sutton, D. Precup, S. Singh, Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999) 56. M.E. Taylor, P. Stone, Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
138
4 Optinformatics Across Heterogeneous Problem …
57. R.S. Woodworth, E.L. Thorndike, The influence of improvement in one mental function upon the efficiency of other functions.(i). Psychol. Rev. 8(3), 247 (1901) 58. B.F. Skinner, Science and Human Behavior (Simon and Schuster, 1953) 59. G.P.C. Fung, J.X. Yu, H. Lu, P.S. Yu, Text classification without negative examples revisit. IEEE Trans. Knowl. Data Eng. 18(1), 6–20 (2006) 60. B. Bakker, T. Heskes, Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4, 83–99 (2003) 61. C.H. Lampert, H. Nickisch, S. Harmeling, Learning to detect unseen object classes by betweenclass attribute transfer, in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (IEEE, 2009), pp. 951–958 62. D. Wang, T.F. Zheng, Transfer learning for speech and language processing, in 2015 AsiaPacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (IEEE, 2015), pp. 1225–1237 63. M.E. Taylor, N.K. Jong, P. Stone, Transferring instances for model-based reinforcement learning, in Machine Learning and Knowledge Discovery in Databases (Springer, 2008), pp. 488– 505 64. M.E. Taylor, P. Stone, Representation transfer for reinforcement learning, in AAAI 2007 Fall Symposium on Computational Approaches to Representation Change during Learning and Development, pp. 1–8 (2007) 65. B. Banerjee, P. Stone, General game learning using knowledge transfer, in IJCAI, pp. 672–677 (2007) 66. T.J. Walsh, L. Li, M.L. Littman, Transferring state abstractions between MDPs, in ICML Workshop on Structural Knowledge Transfer for Machine Learning (2006) 67. M.E. Taylor, P. Stone, Cross-domain transfer for reinforcement learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 879–886 68. A. Taylor, I. Dusparic, E. Galván-López, S. Clarke, V. Cahill, Transfer learning in multi-agent systems through parallel transfer, in Theoretically Grounded Transfer Learning at the 30th International Conference on Machine Learning (ICML) (Omnipress, 2013) 69. P. Vrancx, Y. De H, A. Nowé, Transfer learning for multi-agent coordination, in ICAART (2), pp. 263–272 (2011) 70. M.E. Taylor, S. Whiteson, P. Stone, Transfer via inter-task mappings in policy search reinforcement learning, in Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent systems (ACM, 2007), p. 37 71. G. Boutsioukis, I. Partalas, I. Vlahavas, Transfer learning in multi-agent reinforcement learning domains, in Recent Advances in Reinforcement Learning (Springer, 2012), pp. 249–260 72. E. Oliveira, L. Nunes, Learning by exchanging advice, in Design of Intelligent Multi-agent Systems, Chapter 9, ed. by R. Khosla, N. Ichalkaranje, L. Jain (Spring, New York, NY, USA, 2005) 73. L. Feng, Y. S. Ong, A.H. Tan, X. Chen, Towards human-like social multi-agents with memetic automaton, in 2011 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2011), pp. 1092–1099 74. Ah-Hwee Tan, Lu Ning, Dan Xiao, Integrating temporal difference methods and self-organizing neural networks for reinforcement learning with delayed evaluative feedback. IEEE Trans. Neural Netw. 19(2), 230–244 (2008) 75. H. Meisner, Interview with Richard S. Sutton. Kïnstliche Intelligenz, 3(1), 41–43 (2009) 76. X. Chen, Y. Zeng, Y.S. Ong, C.S. Ho, Y. Xiang, A study on like-attracts-like versus elitist selection criterion for human-like social behavior of memetic mulitagent systems, in 2013 IEEE Congress on, Evolutionary Computation (CEC) (IEEE, 2013), pp. 1635–1642 77. D. Gordon, D. Subramanian, A cognitive model of learning to navigate, in Proceedings of the 19th Conference of the Cognitive Science Society, vol. 25, p. 271 (1997) 78. D. Wang, A. Tan, Creating autonomous adaptive agents in a real-time first-person shooter computer game. IEEE Transactions on Computational Intelligence and AI in Games (2014) 79. J. Gemrot, R. Kadlec, M. Bída, O. Burkert, R. Píbil, J. Havlíˇcek, L. Zemˇcák, J. Šimloviˇc, R. Vansa, M. Štolba, et al., Pogamut 3 can assist developers in building ai (not only) for their videogame agents, in Agents for Games and Simulations (Springer, 2009), pp. 1–15
References
139
80. R. Adobbati, A.N. Marshall, A. Scholer, S. Tejada, G.A. Kaminka, S. Schaffer, C. Sollitto, Gamebots: a 3d virtual world test-bed for multi-agent research, in Proceedings of the Second International Workshop on Infrastructure for Agents, MAS, and Scalable MAS, Montreal, Canada, vol. 5 (2001) 81. J. Lindfors, M. Fleury, JMX: Managing J2EE with Java Management Extensions (Sams Publishing, 2002)
Chapter 5
Potential Research Directions
Although optinformatics in evolutionary learning and optimization has made remarkable progress in recent years, there are still a number of potential research directions of optinformatics that we believe will be beneficial to the field of evolutionary computation, which are remained to be explored. In this chapter, the possible research directions of optinformatics in evolutionary learning and optimization are discussed.
5.1 Deep Optinformatics in Evolutionary Learning and Optimization Today deep learning (DL) has now become the main driver of many real-world learning applications, such as image classification [1], speech recognition [2], and recommendation system [3]. It provides great power and flexibility by learning to represent a given task as a nested hierarchy of features, with each layer defines different levels of simplicity and abstraction of the task. DL mimics the ways of the human brain in processing data for use in recognizing speech, detecting objects, and making decisions, etc., so that it is able to unravels huge amounts of unstructured data without human supervision. Common DL architectures include deep neural networks, deep belief networks, recurrent neural networks and convolutional neural networks, etc. [4]. Inspired by DL, towards further enhanced evolutionary computation, a promising direction for future exploration is deep optinformatics in evolutionary learning and optimization. In particular, instead of learning a specific form of knowledge from the evolutionary search process, deep optinformatics aim at learning knowledge hierarchies with knowledge from higher levels of the hierarchy formed by the composition of lower level knowledge. Automatically learning knowledge at multiple levels of abstraction allow a system to learn diverse and flexible forms of knowledge which can be leveraged in different stages of the evolutionary search process. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. Feng et al., Optinformatics in Evolutionary Learning and Optimization, Adaptation, Learning, and Optimization 25, https://doi.org/10.1007/978-3-030-70920-4_5
141
142
5 Potential Research Directions
5.2 Evolutionary Knowledge Transfer Across Reinforcement Learning Tasks In Sect. 4.2, the presented evolutionary transfer learning framework has exclusively focused on multi-agent reinforcement learning systems, wherein multiple agents (may have different learning structures) share the same state and action spaces in a common reinforcement learning task of interest. Particularly, the memetic knowledge transfer process is primarily driven by imitation, which takes place when sociotype memes (defined as the manifested behavioral actions) are transmitted. Since multiple agents possess common state and action spaces, assimilation of sociotype memes transferred from the teacher agent to the imitating agent can be seamlessly attained since both shares a common mind universe. Due to the unique representations of state and/or action variables in different reinforcement learning tasks, it is often the case that the knowledge acquired from one task cannot be directly used in another. In addition, adaptation of knowledge from unique agents to enable TL across tasks is a non-trivial task. From a survey of the literature, current research works have attempted to build the mappings for given pairs of differing reinforcement learning tasks. Recent studies have demonstrated that existing transfer methods are able to successfully improve the learning capacity of reinforcement learning tasks [5, 6]. Nevertheless, most of these studies assume that the mappings are manually pre-defined in accordance with analysis of specific problems, which can be very labour intensive and time consuming [7]. Falling back on the discussion in Sect. 2.3, memes can be naturally materialized as a wide variety of recurring real-world patterns. These meme patterns provide highlevel common knowledge representations of problems, and hence enable knowledge reuse across problem domains. Particularly in the field of evolutionary computation, memes have already been defined as the transformation matrices that are transferred across differing problem domains for enhancing the evolutionary search process [8]. On this basis, an promising future direction is thus to extend evolutionary knowledge transfer in working with differing reinforcement learning tasks from multiple related domains.
5.3 GPU Based Optinformatics in Evolutionary Learning and Optimization In the last decades, the general-purpose computing on graphics processing units (GPU) has been successfully used in machine learning and optimization to speed up the computational tasks traditionally handled by the central processing unit (CPU). For instance, [9] presented randomized algorithms based on GPU for solving global motion planning, while [10] designed a parallel strategy to speed up time series learning by using GPU. Further, Zhou et al. presented a massive GPU-based A-star search to tackle complex optimization problems efficiently [11].
5.3
GPU Based Optinformatics in Evolutionary Learning and Optimization
143
In the context of evolutionary learning and optimization, as it is a naturally parallel algorithm in solving optimization problems, many efforts have also been proposed in the literature to explore the design of GPU based EAs. In particularly, Pospichal et al. proposed a parallel genetic algorithm (PGA) based on the populaer GPU programming model, i.e., compute unified device architecture (CUDA) [12]. In [13], Jaros introduced the implementation of the genetic algorithm exploiting a multi-GPU cluster based on island model for solving large-scale knapsack problems. Furthermore, in [14], Gupta and Tan porposed a parallel GPU based implementation of NSGA-II with major focus on non-dominated sorting with large scale population for multiobjective evolutionary optimization. The design of GPU based optinformatics in evolutionary learning and optimization is to explore the possible GPU paradigms for both evolutionary search and knowledge learning as well as transfer across optimization problems, which could further improve the efficiency of knowledge transfer in evolutionary search towards significantly accelerated learning and optimization performance.
5.4 Theoretical Study of Optinformatics Despite the increasing research activities on optinformatics in evolutionary learning and optimization, there is the lack of sufficient rigorous theoretical study to date. While there has been some theoretical analysis of knowledge transfer between classification tasks [15, 16], such analyses do not directly apply to evolutionary learning and optimization. To the best of our knowledge, in the literature, there are only few studies working on the similarity measure between optimization problems for positive knowledge transfer in evolutionary search [17, 18]. Unfortunately, these studies are still evaluated based on empirical study, not theoretical analysis, which make the conclusions are not generic. To have a better understanding of how optinformatics work, more theoretical work of optinformatics is thus necessary and expected. For instance, theoretical guarantees about whether a particular source problem can improve the evolutionary learning and optimization in a target task (given a particular type of optimization problem), theoretical analysis on the correlation between the amount of knowledge transferred and the improvement in the target problem, definition of optimal problem mapping for knowledge transfer within and across problem domains, etc.
References 1. Y. Sun, B. Xue, M. Zhang, G.G. Yen, Evolving deep convolutional neural networks for image classification. IEEE Trans. Evol. Comput. 24(2), 394–407 (2020) 2. D. Yu, L. Deng, Automatic Speech Recognition: A Deep Learning Approach (Springer Publishing Company, Incorporated, 2014)
144
5 Potential Research Directions
3. A. Karatzoglou, B. Hidasi, Deep learning for recommender systems, in Proceedings of the Eleventh ACM Conference on Recommender Systems, pp. 396–C397 (2017) 4. Q. Zhang, L.T. Yang, Z. Chen, P. Li, A survey on deep learning for big data. Inf. Fus. 42, 146–157 (2018) 5. H.B. Ammar, E. Eaton, P. Ruvolo, M.E. Taylor, Unsupervised cross-domain transfer in policy gradient reinforcement learning via manifold alignment, in Proceedings of AAAI (2015) 6. Y. Liu, P. Stone, Value-function-based transfer for reinforcement learning using structure mapping, in Proceedings of the National Conference on Artificial Intelligence, vol. 21, p. 415 (2006) 7. M.E. Taylor, P. Stone, Cross-domain transfer for reinforcement learning, in Proceedings of the 24th International Conference on Machine Learning (ACM, 2007), pp. 879–886 8. L. Feng, Y. Ong, M. Lim, I.W. Tsang, Memetic search with interdomain learning: a realization between CVRP and CARP. IEEE Trans. Evol. Comput. 19(5), 644–658 (2015) 9. J. Pan, C. Lauterbach, D. Manocha, g-planner: Real-time motion planning and global navigation using GPUS (2010) 10. I. Coelho, V. Coelho, E. Luz, L. Ochi, F. Guimaraes, ¯ E. Rios, A gpu deep learning metaheuristic based model for time series forecasting. Appl. Energy 201(C), 412–418, 01 (2017) 11. Y. Zhou, J. Zeng, Massively parallel a* search on a GPU, in Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI Press, 2015), pp. 1248C–1254 12. P. Pospichal, J. Jaros, J. Schwarz, Parallel genetic algorithm on the cuda architecture, in Proceedings of the 2010 International Conference on Applications of Evolutionary Computation— Volume Part I, pp. 442C–451 (2010) 13. J. Jaros, Multi-GPU Island-based genetic algorithm for solving the knapsack problem, in 2012 IEEE Congress on Evolutionary Computation, pp. 1–8 (2012) 14. S. Gupta, G. Tan, A scalable parallel implementation of evolutionary algorithms for multiobjective optimization on GPUS, in 2015 IEEE Congress on Evolutionary Computation (CEC), pp. 1567–1574 (2015) 15. J. Baxter, A model of inductive bias learning. J. Artif. Intell. Res. 12(1), 149–C198 (2000) 16. S. Ben-David, R.S. Borbely, A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73(3), 273C–287 (2008) 17. L. Zhou, L. Feng, J. Zhong, Z. Zhu, B. Da, and Z. Wu, A study of similarity measure between tasks for multifactorial evolutionary algorithm, in Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 229C–230 (2018) 18. A. Gupta, Y.S. Ong, B. Da, L. Feng, S.D. Handoko, Landscape synergy in evolutionary multitasking, in 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3076–3083 (2016)